Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2003;326:1127 (24 May), doi:10.1136/bmj.326.7399.1127
Philip J B Brown, honorary lecturer in healthcare informatics1, Victoria Warmington, research associate2, Michael Laurence, general practitioner3, A Toby Prevost, medical statistician4
1 School of Information Systems, University of East Anglia, Norwich NR4 7TJ, 2 Humbleyard Practice, Hethersett, Norfolk NR9 3AB, 3 Bacon Road Medical Centre, Norwich NR2 3QX, 4 Department of Public Health and Primary Care, University of Cambridge, Institute of Public Health, Cambridge CB2 2SR
Correspondence to: P J B Brown, Humbleyard Practice, Hethersett, Norfolk NR9 3AB Pjbb{at}hicomm.demon.co.uk
Design Randomised crossover trial. Clinicians coded patient records using both schemes after being randomised in pairs to use one scheme before the other.
Setting 10 general practices in urban, suburban, and rural environments in Norfolk.
Participants 10 general practitioners.
Source of data Concepts were collected from records of 100 patient encounters.
Main outcome measures Percentage of coded choices ranked as being exact representations of the original terms; percentage of cases where coding choice of paired general practitioners was identical; length of time taken to find a code.
Results A total of 995 unique concepts were collected. Exact matches were more common with Clinical Terms (70% (95% confidence interval 67% to 73%)) than with Read Codes (50% (47% to 53%)) (P < 0.001), and this difference was significant for each of the 10 participants individually. The pooled proportion with exact and identical matches by paired participants was greater for Clinical Terms (0.58 (0.55 to 0.61)) than Read Codes (0.36 (0.33 to 0.39)) (P < 0.001). The time taken to code with Clinical Terms (30 seconds per term) was not significantly longer than that for Read Codes.
Conclusions Clinical Terms Version 3 performed significantly better than Read Codes 5 byte set in capturing the meaning of concepts. These findings suggest that improved coding accuracy in primary care electronic patient records can be achieved with the use of such a clinical terminology.
|
Comparisons of different clinical coding schemes have mainly been conducted by coding experts looking at the schemes' coverage in relation to existing lists of terms.711 No study has examined whether a clinical terminology improves the performance of coding electronic patient records by practising doctors in primary care. The main aim of this crossover study was to determine whether Clinical Terms Version 3 provides greater accuracy and consistency than Read Codes 5 byte set for coding electronic patient records by general practitioners.
Design
Each general practitioner manually recorded the consultation details of 10
consecutive patients in an arbitrarily chosen consultation session. A simple
framework of headings was provided (reason for encounter, diagnosis,
treatment, and medical history) to prompt entry of details, but there was no
restriction in the terms that could be recorded. The terms from these 100
records were then entered verbatim (except for correction of spelling
mistakes) into an Access (Microsoft) database. We used random number
tables12 to group
the general practitioners into five pairs and to randomly select one of each
pair to code terms with Read Codes 5 byte set (termed Read Codes in this
paper) first and then to code with the Clinical Terms Version 3 (termed
Clinical Terms in this paper), and the other doctor in each pair to use the
Clinical Terms first followed by Read Codes. We asked the clinicians to code
the terms collected from their own records and those from the other doctor in
their pair. Before this exercise, we identified and removed any duplicate
concepts in the Access database, providing a body of about 200 terms for
coding for each doctor (fig
1).
|
We videotaped each doctor coding his or her allocated file of terms using both Clinical Terms and Read Codes. To minimise any confounding variables from the human-computer interface, all the participants used the same software (NHS Information Authority Clinical Terminology Browser, March 2001 release) and laptop computer (Sony Vaio) to search the two coding schemes. Participants were given standardised instructions and training to identify a code for each term that would be "an acceptable match if the coded record were the only documentation of the concept in a paperless practice." They were encouraged to identify a match by searching for an appropriate term by keying in one or more words or part words and, when necessary, to browse the hierarchies of the coding scheme until a suitable equivalent was found, and then to record their match electronically by pasting their choice from the browser into an Access database (fig 2).
|
A researcher reviewed each video and recorded the time taken to code each term. Two researchers then independently examined the coded choices made by each general practitioner and ranked each match as exact or non-exact in representing the meaning of the original term. Differences in ranking between the two researchers were resolved by consensus reviews.
We estimated the accuracy of each coding scheme by calculating the proportion of coded choices ranked as being exact semantic representations of the original terms. We estimated consistency by identifying the proportion of cases where the coding choice of the paired general practitioners was the same. The length of time taken to find a code was used as a measure of usability of each scheme.
Sample size
In a pilot study, the records of five patients generated 40 unique terms
from a total of 42 terms. With one coding scheme 60% of the terms were matched
exactly, and with the other scheme 70% were matched exactly, with 67% of terms
being discordant pairs. Thus, in order to detect a 10% difference in exact
matches between the schemes, 704 unique terms would provide 90% power using
McNemar's test and 5% significance level. One hundred patients ought to
generate 840 terms, which, after removal of duplicates, should be sufficient
to provide 704 unique terms or, more certainly, 527 unique terms required for
80% power. The calculation was conservative in that, rather than being
assessed by a single general practitioner, each term was assessed by two
randomly paired clinicians and the average of their outcomes was used in the
analysis.
Statistical analysis
We chose statistical methods of analysis to be consistent with the paired
nature of the design. For all outcomes, we performed stratified analyses
within each of the five pairs of participants and then pooled the within pair
estimates using weights proportional to the number of terms coded by each
pair. We used Cohen's
coefficient to assess agreement among
participants in the exactness of coding under each scheme, with a value of
0.6 indicating good
agreement.13 We
calculated the proportion of coded choices ranked as exact matches for both
doctors in each pair and averaged the proportions as a repeated measures
summary statistic before pooling them across the pairs. We calculated the 95%
confidence interval from standard errors for a pair's averaged proportion,
which was derived as half of the standard error of a difference in two paired
proportions used in McNemar's
test,14 with
results verified using the bootstrap method. We also used these methods to
assess consistency between doctor pairs, defined as the proportion of concepts
coded identically and as an exact match by both doctors.
We calculated the difference between the two schemes in the time taken to code each entry for both doctors in each pair, and we used the average of the two differences as a repeated measures summary statistic in the analysis.13 For those entries where only one doctor's time difference was available, this was analysed instead of the average. We used the bias corrected accelerated non-parametric stratified bootstrap method 15 16 with 5000 replications within S-Plus 2000 software to estimate 95% confidence intervals and P values for mean time differences because the time differences were inconsistent with a normal distribution. We also used this method to test for period effects and for carryover effects (scheme by period interaction), stratified by participant pairs using the testing approach based on difference measures for period effect and average measures for carryover effect.17 All tests were two tailed and assessed at the 5% level of significance.
coefficients of 0.69 (95% confidence interval 0.64 to 0.74) for
Clinical Terms and 0.65 (0.60 to 0.69) for Read Codes.
|
Accuracy of coding schemes
The proportion of concepts ranked as exact semantic representations with
Clinical Terms ranged from 0.60 to 0.74 (pooled proportion 0.70) for the 10
participants, with seven of the doctors being in excess of 0.7. By contrast,
the proportion of concepts ranked exact with Read Codes ranged from 0.37 to
0.58 (pooled proportion 0.50). All 10 doctors coded significantly more
concepts as exact matches with Clinical Terms than with Read Codes (P <
0.001 for each doctor). The excess proportion of concepts ranked exact with
Clinical Terms ranged from 14% (95% confidence interval 7% to 21%) to 27% (19%
to 34%) for the 10 participants. The excess proportion of concepts exactly
matched with Clinical Terms was similar in the doctors who used this scheme
before Read Codes (22%) and in those who used the scheme after using Read
Codes (18%), although this relatively small difference represented a
significant period effect. We also found a significant carryover effect, with
proportions of exact matches in the first period being 69% for Clinical Terms
and 52% for Read Codes (excess 17% (95% confidence interval 13% to 19%)), and
in the second period being 71% and 47% respectively (excess 23% (20% to
26%)).
Consistency of coding schemes
The percentage of concepts ranked consistent (that is, exact matches and
coded identically by both members of a pair) ranged from 53% to 63% for
Clinical Terms and from 31% to 43% for Read Codes. The excess in proportion
ranked consistent with Clinical Terms ranged from 21% to 23% and was
significant for each of the general practitioner pairs. The pooled proportion
of consistent matches by general practitioner pairs was 0.58 for Clinical
Terms and 0.36 for Read Codes, with a pooled difference in proportion of 0.22
(0.19 to 0.25) (P < 0.001). A further 48 concepts in Clinical Terms and 80
concepts in Read Codes were coded identically by general practitioner pairs
but not as an exact match of the original terms.
Usability of coding schemes
The median coding time for each of the 10 participants ranged from 14 to 27
seconds for Clinical Terms and from 18 to 49 seconds for Read Codes. For 989
terms (99%), either both (85%) or one (14%) of the general practitioner pairs
had timing data recorded. Compared with Read Codes, the mean excess time taken
to code with Clinical Terms ranged from -29 to 12.3 seconds for the pairs of
participants. The mean time taken to code with Clinical Terms was shorter by a
mean of 5.9 seconds (4.0 to 7.9), being significantly shorter in four pairs
(by 13, 6, 3, and 7 seconds) and not significantly different in the remaining
pair (0.5 seconds longer). However, on the basis of the 850 terms with full
data available, there were significant period and carryover effects. In the
first period mean coding times were 28.1 seconds for Clinical Terms and 42.1
seconds for Read Codes, and in the second period they were 30.5 seconds and
29.3 seconds respectively. Compared with Clinical Terms, mean coding time with
Read Codes was significantly longer in the first period, by 14.0 seconds (11.2
to 17.0), and non-significantly shorter in the second period, by 1.2 seconds
(-1.3 to 3.8).
Strengths and limitations of our study
We compared the content and usability of the two coding schemes in a
practical setting, where clinicians had a variable degree of competency in
coding. While formulating the study, we considered videotaping the coding
process during live patient consultations. We rejected this in favour of a
randomised crossover trial as consistency between experimenters would have
been difficult to assess and confounding variables such as time constraints on
searching would have been difficult to control. Coding performance and times
are therefore only proxy estimates of use in real patient encounters. Further
improvement of data entry might be achieved with more sophisticated software
than was used in our studysuch as by using templates for data entry and
menus to access commonly used terms.
We compared Clinical Terms Version 3 with Read Codes 5 byte set rather than the earlier Read Code 4 byte set, which is still in use, because an earlier study of coding performance in secondary care had indicated that Read Codes 5 byte set was superior in coverage than the earlier scheme when tested against a set of 2624 concepts.18 Clinical Terms Version 3 also has the ability to support the construction of more detailed concepts using a mechanism of qualifiers; for example, the core notion of "skin abscess" can be qualified by its exact site with reference to a detailed anatomy chapter. This functionality provides great expressivity, and we excluded it from consideration in this study as it would have afforded an unfair comparative advantage and its influence would be heavily dependent on software implementation and user familiarity and skill. The value added by use of this qualifying mechanism merits further investigation.
We reduced confounding variables by using a randomised crossover trial and the same browser for searching both schemes. External validity was improved by involving several general practitioners. We identified carryover effects using tests that were sensitively based on within pair comparisons of general practitioners coding the same terms. The carryover effect in the proportion of terms exactly matched was small compared with the size of the difference between the two coding schemes in each period. The carryover effect for coding time reflected the change from the first period to the second period in the difference between the schemes, from 14 seconds shorter to 1 second longer for Clinical Terms compared with Read Codes. Four of the five doctors who coded first with Read Codes took more than 10 seconds longer to code each term than they did with Clinical Terms; review of the video comments of the remaining participant suggested that the longer coding time in the second period related to user fatigue (including remarks about the doctor's uncertainty of meaning of the original term and technical difficulties in using the notebook keypad). Only one of the participants who coded first with Clinical Terms took more than 10 seconds longer to code with this than with Read Codes, and this may be accounted for by the doctor's familiarity with the content of Read Codes. The small number of participants limits our ability to explain such differences with certainty, and we have cautiously interpreted them to say that the time taken to code with Clinical Terms was not significantly longer than that with Read Codes.
We did not try to measure the potential clinical importance of the non-exact matches. Clearly the absence of a detailed variant of a concept (such as "lipoma of neck" as opposed to just "lipoma") is less important than the complete absence of a suitable concept (such as "monoclonal gammaglobulinaemia of uncertain significance"). Judging the importance of non-exact matching would have introduced a further subjective element, requiring further checks of inter-rater reliability that were outside the scope of the study, although our data provide valuable material for further study.
|
Comparison with other studies
In previous reports assessing the content of Clinical Terms Version 3,
terminology experts coded lists of pre-existing concepts and generated rates
of completeness of 73%, similar to our
findings.7
9 Our study examined the
performance of a clinical terminology against an established coding scheme by
general practitioners (non-expert coders).
Cimino et al used videotaping to study 238 coding events in secondary care (using a terminology known as Medical Entities Dictionary): 71% of the codings captured the exact meaning of the required concept, with a mean coding time of 40.4 seconds.19 These findings are similar to our results. Cimino et al also evaluated the reasons for suboptimal performance and described problems in the terminology content (13%), representation (10%) and usability (6%). These aspects were not part of our current study, which concentrated on assessing practical performance, but further work in identifying the reasons for failing to achieve an exact match in our sample could provide useful information for improving the content of Clinical Terms Version 3 and software design.
Conclusion
The coding of clinical records is an important aspect of medical audit,
research, epidemiology, management of resources, and the direct care of
patients. For information technology to be fully adopted, clinical notions
that are often complex must be accurately and easily represented as coded
concepts that are "user friendly" and easily retrievable. Our
study suggests that substantial advantages may be achieved by investing in the
implementation of Clinical Terms Version 3 or a similar terminology.
Contributors: PJBB formulated the study and ATP, ML, and VW contributed to the design, ATP through the Cambridge Research Development and Support Group. ML recruited the participating clinicians, and VW coordinated the project, created the database of terms, and performed the video analysis. PJBB and ML ranked the matches. ATP performed the statistical analysis. PJBB wrote first draft, and all authors contributed to the final draft of the paper. PJBB is the guarantor of the paper.
Funding: The study was funded by the NHS Eastern Region R&D grant No RCC33031.
Competing interests: PJBB advises the NHS Information Authority on coding and terminology.
Ethical approval: The study was granted ethical approval by the Norfolk and Norwich Ethical Committee.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
Read all Rapid Responses