Research Methods & Reporting

Importance of accurately identifying disease in studies using electronic health records

BMJ 2010; 341 doi: (Published 19 August 2010) Cite this as: BMJ 2010;341:c4226
  1. Douglas G Manuel, senior scientist12345,
  2. Laura C Rosella, fellow456,
  3. Thérèse A Stukel, senior scientist46
  1. 1Ottawa Hospital Research Institute, 1053 Carling Ave, Ottawa, Ontario, Canada K1Y 4E9
  2. 2Statistics Canada, Ottawa
  3. 3Departments of Family Medicine and Epidemiology and Community Medicine, University of Ottawa
  4. 4Institute for Clinical Evaluative Sciences, Toronto, Ontario
  5. 5Dalla Lana School of Public Health, University of Toronto, Toronto
  6. 6Ontario Agency for Health Protection and Promotion, Toronto, Ontario
  1. Correspondence to: D G Manuel dmanuel{at}
  • Accepted 7 June 2010

Use of routinely collected electronic health data to identify people for epidemiology studies and performance reports can lead to serious bias

Disease registries and similar databases have facilitated epidemiological studies that contribute to our understanding of the natural course of disease and the value of medical and surgical interventions.1 These data have also allowed us to study the performance of health care, including patient safety and quality of care.2 3 However, there is an increasing possibility of inaccurate results arising from a shift in the type of data used to identify people with chronic diseases. In the past, registries for cancer and other diseases were laboriously created using active reporting from individual clinical records. But increasingly, disease databases are now generated from routinely collected electronic data and applying a set of disease identification criteria en masse. For uncommon diseases, small errors in classifying people can result in a large number of incorrect entries in a database, leading to biased results and classification errors that propagate through calculations in ways that are difficult to intuitively appreciate.

How disease classification errors affect study conclusions

Routinely collected electronic data are increasingly used to identify patients with chronic diseases such as diabetes, heart disease, cancer, and arthritis, for research.1 4 Databases that contain information on patients with a wide range of diseases are even more widely used. The United Kingdom’s General Practice Research Database, for example, has been used for more than 700 studies of over 150 conditions (table 1),5 6 and hospital discharge databases are widely used in many countries for research and performance studies.

Table 1

 Examples of routinely collected data used to identify people with chronic diseases

View this table:

However, few of these studies assess whether their findings may be biased by misclassification of patients in the database. We believe that the conclusions of many studies may change if their results were adjusted for bias or if there were no misclassification errors.

To illustrate our case, we estimated the potential bias in two published studies that use the Ontario Diabetes Database.7 8 The concerns about misclassification error are described in other areas of health care, such as diagnostic accuracy studies, where methods to reduce error and reporting guidelines to disclose potential bias have been developed.9 We applied the same principles and methods to examine bias in the use of routinely collected data to identify disease.

Estimating bias

The Ontario Diabetes Database is a well developed database generated using only routinely collected administrative data. Both studies that we examined generated study populations directly from this database, and both studies quoted a separate development study as validation that disease identification in the database was high quality (table 2).10

Table 2

 Estimates of misclassified respondents in two published studies using sensitivity and specificity from development study

View this table:

We calculated the potential percentage of misclassified people in the two study samples using a straightforward correction method described in different epidemiology settings (see 12 13 Like several other approaches to assess misclassification and bias, this method centres around estimating the predictive accuracy of disease identification in terms of “false positives”—people who do not have a disease and are incorrectly enrolled in the disease database or study—and “false negatives”—people who do have the disease but are missing from the disease database.14 15 16 The amount that a study is biased can be estimated after misclassification is described. For example, performance studies for diabetes care will report the proportion of patients who receive care as recommended in clinical practice guidelines (such as regular haemoglobin A1c testing). Incorrectly including people without diabetes in a study of diabetes care will bias performance towards poor care because people who do not have diabetes do not need to have regular testing. In this way, false positives and classification error will almost always bias performance reporting towards poor care.

Table 3 shows the findings from the validation study10 for the Ontario Diabetes Database and our estimate of false positives and false negatives in the study populations for each of two examples.

Table 3

 Estimates of misclassified respondents in two published studies using sensitivity and specificity from development study

View this table:

The first study reported an annual rate of haemoglobin A1c tests and concluded that the level of testing was unacceptably low in 2005.7 The study reported that 58% of 36 945 patients with physician diagnosed diabetes received a haemoglobin A1c test. These results have been widely cited. The Health Council of Canada, for one, used these and other findings to conclude that care for people with diabetes in Canada is possibly the worst of any country in the Organisation for Economic Cooperation and Development.17 Applying a sensitivity of 86.1% and specificity of 97.1% from the database validation study, we estimate that 38 186 of 63 699 participants were correctly classified as having diabetes (positive predictive value 59.9%). The remaining 25 513 patients were false positives, misclassified as having diabetes and not in need of regular haemoglobin A1c testing. Using this information (see, we calculated an unbiased estimate of haemoglobin A1c testing among diabetes patients of 97% (36 945/38 186).

The second study reported trends in the incidence and prevalence of diabetes8 and concluded that more adult Ontarians were diagnosed with diabetes in 2005 (8.9% or 827 419 people) than the global rate predicted for 2030.18 The Ontario government extrapolated findings from the study to state that diabetes prevalence will increase by an additional 30% by 2010.19 This prevalence calculation is widely quoted and is being used to support a considerably expanded diabetes strategy.19 However, applying the database validation study, we estimate that the unbiased prevalence of diabetes in 2005 was 19% lower than the original study found (7.2% versus 8.9%). Of the 827 419 people enrolled in the Ontario Diabetes Database, we calculated that 249 840 were wrongly classified as having been diagnosed with diabetes (positive predictive value 69.8%), and that 93 102 people had diabetes diagnosed by their physician but were not enrolled in the database (false negatives).

Why does this problem happen?

It is important to recognise a subtle but critical distinction between disease databases that individually verify diagnoses from those that do not. It is one matter to identify patients with a positive confirmation test such as a cancer pathology report, manually verify the report, and then use this information to create a disease registry. It is another matter to access an entire population’s electronic records and apply identification criteria to automatically classify people who have a disease and exclude those who do not. Routinely collected electronic data offer the advantage of identifying many diseases in large populations at low cost. However, mass application of identification criteria is more prone to error than the traditional, more expensive, approach of individually or manually verifying disease diagnoses for each person.

When individual verification is not done, disease databases should at least attempt to gauge the accuracy of the identification process in a representative sample. Unfortunately, this step is commonly omitted. Instead, diagnoses are identified using the corresponding codes within health services data such as international classification of disease (ICD) codes from hospital admission discharge summaries or Read codes from primary care data.20 21 This approach assumes that the diseases are accurately and completely recorded in the databases, which in turn assumes that well implemented quality control procedures are in place at the point of data entry.21

The purpose of development and validation studies is to test these assumptions. These studies run different identification algorithms against a reference population whose disease status has been individually validated (box). Identification algorithms are constructed and tested using various diagnosis codes along with procedures and services in different combinations and intensities. Identification algorithms are then compared using tests of discrimination (sensitivity, specificity, likelihood ratios) and predictive accuracy (positive and negative predictive values).22 Other approaches for developing and validating identification methods are available.23 24 25 Because studies of identification accuracy (assessing the accuracy of tests to identify people already diagnosed with a disease) are similar to those studying diagnostic accuracy (assessing the accuracy of tests to diagnose people who may have a disease), the approaches to development, validation, and reporting are largely applicable to both types of studies.9

Steps for creating and using a disease database when disease status is not individually verified

  • Development studies—Develop disease identification criteria by assessing the identification (or diagnostic) accuracy of different ascertainment approaches or algorithms against a reference standard of people with individually verified disease status

  • Create disease database—Systematically apply the case identification criteria to an entire population’s health data. Enrol people in the database if they satisfy case identification criteria. Regularly update the process when new data become available. Assign an enrolment (incident) date. Studies may not formally create a disease database; instead, the disease identification criteria are applied to the health data of all study participants

  • Assess for bias due to classification error—For each use of the disease databases or disease identification algorithm, assess the potential for misclassification to bias the study results. Estimate the number of people who may be false positives and false negatives, and examine how this affects the study results

  • Validation studies—(Re)validate the disease identification criteria in new study populations with a reference standard

Errors can also occur when information is abstracted from a database for a study. With increased computing power and wider availability of health data it is straightforward to apply identification criteria to an entire population, including populations beyond those represented in a development study (if one exists). Rather than formally creating a database for a specific study, it is common simply to apply the identification criteria to create a study population. For example, a hospital may assess its performance by examining the quality of care for people with an acute myocardial infarction in terms of time to thrombolytic therapy or 30 day survival by identifying people with a discharge diagnosis coded for acute myocardial infarction.26 However, if the ICD-9 code for myocardial infarction is incorrectly used, the quality measure may be biased.

Studies using electronically collected data can be grouped into three types:

  • Study denominator is drawn from the database—for example, examining healthcare performance for people with a particular condition

  • Study base is entire population and the numerator is people with a disease—for example, examining the incidence and prevalence of a disease in a population

  • Outcome of interest is people who develop a condition—for example, study of drug side effects such as admission for hyperkalaemia (identified from hospital discharge data) in patients prescribed spironolactone.27

Classification errors will potentially bias different study types in different ways. Performance reports are biased only from false positive entries, whereas estimates of disease incidence are affected by both false positive and false negative entries.

Assessment of bias

Bias from misclassification should be assessed for each use of data from electronic health record systems. Unless a diagnosis is individually verified there will inevitably be some classification error, and the resulting bias is difficult to intuitively gauge because both the amount and direction of bias are affected by the study design and by various properties of disease identification including prevalence, sensitivity, and specificity. The amount of bias may be large, even when the disease identification criteria seem to be accurate or there are well instituted data quality control procedures.

There are two general approaches that are used to estimate bias. The first approach applies the level of identification accuracy from development studies to a new study. We used this approach when we estimated bias in the two published diabetes studies. The second approach validates the identification in a new study, correcting for bias as needed.

Calculating bias is not always straightforward. First, development or validation studies are required, and they should report sensitivity and specificity or similar measures of disease identification accuracy. Many studies have not validated their method of disease identification. For example, more than two thirds of peer reviewed studies using the General Practice Research Database did not perform a validation study and most of those that did calculated only specificity and positive predictive values.5 Even well performed validation studies carry generalisability concerns. In our examples, the validation study used a diagnosis of diabetes in general practice records as the reference standard. This reference standard is imperfect because, among other reasons, some patients may not have had their diagnosis in their general practice recorded because their diabetes was diagnosed and cared for exclusively by specialists. Furthermore, it may be inappropriate to assume that sensitivity and specificity from a validation study hold firm for studies with different population characteristics. Methods are available to overcome these concerns, including performing sensitivity testing using different reference standards or levels of identification accuracy (calculating bias by varying sensitivity and specificity).9 We recommend the development and use of multi-attribute identification algorithms to estimate the probability of disease diagnoses (value of 0 to 1), rather than assigning disease status to a person (value of 0 or 1).28


As our two examples show, even in well performed studies with well developed identification criteria, there is considerable opportunity for misclassification to bias results—so much so that studies can arrive at incorrect conclusions. Most of the time, it is straightforward to calculate the amount of potential bias and adjust the findings accordingly. Our findings are applicable beyond diabetes, particularly when disease prevalence is below about 10% and the specificity of identification is less than perfect (say, less than 98%).

The problem is further magnified because once a disease database is generated, many different investigators may use it for a wide range of studies or reports, propagating classification errors in their wake. However, data users cannot estimate bias when the accuracy of identification is unknown, and people who generate the databases or apply identification algorithms to routinely collected data should clearly describe the accuracy of their classification process. Researchers using such data should also publish an estimate of the percentage of false positives and negatives and the effect of misclassified people on the study’s findings. Readers of reports can reasonably ask if classification error potentially challenges the studies’ findings, and they should expect to see calculations that estimate the amount of bias.

It would be wrong to conclude that routinely collected data are poorly suited to study people with chronic conditions. Routinely collected data are improving and increasingly include more clinical information that can be used to individually verify disease or develop more accurate identification algorithms. Nevertheless, careful development and validation can help ensure that disease identification is accurate, bias can be measured, and results accordingly adjusted.

Summary points

  • Routinely collected electronic health data are increasingly used to identify people with chronic conditions for research

  • Classification error can occur during the disease identification process

  • Even when the identification process has very good sensitivity and specificity, misclassification can considerably bias study findings

  • Studies using routinely collected data should assess the potential for classification error and adjust for bias


Cite this as: BMJ 2010;341:c4226


  • Contributors: All authors contributed to the development of the paper. DGM led the writing. LCR and TAS provided edits. DGM is the guarantor of the paper and analyses.

  • Competing interests: All authors have completed the unified competing interest form at (available on request from the corresponding author) and declare support for this article was provided by the Population Health Improvement Research Network; DGM holds a chair in applied public health from the Canadian Institute for Health Research and the Public Health Agency of Canada; no other relationships or activities that could appear to have influenced the submitted work. The opinions, results, and conclusions reported in this paper are not necessarily those of the funding or employment sources.

  • Provenance and peer review: Not commissioned; externally peer reviewed.