Intended for healthcare professionals

Education And Debate

Evidence based diagnostics

BMJ 2005; 330 doi: (Published 24 March 2005) Cite this as: BMJ 2005;330:724
  1. Christian Gluud (cgluud{at}, head of department1,
  2. Lise Lotte Gluud, specialist registrar1
  1. 1 Cochrane Hepato-Biliary Group, Copenhagen Trial Unit, Centre for Clinical Intervention Research, H:S Rigshospitalet, Copenhagen University Hospital, DK-2100 Copenhagen, Denmark
  1. Correspondence to: C Gluud
  • Accepted 24 January 2005

Diagnostic tests are often much less rigorously evaluated than new drugs. It is time to ensure that the harms and benefits of new tests are fully understood


No international consensus exists on the methods for assessing diagnostic tests. Previous recommendations stress that studies of diagnostic tests should match the type of diagnostic question.1 2 Once the specificity and sensitivity of a test have been established, the final question is whether tested patients fare better than similar untested patients. This usually requires a randomised trial. Few tests are currently evaluated in this way. In this paper, we propose an architecture for research into diagnostic tests that parallels the established phases in drug research.

Stages of research

We have divided studies of diagnostic tests into four phases (box). We use research on brain natriuretic peptide for diagnosing heart failure as an illustrative example.2 However, the architecture is applicable to a wide range of tests including laboratory techniques, diagnostic imaging, pathology, evaluation of disability, electrodiagnostic tests, and endoscopy.

Establishing the normal range

In drug research, phase I studies deal with pharmacokinetics, pharmacodynamics, and safe doses.3 Phase I diagnostic studies are done to determine the range of results obtained with a newly developed test in healthy people. For example, after development of a test to measure brain natriuretic peptide in human plasma, phase I studies were done to establish the normal range of values in healthy participants.4 5

Embedded Image

The harms and benefits of diagnostic tests needs evaluating—just as drugs do


Diagnostic phase I studies must be large enough to examine the potential influence of characteristics such as sex, age, time of day, physical activity, and exposure to drugs. The studies are relatively quick, cheap, and easy to conduct, but they may occasionally raise ethical problems—for example, finding abnormal results in an apparently healthy person.6

Diagnostic accuracy

In phase II, studies explore the diagnostic accuracy of a test in participants with both known and suspected relevant disease. Phase IIa studies compare test results in participants with disease diagnosed by a standard method with those in healthy participants (from diagnosis to test result). For example, a phase IIa study found significantly raised concentrations of brain natriuretic peptide in participants with left ventricular dysfunction diagnosed by echocardiography (median 493.5 (range 248.9-909.0) pg/ml) compared with healthy participants (129.4 (53.6-159.7) pg/ml).7 Subsequently, brain natriuretic peptide was recommended as a useful diagnostic aid for left ventricular dysfunction.7

After an association has been found between test results and a certain disease, phase IIb studies may be done to examine whether test results are related to the severity of a disease. For example, in a phase IIb study, brain natriuretic peptide concentrations were measured in healthy participants and participants with congestive heart failure.8 The study found a linear relation between test values and the degree of ventricular dysfunction. The authors concluded that the concentration of brain natriuretic peptide is a good indicator of the severity of chronic heart failure.8 However, the design only allows inferences about how a test works under ideal conditions.

Phase IIc studies examine the predictive value of a test among people with suspected disease (from test results to diagnosis). For example, a phase IIc study measured brain natriuretic peptide concentrations in participants with suspected heart disease.9 All participants had transthoracic echocardiography. The results showed raised concentrations of brain natriuretic peptide in participants with left ventricular systolic dysfunction (median 79.4 (interquartile range 35.9-151.0) pg/ml) compared with those with normal ventricular systolic function (26.7 (12.2-54.3) pg/ml).9 A concentration > 17.9 pg/ml had a sensitivity of 88% and specificity of 34%. Choosing different cut-off points did not improve the predictive characteristics.

Four phases in architecture of diagnostic research

Phase I—Determining the normal range of values for a diagnostic test though observational studies in healthy people

Phase II—Determining the diagnostic accuracy through case-control studies, including healthy people and (a) people with known disease assessed by diagnostic standard and (b) people with suspected disease

Phase III—Determining the clinical consequences of introducing a diagnostic test through randomised trials

Phase IV—Determining the effects of introducing a new diagnostic test into clinical practice by surveillance in large cohort studies

The authors concluded that measuring brain natriuretic peptide in addition to routine investigations provides a small diagnostic advantage.9 However, the characteristics of the test may be different in other settings. A narrative review summarised several phase II studies on brain natriuretic peptides for diagnosing left ventricular systolic dysfunction.10 The studies found that sensitivity ranges from 26% to 92% and specificity from 34% to 89%. The predictive ability seemed to depend on sex, and the test performed less well in community based studies than in referral series.

Several concerns surround the validity and applicability of phase II studies. Two of the most important concerns are blinded evaluations of test results and selection of cut-off values or limits for normal values.2 To improve the quality of reporting of studies of diagnostic tests, the Standards for Reporting of Diagnostic Accuracy (STARD) Initiative was launched.11 Checklists and flowcharts were developed to aid authors of phase II studies. Future studies are planned to evaluate the effect of the initiative.

Clinical effects

In some cases, the value of a diagnostic test is self evident—for example, in genetic testing. However, for most diagnostic tests, phase III studies are necessary to evaluate the beneficial and harmful effects of implementing a new test. The potential effects depend on how the information is used in subsequent clinical decisions. In phase III diagnostic studies, randomisation determines whether participants have the test or not. In some randomised trials, the result of the test may be used to determine a specific clinical course, including treatment. Alternatively, knowledge of a test result may be incorporated into standard clinical practice and treatment strategies remain unchanged.

A phase III study compared the effect of using brain natriuretic peptide concentrations or clinical assessment to guide treatment.12 The study included 69 participants with impaired systolic function and symptomatic heart failure. Participants were randomised to receive treatment guided by brain natriuretic peptide concentrations or by a clinical score of symptoms and signs of heart failure. Fewer deaths, hospital admissions, and cases of decompensation of heart failure occurred among participants whose treatment was guided by brain natriuretic peptide values than among those whose treatment was guided by clinical score.

The study shows the way for diagnostic research. However, the interpretation of the results is not simple. Larger trials with the most recently developed drugs are necessary before the test is implemented in clinical practice. The benefits and harms of the test in other settings—for example, in screening for asymptomatic left ventricular dysfunction—also seem relevant.

Methodological issues also arise. Estimation of required sample size is difficult in diagnostic trials.13 In randomised trials comparing two binary diagnostic tests, patients in the two arms with concordant results will not contribute to the final difference. Sample size estimations in such trials therefore include discordance rates. Other methodological aspects are similar to those in randomised drug trials. In both trial types, methods for adequate generation of the allocation sequence, allocation concealment, and blinding deserve attention.14 When several randomised trials on diagnostic tests are completed, systematic reviews and possibly meta-analyses are warranted.15

Long term consequences

Logistical problems such as storage, freezing, and thawing of samples or poor calibration of equipment may affect the accuracy of a diagnostic test after it is introduced into routine clinical practice. Several factors, such as a change in diagnostic indications, may influence the circumstances under which a test is used. Phase IV studies are therefore needed to determine whether the diagnostic accuracy of a test in practice corresponds to predictions from systematic reviews of phase III trials.

Phase IV studies include large cohorts of consecutive participants. Regular reports on regional, national, and international quality and bench markings may also help improve quality of testing in clinical practice. Phase IV diagnostic studies are an important aid in quality assurance and quality development and are necessary to identify rare adverse events.16


Few will argue that valid evidence is necessary before we introduce new drugs in clinical practice. The randomised trial is the best method for comparing interventions. Randomised trials are also necessary to evaluate the potential effects of introducing a diagnostic test. Unfortunately, few randomised trials deal with diagnostic tests. We searched the Cochrane Central Register of Controlled Trials (Issue 1, 2005) and found that only 4.2% (18 366 of 435 786 records) dealt with diagnostic tests or screening. Awareness of the need for evidence based diagnostic testing must be increased. Organisations such as the Cochrane Collaboration can help by improving facilities for and methodological quality of systematic reviews of diagnostic tests.

The demand for diagnostic phase III and phase IV studies is increasing with the continuous development of new diagnostic methods. Although defensive use of diagnostic tests improves clinical outcomes for some patients, it worsens clinical outcomes for others.17 The four temporal phases of research provide a logical, stepwise procedure for development of diagnostic tests. However, the four phases do not apply to all diagnostic tests or provide an adequate basis for all types of diagnostic studies. Furthermore, one type of study may occur in several phases. The phase concept is meant as a guide that may be adjusted according to individual circumstances.

Summary points

The harms and benefits of diagnostic tests should be fully evaluated before they are used in clinical practice

A four phase process of assessment is suggested, mirroring that used for new drugs

The first phase focuses on establishing the normal range

The second phase focuses on establishing sensitivity and specificity and other measures of diagnostic accuracy

Randomised trials are then needed to determine whether patients benefit from the testing

The final phase is large continuous surveillance studies to identify consequences of testing in clinical practice


  • Contributors and sources CG directs The Copenhagen Trial Unit, a non-specialty oriented centre for clinical intervention research and studies random and systematic errors in clinical research. LLG studies random and systematic errors in clinical research. CG and LLG are physicians and editors of the Cochrane Hepato-Biliary Group. The literature came from unsystematic and systematic searches of PubMed, The Cochrane Library, and personal files. CG drafted and LLG revised the paper. CG is the guarantor.

  • Competing interests None declared.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
View Abstract