Intended for healthcare professionals


Retrospective analysis of evidence base for tests used in diagnosis and monitoring of disease in respiratory medicine

BMJ 2003; 327 doi: (Published 13 November 2003) Cite this as: BMJ 2003;327:1136
  1. Z Borrill, clinical fellow1,
  2. C Houghton, clinical fellow1,
  3. P J Sullivan, consultant (Paul.sullivan{at},
  4. P Sestini, associate professor of respiratory diseases2
  1. 1Department of Cardiorespiratory Medicine, Hope Hospital, Manchester M6 8HD
  2. 2Department of Clinical Medicine and Immunological Sciences, Division of Respiratory Diseases, University of Siena, Viale Bracci 3, 53100 Siena, Italy
  1. Correspondence to: P J Sullivan
  • Accepted 4 September 2003


Objectives To determine how many common clinical tests used in a respiratory medicine outpatient clinic are based on high quality evidence.

Design Retrospective review of case notes. Record of first three tests for each patient. Diagnostic tests, tests used to assess existing condition, explicit trials of therapy were included. Literature search for supporting evidence and grading of best evidence for each test.

Setting Inner city university teaching hospital in the United Kingdom.

Participants All new outpatients referred to a single respiratory medicine team over a period of three months.

Main outcome measures Proportion of tests supported by level 1a-1c evidence (scale developed by Centre for Evidence Based Medicine).

Results Only half the tests that were used to make or exclude a diagnosis and a fifth of the tests used to assess a known condition were supported by level 1a-1c evidence. There was no evidence to support trials of therapy.

Conclusions A large proportion of clinical tests in respiratory medicine are not supported by level 1a-1c evidence. None of the therapeutic trials that were used were supported by evidence.


Clinical practice based on scientific evidence is a major goal of the clinical governance process.1 The randomised controlled trial is regarded as the standard for the assessment of therapeutic interventions.2 Several studies have examined how many treatments in everyday clinical practice are based on good evidence in a range of specialties and in general practice.36 However, good treatment relies on accurate diagnosis and doubts have been expressed regarding the quality and breadth of the current evidence base for diagnostic tests. Criteria for appraisal of papers that assess medical tests are available,7 just as they are for studies that look at therapeutic interventions, and in diagnostic testing poor study design has been shown to be associated with significant outcome bias.8

We used established criteria to assess the quality of available evidence for tests used in routine outpatient clinical practice in one respiratory medicine clinic. Previous studies of the proportion of therapeutic interventions that are evidence based have used the patient as denominator, expressing findings as the proportion of patients who received at least one evidence based intervention. Tests behave differently in that the final diagnosis may be based on a combination of test results. If an individual patient undergoes a series of tests that include high quality evidence based tests as well as inaccurate or unassessed tests the final diagnosis may be incorrect. We therefore used tests as the denominator rather than patients.


The study took place in a UK inner city teaching hospital that provides a referral service for primary care and other specialties. We examined the notes of all consecutive patients referred to the respiratory outpatient clinic in a three month period and recorded the first three eligible tests ordered for each patient. We included tests if they were performed to make a diagnosis or to assess a prediagnosed condition. We excluded tests performed as part of routine preclinical investigation and tests, such as full blood count, if they seemed to have been performed without any specific diagnosis in mind. Routine clinical examination was not included. The tests used were recorded along with the question that they were being used to answer. We used these test-question combinations as the denominator for this study—for example, “serum angiotensin converting enzyme concentration to diagnose sarcoidosis” or “serum angiotensin converting enzyme concentration to assess activity of known sarcoidosis” were considered separately.

We divided tests into three groups: group A comprised tests aimed at making a diagnosis; group B comprised tests performed to assess a previously diagnosed condition; and group C was a trial of therapy, which we included as a special type of test, when a drug was prescribed for a limited period with the explicit intention of predicting future response in an individual. A comprehensive Medline search was performed (1966-2001) for each test-question combination by two researchers experienced in searching medical databases. We used a published strategy with a sensitivity of 92%9 followed by a freely improvised search for each test-question pair. The best evidence that we retrieved for each test-question was graded according to the scale devised by the Centre for Evidence Based Medicine, Oxford, ( (1). Some group A tests were regarded as absolutely specific and therefore graded as level 1c. In group C we searched for evidence that the result of a short term trial could predict the usefulness of a drug for an individual in the longer term.

Table 1

Levels of evidence according to criteria from Centre for Evidence Based Medicine, Oxford

View this table:


Referrals were received for 90 patients during the three month period. Patients were seen by a consultant (PJS) or specialist registrar (or equivalent) in the same team. Not all patients had three eligible tests. A total of 165 tests were recorded, 137 in group A, 15 in group B, and 13 in group C. The tests could be represented as 38 different test-question combinations; 26 in group A, 5 in group B, and 7 in group C. 2 shows the best evidence found for each test categorised and ranked according to the Centre for Evidence Based Medicine criteria. The finding of visible tumour on bronchoscopy with histological confirmation and the finding of mycobacterium tuberculosis in bronchial washings when tuberculosis was the suspected diagnosis were regarded as absolutely specific and therefore level 1c. Both investigators agreed on the level of evidence assigned to each study. In group A there was level 1a-1c evidence for half of the of test-question combinations and in group B a fifth. In group C we found no studies that examined the predictive role for five of the seven therapeutic trials. In the case of trials of oral or inhaled corticosteroids in chronic obstructive pulmonary disease we found literature that we thought did not show that these trials were predictive.

Table 2

Levels of evidence according to criteria from Centre for Evidence Based Medicine, Oxford

View this table:


Few, if any, diagnostic tests give unambiguous results. To deal with this we are advised to combine clinical impressions of pretest probability with test results to derive a post-test probability of disease.10 This requires that the test be assigned a weighting, expressed formally as a likelihood ratio—that is, calculated from the results of scientific studies of the test's performance. Standards for research of diagnostic tests have been published,7 and when these standards are not met studies have been shown to overestimate the value of tests.11 Many of the trials of diagnostic tests that are available fall short of these standards.

What is already known on this topic

Correct interpretation of test results requires information from scientific studies of test performance

If the studies do not meet quality standards the value of the test tends to be overestimated

What this study adds

Many diagnostic tests and tests used to monitor disease are not supported by high quality evidence

In 1996-7 only 30% of studies in one survey met at least six of eight standards11 and a similar survey in 1990-3 gave a figure of only 18%.12 Studies that evaluate diagnostic tests are also relatively rare. In a search of four prominent journals over a period of 16 years only 112 studies gave information on sensitivity, specificity, or likelihood ratios derived from more than 10 participants.13 It is therefore not surprising that a survey of 300 clinicians in a range of different specialties found that only 4% used formal methods to assess the accuracy of tests and 1% utilised likelihood ratios.14 Only half of the common tests we identified were supported by level 1a-1c evidence. We have also shown that there is little evidence to support tests that were used to assess previously diagnosed chronic diseases. The use of therapeutic trials to predict long term efficacy from short term response was similarly unsupported.

Our study reflects the practice in a single unit and the proportion of evidence based tests used elsewhere may be higher. Nevertheless, there is a clear need for further high quality research into medical tests, at least in the specialty that we have studied. There is also a need for an evidence base for the use of trials of therapy.


  • Contributors : PS had the original idea for the study. PJS and PS designed the study. PJS and ZB surveyed case notes, performed literature searches, and graded evidence. CH surveyed case notes. All authors commented on drafts. PJS is guarantor and can provide further details of the evidence found

  • Funding None.

  • Conflict of interest None.