Assessing the value of diagnostic tests: a framework for designing and evaluating trials

BMJ 2012; 344 doi: http://dx.doi.org/10.1136/bmj.e686 (Published 21 February 2012)
Cite this as: BMJ 2012;344:e686
  1. Lavinia Ferrante di Ruffano, research fellow1,
  2. Christopher J Hyde, professor of public health and clinical epidemiology2,
  3. Kirsten J McCaffery, associate professor and principal research fellow3,
  4. Patrick M M Bossuyt, professor of clinical epidemiology4,
  5. Jonathan J Deeks, professor of biostatistics1
  1. 1Department of Public Health, Epidemiology, and Biostatistics, School of Health and Population Sciences, University of Birmingham, Birmingham B15 2TT, UK
  2. 2PenTAG, Institute for Health Services Research, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter, UK
  3. 3Screening and Test Evaluation Program, School of Public Health, University of Sydney, Sydney, Australia
  4. 4Department of Clinical Epidemiology and Biostatistics, Academic Medical Centre, University of Amsterdam, Amsterdam, Netherlands
  1. Correspondence to: Jonathan J Deeks j.deeks{at}bham.ac.uk
  • Accepted 30 November 2011

The value of a diagnostic test is not simply measured by its accuracy, but depends on how it affects patient health. This article presents a framework for the design and interpretation of studies that evaluate the health consequences of new diagnostic tests

Most studies of diagnostic tests evaluate only their accuracy. Although such studies describe how well tests identify patients with disease (sensitivity) or without disease (specificity), further evidence is needed to determine a test’s true clinical value. Firstly, since tests are rarely used in isolation, studies are needed that evaluate the performance of testing strategies, accounting for when and how a new test is used within a diagnostic pathway, and how its findings are combined with results of other tests.1 Secondly, decision making involves selecting among multiple testing strategies; thus studies that compare test strategies and estimate differences in sensitivity and specificity are more informative than those that evaluate the accuracy of one test or diagnostic strategy.2 Thirdly, improvements in test accuracy will not benefit patients unless they lead to changes in diagnoses and patient management, requiring evaluations of the effect of improved accuracy on decision making.3 Finally, improved decision making is only one route by which tests affect patient health, and empirical evaluations are needed to compare the effect of test strategies on patient health.4

Ideally, new tests should only be introduced into clinical practice if evidence indicates that they have a better chance of improving patient health than existing tests.5 6 Tests can be compared by evaluating the downstream consequences of testing on patient outcomes, either directly in a randomised controlled trial or by decision analysis models that integrate multiple sources of evidence. Test-treatment trials randomly allocate patients to tests, follow up subsequent management, and measure outcomes only after treatment has been received (fig 1).7 Decision models use existing clinical data to extrapolate, through a number of assumptions, the link between intermediate outcomes (such as accuracy) and long term outcomes.8 A key issue for trials and decision models is the selection of outcomes that need to be measured or modelled to evaluate how tests are affecting patients. This selection requires a priori knowledge of the mechanisms by which tests affect patient health.

Fig 1 Design of a test-treatment randomised trial assessing whether bronchoalveolar lavage reduces the rate of death from ventilator associated pneumonia compared with endotracheal aspiration.7 *All patients received broad spectrum antibiotics while waiting for test results. †In patients with confirmed pneumonia, antibiotics were adjusted using culture results and sensitivities; in test negative patients, antibiotics were discontinued

In this article, we provide a comprehensive review of the mechanisms that can drive changes to patient health from testing, and include a summary checklist to assist readers, researchers, and funders who wish to design or appraise studies evaluating diagnostic tests. We have based our framework on a review of a large cohort of published test-treatment trials 9 and key methodological literature.

Effect of tests on patient health

To establish whether a new diagnostic test will change health outcomes, it must be examined as part of a broader management strategy. Testing represents the first step of a test-treatment process: (1) a test is administered to identify a target condition, (2) the test result is considered (3) alongside other evidence to decide a diagnosis, and (4) a course of treatment is identified (5) and implemented (fig 2).10

Fig 2 Simplified test-treatment pathway showing each component of a patient’s management that can affect health outcomes10

Changes to any aspect of this pathway after the introduction of a new test could trigger changes in health outcomes. Table 1 lists the mechanisms that commonly affect health outcomes.

View this table:
Table 1

Attributes of the test-treatment pathway that affect patient health

Direct test effects

Test procedure

Some diagnostic procedures carry a risk of harm, hence alternatives that offer reduced procedural morbidity will be of immediate benefit to patients. For example, use of sentinel lymph node biopsy rather than dissection of the axillary node to investigate metastatic spread in patients with early breast cancer results in much lower rates of postoperative swelling of the arm, seroma formation, numbness, and paraesthesia.11

Altering clinical decisions and actions

Feasibility and interpretability

The downstream value of a test will be impaired at the outset if there are contraindications to its use or if it is prone to technical failure (feasibility), while tests that are more difficult to interpret (interpretability) could produce fewer definitive results. Either problem could require additional investigations, increasing the time to diagnosis, or reducing diagnostic and therapeutic yields through incorrect decision making or poor diagnostic confidence.

We observed this in a trial evaluating the diagnosis of coronary artery disease. Patients with acute chest pain who were allocated to exercise electrocardiography were significantly more likely to be referred for further investigation (coronary angiography) than those allocated to stress echocardiography.12 This finding was caused by the higher frequency of inconclusive diagnoses produced by exercise electrocardiography, some of which were because the test was contraindicated.

Test accuracy, diagnostic yield, therapeutic yield, and treatment efficacy

More accurate tests will improve patient outcomes if the reductions in false positive or false negative results lead to more people receiving appropriate diagnoses (diagnostic yield) and appropriate treatment (therapeutic yield). The degree to which appropriate treatment can improve patient outcomes depends on its efficacy (treatment efficacy). In a trial evaluating the effect of fluorescence cystoscopy on the recurrence of bladder carcinoma in situ, the enhanced accuracy of fluorescence cystoscopy compared with white light cystoscopy alone led to a substantial increase in lesions being identified and treated at initial diagnosis, which significantly reduced the rate of recurrence.13

Diagnostic and therapeutic confidence

Although diagnostic yield generally increases with accuracy, it is also affected by a doctor’s confidence in the diagnostic test. Tests inducing greater confidence could benefit patients by reducing the need for further investigations and shortening the time to treatment. The results of a trial evaluating the triage of patients with non-small cell lung cancer who were referred for operative staging with positron emission tomography (PET), show how a lack of diagnostic confidence can over-ride the benefits of improved accuracy.14 PET identified patients for whom surgery was not indicated because of incurable mediastinal disease, but no difference was found in the proportion of patients avoiding a thoracotomy (the primary outcome) because surgeons still preferred to confirm PET findings using standard operative staging.

Doctors’ confidence in the ensuing success of a treatment plan can affect treatment effectiveness by influencing the approach to treatment, particularly in surgery. Digital subtraction angiography (DSA) and multidetector row computed tomographic angiography (MDR-CTA) can both determine the location and degree of vascular narrowing in patients with symptomatic hardening of peripheral arteries. Doctors using DSA were significantly more confident of plans for surgery, owing to the test’s clearer vascular images; however, MDR-CTA images were found to obscure interpretation and decrease confidence in the presence of vessel wall calcifications.15

Changing timeframes of decisions and actions

Tests that are undertaken earlier or produce results more quickly can improve health outcomes. For example, patients with unstable angina and non-ST segment elevated myocardial infarction allocated to receive early coronary angiography had a reduced risk of death, non-fatal cardiac events, and readmission.16 Patients with ventilator associated pneumonia allocated to a rapid antimicrobial susceptibility test received definitive results on average 2.8 days earlier than those receiving the standard susceptibility test and experienced significantly fewer days of fever, bouts of diarrhoea, and days on mechanical ventilation.17

However, quicker results are beneficial only if they produce earlier diagnosis or treatment. The addition of polymerase chain reaction (PCR) to conventional analysis of nasopharyngeal swabs for distinguishing between viral and bacterial causes of lower respiratory tract infection failed to decrease time to treatment, because physicians were unwilling to base treatment decisions solely on PCR, preferring to wait for slower bacterial results.18 Earlier diagnosis can provide psychological benefit by dispelling anxiety or providing earlier reassurance but can also cause psychological harm, particularly if effective treatments are unavailable. The psychosocial impacts of an earlier diagnosis have been highlighted in women following a positive cervical smear test19 or mammogram.20

Influencing patient and clinician perceptions

The patient’s perspective and the doctor’s personal perspective can also influence decision making, sometimes in unexpected ways. These unpredictable responses can eliminate or enhance potential improvements gained from other aspects of the test-treatment pathway.

Patients

Patients’ perceptions of testing, their experience of the testing process, and their understanding of the test result can all affect downstream health. Many studies show social, emotional, cognitive, and behavioural effects of testing across various clinical conditions.21

Test-treatment pathways will be unsuccessful if patients are unwilling to undergo a procedure. This is especially important if multiple testing is required; an unpleasant first test can adversely influence patients’ willingness to attend follow-up testing or treatment. The experience of undergoing tests can also influence illness beliefs. In a randomised trial, women who were able to observe their diagnostic hysteroscopy on a screen were reportedly less optimistic about the effectiveness of treatment offered, experienced more anxiety, but were better able to deal with procedural discomfort than women who could not see the screen.22

Diagnostic placebo effects might occur if the impression of a thorough investigation improves perceptions of health status. This could account for the significant improvements in health utility that were reported by patients with acute undifferentiated chest pain diagnosed in a specialist unit, compared with those diagnosed in emergency departments, despite having equivalent treatment and rates of adverse cardiac events.23

Receiving a diagnosis can have behavioural and health consequences—for example, by confirming patients’ negative health beliefs. Patients with lower back pain reported higher pain scores and poorer health status after receiving an x ray than those who received only a standard consultation.24 The incidental diagnosis of non-pathological abnormalities may have given patients a reason for their pain and encouraged illness behaviour despite the absence of an organic cause.

Adherence to treatment

Patients’ experiences and perceptions of the test-treatment pathway will also affect downstream health behaviours, such as the willingness or motivation to adhere to medical advice.25 Negative perceptions or experiences of testing and clinical diagnosis could cause patients to lose confidence in the diagnosis or management plan, making them reluctant to have subsequent testing or treatment.

Doctors

Doctors’ emotional, cognitive, social, or behavioural perspectives, although external to objective medical concerns, are nevertheless important in decision making. Referring doctors might modify management to reassure and satisfy patients or to prevent perceived threats of malpractice, often by requesting additional diagnostic information.26 This defensive medicine tends to raise the diagnostic threshold needed to trigger a change in management,27 and if additional tests are less accurate, harmful, or lead to treatment delays, patients will be adversely affected.

Systemic approach to evaluating tests

These examples establish that diagnostic tests often affect patient health outcomes in many complex ways. Although test accuracy is commonly regarded as the main mechanism to influence clinical effectiveness,28 we caution against its use as a surrogate for patient health. Only by looking at the test-treatment pathway as a whole can we identify which outcomes need to be evaluated to fully capture a test’s health effects.

Sound evaluations of healthcare demand explanation of how the intervention will improve patient health.29 This is equally true of diagnostic tests, although they are considerably more challenging to evaluate because so many intermediate, interacting factors are at stake. The need to identify which of these factors will exert an effect and how, is a key tenet of complex intervention guidance. 30 Table 2 provides a list of questions to guide the structured assessment of which processes are relevant and need to be measured within a given diagnostic comparison. This approach highlights precisely where in a test-treatment pathway important differences might originate, and will be useful for designing studies, appraising existing research, and determining what new evidence is needed to formulate diagnostic guidelines (box).

View this table:
Table 2

 Checklist to determine clinically important differences between test-treatment pathways of new and existing diagnostic test strategies

Box: Example evaluation of a diagnostic test

Consider replacing conventional imaging (which usually involves multiple images with different technologies) with PET-computed tomography (CT) for the diagnosis of breast cancer recurrence in adults with clinically suspected tumours. The first step is to state the alternative diagnostic and management pathways that will be compared, and to note the differences between them to narrow down mechanisms to consider.

How PET-CT improves patient health

On the basis of a recent systematic review, we might expect the improved accuracy of PET-CT to be the main mechanism driving changes to health.31 A more accurate differentiation of patients with and without recurrence would increase diagnostic and therapeutic yields, and the treatment consequences thereof.

Accuracy improvements could be offset by other decisions; patient contraindications to PET-CT could mean patients must revert to the existing multitest strategy. Although the known technical capabilities of PET-CT might increase doctors’ confidence, the obligation to rely on the results of one test could initially weaken such confidence, thus reducing the effective accuracy of the new protocol.

By contrast, use of a single test could accelerate treatment by enabling a quicker diagnosis. Nevertheless, the requirement for a specialist to interpret PET-CT scans could mitigate this benefit. Comparative procedural harms might also differ, highlighting the importance of considering direct health outcomes, although conventional imaging usually requires CT, so the exposure to radiation as a consequence of using PET-CT is probably similar. However, the success with which the new strategy operates will depend on any differences in perceptions and experiences; PET-CT might be more or less reassuring to patients and clinicians, and these unknown influences would need to be measured carefully.

Choosing outcomes to evaluate PET-CT

Using the framework prompts the consideration of informative outcomes by showing the new pathway’s full range of health effects, and allowing the assessment of all relevant direct and downstream measures of important patient outcomes. In the present example, such outcomes might include measures of anxiety, reassurance, health beliefs, function, symptoms, recurrence, progression, and survival.

Identified mechanisms can be measured as process outcomes in order to assess whether the new pathway is operating as expected. For example, the impact of temporality could be assessed as the time to diagnosis or time to treatment, and diagnostic confidence might be measured directly or by the number of additional investigations ordered.

We identify three benefits from using this framework. Firstly, it presents a structure for carefully developing a rationale that underpins the performance of a putative testing strategy. Secondly, it guides the identification of outcomes for randomised controlled trials, and will also assist in constructing appropriate decision models, particularly when trials are not practicable.32 Finally, the approach supports a full interpretation of empirical results by enabling trialists to distinguish between true ineffectiveness, poor protocol implementation, and methodological flaws in the study design.33 These tasks are particularly important for trials of tests, where sample sizes often need to be several orders of magnitude larger than they do in trials of treatments to detect differences in patient outcomes (fig 3).34 Findings of no effect are all too often interpreted as “evidence of absence,” when in reality studies rarely make provision for being able to attribute negative results to the diagnostic intervention, the study design, or (importantly) an inconsistently implemented test-treatment strategy. These interpretations can be distinguished by identifying and measuring the relevant driving mechanisms. By recording the use of additional diagnostic tests, treatments, and decision making, the failure of PET to reduce the rate of thoracotomies in patients with non-small cell lung cancer was shown to lie with an ill conceived treatment strategy, rather than with efficacy of the test.14 The trialists identified patients for whom PET, unexpectedly failed to change management decisions, and they then found that strong preferences for the existing management (to operate on all patients with stage IIIa disease) exceeded the effect of PET results. By identifying all relevant mechanisms, and measuring how they exert their effect, test-treatment trials are more likely to contribute important evidence to the use of tests in clinical practice.

Fig 3 Sample size calculations for test-treatment randomised controlled trials. In randomised trials of interventions, all participants in a study group are allocated to receive the same intervention. In test-treatment trials, participants in each group receive a variety of interventions, depending on the test results and ensuing diagnosis. The magnitude of the observed treatment effect depends on the differences in proportions of patients who receive interventions appropriate to their condition in each group. This proportion would be expected to be quite small. The figure identifies those participants who contribute statistical power in a randomised trial comparing two tests (where the difference in outcome originates entirely from a difference in diagnostic accuracy). Test 2 has higher sensitivity than test 1 (difference shown in A). Test 2 also has higher specificity than test 1 (difference shown in B). Different widths of diseased and non-diseased columns indicate the prevalence of disease in the study sample. Only participants in A and B would have different test results if they received test 2 rather than test 1 and therefore the potential for different outcomes (all other participants in the study would have the same test result, irrespective of which test they were allocated to). Statistical power therefore depends on only the numbers of participants in A and B (particularly A); for example, if disease prevalence was 20%, and test 2 improved sensitivity by 20%, only 4% of the total sample size would fall in A34

Conclusion

Establishing benefit to patient health must be the priority for diagnostic evaluations. Test accuracy is one component of test evaluation, but does not capture the impact of tests on patients. By considering the ways in which tests affect patients’ health, we reiterate the complex intervention perspective30 that it is not sufficient to measure outcomes, but rather it is essential to understand how these outputs are created, by conducting analyses of their workings and the mechanisms that underpin them. Clearly, this process must be undertaken with expert and stakeholder consultation to ensure all influential mechanisms are identified.

Summary points

  • The value of diagnostic tests ultimately lies in their effect on patient outcomes

  • Tests can affect patient health by changing diagnostic and treatment decisions, affecting time to treatment, modifying patient perceptions and behaviour, or putting patients at risk of direct harm

  • Improved accuracy is not always a necessary prerequisite for improving patient health, nor does it guarantee other downstream improvements

  • All elements of the management process (including decision making and treatment) must be considered when evaluating a diagnostic test

  • Randomised controlled trials of tests can measure these processes directly to understand why and how changes to patient health have occurred

Notes

Cite this as: BMJ 2012;344:e686

Footnotes

  • Contributors: JJD conceived the idea for this project with support from CJH. LFR did most of the primary research for the paper. The initial framework was devised by LFR, CJH, and JJD, and further refined by all authors. All the authors drafted, revised, and gave final approval to the article. JJD is the guarantor.

  • Funding: The development of the framework was funded partly by the UK Medical Research Council Methodology Programme (grant G0600545, awarded to JJD), as part of a wider investigation into the use of randomised trials for evaluating the clinical effectiveness of diagnostic tests. The funders had no involvement in the research project. JJD is partly supported by the Medical Research Council Midland Hub for Trials Methodology Research, University of Birmingham (grant G0800808).

  • Competing interests: All authors have completed the ICJME unified disclosure form at www.icmje.org/coi_disclosure.pdf and declare: the work was funded partly by the UK Medical Research Council Methodology Programme; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; and no other relationships or activities that could appear to have influenced the submitted work.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

References

THIS WEEK'S POLL