Use of 3×2 tables with an intention to diagnose approach to assess clinical performance of diagnostic tests: meta-analytical evaluation of coronary CT angiography studies

BMJ 2012; 345 doi: http://dx.doi.org/10.1136/bmj.e6717 (Published 24 October 2012)
Cite this as: BMJ 2012;345:e6717
  1. Georg M Schuetz, research assistant1,
  2. Peter Schlattmann, professor of medical statistics2,
  3. Marc Dewey, chief consultant1
  1. 1The Charité-Universitätsmedizin Berlin, Humboldt-Universität zu Berlin, Freie Universität Berlin, Department of Radiology, 10117 Berlin, Germany
  2. 2The Department of Medical Statistics, Informatics and Documentation, University Hospital of Friedrich Schiller University Jena, Jena, Germany
  1. Correspondence to: M Dewey dewey{at}charite.de
  • Accepted 24 September 2012

Abstract

Objective To determine whether a 3×2 table, using an intention to diagnose approach, is better than the “classic” 2×2 table at handling transparent reporting and non-evaluable results, when assessing the accuracy of a diagnostic test.

Design Based on a systematic search for diagnostic accuracy studies of coronary computed tomography (CT) angiography, full texts of relevant studies were evaluated to determine whether they could calculate an alternative 3×2 table. To quantify an overall effect, we pooled diagnostic accuracy values according to a meta-analytical approach.

Data sources Medline (via PubMed), Embase (via Ovid), and ISI Web of Science electronic databases.

Eligibility criteria Prospective English or German language studies comparing coronary CT with conventional coronary angiography in all patients and providing sufficient data for a patient level analysis.

Results 120 studies (10 287 patients) were eligible. Studies varied greatly in their approaches to handling non-evaluable findings. We found 26 studies (including 2298 patients) that allowed us to calculate both 2×2 tables and 3×2 tables. Using a bivariate random effects model, we compared the 2×2 table with the 3×2 table, and found significant differences for pooled sensitivity (98.2 (95% confidence interval 96.7 to 99.1) v 92.7 (88.5 to 95.3)), area under the curve (0.99 (0.98 to 1.00) v 0.93 (0.91 to 0.95)), positive likelihood ratio (9.1 (6.2 to 13.3) v 4.4 (3.3 to 6.0)), and negative likelihood ratio (0.02 (0.01 to 0.04) v 0.09 (0.06 to 0.15); (P<0.05)).

Conclusion Parameters for diagnostic performance significantly decrease if non-evaluable results are included by a 3×2 table for analysis (intention to diagnose approach). This approach provides a more realistic picture of the clinical potential of diagnostic tests.

Introduction

Clinical decisions in medicine are largely made on the basis of information gained from diagnostic testing. Against the background of more than 15 years of development and experience in evidence based medicine1 and in times of comparative effectiveness research,2 new diagnostic techniques have to be critically assessed and proven to be effective before they can be used on a wide scale. Diagnostic accuracy studies comparing an index test with a reference or gold standard and meta-analyses combining the results of many individual studies to explore a test’s diagnostic potential are an important and basic step in the overall evaluation process of the validity of a new diagnostic test.3 4 However, previous studies have shown that methodological deficits could affect the estimated diagnostic accuracy of a test.5 6 7

In non-invasive coronary imaging, technical innovations such as dual source8 and 320 row computed tomography (CT)9 have improved spatial and temporal resolution while reducing radiation. As a result of these developments, CT has evolved into the primary modality for non-invasively evaluating native coronary arteries over the past 10 years.10 11 Cardiac CT examinations performed on newer generation scanners (with at least 64 rows) have the potential to reliably rule out substantial stenoses in patients with a low to intermediate pretest likelihood, and thus can spare them an invasive catheterisation.11 12 Nonetheless, when exploring studies from this highly topical field of diagnostic imaging, we are confronted with a fundamental deficiency—non-evaluable results from the index test that are classifiable as neither positive nor negative. This problem has not yet been resolved adequately, although it has the most direct influence on diagnostic accuracy results. Coronary CT angiography studies commonly deal with non-evaluable results of the index test in different ways, especially when transferring a segment based (or vessel based) to a patient based evaluation: non-evaluable segments are simply excluded, patients with non-evaluable segments are excluded, or patients with non-evaluable segments are generally considered either positive or negative. This exclusion leads to bias and overestimation of diagnostic accuracy at the study level, which is then introduced into meta-analyses pooling data from such studies.

In this article, therefore, we aimed to investigate how different approaches of dealing with non-evaluable results lead to variations in overall diagnostic accuracy values, using a systematically compiled pool of studies of non-invasive coronary CT angiography. We proposed an approach for the transparent reporting of such results—by applying a 3×2 table—to avoid biased overestimation of diagnostic accuracy.

Methods

We performed a systematic search for CT studies of coronary angiography on a patient level using recently reported methods.13 14 Briefly, we searched Medline (via PubMed), Embase (via Ovid), and ISI Web of Science databases. The main inclusion criteria were prospective study design, conventional coronary angiography as the reference standard in evaluating native coronary arteries, both tests performed in all patients, and CT scanners with at least 12 detector rows. The studies also had to provide results allowing calculation of per patient 2×2 tables for obstructive coronary artery disease (defined as at least one coronary stenosis of at least 50%) and had to be published in English or German. We excluded studies explicitly stated to be retrospective or if they potentially overlapped with other studies. The original meta-analysis has further methodological details.13 The update search was performed on 2 February 2011. We then checked the full texts of the pool of relevant studies for the possibility to calculate an alternative 3×2 table on the per patient level—that is, giving adequate background information on single patients with coronary CT images of non-evaluable quality and the patient’s real health status defined by the invasive catheter examination (gold standard).

Statistical analysis

For the meta-analytical data evaluation, we used an exact binomial rendition15 of the bivariate, mixed effects regression model developed by van Houwelingen and colleagues16 and modified for synthesis of diagnostic test data.17 We calculated summary diagnostic performance values including 95% confidence intervals from standard data of a 2×2 table (after excluding non-evaluable results) or the 3×2 table, including non-evaluable results either in the “false negative” or the “false positive” cell of a 2×2 table (worst case scenario) according to the results of the reference standard (intention to diagnose principle). For visually illustrating and directly comparing these two approaches, we combined two summary receiver operating characteristics curves18 into one graph. We also evaluated two further common approaches, categorically declaring non-evaluable results as either positive19 or negative.20

Figures 1 to 3 summarise the different approaches and their influence on sensitivity and specificity. The “classic” 2×2 table (fig 1) does not take into account non-evaluable results. Figure 2 shows the effects of excluding non-evaluable results or declaring them as either positive or negative. Figure 3 presents the 3×2 table, suggested for transparent reporting and for avoiding overestimation of sensitivity and specificity (with an intention to diagnose approach). We used the MIDAS module21 for Stata, version 11 (StataCorp), and Proc GLIMMIX in SAS, version 9.2 (SAS Institute), to perform the analysis.

Fig 1 The “classic” 2×2 table, calculation of sensitivity and specificity

Fig 2 Different methods of handling non-evaluable results

Fig 3 3×2 table and intention to diagnose principle

Results

We found 120 eligible studies8 9 19 20 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 of coronary CT angiography (including 10 287 patients) that compared each patient’s results with conventional coronary angiography. We checked their full texts for the possibility to calculate an alternative 3×2 table.

Eleven (11/120=0.092=9%) studies did not have non-evaluable results. Twenty six (22%) studies simply excluded non-evaluable segments, and 23 (19%) excluded patients with non-evaluable segments from analysis. Twenty six (22%) studies declared all patients with non-evaluable segments as positive, and seven (6%) declared them as negative. Three (3%) studies reported findings in a 3×2 table while retaining all non-evaluable results. For 24 (20%) studies, it remained unclear how non-evaluable segments were transferred to the patient level.

For 26 studies8 9 19 20 22 25 32 35 41 45 48 61 62 72 96 97 99 105 111 112 115 116 121 125 126 129 (including 2298 patients), it was possible to calculate alternative 3×2 tables (table 1). Sensitivity, area under the curve, and positive and negative likelihood ratios indicated significantly decreased diagnostic performance (P<0.05), compared with the 2×2 table (table 2).

View this table:
Table 1

 Analysed studies with recalculated 3×2 tables including non-evaluable results

View this table:
Table 2

 Effects of different ways of handling non-evaluable results on pooled diagnostic accuracy values

Other approaches, which are sometimes also referred to as “intention to diagnose” but only declare patients with non-evaluable segments as either positive or negative, overestimated either sensitivity or specificity (table 2). Figure 4 compares the findings for the 2×2 table and the 3×2 table calculations combining summary receiver operating characteristics curves in one graph.

Fig 4 Summary receiver operating characteristics curves for 2×2 and 3×2 tables. The graph shows summary receiver operating characteristics curves using pairs of sensitivity and specificity of the 26 studies that provided enough background information to construct 3×2 tables. The upper left curve is based on the results of the studies when excluding non-evaluable results (2×2 table), the lower right curve when including them as either false positives or false negatives according to the results of the reference standard (3×2 table with an intention to diagnose approach). Curves include a summary operating point for sensitivity and specificity on the curve and a 95% confidence contour ellipsoid

Discussion

Our analysis indicated a lack of consensus on how studies of non-invasive CT coronary angiography handle non-evaluable outcomes, and our meta-analytical examination shows how different yet common strategies can distort diagnostic accuracy results. The “classic” 2×2 table (fig 1) does not hold enough information to show the true range of possible results, and forces investigators to use one of the approaches in figure 2: by simply excluding non-evaluable results, sensitivity and specificity are artificially increased; and by declaring non-evaluable results as either positive or negative, either sensitivity or specificity is overestimated, and the absolute numbers of non-evaluable results are not accessible.

From a clinical perspective, patients with non-evaluable results will have to be further evaluated to rule out or confirm significant disease. Therefore, classifying these patients as positive and taking into account that they will need further investigation seems to be clinically appropriate in several scenarios. However, in relation to the true diagnostic capabilities of the test itself, such an approach might be misleading.

Only by transforming the 2×2 table into a 3×2 table and reporting all results accordingly will researchers make study outcomes fully transparent. Furthermore, using an intention to diagnose principle (fig 3) for calculation ensures that both sensitivity and specificity are not overestimated. The range of possible outcomes for sensitivity and specificity between the two scenarios of declaring non-evaluable results as positive or negative represents the overall effect of non-evaluable results. But summarising these two scenarios with a conservative approach—that is, including non-evaluable results as false positives and false negatives—seems to summarise the true clinical potential of the diagnostic test most adequately.

Acknowledging the fact that the reference standard can also yield non-evaluable results, the 3×2 table could even be extended further to a 3×3 table, also transparently reporting these non-evaluable results of the reference standard. However, non-evaluable results of conventional coronary angiography were rare in our analysis (0.1%).

As early as 1987, Simel and colleagues138 proposed using the 3×2 table as the standard method for reporting absolute numbers of diagnostic test results. To our knowledge, they were the first to propose a solution to overcome the problem. To appraise diagnostic accuracy, they used new operational definitions of sensitivity and specificity, calculating them traditionally but including further test values (especially the overall test yield) necessary to account for non-evaluable results and to characterise the diagnostic test. Based on the 3×2 table, we systematically analysed different approaches of handling non-evaluable results on coronary CT angiography and found that this method significantly altered the results regarding diagnostic accuracy in a meta-analytical evaluation. By applying an intention to diagnose approach, we avoided the overestimation of sensitivity and specificity by including non-evaluable results in our calculation. We believe that this approach is most convenient, because the customary method of characterising diagnostic accuracy as a pair of sensitivity and specificity, without the need for further test yield values, is preserved.

In addition to the fact that there is no consensus on how to handle non-evaluable results, our full text evaluation showed a general lack of comprehensive reporting of results by coronary CT angiography studies. Thus, in 109 (109/120=0.908=91%) studies, the authors encountered non-evaluable results on the segment or vessel level, but for 24 (22%) of these studies, it remained unclear how, or if at all, the results were transferred to the patient level. Furthermore, only 26 (26/85=0.306=31%) of the remaining 85 studies provided enough background information to enable us to calculate alternative 3×2 tables. Only three (3/120=0.025=3%) studies from our pool originally reported non-evaluable results in a 3×2 table.

Our findings indicate a lack of awareness of these issues of poor reporting and inconsistent handling of non-evaluable results, which continues to persist; of the investigated 120 studies from the young field of non-invasive coronary CT angiography, 106 (88%) were published after 2005. This problem is probably not restricted to coronary CT angiography but could greatly affect diagnostic accuracy studies in general. Firstly, the methodological differences in handling non-evaluable results compromise the comparability of diagnostic accuracy data from different studies. Furthermore, the common approaches of handling non-evaluable results (fig 2) distort findings regarding diagnostic accuracy. This distortion affects not only sensitivity and specificity but also predictive values and likelihood ratios, which has important implications for clinical decision making.

If we assume that the pool of studies investigated here is representative of all studies available on the topic, the overall potential of coronary CT angiography as a test to rule out significant stenoses is weakened, because the confidence interval of the negative likelihood ratio exceeded the value of 0.1 (0.09 (0.06 to 0.15).139 Therefore, these biased results will probably also have an effect at higher levels of evidence, and will therefore affect the evaluation of new diagnostic technologies even more. This scenario could happen when such biased data are combined in systematic reviews and meta-analyses that constitute an important basis for health technology assessment reports, which influence decision and policy makers.

The STARD checklist (Standards for the Reporting of Diagnostic Accuracy Studies)140 and the QUADAS tool (Quality Assessment of Diagnostic Accuracy Studies Included in Systematic Reviews),141 the most common reporting guideline and assessment tool for diagnostic accuracy studies, do not pay enough attention to the problem of non-evaluable results. STARD item 19 postulates the reporting of results as “including indeterminate and missing” results, and STARD item 22 requests investigators to report on how these results were handled. Item 13 of the original QUADAS checklist asks “Were uninterpretable/intermediate results reported?”—despite recognising the topic, the question is restricted to the level of reporting. But the mere inclusion of “non-evaluable results as positives” does not necessarily mean that absolute numbers of non-evaluable results will be accessible to readers. In this regard, it is especially disappointing that QUADAS-2,142 the revised version of the original QUADAS tool,141 does not consider non-evaluable results explicitly. This situation underlines the need to find a consensus on how to report and integrate non-evaluable diagnostic test results in the future.

Although we believe that the issues discussed here are inherent to all diagnostic tests, our analysis is limited to a specific field of medical imaging. Therefore, the effect on diagnostic accuracy estimates in other medical fields should be evaluated in further research.

Responsible reporting is an essential component of research conduct.143 To improve the reliability and value of medical research by promoting transparent and accurate reporting, the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) network has been established.144 This international initiative aims to bring together all stakeholders with an interest in the improvement of publications and medical research, including authors, journal editors, and peer reviewers. Among other things, the network’s website (www.equator-network.org) offers a comprehensive online library with the available reporting guidelines.

Conclusions

For diagnostic accuracy studies in particular, complete reporting of all results on all levels (at the segment, vessel, and patient levels), if applicable, is the basis for fully characterising a test’s diagnostic potential. As a minimum, authors of diagnostic accuracy studies should adopt the STARD checklist140 to meet general reporting standards, and medical journals should consistently encourage (or even demand) the use of this checklist for submitting manuscripts on studies of diagnostic accuracy. Beyond that, a standardised approach for authors of diagnostic accuracy studies on coronary CT angiography (and on any other topic) could be to report their findings in a 3×2 table. If this approach is performed following intention to diagnose principles, all results become transparent to readers, and authors will be more cautious in interpreting an overly optimistic presentation of diagnostic test accuracy.

What is already known on this topic

  • Diagnostic accuracy studies and meta-analyses of pooled results constitute an important step in the evaluation of diagnostic tests

  • There is no consensus on how diagnostic accuracy studies should handle non-evaluable results, and common approaches to do so overestimate diagnostic accuracy

What this study adds

  • In a pool of studies of non-invasive coronary CT angiography, we saw no consensus on how to handle non-evaluable results

  • Common approaches of dealing with non-evaluable results led to significant differences in overall diagnostic accuracy estimates

  • Transparent reporting of findings in a 3×2 table including non-evaluable results and applying an intention to diagnose approach can provide a more realistic picture of the clinical potential of diagnostic tests

Notes

Cite this as: BMJ 2012;345:e6717

Footnotes

  • Funding: This study was supported by a grant of the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung, BMBF; FKZ: 01KG1110) for meta-analyses as part of the joint programme “clinical trials” of the BMBF and the German Science Foundation (DFG). The supporting organisation had no involvement in the design, conduct, analysis, or manuscript preparation of this study.

  • Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: PS and MD are supported by a grant of the German Federal Ministry of Education and Research (BMBF) for meta-analyses as part of the joint programme “clinical trials” of the BMBF and the German Science Foundation (DFG); PS is also supported by another grant of the (DFG) (Schl 3-1) and has received lecture fees from Bayer-Schering; MD has received grant support from Heisenberg Program of the German Research Foundation (DFG) for a professorship (DE 1361/14-1), European Regional Development Fund (20072013 2/05, 20072013 2/48), German Heart Foundation/German Foundation of Heart Research (F/23/08, F/27/10), a joint programme from the German Research Foundation (DFG) and the German Federal Ministry of Education and Research (BMBF) for meta-analyses (01KG1013, 01KG1110), GE Healthcare, Bracco, Guerbet, and Toshiba Medical Systems; MD has received lecture fees from Toshiba Medical Systems, Guerbet, Cardiac MR Academy Berlin, and Bayer (Schering-Berlex); MD is a consultant to Guerbet and one of the principal investigators of multicentre studies (CORE-64 and 320) on coronary CT angiography sponsored by Toshiba Medical Systems; MD is the editor of Coronary CT Angiography and Cardiac CT, both published by Springer, and offers hands-on workshops on cardiovascular imaging; institutional master research agreements for MD exist with Siemens Medical Solutions, Philips Medical Systems, and Toshiba Medical Systems; GMS is a physician working as a research assistant in MD’s working group, whose salary is financed by a BMBF grant for meta-analyses granted to MD; no other relationships or activities that could appear to have influenced the submitted work.

  • Ethics approval: Not required.

  • Contributors: MD had the initial idea for the manuscript; MD and GMS conceived and designed the study. MD, GMS, and PS were responsible for analysis and interpretation of data; GMS drafted and MD and PS critically revised the article for important intellectual content; All authors approved the final manuscript; MD is the guarantor. The article was initiated by the authors, who had full control of the data analysis and interpretation, and there was no industry sponsorship.

  • Data sharing: No additional data available.

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.

References

THIS WEEK'S POLL