Reporting of precision of estimates for diagnostic accuracy: a review
BMJ 1999; 318 doi: https://doi.org/10.1136/bmj.318.7194.1322-a (Published 15 May 1999) Cite this as: BMJ 1999;318:1322All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Diagnostic tests are often evaluated, compared and marketed in terms of their diagnostic performance statistics. These statistics are based on comparison of the result of a test with some independent assessment of true disease status. This comparison is best visualised with the conventional two-by-two table; sometimes referred to as a decision matrix. The numbers of observations in each cell, together with the row, column and grand totals, allow calculation of the proportions of patients with any particular combination of test result and true disease state. These proportions of correct and incorrect results are then used to calculate diagnostic performance statistics: the ‘accuracy’, ‘sensitivity’, ‘specificity’ and ‘predictive values’ for the test. Although the general application of statistical probability and decision theory is detailed in standard texts, 1,2 the statistics themselves are, as Harper and Reeves reveal, often misunderstood. 3,4 In particular, few authors, reviewers or editors seem to have appreciated that these statistics are proportions and that they represent an estimate of the probability of any particular outcome. They are all, therefore, subject to sampling error. If this error is not quantified, then the reliability of any observed performance as a true estimate of the actual performance is questionable. Harper and Reeves focus on calculating the standard error of the observed sensitivity and specificity (or a confidence interval for the true sensitivity and specificity) but perhaps fail to emphasise that the sampling error inherent in all diagnostic performance statistics should be quantified. I congratulate the BMJ for highlighting this problem but am dismayed by the poor results presented by Harper and Reeves. When will the editorial rules that apply to other statistics be applied to diagnostic performance ?
References
1. Altman DG. Practical Statistics for Medical Research. London: Chapman and Hall , 1991.
2. Armitage P, Berry G. Statistical Methods in Medical Research. Second edition. Oxford: Blackwell Scientific Publications, 1987.
3. Reporting of precision of estimates for diagnostic accuracy: a review. Harper R, Reeves B. BMJ 1999;318:1322-3.
4. Mackenzie R, Palmer CR, Lomas DJ, Dixon AK. Magnetic resonance imaging of the knee: Diagnostic performance statistics. Clin Rad 1996;51:251-7.
Competing interests: No competing interests
I agree that exact confidence intervals for Binomial proportions such as sensitivity and specificity are desirable and I think that many statistical packages do not offer these. I have written a little program for the PC which does this and it is available free on:
www.sghms.ac.uk/depts/phs/staff/jmb/jmbsoft.htm
I hope this helps.
Competing interests: No competing interests
The paper by Harper and Reeves will undoubtedly contribute to improve the quality of the reporting of research on diagnostic tests. Reporting the precision of estimates is an important issue and we can add some data. Our review on the use of diagnostic tests in papers published in Medicina Clínica ( the medical journal with the highest impact factor among those written in Spanish) showed that only 7 out of 42 (17%) papers reported precision for the estimates of diagnostic accuracy. However, our letter has another intention. While Harper and Reeves recommend the use of confidence intervals, we disagree with its use in their paper. Why to estimate the precision of the results of their revision?, to what population of papers they pretend to apply their estimation?. They have selected all the relevant papers published in the BMJ for 1996- 1997, and this is not a sample. In our opinion, it is not correct to use their results to make predictions over the past or the future on BMJ papers on diagnostic test research.
Dr. Ildefonso Hernandez-Aguado and Francisco Bolumar. Professors of Universidad Miguel Hernandez. Apdo. 18 Facultad de Medicina, E-03550 San Juan de Alicante (Spain)
1. Ramos Rincón, JM, Hernández Aguado I. Investigación sobre pruebas diagnósticas en Medicina Clínica. Valoración de la metodología. Med Clin (Barc) 1998;111:129-34
Competing interests: No competing interests
Harper and Reeves provide a welcome reminder and a useful tool for the reporting of indices of diagnostic accuracy. For some years, the Information for Authors of Clinical Chemistry has asked authors to provide confidence intervals for such indices. As Editor of that Journal, I can report (with confidence) that few papers arrive with the requested indicators of uncertainty.
We have published guidelines (1) for the reporting of studies of diagnostic accuracy. The guidelines include reminders about confidence intervals and can serve as a checklist for authors, reviewers, editors and readers. We welcome suggestions for modifications of the guidelines.
1. Bruns DE, Huth EJ, Magid E. Young DS. Toward a checklist for reporting of studies of diagnostic accuracy. Clin Chem 1997; 43:2211.
Competing interests: No competing interests
Harper and Reeves are to be congratulated on their nice graph. They are right in demanding exact confidence intervals for binomial proportions.
For this and related purposes, clever algorithms from Cytel Software are available in their StatXact program. Some of them are said to have been licensed by other companies (notably SPSS and SAS). Harper and Reeves claim that 'most statistical packages will generate exact confidence intervals'. Which ones were you thinking of and do they represent the majority of all versions of all existing statistical packages?
Unfortunately, many authors still routinely use the Normal approximation, even when this is inappropriate due to small values of n (less than 30) and proportions below 10% or above 90%. Would you please give credit to the programs who do better than that? What about confidence intervals for the difference between binomial proportions?
Any thoughts on another problem encountered when diagnostic methods are compared, i.e., the propensity of Cohen's kappa to become unstable when the true proportion is far from 50 %?
Yours faithful
Competing interests: No competing interests
Comment on precision of diagnosis
With regard to the article Reporting of precision of estimates for diagnostic accuracy: a review.
May I draw your attention to two points regarding the approximation of the binomial distribution to the normal distribution using s.e. = vpq/n.
First, p and q can obviously be used interchangeably and so the correct conditions should be n x p = 10 and n x q = 10, In fact, Fleiss suggests a value of 5 rather than 10.
Second, it may be slightly confusing to the non-statistician to use the term 'sample size' for n. In fact, for sensitivity n refers to the sub -sample comprising the true-positives plus false-positive for sensitivity (n.1 in statistical annotation) and similarly for specificity, true- negative plus false-negative (n.2). Of course, if accuracy was to be quoted, then one would use the total sample size (n..).
Dr P B Pynsent
Fleiss JL (1981) Statistical methods for rates and proportions. (second edition) John Wiley & Sons, New York, p13.
Competing interests: No competing interests