Research

Sample sizes of studies on diagnostic accuracy: literature survey

BMJ 2006; 332 doi: https://doi.org/10.1136/bmj.38793.637789.2F (Published 11 May 2006) Cite this as: BMJ 2006;332:1127
  1. Lucas M Bachmann, senior research fellow (lucas.bachmann{at}evimed.ch)1,
  2. Milo A Puhan, research fellow2,
  3. Gerben ter Riet, clinical epidemiologist3,
  4. Patrick M Bossuyt, professor4
  1. 1 Division of Epidemiology and Biostatistics, Department of Social and Preventive Medicine, University of Bern, Switzerland
  2. 2 Horten Centre, University of Zurich, CH-8091 Zurich, Switzerland
  3. 3 Department of General Practice, Academic Medical Centre, 1105 AZ Amsterdam, Netherlands
  4. 4 Department of Clinical Epidemiology and Biostatistics, Academic Medical Centre, Amsterdam, Netherlands
  1. Correspondence to: L M Bachmann
  • Accepted 7 March 2006

Abstract

Objectives To determine sample sizes in studies on diagnostic accuracy and the proportion of studies that report calculations of sample size.

Design Literature survey.

Data sources All issues of eight leading journals published in 2002.

Methods Sample sizes, number of subgroup analyses, and how often studies reported calculations of sample size were extracted.

Results 43 of 8999 articles were non-screening studies on diagnostic accuracy. The median sample size was 118 (interquartile range 71-350) and the median prevalence of the target condition was 43% (27-61%). The median number of patients with the target condition—needed to calculate a test's sensitivity—was 49 (28-91). The median number of patients without the target condition—needed to determine a test's specificity—was 76 (27-209). Two of the 43 studies (5%) reported a priori calculations of sample size. Twenty articles (47%) reported results for patient subgroups. The number of subgroups ranged from two to 19 (median four). No studies reported that sample size was calculated on the basis of preplanned analyses of subgroups.

Conclusion Few studies on diagnostic accuracy report considerations of sample size. The number of participants in most studies on diagnostic accuracy is probably too small to analyse variability of measures of accuracy across patient subgroups.

Introduction

Estimates of sensitivity and specificity in small studies on diagnostic accuracy are usually imprecise, with wide confidence intervals. This makes it difficult to assess just how informative a test may be. Subgroup analysis is often needed because sensitivity and specificity may vary across patient subgroups, yet estimates are even less precise when subgroups are considered.1 Investigators should calculate the sample size needed for sufficiently narrow confidence intervals at the planning stages of a study, as is common practice for randomised trials.2 3 For example, if a diagnostic test requires a sensitivity of at least 90% for adequate decision making, the lower boundary of the 95% confidence interval should be at least 90%.

We hypothesised that studies of diagnostic accuracy rarely report considerations of sample size and tend to be small. We assumed that authors would state calculations of sample size if they had been performed. We investigated study sizes, the number of subgroup analyses, and how often studies on diagnostic accuracy reported calculations of sample sizes.

Methods

Two reviewers independently screened all issues of the BMJ, Lancet, New England Journal of Medicine, and JAMA as well as four specialist journals (Thorax, Gastroenterology, American Journal of Obstetrics and Gynecology, and European Journal of Pediatrics) published in 2002 for studies on the accuracy of tests. From each full report we extracted data on the type of test(s) studied (table), study sizes, the number of subgroup analyses, and how often the studies reported calculations of sample size. We calculated 95% confidence intervals, medians, and interquartile ranges.

Key features of 57 studies on accuracy of diagnostic tests published in eight major medical journals in 2002

View this table:

Results

Fifty seven of 8999 articles reported test accuracy. Fourteen studies focused on a screening test and were excluded, which left 43 clinical studies for analysis. The median sample size was 118 (interquartile range 71-350) and the median prevalence was 43% (27-61%). The median number of patients with the target condition—needed to calculate a test's sensitivity—was 49 (28-91). The median number of patients without the target condition—needed to determine a test's specificity—was 76 (27-209).

Two of 43 studies (5%; 95% confidence interval 1.3% to 15.5%) reported a priori calculations of sample size, but no study reported that the sample size had been calculated on the basis of preplanned analyses of subgroups. Twenty articles (47%) reported results for subgroups of patients. The number of subgroups ranged from two to 19 (median four). Four studies used multivariable regression, but none used interaction terms.

Discussion

In this survey of studies on diagnostic accuracy in eight major journals, only 4.7% of the studies reported that they considered sample size. Analysing small numbers of participants with and without the target condition usually yields imprecise estimates of overall diagnostic accuracy, and even less precise estimates of subgroups. For example, when the number of patients with the target condition is 49 the two sided 95% confidence interval of a sensitivity of 81% (40 true positives) is 68% to 91%.4 5

To ensure reasonably precise estimates of sensitivity and specificity investigators should consider sample sizes during the planning stages of the study. Investigators should calculate how precise the estimates of test accuracy should be for a particular diagnostic situation and report these calculations with confidence intervals. Arguably, sample size calculations are not important once data collection has been completed.2 All that matters is the width of the confidence intervals. However, besides determining the minimum study size needed, calculations of sample size have another useful feature that remains important after the study has finished. These calculations require authors to think about the minimum precision needed for a test to be clinically meaningful. It is easier for readers to interpret reported confidence intervals if they have access to these data.

In conclusion, few studies on diagnostic accuracy report calculations of sample size. The number of participants in most studies on diagnostic accuracy is probably too small to analyse the variability of measures of accuracy across subgroups of patients.

What is already known on this topic

To assess the minimum size needed for sufficiently narrow confidence intervals of sensitivity and specificity in study groups as a whole and in clinically relevant subgroups in particular, sample sizes should be considered at the planning stage of studies on test accuracy

What this study adds

Few studies on test accuracy report calculations of sample size

Overall size and subgroup size tend to be small in these studies, which leads to imprecise estimates of sensitivity and specificity

Footnotes

  • This article was posted on bmj.com on 20 April 2006: http://bmj.com/cgi/doi/10.1136/bmj.38793.637789.2F

  • Contributors All members of the SUBIRAR (subjectivity rationality and reasoning) research collaboration (Klaus Eichler, Madlaina Scharplatz, and Johann Steurer, Horten Centre, University of Zurich, Switzerland, Ulrich Hoffrage, Max Planck Institute for Human Development and Cognition, Berlin, Germany; Alfons G Kessels, Hans Severens, Maastricht University, Germany; Khalid S Khan, University of Birmingham, UK; Jos Kleijnen, Centre for Reviews and Dissemination, University of York, UK) were involved in the design and critical review of the study. LMB, MAP, and GtR developed the protocol. LMB and MAP acquired the data. All authors interpreted the data and helped prepare the manuscript. LMB was guarantor.

  • Funding LMB was supported by the Swiss National Science Foundation (grants 3233B0-103182 and 3200B0-103183).

  • Competing interests None declared.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
View Abstract