Improving the quality and clinical relevance of diagnostic studies

BMJ 2006; 332 doi: (Published 11 May 2006) Cite this as: BMJ 2006;332:1129
  1. Frans H Rutten, general practitioners (F.H.Rutten{at},
  2. Karel G M Moons, professor of clinical epidemiology,
  3. Arno W Hoes, professor of clinical epidemiology and general practice
  1. Julius Centre for Health Sciences and Primary Care, University Medical Centre, Utrecht, 3508 AB, Netherlands
  1. Correspondence to: F H Rutten

    Bachmann and colleagues show that few studies on diagnostic accuracy include calculations of sample size. Most such studies are too small to provide precise estimates of the overall sensitivity and specificity of a test, let alone for subgroups,1 and few studies have investigated this issue. We support the authors' recommendation that all diagnostic studies should calculate sample size at the planning phase, especially as straightforward methods are available for assessing simple proportions, such as sensitivity and specificity. However, they used the specificity and sensitivity of single tests to calculate sample size (understandable given the predominance of these tests in research) and did not consider the increasing number of clinically relevant studies that measure the accuracy of several tests in combination.2

    If you were testing the accuracy of B-type natriuretic peptide (BNP) for excluding heart failure in primary care, for example, precise estimation of the sensitivity and specificity of the test might seem important. Such tests, however, have limited value in clinical practice. Firstly, in daily practice positive and negative values merely help doctors to estimate the probability of disease.3 Secondly, a diagnosis in practice is seldom based on one test. Doctors would probably use the BNP test only if it provided extra diagnostic information to other measures such as signs and symptoms, which have already been assessed. To improve clinical practice, it would be better to measure the diagnostic accuracy of combinations of readily available tests (applying multivariable regression analysis with receiver operating characteristic curves) and then assess whether the addition of BNP improves accuracy.4 The BNP test should not be used when the patient's history and physical examination would provide equivalent diagnostic information.

    We know even less about determinations of sample size for multivariable diagnostic studies. The number of tests studied is usually limited to allow for adequate data analysis. An often used rule is that at least 10 patients with the disease should be tested for each diagnostic test evaluated.5 Such ways of determining sample size are not ideal. If the method suggested by Bachmann and colleagues is used to determine sample size in evaluations of multiple tests, many assumptions must be made to achieve acceptable proportions of false negative and false positive diagnoses when a cut-off value is introduced.

    Methodological improvements are needed to guide considerations of sample size in diagnostic research. Lack of consensus on some of these issues is no excuse for “complete” lack of prior calculations of sample size in diagnostic studies. Bachmann and colleagues showed that a lack of such calculations is common. We hope that authors of studies on diagnostic tests will soon adopt more rigorous guidelines based on the standards for reporting of diagnostic accuracy (STARD initiative;


    • Contributors FHR, KGMM, and AWH critically discussed the structure of this article. FHR wrote the first draft and KGMM and AWH critically revised the manuscript.

    • Competing interests None declared.


    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    View Abstract