Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiativeBMJ 2003; 326 doi: https://doi.org/10.1136/bmj.326.7379.41 (Published 04 January 2003) Cite this as: BMJ 2003;326:41
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Publication of the STARD paper, on standards for reporting of studies
of diagnostic accuracy, should ensure increased attention to the problems
of poor diagnostic research. Increased awareness of reporting of accuracy
studies should lead to better study designs and hence improve the evidence
base for diagnostic tests.
However, accuracy is but one aspect of assessing diagnostic tests.
Other evidence is required for determining the clinical utility of a test
- reproducibility, effectiveness and cost-effectiveness. A test is not
robust if not reproducible, yet evidence is often lacking. The effect of
test accuracy on patient outcomes is crucial, but the size of effect and
the optimum balance between sensitivity and specificity depends on the
context in which the test is used. Decisions about patient management may
be based on a test alone, for example with a screening test, or be part of
a battery of tests. For a screening test, in populations with a very low
prevalence of disease, false negatives are highly undesirable. With
additional information about a patient, reducing false positives may
become more important.
Whilst it may not be possible to provide comprehensive information
for established diagnostic tests, thorough appraisal of new tests should
include their relation to patient outcomes.
Improved reporting of accuracy studies is an excellent first step,
since currently new tests can be introduced with little evidence, unlike
the rigorous evaluation required for a new drug. However, we should be
moving to a staged evaluation for new tests, especially for population
screening, that adds evidence of effect on clinical, cost and personal
outcomes to measurement of the basic parameters of sensitivity,
specificity and reproducibility. This demands additional evaluative
methodology including clinical trials and mathematical modelling beyond
the studies discussed by the STARD group.
Competing interests: No competing interests
Jairo Echeverry-Raad and Patricia Agualimpia-Franky make valuable comments
about the difference between "validity" and "precision" and the necessity
to have both valid and precise estimators of diagnostic accuracy.
Sample size considerations were on the long list of potential items
for the STARD checklist but the item is not part of the final list,
published in BMJ and other journals. Yet we would like to point out that
item 12 of the STARD checklist addresses the statistical methods to
quantify uncertainty and that item 21 asks the authors to “Report
estimates of diagnostic accuracy and measures of statistical uncertainty
(e.g. 95% confidence intervals)”.
Many studies of diagnostic accuracy are rather small in size, with
expressions of imprecision often lacking from the respective publications.
We hope that the STARD checklist can help authors and reviewers to pay
more attention to these issues.
Competing interests: No competing interests
Would the incorporation of a sample size item be appropriate in the
checklist for reporting diagnostic accuracy studies – STARD statment?
It is interesting to be appreciating in the last years, a global process
of standardization in the designs and conduction of diagnostic test
studies, as the elements for it’s critical appraisal (1, 2), and that this
process has found in the pages of BMJ the scenario of dissemination.
It is remarkable, and it was necessary, the effort made by The
Standards for Reporting of Diagnostic Accuracy (STARD) steering group, and
their consensus statement published recently in extensive(3), as well as
the summary appeared in the BMJ (4).
In the statement, the face-validity and content-validity of the
checklist with 25 items seem adequate, however, in our concept, one item
extremely important, related with the consistency and precision of the
data, is the Sample Size required, which is missing.
In a strange way, we don't know why, it is infrequent to find diagnostic
test studies report, that have calculated the appropriate sample size to
obtain precise estimates.
So it is important to distinguish between the terms " validity " and
" precision ". These are two different inaccuracy sources when estimating
an effect: Systematic Error (attentive against validity) and Random Error
(attentive against precision). Systematic error occur when there is a
difference between what the estimator is actually estimating and the true
measure effect. Systematic error (also known as bias) is attributable to
methodological aspects of the design and analysis different to sampling
variation, particularly the selection of the subjects, the information
quality obtained and the measurement of the important variables other
than the disease and study factor. The Random Error (also known as
“chance”), on the other hand, is the difference between the estimate
obtained from study and the parameter actually being estimated. Random
Error is essentially attributable to sampling variation, depends on
aspects of study design (sample size considerations) and characteristics
of the estimator (its variance) (5).
It is necessary to have in first instance valid and then precise
estimators, and although they both are conceptually different they should
be two qualities accompanying the process.
Then, the presence of a significant Random Error, by inadequate
sample size, generates lack of precision in the data, and this in turn has
implications from the statistical point of view (inability of rejecting
the Null Hypothesis or confidence intervals very wide). Small arrangements
or modifications in the numbers, in the cells of 2x2 table generated big
changes or differences in the results. In small groups of patients, the
new attempts to validate the test, will be generating dissimilar results
only for chance. This is more evident when subgroups analysis are
Now, what research and clinical implications could the lack of
precision have, in the sensitivity, specificity, Likelihood Ratios, etc,
in the diagnostic test studies with small samples?. What would happen, if
simultaneously the sample study subject is enriched with a pre-test
probability unusually high (like happens on hospitals of third level where
usually most of the diagnostic test are proven)?.
Well, there are high possibilities, by simple chance, the estimation
obtained in a small sample study results in a high operative indicator,
overestimating the true and appearing erroneously excellent. Also the
lack of stable and consistent results, will determine discrepancy study to
study, and its use in other scenarios with broad spectrum and smaller
prevalence disease, to generate many difficulties in the applicability and
reproducibility of the test in daily setting of clinical practice.
Therefore, if the previous considerations are true, wouldn’t it be
important to include the sample size item in the checklist for reporting
diagnostic accuracy studies?. And farther more, propose methodological
elements for the calculation of the designs of diagnostics tests sample
JAIRO ECHEVERRY-RAAD M.D.
Pediatrician, Associated Professor, Faculty of Medicina, Universidad
Nacional de Colombia, Bogotá, Colombia, South America.
Pediatrician, Universidad Nacional de Colombia,Bogotá, Colombia, South
1 Knottnerus J A, Weel C, Muris JWM. Evidence base of clinical diagnosis:
Evaluation of diagnostic procedures. BMJ 2002;324:478-480
2 Irwig L, Bossuyt P, Glasziou P, Gatsonis C, Lijmer J. Evidence base
of clinical diagnosis: Designing studies to ensure that estimates of test
accuracy are transferable. BMJ 2002;324:669-671
3 Bossuyt P, Reitsma JB, Bruns DE, Gatsonis C, Glasziou P, Irwig L,
et al. The STARD statement for reporting studies of diagnostic accuracy:
explanation an elaboration. Clin Chem 2003;49:7-18.
4 Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig
LM, Lijmer JG, Moher D, Rennie D, de Vet HCW for the STARD steering group.
Towards complete and accurate reporting of studies of diagnostic accuracy:
the STARD initiative. BMJ 2003; 326:41-44.
5 Kleimbaum DG, Kupper LL, Morgenstern H. Validity: General
considerations. In: Kleimbaum DG, Kupper LL, Morgenstern H, Eds.
Epidemiologic Research 1982. Edit Van Nostrand Reinhold. New York. Chap
Competing interests: No competing interests
In my opinion, the event of a joint publication of the STARD
statement in several scientific magazines has really to be marked “cum
albo lapillo” (a memorable occurrence). Perhaps this is particularly true
for clinical microbiologists, because the document is also being published
in the first issue in 2003 of Journal of Clinical Microbiology (JCM) (1).
So far clinical microbiology authorities have kept silent about the
application of EBM principles to diagnostic tests, what has made the EBM
proposal even harder in a widely hostile territory.
In the November issue of JCM, a scenario of clinical microbiology in
the year 2025 has been portrayed by Dunne et al: “Point-of-care diagnostic
systems … reversed the trend of centralized laboratory services
popularized in the late 1990s … Those commercial and central laboratories
that successfully adapted to the new market required dramatic changes in
the educational level of their personnel …” (2). While this scenario is
depicted for the USA it also seems thinkable for different countries and
not only in market terms. In one example reported by Dunne, “Fifteen DNA
and RNA sequences specific for Streptococcus pyogenes were detected in
sufficient quantity (by an “upper-respiratory cassette”) to suggest a
diagnosis of streptococcal pharyngitis. The organism was determined to
harbour macrolide-lincosamide-streptogramin B resistance secondary to the
detection of the ermTr gene sequence and is also resistent to beta-lactam
antibiotics because of the expression of a a common molecular class A beta
-lactamase …”(2). For all that specialized testing a lot of health
technology assessment will surely be required, with adequate reporting of
As Professor Feinstein (3) recently pointed out, diagnostic
technologies should not only be evaluated on their diagnostic accuracy
(their ability to determine the presence or absence of the disease), but
also on their ability to change patient outcome (f.i. bacterial
sensitivity to antibiotics or test influence on the treatment choices). In
their comment to Feinstein’s article, Moons and Grobbee (4) explain that
complex mathematical models (f.i. logistic regression analysis) - not
singular test parameters (sensitivity, specificity, likelihood ratios) -
are needed to obtain true (added) diagnostic accuracy of a test. Besides,
follow-up studies, or clinical trials - instead of cross sectional studies
– are needed when quantification of beneficial effects of a diagnostic
test for patient outcome is deemed necessary. According to Choi (5), a
balance can be provided between simplicity and complexity: one way to do
it is to create “complex models with simple model-user interface”. Maybe
so, most physicians will be as able to easily interpret and apply hi-tech
POC testing described by Dunne et al (2), as most drivers or pilots are
using complicated machines “without having a clue to what is going inside
of them” (5).
These are some of the many challenges for researchers on diagnostic
tests. In my opinion, improving the accuracy and completeness of reporting
of studies of diagnostic accuracy is one of the first faces of the
mountain that clinical microbiologists must climb, together with users of
their tests. Both of them are now at a crossroads: the uphill EBM way or
the downhill CRAP way (6) of doing the thing.
PS: The STARD statement has been translated into Italian by the EBM
working group of the Italian Society for Clinical Microbiology (7), and
so will be its explanation (8). Recently, this group also made an appeal
for immediate implementation of the STARD statement, together with many
colleagues of different specialties (9,10).
1. PM Bossuyt, et al. for the STARD steering group. Towards complete
and accurate reporting of studies of diagnostic accuracy: the STARD
initiative. BMJ 2003;326:41–4.
2. WM Dunne et al. Clinical microbiology in the year 2025. J Clin
3. AR Feinstein. Misguided efforts and future challenges for research
on “diagnostic tests”. J Epidemiol Comm Health 2002; 56:330-2
4. KGM Moons and DE Grobbee. Diagnostic studies as multivariable
prediction research. J Epidemiol Comm Health 2002; 56:337-8.
5. BCK Choi. Future challenges for diagnostic research: striking a
balance between simplicity and complexity. J Epidemiol Comm Health 2002;
6. Clinicians for the Restoration of Autonomous Practice (CRAP)
Writing Group. EBM: unmasking the ugly truth. BMJ 2002;325:1496–8.
7. The STARD steering group. L’esposizione completa e rigorosa degli
studi di accuratezza diagnostica: l’Iniziativa STARD (
8. PM Bossuyt, et al. The STARD Statement for reporting studies of
diagnostic accuracy: explanation and elaboration. Clinical Chemistry
9. G Giocoli et al. È necessario migliorare la qualità espositiva
degli articoli scientifici. L’iniziativa STARD. Microbiologia Medica 2002;
17: 275-6 (Poster XXXI Congr. AMCLI).
10. G.Giocoli. L’iniziativa STARD per una corretta valutazione
dell’accuratezza dei test diagnostici. Microbiologia Medica 2003 (to be
Competing interests: No competing interests