STARD 2015: an updated list of essential items for reporting diagnostic accuracy studiesBMJ 2015; 351 doi: https://doi.org/10.1136/bmj.h5527 (Published 28 October 2015) Cite this as: BMJ 2015;351:h5527
- Patrick M Bossuyt1,
- Johannes B Reitsma2,
- David E Bruns3,
- Constantine A Gatsonis4,
- Paul P Glasziou5,
- Les Irwig6,
- Jeroen G Lijmer7,
- David Moher89,
- Drummond Rennie1011,
- Henrica C W de Vet12,
- Herbert Y Kressel1314,
- Nader Rifai1516,
- Robert M Golub1718,
- Douglas G Altman19,
- Lotty Hooft20,
- Daniël A Korevaar1,
- Jérémie F Cohen121
- for the STARD Group
- 1Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, University of Amsterdam, Amsterdam, the Netherlands
- 2Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, University of Utrecht, Utrecht, the Netherlands
- 3Department of Pathology, University of Virginia School of Medicine, Charlottesville, VA, USA
- 4Center for Statistical Sciences, Brown University School of Public Health, Providence, RI, USA
- 5Centre for Research in Evidence-Based Practice, Faculty of Health Sciences and Medicine, Bond University, Gold Coast, Queensland, Australia
- 6Screening and Diagnostic Test Evaluation Program, School of Public Health, University of Sydney, Sydney, New South Wales, Australia
- 7Department of Psychiatry, Onze Lieve Vrouwe Gasthuis, Amsterdam, the Netherlands
- 8Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada
- 9School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Canada
- 10Peer Review Congress, Chicago, IL, USA
- 11Philip R Lee Institute for Health Policy Studies, University of California, San Francisco, CA, USA
- 12Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
- 13Department of Radiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
- 14Radiology Editorial Office, Boston, MA, USA
- 15Department of Laboratory Medicine, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- 16Clinical Chemistry Editorial Office, Washington, DC, USA
- 17Division of General Internal Medicine and Geriatrics and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- 18JAMA Editorial Office, Chicago, IL, USA
- 19Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
- 20Dutch Cochrane Centre, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, University of Utrecht, Utrecht, the Netherlands
- 21INSERM UMR 1153 and Department of Pediatrics, Necker Hospital, AP-HP, Paris Descartes University, Paris, France.
- Correspondence to: P M Bossuyt
- Accepted 18 September 2015
As researchers, we talk and write about our studies, not just because we are happy—or disappointed—with the findings, but also to allow others to appreciate the validity of our methods, to enable our colleagues to replicate what we did, and to disclose our findings to clinicians, other health care professionals, and decision makers, all of whom rely on the results of strong research to guide their actions.
Unfortunately, deficiencies in the reporting of research have been highlighted in several areas of clinical medicine.1 Essential elements of study methods are often poorly described and sometimes completely omitted, making both critical appraisal and replication difficult, if not impossible. Sometimes study results are selectively reported, and other times researchers cannot resist unwarranted optimism in interpretation of their findings.2 3 4 These practices limit the value of the research and any downstream products or activities, such as systematic reviews and clinical practice guidelines.
Reports of studies of medical tests are no exception. A growing number of evaluations have identified deficiencies in the reporting of test accuracy studies.5 These are studies in which a test is evaluated against a clinical reference standard, or gold standard; the results are typically reported as estimates of the test’s sensitivity and specificity, which express how good the test is in correctly identifying patients as having the target condition. Other accuracy statistics can be used as well, such as the area under the receiver operating characteristics (ROC) curve or positive and negative predictive values.
Despite their apparent simplicity, such studies are at risk of bias.6 7 If not all patients undergoing testing are included in the final analysis, for example, or if only healthy controls are included, the estimates of test accuracy may not reflect the performance of the test in clinical applications. Yet such crucial information is often missing from study reports.
It is now well established that sensitivity and specificity are not fixed test properties. The relative number of false positive and false negative test results varies across settings, depending on how patients present and which tests they have already undergone. Unfortunately, many authors also fail to completely report the clinical context and when, where, and how they identified and recruited eligible study participants.8 In addition, sensitivity and specificity estimates can differ because of variable definitions of the reference standard against which the test is being compared. Thus this information should be available in the study report.
The 2003 STARD statement
To assist in the completeness and transparency of reporting diagnostic accuracy studies, a group of researchers, editors, and other stakeholders developed a minimum list of essential items that should be included in every study report. The guiding principle for developing the list was to select items that, if described, would help readers to judge the potential for bias in the study and appraise the applicability of the study findings and the validity of the authors’ conclusions and recommendations.
The resulting Standards for Reporting Diagnostic Accuracy (STARD) statement appeared in 2003 in two dozen journals.9 It was accompanied by editorials and commentaries in several other publications and endorsed by many more.
Since the publication of STARD, several evaluations have pointed to small but statistically significant improvements in reporting accuracy studies (mean gain 1.4 items (95% confidence interval 0.7 to 2.2)).5 10 Gradually, more of the essential items are being reported, but the situation remains far from optimal.
Methods for developing STARD 2015
The STARD steering committee periodically reviews the literature for potentially relevant studies to inform a possible update. In 2013, the steering committee decided that the time was right to update the checklist.
Updating had two major goals: first, to incorporate recent evidence about sources of bias, applicability concerns, and factors facilitating generous interpretation in test accuracy research, and, second, to make the list easier to use. In making modifications, we also considered harmonization with other reporting guidelines, such as Consolidated Standards of Reporting Trials (CONSORT) 2010.11
A complete description of the updating process and the justification for the changes are available on the Enhancing the Quality and Transparency of Health Research (EQUATOR) website at www.equator-network.org/reporting-guidelines/stard. In short, we invited the 2003 STARD group members to participate in the updating process, nominate new members, and comment on the general scope of the update. Suggested new members were contacted. As a result, the STARD group has now grown to 85 members that include researchers, editors, journalists, evidence synthesis professionals, funders, and other stakeholders.
STARD group members were then asked to suggest, and later to endorse, proposed changes in a two round, web based survey. This served to prepare a draft list of essential items, which was discussed in the steering committee in a two day meeting in Amsterdam in September 2014. The list was then piloted in different groups: starting and advanced researchers, peer reviewers, and editors.
The general structure of STARD 2015 is similar to that of STARD 2003. A one page document presents 30 items, grouped under sections that follow the introduction, methods, results, and discussion (IMRAD) structure of a scientific article (see table 1⇓). Several of the STARD 2015 items are identical to the ones in the 2003 version. Others have been reworded, combined, or (if complex) split. A few have been added (see table 2⇓ for a summary of new items and table 3⇓ for key terms). A diagram to describe the flow of participants through the study is now expected in all reports (figure⇓).
STARD 2015 replaces the original version published in 2003; those who would like to refer to STARD are invited to cite this article. The list of essential items can be seen as a minimum set, and an informative study report will typically present more information. Yet we hope to find all applicable items in a well prepared report of a diagnostic accuracy study.
Authors are invited to use STARD when preparing their study reports. Reviewers can use the list to verify that all essential information is available in a submitted manuscript and suggest changes if key items are missing.
We trust that journals that endorsed STARD in 2003 or later will recommend the use of this updated version and encourage compliance in submitted manuscripts. We hope that even more journals, and journal organizations, will promote the use of this and comparable reporting guidelines. Funders and research institutions may promote or mandate adherence to STARD as a way to maximize the value of research and downstream products or activities.
STARD may also be beneficial for reporting other studies that evaluate the performance of tests. This includes prognostic studies, which can classify patients on the basis of whether a future event happens; monitoring studies, in which tests are supposed to detect or predict an adverse event or lack of response; studies evaluating treatment selection markers; and more. We and others have found most of the STARD items useful when reporting and examining such studies, although STARD primarily targets diagnostic accuracy studies.
Diagnostic accuracy is not the only expression of test performance, nor is it always the most meaningful.12 Incremental accuracy from combining tests, relative to a single test, can be more informative, for example.13 For continuous tests, dichotomization into test positives and negatives may not always be indicated. In such cases, the desirable computational and graphical methods for expressing test performance are different, although many of the methodological precautions would be the same, and STARD can help in reporting the study in an informative way. Other reporting guidelines target more specific forms of tests, such as Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) for multivariable prediction models.14
Although STARD focuses on full study reports of test accuracy studies, the items can also be helpful when writing conference abstracts, including information in trial registries, and developing protocols for such studies. Additional initiatives are underway to provide more specific guidance for each of these applications.
STARD extensions and applications
The STARD statement was designed to apply to all types of medical tests. The STARD group believed that a single checklist, for all diagnostic accuracy studies, would be more widely disseminated and more easily accepted by authors, peer reviewers, and journal editors than separate lists for different types of tests such as imaging, biochemistry, or histopathology.
Having a general list may necessitate additional instructions for informative reporting, with more information for specific types of tests, specific applications, or specific forms of analysis. Such guidance could describe the preferred methods for studying and reporting measurement uncertainty, for example, without changing any of the other STARD items. The STARD group welcomes the development of such STARD extensions and invites interested groups to contact the STARD executive committee before developing them.
Other groups may want to develop additional guidance to facilitate the use of STARD for specific applications. An example of such a STARD application was prepared for history taking and physical examination.15 Another type of application is the use of STARD for specific target conditions such as dementia.16
The new STARD 2015 list and all related documents can be found on the STARD pages of the EQUATOR website. EQUATOR is an international initiative that seeks to improve the value of published health research literature by promoting transparent and accurate reporting and wider use of robust reporting guidelines.17 18 The STARD group believes that working more closely with EQUATOR and other reporting guideline developers will help us to better reach shared objectives. We have updated the 2003 explanation and elaboration document, which can also be found at the EQUATOR website. This document explains the rationale for each item and gives examples.
The STARD list is released under a Creative Commons license. This allows everyone to use and distribute the work if they acknowledge the source. The STARD statement was originally reported in English, but several groups have worked on translations in other languages. We welcome such translations, which are preferably developed by groups of researchers, by use of a cyclical development process, with back-translation to the original language and user testing.19 We have also applied for a trademark for STARD to ensure that the steering committee has the exclusive right to use the word “STARD” to identify goods or services.
Increasing value, reducing waste
The STARD steering committee is aware that building a list of essential items is not sufficient to achieve substantial improvements in reporting completeness, as the modest improvement after introduction of the 2003 list has shown. We see this list not as the final product, but as the starting point for building more specific instruments to stimulate complete and transparent reporting, such as a checklist and a writing aid for authors, tools for reviewers and editors, instruction videos, and teaching materials, all based on this STARD list of essential items.
Incomplete reporting has been identified as one of the sources of avoidable waste in biomedical research.1 Since STARD was initiated, several other initiatives have been undertaken to enhance the reproducibility of research and promote greater transparency.20 Multiple factors are at stake, but incomplete reporting is one of them. We hope that this update of STARD, together with additional implementation initiatives, will help authors, editors, reviewers, readers, and decision makers to collect, appraise, and apply the evidence needed to strengthen decisions and recommendations about medical tests. In the end, we are all to benefit from more informative and transparent reporting: as researchers, as healthcare professionals, as payers, and as patients.
Cite this as: BMJ 2015;351:h5527
This article is being simultaneously published in October 2015 by The BMJ, Radiology, and Clinical Chemistry. This article is published under the Creative Commons CC BY-NC license http://creativecommons.org/licenses/by-nc/4.0.
STARD Group collaborators: Todd Alonzo, Douglas G Altman, Augusto Azuara-Blanco, Lucas Bachmann, Jeffrey Blume, Patrick M Bossuyt, Isabelle Boutron, David Bruns, Harry Büller, Frank Buntinx, Sarah Byron, Stephanie Chang, Jérémie F Cohen, Richelle Cooper, Joris de Groot, Henrica C W de Vet, Jon Deeks, Nandini Dendukuri, Jac Dinnes, Kenneth Fleming, Constantine A Gatsonis, Paul P Glasziou, Robert M Golub, Gordon Guyatt, Carl Heneghan, Jørgen Hilden, Lotty Hooft, Rita Horvath, Myriam Hunink, Chris Hyde, John Ioannidis, Les Irwig, Holly Janes, Jos Kleijnen, André Knottnerus, Daniël A Korevaar, Herbert Y Kressel, Stefan Lange, Mariska Leeflang, Jeroen G Lijmer, Sally Lord, Blanca Lumbreras, Petra Macaskill, Erik Magid, Susan Mallett, Matthew McInnes, Barbara McNeil, Matthew McQueen, David Moher, Karel Moons, Katie Morris, Reem Mustafa, Nancy Obuchowski, Eleanor Ochodo, Andrew Onderdonk, John Overbeke, Nitika Pai, Rosanna Peeling, Margaret Pepe, Steffen Petersen, Christopher Price, Philippe Ravaud, Johannes B. Reitsma, Drummond Rennie, Nader Rifai, Anne Rutjes, Holger Schunemann, David Simel, Iveta Simera, Nynke Smidt, Ewout Steyerberg, Sharon Straus, William Summerskill, Yemisi Takwoingi, Matthew Thompson, Ann van de Bruel, Hans van Maanen, Andrew Vickers, Gianni Virgili, Stephen Walter, Wim Weber, Marie Westwood, Penny Whiting, Nancy Wilczynski, Andreas Ziegler.
Contributors: All authors confirm they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Funding: There was no explicit funding for the development of STARD 2015. The Academic Medical Center of the University of Amsterdam, the Netherlands, partly funded the meeting of the STARD steering group but had no influence on the development or dissemination of the list of essential items. STARD steering group members and STARD group members covered additional personal costs individually.
Competing interests: All authors have completed the Clinical Chemistry author disclosure form: N Rifai works for Clinical Chemistry, AACC; C A Gatsonisis a member of RSNA Research Development Committee.
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/.