STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies

Incomplete reporting has been identified as a major source of avoidable waste in biomedical research. Essential information is often not provided in study reports, impeding the identification, critical appraisal, and replication of studies. To improve the quality of reporting of diagnostic accuracy studies, the Standards for Reporting Diagnostic Accuracy (STARD) statement was developed. Here we present STARD 2015, an updated list of 30 essential items that should be included in every report of a diagnostic accuracy study. This update incorporates recent evidence about sources of bias and variability in diagnostic accuracy and is intended to facilitate the use of STARD. As such, STARD 2015 may help to improve completeness and transparency in reporting of diagnostic accuracy studies.

As researchers, we talk and write about our studies, not just because we are happy-or disappointed-with the findings, but also to allow others to appreciate the validity of our methods, to enable our colleagues to replicate what we did, and to disclose our findings to clinicians, other health care professionals, and decision makers, all of whom rely on the results of strong research to guide their actions.
Unfortunately, deficiencies in the reporting of research have been highlighted in several areas of clinical medicine. 1 Essential elements of study methods are often poorly described and sometimes completely omitted, making both critical appraisal and replication difficult, if not impossible. Sometimes study results are selectively reported, and other times researchers cannot resist unwarranted optimism in interpretation of their findings. [2][3][4] These practices limit the value of the research and any downstream products or activities, such as systematic reviews and clinical practice guidelines.
Reports of studies of medical tests are no exception. A growing number of evaluations have identified deficiencies in the reporting of test accuracy studies. 5 These are studies in which a test is evaluated against a clinical reference standard, or gold standard; the results are typically reported as estimates of the test's sensitivity and specificity, which express how good the test is in correctly identifying patients as having the target condition. Other accuracy statistics can be used as well, such as the area under the receiver operating characteristics (ROC) curve or positive and negative predictive values.
Despite their apparent simplicity, such studies are at risk of bias. 6 7 If not all patients undergoing testing are included in the final analysis, for example, or if only healthy controls are included, the estimates of test accuracy may not reflect the performance of the test in clinical applications. Yet such crucial information is often missing from study reports.
It is now well established that sensitivity and specificity are not fixed test properties. The relative number of false positive and false negative test results varies across settings, depending on how patients present and which tests they have already undergone. Unfortunately, many authors also fail to completely report the clinical context and when, where, and how they identified and recruited eligible study participants. 8 In addition, sensitivity and specificity estimates can differ because of variable definitions of the reference standard against which the test is being compared. Thus this information should be available in the study report.

The 2003 STARD statement
To assist in the completeness and transparency of reporting diagnostic accuracy studies, a group of researchers, editors, and other stakeholders developed a minimum list of essential items that should be included in every study report. The guiding principle for developing the list was to select items that, if described, would help readers to judge the potential for bias in the study and appraise the applicability of the study findings and the validity of the authors' conclusions and recommendations.
The resulting Standards for Reporting Diagnostic Accuracy (STARD) statement appeared in 2003 in two dozen journals. 9 It was accompanied by editorials and commentaries in several other publications and endorsed by many more.
Since the publication of STARD, several evaluations have pointed to small but statistically significant improvements in reporting accuracy studies (mean gain 1.4 items (95% confidence interval 0.7 to 2.2)). 5 10 Gradually, more of the essential items are being reported, but the situation remains far from optimal.

Methods for developing STARD 2015
The STARD steering committee periodically reviews the literature for potentially relevant studies to inform a possible update. In 2013, the steering committee decided that the time was right to update the checklist.
Updating had two major goals: first, to incorporate recent evidence about sources of bias, applicability concerns, and factors facilitating generous interpretation in test accuracy research, and, second, to make the list easier to use. In making modifications, we also considered harmonization with other reporting guidelines, such as Consolidated Standards of Reporting Trials (CONSORT) 2010. 11 A complete description of the updating process and the justification for the changes are available on the Enhancing the Quality and Transparency of Health Research (EQUATOR) website at www.equator-network.org/reporting-guidelines/stard. In short, we invited the 2003 STARD group members to participate in the updating process, nominate new members, and comment on the general scope of the update. Suggested new members were contacted. As a result, the STARD group has now grown to 85 members that include researchers, editors, journalists, evidence synthesis professionals, funders, and other stakeholders.
STARD group members were then asked to suggest, and later to endorse, proposed changes in a two round, web based survey. This served to prepare a draft list of essential items, which was discussed in the steering committee in a two day meeting in Amsterdam in September 2014. The list was then piloted in different groups: starting and advanced researchers, peer reviewers, and editors.
The general structure of STARD 2015 is similar to that of STARD 2003. A one page document presents 30 items, grouped under sections that follow the introduction, methods, results, and discussion (IMRAD) structure of a scientific article (see

STARD 2015 replaces the original version published in 2003;
those who would like to refer to STARD are invited to cite this article. The list of essential items can be seen as a minimum set, and an informative study report will typically present more information. Yet we hope to find all applicable items in a well prepared report of a diagnostic accuracy study.
Authors are invited to use STARD when preparing their study reports. Reviewers can use the list to verify that all essential information is available in a submitted manuscript and suggest changes if key items are missing.
We trust that journals that endorsed STARD in 2003 or later will recommend the use of this updated version and encourage compliance in submitted manuscripts. We hope that even more journals, and journal organizations, will promote the use of this and comparable reporting guidelines. Funders and research institutions may promote or mandate adherence to STARD as a way to maximize the value of research and downstream products or activities.
STARD may also be beneficial for reporting other studies that evaluate the performance of tests. This includes prognostic studies, which can classify patients on the basis of whether a future event happens; monitoring studies, in which tests are supposed to detect or predict an adverse event or lack of response; studies evaluating treatment selection markers; and more. We and others have found most of the STARD items useful when reporting and examining such studies, although STARD primarily targets diagnostic accuracy studies.
Diagnostic accuracy is not the only expression of test performance, nor is it always the most meaningful. 12 Incremental accuracy from combining tests, relative to a single test, can be more informative, for example. 13 For continuous tests, dichotomization into test positives and negatives may not always be indicated. In such cases, the desirable computational and graphical methods for expressing test performance are different, although many of the methodological precautions would be the Although STARD focuses on full study reports of test accuracy studies, the items can also be helpful when writing conference abstracts, including information in trial registries, and developing protocols for such studies. Additional initiatives are underway to provide more specific guidance for each of these applications.

STARD extensions and applications
The STARD statement was designed to apply to all types of medical tests. The STARD group believed that a single checklist, for all diagnostic accuracy studies, would be more widely disseminated and more easily accepted by authors, peer reviewers, and journal editors than separate lists for different types of tests such as imaging, biochemistry, or histopathology.
Having a general list may necessitate additional instructions for informative reporting, with more information for specific types of tests, specific applications, or specific forms of analysis. Such guidance could describe the preferred methods for studying and reporting measurement uncertainty, for example, without changing any of the other STARD items. The STARD group welcomes the development of such STARD extensions and invites interested groups to contact the STARD executive committee before developing them.
Other groups may want to develop additional guidance to facilitate the use of STARD for specific applications. An example of such a STARD application was prepared for history taking and physical examination. 15 Another type of application is the use of STARD for specific target conditions such as dementia. 16

Availability
The new STARD 2015 list and all related documents can be found on the STARD pages of the EQUATOR website. EQUATOR is an international initiative that seeks to improve the value of published health research literature by promoting transparent and accurate reporting and wider use of robust reporting guidelines. 17 18 The STARD group believes that working more closely with EQUATOR and other reporting guideline developers will help us to better reach shared objectives. We have updated the 2003 explanation and elaboration document, which can also be found at the EQUATOR website. This document explains the rationale for each item and gives examples.
The STARD list is released under a Creative Commons license. This allows everyone to use and distribute the work if they acknowledge the source. The STARD statement was originally reported in English, but several groups have worked on translations in other languages. We welcome such translations, which are preferably developed by groups of researchers, by use of a cyclical development process, with back-translation to the original language and user testing. 19 We have also applied for a trademark for STARD to ensure that the steering committee has the exclusive right to use the word "STARD" to identify goods or services.

Increasing value, reducing waste
The STARD steering committee is aware that building a list of essential items is not sufficient to achieve substantial improvements in reporting completeness, as the modest improvement after introduction of the 2003 list has shown. We see this list not as the final product, but as the starting point for building more specific instruments to stimulate complete and transparent reporting, such as a checklist and a writing aid for authors, tools for reviewers and editors, instruction videos, and teaching materials, all based on this STARD list of essential items.
Incomplete reporting has been identified as one of the sources of avoidable waste in biomedical research. 1 Since STARD was initiated, several other initiatives have been undertaken to enhance the reproducibility of research and promote greater transparency. 20 Multiple factors are at stake, but incomplete reporting is one of them. We hope that this update of STARD, together with additional implementation initiatives, will help authors, editors, reviewers, readers, and decision makers to collect, appraise, and apply the evidence needed to strengthen decisions and recommendations about medical tests. In the end, we are all to benefit from more informative and transparent reporting: as researchers, as healthcare professionals, as payers, and as patients.

Introduction
Scientific and clinical background, including the intended use and clinical role of the index test 3

Methods
Whether data collection was planned before the index test and reference standard were performed (prospective study) or after (retrospective study) 5 Study design

Eligibility criteria 6 Participants
On what basis potentially eligible participants were identified (such as symptoms, results from previous tests, inclusion in registry) 7 Where and when potentially eligible participants were identified (setting, location, and dates) 8 Whether participants formed a consecutive, random, or convenience series 9 Index test, in sufficient detail to allow replication 10a Test methods

Reference standard, in sufficient detail to allow replication 10b
Rationale for choosing the reference standard (if alternatives exist) 11 Definition of and rationale for test positivity cut-offs or result categories of the index test, distinguishing pre-specified from exploratory How missing data on the index test and reference standard were handled 16 Any analyses of variability in diagnostic accuracy, distinguishing pre-specified from exploratory 17 Intended sample size and how it was determined 18

Flow of participants, using a diagram 19 Participants
Baseline demographic and clinical characteristics of participants 20 Distribution of severity of disease in those with the target condition 21a Distribution of alternative diagnoses in those without the target condition 21b

Time interval and any clinical interventions between index test and reference standard 22
Cross tabulation of the index test results (or their distribution) by the results of the reference standard 23 Test results

Estimates of diagnostic accuracy and their precision (such as 95% confidence intervals) 24
Any adverse events from performing the index test or the reference standard 25

Discussion
Study limitations, including sources of potential bias, statistical uncertainty, and generalisability 26 Implications for practice, including the intended use and clinical role of the index test 27

Other information
Registration number and name of registry 28 Where the full study protocol can be accessed 29 Sources of funding and other support; role of funders 30 Open Access: Reuse allowed Subscribe: http://www.bmj.com/subscribe  Abstracts are increasingly used to identify key elements of study design and results. Structured abstract 2 Describing the targeted application of the test helps readers to interpret the implications of reported accuracy estimates.

Intended use and clinical role of the test 3
Not having a specific study hypothesis may invite generous interpretation of the study results and "spin" in the conclusions.

Study hypotheses 4
Readers want to appreciate the anticipated precision and power of the study and whether authors were successful in recruiting the targeted number of participants.

Sample size 18
To prevent jumping to unwarranted conclusions, authors are invited to discuss study limitations and draw conclusions keeping in mind the targeted application of the evaluated tests (see item 3).

Structured discussion 26-27
Prospective test accuracy studies are trials, and, as such, they can be registered in clinical trial registries, such as ClinicalTrials.gov, before their initiation, facilitating identification of their existence and preventing selective reporting.

Registration 28
The full study protocol, with more information about the predefined study methods, may be available elsewhere, to allow more fine grained critical appraisal.

Protocol 29
Awareness of the potentially compromising effects of conflicts of interest between researchers' obligations to abide by scientific and ethical principles and other goals, such as financial ones; test accuracy studies are no exception.

Figure
Prototypical STARD diagram to report flow of participants through the study.