Elsevier

European Journal of Radiology

Volume 93, August 2017, Pages 59-64
European Journal of Radiology

Research papers
Treatment of multiple test readers in diagnostic accuracy systematic reviews-meta-analyses of imaging studies

https://doi.org/10.1016/j.ejrad.2017.05.032Get rights and content

Highlights

  • Imaging systematic reviews rarely report how they handle multiple reader data.

  • Most imaging systematic reviews include studies with more than one reader.

  • Imaging systematic reviews use different strategies for handling multiple reader data.

  • Optimal method for handling multiple reader data imaging reviews is presently unknown.

  • Development of a two-level hierarchical model may be necessary to address this.

Abstract

Objective

To evaluate the handling of multiple readers in imaging diagnostic accuracy systematic reviews-meta-analyses.

Methods

Search was performed for imaging diagnostic accuracy systematic reviews that performed meta-analysis from 2005–2015. Handling of multiple readers was classified as: 1) averaged; 2) ‘best’ reader; 3) ‘most experienced’ reader; 4) each reader counted individually; 5) random; 6) other; 7) not specified. Incidence and reporting of multiple reader data was assessed in primary diagnostic accuracy studies that were included in a random sample of reviews.

Results

Only 28/296 (9.5%) meta-analyses specified how multiple readers were handled: 7/28 averaged results, 2/28 included the best reader, 14/28 treated each reader as a separate data set, 1/28 randomly selected a reader, 4/28 used other methods. Sample of 27/268 ‘not specified’ reviews generated 442 primary studies. 270/442 (61%) primary studies had multiple readers: 164/442 (37%) reported consensus reading, 87/442 (20%) reported inter-observer variability, 9/442 (2%) reported independent datasets for each reader. 26/27 (96%) meta-analyses contained at least one primary study with multiple readers.

Conclusions

Reporting how multiple readers were treated in imaging systematic reviews-meta-analyses is uncommon and method used varied widely. This may result from a lack of guidance, unavailability of appropriate statistical methods for handling multiple readers in meta-analysis, and sub-optimal primary study reporting.

Introduction

Many studies that evaluate the diagnostic accuracy of imaging modalities contain multiple readers, which means that more than one physician interprets each examination. This is commonly done to assess inter-observer variability, or to examine the impact of reader experience or expertise on diagnostic accuracy. Multiple independent readers are preferred to a single reader, as a single reader may have expertise that is difficult for others to reproduce. Reporting measures of inter-observer variability can help assess the generalizability of a diagnostic test to clinical practice [1]. However, the presence of multiple readers in primary diagnostic accuracy studies presents unique challenges when researchers try to synthesize the available evidence, such as in systematic reviews of imaging studies and corresponding meta-analyses [2], [3], [4].

When faced with such a challenge, authors of systematic reviews-meta-analyses seem to have several options, which include: 1) use an average of the diagnostic accuracy results across readers within a study [5]; 2) select the ‘best’ reader within a study (i.e. the reader that reached highest accuracy) [6]; 3) select the ‘most experienced’ reader within a study (i.e. most years of clinical experience) [7]; 4) count each reader within a study as an ‘individual study’[8]; or 5) randomly select one reader within a study [9]. At present, there are no recommendations regarding which strategy is optimal [10], [11], [12].

Each of these strategies may have its disadvantages. If results of multiple readers are averaged, heterogeneity from inter-observer variability may be minimized. However, this strategy may be less impactful when inter-observer agreement is high. Therefore if results are averaged, reporting inter-observer variability is of crucial importance to evaluate the meta-analysis results. If only the best or most experienced readers are selected, the test’s diagnostic accuracy may be overestimated and not reflect what is achievable in daily practice. These practices may have a less pronounced effect on accuracy estimates if all readers have similar performance, but will likely consistently overestimate test accuracy. If each reader is treated as an individual study, the results of a single study will be over-represented in the sample, the biases inherent to the study design will be magnified in the pooled results, and additional statistical challenges will occur due to the paired nature of the data; no straightforward statistical solutions currently exist for this. When choosing a reader at random, there is a risk of bias due to sampling error. If you were to randomly choose all the best or worst readers for your systematic review-meta-analysis the results would be biased.

The purpose of our study was to evaluate the current reporting of how multiple reader data is handled in systematic reviews-meta-analyses of diagnostic accuracy studies in imaging.

Section snippets

Identification of systematic reviews-meta-analyses

Medline was searched through PubMed, applying the database’s systemic review filter, combined with a previously published search filter for diagnostic accuracy studies (Appendix 1 in Supplementary data) [14], [15]. The searches were restricted to “radiology, nuclear medicine & medical imaging” journals, as defined by Thomson Reuters’ Journal Citation Reports [16]. A list of the 127 included journals is available in Appendix 2 in Supplementary data. Searches were performed on May 31, 2015, and

Identification of systematic reviews-meta-analyses

The literature search yielded 839 articles. After screening titles and abstracts, 532 articles were excluded, as they did not meet the inclusion criteria. Full texts of the remaining 307 articles were retrieved and assessed for inclusion. Eleven articles were excluded at the full text level for reasons specified in Fig. 1 and Appendix 3 in Supplementary data, yielding 296 systematic reviews for the study, published in 45 different journals. Summary demographic data of included reviews are

Discussion

We identify that diagnostic accuracy systematic reviews-meta-analyses of imaging studies are infrequently reporting on methods for handling multiple readers, despite the fact that almost all systematic reviews-meta-analyses of imaging studies have to deal with primary studies that included multiple readers; this may be compounded by a lack of clear reporting of multiple reader data in the included primary diagnostic accuracy studies. Among the few systematic reviews-meta-analyses that do report

Conclusions

A minority of diagnostic accuracy systematic reviews-meta-analyses of imaging studies report on methods for handling multiple readers in primary studies; a minority of these primary studies report the data of individual multiple readers. For completeness of reporting and to assess clinical applicability of systematic review results, authors of systematic reviews-meta-analyses are encouraged to report the incidence and handling of multiple reader data. Authors of primary studies are encouraged

Conflicts of interest

The authors have no relevant conflicts of interest to declare.

Acknowledgements

We would like to acknowledge Mrs. Alexandra Davis, librarian at the Ottawa Hospital, for her assistance in the design of our search strategy and the retrieval of selected articles.

References (24)

  • W.L. Devillé et al.

    Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy

    J. Clin. Epidemiol.

    (2000)
  • N.A. Obuchowski et al.

    Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods

    Acad. Radiol.

    (2004)
  • D. Levine et al.

    Submissions to radiology: our top ten list of statistical errors

    Radiology

    (2009)
  • T. Hodgdon et al.

    Can quantitative CT texture analysis be used to differentiate fat-poor renal angiomyolipoma from renal cell carcinoma on unenhanced CT images?

    Radiology

    (2015)
  • N. Schieda et al.

    Diagnostic accuracy of segmental enhancement inversion for the diagnosis of renal oncocytoma using biphasic computed tomography (CT) and multiphase contrast-enhanced magnetic resonance imaging (MRI)

    Eur. Radiol.

    (2014)
  • N. Schieda et al.

    Intracellular lipid in papillary renal cell carcinoma (pRCC): T2 weighted (T2W) MRI and pathologic correlation

    Eur. Radiol.

    (2015)
  • Y.J. Lee et al.

    Hepatocellular carcinoma: diagnostic performance of multidetector CT and MR imaging—a systematic review and meta-analysis

    Radiology

    (2015)
  • M.C. de Jong et al.

    Diagnostic performance of stress myocardial perfusion imaging for coronary artery disease: a systematic review and meta-analysis

    Eur. Radiol.

    (2012)
  • S.K. Das et al.

    Usefulness of DWI in preoperative assessment of deep myometrial invasion in patients with endometrial carcinoma: a systematic review and meta-analysis

    Cancer Imaging

    (2014)
  • M. Dave et al.

    Primary sclerosing cholangitis: meta-analysis of diagnostic performance of MR cholangiopancreatography

    Radiology

    (2010)
  • A. Andreano et al.

    MR diffusion imaging for preoperative staging of myometrial invasion in patients with endometrial cancer: a systematic review and meta-analysis

    Eur. Radiol.

    (2014)
  • P.G.C. Macaskill et al.

    Chapter 10: analysing and presenting results

  • Cited by (40)

    • Completeness of Reporting of Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy (DTA) of Radiological Articles Based on the PRISMA-DTA Reporting Guideline

      2023, Academic Radiology
      Citation Excerpt :

      Reviewers in radiology research must direct authors to state explicitly how they performed data extraction and handled multiple definitions of the target conditions, multiple thresholds for test positivity, indeterminate test results, and grouping and comparing tests. The inclusion of multiple readers in a primary DTA study presents unique challenges to data synthesis, and few SRs and MAs on imaging studies have reported the treatment of multiple readers (36). Our results were consistent with a previous study.

    View all citing articles on Scopus
    View full text