Original Article
Development of the RTI item bank on risk of bias and precision of observational studies

https://doi.org/10.1016/j.jclinepi.2011.05.008Get rights and content

Abstract

Objective

To create a practical and validated item bank for evaluating the risk of bias and precision of observational studies of interventions or exposures included in systematic evidence reviews.

Study Design and Setting

The item bank, developed at RTI International, was created based on 1,492 questions included in earlier instruments, organized by the quality domains identified by Deeks et al. Items were eliminated and refined through face validity, cognitive, content validity, and interrater reliability testing.

Results

The resulting item bank consisting of 29 questions for evaluating the risk of bias and precision of observational studies of interventions or exposures (1) captures all of the domains critical for evaluating this type of research, (2) is comprehensive and can be easily lifted “off the shelf” by different researchers, (3) can be adapted to different topic areas and study types (e.g., cohort, case–control, cross-sectional, and case series studies), and (4) provides sufficient instruction to apply the tool to varied topics.

Conclusion

One bank of items, with specific instructions for focusing abstractor evaluations, can be created to judge the risk of bias and precision of the variety of observational studies that may be used in systematic and comparative effectiveness reviews.

Introduction

What is new?

Key finding

  1. We created and validated an item bank, entitled the “RTI item bank,” to evaluate risk of bias and precision for observational studies of interventions or exposures included in systematic literature reviews. It accommodates a variety of observational study design types, including studies with controls (cohort and case–control) and without controls that rely on changes or differences in exposure (cross-sectional and case series).

What this adds to what was known?
  1. No gold standard exists for evaluating the risk of bias of observational studies. Existing tools require modification or may not be applicable for specific designs such as cross-sectional or case series. In practice, review groups often develop their own critical appraisal tool. These ad hoc tools may lack validated questions and adequate instructions for reviewers, leading to inconsistent evaluations within and across reviews.

  2. We created a practical and validated item bank for evaluating the conduct of observational studies of interventions or exposures that (1) is comprehensive, capturing all of the risk of bias and precision domains critical for evaluating this type of research; (2) can be easily adapted to different topic areas and study types (e.g., cohort, case–control, cross-sectional, and case series studies); and (3) provides instruction to assist reviewers in creating and applying the best tool for varied topics.

What is the implication, what should change now?
  1. Systematic reviewers should adopt validated tools that enable greater transparency and consistency in evaluating risk of bias and precision of observational studies. The RTI item bank is one such tool.

In the past decade, the number of publications included in PubMed has increased at an average annual rate of nearly 6% from 467,364 citations in 1998 to 816,597 in 2008. This steady expansion in the volume of published studies increases the complexity and variability of information that policy makers, clinicians, and patients need to evaluate to make informed health care choices. Systematic reviews that compare interventions play a key role in synthesizing the evidence [1]. The assessment of the design and conduct of individual studies is central to this synthesis and is routinely used for interpreting results and grading the strength of the body of evidence. Systematic reviewers may also use these assessments to select studies for the review, meta-analysis, and interpreting heterogeneous findings [2].

Although well-designed and well-implemented randomized controlled trials (RCTs) have long been considered the gold standard for evidence, they frequently cannot answer all relevant clinical questions. RCTs may be unethical [3], limited in their ability to address harms because of limited size or length of follow-up [4], or lack of applicability to vulnerable subpopulations [5]. Observational studies (lacking randomization, allocation concealment, blinding of participants and interventionists, and in some instances, control groups) may fill these gaps, but the trade-off is a wider range of sources of bias, including potential biases in selection, performance, detection of effects, and attrition; these biases have the potential to alter effect sizes unpredictably [6], [7].

The inclusion of non-RCT studies in systematic reviews requires validated tools to assess the likelihood of bias. Approaches to critical appraisal of study methodology and related terminology have varied and are evolving. Overlapping terms include quality, internal validity, risk of bias, or study limitations, but a central goal is an assessment of the believability of the findings. We use the phrase “assessment of risk of bias and precision” as the most representative of the goal of evaluating the degree to which the effects reported by the study represent the “true” causal relationship between exposure and outcome, that is, the accuracy of the estimation. The accuracy of an estimate depends on its validity (the absence of bias or systematic error in selection, performance, detection, measurement, attrition, and reporting and adequacy in addressing potential confounders) and precision (the absence of random error through adequate study size and study efficiency) [8]. Thorough assessment of these threats to the validity and precision of an estimate is critical to understanding the believability of a study.

Table 1 presents a taxonomy and description of threats to validity and precision, drawing on two well-cited sources: the Cochrane Handbook for Systematic Reviews of Interventions [7] and Modern Epidemiology [8].

Several reviews of critical appraisal tools, including Deeks et al. [9] and West et al. [10], identified key quality domains but found no gold standard in evaluating quality [9], [10], [11], [12]. Deeks et al. reviewed quality appraisal tools for nonrandomized studies. Of 213 identified tools, only six [13], [14], [15], [16], [17], [18] met their criteria of evaluating six core elements of internal validity (creation of groups, comparability of groups at the analysis stage, allocation to intervention, similarity of groups for key prognostic characteristics by design, identification of prognostic factors, and the use of case-mix adjustment) and were specifically designed for use in systematic reviews [9]. These tools vary in the criteria covered [9] and their overall approach. Tools focus on either a description or reporting of methods (questions regarding whether authors reported a particular element of the study in a manuscript) or a judgment of risk of bias (questions regarding whether the conduct of the study altered the believability of results).

Existing tools also have other constraints. Some tools such as the Newcastle Ottawa Scale [14] are scales that rely mostly or entirely on uniform weights for all questions. The use of uniform weights may be difficult to justify in all contexts [7]; for example, if, for a particular topic, a single flaw substantially increases risk of bias. Tools may require modification or may not be applicable for specific designs such as cross-sectional or case series. In practice, the idiosyncrasies of topics require and often result in each review developing its own critical appraisal tool. These ad hoc tools may lack validated questions and adequate instruction for reviewers, leading to inconsistent evaluations within and across reviews.

Our objective was to create a practical and validated item bank for evaluating the conduct of observational studies of interventions or exposures that (1) is comprehensive, capturing all of the risk of bias and precision domains critical for evaluating this type of research; (2) can be easily adapted to different topic areas and study types (e.g., cohort, case–control, cross-sectional, and case series studies); and (3) provides instruction to assist reviewers in creating and applying the best tool for varied topics.

Our resulting risk of bias and precision item bank provides a means to assess threats to the accuracy of an estimate provided in a study and is applicable to evaluating

  • studies of interventions or exposures that lack random allocation to an intervention and rely on associations between changes or differences in exposure or interventions and changes or differences in an outcome of interest [19]. It is not designed to evaluate diagnostic studies.

  • a variety of observational study design types, including studies with controls (cohort and case–control) and without controls that rely on changes or differences in exposure (cross-sectional and case series) [20].

  • internal validity only and not external validity (applicability).

Although we did not test the reliability of our item bank for other study designs, we believe that it can be used for evaluating these studies as well, with some modifications. For instance, evaluations of quasi-experimental studies will need to add, in addition to questions from our item bank, questions from a validated RCT appraisal tools on allocation concealment and blinding of patients and interventionists. We anticipate that systematic review study directors (referred to as principal investigators [PIs]) will select specific items based on the needs of the review topic and the most likely potential sources of bias and threats to precision in the included studies.

As noted above, Deeks et al. [9, p23] identified two approaches to evaluating the quality of observational studies, focusing on either a description of methods (the evaluation of the “objective characteristics of each study's methods as they are described by the primary researchers”) or an evaluation of the risk of bias and threats to precision. Study appraisal based on risk-of-bias lists potential sources of bias (Table 1), relies heavily on judgment, and is supported by transparency in recording reasons for the judgment. One constraint of this approach is that threats to validity and precision can occur at various points in the study. Assessing these threats without explicit reference to methods used at each stage of research would require a relatively abstract evaluation and could result in poor interrater reliability. The alternative approach of “methods description” is easier to implement because methods for each stage of research tend to correspond well with how manuscripts are written. This approach relies less on reviewer judgment [9] but may fall short of evaluating believability. One solution, which we have adopted, uses both approaches, using the methods description for each stage of research as the primary framework to facilitate ease of review, but evaluating how the design and conduct of the study at that stage addresses threats to validity and precision. This approach requires the reviewer to judge risk of bias in the context of adequate reporting and description of methods. In developing our item bank, we identified questions relevant to each of the 12 “methods” domains identified by Deeks et al. [9]: (1) background/context, (2) sample definition and selection, (3) interventions/exposure, (4) outcomes, (5) creation of treatment groups, (6) blinding, (7) soundness of information, (8) follow-up, (9) analysis comparability, (10) analysis outcome, (11) interpretation, and (12) presentation and reporting. The item bank provides a tool for abstractors to review a manuscript to identify the risk of bias and threats to precision for these domains.

Section snippets

Methods

The project was conducted in two phases. The preliminary period, phase 1, resulted in the compilation of potential questions for the item bank. Phase 2 included face validity testing, cognitive testing, content validity testing, and interrater reliability testing. Also, to improve usability of the item bank, we reduced the number of questions to those relevant for evaluating risk of bias and precision in observational studies, eliminating questions regarding applicability or the conduct of

Compilation of potential items for item bank

During phase 1, we reviewed earlier instruments that had been used for evaluating the risk of bias and precision of observational studies and compiled 1,492 items that were available through the published literature and 84 [21], [22], [23], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67],

Discussion

With the increasing use of observational studies in evidence synthesis, systematic reviewers have a greater burden of evaluating the risk of bias of study results. Although this effort is essentially a subjective exercise, requiring judgments by a reviewer, it is the only means of evaluating the degree to which a study's results can be believed and is a critical step on the pathway to evaluating the strength of a body of evidence. The RTI Observational Studies Risk of Bias and Precision Item

References (126)

  • S.L. West et al.

    Systems to rate the strength of scientific evidence. Evidence Report/Technology Assessment No. 47 (Prepared by the Research Triangle Institute-University of North Carolina Evidence-based Practice Center under Contract No. 290-97-0011)

    (2002)
  • S. Sanderson et al.

    Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography

    Int J Epidemiol

    (2007)
  • P. Katrak et al.

    A systematic review of the content of critical appraisal tools

    BMC Med Res Methodol

    (2004)
  • Thomas H. Quality assessment tool for quantitative studies. Effective Public Health Practice Project. Toronto, Canada:...
  • Wells G, Shay B, O'Connell D, Peterson J, Welch V, Losos M, et al. The Newcastle-Ottawa Scale (NOS) for assessing the...
  • S.H. Downs et al.

    The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions

    J Epidemiol Community Health

    (1998)
  • D.E. Cowley

    Prostheses for primary total hip replacement. A critical appraisal of the literature

    Int J Technol Assess Health Care

    (1995)
  • J.S. Reisch et al.

    Aid to the evaluation of therapeutic studies

    Pediatrics

    (1989)
  • D.F. Stroup et al.

    Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group

    JAMA

    (2000)
  • J.F. Peipert et al.

    Observational studies

    Clin Obstet Gynecol

    (1998)
  • T.J. Wilt et al.

    Comparison of endovascular and open surgical repairs for abdominal aortic aneurysm, structured abstract. Evidence Report/Technology Assessment No. 144 (Prepared by the University of Minnesota Evidence-based Practice Center under Contract No. 290-02-0009)

    (2006)
  • J. Lau et al.

    Evaluation of technologies for identifying acute cardiac ischemia in emergency departments. Evidence Report/Technology Assessment Number 26 (Prepared by The New England Medical Center Evidence-based Practice Center under Contract No. 290-97-0019)

    (2001)
  • M. Sharma et al.

    Acute stroke: evaluation and treatment. Evidence Report/Technology Assessment No. 127 (Prepared by the University of Ottawa Evidence-based Practice Center under Contract No. 290-02-0021)

    (2005)
  • A.R. Jadad et al.

    Treatment of attention-deficit/hyperactivity disorder. Evidence Report/Technology Assessment No. 11 (Prepared by McMaster University under Contract No. 290-97-0017)

    (1999)
  • E.R. Myers et al.

    Management of adnexal mass. Evidence Report/Technology Assessment No. 130 (Prepared by the Duke Evidence-based Practice Center under Contract No. 290-02-0025)

    (2006)
  • J. Lau et al.

    Management of clinically inapparent adrenal mass. Evidence Report/Technology Assessment No. 56 (Prepared by New England Medical Center Evidence-based Practice Center under Contract No. 290-97-0019)

    (2002)
  • R. Chou et al.

    Empirical evaluation of the association between methodological shortcomings and estimates of adverse events. Technical Review No. 13 (Prepared by the Oregon Evidence-based Practice Center under Contract No. 290-02-0024)

    (2006)
  • M. Hardy et al.

    Ayurvedic interventions for diabetes mellitus: a systematic review. Evidence Report/Technology Assessment No. 41 (Prepared by Southern California Evidence-based Practice Center/RAND under Contract No. 290-97-0001)

    (2001)
  • E. Balk et al.

    B vitamins and berries and age-related neurodegenerative disorders. Evidence Report/Technology Assessment No. 134 (Prepared by Tufts-New England Medical Center Evidence-based Practice Center under Contract No. 290-02-0022)

    (2006)
  • C. Catlett et al.

    Training of clinicians for public health events relevant to bioterrorism preparedness. Evidence Report/Technology Assessment No. 51 (Prepared by Johns Hopkins Evidence-based Practice Center under Contract No. 290-97-006)

    (2002)
  • D.M. Bravata et al.

    Bioterrorism preparedness and response: use of information technologies and decision support systems. Evidence Report/Technology Assessment No. 59 (Prepared by University of California San Francisco B Stanford Evidence-based Practice Center under Contract No. 290-97-0013)

    (2002)
  • L.J. Appel et al.

    Utility of blood pressure monitoring outside of the clinic setting. Evidence Report/Technology Assessment No. 63 (Prepared by the Johns Hopkins Evidence-based Practice Center under Contract No. 290-97-006)

    (2002)
  • C. Balion et al.

    Testing for BNP and NT-proBNP in the diagnosis and prognosis of heart failure. Evidence Report/Technology Assessment No. 142 (Prepared by the McMaster University Evidence-based Practice Center under Contract No. 290-02-0020)

    (2006)
  • M. Viswanathan et al.

    Management of bronchiolitis in infants and children. Evidence Report/Technology Assessment No. 69 (Prepared by RTI International-University of North Carolina at Chapel Hill Evidence-based Practice Center under Contract No. 290-97-0011)

    (2003)
  • J.G. Ford et al.

    Knowledge and access to information on recruitment of underrepresented populations to cancer clinical trials. Evidence Report/Technology Assessment No. 122 (Prepared by the Johns Hopkins University Evidence-based Practice Center under Contract No. 290-02-0018)

    (2005)
  • P. Ellis et al.

    Diffusion and dissemination of evidence-based cancer control interventions. Evidence Report/Technology Assessment Number 79 (Prepared by McMaster University under Contract No. 290-97-0017)

    (2003)
  • T.J. Whelan et al.

    Impact of cancer-related decision aids. Evidence Report/Technology Assessment Number 46 (Prepared by McMaster University under Contract No. 290-97-0017)

    (2002)
  • A. Ammerman et al.

    Efficacy of interventions to modify dietary behavior related to cancer risk. Evidence Report/Technology Assessment No. 25 (Contract No. 290-97-0011 to the Research Triangle Institute-University of North Carolina at Chapel Hill Evidence-based Practice Center)

    (2001)
  • F. McAlister et al.

    Cardiac resynchronization therapy for congestive heart failure. Evidence Report/Technology Assessment No. 106 (Prepared by the University of Alberta Evidence-based Practice Center under Contract No. 290-02-0023)

    (2004)
  • O.D. Schein et al.

    Anesthesia management during cataract surgery. Evidence Report/Technology Assessment No. 16 (Prepared by the Johns Hopkins University Evidence-based Practice Center under Contract No. 290-097-0006)

    (2001)
  • H. Jampel et al.

    Treatment of coexisting cataract and glaucoma. Evidence Report/Technology Assessment Number 38 (Prepared by Johns Hopkins University Evidence-based Practice Center under Contract No. 290-97-0006)

    (2003)
  • M. Viswanathan et al.

    Community-based participatory research: assessing the evidence. Evidence Report/Technology Assessment No. 99 (Prepared by RTI University of North Carolina Evidence-based Practice Center under Contract No. 290-02-0016)

    (2004)
  • A. Rostom et al.

    Celiac disease. Evidence Report/Technology Assessment No. 104 (Prepared by the University of Ottawa Evidence-based Practice Center, under Contract No. 290-02-0021)

    (2004)
  • D. Grady et al.

    Results of systematic review of research on diagnosis and treatment of coronary heart disease in women. Evidence Report/Technology Assessment No. 80 (Prepared by the University of California, San Francisco-Stanford Evidence-based Practice Center under Contract No. 290-97-0013)

    (2003)
  • S.S. Marinopoulos et al.

    Effectiveness of continuing medical education. Evidence Report/Technology Assessment No. 149 (Prepared by the Johns Hopkins Evidence-based Practice Center, under Contract No. 290-02-0018)

    (2007)
  • D.C. McCrory et al.

    Management of acute exacerbations of COPD. Evidence Report/Technology Assessment No. 19 (Contract 290-97-0014 to the Duke University Evidence-based Practice Center)

    (2001)
  • C.R. Flamm et al.

    Use of epoetin for anemia in chronic renal failure. Evidence Report/Technology Assessment No. 29 (Prepared by the Blue Cross and Blue Shield Association Technology Evaluation Center under Contract No. 290-97-0015)

    (2001)
  • M. Viswanathan et al.

    Cesarean delivery on maternal request. Evidence Report/Technology Assessment No. 133 (Prepared by the RTI International-University of North Carolina Evidence-Based Practice Center under Contract No. 290-02-0016)

    (2006)
  • A.J. Bonito et al.

    Management of dental patients who are HIV-positive. Evidence Report/Technology Assessment No. 37 (Contract 290-97-0011 to the Research Triangle Institute-University of North Carolina at Chapel Hill Evidence-based Practice Center)

    (2002)
  • J.D. Bader et al.

    Cardiovascular effects of epinephrine on hypertensive dental patients. Evidence Report/Technology Assessment Number 48 (Prepared by Research Triangle Institute under Contract No. 290-97-0011)

    (2002)
  • Cited by (0)

    View full text