- Liliane Zorzela, PhD candidate1,
- Su Golder, MRC fellow in health services research2,
- Yali Liu, lecturer3,
- Karen Pilkington, senior research fellow4,
- Lisa Hartling, assistant professor5,
- Ari Joffe, clinical professor6,
- Yoon Loke, clinical senior lecturer7,
- Sunita Vohra, professor8
- 1Department of Pediatrics, 4-548 Edmonton Clinic Health Academy, University of Alberta, Edmonton, Canada
- 2Centre for Reviews and Dissemination, University of York, York, UK
- 3Evidence-Based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Chengguan District, Lanzhou, Gansu, People’s Republic of China
- 4School of Life Sciences, University of Westminster, London, UK
- 5Department of Pediatrics, 4-472 Edmonton Clinic Health Academy, University of Alberta, Canada
- 6Stollery Children’s Hospital, Department of Pediatrics, University of Alberta, Edmonton, Canada
- 7University of East Anglia Medical School, Norwich, UK
- 8Department of Pediatrics, University of Alberta, Canada, T5K 0L4
- Correspondence to: S Vohra
- Accepted 17 December 2013
Objectives To examine the quality of reporting of harms in systematic reviews, and to determine the need for a reporting guideline specific for reviews of harms.
Design Systematic review.
Data sources Cochrane Database of Systematic Reviews (CDSR) and Database of Abstracts of Reviews of Effects (DARE).
Review methods Databases were searched for systematic reviews having an adverse event as the main outcome, published from January 2008 to April 2011. Adverse events included an adverse reaction, harms, or complications associated with any healthcare intervention. Articles with a primary aim to investigate the complete safety profile of an intervention were also included. We developed a list of 37 items to measure the quality of reporting on harms in each review; data were collected as dichotomous outcomes (“yes” or “no” for each item).
Results Of 4644 reviews identified, 309 were systematic reviews or meta-analyses primarily assessing harms (13 from CDSR; 296 from DARE). Despite a short time interval, the comparison between the years of 2008 and 2010-11 showed no difference on the quality of reporting over time (P=0.079). Titles in fewer than half the reviews (proportion of reviews 0.46 (95% confidence interval 0.40 to 0.52)) did not mention any harm related terms. Almost one third of DARE reviews (0.26 (0.22 to 0.31)) did not clearly define the adverse events reviewed, nor did they specify the study designs selected for inclusion in their methods section. Almost half of reviews (n=170) did not consider patient risk factors or length of follow-up when reviewing harms of an intervention. Of 67 reviews of complications related to surgery or other procedures, only four (0.05 (0.01 to 0.14)) reported professional qualifications of the individuals involved. The overall, unweighted, proportion of reviews with good reporting was 0.56 (0.55 to 0.57); corresponding proportions were 0.55 (0.53 to 0.57) in 2008, 0.55 (0.54 to 0.57) in 2009, and 0.57 (0.55 to 0.58) in 2010-11.
Conclusion Systematic reviews compound the poor reporting of harms data in primary studies by failing to report on harms or doing so inadequately. Improving reporting of adverse events in systematic reviews is an important step towards a balanced assessment of an intervention.
A balanced assessment of interventions requires analysis of both benefits and harms. Systematic reviews or meta-analyses of randomised controlled trials are the preferred method to synthesise evidence in a comprehensive, transparent, and reproducible manner. Randomised controlled trials rarely assess harms as their primary outcome; therefore, they typically lack the power to detect differences in harms between groups (table 1⇓). Usually designed to evaluate treatment efficacy or effectiveness, randomised controlled trials are often done over a short period of time, with a relatively small number of participants. These trials are known to be poor at identifying and reporting harms, which can lead to a misconception that a given intervention is safe, when its safety is actually unknown.1 2 3 4 5 6 7 8 Systematic reviews with a primary objective to assess harms represent fewer than 10% of all systematic reviews published yearly.9 10 Systematic reviews of harms can provide valuable information to describe adverse events (frequency, nature, seriousness), but they are hampered by a lack of standardised methods to report these events and the fact that harms are not usually the primary outcome of included studies.9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Several studies have identified challenges when developing a systematic review of adverse events.9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 These include: the poor quality information on harms reported on original studies, difficulties in identifying relevant studies on adverse events when using standard systematic searches techniques, and the lack of a specific guideline to perform a systematic review of adverse events. The need for better reporting on harms in general1 2 3 4 5 6 7 8—and in systematic reviews9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 in particular—has been voiced. In a previous review28 of systematic reviews from the Cochrane Database of Systematic Reviews (CDSR) and Database of Abstracts of Reviews of Effects (DARE), our team identified a significantly increased number of reviews of adverse events over the past 17 years (P<0.001); however, the proportion of these reviews out of the total number of reviews was unchanged at 5%.11 14 28 Some positive points were noted—for example, the increased number of databases searched per review and the reduction in number of systematic reviews limiting their search strategies by date or language—but appropriate reporting of search strategies was still a problem.28
The PRISMA30 (preferred reporting items for systematic review and meta-analysis) statement was developed to deal with suboptimal reporting in systematic reviews. Thus far, PRISMA has mainly focused on efficacy and not on harms. A reporting guideline specific for systematic reviews of harms is crucial to provide a better assessment of adverse events of interventions. The first step for successful guideline development is to document the quality of reporting in published research articles to justify the need for the guideline.30 The goal of this review was to determine whether there is a need for a guideline specific for reviews of harms30 through assessment of the quality of reporting in systematic reviews of harms published between January 2008 and April 2011, from two major databases.
Development of the checklist items
To assess the quality of reporting in systematic reviews of harms, we developed a set of items (table 2⇓) to be reported in these reviews.
The items were originally based on a draft generated from analysis of a systematic review of harms conducted previously.33 During the development of the data extraction form for this current review, several items were added. The wording and content were further refined over telephone meetings, by a group of experts in systematic reviews and guideline development.8 31 32 34 35 36 37 38 39
Not every PRISMA item has a corresponding harms item, and a few have more than one suggestion per item. PRISMA items 15, 19-25, and 27 did not have any specific harms related items.
We searched the CDSR (via the Cochrane Library) and DARE (via the Centre for Reviews and Dissemination and the Cochrane Library) databases for systematic reviews having an adverse event as the primary outcome measured. DARE is compiled through rigorous weekly searches of bibliographic databases (including Medline, Embase, PsycINFO, PubMed, and the Cumulative Index to Nursing and Allied Health Literature). It also involves less frequent searches of the Allied and Complementary Medicine Database and the Education Resource Information Center, hand searching of key journals, grey literature, and regular searches of the internet. CDSR includes all the systematic reviews published by the Cochrane Collaboration. We selected the combination of these two databases because they are likely to represent the most comprehensive collection of systematic reviews published in healthcare.28
The search was limited to a 40 month period between 1 January 2008 and 25 April 2011. The web appendix shows the search strategy used. The dates were selected to include recent reviews in order to describe the current state of reporting in systematic reviews of harms.
Reviews were selected if the primary outcome investigated was exclusively an unintended effect or effects of an intervention. It could be an adverse event, adverse effect, adverse reaction, harms, or complications (table 1) associated with any healthcare intervention (such as pharmaceutical interventions, diagnostic procedures, surgical interventions, or medical devices). Articles with a primary aim to investigate the complete safety profile of an intervention were included. Reviews were not excluded on the basis of their results or conclusions.
We excluded reviews assessing both beneficial and harmful effects (reviews of both efficacy and harms), reviews of desirable side effects of drugs, or reviews of prevention or reduction of unintended or adverse effects. No limitations on interventions, patient groups, or language were applied.
Screening and data extraction
Relevant studies were screened by title and abstract (when available) independently by two authors (LZ and SG). Any disagreements were resolved by consensus; if consensus could not be reached, disagreements were resolved with a third author (SV). Abstracts that were identified as potentially meeting the DARE criteria but were not assessed were called “provisional abstracts.” The full text was retrieved for these abstracts.
The data extraction was based on the items developed, piloted, and refined (table 2). Each field received a “yes” if the item was reported as defined, or “no” if not reported. The data were extracted by one author (LZ) and verified by a second author (YL). Disagreements were resolved by consensus.
The main outcome assessed was the quality of reporting in reviews of harms for each of the 37 items. We also measured the proportion of each “yes” response for each year of the search: 2008, 2009, and 2010-11 (12 reviews published in the first four months of 2011 were combined with the reviews published in 2010). The quality of reporting was compared between the earliest and latest years reviewed (2008 v 2010-11) to assess any improvement in quality of reporting during the study period.
We deliberately decided not to include an intermediate category (such as “unclear”). If the item was not clearly reported, it was considered as a “no” response and the unclear category would simply be duplication.
The present review did not aim to evaluate the reason behind the review author’s decisions to examine harms. Our goal was to measure the quality of reporting on those reviews— answering the question “is the item clearly reported?” The goal was not to judge whether a methodologically appropriate decision was made (for example, statistical tests used, data extraction, data pooling), but to ensure clarity in reporting regarding the choices made. Information on the types of interventions, nature of included study designs, as well as search strategies and databases searched in systematic reviews published between 1994 and 2011 was previously reported by our team.28
This study does not intend to measure the effect of the PRISMA statement, for two reasons. Firstly, PRISMA focuses on efficacy; thus measurement of its effect would reasonably focus on systematic reviews of efficacy and not specifically of harms. Secondly, the 37 items measured in the present review were new items and not those found in PRISMA.
Data were collected as dichotomous outcomes (“yes” or “no”) for each item, and presented as proportions of reviews for each category (reported from 0 to 1) and as proportions divided by the database that the reviews were identified from (CDSR and DARE). We also provided 95% confidence intervals for each proportion based on methods described by Wilson and Newcombe40 41 using a correction for continuity.
An overall reporting quality rate was provided through an unweighted average of proportions of items with good reporting. P≤0.05 was considered statistically significant. We did statistical calculations using StataIC-13.
The search yielded 4644 unique references. After screening and retrieving full text articles, an extra 14 papers were excluded as they did not fulfil the inclusion criteria. A total of 309 reviews were identified as systematic reviews of meta-analyses primarily assessing harms, of which 13 were identified at CDSR and 296 at DARE (fig 1⇓). Disagreements at the inclusion and exclusion criteria were discussed by LZ and SG and consensus was reached after discussion. Three of the included papers were published in Chinese, two in Spanish, and one in Portuguese. Table 3⇓ provides detailed information on the reporting of each item.
The 309 systematic reviews and meta-analyses with harms as a primary outcome focused on the following interventions:
Drugs (223 studies, proportion of reviews 0.72 (95% confidence interval 0.66 to 0.77))
Surgery or other procedures (67 studies, 0.21 (0.17 to 0.26))
Devices (13 studies, 0.04 (0.02 to 0.07))
Blood transfusion (two studies, 0.006 (0.001 to 0.022))
Enteral nutrition (two studies, 0.006 (0.001 to 0.022))
Isolation rooms (one study, 0.003 (0.001 to 0.01))
Surgical versus medical treatment (one study, 0.003 (0.001 to 0.01)).
Titles and abstracts
Titles in close to half of included systematic reviews and meta-analyses of harms did not mention any harm related terms (proportion of reviews 0.46 (95% confidence interval 0.40 to 0.52)) and had no report of a patient population or condition under review (0.42 (0.36 to 0.48)). Twenty five reviews (0.08 (0.05 to 0.11) used the word “safety” to identify a review of harms. In the abstract section, reviews often used harm related terms (0.84 (0.79 to 0.88)). For other terms, one in every 6.5 reviews did not have any harms related word in the abstract, and half of reviews (0.5 (0.44 to 0.56)) did not report the study designs sought or included.
Introduction and rationale
Introductions were well written overall, explaining the rationale for the review and providing information on harms being reviewed. Fifty one reviews (proportion of reviews 0.16 (95% confidence interval 0.12 to 0.21)) were performed to investigate any adverse event associated with an intervention, rather than focusing on a specific event. As per our definition, these reviews provided “an explicit statement of questions being asked with reference to harms” as this broad goal was reported.
Protocol and registration
Consistent with Cochrane requirements for authors, all included systematic reviews conducted through the Cochrane Collaboration (Cochrane reviews) had a protocol. Cochrane reviews did not refer to a protocol in their full text reviews, but this item was considered “yes” for the reviews for which a protocol could be found. By contrast, only 22 reviews (proportion of reviews 0.07 (95% confidence interval 0.04 to 0.11)) identified through the DARE database reported the use of a protocol. Reporting clinical expertise was not deemed necessary to receive “yes” for this item (table 2).
Almost one third of DARE reviews (proportion of reviews 0.26 (95% confidence interval 0.22 to 0.31)) did not have a clear definition of the adverse events reviewed, nor did they specify the study designs selected for inclusion in their methods section. All Cochrane reviews had a clear report of their search strategy, eligible study designs, and methods of data extraction.
Authors did not usually search outside the peer reviewed literature for additional sources of adverse events; for example, only 54 reviews (proportion of reviews 0.17 (95% confidence interval 0.13 to 0.22)) searched databases of regulatory bodies or similar sources. Seven of 13 Cochrane reviews (0.53 (0.25 to 0.80)) searched for data from regulatory bodies or industry, compared with 47 of 296 (0.15 (0.11 to 0.20)) DARE reviews.
Study selection and data items
At the screening phase, most reviews only included studies if the harms being searched were reported (proportion of reviews 0.70 (95% confidence interval 0.65 to 0.75)). Of 67 reviews of complications related to surgery or procedures, only four (0.05 (0.01 to 0.14)) reported professional qualifications of the individuals involved.
Reports of any possible patient related risk factors—such as age, sex, or comorbidities—were sought in less than half of reviews (proportion of reviews 0.41 (0.36 to 0.47)). Furthermore, only 10 reviews adjudicated whether the adverse event could be biologically, pharmacologically, or temporally caused by the intervention, as measured by item 12a.
Fewer than half the reviews (132 of 309; proportion of reviews 0.42 (95% confidence interval 0.37 to 0.48)) only included controlled clinical trials (randomised or not). Only 59 reviews (0.19 (0.14 to 0.23)) included both clinical trials and observational studies. Only observational studies (prospective or retrospective) were included in 109 reviews (0.35 (0.29 to 0.40)), case series or case reports were included in 24 (0.07 (0.05 to 0.11)). Three of 13 Cochrane reviews (0.23 (0.05 to 0.53)) included observational studies compared with 165 of 296 DARE reviews (0.55 (0.49 to 0.61)). After reviewing the full text, nine reviews did not report the designs included anywhere in the text. Length of follow-up or patient demographics were only reported in just over half of reviews (170 of 309 reviews; 0.55 (0.49 to 0.60)).
In a retrospective analysis, we compared the quality of reporting between the years of 2008 and 2010-11 to measure any possible improvement over time on the quality of reporting. There were no significant differences (P=0.079) on the proportion of good reporting between the earlier and later years.
The 13 CDSR reviews had overall better reporting than DARE reviews in the abstracts, methods, and results sections; the other categories had similar levels of reporting quality. Almost half of DARE reviews (proportion of reviews 0.46 (95% confidence interval 0.40 to 0.51)) had poor reporting in the results section. Because the number of reviews was considerably different between databases, we considered it inappropriate to proceed with any formal tests to compare CDSR and DARE reviews.
Figure 2⇓ provides a graphic trend in the proportion of reviews with good reporting over time (2008, 2009, and 2010-11), by each item reviewed. Reviews had a poor report quality on methods and results, with an average of half of items being poorly reported on those sections.
The overall proportion of reviews with good reporting was 0.56 (95% confidence interval 0.55 to 0.57). The corresponding proportions were 0.55 (0.53 to 0.57) in 2008, 0.55 (0.54 to 0.57) in 2009, and 0.57 (0.55 to 0.58) in 2010-11.
We conducted this systematic review to assess the quality of reporting in systematic reviews of harms as a primary outcome using a set of proposed reporting items. This is the first step for the development of PRISMA Harms, a reporting guideline specifically designed for reviews of harms.30 There is a substantial difference between the number of systematic reviews measuring an adverse event as the main outcome identified through DARE and those identified through CDSR. CDSR reviews comprised only a small fraction of the total reviews included. This distinction may compromise any direct comparison between the two databases. Reviews of harms as a secondary outcome or a coprimary outcome were not included in this review, which could be one reason for the large dissimilarity. Despite the small number of reviews of harms published in the CDSR, they were better reported overall than DARE reviews, probably owing to the clear guidelines provided by the Cochrane Collaboration29 and the more flexible word limits allowed than those in other peer reviewed journals.
Several items were poorly reported in the included reviews, but a few are especially important when reviewing harms, because the lack of reporting on these could lead to misinterpretations of findings. In a systematic review, the screening phase is crucial and the exclusion of studies due to the absence of harms could overestimate the events and perhaps generate a biased review. Two thirds of reviews of harms only included studies if at least one adverse event was reported in the included studies. In a review of harms, “zero” is an important value, and studies with “no adverse events” are possibly as relevant to the review as those with reported adverse events. Nevertheless, zero events in studies require careful interpretation, because the lack of reported harms may have different reasons: they may not have occurred (that is, a zero event), they may not have been investigated (that is, unknown if zero or no events occurred), or they may have been detected but not reported (that is, unknown if zero or no events occurred). The lack of reporting can be thought of as a measurement bias or reporting bias and should be considered as such.1 2 3 4 5 6 7 8 9 10 11 12 24 25 26
All these scenarios have different implications for readers who need to judge whether an intervention may cause harm. Almost half of the included reviews of harms as a primary outcome did not consider patient risk factors or length of follow-up when reviewing adverse events of an intervention. Readers cannot properly judge whether there is an association between intervention and harms if these critical data are not reported. Most of the poorly reported items were identified at methods and results. The clarity on methods and results are essential to provide a clear picture of author’s intentions and limitations of findings and transparency is warranted.
Strengths and limitations
This review was unique by including more than 300 reviews, from two major databases, that looked at harms as a main outcome; each review was evaluated in depth using a novel set of 37 items to measure the quality of reporting. A limitation of this review was the lack of a reporting guideline specific for systematic reviews of harms; different formats of reporting were found, and assessing whether the reporting was adequate was challenging. The reviewers were generous in their assessment, accepting a range of reports as a “yes” to the presence of an item, which could have underestimated the degree of the problem.
Our review was limited exclusively to systematic reviews where harms were the primary focus. We believed that measuring the quality of reporting of adverse events in reviews specifically designed to evaluate adverse events would generate a pure sample focusing on the reporting of such events. The documenting of poor reporting on these reviews would imply poor reporting on adverse events in general. At this stage, we also decided to be inclusive and measure all potentially relevant items. In the future PRISMA Harms extension, we will limit these to the minimum set of essential items for reporting harms in a systematic review.
Comparison with other studies
Hopewell and colleagues14 reviewed a sample of 59 Cochrane reviews, of which only one was a harms review. The remaining 58 reviews focused primarily on benefit, with adverse events as secondary or tertiary considerations. Hopewell and colleagues reported that 32 (54%) reviews had fewer than three paragraphs of information on adverse events, and 11 (19%) had fewer than five sentences on adverse events.
Hammad and colleagues42 reviewed a sample of 27 meta-analyses primarily assessing drug safety and identified that more than 85% of the PRISMA items were reported in the majority of the reviews. However, most reviews did not report the 20 items specifically developed by the authors to address drug safety assessment. Several of the items considered important by Hammad and colleagues where similar to the ones developed by our team. The present review adds to the voice of many authors highlighting the poor reporting in reviews of adverse events.9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 42
Systematic reviews may compound the poor reporting of harms data in primary studies by failing to report on harms or doing so inadequately.9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 We recognise the need to optimise quality of reporting on harms in primary studies, and attempts to enhance it have already been made.6 At this phase, we measured the report quality in systematic reviews. As a future step, we intend to compare the quality of reporting in included studies with the reporting in the systematic review.
Despite their status as the preferred method for knowledge synthesis, systematic reviews can present an incomplete picture to readers by not representing a reliable assessment of a given treatment. Authors of the systematic review have a unique vantage point and can evaluate the entire evidence base under review, including deficits in reporting harms at the primary study level. This vantage point should be used to flag deficits in primary reporting. The goal over time should be to improve the quality and clarity of reporting in systematic reviews as well as the primary studies they evaluate. Although we are glad to see increasing numbers of systematic reviews of harms being published, we are also aware that the validity of the results can be heavily influenced by reviewers’ decisions during conduct of the review—even more so than in reviews of benefit because we are dealing with sparse data and secondary outcomes.43 44 45 Hence, we emphasise here the crucial importance of transparent reporting of the methods used in systematic reviews of harms.
Systematic reviews of interventions should put equal emphasis on efficacy and harms. Improved reporting of adverse events in systematic reviews is one step towards providing a balanced assessment of an intervention. Patients, healthcare professionals, and policymakers should base their decisions not only on the efficacy of an intervention, but also on its risks. Guidance on a minimal set of items to be reported when reviewing harms is needed to improve transparency and informed decision making, thereby greatly enhancing the relevance of systematic reviews to clinical practice.
This review used a set of proposed reporting items to assess reporting, and the findings indicate that specific aspects of harms reporting could be improved. The items will be further refined with the aim of developing a final set of criteria that would constitute the PRISMA Harms. The PRISMA statement30 is a living document, open to criticism and suggestions; these same principles will be shared by the PRISMA Harms. The development of a standardised format for reporting harms in systematic reviews will promote clarity and help ensure that readers have the basic information necessary to make an informed assessment of the intervention under review.
What this paper adds
There is room for improved reporting in systematic of harms, including clear definition of the events measured, length of follow-up, and patient risk factors
Comparisons of reviews of harms have shown no improvement in the quality of reporting over time
Lack of detail or transparency in the reporting of systematic reviews of harms could hinder proper assessment of validity of findings
What is already known on this topic
The number of systematic reviews of adverse effects has increased significantly over the past 17 years
Harms are poorly reported in randomised controlled trials, but it is unclear whether there are weaknesses in the reporting of systematic reviews of harms
Although the PRISMA statement aims to provide guidance on transparent reporting in systematic reviews, its recommendations are mainly focused on studies of beneficial outcomes
Cite this as: BMJ 2014;348:f7668
Contributors: All authors participated in the study conception and design, and analysis and interpretation of data; drafted the article or revisited it critically for important intellectual content; and approved the final version to be published. SV is the study guarantor.
Funding: No specific funding was given for this study.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: No ethics approval was necessary because this was a review of published literature. No patient data or confidential information were used in this manuscript.
Data sharing: The dataset is available from the corresponding author at.
This manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.