PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews

The methods and results of systematic reviews should be reported in sufficient detail to allow users to assess the trustworthiness and applicability of the review findings. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement was developed to facilitate transparent and complete reporting of systematic reviews and has been updated (to PRISMA 2020) to reflect recent advances in systematic review methodology and terminology. Here, we present the explanation and elaboration paper for PRISMA 2020, where we explain why reporting of each item is recommended, present bullet points that detail the reporting recommendations, and present examples from published reviews. We hope that changes to the content and structure of PRISMA 2020 will facilitate uptake of the guideline and lead to more transparent, complete, and accurate reporting of systematic reviews.

"Currently there is no clear evidence to indicate which surgery is the best choice. It is unclear if the older operations that were previously available (such as anterior repair and colposuspension) really result in equivalent or better outcomes than the polypropylene mid-urethral sling. However, the feeling of our clinical experts who used to offer colposuspension and traditional slings is that these techniques had more frequent and severe associated complications and returning to them may be detrimental to women. To enable women to make an evidence-based choice and inform practice guidelines, it is essential to collect reliable evidence in a transparent, concise manner to allow impartial counselling of women regarding the benefits and risks of the alternative surgical operations for the management of stress urinary incontinence. The wide range of surgical operations available, the different techniques used to perform these operations and the lack of a consensus among surgeons make it challenging to establish which procedure is the most effective. The existing evidence base, including the Cochrane systematic reviews, has focused on discrete two-way comparisons, with no attempt being made to collate all of the evidence on the surgical options available and rank them in terms of clinical effectiveness, safety and cost-effectiveness. This has resulted in a piecemeal evidence base that is difficult for women and clinicians to interpret. This assessment includes an evidence synthesis of all available randomized controlled trials to determine the relative clinical effectiveness and safety of interventions, a discrete choice experiment (DCE) to explore women's preferences, an economic decision model to determine the most cost-effective treatment and a value-of-information (VOI) analysis to help inform the focus of further research." (11) Example 4: In a review examining the effects of dietary inorganic nitrate for lowering blood pressure in hypertensive adults, the authors report what information the review seeks to add to current knowledge, and indicate that no systematic review addressing the same question exists: "…it is well known that the organic nitrates lower blood pressure in hypertensive individuals, which brings about the question of whether inorganic nitrates have the same ability. This review focuses on the dietary alteration component of lifestyle modifications by the use of inorganic nitrate in the treatment of hypertension. The appraisal of the evidence was completed to ultimately help providers make informed decisions regarding interventions to address one of the nation's biggest killers. There was a systematic review published in 2013 that addressed the effects of dietary inorganic nitrate on blood pressure with an overrepresentation of healthy, normotensive participants. That review found that inorganic nitrates decrease blood pressure. For this reason, this review examines studies published from 2013 through 2018 with blood pressure greater than 120/80 mmHg in participants, which would be considered elevated according to the guidelines published by the American College of Cardiology Examples (ACC) and American Heart Association (AHA). The results of this review will contribute towards a greater understanding of possible treatments for hypertension, sequentially resulting in less morbidity and mortality from cardiovascular diseases. At the time of this systematic review, there was no systematic review that evaluated the effects of inorganic nitrate specifically on adults with blood pressure greater than 120/80 mmHg." (12) Item 4. OBJECTIVES: Provide an explicit statement of the objective(s) or question(s) the review addresses.
Example 1: In a review examining the effects of anti-tumour necrosis factor-blocking agents for rheumatoid arthritis, the authors report a single objective of the review: "Objectives: To evaluate the benefits and harms of down-titration (dose reduction, discontinuation, or disease activity-guided dose tapering) of anti-tumour necrosis factor-blocking agents (adalimumab, certolizumab pegol, etanercept, golimumab, infliximab) on disease activity, functioning, costs, safety, and radiographic damage compared with usual care in people with rheumatoid arthritis and low disease activity." (13) Example 2: In a review examining the effects of pre-exposure prophylaxis for the prevention of HIV infection, the authors report five key questions the review addresses: "Key Questions: 1. What are the benefits of PrEP in persons without pre-existing HIV infection versus placebo or no PrEP (including deferred PrEP) on the prevention of HIV infection and quality of life? a. How do the benefits of PrEP differ by population subgroups? b. How do the benefits of PrEP differ by dosing strategy or regimen? 2. What is the diagnostic accuracy of provider or patient risk assessment tools in identifying persons at increased risk of HIV acquisition who are candidates for PrEP? 3. What are rates of adherence to PrEP in U.S. primary care-applicable settings? 4. What is the association between adherence to PrEP and effectiveness for preventing HIV acquisition? 5. What are the harms of PrEP versus placebo or no PrEP when used for the prevention of HIV infection?" (14) Example 3: In a review examining the effects of mobile health interventions during the perinatal period for mothers in lowand middle-income countries, the authors report the primary objective of the review and specify two questions the review addresses: "The primary objective of this review was to determine the impact of mother-targeted mHealth educational interventions during the perinatal period in low-and middle-income countries on maternal and neonatal outcomes. Thus, this quantitative review aimed to answer the following questions: i.
What is the impact of mother-targeted mHealth educational interventions on maternal knowledge, self-efficacy and antenatal/postnatal care clinic attendance in low-and middle-income countries? ii.
What is the impact of mother-targeted mHealth educational interventions on neonatal mortality and morbidity in lowand middle-income countries?" (15) Example 4: In a review examining the effects of screening for esophageal adenocarcinoma in patients with chronic gastroesophageal reflux disease, the authors report two key questions the review addresses: "In order to determine the effectiveness of screening for esophageal adenocarcinoma among gastroesophageal reflux disease patients, the following key questions were addressed: 1a. In adults (≥ 18 years) with chronic gastroesophageal reflux disease with or without other risk factors, what is the effectiveness (benefits and harms) of screening for esophageal adenocarcinoma and precancerous conditions (Barrett's Esophagus and lowand high-grade dysplasia)? What are the effects in relevant subgroup populations?" 1b. If there is evidence of effectiveness, what is the optimal time to initiate and to end screening, and what is the optimal screening interval (includes single and multiple tests and ongoing 'surveillance')?" (16) Item 5. ELIGIBILITY CRITERIA: Specify the inclusion and exclusion criteria for the review and how studies were grouped for the syntheses.
Example 1: In a review examining the effects of family therapy for people with anorexia nervosa, the authors report the types of studies, participants, interventions, comparators, and outcomes that were eligible for inclusion in the review, and state that there were no restrictions on the type of reports that were eligible (i.e. published or unpublished, any language, any date of publication): "Types of studies: We include all published or unpublished randomised controlled trials (RCTs). We would also have included cluster-randomised controlled trials and cross-over trials, but we found none. There were no language restrictions, nor did we exclude studies on the basis of the date of publication.
Types of participants: We included people of any age or gender with a primary clinical diagnosis of anorexia nervosa (AN), either or both purging or restricting subtypes, based on DSM (APA 2013) or ICD criteria (WHO 1992) or clinicians' judgement, and of any severity. We included those with chronic AN. We included those with psychiatric comorbidity, with the details of comorbidity documented. Participants may have received the intervention in any setting (including in-, day-or outpatient) and may have started in the trial at the beginning of treatment or part-way through (e.g. after discharge from hospital or some other indication/definition of stabilisation). We included those living in a family unit (of any nature, as described/defined by study authors), and those living outside of a family unit.
Types of interventions: Trials where the intervention describes inclusion of the family in some way and is labelled 'family therapy'. These interventions may have been delivered as a monotherapy or in conjunction with other interventions (including standard care, which may or may not be in the context of an inpatient admission). The main categories of family therapy approaches considered were: • Structural family therapy • Systems (systemic) family therapy • Strategic family therapy • Family-based therapy and its variants (including short-term, long-term, and separated) and behavioural family systems therapy (these two therapies were grouped together, given the similarity of approach) • Other (including other approaches that use family involvement in therapy but are less specific about the theoretical underpinning of the therapy and its procedures).
Family therapy approaches were compared with: • Standard care or treatment as usual • Biological interventions (for example, antidepressants, antipsychotics, mood stabilisers, anxiolytics, neutraceuticals, and other agents such as anti-glucocorticoids) • Educational interventions (for example, nutritional interventions and dietetics) • Psychological interventions (for example, cognitive behavioural therapy (CBT) and its derivatives, cognitive analytical therapy, interpersonal therapy, supportive therapy, psychodynamic therapy, play therapy, other) • Alternative or complementary interventions (for example, massage, exercise, light therapies).
Additionally, different types of family therapy approaches were compared to each other. The addition of a family therapy approach to other interventions (including standard care) was also compared to other interventions alone. We would also have included the following comparisons: Family therapy approaches versus biological interventions; and Family therapy approaches versus alternative/complementary interventions; however, we had neither the relevant trials nor useable data from these.
Types of outcome measures: Primary outcomes included: • Remission (by DSM or ICD or trialist-defined cut-off on standardised scale measure for remission versus no remission) • All-cause mortality Secondary outcomes included: • Family functioning as measured on standardised, validated and reliable measures, e.g. Family Environment Scale (Moos 1994), Expressed Emotions (Vaughn 1976), FACES III (Olson 1985) • General functioning, measured by return to school or work, or by general mental health functioning measures, e.g. Global Assessment of Functioning (GAF) (APA 1994) • Dropout (by rates per group during treatment) • Eating disorder psychopathology (evidence of ongoing preoccupation with weight/shape/food/eating by eating-disorder symptom measures using any recognised validated eating disorders questionnaire or interview schedule, e.g. the Morgan-Russell Assessment Schedule (Morgan 1988), Eating Attitudes Test (EAT, Garner 1979), Eating Disorders Inventory (Garner 1983; Garner 1991). • Weight, including all representations of this measure such as kilograms, body mass index (BMI, kg/m2) and average body weight (ABW) calculations. We included this measure after the finalisation of our protocol, due to the lack of universal reporting on remission, and the differing definitions used for remission • Relapse (by DSM or ICD or trialist-defined criteria for relapse or hospitalisation)" (17) Example 2: In a review examining the effects of perioperative interventions for prevention of postoperative pulmonary complications, the authors report the types of studies, participants, interventions and outcomes that were eligible for inclusion in the review, indicating that studies were excluded if they measured outcomes that were neither patient centric nor clinically relevant.
"Population: We included RCTs of adult (age ≥18 years) patients undergoing non-cardiac surgery, excluding organ transplantation surgery (as findings in patients who need immunosuppression may not be generalisable to others).
Intervention: We considered all perioperative care interventions identified by the search if they were protocolised (therapies were systematically provided to patients according to pre-defined algorithm or plan) and were started and completed during the perioperative pathway (that is, during preoperative preparation for surgery, intraoperative care, or inpatient postoperative recovery). Examples of interventions that we did or did not deem perioperative in nature included long term preoperative drug treatment (not included, as not started and completed during the perioperative pathway) and perioperative physiotherapy interventions (included, as both started and completed during the perioperative pathway). We excluded studies in which the intervention was directly related to surgical technique.
Outcomes: To be included, a trial had to use a defined clinical outcome relating to postoperative pulmonary complications, such as "pneumonia" diagnosed according to the Centers for Disease Control and Prevention's definition. RCTs reporting solely physiological (for example, lung volumes and flow measurements) or biochemical (for example, lung inflammatory markers) outcomes are valuable but neither patient centric nor necessarily clinically relevant, and we therefore excluded them. We applied Example 3: In a review examining the effects of pharmacological or non-pharmacological interventions for adults with exacerbation of chronic obstructive pulmonary disease, the authors report inclusion and exclusion criteria for participants, interventions, comparators, outcomes, settings, study designs, and language of publication. More detail is provided in a table.
"The eligible studies had to meet all of the following criteria: 1) adult 18 years and older with exacerbations of chronic obstructive pulmonary disease (ECOPD); 2) received pharmacologic intervention or nonpharmacologic interventions; 3) compared with placebo, standard care, for antibiotics and systemic corticosteroids: different types of agents, different delivery modes, and different durations of treatments; 4) reported outcomes of interest; 5) conducted in outpatient, inpatients, and emergency department; 6) randomized controlled trials (RCTs); and 7) published in English. We excluded studies conducted in the intensive care unit, or chronic ventilator unit or respiratory care unit; studies of patients with exacerbation of chronic bronchitis if they did not have any evidence of airflow limitation on spirometry (at any time, including during a stable state); and studies of health service interventions (e.g. hospital in the home as alternative to hospitalization). We focused only on interventions during the initial acute phase of an exacerbation of COPD and not during the convalescence period. We did not restrict study location or sample size. The detailed inclusion and exclusion criteria are listed in Table 1. All outcomes were final health outcomes except for the intermediate outcome, "forced expiratory volume in one second" (FEV1). FEV1 was included because it is a commonly used outcome in COPD studies and has been shown to be highly predictive of final health outcomes during ECOPD (including mortality, need for intubation, or hospital admission for COPD)." (19) Item 6. INFORMATION SOURCES: Specify all databases, registers, websites, organisations, reference lists and other sources searched or consulted to identify studies. Specify the date Example 1: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors list the electronic bibliographic databases (with dates of coverage for each), trials registers and websites searched. They also indicate that reference lists of all eligible study reports were reviewed and forward citation tracking of all eligible study reports was conducted: "We conducted electronic searches for eligible studies within each of the following databases: • Cochrane Central Register of Controlled Trials (CENTRAL) (1992 to 23rd July 2018); • MEDLINE (including MEDLINE In-Process) (OvidSP) (1946 to 23rd July 2018); • Embase (OvidSP) (1980 to 23rd July 2018); Examples when each source was last searched or consulted.
We conducted electronic searches of the following grey literature databases using search strategies adapted from the final MEDLINE search strategy, as described above: • Conference Proceedings Citation Index -Science (Web of Science) (1990 to 24th July 2018); • Conference Proceedings Citation Index -Social Science & Humanities (Web of Science) (1990 to 24th July 2018); and • OpenGrey (1997 to 24th July 2018).
We searched trial registers (US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov (www.clinicaltrials.gov/), the World Health Organization International Clinical Trials Registry Platform (apps.who.int/trialsearch/), and the EU Clinical Trials Register (www.clinicaltrialsregister.eu/) to identify registered trials (up to 25th July 2018), and the websites of key organisations in the area of health and nutrition, including the following: • UK Department of Health; • Centers for Disease Control and Prevention (CDC), USA; • World Health Organization (WHO); • International Obesity Task Force; and • EU Platform for Action on Diet, Physical Activity and Health.
In addition, we searched the reference lists of all eligible study reports and undertook forward citation tracking (using Google Scholar) to identify further eligible studies or study reports (up to 25th July 2018)" (20) Example 2: In a review examining the educational outcomes of children in contact with social care in England, the authors report the databases and other sources consulted, along with the date each source was searched: "On 21 December 2017, MAJ searched 16 health, social care, education and legal databases, the names and date coverage of which are given in Table 2. […] We also carried out a 'snowball' search to identify additional studies by searching the reference lists of publications eligible for full-text review and using Google Scholar to identify and screen studies citing them.
[…] On 26 April 2018, we conducted a search of Google Scholar and additional supplementary searches for publications on websites of 10 relevant organisations (including government departments, charities, think-tanks and research institutes). Full details of these supplementary searches can be found in the Additional file 2. Finally, we updated the database search on 7 May 2019, and the snowball and additional searches on 10 May 2019 as detailed in Additional file 3. We used the same search method, except that we narrowed the searches to 2017 onwards." (21) Example 3: In a review examining the effects of environmental interventions to reduce the consumption of sugar-sweetened beverages, the authors report the databases, trials registers and websites searched (noting the date when each was searched) and indicate in an Appendix the reports for which forward and backward citation searching occurred: "We performed searches in the following databases: In addition, we searched the websites of key organisations in the area of health, health promotion and nutrition, including the following: • EU platform for action on diet, physical activity and health (ec.europa.eu/health/ph_determinants/life_style/nutrition/platform/database/dsp_search.cfm).
• Harvard TH Chan School of Public Health Obesity Prevention Source (www.hsph.harvard.edu/obesity-prevention-source).
We handsearched reference lists of included studies and previously published reviews, and contacted the corresponding author of included studies and previously published reviews as well as the members of the Review Advisory Group to identify additional studies. We also conducted a citing studies search with Scopus, i.e. we searched for studies that have cited included studies and previously published reviews. The studies used for these forward and backward citation searches are provided in Appendix 6… The following terms were searched individually using the CADTH site search engine.
Five known relevant studies were used to identify records within databases. Candidate search terms were identified by looking at words in the titles, abstracts and subject indexing of those records. A draft search strategy was developed using those terms and additional search terms were identified from the results of that strategy. Search terms were also identified and checked using the PubMed PubReMiner word frequency analysis tool. The MEDLINE strategy makes use of the Cochrane RCT filter reported in the Cochrane Handbook v5.2. The RCT filter used in the Embase search was developed by the authors. Animal studies are removed from MEDLINE by using a standard algorithm and from Embase using an approach that uses animal-related subject headings but excluding records that are also indexed with the Emtree heading 'Human'. As per the eligibility criteria the strategy was limited to English language studies.
The search strategy was validated by testing whether it could identify the five known relevant studies and also three further studies included in two systematic reviews identified as part of the strategy development process. All eight studies were identified by the search strategies in MEDLINE and Embase.
The strategy was developed by an information specialist and the final strategies were peer reviewed by an experienced information specialist within our team. Peer review involved proofreading the syntax and spelling and overall structure, but did not make use of the PRESS checklist.
Three additional approaches were used to identify further studies. The reference lists of the eligible trials were screened, the included studies of two recent systematic reviews were screened and a forward citation search of the eligible trials to identify publications that had cited them was conducted using Web of Science on 3 Dec 2013. (24) Example 2: In a review examining the effects of environmental interventions to reduce the consumption of sugar-sweetened beverages, the authors report the full search strategy for all databases searched. The following is an excerpt of how they reported the search strategy for trials registers: "For clinicaltrials.gov we used the advanced search interface, and used the search syntax "(sugar-sweetened beverage) OR SSB OR soda" to run searches in the following fields: The search yielded 646 records, which we collated and de-duplicated in MS Excel. After de-duplication, 282 unique records remained.
For the International Clinical Trials Registry Platform (ICTRP) we used the advanced search interface, and used the search syntax "sugar-sweetened beverage OR SSB OR soda" to run searches in the following fields (with synonyms, all recruitment status): The search resulted in 171 hits.
Based on the search, we identified two completed studies eligible for inclusion in our review (Collins 2016 SNAP; Collins 2016 WIC), which we found through clinicaltrials.gov. Moreover, we identified 10 ongoing studies which we judged likely to meet our eligibility criteria upon completion. We present details of these in Characteristics of ongoing studies. We found eight of these through our search in clincialtrials.gov, and two through our search in the ICTRP. We ran trial register searches on 21 June 2018." (22) Item 8. SELECTION PROCESS: Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process.
Example 1: In a review examining the key components of shared decision-making models, the authors report piloting, double screening, and consensus methods for study selection: "Three researchers (AP, HB-R, FG) independently reviewed titles and abstracts of the first 100 records and discussed inconsistencies until consensus was obtained. Then, in pairs, the researchers independently screened titles and abstracts of all articles retrieved. In case of disagreement, consensus on which articles to screen full-text was reached by discussion. If necessary, the third researcher was consulted to make the final decision. Next, two researchers (AP, HB-R) independently screened full-text articles for inclusion. Again, in case of disagreement, consensus was reached on inclusion or exclusion by discussion and if necessary, the third researcher (FG) was consulted." (25) Example 2: In a review examining the long-term effects of alcohol consumption on cognitive function, the authors report piloting, single screening titles/abstracts, partial single screening of full-text, and linking reports to studies: "Citations identified from the literature searches and reference list checking were imported to EndNote and duplicates were removed. Three reviewers independently screened a sample of 109 citations to pre-test and refine coding guidance based on the inclusion criteria. Disagreements about eligibility were resolved through discussion. One reviewer (SB, JR, or SM) then each screened about a third of the remaining citations (grouped by year of publication) for inclusion in the review using the pre-tested coding guidance.
Full-text of all potentially eligible studies were retrieved. A sample of full-text studies was independently screened by two reviewers (SB and JR) until concordance was achieved (~15%; 37/228 of full-text studies screened). The remaining full-text studies were screened by one reviewer (SB or JR). All included studies, and those for which eligibility was uncertain, were screened by a second reviewer (JR or SB). Disagreements or uncertainty about eligibility were resolved through discussion, with advice from the review biostatisticians (JM, AF, or both) to confirm eligibility based on study design and analysis methods. Further information was sought from the authors of two studies (Piumatti 2018, Wardzala 2018) to clarify methods and interpretation of the analysis.

Checklist item Examples
Citations that did not meet the inclusion criteria were excluded and the reason for exclusion was recorded at the full-text screening. Cohort names, author names, and study locations, dates and sample characteristics were used to identify multiple reports arising from the same study (deemed to be a 'cohort'). These reports were matched and data extracted only from the report that provided the most relevant analysis and complete information for the review. In most cases, the decision was based on the outcome reported (global function was prioritised)." (26) Example 3: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors report priority screening methods and how non-English language articles were handled: "We imported titles and abstracts retrieved by the searches into EPPI Reviewer v.4.10.2 (ER4) systematic review software. Duplicate records were identified, manually reviewed and then removed using ER4's automatic de-duplication feature, with the similarity threshold initially set to 0.85 and then to 0.80. Due to the large number of records retrieved, we developed a semiautomated screening workflow in ER4 that used machine learning to assign title-abstract records for duplicate manual screening. This workflow was designed to maximise the recall of eligible studies while reducing the overall screening workload to match the resources available. We planned for duplicate manual screening to apply to up to a third of records retrieved.
In developing the workflow, we first screened a random sample of 500 title-abstract records to calculate inter-rater reliability and establish an initial estimate of the baseline inclusion rate (sample sized determined as per Shemilt 2014). Secondly, title-abstract records were prioritised for manual screening using active learning to distinguish between relevant and irrelevant records in conjunction with manual user input. This phase of the workflow stopped when each review author had completed 15 hours of duplicate screening without identifying any further potentially eligible studies. In practice, this equated to 1700 title-abstract records.
When we found non-English language articles, we used Google Translate in the first instance to determine potential eligibility. We intended that if an article appeared to be eligible, we would have the article translated by a native language speaker or professional translation service, however no articles needed translating." (20) Example 4: The following is a made-up example showing how to report use of machine learning and crowdsourcing in the study selection process: Study selection followed a three-stage process that involved machine learning classifiers, crowdsourcing and manual screening. After removing duplicates, we applied Cochrane's RCT machine learning classifier (Thomas 2020) and removed from further consideration any record classified as highly unlikely to report a randomized trial (i.e. below the externally calibrated recall threshold of 99%). Records that remained were then screened by Cochrane Crowd (Noel-Storr 2020), a crowdsourcing platform that has consistently shown to be over 99% accurate. In Cochrane Crowd, every record is screened by at least two crowd members, with all disagreements resolved by two expert screeners. Records rejected by the crowd were removed from further consideration. Finally, records the crowd deemed likely to be reports of randomized trials were screened independently by two members of the review team in Covidence. [Example drafted by Steve McDonald and James Thomas, March 2020] Item 9. DATA COLLECTION PROCESS: Specify the methods used to collect data from reports, including how many reviewers collected data from each report, whether they worked independently, any processes for obtaining or confirming data from study investigators, and if applicable, details of automation tools used in the process.
Example 1: In a review examining the effects of pharmacological interventions for promoting smoking cessation during pregnancy, the authors report using a data collection form, the number of authors collecting data from studies and the process for resolving disagreements, and indicate that study authors were contacted if any data were unclear: "We designed a data extraction form based on that used by Lumley 2009, which two review authors (RC and TC) used to extract data from eligible studies. Extracted data were compared, with any discrepancies being resolved through discussion. RC entered data into Review Manager 5 software (Review Manager 2014), double checking this for accuracy. When information regarding any of the above was unclear, we contacted authors of the reports to provide further details." (27) Example 2: In a review examining the effects of pharmacological or non-pharmacological interventions for adults with exacerbation of chronic obstructive pulmonary disease, the authors report using a standardized form that was pilot tested, and indicate that independent reviewers extracted data, which was checked by another reviewer: "We developed a standardized data extraction form to extract study characteristics...The standardized form was pilot-tested by all study team members using five randomly selected studies. Reviewers worked independently to extract study details. A third reviewer reviewed data extraction, and resolve conflicts." (19) Example 3: In a review examining the association between smoking and sickness absence, the authors report using a standardized form that was pilot tested, that one reviewer extracted data which was checked by another reviewer, and that study authors were contacted: "A data extraction sheet was developed, pilot tested on ten randomly selected included articles and then refined. After finalizing the data extraction sheet, one reviewer performed the initial data extraction for all included articles and a second reviewer checked all proceedings…. Corresponding authors were asked for additional information in cases where data provided in the published articles were insufficient." (28) Item 10a. DATA ITEMS: List and define all outcomes for which data were Example 1: In a review examining the long-term effects of alcohol consumption on cognitive function, the authors list and define the outcomes for which data were sought (e.g. cognitive function), and specify the decision rules used to decide which results to collect when multiple were available in studies (e.g. when multiple measures, time points and unadjusted and adjusted analyses were available): Examples sought. Specify whether all results that were compatible with each outcome domain in each study were sought (e.g. for all measures, time points, analyses), and if not, the methods used to decide which results to collect.
"Eligible outcomes were broadly categorised as follows: Cognitive function • Global cognitive function • Domain-specific cognitive function (especially domains that reflect specific alcohol-related neuropathologies, such as psychomotor speed and working memory) Clinical diagnoses of cognitive impairment • Mild cognitive impairment (also referred to as mild neurocognitive disorders) These conditions were 'characterised by a decline from a previously attained cognitive level'.
Major cognitive impairment (also referred to as major neurocognitive disorders; including dementia) was excluded.
We expected that definitions and diagnostic criteria would vary across studies, so we accepted a range of definitions as noted under 'Methods of outcome assessment' section. Table 1 provides an example of specific domains of cognitive function used in the diagnosis of mild and major cognitive impairment in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5)).

Method of outcome measurement
Any measure of cognitive function was eligible for inclusion. The tests or diagnostic criteria used in each study should have had evidence of validity and reliability for the assessment of mild cognitive impairment, but studies were not excluded on this basis.
We anticipated that many different methods would be used to assess cognitive functioning across studies. These include the following.
Clinical diagnoses of Neuropsychological tests used to assess global cognitive function, for example the: • Addenbrooke's Cognitive Examination-Revised (ACE-R) which "incorporates the MMSE and assesses attention, orientation, fluency, language, visuospatial function, and memory, yielding subscale scores for each domain" • Montreal Cognitive Assessment (MOCA), which provides measures for specific cognitive abilities and may be more suitable for assessing mild cognitive impairment than the MMSE Neuropsychological tests for assessing domain-specific cognitive function, for example, tests of: • Attention and processing speed, for example, the Trail making test (TMT-A) Results could be reported as an overall test score that provides a composite measure across multiple areas of cognitive ability (i.e. global cognitive function), sub-scales that provide a measure of domain-specific cognitive function or cognitive abilities (e.g. processing speed, memory), or both.

Timing of outcome assessment
Studies with a minimum follow-up of 6 months were eligible, a time frame chosen to ensure that studies were designed to examine more persistent effects of alcohol consumption. This threshold was based on previous reviews examining the association between long-term cognitive impairment and alcohol consumption (e.g. Anstey 2009 specified 12 months) and guidance from the Cochrane Dementia and Cochrane Improvement Group, which suggests a minimum follow-up of 9 months for studies examining progression from mild cognitive impairment to dementia. We deliberately specified a shorter period to ensure studies reporting important long-term effects were not missed.
No restrictions were placed on the number of points at which the outcome was measured, but the length of follow-up and number of measurement points (including a baseline measure of cognition) was considered when interpreting study findings and in deciding which outcomes were similar enough to combine for synthesis. Since long-term cognitive impairment is characterised as a decline from a previous level of cognitive function and implies a persistent effect, studies with longer-term outcome follow up at multiple time points should provide the most direct evidence.

Checklist item Examples
We anticipated that individual studies would report data for multiple cognitive outcomes. Specifically, a single study may report results: • For multiple constructs related to cognitive function, for example, global cognitive function and cognitive ability on specific domains (e.g. memory, attention, problem-solving, language); • Using multiple methods or tools to measure the same or similar outcome, for example reporting measures of global cognitive function using both the Mini-Mental State Examination and the Montreal Cognitive Assessment; • At multiple time points, for example, at 1, 5, and 10 years.
Where multiple cognition outcomes were reported, we selected one outcome for inclusion in analyses and for reporting the main outcomes (e.g. for GRADEing), choosing the result that provided the most complete information for analysis. Where multiple results remained, we listed all available outcomes (without results) and asked our content expert to independently rank these based on relevance to the review question, and the validity and reliability of the measures used. Measures of global cognitive function were prioritised, followed by measures of memory, then executive function.
In the circumstance where results from multiple multivariable models were presented, we extracted associations from the most fully adjusted model, except in the case where an analysis adjusted for a possible intermediary along the causal pathway (i.e. post-baseline measures of prognostic factors (e.g. smoking, drug use, hypertension))" (26) Example 2: In a review examining the effects of shockwave therapy for rotator cuff disease, the authors list and define the outcomes for which data were sought (e.g. pain, function), and specify the decision rules used to decide which results to collect when multiple were available in studies (e.g. when multiple measures, time points and unadjusted and adjusted analyses were available): "We presented the major outcomes below in the 'Summary of findings' tables.
• Participant-reported pain relief of 30% or greater.
• Mean pain score, or mean change in pain score on VAS or Numerical Rating Scale (NRS) or categorical rating scale (in that order of preference). • Disability or function.
• Composite endpoints measuring 'success' of treatment such as participants feeling no further symptoms.
• Quality of life.
• Number of participant withdrawals, for example, due to adverse events or intolerance to treatment.
• Number of participants experiencing any adverse event.
We extracted outcome measures assessing benefits of treatment (e.g. pain, function, success, quality of life) at the time points: • up to six weeks; • greater than six weeks to three months (this was the primary time point); • greater than three months to up to six months; • greater than six months to 12 months; • greater than 12 months.
If data were available in a trial at multiple time points within each of the above periods (e.g. at four, five and six weeks), we only extracted data at the latest possible time point of each period. We extracted adverse events, calcification resolution and treatment success at the end of the trial.
For a particular systematic review outcome there may be a multiplicity of results available in the trial reports (e.g. multiple scales, time points and analyses). To prevent selective inclusion of data based on the results, we used the following a priori defined decision rules to select data from trials.
• Where trialists reported both final values and change from baseline values for the same outcome, we extracted final values. • Where trialists reported both unadjusted and adjusted values for the same outcome, we extracted unadjusted values.
• Where trialists reported data analysed based on the intention-to-treat (ITT) sample and another sample (e.g. perprotocol, as-treated), we extracted ITT-analysed data.
Where trials did not include a measure of overall pain but included one or more other measures of pain, for the purpose of combining data for the primary analysis of overall pain, we combined overall pain with other types of pain in the following hierarchy: • overall or unspecified pain; • pain at rest; • pain with activity; • daytime pain; • night-time pain.
Where trials included more than one measure of disability or function, we extracted data from the one function scale that was highest on the following a priori defined list: • Shoulder Pain And Disability Index (SPADI); • Shoulder Disability Questionnaire (SDQ); • Constant score; • Disabilities of the Arm, Shoulder and Hand (DASH); • Health Assessment Questionnaire (HAQ); • any other function scale.
Where trials included more than one measure of treatment success, we extracted data from the one function scale that was highest on the following a priori defined list: • participant-defined measures of success, such as asking participants if treatment was successful; • trialist-defined measures of success, such as a 30-point increase on the Constant Score." (29) Example 3: In a review examining the effects of strategies to improve the implementation of healthy eating, physical activity and obesity prevention policies, practices or programmes within childcare services, the authors report the following decision rules to select results when multiple were available in study reports (e.g. multiple time points, multiple outcome measures, change scores versus final values): "We reported measures of treatment effect from included studies that were adjusted for potential confounding variables over reported estimates that were not adjusted for potential confounding. Where studies used multiple follow-up periods, we used data from the final (most recent) study follow-up. We included data from the primary implementation outcome in meta-analyses. In instances where the authors of included studies did not identify a primary implementation outcome, we used the outcome on which the study sample size and power calculation was based. In its absence, for studies using score-based measures of implementation, and reporting total and subscale scores, we assumed the total score represented the primary implementation outcome. Otherwise, we attempted to calculate a relative effect size for each implementation outcome measure, rank these based on effect size and used the measure reporting the median effect size to include in any pooled analysis. We calculated the effect size by subtracting the change from baseline of the primary implementation outcome for the control or comparison group from the change from baseline in the experimental or intervention group. If data to enable calculation of the change from baseline were unavailable, we used the differences between groups post-intervention. For score-based measures, we calculated a standardised ('d') measure of effect size for each outcome to rank the effect size. Where there were an even number of implementation outcomes, one of the two measures at the median was randomly selected and used for inclusion in metaanalysis." (30)

Checklist item Examples
Example 4: In a review examining the effects of, the authors report how the outcome domains were selected and decision rules used to select results from among multiple measurement instruments: "Twelve dementia care partners (nurses, allied health professionals, physicians, and a caregiver) selected our study outcomes (18) by independently ranking a group of commonly reported neuropsychiatric symptoms (for example, aggression, agitation, and sleep disturbances) in descending order of importance. The care partners selected change in aggression as our main outcome and change in agitation as our secondary outcome… For all of our NMAs, we preferentially abstracted a scale (e.g. Neuropsychiatry inventory (NPI) -agitation subscale, CMAI) reported by study authors before abstracting an individual aggressive or agitated behaviour (e.g. kicking, biting, screaming). Only in the case of our NMA for the outcome of overall agitation and aggression were there cases where study authors reported more than one scale for the same outcome (e.g. NPI-agitation subscale and CMAI). The CMAI was the most commonly reported scale for the outcome of overall agitation and aggression. The NPI-agitation subscale was the second most common scale for the outcome of overall agitation and aggression. Other scales were reported much less frequently. Therefore, the CMAI was always preferentially abstracted, where reported. If the CMAI was not reported, but the NPIagitation subscale was reported, then it was preferentially abstracted before any other scales used to report the outcome of overall agitation and aggression." (31) Item 10b. DATA ITEMS: List and define all other variables for which data were sought (e.g. participant and intervention characteristics, funding sources).
Describe any assumptions made about any missing or unclear information.
Example 1: In a review examining the long-term effects of alcohol consumption on cognitive function, the authors list and define all variables for which data were sought, including characteristics of the study design, exposure and comparator, and participants: "We extracted information relating to the characteristics of included studies and results as follows.

Study identifiers and characteristics of the study design
• Study references (multiple publications arising from the same study were matched to an index reference, which is the study from which results were selected for analysis or summary) • Study or cohort name, location, and commencement date • Study design (categorised as 'prospective cohort study', 'nested case-control study', or 'other' using the checklist of study design features developed by Reeves and colleagues) • Funding sources and funder involvement in the study 2. Characteristics of the exposure and comparator groups • Levels of alcohol consumption as defined in the study, including details of how consumption was measured and categorised, and information required to convert data for reporting and analysis o Qualitative descriptors of each category, if used (e.g. never or non-drinker, abstainer, former drinker, low/moderate/heavy consumption) o Upper and lower boundaries of each category (e.g. 1 to 29 g per day; 5.1 to 10 units per week based on a standard drink in the UK) o Group used as referent category (comparator) in analyses and how defined o Units of measurement (e.g. standard units of alcohol per day and definition of unit) o Method of collecting alcohol consumption data (e.g. retrospective survey involving recall of alcohol consumption over different periods of life; intake diaries to measure current alcohol consumption); time points at which exposure data were collected o Sample size for each exposure group at each measurement point and included in analysis; number lost to follow up [these data were used in the analysis and risk of bias assessment] o Any additional parameters used to derive each category or exposure measure (e.g. alcohol consumption at each drinking occasion; frequency of drinking; recall period) • Patterns of exposure o Any additional data not listed above that characterises and quantifies different patterns of alcohol exposure (e.g. consumption on heaviest drinking day; diagnosis of an alcohol-use disorder such as dependence or harmful drinking, and the method of assessment; definition of other frequency-based categories used to characterise patterns of drinking such as occasional drinking or infrequent consumption) o Duration/length of exposure period at study baseline and follow-up (directly reported or data that can be used to calculate) o Age at commencement of drinking (initial exposure)

Characteristics of participants
• Age at baseline and follow up, sex, ethnicity, co-morbidities, socio-economic status (including education), use of licit or illicit drugs, family history of alcohol dependence • Other characteristics of importance within the context of each study • Eligibility criteria used in the study" (26) Example 2: In this review examining the effects of pharmacological, psychological, and non-invasive brain stimulation interventions for treating depression after stroke, the authors list and define all variables for which data were sought, including characteristics of the report, participants, study design and intervention: Examples "We collected data on: • the report: author, year, and source of publication; • the study: sample characteristics, social demography, and definition and criteria used for depression; • the participants: stroke sequence (first ever vs recurrent), social situation, time elapsed since stroke onset, history of psychiatric illness, current neurological status, current treatment for depression, and history of coronary artery disease; • the research design and features: sampling mechanism, treatment assignment mechanism, adherence, non-response, and length of follow up; • the intervention: type, duration, dose, timing, and mode of delivery." (32)

Example 3: In this review examining the effects of caregiver involvement in interventions for improving children's dietary intake and physical activity behaviours, the authors report their assumption about ages of children when such information was not reported:
"When trial authors reported child grade rather than age, we assumed the following age distributions: kindergarten, four to six years; first grade, five to seven years; second grade, six to eight years, third grade, seven to nine; fourth grade, 8 to 10; fifth grade 9 to 11; sixth grade, 10 to 12; seventh grade, 11 to 13; eighth grade, 12 to 14; ninth grade, 13 to 15; tenth grade, 14 to 16; eleventh grade, 15 to 17; and twelfth grade, 16 to 18." (33) Item 11. STUDY RISK OF BIAS ASSESSMENT: Specify the methods used to assess risk of bias in the included studies, including details of the tool(s) used, how many reviewers assessed each study and whether they worked independently, and if applicable, details of Example 1: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors specify the risk of bias tool used, the domains of bias addressed by the tool, how many reviewers assessment each study and how an overall judgement was reached: "We assessed risk of bias in the included studies using the revised Cochrane 'Risk of bias' tool for randomised trials (RoB 2.0) (Higgins 2016a), employing the additional guidance for cluster-randomised and cross-over trials (Eldridge 2016; Higgins 2016b). RoB 2.0 addresses five specific domains: (1) bias arising from the randomisation process; (2) bias due to deviations from intended interventions; (3) bias due to missing outcome data; (4) bias in measurement of the outcome; and (5) bias in selection of the reported result. Two review authors independently applied the tool to each included study, and recorded supporting information and justifications for judgements of risk of bias for each domain (low; high; some concerns). Any discrepancies in judgements of risk of bias or justifications for judgements were resolved by discussion to reach consensus between the two review authors, with a third review author acting as an arbiter if necessary. Following guidance given for RoB 2.0 (Section 1.3.4) (Higgins 2016a), we derived an overall summary 'Risk of bias' judgement (low; some concerns; high) for each specific outcome, whereby the overall RoB for each study was determined by the highest RoB level in any of the domains that were assessed." (20) automation tools used in the process.
Example 2: In a review examining the effects of red light camera interventions for reducing traffic violations and traffic crashes, the authors report the risk of bias domains they assessed, how each were rated, and how many reviewers performed assessments: "The expanded risk of bias analysis was based on six dimensions that focused on the design of the study, the analysis of the data, and the contents of the study report. These six dimensions, which conform to the requirements set forth by the UK Economic and Social Research Council (ESRC), are: 1. Selection and matching of intervention and control areas 2. Blinding of data collection and analysis 3. Pre-and postintervention data collection periods 4. Reporting of results 5. Control of confounders 6. Control of other potential sources of bias See Appendix G for a list of the 17 specific criteria included in each dimension. Each individual criterion statement was scored on whether it was True, False, or Unclear and these were used to assess each study on whether it presented a high, low, or unclear risk of bias across the six domains.
Risk of bias assessment was performed independently by three review authors (E.G.C., S.K., and C.P.). For the studies identified in the previous review, the same three review authors independently assessed the risk of bias of the included studies. Any discrepancies were resolved by deferment to further review authors (R.S. and P.E.). All disagreements were resolved by consensus." (34) Item 12. EFFECT MEASURES: Specify for each outcome the effect measure(s) (e.g. risk ratio, mean difference) used in the synthesis or presentation of results.
Example 1: In a review examining the effects of psychological interventions to foster resilience in healthcare students, the authors report planning to use the risk ratio for dichotomous outcomes and the standardised mean difference for continuous outcomes: "We planned to analyse dichotomous outcomes by calculating the risk ratio (RR) of a successful outcome (i.e. improvement in relevant variables) for each trial…Because the included resilience-training studies used different measurement scales to assess resilience and related constructs, we used standardised mean difference (SMD) effect sizes (Cohen's d) and their 95% confidence intervals (CIs) for continuous data in pair-wise meta-analyses." (35) Examples Example 2: In a review comparing the effects of pars plana vitrectomy combined with scleral buckle with pars plana vitrectomy alone for giant retinal tear, the authors report using the risk ratio in the synthesis or presentation of results for dichotomous outcomes: "We estimated the risk ratio (RR) and its 95% confidence interval (CI) after surgery (pars plana vitrectomy combined with scleral buckle vs pars plana vitrectomy alone) for the following dichotomous outcomes with information obtained from the included studies.
• Second surgery for retinal reattachment.
• Development of adverse events such as retinal detachment recurrence, elevation of intraocular pressure above 21 mmHg, choroidal detachment, cystoid macular edema, macular pucker, proliferative vitreoretinopathy, progression of cataract in initially phakic eyes, and any other adverse events reported by included trials at any time from day one up to the last reported follow-up visit after surgery." (36) Example 3: In a review examining the effects of metformin for endometrial hyperplasia, the authors report using the hazard ratio or odds ratio in the synthesis or presentation of time-to-event (survival) outcomes" "For survival outcomes (e.g. regression of endometrial hyperplasia, recurrence of endometrial hyperplasia, progression to endometrial carcinoma), we planned to calculate hazard ratios if data were available. Otherwise, we would calculate rates at a set time point, using the Mantel-Haenszel odds ratio (OR) and the numbers of events in control and intervention groups." (37) Item 13a. SYNTHESIS METHODS: Describe the processes used to decide which studies were eligible for each synthesis (e.g. tabulating the study intervention characteristics and comparing against the planned groups Example 1: In a review examining the effects of interventions to reduce homelessness, the authors report categorising the interventions delivered in the included studies according to four dimensions: "Given the complexity of the interventions being investigated, we attempted to categorize the included interventions along four dimensions: (1) was housing provided to the participants as part of the intervention; (2) to what degree was the tenants' residence in the provided housing dependent on, for example, sobriety, treatment attendance, etc.; (3) if housing was provided, was it segregated from the larger community, or scattered around the city; and (4) if case management services were provided as part of the intervention, to what degree of intensity. We created categories of interventions based on the above dimensions: 1. Case management only 2. Abstinence-contingent housing 3. Non-abstinence-contingent housing 4. Housing vouchers

Residential treatment with case management
Some of the interventions had multiple components (e.g. abstinence-contingent housing with case management). These interventions were categorized according to the main component (the component that the primary authors emphasized). They were also placed in separate analyses. We then organized the studies according to which comparison intervention was used (any of the above interventions, or usual services)." (38) Item 13b. SYNTHESIS METHODS: Describe any methods required to prepare the data for presentation or synthesis, such as handling of missing summary statistics, or data conversions.
Example 1: In a review examining the effects of interventions to reduce homelessness, the authors report methods used to calculate standard deviations from other statistics reported: "In cases where the means, number of participants and test statistics for t-test were reported, but not the standard deviations, and there was the opportunity to include results in a meta-analysis, we calculated standard deviations, assuming the same standard deviation for each of the two groups (intervention and control)" (38).

Example 2: In a review examining the effects of interventions to reduce homelessness, the authors report methods used to combine intervention groups of multi-arm studies:
"Where we were interested in an intervention and it was compared to two or more comparison interventions that were both considered to be within the realm of "usual services", we combined the two comparison arms into one comparison group and compared the means of the combined control groups to the intervention for a given outcome (for Morse 1992). In one study we have combined two intervention arms that both employed slightly differing versions of an intervention (assertive community treatment) into one intervention group and compared that to the usual services comparison condition (for Morse 1997)" (38).
Example 3: In a review examining the effects of food fortification with multiple micronutrients on health outcomes in the general population, the authors report estimating and imputing intra-cluster correlation coefficients for cluster-randomised trials: "We used cluster-adjusted estimates from cluster randomised controlled trials (c-RCTs) where available. If the studies had not adjusted for clustering, we attempted to adjust their standard errors using the methods described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019), using an estimate of the intra-cluster correlation coefficient (ICC) derived from the trial. If the trial did not report the cluster-adjusted estimated or the ICC, we imputed an ICC from a similar study included in the review, adjusting if the nature or size of the clusters was different (e.g. households compared to classrooms). We assessed any imputed ICCs using sensitivity analysis." (39)

Checklist item Examples
Example 4: In a review examining the effects of manually-generated reminders delivered on paper on professional practice and patient outcomes, the authors report standardising the direction of effects across studies: "Some studies targeted quality problems that involve 'underuse', so that improvements in quality correspond to increases in the percentage of patients who receive a target process of care (for example, increasing the percentage of patients who receive the influenza vaccine). However, other studies targeted 'overuse', so that improvements correspond to reductions in the percentage of patients receiving inappropriate or unnecessary processes of care (for example, reducing the percentage of patients who receive antibiotics for viral upper respiratory tract infections). In order to standardise the direction of effects, we defined all process adherence outcomes so that higher values represented an improvement. For example, data from a study aimed at reducing the percentage of patients receiving inappropriate medications would be captured as the complementary percentage of patients who did not receive inappropriate medications. Increasing this percentage of patients for whom providers did not prescribe the medications would thus represent an improvement. Each outcome can then be interpreted as compliance with desired practice." (40) Item 13c. SYNTHESIS METHODS: Describe any methods used to tabulate or visually display results of individual studies and syntheses.
Example 1: In a review examining the effects of interventions to reduce ambient particulate matter air pollution on health, the authors report their chosen plot, along with a rationale: "… in line with the review protocol we synthesized evidence narratively as well as graphically using harvest plots. Harvest plots have been shown to be an effective, clear and transparent way to summarize evidence of effectiveness for complex interventions (Ogilvie 2008; Turley 2013). We created eight separate harvest plots, one for health outcomes and one for air quality outcomes for each intervention category." (41) Example 2: In a review examining the effects of transfers and vouchers on the use and quality of maternity care services, the authors report using albatross plots to present results of individual studies, along with a rationale: "Meta-analyses could not be undertaken due to the heterogeneity of interventions, settings, study designs and outcome measures. Albatross plots were created to provide a graphical overview of the data for interventions with more than five data points for an outcome. Albatross plots are a scatter plot of p-values against the total number of individuals in each study. Small pvalues from negative associations appear at the left of the plot, small p-values from positive associations at the right, and studies with null results towards the middle. The plot allows p-values to be interpreted in the context of the study sample size; effect contours show a standardised effect size (expressed as relative risk-RR) for a given p-value and study size, providing an indication of the overall magnitude of any association. We estimated an overall magnitude of association from these contours, but this should be interpreted cautiously." (42) Example 3: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors describe using 'Summary of findings' tables to present the synthesis results: "We developed 'Summary of findings' tables using GRADEpro GDT. These tables comprise summaries of the estimated intervention effect and the number of participants and studies for each primary outcome, and include justifications underpinning GRADE assessments. We planned to present separate summary effect sizes and certainty of evidence ratings for food, alcohol, and tobacco products, and for availability and proximity interventions within each of these product types, but in practice no eligible alcohol or tobacco studies were identified. Results of random-effects meta-analyses are presented as SMDs with 95% CIs.
To facilitate interpretation of these estimated effect sizes, we re-expressed them employing selected familiar metrics of selection or consumption using observational data from a population-representative sample." (20) Item 13d. SYNTHESIS METHODS: Describe any methods used to synthesize results and provide a rationale for the choice(s). If meta-analysis was performed, describe the model(s), method(s) to identify the presence and extent of statistical heterogeneity, and software package(s) used.
Example 1: In a review examining the effects of functional appliance treatment on the temporomandibular joint, the authors report their chosen meta-analysis model, along with a rationale, the between-study variance estimator used, methods used to quantify statistical heterogeneity, and the software packages used: "As the effects of functional appliance treatment were deemed to be highly variable according to patient age, sex, individual maturation of the maxillofacial structures, and appliance characteristics, a random-effects model was chosen to calculate the average distribution of treatment effects that can be expected. A restricted maximum likelihood random-effects variance estimator was used instead of the older DerSimonian-Laird one, following recent guidance. Random effects 95% predictions were to be calculated for meta-analyses with at least three studies to aid in their interpretation by quantifying expected treatment effects in a future clinical setting. The extent and impact of between-study heterogeneity were assessed by inspecting the forest plots and by calculating the tau-squared and the I-squared statistics, respectively. The 95% CIs (uncertainty intervals) around tausquared and the I-squared were calculated to judge our confidence about these metrics. We arbitrarily adopted the I-squared thresholds of > 75% to be considered as signs of considerable heterogeneity, but we also judged the evidence for this heterogeneity (through the uncertainty intervals) and the localization on the forest plot…All analyses were run in Stata SE 14.0 (StataCorp, College Station, TX) by one author." (43)

Example 2: In a review examining the effects of individual-level behavioural smoking cessation interventions tailored for disadvantaged socioeconomic position, the authors report their chosen meta-analysis model, along with a rationale, the between-study variance estimator used, methods used to quantify statistical heterogeneity, and the software packages used:
"Diverse interventions, settings, and participants characterise the field of smoking cessation. We judged it likely that the included studies would show heterogeneity in treatment effect (the observed intervention effects being more different from each other than one would expect because of random error alone). As such, the assumptions of a fixed-effect meta-analysis (that all studies in the meta-analysis share a common overall effect size and that all factors that could influence the effect size are the same across studies), were unlikely to hold…In random-effects meta-analysis models (restricted maximum-likelihood method), we calculated pooled risk ratios (RRs) with 95% confidence intervals (CIs) for both socioeconomic-position-tailored and nonsocioeconomic-position-tailored interventions as the weighted average of each individual study's estimated intervention effect. All computations were done on a log scale with the log RR, its variance, and standard error (SE), before exponentiating the summary effect for interpretation. We explored heterogeneity by observation of forest plots and use of the χ 2 test to show whether observed differences in results were compatible with chance alone. We calculated I 2 statistics to examine the level of inconsistency across study findings…Analyses were done in the RStudio development environment version 1.1.463 using R version 3.5.2 and the metafor package." (44)

Example 3: In a review examining the effects of manually-generated reminders delivered on paper on professional practice and patient outcomes, the authors report calculating the median effect estimate and interquartile range across all included studies:
"We based our primary analyses upon consideration of dichotomous process adherence measures (for example, the proportion of patients managed according to evidence-based recommendations). In order to provide a quantitative assessment of the effects associated with reminders without resorting to numerous assumptions or conveying a misleading degree of confidence in the results, we used the median improvement in dichotomous process adherence measures across studies…With each study represented by a single median outcome, we calculated the median effect size and interquartile range across all included studies for that comparison." (40)

Example 4: In a review examining the effects of homeopathy, the authors report combining P values:
"The statistical approach used, therefore, was the combination of the significance levels (P values). The rationale for this choice is that all the trials explored the same broad question, i.e. "is homeopathic treatment efficacious?", even if, for individual trials, the question asked expressed more specific terms and focused on a given treatment of a particular disease. Thus, unlike in the conventional meta-analytical methods, the method used does not involve pooling the numerical estimates of treatment effect sizes obtained, in our case, in very different situations. Using this approach, the null hypothesis tested is that the effect of interest (in this case, the efficacy of homeopathic treatment) is not present in any of the trials considered. If the null hypothesis is rejected, we can conclude that in at least one trial there is a non-null effect…Thus, we used seven methods: the sum of logs, the sum of Z, the weighted sum of Z, the sum of t, the mean Z, the mean P, the count test and the logit procedure. We present the results obtained with the method that gave the most conservative (least optimistic) results. A two-sided approach was adopted because of the format of the tested hypothesis (i.e. the effect could be either "negative" or "positive")." (45) Item 13e. SYNTHESIS METHODS: Describe any methods used to explore possible causes of heterogeneity among study results (e.g. subgroup analysis, meta-regression).

Example 1: In a review examining the effects of individual-level behavioural smoking cessation interventions tailored for
disadvantaged socioeconomic position, the authors report conducting meta-regression to explore possible causes of heterogeneity among study results, indicating the potential effect modifiers considered and how they were defined, and noted that these were pre-specified: "Given a sufficient number of trials, we used unadjusted and adjusted mixed-effects meta-regression analyses to assess whether variation among studies in smoking cessation effect size was moderated by tailoring of the intervention for disadvantaged groups. The resulting regression coefficient indicates how the outcome variable (log risk ratio (RR) for smoking cessation) changes when interventions take a socioeconomic-position-tailored versus non-socioeconomic-tailored approach. A statistically significant (p<0·05) coefficient indicates that there is a linear association between the effect estimate for smoking cessation and the explanatory variable. More moderators (study-level variables) can be included in the model, which might account for part of the heterogeneity in the true effects. We pre-planned an adjusted model to include important study covariates related to the intensity and delivery of the intervention (number of sessions delivered (above median vs below median), whether interventions involved a trained smoking cessation specialist (yes vs no), and use of pharmacotherapy in the intervention group (yes vs no). These covariates were included a priori as potential confounders given that programmes tailored to socioeconomic position might include more intervention sessions or components or be delivered by different professionals with varying experience. The regression coefficient estimates how the intervention effect in the socioeconomic-position-tailored subgroup differs from the reference group of non-socioeconomic-position-tailored interventions." (44) Example 2: In a review examining the effects of intensive LDL cholesterol-lowering treatment for the prevention of major vascular events, the authors report conducting subgroup analyses and meta-regression to explore possible causes of heterogeneity among study results, indicating the potential effect modifiers considered and how they were defined: "First, we assessed the association between absolute LDL cholesterol reduction (calculated as a difference in baseline minus lastmeasured achieved LDL cholesterol between the treatment groups) and the relative risk (RR) of major vascular events for statins, ezetimibe, and PCSK9 inhibitors. Second, we did analyses to establish the effect of a reduction of 1 mmol/L in LDL cholesterol on the RR of major vascular events, stratified into four groups with mean baseline LDL cholesterol concentrations of 2.60 mmol/L or less, 2.61-3.40 mmol/L, 3.41-4.10 mmol/L, and more than 4.10 mmol/L (the recommended LDL cholesterol thresholds for treatment initiation). Subgroups of trials that reported outcomes of patients with baseline LDL cholesterol less than 2.07 mmol/L (80 mg/dL) were also analysed to most closely approximate a mean baseline LDL cholesterol of 1.80 mmol/L (70 mg/dL; the LDL cholesterol threshold for treatment in high-risk patients in the ACC/AHA and CCS guidelines). Subgroups of trials that reported outcomes of patients by sex, presence or absence of diabetes, presence or absence of chronic kidney disease (defined as estimate glomerular filtration rate <60 mL/min per 1.73 m), and presence or absence of heart failure were also meta-analysed. Meta-regression analyses were done with the following covariates: baseline LDL cholesterol, extent of LDL cholesterol reduction, mean age, 10-year risk of atherosclerotic cardiovascular disease, and median duration of follow-up. Non-standardised and standardised analyses were done for each 1 mmol/L reduction in LDL cholesterol. We used a multivariable model including the same covariates and drug class... Heterogeneity of RRs were assessed using I 2 and Cochran's Q statistic was used to test for differences between subgroups." (46) Example 3: In a review examining the effects of psychological interventions for common mental disorders in women experiencing intimate partner violence in low-income and middle-income countries, the authors report conducting post-hoc subgroup analyses to explore possible causes of heterogeneity among study results, indicating the potential effect modifiers considered and how they were defined: "We anticipated that, in settings where intimate partner violence was sufficiently prevalent to be measured, female therapists might have been considered more culturally acceptable to female participants. We did post-hoc subgroup analyses to compare differences in standardised mean differences (dSMDs) of trauma-focused interventions versus generic psychological interventions, female-delivered interventions versus mixed gender-delivered interventions, novel treatments for low and middle income countries (LMICs) versus those with an established evidence base in high-income countries, and those asking only about recent (within the past 12 months) intimate partner violence versus lifetime intimate partner violence." (5) Item 13f. SYNTHESIS METHODS: Describe any sensitivity analyses conducted to assess robustness of the synthesized results.

Example 1: In a review examining the effects of vitamin D supplementation during pregnancy, the authors report conducting several sensitivity analyses:
"We conducted sensitivity meta-analyses restricted to trials with recent publication (2000 or later); overall low risk of bias (low risk of bias in all seven criteria); and enrolment of generally healthy women (rather than those with a specific clinical diagnosis).
To incorporate trials with zero events in both intervention and control arms (which are automatically dropped from analyses of pooled relative risks), we also did sensitivity analyses for dichotomous outcomes in which we added a continuity correction of 0.5 to zero cells." (47)

Example 2: In a review examining the effects of omega-3, omega-6, and total dietary polyunsaturated fat for prevention and treatment of type 2 diabetes mellitus, the authors report conducting two sensitivity analysis and indicate that these were requested by the funders of the review:
"At the request of the funders, we did an additional sensitivity analysis with respect to compliance. Our protocol stated an intention to subgroup by "recent publications"; we changed this to run a sensitivity analysis including publications before 2010 combined with all publications from 2010 onwards with a trials registry entry (even if published retrospectively). As our funders were particularly interested in effects within trials of at least 12 months, we also ran an analysis limiting to trials of at least 52 weeks' duration." (48) Item 14. REPORTING BIAS ASSESSMENT: Describe any methods used to assess risk of bias due to missing results in a synthesis (arising from reporting biases).
Example 1: In a review examining the effects of surgery for rotator cuff tears, the authors report using funnel plots to assess small-study effects, noting that publication bias is one of several reasons for any asymmetry detected. They also report comparing outcomes specified within and across reports of studies to assess outcome reporting bias: "To assess small-study effects, we planned to generate funnel plots for meta-analyses including at least 10 trials of varying size. If asymmetry in the funnel plot was detected, we planned to review the characteristics of the trials to assess whether the asymmetry was likely due to publication bias or other factors such as methodological or clinical heterogeneity of the trials. To assess outcome reporting bias, we compared the outcomes specified in trial protocols with the outcomes reported in the corresponding trial publications; if trial protocols were unavailable, we compared the outcomes reported in the methods and results sections of the trial publications." (49)

Example 2: In a review examining the effects of interventions for seizures in catamenial (menstrual-related) epilepsy, the authors report assessing selective reporting bias in the included studies by comparing pre-specified with reported outcomes:
"To assess selective reporting bias, we compared the measurements and outcomes planned by the original investigators during the trial with those reported within the published paper by checking the trial protocols (when available) against the information in the final publication. Where published protocols were not available and the trial authors did not provide an unpublished protocol upon request, we compared the methods and the results sections of the published papers. We also used our knowledge of the clinical area to identify where trial investigators had not reported commonly used outcome measures." (50) Example 3: In a review examining the association between quality of dietary fat and genetic risk of type 2 diabetes, the authors report using contour-enhanced funnel plots and a statistical test for funnel plot asymmetry to assess small-study effects, noting that publication bias is one of several reasons for any asymmetry detected: "Small study effects owing to potential publication bias, poor methodological quality in smaller studies, artefactual associations, true heterogeneity, or chance were evaluated by using contour-enhanced funnel plots alongside visual examination and statistical tests for asymmetry (Debray's test)." (51)

Example 4: In a review examining the effects of testosterone treatment on depressive symptoms in men, the authors report using contour-enhanced funnel plots and adjusting the meta-analytic result to assess small-study effects, noting that publication bias is one of several reasons for any asymmetry detected: Examples
"Small-study effects (e.g. publication bias) was checked by contour-enhanced funnel plots and adjusted for by obtaining a precision-effect estimate with standard error. Although precision-effect estimate with standard error tends to slightly underestimate the true association if the observed effects were generated by questionable research practices, simulations suggest that it provides the most precise estimates in the presence of residual effect heterogeneity and small-study effects." (52) Item 15. CERTAINTY ASSESSMENT: Describe any methods used to assess certainty (or confidence) in the body of evidence for an outcome.
Example 1: In a review examining the effects of Tai Chi for rheumatoid arthritis, the authors report using the GRADE approach for assessing certainty in the body of evidence, stating how many reviewers performed assessments, the domains considered, and software used to perform assessments: "Two people (AM, JS) independently assessed the certainty of the evidence. We used the five GRADE considerations (study limitations, consistency of effect, imprecision, indirectness, and publication bias) to assess the certainty of the body of evidence as it related to the studies that contributed data to the meta-analyses for the prespecified outcomes. We assessed the certainty of evidence as high, moderate, low, or very low. We considered the following criteria for upgrading the certainty of evidence, if appropriate: large effect, dose-response gradient, and plausible confounding effect. We used the methods and recommendations described in sections 8.5 and 8.7, and chapters 11 and 12, of the Cochrane Handbook for Systematic Reviews of Interventions. We used GRADEpro GDT software to prepare the 'Summary of findings' tables (GRADEpro GDT 2015). We justified all decisions to down-or up-grade the certainty of studies using footnotes, and we provided comments to aid the reader's understanding of the results where necessary." (53) Example 2: In a review examining the effects of implantable cardiac defibrillators for people with non-ischaemic cardiomyopathy, the authors report using standardised language to convey their certainty in the body of evidence for an outcome: "We reported our findings using the language suggested by Glenton and colleagues, focusing on the size of the effect and its clinical significance in relation to the certainty of the evidence on which the result is based (including the precision of the effect) (see Appendix 3)." (54)

Example 3: In a review examining the effects of pharmacotherapy for the treatment of cannabis use disorder, the authors report assessing certainty in the body of evidence using the system developed by the Evidence-based Practice Center (EPC) program (established by the US Agency for Healthcare Research and Quality):
"We classified the overall strength of evidence (SOE) for each outcome as high, moderate, low, or insufficient by using an established method that considers study quality, consistency of findings, directness of the comparisons, precision, and applicability (Berkman et al.). For findings with SOE greater than insufficient, we classified the direction of effect as "evidence of benefit," "no benefit" (that is, no difference from placebo or mixed findings), or "favors placebo."" (55) Item 16a. STUDY SELECTION: Describe the results of the search and selection process, from the number of records identified in the search to the number of studies included in the review, ideally using a flow diagram.
Example 1: In a review examining the effects of text message reminders for improving sun protection habits, the authors report results of the search and selection process in text and in a flow diagram: "We found 1,333 records in databases searching. After duplicates removal, we screened 1,092 records, from which we reviewed 34 full-text documents, and finally included six papers [cited]. Later, we searched documents that cited any of the initially included studies as well as the references of the initially included studies. However, no extra articles that fulfilled inclusion criteria were found in these searches (Fig 1)." (56)

Example 2: In a review examining the effects of pharmacological interventions for heart failure in people with chronic kidney disease, the authors report results of the search and selection process in text and in a flow diagram; the diagram includes an additional box specifying the number of studies and participants contributing to each synthesis:
"Our search of the Cochrane Kidney and Transplant specialised register identified 869 records. We identified an additional 78 records using other sources (reference lists of review articles, relevant studies, and clinical practice guidelines) -therefore a total of 947 records (n=176 studies) were identified. We excluded 61 studies (n=252 records), either due to a population other than heart failure (n=38 studies), a non-pharmacological intervention (n=5), follow-up shorter than three months (n=16), or a study design other than a RCT (n=2) (see Characteristics of excluded studies). Overall, 115 studies were eligible. Of these, three are ongoing and awaiting publication of primary data (PARAGON-HF 2018; RELAXAHF-2 2017; TMAC 2007) and will be included in a future update of this review. As a result, 112 studies were included in this review ( Figure 1)." (57)

Example 3: In an updated review examining the effects of non-menthol flavours in e-cigarettes on perceptions and use, the authors report results of the search and selection process in text and in a flow diagram; the diagram shows the flow of records in the original and updated searches:
"A total of 3191 articles resulted from searching the four databases during the initial search (21 March 2018). After authors removed duplicates, 2822 articles remained for title and abstract review, including 14 articles identified through manual search of references. Two authors (CM and HMB) reviewed the titles and abstracts of all 2822 articles. A third author (SK) resolved any discrepancies. Following this step, two authors (CM and HMB) reviewed the full text of all 114 articles eligible for full-text screening. A third author (SK) resolved any discrepancies. Eighty articles were excluded for the following reasons: they did not have data on the specified outcomes (n=27), used qualitative methodologies (n=27), focused on a tobacco product other than ecigarettes (n=12), were only focused on menthol flavour (n=2), was a duplicate (n=1) or were not peer-reviewed, did not include original data, did not include full-text or included only a conference abstract (n=11). Articles that addressed e-cigarettes from the original systematic review (n=17) were then added to the 34 articles identified from this current review, combining for a total of 51 articles included in the final analysis. The study selection processes, which approximate but do not exactly follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology, are illustrated in figure 1." (58) Example 4: In a review examining the effects of gabapentin for neuropathic pain, the authors present a flow diagram delineating the number records according to the information source from which they originated (i.e. databases, trials registers, or other "hidden" sources): See page 52 of Supplement B of the systematic review by Mayo-Wilson et al. (59), available at https://ars.elscdn.com/content/image/1-s2.0-S0895435617307217-mmc1.pdf Item 16b. STUDY SELECTION: Cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded.

Example 1: In a review examining the effects of drug-eluting balloon angioplasty versus uncoated balloon angioplasty for the treatment of in-stent restenosis of the femoropopliteal arteries, the authors cite excluded studies and provide reasons for their exclusion in a table.
"We excluded seven studies from our review (Bosiers 2015; ConSeQuent; DEBATE-ISR; EXCITE ISR; NCT00481780; NCT02832024; RELINE), and we listed reasons for exclusion in the Characteristics of excluded studies tables. We excluded studies because they compared stenting in Bosiers 2015 and RELINE, laser atherectomy in EXCITE ISR, or cutting balloon angioplasty in NCT00481780 versus uncoated balloon angioplasty for in-stent restenosis. The ConSeQuent trial compared DEB versus uncoated balloon angioplasty for native vessel restenosis rather than in-stent restenosis. The DEBATE-ISR study compared a prospective cohort of patients receiving DEB therapy for in-stent restenosis against a historical cohort of diabetic patients. Finally, the NCT02832024 study compared stent deployment versus atherectomy versus uncoated balloon angioplasty alone for in-stent restenosis." (60)

Example 2: In a review examining the effects of organised cervical cancer screening on cervical cancer mortality in Europe, the authors present citations of excluded studies and a table providing reasons for their exclusion, in the appendix:
"Of the remaining 64 articles, 54 were excluded for a variety of other reasons (Fig. 1). Ultimately, this review included a total of ten studies. All included studies were present in the initial database search. Excluded articles are listed in appendix study and present its characteristics.
"Of the 12 unique studies, three were prospective cohort studies, 15 18 22 three were case-control studies, 20 25 26 and six were crosssectional studies 14 16 17 23 24 39 (table 1)." (62) Example 2: In a review examining the effects of antenatal corticosteroids for maturity of term or near term foetuses, the authors include a table presenting for each included study the citation, study location, study design, number of centres, duration of the study, percentage of participants lost to follow-up, number of participants, inclusion criteria, specific drug delivered and its dosage, the control, gestational age of participants at randomization, and how outcomes were defined: " Table 1⇓ shows the characteristics of the included clinical trials." (63)

Example 3: In a review examining the effects of interventions to facilitate shared decision making to address antibiotic use for acute respiratory infections in primary care, the authors include a table presenting the components of interventions delivered in the included studies (e.g. materials delivered, who provided, how it was delivered):
"A summary of the main intervention components is described using the items from the Template for Intervention Description and Replication (TIDieR) checklist (see Table 1)." (64) Item 18. RISK OF BIAS IN STUDIES: Present assessments of risk of bias for each included study.

Example 1: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors present a table indicating the domain-specific and overall risk of bias judgement for each study result, and include justification for assessments in a file uploaded to a public repository:
"We used the RoB 2.0 tool to assess risk of bias for each of the included studies. A summary of these assessments is provided in Table 1. In terms of overall risk of bias, there were concerns about risk of bias for the majority of studies (20/24), with two of these assessed as at high risk of bias (Musher-Eizenman 2010; Wansink 2013a). A text summary is provided below for each of the six individual components of the 'Risk of bias' assessment. Justifications for assessments are available at the following (http://dx.doi.org/10.6084/m9.figshare.9159824)" (20)

Example 2: In a review examining the effects of intensity-modulated radiation therapy in curative-intent management of head and neck squamous cell carcinoma, the authors present a forest plot displaying risk of bias judgements for each study (represented as traffic lights):
" Fig 2. Forest plot (including the risk of bias assessment) demonstrating significant reduction in the risk of acute grade 2 or worse xerostomia with intensity modulated radiation therapy (IMRT) compared to conventional techniques. Note comparable benefit of IMRT over two-dimensional radiotherapy (2D-RT) and three-dimensional radiotherapy (3D-RT) on subgroup analyses.

Example 2: In a review examining the blood pressure lowering efficacy of renin inhibitors for primary hypertension, the authors present forest plots showing summary statistics for each group and effect estimates with confidence intervals for each study, and indicate in a footnote the origin of particular data:
See meta-analysis in Figure 3. The footnote to this forest plot states: "CSPP100A2308 study: the SBP reduction in the treatment and placebo group are reported from the CSR page 61. CSPP100A2405 study: the SD for all treatment groups are calculated from SEM reported on page 7 in the CSR". (67) Item 20a. RESULTS OF SYNTHESES: For each synthesis, briefly summarise the characteristics and risk of bias among contributing studies.

Example 1: In a review examining the effects of breastfeeding programs and policies on maternal health outcomes in developed countries, the authors summarise various characteristics of the studies contributing to a synthesis of the effects of Baby-Friendly Hospital Initiative interventions:
"Twelve included studies (described in 13 publications) assessed the effectiveness of Baby-Friendly Hospital Initiative interventions. All focused on postpartum women enrolled from hospital wards or birth facilities soon after delivery. Studies were conducted in diverse country settings including the United States (two studies); Taiwan (two studies); and one each in the Republic of Belarus, Hong Kong, Czech Republic, Russia, Croatia, Brazil, United Kingdom (multiple regions), and Scotland. All studies focused on multiple hospitals (>4) or clusters of hospitals. The majority of studies focused on women giving birth between 2000 and 2009; two enrolled women in the late 1990s…One included study was an RCT, 10 were prospective cohort studies, and 1 was a single-group pre-post study…In terms of population characteristics, seven studies reported on maternal age and generally enrolled women in their 20s and 30s. Three studies (set in the United States and United Kingdom) reported on race; the percentage of nonwhite participants enrolled ranged from 3 to 47 percent. In the six studies reporting on the percentage of enrolled women who were primiparous, the range was 38 to 67 percent" (68) "Overall, 61 studies described in 83 publications investigated our included tools for determining stroke risk in patients with nonvalvular AF and met the other inclusion criteria for Key Question 1. The included studies explored tools in studies of diverse quality, design, funding, and geographical location. Forty-three included studies were of good quality or rated as low risk of bias, 11 of fair quality or rated as medium risk of bias, and 7 were of poor quality or rated as high risk of bias. Studies with increased risk of bias had potential limitations related to handling of missing data, length of follow up between groups, blinding of outcomes assessors, whether confounders were assessed with reliable measures, and whether potential outcomes were prespecified. The studies covered broad geographical locations with 32 studies conducted in UK or continental Europe, 18 exclusively in the United States, 3 studies exclusively conducted in Canada, and 7 multinational trials. There was one study that did not report geographic location of enrolment. Ten studies were supported solely by industry, 8 studies received solely government support, 6 studies were supported by non-government, non-industry organizations, 15 studies received funding from multiple sources including government, industry, non-government and non-industry, and 22 studies did not report funding or it was unclear. We identified 52 studies using observational study design (prospective and retrospective cohorts) while 9 studies were identified as randomized controlled trials (RCTs)." (69) Example 3: In a review examining the effects of antipsychotics for the prevention and treatment of delirium, the authors summarise various characteristics and the risk of bias in studies comparing delirium incidence between haloperidol and placebo groups: "Nine randomized controlled trials (RCTs) directly compared delirium incidence between haloperidol and placebo groups. These RCTs enrolled 3,408 patients in both surgical and medical intensive care and non-intensive care unit settings and used a variety of validated delirium detection instruments. Five of the trials were low risk of bias, three had unclear risk of bias, and one had high risk of bias owing to lack of blinding and allocation concealment. Intravenous haloperidol was administered in all except two trials; in those two exceptions, oral doses were given. These nine trials were pooled, as they each identified new onset of delirium (incidence) within the week after exposure to prophylactic haloperidol or placebo." (70)

Example 4: In a review examining the effects of quadruple versus triple combination antiretroviral therapies for treatment naive people with HIV, the authors present a figure displaying the characteristics and risk of bias of studies alongside their results for several meta-analyses:
See Graphical Overview for Evidence Reviews visual summary ( Example 1: In a review examining the effects of aspirin for primary prevention of cardiovascular disease, the authors report for a meta-analysis of risk ratios the number of included studies and participants, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency, and they translate the relative effect into absolute terms: "Twelve studies [each study cited], including a total of 159,086 patients, reported on the rate of major bleeding complications. Aspirin use was associated with a 46% relative risk increase of major bleeding complications (risk ratio 1.46; 95% CI, 1.30-1.64; p < 0.00001; I 2 = 31%; absolute risk increase 0.077%; number needed to treat to harm 1295; Fig 1)" (71) Example 2: In a review examining the effect of exercise programmes for ankylosing spondylitis, the authors report for a metaanalysis of mean differences the number of included studies and participants, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency, and they translate the absolute effect into relative terms and describe the clinical importance of the result: "Physical function (BASFI, 0 to 10 scale; lower score indicates higher function): Seven studies (312 participants) found a reduction in physical function score with exercise versus no intervention at the end of the intervention (mean difference (MD) -1.3, 95% confidence interval (CI) -1.7 to -0.9); absolute risk difference 13% (95% CI 9% to 17%); relative change 32% (95% CI 23% to 42%); Analysis 1.1). The statistical heterogeneity was not important (I²= 23%). There was no important clinical meaningful benefit." (72) Example 3: In a review examining the effects of strategies to improve the implementation of healthy eating, physical activity and obesity prevention policies, practices or programmes within childcare services, the authors report for a meta-analysis of standardised mean differences the number of included studies and participants, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency, and they translate the result into units of a particular measurement scale: "Score-based measures of implementation were the most common continuous outcomes in studies comparing an implementation strategy with usual practice or minimal support control and were reported in 11 studies including nine randomised trials. Pooled analysis providing moderate-certainty evidence including all nine randomised trials with score-based measures of implementation [each study cited] reported an improvement (standardised mean difference 0.49; 95% confidence interval 0.19 to 0.79; I 2 = 54%; P < 0.001; participants = 495 services; equivalent to a mean difference of 0.88 on the Environment and Policy Assessment and Observation (EPAO) scale) favouring groups receiving implementation support strategies (Analysis 1.1)." (30) Example 4: In a review examining the effects of workplace interventions for reducing sitting at work, the authors report for a meta-analysis of mean differences the number of included studies, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency and a prediction interval: "Ten studies compared the effects of using a sit-stand desk with or without information and counselling to the effects of using a sit-desk [each study cited]. The pooled analysis showed that the sit-stand desk with or without information and counselling intervention reduced sitting time at work by on average 100 minutes per eight-hour workday (95% confidence interval -116 to -84, I² = 37%; Analysis 1.1)… Data presented by one study, Sandy 2016, did not allow for calculation of time spent in sitting time at work and therefore we did not include the study in the quantitative synthesis. The prediction interval for sitting time ranged from -146 to -54 minutes a day." (73) "Among the 4 trials that recruited critically ill patients who were and were not receiving invasive mechanical ventilation at randomization, the association between corticosteroids and lower mortality was less marked in patients receiving invasive mechanical ventilation (ratio of Example 2: In a review examining the effects of community-based coordinating interventions in dementia care, the authors present results of several subgroup analyses, indicating for each the summary estimate and its precision for each subgroup and the P value for a test for subgroup differences: "Interventions using a case manager with a nursing background showed a greater positive effect on caregiver quality of life compared to those that used other professional backgrounds (standardised mean difference = 0.94 versus 0.03, respectively; p < 0.001). Interventions that did not provide case managers with supervision showed greater effectiveness for reducing the percentage of patients that are institutionalised compared to those that provided supervision (odds ratio = 0.27 versus 0.96 respectively; p = 0.02). There was weak evidence that interventions using a lower caseload for case managers had greater effectiveness for reducing the number of patients institutionalised compared to interventions using a higher caseload for case managers (odds ratio = 0.23 versus 1.20 respectively; p = 0.08). There was little evidence that the other intervention components modify treatment effects (see Table 3)." (75)

Example 3: In a review examining the effects of motor control stabilisation exercises in chronic nonspecific low back pain patients, the authors present results of several meta-regression analyses, indicating for each the regression coefficient and its confidence interval:
"The results of the five meta-regressions…are highlighted in Table 5. The training duration, frequency, total trainings dose and training-to-sustainability ratio showed no impact on the effect size of the primary outcome pain. The PEDro sum score was negatively associated with the effect size; a study with a score-decrease of 1 point shows an increase in the effect size of .24. Fig 9 illustrates this association." (76) Example 4: In a review examining the effects of cannabinoid administration for pain, the authors present results of several meta-regression analyses, indicating for each the regression coefficient and its confidence interval, and using plots to visualise the relationships: "Meta-regression results revealed that, when controlling for other explanatory variables, drug administration conditions were linked with pain reduction among included studies, such that cannabinoids (whole-plant cannabis and whole-cannabis extracts) β = −0.43, 95% confidence interval (CI) (−0.62, −0.24), p < 0.05 (Figure 4), and synthetic cannabinoids (Dronabinol, Nabilone, and CT3) β = −0.39, 95% CI (−0.65, −0.14), p < 0.05 (Figure 4), performed better than placebo. Furthermore, meta-regression results showed that, when controlling for other explanatory variables, sample size was linked with pain reduction, β = 0.01, 95% CI (0.00, 0.01), p < 0.05, such that studies involving smaller samples tended to report greater pain reduction effects (Figure 4). There were no observed interactions between drug administration conditions and sample size. Finally, meta-regression results showed that, when controlling for other explanatory variables, sample sex composition was linked with a modest, however non-significant, effect, β = −0.64, 95% CI (−1.37, 0.09), p = 0.09, such that studies including more female participants tended to report greater pain reductions ( Figure 5)." (77) Item 20d. RESULTS OF SYNTHESES: Present results of all sensitivity analyses conducted to assess the robustness of the synthesized results.

Example 1: In a review examining the effects of vitamin D supplementation during pregnancy, the authors present in an appendix tables showing the results of primary and sensitivity analyses for several meta-analyses:
"The magnitude of the pooled effect remained relatively stable in sensitivity analyses (table S13 in appendix 10)" (47) Example 2: In a review examining the effects of quadruple versus triple combination antiretroviral therapies for treatment naive people with HIV, the authors report that results of several sensitivity analyses were consistent with results of primary meta-analyses: "Sensitivity analyses that removed studies with potential bias showed consistent results with the primary meta-analyses (risk ratio 1.00 for undetectable HIV-1 RNA, 1.00 for virological failure, 0.98 for severe adverse effects, and 1.02 for AIDS defining events; supplement 3E, 3F, 3H, and 3I, respectively). Such sensitivity analyses were not performed for other outcomes because none of the studies reporting them was at a high risk of bias. Sensitivity analysis that pooled the outcome data reported at 48 weeks, which also showed consistent results, was performed for undetectable HIV-1 RNA and increase in CD4 T cell count only (supplement 3J and 3K) and not for other outcomes owing to lack of relevant data. When the standard deviations for increase in CD4 T cell count were replaced by those estimated by different methods, the results of figure 3 either remained similar (that is, quadruple and triple arms not statistically different) or favoured triple therapies (supplement 2)." (66) Example 3: In a review examining the effects of operative treatment versus nonoperative treatment of Achilles tendon ruptures, the authors show in a table the results of primary and sensitivity analyses for two meta-analyses and present forest plots for sensitivity analyses in an appendix: " Table 3 shows the results of the secondary sensitivity analyses. Re-rupture rate was reported in 17 (59%) high quality studies -10 randomized controlled trials and seven observational studies. The overall pooled effect showed that operative treatment was associated with a significant reduction in re-rupture rate compared with nonoperative treatment (risk difference 5.1%; risk ratio 0. 44 other three studies]…The authors reported small but significant improvements on the CIBIC-Plus for 183 patients (89 on latrepirdine and 94 on placebo) favouring latrepirdine following the 26-week primary endpoint (MD -0.60, 95% CI -0.89 to -0.31, P < 0.001). Similar results were found at the additional 52-week follow-up (MD -0.70, 95% CI -1.01 to -0.39, P < 0.001). However, we considered this to be low quality evidence due to imprecision and reporting bias. Thus, we could not draw conclusions about the efficacy of latrepirdine in terms of changes in clinical impression." (79) Example 2: In a review examining the effects of pharmacotherapy for social anxiety disorder, the authors used visual inspection of a contour-enhanced funnel plot to conclude that the asymmetry observed was likely due to publication bias: "There is evidence of possible funnel plot asymmetry providing data on response to short-term medication treatment, both for the SSRIs and all medications combined. Inspection of the contour enhanced funnel plots for the SSRIs ( Figure 4) and all of the trials ( Figure 5) suggests that this asymmetry is due to publication bias, as trials with less precise treatment response outcomes are more likely than their higher precision counterparts to be missing from regions of the plot representing statistically nonsignificant treatment effects. Egger regression tests quantitatively confirmed this visual impression, providing evidence of possible publication bias for all of the medication trials (t = 2.8226, df = 49, P = 0.0069) and for the SSRIs (t = 2.6426, df = 22, P = 0.015)." (80) Example 3: In a review examining the effects of bystander programs on the prevention of sexual assault among adolescents and college students, the authors used visual inspection of a contour-enhanced funnel plot to conclude that the asymmetry observed was unlikely to be due to publication bias: "To examine small study and publication bias we created a contour-enhanced funnel plot of the 11 effect sizes plotted against their standard errors (see Figure 32). Visual inspection of the funnel plot reveals an absence of adverse intervention effects. Given the absence of negative effects in the regions of statistical significance and non-significance, the results from this contourenhanced funnel plot indicate a potential risk of publication bias. To further investigate the possibility of bias, we conducted an Egger test for funnel plot asymmetry. The results provided no significant evidence of small study effects (bias coefficient: 0.36; t: −0.60, p = .56)...With these collective findings, we therefore conclude that the meta-analysis results shown in Figure 31 are likely robust to any small study/publication bias." (81) Item 22. CERTAINTY OF EVIDENCE: Present assessments of certainty (or confidence) in the Example 1: In a review examining the effects of surgery for rotator cuff tears, the authors report their certainty in text, along with rationale for their judgement, and present a Summary of Findings table including certainty judgements for several outcomes: "Compared with non-operative treatment, low-certainty evidence indicates surgery (repair with subacromial decompression) may have little or no effect on function at 12 months. The evidence was downgraded two steps, once for bias and once for imprecision body of evidence for each outcome assessed.
-the 95% CIs overlap minimal important difference in favour of surgery at this time point." (49). The summary of findings table presents the same information as the text above, with footnotes explaining judgements.
Example 2: In a review examining the effects of polyunsaturated fatty acids on patient-important outcomes in children and adolescents with autism spectrum disorder, the authors report their certainty in the abstract text and present Summary of Findings tables including certainty judgements for several outcomes: "Polyunsaturated fatty acids (PUFAs) were superior compared to placebo in reducing anxiety in individuals with autism spectrum disorder (standardised mean difference -1.01, 95% CI -1.86 to -0.17; very low certainty of evidence)…Summary of findings for the comparisons PUFAs versus placebo and PUFAs versus healthy diet are presented in Table 2 and Table 3." (82) Item 23a. DISCUSSION: Provide a general interpretation of the results in the context of other evidence.

Example 1: In a review examining the effects of interventions to improve early grade literacy in Latin America and the
Caribbean, the authors compare their findings with those observed in other relevant reviews: "Although we need to exercise caution in interpreting these findings because of the small number of studies, these findings nonetheless appear to be largely in line with the recent systematic review on what works to improve education outcomes in lowand middle-income countries of Snilstveit et al. (2012). They found that structured pedagogical interventions may be among the effective approaches to improve learning outcomes in low-and middle-income countries. This is consistent with our findings that teacher training is only effective in improving early grade literacy outcomes when it is combined with teacher coaching. The finding is also consistent with our result that technology in education programs may have at best no effects unless they are combined with a focus on pedagogical practices. In line with our study, Snilstveit et al. Example 2: In a review examining the effects of individualized funding interventions to improve health and social care outcomes for people with a disability, the authors describe how their review differs to the methods used in two previous reviews: "As outlined in the protocol, the authors were aware of only two previous systematic reviews prior to commencing this study (Carter Anand et al., 2012; Webber et al., 2014). In one sense, the eligibility criteria within the current study were broader and more inclusive; for example, Webber et al. limited their review to mental health users only. The need for a results refinement process further highlights the broad scope of the current review. In another sense, however, this review was more restrictive in terms of the quality of evidence. To this end, quantitative studies were excluded if they were not designed to robustly evaluate effectiveness or did not have a control group, while previous reviews included studies without control groups (for example). Therefore, the studies included in this review are very different, in some respects from those captured in the above reviews. At the same time, however, the findings from this review were consistent in many respects with the two reviews previously identified." (84) Example 3: In a review examining the effects of Alcoholics Anonymous and other 12-step programs for alcohol use disorder, the authors compare the current review with the previous of the review and with other reviews and studies: "The evidence contained in this review is similar to, and extends that of the prior Cochrane Review (Ferri 2006b), which this review updates and replaces, as well as of other narrative reviews which found overall positive effects for AA/TSF interventions (e.g. Kaskutas 2009a; Kelly 2003b). The results presented in this review are also supported by other published analyses. One study from Project MATCH (Longabaugh 1998), found that regardless of whether outpatients' pre-treatment network was supportive or unsupportive of alcohol use at treatment intake, AA/TSF participants were more likely to be involved with AA, which in turn, subsequently explained the observed lower drinks per drinking day (DDD) and greater PDA advantages for TSF-treated participants observed at the 36-month follow-up. The prior Cochrane Review contained eight studies with 3417 participants (Ferri 2006b), and found that on the whole, AA/TSF interventions were as effective, but not more effective, than the interventions to which they were compared. This new review is based on 27 studies reported in 36 articles and has a total of 10,565 participants. It is considerably larger, comprises more rigorous studies, and found that, compared to other active psychosocial interventions for AUD, AA/TSF interventions often produce greater abstinence -notably continuous abstinence -as well as some reductions in drinking intensity, fewer alcohol-related consequences, and lower alcohol addiction severity. This review also included economic analyses, which augments prior reviews and adds important information regarding the cost-benefits of providing AA/TSF in clinical settings." (85) Item 23b. DISCUSSION: Discuss any limitations of the evidence included in the review.

Example 1: In a review examining the association between marijuana use and risk of cancer, the authors describe various limitations of the included studies:
"Study populations were young, and few studies measured longitudinal exposure. The included studies were often limited by selection bias, recall bias, small sample of marijuana-only smokers, reporting of outcomes on marijuana users and tobacco users combined, and inadequate follow-up for the development of cancer… Most studies poorly assessed exposure, and some studies did not report details on exposure, preventing meta-analysis for several outcomes." (86) Example 2: In a review examining indicators associated with job morale among physicians and dentists in low-income and middle-income countries, the authors describe the limited applicability of the conclusions to people in low and middle income countries: "…despite the use of a comprehensive search strategy, almost all included studies were from middle-income countries, possibly reflecting the shortage of resources for such studies in low-income countries. This means that our findings cannot be generalized to low-income countries. Also, relatively fewer findings were available from Africa, Southern Europe, and Central, Southern, and Southeastern Asia, which made it challenging to generalize conclusions about low and middle income countries." (87) Item 23c. DISCUSSION: Discuss any limitations of the review processes used.
Example 1: In a review examining the effect of quarantine alone or in combination with other public health measures to control COVID-19, the authors report several limitations of the review processes used: "Because of time constraints…we dually screened only 30% of the titles and abstracts; for the rest, we used single screening. A recent study showed that single abstract screening misses up to 13% of relevant studies (Gartlehner 2020). In addition, single review authors rated risk of bias, conducted data extraction and rated certainty of evidence. A second review author checked the plausibility of decisions and the correctness of data. Because these steps were not conducted dually and independently, we introduced some risk of error…Nevertheless, we are confident that none of these methodological limitations would change the overall conclusions of this review. Furthermore, we limited publications to English and Chinese languages. Because COVID-19 has become a rapidly evolving pandemic, we might have missed recent publications in languages of countries that have become heavily affected in the meantime (e.g. Italian or Spanish)." (88) Example 2: In a review examining the effects of regular inhaled therapies for patients with stable chronic obstructive pulmonary disease, the authors report several limitations of the review processes used: "We acknowledge several limitations…Although our network meta-analysis included all available randomized controlled trials, we could not conduct a subgroup analysis to identify a specific group of patients who could benefit from triple therapy more prominently…Because studies reporting information -such as eosinophil counts and chronic bronchitis -were fewer than expected, we could not generate a sufficient network for the sensitivity and meta-regression analyses. In addition, we did not evaluate the symptoms, use of rescue medication, quality of life, and lung function, which are other important outcomes." (89) Example 3: In a review examining the effects of red and processed meat consumption on risk of cardiometabolic and cancer outcomes, the authors report several limitations of the review processes used: "One of the primary limitations of our work is the heterogeneity of dietary patterns across studies. Although all patterns discriminated between participants with low and high intake of red and processed meat, other food and nutrient characteristics of dietary patterns and the quantity of red and processed meat consumed varied widely across studies. Moreover, the quantity of red and processed meat consumed differed across dietary patterns and studies. For example, one study compared 1.4 versus 3.5 servings of processed meat per week, whereas another compared 0.7 versus 4.9 servings per week. Such inconsistencies may have increased heterogeneity of meta-analyses and potentially reduced the magnitude of observed associations. Also, analyses of extreme categories of adherence may artificially inflate effect estimates and may not be indicative of effects observed at typical levels of adherence. Second, we were unable to analyze the data separately for red and processed meat because authors typically combined them or did not distinguish between them in primary studies." (90) Item 23d. DISCUSSION: Discuss implications of the results for practice, policy, and future research.
Example 1: In a review examining the effects of bystander programs on the prevention of sexual assault among adolescents and college students, the authors discuss the implications for practice given the evidence of benefit observed: "Implications for practice and policy: Findings from this review indicate that bystander programs have significant beneficial effects on bystander intervention behaviour. This provides important evidence of the effectiveness of mandated programs on college campuses. Additionally, the fact that our (preliminary) moderator analyses found program effects on bystander intervention to be similar for adolescents and college students suggests early implementation of bystander programs (i.e., in secondary schools with adolescents) may be warranted. Importantly, although we found that bystander programs had a significant beneficial effect on bystander intervention behaviour, we found no evidence that these programs had an effect on participants' sexual assault perpetration. Bystander programs may therefore be appropriate for targeting bystander behaviour, but may not be appropriate for targeting the behaviour of potential perpetrators. Additionally, effects of bystander programs on bystander intervention behaviour diminished by 6-month post-intervention. Thus, programs effects may be prolonged by the implementation of booster sessions conducted prior to 6 months post-intervention.
Implications for research: Findings from this review suggest there is a fairly strong body of research assessing the effects of bystander programs on attitudes and behaviors. However, there are a couple of important questions worth further exploration. First, according to one prominent logical model, bystander programs promote bystander intervention by fostering prerequisite knowledge and attitudes (Burn, 2009). Our meta-analysis provides inconsistent evidence of the effects of bystander programs on knowledge and attitudes, but promising evidence of short-term effects on bystander intervention. This casts uncertainty around the proposed relationship between knowledge/attitudes and bystander behavior. Although we were unable to assess these issues in the current review, this will be an important direction for future research. Our understanding of the causal mechanisms of program effects on bystander behavior would benefit from further analysis (e.g., path analysis mapping relationships between specific knowledge/attitude effects and bystander intervention). Second, bystander programs exhibit a great deal of content variability, most notably in framing sexual assault as a gendered or gender-neutral problem. That is, bystander programs tend to adopt one of two main approaches to addressing sexual assault: (a) presenting sexual assault as a gendered problem (overwhelmingly affecting women) or (b) presenting sexual assault as a gender-neutral problem (affecting women and men alike). Differential effects of these two types of programs remain largely unexamined. Our analysis indicated that (a) the sex of victims/perpetrators (i.e., portrayed in programs as gender neutral or male perpetrator and female victim) and (b) whether programs were implemented in mixed-or single-sex settings were not significant moderators of program effects on bystander intervention. However, these findings are limited to a single outcome and they should be considered preliminary, as they are based on a small sample (n = 11). Our understanding of the differential effects of gendered versus gender neutral programs would benefit from the design and implementation of high-quality primary studies that make direct comparisons between these two types of programs (e.g., RCTs comparing the effects of two active treatment arms that differ in their gendered approach). Finally, our systematic review and meta-analysis demonstrate the lack of global evidence concerning bystander program effectiveness. Our understanding of bystander programs' generalizability to non-US contexts would be greatly enhanced by high quality research conducted across the world." (81) Example 2: In a review examining the effects of trauma-informed approaches in schools, the authors discuss the implications for practice given the lack of evidence of benefit: "From this review, it seems like the most prudent thing for school leaders, policymakers, and school mental health professionals to do would be proceed with caution in their embrace of a trauma-informed approach as an overarching framework and conduct rigorous evaluation of this approach. We simply do not have the evidence (yet) to know if this works, and indeed, we do not know if using a trauma-informed approach could actually have unintended negative consequences for traumatized youth and school communities. We also do not have evidence of other potential costs in implementing this approach in schools, whether they be financial, academic, or other opportunity costs, and whether benefits outweigh the costs of implementing and maintaining this approach in schools. That said, calling for caution in adopting trauma-informed care in schools does not preclude schools from continuing to implement evidence-informed programs that target trauma symptoms in youth, or that they should simply wait for the research to provide unequivocal answers. The benefit of the trauma-informed approach being made freely available by SAMHSA and other policymakers is that these components can form the basis for a school (or school district) to begin to adapt and apply this approach in schools." (91) Example 3: In a review examining the effects of lenvatinib and sorafenib for differentiated thyroid cancer, the authors list implications for future research in order of priority: "In order of priority, the assessment group suggests the following further research priorities: 1. Clinical advice to the assessment group is that only radioactive iodine-refractory differentiated thyroid cancer patients experiencing symptoms, or those who have clinically significant progressive disease, are likely to be treated in routine clinical practice. Subgroup analyses suggest that the effects on progression-free survival are similar for patients treated with sorafenib regardless of whether they are symptomatic or asymptomatic. However, these findings are post hoc and include only a minority of symptomatic patients. It is unclear if other outcomes, such as overall survival, objective tumour response rate, adverse events and health-related quality of life, differ by symptomatic or asymptomatic disease. Future Item 24b. REGISTRATION AND PROTOCOL: Indicate where the review protocol can be accessed, or state that a protocol was not prepared.
Example 1: In a review examining the effects of psychological interventions for fatigue in cancer survivors, the authors report that the protocol for the review is published and provide a citation for it: "The review protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO) database (registration number: CRD42014015219) and the protocol has been published [citation for protocol provided]." (96) Example 2: In a review examining psychotropic medication non-adherence and its associated factors among patients with major psychiatric disorders, the authors report that the protocol for the review is published and provide a citation for it: "…this systematic review and meta-analysis protocol has been published elsewhere [citation for the protocol provided]." (97) Item 24c. REGISTRATION AND PROTOCOL: Describe and explain any amendments to information provided at registration or in the protocol.
Example 1: In a review examining the effects of pharmacological interventions for the treatment of delirium in critically ill adults, the authors describe and explain several amendments to information provided in the protocol: "Differences between protocol and review: In our protocol (Burry 2015), we planned the primary outcome to be duration of delirium, defined as the time from which it was first identified to when it was first resolved (i.e. screened negative as defined by study authors (e.g. first negative screen, two consecutive screenings)), measured in days, and our secondary outcome to be the total duration of delirium, measured in days. There was far more variability in the definition of the outcome used than we had anticipated. Only two trials reported on the duration of delirium's first episode, and the remaining trials reported days with delirium, time in delirium, or total duration of delirium; most did not report when delirium was identified or how trial authors defined resolution of delirium. We therefore chose to report the total duration of delirium as our primary outcome and to pool the variable definitions. We added the outcome number of days in coma, as this outcome was reported in four trials, and we believed it important to include it in this review, as it is a newer outcome that is likely to be included in subsequent studies." (98) Example 2: In a review examining the effects of pharmacologic therapies on patients with idiopathic sudden sensorineural hearing loss, the authors describe and explain several amendments to information provided in the protocol: "We incurred no deviations from the a priori review protocol, with the exception of a minor modification of our modelling approach for: 1) continuous endpoints at baseline and at final follow-up with corresponding standard deviations (but without average changes and corresponding standard deviations per group) in certain studies; and 2) follow-up time due to variations in endpoints assessment time across studies." (99) Example 3: In a review examining the effects of deworming in non-pregnant adolescent girls and adult women, the authors describe and explain several amendments to information provided in the protocol: collection forms; data extracted from included studies; data used for all analyses; analytic code; any other materials used in the review.
Example 2: In a review examining the effects of self-management smartphone-based apps for post-traumatic stress disorder symptoms, the authors report that the data and analytic code are publicly available in the Open Science Framework repository and provide a DOI for readers to access the files: "All data and code are stored on a repository of the Open Science Framework (doi: 10.17605/OSF.IO/DZJT7)" (110) Example 3: In a review examining the effects of specialised treatments for anorexia nervosa, the authors report that the data and analytic code are publicly available in the Open Science Framework repository and provide a URL for readers to access the files: "The dataset and script to perform the analyses are available at https://osf.io/q7v2d/?view_only=c3cdaf346298411eab9ed15e863c9f21." (111)