Intended for healthcare professionals

CCBYNC Open access

Living network meta-analysis compared with pairwise meta-analysis in comparative effectiveness research: empirical study

BMJ 2018; 360 doi: (Published 28 February 2018) Cite this as: BMJ 2018;360:k585
  1. Adriani Nikolakopoulou, research associate in biostatistics1,
  2. Dimitris Mavridis, assistant professor of statistics2 3,
  3. Toshi A Furukawa, professor of clinical epidemiology4,
  4. Andrea Cipriani, associate professor of psychiatry5 6,
  5. Andrea C Tricco, research scientist7 8,
  6. Sharon E Straus, professor of medicine7 9,
  7. George C M Siontis, consultant cardiologist10,
  8. Matthias Egger, professor of epidemiology and public health1,
  9. Georgia Salanti, associate professor of biostatistics and epidemiology1
  1. 1Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
  2. 2Department of Primary Education, University of Ioannina, Ioannina, Greece
  3. 3Centre de Recherche Épidémiologie et Statistique Sorbonne Paris Cité, Inserm/Université Paris Descartes, Paris, France
  4. 4Departments of Health Promotion and Human Behavior and of Clinical Epidemiology, Kyoto University Graduate School of Medicine/School of Public Health, Kyoto, Japan
  5. 5Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, UK
  6. 6Oxford Health NHS Foundation Trust, Warneford Hospital, Oxford, UK
  7. 7Knowledge Translation Program, Li Ka Shing Knowledge Institute, St Michael’s Hospital, Toronto, Ontario, Canada
  8. 8Epidemiology Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
  9. 9Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
  10. 10Department of Cardiology, Bern University Hospital, Bern, Switzerland
  1. Correspondence to: G Salanti georgia.salanti{at}
  • Accepted 22 January 2018


Objective To examine whether the continuous updating of networks of prospectively planned randomised controlled trials (RCTs) (“living” network meta-analysis) provides strong evidence against the null hypothesis in comparative effectiveness of medical interventions earlier than the updating of conventional, pairwise meta-analysis.

Design Empirical study of the accumulating evidence about the comparative effectiveness of clinical interventions.

Data sources Database of network meta-analyses of RCTs identified through searches of Medline, Embase, and the Cochrane Database of Systematic Reviews until 14 April 2015.

Eligibility criteria for study selection Network meta-analyses published after January 2012 that compared at least five treatments and included at least 20 RCTs. Clinical experts were asked to identify in each network the treatment comparison of greatest clinical interest. Comparisons were excluded for which direct and indirect evidence disagreed, based on side, or node, splitting test (P<0.10).

Outcomes and analysis Cumulative pairwise and network meta-analyses were performed for each selected comparison. Monitoring boundaries of statistical significance were constructed and the evidence against the null hypothesis was considered to be strong when the monitoring boundaries were crossed. A significance level was defined as α=5%, power of 90% (β=10%), and an anticipated treatment effect to detect equal to the final estimate from the network meta-analysis. The frequency and time to strong evidence was compared against the null hypothesis between pairwise and network meta-analyses.

Results 49 comparisons of interest from 44 networks were included; most (n=39, 80%) were between active drugs, mainly from the specialties of cardiology, endocrinology, psychiatry, and rheumatology. 29 comparisons were informed by both direct and indirect evidence (59%), 13 by indirect evidence (27%), and 7 by direct evidence (14%). Both network and pairwise meta-analysis provided strong evidence against the null hypothesis for seven comparisons, but for an additional 10 comparisons only network meta-analysis provided strong evidence against the null hypothesis (P=0.002). The median time to strong evidence against the null hypothesis was 19 years with living network meta-analysis and 23 years with living pairwise meta-analysis (hazard ratio 2.78, 95% confidence interval 1.00 to 7.72, P=0.05). Studies directly comparing the treatments of interest continued to be published for eight comparisons after strong evidence had become evident in network meta-analysis.

Conclusions In comparative effectiveness research, prospectively planned living network meta-analyses produced strong evidence against the null hypothesis more often and earlier than conventional, pairwise meta-analyses.


A timelier introduction of effective medical interventions was one of the early promises of meta-analysis of randomised control trials (RCTs).12 Cumulative meta-analysis, defined as updating a meta-analysis whenever a new eligible RCT becomes available, has been used retrospectively to examine how evidence on a given intervention has accrued over time and how quickly it has informed guidelines.34 More recently, the optimal time for updating a systematic review has been discussed567 and guidelines and decision tools developed.8910 In 2014 “living systematic reviews” were proposed as a framework for continuously updated meta-analyses.11

In recent years, network meta-analyses have gained prominence in comparative effectiveness research.1213 They extend conventional, pairwise meta-analysis to compare multiple treatments within a network of RCTs.141516 A living version of network meta-analysis has recently been suggested as the new paradigm in comparative effectiveness research.1718 Healthcare institutions such as the UK National Institute for Health and Care Excellence and the World Health Organization consider network meta-analyses and, if there is high confidence in the results, use them to inform recommendations.19 By including both direct and indirect evidence, continuously updated network meta-analysis can reach robust conclusions on the relative effectiveness of treatments earlier than pairwise meta-analyses, thus potentially facilitating timely recommendations and reducing research waste.171820

In a prospectively planned network meta-analysis, studies are designed and realised using a predefined protocol and they are cumulatively synthesised as their results become available. One study highlighted the potential of this approach to optimally inform comparative effectiveness of drugs, not only at the post-marketing stage but also before licensing.21 In the framework of a prospective living network meta-analysis, suitable methods are required for statistically monitoring the accumulating evidence while controlling for the risk of falsely concluding superiority of an intervention. Such methods have been developed recently, extending the sequential monitoring of trials and pairwise meta-analyses.1822 It is, however, unclear whether the theoretical potential of prospectively planned living network meta-analysis can be realised in comparative effectiveness research and whether its increased power compared with pairwise meta-analysis is or is not substantial. We used sequential monitoring to assess recently published network meta-analyses of RCTs of medical interventions to examine whether living network meta-analysis would have provided strong evidence against the null hypothesis more often and earlier than the corresponding updated pairwise meta-analysis.


Search strategy and inclusion criteria

We compiled a large database of network meta-analyses of RCTs based on searches of Medline, Embase, and the Cochrane Database of Systematic Reviews from inception to 14 April 2015.12 In the present study we included networks published after January 2012, as empirical evidence suggested that the quality of the systematic reviews and statistical rigor have considerably improved in recent years.12 To ensure a critical mass of data, we included networks with at least one closed loop of evidence that compared at least five different treatments and included 20 or more RCTs published within at least 10 years.

Selection of comparison of interest

We focused on treatment comparisons that were of interest to the developers of clinical guidelines during the period the body of evidence accumulated. For each included network meta-analysis we asked senior clinicians or clinical researchers with experience in guideline development to choose the treatment comparison that “was of greatest interest to the developers of guidelines or which had the greatest influence on clinical decision-making during the indicated time period” and to justify their choice by providing a reference of a relevant clinical guideline. One expert evaluated each network and the comparison was chosen independently of the availability of direct or indirect evidence. Experts were blind to the results of the sequential analysis. Treatment effects were expressed as the standardised mean difference, odds ratios, or hazard ratios for continuous, binary, and time-to-event data, respectively.

Network meta-analysis rests on the assumption of consistency between direct and indirect evidence. We therefore excluded networks where the comparison of interest showed evidence of inconsistency, defined by a P value less than 0.10 when direct and indirect evidence were compared in a z test (Separate Indirect from Direct Evidence (SIDE), also called node splitting test).23

Construction of monitoring boundaries and definition of strong evidence

We assumed that studies had been prospectively planned and that they were included in the synthesis model once results became available. Then we evaluated the evidence against the null hypothesis using hypothesis testing to decide whether further data were needed. Repeatedly testing whenever new evidence is added to a body of evidence leads to inflated type I errors.222425 Methods originally developed for sequential analysis of RCTs have been adapted to cumulative pairwise meta-analysis and more recently to network meta-analysis.1826 We used an adaptation of α spending functions, which we have described in detail elsewhere.18 Firstly, we defined an anticipated treatment effect to detect rates of type I and type II errors (α and β, respectively). Secondly, we constructed the α spending function boundary as a function of the statistical information (added at each update) and the maximum information. Statistical information is defined as the inverse of the variance—that is, precision. We defined the maximum information as the precision of a single RCT that is adequately powered to detect the anticipated difference between the two interventions, given α and β. The α error is then distributed along the sequential tests, with smaller values “spent” for early tests and larger values spent at later stages. The monitoring boundaries correspond to the quantiles of the α levels and approximate the (1−α/2) % quantile of thestandard normal distribution as the statistical information approaches its maximum. We implemented the methods in a freely available R package (see appendix N).

We defined a significance level α=5%, power of 90% (β=10%) and an anticipated treatment effect to detect equal to the final estimate from the network meta-analysis. We expressed results as z scores (ie, the effect size divided by its standard error). In the primary analysis for both pairwise meta-analysis and network meta-analysis, we imputed the median value of the empirical distributions from Cochrane reviews.2728 In a sensitivity analysis, we estimated heterogeneity from the data at each update. We considered that a pairwise or network meta-analysis provided strong evidence against the null hypothesis (the hypothesis that there is no difference between the interventions) when the accumulated information crossed the monitoring boundaries of statistical significance, constructed as described here and previously.18 We define strong evidence as strong evidence against the null hypothesis.

Example: olanzapine versus haloperidol in schizophrenia

We illustrate the approach using the example of the relative efficacy of olanzapine and haloperidol in the acute treatment of schizophrenia, based on one of the network meta-analyses included in this study29 (fig 1). We assume a standardised mean difference measuring the overall change in symptoms of 0.13 favouring olanzapine, equal to the final estimate from the random effects network meta-analysis, and type I and type II errors of 5% and 10%, respectively. The first RCT to compare olanzapine and haloperidol was published in 1996.30 The results showed that olanzapine tended to reduce symptoms more than haloperidol: the standardised mean difference was 0.07 and the z score was 0.26.30 Until 2007, a further 10 RCTs that directly compared the two drugs were published, resulting in a summary standardised mean difference of 0.12 favouring olanzapine (95% confidence interval −0.07 to 0.30); these results are added sequentially in cumulative pairwise meta-analysis (fig 1). In network meta-analysis the z score is updated whenever new direct or indirect evidence becomes available (fig 1). Indirect evidence accumulates through RCTs that compare the drugs of interest with placebo or another drug. At any time, the accumulated information is compared with the monitoring boundaries. The direct evidence from cumulative pairwise meta-analysis remains within the boundaries, whereas the mixed evidence (direct and indirect) from network meta-analysis crosses the monitoring boundaries in 2008 (after the inclusion of 131 RCTs in the entire network), indicating strong evidence against the null hypothesis for the superiority of olanzapine (fig 1). The standardised mean difference at the point of crossing the stopping boundary was 0.13 (95% confidence interval 0.02 to 0.24). Appendix M shows an alternative presentation of sequential monitoring using repeated confidence intervals.

Fig 1
Fig 1

Efficacy of olanzapine versus haloperidol in treatment of acute schizophrenia, as estimated from living pairwise meta-analysis and living network meta-analysis. Monitoring boundaries were constructed using an α spending function with type I and type II errors fixed at 5% and 10%, respectively. Conventional significance thresholds are shown as dotted lines (z=1.96). The horizontal axis shows statistical information that accumulated over time, compared with maximum statistical information (information in single adequately powered study). Heterogeneity variance is assumed to be equal to the median of predictive distributions (0.049)

Comparing living network meta-analysis and living pairwise meta-analysis

We compared the emergence of strong evidence against the null hypothesis (defined as crossing monitoring boundaries) between living pairwise and network meta-analyses. Among comparisons with strong evidence we examined whether boundaries were crossed as a result of indirect evidence. We analysed data in a 2×2 table using McNemar’s exact test and estimated differences in the probability of providing strong evidence. We plotted Kaplan-Meier curves and calculated the hazard ratio from a frailty time-to-event regression model to describe the time needed to cross the boundary while accounting for the paired nature of the data.31 For comparisons with strong evidence against the null hypothesis we recorded how many studies that directly compared the treatments were published after the boundaries had been crossed. All analyses were done in a frequentist framework using R (R Development Core Team, Vienna, Austria) and Stata (Stata, College Station, TX). Appendices K and N include the technical details about the R package, and worked examples.

Patient involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in the design and implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.


Database of network meta-analyses

Out of 456 published network meta-analyses included in the original database,12 44 met the inclusion criteria. The most important reasons for exclusion were publication before January 2012 and lack of outcome data (see appendix figure 1). The 44 network meta-analyses were published in 38 journals (28 specialist and 10 general medicine journals) (table 1). Most networks addressed research questions in the specialties of cardiology, endocrinology, psychiatry, and rheumatology (table 1). Clinical experts selected 54 treatment comparisons, and five were excluded owing to evidence of inconsistency (table 2, and see appendix table 1). Most of the 49 included comparisons were between two drug interventions (n=39, 80%). Five comparisons (10%) involved placebo, two (4%) were between invasive interventions, and four (8%) involved lifestyle modifications. Most primary outcomes were binary (66%) or continuous (23%) (table 1).

Table 1

Characteristics of 44 network meta-analyses and 49 network comparisons of medical interventions included in study. Values are numbers (percentages) unless stated otherwise

View this table:
Table 2

Treatment comparisons selected in each network, type of evidence (direct, indirect, or both), and meta-analysis method that provides strong evidence against similarity of treatments for primary outcome studied (see appendix for more detailed version of table)

View this table:

Of the 49 comparisons, 29 (59%) were informed by both direct and indirect evidence in the network meta-analysis. The P values from testing for agreement between indirect and direct evidence ranged between 0.11 and 0.99 (see appendix table 1). Thirteen comparisons (27%) were not examined directly in any RCT; seven (14%) were based on direct evidence only.

Comparison of living pairwise and network meta-analyses

For 10 of the 49 comparisons (20%), the evidence for superiority of one of the interventions was stronger with network meta-analysis than with pairwise meta-analysis. In seven instances (14%) both pairwise and network meta-analyses provided strong evidence against the null hypothesis, whereas in 32 comparisons neither analysis produced strong evidence (table 3, P=0.002 from McNemar’s exact test). Network meta-analysis was 20% more likely (95% confidence interval 10% to 35%) to provide strong evidence against the null hypothesis than pairwise meta-analysis. Results were similar when heterogeneity was estimated rather than imputed (see appendix table 2) or when summary effects from pairwise meta-analysis instead of network-meta-analysis were used to define the anticipated treatment effect to detect (see appendix table 3). Restricting analyses to comparisons for which both direct and indirect evidence were available did not materially change results (P=0.016 from McNemar’s test, see appendix table 4).

Table 3

Number of comparisons with strong evidence against the null hypothesis from pairwise and network meta-analysis. Values are numbers (percentages) unless stated otherwise

View this table:

For nine out of the 17 treatment comparisons where network meta-analyses provided strong evidence against the null hypothesis (53%), this was achieved only after adding an RCT that contributed indirect evidence. For 13 treatment comparisons there was no RCT directly comparing the interventions of interest, and yet for three of them strong evidence was available by indirect comparison (table 2): exercise and non-steroidal anti-inflammatory drugs versus exercise for pain relief,32 nifedipine versus placebo for delaying delivery in women at risk of preterm delivery,33 and candesartan versus topiramate for the prevention of migraine.34

Median time to strong evidence against the null hypothesis (the first time that the monitoring boundary was crossed) was 19 years (interquartile range 16 to 23) with network meta-analysis and 23 years (interquartile range not estimable) with pairwise meta-analysis (fig 2). Network meta-analysis provided strong evidence earlier than pairwise meta-analysis, by 4 years (95% confidence interval 0 to 7 years). The hazard ratio comparing network with pairwise meta-analysis was 2.78 (95% confidence interval 1.00 to 7.72; P=0.05). For eight (47%) of the 17 comparisons with strong evidence, studies directly comparing the treatments of interest continued to be published after the boundary had been crossed (see appendix table 1). The total number of additional studies was 66; 40 of these compared edaravone with placebo.35 Appendix table 5 shows the results from pairwise and network meta-analyses for each medical specialty.

Fig 2
Fig 2

Kaplan-Meier survival curves for non-strong evidence against null hypothesis, comparing sequential pairwise and network meta-analysis of 49 comparisons. Events occur when monitoring boundaries are crossed for comparison of interest. Time is measured as years from time point when both interventions are included in network

In a scatterplot of the precision of estimates from pairwise and network meta-analyses, there was a clear gain in precision with network meta-analysis (see appendix figure 2). In almost half of the comparisons (13 out of 29 comparisons, 45%) the network meta-analysis produced a 95% confidence interval the width of which was less than two thirds of the interval from pairwise meta-analysis. Appendix figure 3 presents the continuous updating (as z scores along with monitoring boundaries) of pairwise and network meta-analysis for all 49 comparisons included in this study.


This study found that among 49 treatment comparisons deemed important for guideline development and clinical practice, prospectively planned, living network meta-analysis was 20% more likely to produce strong evidence against the null hypothesis than living pairwise meta-analysis that was based on direct evidence only. Strong evidence became available four years earlier with network meta-analysis compared with pairwise meta-analysis. Of note, studies comparing the two treatments of interest continued to be published even after strong evidence had become available. This is an important finding with implications for clinical research, especially in the context of the debate about research waste.20363738 As per the inclusion criteria, the findings of this study apply to treatment comparisons for which a considerable amount of data have accumulated over at least 10 years.

Strengths and weaknesses of this study

Several authors have argued that network meta-analyses and indirect treatment comparisons should be more frequently used to inform healthcare and regulatory decisions.3940414243 One of these studies analysed a network of interventions for primary open angle glaucoma and found that network meta-analysis showed the advantage of prostaglandins 10 years before they were recommended in clinical guidelines.39 In our study we empirically assessed the frequency of and time to strong evidence in comparative effectiveness research using network or traditional pairwise meta-analysis. Although the sample size was small (49 comparisons) we believe it is likely to represent situations where guideline developers and clinical decision makers might consider network meta-analysis.

We mimicked the situation of a prospectively planned living network or pairwise meta-analysis and asked clinical experts to choose the treatment comparisons that were of topical interest during the period the evidence accumulated. We restricted the data to networks that did not show evidence of statistical inconsistency for the comparisons of interest, and excluded only five networks with evidence of inconsistency; this is in line with previous studies that have shown that direct and indirect comparisons in a network disagree in about 10% of comparisons.4445 However, we cannot rule out inconsistency in some of the comparisons evaluated; statistical tests for inconsistency have low power to detect inconsistency.4647 One in eight networks was previously found to show evidence of inconsistency using the design-by-treatment test; this means that our methods would not be applicable in, on average, one in eight networks.44

Our study has some limitations. We reanalysed published networks that did not show statistical inconsistency, but we did not examine the overall quality of the evidence they provided. We acknowledge that strong evidence against the null hypothesis does not necessarily translate into strong recommendations.48 Guideline panels typically consider the quality of evidence in the results from meta-analysis before making recommendations, often using the GRADE system.49 Evaluation of the confidence in the results from network meta-analysis is a matter of ongoing research,505152 and none of the included networks in our database attempted such an evaluation. We did not consider other components such as the risk of publication bias, the limitations of the RCTs included in the networks, or the comprehensiveness of the literature search and accuracy of the data extraction. Selective publication of studies of the comparison of interest or of studies of comparisons that contributed indirect evidence or a high risk of bias in the conduct and analysis of studies, could have affected our conclusions. Finally, we acknowledge that evidence against the null hypothesis based on P values might be of greater interest to regulatory agencies than to guideline developers. Interpretation of the 95% confidence interval of the treatment effect in the context of worthwhile effects are more useful for decision making.5354 Although recently published networks, such as those included here, conform with high methodological standards,55 the strength of recommendations from pairwise and network meta-analyses should be compared in future studies.

Relation to other studies and implications

Protocols for living pairwise and network meta-analyses have been published recently,565758 and health technology assessment agencies, the World Health Organization, and drug licensing bodies have recognised the value of synthesising direct and indirect evidence in network meta-analysis.19214159 The concept of living meta-analysis is in line with the goal to continuously update guidelines and to promptly translate results to recommendations, evidence summaries, and decision aids.6061 Our findings suggest that the gain in including both direct and indirect evidence when synthesising data from clinical research is substantial as it can considerably reduce the time to strong evidence against the null hypothesis.

We found that one in four of the comparisons was not directly compared in RCTs. This might be partly explained by the tendency of pharmaceutical companies to test drugs against placebo or suboptimal interventions, rather than the reference treatment given in routine practice.62 Taking into account direct and indirect evidence, the median time to obtain strong evidence against the null hypothesis about comparisons of interest was 19 years. Consequently, even in cases where direct evidence exists, it is important to strengthen the evidence base using network meta-analysis.

Unanswered questions and future work

Our results indicate that living network meta-analysis has the potential to reduce research waste and optimise the use of available evidence.17 Further research is needed to better understand the role of living meta-analysis in the prevention of research waste. Previous studies showed that the impact of available data on the design of subsequent research is generally low.63646566 We found that in about half of the comparisons for which strong evidence emerged from living network meta-analysis, further studies were conducted after the evidence was already available. We did not assess these studies to determine whether they were scientifically and ethically justified but are planning such work in the future.

Methodologists are debating the use of sequential methods for cumulative meta-analysis. In particular, there is concern about encouraging inference based on statistical rather than clinical significance (see box). The statistical routines are not widely available, and guidance and tutorials are lacking. The development of software tools and a template protocol for living systematic reviews are urgently needed. Further development of the methodology, such as the most appropriate approach to dealing with heterogeneity in a sequential meta-analysis, is another priority.1122

Box 1 Do researchers agree on the use of sequential methods in meta-analysis?

Consensus on the appropriateness of applying sequential methods in meta-analysis is lacking. The arguments against their use are mainly twofold: the suspicion that sequential methods encourage inferences on the basis of statistical significance and concerns about meta-analysis influencing decisions on the design of future trials.12 In our view, interpretation of meta-analysis results should emphasise the uncertainty surrounding effect estimates, irrespective of the use of sequential analysis. Uncertainty over stopping decisions can be expressed and inspected through repeated confidence intervals, which can be drawn in a forest plot along with confidence intervals (see appendix M)34

Concerns about the influence of meta-analyses on future primary research has also contributed to the scepticism. In particular, detractors of the notion of sequential meta-analysis have questioned the analogy between stopping trials and stopping meta-analyses and wondered whether a decision of “no further updating warranted” would be reasonable.2 Editors of Cochrane review groups have indeed “closed a review,” judging that it is not likely that further evidence would challenge current conclusions, and therefore new trials are deemed unnecessary, costly, and unethical.56 Authors of systematic reviews are not in a position to decide whether further studies are done and to define their design. What they should do, however, is to provide recommendations for research, to identify potential gaps in the existing evidence base, and to discuss the implications of their reviews for future research.237891011

The application of sequential methods in meta-analysis is also controversial; this controversy is mainly driven by the nature of their use and interpretation in practice. Sequential methods are often used to correct for multiple testing when presenting and interpreting evidence from a systematic review. In our view, no multiplicity is induced in the conventional process of performing meta-analysis as a retrospective activity: researchers simply synthesise what is already known and are not in a position to decide on carrying out further studies.2 Retrospective application of sequential methods in meta-analysis in line with recommendations for cumulative meta-analysis7 should be done for illustrative purposes only.

However, in a prospective meta-analysis setting where studies are planned and analysed sequentially to answer a research question, control of type I error through sequential methods is desirable.12 Researchers undertaking such living systematic reviews have to decide a priori and describe in a protocol if and when the review is going to be terminated. If this decision is linked to whether treatment effects provide strong evidence against the null hypothesis or not, adjustment for multiple testing becomes important. Empirical results presented in this paper should be seen as an illustration of hypothetical living networks of prospectively planned studies rather than as an attempt to define the need of future research in the examined healthcare areas. Depending on whether researchers plan to simply describe implications for research, provide concrete recommendations for filling evidence gaps, or actively direct future research, use of sequential methods might be less or more imperative. In our view, the optimal use of living systematic reviews will be to highlight gaps and certainties in a body of evidence and provide research funders and regulators with the best evidence to decide whether new primary research is warranted and, if so, for which treatment comparisons.

Undertaking living systematic reviews within a bayesian framework is a possible alternative.313 The estimated treatment effects after each update form a prior for future updates. Then, approaches to monitor the accumulated evidence can be informal (eg, inspecting the estimated treatment effects and their precision without formally specifying a stopping criterion) or formal (eg, by defining a loss function or a boundary as a basis for monitoring).1415

  • 1. Chalmers TC, Lau J. Meta-analytic stimulus for changes in clinical trials. Stat Methods Med Res 1993;22:161-72.

  • 2. Higgins JPT. Comment on “Trial sequential analysis: methods and software for cumulative meta-analyses” by Wetterslev and colleagues. Cochrane Methods. Cochrane DB Syst Rev 2012;(Suppl 1):1-56.

  • 3. Higgins JPT, Whitehead A, Simmonds M. Sequential methods for random-effects meta-analysis. Stat Med 2011;309:903-21.

  • 4. Jennison C, Turnbull BW. Repeated confidence intervals for group sequential clinical trials. Control Clin Trials 1984;51:33-45.

  • 5. Lacasse Y, Cates CJ, McCarthy B, Welsh EJ. This Cochrane Review is closed: deciding what constitutes enough research and where next for pulmonary rehabilitation in COPD.

  • 6. Sutton AJ, Donegan S, Takwoingi Y, Garner P, Gamble C, Donald A. An encouraging assessment of methods to inform priorities for updating systematic reviews. J Clin Epidemiol 2009;623:241-51.

  • 7. Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to Meta-Analysis. Wiley; 2011:434.

  • 8. Chapman E, Reveiz L, Chambliss A, Sangalang S, Bonfill X. Cochrane systematic reviews are useful to map research gaps for decreasing maternal mortality. J Clin Epidemiol 2013;661:105-12.

  • 9. Higgins JP, Green S, Scholten RJ. Maintaining Reviews: Updates, Amendments and Feedback. In: Higgins JP, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions. Chichester, UK: Wiley; 2008:31-49 [cited 2014 Jul 16].

  • 10. Habre C, Tramèr MR, Pöpping DM, Elia N. Ability of a meta-analysis to prevent redundant research: systematic review of studies on pain from propofol injection. BMJ 2014;348:g5219.

  • 11. Elliott JH, Turner T, Clavisi O, Thomas J, Higgins JPT, Mavergames C, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med 2014;18:112:e1001603.

  • 12. Whitehead A. A prospectively planned cumulative meta-analysis applied to a series of concurrent clinical trials. Stat Med 1997;1624:2901-13.

  • 13. Spence GT, Steinsaltz D, Fanshawe TR. A Bayesian approach to sequential meta-analysis. Stat Med 2016: 1 Aug.

  • 14. Spiegelhalter DJ, Abrams KR, Myles JP. Randomised controlled trials. In: Bayesian approaches to clinical trials and health-care evaluation. Wiley; 2003:181-249.

  • 15. Freedman LS, Spiegelhalter DJ. Comparison of Bayesian with group sequential methods for monitoring clinical trials. Control Clin Trials 1989;104:357-67.

In the present work, we focused on detecting differences between interventions. It is also possible to construct futility stopping boundaries,18 and empirical evidence about the relative advantage of network meta-analysis in this context is required. Such an extension might be particularly useful considering that we could not detect statistically significant differences in two out of three comparisons. We selected only one treatment comparison for each network, although decision making may involve several treatments included in the network. Future studies should investigate the superiority or non-inferiority of several competing treatments.


Continuously updated systematic reviews to inform guidelines and clinical decision making may provide strong evidence against the null hypothesis more frequently and earlier if both direct and indirect accumulating evidence is considered within the framework of a living network meta-analysis.

What is already known on this topic

  • Network meta-analysis is an extension to conventional meta-analysis, which includes both direct and indirect evidence on the comparative effectiveness of multiple treatments

  • Network meta-analysis might produce strong evidence against the null hypothesis on the comparative effectiveness of treatments earlier than standard pairwise meta-analysis but requires more assumptions and advanced statistical methods

  • Sequential methods for analysis of “living” network meta-analysis of prospectively planned randomised controlled trials have recently become available, allowing the continuous updating of evidence

What this study adds

  • Network meta-analysis was 20% more likely to provide strong evidence against the null hypothesis of treatment differences than pairwise meta-analysis

  • Network meta-analysis provided strong evidence against the null hypothesis four years earlier than pairwise meta-analysis

  • Prospectively planned living network meta-analysis can facilitate timely recommendations and contribute to reduce research waste by providing strong evidence against the null hypothesis earlier than living pairwise meta-analysis


We thank the following clinical experts who identified clinically relevant comparisons of highest interest and provided a relevant reference to guidelines (numbers in brackets refer to the relevant references to the included network meta-analyses, which can be found in appendix C): Graziella Filippini for reviewing a network on multiple sclerosis,[4] Maria Kyrgiou for reviewing a network on labour induction,[10] Niklaus Meier for reviewing a network on migraine,[44] Nikolaos Pandis for reviewing three networks on dentistry and periodontology,[1, 7, 34] and Stephan Reichenbach for reviewing three networks on rheumatology.[9, 12, 41] We also thank Akhil Parashar, Karan Sud, Apostolos Tsapas, Sara Mazzucco, and Klaus Linde for discussions on the process to select the comparison of highest interest in a previous version of this project; and Maria Petropoulou, Areti-Angeliki Veroniki, Patricia Rios, Afshin Vafaei, Wasifa Zarin, Myrsini Giannatsi, Shannon Sullivan, and Anna Chaimani for their contribution to building the database of network meta-analyses that we used for this study.


  • Contributors: GS and ME conceived and designed the study. TAF, AC, ACT, SES, and GCMS assisted in the design of the study, indicated the relevant comparisons of highest interest for their specialties of expertise, and provided a relevant reference to guidelines. AN compiled the database of eligible networks and performed the analysis. AN drafted the manuscript. TAF and AC assessed network meta-analyses pertaining to mental health, ACT and SES assessed networks in the specialties of endocrinology, dermatology, gastroenterology, obstetrics, oncology, anaesthesiology, and hepatology, and GCMS assessed network meta-analyses from the specialty of cardiology. For networks examining other conditions, external medical doctors or experienced researchers were approached. All authors critically revised the manuscript, interpreted the results, and performed a critical review of the manuscript for intellectual content. GS and ME produced the final version of the submitted article, and all coauthors approved it. AN and GS had full access to all data in the study and take responsibility for the integrity of the data and the accuracy of the analysis. GS and ME are the guarantors.

  • Funding: ACT is funded by a tier 2 Canada research chair in knowledge synthesis. SES is funded by a tier 1 Canada research chair in knowledge translation. GS received funding from a Horizon 2020 Marie-Curie individual fellowship (grant No 703254). AC is supported by the National Institute for Health Research Oxford cognitive health clinical research facility. The sponsors had no influence on the design, analysis, and reporting of this study, neither on the preparation of the manuscript.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at and declare: no support from any organisation for the submitted work; TAF has received lecture fees from Eli Lilly, Janssen, Meiji, Mitsubishi-Tanabe, MSD, and Pfizer and consultancy fees from Takeda Science Foundation. He has received research support from Mochida and Mitsubishi-Tanabe; no other relationships or activities that could appear to have influenced the submitted work.

  • Ethical approval: Not required.

  • Data sharing: List of included studies and details about the data are reported in the appendix. Study level data and R codes are available in a GitHub repository. Instructions to access them can be found in appendix N.

  • Transparency: The lead author (GS) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: