Original ArticleTrial sequential analysis may establish when firm evidence is reached in cumulative meta-analysis
Introduction
Meta-analyses of randomized trials increase the power and precision of the estimated intervention effects. Meta-analyses may report spurious significant results (type I errors) which should have been “nonsignificant.” Such spurious results may be because of systematic errors (bias) or random errors due to repeated significance testing when updating meta-analyses with new trials. First, bias from trials with low methodological quality [1], [2], [3], [4], outcome measure bias [5], [6], publication bias [7], early stopping for benefit [8], and small trial bias (which may be a proxy for the other bias mechanisms) [9] may result in spurious P-values. Second, repeated testing of significance as data accrue is bound to lead to rejection of the null hypothesis, within a realistic time span, with probability 1 [10]. Statistically significant small trials are often overruled when results from adequately powered and bias-protected trials emerge [11], [12].
Armitage [13] and Pocock [14] introduced group sequential analysis with planned group sizes in a single randomized trial. Lan and DeMets [15] extended the methodology using an α-spending function to allow flexible unplanned monitoring of a randomized trial. They introduced the cumulative z-curve modeled as a Brownian motion and an α-spending function according to O'Brien and Flemming for construction of discrete sequential boundaries. If a treatment effect larger than expected occurs, group sequential analysis with the cumulative z-curve crossing the discrete sequential boundary suggest preliminary termination of a trial.
Pogue and Yusuf [16], [17] used discrete sequential boundaries to suggest when firm evidence in cumulative meta-analyses is reached. They calculated an “optimal information size” based on the assumption that participants in a meta-analysis originated from one trial [16], [17]. We suggest to call such analyses trial sequential analyses (TSAs) using trial sequential monitoring boundaries (TSMB).
TSA necessitates the use of an information size. We refer to the total number of participants in the meta-analysis, N, as the accrued information size (AIS). The information size based on an a priori anticipated intervention effect (APIS) is the sample size required for an adequately powered trial to detect a prespecified intervention effect with a risk of type I error (α) and type II error (β). The calculation of the information size in each meta-analyses is the cornerstone in analyzing the strength of the evidence. However, there is no standard method to calculate the information size. Pogue and Yusuf defined a target number of meta-analysis participants, or information size, as the size suggested for a single trial of the same therapeutic problem assuming an a priori relative risk reduction (RRR) of 20%. The a priori RRR was based on experiences from related areas in cardiology [16], [17]. Drawing inference from one intervention area to another may be problematic and randomized trials may be biased [1], [2], [3], [4]. Generally, trials with high-bias risk (i.e., inadequate generation of the allocation sequence, inadequate allocation concealment, or inadequate double blinding) overestimate intervention effects compared to trials with low-bias risk [1], [2], [3], [4], [9]. Therefore, it seems more realistic and reliable to base the calculation on an effect magnitude that does not contradict the intervention effect estimated by the low-bias trials. Accordingly, we suggest to apply the estimate of the intervention effect suggested by a meta-analysis of the low-bias risk trials for calculation of a low-bias information size (LBIS).
Meta-analyzing trials as if participants originated from one trial may be biased because of trial heterogeneity [18], [19], [20]. Such heterogeneity influences whether to use a fixed- or a random-effects model [18]. Likewise heterogeneity should be considered when the information size is calculated in TSA. Therefore, we suggest that the heterogeneity should adjust LBIS. Hence, an increase in heterogeneity should result in a larger information size: the low-bias, heterogeneity-adjusted information size (LBHIS). It may be “to move the goal post” [21], but this may be needed to establish firm evidence.
We aimed to design a frequentistic TSA that prevents premature declaration of superiority of an intervention that is misled by a fortuitous low nominal P-value. TSA offers other advantages, such as re-estimation of sample size needed, provide incentives for new high-quality trials, and may stop further trials if the intervention benefits are remote or nonexistent or when the intervention effect is dramatic and no more trials are needed. We therefore examined randomly selected neonatal meta-analyses of binary outcomes applying TSA with AIS, LBIS, and LBHIS. Further, we compared these TSAs with TSA using an information size based on an a priori chosen RRR of 15% (APIS).
Section snippets
Material
We randomly selected six meta-analyses [22], [23], [24], [25], [26], [27] (Table 1) with at least five trials reporting a binary primary outcome from the 170 Cochrane Neonatal Group systematic reviews in The Cochrane Library (Issue 2, 2004) [28].
Cumulative z-curves, information size, and trial sequential monitoring boundaries
The sample size needed in the meta-analysis is at least the sample size needed in a single trial with a given α and β. We constructed the cumulative z-curve of the meta-analyses [22], [23], [24], [25], [26], [27] and assessed their crossing of the
TSA of six meta-analyses of interventions in neonatal patients
The binary outcomes were mortality or pneumothorax, respiratory failure, vascular compromise, infection, enterocolitis, and mortality or morbidity [22], [23], [24], [25], [26], [27] (Table 1). The mean event-proportion in the control groups was 30% (range 17%–58%). Four meta-analyses were without heterogeneity (I2 = 0) [23], [24], [25], [27] and two with heterogeneity (I2 = 28% [22] and 41% [26]). Cumulative z-curves crossing of P < 0.05 were observed in 5/6 meta-analyses (Fig. 1, Fig. 2, Fig. 3,
Discussion
Our study represents the largest number of TSA on meta-analyses conducted so far and demonstrates the following. First, potentially spurious P-values are prevalent in meta-analyses, thus supporting the previous simulation results [10]. We identified such values in 5/6 six meta-analyses using TSMBAIS or TSMBLBIS. Second, TSMBAIS seems insufficient to detect all potentially spurious P-values, whereas most such values may be eliminated by TSMBLBHIS or TSMBAPIS (Fig. 1, Fig. 2, Fig. 3, Fig. 4 and //ctu.rh.dk
Conclusion
TSA with TSMBLBHIS and TSMBAPIS considering bias risk, heterogeneity, and WIF may reduce the risk of type I error compared to P < 0.05. TSA conducted with both TSMBLBHIS and TSMBAPIS may guide investigators to plan a realistic and worthwhile sample size in a new trial based on the available evidence. TSA conducted with both TSMBLBHIS and TSMBAPIS may have the potential to prevent the initiation of unnecessary trials when firm evidence has been obtained. TSA may offer a flexible tool to analyze
Acknowledgments
We thank the peer reviewers P.J. Devereaux and J. Hilden for valuable comments and suggestions on previous versions of this manuscript.
References (46)
- et al.
Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?
Lancet
(1998) - et al.
Uncertainty of the time of first significance in random effects cumulative meta-analysis
Control Clin Trials
(1996) - et al.
Effect sizes in cumulative meta-analyses of mental health randomized trials evolved over time
J Clin Epidemiol
(2004) - et al.
Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative meta-analysis
Control Clin Trials
(1997) - et al.
Overcoming the limitations of current meta-analysis of randomised controlled trials
Lancet
(1998) - et al.
Meta-analysis. Is moving the goal post the answer?
Lancet
(1998) - et al.
Computations for group sequential boundaries using the Lan-DeMets spending function method
Control Clin Trials
(2000) - et al.
Antioxidant supplements for prevention of gastrointestinal cancers: a systematic review and meta-analysis
Lancet
(2004) - et al.
Vitamin C and vitamin E in pregnant women at risk for pre-eclampsia (VIP trial): randomised placebo-controlled trial
Lancet
(2006) - et al.
Strengths and limitations of meta-analysis: larger studies may be more reliable
Control Clin Trials
(1997)
The importance of effect mechanism in the design and interpretation of clinical trials: the role of magnesium in acute myocardial infarction
Prog Cardiovasc Dis
Recursive cumulative meta- analysis: a diagnostic for the evolution of total randomized evidence from group and individual patient data
J Clin Epidemiol
Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials
J Am Coll Cardiol
Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials
JAMA
Reported methodological quality and discrepancies between large and small randomized trials in meta-analyses
Ann Int Med
Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles
JAMA
Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors
BMJ
Registering clinical trials
JAMA
Randomized trials stopped early for benefit: a systematic review
JAMA
Contradicted and initially stronger effects in highly cited clinical research
JAMA
Sequential analysis in therapeutic trials
Annu Rev Med
Cited by (1358)
Monotherapy with P2Y<inf>12</inf>-inhibitors after dual antiplatelet therapy: Filling gaps in evidence
2024, International Journal of CardiologyPrevalence of and factors associated with potentially redundant randomized controlled trials: a cross-sectional study
2024, Journal of Clinical EpidemiologyThe lowest well tolerated blood pressure: A personalized target for all?
2024, European Journal of Internal Medicine
Preliminary results included in this article have been presented in part as three abstracts at the 26th annual meeting of The Society for Clinical Trials, 22–25 May, 2005, Portland, OR and as an abstract at the XIII Cochrane Colloquium, 22–26 October, 2005, Melbourne, Australia are listed in Appendix III.