Original Article
Trial sequential analysis may establish when firm evidence is reached in cumulative meta-analysis

https://doi.org/10.1016/j.jclinepi.2007.03.013Get rights and content

Abstract

Background and Objective

Cumulative meta-analyses are prone to produce spurious P < 0.05 because of repeated testing of significance as trial data accumulate. Information size in a meta-analysis should at least equal the sample size of an adequately powered trial. Trial sequential analysis (TSA) corresponds to group sequential analysis of a single trial and may be applied to meta-analysis to evaluate the evidence.

Study Design and Setting

Six randomly selected neonatal meta-analyses with at least five trials reporting a binary outcome were examined. Low-bias heterogeneity-adjusted information size and information size determined from an assumed intervention effect of 15% were calculated. These were used for constructing trial sequential monitoring boundaries. We assessed the cumulative z-curves' crossing of P = 0.05 and the boundaries.

Results

Five meta-analyses showed early potentially spurious P < 0.05 values. In three significant meta-analyses the cumulative z-curves crossed both boundaries, establishing firm evidence of an intervention effect. In two nonsignificant meta-analyses the cumulative z-curves crossed P = 0.05, but never the boundaries, demonstrating early potentially spurious P < 0.05 values. In one nonsignificant meta-analysis the cumulative z-curves never crossed P = 0.05 or the boundaries.

Conclusion

TSAs may establish when firm evidence is reached in meta-analysis.

Introduction

Meta-analyses of randomized trials increase the power and precision of the estimated intervention effects. Meta-analyses may report spurious significant results (type I errors) which should have been “nonsignificant.” Such spurious results may be because of systematic errors (bias) or random errors due to repeated significance testing when updating meta-analyses with new trials. First, bias from trials with low methodological quality [1], [2], [3], [4], outcome measure bias [5], [6], publication bias [7], early stopping for benefit [8], and small trial bias (which may be a proxy for the other bias mechanisms) [9] may result in spurious P-values. Second, repeated testing of significance as data accrue is bound to lead to rejection of the null hypothesis, within a realistic time span, with probability 1 [10]. Statistically significant small trials are often overruled when results from adequately powered and bias-protected trials emerge [11], [12].

Armitage [13] and Pocock [14] introduced group sequential analysis with planned group sizes in a single randomized trial. Lan and DeMets [15] extended the methodology using an α-spending function to allow flexible unplanned monitoring of a randomized trial. They introduced the cumulative z-curve modeled as a Brownian motion and an α-spending function according to O'Brien and Flemming for construction of discrete sequential boundaries. If a treatment effect larger than expected occurs, group sequential analysis with the cumulative z-curve crossing the discrete sequential boundary suggest preliminary termination of a trial.

Pogue and Yusuf [16], [17] used discrete sequential boundaries to suggest when firm evidence in cumulative meta-analyses is reached. They calculated an “optimal information size” based on the assumption that participants in a meta-analysis originated from one trial [16], [17]. We suggest to call such analyses trial sequential analyses (TSAs) using trial sequential monitoring boundaries (TSMB).

TSA necessitates the use of an information size. We refer to the total number of participants in the meta-analysis, N, as the accrued information size (AIS). The information size based on an a priori anticipated intervention effect (APIS) is the sample size required for an adequately powered trial to detect a prespecified intervention effect with a risk of type I error (α) and type II error (β). The calculation of the information size in each meta-analyses is the cornerstone in analyzing the strength of the evidence. However, there is no standard method to calculate the information size. Pogue and Yusuf defined a target number of meta-analysis participants, or information size, as the size suggested for a single trial of the same therapeutic problem assuming an a priori relative risk reduction (RRR) of 20%. The a priori RRR was based on experiences from related areas in cardiology [16], [17]. Drawing inference from one intervention area to another may be problematic and randomized trials may be biased [1], [2], [3], [4]. Generally, trials with high-bias risk (i.e., inadequate generation of the allocation sequence, inadequate allocation concealment, or inadequate double blinding) overestimate intervention effects compared to trials with low-bias risk [1], [2], [3], [4], [9]. Therefore, it seems more realistic and reliable to base the calculation on an effect magnitude that does not contradict the intervention effect estimated by the low-bias trials. Accordingly, we suggest to apply the estimate of the intervention effect suggested by a meta-analysis of the low-bias risk trials for calculation of a low-bias information size (LBIS).

Meta-analyzing trials as if participants originated from one trial may be biased because of trial heterogeneity [18], [19], [20]. Such heterogeneity influences whether to use a fixed- or a random-effects model [18]. Likewise heterogeneity should be considered when the information size is calculated in TSA. Therefore, we suggest that the heterogeneity should adjust LBIS. Hence, an increase in heterogeneity should result in a larger information size: the low-bias, heterogeneity-adjusted information size (LBHIS). It may be “to move the goal post” [21], but this may be needed to establish firm evidence.

We aimed to design a frequentistic TSA that prevents premature declaration of superiority of an intervention that is misled by a fortuitous low nominal P-value. TSA offers other advantages, such as re-estimation of sample size needed, provide incentives for new high-quality trials, and may stop further trials if the intervention benefits are remote or nonexistent or when the intervention effect is dramatic and no more trials are needed. We therefore examined randomly selected neonatal meta-analyses of binary outcomes applying TSA with AIS, LBIS, and LBHIS. Further, we compared these TSAs with TSA using an information size based on an a priori chosen RRR of 15% (APIS).

Section snippets

Material

We randomly selected six meta-analyses [22], [23], [24], [25], [26], [27] (Table 1) with at least five trials reporting a binary primary outcome from the 170 Cochrane Neonatal Group systematic reviews in The Cochrane Library (Issue 2, 2004) [28].

Cumulative z-curves, information size, and trial sequential monitoring boundaries

The sample size needed in the meta-analysis is at least the sample size needed in a single trial with a given α and β. We constructed the cumulative z-curve of the meta-analyses [22], [23], [24], [25], [26], [27] and assessed their crossing of the

TSA of six meta-analyses of interventions in neonatal patients

The binary outcomes were mortality or pneumothorax, respiratory failure, vascular compromise, infection, enterocolitis, and mortality or morbidity [22], [23], [24], [25], [26], [27] (Table 1). The mean event-proportion in the control groups was 30% (range 17%–58%). Four meta-analyses were without heterogeneity (I2 = 0) [23], [24], [25], [27] and two with heterogeneity (I2 = 28% [22] and 41% [26]). Cumulative z-curves crossing of P < 0.05 were observed in 5/6 meta-analyses (Fig. 1, Fig. 2, Fig. 3,

Discussion

Our study represents the largest number of TSA on meta-analyses conducted so far and demonstrates the following. First, potentially spurious P-values are prevalent in meta-analyses, thus supporting the previous simulation results [10]. We identified such values in 5/6 six meta-analyses using TSMBAIS or TSMBLBIS. Second, TSMBAIS seems insufficient to detect all potentially spurious P-values, whereas most such values may be eliminated by TSMBLBHIS or TSMBAPIS (Fig. 1, Fig. 2, Fig. 3, Fig. 4 and //ctu.rh.dk

Conclusion

TSA with TSMBLBHIS and TSMBAPIS considering bias risk, heterogeneity, and WIF may reduce the risk of type I error compared to P < 0.05. TSA conducted with both TSMBLBHIS and TSMBAPIS may guide investigators to plan a realistic and worthwhile sample size in a new trial based on the available evidence. TSA conducted with both TSMBLBHIS and TSMBAPIS may have the potential to prevent the initiation of unnecessary trials when firm evidence has been obtained. TSA may offer a flexible tool to analyze

Acknowledgments

We thank the peer reviewers P.J. Devereaux and J. Hilden for valuable comments and suggestions on previous versions of this manuscript.

References (46)

  • K.L. Woods et al.

    The importance of effect mechanism in the design and interpretation of clinical trials: the role of magnesium in acute myocardial infarction

    Prog Cardiovasc Dis

    (2002)
  • J.P. Ioannidis et al.

    Recursive cumulative meta- analysis: a diagnostic for the evolution of total randomized evidence from group and individual patient data

    J Clin Epidemiol

    (1999)
  • G.A. Diamond et al.

    Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials

    J Am Coll Cardiol

    (2004)
  • K.F. Schulz et al.

    Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials

    JAMA

    (1995)
  • L.L. Kjaergard et al.

    Reported methodological quality and discrepancies between large and small randomized trials in meta-analyses

    Ann Int Med

    (2001)
  • Als-Nielsen B, Gluud LL, Gluud C. Methodological quality and treatment effects in randomised trials—a review of six...
  • A.W. Chan et al.

    Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles

    JAMA

    (2004)
  • A.W. Chan et al.

    Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors

    BMJ

    (2005)
  • K. Dickersin et al.

    Registering clinical trials

    JAMA

    (2003)
  • V.M. Montori et al.

    Randomized trials stopped early for benefit: a systematic review

    JAMA

    (2005)
  • Als-Nielsen B, Chen W, Gluud LL, Siersma V, Hilden J, Gluud C. Are trial size and reported methodological quality...
  • J.P.A. Ioannidis

    Contradicted and initially stronger effects in highly cited clinical research

    JAMA

    (2005)
  • P. Armitage

    Sequential analysis in therapeutic trials

    Annu Rev Med

    (1969)
  • Cited by (1358)

    View all citing articles on Scopus

    Preliminary results included in this article have been presented in part as three abstracts at the 26th annual meeting of The Society for Clinical Trials, 22–25 May, 2005, Portland, OR and as an abstract at the XIII Cochrane Colloquium, 22–26 October, 2005, Melbourne, Australia are listed in Appendix III.

    View full text