Meta-analysis: Potentials and promiseBMJ 1997; 315 doi: https://doi.org/10.1136/bmj.315.7119.1371 (Published 22 November 1997) Cite this as: BMJ 1997;315:1371
- Matthias Egger (), reader in social medicine and epidemiologya,
- George Davey Smith, professor of clinical epidemiologya
- Correspondence to: Dr Matthias Egger
The number of papers published on meta-analyses in medical research has increased sharply in the past 10 years (fig 1). The merits and perils of the somewhat mysterious procedure of meta-analysis, however, continue to be debated in the medical community.1 23 What, then, is meta-analysis? A useful definition was given by Huque: “A statistical analysis that combines or integrates the results of several independent clinical trials considered by the analyst to be ‘combinable.’” 4 The terminology, however, is still debated, and expressions used concurrently include “overview,” “pooling,” and “quantitative synthesis.” We believe that the term meta-analysis should be used to describe the statistical integration of separate studies, whereas “systematic review” is most appropriate for denoting any review of a body of data that uses clearly defined methods and criteria (box). Systematic reviews can include meta-analyses, appraisals of single trials, and other sources of evidence.6 In this article we examine the potentials and promise of meta-analysis of randomised controlled trials. In later articles of this series we will consider the practical steps involved in meta-analysis,7 examine various extensions beyond the calculation of a combined estimate,8 address potential biases and discuss strategies to detect and minimise the influence of these in meta-analysis of randomised trials9 and of observational studies.10 We will conclude with a discussion of unresolved issues and future developments.11 Details of relevant software will appear on the BMJ's website at the end of the series.
What's in a name? The case for “meta-analysis”
The term meta-analysis for statistically combining and analysing data from separate studies is appropriate because:
The term makes sense. “Meta” implies something occurring later, more comprehensive, and is often used to name a new but related discipline designated to deal critically with the original one
The alternative terms are less specific or less poignant—for example, “overview” is also used for traditional reviews, and “pooling” incorrectly implies that the source data are merged
“Meta-analysis” has recently been included as a Medical Subject Heading (MeSH) and publication type within the Medline indexing system of the National Library of Medicine.5
“Systematic review” denotes any type of review that has been prepared using strategies to avoid bias and that which includes a material and methods section.6 Systematic reviews may or may not include formal meta-analyses
“Meta-analysis” is a useful term for describing a possible component of systematic reviews, and distinguishing between the two terms contributes to methodological clarity
Efforts to pool results from separate studies are not new. In his account on the preventive effect of serum inoculations against enteric fever, statistician Karl Pearson, was in 1904 probably the first researcher reporting the use of formal techniques to combine data from different samples. The rationale put forward by Pearson for pooling studies is still one of the main reasons for undertaking meta-analysis today: “Many of the groups … are far too small to allow of any definite opinion being formed at all, having regard to the size of the probable error involved.” 12
Well conducted meta-analysis allows for a more objective appraisal of the evidence, which may lead to resolution of uncertainty and disagreement
Meta-analysis may reduce the probability of false negative results and thus prevent undue delays in the introduction of effective treatments into clinical practice
Meta-analysis of a large number of individual studies or of individual patient data allows testing of a priori hypotheses regarding treatment effects in subgroups of patients
Heterogeneity between study results may be explored and sometimes explained
Promising research questions to be addressed in future studies may be generated, and the sample size needed in future studies may be calculated accurately
The first meta-analysis assessing the effect of a therapeutic intervention was published in 1955; interestingly, the treatment being evaluated was the placebo.13 A simple average was calculated of the effectiveness of placebos in such diverse conditions as postoperative wound pain, cough, and angina pectoris: the placebo was apparently effective in 35% of patients. The development of more sophisticated statistical techniques, however, took place in the social sciences, in particular in education research, in the 1970s. The term meta-analysis was coined in 1976 by the psychologist Glass.14 Meta-analysis was rediscovered by medical researchers to be used mainly in randomised clinical trial research, particularly in the fields of cardiovascular disease,15 oncology,16 and perinatal care.17 Meta-analysis of observational studies18 and “cross design synthesis” (the integration of observational data with the results from meta-analyses of randomised clinical trials19 20) have also been advocated.
More recently, a network of clinicians, epidemiologists, and other health professionals has been set up. The Cochrane Collaboration (named after Archie Cochrane, a pioneer in the field of evaluation of medical interventions) aims to prepare, maintain and disseminate comprehensive and systematic reviews of the effects of health care.21 22 Since the foundation of the Cochrane Centre in Oxford in October 1992, this initiative has been growing rapidly, with the foundation of 15 further centres in Europe, North and Latin America, Africa, and Australia and hundreds of individuals from all over the globe collaborating in review groups.
The unacceptable face of “statisticism”?
Despite its widespread use, meta-analysis continues to be a controversial technique. While some exponents feel that meta-analysis should “replace traditional review articles of single topic issues whenever possible,”23 others think of it as a “a new bête noir,” which represents “the unacceptable face of statisticism” and “should be stifled at birth.” 24 This mixed reception is not surprising. The pooling of results from a particular set of studies may be inappropriate from a clinical point of view, producing a population “average” effect, while the clinician wants to know how to best treat his or her particular patient. Meta-analyses of the same issue may reach opposite conclusions, as shown by assessments of low molecular weight heparin in the prevention of perioperative thrombosis25 26 and of second line antirheumatic drugs in the treatment of rheumatoid arthritis.27 28 It is nevertheless clear that for maximum benefit to be obtained from prior research, sound reviewing strategies must become more accessible and highly valued.
Traditional narrative reviews have several disadvantages that meta-analyses appear to overcome. The classic review is subjective and therefore prone to bias and error.29 Without guidance by formal rules, reviewers can disagree about issues as basic as what types of studies it is appropriate to include and how to balance the quantitative evidence they provide. Selective inclusion of studies that support the author's view is common: the frequency of citation of clinical trials is related to their outcome, with studies in line with the prevailing opinion being quoted more frequently than unsupportive studies.30 31 Once a set of studies has been assembled, a common way to review the results is to count the number of studies supporting various sides of an issue and to choose the view receiving the most votes. This procedure is clearly unsound, as it ignores sample size, effect size, and research design. It is thus hardly surprising that reviewers using traditional methods often reach opposite conclusions32 and miss small, but potentially important, differences.33 Clinical medicine is riddled with controversies, with reviews often being commissioned to end an argument. However, in controversial areas the conclusions drawn from a given body of evidence may be associated more with the specialty of the reviewer than with the available data.23 By integrating the actual evidence, meta-analysis allows a more objective appraisal, which can help to resolve uncertainties when the original research, classic reviews, and editorial comments disagree.
Limitations of a single study
A single study often cannot detect or exclude with certainty a modest, albeit clinically relevant, difference in the effects of two treatments. A trial may thus show no significant treatment effect when in reality such an effect exists—that is, it may produce a false negative result. In this case we are dealing with a type II error, whose probability of occurrence can be calculated for a given difference in treatment effect, study size, and significance level. Generally better recognised is the type I error—when a trial produces a significant difference due to chance—whose probability corresponds to the probability (P) value. An examination of clinical trials that reported no significant differences between experimental and control treatment has shown that type II errors in clinical research are common: for a clinically relevant difference in outcome, the a priori probability of missing this effect (given the trial size) was greater than 20% in 115 of the 136 trials examined.34 The number of patients included in clinical trials is thus often inadequate, a situation that has changed little over recent years. In some cases, however, the required sample size may be difficult to achieve. A drug that reduces the risk of death from myocardial infarction by 10% could, for example, delay many thousands of deaths each year in Britain alone. To detect such an effect with 90% certainty (that is, with a type II error of no more than 10%) over 10 000 patients in each treatment group would be needed.35
The meta-analytic approach seems to be an attractive alternative to such a large, expensive, and logistically problematic study. Data from patients in trials evaluating the same or a similar drug in several smaller, but comparable, studies are considered. In this way the necessary number of patients may be reached, and relatively small effects can be detected or excluded with confidence.
Meta-analysis can also contribute to considerations about the generalisability of study results. The findings of a particular study may be valid only for a population of patients with the same characteristics as those investigated in the trial. If many trials exist in different groups of patients, with similar results in the various trials, then it can be concluded that the effect of the intervention under study has some generality. By putting together all available data, meta-analyses are also better placed than individual trials to answer questions about whether an overall study result varies among subgroups—for example, among men and women, older and younger patients, or subjects with different degrees of severity of disease. As discussed later in this series,8 these questions can be addressed in the analysis and often lead to insights beyond what is provided by the calculation of a single combined effect estimate.
Epidemiology of results
Meta-analysis thus not only consists of the combination of data but includes the epidemiological exploration and evaluation of results—the “epidemiology of results,” whereby the findings of an original study replace the individual as the unit of analysis.36 New hypotheses that were not posed in the single studies can thus be tested in meta-analyses. However, although the studies included may be controlled experiments, the meta-analysis itself is subject to many biases inherent in observational studies.37 Meta-analysis can, nevertheless, lead to the identification of the most promising or urgent research question, and may permit a more accurate calculation of the sample sizes needed in future studies. This is illustrated by an early meta-analysis of four trials that compared different methods of monitoring the fetus during labour.38 The meta-analysis led to the hypothesis that, compared with intermittent auscultation, continuous fetal heart monitoring reduced the risk of neonatal seizures. This hypothesis was subsequently confirmed in a single randomised trial of almost seven times the size of the four previous studies combined.39
A more transparent appraisal
One benefit of meta-analysis is that it renders an important part of the review process transparent. In traditional narrative reviews it is often not clear how the conclusions follow from the data examined. In an adequately presented meta-analysis readers should be able to replicate the quantitative component of the argument. To facilitate this, it is valuable if the data included in meta-analyses are either presented in full or made available to interested readers by the authors.
The increased openness required by meta-analysis leads to the replacement of unhelpful descriptors such as “no relation,” “some evidence of a trend,” “a weak relation,” and “a strong relation,” with reproducible numerical values.40 Furthermore, performing a meta-analysis may lead to reviewers moving beyond the conclusions that authors present in the abstract of papers, to a thorough examination of the actual data. As research assistants cannot be sent away with file cards to return with abridged conclusions, Rosenthal has suggested that this will lead to a “decrease in the splendid detachment of the full professor”40—in other words to a stronger involvement of the reviewers in the individual study results. As meta-analysis becomes a standard procedure, however, the splendid detachment may soon be restored.
Cumulative meta-analysis is defined as the repeated performance of meta-analysis whenever a new trial becomes available for inclusion. Such cumulative meta-analysis can retrospectively identify the point in time when a treatment effect first reached conventional levels of significance. For example, Lau and colleagues showed that for the trials of intravenous streptokinase in acute myocardial infarction, a significant (P=0.01) combined difference in total mortality had been achieved by 197341 (fig 2). At that time, 2432 patients had been randomised in eight small trials. The results of the subsequent 25 studies, which included the large GISSI-1 and ISIS-2 trials42 43 and enrolled a total of 34 542 additional patients, reduced the significance level to P=0.001 in 1979, P=0.0001 in 1986, and to P<0.00001 when the first very large trial appeared, narrowing the confidence intervals around an essentially unchanged estimate of about 20% reduction in the risk of death. Interestingly, at least one country licensed streptokinase for use in myocardial infarction before GISSI-142 was published, whereas many national authorities waited for this trial to appear, and some waited a further two years for the results of ISIS-243 (fig 2).
A similar picture is apparent in the case of ß blockade in secondary prevention of myocardial infarction. In 1981 an influential editorial stated that “despite claims that they reduce arrhythmias, cardiac work, and infarct size, we still have no clear evidence that ß blockers improve long-term survival after infarction despite almost 20 years of clinical trials.” 44 Retrospective cumulative meta-analysis, however, shows that a significant beneficial effect (P=0.02) was evident by 1977, and that the combined effect estimate was already both clinically important and highly significant (odds ratio 0.71 (95% confidence interval 0.59 to 0.84), P=0.0001) in 1981 (fig 3). Subsequent trials in a further 13 113 patients only confirmed this result.
Another application of cumulative meta-analysis has been to correlate the accruing evidence with the recommendations made by experts in review articles and textbooks. Antman and colleagues showed for thrombolytic drugs that recommendations for routine use first appeared in 1987, 14 years after a significant (P=0.01) beneficial effect became evident in cumulative meta-analysis.45 Conversely, the prophylactic use of lidocaine continued to be recommended for routine use in myocardial infarction despite the lack of evidence for any beneficial effect and the possibility of a harmful effect being evident in the meta-analysis.
These examples have been taken to suggest that further studies in large numbers of patients may be at best superfluous and costly, if not unethical,46 once a significant treatment effect is evident from meta-analysis of the existing smaller trials. There are several other examples, however, of meta-analyses showing benefit of statistical significance and clinical importance that were later contradicted by large randomised trials.47 Meta-analysis clearly has advantages over conventional narrative reviews and carries considerable promise as a tool in clinical research and health technology assessment. Meta-analysis is not an infallible tool, however, as will be discussed later in this series.
We thank Dr T Johansson and G Enocksson (Pharmacia, Stockholm) and Drs A Schirmer and M Thimme (Behring, Marburg) for providing data on licensing of streptokinase in different countries. The department of social medicine at the University of Bristol is part of the Medical Research Council's health services research collaboration.
Funding: ME was supported by the Swiss National Science Foundation.