Meta-analyses IIIBMJ 2011; 342 doi: https://doi.org/10.1136/bmj.d244 (Published 26 January 2011) Cite this as: BMJ 2011;342:d244
- Philip Sedgwick, senior lecturer in medical statistics
- 1Section of Medical and Healthcare Education, St George’s, University of London, Tooting, London, UK
Previous questions described a meta-analysis of the effectiveness of parenteral corticosteroids for the relief of acute severe migraine headache in adults.1 2 Seven randomised controlled trials were identified in which single dose parenteral dexamethasone, administered alone or in combination with standard abortive therapy, was compared with placebo or any other standard treatment for acute migraine in adults. For each trial, the relative risk for recurrence of acute severe migraine headache in adults within 72 hours for the dexamethasone treatment arm compared with the placebo arm was obtained.3
The results of the meta-analysis were presented in a forest plot⇓. The test for statistical heterogeneity resulted in P=0.40.
Which of the following statements, if any, are true for the statistical test of heterogeneity?
a) Null hypothesis: homogeneity exists between the sample relative risks as estimates of the population parameter
b) Null hypothesis: heterogeneity exists between the sample relative risks as estimates of the population parameter
c) Alternative hypothesis: heterogeneity exists between the sample relative risks as estimates of the population parameter
d) Statistical heterogeneity existed between the seven trials in their estimates of the population relative risk.
Answers a and c are true, whereas b and d are false.
The meta-analysis combined the results of trials comparing single dose parenteral dexamethasone with placebo, with the aim of obtaining a summary estimate of the population relative risk for recurrence of severe migraine headache within 72 hours in adults. It was essential the meta-analysis incorporated a statistical test of heterogeneity to assess the extent of variation between the sample estimates. Statistical homogeneity would have existed if the sample relative risks were similar in magnitude and the variation between them was no more than expected when taking samples from the same population—that is, any variation between them was minimal and a result of sampling error. If statistical homogeneity did not exist, then statistical heterogeneity would be present and the sample estimates would differ substantially—it is possible that some sample estimates might favour dexamethasone and others placebo in the risk of recurrence.
Variation between the sample estimates may occur for a variety of reasons. The population parameter may differ in magnitude within the population—for example, between men and women, ethnic groups, with age, or with disease severity. Therefore, if trials differ in the composition of their samples, then their sample estimates may differ in magnitude. Any differences between trials in their methodology or interventions may also contribute to variation between the sample estimates. The result of the statistical test of heterogeneity influences how the total overall result would be obtained.
The statistical test of heterogeneity is similar to traditional statistical hypothesis testing, there being a statistical null and alternative hypothesis.4 The null hypothesis starts at the position of statistical homogeneity (a is true and b is false). For the above study, there is minimal variation between the sample estimates of the population parameter; any variation that does exist is a result of differences between trials when sampling from the same population, or minor differences in methodology. The alternative hypothesis states that heterogeneity exists (c is true). For the above study, there is substantial variation between the sample relative risks as estimates the population parameter.
The result of the statistical test of heterogeneity is shown on the forest plot in the line titled “Test of heterogeneity.” In particular, we use the P value to establish whether the sample relative risks support the null hypothesis or provide evidence of heterogeneity, as specified by the alternative hypothesis. The P value, described in a previous question,5 represents the strength of evidence in support of the null hypothesis and is derived using the sample relative risks. The P value for the statistical test of heterogeneity was 0.40, which is larger than 0.05—the traditional critical level of significance. Therefore, there is no evidence to reject the null hypothesis in favour of the alternative hypothesis. It is concluded that statistical homogeneity exists between the sample estimates (d is false).
The statistical hypothesis test of heterogeneity may not always accurately detect heterogeneity in the sample estimates. Because of this, the Higgins I2 statistic is frequently used instead. This statistic represents the percentage of variation between the sample estimates that is caused by heterogeneity. It can take values from 0% to 100%, with 0% indicating that statistical heterogeneity does not exist. Significant statistical heterogeneity is often considered to be present if I2 is greater than or equal to 50%. For the above study, in the line titled “Test of heterogeneity,” I2 is 3.4%, corroborating the inference of the statistical hypothesis test that there was no evidence of statistical heterogeneity.
Given that there was no evidence of statistical heterogeneity—that is, statistical homogeneity existed—a so called fixed effects meta-analysis was performed. If statistical heterogeneity had existed, then a random effects meta-analysis would have been undertaken. Whether a fixed effects or a random effects meta-analysis was performed is indicated at the top of the forest plot. The difference between these approaches is the methodology used to calculate the total overall effect. In the presence of heterogeneity, the random effects meta-analysis produces a wider confidence interval for the total overall effect than does a fixed effects meta-analysis, resulting in an overall effect size that has less precision.
If statistical heterogeneity had existed in this study, it would have been necessary to explore why it had occurred. In addition to combining all sample estimates into a single overall effect, the studies should be split into homogeneous subgroups based on their sample composition or methodology, with the aim of establishing if homogeneity exists between the sample estimates within the subgroups. Such an approach would derive more accurate estimates of the population parameter and will be discussed in a future endgame.
Cite this as: BMJ 2011;342:d244
Competing interests: The author has completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declares: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.