Measuring inconsistency in meta-analysesBMJ 2003; 327 doi: http://dx.doi.org/10.1136/bmj.327.7414.557 (Published 04 September 2003) Cite this as: BMJ 2003;327:557
- Julian P T Higgins, statistician (firstname.lastname@example.org)1,
- Simon G Thompson, director1,
- Jonathan J Deeks, senior medical statistician2,
- Douglas G Altman, professor of statistics in medicine2
- 1MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR,
- 2Cancer Research UK/NHS Centre for Statistics in Medicine, Institute of Health Sciences, Oxford OX3 7LF
- Correspondence to: J P T Higgins
Cochrane Reviews have recently started including the quantity I2 to help readers assess the consistency of the results of studies in meta-analyses. What does this new quantity mean, and why is assessment of heterogeneity so important to clinical practice?
Systematic reviews and meta-analyses can provide convincing and reliable evidence relevant to many aspects of medicine and health care.1 Their value is especially clear when the results of the studies they include show clinically important effects of similar magnitude. However, the conclusions are less clear when the included studies have differing results. In an attempt to establish whether studies are consistent, reports of meta-analyses commonly present a statistical test of heterogeneity. The test seeks to determine whether there are genuine differences underlying the results of the studies (heterogeneity), or whether the variation in findings is compatible with chance alone (homogeneity). However, the test is susceptible to the number of trials included in the meta-analysis. We have developed a new quantity, I2, which we believe gives a better measure of the consistency between trials in a meta-analysis.
Need for consistency
Assessment of the consistency of effects across studies is an essential part of meta-analysis. Unless we know how consistent the results of studies are, we cannot determine the generalisability of the findings of the meta-analysis. Indeed, several hierarchical systems for grading evidence state that the results of studies must be consistent or homogeneous to obtain the highest grading.2–4
Tests for heterogeneity are commonly used to decide on methods for combining studies and for concluding consistency or inconsistency of findings.5 6 But what does the test achieve in practice, and how should the resulting P values be interpreted?
Testing for heterogeneity
A test for heterogeneity examines the null hypothesis that all studies are evaluating the same effect. The usual test statistic …