Intended for healthcare professionals

Rapid response to:

Education And Debate
# Measuring inconsistency in meta-analyses

BMJ
2003;
327
doi: https://doi.org/10.1136/bmj.327.7414.557
(Published 04 September 2003)
Cite this as: BMJ 2003;327:557

## Rapid Response:

## I2 Is Subject to the Same Statistical Power Problems as Cochran's Q

I^{2}Is Subject to the Same Statistical Power Problems as Cochran’sQTania Huedo-Medina,post-doctoral fellow^{1,2},Blair T Johnson,professorof psychology

^{1}^{1}Center for Health, Intervention, and Prevention(CHIP), University of Connecticut, 2006 Hillside Road, Unit 1248, Storrs, CT

06269-1248 USA.

^{2}Correspondence: tania.huedo-medina{at}uconn.edu.In their popular article, Higgins and

colleagues (2003) provided a valuable explanation about the importance of

assessing heterogeneity in overall meta-analytic findings, and how their new

index helps scholars to attain this goal. As these authors review, there are

three general ways to assess heterogeneity in meta-analysis, but each has a

liability for interpretation. First, one can assess the between-studies variance,

τ

^{2}, but its values depend on the particular effect size metricused, along with other factors. The second is Cochran’s

Q, which follows a chi-square distribution to make inferences aboutthe null hypothesis of homogeneity. (It is actually not a test of

heterogeneity, as Higgins and colleaguesassert, but of the hypothesis of homogeneity.) The problem with

Qis that it has a poor power to detectthe true heterogeneity when the number of studies is small. Because neither of

these first two methods has a standardized scale, they are poorly equipped to

make comparisons of the degree of homogeneity across meta-analyses.

The third and final way to assess the

heterogeneity is calculating a scale-free index of variability. The Birge ratio, originated in 1932, has been the most commonly

used scale-free index to quantify the consistency of study findings; it is

defined as the ratio of a chi-square to its degrees of freedom. Because the

degrees of freedom are the expected value of each chi-square, when the

chi-square shows only random variation, the Birge

ratio is close to 1.00. Thus, to the extent that the Birge

ratio exceeds 1.00, results of a set of studies lack homogeneity. That is, they

are more varied than one can expect based merely on sampling error.

Higgins and Thompson (2002; Higgins et al.,

2003) extended the Birge ratio to the

I^{2}index in an effort toovercome the shortcomings of

Qandτ

^{2}. Like the Birge ratio, theI^{2}index is a scale-freeindex of variability in defining the ratio of

Qin relation to its degrees of freedom. The advantage of this newindex is its easier interpretation because it defines variability along a

scale-free range as a percentage from 0 to 100%. Although Higgins et al. claimed

that an advantage of the

I^{2}index is that it “does not inherently depend on the number of studies in the

meta-analysis” (p. 559), they provided no evidence to support this claim.

Direct comparisons of

I^{2}toQaredifficult because only the second index has a known sampling distribution

theory that can be used to estimate the probability of a particular value’s

appearance. To counter this problem with

I^{2},Higgins and Thompson (2002) developed approximate confidence intervals for

I^{2}based on the Birge ratio (which they termed theHindex). Huedo-Medina et al. (2006) used these confidenceintervals in order to compare the performance of

I^{2}to

Qin a Monte-Carlo simulation across awide variety of potential meta-analytic conditions. Huedo-Medina and

colleagues’ results demonstrated that like

Q,I^{2}suffers from the sameproblem of low statistical power with small numbers of studies. Specifically, the

confidence intervals around

I^{2}behave very similarly to tests of

Qin terms of Type I error and statistical power. Readers can examine this

conclusion for themselves: In each of the 14 examples that Higgins et al.

(2003) provided, the inference about consistency reached from the

I^{2 }index is identical to thatreached by the

Q.We concur with Higgins and colleagues that (1) in

reporting

Q(with its associatedpvalue) andI^{2}(with its confidence intervals), it is easier tointerpret the degree of consistency in a set of study outcomes; (2) using

I^{2}greatly facilitates comparisonsacross meta-analyses; and (3) the values of

I^{2}themselves do not depend on the number of studies. Nonetheless, inferences from

both

QandI^{2 }can be misleading when the number of studies is small.Under such circumstances, analysts should still interpret results with caution.

ReferencesBirge, R. T. (1932). The

calculation of errors by the method of least squares.

Reviews of Modern Physics, 40,207-227.Higgins, J. P. T.,

& Thompson, S. G. (2002).Quantifying heterogeneity in a meta-analysis.

Statistics in Medicine, 21, 1539-1558.Higgins, J. P. T.,

Thompson, S. G., Deeks, J. J., & Altman, D. G.

(2003).Measuring inconsistency in meta-analyses.

British Medical Journal, 327,557–560.Huedo-Medina, T. B., Sánchez-Meca, F., Marín-Martínez,

F., & Botella, J. (2006). Assessing heterogeneity in

meta-analysis:

I^{2}orQstatistic?Psychological Methods, 11, 193-206.Competing interests:

None declared

Competing interests:No competing interests04 January 2007