In their meta-analytical investigation of rare adverse events associated with varenicline use for tobacco cessation, Prochaska and Hilton compared four different methods of meta-analysis (the Peto odds ratio, and the Mantel-Haenszel odds ratio, risk ratio and risk difference methods) and noted that they yielded different estimates of effect . Such differences are not unexpected since the methods are all based on large sample asymptotic statistical theory, the assumptions of which are challenged when events are rare. The three Mantel-Haenszel methods also involve the use of an arbitrary numerical correction to avoid computational errors that occur when attempting to divide by zero.
Prochaska and Hilton conclude “for rare outcomes, summary estimates based on absolute effects are recommended and estimates based on the Peto odds ratio should be avoided”. They provide five arguments to support this statement, none of which is convincing, some of which are misleading, and some of which are seriously flawed. To demonstrate that a result is biased, it is necessary to know what the correct result should be. This cannot be achieved in a case-study such as this, which simply compares four different analytical methods. The authors’ conclusion is not justified and is potentially dangerous.
First, Prochaska and Hilton state “treatment effects based on relative risks always are as or less extreme than those based on odds ratios”. It is a mistake to compare the ‘extremeness’ of two different metrics in this way. Furthermore, when the outcomes are rare as they are in this example, odds ratios are in fact very close approximations to relative risks.
Second they argue “relative statistics cannot be calculated for trials with zero events and therefore can bias summaries against the null hypothesis of no effect”. This reflects a common intuitive feeling that it is wrong to exclude any studies from a meta-analysis. However, if no events occur at all in a trial, the trial in itself conveys no information about the relative odds or risks of events between the two groups. A meta-analysis may be viewed as a weighted average of trial results, with weights reflecting the amount of information each study contains about the summary statistic. Allocating a trial with no information a zero weight is entirely appropriate and does not introduce bias.
Third, they argue that relative statistics hide the impact of the effect, whereas risk differences most clearly convey the effect, using this to justify their meta-analysis of risk differences. We agree that absolute effect measures convey more useful information. However, the use of relative measures in a meta-analysis is not a barrier to re-expressing the treatment effect in absolute terms, e.g. as a number needed to treat to benefit or harm, or a risk difference.
Prochaska and Hilton also cite Vandermeer and colleagues’ analysis of 1613 meta-analyses , claiming that they showed the Peto odds ratio to be particularly biased. Vandermeer in fact compared results of asymptotic methods with meta-analytical techniques based on ‘permutation’ or ‘exact’ methods. As with the current study, Vandermeer did not know what the true result was, so was unable to indicate that one method was biased, only that methods gave different results.
Finally they imply that the Peto odds ratio method must be biased because it produces the most extreme values of odds ratios for the individual studies. This is a misunderstanding of the way in which the method should be applied – it is designed to compute an overall meta-analytical statistic and not for computation of estimates of odds ratios for individual studies. The strength of the Peto method is that it aggregates within-trial comparisons across trials in a way that avoids the need for the arbitrary addition of 0.5 events to some treatment groups in which no events were observed. It is this arbitrary addition that causes both the Mantel-Haenszel risk ratio and odds ratio to be similar, and the Mantel-Haenszel odds ratio to be smaller than the Peto odds ratio.
The rigorous approach to understanding bias in statistical methods is to undertake simulation studies, where the investigators create data with known true treatment effects, apply the alternative statistical methods and examine how the estimates compare with the truth. Such studies can investigate bias in the treatment effect, the coverage of confidence intervals, the correctness of P-values, and the power that different methods have to detect differences. One of us undertook and reported such a study of statistical methods for meta-analysis of rare events, and found evidence that the methods to be recommended in practice exactly are the opposite to the recommendation of Prochaska and Hilton, with the Peto method being the least biased and most powerful in situations where event rates were around 1% . Notably, whilst the Mantel-Haenszel risk difference method produced relatively unbiased estimates of treatment effects, the method was shown to be seriously limited in its ability to detect real differences, as its confidence intervals are wide, giving poor statistical power. It is thus unsuitable for meta-analysis of rare events, as it reduces the chance of real increases in rare adverse events being detected, with the subsequent possibility that patients continue to be exposed to harmful effects of some medications.
 Prochaska J, Hilton J. Risk of serious adverse cardiovascular events associated with varenicline use for tobacco cessation: systematic review and meta-analysis. BMJ. 2012;344(e2856)
 Vandermeer B, Bialy L, Hooton N, Hartling L, Klassen TP, Johnston BC, et al. Meta-analyses of safety data: a comparison of exact versus asymptotic methods. Stat Methods Med Res 2009;18:421-32.
 Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Stat Med 2007;26:53-77.