Investigating and dealing with publication and other biases in metaanalysis
BMJ 2001; 323 doi: https://doi.org/10.1136/bmj.323.7304.101 (Published 14 July 2001) Cite this as: BMJ 2001;323:101 Jonathan A C Sterne (jonathan.sterne{at}bristol.ac.uk), senior lecturer in medical statistics,
 Matthias Egger, senior lecturer in epidemiology and public health medicine,
 George Davey Smith, professor of clinical epidemiology
 Medical Research Council Health Services Research Collaboration, Department of Social Medicine, University of Bristol, Bristol BS8 2PR
 Correspondence to: J A C Sterne
This is the second in a series of four articles
Studies that show a significant effect of treatment are more likely to be published, be published in English, be cited by other authors, and produce multiple publications than other studies.1^{–}8 Such studies are therefore also more likely to be identified and included in systematic reviews, which may introduce bias.9 Low methodological quality of studies included in a systematic review is another important source of bias.10
All these biases are more likely to affect small studies than large ones. The smaller a study the larger the treatment effect necessary for the results to be significant. The greater investment of time and money in larger studies means that they are more likely to be of high methodological quality and published even if their results are negative. Bias in a systematic review may therefore become evident through an association between the size of the treatment effect and study size—such associations may be examined both graphically and statistically.
Summary points
Asymmetrical funnel plots may indicate publication bias or be due to exaggeration of treatment effects in small studies of low quality
Bias is not the only explanation for funnel plot asymmetry; funnel plots should be seen as a means of examining “small study effects” (the tendency for the smaller studies in a metaanalysis to show larger treatment effects) rather than a tool for diagnosing specific types of bias
Statistical methods may be used to examine the evidence for bias and to examine the robustness of the conclusions of the metaanalysis in sensitivity analyses
“Correction” of treatment effect estimates for bias should be avoided as such corrections may depend heavily on the assumptions made
Multivariable models may be used, with caution, to examine the relative importance of different types of bias
Graphical methods for detecting bias
Funnel plots
Funnel plots were first used in educational research and psychology.11 They are simple scatter plots of the treatment effects estimated from individual studies (horizontal axis) against some measure of study size (vertical axis). Because precision in estimating the underlying treatment effect increases as a study's sample size increases, effect estimates from small studies scatter more widely at the bottom of the graph, with the spread narrowing among larger studies. In the absence of bias the plot therefore resembles a symmetrical inverted funnel (fig 1 (left)).
Ratio measures of treatment effect—relative risk or odds ratio—are plotted on a logarithmic scale, so that effects of the same magnitude but in opposite directions—for example, 0.5 and 2—are equidistant from 1.0.12 Treatment effects have generally been plotted against sample size or log sample size. However, the statistical power of a trial is determined by both the sample size and the number of participants developing the event of interest, and so the use of standard error as the measure of study size is generally a good choice. Plotting against precision (1/standard error) emphasises differences between larger studies, which may be useful in some situtations. Guidelines on the choice of axis in funnel plots are presented elsewhere.13
Reporting bias—for example, because smaller studies showing no statistically significant beneficial effect of the treatment (open circles in fig 1 (left)) remain unpublished—leads to an asymmetrical appearance with a gap in the bottom right of the funnel plot (fig 1 (centre)). In this situation the combined effect from metaanalysis overestimates the treatment's effect. 14 15 Smaller studies are, on average, conducted and analysed with less methodological rigour than larger ones, so that asymmetry may also result from the overestimation of treatment effects in smaller studies of lower methodological quality (fig 1 (right)).
Alternative explanations of funnel plot asymmetry
It is important to realise that funnel plot asymmetry may have causes other than bias.14 Heterogeneity between trials leads to asymmetry if the true treatment effect is larger in the smaller trials. For example, if a combined outcome is considered then substantial benefit may be seen only in patients at high risk for the component of the combined outcome affected by the intervention. 16 17 Trials conducted in patients at high risk also tend to be smaller because of the difficulty in recruiting such patients and because increased event rates mean that smaller sample sizes are required to detect a given effect. Some interventions may have been implemented less thoroughly in larger trials, which thus show decreased treatment effects. For example, an asymmetrical funnel plot was found in a metaanalysis of trials examining the effect of comprehensive assessment on mortality. An experienced consultant geriatrician was more likely to be actively involved in the smaller trials, and this may explain the larger treatment effects observed in these trials. 14 18
Other sources of funnel plot asymmetry are discussed elsewhere.19 Because publication bias is only one of the possible reasons for asymmetry, the funnel plot should be seen as a means of examining “small study effects” (the tendency for the smaller studies in a metaanalysis to show larger treatment effects). The presence of funnel plot asymmetry should lead to consideration of possible explanations and may bring into question the interpretation of the overall estimate of treatment effect from a metaanalysis.
Examining biological plausibility
In some circumstances the possible presence of bias can be examined through markers of adherence to treatment, such as drug metabolites in patients' urine or markers of the biological effects of treatment such as the achieved reduction in cholesterol concentration in trials of cholesterol lowering drugs. If patients' adherence to an effective treatment varies across trials this should result in corresponding variation in treatment effects. Scatter plots of treatment effect against adherence should be compatible with there being no treatment effect at 0% adherence, and so a simple regression line should intercept the vertical axis at zero treatment effect. If a scatter plot indicates a treatment effect even when no patients adhere to treatment then bias is a possible explanation. Such plots provide an analysis that is independent of study size. For example, in a metaanalysis of trials examining the effect of reducing dietary sodium on blood pressure Midgley et al plotted the reduction in blood pressure against the reduction in urinary sodium concentration for each study and performed a linear regression analysis (fig 2).20 The results show a reduction in blood pressure even in the absence of a reduction in urinary sodium concentration, which may indicate the presence of bias.
Statistical methods for detecting and correcting for bias
Selection models
“Selection models” to detect publication bias model the selection process that determines which results are published, based on the assumption that the study's P value affects its probability of publication.21^{–}23 The methods can be extended to estimate treatment effects, corrected for the estimated publication bias,24 but avoidance of strong assumptions about the nature of the selection mechanism means that a large number of studies is required so that a sufficient range of P values is included. Published applications include a metaanalysis of trials of homoeopathy and correction of estimates of the association between passive smoking and lung cancer. 25 26 The complexity of the methods and the large number of studies needed probably explains why selection models have not been widely used in practice.
Copas proposed a model in which the probability that a study is included in a metaanalysis depends on its standard error.27 Because there are not enough data to choose a single “best” model, he advocates sensitivity analyses in which the value of the estimated treatment effect is computed under a range of assumptions about the severity of the selection bias: these show how the estimated effect varies as the assumed amount of selection bias increases. Application of the method to epidemiological studies of environmental tobacco smoke and lung cancer suggests that publication bias may explain some of the association observed in metaanalyses of these studies.28
The “correction” of effect estimates when publication bias is assumed to be present is problematic and a matter of ongoing debate. Results may depend heavily on the modeling assumptions used. Many factors may affect the probability of publication of a given set of results, and it is difficult, if not impossible, to model these adequately. Furthermore, publication bias is only one of the possible explanations for associations between treatment effects and study size. It is therefore prudent to restrict the use of statistical methods that model selection mechanisms to the identification of bias rather than correcting for it.29
Trim and fill
Duval and Tweedie have proposed “trim and fill”; a method based on adding studies to a funnel plot so that it becomes symmetrical.30^{–}32 Smaller studies are omitted until the funnel plot is symmetrical (trimming). The trimmed funnel plot is used to estimate the true “centre” of the funnel, and then the omitted studies and their missing “counterparts” around the centre are replaced (filling). This provides an estimate of the number of missing studies and an adjusted treatment effect, including the “filled” studies. A recent study that used the trim and fill method in 48 metaanalyses estimated that 56% of metaanalyses had at least one study missing whereas the number of missing studies in 10 was statistically significant.33 However, simulation studies have found that the trim and fill method detects “missing” studies in a substantial proportion of metaanalyses, even in the absence of bias.34 Thus there is a danger that in many metaanalyses application of the method could mean adding and adjusting for nonexistent studies in response to funnel plot asymmetry arising from nothing more than random variation.
Statistical methods for detecting funnel plot asymmetry
An alternative approach, which does not attempt to define the selection process leading to publication, is to examine associations between study size and estimated treatment effects. Begg and Mazumdar proposed a rank correlation method to examine the association between the effect estimates and their variances (or, equivalently, their standard errors),35 whereas Egger et al introduced a linear regression approach, which is equivalent to a weighted regression of treatment effect (for example, log odds ratio) on its standard error, with weights inversely proportional to the variance of the effect size.14 Because each of these approaches looks for an association between the study's treatment effect and its standard error, they can be seen as statistical analogues of funnel plots. The regression method is more sensitive than the rank correlation approach, but the sensitivity of both methods is generally low in metaanalyses based on less than 20 trials.36
An obvious extension is to consider study size as one of several different possible explanations for heterogeneity between studies in multivariable “metaregression” models. 37 38 For example, the effects of study size, adequacy of randomisation, and type of blinding might be examined simultaneously. Three notes of caution are necessary. Firstly, in standard regression models inclusion of large numbers of covariates (overfitting) is unwise, particularly if the sample size is small. In metaregression the number of data points corresponds to the number of studies, which is often less than 10.36 Thus tests for an association between treatment effect and large numbers of study characteristics may lead to spurious claims of association. Secondly, all associations found in such analyses are observational and may be confounded by other factors. Thirdly, regression analyses using averages of patient characteristics from each trial (for example, patients' mean age) can give misleading impressions of the relation for individual patients—the “ecological fallacy.”39
Metaregression can also be used to examine associations between clinical outcomes and markers of adherence to, or the biological effects of, treatment; weighting appropriately for study size. As discussed, a nonzero intercept may indicate bias or a treatment effect that is not mediated through the marker. The error in estimating the effect of treatment should be incorporated in such models: Daniels and Hughes discuss this and propose a bayesian estimation procedure, which has been applied in a study of CD4 cell count as a surrogate end point in clinical trials of HIV. 40 41
Case study
Is the effect of homoeopathy due to the placebo effect?
The placebo effect is a popular explanation for the apparent efficacy of homoeopathic remedies.42^{–}44 Linde et al addressed this question in a systematic review and metaanalysis of 89 published and unpublished reports of randomised placebo controlled trials of homoeopathy.25 They did an extensive literature search and quality assessment that covered dimensions of internal validity known to be associated with treatment effects.10
The funnel plot of the 89 trials is clearly asymmetrical (fig 3 (top)), and both the rank correlation and the weighted regression tests indicated clear asymmetry (P<0.001). The authors used a selection model to correct for publication bias and found that the odds ratio was increased from 0.41 (95% confidence interval 0.34 to 0.49) to 0.56 (0.32 to 0.97, P=0.037). 22 24 They concluded that the clinical effects of homoeopathy were unlikely to be due to placebo.25 Similar results are obtained with the trim and fill method (fig 3 (bottom)), which adds 16 studies to the funnel plot, leading to an adjusted odds ratio of 0.52 (0.43 to 0.63). These methods do not, however, allow simultaneously for other sources of bias. It may be more reasonable to conclude that methodological flaws led to exaggeration of treatment effects in the published trials than to assume that there are unpublished trials showing substantial harm caused by homoeopathy (fig 3 (bottom)).
The table shows the results from metaregression analyses of associations between trial characteristics and the estimated effect of homoeopathy. Results are presented as ratios of odds ratios: ratios of less than 1 correspond to a smaller odds ratio for trials with the characteristic and hence a larger apparent benefit of homoeopathy. For example, in univariable analysis the ratio of odds ratios was 0.24 (95% confidence interval 0.12 to 0.46) if the assessment of outcome was not adequately blinded, implying that such trials showed much greater protective effects of homoeopathy. In the multivariable analysis shown in the table there was clear evidence from the asymmetry coefficient that treatment effects were larger in smaller studies and in studies with inadequate blinding of outcome assessment. There was also a tendency for larger treatment effects in trials published in languages other than English.
Summary recommendations on investigating and dealing with publication and other biases in a metaanalysis
Examining for bias
Check for funnel plot asymmetry with graphical and statistical methods
Use metaregression to look for associations between key measures of trial quality and size of treatment effect
Use metaregression to examine other possible explanations for heterogeneity
If available, examine associations between size of treatment effect and changes in biological markers or patients' adherence to treatment
Dealing with bias
If there is evidence of bias, report this with the same prominence as any combined estimate of treatment effect
Consider sensitivity analyses to establish whether the estimated treatment effect is robust to reasonable assumptions about the effect of bias
Consider excluding studies of lower quality
If sensitivity analyses show that a review's conclusions could be seriously affected by bias, then consider recommending that the evidence to date be disregarded
The largest trials of homoeopathy (those with the smallest standard error) that were also double blind and had adequate concealment of randomisation show no effect. The evidence is thus compatible with the hypothesis that the clinical effects of homoeopathy are completely due to placebo and that the effects observed in Linde et al's metaanalysis are explained by a combination of publication bias and inadequate methodological quality of trials. We emphasise, however, that these results cannot prove that the apparent benefits of homoeopathy are due to bias.
Conclusions
Prevention is better than cure. In conducting a systematic review and metaanalysis, investigators should make strenuous efforts to find all published studies and search for unpublished work. The quality of component studies should also be carefully assessed.10 The box shows summary recommendations on examining for, and dealing with, bias in metaanalysis. Selection models for publication bias are likely to be of most use in sensitivity analyses in which the robustness of a metaanalysis to possible publication bias is assessed. Funnel plots should be used in most metaanalyses to provide a visual assessment of whether the estimates of treatment effect are associated with study size. Statistical methods may be used to examine the evidence for funnel plot asymmetry and competing explanations for heterogeneity between studies. The power of these methods is, however, limited, particularly for metaanalyses based on a small number of small studies. The results of such metaanalyses should always be treated with caution.
Statistically combining data from new trials with a body of flawed evidence does not remove bias. However there is currently no consensus to guide clinical practice or future research when a systematic review suggests that the evidence to date is unreliable for one or more of the reasons discussed here. If there is clear evidence of bias, and if sensitivity analyses show that this could seriously affect a review's conclusions, then reviewers should recommend that some or all of the evidence to date be disregarded. Future reviews could then be based on new, high quality evidence. Improvements in the conduct and reporting of trials, prospective registration, and easier access to data from published and unpublished studies 45 46 mean that bias will hopefully be a diminishing problem in future systematic reviews and metaanalyses.
Acknowledgments
We thank Klaus Linde and Julian Midgley for unpublished data.
Footnotes

Series editor: Matthias Egger

Competing interests Systematic Reviews in Health Care: Metaanalysis in Context can be purchased through the BMJ Bookshop (http://www.bmjbookshop.com/); further information and updates for the book are available (http://www.systematicreviews.com/)