Meta-analysis Spurious precision? Meta-analysis of observational studies
(Published 10 January 1998)
Cite this as: BMJ 1998;316:140
- Matthias Egger, reader in social medicine and epidemiology ()a,
- Martin Schneider, research fellowb,
- George Davey Smith, professor of clinical epidemiologya
- a Department of Social Medicine, University of Bristol, Bristol BS8 2PR
- b Department of Social and Preventive Medicine, University of Berne, CH-3012 Berne, Switzerland
- Correspondence to: Dr Egger
In previous articles we have focused on the potentials, principles, and pitfalls of meta-analysis of randomised controlled trials.1 2 3 4 5 Meta-analysis of observational data is, however, also becoming common. In a Medline search we identified 566 articles (excluding those published as letters) published in 1995 and indexed with the medical subject heading (MeSH) term “meta-analysis.” We randomly selected 100 of these articles and examined them further. Sixty articles reported on actual meta-analyses, and 40 were methodological papers, editorials, and traditional reviews (1). Among the meta-analyses, about half were based on observational studies, mainly cohort and case-control studies of medical interventions or aetiological associations.
The randomised controlled trial is the principal research design in the evaluation of medical interventions. However, aetiological hypotheses—for example, those relating common exposures to the occurrence of disease—cannot generally be tested in randomised experiments. Does breathing other people's tobacco smoke cause lung cancer, drinking coffee cause coronary heart disease, and eating a diet rich in saturated fat cause breast cancer? Studies of such “menaces of daily life”6 use observational designs or examine the presumed biological mechanisms in the laboratory. In these situations the risks involved are generally small, but once a large proportion of the population is exposed, the potential public health implications of these associations—if they are causal—can be striking.
Analyses of observational data also have a role in medical effectiveness research.7 The evidence available from clinical trials will rarely answer all the important questions. Most trials are conducted to establish efficacy and safety of a single agent in a specific clinical situation. Owing to the limited size of such trials, less common adverse effects of drugs may only be detected in case-control studies or in analyses of databases from postmarketing surveillance schemes. Also, because follow up is generally limited, adverse effects occurring many years later will not be identified. If years later established interventions are incriminated with adverse effects, there will be ethical, political, and legal obstacles to the conduct of a new trial. Recent examples for such situations include the controversy surrounding a possible association between intramuscular administration of vitamin K to newborns and the risk of childhood cancer8 and whether oral contraceptives increase women's risk of breast cancer.9
Meta-analysis of observational studies is as common as meta-analysis of controlled trials
Confounding and selection bias often distort the findings from observational studies
There is a danger that meta-analyses of observational data produce very precise but equally spurious results
The statistical combination of data should therefore not be a prominent component of reviews of observational studies
More is gained by carefully examining possible sources of heterogeneity between the results from observational studies
Reviews of any type of research and data should use a systematic approach, which is documented in a materials and methods section
The patients who are enrolled in randomised trials often differ from the average patient seen in clinical practice. Women, elderly people, and minority ethnic groups are often excluded from randomised trials.10 11 Similarly, the university hospitals typically participating in clinical trials differ from the settings where most patients are treated. In the absence of evidence from randomised trials from these settings and patient groups, the results from observational database analyses may seem more relevant and more readily applicable to clinical practice.12 Finally, strong prior views may preclude the recruitment of sufficient patients or clinics into a randomised experiment. In complementary medicine, for example, consider a treatment that entails drinking your own urine.13 It would probably be impossible to recruit sufficient patients into a controlled trial.
Meta-analysis, by promising a precise and definite answer when the magnitude of the underlying risks are small or when the results from individual studies disagree, seems an attractive proposition both in aetiological studies and in observational effectiveness research.
Meta-analysis of randomised trials is based on the assumption that each trial provides an unbiased estimate of the effect of an experimental treatment, with the variability of the results between the studies being attributed to random variation. The overall effect calculated from a group of sensibly combined and representative randomised trials will provide an essentially unbiased estimate of the treatment effect, with an increase in the precision of this estimate. A fundamentally different situation arises in the case of observational studies. Such studies yield estimates of association which may deviate from true underlying relationships beyond the play of chance. This may be due to the effects of confounding factors, the influence of biases, or both.
Patients exposed to the factor under investigation may differ in several other aspects that are relevant to the risk of developing the disease in question. Consider, for example, smoking as a risk factor for suicide. Virtually all cohort studies have shown a positive association, with a dose-response relation being evident between the amount smoked and the probability of committing suicide.14 15 16 17 18 19 Figure 1 illustrates this for four prospective studies of middle aged men, including the massive cohort of patients screened for the multiple risk factors intervention trial. Based on over 390 000 men and almost five million years of follow up, a meta-analysis of these cohorts produces highly precise and significant estimates of the increase in suicide risk that is associated with smoking different daily amounts of cigarettes: relative rate for 1–14 cigarettes 1.43 (95% confidence interval 1.06 to 1.93), for 15–24 cigarettes 1.88 (1.53 to 2.32), ≥25 cigarettes 2.18 (1.82 to 2.61).
On the basis of established criteria,20 many would consider the association to be causal—if only it were more plausible. Indeed, it is improbable that smoking is causally related to suicide.14 Rather, it is the social and mental states predisposing to suicide that are also associated with the habit of smoking. Factors that are related to both the exposure and the disease under study—confounding factors—may thus distort results. If the factor is known and has been measured, the usual approach is to adjust for its influence in the analysis. For example, studies assessing the influence of coffee consumption on the risk of myocardial infarction should make statistical adjustments for smoking, as smoking is generally associated with drinking larger amounts of coffee, and smoking is a cause of coronary heart disease.21 However, even if adjustments for confounding factors have been made in the analysis, residual confounding remains a potentially serious problem in observational research. Residual confounding arises when a confounding factor cannot be measured with sufficient precision—which often occurs in epidemiological studies.22 23 Confounding is the most important threat to the validity of results from cohort studies, whereas many more difficulties, in particular selection biases, arise in case-control studies.24
Implausibility of results, as in the case of smoking and suicide, rarely protects us from reaching misleading claims. It is generally easy to produce plausible explanations for the findings from observational research. In a cohort study of sex workers, for example, one group of researchers that investigated cofactors in transmission of HIV among heterosexual men and women found a strong association between oral contraceptives and HIV infection, which was independent of other factors.25 The authors hypothesised that, among other mechanisms, the risk of transmission could be increased with oral contraceptives due to “effects on the genital mucosa, such as increasing the area of ectopy and the potential for mucosal disruption during intercourse.” In a cross sectional study another group produced diametrically opposed findings, indicating that oral contraceptives protect against the virus.26 This was considered to be equally plausible, “since progesterone-containing oral contraceptives thicken cervical mucus, which might be expected to hamper the entry of HIV into the uterine cavity.” It is likely that confounding and bias had a role in producing these contradictory findings. This example should be kept in mind when assessing other seemingly plausible epidemiological associations.
Observational studies have consistently shown that people eating more fruits and vegetables, which are rich in β carotene, and people having higher serum β carotene concentrations have lower rates of cardiovascular disease and cancer.27 β carotene has antioxidant properties and could thus plausibly be expected to prevent carcinogenesis and atherogenesis by reducing oxidative damage to DNA and lipoproteins.27 Contrary to many other associations found in observational studies, this hypothesis could be, and was, tested in experimental studies. The findings of four large trials have recently been published.28 29 30 31 The results were disappointing and even—for the two trials conducted in men at high risk (smokers and workers exposed to asbestos)28 29—disturbing.
We performed a meta-analysis of the findings for cardiovascular mortality, comparing the results from the six observational studies recently reviewed by Jha et al27 with those from the four randomised trials. For the observational studies the results relate to a comparison between groups with high and low β carotene intake or serum β carotene concentration, whereas in the trials the participants randomised to β carotene supplements were compared with those randomised to placebo. With a fixed effects model, the meta-analysis of the cohort studies shows a significantly lower risk of cardiovascular death (relative risk reduction 31% (95% confidence interval 41% to 20%, P<0.0001)) (fig 2). The results from the randomised trials, however, show a moderate adverse effect of β carotene supplementation (relative increase in the risk of cardiovascular death 12% (4% to 22%, P=0.005)). Similarly discrepant results between epidemiological studies and trials were observed for the incidence of and mortality from cancer. This example illustrates that in meta-analyses of observational studies, the analyst may well be simply producing tight confidence intervals around spurious results.
Some observers suggest that meta-analysis of observational studies should be abandoned altogether.32 We disagree, but we think that the statistical combination of studies should not generally be a prominent component of reviews of observational studies. The thorough consideration of possible sources of heterogeneity between observational study results will provide more insights than the mechanistic calculation of an overall measure of effect, which will often be biased.
Heterogeneity can be explored in funnel plots, a graphical method discussed in detail previously.5 Funnel plots will, however, generally be less useful in the context of observational meta-analyses. Publication bias and related biases4 will be less important against the background of the numerous other biases and confounding factors that may introduce heterogeneity. Several such situations are depicted in figure 3. Consider diet and breast cancer. The hypothesis from ecological analyses33 that higher intake of saturated fat could increase the risk of breast cancer generated much observational research, often with contradictory results. A comprehensive meta-analysis34 showed an association for case-control but not for cohort studies (odds ratio 1.36 for case-control studies versus relative rate 0.95 for cohort studies comparing highest with lowest category of saturated fat intake, P=0.0002 for difference in our calculation) (fig 2). This discrepancy was also shown in two separate large collaborative pooled analyses of cohort and case-control studies.35 36 The most likely explanation for this situation is that biases in the recall of dietary items and in the selection of study participants have produced a spurious association in the case-control comparisons.36
That differential recall of past exposures may introduce bias is also evident from a meta-analysis of case-control studies of intermittent sunlight exposure and melanoma (fig 3).37 When studies were combined in which some degree of blinding to the study hypothesis was achieved, only a small and non-significant effect (odds ratio 1.17 (95% confidence interval 0.98 to 1.39)) was evident. Conversely, in studies without blinding, the effect was considerably greater and significant (1.84 (1.52 to 2.25)). The difference between these two estimates is unlikely to be a product of chance (P=0.0004 in our calculation).
The importance of the methods used for assessing exposure is further illustrated by a meta-analysis of cross sectional data of dietary calcium intake and blood pressure from 23 different studies.38 As shown in figure 3, the regression slope describing the change in systolic blood pressure (in mm Hg) per 100 mg of calcium intake is strongly influenced by the approach used for assessing the amount of calcium consumed. The association is small and only marginally significant with diet histories (slope −0.01 (−0.003 to −0.016)) but large and highly significant when food frequency questionnaires were used (−0.15 (−0.11 to −0.19). With studies using 24 hour recall an intermediate result emerges (−0.06 (−0.09 to −0.03). Diet histories assess patterns of usual intake over long periods of time and require an extensive interview with a nutritionist, whereas 24 hour recall and food frequency questionnaires are simpler methods that reflect current consumption.39 It is conceivable that different precision in the assessment of current calcium intake may explain the differences in the strength of the associations found, a statistical phenomenon known as regression dilution bias.40
An important criterion supporting causality of associations is a dose-response relation. In occupational epidemiology the quest to show such an association can lead to very different groups of employees being compared. In a meta-analysis that examined the link between exposure to formaldehyde and cancer, funeral directors and embalmers (high exposure) were compared with anatomists and pathologists (intermediate to high exposure) and with industrial workers (low to high exposure, depending on job assignment).41 As shown in figure 3, there is a striking deficit of deaths from lung cancer among anatomists and pathologists (standardised mortality ratio 33 (95% confidence interval 22 to 47)), which is most likely to be due to a lower prevalence of smoking among this group. In this situation few would argue that formaldehyde protects against lung cancer. In other instances, however, such selection bias may be less obvious.
In these examples heterogeneity was explored in the spirit of sensitivity analysis2—to test the stability of findings across different study designs and different approaches to both exposure ascertainment and selection of study participants. Such sensitivity analyses should alert investigators to inconsistencies and prevent misleading conclusions. Although heterogeneity was noticed, explored, and sometimes extensively discussed, the way the situation was interpreted differed considerably. In the analysis examining studies of dietary fat and risk of breast cancer, the authors went on to combine case-control and cohort studies and concluded that “higher intake of dietary fat is associated with an increased risk of breast cancer.”34 The meta-analysis of exposure to sunlight and risk of melanoma was exceptional in its thorough examination of possible reasons for heterogeneity, and the calculation of a combined estimate was deemed appropriate in one subgroup of population based studies only.37 Conversely, uninformative and potentially misleading combined estimates were calculated both in the study on dietary calcium and blood pressure38 and in the meta-analysis of occupational exposure to formaldehyde.41 These case studies show that the temptation to combine the results of studies seems to be hard to resist.
The suggestion that formal meta-analysis of observational studies can be misleading and that insufficient attention is often given to heterogeneity does not mean that researchers should return to writing highly subjective narrative reviews. Many of the principles of systematic reviews remain: a study protocol should be written in advance, complete literature searches carried out, and studies selected and data extracted in a reproducible and objective fashion.42 This allows both differences and similarities of the results found in different settings to be inspected, hypotheses to be formulated, and the need for future studies, including randomised controlled trials, to be defined.
We are grateful to Jim Neaton (Multiple Risk Factors Intervention Trial Research Group); Juha Pekkanen and Erkki Vartiainen (North Karelia and Kuopio cohort studies); and Martin Shipley (Whitehall study) for providing additional data on suicides. The department of social medicine at the University of Bristol is part of the Medical Research Council's health services research collaboration.
Funding: ME was supported by the Swiss National Science Foundation.