- George Davey Smith, professor of clinical epidemiologya (zetkin{at}bristol.ac.uk),
- Matthias Egger, reader in social medicine and epidemiologya,
- Andrew N Phillips, professor of epidemiology and biostatisticsb

^{a}Department of Social Medicine, University of Bristol, Bristol BS8 2PR^{b}Department of Primary Care and Population Sciences, Royal Free Hospital School of Medicine, London NW3 2PF

- Correspondence to: Professor Davey Smith

## Introduction

In the previous two articles1 2 we outlined the potentials and principles of meta-analysis and the practical steps in performing a meta-analysis. Now we will examine how to use meta-analysis to do more than simply combine the results from all the individual trials into a single effect estimate. Firstly, we discuss the advantages and disadvantages of performing subgroup analyses. Secondly, we consider the situation in which the differences in effects between individual trials are related in a graded way to an underlying phenomenon, such as the degree of mortality risk of the trial participants.

#### Summary points

Meta-analysis can be used to examine differences in treatment effects across trials; however, the fact that randomised trials are included in meta-analyses does not mean that comparisons between trials are also randomised comparisons

Meta-analytic subgroup analyses, like subgroup analyses within trials, are prone to bias and need to be interpreted with caution

A more reliable way of assessing differences in treatment effects is to relate outcome to some underlying patient characteristic on a continuous, or ordered, scale

The underlying level of risk is a key variable which is often related to a given treatment effect, with patients at higher risk receiving more benefit then low risk patients

Individual patient data, rather than published summary statistics, are often required for meaningful subgroup analyses

## Subgroup analysis

The main aim of a meta-analysis is to produce an estimate of the average effect seen in trials of a particular treatment. The direction and magnitude of this average effect is intended to guide decisions about clinical practice for a wide range of patients. Clinicians are thus being asked to treat their patients as though each one is well represented by the patients in the clinical trials included in the meta-analysis. This runs against doctors' concerns to use the specific characteristics of a patient to tailor that patient's management.3 Indeed, the effect of a given treatment is unlikely to be identical across different groups of patients—for example, young people versus elderly people, those with mild disease versus those with severe disease. It may therefore seem reasonable to base treatment decisions on the results of the trials that have included participants with similar characteristics to the patient under consideration rather than on the overall evidence as provided by meta-analysis.

Decisions based on subgroup analyses, however, are often misleading. Consider, for example, a doctor in Germany being confronted by the meta-analysis of long term ß blockade after myocardial infarction (see previous article2). Although a robust beneficial effect is seen in the overall analysis, in the only trial that recruited a substantial proportion of German patients (trial N in previous article),4 there was, if anything, a detrimental effect associated with ß blockers. Should the doctor give ß blockers to German patients who have had an infarction? Common sense suggests that being German does not prevent a patient from obtaining benefit from ß blockade. Thus the best estimate of the outcome for German patients may come through discounting the trial carried out in German patients. This may seem paradoxical; indeed the statistical expression of this phenomenon is known as Stein's paradox (box).5

#### Stein's paradox

Applying the findings from meta-analyses often means that the results from a particular trial are disregarded in favour of the combined result. This will generally be based on the assumption that inconsistent results are purely due to chance. But even if some real differences exist the overall estimate may still provide the best estimate of the effect in that group (Stein's paradox).5

Charles Stein showed that a quantity can be better estimated by taking into account the findings from similar studies, rather than by basing estimation solely on one study. The central principle of Stein's method is the “shrinking” of individual data points towards the grand mean. The amount by which an observed value is adjusted (shrinking factor) will depend on the precision of this value. An outlying value that was measured imprecisely is shrunk towards the grand mean to a greater extent than an outlier that was measured with considerable precision. The result of the trial including German patients contributed only little weight in the combined analysis (see the main text)4 and would thus be shrunk a long way towards the overall estimate of a beneficial effect of ß blockade.

Making decisions between overall effects and particular results is not just a problem created by meta-analysis; it also applies to the interpretation of individual clinical trials.6 Authors of trial reports often spend more time discussing the results seen in subgroups of patients included in the trial than on the overall results. Yet frequently the findings of these subgroup analyses fail to be confirmed by later research. The various trials of ß blockade after myocardial infarction yielded several subgroup findings with apparent clinical significance.7 Treatment was said to be beneficial in patients aged under 65 but harmful in older patients, or only beneficial in patients with anterior myocardial infarction. When examined in subsequent studies or in a formal pooling project8 these findings received no support.7 It can be shown that if an overall treatment effect is significant at the 5% level (P<0.05) and the patients are divided at random into two similarly sized groups then there is a 1 in 3 chance that the treatment effect will be large and highly significant in one group but irrelevant and non-significant in the other.9 Which subgroup “clearly” benefits from an intervention is thus often a chance phenomenon, inundating the literature with contradictory findings from subgroup analyses and wrongly inducing clinicians to withhold treatments from some patients.10 11 12

Meta-analyses offer a sounder basis for subgroup analysis, but they are not exempt from producing misleading findings. One of the explanations for the disappointing result seen in the ß blocker trial in German patients was that the agent used, oxprenolol, had intrinsic sympathomimetic activity.13 This seemed plausible because the beneficial effect was assumed to be entirely mediated by blockade of the ß 1 receptor, and the supposition was supported by subgroup analysis in a meta-analysis,14 15 which showed less benefit in trials of patients treated with agents with intrinsic sympathomimetic activity (fig 1). The difference between the two classes of ß blockers was significant (P<0.01). Since then, however, a trial was published showing a particularly strong beneficial effect of acebutolol, an agent with intrinsic sympathomimetic activity,16 whereas another trial using metoprolol, a ß blocker without intrinsic sympathomimetic activity, was essentially negative.17 This illustrates that, far from aiding clinicians, post hoc subgroup analyses may confuse and mislead. A more reliable way of assessing differences in treatment effects is to relate outcome to some underlying patient characteristic on a continuous, or ordered, scale.18 19

## Meta-regression: examining gradients in treatment effects

The clinical trials included in a meta-analysis often differ in a way that would be expected to modify the outcome. In trials of cholesterol reduction the degree of cholesterol lowering attained differs markedly between studies, and the reduction in mortality from coronary heart disease is greater in the trials in which larger reductions in cholesterol are achieved.18 20 Such graded associations are not limited to situations where greater benefits would be expected consequent on greater changes in a risk factor. In the case of thrombolysis after acute myocardial infarction, the greater the delay in treatment, the smaller the benefit of thrombolysis.21 22 Here, the graded association is seen between the outcome and a characteristic of the treatment used. Such a gradient allows for a more powerful examination of differences in outcomes, as a statistical test for trend can be performed, rather than the less powerful test for evidence of global heterogeneity. Other attributes of study groups—such as age and length of follow up—can readily be analysed in this way. As discussed later in this series,23 such analyses will often require data on individual patients rather than published summary statistics.

## Risk stratification

A factor that is often related to a given treatment effect is the underlying risk of occurrence of the event that the treatment aims to prevent. It makes intuitive sense that patients at high risk are more likely to benefit than those at low risk. In the case of trials of cholesterol lowering, for example, the patient groups have ranged from survivors of heart attack with gross hypercholesterolaemia to groups of healthy asymptomatic people with moderately raised cholesterol concentrations. The death rates from coronary heart disease in the first group have been up to 100 times higher than the death rates in the second groups. The outcome of treatment in terms of all cause mortality has been more favourable in the trials recruiting participants at high risk than in the trials recruiting participants at relatively low risk.18 Two factors contribute to this. Firstly, among the high risk participants, the great majority of deaths will be from coronary heart disease, the risk of which is reduced by cholesterol reduction. A 30% reduction in mortality from coronary heart disease therefore translates into a near equivalent reduction in total mortality. In the low risk participants, on the other hand, a much smaller proportion—about 40%—of deaths will be from coronary heart disease. In this case a 30% reduction in mortality from coronary heart disease would translate into a much smaller—about 10%—reduction in all cause mortality. Secondly, if there is any detrimental effect of treatment it may easily outweigh the benefits of cholesterol reduction in the low risk group, whereas in high risk patients, among whom a substantial benefit is achieved from cholesterol reduction, this will not be the case. In a recent meta-analysis of cholesterol lowering trials this situation was evident for trials using fibrates but not for trials using other drugs.24

A similar association between level of risk and benefit can be seen in meta-analyses carried out for other types of medical treatment.25 Thus the use of antiplatelet agents such as aspirin produces a 23% reduction in all cause mortality after an acute myocardial infarction but only a (non-significant) 5% reduction in the primary prevention setting.26 This may reflect a small increase in the risk of haemorrhagic stroke consequent on the use of antiplatelet agents, which counterbalances the beneficial effects on coronary heart disease among low risk individuals but not among those at higher risk. Similarly, a large reduction in relative risk of death was seen in the single study that has reported on treating HIV infection with zidovudine in patients with AIDS.27 A meta-analysis of seven trials, however, showed that zidovudine given early in the course of HIV infection was not associated with any long term survival benefit (fig 2).28 When outcomes are very different in groups at different levels of risk it is inappropriate to perform a meta-analysis in which an overall estimate of the effect of treatment is calculated. In the zidovudine trials, for example, an overall effect estimate from all eight trials (odds ratio 0.96; 95% confidence interval 0.75 to 1.22) is very different from that seen in the only trial among patients with AIDS (0.04; 0.01 to 0.33). If there had been more trials among patients with AIDS the overall effect would seem highly beneficial. Conversely, if there had been more large trials among asymptomatic patients the confidence limits around the overall effect estimate would exclude any useful benefit, which would be misleading if applied to patients with AIDS.

## Problems in risk stratification

When many trials have been conducted in a particular field, risk stratification can be performed at the level of individual trials. This was carried out in the case of cholesterol lowering, with mortality from coronary heart disease in the control arm of the trials as the stratification variable.18 This stratification is of clinical use, as this is the risk of death from coronary heart disease in patients without treatment—that is, the risk level that clinicians want to use for deciding whether patients will benefit from therapeutic cholesterol lowering. The analysis can also use risk of death in the control group as a continuous variable, through the examination of the interaction between treatment effect and risk in a logistic regression analysis. A significant statistical test for interaction suggests that there is a real difference in outcome at different levels of risk.

The use of mortality in the control group as a stratification variable introduces a potential bias into the analysis, as this mortality is included in the calculation of the effect estimate from each trial.18 29 30 31 Thus, if through chance variation, mortality from coronary heart disease in the control group happens to be low, apparently unfavourable effects of the treatment on mortality would be likely, as mortality in the treatment group would apparently be increased. This would itself produce an association between the outcome measure and the level of risk in the control group, with greater benefit (and fewer disbenefits) being seen in the trials in which the play of chance led to a high mortality in the control group. For example, a recent meta-regression analysis examined whether in middle aged patients with mild to moderate hypertension the benefit from drug treatment depends on the underlying risk of death.32 The scatterplot advocated by L'Abbé et al33 of event rates in the treated group against those in the control group was used (fig 3(top)). This plot is useful for examining the degree of heterogeneity between trials and to identify outliers. If the treatment is beneficial, trials will fall to the right of the line of identity (the no effect line). A homogenous set of trials will scatter around a parallel line, which corresponds to the combined treatment effect.

The authors then computed a linear regression model describing mortality in the treated groups as a function of mortality in the control group.32 Because the number of deaths and person years of follow up varied widely between studies, the analysis was weighted by the inverse of the variance of the rate ratio. The resulting regression line intersects with the “null effect” line at a rate of 6 per 1000 person years in the control group (fig 3 (top)). This was interpreted as showing “that drug treatment for mild to moderate hypertension has no effect on, or may even increase, all cause mortality in middle aged patients.” 32 In other words, antihypertensive treatment was considered to be beneficial only in patients at relatively high risk of death. This interpretation, however, is misleading because it ignores the influence of random fluctuations on the slope of the regression line.29 If, owing to non-infinite sample sizes, mortality in a control group is particularly low then mortality in the treatment group will, on average, seem high. Conversely, if mortality among controls is by chance high then mortality in the treatment group will seem low. The effect of random error will thus rotate the regression line around a pivot, making it cross the line of identity on the right hand side of the origin.

This phenomenon, a manifestation of regression to the mean,30 can be illustrated in computer simulations. Using the same rates in the control group and assuming a constant reduction of all cause mortality of 10% in treated groups (relative risk 0.9), we considered the situation both assuming no random fluctuations in rates and allowing random error (fig 3 (bottom)).29 After we added error (by sampling 1000 times from the corresponding Poisson distribution) the regression line rotated and crossed the no effect line. Indeed, the intersection is at almost the same point as that found in the earlier meta-analysis—namely, at a mortality in the control group of about 6 per 1000 person years. It is thus quite possible that what was interpreted as reflecting detrimental effects of antihypertensive treatment32 was in fact produced by random variation in event rates.

When mortality in the control groups vary greatly or when trials are large, the chance fluctuations that produce such spurious associations will be less important. Alternatively, the analysis can be performed using the overall mortality in the control and treatment arms of the trials as the risk indicator.18 This will generally, but not always, lead to bias in the opposite direction, diluting any real association between level of risk and treatment effect.30

Use of event rates from either the control group or overall trial participants as the stratifying variable when relating treatment effect to level of risk is thus problematic.29 30 Although some, more complex, statistical methods are less susceptible to these biases,31 34 it is preferable to use indicators of risk that are not based on outcome measures. In the case of the effect of angiotensin converting enzyme inhibitors on mortality in patients with heart failure, use of risk in the control group showed greater relative and absolute benefit in trials recruiting higher risk participants.25 In a meta-analysis, data were available on treatment effects according to clinical indicators within strata from many of the trials.35 Twenty nine per cent of patients with an ejection fraction of ≤0.25 at entry died during the trials, compared with 17% of patients with an ejection fraction of >0.25. A substantial reduction in mortality (odds ratio 0.69; 95% confidence interval 0.57 to 0.85) was seen in the first, higher risk group, whereas little effect on mortality was seen in the second, lower risk group (0.98; 0.79 to 1.23). A similar difference was seen if a combined end point of mortality or admission to hospital for congestive heart failure was used as the outcome measure.

## Confounding

That randomised controlled trials are included in meta-analyses does not mean that comparisons made between trials are randomised comparisons. When outcomes are related to characteristics of the trial participants, to differences in treatments used in the separate trials, or to the situations in which treatments were given, the associations seen are subject to the potential biases of observational studies. Confounding could exist between one trial characteristic—say, drug trials versus diet trials in the case of cholesterol lowering—and another characteristic, such as level of risk of the participants in the trial. In many cases there are simply too few trials, or differences in the average characteristics of participants in the trials are too small, for a stratified analysis to be performed at the level of the individual trial. It may be possible to consider strata within the trials—for example, male versus female, or those with or without existing disease—to increase the number of observations to be included in the regression analysis. Increasing the number of data points in this way is of little help if there are strong associations between the factors under consideration. For example, in a meta-regression analysis of total mortality outcomes of cholesterol lowering trials various factors seem to influence the outcome: greater cholesterol reduction leads to greater benefit; trials including participants with a higher level of risk of coronary heart disease show larger mortality reductions; and the fibrate drugs lead to less benefit than other interventions.20 24 These findings are difficult to interpret, however, as the variables included are strongly related—fibrates have been used mainly in trials recruiting lower risk participants, and they lower cholesterol much less than statins. In this situation all the problems of performing multivariable analyses with correlated covariates are introduced.36 37

## Conclusion

It is tempting to use a meta-analysis to produce more than a simple overall effect estimate, but caution is needed, for the reasons detailed above. One of the more useful extensions of meta-analysis beyond the grand mean relates to the examination of publication bias and other inclusion biases, which will be discussed later in this series.

## Notes

The department of social medicine at the University of Bristol and the department of primary care and population sciences at the Royal Free Hospital School of Medicine, London, are part of the Medical Research Council's health services research collaboration.