The relation between treatment benefit and underlying risk in metaanalysis
BMJ 1996; 313 doi: http://dx.doi.org/10.1136/bmj.313.7059.735 (Published 21 September 1996) Cite this as: BMJ 1996;313:735 Stephen J Sharp, research fellow in medical statisticsa,
 Simon G Thompson, professor of medical statistics and epidemiologyb,
 Douglas G Altman, headc
 ^{a} Medical Statistics Unit, London School of Hygiene and Tropical Medicine, London WC1E 7HT,
 ^{b} Department of Medical Statistics and Evaluation, Royal Postgraduate Medical School, London W12 0NN,
 ^{c} Imperial Cancer Research Fund Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF
 Correspondence to: Mr Sharp.
 Accepted 21 June 1996
In metaanalyses of clinical trials comparing a treated group with a control group it has been common to ask whether the treatment benefit varies according to the underlying risk of the patients in the different trials, with the hope of defining which patients would benefit most and which least from medical interventions. The usual analysis used to investigate this issue, however, which uses the observed proportions of events in the control groups of the trials as a measure of the underlying risk, is flawed and produces seriously misleading results. This arises through a bias due to regression to the mean and will be particularly acute in metaanalyses which include some small trials or in which the variability in the true underlying risks across trials is small. Approaches which previously have been thought to be more appropriate are to substitute the average proportion of events in the control and treated groups as the measure of underlying risk or to plot the proportion of events in the treated group against that in the control group (L'Abbe plot). However, these are still subject to bias in most circumstances. Because of the potentially seriously flawed conclusions that can result from such analyses, they should be replaced either by statistically appropriate (but more complex) approaches or, preferably, by analyses which investigate the dependence of the treatment effect on measured baseline characteristics of the patients in each trial.
Where there are substantial clinical differences between the different trials of a metaanalysis and their patients, or substantial quantitative differences in the results from the different trials, a single overall summary estimate of treatment benefit has little practical applicability.1 An analysis which ignores this heterogeneity is clinically misleading and scientifically naive.2 Many authors have now emphasised the clinical and scientific importance of investigating potential sources of heterogeneity when conducting a metaanalysis.3 4 5
If the results of a metaanalysis of clinical trials are to affect future clinical practice the clinician needs to know how the expected net treatment benefit varies according to certain measurable characteristics of a patient. A frequently investigated question is whether there is variation in the treatment benefit according to a patient's underlying risk of the event that the treatment is designed to prevent or delay. Underlying risk is used as a convenient summary of a number of characteristics which may be measurable risk factors but for which individual patient data are not available from some or all of the trials. Whereas it is often expected that the absolute risk reduction attributable to treatment will vary, possibly almost proportionately, with the underlying risk of patients in each trial, it is usually assumed that the relative risk (or odds ratio) does not vary in this way. Indeed, a recently proposed model to assess net benefit is based on this assumption.6 The existence of a relation between relative treatment benefit and underlying risk would have crucial implications for the interpretation of the results of a metaanalysis, in terms of both assessment of net treatment benefit and economic considerations.7
We address two issues. Firstly, while the intention to investigate possible variation in treatment benefit is laudable, we show that common approaches to this problem contain serious statistical pitfalls. Secondly, we argue that these same approaches fail to indicate the potential treatment benefit for individual patients, which is of more practical use to clinicians, and advocate instead the direct use of measurable patient characteristics.
The data
To demonstrate the statistical pitfalls in an analysis of underlying risk as a source of heterogeneity we use data from a metaanalysis of randomised trials to assess the effectiveness of endoscopic sclerotherapy in reducing mortality in patients with cirrhosis and oesophagogastric varices.8 Nineteen such trials comparing sclerotherapy with a control were reviewed; the data relevant to our discussion are shown in table 1. In the following, we use proportion of deaths (expressed as a percentage) on a log odds scale as the measure of “underlying risk” and the log odds ratio as the measure of treatment effect. However, the principles apply more generally to other measures of treatment effect, such as the risk ratio or mean difference for a quantitative outcome, and corresponding measures of “underlying risk.” There was substantial evidence of heterogeneity (P<0.0001) in the observed odds ratios across the 19 sclerotherapy trials, and therefore an investigation of whether underlying risk was part of the explanation for different observed treatment effects is scientifically relevant.2
Relating treatment effect to underlying risk: three conventional approaches (1) GRAPH OF TREATMENT EFFECT AGAINST PROPORTION OF EVENTS IN CONTROL GROUP
A natural measure of underlying risk in a trial population is the observed proportion of events in the control group. Figure 1 shows a graph of odds ratio of death (log scale) against proportion of deaths in the control group (log odds scale) for the data from the sclerotherapy trials. Each trial on the graph is represented by a circle, the area of which indicates the study size. The graph includes the line of predicted values obtained from a weighted regression. The estimated slope is 0.61 (95% confidence interval 0.99 to 0.23), giving strong evidence of a negative association—that is, an increase in treatment benefit (lower odds ratio) with increasing proportion of events in the control group. The conclusion from this analysis would be that underlying risk is a significant source of heterogeneity. Furthermore, there is a temptation to use point T in the figure to define a cut off value of risk in the control group and conclude that the treatment is effective (odds ratio below 1) only in patients with an underlying risk higher than this value.
The problem with this interpretation is due to regression towards the mean.9 10 Because the outcome in the control group is being related to the treatment effect, an expression that also includes the control group outcome, a relation is expected. In the case where the treatment reduces the risk a high observed proportion of events in the control group will tend to lead to a larger observed treatment effect—and the converse when the observed proportion is low. In other words, the bias will lead to the potentially incorrect inference that the treatment is most beneficial among high risk patients and least among low risk patients. The size of the bias can be surprisingly large. In the extreme case when the treated and control group outcomes are unrelated the expected correlation can be 0.71.11 Underlying risk may indeed be a source of heterogeneity, but such a graph and regression will misrepresent any true effect.
(2) GRAPH OF TREATMENT EFFECT AGAINST AVERAGE PROPORTION OF EVENTS IN THE CONTROL AND TREATED GROUPS
Figure 2 is a graph of odds ratio of death (log scale) against the average proportion of events in the control and treated groups (log odds scale) for the data from the sclerotherapy trials; there is a slight increase in treatment effect (reduction in odds ratio) as the average proportion increases, but the evidence is unconvincing: the slope of the fitted line is 0.16 (0.73 to 0.42). Use of the average has led to a different conclusion: underlying risk is not a significant source of heterogeneity.
Unfortunately the validity of the approach using the average relies on the assumption that the true treatment effect does not vary between trials12; departures from this assumption will lead to bias in the size and direction of any observed association. To take an extreme example, consider a set of very large trials (where errors of measurement are negligible) which have the same underlying risk (as measured by proportion of events in the control group) but some of which have larger treatment benefits than others. A graph of treatment effect against average proportion will show a positive relation, whereas in truth there is no relation with underlying risk. Furthermore, there is also a conceptual difficulty with using a method which depends on an assumption of no variation in true treatment effect to estimate how the treatment effect varies with underlying risk.
To illustrate further the danger of a simple analysis using some observed measure of “risk,” we consider briefly an approach using a graph of treatment effect against proportion of events in the treated group. The data from the sclerotherapy trials produce a slope of the fitted line of 0.51 (0.02 to 1.00), leading to the opposite conclusion from that obtained using the proportion of events in the control group. In other words, the danger is that any observed association between treatment effect and underlying risk is strongly dependent on the choice of measure of underlying risk, even to the extent of determining the direction of the association.
(3) L'ABBE PLOT: PROPORTION OF EVENTS IN THE TREATED GROUP AGAINST PROPORTION OF EVENTS IN THE CONTROL GROUP
The L'Abbe plot of proportion of events in the treated group against that in the control group was proposed as a graphical means of exploring possible heterogeneity.13 If the trials are fairly homogeneous the points would lie around a line corresponding to the pooled treatment effect parallel to the line of identity: large deviations would indicate possible heterogeneity.
Figure 3 is a L'Abbe plot for the data from the sclerotherapy trials, including the line of identity and weighted least squares regression line, estimated slope 0.38 (0.07 to 0.69). This illustrates how L'Abbe plots may be used to draw two inappropriate conclusions. Firstly, treatment effect is associated with underlying risk due to the slope of the regression line being different from the slope of the line of identity, and, secondly, point P defines a cut off value of proportion of events in the control group, above which the treatment is effective (odds ratio below 1) and below which it is not (odds ratio above 1), analogous to point T in figure 1. These interpretations are misleading because the proportion of events in the control group is measured with error, leading to underestimation of the true regression slope,14 another manifestation of regression towards the mean.9 10 L'Abbe plots may be useful as an exploratory graphical device—for example, to identify an unusual or outlying trial—but they cannot be used in conjunction with a regression analysis to define regions in which a treatment is or is not effective.
Interpretation of conventional graphs: practical guidelines
The extent to which these conventional approaches yield misleading conclusions depends on a number of factors, and below we summarise guidelines based on algebraic results (see appendix) for the interpretation of such plots.
Use of proportion of events in the control group (1):
is not an appropriate method, and will always be biased,
will be less misleading—that is, less biased—if the trials are mostly large, or the variation in true underlying risks is large.
Use of average proportion of events in control and treated groups (2):
is appropriate only if the true treatment effect is constant across trials,
will be less misleading if the variation in true underlying risks is large.
A L'Abbe plot (3):
is a useful exploratory graphical method as an adjunct to a standard metaanalysis plot,
is not appropriate for defining groups in which treatment is or is not effective.
In all of these approaches it is the use of ordinary linear regression to interpret the graph which leads to misleading conclusions.
A clinically more useful alternative
Given that a patient's “underlying risk” is known only to the clinician through certain measured characteristics, a clinically more useful alternative to the problematic analyses we have described is to relate treatment benefit to measurable baseline characteristics. These characteristics, or some combination of them, would act as a surrogate measure of the patient's risk. In a metaanalysis such an analysis would ideally be based on individual patient data, but it would also be possible to use group data. To take a simple example, we might use age either for each patient or, much more crudely, the mean age of all patients in a trial. Such an analysis would allow age specific estimates of the treatment benefit to be derived.
An extension of this idea would be to combine several prognostic variables into a risk score.15 The relation of treatment benefit to risk score for individual patients could be evaluated using a regression analysis.5 Such a combination would avoid the problem of post hoc data dredging which arises when many variables are considered separately and would best be based on data from sources other than the trials which form the metaanalysis for treatment effects, such as prospective studies. Rothwell has discussed such prognostic modelling in the context of a single clinical trial, where the aim is to identify those patients most likely to benefit from treatment.16
Applying this approach to metaanalysis would avoid the biases inherent in the “underlying risk” analyses. Moreover, the benefit from treatment, for a particular patient, could then be estimated on the basis of specific measurable patient characteristics, rather than the unknown quantity “underlying risk.”
Discussion
The question of whether treatment effects are related to underlying risk has been considered in several conditions—for example, in trials of cholesterol reduction and mortality,17 of tocolysis using β mimetics in preterm delivery,18 and of antiarrhythmic drugs after acute myocardial infarction.19 20 Differences in the underlying risk of patients have also been proposed as an explanation for the differences between the results of metaanalysis and a subsequent megatrial of magnesium therapy after acute myocardial infarction. Unfortunately, while the question posed is simple, it turns out that there is no easy way to answer it validly. A similar problem occurs in several guises in medicine, perhaps most commonly the issue of the possible relation between change and initial value.10 12 21
As we have shown, the conventional approaches to answering this question are flawed and liable to produce misleading results, yet there are several examples of such methods in the medical literature. The problem of using the observed proportion of events in the control group was first discussed11 in response to a metaanalysis of 14 placebo controlled clinical trials to evaluate the effect of tocolysis with β mimetics on the risk of preterm birth.18 An example where the outcome was a continuous measure appeared in a review of 18 randomised controlled trials of prophylactic desmopressin to reduce perioperative blood loss during cardiac surgery,22 where one conclusion was that the efficacy of desmopressin was a function of blood loss in the control group. In a metaanalysis of 17 controlled clinical trials to test whether dietary supplementation with fish oil ((omega)3 polyunsaturated fatty acids) reduces systolic blood pressure23 the conclusion that the magnitude of blood pressure reduction increased progressively with the level of blood pressure must also be in doubt, since the level used was the average of the levels at pretreatment and at the end of treatment. Finally, exactly the misleading interpretation of a L'Abbe plot described earlier was given after a metaanalysis of drug trials in mild hypertension24 25: deaths per 1000 person years in the intervention group were plotted against the same quantity in the control group, and the point of intersection of the regression line and line of identity was used to estimate an underlying risk of mortality below which drugs were harmful and above which they were beneficial.
Key messages
Such an association would have important implications in the evaluation of the treatment
Conventional analysis methods are based on the observed proportion of events in the control group or the average proportion in the control and treated groups
Such methods are flawed in most situations and can lead to seriously mislead ing conclusions
A preferable alternative approach is to relate treatment benefit to measurable patient characteristics
The biases described will not apply equally to all metaanalyses. As we have noted, the effect will be particularly acute in metaanalyses which include some small trials (a common case) or in which the true variability in underlying risk across trials is small. Although we have concentrated on binary outcomes, the same issues apply to all types of outcome. An approach based on the average may work for other problems,26 but the assumptions made12 are untenable in the context of metaanalysis, where each estimate has a different precision, being from a different trial. While the use of the average proportion of events is likely to be less biased than the use of the proportion of events in the control group, it would be clearly preferable to use an unbiased method. A statistically valid approach requires a correction of the observed relation between treatment effect and proportion of events in the control group to allow for the bias. A solution to this problem is difficult because each trial result has a different precision. Complex solutions have been proposed,27 28 but based on strong assumptions: we do not consider them further here. Even were a correct solution to this question available, it is unclear how this information could be used by a clinician, as there is no direct way of assessing the risk for an individual patient.
In conclusion, the statistical and philosophical difficulties with the conventional simple approaches (except in the extreme situation where all the trials in a metaanalysis are very large) lead us to recommend that these should be avoided. Attention instead should be given to the more practically applicable question of how measurable patient characteristics impact on treatment efficacy.
Appendix
Let xj and yj be the observed measures of “risk” in the control and treated groups of the jth trial—these could be log odds, absolute risks, log risks, or means of a quantitative outcome, whichever is most appropriate.
Write xj = μj + (is an element of)1j yj = μj + (delta)j + (is an element of)2j
where μj is the underlying risk in trial j, (delta)j the true treatment effect, (is an element of)1j and (is an element of)2j random errors of measurement, and for all j
Var(μj) = (sigma)^{2}μ Var((delta)j) = (sigma)^{2}delta E((is an element of)1j) = E((is an element of)2j) = 0, Var((is an element of)1j) = Var((is an element of)2j) = (sigma)^{2}
Then (sigma)^{2}μ represents the variability in the true underlying risks, and (sigma)^{2}(delta) the variability in the true treatment effects between trials; (sigma)^{2} is the variability of the measured outcomes within each trial group, here assumed the same in each trial.
The observed treatment effect—for example, log odds ratio, absolute risk difference, log relative risk, mean difference—is tj = yj  xj
Even when there is no relation between the true treatment effects (delta)j and underlying risks μj across trials, it can be shown that:
Expected slope of regression of tj on xj = (sigma)^{2}/(sigma)^{2}μ + (sigma)^{2}
Expected slope of regression of tj on xj + yj/2 = 2(sigma)^{2}(delta)/2(sigma)^{2} + (sigma)^{2}(delta) + 4(sigma)^{2}μ
The former is necessarily negative, and the latter positive unless (sigma)^{2}(delta) = 0
Footnotes

Funding Medical Research Council (SJS), University of London (SGT), Imperial Cancer Research Fund (DGA).

Conflict of interest None.