Intended for healthcare professionals

Education And Debate Statistics Notes

Generalisation and extrapolation

BMJ 1998; 317 doi: (Published 08 August 1998) Cite this as: BMJ 1998;317:409
  1. Douglas G Altman, head**a,
  2. J Martin Bland, professor of medical statisticsb
  1. ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, Oxford OX3 7LF
  2. Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
  1. Correspondence to: Mr Altman.

    All medical research is carried out on selected individuals, although the selection criteria are not always clear. The usefulness of research lies primarily in the generalisation of the findings rather than in the information gained about those particular individuals. We study the patients in a trial not to find out anything about them but to predict what might happen to future patients given these treatments.

    A recent randomised trial showed no benefit of fine needle aspiration over expectant management in women with simple ovarian cysts.1 The clinical question is whether the results can be deemed to apply to a given patient. For most conditions it is widely accepted that a finding like this validly predicts the effect of treatment in other hospitals and in other countries. It would not, however, be safe to make predictions about patients with another condition, such as a breast lump. In between these extremes lie some cases where generalisability is less clear.

    For example, when trials showed the benefits of β blockers after myocardial infarction the studies had been carried out on middle aged men. Could the findings reasonably be extrapolated to women, or to older men? It is probably rare that treatment effectiveness truly varies by sex, and claims of this kind often arise from faulty subgroup analysis.2 Age too rarely seems to affect the benefit of a treatment, but clinical characteristics certainly do. Treatments that work in mild disease may not be equally effective in patients with severe disease, or vice versa. Likewise the mode of delivery—for example, oral versus subcutaneous—or dose may affect treatment benefit. Clinical variation is likely to affect the size of beneft of a treatment, not whether any benefit exists.

    The extent to which it is wise or safe to generalise must be judged in individual circumstances, and there may not be a consensus. Arguably many studies (especially randomised controlled trials) use over-restrictive inclusion criteria, so that the degree of safe generalisability is reduced.3 Even geographical generalisation may sometimes be unwarranted. For example, BCG vaccination against tuberculosis is much less effective in India than in Europe, probably because of greater exposure in India.4 For the clinician treating a patient the question can be expressed as: “Is my patient so different from those in the trial that its results cannot help me make my treatment decision?”5

    In a clinical trial we are interested in the difference in effectiveness between two treatments. There is no need to generalise the success rate of a particular treatment. In some other types of research, such as surveys to establish prevalence and prognostic or diagnostic studies, we may be trying to estimate a single population value rather than the difference between two of them. Here generalisation may be less safe. For example, the prevalence of many diseases varies across social and geographical groups. Results may not even hold up across time. For example, changes in case mix over time can affect the properties of a diagnostic test.6

    Many studies use regression analysis to derive a model for predicting an outcome from one or more explanatory variables. The model, represented by an equation, is strictly valid only within the range of the observed data on the explanatory variable(s). When a measurement is included in the regression model it is possible to make predictions for patients outside the range of the original data (perhaps inadvertently). This numerical form of generalisation is called extrapolation. It can be seriously misleading.


    Fetal biparietal diameter (on log scale) in relation to gestational age8 with quadratic (solid line) and cubic (broken line) regression models fitted to data from only those fetuses less than 30 weeks' gestation (n=119)

    To take an extreme example, a linear relation was found between ear size and age in men aged 30 to 93, with ear length (in mm) estimated as 55.9+0.22×age in years.7 The value of 55.9 corresponds to an age of zero. A baby with ears 5.6 cm long would look like Dumbo.

    Extrapolating may be especially dangerous when a curved relation is found. Figure 1 shows fetal biparietal diameter (on a log scale) in relation to gestational age. Also shown are quadratic and cubic models fitted to the log biparietal diameter measurements from only those fetuses less than 30 weeks' gestation. Both curves fit the data well up to 30 weeks, but both give highly misleading predictions thereafter. The quadratic model shows a spurious maximum at around 34 weeks, while the cubic curve takes us again into elephantine regions.

    When we have two explanatory variables it will not usually be apparent (unless we examine a scatter diagram) when a patient has a combination of characteristics which do not fall within the span of the original data set. With more than two variables, such as in many prognostic models, it is not possible to be sure that the original data included any patients with the combination of values of a new patient. Nevertheless, it is reasonable to use such models to make predictions for patients whose important characteristics are within the range in the original data.

    Clearly patient characteristics, including the criteria for sample selection, need to be fully reported in medical papers. Yet such basic information is not always provided.


    View Abstract