Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Douglas G Altman a ICRF Medical
Statistics Group, Centre for Statistics in Medicine, Institute of
Health Sciences, Oxford OX3 7LF, b Department of Public Health
Sciences, St George's Hospital Medical School, London SW17
0RE
Correspondence to: Mr
Altman.
All medical research is carried out on selected
individuals, although the selection criteria are not always clear. The
usefulness of research lies primarily in the generalisation of the
findings rather than in the information gained about those particular
individuals. We study the patients in a trial not to find out anything
about them but to predict what might happen to future patients given
these treatments.
A recent randomised trial showed no benefit of fine needle aspiration
over expectant management in women with simple ovarian
cysts.1 The clinical question is whether the results can
be deemed to apply to a given patient. For most conditions it is widely
accepted that a finding like this validly predicts the effect of
treatment in other hospitals and in other countries. It would not,
however, be safe to make predictions about patients
with another condition, such as a breast lump. In between these
extremes lie some cases where generalisability is less clear.
For example, when trials showed the benefits of The extent to which it is wise or safe to generalise must be judged in
individual circumstances, and there may not be a consensus. Arguably
many studies (especially randomised controlled trials) use
over-restrictive inclusion criteria, so that the degree of safe
generalisability is reduced.3 Even geographical
generalisation may sometimes be unwarranted. For example, BCG
vaccination against tuberculosis is much less effective in India than
in Europe, probably because of greater exposure in India.4
For the clinician treating a patient the question can be expressed as:
"Is my patient so different from those in the trial that its results
cannot help me make my treatment decision?"5
In a clinical trial we are interested in the difference in
effectiveness between two treatments. There is no need to generalise
the success rate of a particular treatment. In some other types of
research, such as surveys to establish prevalence and prognostic or
diagnostic studies, we may be trying to estimate a single population
value rather than the difference between two of them. Here
generalisation may be less safe. For example, the prevalence of many
diseases varies across social and geographical groups. Results may not
even hold up across time. For example, changes in case mix over time
can affect the properties of a diagnostic test.6
Many studies use regression analysis to derive a model for predicting
an outcome from one or more explanatory variables. The model,
represented by an equation, is strictly valid only within the range of
the observed data on the explanatory variable(s). When a measurement is
included in the regression model it is possible to make predictions for
patients outside the range of the original data (perhaps
inadvertently). This numerical form of generalisation is called
extrapolation. It can be seriously misleading.
blockers after
myocardial infarction the studies had been carried out on middle aged
men. Could the findings reasonably be extrapolated to women, or to
older men? It is probably rare that treatment effectiveness truly
varies by sex, and claims of this kind often arise from faulty subgroup
analysis.2 Age too rarely seems to affect the benefit of a
treatment, but clinical characteristics certainly do. Treatments that
work in mild disease may not be equally effective in patients with
severe disease, or vice versa. Likewise the mode of delivery
for
example, oral versus subcutaneous
or dose may affect treatment
benefit. Clinical variation is likely to affect the size of beneft
of a treatment, not whether any benefit exists.

View larger version (18K):
[in a new window]
Fetal biparietal diameter (on log scale) in relation
to gestational age8 with quadratic (solid line) and cubic
(broken line) regression models fitted to data from only those fetuses
less than 30 weeks' gestation (n=119)
To take an extreme example, a linear relation was found between ear size and age in men aged 30 to 93, with ear length (in mm) estimated as 55.9+0.22×age in years.7 The value of 55.9 corresponds to an age of zero. A baby with ears 5.6 cm long would look like Dumbo.
Extrapolating may be especially dangerous when a curved relation is found. Figure 1 shows fetal biparietal diameter (on a log scale) in relation to gestational age. Also shown are quadratic and cubic models fitted to the log biparietal diameter measurements from only those fetuses less than 30 weeks' gestation. Both curves fit the data well up to 30 weeks, but both give highly misleading predictions thereafter. The quadratic model shows a spurious maximum at around 34 weeks, while the cubic curve takes us again into elephantine regions.
When we have two explanatory variables it will not usually be apparent (unless we examine a scatter diagram) when a patient has a combination of characteristics which do not fall within the span of the original data set. With more than two variables, such as in many prognostic models, it is not possible to be sure that the original data included any patients with the combination of values of a new patient. Nevertheless, it is reasonable to use such models to make predictions for patients whose important characteristics are within the range in the original data.
Clearly patient characteristics, including the criteria for sample selection, need to be fully reported in medical papers. Yet such basic information is not always provided.
References
What can you learn from this BMJ paper? Read Leanne Tite's Paper+