Confounding and Simpson's paradoxBMJ 1994; 309 doi: https://doi.org/10.1136/bmj.309.6967.1480 (Published 03 December 1994) Cite this as: BMJ 1994;309:1480
- a Medical Statistics and Computing, University of Southampton, Southampton General Hospital, Southampton SO16 6YD
- Correspondence to: Mr Julious.
- Accepted 16 September 1994
A common problem when analysing clinical data is that of confounding. This occurs when the association between an exposure and an outcome is investigated but the exposure and outcome are strongly associated with a third variable. An extreme example of this is Simpson's paradox, in which this third factor reverses the effect first observed.1 This phenomenon has long been recognised as a theoretical possibility but few real examples have been presented.
Charig et al undertook a historical comparison of success rates in removing kidney stones.2 Open surgery (1972-80) had a success rate of 78% (273/350) while percutaneous nephrolithotomy (1980-5) had a success rate of 83% (289/350), an improvement over the use of open surgery. However, the success rates looked rather different when stone diameter was taken into account. This showed that, for stones of <2 cm, 93% (81/87) of cases of open surgery were successful compared with just 83% (234/270) of cases of percutaneous nephrolithotomy. Likewise, for stones of >/=2 cm, success rates of 73% (192/263) and 69% (55/80) were observed for open surgery and percutaneous nephrolithotomy respectively.
The main reason why the success rate reversed is because the probability of having open surgery or percutaneous nephrolithotomy varied according to the diameter of the stones. In observational (nonrandomised) studies comparing treatments it is likely that the initial choice of treatment would have been influenced by patients' characteristics such as age or severity of condition; so any difference between treatments could be accounted for by these original factors. Such a situation may arise when a new treatment is being phased in over time. Randomised trials are therefore necessary to demonstrate any treatment effect.
In another example Hand reported that the proportion of male patients in a psychiatric hospital seemed to fall slightly over time, from 46.4% (343/739) in 1970 to 46.2% (238/515) in 1975.3 When the results were broken down according to patients' age, however, it was observed that the proportion of male patients had increased; from 59.4% (255/429) to 60.5% (156/258) among those aged <65 and from 28.4% (88/310) to 31.9% (82/257) among those aged >/=65.
The table shows another example, a study of mortality and diabetes (data from the Poole diabetis cohort4). In the study only 29% of the patients with insulin dependent diabetes died compared with 40% of the patients with non-insulin dependent diabetes. However, non-insulin dependent diabetes usually develops only after the age of 40.5 Hence, when the diabetic patients are split into two groups (those aged </=40 and those aged >40), it is found that in both groups a smaller proportion of patients with noninsulin dependent diabetes died compared with patients with insulin dependent diabetes.
All three examples incorporate the arbitrary dichotomisation of continuous variables. However, adjustment can be made by keeping the variables continuous.
For the Poole diabetic cohort, a Cox proportional hazards survival model with just type of diabetes indicates that insulin dependent diabetes gives a significantly better prognosis for survival than noninsulin dependent diabetes (relative risk 0.69 (95% confidence interval 0.54 to 0.87)). However, correcting for age (by entering it concurrently into the model with type of diabetes) switches this risk so that the risk for insulin dependent diabetes is greater than for noninsulin dependent diabetes (relative risk 1.15 (0.91 to 1.46)).
Thus, a problem arises when the variable of interest is expected to be confounded with another factor (such as type of diabetes and age) or when there is an important imbalance of a factor at the different levels of the variable of interest (such as an imbalance in the proportion of the sexes on two treatments). To accommodate this, the factor should also be included in a multiple regression or multiple logistic regression model together with the variable of interest or as a covariate in an analysis of variance.
We thank David Hand and David Spiegelhalter for providing the first two examples and the referee for useful comments.