Statistics Notes: Interaction 1: heterogeneity of effects
BMJ 1996; 313 doi: https://doi.org/10.1136/bmj.313.7055.486 (Published 24 August 1996) Cite this as: BMJ 1996;313:486- a ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF
- b Department of Medical Statistics, University of Newcastle, Newcastle upon Tyne NE2 4HH
- Correspondence to: Mr Altman.
In several types of study we may want to examine the consistency of an observed relation across two or more subgroups of the individuals studied. For example, in a clinical trial we might want to know if the observed treatment difference is the same for young and old patients or for different stages of disease at presentation. In an epidemiological study we might want to know whether the observed relation between an exposure and disease is different among smokers and non-smokers
In such cases we are interested in examining whether one effect is modified by the value of another variable. This may be viewed as the examination of the heterogeneity of an observed effect, such as treatment benefit in a clinical trial, across subsets of individuals. The statistical term for heterogeneity of this type is interaction; the medical concept of synergy is the same thing.
While it may well be of interest to look for heterogeneity of effect, this is not always wise. In a controlled trial there are numerous subgroups which might be compared by splitting the patients according to sociodemographic or clinical categories at the start of the trial. In addition, for continuous variables such as age or blood pressure there are many ways of creating groups. Exploratory examination of many such subgroups is almost certain to throw up some spurious significant interactions, and in practice we cannot tell if a specific interaction is real or spurious. For example, in a randomised controlled trial comparing dexamethasone phosphate with placebo for preventing neonatal respiratory distress syndrome, the researchers found unexpectedly that the overall beneficial effect of the active treatment was present only in female infants.1 Further studies would be needed to confirm the finding (or not). The refutation of such unexpected observations is common, and indeed in this case the finding was not replicated in further studies.2
Likewise, we can investigate the interaction between any pair of variables in a regression model. With 10 variables there are 45 such potential interactions and much scope for being misled. So, although we do not necessarily believe that all effects are truly independent, in many cases it is reasonable not to examine any possible interactions. For example, Pocock et al found a negative association between tooth lead concentrations and IQ (intelligence quotient) in children aged 6.3 Exploratory analysis revealed a strong association among boys and little association among girls. They were rightly cautious in their interpretation as there had been no prior hypothesis about such an effect.
By contrast, when there is a specific prior suspicion of the existence of a particular interaction it is perfectly reasonable and desirable to examine it. A common example already mentioned is the interest in a possible difference of risk between smokers and non-smokers. For example, a study of Danish porcelain painters found that the adverse effects of cobalt exposure on lung function were more severe among non-smokers than smokers.4
Results of tests for interactions are likely to be convincing only if they were specified at the start of the study. In any study that presents subgroup analyses it is important to specify when and why the subgroups were chosen. Studies which present analyses without such justification can be difficult to interpret. For example, Penttinen found a significant excess of ischaemic heart disease in relation to back pain in farmers aged 30-49 and a non-significant difference in the opposite direction among those aged 50-66.5 He did not explain why this age division was made, nor did he note that there was no relation when the two age groups were considered together. Studies where subgroup definition has been guided by the data, for example concentrating on males born in October,6 should be based on statistical tests that account for any multiple comparisons that have been made7 and should be scientifically sensible; even then they should be treated with scepticism until confirmed in subsequent studies.
Problems of interpretation are exacerbated by incorrect analysis. We consider right and wrong ways to examine possible interactions in two subsequent statistics notes.