BMJ 1996;313:486 (24 August)

Education and debate

Statistics Notes: Interaction 1: heterogeneity of effects

Douglas G Altman, head,a John N S Matthews, senior lecturer in medical statistics b

a ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, PO Box 777, Oxford OX3 7LF, b Department of Medical Statistics, University of Newcastle, Newcastle upon Tyne NE2 4HH

Correspondence to: Mr Altman.

In several types of study we may want to examine the consistency of an observed relation across two or more subgroups of the individuals studied. For example, in a clinical trial we might want to know if the observed treatment difference is the same for young and old patients or for different stages of disease at presentation. In an epidemiological study we might want to know whether the observed relation between an exposure and disease is different among smokers and non-smokers

In such cases we are interested in examining whether one effect is modified by the value of another variable. This may be viewed as the examination of the heterogeneity of an observed effect, such as treatment benefit in a clinical trial, across subsets of individuals. The statistical term for heterogeneity of this type is interaction; the medical concept of synergy is the same thing.

While it may well be of interest to look for heterogeneity of effect, this is not always wise. In a controlled trial there are numerous subgroups which might be compared by splitting the patients according to sociodemographic or clinical categories at the start of the trial. In addition, for continuous variables such as age or blood pressure there are many ways of creating groups. Exploratory examination of many such subgroups is almost certain to throw up some spurious significant interactions, and in practice we cannot tell if a specific interaction is real or spurious. For example, in a randomised controlled trial comparing dexamethasone phosphate with placebo for preventing neonatal respiratory distress syndrome, the researchers found unexpectedly that the overall beneficial effect of the active treatment was present only in female infants.1 Further studies would be needed to confirm the finding (or not). The refutation of such unexpected observations is common, and indeed in this case the finding was not replicated in further studies.2

Likewise, we can investigate the interaction between any pair of variables in a regression model. With 10 variables there are 45 such potential interactions and much scope for being misled. So, although we do not necessarily believe that all effects are truly independent, in many cases it is reasonable not to examine any possible interactions. For example, Pocock et al found a negative association between tooth lead concentrations and IQ (intelligence quotient) in children aged 6.3 Exploratory analysis revealed a strong association among boys and little association among girls. They were rightly cautious in their interpretation as there had been no prior hypothesis about such an effect.

By contrast, when there is a specific prior suspicion of the existence of a particular interaction it is perfectly reasonable and desirable to examine it. A common example already mentioned is the interest in a possible difference of risk between smokers and non-smokers. For example, a study of Danish porcelain painters found that the adverse effects of cobalt exposure on lung function were more severe among non-smokers than smokers.4

Results of tests for interactions are likely to be convincing only if they were specified at the start of the study. In any study that presents subgroup analyses it is important to specify when and why the subgroups were chosen. Studies which present analyses without such justification can be difficult to interpret. For example, Penttinen found a significant excess of ischaemic heart disease in relation to back pain in farmers aged 30-49 and a non-significant difference in the opposite direction among those aged 50-66.5 He did not explain why this age division was made, nor did he note that there was no relation when the two age groups were considered together. Studies where subgroup definition has been guided by the data, for example concentrating on males born in October,6 should be based on statistical tests that account for any multiple comparisons that have been made7 and should be scientifically sensible; even then they should be treated with scepticism until confirmed in subsequent studies.

Problems of interpretation are exacerbated by incorrect analysis. We consider right and wrong ways to examine possible interactions in two subsequent statistics notes.

  1. Collaborative Group on Antenatal Steroid Therapy. Effect of antenatal dexamethasone administration on the prevention of respiratory distress syndrome. Am J Obstet Gynecol 1981;141:276-87. [Medline]
  2. Crowley P, Chalmers I, Keirse MJNC. The effects of corticosteroid administration before preterm delivery: an overview of the evidence from controlled trials. Br J Obstet Gynaecol 1990;97:11-25. [Medline]
  3. Pocock SJ, Ashby D, Smith MA. Lead exposure and children's intellectual performance. Int J Epidemiol 1987;16:57-67. [Abstract/Free Full Text]
  4. Raffn E, Mikkelsen S, Altman DG, Christensen JM, Groth S. Health effects due to occupational exposure to cobalt blue dye among plate painters in a porcelain factory in Denmark. Scand J Work Environ Health 1988;14:378-84. [Medline]
  5. Penttinen J. Back pain and risk of fatal ischaemic heart disease: 13 year follow up of Finnish farmers. BMJ 1994;309:1267-8. [Free Full Text]
  6. Helgason T, Jonasson MR. Evidence for a food additive as a cause of ketosis-prone diabetes. Lancet 1981;ii:716-20.
  7. Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ 1995;310:170. [Free Full Text]

This article has been cited by other articles:

  • Eldridge, S. (2007). Good practice in statistical reporting for Family Practice. Fam Pract 24: 93-94 [Full text]  
  • Ogilvie, D, Petticrew, M (2004). Reducing social inequalities in smoking: can evidence inform policy? A pilot study. Tobacco Control 13: 129-131 [Abstract] [Full text]  
  • Altman, D. G, Bland, J M. (2003). Statistics Notes: Interaction revisited: the difference between two estimates. BMJ 326: 219-219 [Full text]  
  • Altman, D. G, Bland, J M. (1998). Statistics Notes: Generalisation and extrapolation. BMJ 317: 409-410 [Full text]  
  • Matthews, J. N S, Altman, D. G (1996). Statistics notes: Interaction 3: How to examine heterogeneity. BMJ 313: 862-862 [Full text]  
  • Matthews, J. N S, Altman, D. G (1996). Statistics Notes: Interaction 2: compare effect sizes not P values. BMJ 313: 808-808 [Full text]  

Online poll
Find out more

Rapid responses for this article

There are no rapid responses for this article.


Student BMJ

Risk of surgery for inflammatory bowel disease: record linkage studies

What can you learn from this BMJ paper? Read Leanne Tite's Paper+

www.student.bmj.com

Listen to the latest BMJ Interview