Education And Debate

Interpreting the results of observational research: chance is not such a fine thing

BMJ 1994; 309 doi: (Published 17 September 1994) Cite this as: BMJ 1994;309:727
  1. P Brennan,
  2. P Croft
  1. ARC Epidemiological Research Unit, University of Manchester Medical School, Manchester M13 9PT
  1. Correspondence to: Mr Brennan.
  • Accepted 6 June 1994

In a randomised controlled trial, if the design is not flawed, different outcomes in the study groups must be due to the intervention itself or to chance imbalances between the groups. Because of this tests of statistical significance are used to assess the validity of results from randomised studies. Most published papers in medical research, however, describe observational studies which do not include randomised intervention. This paper argues that the continuing application of tests of significance to such non-randomised investigations is inappropriate. It draws a distinction between bias and chance imbalance on the one hand (both randomised and observational studies can be affected) and confounding on the other (a unique problem for observational investigations). It concludes that neither the P value nor the 95% confidence interval should be used as evidence for the validity of an observational result.

Epidemiologists and clinical researchers design studies to estimate the effect which a presumed cause or treatment has on the occurrence of a disease. Most questions about causes of disease cannot be addressed by experiments: we must rely on the observation of life as it is, rather than of the results of controlled intervention. Such observational studies cannot provide proof of causality but are still the basis for reasoned public health decisions.

In the presentation of results from observational studies significance tests are often presented as judgments on the “truth” or validity of the effect which a presumed cause has on the occurrence of a disease. In 1965 Bradford Hill lamented this application of statistics,1 a concern given prominence again recently.2 Yet almost 30 years on, phrases such as “the result just failed to reach statistical significance” are still part of the argot of medical papers and presentations. The move towards estimating confidence intervals has not resolved this problem, as the …

View Full Text

Sign in

Log in through your institution

Free trial

Register for a free trial to to receive unlimited access to all content on for 14 days.
Sign up for a free trial