Intended for healthcare professionals

Research Methods & Reporting

The tyranny of power: is there a better way to calculate sample size?

BMJ 2009; 339 doi: (Published 06 October 2009) Cite this as: BMJ 2009;339:b3985
  1. John Martin Bland, professor of health statistics
  1. 1Department of Health Sciences, University of York, Heslington, York YO10 5DD
  1. mb55{at}
  • Accepted 12 June 2009

Martin Bland’s extensive experience in reviewing and using power calculations has led him to believe that it is time to replace them

When I began my career in medical statistics, back in 1972, little was heard of power calculations. In major journals, sample size often seemed to be whatever came to hand. For example, in September 1972, the Lancet contained 31 research reports that used individual subject data, excluding case reports and animal studies. The median sample size was 33 (quartiles 12 and 85). In the same month the BMJ had 30 reports of the same type, with median sample size 37 (quartiles 12 and 158). None of these publications explained the choice of sample size, other than it being what was available. Indeed, statistical considerations were almost entirely lacking from the methods sections of these papers.

Summary points

  • Most medical research studies have sample sizes justified by power calculations

  • Power calculations are based on significance tests

  • Many journals require results to be presented with confidence intervals

  • Sample size calculations should be based on the width of a confidence interval, not power

Compare the research papers of September 1972 with those in the same journals in September 2007, 35 years later. In the Lancet, there were 14 such research reports, with median sample size 3116 (quartiles 1246 and 5584), two orders of magnitude greater than in 1972. In September 2007, the BMJ carried 12 such research reports, with median sample size 3104 (quartiles 236 and 23 351). Power calculations were reported for four of the Lancet papers and five of the BMJ papers.

The patterns in the two journals are strikingly similar. For each journal, sample sizes increased almost a 100-fold, the proportion of papers reporting power calculations increased from none to one third, and the number of studies of individual …

View Full Text