Sample size calculations: should the emperor’s clothes be off the peg or made to measure?

BMJ 2012; 345 doi: http://dx.doi.org/10.1136/bmj.e5278 (Published 23 August 2012)
Cite this as: BMJ 2012;345:e5278

Get access to this article and all of bmj.com for the next 14 days

Sign up for a 14 day free trial today

Access to the full text of this article requires a subscription or payment. Please log in or subscribe below.

  1. Geoffrey Norman, statistician and cognitive psychologist1,
  2. Sandra Monteiro, graduate student2,
  3. Suzette Salama, chair of McMaster research ethics board 3
  1. 1Clinical Epidemiology and Biostatistics, McMaster University, MDCL 3519, 1280 Main St W, Hamilton, Ontario L8S 2T1, Canada
  2. 2Department of Psychology, McMaster University, Hamilton
  3. 3Medicine, Hamilton Health Sciences, Hamilton
  1. Correspondence to: G Norman norman{at}mcmaster.ca
  • Accepted 23 May 2012

Ethics committees require estimates of sample size for all trials, but statistical calculations are no more accurate than estimates from historical data. Geoffrey Norman and colleagues propose some “one size fits all” numbers for different study designs

Conventional wisdom dictates that it is unethical to conduct a study that is so large that excess numbers of patients are exposed or so small that clinically important changes cannot be detected.1 This implies that there is some optimal sample size that can be calculated using statistical theory and information from previous research. But the choice of sample size is usually a compromise between statistical considerations, which always benefit from increased sample size, and economic or logistical constraints.2

Only rarely is sufficient information available to make informed decisions. Moreover, despite the illusion of precision that arises from the application of arcane statistical formulas, in many situations the choice of inputs—the expected treatment effect, the standard deviation, and the power—are subject to considerable uncertainty. As a result, sample size calculations may vary widely.

We argue that, in the absence of good data, it would be better to determine sample size by adopting norms derived from historical data based on large numbers of studies of the same type. We show that for many common situations we can define defensible, evidence based, ranges of sample sizes.

An example

Imagine that you decide to do a study to see if control of primary hypertension is improved by home monitoring. You visit your local statistician for a sample size calculation, as the ethics board insists. The following are some key questions that he or she will ask and some tentative answers.

What is the distribution of blood pressure in the population you intend to study?

One study design might be to randomise people to treatment and control groups, put one group on monitors for a few months, then measure their blood pressure. …

Get access to this article and all of bmj.com for the next 14 days

Sign up for a 14 day free trial today

Access to the full text of this article requires a subscription or payment. Please log in or subscribe below.

Article access

Article access for 1 day

Purchase this article for £20 $30 €32*

The PDF version can be downloaded as your personal record

* Prices do not include VAT

THIS WEEK'S POLL