Sample size and power
BMJ 2011; 343 doi: https://doi.org/10.1136/bmj.d5579 (Published 08 September 2011) Cite this as: BMJ 2011;343:d5579- Philip Sedgwick, senior lecturer in medical statistics
- 1Centre for Medical and Healthcare Education, St George’s, University of London, Tooting, London, UK
- p.sedgwick{at}sgul.ac.uk
Researchers carried out a randomised controlled trial to compare the effectiveness of cryotherapy with that of salicylic acid for treating plantar warts. Participants were eligible if aged 12 years or over. Those randomised to cryotherapy had liquid nitrogen delivered by a healthcare professional, with a maximum of four treatments, each 2-3 weeks apart. Participants randomised to 50% salicylic acid (Verrugon) treated themselves daily for a maximum of eight weeks. The trial was conducted as a superiority study.1
The primary outcome was complete clearance of all plantar warts at 12 weeks. The cure rate for salicylic acid was assumed to be 70% at 12 weeks, and the smallest effect of clinical interest was a difference in cure rates of 15%. To demonstrate the smallest effect of clinical interest with 80% power at the 5% critical level of significance, a sample size of 120 patients in each treatment group was needed, or 133 patients in each group after allowing for 10% attrition (266 patients in total).
The proportion of participants with complete clearance of all plantar warts at 12 weeks was slightly lower in the cryotherapy group, although the difference was not statistically significant (13.64% versus 14.29% (95% confidence interval for difference −9.63 to 8.33; P=0.89)).
Which of the following statements, if any, are true?
a) The alternative hypothesis is one sided, stating that in the population the cure rate for cryotherapy is greater than salicylic acid by at least 15%.
b) Power is the probability of observing the smallest effect of clinical interest, if it exists in the population.
c) If power was increased to 90%, the sample size would decrease.
d) It can be concluded that there is no difference in cure rates between treatments in the population.
Answers
Answer b is true, while a, c, and d are false.
The researchers reported that limited evidence existed to inform clinical decision making in the treatment of plantar warts. Therefore, the randomised controlled trial was carried out to compare the effectiveness of cryotherapy with that of salicylic acid. Designed as a superiority trial,2 the study aimed to establish whether cryotherapy was superior to salicyclic acid or vice versa. For one treatment to be considered superior to the other, a difference between treatments in cure rates of 15% or more needed to be observed. This difference is termed the smallest effect of clinical interest and is based on prior clinical expertise. If this difference in cure rates between treatments was seen, the null hypothesis would be rejected in favour of the alternative hypothesis at the 5% level of significance. The alternative hypothesis is two sided, stating that in the population the cure rate for salicylic acid is less than, or greater than, that for cryotherapy by 15% or more (a is false).
It was not known if the smallest effect of clinical interest actually existed in the population. It was essential that the sample was representative of the population so that if the actual difference between treatments in the population was 15% or more, it was then found in the trial. The only way to ensure that the sample estimate was a truly accurate estimate of the population difference was for the entire population to be included in the trial. However, this was obviously not possible, and so a sample was taken from the population. The sample needed to be large enough for the sample estimate to be similar in magnitude to the population treatment difference. Therefore, if the smallest effect of clinical interest actually existed, it would be observed.
To calculate the required sample size, some idea of the expected cure rate for one of the treatments was needed. Published evidence indicated that the cure rate for salicylic acid was 70% at 12 weeks. The sample size was derived by using the theoretical concept of repeated sampling from the population. If samples of size of 240 (120 in each treatment group) were taken repeatedly at random from the population, then 80% of these samples would show a treatment difference of 15% or more—but obviously only if that difference actually existed in the population. This is called the power of the study: that is, the probability of observing the smallest effect of clinical interest, if it exists (b is true). We wish the power to be as high as possible and close as possible to 1.0 (that is, 100%). The only way the power could be 100% is if the entire population was included in the trial. Therefore, increases in power result in a greater sample size (c is false). Typically in clinical trials, power is set to be a minimum of 0.8 (80%).
No statistically significant difference between treatments was observed. However, it cannot be concluded that such a difference does not exist in the population (d is false), only that there was no evidence of one. The trial participants were a single sample from the population that may not have been representative. It is possible that another sample may give different results. This concept has led to the phrase, “Absence of evidence is not evidence of absence.”3 4 The sample size was calculated with a power of 80%, and therefore there was a probability of 0.2 that the calculated sample size would not demonstrate the smallest effect of clinical interest even if it existed in the population.
Notes
Cite this as: BMJ 2011;343:d5579
Footnotes
Competing interests: None declared.