Endgames Statistical Quiz

Sample size calculations I

BMJ 2010; 340 doi: http://dx.doi.org/10.1136/bmj.c3104 (Published 16 June 2010) Cite this as: BMJ 2010;340:c3104
  1. Philip Sedgwick, senior lecturer in medical statistics
  1. 1Centre for Medical and Healthcare Education, St George’s, University of London, London SW17 0RE
  1. p.sedgwick{at}sgul.ac.uk

    A randomised, double blind, placebo controlled trial investigated whether fluvastatin reduced major adverse cardiac events in patients who had undergone successful percutaneous coronary intervention (with or without stenting). The primary outcome was the occurrence of major adverse cardiac events—defined as cardiac death, non-fatal myocardial infarction, or a reintervention procedure—within three years.1

    To calculate the sample size needed to compare fluvastatin with placebo, it was assumed that the proportion of patients having major adverse cardiac events at three years without treatment would be 25%. For fluvastatin to be considered clinically superior to placebo, it would be necessary to demonstrate a relative improvement of 25%, with only 18.75% of patients having major adverse cardiac events at three years. To do so, a total sample size of 1828 patients (914 in each treatment arm) would be required to achieve 90% power using a two sided hypothesis test and critical level of significance of 0.05. A total of 1677 patients were subsequently recruited to the trial.

    Which of the following, if any, are true?

    • a) The specified difference in major adverse cardiac events between fluvastatin and placebo is called the smallest effect of clinical interest

    • b) Power is the probability of detecting the specified difference in major adverse cardiac events, if it exists in the population

    • c) If power was increased to 95%, the sample size would decrease

    • d) The maximum probability of a type I error was 0.05

    Answers

    Answer a, b, and d are true; c is false.

    The researchers wanted to investigate whether fluvastatin reduced major adverse cardiac events in patients who had undergone successful percutaneous coronary intervention. To compare fluvastatin with placebo, an optimal sample size was needed to show whether a clinically significant result existed in the population. To calculate the sample size it was necessary to have some idea of the expected results for the primary outcome.

    The researchers assumed that the rate of major adverse cardiac events at three years without treatment (on placebo) would be 25%. An observed proportion of 18.75% of patients with a major adverse cardiac event at three years, a relative reduction of 25%, was the smallest improvement that needed to be seen for fluvastatin to be considered clinically effective and superior to placebo. This difference is called the smallest effect of clinical interest (a is true). Larger differences would obviously also be of interest. The expected proportion of events on placebo and improvement with fluvastatin were based either on clinical experience or previous research studies involving smaller samples.

    Hypothesis testing and derivation of the P value are based on the hypothetical situation of sampling an infinite number of times.2 Power is the percentage of these repeated samples, set at 90% in the above trial, that would demonstrate the smallest effect of clinical interest, if it exists in the population (b is true). It is generally recommended that power is set to a minimum of 80% to ensure that there is a high probability of observing the smallest effect of clinical interest, if it exists.

    The smallest effect of clinical interest may not actually exist in the population. But if it does, the likelihood that it will be seen in the sample needs to be maximised. Generally, as sample size increases and approaches that of the population, the sample difference in major adverse cardiac events will become similar to that in the population. Therefore, as sample size increases so does power, because the smallest effect of clinical interest is more likely to be observed for the sample, if it exists in the population (c is false).

    To compare fluvastatin with placebo, a two sided hypothesis test with a critical level of significance of 0.05 was proposed. Fixing the critical level of significance in advance sets the maximum probability of a type I error.3 Hypothesis testing and derivation of the P value are based on the hypothetical situation of sampling an infinite number of times. Because the critical level of significance was set at 0.05, the null hypothesis would be rejected in favour of the alternative for 5% of these infinite numbers of samples. Therefore, for any hypothesis test the maximum probability of rejecting the null hypothesis is 0.05. A type I error would occur if we rejected the null hypothesis in favour of the alternative when there was no difference in major adverse cardiac events in the population; an incorrect inference would have been made. Because any hypothesis test could result in a type I error, the maximum probability of a type I error is 0.05 (d is true). The probability of a type I error is influenced by sample size. As sample size increases and approaches that of the population, the sample difference in major adverse cardiac events will become similar to that in the population, making it less likely that a type I error will occur.

    To undertake a statistical hypothesis test comparing fluvastatin with placebo, an optimal sample size needed to be calculated. This was a balance between power and the probability of a type I error based on the smallest effect of clinical interest. Typically we fix power at 80% or 90%, and limit the maximum probability of a type I error to 0.05—that is, the critical level of significance. If the sample size is too small it may not be representative of the population, and this could lead to a trial that lacks power. Too large a sample may be time consuming, expensive, and possibly unethical. However, it is not always possible to recruit the desired number of people for financial or practical reasons.

    Notes

    Cite this as: BMJ 2010;340:c3104

    Footnotes

    • Competing interests: None declared.

    References