- Timothy Clark, research scientist,
- Ursula Berger, senior statistician,
- Ulrich Mansmann, director, and chair of biostatistics and bioinformatics
- 1Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie, Faculty of Medicine, Ludwig-Maximilians University, Munich, Germany
- Correspondence to: U Mansmann
- Accepted 31 January 2013
Objectives To assess the completeness of reporting of sample size determinations in unpublished research protocols and to develop guidance for research ethics committees and for statisticians advising these committees.
Design Review of original research protocols.
Study selection Unpublished research protocols for phase IIb, III, and IV randomised clinical trials of investigational medicinal products submitted to research ethics committees in the United Kingdom during 1 January to 31 December 2009.
Main outcome measures Completeness of reporting of the sample size determination, including the justification of design assumptions, and disagreement between reported and recalculated sample size.
Results 446 study protocols were reviewed. Of these, 190 (43%) justified the treatment effect and 213 (48%) justified the population variability or survival experience. Only 55 (12%) discussed the clinical importance of the treatment effect sought. Few protocols provided a reasoned explanation as to why the design assumptions were plausible for the planned study. Sensitivity analyses investigating how the sample size changed under different design assumptions were lacking; six (1%) protocols included a re-estimation of the sample size in the study design. Overall, 188 (42%) protocols reported all of the information to accurately recalculate the sample size; the assumed withdrawal or dropout rate was not given in 177 (40%) studies. Only 134 of the 446 (30%) sample size calculations could be accurately reproduced. Study size tended to be over-estimated rather than under-estimated. Studies with non-commercial sponsors justified the design assumptions used in the calculation more often than studies with commercial sponsors but less often reported all the components needed to reproduce the sample size calculation. Sample sizes for studies with non-commercial sponsors were less often reproduced.
Conclusions Most research protocols did not contain sufficient information to allow the sample size to be reproduced or the plausibility of the design assumptions to be assessed. Greater transparency in the reporting of the determination of the sample size and more focus on study design during the ethical review process would allow deficiencies to be resolved early, before the trial begins. Guidance for research ethics committees and statisticians advising these committees is needed.
The determination of sample size is central to the design of randomised controlled trials.1 To have scientific validity a clinical study must be appropriately designed to meet clearly defined objectives.2 3 Clinical trials should provide precise estimates of treatment effects, thus allowing healthcare professionals to make informed decisions based on sound evidence.3 Equally, trials should not be too large, as these may expose some patients to unnecessary risks. An extensive literature on sample size calculations in clinical research now exists for a wide variety of data types and statistical tests.4 5 6 7 8 9 The International Conference on Harmonisation of technical requirements for registration of pharmaceuticals for human use, topic E9, sets down the requirements for sample size reporting in research protocols for studies supporting the registration of drugs for use in humans.1 Although these standards primarily concern commercial sponsors, the principles (box 1) have broad application to all clinical trials. The consolidated standards of reporting trials statement provides similar guidance for published randomised trials.10 11
Box 1: Core components of the sample size determination (International Conference on Harmonisation topic E9)1
For example, superiority, non-inferiority, equivalence
For example, parallel group, crossover, factorial
Clinically most relevant endpoints from the patients’ perspective
Statistical test procedure
For example, t test for continuous variables, χ2 test for binary variables
Ratio of number of participants in each treatment arm
Treatment difference sought
Minimal effect that has clinical relevance or the anticipated effect of the new treatment, where this is larger
Other design assumptions
For example, variance, response rates, and event rates, used in the calculation
Type I error
Probability of erroneously rejecting the null hypothesis
Type II error
Probability of erroneously failing to reject the alternative hypothesis:
Expected rate of treatment withdrawals
Expected proportion of subjects with no post-randomisation information
Justification of treatment difference sought and other design assumptions
Investigation of how sample size changes under different assumptions (sensitivity analyses)
Other components depending on study design
Accrual and total study duration used to estimate the number of patients required in event driven studies
Adjustments for multiple testing—for example, multiple endpoints, multiple checks during interim monitoring
Surprisingly few evaluations have been made of the quality of the sample size determination in randomised controlled trials. Those that have been performed are mainly based on published data owing to the difficulty in obtaining access to unpublished research protocols.12 13 These reviews have several limitations. Firstly, the reporting of the sample size determination in the study publication is less detailed than in the research protocol. Secondly, they are affected by publication bias. Thirdly, one study showed that there are often discrepancies between the research protocol and the publication.14 Consequently, definitive conclusions about the quality of sample size determinations should be based on a review of original research protocols.
Studies that are too large or too small have been branded as unethical.2 15 16 The view that underpowered studies are in themselves unethical has been challenged by some researchers, who argue that this is too simplistic.17 18 19 We believe that a study must be judged on whether it is appropriately designed to answer the research question posed, and the validity of the sample size calculation is germane to this assessment. This is not merely a matter of whether the sample size can be recalculated, since the calculation can be correct mathematically but still be of poor quality if the assumptions used have not been suitably researched and qualified.20 Greater transparency in the reporting of the sample size determination and more focus on study design during the ethical review process would allow deficiencies to be resolved early, before the trial begins; once the trial starts it is too late. We assessed the quality of sample size determinations reported in research protocols with the aim of developing guidance for research ethics committees.
We searched the research ethics database, a web based database application for managing the administration of the ethical review process in the United Kingdom, using filter criteria (see supplementary file) to identify all validated applications for randomised (phase IIb, III, and IV) clinical trials of investigational medicinal products submitted to the National Research Ethics Service for ethical review during 2009. We designed these criteria to create a large database of recently submitted protocols (2009 was the last complete year before the project started in 2010) for randomised controlled trials.
Creation of the protocol database
Three researchers extracted the characteristics of the studies (table 1⇓) from the research ethics database according to prespecified rules and entered the data into the protocol database. Two reviewers independently assessed each research protocol. The researchers met regularly to discuss and agree on the final data to be entered into the database.
To verify the data sources we checked that the information in the research ethics database was consistent with the research protocol on file at the research ethics committee office.
The database was analysed using SPSS Version 19. We describe the results using frequency tables with percentages, cross tabulations, relative risks with 95% confidence interval, and box and Bland-Altman plots.
Assessing the sample size determination
We assessed sample size determination based on three factors: the reporting of how the sample size was determined, the reporting of and justification for the design assumptions, and recalculation of the original sample size determination.
Reporting of sample size determination
We reviewed each protocol to determine the presence or absence of the core sample size components. Reporting of additional information such as adjustment for multiple testing (for example, multiple endpoints, multiple checks during interim monitoring) required for the sample size calculation was also documented. We did not assess the appropriateness of the proposed methods of analysis.
Reporting and justifying design assumptions
Each design assumption was categorised (box 2). We also documented the reporting of sensitivity analyses and consideration of an adaptive design. We did not independently assess the appropriateness of the design assumptions.
Box 2: Categorisation of design assumptions
Variable and data on which assumption based (the “basis”)
Details of the data underpinning the variable—for example, previous studies with the new drug or products in the same therapeutic class, physician survey, meta-analysis, literature search given
Treatment difference and data on which assumption based and discussion of its clinical importance
Required more than a simple statement that the “treatment difference was clinically important”. Reference to a specific study or studies in which the clinically relevant difference has been determined, or a detailed clinical discussion of why the investigators considered the difference sought to be meaningful
Variable and data on which assumption based and a reasoned explanation for choice
Required a discussion of the data underpinning the variable and an explanation why the value used in the sample size calculation was plausible for the planned study
Recalculation of original sample size determination
The three researchers who created the protocol database recalculated the original sample size according to prespecified rules. Two independent reviewers carried out each recalculation; the researchers met regularly to discuss and agree the final data to be entered into the database. Any outstanding questions were referred to a fourth reviewer for resolution (n=65).
If the sample size determination stated that a specific statistical software had been used (for example, nQuery Advisor, East, PASS) or referenced a specific publication, then we used the same software or published methodology to recalculate the sample size. If the protocol stated that the sample size was based on a more complex method of analysis, such as analysis of covariance then we used PASS 11 or nQuery 6.01. Otherwise we used standard formulas for normal, binary, or survival data.4 5 6
Missing information was imputed in four ways. Firstly, if the withdrawal rate was not specified we recalculated the sample size using the variables given. This recalculated sample size was then compared with the sample size reported in the protocol. Secondly, if the type I error or type II error (power of a trial is 1−probability of a type II error) was not specified, we recalculated the sample size using a two sided 5% type I error or a 20% type II error. Thirdly, when adjustments for multiple testing were not reported we assumed no adjustment had been applied. Finally, if the sample size was based on a more complex method of analysis, but insufficient information was reported to allow the sample size to be recalculated for the planned method, we used standard formulas to recalculate the sample size.
We defined two populations for analysis: protocols where missing information was imputed and protocols that reported all core components and any additional information such as adjustments for multiple testing required to accurately recalculate the sample size (complete reporting).
If the ratio of the number of evaluable patients or events reported in the protocol to that calculated fell within the range 0.95 to 1.05, then we reproduced the sample size, since a difference of 5% or less either way represented an inconsequential reduction or increase in power (approximately 2% for normal, binary, or survival data).
A total of 929 research protocols were identified by the initial search. Of these, 446 met the inclusion criteria (see supplementary figure 5). Table 2⇓ lists the main characteristics of the 446 research protocols (also see supplementary table 5). The most common therapeutic areas were oncology (94; 21%) and endocrinology (49; 11%). Most studies were sponsored by industry (314; 70%), were in phase III (251; 56%), had a parallel group design (319; 72%), and had superiority of the test over control medicinal product as the primary objective (375; 84%). Six (1%) protocols included sample size re-estimation in the study design.
Reporting of sample size components
The individual core components of the sample size were generally reported in the 446 protocols, with the exception of withdrawals (269; 60%, fig 1⇓) (also see supplementary table 6). Of the 446 protocols, 240 (54%) reported all the core components; withdrawal rate was the only element missing in 143 out of 206 (69%) protocols that did not report all core components.
When we considered protocols that reported all core components and additional information such as adjustments for multiple testing to accurately recalculate the sample size (complete reporting) then the number reduced to 188 protocols (42%).
Reporting design assumptions
Less than half of the 446 protocols (190; 43%) reported the data on which the treatment difference (or margin) was based. Of the 190 protocols that did report the basis of the treatment difference, 92 (48%) cited previous studies with the product or a product in the same class and 38 (20%) cited a literature search (fig 2⇓ and supplementary table 7). In only four (2%) protocols was the estimated treatment difference based on a meta-analysis. Reporting the basis for the treatment difference was lowest in studies on oncology (28/94; 30%) and cardiovascular disease (12/36; 33%) and highest in those on pain and anaesthesia (16/27; 59%) (see supplementary table 8).
Overall, 55 out of 446 (12%) protocols reported both the basis of the treatment effect and its clinical importance, 135 (30%) protocols reported the basis only, and 256 (57%) reported neither. Limited information on the nature of the data underpinning the treatment effect was usually given, and just 13 (3%) protocols gave a reasoned explanation why the value chosen was plausible for the planned study.
The same pattern was observed with population variability or survival, with less than half (213/446; 48%) of the protocols reporting the basis of the variable used in the calculation (fig 2 and supplementary table 9). Previous studies, a literature search, or both, were again most commonly cited. The variability or survival estimate was based on a meta-analysis in only two of the 213 (1%) protocols. Again, limited information was usually given, and just 17 (4%) protocols explained the plausibility of the value chosen.
Only 11 out of the 446 (3%) protocols reported analyses investigating the sensitivity of the sample size to deviations from the assumptions used in the calculation.
Reporting of strategies to control type I (false positive) and type II (false negative) error
Adjustments for multiple comparisons (81/144; 56%) or interim analyses (56/95; 59%) were reported in just over half of the research protocols with these design features (see supplementary table 10). The potential for increasing the type II error was not considered in any study with multiple comparisons. If all co-primary variables must be significant to declare success then the type II error rate can be inflated, resulting in reduction in the overall study power.1 22
Recalculation of the original sample size determination
If all protocols were considered using the rules for imputing missing information then 262 of out 446 (59%) sample size determinations could be reproduced, with 51 (11%) under-estimated and 103 (23%) over-estimated. Thirty (7%) of the original sample size calculations could not be recalculated (see supplementary table 11). Figure 3⇓ shows a box plot of the relative differences between the reported and recalculated sample sizes.
A total of 134 of the 188 (71%) sample size calculations from protocols with complete reporting could be reproduced, with 20 (11%) under-estimated and 34 (18%) over-estimated, respectively. The reproducibility of the sample size increased with more comprehensive reporting, primarily withdrawal rates and adjustments for multiple testing. None the less, both analyses showed a tendency for over-estimation, and in total only 134 of the 446 (30%) original sample size calculations could be accurately reproduced.
Supplementary figure 6 shows a Bland-Altman plot comparing reported and calculated sample sizes.
Commercial versus non-commercial sponsors
The reporting of the core components of the sample size determination did not differ noticeably between studies with commercial and non-commercial sponsors (fig 4⇓ and supplementary table 12). Studies with non-commercial sponsors were more likely than those with commercial sponsors to report the basis for design assumptions (relative risk 1.69, 95% confidence interval 1.38 to 2.08 for treatment difference and 1.29, 1.07 to 1.56 for variance and survival). Conversely, studies with non-commercial sponsors were less likely than those with commercial sponsors to report adjustments for multiple comparisons (0.26, 0.13 to 0.50) and interim analyses (0.54, 0.31 to 0.93) and provide complete reporting (0.60, 0.45 to 0.81); the sample size calculation from protocols of studies with non-commercial sponsors was also less likely to be reproduced (0.72, 0.59 to 0.88).
Our review suggests that the reporting of the sample size determination in the research protocol often lacks essential information. Treatment difference and type I error were usually given, but withdrawal rates and adjustments for multiple testing were often missing. Only 188 of 446 (42%) protocols contained sufficient information to accurately recalculate the sample size. More than half of the research protocols provided no justification for the assumptions used in the sample size calculation. When a justification was given, it generally lacked detail. Sensitivity analyses, which can help investigators understand the reliability of the variables used in the sample size calculation and whether sample size re-estimation should be included in the study design, were rarely reported.23 24 25
Imputing missing information resulted in 262 out of 446 (59%) reproduced sample sizes. This increased to 134 out of 188 (71%) when only complete reports were considered. Overall, only 134 of the 446 (30%) sample size calculations could be accurately reproduced. Study size tended to be over-estimated rather than under-estimated.
Our research, the first extensive review of unpublished research protocols, raises several problems with the statistical planning of randomised controlled trials, in particular the limited consideration afforded to the choice of design assumptions. Sample size determinations are highly sensitive to changes in design assumptions, which behoves sponsors to be rigorous when estimating these variables.26 Moreover, if the degree of uncertainty is high then design assumptions should be checked during the course of the trial.26
Limitations of this review
We only reviewed the research protocol submitted to the research ethics committee and had no access to any other documents. Moreover, our review was completely independent of the ethical review process. The protocols were submitted in 2009 to the UK National Research Ethics Service and reflect clinical research practice at that time. None the less, the sample is relatively recent and many sponsors planned to include sites both within and outside the United Kingdom, so we believe our findings can be generalised to other countries and regions for commercial studies where global regulatory requirements exist. For non-commercial studies, the quality of reporting depends on the investigators experience. We did not verify the appropriateness of the design assumptions used in the sample size determination in this research project.
Implications of the findings
In many instances the validity of the sample size determination and by extension the scientific validity of the study—one of the main aspects of the ethical review process—could not be judged.2 The available evidence suggests that key sample size assumptions are not determined in a rigorous manner. This may explain why large differences have been observed between design assumptions and observed data.13 27 Furthermore, sample sizes tended to be over-estimated, which is a concern given the challenges of recruiting to randomised controlled trials.28 Finally, methodologies to check assumptions and re-estimate sample size during the study are often not applied, despite the fact that these methods are encouraged by regulatory authorities.29 30
Investigators should be rigorous in the determination of design assumptions. There is no “one size fits all” approach. Sufficient information should be reported to allow the sample size to be reproduced and show that there is solid reasoning behind the assumptions used in the calculation (box 3).
Box 3: Recommended information to be reported in research protocols
All components necessary to reproduce the sample size, in particular withdrawal or dropout rate and adjustments for multiple comparisons or interim analyses
Confidence interval for variables used in the calculation
A concise summary of the data from which variable estimates are derived. If the variable is based on previous studies then give details of the study design, clinical phase, study population, relevant outcome measures, relevant results, and study size, ideally in a table
Discussion of the clinical importance of the treatment effect
A reasoned explanation of why the treatment difference and other design assumptions are plausible for the planned study, taking into account:
All existing data, for example, previous clinical studies, relevant clinical pharmacology (dose effect relation, etc) and non-clinical data
How any differences between the previous studies and the one planned impact on the design assumptions
How robust the sample size and/or statistical power is to different assumptions (sensitivity analysis). If the variable estimates are considered unreliable then re-estimation of the sample size could be considered
We would also ask the suppliers of software used to calculate sample size to consider including the withdrawal and dropout rate in the package to ensure that this is taken into account and reported in the research protocol.
A poorly designed trial cannot be saved once it is completed. Greater transparency in the reporting of sample size determinations in research protocols would facilitate the early detection of deficiencies in the study design. Moreover, better justification of the design assumptions in the research protocol would facilitate the overall ethical review process.31
Despite calls for a different approach to sample size determination, we believe that there is no substitute for spending time designing the study and giving due consideration to the risks and how these can be tackled.13 32
Wherever the responsibility for scientific and statistical review lies, we believe clear guidance on the sample size determination should be provided and followed. Individuals with appropriate statistical expertise should also play a central role in the ethical review of research protocols.33 Improving the review process to place more focus on study design was the aim of the National Research Ethics Service at the start of this project and we propose to use the results of our research to develop guidance, working with the ethics service and others interested in this area.
What is already known on this topic
Sample size determination is an accepted and important part of the planning process for randomised controlled trials
Sample size reporting in publications is often lacking essential information
What this study adds
Sample size reporting in original research protocols is often incomplete and in many instances the reliability of the design assumptions and hence the validity of the sample size determination cannot be judged
The ethical review process should place greater focus on study design
Withdrawal and dropout rate are frequently not reported and therefore suppliers of sample size software could include this variable in the package to improve reporting
Cite this as: BMJ 2013;346:f1135
We thank the National Research Ethics Service for its support; IBE research assistants Mathias Heibeck and Linda Hayanga for their contribution to this project; and Michael Campbell, Sir Iain Chalmers, Douglas Altman, Gary Collins, and Hugh Davies for their comments on this work.
Contributors: UM had full access to all of the study data and takes responsibility for the integrity of the data and the accuracy of the data analyses. He is guarantor. TC and UM conceived and designed the study. TC, Mathias Heibeck, and Linda Hayanga collected the data. TC and UB carried out the statistical analyses. TC, Mathias Heibeck, Linda Hayanga, and UM recalculated the sample sizes. TC, UM, and UB interpreted the data. TC drafted the manuscript. UM and UB critically reviewed the manuscript. The views expressed in the paper are those of the authors and do not necessarily reflect the views of the National Research Ethics Service, which became a part of the Health Research Authority in December 2011.
Funding: This study received no funding.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare that: TC received support for travel from the National Research Ethics Service and has worked as a consultant for the clinical research organisation ICON in the previous three years; they have no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data available.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.