Influence of trial sample size on treatment effect estimates: meta-epidemiological studyBMJ 2013; 346 doi: http://dx.doi.org/10.1136/bmj.f2304 (Published 24 April 2013) Cite this as: BMJ 2013;346:f2304
- Agnes Dechartres, assistant professor of epidemiology123,
- Ludovic Trinquart, senior statistician14,
- Isabelle Boutron, professor of epidemiology1234,
- Philippe Ravaud, professor of epidemiology and director12345
- 1INSERM, U738, Paris, France
- 2AP-HP (Assistance Publique des Hôpitaux de Paris), Hôpital Hôtel Dieu, Centre d’Épidémiologie Clinique, 75004 Paris, France
- 3Univ Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine, Paris, France
- 4French Cochrane Centre, Paris, France
- 5Columbia University, Mailman School of Public Health, New York, NY, USA
- Correspondence to: A Dechartres
- Accepted 2 April 2013
Objective To assess the influence of trial sample size on treatment effect estimates within meta-analyses.
Design Meta-epidemiological study.
Data sources 93 meta-analyses (735 randomised controlled trials) assessing therapeutic interventions with binary outcomes, published in the 10 leading journals of each medical subject category of the Journal Citation Reports or in the Cochrane Database of Systematic Reviews.
Data extraction Sample size, outcome data, and risk of bias extracted from each trial.
Data synthesis Trials within each meta-analysis were sorted by their sample size: using quarters within each meta-analysis (from quarter 1 with 25% of the smallest trials, to quarter 4 with 25% of the largest trials), and using size groups across meta-analyses (ranging from <50 to ≥1000 patients). Treatment effects were compared within each meta-analysis between quarters or between size groups by average ratios of odds ratios (where a ratio of odds ratios less than 1 indicates larger effects in smaller trials).
Results Treatment effect estimates were significantly larger in smaller trials, regardless of sample size. Compared with quarter 4 (which included the largest trials), treatment effects were, on average, 32% larger in trials in quarter 1 (which included the smallest trials; ratio of odds ratios 0.68, 95% confidence interval 0.57 to 0.82), 17% larger in trials in quarter 2 (0.83, 0.75 to 0.91), and 12% larger in trials in quarter 3 (0.88, 0.82 to 0.95). Similar results were obtained when comparing treatment effect estimates between different size groups. Compared with trials of 1000 patients or more, treatment effects were, on average, 48% larger in trials with fewer than 50 patients (0.52, 0.41 to 0.66) and 10% larger in trials with 500-999 patients (0.90, 0.82 to 1.00).
Conclusions Treatment effect estimates differed within meta-analyses solely based on trial sample size, with stronger effect estimates seen in small to moderately sized trials than in the largest trials.
Sample size varies greatly among trials, ranging from tens of patients to thousands of patients,1 even within a meta-analysis investigating the same question. For example, a meta-analysis in cardiology2 included trials with sizes ranging from 62 patients to 45 852 patients. Our knowledge about the influence of trial sample size on treatment effect estimates is based on the small study effect—the tendency for small trials to report greater treatment benefits than large trials in the same meta-analysis.3 4 5 A study based on a collection of meta-analyses in osteoarthritis showed that trials including fewer than 100 patients per arm yielded, on average, greater treatment effect estimates than did larger trials.6
The concept of a single threshold to distinguish small trials from large trials, whatever the medical area or intervention being tested, is not straightforward.7 For binary outcomes, the required sample size depends on the magnitude of treatment effect as well as the number of events and frequency of the medical condition. Therefore, a trial of 1000 patients can be considered large for certain medical conditions and small for others.
In this study, we assessed the influence of trial sample size on treatment effect estimates in a large collection of meta-analyses of various medical conditions and interventions.
This study combined data from two independent collections of meta-analyses of randomised controlled trials assessing therapeutic interventions with binary outcomes. The first collection included 48 meta-analyses (421 trials) published in the 10 leading journals of each medical subject category of the Journal Citation Reports during two periods: between July 2008 and January 2009 and between January and June 2010 or in issue 4 of the Cochrane Database of Systematic Reviews in 2008. Further details about the search strategy and selection of meta-analyses have been published.8 Reports of all component trials from included meta-analyses were obtained.
The second collection included 45 meta-analyses (314 trials) published in the Cochrane Database of Systematic Reviews between January and July 2011. We included meta-analyses assessing a binary outcome for the primary or main outcome measure and involving four or more trials. If the meta-analysis reported combined results for more than one primary binary outcome, we selected the first reported outcome if it was described in enough trial reports. We excluded meta-analyses that overlapped with the first collection. Web appendix 1 includes details of the selection process for both collections.
Data extraction and risk of bias assessment
Using a standardised data extraction form, we extracted the following data for each randomised controlled trial: the date of publication, whether the trial was a single centre or multicentre trial (with at least two different centres), the number of patients with the outcome in each group, and the number of patients randomised in each group. Data for risk of bias were also collected by using the following domains of the risk of bias tool of the Cochrane Collaboration9 10: methods for sequence generation and allocation concealment, blinding, and incomplete outcome data. Each domain was rated as having low, high, or unclear risk of bias, according to the recommendations of the Cochrane Collaboration.9 10 For each trial, the overall risk of bias was classified as low (that is, low risk of bias for all domains), high (that is, high risk of bias for one or more domains), or unclear (that is, unclear risk of bias for one or more domains in the absence of high risk of bias). We extracted data from the original reports of trials for the first collection of meta-analyses (in duplicate for a third of the meta-analyses), and from the Cochrane reviews for the second collection.
Data synthesis and analysis
Association between trial sample size and treatment effect
The trials within each meta-analysis were sorted by their sample size: using quarters within each meta-analysis (from quarter 1 including 25% of the smallest trials, to quarter 4 including 25% of the largest trials), and using size groups across meta-analyses (<50, 50-99, 100-199, 200-499, 500-999, and ≥1000 patients). Treatment effects (measured as odds ratios) were compared between quarters and size groups by multilevel logistic regression models with random effects.11 These hierarchical models allowed for taking into account random intervention effects (between trial heterogeneity) within meta-analyses as well as random variation in the effect of trial sample size between meta-analyses. The results were expressed as average ratios of odds ratios. This measure is the ratio of the odds ratio in smaller trials to the odds ratio in larger trials. A ratio of odds ratios less than 1 indicates larger estimates of the treatment effect in smaller trials. The heterogeneity across meta-analyses was quantified with τ2, the variance between meta-analyses. We performed tests for linear trend across quarters and size groups.
We reassessed the influence of trial sample size on treatment effect estimates by comparing treatment effects between trials by quarters (quarter 1 v quarters 2-4; quarters 1 and 2 v quarters 3 and 4; quarters 1-3 v quarter 4) and by fixed thresholds for trial sample size (50, 100, 200, 500, and 1000 patients; for example, for the 200 patient threshold, we compared treatment effects between trials with less than 200 patients and trials with 200 patients or more). We used the two stage approach to meta-epidemiological analyses as described by Sterne and colleagues12 and further adjusted these analyses for the following trial characteristics: domains of risk of bias,13 14 15 16 17 overall risk of bias, centre status,8 18 and time since publication of the first trial within each meta-analysis.
Web appendix 2 details the statistical methods. We used SAS version 9.2 (SAS) for the multilevel models and Stata MP version 10.0 (Stata Corp) for the meta-epidemiological analyses.
The study sample included 93 meta-analyses (735 randomised controlled trials; web appendix 3). A median of seven trials (range 3-30) were included per meta-analysis. Trial sample size varied greatly among the meta-analyses (median 34-2371 patients) and within the meta-analyses (for example, trial sample size ranged from 106 to 48 835 patients in one meta-analysis).
Association of trial sample size and treatment effect
Treatment effect estimates were significantly larger in smaller trials regardless of the sample size. Compared with trials in quarter 4 (which included the largest trials), treatment effects were, on average, 32% larger in trials in quarter 1 (which included the smallest trials; ratio of odds ratios 0.68, 95% confidence interval 0.57 to 0.82), 17% larger in trials in quarter 2 (0.83, 0.75 to 0.91), and 12% larger in trials in quarter 3 (0.88, 0.82 to 0.95). Heterogeneity across meta-analyses ranged from small to moderate in the three comparisons (τ2=0.30, 0.07, and 0.02, respectively; fig 1⇓).
Compared with trials of 1000 patients or more, treatment effects were, on average, 48% larger in trials with fewer than 50 patients (ratio of odds ratios 0.52, 0.41 to 0.66), 34% larger in trials with 50-99 patients (0.66, 0.56 to 0.79), 30% larger in trials with 100-199 patients (0.70, 0.61 to 0.80), 19% larger in trials with 200-499 patients (0.81, 0.73 to 0.88), and 10% larger in trials with 500-999 patients (0.90, 0.82 to 1.00; fig 1). Heterogeneity across meta-analyses was moderate (τ2 ranged from 0.11 to 0.26).
For both analyses, ratios of odds ratios showed a significant linear trend (both P<0.001).
In the two stage meta-epidemiological analyses, treatment effect estimates were, on average, 23% larger in quarter 1 trials (that is, the smallest trials) than in the other trials (ratio of odds ratios 0.77, 95% confidence interval 0.65 to 0.91), 19% larger in quarter 1 and 2 trials than in quarter 3 and 4 trials (0.81, 0.74 to 0.88), and 15% larger in quarter 1-3 trials than in quarter 4 trials (that is, the largest trials; 0.85, 0.79 to 0.90). With comparisons of fixed thresholds of sample size, treatment effect estimates were also significantly larger in smaller trials, regardless of the threshold level. The heterogeneity across meta-analyses was low for all analyses (fig 2⇓). Results were consistent after adjustment on the following trial characteristics: domains of risk of bias, overall risk of bias, centre status, and time of publication since the first trial (web appendix 4).
In this meta-epidemiological study of 93 meta-analyses of 735 trials, we found significantly larger estimates of treatment effects in smaller trials, regardless of sample size. Treatment effect estimates differed within meta-analyses solely based on trial sample size, with, on average, stronger estimates in small to moderately sized trials than in the largest trials. The average difference was substantial and ranged from 12% to 32% when comparing estimates between quarters of sample size within meta-analyses.
Strengths and weaknesses of the study
Our results were based on a large meta-epidemiological study of 93 meta-analyses, representing various medical areas published in the leading journals of each medical speciality or in the Cochrane Database of Systematic Reviews. Cochrane reviews have generally been shown to be of higher methodological quality, are better reported, and have fewer conflicts of interest than do non-Cochrane reviews.19 20 21 To explore the influence of sample size on treatment effect, we used several complementary approaches, which all showed consistent results. However, because our results were based on meta-analyses of trials assessing binary outcomes, they cannot be extrapolated to trials assessing continuous outcomes because such trials usually differ in medical condition, risk of bias, sample size, and statistical analysis.
Several mechanisms could help explain the association between trial sample size and treatment effects regardless of sample size. The first may be related to reporting bias. Smaller studies are more prone to publication bias,5 defined by the tendency for more likely publication of reports of studies with significant than non-significant results.22 A continuum of publication bias could exist to some extent: the larger the trial, the greater the probability that results are published, regardless of statistical significance. Smaller trials might also be more prone to outcome reporting bias.23 24
Another possible mechanism is the difference in methodological quality by sample size.25 Our results tended to be consistent after adjustment on domains of risk of bias, as well as overall risk of bias. Finally, it is possible that the larger the sample size, the greater the heterogeneity in selecting participants26 or implementing interventions. Future research is needed to explore the effect of these different mechanisms.
Implications for researchers
Our results have an important effect on the interpretation of results of clinical trials and meta-analyses in general. The main issue of systematic reviews and meta-analyses is whether the combined treatment effect estimated by synthesising all included studies provides the best estimate of the true treatment effect, or whether the studies overestimate or underestimate the treatment effect.27 Because larger trials28 are probably more pragmatic than smaller trials, with wider eligibility criteria26 and greater variability in interventions, treatment effect estimates reported in large trials could be closer to the true treatment effect in real life. Thus, meta-analyses of all available evidence—whatever the trial sample size—might not reflect the true treatment effect.
Several authors have suggested that the results from large randomised controlled trials are inherently superior to those from smaller trials, even when the results of smaller trials are pooled in a meta-analysis,29 30 and that an substantially large trial should be conducted to definitively answer the question across a large sample of the population. Glasziou and colleagues31 also proposed relying on the results of the most precise trial if a meta-analysis was not available. Our results raise questions about whether the meta-analysis should be restricted to larger trials (or even to the “largest” trial). The downside of this approach would be imprecise estimates of the treatment effect. Rücker and colleagues recently proposed a method of limit meta-analysis, which allows for predicting treatment effects when the precision of each trial is increased to infinity.32 33 This approach is close to the regression based model described by Moreno and colleagues.34
With a large meta-epidemiological study, we found smaller trials to have significantly larger estimates of treatment effects, regardless of sample size. Effect estimates differed within meta-analyses solely based on trial sample size, with, on average, stronger effect estimates in small to moderately sized trials than in the largest trials. These stronger effects might not reflect the true treatment effect; therefore, robustness of the conclusions of a meta-analysis, including assessment of the influence of trial sample size on treatment effect estimates using sensitivity analyses (for example, subgroup analyses comparing quarters or limit meta-analysis), should be assessed. Reviewers and readers can easily check whether the result for the overall meta-analysis agrees with the results for the largest trials (that is, those in quarter 4 of sample size). Interpretation of the pooled result should be cautioned when this is not the case. More generally, our results raise questions about how meta-analyses are currently performed, especially whether all available evidence should be included in meta-analyses because it could lead to more beneficial results.
What is already known on this topic
Sample size varies greatly among trials, ranging from tens of patients to thousands of patients, even within one meta-analysis answering the same question
Small study effect has been previously defined as the tendency for small trials in a meta-analysis to show larger treatment benefits
The concept of using a single threshold to distinguish small and large trials, whatever the medical area or intervention being tested, is not straightforward
What this study adds
Smaller trials had significantly larger estimates of treatment effect, regardless of sample size.
Effect estimates differed within meta-analyses solely based on trial sample size, with, on average, stronger estimates seen in small to moderately sized trials than in the largest trials
The robustness of the conclusions of a meta-analysis should be assessed with careful interpretation of results if the overall result is not consistent with those of the largest trials
Cite this as: BMJ 2013;346:f2304
We thank Mickaël Randrianandrasana for data extraction from the Cochrane reports for the second collection of meta-analyses.
Contributors: AD is the study guarantor, and was involved in the study conception, selection of trials, data extraction, data analysis, interpretation of results, and drafting of the manuscript; LT was involved in the study conception, data analysis, interpretation of results, drafting of the manuscript; IB and PR were involved in the study conception, interpretation of results, and drafting of the manuscript. All authors, external and internal, had full access to all the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.
Funding: This study was funded by an academic grant from the Programme Hospitalier de recherche Clinique Régional (AOR10017). Our team is supported by an academic grant (DEQ20101221475) for the programme “Equipe espoir de la Recherche,” from the Fondation pour la Recherche Médicale. The researchers declare that they are independent from the funders. The funders did not have any role in the study.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support from the Programme Hospitalier de recherche Clinique Régional and the Fondation pour la Recherche Médicale for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data available.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.