Credibility of claims of subgroup effects in randomised controlled trials: systematic reviewBMJ 2012; 344 doi: https://doi.org/10.1136/bmj.e1553 (Published 15 March 2012) Cite this as: BMJ 2012;344:e1553
All rapid responses
Sun and colleagues have investigated the credibility of reported claims of subgroup effects in published randomised controlled trials using specified criteria . The authors considered the critical criteria for evaluating the credibility for an observed subgroup effect to be: use of subgroup variables measured at baseline; pre-specification of subgroup hypotheses; and statistical significance of an interaction test. The first of these is reasonable as subgroups defined according to post-randomisation characteristics might be influenced by tested interventions and so the observed differences may simply be the result of bias. However, the second and third criteria lack a firm conceptual rationale to justify them as stated, and so the conclusions drawn from their analysis may be flawed.
In a framework of hypothesis testing the fundamental question of interest is “given the data, what is the probability that the null hypothesis (or the alternative hypothesis) is true”. It can be shown that, in the absence of bias, this probability depends on the prior probability of the alternative hypothesis being correct (the prior), on the statistical power of the study to detect an effect and the P-value . It is this simple observation that should inform the interpretation of reported sub-group effects.
In a randomized trial the prior for the primary endpoint ought to be close to 50 percent – equipoise. Under this prior, a result declared significant at P<0.05 will have a six percent chance of being a false positive if the study power was 80 percent. However, the prior relating to a subgroup effect being may be small, and is often substantially smaller than 50 percent. Under a prior of 5 percent and the same power, a result declared significant at P<0.05 will have a 54 percent chance of being a false positive.
The prior does not depend directly on either the number of hypotheses tested or on whether or not the hypothesis was pre-specified, although these are related. If one pre-specifies a hypothesis it suggests that there was some rationale for that hypothesis and therefore the prior for that hypothesis may be higher than a post hoc hypothesis. But this may not always apply, as one could pre-specify an extremely unlikely hypothesis. For example the hypothesis that being born under the star sign Libra affects response to a particular drug. On the other hand, evidence from an external source may arise during the conduct of a trial that results in a new hypothesis that is highly plausible, and so, despite being post hoc, the prior may be high. It is apparent that it is not whether or not the hypothesis was pre-specified that is important, but whether or not that hypothesis is likely to be correct.
The P-value, in itself, is an uninformative probability - it is the probability of obtaining data as or more extreme than those observed IF the null hypothesis is correct. The incorrect interpretation of the P-value as the probability of the observed data occurring by chance has been responsible for serious misinterpretation of many of the findings from observational and clinical epidemiology, including the interpretation of subgroup analyses in randomized controlled trials. A similar error arises from interpreting a 95 percent confidence interval as the range over which we are 95 percent certain that the true value will lie. As shown above, when the prior is low, even a result declared significant at P < 0.05 may be more likely to be a false positive than a true positive. The P-value can only be interpreted rationally when the prior is taken into account.
Statistical power also needs to be taken into account. A statistically significant result arising from a small study with low power is more likely to be a false positive than a result with the same P-value from a large study. This is particularly relevant for subgroup analyses in which the sample sizes may be considerably smaller than the main study.
If they are to be useful and not just a tick box exercise, any criteria for evaluating reported subgroup effects should explicitly state how those criteria might affect the prior. Only then can the subgroup effect be interpreted.
1. Sun X, Briel M, Busse JW, You JJ, Akl EA, Mejza F, Bala MM, Bassler D, Mertz D, Diaz-Granados N, Vandvik PO, Malaga G, Srinathan SK, Dahm P, Johnston BC, Alonso-Coello P, Hassouneh B, Walter SD, Heels-Ansdell D, Bhatnagar N, Altman DG, Guyatt GH. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ 2012;344:e1553.
2. Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl. Cancer Inst. 2004;96(6):434-42.
Competing interests: No competing interests
Sun et al.(1) studied subgroup analyses in published randomised controlled trials. One of their results was that essential statistical findings such as interaction tests were often not reported. The outcomes of subgroup analyses therefore need to be interpreted with caution, as was also emphasised by Oxman(2) in the related editorial. However, subgroup analyses are crucial for enhancing our knowledge and this was not discussed by the authors.
Only average results are reported in randomised controlled trials and it follows from this that a treatment effect in a randomised controlled trial only implies that there was a subgroup in which the treatment has been successful(3). Trial participants are often recruited from a very heterogeneous population. It could well be that treatment did not work or even had an adverse effect in a number of participating patients. The problem is that one has to identify the non-responders, consider adjusting the theory and plan new studies. Cartwright and Munro(4) argued that theoretical and practical knowledge related to the particular study area was crucial for doing this.
Sun et al.(1) rightly mentioned that treatment should not be withheld to patients only because statistical analysis has shown that it was not effective in their subgroup. However, it is important to report subgroup findings and plan new studies to test further claims.
1. Sun X, Briel M, Busse JW, You JJ, Akl EA, Mejza F, et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ. 2012;1553(March):in press.
2. Oxman AD. Subgroup analyses The devil is in the interpretation. BMJ. 2012;2022(March):in press.
3. Cartwright N. Are RCTs the Gold Standard? BioSocieties. 2007;2:11–20.
4. Cartwright N, Munro E. The limitations of randomized controlled trials in predicting effectiveness. Journal of Evaluation in Clinical Practice. 2010;16:260–6.
Competing interests: No competing interests