The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trialsBMJ 1998; 317 doi: http://dx.doi.org/10.1136/bmj.317.7167.1185 (Published 31 October 1998) Cite this as: BMJ 1998;317:1185
- Department of Nephrology, Charité, Berlin, Germany
- Health Services Research Unit, National Institute of Public Health, Oslo, Norway
- Correspondence to: Dr Oxman
Objective To summarise comparisons of randomised clinical trials and non-randomised clinical trials, trials with adequately concealed random allocation versus inadequately concealed random allocation, and high quality trials versus low quality trials where the effect of randomisation could not be separated from the effects of other methodological manoeuvres.
Design Systematic review.
Selection criteria Cohorts or meta-analyses of clinical trials that included an empirical assessment of the relation between randomisation and estimates of effect.
Data sources Cochrane Review Methodology Database, Medline,SciSearch, bibliographies, hand searching of journals, personal communication with methodologists, and the reference lists of relevant articles.
Main outcome measures Relation between randomisation and estimates of effect.
Results Eleven studies that compared randomised controlled trials with non-randomised controlled trials (eight for evaluations of the same intervention and three across different interventions), two studies that compared trials with adequately concealed random allocation and inadequately concealed random allocation, and five studies that assessed the relation between qualityscores and estimates of treatment effects, were identified. Failure to use random allocation and concealment of allocation were associated with relative increases in estimates of effects of 150% or more, relative decreases of up to 90%, inversion of the estimated effect and, in some cases, no difference. On average, failure to use randomisation or adequate concealment of allocation resulted in larger estimates of effect due to a poorer prognosis in non-randomly selected control groups compared with randomly selected control groups.
Conclusions Failure to use adequately concealed random allocation can distort the apparent effects of care in either direction, causing the effects to seem either larger or smaller than they really are. The size of these distortions can be as large as or larger than the size of the effects that are to be detected.
Empirical studies support using random allocation in clinical trials and ensuring that the allocation process is concealed—that is, that assignment is impervious to any influence by the people making the allocation
The effect of not using concealed random allocation can be as large or larger than the effects of worthwhile interventions
On average, failure to use concealed random allocation results in overestimates of effect due to a poorer prognosis in non-randomly selected control groups compared with randomly selected control groups, but it can result in underestimates of effect, reverse the direction of effect, mask an effect, or give similar estimates of effect
The adequacy of allocation concealment may be a more sensitive measure of bias in clinical trials than scales used to assess the quality of clinical trials
It is a paradox that the unpredictability of randomisation is the best protection against the unpredictability of the extent and direction of bias in clinical trials that are not properly randomised
Observational evidence is clearly better than opinion, but it is thoroughly unsatisfactory. Allresearch on the effectiveness of therapy was in this unfortunate state until the early 1950s. The only exceptions were the drugs whose effect on immediate mortality were so obvious that no trials were necessary, such as insulin, sulphonamide, and penicillin.1
“The basic idea, like most good things, is very simple.”1 Randomisation is the only means of controlling for unknown and unmeasured differences between comparison groups as well as those that are known and measured. Random assignment removes the potential of bias in the assignment of patients to one intervention or another by introducing unpredictability. When alternation or any other preset plan (such as time of admission) is used, it is possible to arrange to enter a patient into a study at an opportune moment. With randomisation, however, each patient's treatment is assigned according to the play of chance. It is a paradox that unpredictability is introduced into the design of clinical trials by using random allocation to protect against the unpredictability of the extent of bias in the results of non-randomised clinical trials.
Despite this simple logic, and many examples of harm being done because of delays in conducting randomised trials, there are limitations to the use of randomised trials, both real and imagined, and scepticism about the value of randomisation.2–5 We believe this scepticism is healthy. It is important to question assumptions about research methods, and to test these assumptions empirically, just as it is important to test assumptions about the effects of health care. In this paper we have attempted systematically to summarise empirical studies of the relation between randomisation and estimates of effect.
We included four types of comparisons in our review: randomised clinical trials versus non-randomised clinical trials of the same intervention, randomised clinical trials versus non-randomised clinical trials across different interventions, adequately concealed random allocation versus inadequately concealed random allocation in trials, and high quality trials versus low quality trials in which the specific effect of randomisation or allocation concealment could not be separated from the effect of other methodological manoeuvres such as double blinding. Both descriptive and analytical assessments of the relation between the use of random allocation and estimates of effect are included, based on cohorts or meta-analyses of clinical trials.
We identified studies from the Cochrane Review Methodology Database,6other methodological bibliographies, Medline, and SciSearch, and by hand searching journals, personal communication with methodologists, and checking the reference lists of relevant articles. These searches were conducted up to July 1998. Potentially relevant citations were retrieved and assessed for inclusion independently by both authors. Disagreements were resolved by discussion.
We used the following criteria to appraise the methodological quality of included studies: Were explicit criteria used to select the trials? Did two or more investigators agree regarding the selection of trials? Was there a consecutive or complete sample of clinical trials? Did the study control for other methodological differences such as double blinding and complete follow up? Did the study control for clinical differences in theparticipants and interventions in the included trials? Were similar outcome measures used in the included trials? The overall quality of each study was summarised as: no important flaws, possibly important flaws, or major flaws.
For each study one of us (RK) extracted information about the sample of clinical trials, the comparison that was made, the type of analysis undertaken, and the results, and the other checked the extracted data against the published article. The reported relation between randomisation and estimates of effect was recorded and, if possible, converted to the relative overestimation or underestimation of the relative risk reduction. We prepared tables for each type of comparison to facilitate a qualitative analysis of the extent to which the included studies yielded similar results, and heterogeneity in the included studies was explored both within and across comparisons.
In summarising the results we have assumed that evidence from randomised trials is the reference standard to which estimates from non-randomised trials are compared. However, as with other gold standards, randomised trials are not without flaws, and this assumption is not intended to imply that the true effect is known, or that estimates derived from randomised trials are always closer to the truth than estimates from non-randomised trials.
We have identified 18 cohorts or meta-analyses that met our inclusion criteria, totalling 1211 clinical trials.7–24Efforts to develop an efficient electronic search strategy using Medline have thus far not been successful due to poor indexing. Searches for studies that cited Colditz and colleagues,15Miller and colleagues,16Chalmers and colleagues,18 or Schulz and colleagues19 using SciSearch yielded seven additional studies. Searches using SciSearch for studies that cited the other studies meeting our inclusion criteria did not yield any other additional studies. Exploratory hand searching of three methodological journals (Controlled Clinical Trials, Statistics in Medicine, and the Journal of Clinical Epidemiology) for four years (1970, 1980, 1990, and 1995) yielded a single relevant study published in 1990. The 18 included studies were published in 14 different journals. The majority of studies were identified through personal communication with methodologists and through bibliographies and reference lists.
Randomised trials versus non-randomised trials of the same intervention
Table 1 summarises the eight studies comparing randomised clinical trials and non-randomised clinical trials of the same intervention. In five of the eight studies, estimates of effect were larger in non-randomised trials. Outcomes in the randomised treatment groups and non-randomised treatment groups were frequently similar, but worse outcomes among historical controls spuriously increased the estimated treatment effects. One study found comparable results for both allocation procedures, and two studies reported smaller treatment effects in non-randomised studies. In one study the smaller estimate of effect was due to a poorer prognosis for patients in the non-randomised treatment groups. The deviation of the estimates of effect for non-randomised trials compared with randomised trials ranged from an underestimation of effect of 76% to an overestimation of effect of 160%.
Randomised trials versus non-randomised trials across different interventions
The evidence from comparisons across different interventions and various study designs (randomised controlled trials and non-randomised controlled trials, crossover designs, and observational studies) is less clear (table 2). In all three studies several study designs and clinical conditions were combined and their diverse outcomes converted to a standardised effect size. There was substantial clinical heterogeneity, and there were many other factors that could distort or mask a possible association between randomisation and estimates of effect. No consistent relation between study design or quality and the magnitude of the estimates of effect was detected.
Adequately concealed allocation versus inadequately concealed allocation
Concealed random allocation to treatment—that is, blinding of the randomisation schedule to prevent subversion by the investigators or trial participants—should ensure protection against biased allocation. Chalmers and colleagues found that within randomised controlled trials failure adequately to conceal allocation was associated with larger imbalances in prognostic factors and larger treatment effects (table 3).18They reported a more than sevenfold overestimation of the treatment effect in trials with inadequately concealed allocation. They did not, however, control for other methodological factors in their descriptive analysis.18Schulz and colleagues conducted a multivariate analysis that controlled for blinding and completeness of follow up, which yielded similar results.19They found that inadequately concealed random allocation (for example, alternation) compared with adequately concealed random allocation (for example, assignment by a central office) resulted in estimates of effect (odds ratios) that were on average 40% larger.
High quality trials versus low quality trials
Considerable differences in the observed treatment effect were detected when the results of high quality studies were compared with those of low quality studies in the context of systematic reviews of specific health care (table 4). In these studies the estimates of effect were distorted in both directions and even caused the alarming situation of a harmful intervention associated with a reduction in pregnancies (odds ratio 0.5, on the basis of high quality studies) seeming beneficial in low quality studies (odds ratio 2.6, on the basis of low quality studies). In two meta-analyses, low quality studies consistently underestimated the beneficial effect of the intervention being evaluated by 27% to 100%, and an effective treatment could have been discarded based on the results of low quality studies.
The methodological quality of the studies included in this review varied. Four studies met all of our criteria. 19 21–23 Three of these assessed the impact of bias on the effect of a specific healthcare intervention as part of a systematic review, and the analysis was performed as part of a subgroup analysis to test the robustness of the overall finding.21–23 The other 14 studies had one or more methodological flaws including not controlling for other methodological manoeuvres 16182227 or clinical differences. 7 13–17 2024
It has proved difficult to develop efficient search strategies for locating empirical methodological studies such as the ones included in this review. Although we believe it is unlikely that there are many published methodological studies such as the ones by Sacks and colleagues,8Schulz and colleagues,19Chalmers and colleagues,18and Emerson and colleagues20that we have not identified, there may be unpublished or ongoing studies like these that we have not identified, and it is likely that there are many meta-analyses that meet the inclusion criteria for this review that we have not identified. The Cochrane Library contains 428 completed reviews and 397 protocols, and there are over 1700 entries in the database of abstracts of reviews of effectiveness.26 We have not systematically gone through all of these meta-analyses. An expanded version of this review will be published in the Cochrane Library and kept up to date through the Cochrane Empirical Methodological Studies Methods Group.27Additional studies will be added to the review, and any errors that are identified will be corrected.
We have not included comparisons between randomised controlled trials and cohort studies,28case-control studies, 29 30or evaluations of effectiveness using large healthcare ministrative databases,3although some of the studies in this review included observational studies. Observational studies often provide valuable information that is complementary to the results of clinical trials. For example, case-control studies may be the best available study design for evaluating rare adverse effects, and large database studies may provide important information about the extent to which effects that are expected based on randomised clinical trials are achieved in routine practice. However, it is important to remember that it is only possible to control for confounders that are known and measured in observational studies, and we should be wary of hubris and its consequences in assuming that we know all there is to know about any disease.
As with any review the quality of the data is limited by the quality of the studies that we have reviewed. Most of the studies included in the review had one or more methodological flaws. In many of the included comparisons, particularly those between randomised controlled trials and historically controlled trials, methodological differences other than randomisation may account for some of the observed differences in estimates of effect. 7–91318
Four of the studies met all of our criteria for assessing methodological quality, 19–21–23and one study in particular provided strong support for the conclusion that clinical trials that lack adequately concealed random allocation produce estimates of effect that are on average 40% larger than clinicaltrials with adequately concealed random allocation, but that the degree and the direction of this bias varies widely.19This study also shows the potential contribution that systematic reviews, and notably the Cochran Database of Systematic Reviews, can make towards developing an empirical basis for methodological decisions in evaluations of health care. Currently this empirical basis is lacking, and many methodological debates rely more on logic or rhetoric than evidence. Analyses such as the one undertaken by Schulz and colleagues, in which methodological comparisons are made among trials of the same intervention, are likely to yield more reliable results than comparisons that are made across different interventions which, not surprisingly, tend to be inconclusive.15–17
We have assumed that, in general, differences between randomised trials and non-randomised trials or between trials with adequately concealed random allocation and inadequately concealed random allocation are best explained by bias in the non-randomised controlled trials and inadequately concealed trials. This assumption is supported by findings of large imbalances in prognostic factors as well. However, it is possible that randomised controlled trials can sometimes underestimate the effectiveness of an intervention in routine practice by forcing healthcare professionals and patients to acknowledge their uncertainty and thereby reduce the strength of placebo effects. 4 25 31It is also possible that publication bias can partly explain some of the differences in results observed in studies such as the one by Sacks and colleagues.8This would be the case if randomised trials are more likely to be published regardless of the effect size, than historically controlled trials. However, we are not aware of any evidence that supports this hypothesis, and the available evidence shows consistently that randomised trials, like other research, are also more likely to be published if they have results that are considered significant.32–35
Several explanations for discrepancies between estimates of effect derived from randomised trials and non-randomised trials are possible. For example, it can be argued that estimates of effect might be larger in randomised trials if the care provided in the context of trials is better than that in routine practice, assuming this is the case for the treatment group and not the control group. Similarly, strict eligibility criteria might select people with a higher capacity to benefit from a treatment, resulting in larger estimates of effect in randomised trials than non-randomised trials with less strict eligibility criteria. If, for some reason, patients with a poor prognosis were more likely to be allocated to the treatment group in non-randomised trials then this would also result in larger estimates of effect in randomised trials. Conversely, if patients with a poor prognosis were more likely to be allocated to the control group in non-randomised trials, as often seems to be the case based on the results of this review, this would result in larger estimates of effect in the non-randomised trials.
Overall, this review supports using random allocation in clinical trials and ensuring that the randomisation schedule is adequately concealed. The effect of not using random allocation with adequate concealment can be as large or larger than the effects of worthwhile interventions. On average, non-randomised trials and randomised trials with inadequately concealed allocation result in overestimates of effect. This bias, however, can go in either direction, can reverse the direction of effect, or can mask an effect.
For those undertaking clinical trials this review provides support for using randomisation to assemble comparison groups.25 For those undertaking systematic reviews of clinical trials, this review provides support for considering sensitivity analyses based on the adequacy of allocation concealment in addition to or instead of on the basis of overall quality scores, which may be less sensitive measures of bias.
As Cochrane stated: “The [randomised controlled trial] is a very beautiful technique, of wide applicability, but as with everything else there are snags.”1 Those making decisions on the basis of clinical trials need to be cautious of small trials (even when they are properly randomised) and systematic reviews of small trials both because of chance effects and the risk of biased reporting. 3637It is also possible to introduce bias into a trial despite allocation concealment. 1938 Finally, even when the risk of error due to either bias or chance is small, judgments must be made about the applicability of the results to individual patients 3940and about the relative value of the probable benefits, harms, and costs. 4142
We thank Alex Jadad, Steve Halpern, and David Cowan for help in locating studies, Dave Sackett and Iain Chalmers for encouragement and advice, Mike Clarke for reviewing the manuscript, Annie Britton and other colleagues for provision of their bibliographies on research methodology, and the investigators who conducted the studies we reviewed.
Contributors: RK and ADO contributed to the preparation of the protocol and the final manuscript and assessed the relevance and methodological quality of retrieved reports. RK prepared the first drafts of the protocol and the paper, undertook the majority of the searches with help from David Cowan, Steve Halpern, Alex Jadad, and collected data from the included studies. ADO checked the collected data against the original reports. Both authors will act as guarantors for the paper.
Funding Norwegian Ministry of Health and Social Affairs
Competing interests None declared.