Intended for healthcare professionals


Any casualties in the clash of randomised and observational evidence?

BMJ 2001; 322 doi: (Published 14 April 2001) Cite this as: BMJ 2001;322:879

No—recent comparisons have studied selected questions, but we do need more data

  1. John P A Ioannidis, associate professor and chairman (jioannid{at},
  2. Anna-Bettina Haidich, research fellow,
  3. Joseph Lau, professor
  1. Clinical Trials and Evidence-Based Medicine Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece
  2. Division of Clinical Care Research, Department of Medicine, New England Medical Center, Tufts University School of Medicine, Boston, MA 02111, USA

    Randomised controlled trials and observational studies are often seen as mutually exclusive, if not opposing, methods of clinical research. Two recent reports, however, identified clinical questions (19 in one report,1 five in the other2) where both randomised trials and observational methods had been used to evaluate the same question, and performed a head to head comparison of them. In contrast to the belief that randomised controlled trials are more reliable estimators of how much a treatment works, both reports found that observational studies did not overestimate the size of the treatment effect compared with their randomised counterparts. The authors say that the merits of well designed observational studies may need to be re-evaluated: case-control and cohort studies may need to assume more respect in assessing medical therapies and largescale observational databases should be better exploited. 1 2 The first claim flies in the face of half a century of thinking, so are these authors right?

    The combined results from the two reports indeed show a striking concordance between the estimates obtained with the two research designs. A correlation analysis we performed on their combined databases found that the correlation coefficient between the odds ratio of randomised trials and the odds ratio of observational designs is 0.84 (P<0.001). This represents excellent concordance (figure). In fact, it is better than that observed when the results of small randomised trials and their meta-analyses were compared with the results of large randomised trials.3 To complicate matters, the concordance has been worse when the results of specific large randomised trials on the same topic were compared among themselves.3 Concato et al further observe that, for the five clinical questions they evaluated, observational studies for each question had very similar odds ratios between themselves,2 whereas the results of the randomised trials were often very heterogeneous. Popular wisdom has it that a “gold standard” method should give more or less the same results when repeated several times, while a poor method would suffer from lots of variability. So should observational studies be the gold standard instead of randomised trials?


    Comparison of pooled odds ratio from observational studies against pooled odds ratio from randomised controlled trials on the same question. The 25 questions with binary outcomes are derived from Benson and Hartz1 and Concato et al2; one question had three comparisons, and another topic is not included since the outcome was not dichotomous

    Such a thought would be anathema to most clinical trialists.4 A closer inspection of the data suggests several caveats. Firstly, in six of 25 comparisons the 95% confidence intervals of the summary effect from observational studies does not include the summary point estimate of the randomised trials. Moreover, in three cases the pooled point estimates are in the opposite direction (one suggests harm, the other benefit); in two more cases one pooled odds ratio estimate is exactly 1.00, and the other documents benefit. So, perhaps concordance is not all that perfect, depending on how one looks at it.

    Secondly, variability may be a blessing and not a nuisance. Variable results in randomised trials suggest that these trials have indeed managed to study diverse patient populations and treatment circumstances where the efficacy of a treatment may differ.5 Observational studies may tend to amalgamate large populations and reach average population-wide effects where there is less variability but where it is also more difficult to discern which patients are likely to benefit from an intervention.

    Perhaps more importantly, Benson and Hartz1 and Concato et al2 are still dealing with only a very small portion of randomised and observational research. Their sampling failed to capture some prodigious discrepancies between the two methods. Interventions such as β carotene and α tocopherol, which have brought fame to observational epidemiologists, crashed when they were tested in rigorous randomised controlled trials. 6 7 Given the hundreds of thousands of trials and observational studies that have been conducted and are still being conducted, the number of topics studied in the two reports is limited and subject to strong selection biases.

    Perhaps the most important bias is that it is only for very selected clinical questions that both designs are concurrently used, and investigators are willing to compare the designs in an even smaller minority. In a continuing effort to compare the merits of the two designs, we have found about 50 topics where both randomised and observational evidence were considered in the same meta-analysis among over 2000 meta-analyses performed in the past 25 years. Despite some overlap, the two types of designs are used in largely different settings.

    For interventions that show very large harmful effects in observational studies, randomised trials may be justifiably discouraged and never performed. For interventions that have already shown large beneficial treatment effects in observational trials (risk ratios less than 0.40) the ethics of randomisation may also be questioned. Interventions with modest postulated effects (risk ratios in the range 0.40-0.90) are likely to be targeted by randomised trials; in this setting, observational studies may not be given comparable credit and may be unjustifiably discarded once randomised trials have been performed. Finally, for interventions with very small postulated effects (risk ratios 0.90-1.00) adequately powered randomised trials may be difficult to perform given the sample size requirements, and thus only observational evidence may be generated.

    Besides the size of the postulated treatment effect, another important selection force is the frequency of the outcome of interest. Rare yet important outcomes are unlikely to be studied in trials, given the extreme requirements of sample size and follow up. In contrast, when the outcomes of interest are common, trials are convenient.

    More empirical evidence is needed on the merits of various research designs. We need more quantitative evidence to understand what exactly each design can tell us and how often and why each design may go wrong. Discarding observational evidence when randomised trials are available is missing an opportunity. Conversely, abandoning plans for randomised trials in favour of quick and dirty observational designs is poor science. The careful comparisons of methods performed by Benson and Hartz1 and Concato et al2 can enhance our understanding about their relative merits and we should encourage such comparisons when the use of various clinical research designs is ethically appropriate.


    ABH is supported by a grant from the General Secretariat of Research and Technology, Greece, and the European Union.


    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    View Abstract