Intended for healthcare professionals

Editorials

Bayesian statistical methods

BMJ 1996; 313 doi: https://doi.org/10.1136/bmj.313.7057.569 (Published 07 September 1996) Cite this as: BMJ 1996;313:569
  1. Laurence Freedman
  1. Acting chief Biometry Branch, Division of Cancer Prevention and Control, National Cancer Institute, Bethesda, MD 20892, USA

    A natural way to assess clinical evidence

    In this week's BMJ, Lilford and Braunholtz (p 603) explain the basis of Bayesian statistical theory.1 They explore its use in evaluating evidence from medical research and incorporating such evidence into policy decisions about public health. When drawing inferences from statistical data, Bayesian theory is an alternative to the frequentist theory that has predominated in medical research over the past half century.

    As explained by Lilford and Braunholtz, the main difference between the two theories is the way they deal with probability. Consider a clinical trial comparing treatments A and B. Frequentist analysis may conclude that treatment A is superior because there is a low probability that such an extreme difference would have been observed when the treatments were in fact equivalent. Bayesian analysis begins with the observed difference and then asks how likely is it that treatment A is in fact superior to B. In other words, frequentists deduce the probability of observing an outcome given the true underlying state (in this case no difference between treatments), while Bayesians induce the probability of the existence of the true but as yet unknown underlying state (in this case, A is superior to B) given the data.

    The difference is quite profound, and, although the conclusions reached by applying the two methods may be qualitatively the same, the mode of expressing those conclusions will always be different. For example, a frequentist may conclude that the difference between treatments A and B is highly significant (P = 0.002), meaning that the chance of observing such an extreme difference when A and B are in fact equivalent is about 2 in 1000. Faced with the same data, a Bayesian may conclude that the probability that treatment A is superior to B is 0.999 (or some other number very close to 1). Both statements lead to the same conclusion, that there is overwhelming evidence of treatment A's superiority. However, in more complex situations, as illustrated by Lilford and Braunholtz,1 the conclusions will not necessarily coincide.

    Doctors may now be comfortable with stating the conclusions of a study in terms of P values. However, most people find Bayesian probability much more akin to their own thought processes. Indeed, many clinicians mistake P values for statements of Bayesian probability. A favourite multiple choice question for medical examinees has the root “P<0.05 means that” followed by a choice of options including “treatment A has less than a 5% chance of being superior to treatment B,” which is incorrectly chosen by many unwary candidates.

    If Bayesian inference is more natural, why has it not been used in preference to frequentism? The answer lies in the need in Bayesian analysis to establish prior probability. To obtain a Bayesian probability that treatment A is superior to B after a trial has been conducted, it is necessary to specify a prior probability—for example, the probability of A's superiority based on evidence available before the trial.

    Frequentists argue that this specification introduces an element of arbitrariness and subjectivity into an otherwise objective procedure. Bayesians counter, first, that research experiments are rarely conducted in the absence of external information, and if such information exists then it should be quantified and included in the concluding inference, and, second, that frequentism itself has elements of subjectiveness, most notably in the choice of P<0.05 as a criterion for statistical significance.

    While this debate continues in relation to some aspects of medical research, in others the use of Bayesian methods has penetrated deeply. One of the earliest and most natural applications is to diagnostic medicine. The formal calculation of an individual's probability of having one of a list of possible diseases dovetails with the diagnostic paradigm, in which the doctor conducts a series of tests to try to find the one disease for which the probability is high enough to justify the diagnosis. In this context the prior probability will reflect the relative frequency of the possible diseases at the same or similar hospitals in recent years.2 3 Bayesian methods have also been used for monitoring patients who are at risk of relapse or treatment failure.4 Bayesian methods have become the primary tool for expert systems that acknowledge uncertainty5 and are even becoming integrated with neural networks.6

    Probabilities of different “states of nature,” calculated with Bayesian methods, can be combined with prespecified benefits that would result from taking a certain decision in the face of those given states of nature. (These benefits may combine expected health gains traded off with monetary costs.) The Bayesian approach is to choose the decision that maximises the expected benefit. Detsky has used the approach for deciding which clinical trials are cost effective,7 and Eddy has used it to evaluate the worth of introducing population screening programs for various cancers.8

    The areas in which there is most resistance to Bayesian methods are those where the frequentist paradigm took root in the 1940s to 1960s, namely clinical trials and epidemiology. Resistance is less strong in areas where formal inference is not so important, for example during phase I and II trials, which are concerned mainly with safety and dose finding.9 10 But despite the continued resistance to Bayesian methods as the primary inference tool in phase III trials and epidemiology, there are important aspects of these studies in which the Bayesian approach has acknowledged advantages. One of these is monitoring data as they accumulate during a trial. Group sequential methods, based on frequentist analysis, are currently the standard used for recommending early termination of a trial when interim data indicate clear benefit or harm from one of the treatments. However, there is no agreed method of calculating a P value or confidence interval for the treatment effect after the use of a group sequential method.11 Nor are the methods flexible to the emergence of new external data that might influence early termination.12 Bayesian methods that express prior scepticism about the existence of benefit from a new treatment seem to carry the same advantages of group sequential methods but also take account of new external data in making the final inference.12 These methods have been used recently for the design, monitoring, and analysis of several cancer trials sponsored by Britain's Medical Research Council.13

    Another advantage of Bayesian methods involves the interpretation of multiple hypothesis testing. Clinical trials often address the effect of a treatment in different subgroups of patients. Epidemiological studies are often designed to test hypotheses about a range of putative risk factors for a given disease. Frequentist methods aim to control the probability of finding false subgroup effects or risk factors. This means using more stringent significance levels, such as Bonferroni procedures, where the degree of conservatism in the conclusions increases with the number of subgroup effects or risk factors tested. Bayesian methods of dealing with this multiple testing problem depend not on the number of subgroup effects or risk factors but on the prior information regarding the possibility of these effects. The frequentists' idea that conclusions about risk factor W must become more conservative simply because a study also considers risk factors X, Y, and Z makes the Bayesian approach seem scientifically more sensible.14 Nevertheless, specification of prior distributions in multiple testing problems is difficult, and more research in this area is needed.

    Ten years ago, Bayesian calculations were difficult for all but the simplest problems. But advances in statistical computing techniques using Monte Carlo sampling methods15 have led to an explosion of interest among statisticians. Nowadays, a large proportion of research papers in theoretical statistics journals deal with Bayesian methods. It is only a matter of time before their use becomes more widespread in medicine. To prepare for this, doctors may like to ask their statistical colleagues to teach them about Bayesian methods or read the recently published book by Berry.16 They will be pleasantly surprised by the natural simplicity of the concepts.

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.