Clinical trials and rare diseases: a way out of a conundrum
BMJ 1995; 311 doi: https://doi.org/10.1136/bmj.311.7020.1621 (Published 16 December 1995) Cite this as: BMJ 1995;311:1621- Richard J Lilford, director of research and developmenta,
- J G Thornton, reader in obstetrics and gynaecologyb,
- D Braunholtz, statisticianb
- aWest Midlands Health Authority, Arthur Thomson House, Birmingham B16 9PA
- bnstitute of Epidemiology and Health Services Research, University of Leeds, Leeds LS2 9LN
- Correspondence to: Professor Lilford.
Currently, clinical trials tend to be individually funded and applicants must include a power calculation in their grant request. However, conventional levels of statistical precision are unlikely to be obtainable prospectively if the trial is required to evaluate treatment of a rare disease. This means that clinicians treating such diseases remain in ignorance and must form their judgments solely on the basis of (potentially biased) observational studies, experience, and anecdote. Since some unbiased evidence is clearly better than none, this state of affairs should not continue. However, conventional (frequentist) confidence limits are unlikely to exclude a null result, even when treatments differ substantially. Bayesian methods utilise all available data to calculate probabilities that may be extrapolated directly to clinical practice. Funding bodies should therefore fund a repertoire of small trials, which need have no predetermined end, alongside standard larger studies.
Introduction
: the problem
Randomised clinical trials have become the standard method to assess clinical effectiveness when benefits are modest but worth while. They are more reliable than other methods1 and have solved some clinical questions conclusively--for example, the effectiveness of adjuvant treatment in early breast cancer. Clinical questions are most easily answeredwhen a disease is fairly common and the outcome of interest has a high risk of occurring. It is not surprising that randomised controlled trials have provided fairly conclusive results about the treatment of such conditions as acute myocardial infarction and the common cancers and that these results have formed the basis of clinical guidelines and audit standards. When diseases are rare and benefits modest, however, clinical trials, as currently conceived, have little to contribute. This is because they cannot be expected to provide a “definitive” answer--that is, they cannot be expected to detect or exclude clinically worthwhile differences between treatments with standard levels of statistical confidence. Hence they are not funded by grant giving bodies.
In this article we argue that randomised trials can be expected to provide useful information, even when a definitive answer is unlikely in prospect. Standard (so called frequentist) statistical techniques are not, however, suitable in these circumstances, but bayesian methods provide a much clearer guide to action.
An example of the problem
The evaluation of treatments applicable to congenitally abnormal fetuses (fetal surgery) is an example. The conditions for which this surgery may be contemplated are, individually, rare. For example, fetal hydrothorax, suitable for drainage, has an incidence of 1 in 10000 pregnancies. Current clinical trials are usually designed to give a chance of a false positive answer (P value) of 5%. Provided that the trial is designed to detect a clinical effect that would justify its use in practice, the chance of a false negative result in a trial (the beta or type two error) should also, logically, be 5%.2 Six hundred participants would be needed in each arm of a trial to show that intervention could reduce mortality from 40% to, say, 30%. Access to 12000000 pregnancies would be required to recruit sufficient participants, assuming 100% compliance. Clearly, a grant request designed to look at this problem is likely to fail: there have indeed been no randomised studies of fetal surgery.
The same problem applies to many other rare diseases. Clinicians are forced to rely on observational studies, anecdotal information, (limited) clinical experience, and perception of biological plausibility. When a disease is uniformly and rapidly fatal such non-randomised case series may prove extremely valuable--for example, the use of penicillin to treat meningococcal meningitis. Most rare diseases, however, have a variable prognosis, and bias in allocating treatment in observational studies might be large in relation to the effects of treatment.
Current practice
Studies of rare diseases will remain vulnerable to this bias unless randomised trials are though of in a different way. Currently, they are highly stylised. Clinicians formulate a clinical question--preferably one that applies to a well circumscribed group of patients. They then decide on aworthwhile clinical effect--the size of effect that would make one treatment worth while (allowingfor other desirable or undesirable facets of treatment)--either by seeking subjective opinion or by means of decision analysis.3 The necessary sample size is then calculated on the basis of this worthwhile clinical effect and the acceptable risk for a false negative or false positive result.
In the case of rare diseases clinical scientists are likely to find that a trial of sufficient size to provide a definitive answer is virtually impossible because of the difficulty of recruiting sufficient patients. A study of sufficient size would need to recruit from very large areas over long periods. Such studies are expensive and difficult to organise. If different doctors have different areas of clinical uncertainty then the problem will be compounded because power will be lower still in the variousprognostic subgroups. For example, it may be deemed necessary in the case of fetal hydrothorax to analyse results separately according to whether the hydrothorax is unilateral or bilateral or whether fetal ascites is present. The factor limiting obtaining unbiased evidence for treatment of rare diseases is the concept that trials should provide a definitive answer as defined above. Since clinically useful effects are unlikely to be seen at the standard level of statistical precision, clinicians are locked out: they remain in complete ignorance (or at least having to rely on evidence that is subject to treatment allocation bias).
Confronting the problem
bayesian methods
We suggest an alternative: carry out trials of treatments for rare diseases, even though a definitive answer is unlikely to result in practice. The idea is simply to change the level of certainty.
But how should the results be analysed, given that conventional (frequentist) confidence limitsare unlikely to exclude a null result, even when treatments differ substantially? We suggest that the bayesian perspective is particularly useful in such circumstances.4 5 6 The bayesian approach can give probabilities that the clinical effect lies in a particular range (and also the size of the most likely effect). This is in stark contrast to the often misinterpreted P value produced by the usual (frequentist) approach. The frequentist P value is the probability of the observations (or something more extreme) occurring were the null hypothesis true--a difficult concept to grasp and one that does not provide what is wanted. The bayesian approach, by contrast, provides probabilities of treatment effects that apply directly to the next patient who is similar to those treated in any completed or ongoing trial. Put another way, the bayesian approach provides probabilities that can be used in formal decision analysis, or extrapolated to clinical practice. Thus, the conclusion of the bayesian trial might be that the probabilities that the drainage of fetal hydrothorax reduces mortality by at least 50%, by 25%, or not at all are 0.2, 0.5, and 0.2 respectively.
These probabilities are calculated on the basis of the observed data and a prior distribution of probabilities. In this case the prior distribution usually represents the expectations of clinicians (or a clinician) before the trial. The purpose of the trial is to alter that belief accordingto the strength of the evidence. A null hypothesis would suggest a prior distribution of probabilities based on no effect--that is, the most likely effect perceived before observation of the data is no difference. The further a hypothesised result deviates from this the less likely it is to happen. A typical prior expectation would be that the probability of relative risks differing by more than twofold is extremely unlikely--having a probability of 0.025 (2.5%) in each direction. The crucial point is that the bigger the trial the greater the relative effect of data on the prior distribution and the less will be the difference between conventional confidence limits and the equivalent bayesian interval. The particular strength of the bayesian approach is that it produces a probability distribution which may guide clinical action even when a “definitive” answer is not available--the expected result of clinical trials of rare diseases.
Because the bayesian approach attempts simply to enumerate the probabilities of effects of different sizes, the trial can be analysed as data accumulate--a so called open trial.5 7 8 9 The traditional (so called frequentist) method, by contrast, is based on hypothesis testing, and the probability of getting a false positive answer isaffected by how often the data are tested statistically. This is a further argument for the bayesian approach for trials of treatment for rare diseases, since such studies are unlikely ever to becomplete and the rate of recruitment is difficult to predict in advance.
Of course, small trials will sometimes mislead. Thus, in the example given above, a small trialmay suggest that there is a probability of 0.5 (50%) that draining a fetal hydrothorax reduces mortality by 25% or more. However, the distribution of probabilities might be such that there is, say, a probability of 0.2 that this intervention does not reduce the risk of death or increases it. Clearly, there is a substantial risk of getting the wrong answer. If, however, we had an equipoised prior expectation--that is, we thought it equally likely at the start of the trial that the intervention (or non-intervention) would improve or diminish the chances of the desired outcome--then we have achieved something. Instead of an equal chance of benefit and harm, we now have an 80% chance of benefit and a 20% chance of harm. Clinicians are familiar with the need to make decisions under uncertainty and recommend the treatment which seems to have the best chance of maximising benefit (expected utility).
The alternative is to eschew clinical trials for rare diseases, and thereby to remain in ignorance and subject to bias. The point is that clinicians must make management decisions and clinical science should aim to provide the best possible information. The individual rare diseases are, by definition, not a big public health problem. There are, however, many rare diseases, so that, taken together, they represent a substantial health burden. Small trials might mislead us on, say, 20% of topics, but they would then suggest the correct treatment on the remaining 80%. Assuming that in each case clinicians were in justifiable collective equipoise10 before the trial began--that is, they were split 50:50--then if clinicians follow trial recommendations 20% of people overall would get the wrong treatment if such trials were done compared with 50% if they were not done.
Ethics and bayesian trials
The bayesian approach is ethically sound; it makes prior belief explicit. Trials are ethical when prior belief is in equipoise10 11 12--that is, randomisation occurs only when the doctor or doctor and patient expect utility to be the same with each treatment.12 13 14 Although randomisation will reduce the chance of treatment allocation bias, different doctors and doctor and patient pairs will be in equipoise at different levels of basic risk.15 Therefore, in the analysis it may be desirable to stratify for risk. This was done in the Medical Research Council's cervical cerclage trial.16 The entry criterion was clinical uncertainty, but the prognostic features producing such uncertainty varied considerably from clinician to clinician. Stratified analysis, however, showed that this treatment was effective for certain women--namely, those who had suffered two previous mid-trimester miscarriages. We suggest that the same principle of equipoise and a stratified analysis should apply to rare disease but that the trial should be analysed along bayesian lines, since this would allow the results to be scrutinised whenever deemed appropriate and itwould not be essential to provide convincing evidence of likely recruitment to gain funding.
How the approach may work in practice
To illustrate this approach, consider a category of patient with fairly advanced (say, stage 3)primary biliary cirrhosis for whom a new drug (perhaps a biological modifier) is proposed as an alternative to conventional ursodeoxycholic treatment. The prevalence is so low that improvement in death and transplantation rates “cannot be subject to controlled trials.”17
This conclusion comes from thinking of trials as a black box to be used only when the chances are high that a clinically worthwhile effect can be seen at classic levels of statistical significance. Let us say that five year mortality among patients with this stage of primary ciliary cirrhosis is 30% with current treatment. Standard (frequentist) power calculations suggest a study size of 320 in each arm to show even a large (10 percentage points) improvement in mortality from 30% to 20%. Let us imagine that a bayesian trial is started using an open access (see below) trials facility. Suppose further that after five years 50 patients have been randomly allocated to each treatment and that 15 patients given conventional treatment and 10 given the new treatment have died--that is, an improvement of 10 percentage points as proposed in the above power calculation. However, the 95% confidence interval for the true difference ranges from a 7 percentage point increase to a 27 percentage point decrease in mortality. The result is not significant, and a traditionalist (frequentist) would say that the new treatment was still unproved and that it may even be wrong to use this expensive and possibly harmful medicine until further studies had been carried out. However, the bayesian would ask: “how have these data impacted on my prior belief?” Such prior belief would be based on biological plausibility, the results of other (perhaps open) studies, and clinical experience.
Figure 3 shows the effect ofthe above data on an observer who previously considered a decrease and increase in mortality to beequally likely (in equipoise) and that a result more extreme than a 20 percentage point improvement or worsening in mortality was implausible. Calculation based on Bayes's theorem now shows that the most likely treatment effect is a 3.3 percentage point improvement in mortality, and there is aprobability of 0.75 that the new treatment is better as far as mortality is concerned. In addition, both a reduction in mortality of more than 15 percentage points and an increase in mortality of more than 8.5 percentage points are highly unlikely (1% chance each).
The clinician in figure 4, however, was sceptical about the chances of the new treatment being better than no treatment and israther vague about where the true difference lies. This clinician would find very large increases or smaller decreases in mortality plausible. Calculation of the posterior probabilities in the light of the data produced in the trial, now means that this clinician is in equipoise. Such a clinician, if previously unwilling to offer randomised treatment to patients, could now do so against personal equipoise.
Figure shows an enthusiastic clinician who thinks that the new treatment is likely to improve mortality but is otherwise vague about where the true difference lies. The posterior probabilities, which take account of the data, are centred on a 12.5 percentage point reduction in mortality. The probability that the new treatment improves mortality is 0.98. Such a clinician may have been (and would remain) unwilling to offer randomised treatment to patients unless such improvements in mortality were necessary to justify side effects or costs, or both.18 The clinician's overenthusiasm has also been curbed: the clinician's prior probability of 0.12 that the new treatment produces a huge 25 percentage point reduction in mortality is reduced to just 0.02 in the light of the data.
Making use of all knowledge
The results of non-randomised studies should not be discarded--rather they should be incorporated into prior beliefs with due caution (scepticism). Furthermore, randomised controlled trials of treatments for rare diseases have the potential to produce relatively precise information on surrogate outcomes; and the effect of this information on clinical opinion can be incorporated in bayesian calculations as shown in figure6. Clinical trials capable of measuring improvements in mortality at classic significance levels are difficult to mount. Eleven small trials of UDCA00 have been done for primary ciliary cirrhosis, but only four mention time to transplantation or survival.19 They do, however, show that results in liver function tests are improved by this treatment in comparison with placebo or colchicine. (Fewer patients are required to show a significant change in a continuous variable, such as alkaline phosphatase concentrations, than in a rate, such as death rate.) Although an improvement in mortality cannot necessarily be inferred from improvement in such surrogate outcomes, such results would increase our confidence that death rates are indeed improved. Thus similar data on liver function tests of survivors in this hypothetical trial would increase the likelihood of an improvement in mortality and clinicians might wish to amend their prior expectations accordingly. This might move an initially equipoised prior belief in the direction of benefit 6. The randomised results on improved mortality can then be combined formally with the amended prior beliefs to calculate the probabilities of different sizes of treatment effect. This shows that the greatest probability is a 7.3 percentage point reduction in mortality, and the probability of the new treatment being superior is 0.90. Of course, all randomised controlled trials can by chance produce an incorrect result--that the treatment which is really worse is better--and this is much more likely in small trials. Nevertheless, a decision taken on the basis of a posterior belief that includes evidence from a randomised controlled trial, however small, is more likely to be correct than a decision based simply on a prior belief with no evidence from such a trial. Any randomised evidence is better than none.
Implications for science policy
The concept of randomisation when the beliefs of clinicians treating rare diseases are in equipoise has implications for science policy. Rather than fund individual trials, funding bodies should fund the capacity to undertake rolling trials on a continuousbasis--this would include a trials office that deals with a wide range of issues, is knowledgeable, has facilities for 24 hour randomisation, and can follow up patients over time. Such facilities would allow responsible investigators to offer randomised treatment to the first patient treated by a new method or to randomise whenthe treatment in question might be completely supplanted at some time in the future. This would bea substantial change in how research commissioners think about trials, and we suggest that it may complement, but not supplant, the existing methods, which should remain the standard for common diseases.
Ethics committees sometimes include scientific review and reject randomised controlled trials of treatments showing worthwhile effects that have little chance of producing a result at classic levels of significance. If thereby they encourage a larger trial, then some good has been done. However, if the disease is rare and no patients are randomly allocated treatment (instead of a small number) science, and people who suffer from rare diseases, have been badly served. If this effect is unwittingly replicated over the world, then trials which could contribute to a structured review and meta-analysis will be thwarted, thereby compounding the error. When diseases are rare, or potentially supplantable in the short term, then some unbiased information is better than none. This philosophy might result in replication of the trial elsewhere, thereby providing an unexpected scientific bonus.
Acknowledgments
We thank Dr David Spiegelhalter for his advice and inspiration.