Intended for healthcare professionals

Education And Debate

When placebo controlled trials are essential and equivalence trials are inadequate

BMJ 1998; 317 doi: https://doi.org/10.1136/bmj.317.7162.875 (Published 26 September 1998) Cite this as: BMJ 1998;317:875
  1. Martin R Tramèr, research fellow (martin.tramer{at}hcuge.ch)a,
  2. D John M Reynolds, consultant clinical pharmacologistb,
  3. R Andrew Moore, consultant biochemista,
  4. Henry J McQuay, clinical reader in pain reliefa
  1. aPain Research and Nuffield Department of Anaesthetics, University of Oxford, Oxford Radcliffe Hospital, The Churchill, Headington, Oxford OX3 7LJ
  2. bDepartment of Clinical Pharmacology, Radcliffe Infirmary, Oxford OX2 6HE
  1. Correspondence to: Dr M R Tramèr, Division of Anaesthesiology, DAPSIC, Geneva University Hospital, 1211 Geneva 14, Switzerland
  • Accepted 9 June 1998

Arguments against the use of placebo groups in clinical trials have been based on opinion rather than evidence. Ethical issues have been raised,1 but these are contentious. 2 3 Scientific requirements should not override ethical ones, but if placebo controls are not used, then active controlled trials (trials using other active drugs as controls) have to be able to determine the efficacy of an intervention and its likelihood of causing harm.

Summary points

  • Many consider the use of placebos in clinical trials to be unethical, but can trials without placebo controls provide sensible and useful results?

  • One problem is finding a gold standard comparator—for example, no gold standard comparator exists for the prophylactic antiemetic ondansetron

  • Another problem is the underlying variation in likelihood of an event (wanted or unwanted); the incidence of postoperative nausea and vomiting, for example, can range from 1% to 80% within 6 hours and from 10% to 96% within 48 hours after surgery

  • Ondansetron (pooled 4 mg and 8 mg data) seems better than metoclopramide 10 mg at preventing postoperative nausea and vomiting within 6 hours of surgery but not after 6 hours; comparisons with all the other antiemetics and data on adverse effects are inconclusive

  • In clinical settings where no gold standard treatment exists and where event rates vary widely, trial designs without placebo controls are unlikely to yield sensible results

  • The ethics of recruiting patients into trials that cannot yield sensible results is dubious

  • Systematic reviews could provide ethics committees and trialists with the necessary information to question the ethics of a trial design

Evidence from placebo controlled trials

We used the antiemetic ondansetron to explore the value of active controlled trials for two reasons. Firstly, the ethics of using placebo controls in ondansetron trials has been questioned repeatedly, both in oncology 4 5 and after surgery,6 causing confusion for trialists7 and ethics committees.8 Secondly, we had good estimates of ondansetron's antiemetic efficacy and harm postoperatively from a systematic review.9 That showed a dose-response and defined the optimal dose: 8 mg intravenously or 16 mg orally achieved a number needed to treat to prevent emesis of about 6 compared with placebo.9 It also showed that 1 in 30 patients treated with ondansetron will have a headache or raised concentrations of liver enzymes—they would not have had these complications without the drug.9 We compared these estimates of efficacy and harm with those from active controlled comparisons.

Active controlled trials—methods of quantitative systematic review

The methods used in systematic search, quality score, data extraction, and meta-analysis of active controlled ondansetron trials are described in detail elsewhere.9 Efficacy data for ondansetron as a treatment of established postoperative nausea and vomiting10 and trials without an active control arm9 were not analysed. Propofol anaesthesia was not regarded as an antiemetic comparator.11

Evidence from active controlled trials

Multiple different comparators - lack of a gold standard

Evidence

Data on included and excluded trials (retrieved up to September 1996) are shown in figure 1. Thirty three randomised controlled trials with data from 4827 patients (1837 treated with ondansetron) were finally analysed.1244 The median size of ondansetron treatment groups was 33 (range 10-465) patients. The median quality score45 of all reports was 2 (1-5).

FIG 1
FIG 1

Data on included and excluded trials

Tables with relevant data extracted from the analysed reports are available on the internet (www.jr2.ox.ac.uk/Bandolier/painres/ondA/ondA.html).

Comment

Many different ondansetron regimens were compared with many different antiemetic controls. This shows uncertainty, both about which regimen of ondansetron is the best, and about which established antiemetic should be used as the gold standard active control. A gold standard is needed to establish the relative efficacy and harm of a new treatment, and that gold standard should be the most effective and the least harmful.46 There is still no such standard for prevention of postoperative nausea and vomiting. Only a minority of these trials used the optimal intravenous dose of ondansetron—namely, 8 mg.9 We do not know the most effective dose for any other antiemetic.

Underlying variation in likelihood of nausea and vomiting (control event rate)

Evidence

Nineteen trials included a placebo arm 14 16 18 2024 27 28 3135 38 41 43 44 and two trials included a “no treatment” arm. 15 30 The median number of patients in ondansetron groups in these trials was 32 (10 to 465). The median quality score was 2 (1 to 5). Graphically, the comparison of any dose of ondansetron with placebo in the trials that included a placebo arm suggested superiority of ondansetron (fig 2 (top)). Nausea or vomiting rates in placebo groups varied between 1% and 80% for outcomes up to six hours after surgery, and between 10% and 96% for outcomes up to 48 hours after.

FIG 2
FIG 2

Event rate scattergrams showing cumulative event rates up to 48 hours after surgery from (top) placebo and “no treatment” controlled trials and (bottom) active controlled trials (dotted line represents equality)

Comment

The extraordinary variation in the incidence of nausea and vomiting that was shown in placebo controlled trials (10% to 96%) is a big problem. If some patients do not vomit then prophylactic antiemetic efficacy cannot be shown. If everybody vomits then prophylactic antiemetic efficacy will be exaggerated. The variation is not an artefact of trial design or measurements and it is not confined to antiemetics.47 Reasons for this phenomenon are poorly understood.48 It may be due partly to random variation in small trials.49

Equivalence

Evidence

Ondansetron was no better than placebo in 19 of the 52 possible comparisons with all outcomes. 14 18 20 23 24 27 30 32 33 35 44 The median number of patients in ondansetron groups in the 11 trials that failed to show a difference between ondansetron and placebo in at least one comparison was 30 (10 to 83).

Comment

Many of the trials showed no difference between ondansetron and its active control—that is, they showed equivalence. Failure to show a difference between two treatments, however, does not necessarily mean equivalence.50 The only conclusions that can be drawn if both drugs show similar efficacy are: (a) both drugs are effective to a similar degree; (b) both drugs are equally ineffective; or (c) the trial design was inadequate—for example, too small—to show the real difference between the two treatments. In equivalence trials we need to know that both treatments were indeed effective in an A versus B comparison of two active drugs.50 To meet this criterion we need to know the extent of the placebo response and that it does not vary. Otherwise a result seeming to show no difference between A and B could mean that both A and B were effective or that neither A nor B was effective. Only in trials with proved internal sensitivity (a positive dose-response or an active drug is better than a placebo) can we draw correct conclusions about equivalence. One trial produced a remarkable result—ondansetron seemed to be equivalent to placebo but significantly better than the active comparator.14 Only because there was a placebo group was the obvious lack of internal sensitivity detectable.

Can we interpret these active comparisons?

Evidence of efficacy

The event rate scatter suggested little difference between any ondansetron regimen and any dose of any comparator (fig 2 (bottom)). With both ondansetron and comparators early event rates ranged from 0% to about 60% and late event rates from 0% to about 80%.

As in the original meta-analysis of placebo controlled ondansetron trials9 we intended to combine clinically homogeneous efficacy data—namely, similar active drug and comparator, similar dose and route of administration, similar emetic events, and similar observation periods. We could not do this here for more than two trials at a time, except for the comparison of ondansetron 4 mg with metoclopramide 10 mg. We therefore combined data from different doses of ondansetron and compared these data with combined data from different doses of any given comparator, but only if the trials at least reported similar emetic events and similar observation periods. The same was done for adverse effects. Only one active group was considered in trials with different doses of ondansetron or comparators. 28 39 43 The major results of the meta-analysis—that is, comparisons between ondansetron and either droperidol or metoclopramide—are shown in tables 1 and 2.

Table 1.

Meta-analyis of ondansetron (combined regimens) v droperidol (combined regimens) and metoclopramide (combined regimens) in adults and children, showing data on efficacy

View this table:
Table 2.

Meta-analysis of ondansetron (combined regimens) and droperidol (combined regimens) in trials with a placebo arm showing data on harm

View this table:

Ondansetron was also compared with nine other antiemetics in one trial each. During the first six hours after surgery there was no difference between ondansetron 4 mg and perphenazine 5 mg,21 promethazine 1 mg/kg,30 or dexamethasone 8 mg.27 Similarly, ondansetron 60 µg/kg was not different from prochlorperazine 0.1 mg/kg or 0.2 mg/kg.44 In the first 48 hours after surgery, ondansetron 60 µg/kg was significantly better than intravenous prochlorperazine 0.1 mg/kg (number needed to treat 6 (95% confidence interval 3.3 to 39)) and intramuscular prochlorperazine 0.2 mg/kg (number needed to treat 6.8 (3.5 to 114)) in preventing any emetic event,44 and ondansetron 8 mg was significantly more efficacious than sulpiride 50 mg (number needed to treat 4 (2 to 328)) in preventing any emetic event.39 During the same time period there was no difference between ondansetron 4 mg and promethazine 1 mg/kg,30 dexamethasone 8 mg,27 tropisetron 5 mg,31 or granisetron 3 mg,31 and ondansetron 8 mg was no better than alizapride 50 mg.15

Comment on efficacy

The scattergram suggests qualitatively that ondansetron was no better than droperidol, perphenazine, prochlorperazine, promethazine, alizapride, sulpiride, tropisetron, granisetron, or dexamethasone (fig 2 (bottom). Ondansetron seemed to be more effective than metoclopramide (table 1), but the clinical importance of any difference is doubtful. At least six adult patients would have to be treated with ondansetron 4 mg or 8 mg to prevent one patient who would have vomited or been nauseous had he or she received metoclopramide 10 mg from vomiting or being nauseous in the first six hours after surgery. Unlike ondansetron,9 the optimal dose of metoclopramide is still not known; 10 mg may have been too low a dose. Before a sensible comparison between ondansetron and metoclopramide can be made, the optimal dose of metoclopramide needs to be established. This is true for all the other comparators. Ondansetron's antivomiting effect compared with droperidol or metoclopramide seemed to be more pronounced in children than in adults. Reasons for this are unknown. The effect on nausea was not reported in paediatric trials.

Evidence of adverse effects

Possible drug related adverse effects were reported in 19 trials, in 11 of them in dichotomous form. In three placebo controlled trials postoperative anxiety, restlessness, or agitation was described in patients treated with droperidol, and in six placebo controlled trials postoperative headache was reported in patients receiving ondansetron (table 2). No extrapyramidal symptoms were reported in any trial.

In trials without a placebo arm adverse effects described in relation to ondansetron were flush and urticaria in three patients 12 36 and nodal rhythm in three patients.34 Ten patients (0.2% of all patients) were admitted or readmitted to hospital because of excessive or prolonged postoperative nausea and vomiting (five children had received droperidol 50 µg/kg or 75 µg/kg, 20 22 42 two children ondansetron 150 µg/kg,42 and one adult metoclopramide 10 mg,28 and in two adults the treatment was not specified33).

Comment on adverse effects

Only placebo controlled trials enabled us to draw meaningful conclusions on drug related harm. The widely held view that use of droperidol is limited by adverse reactions was not supported.

Discussion

Arguments against the use of placebos include the general one that placebos are unethical51 and the specific one that, because ondansetron has proved more effective than placebo, we do not need further placebo controlled trials and what we need is to determine ondansetron's clinical role, through comparisons with existing antiemetics (active controlled trials). 6 52 These 33 active controlled ondansetron trials provided the opportunity to check whether these trials alone—that is, in the absence of any placebo controlled trials—would have been adequate to determine relative efficacy and likelihood of harm. They would not: three major shortcomings were discovered (fig 3).

FIG 3
FIG 3

Potential shortcomings of active controlled trials

Why we need placebos

In this review 36% of all trials did not include a placebo (or a no treatment) group. We do not know if these trials provide valid data because we do not know the event rates without antiemetic prophylaxis in these study populations, and because they lack an index of internal sensitivity. Interpretation of these trials is, therefore, impossible. It is likely that the use of placebos would have avoided most of the drawbacks in these active controlled trials. Moreover, placebo controls would have enabled estimation of efficacy and likelihood of harm for a variety of regimens of the new antiemetic ondansetron, compared with a standard comparator, a placebo. Models for estimation of an intervention's relative efficacy without direct comparisons have been proposed. 47 53

The problems of variation in control event rate (underlying variation in likelihood of an event) and lack of gold standard comparator justify the use of placebos as an index of a trial's validity. This is true in many therapeutic areas, not just in antiemetic trials. In a qualitative systematic review on the analgesic efficacy of intra-articular morphine, for example, only a minority of retrieved trials could be regarded as valid assays with proved internal sensitivity.54

Ethical argument against placebos

Why then, despite potential drawbacks, did 12 trials in this review not include a placebo arm? In some trials placebos were omitted on ethical grounds.12 This is illogical because studies destined to produce unreliable results should themselves be considered unethical. 3 55 The use of placebos is one of the thorniest issues facing clinical researchers today.56 It has been claimed that if an arm of a study is known to be less beneficial or more harmful than alternatives, investigators must protect patients from that additional known risk4 and that assignment of patients to placebo treatment when an effective treatment already exists is therefore unethical.1 Such general statements may be misinterpreted. This systematic review shows clearly that we do not know which treatment is most beneficial or least harmful in postoperative nausea and vomiting. Such systematic reviews could provide ethics committees with the necessary information to question the ethics of a trial design. 7 8 57

Ethical acceptability of placebos

The important question then is whether the use of placebos in trials of postoperative nausea and vomiting is unethical. Use of a placebo would be unethical if it meant that life was endangered or symptoms were made intolerable.3 These trials are designed to establish the number of patients who do not develop nausea or vomiting after surgery.58 Antiemetic “rescue” treatment would be needed both for the patients who were denied active prophylaxis—that is, who received a placebo and who do vomit—and for the patients receiving active prophylaxis in whom that intervention failed. No evidence exists that treatment of established postoperative nausea and vomiting is less efficacious than prevention.10 Although postoperative nausea and vomiting may induce serious complications,59 it is most often a minor adverse effect of anaesthesia and surgery; it does not become chronic; and almost never kills. The use of placebos in trials investigating postoperative nausea and vomiting may therefore be justified. Informed consent and adequate rescue antiemetic treatment are of course necessary to ensure ethical legitimacy.

Conclusions

This set of trials does not support the general argument that we should eschew placebo controlled trials in favour of direct comparisons alone. 1 46 51 52 These trials failed to improve our understanding of the therapeutic role of prophylactic ondansetron in prevention of postoperative nausea and vomiting, and that failure was entirely predictable from their equivalence design. The ethical acceptability of placebos is likely to be dependent on the setting. In situations such as postoperative nausea and vomiting that lack a gold standard treatment and where the likelihood of an outcome is expected to vary widely, trial designs without placebo controls are unlikely to yield sensible results. We contend that the ethics of recruiting patients into trials that cannot yield sensible results is dubious.

Acknowledgments

Funding: MRT was funded by a UK overseas research student award and a PROSPER grant (No 3233-051939.97) from the Swiss National Research Foundation. The review was funded by Pain Research Funds.

Competing interest: None.

References

View Abstract