Efficacy of antidepressantsBMJ 2008; 336 doi: http://dx.doi.org/10.1136/bmj.39510.531597.80 (Published 06 March 2008) Cite this as: BMJ 2008;336:516
- 1Department of Psychiatry, Oregon Health and Science University, Portland, OR 97239, USA
- 2Department of Psychology, University of California, Riverside, CA 92521, USA
In February 2008, Kirsch and colleagues reported a meta-analysis of the efficacy of antidepressants using data from clinical trials submitted to the Food and Drug Administration.1 They provocatively concluded, “there seems little evidence to support the prescription of antidepressant medication to any but the most severely depressed patients.”
In January this year, we published an article about the selective publication of antidepressant trials and its influence on apparent efficacy,2 in which we also used FDA data. Our main finding was that antidepressant drugs are much less effective than is apparent from journal articles. From the FDA data we derived an overall effect size of 0.31. Kirsch and colleagues used FDA data from four of the 12 drugs we examined and calculated an overall effect size of 0.32.
Although these two sets of results were in excellent agreement, our interpretations of them were quite different. In contrast to Kirsch and colleagues’ conclusion that antidepressants are ineffective, we concluded that each drug was superior to placebo. The difference in our interpretations stems from Kirsch and colleagues’ use of the criteria for clinical significance recommended by the UK’s National Institute for Health and Clinical Excellence (NICE).
Clinical significance is an important concept because a clinical trial can show superiority of a drug to placebo in a way that is statistically, but not clinically, significant. Tests of statistical significance give a yes or no answer (for example, P<0.05 is deemed significant, P>0.05 non-significant) that tells us whether the true effect size is zero or not, but it tells us nothing about the size of the effect.3 In contrast, effect size does, and thus allows us to look at the question of clinical significance. Values of 0.2, 0.5, and 0.8 were proposed to represent small, medium, and large effects, respectively.4
NICE chose the “medium” value of 0.5 as a cut-off below which they deem benefit of a drug not clinically significant.5 This is problematic because it transforms effect size, a continuous measure, into a yes or no measure, thereby suggesting that drug efficacy is either totally present or absent, even when comparing values as close together as 0.51 and 0.49. Kirsch and colleagues compared their effect size of 0.32 to the 0.50 cut-off and concluded that the benefits of antidepressant drugs were of no clinical significance.
But on what basis did NICE adopt the 0.5 value as a cut-off? When Cohen first proposed these landmark effect size values, he wrote, “The terms ‘small’, ‘medium’, and ‘large’ are relative . . . to each other . . . the definitions are arbitrary . . . these proposed conventions were set forth throughout with much diffidence, qualifications, and invitations not to employ them if possible.” He also said, “The values chosen had no more reliable a basis than my own intuition.” Thus, it seems doubtful that he would have endorsed NICE’s use of an effect size of 0.5 as a litmus test for drug efficacy.
To illustrate Cohen’s use of “relative” with a metaphor, imagine antidepressant efficacy measured in terms of litres of a fluid called “d-juice” (named after Cohen’s “d”—the effect size measure described here). When our group measured 0.41 litres of d-juice in the “glass” representing journal articles, but 0.31 litres in the FDA glass, we concluded that the FDA glass was empty relative to the journal glass. Nevertheless, we acknowledged that 0.31 litres was an amount that was measurable and significant. Kirsch and colleagues measured 0.32 litres of d-juice, but because they did not consider the glass sufficiently full (defined arbitrarily as P≥0.5), they concluded that the glass contained virtually no d-juice whatsoever. To summarise, we agree that the antidepressant “glass” is far from full, but we disagree that it is completely empty.
Hypothetically, if antidepressants are not worth taking, then what should doctors and patients do? Kirsch and colleagues recommend that if antidepressants are to be used at all they should be used only when alternative treatments have failed to provide a benefit.1 Although the authors did not specify a preferred first line treatment, they may have had psychotherapy in mind.6 7 It seems unfair that pharmacological, and not psychotherapeutic, treatment has become the usual first line approach to depression merely for economic reasons.7 But before we embrace any treatment as first line, it is prudent to ask whether its efficacy is beyond question. For psychotherapy trials, there is no equivalent of the FDA whose records we can examine, so how can we be sure that selective publication is not occurring here as well?
Our clinical recommendation is that when considering the potential benefits of treatment with antidepressants, be circumspect but not dismissive. Efficacy measured in clinical trials does not necessarily translate into effectiveness in clinical practice.8 Patients’ individual responses are like clinical trial effect sizes in that they are not all or none. Thus, when a patient is tried on his or her first antidepressant, a partial response should not be surprising or discouraging. Also, depression rating scales used in clinical trials seldom measure quality of life, which has been suggested to be a reasonable measure of clinical significance.9
With regard to policy, we reiterate our request in 2004 for drug regulatory authorities such as the FDA to make their reviews publicly available on the world wide web—retrospectively.10 Making this unbiased information more accessible will allow other researchers to move beyond antidepressants and ascertain the true efficacy of all marketed drugs.
Competing interests: None declared.
Provenance and peer review: Commissioned; not externally peer reviewed.