The effectiveness of exercise as an intervention in the management of depression: systematic review and meta-regression analysis of randomised controlled trialsBMJ 2001; 322 doi: http://dx.doi.org/10.1136/bmj.322.7289.763 (Published 31 March 2001) Cite this as: BMJ 2001;322:763
- Debbie A Lawlor, lecturer in epidemiology and public health medicine ()a,
- Stephen W Hopker, consultant psychiatristb
- a Department of Social Medicine, University of Bristol, Bristol BS8 2PR
- b Bradford Community Trust, Shipley, West Yorkshire BD18 3BP
- Correspondence to: D A Lawlor
- Accepted 1 December 2000
Objective: To determine the effectiveness of exercise as an intervention in the management of depression.
Design: Systematic review and meta-regression analysis of randomised controlled trials obtained from five electronic databases (Medline, Embase, Sports Discus, PsycLIT, Cochrane Library) and through contact with experts in the field, bibliographic searches, and hand searches of recent copies of relevant journals.
Main outcome measures: Standardised mean difference in effect size and weighted mean difference in Beck depression inventory score between exercise and no treatment and between exercise and cognitive therapy.
Results: All of the 14 studies analysed had important methodological weaknesses; randomisation was adequately concealed in only three studies, intention to treat analysis was undertaken in only two, and assessment of outcome was blinded in only one. The participants in most studies were community volunteers, and diagnosis was determined by their score on the Beck depression inventory. When compared with no treatment, exercise reduced symptoms of depression (standardised mean difference in effect size −1.1 (95% confidence interval −1.5 to −0.6); weighted mean difference in Beck depression inventory −7.3 (−10.0 to −4.6)). The effect size was significantly greater in those trials with shorter follow up and in two trials reported only as conference abstracts. The effect of exercise was similar to that of cognitive therapy (standardised mean difference −0.3 (95% confidence interval −0.7 to 0.1)).
Conclusions: The effectiveness of exercise in reducing symptoms of depression cannot be determined because of a lack of good quality research on clinical populations with adequate follow up.
What is already known on this topic
What is already known on this topic Depression is common
Management is often inadequate and many patients do not comply with antidepressant medication
The effect of exercise on depression has been a subject of interest for many years
What this study adds
What this study adds Most studies of the effect of exercise on depression are of poor quality, have brief follow up, and are undertaken on non-clinical volunteers
Exercise may be efficacious in reducing symptoms of depression in the short term but its effectiveness in clinical populations is unknown
A well designed, randomised controlled trial with long term follow up is needed
Depression is a common and important cause of morbidity and mortality worldwide. Although effective pharmacological interventions are available, much depression remains inadequately treated. Compliance with antidepressant treatment is often poor: studies have shown that between 20% and 59% of patients in primary care stop taking antidepressants within three weeks of the drugs being prescribed. 1 2 The effect of exercise on depression has been the subject of research for several decades, and the literature on the subject is growing.3 In the past decade “exercise on prescription” schemes have become popular in primary care in the United Kingdom,4 many of which include depression as a referral criterion.
Several plausible mechanisms for how exercise affects depression have been proposed. In the developed world taking regular exercise is seen as a virtue; the depressed patient who takes regular exercise may, as a result, get positive feedback from other people and an increased sense of self worth. Exercise may act as a diversion from negative thoughts, and the mastery of a new skill may be important. 5 6 Social contact may be an important mechanism, and physical activity may have physiological effects such as changes in endorphin and monoamine concentrations. 7 8
Three meta-analyses have looked at the effect of exercise on depression, and all found a benefit.9–11 However, these analyses pooled data from a range of study types that included uncontrolled studies and randomised as well as non-randomised controlled trials. They also pooled data from trials that compared exercise and no treatment with data from trials that compared exercise and other forms of treatment, and they did not explicitly assess the quality of the studies. Other studies have been completed since the most recent of these meta-analyses was published. This review summarises the evidence from randomised controlled trials of the effectiveness of exercise as a treatment for depression.
Identification of the studies
We searched Medline (1966–99), Embase (1980–99), Sports Discus (1975–99), PsycLIT (1981–99), the Cochrane Controlled Trials Register, and the Cochrane Database of Systematic Reviews using the terms “exercise,” “physical activity,” “physical fitness,” “walking,” “jogging,” “running,” “cycling,” “swimming,” “depression,” “depressive disorder,” and “dysthymia.” We also examined bibliographies, contacted experts, and hand searched copies published in the 12 months to December 1999 of the following journals: BMJ, JAMA, Archives of Internal Medicine, New England Journal of Medicine, Journal of the Royal Society of Medicine, Comprehensive Psychiatry, British Journal of Psychiatry, Acta Psychiatrica Scandanavica, and British Journal of Sports Medicine. Three people independently reviewed titles and available abstracts to retrieve potentially relevant studies; studies needed to be identified by only one person to be retrieved.
Studies were included in the review if the participants were diagnosed as having depression (by any method of diagnosis and with any severity of depression) and were aged 18 or above (with no upper age limit). Only randomised controlled trials were included. A trial was defined as a randomised controlled trial if the allocation of participants to treatment and comparison groups was described as randomised (including terms such as “randomly,” “random,” and “randomisation”). Studies had to include depression as an outcome measure and could be in any language. We excluded studies that compared different types of exercise, those that measured outcomes immediately before and after a single exercise session, and those that looked at the effect of exercise on anxiety or other neurotic disorders. We included studies that compared exercise and other, established treatments for depression.
We assessed the quality of studies by noting whether allocation was concealed and intention to treat analysis was undertaken, and whether there was blinding. 12 13 For concealment of allocation we distinguished between trials that were adequately concealed (central randomisation at a site remote from the study; computerised allocation in which records are in a locked, unreadable file that can be accessed only after entering patient details; the drawing of sealed and opaque sequentially numbered envelopes), inadequately concealed (open list or tables of random numbers; open computer systems; drawing of non-opaque envelopes), and unclear (no information in report, and the authors either did not respond to requests for information or were unable to provide information). We defined trials as using intention to treat analysis if all the patients were analysed in the groups to which they were randomly allocated. If only those who started treatment or only those who completed treatment were included in the analysis we defined the study as not using intention to treat analysis. For blinding we distinguished between trials in which the main outcome was measured by an assessor who was blind to treatment allocation and those in which the main outcome was measured either by the participants themselves or by a non-blinded assessor.
The two authors independently extracted data (the quality criteria, participant details, intervention details, outcome measures, baseline and post-intervention results, and main conclusions), using a structured form. We resolved discrepancies by referring to the original papers and discussion.
Contact with authors
We found current contact details of all authors through correspondence addresses on study reports and by searching websites. We contacted all authors by email or post (sending three reminders to non-responders), to establish missing details in the methods and results sections of the written reports and to determine authors' knowledge of or involvement in any current work in the area. On the envelopes we put return address details and a request to inform us if the addressee was no longer at that address.
The studies used a number of psychometric instruments to assess depression, with several using more than one instrument. To include data from as many trials as possible we calculated effect sizes for each trial, using Cohen's method,14 and a standardised mean difference for the overall effect. To calculate a trial's effect size we defined the main outcome measure of depression as the one reported in the abstract or the first one reported in either the methods or results sections. As the main outcome measure in 10 of the 14 trials that were finally included was the Beck depression inventory, we also combined data from these trials to calculate the weighted mean difference in the Beck depression inventory score.
We undertook a narrative review of all studies and a meta-regression analysis of those studies with appropriate data. The effect of exercise compared with “no treatment” (controls on a waiting list; placebo intervention; or, where exercise was an adjunct, with both treatment and control groups receiving an identical established treatment) was considered separately from the effect of exercise compared with an established treatment for depression. Some studies were included in both analyses as they contained exercise, established treatment, and control groups.
We anticipated that systematic differences between studies (heterogeneity) would be likely. This was the case for the meta-analysis of studies comparing exercise with no treatment; for these we used a random effects model based on DerSimonian and Laird's method to calculate the pooled effect size.15 The results of studies comparing exercise with cognitive therapy were homogeneous; for meta-analysis of these we used the fixed effects inverse variance method.16 We also undertook a meta-regression analysis to assess the effects of allocation concealment, intention to treat analysis, blinding, the setting (whether participants were volunteers from the community or clinical patients), baseline severity of depression (using Beck depression inventory scores or, for the two studies that did not report baseline Beck depression inventory scores for the whole study sample, inputting the mean of these scores), type of exercise (aerobic or non-aerobic), type of publication (peer reviewed journal, conference abstract, or doctoral dissertation), and length of follow up. We used STATA (version 6) statistical software for all the analyses.
Study inclusion and characteristics
Figure 1 summarises the process of inclusion of the studies for review and analysis. Of 72 potentially relevant studies, we excluded 56: 29 were non-systematic reviews or commentaries,17–45 15 were experimental non-randomised controlled trials,46–60 three were of psychiatric patients with mixed diagnoses and had no separate analysis for depressed patients,61–63 five did not have an outcome measure of depression,64–68 and four compared different types of exercise but had no non-exercising group.69–72
Sixteen articles reporting 14 studies fulfilled the inclusion criteria.73–88 Of these 14 studies, 10 were in the United States, 73 74 79–81 84–88 two were in the United Kingdom, 75 78 and one each were in Canada77 and Norway.82 Eight of the 14 studies compared an exercise group with a no treatment group, 74 75 77–79 82 and six compared exercise directly with an established form of therapy: four with cognitive therapy, 80 81 84 87 one with psychotherapy,88 and one with antidepressant treatment.73 Three of these also had a no treatment group 81 84 87; thus in total 11 studies compared exercise with no treatment and six compared it with an established treatment (fig 1).
Missing data and contact with authors
Authors of 11 of the 14 studies responded to our request to provide missing data, 73–78 80 82–86 88 but three were unable to provide all the information. 85 86 88 Only seven of the 14 written reports provided adequate data for statistical pooling and confirmation of study conclusions. Through contact with authors we were able to obtain adequate data for a further five.
Most studies were of poor quality. In no study was treatment allocation described, and contact with authors established that allocation might have been adequately concealed in only three studies. 74 75 82 Intention to treat analysis was undertaken in two studies. 73 74 The main outcome was measured by the participants themselves, by means of a questionnaire, in all but two of the studies. 73 75 The outcome assessor in one of these exceptions was not blinded,75 therefore assessment of outcome was blind in only one of the 14 studies.
Nine of the studies involved non-clinical populations, 73 74 77 79–81 84 85 87 and most participants were recruited through the media. The participants in the study by McCann and Holmes85 were a sample obtained from a screening of all female entrants in one year to an undergraduate psychology course at the University of Kansas; the report stated that students had to participate in research as part of their course. Two studies reported financial incentives for participants. 79 80
Only four of the nine studies with non-clinical participants used clinical interview to confirm the presence of depression, 73 74 79 84 the remainder using a cut-off point on the self reported Beck depression inventory score (each study using a different value). One of the studies in which depression was confirmed by clinical interview also included patients with dysthymia (depressed mood, without the full range of symptoms of clinical depression).74
Exercise compared with placebo intervention or as an adjunct to standard treatment
Table 1 summarises the 11 studies that compared exercise with no treatment, 10 of which had data available for analysis. The pooled standardised mean difference in effect size, calculated using the random effects model, was −1.1 (95% confidence interval −1.5 to −0.6). Significant heterogeneity between studies (Q=35.0, P<0.001) was not associated with allocation concealment, intention to treat analysis, blinding, setting, baseline severity of depression, or exercise type but was associated with type of publication and length of follow up. The reported effect of treatment was significantly higher in conference abstracts than in peer reviewed journals or doctoral dissertations (P<0.01). The estimated variance (τ2) between studies was reduced from 0.41 to 0.03 when “abstract” was added as a variable to the model. Length of follow up was significantly negatively associated with the size of effect: the addition of the variable “follow up” reduced τ2 from 0.41 to 0.08. When both these variables were combined in the model, τ2 was reduced to zero.
Figure 2 shows the standardised mean differences in effect size of the 10 studies that provided these data, with the studies listed in order of length of intervention. Pooling studies according to type of publication gave standardised mean differences, calculated using the fixed effects model, of −0.7 (−1.0 to −0.5; n=8) for journal papers and dissertations and −2.3 (−2.9 to −1.8; n=2) for conference abstracts; pooling according to duration of intervention gave −1.8 (−2.3 to −1.3; n=2) for less than eight weeks, −1.3 (−1.8 to −0.9; n=4) for eight weeks, and −0.6 (−0.9 to −0.3; n=4) for more than eight weeks (there was no significant heterogeneity within these subgroups). Although the effect size remained significant when the two conference abstracts were excluded and when we analysed only those studies of more than eight weeks' duration, the effect was reduced.
Pooling the nine studies that used the Beck depression inventory as a measure of depression gave a weighted mean difference in the score of −7.3 (−10.0 to −4.6). Again there was significant heterogeneity, associated with type of publication and length of follow up.
Exercise compared with standard treatments for depression
Table 2 summarises the six studies that compared exercise and standard interventions, four of which compared exercise and cognitive therapy. Figure 3, which shows the standardised mean differences of these four studies, shows that the difference in effect size between exercise and cognitive therapy was not significant (standardised mean difference −0.3 (95% confidence interval −0.7 to 0.1)). These studies were homogeneous (Q=2.9, P=0.4).
Only one study compared exercise and standard antidepressant treatment.73 Its main outcome measure—the Hamilton rating scale of depression—did not differ significantly between the groups of patients receiving the exercise intervention, medication, or both; and at the end of the intervention period the proportion of patients diagnosed as no longer depressed was similar in each group.
Quality of the studies
Exercise may be efficacious in reducing depressive symptoms, but the poor quality of much of the evidence is of concern. The fact that none of the measures of study quality explained the variation among the studies is likely to be due to the low quality of most of the studies. The size of the effect is increased by results from two unpublished conference abstracts and studies with a shorter follow up period, suggesting that results may be sustained only in the short term. All the studies reported results at the end of the intervention, and only one study followed patients up beyond the completion of the intervention. This was the only study that found no effect of exercise, compared with the control group, at the end of the intervention period (12 weeks); at nine months' follow up the reduction in symptoms remained similar in the exercise intervention, control (meditation), and cognitive therapy groups.84 Thus this evidence does not support a sustained effect of exercise beyond the intervention period. Participants from one other study are being followed up for two years (N Singh, personal communication, 1999),74 and the results of this follow up will provide important information.
The size of the effect of exercise compared with no treatment in the studies we analysed is similar to those found by three previous meta-analyses.9–11 We aimed to provide a better quality analysis by including only trials that were described as randomised controlled trials. Our results did not differ from those of meta-analyses that also included non-randomised trials and observational data; this may be because the effect of randomisation was mitigated by the lack of adequate concealment, intention to treat analysis, and blinding, making the trials in our analysis no better than non-randomised trials.
Type of exercise
There was no association between type of exercise and the variation in results between studies, indicating that aerobic and non-aerobic exercise have a similar effect. Studies directly comparing different exercise types support such a finding.69–72 However, this may be because the effect is due to psychosocial factors, such as learning a new skill or socialising, rather than to the exercise itself. None of the participants in the studies we reviewed exercised alone: they were either with other participants or with a coach. McNeil et al included a social contact control group and found no significant difference in the effect on depressive symptoms between this group and the exercise group.77
Our aim was to assess clinical effectiveness—that is, the likely effect of exercise on clinical patients in everyday practice. Although no trial can exactly replicate everyday practice, the screening out of individuals who were not motivated to exercise, the use of non-clinical volunteers, and the lack of intention to treat analysis in most of the studies suggest that our results overestimate what would be likely in real life. In the United Kingdom rates of compliance with “exercise on prescription” schemes among patients with any referral criteria vary from 20% to 50%. 4 89 It is reasonable to assume that compliance among patients with depression would be similar or worse. Salmon has pointed out that the allocation of depressed patients in these studies to activities such as running or aerobics “must puzzle clinicians, who in treating depressed people, often have to contend with an absence of motivation to tackle much less strenuous features of life's routine.”37 Baseline severity of depression, when added to the regression model, was not associated with any of the systematic differences between studies. This suggests that, although different criteria for determining inclusion were used, participants in each study had similar levels of depression. However, the fact that most studies used non-clinical participants means that the results may be less generalisable.
To use as much of the available data as possible we calculated the standardised mean difference using effect sizes. The result is therefore expressed as a standard deviation. That is to say, our results show that people who exercise are “1.1 standard deviations less depressed than non-exercisers”; in clinical terms such a result is difficult to understand. We also calculated pooled differences in the mean score on the Beck depression inventory (a common instrument in mental health research) for those studies in which this measure was used. We argue, however, that even such a well known instrument is difficult to interpret clinically. Our result shows that people who exercise score less on the Beck depression inventory scale (by 7.3 points) than those who do not exercise, a result that is likely to have little meaning for most doctors and patients. A more useful outcome measure would be the likelihood of being depressed after the intervention, but only two studies included a dichotomous outcome. 73 81 Epstein, comparing exercise and no treatment, found no significant difference between the exercise and control groups in the numbers of participants who were still diagnosed as having depression.81 A dichotomous result is a more understandable and perhaps a more important outcome measure in clinical terms, and such measures should be included in future research in this area.
Many of the problems we identified in the studies we reviewed are also present in research into other interventions in the management of depression, 90 91 highlighting the need for better quality research in the area of depression. We conclude that it is not possible to determine from the available evidence the effectiveness of exercise in the management of depression. However, exercise may be efficacious in reducing the symptoms of depression in some volunteers in the short term. Doctors could recommend more physical activity to their motivated patients, but this should not replace standard treatment, particularly for those with severe disease. Other health benefits could accrue to patients who do become more active. 92 93 There is a need for well designed, randomised controlled trials on a clinical population that measure both continuous and dichotomous outcomes and that follow up participants for at least 12 months.
This work began as part of a training course at the NHS Centre for Reviews and Dissemination, University of York, and we thank Jos Kleinan and other staff at the centre for their help. Alan Lui (audit nurse, Airedale General Hospital, West Yorkshire) helped with the protocol development and retrieval of articles. Domenico Scala (senior house officer, psychiatry, Lynfield Mount Hospital, Bradford) translated one Italian paper. Matthias Egger and David Gunnell (department of social medicine, University of Bristol) gave useful comments on an earlier draft.
Contributors: Both authors developed the idea for the review, the protocol, and the search strategy, applied the search strategy, and independently extracted data from retrieved articles. DAL undertook all statistical analyses and wrote the original draft of the paper. Both authors contributed to the final version of the paper, and both act as guarantors.
Competing interests None declared.