Effect of exercise for depression: systematic review and network meta-analysis of randomised controlled trials
BMJ 2024; 384 doi: https://doi.org/10.1136/bmj-2023-075847 (Published 14 February 2024) Cite this as: BMJ 2024;384:e075847Linked Editorial
Exercise for the treatment of depression
- Michael Noetel, senior lecturer1,
- Taren Sanders, senior research fellow2,
- Daniel Gallardo-Gómez, doctoral student3,
- Paul Taylor, deputy head of school4,
- Borja del Pozo Cruz, associate professor56,
- Daniel van den Hoek, senior lecturer7,
- Jordan J Smith, senior lecturer8,
- John Mahoney, senior lecturer9,
- Jemima Spathis, senior lecturer9,
- Mark Moresi, lecturer4,
- Rebecca Pagano, senior lecturer10,
- Lisa Pagano, postdoctoral fellow11,
- Roberta Vasconcellos, doctoral student2,
- Hugh Arnott, masters student2,
- Benjamin Varley, doctoral student12,
- Philip Parker, pro vice chancellor research13,
- Stuart Biddle, professor1415,
- Chris Lonsdale, deputy provost13
- 1School of Psychology, University of Queensland, St Lucia, QLD 4072, Australia
- 2Institute for Positive Psychology and Education, Australian Catholic University, North Sydney, NSW, Australia
- 3Department of Physical Education and Sport, University of Seville, Seville, Spain
- 4School of Health and Behavioural Sciences, Australian Catholic University, Strathfield, NSW, Australia
- 5Department of Clinical Biomechanics and Sports Science, University of Southern Denmark, Odense, Denmark
- 6Biomedical Research and Innovation Institute of Cádiz (INiBICA) Research Unit, University of Cádiz, Spain
- 7School of Health and Behavioural Sciences, University of the Sunshine Coast, Petrie, QLD, Australia
- 8School of Education, University of Newcastle, Callaghan, NSW, Australia
- 9School of Health and Behavioural Sciences, Australian Catholic University, Banyo, QLD, Australia
- 10School of Education, Australian Catholic University, Strathfield, NSW, Australia
- 11Australian Institute of Health Innovation, Macquarie University, Macquarie Park, NSW, Australia
- 12Children’s Hospital Westmead Clinical School, University of Sydney, Westmead, NSW, Australia
- 13Australian Catholic University, North Sydney, NSW, Australia
- 14Centre for Health Research, University of Southern Queensland, Springfield, QLD, Australia
- 15Faculty of Sport and Health Science, University of Jyvaskyla, Jyvaskyla, Finland
- Correspondence to: M Noetel m.noetel{at}uq.edu.au (or @mnoetel on Twitter)
- Accepted 15 January 2024
Abstract
Objective To identify the optimal dose and modality of exercise for treating major depressive disorder, compared with psychotherapy, antidepressants, and control conditions.
Design Systematic review and network meta-analysis.
Methods Screening, data extraction, coding, and risk of bias assessment were performed independently and in duplicate. Bayesian arm based, multilevel network meta-analyses were performed for the primary analyses. Quality of the evidence for each arm was graded using the confidence in network meta-analysis (CINeMA) online tool.
Data sources Cochrane Library, Medline, Embase, SPORTDiscus, and PsycINFO databases.
Eligibility criteria for selecting studies Any randomised trial with exercise arms for participants meeting clinical cut-offs for major depression.
Results 218 unique studies with a total of 495 arms and 14 170 participants were included. Compared with active controls (eg, usual care, placebo tablet), moderate reductions in depression were found for walking or jogging (n=1210, κ=51, Hedges’ g −0.62, 95% credible interval −0.80 to −0.45), yoga (n=1047, κ=33, g −0.55, −0.73 to −0.36), strength training (n=643, κ=22, g −0.49, −0.69 to −0.29), mixed aerobic exercises (n=1286, κ=51, g −0.43, −0.61 to −0.24), and tai chi or qigong (n=343, κ=12, g −0.42, −0.65 to −0.21). The effects of exercise were proportional to the intensity prescribed. Strength training and yoga appeared to be the most acceptable modalities. Results appeared robust to publication bias, but only one study met the Cochrane criteria for low risk of bias. As a result, confidence in accordance with CINeMA was low for walking or jogging and very low for other treatments.
Conclusions Exercise is an effective treatment for depression, with walking or jogging, yoga, and strength training more effective than other exercises, particularly when intense. Yoga and strength training were well tolerated compared with other treatments. Exercise appeared equally effective for people with and without comorbidities and with different baseline levels of depression. To mitigate expectancy effects, future studies could aim to blind participants and staff. These forms of exercise could be considered alongside psychotherapy and antidepressants as core treatments for depression.
Systematic review registration PROSPERO CRD42018118040.
Introduction
Major depressive disorder is a leading cause of disability worldwide1 and has been found to lower life satisfaction more than debt, divorce, and diabetes2 and to exacerbate comorbidities, including heart disease,3 anxiety,4 and cancer.5 Although people with major depressive disorder often respond well to drug treatments and psychotherapy,67 many are resistant to treatment.8 In addition, access to treatment for many people with depression is limited, with only 51% treatment coverage for high income countries and 20% for low and lower-middle income countries.9 More evidence based treatments are therefore needed.
Exercise may be an effective complement or alternative to drugs and psychotherapy.1011121314 In addition to mental health benefits, exercise also improves a range of physical and cognitive outcomes.151617 Clinical practice guidelines in the US, UK, and Australia recommend physical activity as part of treatment for depression.18192021 But these guidelines do not provide clear, consistent recommendations about dose or exercise modality. British guidelines recommend group exercise programmes2021 and offer general recommendations to increase any form of physical activity,21 the American Psychiatric Association recommends any dose of aerobic exercise or resistance training,20 and Australian and New Zealand guidelines suggest a combination of strength and vigorous aerobic exercises, with at least two or three bouts weekly.19
Authors of guidelines may find it hard to provide consistent recommendations on the basis of existing mainly pairwise meta-analyses—that is, assessing a specific modality versus a specific comparator in a distinct group of participants.121322 These meta-analyses have come under scrutiny for pooling heterogeneous treatments and heterogenous comparisons leading to ambiguous effect estimates.23 Reviews also face the opposite problem, excluding exercise treatments such as yoga, tai chi, and qigong because grouping them with strength training might be inappropriate.23 Overviews of reviews have tried to deal with this problem by combining pairwise meta-analyses on individual treatments. A recent such overview found no differences between exercise modalities.13 Comparing effect sizes between different pairwise meta-analyses can also lead to confusion because of differences in analytical methods used between meta-analysis, such as choice of a control to use as the referent. Network meta-analyses are a better way to precisely quantify differences between interventions as they simultaneously model the direct and indirect comparisons between interventions.24
Network meta-analyses have been used to compare different types of psychotherapy and pharmacotherapy for depression.62526 For exercise, they have shown that dose and modality influence outcomes for cognition,16 back pain,15 and blood pressure.17 Two network meta-analyses explored the effects of exercise on depression: one among older adults27 and the other for mental health conditions.28 Because of the inclusion criteria and search strategies used, these reviews might have been under-powered to explore moderators such as dose and modality (κ=15 and κ=71, respectively). To resolve conflicting findings in existing reviews, we comprehensively searched randomised trials on exercise for depression to ensure our review was adequately powered to identify the optimal dose and modality of exercise. For example, a large overview of reviews found effects on depression to be proportional to intensity, with vigorous exercise appearing to be better,13 but a later meta-analysis found no such effects.22 We explored whether recommendations differ based on participants’ sex, age, and baseline level of depression.
Given the challenges presented by behaviour change in people with depression,29 we also identified autonomy support or behaviour change techniques that might improve the effects of intervention.30 Behaviour change techniques such as self-monitoring and action planning have been shown to influence the effects of physical activity interventions in adults (>18 years)31 and older adults (>60 years)32 with differing effectiveness of techniques in different populations. We therefore tested whether any intervention components from the behaviour change technique taxonomy were associated with higher or lower intervention effects.30 Other meta-analyses found that physical activity interventions work better when they provide people with autonomy (eg, choices, invitational language).33 Autonomy is not well captured in the taxonomy for behaviour change technique. We therefore tested whether effects were stronger in studies that provided more autonomy support to patients. Finally, to understand the mechanism of intervention effects, such as self-confidence, affect, and physical fitness, we collated all studies that conducted formal mediation analyses.
Methods
Our findings are presented according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses-Network Meta-analyses (PRISMA-NMA) guidelines (see supplementary file, section S0; all supplementary files, data, and code are also available at https://osf.io/nzw6u/).34 We amended our analysis strategy after registering our review; these changes were to better align with new norms established by the Cochrane Comparing Multiple Interventions Methods Group.35 These norms were introduced between the publication of our protocol and the preparation of this manuscript. The largest change was using the confidence in network meta-analysis (CINeMA)35 online tool instead of the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) guidelines and adopting methods to facilitate assessments—for example, instead of using an omnibus test for all treatments, we assessed publication bias for each treatment compared with active controls. We also modelled acceptability (through dropout rate), which was not predefined but was adopted in response to a reviewer’s comment.
Eligibility criteria
To be eligible for inclusion, studies had to be randomised controlled trials that included exercise as a treatment for depression and included participants who met the criteria for major depressive disorder, either clinician diagnosed or identified through participant self-report as exceeding established clinical thresholds (eg, scored >13 on the Beck depression inventory-II).36 Studies could meet these criteria when all the participants had depression or when the study reported depression outcomes for a subgroup of participants with depression at the start of the study.
We defined exercise as “planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness.”37 Unlike recent reviews,1222 we included studies with more than one exercise arm and multifaceted interventions (eg, health and exercise counselling) as long as they contained a substantial exercise component. These trials could be included because network meta-analysis methods allows for the grouping of those interventions into homogenous nodes. Unlike the most recent Cochrane review,12 we also included participants with physical comorbidities such as arthritis and participants with postpartum depression because the Diagnostic Statistical Manual of Mental Health Disorders, fifth edition, removed the postpartum onset specifier after that analysis was completed.23 Studies were excluded if interventions were shorter than one week, depression was not reported as an outcome, and data were insufficient to calculate an effect size for each arm. Any comparison condition was included, allowing us to quantify the effects against established treatments (eg, selective serotonin reuptake inhibitors (SSRIs), cognitive behavioural therapy), active control conditions (usual care, placebo tablet, stretching, educational control, and social support), or waitlist control conditions. Published and unpublished studies were included, with no restrictions on language applied.
Information sources
We adapted the search strategy from the most recent Cochrane review,12 adding keywords for yoga, tai chi, and qigong, as they met our definition for exercise. We conducted database searches, without filters or date limits, in The Cochrane Library via CENTRAL, SPORTDiscus via Embase, and Medline, Embase, and PsycINFO via Ovid. Searches of the databases were conducted on 17 December 2018 and 7 August 2020 and last updated on 3 June 2023 (see supplementary file section S1 for full search strategies). We assessed full texts of all included studies from two systematic reviews of exercise for depression.1222
Study selection and data collection
To select studies, we removed duplicate records in Covidence38 and then screened each title and abstract independently and in duplicate. Conflicts were resolved through discussion or consultation with a third reviewer. The same methods were used for full text screening.
We used the Extraction 1.0 randomised controlled trial data extraction forms in Covidence.38 Data were extracted independently and in duplicate, with conflicts resolved through discussion with a third reviewer.
Data items
For each study, we extracted a description of the interventions, including frequency, intensity, and type and time of each exercise intervention. Using the Compendium of Physical Activities,39 we calculated the energy expenditure dose of exercise for each arm as metabolic equivalents of task (METs) min/week. Two authors evaluated each exercise intervention using the Behaviour Change Taxonomy version 130 for behaviour change techniques explicitly described in each exercise arm. They also rated the level of autonomy offered to participants, on a scale from 1 (no choice) to 10 (full autonomy). We also extracted descriptions of the other arms within the randomised trials, including other treatment or control conditions; participants’ age, sex, comorbidities, and baseline severity of depressive symptoms; and each trial’s location and whether or not the trial was funded.
Risk of bias in individual studies
We used Cochrane’s risk of bias tool for randomised controlled trials.40 Risk of bias was rated independently and in duplicate, with conflicts resolved through discussion with a third reviewer.
Summary measures and synthesis
For main and moderation analyses, we used bayesian arm based multilevel network meta-analysis models.41 All network meta-analytical approaches allow users to assess the effects of treatments against a range of comparisons. The bayesian arm based models allowed us to also assess the influence of hypothesised moderators, such as intensity, dose, age, and sex. Many network meta-analyses use contrast based methods, comparing post-test scores between study arms.41 Arm based meta-analyses instead describe the population-averaged absolute effect size for each treatment arm (ie, each arm’s change score).41 As a result, the summary measure we used was the standardised mean change from baseline, calculated as standardised mean differences with correction for small studies (Hedges’ g). In keeping with the norms from the included studies, effect sizes describe treatment effects on depression, such that larger negative numbers represent stronger effects on symptoms. Using National Institute for Health and Care Excellence guidelines,42 we standardised change scores for different depression scales (eg, Beck depression inventory, Hamilton depression rating scale) using an internal reference standard for each scale (for each scale, the average of pooled standard deviations at baseline) reported in our meta-analysis. Because depression scores generally show regression to the mean, even in control conditions, we present effect sizes as improvements beyond active control conditions. This convention makes our results comparable to existing, contrast based meta-analyses.
Active control conditions (usual care, placebo tablet, stretching, educational control, and social support) were grouped to increase power for moderation analyses, for parsimony in the network graph, and because they all showed similar arm based pooled effect sizes (Hedges’ g between −0.93 and −1.00 for all, with no statistically significant differences). We separated waitlist control from these active control conditions because it typically shows poorer effects in treatment for depression.43
Bayesian meta-analyses were conducted in R44 using the brms package.45 We preregistered informative priors based on the distributional parameters of our meta-analytical model.46 We nested effects within arms to manage dependency between multiple effect sizes from the same participants.46 For example, if one study reported two self-reported measures of depression, or reported both self-report and clinician rated depression, we nested these effect sizes within the arm to account for both pieces of information while controlling for dependency between effects.46 Finally, we compared absolute effect sizes against a standardised minimum clinically important difference, 0.5 standard deviations of the change score.47 From our data, this corresponded to a large change in before and after scores (Hedges’ g −1.16), a moderate change compared with waitlist control (g −0.55), or a small benefit when compared with active controls (g −0.20). For credibility assessments comparing exercise modalities, we used the netmeta package48 and CINeMA.49 We also used netmeta to model acceptability, comparing the odds ratio for drop-out rate in each arm.
Additional analyses
All prespecified moderation and sensitivity analyses were performed. We moderated for participant characteristics, including participants’ sex, age, baseline symptom severity, and presence or absence of comorbidities; duration of the intervention (weeks); weekly dose of the intervention; duration between completion of treatment and measurement, to test robustness to remission (in response to a reviewer’s suggestion); amount of autonomy provided in the exercise prescription; and presence of each behaviour change technique. As preregistered, we moderated for behaviour change techniques in three ways: through meta-regression, including all behaviour change techniques simultaneously for primary analysis; including one behaviour change technique at a time (using 99% credible intervals to somewhat control for multiple comparisons) in exploratory analyses; and through meta-analytical classification and regression trees (metaCART), which allowed for interactions between moderating variables (eg, if goal setting combined with feedback had synergistic effects).50 We conducted sensitivity analyses for risk of bias, assessing whether studies with low versus unclear or high risk of bias on each domain showed statistically significant differences in effect sizes.
Credibility assessment
To assess the credibility of each comparison against active control, we used CINeMA.3549 This online tool was designed by the Cochrane Comparing Multiple Interventions Methods Group as an adaptation of GRADE for network meta-analyses.35 In line with recommended guidelines, for each comparison we made judgements for within study bias, reporting bias, indirectness, imprecision, heterogeneity, and incoherence. Similar to GRADE, we considered the evidence for comparisons to show high confidence then downgraded on the basis of concerns in each domain, as follows:
Within study bias—Comparisons were downgraded when most of the studies providing direct evidence for comparisons were unclear or high risk.
Reporting bias—Publication bias was assessed in three ways. For each comparison with at least 10 studies51 we created funnel plots, including estimates of effect sizes after removing studies with statistically significant findings (ie, worst case estimates)52; calculated an s value, representing how strong publication bias would need to be to nullify meta-analytical effects52; and conducted a multilevel Egger’s regression test, indicative of small study bias. Given these tests are not recommended for comparisons with fewer than 10 studies,51 those comparisons were considered to show “some concerns.”
Indirectness—Our primary population of interest was adults with major depression. Studies were considered to be indirect if they focused on one sex only (>90% male or female), participants with comorbidities (eg, heart disease), adolescents and young adults (14-20 years), or older adults (>60 years). We flagged these studies as showing some concerns if one of these factors was present, and as “major concerns” if two of these factors were present. Evidence from comparisons was classified as some concerns or major concerns using majority rating for studies directly informing the comparison.
Imprecision—As per CINeMA, we used the clinically important difference of Hedges’ g=0.2 to ascribe a zone of equivalence, where differences were not considered clinically significant (−0.2<g<0.2). Studies were flagged as some concerns for imprecision if the bounds of the 95% credible interval extended across that zone, and they were flagged as major concerns if the bounds extended to the other side of the zone of equivalence (such that effects could be harmful).
Heterogeneity—Prediction intervals account for heterogeneity differently from credible intervals.35 As a result, CINeMA accounts for heterogeneity by assessing whether the prediction intervals and the credible intervals lead to different conclusions about clinical significance (using the same zone of equivalence from imprecision). Comparisons are flagged as some concerns if the prediction interval crosses into, or out of, the zone of equivalence once (eg, from helpful to no meaningful effect), and as major concerns if the prediction interval crosses the zone twice (eg, from helpful and harmful).
Incoherence—Incoherence assesses whether the network meta-analysis provides similar estimates when using direct evidence (eg, randomised controlled trials on strength training versus SSRI) compared with indirect evidence (eg, randomised controlled trials where either strength training or SSRI uses waitlist control). Incoherence provides some evidence the network may violate the assumption of transitivity: that the only systematic difference between arms is the treatment, not other confounders. We assessed incoherence using two methods: Firstly, a global design-by-treatment interaction to assess for incoherence across the whole network,3549 and, secondly, separating indirect and direct evidence (SIDE method) for each comparison through netsplitting to see whether differences between those effect estimates were statistically significant. We flagged comparisons as some concerns if either no direct comparisons were available or direct and indirect evidence gave different conclusions about clinical significance (eg, from helpful to no meaningful effect, as per imprecision and heterogeneity). Again, we classified comparisons as major concerns if the direct and indirect evidence changed the sign of the effect or changed both limitsof the credible interval.3549
Patient and public involvement
We discussed the aims and design of this study with members of the public, including those who had experienced depression. Several of our authors have experienced major depressive episodes, but beyond that we did not include patients in the conduct of this review.
Results
Study selection
The PRISMA flow diagram outlines the study selection process (fig 1). We used two previous reviews to identify potentially eligible studies for inclusion.1222 Database searches identified 18 658 possible studies. After 5505 duplicates had been removed, two reviewers independently screened 13 115 titles and abstracts. After screening, two reviewers independently reviewed 1738 full text articles. Supplementary file section S2 shows the consensus reasons for exclusion. A total of 218 unique studies described in 246 reports were included, totalling 495 arms and 14 170 participants. Supplementary file section S3 lists the references and characteristics of the included studies.
Network geometry
As preregistered, we removed nodes with fewer than 100 participants. Using this filter, most interventions contained comparisons with at least four other nodes in the network geometry (fig 2). The results of the global test design-by-treatment interaction model were not statistically significant, supporting the assumption of transitivity (χ2=94.92, df=75, P=0.06). When net-splitting was used on all possible combinations in the network, for two out of the 120 comparisons we found statistically significant incoherence between direct and indirect evidence (SSRI v waitlist control; cognitive behavioural therapy v tai chi or qigong). Overall, we found little statistical evidence that the model violated the assumption of transitivity. Qualitative differences were, however, found for participant characteristics between different arms (see supplementary file, section S4). For example, some interventions appeared to be prescribed more frequently among people with severe depression (eg, 7/16 studies using SSRIs) compared with other interventions (eg, 1/15 studies using aerobic exercise combined with therapy). Similarly, some interventions appeared more likely to be prescribed for older adults (eg, mean age, tai chi=59 v dance=31) or women (eg, per cent female: dance=88% v cycling=53%). Given that plausible mechanisms exist for these systematic differences (eg, the popularity of tai chi among older adults),53 there are reasons to believe that allocation to treatment arms would be less than perfectly random. We have factored these biases in our certainty estimates through indirectness ratings.
Risk of bias within studies
Supplementary file section S5 provides the risk of bias ratings for each study. Few studies explicitly blinded participants and staff (fig 3). As a result, overall risk of bias for most studies was unclear or high, and effect sizes could include expectancy effects, among other biases. However, sensitivity analyses suggested that effect sizes were not influenced by any risk of bias criteria owing to wide credible intervals (see supplementary file, section S6). Nevertheless, certainty ratings for all treatments arms were downgraded owing to high risk of bias in the studies informing the comparison.
Synthesis of results
Supplementary file section S7 presents a forest plot of Hedges’ g values for each study. Figure 4 shows the predicted effects of each treatment compared with active controls. Compared with active controls, large reductions in depression were found for dance (n=107, κ=5, Hedges’ g −0.96, 95% credible interval −1.36 to −0.56) and moderate reductions for walking or jogging (n=1210, κ=51, g −0.63, −0.80 to −0.46), yoga (n=1047, κ=33, g=−0.55, −0.73 to −0.36), strength training (n=643, κ=22, g=−0.49, −0.69 to −0.29), mixed aerobic exercises (n=1286, κ=51, g=−0.43, −0.61 to −0.25), and tai chi or qigong (n=343, κ=12, g=−0.42, −0.65 to −0.21). Moderate, clinically meaningful effects were also present when exercise was combined with SSRIs (n=268, κ=11, g=−0.55, −0.86 to −0.23) or aerobic exercise was combined with psychotherapy (n=404, κ=15, g=−0.54, −0.76 to −0.32). All these treatments were significantly stronger than the standardised minimum clinically important difference compared with active control (g=−0.20), equating to an absolute g value of −1.16. Dance, exercise combined with SSRIs, and walking or jogging were the treatments most likely to perform best when modelling the surface under the cumulative ranking curve (fig 4). For acceptability, the odds of participants dropping out of the study were lower for strength training (n=247, direct evidence κ=6, odds ratio 0.55, 95% credible interval 0.31 to 0.99) and yoga (n=264, κ=5, 0.57, 0.35 to 0.94) than for active control. The rate of dropouts was not significantly different from active control in any other arms (see supplementary file, section S8).
Consistent with other meta-analyses, effects were moderate for cognitive behaviour therapy alone (n=712, κ=20, g=−0.55, −0.75 to −0.37) and small for SSRIs (n=432, κ=16, g=−0.26, −0.50 to −0.01) compared with active controls (fig 4). These estimates are comparable to those of reviews that focused directly on psychotherapy (g=−0.67, −0.79 to −0.56)7 or pharmacotherapy (g=−0.30, –0.34 to −0.26).25 However, our review was not designed to find all studies of these treatments, so these estimates should not usurp these directly focused systematic reviews.
Credibility assessment
Despite the large number of studies in the network, confidence in the effects were low (fig 5). This was largely due to the high within study bias described in the risk of bias summary plot. Reporting bias was also difficult to robustly assess because direct comparison with active control was often only provided in fewer than 10 studies. Many studies focused on one sex only, older adults, or those with comorbidities, so most arms had some concerns about indirect comparisons. Credible intervals were seldom wide enough to change decision making, so concerns about imprecision were few. Heterogeneity did plausibly change some conclusions around clinical significance. Few studies showed problematic incoherence, meaning direct and indirect evidence usually agreed. Overall, walking or jogging had low confidence, with other modalities being very low.
Moderation by participant characteristics
The optimal modality appeared to be moderated by age and sex. Compared with models that only included exercise modality (R2=0.65), R2 was higher for models that included interactions with sex (R2=0.71) and age (R2=0.69). R2 showed no substantial increase for models including baseline depression (R2=0.67) or comorbidities (R2=0.66; see supplementary file, section S9).
Effects appeared larger for women than men for strength training and cycling (fig 6). Effects appeared to be larger for men than women when prescribing yoga, tai chi, and aerobic exercise alongside psychotherapy. Yoga and aerobic exercise alongside psychotherapy appeared more effective for older participants than younger people (fig 7). Strength training appeared more effective when prescribed to younger participants than older participants. Some estimates were associated with substantial uncertainty because some modalities were not well studied in some groups (eg, tai chi for younger adults), and mean age of the sample was only available for 71% of the studies.
Moderation by intervention and design characteristics
Across modalities, a clear dose-response curve was observed for intensity of exercise prescribed (fig 8). Although light physical activity (eg, walking, hatha yoga) still provided clinically meaningful effects (g=−0.58, −0.82 to −0.33), expected effects were stronger for vigorous exercise (eg, running, interval training; g=−0.74, −1.10 to −0.38). This finding did not appear to be due to increased weekly energy expenditure: credible intervals were wide, which meant that the dose-response curve for METs/min prescribed per week was unclear (see supplementary file, section S10). Weak evidence suggested that shorter interventions (eg, 10 weeks: g=−0.53, −0.71 to −0.35) worked somewhat better than longer ones (eg, 30 weeks: g=−0.37, −0.79 to 0.03), with wide credible intervals again indicating high uncertainty (see supplementary file, section S11). We also moderated for the lag between the end of treatment and the measurement of the outcome. We found no indication that participants were likely to relapse within the measurement period (see supplementary file, section S12); effects remained steady when measured either directly after the intervention (g=−0.59, −0.80 to −0.39) or up to six months later (g=−0.63, −0.87 to −0.40).
Supplementary file section S13 provides coding for the behaviour change techniques and autonomy for each exercise arm. None of the behaviour change techniques significantly moderated overall effects. Contrary to expectations, studies describing a level of participant autonomy (ie, choice over frequency, intensity, type, or time) tended to show weaker effects (g=−0.28, −0.78 to 0.23) than those that did not (g=−0.75, −1.17 to −0.33; see supplementary file, section S14). This effect was consistent whether or not we included studies that used physical activity counselling (usually high autonomy).
Use of group exercise appeared to moderate the effects: although the overall effects were similar for individual (g=−1.10, −1.57 to −0.64) and group exercise (g=−1.16, −1.61 to −0.73), some interventions were better delivered in groups (yoga) and some were better delivered individually (strength training, mixed aerobic exercise; see supplementary file, section S15).
As preregistered, we tested whether study funding moderated effects. Models that included whether a study was funded did explain more variance (R2=0.70) compared with models that included treatment alone (R2=0.65). Funded studies showed stronger effects (g=−1.01, −1.19 to −0.82) than unfunded studies (g=−0.77, −1.09 to −0.46). We also moderated for the type of measure (self-report v clinician report). This did not explain a substantial amount of variance in the outcome (R2=0.66).
Sensitivity analyses
Evidence of publication bias was found for overall estimates of exercise on depression compared with active controls, although not enough to nullify effects. The multilevel Egger’s test showed significance (F1,98=23.93, P<0.001). Funnel plots showed asymmetry, but the result of pooled effects remained statistically significant when only including non-significant studies (see supplementary file, section S16). No amount of publication bias would be sufficient to shrink effects to zero (s value=not possible). To reduce effects below clinical significance thresholds, studies with statistically significant results would need to be reported 58 times more frequently than studies with non-significant results.
Qualitative synthesis of mediation effects
Only a few of the studies used explicit mediation analyses to test hypothesised mechanisms of action.545556575859 One study found that both aerobic exercise and yoga led to decreased depression because participants ruminated less.54 The study found that the effects of aerobic exercise (but not yoga) were mediated by increased acceptance.54 “Perceived hassles” and awareness were not statistically significant mediators.54 Another study found that the effects of yoga were mediated by increased self-compassion, but not rumination, self-criticism, tolerance of uncertainty, body awareness, body trust, mindfulness, and attentional biases.55 One study found that the effects from an aerobic exercise intervention were not mediated by long term physical activity, but instead were mediated by exercise specific affect regulation (eg, self-control for exercise).57 Another study found that neither exercise self-efficacy nor depression coping self-efficacy mediated effects of aerobic exercise.56 Effects of aerobic exercise were not mediated by the N2 amplitude from electroencephalography, hypothesised as a neuro-correlate of cognitive control deficits.58 Increased physical activity did not appear to mediate the effects of physical activity counselling on depression.59 It is difficult to infer strong conclusions about mechanisms on the basis of this small number of studies with low power.
Discussion
Summary of evidence
In this systematic review and meta-analysis of randomised controlled trials, exercise showed moderate effects on depression compared with active controls, either alone or in combination with other established treatments such as cognitive behaviour therapy. In isolation, the most effective exercise modalities were walking or jogging, yoga, strength training, and dancing. Although walking or jogging were effective for both men and women, strength training was more effective for women, and yoga or qigong was more effective for men. Yoga was somewhat more effective among older adults, and strength training was more effective among younger people. The benefits from exercise tended to be proportional to the intensity prescribed, with vigorous activity being better. Benefits were equally effective for different weekly doses, for people with different comorbidities, or for different baseline levels of depression. Although confidence in many of the results was low, treatment guidelines may be overly conservative by conditionally recommending exercise as complementary or alternative treatment for patients in whom psychotherapy or pharmacotherapy is either ineffective or unacceptable.60 Instead, guidelines for depression ought to include prescriptions for exercise and consider adapting the modality to participants’ characteristics and recommending more vigorous intensity exercises.
Our review did not uncover clear causal mechanisms, but the trends in the data are useful for generating hypotheses. It is unlikely that any single causal mechanism explains all the findings in the review. Instead, we hypothesise that a combination of social interaction,61 mindfulness or experiential acceptance,62 increased self-efficacy,33 immersion in green spaces,63 neurobiological mechanisms,64 and acute positive affect65 combine to generate outcomes. Meta-analyses have found each of these factors to be associated with decreases in depressive symptoms, but no single treatment covers all mechanisms. Some may more directly promote mindfulness (eg, yoga), be more social (eg, group exercise), be conducted in green spaces (eg, walking), provide a more positive affect (eg, “runner’s high”’), or be more conducive to acute adaptations that may increase self-efficacy (eg, strength).66 Exercise modalities such as running may satisfy many of the mechanisms, but they are unlikely to directly promote the mindful self-awareness provided by yoga and qigong. Both these forms of exercise are often practised in groups with explicit mindfulness but seldom have fast and objective feedback loops that improve self-efficacy. Adequately powered studies testing multiple mediators may help to focus more on understanding why exercise helps depression and less on whether exercise helps. We argue that understanding these mechanisms of action is important for personalising prescriptions and better understanding effective treatments.
Our review included more studies than many existing reviews on exercise for depression.13222728 As a result, we were able to combine the strengths of various approaches to exercise and to make more nuanced and precise conclusions. For example, even taking conservative estimates (ie, the least favourable end of the credible interval), practitioners can expect patients to experience clinically significant effects from walking, running, yoga, qigong, strength training, and mixed aerobic exercise. Because we simultaneously assessed more than 200 studies, credible intervals were narrower than those in most existing meta-analyses.13 We were also able to explore non-linear relationships between outcomes and moderators, such as frequency, intensity, and time. These analyses supported some existing findings—for example, our study and the study by Heissel et al22 found that shorter interventions had stronger effects, at least for six months; our study and the study by Singh et al13 both found that effects were stronger with vigorous intensity exercise compared with light and moderate exercise. However, most existing reviews found various treatment modalities to be equally effective.1327 In our review, some types of exercise had stronger effect sizes than others. We attribute this to the study level data available in a network meta-analysis compared with an overview of reviews 24 and higher power compared with meta-analyses with smaller numbers of included studies.2228 Overviews of reviews have the ability to more easily cover a wider range of participants, interventions, and outcomes, but also risk double counting randomised trials that are included in separate meta-analyses. They often include heterogeneous studies without having as much control over moderation analyses (eg, Singh et al included studies covering both prevention and treatment13). Some of those reviews grouped interventions such as yoga with heterogeneous interventions such as stretching and qigong.13 This practise of combining different interventions makes it harder to interpret meta-analytical estimates. We used methods that enabled us to separately analyse the effects of these treatment modalities. In so doing, we found that these interventions do have different effects, with yoga being an intervention with strong effects and stretching being better described as an active control condition. Network meta-analyses revealed the same phenomenon with psychotherapy: researchers once concluded there was a dodo bird verdict, whereby “everybody has won, and all must have prizes,”67 until network meta-analyses showed some interventions were robustly more effective than others.626
Predictors of acceptability and outcomes
We found evidence to suggest good acceptability of yoga and strength training; although the measurement of study drop-out is an imperfect proxy of adherence. Participants may complete the study without doing any exercise or may continue exercising and drop out of the study for other reasons. Nevertheless, these are useful data when considering adherence.
Behaviour change techniques, which are designed to increase adherence, did not meaningfully moderate the effect sizes from exercise. This may be due to several factors. It may be that the modality explains most of the variance between effects, such that behaviour change techniques (eg, presence or absence of feedback) did not provide a meaningful contribution. Many forms of exercise potentially contain therapeutic benefits beyond just energy expenditure. These characteristics of a modality may be more influential than coexisting behaviour change techniques. Alternatively, researchers may have used behaviour change techniques such as feedback or goal setting without explicitly reporting them in the study methods. Given the inherent challenges of behaviour change among people with depression,29 and the difficulty in forecasting which strategies are likely to be effective,68 we see the identification of effective techniques as important.
We did find that autonomy, as provided in the methods of included studies, predicted effects, but in the opposite direction to our hypotheses: more autonomy was associated with weaker effects. Physical activity counselling, which usually provides a great deal of patient autonomy, was among the lowest effect sizes in our meta-analysis. Higher autonomy judgements were associated with weaker outcomes regardless of whether physical activity counselling was included in the model. One explanation for these data is that people with depression benefit from the clear direction and accountability of a standardised prescription. When provided with more freedom, the low self-efficacy that is symptomatic of depression may stop patients from setting an appropriate level of challenge (eg, they may be less likely to choose vigorous exercise). Alternatively, participants were likely autonomous when self-selecting into trials with exercise modalities they enjoyed, or those that fit their social circumstances. After choosing something value aligned, autonomy within the trial may not have helpful. Either way, data should be interpreted with caution. Our judgement of the autonomy provided in the methods may not reflect how much autonomy support patients actually felt. The patient’s perceived autonomy is likely determined by a range of factors not described in the methods (eg, the social environment created by those delivering the programme, or their social identity), so other studies that rely on patient reports of the motivational climate are likely to be more reliable.33 Our findings reiterate the importance of considering these patient reports in future research of exercise for depression.
Our findings suggest that practitioners could advocate for most patients to engage in exercise. Those patients may benefit from guidance on intensity (ie, vigorous) and types of exercise that appear to work well (eg, walking, running, mixed aerobic exercise, strength training, yoga, tai chi, qigong) and be well tolerated (eg, strength training and yoga). If social determinants permit,66 engaging in group exercise or structured programmes could provide support and guidance to achieve better outcomes. Health services may consider offering these programmes as an alternative or adjuvant treatment for major depression. Specifically, although the confidence in the evidence for exercise is less strong than for cognitive behavioural therapy, the effect sizes seem comparable, so it may be an alternative for patients who prefer not to engage in psychotherapy. Previous reviews on those with mild-moderate depression have found similar effects for exercise or SSRIs, or the two combined.1314 In contrast, we found some forms of exercise to have stronger effects than SSRIs alone. Our findings are likely related to the larger power in our review (n=14 170) compared with previous reviews (eg, n=2551),14 and our ability to better account for heterogeneity in exercise prescriptions. Exercise may therefore be considered a viable alternative to drug treatment. We also found evidence that exercise increases the effects of SSRIs, so offering exercise may act as an adjuvant for those already taking drugs. We agree with consensus statements that professionals should still account for patients’ values, preferences, and constraints, ensuring there is shared decision making around what best suits the patient.66 Our review provides data to help inform that decision.
Strengths, limitations, and future directions
Based on our findings, dance appears to be a promising treatment for depression, with large effects found compared with other interventions in our review. But the small number of studies, low number of participants, and biases in the study designs prohibits us from recommending dance more strongly. Given most research for the intervention has been in young women (88% female participants, mean age 31 years), it is also important for future research to assess the generalisability of the effects to different populations, using robust experimental designs.
The studies we found may be subject to a range of experimental biases. In particular, researchers seldom blinded participants or staff delivering the intervention to the study’s hypotheses. Blinding for exercise interventions may be harder than for drugs23; however, future studies could attempt to blind participants and staff to the study’s hypotheses to avoid expectancy effects.69 Some of our ratings are for studies published before the proliferation of reporting checklists, so the ratings might be too critical.23 For example, before CONSORT, few authors explicitly described how they generated a random sequence.23 Therefore, our risk of bias judgements may be too conservative. Similarly, we planned to use the Cochrane risk of bias (RoB) 1 tool40 so we could use the most recent Cochrane review of exercise and depression12 to calibrate our raters, and because RoB 2 had not yet been published.70 Although assessments of bias between the two tools are generally comparable,71 the RoB 1 tool can be more conservative when assessing open label studies with subjective assessments (eg, unblinded studies with self-reported measures for depression).71 As a result, future reviews should consider using the latest risk of bias tool, which may lead to different assessments of bias in included studies.
Most of the main findings in this review appear robust to risks from publication bias. Specifically, pooled effect sizes decreased when accounting for risk of publication bias, but no degree of publication bias could nullify effects. We did not exclude grey literature, but our search strategy was not designed to systematically search grey literature or trial registries. Doing so can detect additional eligible studies72 and reveal the numbers of completed studies that remain unpublished.73 Future reviews should consider more systematic searches for this kind of literature to better quantify and mitigate risk of publication bias.
Similarly, our review was able to integrate evidence that directly compared exercise with other treatment modalities such as SSRIs or psychotherapy, while also informing estimates using indirect evidence (eg, comparing the relative effects of strength training and SSRIs when tested against a waitlist control). Our review did not, however, include all possible sources of indirect evidence. Network meta-analyses exist that directly focus on psychotherapy7 and pharmacotherapy,25 and these combined for treating depression.6 Those reviews include more than 500 studies comparing psychological or drug interventions with controls. Harmonising the findings of those reviews with ours would provide stronger data on indirect effects.
Our review found some interesting moderators by age and sex, but these were at the study level rather than individual level—that is, rather than being able to determine whether women engaging in a strength intervention benefit more than men, we could only conclude that studies with more women showed larger effects than studies with fewer women. These studies may have been tailored towards women, so effects may be subject to confounding, as both sex and intervention may have changed. The same finding applied to age, where studies on older adults were likely adapted specifically to this age group. These between study differences may explain the heterogeneity in the effects of interventions, and confounding means our moderators for age and sex should be interpreted cautiously. Future reviews should consider individual patient meta-analyses to allow for more detailed assessments of participant level moderators.
Finally, for many modalities, the evidence is derived from small trials (eg, the median number of walking or jogging arms was 17). In addition to reducing risks from bias, primary research may benefit from deconstruction designs or from larger, head-to-head analyses of exercise modalities to better identify what works best for each candidate.
Clinical and policy implications
Our findings support the inclusion of exercise as part of clinical practice guidelines for depression, particularly vigorous intensity exercise. Doing so may help bridge the gap in treatment coverage by increasing the range of first line options for patients and health systems.9 Globally there has been an attempt to reduce stigma associated with seeking treatment for depression.74 Exercise may support this effort by providing patients with treatment options that carry less stigma. In low resource or funding constrained settings, group exercise interventions may provide relatively low cost alternatives for patients with depression and for health systems. When possible, ideal treatment may involve individualised care with a multidisciplinary team, where exercise professionals could take responsibility for ensuring the prescription is safe, personalised, challenging, and supported. In addition, those delivering psychotherapy may want to direct some time towards tackling cognitive and behavioural barriers to exercise. Exercise professionals might need to be trained in the management of depression (eg, managing risk) and to be mindful of the scope of their practice while providing support to deal with this major cause of disability.
Conclusions
Depression imposes a considerable global burden. Many exercise modalities appear to be effective treatments, particularly walking or jogging, strength training, and yoga, but confidence in many of the findings was low. We found preliminary data that may help practitioners tailor interventions to individuals (eg, yoga for older men, strength training for younger women). The World Health Organization recommends physical activity for everyone, including those with chronic conditions and disabilities,75 but not everyone can access treatment easily. Many patients may have physical, psychological, or social barriers to participation. Still, some interventions with few costs, side effects, or pragmatic barriers, such as walking and jogging, are effective across people with different personal characteristics, severity of depression, and comorbidities. Those who are able may want to choose more intense exercise in a structured environment to further decrease depression symptoms. Health systems may want to provide these treatments as alternatives or adjuvants to other established interventions (cognitive behaviour therapy, SSRIs), while also attenuating risks to physical health associated with depression.3 Therefore, effective exercise modalities could be considered alongside those intervention as core treatments for depression.
What is already known on this topic
Depression is a leading cause of disability, and exercise is often recommended alongside first line treatments such as pharmacotherapy and psychotherapy
Treatment guidelines and previous reviews disagree on how to prescribe exercise to best treat depression
What this study adds
Various exercise modalities are effective (walking, jogging, mixed aerobic exercise, strength training, yoga, tai chi, qigong) and well tolerated (especially strength training and yoga)
Effects appeared proportional to the intensity of exercise prescribed and were stronger for group exercise and interventions with clear prescriptions
Preliminary evidence suggests interactions between types of exercise and patients’ personal characteristics
Ethics statements
Ethical approval
Not required.
Acknowledgments
We thank Lachlan McKee for his assistance with data extraction. We also thank Juliette Grosvenor and another librarian (anonymous) for their review of our search strategy.
Footnotes
Contributors: MN led the project, drafted the manuscript, and is the guarantor. MN, TS, PT, MM, BdPC, PP, SB, and CL drafted the initial study protocol. MN, TS, PT, BdPC, DvdH, JS, MM, RP, LP, RV, HA, and BV conducted screening, extraction, and risk of bias assessment. MN, JS, and JM coded methods for behaviour change techniques. MN and DGG conducted statistical analyses. PP, SB, and CL provided supervision and mentorship. All authors reviewed and approved the final manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: None received.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Data sharing Data and code for reproducing analyses are available on the Open Science Framework (https://osf.io/nzw6u/).
The lead author (MN) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Dissemination to participants and related patient and public communities: We plan to disseminate the findings of this study to lay audiences through mainstream and social media.
Provenance and peer review: Not commissioned; externally peer reviewed.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.