Intended for healthcare professionals

CCBYNC Open access
Research

Efficacy of psilocybin for treating symptoms of depression: systematic review and meta-analysis

BMJ 2024; 385 doi: https://doi.org/10.1136/bmj-2023-078084 (Published 01 May 2024) Cite this as: BMJ 2024;385:e078084

Linked Editorial

Psilocybin for depression

  1. Athina-Marina Metaxa, masters graduate researcher1,
  2. Mike Clarke, professor2
  1. 1Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford OX2 6GG, UK
  2. 2Northern Ireland Methodology Hub, Centre for Public Health, ICS-A Royal Hospitals, Belfast, Ireland, UK
  1. Correspondence to: A-M Metaxa athina.metaxa{at}hmc.ox.ac.uk (or @Athina_Metaxa12 on X)
  • Accepted 6 March 2024

Abstract

Objective To determine the efficacy of psilocybin as an antidepressant compared with placebo or non-psychoactive drugs.

Design Systematic review and meta-analysis.

Data sources Five electronic databases of published literature (Cochrane Central Register of Controlled Trials, Medline, Embase, Science Citation Index and Conference Proceedings Citation Index, and PsycInfo) and four databases of unpublished and international literature (ClinicalTrials.gov, WHO International Clinical Trials Registry Platform, ProQuest Dissertations and Theses Global, and PsycEXTRA), and handsearching of reference lists, conference proceedings, and abstracts.

Data synthesis and study quality Information on potential treatment effect moderators was extracted, including depression type (primary or secondary), previous use of psychedelics, psilocybin dosage, type of outcome measure (clinician rated or self-reported), and personal characteristics (eg, age, sex). Data were synthesised using a random effects meta-analysis model, and observed heterogeneity and the effect of covariates were investigated with subgroup analyses and metaregression. Hedges’ g was used as a measure of treatment effect size, to account for small sample effects and substantial differences between the included studies’ sample sizes. Study quality was appraised using Cochrane’s Risk of Bias 2 tool, and the quality of the aggregated evidence was evaluated using GRADE guidelines.

Eligibility criteria Randomised trials in which psilocybin was administered as a standalone treatment for adults with clinically significant symptoms of depression and change in symptoms was measured using a validated clinician rated or self-report scale. Studies with directive psychotherapy were included if the psychotherapeutic component was present in both experimental and control conditions. Participants with depression regardless of comorbidities (eg, cancer) were eligible.

Results Meta-analysis on 436 participants (228 female participants), average age 36-60 years, from seven of the nine included studies showed a significant benefit of psilocybin (Hedges’ g=1.64, 95% confidence interval (CI) 0.55 to 2.73, P<0.001) on change in depression scores compared with comparator treatment. Subgroup analyses and metaregressions indicated that having secondary depression (Hedges’ g=3.25, 95% CI 0.97 to 5.53), being assessed with self-report depression scales such as the Beck depression inventory (3.25, 0.97 to 5.53), and older age and previous use of psychedelics (metaregression coefficient 0.16, 95% CI 0.08 to 0.24 and 4.2, 1.5 to 6.9, respectively) were correlated with greater improvements in symptoms. All studies had a low risk of bias, but the change from baseline metric was associated with high heterogeneity and a statistically significant risk of small study bias, resulting in a low certainty of evidence rating.

Conclusion Treatment effects of psilocybin were significantly larger among patients with secondary depression, when self-report scales were used to measure symptoms of depression, and when participants had previously used psychedelics. Further research is thus required to delineate the influence of expectancy effects, moderating factors, and treatment delivery on the efficacy of psilocybin as an antidepressant.

Systematic review registration PROSPERO CRD42023388065.

Introduction

Depression affects an estimated 300 million people around the world, an increase of nearly 20% over the past decade.1 Worldwide, depression is also the leading cause of disability.2

Drugs for depression are widely available but these seem to have limited efficacy, can have serious adverse effects, and are associated with low patient adherence.34 Importantly, the treatment effects of antidepressant drugs do not appear until 4-7 weeks after the start of treatment, and remission of symptoms can take months.45 Additionally, the likelihood of relapse is high, with 40-60% of people with depression experiencing a further depressive episode, and the chance of relapse increasing with each subsequent episode.67

Since the early 2000s, the naturally occurring serotonergic hallucinogen psilocybin, found in several species of mushrooms, has been widely discussed as a potential treatment for depression.89 Psilocybin’s mechanism of action differs from that of classic selective serotonin reuptake inhibitors (SSRIs) and might improve the treatment response rate, decrease time to improvement of symptoms, and prevent relapse post-remission. Moreover, more recent assessments of harm have consistently reported that psilocybin generally has low addictive potential and toxicity and that it can be administered safely under clinical supervision.10

The renewed interest in psilocybin’s antidepressive effects led to several clinical trials on treatment resistant depression,1112 major depressive disorder,13 and depression related to physical illness.14151617 These trials mostly reported positive efficacy findings, showing reductions in symptoms of depression within a few hours to a few days after one dose or two doses of psilocybin.111213161718 These studies reported only minimal adverse effects, however, and drug harm assessments in healthy volunteers indicated that psilocybin does not induce physiological toxicity, is not addictive, and does not lead to withdrawal.1920 Nevertheless, these findings should be interpreted with caution owing to the small sample sizes and open label design of some of these studies.1121

Several systematic reviews and meta-analyses since the early 2000s have investigated the use of psilocybin to treat symptoms of depression. Most found encouraging results, but as well as people with depression some included healthy volunteers,22 and most combined data from studies of multiple serotonergic psychedelics,232425 even though each compound has unique neurobiological effects and mechanisms of action.262728 Furthermore, many systematic reviews included non-randomised studies and studies in which psilocybin was tested in conjunction with psychotherapeutic interventions,2529303132 which made it difficult to distinguish psilocybin’s treatment effects. Most systematic reviews and meta-analyses did not consider the impact of factors that could act as moderators to psilocybin’s effects, such as type of depression (primary or secondary), previous use of psychedelics, psilocybin dosage, type of outcome measure (clinician rated or self-reported), and personal characteristics (eg, age, sex).252629303132 Lastly, systematic reviews did not consider grey literature,3334 which might have led to a substantial overestimation of psilocybin’s efficacy as a treatment for depression. In this review we focused on randomised trials that contained an unconfounded evaluation of psilocybin in adults with symptoms of depression, regardless of country and language of publication.

Methods

In this systematic review and meta-analysis of indexed and non-indexed randomised trials we investigated the efficacy of psilocybin to treat symptoms of depression compared with placebo or non-psychoactive drugs. The protocol was registered in the International Prospective Register of Systematic Reviews (see supplementary Appendix A). The study overall did not deviate from the pre-registered protocol; one clarification was made to highlight that any non-psychedelic comparator was eligible for inclusion, including placebo, niacin, micro doses of psychedelics, and drugs that are considered the standard of care in depression (eg, SSRIs).

Inclusion and exclusion criteria

Double blind and open label randomised trials with a crossover or parallel design were eligible for inclusion. We considered only studies in humans and with a control condition, which could include any type of non -active comparator, such as placebo, niacin, or micro doses of psychedelics.

Eligible studies were those that included adults (≥18 years) with clinically significant symptoms of depression, evaluated using a clinically validated tool for depression and mood disorder outcomes. Such tools included the Beck depression inventory, Hamilton depression rating scale, Montgomery-Åsberg depression rating scale, profile of mood states, and quick inventory of depressive symptomatology. Studies of participants with symptoms of depression and comorbidities (eg, cancer) were also eligible. We excluded studies of healthy participants (without depressive symptomatology).

Eligible studies investigated the effect of psilocybin as a standalone treatment on symptoms of depression. Studies with an active psilocybin condition that involved micro dosing (ie, psilocybin <100 μg/kg, according to the commonly accepted convention2235) were excluded. We included studies with directive psychotherapy if the psychotherapeutic component was present in both the experimental and the control conditions, so that the effects of psilocybin could be distinguished from those of psychotherapy. Studies involving group therapy were also excluded. Any non-psychedelic comparator was eligible for inclusion, including placebo, niacin, and micro doses of psychedelics.

Changes in symptoms, measured by validated clinician rated or self-report scales, such as the Beck depression inventory, Hamilton depression rating scale, Montgomery-Åsberg depression rating scale, profile of mood states, and quick inventory of depressive symptomatology were considered. We excluded outcomes that were measured less than three hours after psilocybin had been administered because any reported changes could be attributed to the transient cognitive and affective effects of the substance being administered. Aside from this, outcomes were included irrespective of the time point at which measurements were taken.

Search strategy

We searched major electronic databases and trial registries of psychological and medical research, with no limits on the publication date. Databases were the Cochrane Central Register of Controlled Trials via the Cochrane Library, Embase via Ovid, Medline via Ovid, Science Citation Index and Conference Proceedings Citation Index-Science via Web of Science, and PsycInfo via Ovid. A search through multiple databases was necessary because each database includes unique journals. Supplementary Appendix B shows the search syntax used for the Cochrane Central Register of Controlled Trials, which was slightly modified to comply with the syntactic rules of the other databases.

Unpublished and grey literature were sought through registries of past and ongoing trials, databases of conference proceedings, government reports, theses, dissertations, and grant registries (eg, ClinicalTrials.gov, WHO International Clinical Trials Registry Platform, ProQuest Dissertations and Theses Global, and PsycEXTRA). The references and bibliographies of eligible studies were checked for relevant publications. The original search was done in January 2023 and updated search was performed on 10 August 2023.

Data collection, extraction, and management

The results of the literature search were imported to the Endnote X9 reference management software, and the references were imported to the Covidence platform after removal of duplicates. Two reviewers (AM and DT) independently screened the title and abstract of each reference and then screened the full text of potentially eligible references. Any disagreements about eligibility were resolved through discussion. If information was insufficient to determine eligibility, the study’s authors were contacted. The reviewers were not blinded to the studies’ authors, institutions, or journal of publication.

The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram shows the study selection process and reasons for excluding studies that were considered eligible for full text screening.36

Critical appraisal of individual studies and of aggregated evidence

The methodological quality of eligible studies was assessed using the Cochrane Risk of Bias 2 tool (RoB 2) for assessing risk of bias in randomised trials.37 In addition to the criteria specified by RoB 2, we considered the potential impact of industry funding and conflicts of interest. The overall methodological quality of the aggregated evidence was evaluated using GRADE (Grading of Recommendations, Assessment, Development and Evaluation).38

If we found evidence of heterogeneity among the trials, then small study biases, such as publication bias, were assessed using a funnel plot and asymmetry tests (eg, Egger’s test).39

Data items

We used a template for data extraction (see supplementary Appendix C) and summarised the extracted data in tabular form, outlining personal characteristics (age, sex, previous use of psychedelics), methodology (study design, dosage), and outcome related characteristics (mean change from baseline score on a depression questionnaire, response rates, and remission rates) of the included studies. Response conventionally refers to a 50% decrease in symptom severity based on scores on a depression rating scale, whereas remission scores are specific to a questionnaire (eg, score of ≤5 on the quick inventory of depressive symptomatology, score of ≤10 on the Montgomery-Åsberg depression rating scale, 50% or greater reduction in symptoms, score of ≤7 on the Hamilton depression rating scale, or score of ≤12 on the Beck depression inventory). Across depression scales, higher scores signify more severe symptoms of depression.

Continuous data synthesis

From each study we extracted the baseline and post-intervention means and standard deviations (SDs) of the scores between comparison groups for the depression questionnaires and calculated the mean differences and SDs of change. If means and SDs were not available for the included studies, we extracted the values from available graphs and charts using the Web Plot Digitizer application (https://automeris.io/WebPlotDigitizer/). If it was not possible to calculate SDs from the graphs or charts, we generated values by converting standard errors (SEs) or confidence intervals (CIs), depending on availability, using formulas in the Cochrane Handbook (section 7.7.3.2).40

Standardised mean differences were calculated for each study. We chose these rather than weighted mean differences because, although all the studies measured depression as the primary outcome, they did so with different questionnaires that score depression based on slightly different items.41 If we had used weighted mean differences, any variability among studies would be assumed to reflect actual methodological or population differences and not differences in how the outcome was measured, which could be misleading.40

The Hedges’ g effect size estimate was used because it tends to produce less biased results for studies with smaller samples (<20 participants) and when sample sizes differ substantially between studies, in contrast with Cohen’s d.42 According to the Cochrane Handbook, the Hedges’ g effect size measure is synonymous with the standardised mean difference,40 and the terms may be used interchangeably. Thus, a Hedges’ g of 0.2, 0.5, 0.8, or 1.2 corresponds to a small, medium, large, or very large effect, respectively.40

Owing to variation in the participants’ personal characteristics, psilocybin dosage, type of depression investigated (primary or secondary), and type of comparators, we used a random effects model with a Hartung-Knapp-Sidik-Jonkman modification.43 This model also allowed for heterogeneity and within study variability to be incorporated into the weighting of the results of the included studies.44 Lastly, this model could help to generalise the findings beyond the studies and patient populations included, making the meta-analysis more clinically useful.45 We chose the Hartung-Knapp-Sidik-Jonkman adjustment in favour of more widely used random effects models (eg, DerSimonian and Laird) because it allows for better control of type 1 errors, especially for studies with smaller samples, and provides a better estimation of between study variance by accounting for small sample sizes.4647

For studies in which multiple treatment groups were compared with a single placebo group, we split the placebo group to avoid multiplicity.48 Similarly, if studies included multiple primary outcomes (eg, change in depression at three weeks and at six weeks), we split the treatment groups to account for overlapping participants.40

Prediction intervals (PIs) were calculated and reported to show the expected effect range of a similar future study, in a different setting. In a random effects model, within study measures of variability, such as CIs, can only show the range in which the average effect size could lie, but they are not informative about the range of potential treatment effects given the heterogeneity between studies.49 Thus, we used PIs as an indication of variation between studies.

Heterogeneity and sensitivity analysis

Statistical heterogeneity was tested using the χ2 test (significance level P<0.1) and I2 statistic, and heterogeneity among included studies was evaluated visually and displayed graphically using a forest plot. If substantial or considerable heterogeneity was found (I2≥50% or P<0.1),50 we considered the study design and characteristics of the included studies. Sources of heterogeneity were explored by subgroup analysis, and the potential effects on the results are discussed.

Planned sensitivity analyses to assess the effect of unpublished studies and studies at high risk of bias were not done because all included studies had been published and none were assessed as high risk of bias. Exclusion sensitivity plots were used to display graphically the impact of individual studies and to determine which studies had a particularly large influence on the results of the meta-analysis. All sensitivity analyses were carried out with Stata 16 software.

Subgroup analysis

To reduce the risk of errors caused by multiplicity and to avoid data fishing, we planned subgroup analyses a priori and limited to: (1) patient characteristics, including age and sex; (2) comorbidities, such as a serious physical condition (previous research indicates that the effects of psilocybin may be less strong for such participants, compared with participants with no comorbidities)33; (3) number of doses and amount of psilocybin administered, because some previous meta-analyses found that a higher number of doses and a higher dose of psilocybin both predicted a greater reduction in symptoms of depression,34 whereas others reported the opposite33; (4) psilocybin administered alongside psychotherapeutic guidance or as a standalone treatment; (5) severity of depressive symptoms (clinical v subclinical symptomatology); (6) clinician versus patient rated scales; and (7) high versus low quality studies, as determined by RoB 2 assessment scores.

Metaregression

Given that enough studies were identified (≥10 distinct observations according to the Cochrane Handbook’s suggestion40), we performed metaregression to investigate whether covariates, or potential effect modifiers, explained any of the statistical heterogeneity. The metaregression analysis was carried out using Stata 16 software.

Random effects metaregression analyses were used to determine whether continuous variables such as participants’ age, percentage of female participants, and percentage of participants who had previously used psychedelics modified the effect estimate, all of which have been implicated in differentially affecting the efficacy of psychedelics in modifying mood.51 We chose this approach in favour of converting these continuous variables into categorical variables and conducting subgroup analyses for two primary reasons; firstly, the loss of any data and subsequent loss of statistical power would increase the risk of spurious significant associations,51 and, secondly, no cut-offs have been agreed for these factors in literature on psychedelic interventions for mood disorders,52 making any such divisions arbitrary and difficult to reconcile with the findings of other studies. The analyses were based on within study averages, in the absence of individual data points for each participant, with the potential for the results to be affected by aggregate bias, compromising their validity and generalisability.53 Furthermore, a group level analysis may not be able to detect distinct interactions between the effect modifiers and participant subgroups, resulting in ecological bias.54 As a result, this analysis should be considered exploratory.

Sensitivity analysis

A sensitivity analysis was performed to determine if choice of analysis method affected the primary findings of meta-analysis. Specifically, we reanalysed the data on change in depression score using a random effects Dersimonian and Laird model without the Hartung-Knapp-Sidik-Jonkman modification and compared the results with those of the originally used model. This comparison is particularly important in the presence of substantial heterogeneity and the potential of small study effects to influence the intervention effect estimate.55

Patient and public involvement

Research on novel depression treatments is of great interest to both patients and the public. Although patients and members of the public were not directly involved in the planning or writing of this manuscript owing to a lack of available funding for recruitment and researcher training, patients and members of the public read the manuscript after submission.

Results

Figure 1 presents the flow of studies through the systematic review and meta-analysis.56 A total of 4884 titles were retrieved from the five databases of published literature, and a further 368 titles were identified from the databases of unpublished and international literature in February 2023. After the removal of duplicate records, we screened the abstracts and titles of 875 reports. A further 12 studies were added after handsearching of reference lists and conference proceedings and abstracts. Overall, nine studies totalling 436 participants were eligible. The average age of the participants ranged from 36-60 years. During an updated search on 10 August 2023, no further studies were identified.

Fig 1
Fig 1

Flow of studies in systematic review and meta-analysis

After screening of the title and abstract, 61 titles remained for full text review. Native speakers helped to translate papers in languages other than English. The most common reasons for exclusion were the inclusion of healthy volunteers, absence of control groups, and use of a survey based design rather than an experimental design. After full text screening, nine studies were eligible for inclusion, and 15 clinical trials prospectively registered or underway as of August 2023 were noted for potential future inclusion in an update of this review (see supplementary Appendix D).

We sent requests for further information to the authors of studies by Griffiths et al,57 Barrett,58 and Benville et al,59 because these studies appeared to meet the inclusion criteria but were only provided as summary abstracts online. A potentially eligible poster presentation from the 58th annual meeting of the American College of Neuropsychopharmacology was identified but the lead author (Griffiths) clarified that all information from the presentation was included in the studies by Davis et al13 and Gukasyan et al60; both of which we had already deemed ineligible.

Barrett58 reported the effects of psilocybin on the cognitive flexibility and verbal reasoning of a subset of patients with major depressive disorder from Griffith et al’s trial,61 compared with a waitlist group, but when contacted, Barrett explained that the results were published in the study by Doss et al,62 which we had already screened and judged ineligible (see supplementary Appendix E). Benville et al’s study59 presented a follow-up of Ross et al’s study17 on a subset of patients with cancer and high suicidal ideation and desire for hastened death at baseline. Measures of antidepressant effects of psilocybin treatment compared with niacin were taken before and after treatment crossover, but detailed results are not reported. Table 1 describes the characteristics of the included studies and table 2 lists the main findings of the studies.

Table 1

Characteristics of included studies

View this table:
Table 2

Main findings of included studies

View this table:

Side effects and adverse events

Side effects reported in the included studies were minor and transient (eg, short term increases in blood pressure, headache, and anxiety), and none were coded as serious. Cahart-Harris et al noted one instance of abnormal dreams and insomnia.63 This side effect profile is consistent with findings from other meta-analyses.3068 Owing to the different scales and methods used to catalogue side effects and adverse events across trials, it was not possible to combine these data quantitatively (see supplementary Appendix F).

Risk of bias

The Cochrane RoB 2 tools were used to evaluate the included studies (table 3). RoB 2 for randomised trials was used for the five reports of parallel randomised trials (Carhart-Harris et al63 and its secondary analysis Barba et al,64 Goodwin et al18 and its secondary analysis Goodwin et al,65 and von Rotz et al66) and RoB 2 for crossover trials was used for the four reports of crossover randomised trials (Griffiths et al,14 Grob et al,15 and Ross et al17 and its follow-up Ross et al67). Supplementary Appendix G provides a detailed explanation of the assessment of the included studies.

Table 3

Summary risk of bias assessment of included studies, based on domains in Cochrane Risk of Bias 2 tool

View this table:

Quality of included studies

Confidence in the quality of the evidence for the meta-analysis was assessed using GRADE,38 through the GRADEpro GDT software program. Figure 2 shows the results of this assessment, along with our summary of findings.

Fig 2
Fig 2

GRADE assessment outputs for outcomes investigated in meta-analysis (change in depression scores and response and remission rates). The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). BDI=Beck depression inventory; CI=confidence interval; GRADE=Grading of Recommendations, Assessment, Development and Evaluation; HADS-D=hospital anxiety and depression scale; HAM-D=Hamilton depression rating scale; MADRS=Montgomery-Åsberg depression rating scale; QIDS=quick inventory of depressive symptomatology; RCT=randomised controlled trial; SD=standard deviation

Meta-analyses

Continuous data, change in depression scores—Using a Hartung-Knapp-Sidik-Jonkman modified random effects meta-analysis, change in depression scores was significantly greater after treatment with psilocybin compared with active placebo. The overall Hedges’ g (1.64, 95% CI 0.55 to 2.73) indicated a large effect size favouring psilocybin (fig 3). PIs were, however, wide and crossed the line of no difference (95% CI −1.72 to 5.03), indicating that there could be settings or populations in which psilocybin intervention would be less efficacious.

Fig 3
Fig 3

Forest plot for overall change in depression scores from before to after treatment. CI=confidence interval; DL=DerSimonian and Laird; HKSJ=Hartung-Knapp-Sidik-Jonkman

Exploring publication bias in continuous data—We used Egger’s test and a funnel plot to examine the possibility of small study biases, such as publication bias. Statistical significance of Egger’s test for small study effects, along with the asymmetry in the funnel plot (fig 4), indicates the presence of bias against smaller studies with non-significant results, suggesting that the pooled intervention effect estimate is likely to be overestimated.69 An alternative explanation, however, is that smaller studies conducted at the early stages of a new psychotherapeutic intervention tend to include more high risk or responsive participants, and psychotherapeutic interventions tend to be delivered more effectively in smaller trials; both of these factors can exaggerate treatment effects, resulting in funnel plot asymmetry.70 Also, because of the relatively small number of included studies and the considerable heterogeneity observed, test power may be insufficient to distinguish real asymmetry from chance.71 Thus, this analysis should be considered exploratory.

Fig 4
Fig 4

Funnel plot assessing publication bias among studies measuring change in depression scores from before to after treatment. CI=confidence interval; θIV=estimated effect size under inverse variance random effects model

Dichotomous data

We extracted response and remission rates for each group when reported directly, or imputed information when presented graphically. Two studies did not measure response or remission and thus did not contribute data for this part of the analysis.1518 The random effects model with a Hartung-Knapp-Sidik-Jonkman modification was used to allow for heterogeneity to be incorporated into the weighting of the included studies’ results, and to provide a better estimation of between study variance accounting for small sample sizes.

Response rate—Overall, the likelihood of psilocybin intervention leading to treatment response was about two times greater (risk ratio 2.02, 95% CI 1.33 to 3.07) than with placebo. Despite the use of different scales to measure response, the heterogeneity between studies was not significant (I2=25.7%, P=0.23). PIs were, however, wide and crossed the line of no difference (−0.94 to 3.88), indicating that there could be settings or populations in which psilocybin intervention would be less efficacious.

Remission rate—Overall, the likelihood of psilocybin intervention leading to remission of depression was nearly three times greater than with placebo (risk ratio 2.71, 95% CI 1.75 to 4.20). Despite the use of different scales to measure response, no statistical heterogeneity was found between studies (I2=0.0%, P=0.53). PIs were, however, wide and crossed the line of no difference (0.87 to 2.32), indicating that there could be settings or populations in which psilocybin intervention would be less efficacious.

Exploring publication bias in response and remission rates data—We used Egger’s test and a funnel plot to examine whether response and remission estimates were affected by small study biases. The result for Egger’s test was non-significant (P>0.05) for both response and remission estimates, and no substantial asymmetry was observed in the funnel plots, providing no indication for the presence of bias against smaller studies with non-significant results.

Heterogeneity: subgroup analyses and metaregression

Heterogeneity was considerable across studies exploring changes in depression scores (I2=89.7%, P<0.005), triggering subgroup analyses to explore contributory factors. Table 4 and table 5 present the results of the heterogeneity analyses (subgroup analyses and metaregression, respectively). Also see supplementary Appendix H for a more detailed description and graphical representation of these results.

Table 4

Subgroup analyses to explore potential causes of heterogeneity among included studies

View this table:
Table 5

Metaregression analyses to explore potential causes of heterogeneity among included studies

View this table:

Cumulative meta-analyses

We used cumulative meta-analyses to investigate how the overall estimates of the outcomes of interest changed as each study was added in chronological order72; change in depression scores and likelihood of treatment response both increased as the percentage of participants with past use of psychedelics increased across studies, as expected based on the metaregression analysis (see supplementary Appendix I). No other significant time related patterns were found.

Sensitivity analysis

We reanalysed the data for change in depression scores using a random effects Dersimonian and Laird model without the Hartung-Knapp-Sidik-Jonkman modification and compared the results with those of the original model. All comparisons found to be significant using the Dersimonian and Laird model with the Hartung-Knapp-Sidik-Jonkman adjustment were also significant without the Hartung-Knapp-Sidik-Jonkman adjustment, and confidence intervals were only slightly narrower. Thus, small study effects do not appear to have played a major role in the treatment effect estimate.

Additionally, to estimate the accuracy and robustness of the estimated treatment effect, we excluded studies from the meta-analysis one by one; no important differences in the treatment effect, significance, and heterogeneity levels were observed after the exclusion of any study (see supplementary Appendix J).

Discussion

In our meta-analysis we found that psilocybin use showed a significant benefit on change in depression scores compared with placebo. This is consistent with other recent meta-analyses and trials of psilocybin as a standalone treatment for depression7374 or in combination with psychological support.2425293031326875 This review adds to those finding by exploring the considerable heterogeneity across the studies, with subsequent subgroup analyses showing that the type of depression (primary or secondary) and the depression scale used (Montgomery-Åsberg depression rating scale, quick inventory of depressive symptomatology, or Beck depression inventory) had a significant differential effect on the outcome. High between study heterogeneity has been identified by some other meta-analyses of psilocybin (eg, Goldberg et al29), with a higher treatment effect in studies with patients with comorbid life threatening conditions compared with patients with primary depression.22 Although possible explanations, including personal factors (eg, patients with life threatening conditions being older) or depression related factors (eg, secondary depression being more severe than primary depression) could be considered, these hypotheses are not supported by baseline data (ie, patients with secondary depression do not differ substantially in age or symptom severity from patients with primary depression). The differential effects from assessment scales used have not been examined in other meta-analyses of psilocybin, but this review’s finding that studies using the Beck depression inventory showed a higher treatment effect than those using the Montgomery-Åsberg depression rating scale and quick inventory of depressive symptomatology is consistent with studies in the psychological literature that have shown larger treatment effects when self-report scales are used (eg, Beck depression inventory).7677 This finding may be because clinicians tend to overestimate the severity of depression symptoms at baseline assessments, leading to less pronounced differences between before and after treatment identified in clinician assessed scales (eg, Montgomery-Åsberg depression rating scale, quick inventory of depressive symptomatology).78

Metaregression analyses further showed that a higher average age and a higher percentage of participants with past use of psychedelics both correlated with a greater improvement in depression scores with psilocybin use and explained a substantial amount of between study variability. However, the cumulative meta-analysis showed that the effects of age might be largely an artefact of the inclusion of one specific study, and alternative explanations are worth considering. For instance, Studerus et al79 identified participants’ age as the only personal variable significantly associated with psilocybin response, with older participants reporting a higher “blissful state” experience. This might be because of older people’s increased experience in managing negative emotions and the decrease in 5-hydroxytryptamine type 2A receptor density associated with older age.80 Furthermore, Rootman et al81 reported that the cognitive performance of older participants (>55 years) improved significantly more than that of younger participants after micro dosing with psilocybin. Therefore, the higher decrease in depressive symptoms associated with older age could be attributed to a decrease in cognitive difficulties experienced by older participants.

Interestingly, a clear pattern emerged for past use of psychedelics—the higher the proportion of study participants who had used psychedelics in the past, the higher the post-psilocybin treatment effect observed. Past use of psychedelics has been proposed to create an expectancy bias among participants and amplify the positive effects of psilocybin828384; however, this important finding has not been examined in other meta-analyses and may highlight the role of expectancy in psilocybin research.

Limitations of this study

Generalisability of the findings of this meta-analysis was limited by the lack of racial and ethnic diversity in the included studies—more than 90% of participants were white across all included trials, resulting in a homogeneous sample that is not representative of the general population. Moreover, it was not possible to distinguish between subgroups of participants who had never used psilocybin and those who had taken psilocybin more than a year before the start of the trial, as these data were not provided in the included studies. Such a distinction would be important, as the effects of psilocybin on mood may wane within a year after being administered.2185 Also, how psychological support was conceptualised was inconsistent within studies of psilocybin interventions; many studies failed to clearly describe the type of psychological support participants received, and others used methods ranging from directive guidance throughout the treatment session to passive encouragement or reassurance (eg, Griffiths et al,14 Carhart-Harris et al63). The included studies also did not gather evidence on participants’ previous experiences with treatment approaches, which could influence their response to the trials’ intervention. Thus, differences between participant subgroups related to past use of psilocybin or psychotherapy may be substantial and could help interpret this study’s findings more accurately. Lastly, the use of graphical extraction software to estimate the findings of studies where exact numerical data were not available (eg, Goodwin et al,18 Grob et al15), may have affected the robustness of the analyses.

A common limitation in studies of psilocybin is the likelihood of expectancy effects augmenting the treatment effect observed. Although some studies used low dose psychedelics as comparators to deal with this problem (eg, Carhart-Harris et al,63 Goodwin et al,18 Griffiths et al14) or used a niacin placebo that can induce effects similar to those of psilocybin (eg, Grob et al,15 Ross et al17), the extent to which these methods were effective in blinding participants is not known. Other studies have, however, reported that participants can accurately identify the study groups to which they had been assigned 70-85% of the time,8486 indicating a high likelihood of insufficient blinding. This is especially likely for studies in which a high proportion of participants had previously used psilocybin and other hallucinogens, making the identification of the drug’s acute effects easier (eg, Griffiths et al,14 Grob et al,15 Ross et al17). Patients also have expectations related to the outcome of their treatment, expecting psilocybin to improve their symptoms of depression, and these positive expectancies are strong predictors of actual treatment effects.8788 Importantly, the effect of outcome expectations on treatment effect is particularly strong when patient reported measures are used as primary outcomes,89 which was the case in several of the included studies (eg, Griffiths et al,14 Grob et al,15 Ross et al17). Unfortunately, none of the included studies recorded expectations before treatment, so it is not possible to determine the extent to which this factor affected the findings.

Implications for clinical practice

Although this review’s findings are encouraging for psilocybin’s potential as an effective antidepressant, a few areas about its applicability in clinical practice remain unexplored. Firstly, it is unclear whether the protocols for psilocybin interventions in clinical trials can be reliably and safely implemented in clinical practice. In clinical trials, patients receive psilocybin in a non-traditional medical setting, such as a specially designed living room, while they may be listening to curated calming music and are isolated from most external stimuli by wearing eyeshades and external noise-cancelling earphones. A trained therapist closely supervises these sessions, and the patient usually receives one or more preparatory sessions before the treatment commences. Standardising an intervention setting with so many variables is unlikely to be achievable in routine practice, and consensus is considerably lacking on the psychotherapeutic training and accreditations needed for a therapist to deliver such treatment.90 The combination of these elements makes this a relatively complex and expensive intervention, which could make it challenging to gain approval from regulatory agencies and to gain reimbursement from insurance companies and others. Within publicly funded healthcare systems, the high cost of treatment may make psilocybin treatment inaccessible. The high cost associated with the intervention also increases the risk that unregulated clinics may attempt to cut costs by making alterations to the protocol and the therapeutic process,9192 which could have detrimental effects for patients.929394 Thus, avoiding the conflation of medical and commercial interests is a primary concern that needs to be dealt with before psilocybin enters mainstream practice.

Implications for future research

More large scale randomised trials with long follow-up are needed to fully understand psilocybin’s treatment potential, and future studies should aim to recruit a more diverse population. Another factor that would make clinical trials more representative of routine practice would be to recruit patients who are currently using or have used commonly prescribed serotonergic antidepressants. Clinical trials tend to exclude such participants because many antidepressants that act on the serotonin system modulate the 5-hydroxytryptamine type 2A receptor that psilocybin primarily acts upon, with prolonged use of tricyclic antidepressants associated with more intense psychedelic experiences and use of monoamine oxidase inhibitors or SSRIs inducing weaker responses to psychedelics.959697 Investigating psilocybin in such patients would, however, provide valuable insight on how psilocybin interacts with commonly prescribed drugs for depression and would help inform clinical practice.

Minimising the influence of expectancy effects is another core problem for future studies. One strategy would be to include expectancy measures and explore the level of expectancy as a covariate in statistical analysis. Researchers should also test the effectiveness of condition masking. Another proposed solution would be to adopt a 2×2 balanced placebo design, where both the drug (psilocybin or placebo) and the instructions given to participants (told they have received psilocybin or told they have received placebo) are crossed.98 Alternatively, clinical trials could adopt a three arm design that includes both an inactive placebo (eg, saline) and active placebo (eg, niacin, lower psylocibin dose),98 allowing for the effects of psilocybin to be separated from those of the placebo.

Overall, future studies should explore psilocybin’s exact mechanism of treatment effectiveness and outline how its physiological effects, mystical experiences, dosage, treatment setting, psychological support, and relationship with the therapist all interact to produce a synergistic antidepressant effect. Although this may be difficult to achieve using an explanatory randomised trial design, pragmatic clinical trial designs may be better suited to psilocybin research, as their primary objective is to achieve high external validity and generalisability. Such studies may include multiple alternative treatments rather than simply an active and placebo treatment comparison (eg, psilocybin v SSRI v serotonin-noradrenaline reuptake inhibitor), and participants would be recruited from broader clinical populations.99100 Although such studies are usually conducted after a drug’s launch,100 earlier use of such designs could help assess the clinical effectiveness of psilocybin more robustly and broaden patient access to a novel type of antidepressant treatment.

Conclusions

This review’s findings on psilocybin’s efficacy in reducing symptoms of depression are encouraging for its use in clinical practice as a drug intervention for patients with primary or secondary depression, particularly when combined with psychological support and administered in a supervised clinical environment. However, the highly standardised treatment setting, high cost, and lack of regulatory guidelines and legal safeguards associated with psilocybin treatment need to be dealt with before it can be established in clinical practice.

What is already known on this topic

  • Recent research on treatments for depression has focused on psychedelic agents that could have strong antidepressant effects without the drawbacks of classic antidepressants; psilocybin being one such substance

  • Over the past decade, several clinical trials, meta-analyses, and systematic reviews have investigated the use of psilocybin for symptoms of depression, and most have found that psilocybin can have antidepressant effects

  • Studies published to date have not investigated factors that may moderate psilocybin’s effects, including type of depression, past use of psychedelics, dosage, outcome measures, and publication biases

What this study adds

  • This review showed a significantly greater efficacy of psilocybin among patients with secondary depression, patients with past use of psychedelics, older patients, and studies using self-report measures for symptoms of depression

  • Efficacy did not appear to be homogeneous across patient types—for example, those with depression and a life threatening illness appeared to benefit more from treatment

  • Further research is needed to clarify the factors that maximise psilocybin’s treatment potential for symptoms of depression

Ethics statements

Ethical approval

This study was approved by the ethics committee of the University of Oxford Nuffield Department of Medicine, which waived the need for ethical approval and the need to obtain consent for the collection, analysis, and publication of the retrospectively obtained anonymised data for this non-interventional study.

Data availability statement

The relevant aggregated data and statistical code will be made available on reasonable request to the corresponding author.

Acknowledgments

We thank DT who acted as an independent secondary reviewer during the study selection and data review process.

Footnotes

  • Contributors: AMM contributed to the design and implementation of the research, analysis of the results, and writing of the manuscript. MC was involved in planning and supervising the work and contributed to the writing of the manuscript. AMM and MC are the guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Funding: None received.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at https://www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; AMM is employed by IDEA Pharma, which does consultancy work for pharmaceutical companies developing drugs for physical and mental health conditions; MC was the supervisor for AMM’s University of Oxford MSc dissertation, which forms the basis for this paper; no other relationships or activities that could appear to have influenced the submitted work.

  • Transparency: The corresponding author (AMM) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as registered have been explained.

  • Dissemination to participants and related patient and public communities: To disseminate our findings and increase the impact of our research, we plan on writing several social media posts and blog posts outlining the main conclusions of our paper. These will include blog posts on the websites of the University of Oxford’s Department of Primary Care Health Sciences and Department for Continuing Education, as well as print publications, which are likely to reach a wider audience. Furthermore, we plan to present our findings and discuss them with the public in local mental health related events and conferences, which are routinely attended by patient groups and advocacy organisations.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

http://creativecommons.org/licenses/by-nc/4.0/

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

References