Intended for healthcare professionals

CCBYNC Open access

Response to acute monotherapy for major depressive disorder in randomized, placebo controlled trials submitted to the US Food and Drug Administration: individual participant data analysis

BMJ 2022; 378 doi: (Published 02 August 2022) Cite this as: BMJ 2022;378:e067606
  1. Marc B Stone, deputy director for safety1,
  2. Zimri S Yaseen, senior physician1,
  3. Brian J Miller, assistant professor2,
  4. Kyle Richardville, chief resident3,
  5. Shamir N Kalaria, clinical reviewer4,
  6. Irving Kirsch, associate director5
  1. 1Division of Psychiatry, Office of Neuroscience, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
  2. 2Division of Hospital Medicine, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
  3. 3Department of Medicine, Cleveland Clinic Foundation, Cleveland, OH, USA
  4. 4Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
  5. 5Program in Placebo Studies, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
  1. Correspondence to: B J Miller brian{at}
  • Accepted 2 June 2022


Objectives To characterize individual participant level response distributions to acute monotherapy for major depressive disorder in randomized, placebo controlled trials submitted to the US Food and Drug Administration from 1979 to 2016.

Design Individual participant data analysis.

Population 232 randomized, double blind, placebo controlled trials of drug monotherapy for major depressive disorder submitted by drug developers to the FDA between 1979 and 2016, comprising 73 388 adult and child participants meeting the inclusion criteria for efficacy studies on antidepressants.

Main outcome measures Responses were converted to Hamilton Rating Scale for Depression (HAMD17) equivalent scores where other measures were used to assess efficacy. Multivariable analyses examined the effects of age, sex, baseline severity, and year of the study on improvements in depressive symptoms in the antidepressant and placebo groups. Response distributions were analyzed with finite mixture models.

Results The random effects mean difference between drug and placebo favored drug (1.75 points, 95% confidence interval 1.63 to 1.86). Differences between drug and placebo increased significantly (P<0.001) with greater baseline severity. After controlling for participant characteristics at baseline, no trends in treatment effect or placebo response over time were found. The best fitting model of response distributions was three normal distributions, with mean improvements from baseline to end of treatment of 16.0, 8.9, and 1.7 points. These distributions were designated Large, Non-specific, and Minimal responses, respectively. Participants who were treated with a drug were more likely to have a Large response (24.5% v 9.6%) and less likely to have a Minimal response (12.2.% v 21.5%).

Conclusions The trimodal response distributions suggests that about 15% of participants have a substantial antidepressant effect beyond a placebo effect in clinical trials, highlighting the need for predictors of meaningful responses specific to drug treatment.


Depression is a leading cause of disability worldwide, affecting 300 million people globally, causing a major reduction in quality of life, with domestic costs (including costs related to work) estimated at more than $210.5 (£175.3; €207.1) billion annually.12 About 13% of Americans use antidepressants, and use of antidepressants in economically developed countries more than doubled between 2000 and 2015.34 Although many factors affect depression and suicide rates, the hope was that wider use of antidepressants would improve these rates. Nonetheless,5 these rates have generally increased,6 particularly in younger age groups, highlighting the importance of understanding the magnitude and determinants of the efficacy of antidepressant drugs.

Previous reviews have assessed the effects of antidepressants by analyzing aggregate trial data7891011121314 or participant level data from limited datasets. Meta-analyses have shown small mean differences between drug and placebo arms, and the clinical significance of these differences continues to be debated.789101112131415161718 Patients do not feel the difference in response between drug and placebo (drug effect); rather, patients have an overall drug response in the context of pharmacotherapy. How much was attributable to placebo effects is unobservable. In this paper, we use the term drug or placebo response to indicate change from baseline with the drug or placebo, and the term drug or placebo effect to indicate the component specifically attributable to the drug or placebo.19

Lack of knowledge about the distributions of individual responses has hampered discussions of the clinical significance of mean effects. Whether treatment responses in clinical drug trials are best described by one or multiple underlying distributions of treatment response or how drug and placebo response distributions differ is not known. The drug effect might not be a uniform small, and hence clinically unimportant, benefit across patients (ie, a shift in distribution mean without a change in the shape of the distribution). Rather, it could occur as a large, and thus clinically important, difference for a small subpopulation (ie, a difference in response distribution composition). Some investigators have attempted to look at this possibility by comparing variability in treatment response in patients treated with a drug or placebo. These analyses cannot rule out the effects of restricted subpopulations, however.202122 Researchers also continue to debate the relation between the initial severity of the disease and the effect of the drug,14232425 which patient subgroups benefit most from antidepressant treatment,2627 and whether new trials are hampered by rising placebo response rates.2829

In this article, we report a participant level analysis of randomized, placebo controlled trials of acute monotherapy for the treatment of major depressive disorder submitted to the US Food and Drug Administration from 1979 to 2016. We used mixture modeling of the distributions of participant level responses in randomized, placebo controlled trials of antidepressant drugs to determine subpopulations that the response distributions might comprise and to determine whether the difference between drug and placebo can be accounted for by a broadly applicable small incremental effect of the drug. We supported our aggregation of subject level trial data and looked at other controversies relating to efficacy trials of acute antidepressant treatment, with examinations of how mean responses and baseline severity have changed over time. We also looked at relations between age, sex, baseline severity of depression, and their interactions, and mean responses.


Our database contained 232 randomized, double blind, placebo controlled trials of drug monotherapy for major depressive disorder submitted by drug developers to the FDA, comprising 73 388 participants. The database included studies in new drug applications, whether positive or negative, and whether an indication for major depressive disorder was applied for or approved. The dataset also included studies after approval. All studies had specified study objectives and inclusion and exclusion criteria, as required to meet regulatory standards, thus meeting typical criteria for high quality studies and low risk of bias. Also, studies of clinical efficacy submitted to the FDA are generally reviewed by expert reviewers for fitness for purpose before the start of the study if conducted in the US, providing additional assurance of study quality. The data elements used for this study were trial and participant identifiers, treatment assignment (specific drug or placebo), age, sex, primary scale used to measure the severity of symptoms of major depressive disorder, and the score on that scale at baseline and at the last observation on treatment.

To conduct analyses of the severity of major depressive disorder across trials that used different instruments, we converted all scores to equivalent 17 item Hamilton Rating Scale for Depression (HAMD17) scores. HAMD17 was the most widely used measure (in 104 of 232 trials) in the dataset. The supplementary material has details of the conversions.

A mixed effects model with study as random effect estimated the mean difference in change from baseline (last measurement before treatment) for drug compared with placebo, as well as change from baseline for individual participants adjusted for random study effects. Other models included age, sex, baseline severity, and their interactions with treatment assignment, and with each other. Ecological bias in treatment covariate interactions was avoided by centering the participant level covariate around the study mean, as recommended by Burke et al.30 Because baseline severity correlates artifactually with change from baseline but not with severity at the end of treatment, models of change from baseline that included baseline severity as a predictor, estimated end of treatment severity rather than change from baseline and were converted into estimates of change from baseline by subtracting baseline severity. Models that used date of study participation as an explanatory variable were used to look for trends over time. Multivariable adaptive regression splines, with the method of Royston and Sauerbrei,31 were used to account for non-linear effects of continuous explanatory variables. To explore differences in efficacy among individual drugs, we modeled the residuals from the random effects model that included all the non-linearities and interactions among age, sex, baseline severity, and drug versus placebo assignment as a function of individual drug assignment.

We used finite mixture modeling to evaluate whether response distributions were compatible with drug effects having a broadly applicable incremental improvement over placebo or whether response distributions were instead consistent with a combination of simpler distributions, as might be expected with different response populations. The models tested allowed for different numbers of component distributions and differences between drug and placebo groups, both in component distribution means and in proportional contributions. We also compared models of normal distributions with models of left skewed or right skewed log normal distributions. Minimization of the Akaike and Bayesian information criteria was used to select the best model, along with a requirement that each latent distribution represented at least 1% of the population. We also applied finite mixture modeling to a split sample randomized by trial and to several subgroups: men and women; mild, moderate, and severe baseline severities according to the criterial of the National Institute for Health and Care Excellence (NICE); and participants aged <18 years to evaluate model consistency across subsamples. eTable 3 provides more details of mixture model testing and selection. All statistical analyses were performed with Stata versions 15.1, 16.1, and 17.0.

Patient and public involvement

Patients and members of the public were not involved in the design, conduct, reporting, or dissemination of the research. Patients were not involved in the study planning process because industry study data are non-public and submitted to the FDA as part of the agency’s regulatory processes.


Sample characteristics and relations among baseline variables

Table 1 shows sample characteristics of the population. The random effects mean age of participants was 41.8 years, with 90% of participants aged 15-70.4 years. Table 2 and table 3 summarize baseline severity of depression (last assessed before randomized treatment) by demographic characteristics. Baseline severity was considerably lower for participants aged <18 years, particularly those aged ≤12 years. This finding was mostly because a lower level of severity was seen in trials that included only children, particularly in trials that included children aged <12 years. When individuals aged 16 and 17 years were included in adult trials, their mean baseline severity was 20.3 points; in pediatric trials, their average severity was 17.7 points. In pediatric trials with a minimum age of 12 or 13 years, the average baseline severity was 18.8 points; for trials that included participants aged <12 years, the average baseline severity for participants aged ≥12 years was 17.3 points. Within trials, the distribution of baseline severity was slightly skewed, with median severity being about 0.3 points lower than mean severity (eFig 2).

Table 1

Population characteristics

View this table:
Table 2

Depression severity at baseline—HAMD17 equivalent scores by demographic group

View this table:
Table 3

Depression severity at baseline across studies by HAMD17 equivalent score

View this table:

Treatment effects

The random effects mean changes (supplement eTable 2) were improvements of 9.8 points (95% confidence interval 9.5 to 10.0) with active drug and 8.0 points (7.8 to 8.3) with placebo. The difference between drug and placebo was 1.75 points (1.63 to 1.86). The magnitude of the difference was unchanged when the analysis was done separately in subgroups with native HAMD17 scores (1.75, 95% confidence interval 1.57 to 1.93; standardized mean difference 0.232, 95% confidence interval 0.210 to 0.255) and converted scores (1.75, 1.59 to 1.91; 0.245, 0.223 to 0.267).

Influence of and interaction among participant characteristics

When sex was included in the model as the only covariate, little difference in response to placebo was found (0.14 points less improvement in men, 95% confidence interval −0.05 to 0.33, P>0.1). For active drug, the difference was greater (0.35 points less improvement in men, 0.21 to 0.49).

When baseline severity was used as the only covariate, improvement with drug and placebo increased with greater baseline severity (eFig 3). The advantage of drug over placebo increased with baseline severity by 0.09 points (95% confidence interval 0.06 to 0.12) for every one point increase in severity. The estimated difference between drug and placebo at a baseline severity of 16 points (5th centile) was 1.1 points, increasing to 2.5 points at a baseline severity of 29.6 points (95th centile).

With age as the only covariate, we found a linear relation between age and response to placebo for adults; improvement over baseline diminished by 0.30 (95% confidence interval 0.22 to 0.38) points for every decade increase in age. The observed response for adults to active drug also diminished with age.

When sex, baseline severity, and their interaction were included in the model, the differences between sexes in response to placebo (P=0.008) and active drug (P=0.02) were slight but statistically significant. Differences between drug and placebo increased similarly with greater baseline severity for both sexes.

When age, sex, and their interaction were included, both sexes showed a similar (P>0.4 for a difference) response to placebo that decreased with age, but with active drug the difference between sexes increased with age by an estimated 0.13 (95% confidence interval 0 to 0.25) points per decade. For women, the largest improvement over baseline with drug was estimated as 10.1 points at age 30 years; the largest difference between drug and placebo was estimated as 2.2 points at age 62 years. For men, the largest improvement over baseline with drug was estimated as 9.8 points between ages 22 and 23 years; the largest difference between drug and placebo was estimated as 1.7 points at age 57 years.

When age, baseline severity, and their interaction were included, these factors were strong predictors of change from baseline for both drug and placebo, and for difference between drug and placebo. For both drug and placebo, improvement from baseline increased with baseline severity and decreased with age. Comparing pediatric (age <18) participants directly with adults, the unadjusted difference between drug and placebo (eTable 2) was 1.12 points (95% confidence interval 0.66 to 1.57) greater for adults. Adjusted for baseline severity, the difference between drug and placebo was greater in adults by 0.64 points (0.17 to 1.12). Figure 1 shows the interactions among treatment, age, sex, and baseline severity. Generally, the greatest difference between drug and placebo was seen in older participants with higher baseline severity.

Fig 1
Fig 1

Heatmaps showing predicted treatment responses (change from baseline) as a function of sex, age, and baseline severity on the Hamilton rating scale for depression (HAMD17)

Trends over time in study characteristics and results

We found no detectable trend over time in sex distribution (P>0.15). The average age of participants was consistently 42 years until about 2005 when a notable downward trend was seen attributable to a relative absence of older adult participants, with an additional steep decline beginning in 2013 because most trials were conducted in children (79% of participants were aged <18 years). The random effects mean severity at baseline decreased by 1.54 (95% confidence interval 0.47 to 2.61) points, mostly between 1979 and 1995; a further reduction after 2013 was because most of the trials were conducted in children. End of treatment severity seemed to decrease slightly for active drug and placebo, but these trends were not statistically significant. After adjustment for age, sex, and baseline severity, no evidence for change in treatment responses over time was seen (P>0.7).

Response distributions

Figure 2 shows the distributions of responses to drug and placebo (compared with the superimposed modeled distributions). With drug, 41 790 (88.5%) of 47 243 participants showed some improvement (compared with 20 376 (84.4%) of 24 150 for placebo) and the median improvement was 9.8 points (compared with 7.2 points for placebo).

Fig 2
Fig 2

(Top) Fit of the mixture model distributions (curves) for drug and placebo responses with the respective histograms for the observed drug and placebo responses. (Bottom) Overall finite mixture model and component normal distributions for drug and placebo. HAMD17=Hamilton rating scale for depression; SEM=standard error of the mean; SD=standard deviation

The distributions of responses for drug and placebo did not appear unimodal. Analysis with finite mixture modeling found that the optimal model for drug and placebo responses was a combination of three overlapping normal distributions allowed to vary in relative size between drug and placebo (fig 2). The respective modeled overall distributions differed only in the proportions drawn from the underlying latent distributions (corresponding to the area under the curve for each). Allowing different means or skewness in the latent distributions for drug and for placebo did not improve the fit of the model. This trimodal normal model was robust across random subgroups and subgroups defined by baseline characteristics, and was consistently one of the two best models by Akaike and Bayesian information criteria (eTable 3). When other models (including models with four modes) showed a better value for the Akaike or Bayesian information criterion (although never both), no similar consistency across subgroups was found; rather, they seemed to deal with minor deviations from normality in the data.

One latent distribution (Large) represented a large degree of improvement (mean improvement 16 points, standard deviation 4.2), one (Minimal) represented little or no improvement (1.7, 3.0), and the third (Non-specific) represented a broad range (8.9, 7.0). Compared with placebo, the distribution for active drug was more likely to show Large responses (24.5% v 9.6%, odds ratio 3.07, 95% confidence interval 2.05 to 4.91) and less likely to show Minimal responses (12.2% v 21.5%, 0.51, 0.41 to 0.62). Most responses (63.3% of active drug and 68.9% of placebo), however, were in the Non-specific category. The estimated number needed to treat with active drug to realize one more patient with Large improvement was 6.7 (95% confidence interval 5.7 to 7.7). The number needed to treat with active drug to realize one less patient with Minimal improvement was 10.8 (9.0 to 12.5).

Drug level differences in treatment effect

Figure 3 shows differences in effect size among active drugs, adjusted for age, sex, and baseline severity. The drugs showing the largest beneficial effect (amitriptyline, clomipramine, and venlafaxine) also showed larger effects in trials where they were directly compared with other agents. These drugs represented about 10% (4689 of 48 495) of participants who received active drug, and the distribution of responses for these drugs (28% Large, 62% Non-specific, 10% Minimal) differed modestly from the other antidepressants (24% Large, 63% Non-specific, 12% Minimal).

Fig 3
Fig 3

Estimated effect for each drug. Center line shows the overall (random effects) mean effect for active drug to show how each drug differs from the average. Placebo line is a reference for how the differences among drugs compare with the differences between drugs and placebo


Principal findings and comparison with other studies

This participant level analysis of all placebo controlled monotherapy antidepressant efficacy trials submitted to the FDA between 1979 and 2016 provides a comprehensive participant level analysis of treatment responses and net drug effects. Consistent with meta-analyses, where the effects of antidepressant drugs on HAMD17 ranged from 1.6224 to 2.56,15 with standardized mean differences of 0.2313 to 0.34,32 we found a drug effect among adults equivalent to 1.82 points, with a standardized mean difference of 0.24. For pediatric participants, the drug effect was 0.71 points, with a standardized mean difference of 0.13. Some studies have attributed the relatively small standardized mean differences to increases in the placebo response over time282933 but this hypothesis was not supported in our study.

Similar to Thase et al,34 we found that a multimodal mixture model was the best fit for the data. However, we found that participants seemed to belong to one of three types of response populations. About two thirds of participants assigned drug and placebo had a Non-specific response. Those treated with drug were more likely to show a Large response (24.5% v 9.6% with placebo), however, and less likely to have a Minimal response (12.2% v 21.5%). Thus the observed advantage of antidepressants over placebo is best understood as affecting a minority of patients as either an increase in the likelihood of a Large response or a decrease in the likelihood of a Minimal response. Examination of response distributions by demographic and baseline factors (eFigs 9-12) showed differences in the overall distributions between subgroups, consistent with our findings in the regression model of the effects of baseline severity and age on treatment responses. The subgroup distributions were best described by trimodal models (eTable 3), however, showing that our findings were not artifacts of response patterns of different subgroups. This result highlights continued potential for identification of (endo)phenotypes that are specifically responsive to antidepressant drugs.

The Non-specific distribution included a broad range of responses. Its distinctly broad range and similar likelihood for drug and placebo groups suggests that these responses might reflect the diverse interactions of individual characteristics with placebo and other effects not related to drug treatment, such as response to increased clinical contact, spontaneous improvements, and regression toward the mean. The Large and Minimal response distributions were distinguished from the Non-specific category by differences in means and smaller standard deviations. The Large response was more than twice as likely with drug than placebo and might be qualitatively different, showing a change in the depressive state rather than symptomatic attenuation; >90% of those having a Large response were subthreshold or not depressed according to the NICE criteria.


The trimodal model provides a different understanding of what is meant by clinical response. Response has been conventionally defined in the literature in clinical trials for depression by a 50% or greater improvement from baseline. These threshold definitions, although useful, are arbitrary. Also, comparing numbers of participants in response categories defined by thresholds cannot distinguish between a uniformly shifted distribution pushing a subset of patients over a threshold, and one where different subgroups have different treatment effects. Thus unlike latent distributions in a mixture model, outcome categories defined by thresholds also cannot suggest differential underlying processes for how a change in symptoms came about. The finite mixture models presented in this paper give insight into how participants might have achieved their responses. A useful analogy might be the difference between phenotype and genotype. The observed improvement is analogous to phenotype, whereas underlying response distributions are analogous to genotypes.

Our findings could therefore suggest a different framing of the discussion of the balance of potential benefits and harms for acute prescribing of antidepressant drugs. Our results suggest that rather than being likely to provide a small incremental benefit in reducing symptoms (beyond a placebo response), there is a modestly increased likelihood of providing a large near term reduction in symptoms or preventing continued near term symptom severity that would not have occurred otherwise. Also, patients with only a modest improvement in symptoms with an adequate trial of acute antidepressant treatment might not be having a drug specific effect (92% of those having a Large response improved by ≥10 points) and thus might need to switch to another treatment.

Previous analyses of the relation between baseline severity and the efficacy of antidepressants found null2324 to moderate (slope about 0.3)1416 effects, and these effects might be attributable to instrument behavior rather than patient experience.25 We found that the effect of baseline severity was statistically significant (P<0.001) but small (slope about 0.1). This finding could be partially because of the ceiling effects on improvement among participants who were less depressed but might also be attributable to an increasing likelihood of a specifically drug responsive phenotype among those with greater symptom severity. In line with the NICE guidelines,32 given the modest absolute likelihood of substantial benefit over placebo, and when consistent with availability and patient preferences, beginning with lower risk treatments for mild to moderate acute depression might be preferable if no underlying dysthymic disorder exists.

Strengths and limitations

A key strength of the study was its reliance on a comprehensive dataset free of publication bias,35 including published and unpublished data. Our findings are limited to acute efficacy while on treatment by the nature of the trials evaluated and the available subject level data; with only observations for baseline and end of treatment, we could not look at symptom trajectories in this study, or other relevant targets of estimation, such as expected symptom severity at a predefined time point (eg, symptom status two months after the start of treatment).

The database did not include details of study design, such as inclusion and exclusion criteria, and their effect on our findings cannot be assessed. Other limitations include lack of more demographic information, evaluation of blinding,36 individual length of treatment, and item level data that might look at or refine interpretation of modeled subpopulations and the clinical generalizability of the study sample. For example, the effects of treatment history of antidepressants26 or discontinuations before study entry on our results are unknown, and drug registration trials usually exclude participants with recent suicidal ideation or suicidal attempts, or major medical or psychiatric comorbidities. The patients included in our analysis are thus likely to have had less clinically complex but more acutely severe depression than is typically seen in the community. The STAR*D (Sequenced Treatment Alternatives to Relieve Depression) trial found that about 78% of patients with major depressive disorder are excluded from typical clinical trials. The mean response to antidepressant medication in the first phase of the STAR*D trial was substantially (3.3 points) smaller than that seen in our data, but this finding also does not necessarily indicate that drug effects were smaller.37

We cannot fully exclude the possibility that the effects of the drugs are accounted for by functional unblinding.36 This possibility could result in biased assessments by unblinded raters or an increased placebo effect in unblinded participants. If bias were the primary driver of drug effects, we might expect shifts in the means of the response distributions for active drug relative to placebo, or additional response modes, limited to active drug, generated by the subset of participants who were unblinded. Such effects were not seen, however. On the other hand, although the observed distributions cannot exclude drug effects being accounted for by an increased placebo effect because of functional unblinding, drugs with more marked unblinding potential, such as trazodone, mirtazapine, and bupropion, would be expected to evidence larger mean treatment effects than others, but this effect was not seen (fig 3). Finally, HAMD17 has been criticized as a method of assessing changes in depressive symptoms in clinical trials. We found that the Montgomery-Asberg Depression Rating Scale and the Children’s Depression Rating Scale correlated strongly with the HAMD17, however, and studies that used the HAMD17 and other scales produced estimates nearly identical in magnitude.


Patients with depression are likely to improve substantially from acute treatment of their depression with drug or placebo. Although the mean effect of antidepressants is only a small improvement over placebo, the effect of active drug seems to increase the probability that any patient will benefit substantially from treatment by about 15%. Further research is needed to identify the subset of patients who are likely to require antidepressants for substantial improvement. The potential for substantial benefit must be weighed against the risks associated with the use of antidepressants, as well as consideration of the risks associated with other treatments that have shown similar benefits.383940 Because the benefits and risks might be categorically different (eg, reduced sadness v anorgasmia), weighting should be done at the individual level, jointly by patients and their care providers.

What is already known on this topic

  • Clinical trials of antidepressants in major depressive disorder show substantial mean improvement with both drug and placebo

  • Meta-analyses have confirmed that antidepressants have greater efficacy than placebo, but the mean difference is small

What this study adds

  • After accounting for participant baseline severity, age, and sex, placebo responses and drug effects were stable over time

  • Antidepressants and placebo showed the same three modal responses

  • The small mean advantage of antidepressants is because of differences between drug and placebo in a minority of participants in the likelihood of achieving a Large response or avoiding a Minimal response

Ethics statements

Ethical approval

Not required; analysis of aggregated deidentified clinical trial data.

Data availability statement

No additional data available.


This article reflects the views of the authors and should not be construed to represent US Food and Drug Administration’s views or policies.


  • Contributors: MBS (lead author and guarantor) designed and executed the analyses and wrote the manuscript. ZSY contributed to the design of the analyses and wrote the manuscript. BJM contributed to the design of the analyses and the manuscript. KR compiled the data analyzed and contributed to the manuscript. SNK compiled the data analyzed and contributed to the manuscript. IK contributed to the design of the analyses and the manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Funding: None.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years except for BJM who reports serving as a member of the CMS Medicare Evidence Development and Coverage Advisory Committee and receiving fees outside the related work from the Federal Trade Commission, the Health Resources and Services Administration, the Heritage Foundation, and Oxidien Pharmaceuticals; BJM was a medical officer at the FDA in the Division of Psychiatry from 2016 to 2017; no other relationships or activities that could appear to have influenced the submitted work.

  • The lead author (the manuscript’s guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

  • Dissemination to participants and related patient and public communities: Dissemination of the work to the public and clinical community through social media and lectures is planned.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: