Autism intervention meta-analysis of early childhood studies (Project AIM): updated systematic review and secondary analysisBMJ 2023; 383 doi: https://doi.org/10.1136/bmj-2023-076733 (Published 14 November 2023) Cite this as: BMJ 2023;383:e076733
- Micheal Sandbank, assistant professor1,
- Kristen Bottema-Beutel, associate professor2,
- Shannon Crowley LaPoint, postdoctoral research fellow3,
- Jacob I Feldman, research fellow45,
- D Jonah Barrett, doctoral student6,
- Nicolette Caldwell, research fellow7,
- Kacie Dunham, doctoral student48,
- Jenna Crank, independent researcher9,
- Suzanne Albarran, doctoral student10,
- Tiffany Woynaroski, assistant professor4581112
- 1Division of Occupational Science and Occupational Therapy, Department of Health Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- 2Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, USA
- 3TEACCH Autism Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- 4Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
- 5Frist Center for Autism and Innovation, Vanderbilt University, Nashville, TN, USA
- 6University of Alabama at Birmingham, Birmingham, AL, USA
- 7Department of Curriculum and Instruction, University of Arkansas, Fayetteville, AR, USA
- 8Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
- 9Austin, TX, USA
- 10Department of Special Education, University of Texas at Austin, Austin, TX, USA
- 11Vanderbilt Kennedy Center, Nashville, TN, USA
- 12Department of Communication Sciences and Disorders, John A Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, USA
- Correspondence to: M Sandbank Bluesky) (or @michealsandbank.bsky.social on
- Accepted 29 September 2023
Objective To summarize the breadth and quality of evidence supporting commonly recommended early childhood autism interventions and their estimated effects on developmental outcomes.
Design Updated systematic review and meta-analysis (autism intervention meta-analysis; Project AIM).
Data sources A search was conducted in November 2021 (updating a search done in November 2017) of the following databases and registers: Academic Search Complete, CINAHL Plus with full text, Education Source, Educational Administration Abstracts, ERIC, Medline, ProQuest Dissertations and Theses, PsycINFO, Psychology and Behavioral Sciences Collection, and SocINDEX with full text, Trials, and ClinicalTrials.gov.
Eligibility criteria for selecting studies Any controlled group study testing the effects of any non-pharmacological intervention on any outcome in young autistic children younger than 8 years.
Review methods Newly identified studies were integrated into the previous dataset and were coded for participant, intervention, and outcome characteristics. Interventions were categorized by type of approach (such as behavioral, developmental, naturalistic developmental behavioral intervention, and technology based), and outcomes were categorized by domain (such as social communication, adaptive behavior, play, and language). Risks of bias were evaluated following guidance from Cochrane. Effects were estimated for all intervention and outcome types with sufficient contributing data, stratified by risk of bias, using robust variance estimation to account for intercorrelation of effects within studies and subgroups.
Results The search yielded 289 reports of 252 studies, representing 13 304 participants and effects for 3291 outcomes. When contributing effects were restricted to those from randomized controlled trials, significant summary effects were estimated for behavioral interventions on social emotional or challenging behavior outcomes (Hedges’ g=0.58, 95% confidence interval 0.11 to 1.06; P=0.02), developmental interventions on social communication (0.28, 0.12 to 0.44; P=0.003); naturalistic developmental behavioral interventions on adaptive behavior (0.23, 0.02 to 0.43; P=0.03), language (0.16, 0.01 to 0.31; P=0.04), play (0.19, 0.02 to 0.36; P=0.03), social communication (0.35, 0.23 to 0.47; P<0.001), and measures of diagnostic characteristics of autism (0.38, 0.17 to 0.59; P=0.002); and technology based interventions on social communication (0.33, 0.02 to 0.64; P=0.04) and social emotional or challenging behavior outcomes (0.57, 0.04 to 1.09; P=0.04). When effects were further restricted to exclude caregiver or teacher report outcomes, significant effects were estimated only for developmental interventions on social communication (0.31, 0.13 to 0.49; P=0.003) and naturalistic developmental behavioral interventions on social communication (0.36, 0.23 to 0.49; P<0.001) and measures of diagnostic characteristics of autism (0.44, 0.20 to 0.68; P=0.002). When effects were then restricted to exclude those at high risk of detection bias, only one significant summary effect was estimated—naturalistic developmental behavioral interventions on measures of diagnostic characteristics of autism (0.30, 0.03 to 0.57; P=0.03). Adverse events were poorly monitored, but possibly common.
Conclusion The available evidence on interventions to support young autistic children has approximately doubled in four years. Some evidence from randomized controlled trials shows that behavioral interventions improve caregiver perception of challenging behavior and child social emotional functioning, and that technology based interventions support proximal improvements in specific social communication and social emotional skills. Evidence also shows that developmental interventions improve social communication in interactions with caregivers, and naturalistic developmental behavioral interventions improve core challenges associated with autism, particularly difficulties with social communication. However, potential benefits of these interventions cannot be weighed against the potential for adverse effects owing to inadequate monitoring and reporting.
Autism is a relatively common diagnosis, with current estimates suggesting approximately 1-4% of the population is affected.123 Early childhood interventions are often strongly recommended for young autistic children to promote skill gain in areas that might contribute to positive long term outcomes.45 Pediatricians and other physicians are often the first line of care directing families of autistic children to early childhood interventions to support their development. Therefore, physicians should be familiar with the available interventions and the landscape of evidence supporting them to make practice recommendations. However, the types of early childhood interventions recommended for this population vary widely in terms of approach and intensity, and current best practice guidelines differ across countries. For example, in the United States, the most commonly recommended treatment is early intensive behavioral intervention, an approach that incorporates operant conditioning, targets functional skills, and is characterized by a recommended intensity of 20-40 hours per week.6 In contrast, the National Institute for Health and Care Excellence in England concluded that only two intervention approaches have sufficient evidence to support their use. These relatively low intensity interventions are pediatric autism communication therapy, and joint attention, symbolic play, engagement and regulation (JASPER), which target early social communication in the context of natural interactions.7 Previous attempts to synthesize intervention evidence to generate consistent clinical guidelines have been hindered by a number of factors: heterogenous intervention approaches that prevented aggregation of evidence; low standards of evidence for designating practices as evidence based; limited evaluation of intervention outcomes; overreliance on vote counting over quantitative synthesis; and a rapidly transforming evidence base. Consequently, doctors, clinicians, and families have to navigate confusing and often conflicting guidance on which supports are the most likely to be efficacious for autistic children.
Project AIM: initial scope and findings
We conducted Project AIM (autism intervention meta-analysis), a scoping systematic review and meta-analysis of controlled group studies of any non-pharmacological intervention designed to support any outcome in young autistic children.8 The initial search identified 139 studies of common intervention approaches (parsed by type) on various outcomes (categorized by domain) in young autistic children. This quantitative synthesis of evidence from group design intervention studies allowed comparison of the overall quality and findings of evidence according to intervention approach. The results of Project AIM were selected by the Interagency Autism Coordinating Committee of the US Department of Health and Human Services as an important advance in autism research.9 Subsequently, the findings have been incorporated into clinical guidelines510 and continue to shape intervention recommendations for young autistic children.
The initial Project AIM investigation and subsequent secondary analyses documented gaps in study quality, within specific intervention types and across the literature as a whole. Notable gaps included an overrepresentation of quasi-experimental studies (ie, in which participants were not randomly assigned to treatment groups), overreliance on outcomes measured by unmasked assessors and proxy reports, and inadequate monitoring of adverse events and harms.811 When intervention effects were estimated from all available evidence, regardless of quality, several intervention approaches were estimated to have positive and statistically significant effects on a variety of outcomes. However, when study quality was taken into account and effects were restricted to those immune to these risks of bias (ie, selection bias, detection bias, placebo-by-proxy bias), no intervention approach was estimated to have positive and statistically significant intervention effects on any outcome. Although adverse events, effects, and harms were inadequately monitored, many studies reported information indicating they occurred (such as reasons for attrition that should have been reported as adverse events, or statistically significant negative effects on a measured outcome, which would qualify as a harm).11
Our initial report also found that intervention effects were larger on proximal outcomes that were specifically targeted in the intervention compared with outcomes indicating more distal developmental improvement; and on outcomes that were measured in contexts identical or similar to those of the intervention compared with those generalized to other contexts.12 These findings show that conclusions drawn about intervention effectiveness, in addition to being compromised by quality concerns, are dependent on researcher measurement decisions. Interventions shown to have proximal impacts in specific contexts are often designated as effective, with little attention given to the limited scope of change quantified by these effects. These designations are then repeated in systematic reviews and meta-analyses, and eventually shape clinical guidelines until most interventions that have been designated as evidence based and recommended for clinical use are those that have been shown to effect circumscribed and specific change, rather than generalized developmental gains. Although physicians guiding families to clinical supports are often led to believe that a substantial evidence base endorses the effectiveness of interventions for enabling broad developmental improvements, our work showed that there is little evidence that this occurs. Families might then believe that their child’s lack of improvement when participating in these interventions is indicative of the complexity of their child’s condition, rather than the inadequacy of the interventions available.
Need for an updated meta-analysis
Although the initial Project AIM report was published in January 2020, the search was completed in November 2017. Therefore, studies published after the search date were not included in quality evaluations and summary effect estimation. During the decade before the search, there were considerable increases in funding and the rate of publication of autism related research.13 Two thirds of the studies and randomized controlled trials included in Project AIM were published in the five years preceding the search date. Therefore, it was reasonable to assume that a substantial number of studies, including many randomized controlled trials, had been published since the original search was conducted. The rapidly expanding evidence base suggests that an updated review would ensure the conclusions of Project AIM reflect the most recent evidence on interventions for young autistic children and provide guidance to medical professionals, specialist clinicians, and families.
In the current report, we sought to answer the following questions with an updated dataset that integrated studies published since 2017 (inclusive of all research conducted from 1975 to 2021).
2. What percentage of studies monitored and reported or showed evidence of adverse events, adverse effects, or harms?
3. When all available evidence is considered from randomized controlled trials, what intervention types are estimated to have positive and statistically significant effects on targeted outcomes?
4. What intervention types are estimated to have positive and statistically significant effects on targeted outcomes when evidence from randomized controlled trials is further restricted to outcomes measured directly (ie, excluding caregiver or teacher report outcomes) and by masked assessors?
5. Are intervention effects on proximal outcomes significantly larger than intervention effects on distal outcomes? Are intervention effects on context bound outcomes significantly larger than intervention effects on generalized outcomes?
Search terms and databases
We completed an updated search on 17 November 2021 that replicated the initial Project AIM search in terms and databases, but was limited to studies published after 1 November 2017 (the date of our previous search). Searched databases included Academic Search Complete, CINAHL Plus with full text, Education Source, Educational Administration Abstracts, ERIC, Medline, Proquest Dissertations and Theses, PsycINFO, Psychology and Behavioral Sciences Collection, and SocINDEX with full text. Search terms are listed in the supplementary materials. This search yielded 6427 records that were then screened. In addition to searching databases that index dissertations and theses, we sought unpublished data by searching the journal Trials for published protocols and ClinicalTrials.gov with the search term “autism” to identify potentially relevant registered but unpublished clinical trials. Potentially relevant trials (n=168) were identified and we emailed researchers associated with those trials (n=187) with a request to share information that would allow their inclusion in the updated meta-analysis.
All identified records were double screened at the abstract level by two of 13 independent screeners using the web application abstrackr.14 Any record flagged as potentially eligible by at least one screener was then examined at the full text level. Studies were considered eligible if they were published in English between November 2017 and November 2021; they were experimental (ie, a randomized controlled trial) or quasi-experimental design group studies that included an intervention and a control or comparison group; they reported a simple majority of participants had autism; they included participant samples with an average age <8 years (<96 months); and they had not already been included in the previous Project AIM meta-analysis. Some studies that were eligible for inclusion reported insufficient data to allow extraction of appropriate effect sizes. For each of these studies, the first author contacted the corresponding author by email to request information that would allow effect size calculation.
All studies were independently double coded by the first author and by one of a team of five trained reliability coders. All discrepancies were identified and resolved through discussion before final codes were entered. Coding procedures were nearly identical to those used in the original Project AIM, but are briefly described here. The coding manual can be accessed through an online data repository.15 This update was not registered but replicated procedures used in the previous meta-analysis.
When reported, we extracted the following participant sample characteristics from reports: chronological age in months; language age in months (expressive was given preference, but receptive and total language ages were also extracted in the absence of expressive); and proportion of the sample reported as male.
Interventions were coded for type, setting, implementer, and cumulative intensity (total amount of intervention provided to participants in hours across the duration of the study). Approaches were categorized as belonging to one of nine possible types using the categorization system derived for the original Project AIM: animal assisted; behavioral; cognitive behavioral therapy; developmental; naturalistic developmental behavioral intervention (NDBI); TEACCH (formerly treatment of autistic and related communications handicapped children); technology based; sensory based; or other. Sensory based interventions were then further coded as sensory integration therapy, other sensory based interventions, or music therapy to ensure that interventions were grouped according to a consistent theory of change. We provide a non-exhaustive list of examples for each intervention type below. In the rare event that coders were unable to agree on intervention type, the first author contacted the corresponding author of the study and asked for their input.
Animal assisted therapy—Interventions that were mediated entirely through the presence of an animal or that were characterized primarily by interaction with an animal were categorized as animal assisted therapy. Examples include equine assisted therapy and use of service dogs.
Behavioral—Interventions were categorized as traditional behavioral interventions if they relied heavily on operant principles of learning and corresponding techniques (eg, were primarily adult led, used explicit instruction and prompting, provided explicit reinforcement with tangible rewards). Examples include early intensive behavioral intervention, the picture exchange communication system, and discrete trial training.
Cognitive behavioral therapy—Interventions described as cognitive behavioral therapy, which focuses on identifying and changing thinking patterns in an effort to change behavior, were categorized as such.
Developmental—Interventions were categorized as developmental if they were primarily child led, stressed the relational or transactional and social underpinnings of development, and taught skills according to a developmental sequence with the goal of allowing developmental cascades by repairing breakdowns in relational cycles. Examples include pediatric autism communication therapy and Hanen models.
Music therapy—Interventions were categorized as music therapy if they were explicitly described as such or incorporated music and rhythm based experiences toward therapeutic ends.
Naturalistic developmental behavioral intervention—Interventions were categorized as NDBIs if they were one of several named interventions listed in the consensus paper by Schreibman and colleagues.16 These interventions are motivated by developmental and behavioral theories of learning and characterized by child initiated interactions that give way to reciprocal social routines that are maintained through shared control by the adult and child. NDBIs are naturalistic in that they take place in environments and within routines that are already likely to occur in the child’s life (eg, play in the home or community), and rely on natural antecedents and rewards. Intervention targets tend to center on early social communication skills that are thought to serve as a foundation for further developmental cascades. Examples include the early start Denver model, JASPER, and pivotal response teaching.
Sensory integration therapy—Interventions were described as sensory integration therapy if they were explicitly described as such or were characterized by structured exposure to several types of sensory opportunities (ie, tactile, vestibular, proprioceptive).1718 This category included Ayres sensory integration and more general sensory integration therapy.
Sensory based interventions—Interventions were categorized as sensory based interventions if they incorporated targeted exposure to sensory stimuli with the goal of enhancing processing of sensory stimuli and theoretically related outcomes, but did not include several types of sensory opportunities. Examples include auditory integration, touch therapy, and massage.
TEACCH—Interventions were categorized as TEACCH if they were explicitly named as such. TEACCH is characterized by heavy reliance on predictable environments, structured work systems, and routines (eg, visual and picture activity schedules).
Technology based interventions—Interventions were categorized as technology based if technological mediation was described as the main change agent of the intervention. Technologies included electronic devices such as computers, iPads, or robots.
Other—Interventions that could not be adequately categorized in the previous categories were coded as other for type. These studies were excluded from summary effect estimation, but included in moderator analyses.
Outcomes were coded for domain, proximity, and boundedness.
Domain—Dependent variable names were extracted for each outcome, and outcomes were coded as representing diagnostic characteristics of autism (ie, social communication; restricted or repetitive patterns of behaviors, interests, or activities; sensory; or overall autism features—ie, total scores on diagnostic assessments) or related domains (ie, brain imaging, academic, adaptive, cognitive, language, motor, play, sleep, social emotional or challenging behavior). Further details about outcome domain coding are provided in the previous report.8 A non-exhaustive list of example measures and associated metrics represented in each domain category for which we estimated summary effects is provided in the supplementary materials.
Proximity—Outcomes derived from measures of skills and developmental achievements directly taught or modeled in the intervention were coded as proximal, while those derived from measures of broader skill sets and developmental milestones (across the targeted domain or in a distinct untargeted domain) were coded as distal.
Boundedness—To code outcome boundedness, coders considered the context of outcome measurement and context of the intervention across the four dimensions of materials, setting, interaction partner, and interaction style to determine the degree to which they matched. Outcomes measured in contexts that were the same or highly similar (differing on only one dimension) to that of the intervention context were coded as context bound. Outcomes measured in contexts that differed across two or more dimensions from the intervention context were coded as generalized.
Risks of bias
Studies and outcomes were coded for standard risks of bias following guidance from Cochrane (ie, selection bias, detection bias, performance bias, attrition bias)19 and reliance on caregiver or teacher report to index outcomes. We coded this additional outcome characteristic because it indicates the risk of a specific type of detection bias known as placebo-by-proxy bias, which is introduced when assessors are not only aware of participant group assignment, but also personally invested in the outcome.20
Adverse events, effects, and harms
Adverse events refer to any unfavorable outcomes that occur during or subsequent to participating in an intervention, but that might or might not have been caused by the intervention. Adverse effects refer to unfavorable outcomes that can be reasonably attributed to the intervention. Harms refer to sustained deterioration during or after participation in an intervention.21 Using procedures similar to those outlined by Bottema-Beutel and colleagues,11 two independent coders searched full text copies of each article for the following terms: “adverse events,” “adverse effects,” “harm,” “side effect,” and “complication,” and coded whether adverse events were reported, whether the adverse events were described in such a way that they could reasonably be considered adverse effects, the number of adverse events reported, and whether the authors described adverse event monitoring procedures. Direct quotes describing the adverse events and monitoring procedures were copied and pasted verbatim into the coding spreadsheet.
Effect size information
For each reported outcome, postintervention means, standard deviations, and sample sizes were extracted for the intervention and counterfactual groups. These values were used to calculate and code the standardized mean difference (d) representing each intervention effect and corresponding variance. In the few cases in which reported outcomes were dichotomous, frequencies were extracted to estimate d and its variance. All calculations of d were derived using the Campbell Collaboration Practical Meta-Analysis Effect Size Calculator.22 Standardized mean difference effect sizes were then converted to Hedges’ g to correct for small sample sizes using R statistical computing software (R Core Team, 2022). Effect sizes were reflected for outcomes for which lower scores were considered adaptive, so that directionality of effect size was consistent across all effects.
Primary and reliability coding sheets were independently sent to a separate coding auditor who identified discrepancies using a program created in-house for this purpose. Initial codes were stored in a separate location for reliability analyses before discrepancy discussions. Reliability was calculated in R23 with the irr package.24 Reliability was indexed using unweighted κ for categorical variables,25 where values over 0.6 reflect substantial agreement and values over 0.8 reflect near perfect agreement; and two way absolute intraclass correlation coefficients for continuous variables,26 where values over 0.75 reflect strong agreement and values over 0.9 reflect excellent agreement. κ values ranged from 0.62 to 0.9, and the average κ across all included categorical variables was 0.75. Intraclass correlation coefficients ranged from 0.68 to 0.99, and the average value across all continuous variables was 0.90.
All analyses were conducted in R.23 Given that this meta-analysis involved a complex data structure, wherein multiple effect sizes were extracted within overlapping participant samples and these effect sizes were then categorized according to intervention and outcome type, effect sizes were analyzed using robust variance estimation meta-analysis to account for the dependence structure of the data. Specifically, subgroup correlated effects working models27 were used to aggregate effect sizes based on type of outcome (outcome characteristics) within each type of intervention (intervention characteristics) using the clubSandwich28 and metafor packages.29 Summary effect sizes for each outcome and intervention type were only retained in the final models when the degrees of freedom (df) were greater than five.30 For moderation analyses, putative categorical moderators (ie, proximal v distal outcome, generalized v context bound outcome) were added to the final model used to estimate summary effects for all studies.
To assess possible publication bias across the extant literature, we conducted Egger’s regression test with cluster robust variance estimation methods (ie, the Egger MLMA with standard error approach) using the rma.mv function in the metafor package with the SCE covariance matrix obtained from the clubSandwich package.31 Egger’s regression test was conducted for each presented model, and within interventions for outcome types with df≥5 in the null models.
Patient and public involvement
Coauthors Jacob I Feldman and Tiffany Woynaroski are parents of autistic individuals and they worked on the research question, analyses, and drafting of the manuscript. Although members of the public were not directly involved in this review because of funding limitations and lack of researcher training to engage the public, the focus of this work is aligned with the research priorities of autistic people, which include rigorous evaluations of interventions designed to support development, health, and wellbeing.32
Descriptives of included study samples, interventions, and outcomes
Figure 1 presents the PRISMA (preferred reporting items for systematic review and meta-analysis) diagram detailing the search process. From the 6427 records retrieved in the updated search, we included 139 eligible reports. Reports identified in the updated search were combined with reports from the original search to yield a dataset of 289, reporting on 252 separate study samples (173 randomized controlled trials, 79 quasi-experimental design studies) that included a total of 13 304 participants and 3291 outcomes. The mean age of participant samples was 56.11 months (range 18.9-95.2, standard deviation 19.08). The mean language age equivalent in months for samples where it was reported was 22.36 (standard deviation 12.73). The average percentage of samples reported as male was 82.57 (standard deviation 11.27). There were 10 studies (10 reports) of animal assisted interventions, 4 studies (4 reports) of cognitive behavioral therapy, 48 studies (51 reports) of behavioral interventions, 19 studies (24 reports) of developmental interventions, 6 studies (8 reports) of music therapy, 57 studies (75 reports) of NDBI, 6 studies (6 reports) of sensory integration therapy, 6 studies (7 reports) of other sensory based interventions, 9 studies (9 reports) of TEACCH, 30 studies (33 reports) of technology based interventions, and 65 studies (72 reports) of interventions categorized as other. (When summed across intervention type, the total numbers of studies and reports slightly exceed the overall totals of 252 study samples and 289 reports because a small number of studies reported effects for two separate intervention approaches tested against a control. For these studies, a single report or study is represented in two separate intervention categories.) Data and analytic code are available in an online repository.33
Risk of bias and outcome characteristics
Figure 2 presents the percentage of outcomes for each risk of bias rating (ie, low risk of bias, high risk of bias, unclear) for all risk of bias indicators by intervention type. Only studies included in the generation of effect sizes are summarized in terms of risk of bias. Studies of animal assisted interventions, cognitive behavioral therapy, music therapy, sensory integration therapy, and other sensory based interventions are not described here because there were too few studies of these intervention types to reliably estimate the summary effects of these approaches.30 Overall, 173 of 252 studies were randomized controlled trials. Nearly two thirds of all outcomes (65.19%) were at risk of detection bias, and nearly half of all outcomes (45.51%) were derived from caregiver or teacher reports. Attrition bias was coded as high for 15.67% of outcomes. In most studies, the nature of the interventions prevented the possibility of fully masking participants to the intervention received. Therefore, we coded most outcomes (97.54%) as being at risk of performance bias, and we do not report risk of performance bias by intervention type because there is little variation. Most outcomes (80.56%) were coded as being generalized from the intervention context; the remainder were coded as context bound. Similarly, most outcomes (72.7%) were coded as being distal from the intervention targets; the remainder were coded as proximal.
We present effects stratified by risk of bias, first including all effects from randomized controlled trials, and subsequently restricting effects to those from randomized controlled trials and outcomes that were directly assessed (excluding caregiver or teacher reports), and then to effects from randomized controlled trials and outcomes subject to low risk of detection bias. Table 1 presents summary (Hedges’ g), heterogeneity (𝜏2), and significance estimates (P) for each of these analyses. Figures 3, 4, and 5 present forest plots of summary effect estimates associated with decreasing risks of bias and increasing levels of certainty. Summary estimates that include effects from quasi-experimental design studies, which are subject to high risk of selection bias, are provided in the supplementary materials (see figure S1).
Estimated effects from randomized controlled trials
Figure 3 reflects summary effect sizes derived from outcomes extracted only from randomized controlled trials, according to intervention and outcome type. Statistically significant effects were estimated for behavioral interventions on social emotional or challenging behavior (g=0.58, 95% confidence interval 0.11 to 1.06); for developmental interventions on social communication outcomes (0.28, 0.12 to 0.44); for NDBIs on adaptive (0.23, 0.02 to 0.43), language (0.16, 0.01 to 0.31), play (0.19, 0.02 to 0.36), and social communication outcomes (0.35, 0.23 to 0.47), and measures of diagnostic characteristics of autism (0.38, 0.17 to 0.59); for technology based interventions on social communication outcomes (0.33, 0.02 to 0.64) and social emotional or challenging behavior (0.57, 0.04 to 1.09). There were not enough controlled studies of animal assisted interventions, cognitive behavioral therapy, music therapy, sensory integration, or sensory based interventions to generate summary estimates of their effects on any outcome for young children. Additionally, there were not enough randomized controlled trials of TEACCH to generate summary effects for any outcome.
Estimated effects from randomized controlled trials excluding outcomes from caregiver or teacher reports
Figure 4 shows summary effects estimated exclusively from outcomes that were extracted from randomized controlled trials and that were not derived from caregiver or teacher reports (ie, were not at risk of placebo-by-proxy bias). Statistically significant effects were estimated for developmental interventions on social communication outcomes (g=0.31, 95% confidence interval 0.13 to 0.49) and for NDBIs on measures of diagnostic characteristics of autism (0.44, 0.20 to 0.68) and social communication outcomes (0.36, 0.23 to 0.49).
Estimated effects from randomized controlled trials excluding all outcomes subject to high risk of detection bias
Figure 5 shows summary effects estimated exclusively from outcomes that were extracted from randomized controlled trials where assessors were naive to group assignment. Summary effect estimation was possible for NDBIs only on measures of diagnostic characteristics of autism (g=0.30, 95% confidence interval 0.03 to 0.57), and cognitive (0.17, −0.02 to 0.37), language (0.06, −0.13 to 0.025), and social communication outcomes (0.11, −0.03 to 0.26). Only the effect on measures of diagnostic characteristics of autism was statistically significant.
Moderator analyses of proximity and boundedness
Meta-regression analyses across the entire dataset suggest that summary effects were significantly smaller for distal outcomes compared with proximal outcomes (B=−0.15, P=0.002). Additionally, effect sizes coded as generalized were significantly smaller than those coded as context bound (B=−0.27, P<0.001).
Results from the Egger multilevel meta-analysis test31 indicated that, across all outcomes analyzed in our models, there was evidence for funnel plot asymmetry (B=2.98, P<0.001). There was still evidence of funnel plot asymmetry when we restricted outcomes to only those from randomized controlled trials (B=5.86, P=0.01), only those from randomized controlled trials that did not include caregiver or teacher outcomes (B=8.18, P<0.001), and only those from randomized controlled trials that were at low risk of detection bias (B=2.26, P=0.03).
Looking within interventions by outcome type (see table S1 and figures S1-S8), limited evidence was found for small study or publication bias within each outcome type with a sufficient number of effect sizes and clusters. After correcting for multiple comparisons with a Benjamini-Yekutieli false discovery rate correction,34 no outcome type within any intervention was identified with a statistically significant result for the Egger multilevel meta-analysis test. Therefore, the evidence indicates that small sample or publication bias influenced the results, but the extent to which publication bias could have influenced the results for individual intervention types is not clear.
Adverse events, effects, and harms
When reports across both search periods were considered together, 10% mentioned adverse events, and of these, 66% reported that no adverse events occurred, 34% reported that adverse events occurred, and 17% reported that adverse effects occurred. Additionally, only 28% of articles that mentioned adverse events reported monitoring procedures for determining if adverse events or effects occurred. None of the reports in both search periods mentioned harms, or indicated any intention to monitor harms after the end of the intervention. The number of reported adverse events across studies ranged from 0 to 67. Interestingly, three quarters (76%) of the studies that mentioned adverse events, but did not describe any monitoring procedures, reported that no adverse events occurred. In contrast, only half (50%) of studies that described at least some adverse event monitoring procedures reported that no adverse events occurred. Therefore, it is possible that the frequency with which adverse events are reported to occur is dependent upon the robustness and transparency of procedures for monitoring them. Descriptions of adverse events extracted from studies that reported this information are provided in the supplementary materials. Examples of adverse events that could be attributed to the intervention were intense child aggression and serious adverse effects on parent mental health after participating in a parent mediated intervention (see Bottema-Beutel and colleagues11 for additional examples of adverse events extracted from studies during the initial search period).
The purpose of this systematic review and meta-analysis was to provide an updated summary of evidence on interventions designed for young autistic children that includes more recently published studies. In only four years after our initial search for Project AIM, the available evidence has doubled, including the number of randomized controlled trials (from 87 in our original report to 173 in the current report). Three quarters of all controlled group design tests of interventions and 80% of all randomized controlled trials were published in the past decade, which means that even intervention guidelines based on relatively recent evidence reviews3536 no longer reflect most available evidence from controlled group design studies.
Findings by intervention type
As in the previous meta-analysis, we estimated the effects for behavioral interventions, developmental interventions, NDBIs, and technology based interventions. However, there were still not enough studies to reliably estimate the summary effects for animal assisted interventions or cognitive behavioral therapy (although cognitive behavioral therapy is typically recommended for older populations). Because we restricted summary effect estimation to those from randomized controlled trials, we were also unable to estimate the summary effects of TEACCH on any outcome (though see supplementary figure 1 for estimates of summary effects when quasi-experimental studies were included). We further divided the previous sensory based intervention category into three groups (music therapy, sensory integration therapy, other sensory based interventions) to ensure that intervention categories had consistent theories of change. We found that these categories also had too few controlled studies to reliably estimate summary effects.
In the US, traditional behavioral interventions are the most frequently recommended intervention approach for autistic children.37 When evidence from randomized controlled trials is considered, it appears that behavioral interventions might have moderate positive effects on social emotional or challenging behavior outcomes. These estimates are mainly driven by effects from unmasked caregiver or teacher report measures. However, this finding differs from our previous report, in which not enough randomized controlled trials of behavioral interventions had been conducted to allow summary estimation of any effects. Several randomized controlled trials were published in the four years after the original search date; however, this increase was mostly due to randomized tests of focused behavioral interventions (eg, functional communication training, prevent teach reinforce, predictive parenting), and not by randomized tests of early intensive behavioral intervention or other comprehensive behavioral approaches. There was only one randomized controlled trial of early intensive behavioral intervention added to our current sample, which tested randomized comparisons of this treatment and the early start Denver model (an NDBI) delivered at various intensities (and found that neither intervention nor intensity was associated with superior effects on measured outcomes).38 More studies that compare robust delivery of two different but commonly recommended comprehensive interventions are needed. Given that behavioral interventions are routinely recommended for this population, it is vital that more controlled tests with unbiased measures are conducted, and that support recommendations are altered to reflect the current evidence base.
As in our original meta-analysis, estimates from randomized controlled trials suggest developmental interventions have positive and statistically significant effects on social communication, even when outcomes at risk of placebo-by-proxy bias are excluded. However, because developmental researchers frequently relied on observational measures of social communication derived from interactions with unmasked caregivers to indicate developmental intervention effects, our confidence in this estimate is limited by the risk of detection bias. Therefore, we recommend that intervention researchers estimate social communication effects (or effects on any domain) using assessments that use masked assessors, and not only masked coders; this can be done even when deriving variables from communication samples by using a masked clinician who is skilled at facilitating natural and responsive interactions.
Naturalistic developmental behavioral interventions
Despite only being named as a category in 2015,16 NDBIs are now the most frequently studied intervention approach for this population. Summary effect estimates from randomized controlled trials suggest that NDBIs might improve adaptive behavior, language, play, social communication, and measures of diagnostic characteristics of autism, though our confidence in these estimates is limited by reliance on measures that are subject to high detection bias. In contrast to our previous meta-analysis, we found a statistically significant positive effect of NDBIs on measures of the diagnostic characteristics of autism, even when effects were restricted to outcomes from randomized controlled trials with low risk of detection bias. This was the only intervention effect we were able to estimate when accounting for all risks of bias considered. Given that diagnostic measures index core features of autism (ie, repetitive patterns of behaviors, interests, or activities, and social communication challenges), and that our other estimates suggest NDBIs have null effects on repetitive patterns of behaviors, interests, or activities but positive effects on social communication outcomes, it is likely that the positive and statistically significant estimates of NDBIs on overall measures of the diagnostic characteristics of autism were driven by improvements in social communication. We were unable to estimate summary effects of NDBIs on outcomes categorized as social communication alone when outcomes at high risk of detection bias were excluded because social communication outcomes were frequently measured using interactions with unmasked assessors (eg, caregivers) in studies of NDBIs. Most of these effects were excluded when detection bias was considered, and diagnostic measures of autism remained the only masked assessments indicating improvements in social communication in these studies. Therefore, we conclude that there is relatively strong evidence that NDBIs can have positive effects on core features associated with autism, specifically social communication differences.
Using measures of diagnostic characteristics of autism (eg, Autism Diagnostic Observation Schedule, Childhood Autism Rating Scale) to index intervention outcomes could be problematic because interpreting “improved” scores on such measures as evidence of a positive intervention effect implies that the goal of an intervention is to make a child “less autistic.” Although these measures indicate difficulties associated with autism diagnosis, they also capture neutral or even beneficial aspects of autism (such as special interests). If interventions are only intended to support improvements in social communication, it is important that researchers use high quality masked assessments of this specific construct to index intervention effects, such as the Communication and Symbolic Behavior Scales39 or the Early Social Communication Scales40 as administered and coded by people who are naive to children’s group assignment.
Technology based interventions
The number of studies of technology based interventions nearly tripled (from 10 to 30 studies) in the four years between the original and updated searches. However, most of the high quality evidence supporting the potential benefits of technology based interventions reflects effects on proximal, circumscribed outcomes. Technology based interventions could have broad appeal because they tend to be new and motivating, use predictable formats, and have the potential to increase access for those who might otherwise have difficulty accessing intervention and supports. For these reasons, intervention developers might wish to integrate technological supports into more established intervention approaches to help the development of specific skills.
Categories of intervention approaches that were too sparsely represented in the literature to allow summary effect estimation for autistic children between birth and 8 years included animal assisted therapy, cognitive behavioral therapy, music therapy, sensory integration therapy, other sensory based interventions, and TEACCH. Given that these interventions are frequently prescribed for and used by this population, there is a need for more rigorous research evaluating the efficacy of such approaches. In the meantime, physicians guiding families toward interventions should bear in mind that the current evidence base is limited, and they should keep up to date with emerging literature.
Proximity and boundedness
As expected, we found evidence that intervention effects are greater on proximal than distal outcomes, and for context bound than generalized outcomes, replicating the findings of our previous work.841 These findings are consistent with the conclusions of a recent review of early interventions for children with or at high likelihood of receiving an autism diagnosis.42 Therefore, researchers who measure outcomes that are not designed to show sustained developmental change are likely to observe stronger effects and to draw more positive conclusions about intervention efficacy than is warranted. Intervention is often recommended for autistic children in early childhood at high intensities and for long durations based on the assumption that this approach is necessary to support generalized developmental improvements. If the goal of evidence summaries is to determine whether early childhood interventions support such improvements, controlled group design studies that use measures well equipped to tap broad developmental change offer the most relevant evidence. Therefore, in evaluating evidence and guiding families toward early childhood interventions, it is important that physicians and other clinicians consider not only the quality of evidence, but also the scope of change reflected by the outcomes.12
Current trends and future research
The doubling of randomized controlled trials in only four years suggests that it is possible for autism researchers to conduct randomized controlled trials, and that randomized tests of interventions should be a minimum standard for establishing intervention efficacy. Consequently, physicians should refrain from drawing firm conclusions when available evidence is largely quasi-experimental in nature, and researchers should continue to treat random assignment as a crucial component of experimental design.
Outcomes derived from caregiver or teacher report measures comprised a substantial subset (nearly half) of all outcomes, and the percentage of these report measures even increased between the original and updated samples. Physicians evaluating evidence from randomized controlled trials of specific intervention approaches should keep this in mind and exercise caution in interpreting results that emphasize statistically significant effects on outcomes measured by proxy reports and unmasked assessments with minimal reporting of effects on more stringent outcome measures.
Robust intervention evaluations require that researchers assess, interpret, and report adverse events, adverse effects, and harms.43 However, these remain infrequently monitored or reported in the autism intervention literature. Despite this lack of monitoring, however, there is some evidence to suggest that adverse events and adverse effects might be relatively common.11 One promising finding from our updated review is the increased frequency of describing adverse event monitoring procedures: 44% of studies that mentioned adverse events provided at least some description of their approach to monitoring procedures. Moving forward, researchers should develop definitions of adverse events and procedures for monitoring them that are tailored by intervention type, and shared across research groups. These measures could include procedures for active monitoring for adverse events that might occur within the intervention sessions (eg, child distress, injury from intervention equipment, aggression), and additional procedures for active monitoring of events that could occur outside of the intervention sessions (eg, sleep disturbance, changes in eating habits, anxiety, parental distress). Because anecdotal reports and qualitative evidence suggest autistic adults might have experienced long term harms from participation in specific interventions,44 and because they have expressed that research about potential intervention harms is among their top research priorities,4345 researchers should make concerted efforts to follow participants over longer periods of time to document the potential for sustained negative impacts of interventions. Until such information is explored by researchers, families and practitioners will continue to have little basis on which to weigh the potentially positive effects of interventions against the potential for negative impacts.
Strengths and limitations
This updated meta-analysis has several strengths. Our search of published and unpublished literature ensured that we identified and retrieved as many potentially eligible studies as possible. Our double screening and double coding process and high reliability suggest our data reliably reflect attributes of included studies and outcomes. Additionally, our statistical methods ensured that we estimated summary effects as precisely as possible, while accounting for the intercorrelated structure of the data and retaining power in subgroup analyses.27 Our coding process also accounted for study and outcome level risks of bias, and outcome characteristics that are often ignored (ie, boundedness and proximity), ensuring appropriate caution in interpreting results.
Our investigation also has a number of limitations. Although our procedures closely followed those used in the previous meta-analysis, this update was not registered. Even though we reliably categorized most interventions and our intervention categories have proved useful to other reviews,10 approximately a quarter of studies featured interventions that were categorized as other because they did not cluster together in terms of their theory of change and were not reflected in the current summary estimates. However, effect sizes from these studies were reflected in moderator analyses (ie, for proximity and boundedness) and can be included in future meta-regressions that investigate whether specific participant or intervention characteristics are associated with stronger effects. There were a range of studies in this catch-all category which describe interventions that might deserve further attention given that they support improvements in domains that are often regarded as impairments by autistic people46 and implicated in important developmental processes, but rarely targeted by intervention. These range from specialized intervention approaches that support sleep47 and management of eating aversions48 to academic programs that support reading acquisition.49
Authors of other prominent autism evidence reviews based their clinical recommendations on rubrics specifying different thresholds of study quality.3550 For example, the National Clearinghouse on Autism Evidence and Practice designated practices as evidence based if they were supported by two peer reviewed quasi-experimental or randomized controlled group studies conducted by at least two research groups; five single case design studies conducted by at least three research groups and featuring a minimum of 20 participants in total; or a combination of one controlled group study and three single case design studies conducted by at least two research groups.50 In contrast, the National Autism Center designated interventions as established if they were supported by two quasi-experimental or randomized controlled group studies or four single case design studies with at least 12 participants and consistent effects.35 Because we recognize that the designation of a single threshold as sufficient for drawing conclusions is somewhat arbitrary and varies by reviewer, we elected to transparently present summary effects associated with increasingly lower risks of bias (and higher levels of certainty). Consistent with our previous findings which were derived using a similar analytic approach, we observed that because contributing effects were increasingly excluded based on risk of bias, fewer summary effects could be estimated, and associated confidence intervals were wider. Consequently, as quality thresholds increased, fewer summary effects could be synthesized with a high degree of confidence (df≥5), and crossed the threshold for statistical significance, even though the magnitude of summary effect estimates was often similar. To some extent, this observation is a function of our analytic approach. Raising quality standards will often reduce the sample of studies and effects that meet such standards, and so reduce power to detect a true effect that might be present. By presenting estimates associated with increasing levels of confidence, we hoped to show which clinical recommendations might be supported by some evidence (ie, evidence from randomized controlled trials but including effects at risk of detection and placebo-by-proxy bias), and which might be supported by the best available evidence (ie, evidence from randomized controlled trials excluding effects at high risk of detection bias).
Given that there are few intervention approaches at present with the best available evidence supporting their efficacy for improving developmental outcomes, what intervention recommendations should medical professionals make? The Australian government recently updated their national guideline for supporting autistic children and their families, and integrated a framework for making ethical support recommendations.1051 The framework suggests that supports should be plausible (have a clear mechanism of effectiveness and be supported by the best available evidence), practical (feasible to deliver in local conditions), desirable (consistent with child wants and needs, and family priorities), and defensible (benefits outweigh effort and opportunity costs, and will be viewed positively by the child later in life). Drawing on this framework, we recommend that physicians guide families toward interventions with the most robust evidence supporting their efficacy for improving the intended outcomes, provided that the supports can be offered in a way that integrates and strengthens child wellbeing and family routines rather than disrupting them. For example, interventions provided within the home, embedded in daily routines, and focused on strengthening caregiver capacities to support development are less likely to disrupt child and family wellbeing than interventions provided at high intensities in clinics directly to the child by clinicians. Clinicians should ensure that they have adequate systems for monitoring whether the selected intervention promotes progress in terms of acquiring specific skills in specific contexts, and also in the broader, generalized development of these children. Finally, it cannot be assumed that interventions and supports are harmless, so physicians should advise families to monitor for indicators of negative effects and child or family distress.
Studies investigating interventions for young autistic children have proliferated at an astonishing rate, but corresponding improvements in study quality have not kept pace. Some high quality evidence exists, which suggests that NDBIs can improve core features associated with autism. However, it is not clear if such outcomes are desirable for autistic people given that measures of core features of autism are not restricted to impairments that need to be addressed to positively influence autistic development. Interventions tend to have larger effects on small and specific changes in specific contexts, and smaller effects on distal and generalized developmental improvement. We are unable to weigh the potential benefits of any intervention against the potential for unintended negative consequences because most researchers are not adequately monitoring and reporting adverse events.
What is already known on this topic
Many different types of early childhood interventions are recommended and offered to support generalized development in young autistic children
Previous research is mixed in quality and conclusions about the effectiveness of these interventions
What this study adds
The evidence available has approximately doubled in only four years
Some evidence from high quality studies supports the effectiveness of specific early childhood interventions for improving certain outcomes
Researchers have inadequately monitored and reported adverse events, effects, and harms (for any intervention type), therefore physicians should guide families to watch children closely when starting an intervention
Ethical approval was not required for this work.
Data availability statement
The coding manual that guided primary data extraction, and the full dataset are available at https://osf.io/kr2cd/
The authors acknowledge the helpful efforts of Elton Wells, who created in-house software that automated agreement checks for this investigation.
Contributors: MS conceptualized the research project, oversaw all aspects of the work, was responsible for conducting the search, full text screening, data extraction, and manuscript drafting, and acts as the primary guarantor of the published findings. KB-B conceptualized the research project, assisted with title and abstract screening, conducted adverse event coding, and participated in manuscript drafting. SCLP assisted with title and abstract screening, served as a key reliability coder, summarized risk of bias analyses, and edited the drafted manuscript. JIF assisted with title and abstract screening, conducted primary data analyses, assembled results tables and figures, edited the drafted manuscript, and acts as the secondary guarantor of published findings. DJB assisted with title and abstract screening, served as a key reliability coder, and edited the drafted manuscript. NC assisted with reliability coding and edited the drafted manuscript. KD assisted with reliability coding and edited the drafted manuscript. JC assisted with title and abstract screening, audited agreement between coders, and edited the drafted manuscript. SA assisted with reliability coding and edited the drafted manuscript. TW conceptualized the research project and edited the drafted manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: Research reported in this publication was supported in part by the National Center for Advancing Translational Sciences of the National Institutes of Health under award number TL1TR002244 (principal investigator Hartmann) and the National Institute on Deafness and other Communication Disorders of the National Institutes of Health under award number F31DC020129 (principal investigator KD). Affiliated institutions and funders had no role in study design; collection, analysis, or interpretation of data; nor the writing of or decision to submit this report for publication.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: support from the National Center for Advancing Translational Sciences of the National Institutes of Health, and the National Institute on Deafness and other Communication Disorders of the National Institutes of Health for the submitted work. MS has received fees for presenting research findings in invited talks from Children’s Healthcare of Atlanta and the New Jersey Autism Center of Excellence, and from law firms representing the National Disability Insurance Scheme of Australia for providing expert evidence on the efficacy of early childhood interventions in court hearings. In the past three years, she taught courses in a program that was accredited by the Behavior Analyst Certification Board on both behavioral and NDBI early childhood interventions. KB-B has previously received fees for consulting with school districts on intervention practices for autistic children and teaches courses on autism interventions in her role as an associate professor of special education. She has also accepted speaker fees to discuss her work on research quality, adverse events, and researcher conflicts of interest as they pertain to autism intervention research. She also receives royalties for a coedited book titled Clinical Guide to Early Interventions for Children with Autism published by Springer. SCLP was formerly affiliated with an entity that trained students to become board certified behavior analysts and provided early intensive behavioral intervention. She is currently employed by the TEACCH Autism Program and served as an interventionist on an intervention developed at TEACCH for autistic transition age youth. JIF has been paid to provide adaptive horseback riding lessons (an animal assisted therapy). He is employed in a department that teaches students to provide early communication therapies. NC is a board certified behavior analyst at the doctoral level (BCBA-D) and is the current president elect of the Arkansas Association for Behavior Analysis. She teaches courses in a university program accredited by the behavior analyst certification board and formerly provided quality assurance and consultation services for the Arkansas Medicaid waiver program which provides behavioral based services for children with autism aged 0-8. SA is a board certified behavior analyst who directly provides services to autistic children, adolescents, and adults. She is co-owner of a clinical practice that receives direct payment for behavior analytic services through contracts with local school districts, private and public insurance payors, and Texas Medicaid waiver programs. Susanne is an instructor for coursework that is approved by the behavior analyst certification board, and she serves as a practicum and field supervisor for master’s level students in pursuit of advanced degrees in the field of behavior analysis. KD is a PhD candidate in a department that teaches students to provide early communication therapies, including some evaluated as part of this meta-analysis. JC was previously employed as an early intervention therapist, and was paid to provide behavioral and NDBI type therapies to children. TW is the parent of an autistic child; has previously been paid to provide traditional behavioral, naturalistic developmental behavioral, and developmental interventions to young children on the autism spectrum; has received grant funding from internal and external agencies, including the National Institutes of Health and the Vanderbilt Institute for Clinical and Translational Research, to study the efficacy of various interventions geared toward young children with autism (though not to support this specific work); and is employed by the Department of Hearing and Speech Sciences at Vanderbilt University Medical Center, which offers intervention services (which include the types of interventions evaluated in this meta-analysis) for autistic children through their outpatient clinics and trains clinical students in the provision of treatments delivered over the course of early childhood. All other authors have no conflicts of interest to declare.
The lead author (the manuscript’s guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Dissemination to participants and related patient and public communities: The results of this work will be disseminated to the public via press releases, presentations at conferences oriented towards clinicians that serve young autistic children, and plain language summaries posted on websites and social media. The lead investigators are currently seeking funding to support development of a website that will provide the public with open access to the dataset, plain language summaries of findings and links to published papers, and data visualizations that intuitively characterize the data and findings for a lay audience. These findings will also inform a future clinical trial that includes robust patient involvement in evaluation of the intervention acceptability and tolerability.
Provenance and peer review: Not commissioned; externally peer-reviewed.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.