Autism is a heterogenous neurodevelopmental condition defined by difficulties in social communication and interaction and restricted, repetitive patterns of behavior, interests, or activities (American Psychiatric Association, 2013). Autism has an estimated global prevalence of over 1%, and recognition has increased in recent years, especially in Western countries (Zeidan et al., 2022).

A ‘gold standard’ clinical diagnostic assessment of autism often includes a developmental interview, such as the Autism Diagnostic Interview-Revised (ADI-R; Lord et al., 1994) and a behavioral observation, such as the Autism Diagnostic Observation Schedule-Second Edition (ADOS-2; Lord et al., 2012). School reports and other assessments (e.g., developmental and cognitive) are used adjunctively (Lord et al., 2020). Nonetheless, ‘gold standard’ tools are resource-intensive and can lack reliability within clinical settings, where ongoing checks on adherence to administrative procedures and consistent interpretation are often lacking (Bishop & Lord, 2023). Furthermore, training and purchasing equipment to administer ADOS-2 is costly (Galliver et al., 2017). The ADOS-2 was developed and validated in specific Western cultures, with little evidence in assessing autism in non-Western non-English speaking populations (Harrison et al., 2017). Additionally, these assessments largely underdiagnose autism in girls/women (Navarro-Pardo et al., 2021). Lastly, ADOS-2 and ADI-R show remarkably low agreement in clinical practice (Kamp-Becker et al., 2021), which reduces the validity of the combined assessment.

Despite evidence that autism can be reliably diagnosed by 24 months (Pierce et al., 2019; Sacrey et al., 2018), a diagnosis is frequently given much later. A recent review found that the average age of autism diagnosis is at 60.48 months (van ‘t Hof et al., 2021) however, data are overwhelmingly from high-resource countries in Europe and North America, where waiting lists for assessment exceeding 18 months are common (Gordon-Lipkin et al., 2016). Barriers to timely autism assessment and diagnosis include geographic and ethnic disparities (Antezana et al., 2017; Daniels & Mandell, 2014; Khowaja et al., 2015; Lauritsen et al., 2014; Overs et al., 2017; Pillay et al., 2021; Samms-Vaughan, 2014; Shrestha et al., 2019; Williams et al., 2015; Zuckerman et al., 2013). In low- and middle-income countries (LMICs), valid and culturally sensitive tools are lacking, meaning that children and adults with autism are less likely to ever receive a diagnosis (Durkin et al., 2015; Marlow et al., 2019). Moreover, there is a significant shortage of professionals trained in assessing autism in rural and low-resourced areas, where the majority of children reside (Franz et al., 2017; Olusanya et al., 2018). Families living in rural areas also need to travel long distances for an assessment, which limits their access to services (Gallego et al., 2017). Finally, limited knowledge about autism, cultural perceptions of healthcare, and stigma surrounding autism are impeding timely assessment (de Leeuw et al., 2020).

Reflecting the need to provide timely medical services to underserved communities and prompted by the disruption of healthcare during COVID-19, studies have examined the feasibility of telehealth to reduce costs and waiting times (Shore et al., 2020; Zwaigenbaum et al., 2021). Telehealth refers to ‘the provision of health care remotely by means of a variety of telecommunication tools, including telephones, smartphones, and mobile wireless devices, with or without a video connection’ (Dorsey & Topol, 2016, p.154). Synchronous methods entail ‘live’ online assessments by a clinician using videoconferencing or audioconferencing. Asynchronous methods encompass an online review of material such as video recordings, or responses to digital questionnaires (Alfuraydan et al., 2020). Recent reviews concluded that telehealth assessments for autism have high clinical utility and comparable accuracy to in-person assessments (Alfuraydan et al., 2020; Dahiya et al., 2020, 2021; Stavropoulos et al., 2022a; Sutherland et al., 2018; Valentine et al., 2021). There are nevertheless significant limitations in their generalisability, as the former reviews included studies with heterogenous samples, investigated broad outcomes, and were exploratory in nature.

This systematic review aims to update the existing literature and explore the evidence base around telehealth assessment for autism globally. We aim to address the critical question: Are adaptations of ‘gold-standard’ diagnostic assessments for telehealth administration for autism valid and diagnostically accurate compared with care-as-usual in-person assessments? We evaluate the psychometric properties of novel tools developed or adapted for telehealth use in children with suspected autism, with the objective of informing the future direction of clinical practice globally.

Methods

This systematic review was conducted following PRISMA guidelines (Moher et al., 2009), and a PROSPERO protocol was registered online (CRD42022332500). This study comprises part of a larger project (Children’s Autism Technology-Assisted Assessments; CHATA), which aims to develop and test a novel online diagnostic assessment pathway for preschool children in an ethnically diverse sample in Newham, London.

Search Strategy

To identify eligible studies, three major databases were searched: MEDLINE, Embase, and PsycInfo. A grey literature search was conducted to identify additional studies (e.g., doctoral dissertations). Relevant reviews were manually searched and subject to reference mining.

The combination of the following search terms was used to search databases: (autism spectrum disorder* OR autis* OR autis* disorder OR Asperger’s Syndrome OR pervasive developmental disorder OR PDD-NOS) AND (online OR digital OR telehealth OR virtual) AND (assessment OR diagnosis OR identification) AND (validity OR reliability OR specificity OR sensitivity OR inter-rater agreement OR positive predictive value OR negative predictive value OR internal consistency OR psychometric OR acceptability OR satisfaction OR feasibility). Publications from inception to 24 May 2022 were identified.

Eligibility Criteria

The PICOS (Population, Intervention, Comparison, Outcomes and Study design) framework was used to develop the eligibility criteria (Methley et al., 2014). Table 1 lists the inclusion and exclusion criteria for this review based on the five PICOS domains. No article was excluded based on the language or country of publication.

Table 1 PICOS criteria

Study Selection and Data Extraction

Following the deduplication of records, one reviewer (PK) screened titles and abstracts, while 10% of them were double-checked by a second reviewer (EB). Discrepancies or disagreements were resolved through discussion between the reviewers. When a consensus could not be reached, the research team was consulted. Full texts of potentially eligible articles were examined by the first reviewer, who selected studies for inclusion based on the eligibility criteria. Data were extracted in a standardized form including the following sections: first author and date; setting; participants; mean age; gender; ethnicity; tool(s); mode of assessment (synchronous or asynchronous); validity; reliability; diagnostic accuracy.

Quality Assessment

The quality of the included studies was assessed using the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) Risk of Bias checklist, with adaptations for clinician-reported outcomes (Mokkink et al., 2018, 2020), where appropriate. This tool assesses psychometric properties, including internal consistency, reliability, measurement error, content validity, construct validity (structural validity, hypotheses testing, and cross‐cultural validity), criterion validity, and responsiveness. Each COSMIN item was scored using one of the four options available (‘very good’, ‘adequate’, ‘doubtful’, or ‘inadequate’), and the overall rating of the quality of each study was determined by ‘the worst score counts’ principle, using the lowest rating of any standard in the box. No study was excluded based on quality assessment results.

Data Synthesis

Data were synthesized narratively. First, we present the results from studies which utilized adapted versions of ‘gold standard’ tools for telehealth administration and compared the diagnostic agreement between telehealth assessment and care-as-usual in-person assessment. Second, we report on the psychometric properties and diagnostic accuracy of diagnostic tools for autism that have been administered via telehealth, using either synchronous or asynchronous methods. Finally, we summarize the evidence regarding the psychometric properties and diagnostic accuracy of clinician-administered virtual screening tools for infants and toddlers that present with autistic traits.

Results

Study Selection

After removing duplicates, our search identified 1677 studies for abstract screening. After excluding 1432 studies, 245 records were identified for full-text screening. The two reviewers agreed in 97% of cases (K = 0.93). From there, 17 articles met the eligibility criteria and were included in this review. One additional article was identified through reference mining. In total, 18 studies were included in this review. Figure 1 illustrates the study selection procedure.

Fig. 1
figure 1

Flowchart of study selection

Study Characteristics

The characteristics of the studies are presented in Table 2. Most studies (K = 15) were conducted in the USA. Of the remaining three studies, one was conducted in South Africa, one in Indonesia, and one in the UK. Thirteen studies provided live assessments via synchronous methods, while five studies provided asynchronous assessments, using video recordings. In total, 1593 children were assessed for autism via telehealth in the included studies. Description of the tools administered via telehealth can be found in the Online Resource 1.

Table 2 Characteristics of the studies included

Quality Assessment

Out of the 18 studies assessed, four (22.2%) were rated as ‘very good’; three (16.7%) were rated as ‘adequate’; five (27.8%) were rated as ‘doubtful’; and six studies (33.3%) were rated as ‘inadequate’ based on the ‘the worst score counts’ principle. Quality assessment results are presented in Table 3.

Table 3 Risk of Bias appraisal according to the COSMIN Risk of Bias checklist

Diagnostic Assessment via Telehealth

Twelve studies utilized adaptations of ‘gold standard’ procedures for telehealth or other tools developed or adapted for virtual administration and compared the diagnostic results from telehealth assessment to an in-person assessment.

Adaptations of ADI-R and ADOS for Telehealth

In an RCT by Reese and colleagues (2013), children diagnosed with autism or developmental delay (DD) were assessed for autism either via videoconferencing or with the clinician physically present in the same clinical room. Families in the telehealth modality were assessed in a clinic room equipped with assessment materials and electronic devices, and clinicians conducted the assessments from a separate room. In both research modalities, clinicians utilized ADOS and ADI-R, with appropriate adaptations in the telehealth condition. Clinicians’ diagnostic decisions matched children’s existing diagnosis in 86% of the cases in the telehealth modality and 83% of the time in the in-person modality. Moderate inter-rater agreement was found for ADOS (average percentage agreement = 72.07%, SD = 15.96%, k = 0.50), while the inter-rater agreement was higher for ADI-R (average percentage agreement = 88.89%, SD = 5.80%, k = 0.82) in the telehealth modality.

A subsequent RCT compared the diagnostic accuracy of two assessment modalities, one conducted through videoconferencing and the other in-person, to a subsequent care-as-usual assessment in clinic (Reese et al., 2015). In both research modalities, clinicians employed information from a modified ADOS-2, modified ADI-R, an unstructured 20-min observation, and child’s medical history and family history. The diagnostic results from the telehealth assessment agreed in 85.7% of the cases with the diagnostic outcomes from the scheduled clinical assessment. Telehealth assessment showed high sensitivity (0.84) and specificity (0.88). The in-person assessment agreed in 82.4% of the cases with the scheduled clinical assessment, with a sensitivity and specificity score of 0.78 and 0.88, respectively.

In conclusion, telehealth assessments using adapted ‘gold standard’ procedures highly agreed with in-person clinical assessments, with an agreement of 85.6 to 86%. Both ADOS and ADI-R achieved acceptable reliability via telehealth.

Diagnostic Assessment Using Alternative Procedures via Telehealth

Juárez and colleagues (2018) conducted two studies where toddlers were assessed for autism virtually using the Screening Tool for Autism in Toddlers and Young Children (STAT; Stone et al., 2000), an interactive screening tool examining behavioral features of autism, and a ‘DSM-5 Clinical Interview’ for autism, which has been used in research settings to diagnose autism (e.g., Swanson et al., 2014). The diagnostic results from the telehealth assessment were compared to the diagnostic outcome from a care-as-usual assessment, utilizing the ‘DSM-5 Clinical Interview’, ADOS-2 and an assessment of cognitive functioning and adaptive behavior. The telehealth assessment agreed in 80% of the cases with the in-person evaluation and yielded a sensitivity score of 0.79.

McEwen and colleagues (2016) examined the validity and diagnostic accuracy of the autism module of the Development and Well-Being Assessment (DAWBA; Goodman et al., 2000), a structured interview containing modules to diagnose ICD-10 and DSM-5 psychiatric conditions, via telehealth. DAWBA was either administered via telephone interview or completed online by parents. Families were subsequently provided with a ‘gold standard’ assessment using ADI-R and ADOS. DAWBA assessment agreed 86.2% of the time with the ‘gold standard’ in-person diagnostic outcome. DAWBA scores were highly correlated with ADI-R scores (r = 0.82, p < 0.001), indicating high convergent validity. Children diagnosed with autism (M = 30.65, SD = 10.77) manifested higher DAWBA scores compared to unaffected cotwins (M = 9.02, SD = 10.30), and children at low likelihood of developing autism (M = 7.75, SD = 4.14), although significance levels were not calculated. Finally, DAWBA displayed excellent area under the curve (AUC) (0.91. p < 0.001, 95% CI: 0.87–0.94), and high sensitivity (0.88), specificity (0.85), positive predictive value (PPV) (0.81), and negative predictive value (NPV) (0.91), indicating high diagnostic accuracy.

In conclusion, a diagnostic assessment utilizing STAT and a diagnostic interview via videoconferencing showed good agreement with care-as-usual assessment (80%). Similarly, DAWBA administered virtually achieved high validity and good agreement (86.2%) with ‘gold standard’ in-person assessment for autism.

Behavioral Observation Tools

Four studies examined the TELE-ASD-PEDS (TAP; Corona et al., 2020), a brief observational tool developed to assess children under 36 months using a series of structured tasks designed to elicit behaviors of concern via telehealth. In an initial study (Corona et al., 2021), children with a diagnosis of autism or DD and a comparison group were assessed using TELE-STAT or TAP. Children receiving an autism diagnosis scored higher on TAP (M = 15.53, SD = 1.77) compared to children for whom autism was ruled out (M = 8.83, SD = 1.72). The diagnostic outcome from the telehealth assessment agreed with the existing diagnosis of the children in 86% of the cases. Additionally, Wagner and colleagues (2021) found that children diagnosed with autism with TAP received the highest TAP scores (M = 17.96, SD = 2.36), followed by children with suspected autism (M = 15.14, SD = 2.45), children with uncertain diagnosis (M = 12.32, SD = 1.52) and children for whom a diagnosis was ruled out (M = 9.96, SD = 1.64). A subsequent study, similarly, found that children with autism (M = 16.67, SD = 2.62) and children with suspected autism (M = 16.57, SD = 1.99) scored higher on TAP compared to children with uncertain diagnosis (M = 12.71, SD = 2.49) and no signs of autism (M = 8.50, SD = 1.95) (Wagner et al., 2022). However, none of the studies above calculated significance levels. Finally, a study which utilized either TAP or TELE-ASD-KIDS, a telehealth behavioral observation tool using tasks and activities from TAP and ADOS-2, found inter-rater reliability of 83.60% across cases (Stavropoulos et al., 2022b).

Dow and colleagues (2022) validated the Brief Observation of Symptoms of Autism (BOSA), a novel interactive observation tool adapted from the Brief Observation of Social Communication Change (BOSCC; Grzadzinski et al., 2016) and ADOS-2. BOSA can be administered in 12 to 14 min and uses materials from ADOS-2. Four BOSA modules exist based on child’s age and verbal fluency, and those are scored using ADOS-2 scoring algorithms. BOSA showed high inter-rater agreement, with intra-class correlation coefficient (ICC) ranging from 0.92 to 0.93 across modules, excellent test–retest reliability (ICC = 0.95, p < 0.01), and good cross-site reliability (ICC = 0.84, p < 0.01). High structural validity was found, with factor loadings ranging from 0.40 to 0.93 in Toddler Module, 0.25 to 0.98 in Module 1, 0.50 to 0.95 in Module 2, and 0.55 to 0.96 in Module 3. BOSA displayed high convergent validity, given the high correlation with ADOS-2 scores (r = 0.54–0.74, p < 0.001). Lastly, BOSA showed high AUC (0.87–0.96), sensitivity (0.86–0.96), and specificity (0.74–1.00) across modules.

Three studies assessed children for autism using the Naturalistic Observation Diagnostic Assessment (NODA; Nazneen et al., 2015). NODA is a mobile application, which allows caregivers to answer questions concerning their child’s developmental history and record their child in four scenarios. Parents can upload the clips through the application, which can be subsequently accessed, along with the developmental history, by the clinician. Clinicians assign a number of predefined ‘tags’ to the videos reviewed, with each ‘tag’ corresponding to a DSM-5 criterium. In an evaluation study, Nazneen and colleagues (2015) found an inter-rater agreement of 91% between clinicians in assigning a diagnosis using NODA. A validation study utilized NODA to assess children who had already undergone an in-person ‘gold standard’ assessment for autism (Smith et al., 2017). The diagnostic agreement between NODA assessment and in-person assessment was 88.2%. NODA displayed acceptable inter-rater agreement (78%) and moderate inter-rater reliability (k = 0.56, 95% CI: 0.53–0.59 and ICC = 0.85, 95% CI: 0.73–0.91). The number of ‘tags’ assigned was significantly higher in children diagnosed with autism (Z = 2.54, p = 0.01) compared to the comparison group, indicating high known-groups validity. High sensitivity (0.85, 95% CI: 0.67–0.94) and specificity (0.94, 95% CI: 0.71–1.00) were also found for NODA. Finally, a study utilized a virtual recording evaluation (VRE) protocol to assess children remotely, which used scenarios adapted from NODA (Sutantio et al., 2021). Diagnostic results from VRE were compared with an in-person diagnostic assessment. VRE agreed 82.5% of the time with in-person assessment and showed high sensitivity (0.91, 95% CI: 0.80–1.00), specificity (0.71, 95% CI: 0.49–0.92), PPV (0.81, 95% CI: 0.66–0.96), and NPV (0.86, 95% CI: 0.67–1.00).

In summary, studies have used both synchronous (e.g., TAP, STAT, BOSA) and asynchronous (NODA) observational tools to assess for autism virtually. Those tools achieved acceptable reliability and validity and high diagnostic accuracy, and the telehealth diagnostic results showed high agreement with in-person care-as-usual assessments (82.5–88.2%).

Telehealth Screening for Children with Suspected Autism

Six studies assessed infants or toddlers with early signs of autism using clinician-administered screening tools.

Screening Infants with Early Symptoms of Autism Virtually

In two studies, infants with developmental concerns were evaluated for autism with the Telehealth Evaluation of Development for Infants protocol (TEDI; Talbott et al., 2020). As part of this assessment procedure, parents completed pre-assessment questionnaires online and subsequently engaged in clinician-guided interactive tasks via videoconferencing. The Autism Observation Scale for Infants (AOSI; Bryson et al., 2008), a brief interactive screener examining non-verbal behaviors in infants, comprises part of TEDI. In a feasibility study (Talbott et al., 2020), high inter-rater agreement (ICC = 0.94) and test–retest reliability (r = 0.86, p = 0.002) was found for AOSI total severity score. Reliability was significantly lower when the AOSI number of behaviors was analysed. In a subsequent study examining TEDI in a bigger sample (Talbott et al., 2022), both the AOSI total score and number of behaviors showed high inter-rater agreement (ICC = 0.94 and ICC = 0.89, respectively), while significant test–retest reliability was detected for AOSI total scores (r = 0.459, p = 0.01), but not for AOSI number of behaviors (r = 0.47, p = 0.171).

Screening Toddlers with Suspected Autism via Telehe>

In one study (Phelps et al., 2022), children were assessed for autism using several tools, including the Childhood Autism Rating Scale-Second edition (CARS-2; Schopler et al., 2010). CARS is a rating scale, which examines 15 behavioral domains relevant to autism in toddlers, utilizing information from parent report or behavioral observation. Participants diagnosed with autism following telehealth assessment scored significantly higher on CARS-2 compared to those referred for in-person assessment (t = 6.27, p < 0.001), and those not diagnosed (t =  − 16.85, p < 0.001), indicating high known-groups validity. Additionally, the CARS-2 severity score significantly predicted the diagnostic outcome (β =  − 2.83, SE = 0.79, p = 0.001).

Another interactive screening tool is the Autism Detection in Early Childhood (ADEC; Young, 2007). ADEC assesses for features of autism in toddlers aged from 12 to 36 months, utilizes toys that families typically own, and is administered in 10 to 15 min. In a validation study (Kryszak et al., 2022a), toddlers were assessed with ADEC-V, a modified version of ADEC for virtual administration. ADEC-V showed acceptable internal consistency (ω = 0.75, α = 0.77) which improved after removing low-performing items (ω = 0.82, α = 0.82). ADEC-V scores showed a moderate correlation with ADI-R scores (r = 0.26, p < 0.01) and a high correlation with CARS-2 scores (r = 0.70, p < 0.001), indicating acceptable convergent validity. ADEC-V additionally showed high AUC (0.88), sensitivity (0.82), specificity (0.78), PPV (0.95), and moderate NPV (0.42). ADEC-V scores could also predict the final diagnosis (OR = 1.33, 95% CI: 1.16–1.60, p < 0.001). Finally, children who received a diagnosis of autism scored significantly higher compared to the comparison group (t = 7.59, p < 0.001) on ADEC-V.

Two studies investigated the Systematic Observation of Red Flags (SORF; Dow et al., 2017), an observational screener which rates early signs of autism in behavior samples. In a validation study in a community setting (Chambers et al., 2017), clinicians utilized SORF to code autistic symptoms on a CSBS-Developmental Profile Behavior Sample (Wetherby & Prizant, 2002) and an unstructured naturalistic observation recording. Children who were later diagnosed with autism using ‘gold-standard’ procedures (e.g., ADOS) scored significantly higher on SORF compared to the comparison group on the unstructured observation sample (F = 20.64, p < 0.001, 95% CI: 15.24–27.43) and the CSBS sample (F = 67.80, 95% CI: 27.87–38.67, p < 0.001), suggesting high known-groups validity. Dow and colleagues (2020) assessed children for autism with SORF using behavior samples from a naturalistic video-recorded home observation. A clinical in-person assessment was subsequently conducted using ADOS. SORF displayed moderate to high AUC (0.79), sensitivity (0.70), specificity (0.67), PPV (0.55), and NPV (0.79). The composite score of the six best-performing items demonstrated improved AUC (0.81), sensitivity (0.76), specificity (0.75), PPV (0.66), and NPV (0.84).

In summary, infants with early signs of autism could be reliably assessed remotely using AOSI (part of the TEDI protocol). As for toddlers, CARS-2 and ADEC-V, synchronous screening tools, showed high validity and accuracy, while SORF, an asynchronous screener, demonstrated promising validity and considerable accuracy.

Discussion

This review aimed to explore the psychometric properties and diagnostic accuracy of telehealth autism assessment tools for children globally. We identified studies examining an array of assessment tools and procedures, including behavioral observation tools, developmental and diagnostic interviews, and clinician-administered screening tools for children with suspected autism.

In accordance with previous reports, telehealth assessments for autism could be divided into two categories; those utilizing synchronous methods, including real-time communication between providers and families, and those conducted via asynchronous, or store-and-forward methods, where parents send clinical information or material (e.g., videos) to providers to assess (Alfuraydan et al., 2020). Overall, we found high diagnostic agreement between telehealth and in-person assessment which ranged from 80 to 88.2%. This agrees with Stavropoulos and colleagues’ (2022a) review, which found 80–91% agreement between the two modalities. However, unlike previous reviews, our study focused on telehealth assessments specifically for children presenting with autistic features, incorporated a larger pool of studies, including those examining clinician-administered screening tools, and systematically examined the psychometric qualities of those tools.

Early studies utilized modified versions of ADOS and ADI-R for telehealth administration (Reese et al., 2013, 2015). Although those assessments were feasible and largely accurate, children were assessed in clinical rooms, limiting generalisability in other contexts, such as home or school. More recent studies examined novel behavioral observation tools, such as BOSA (Dow et al., 2022) and TAP (Corona et al., 2021). While those tools appear promising, limitations exist. For example, BOSA requires a specific set of toys to be provided to families in advance of the assessment, limiting its applicability to low-resourced families and families from non-Western cultures (Berger et al., 2022). TAP is an acceptable and convenient tool (Wagner et al., 2021, 2022), which overcomes BOSA’s limitations, as typical toys found in families’ homes can be utilized. However, published data regarding TAP’s psychometric data are not currently available. The autism module of DAWBA is valid and efficient when conducted virtually (McEwen et al., 2016) and has similar diagnostic accuracy to the ADI-R (Lebersfeld et al., 2021). This tool has been used in several national surveys of child mental health in the UK (Sadler et al., 2018) and has been translated into 19 languages.

Αsynchronous tools, such as NODA, showed high agreement with in-person assessment across different cultural contexts (Smith et al., 2017; Sutantio et al., 2021). The ability to detect autistic behaviors from home videos has been described previously (Baranek, 1999; Ozonoff et al., 2010, 2011) and has been used by clinicians for diagnostic purposes (Gabrielsen et al., 2015). Store-and-forward methods benefit from increased ecological validity, given that videos can capture behavior in a naturalistic environment, mitigating the impact of clinic-induced anxiety (Kerns et al., 2014). Importantly, asynchronous assessment tools have been validated in LMICs (Chambers et al., 2017; Sutantio et al., 2021). Screening and assessment through mobile technologies can potentially improve early access and identification in LMICs (Kumm et al., 2022; Sondaal et al., 2016), although limitations exist with poor access to the internet and digital technology for low-resource families.

Some online screening tools exhibited excellent psychometric properties and promising diagnostic accuracy. In terms of screening infants with autistic traits virtually, the TEDI protocol was feasible and reliable (Talbott et al., 2020, 2022). Evidence shows that early signs of autism can emerge in the first year of life (Sacrey et al., 2018), a period of rapid brain development (Zwaigenbaum et al., 2015). Accordingly, this assessment protocol could potentially address the lack of tools for this age group and expedite access to early interventions for autism. The latter can lead to significant improvement in overall functioning and a better prognosis (Landa, 2018).

Other screeners showed high validity and diagnostic accuracy when used with toddlers. For example, ADEC-V showed sensitivity and specificity equal to that of ADOS-2 and ADI-R (Lebersfeld et al., 2021). Unlike ADOS-2, ADEC-V does not require extensive training, is brief, and can utilize common toys found in families’ homes. Therefore, services could leverage the efficiency and accuracy of synchronous or asynchronous screeners to assess infants and toddlers at increased likelihood of developing autism (Roberts et al., 2019; Rotholz et al., 2017). This would particularly benefit families living in LMICs or rural areas, where professionals qualified to administer ‘gold standard’ tools are limited (Sukiennik et al., 2022).

Nevertheless, it is important to note that although the tools described in this paper exhibit promising psychometric properties, the majority of them have been validated in Western, high-income countries, which potentially limits their generalizability in other sociocultural contexts. This is in line with previous reviews highlighting the lack of studies investigating the cultural validity of autism assessment or screening tools (Al Maskari et al., 2018; Mukherjee et al., 2022; Stavropoulos et al., 2022a). It is imperative that further studies investigate the possible adaptations required for non-Western and ethnically and linguistically diverse populations (Stoll et al., 2021) and examine the optimal means of administration in LMICs (Marlow et al., 2019). This will not only include the translation of the tools, but also the amendment of specific items and procedures, to reflect the cultural values, customs, and differences in the perception of autism across different cultures (Al Maskari et al., 2018; de Leeuw et al., 2020). Novel culturally appropriate tools can also be developed and used in those contexts (Gladstone et al., 2017). This will benefit both high-income countries with diverse sociocultural populations and LMICs.

Limitations

Many of the studies reviewed here were either pilot or feasibility studies. Most comprised small samples, lacked a comparison group, and were rated as having low or moderate quality. Accordingly, drawing firm conclusions about their validity and reliability, and their applicability outside of the confines of their cultural context is limited. Differences in the methodology followed and the use of different terms to describe the outcomes (e.g., diagnostic or predictive validity and diagnostic accuracy) were apparent across studies. Another limitation is ‘ascertainment bias’; children with suspected autism are more likely to receive a diagnosis of autism, potentially inflating the diagnostic accuracy of the telehealth tools. Moreover, a clear protocol for telehealth assessment was lacking in most cases. Clear protocols are important to ensure that the assessments are provided in a standardized manner (Jang et al., 2022). Notably, 16 out of the 18 studies (89%) were conducted in high-income Western countries. Lastly, a meta-analysis was not conducted.

Future Research

Future studies should use larger samples of children with autistic traits and include a comparison group, consisting of children with other conditions (e.g., DD, intellectual disabilities) or children with no developmental concerns. This would allow the investigation of the ability of those tools to discriminate between different conditions and predict a diagnosis of autism accurately. Studies conducted in different geographical (urban and rural) and sociocultural contexts and low-resource settings are needed, to explore the unique challenges and benefits of telehealth for those populations and establish cultural sensitivity and validity. Sociocultural factors that might affect the acceptability and accuracy of the diagnostic tools should be identified through rigorous research led by a framework identified by de Leeuw and colleagues (2020). This way, diagnostic procedures could be adapted to accommodate to cultural elements and traditions and facilitate access for groups facing hardship. In addition, although most of the tools described in this review demonstrated high validity and diagnostic accuracy, for almost one out of five children, a diagnosis was either given incorrectly or missed. Therefore, studies should investigate which children are most likely to benefit from a telehealth assessment. Initial evidence suggests that preschool children (Phelps et al., 2022; Stainbrook et al., 2019), especially those with profound developmental impairment and apparent features of autism (Kryszak et al., 2022b), are more likely to receive diagnostic clarity in a telehealth assessment compared to older and medically complex children, or those with intellectual disabilities (McNally Keehn et al., 2022). Given that some of the tools described in this paper demonstrated high known-groups validity, it is possible that telehealth assessments for autism are more accurate and appropriate for children with a clear presentation of autism. Children with subtle or atypical symptoms can be initially identified through a virtual assessment and subsequently referred for a comprehensive in-person assessment. Lastly, detailed telehealth assessment protocols, which will ideally assess for comorbidities and other domains of functioning, should be developed.

Conclusions

Our review aimed to investigate the evidence base for the diagnostic evaluation of autism in a telehealth environment for children. The current evidence suggests that clinicians can assess children for autism via telehealth accurately and reliably. Although research in this field is still in its infancy, our findings indicate that diagnostic decisions made via telehealth highly agree with the diagnostic outcomes of care-as-usual clinical face-to-face assessments. Tools developed or adapted for virtual administration demonstrate promising validity and high diagnostic accuracy, which in some cases is comparable to that of ‘gold standard’ tools. Telehealth assessments for autism have the potential to increase neurodevelopmental services’ efficiency and improve families’ access to timely and accurate assessment.