Children with Autism Spectrum Disorders (ASD) present impaired social communication and repetitive sensorimotor behaviours linked to genetic and environmental factors (Lord et al., 2018). Compared to typically developing children they could also manifest learning difficulties associated with language and cognitive development (Ousley & Cermak, 2014). Impairment in Executive Functions (EF) has been linked to difficulties in children's psychosocial adjustment and several cognitive models (Baron-Cohen et al., 1985; Demetriou et al., 2019; Hill, 2004; Olde Dubbelink & Geurts, 2017) have been suggested to explain ASD symptomatology based on the atypical EF processes.

Although no cure or a unique gold standard treatment for ASD exists, a diverse range of efficacious therapies which target autism symptoms are available such as “Applied Behaviour Analysis (ABA)” and its subsequent adaptations “Pivotal Response Training” and “Early Start Denver Model”, Cognitive Behavioural Therapy models (CBT) (Politte et al., 2015), and Theory of Mind (ToM) interventions (Begeer et al., 2011). Most approaches follow the technique of task analysis, thus breaking the task into steps and rewarding the child for completing each step, along the way. Better results appear from early and ongoing interventions that are tailored to children’s specific needs (Rogers & Vismara, 2014).

ABA based interventions are more frequently used but they are very demanding in time as they involve as much as 40 therapeutic hours a week of one-on-one basis (Leaf et al., 2021). Intervention’s studies based on ToM have produced mixed results about their efficiency, but still challenges related to reasoning about others' feelings and way of thinking proved to be critical in distinguishing ASD children with worse social functioning (Altschuler et al., 2018; Rosello et al., 2020). CBT interventions have shown to be efficacious in reducing mental health problems and improving emotion regulation with only small to moderate treatment effects (Weiss, 2018).

The development of technology aspires to enhance existing treatments for autism while helping to overcome various barriers such as lack of qualified therapists, (especially in non-urban areas), high cost of treatment, or problems associated with the effects of the repetitive nature of therapeutic tasks like tiredness or distraction (Roski et al., 2019). Integrating a social-humanoid robot within the standard clinical treatment has been proven promising in caring for children with ASD (Diehl et al., 2012) and in providing clinicians the means to connect with ASD children in an easier way (Manzi et al., 2020; Pennisi et al., 2016).

Reviews of more than fifty studies using social robots concluded that they showed encouraging results on social behaviour, imitation and engagement (Begum et al., 2016; Papakostas et al., 2021; Yuan et al., 2021), but they were also lacking scientific rigor as they were mainly non-randomized with small samples and questionable methodology (Duradoni et al., 2021; Ismail et al., 2019). More specifically, at a recent meta-analysis (Kouroupa et al., 2022) it was found that the methodological quality of robot-assisted autism studies was affected by the absence of an intelligence assessment, the variety of the duration of the intervention (from 3 to 180 min), the variety of the session’s frequency (single session to 3 times a week), the lack of longitudinal data and the absence of clear documentation of statistical significance in some of the studies.

Moreover, very few studies had a clear clinical objective as to compare robot- versus human-led interventions to explore which one can ensure better care. In a recent systematic review of randomised controlled trials (RCTs) by Salimi et al. (2021) targeting robot-assisted therapies in the field of ASD the authors concluded that at the moment robots are used in therapy mainly as entertainment agents, since they do not appear to be ready yet to deliver high-end care. The reviewed RCTs, although methodologically more powerful than proof-of-concept studies, presented challenges that reduced their robustness. In more detail, the intervention duration ranged from 1 session to 10 weeks. Sample sizes ranged from 14 to 36 participants with group sizes ranging from 6 to 24 participants. Out of six trials that fulfilled the inclusion criteria five used different types of robots. The most important concern was that none of the trials included follow-up evaluations to ascertain if the effects of the interventions are maintained. Αccording to the results of the review, when compared to humans social robots are shown to be "less effective" while when social robots are not compared to humans, they are usually proved to be effective (Salimi et al., 2021).

Despite existing challenges, the scientific community has evidence to persevere with robot therapy research with optimism, mainly due to the substantial technological progress within the field of Artificial Intelligence (AI) and data science in the past decade (Gasser, 2021).

The theoretical framework supporting the various ASD interventions may vary significantly resulting in different models of implementation in terms of session frequency, therapeutic techniques, tools used, degree of parental involvement and outcomes (Lytridis et al., 2020; Seida et al., 2009), therefore several different types of interventions have been tested in robot-assisted therapy (Robinson et al., 2019) such as ABA-based (Salvador et al., 2016); CBT-based (Marino et al., 2020) and ToM-based interventions (Zhang et al., 2019).

In this groundwork context, this study was focused on evaluating the efficacy of a robot-assisted psychosocial intervention for children with ASD. The secondary goal was to explore potential differences between a robot-assisted intervention group and a control group that receives intervention by humans only. Specifically, the main hypothesis was that children in both groups at the end of the intervention will achieve improvements in social perception, prosocial behaviour and emotion regulation. The secondary hypothesis was that children in the robot-assisted group will have similar improvements compared to the children engaged in the control group.

Methods

Study Preparation

In order to design a valid and effective psychosocial protocol, the following preparatory steps were carried out (Fig. 1). First a comprehensive literature review was conducted to determine theoretically derived implementation components that were likely to be relevant to the protocol (Papakostas et al., 2021). Second, a multidisciplinary focus group examined the applicability of the intervention by using a mixed method research design involving quantitative and qualitative measures. Third, the protocol design was completed and two pilot sessions (one with a typically developing girl, aged 11 years old, and one with a paediatric male inpatient with ASD, aged 12 years old) were conducted and recorded to test its application in real settings before the official trial onset. Taking into account the comments of the research team, final corrections were made and the protocol was piloted in a feasibility study examining its clinical usability.

Fig. 1
figure 1

Study preparation

Focus Groups

A multidisciplinary focus group was put together to identify the intervention’s requirements. The group consisted of 10 members with extensive experience in the field of autism and 4 members with experience in the field of artificial intelligence. More specifically, the group was composed of one neuropediatrician, one pediatrician, a clinical psychologist, a neuropsychologist, a child psychologist, three special needs’ educators, a speech therapist, a physiotherapist, an ergotherapist and a technical team including engineers and computer scientists. This group collectively possess research and clinical expertise on the use of social robots for children with ASD.

After several focus group sessions, one of the main challenges was to identify how the robot will behave when a “crisis” appears relevant either to the child or to the robot itself (robot’s malfunction). It was decided that the session’s flow could benefit by controlling the robot's behaviour with linguistic cues phrased by the therapist. To select appropriate linguistic cues that mimic those used by therapists, a qualitative study that would explore the vocabulary used by clinicians when a problem arises during the interaction with the child, was conducted.

Qualitative Study

Online surveys were completed by 33 professionals from different backgrounds to explore their perspectives on integrating social robots in existing workflow and to identify the vocabulary more often used in psychosocial interventions with ASD children, in specific situations. Taking into account Nikopoulou et al. (2021) findings, six linguistic cues were selected (e.g. well done, again) and specific phrases (e.g. I need time to relax) were used so as to accordingly program the robot to show empathy and respond accordingly when a crisis emerges (Table 1).

Table 1 Therapist’s Cues to Adjust Robot’s Behaviour

Protocol Design

Based on ToM and ABA principles combined with cognitive-behavioural techniques, the design of the intervention aimed at addressing socioemotional, cognitive and behavioural issues related to ASD. Specifically, it aimed to develop play skills (e.g. taking turns at games or sharing games); conversation skills (e.g. starting a conversation, body language communication); emotional self-regulation skills (e.g. responding appropriately without engagement in tantrums, whining, etc. when told “no” following a request, understanding how others feel); problem-solving skills (e.g. conflict resolution or making decisions in social situations).

The tasks (i.e. free play, symbolic play, cognitive training, empathy training, behaviour training, relaxation training) were selected to meet the following requirements: being in accordance with standard therapeutic methods, being suitable and adaptable to each child’s development and personal needs, being feasible to apply all of it or part of it by the robot.

The intervention protocol consisted of 7 steps (Table 2) described analytically elsewhere (Holeva et al. 2019). Due to the probability of ASD children having difficulties to follow all the steps, the therapist was allowed to intervene and change the flow of the steps in favour of the child’s emotional well-being. All therapeutic scenarios were designed based on quick shifts to keep the child engaged. Depending on the child's attention abilities, each session could last 35 to 45 min. At the end of each session the therapist provided a synopsis to the parents of how the intervention went and a short task as homework in order to retain the benefits of the session.

Table 2 The intervention protocol

Feasibility Study

A feasibility study was conducted from October 2019 to January 2020 with a pre-post assessment design to explore the effectiveness of the designed intervention protocol, its clinical usability, the suitability of the clinical tools and possible technological challenges (Kaburlasos et al., 2021). The analysis of the results showed that the implementation of the designed intervention could be beneficial for children with ASD by improving their socioemotional and communicative skills. This improvement was highlighted by neuropsychological testing and parent reporting.

Operations Manual

An operations manual was developed to allow standardisation of all procedures along with a one day training session to familiarise the members of the research team with it. Data collection forms were approved by the members of the research team so as to be clearly formatted, with only truly relevant data being collected. All changes to the clinical protocol were recorded and the different versions were kept on file.

Main Study

The main study adopted a randomized controlled trial design (registration number: ISRCTN31154845) approved by the scientific committee of Papageorgiou General Hospital (identification number: 1274/18/10/2018). Parental written informed consent, prior to taking part in the study, was obtained. More specifically, a preliminary session was held to deliver the study information. During the session, a member of the research team was responsible for clarifying the study’s aims and answering questions. Consent forms were distributed for both parents of each child to read and sign. Parents remained blind to the group allocation until all baseline measures were obtained. The outcome assessors were also blinded to the group allocation and the therapists were blinded to the outcome measures results.

Implementation

A web-based service was adopted to randomise children into two groups; a robot-assisted intervention group (NG) and a therapist only intervention group (CG). Treatment allocations remained concealed from the main researchers until recruitment was irrevocable (Fig. 2). The intervention took place in two specialised centres: the paediatric clinic at Papageorgiou General Hospital in Thessaloniki, Greece and the Learning Disabilities center “Praxis” in Kavala, Greece.

Fig. 2
figure 2

Flow chart

The study was conducted over a three month period, with the pre-test and the post-test sessions conducted during the first and the last weeks of the intervention. Two sessions were conducted weekly. Each expert training session involved a triadic context consisting of the therapist, the social robot NAO and the child at the NG and a dyadic context consisting of the therapist and the child at the CG. NAO was programmed and employed as an assistant therapist in the NG during 21 intervention sessions. These sessions aimed at instructing children about social skills e.g. empathy, appropriate behaviour; control skills e.g. emotion self-regulation, inhibition control and cognitive skills e.g. joint attention, memory.

Inclusion criteria for the selected children were the following: ages 6–12 years, confirmed ASD diagnosis, IQ over 70, Greek-language comprehension, CARS-2 from 29 to 37, ΑDI-R: social interaction ≥ 10, communication ≥ 8 (when verbal ability present) or ≥ 7 (verbal ability absent), stereotypic behaviour ≥ 3 and parents/caregivers written informed consent. The children were assessed at baseline, at the end of the intervention and three months later by an independent “blind” outcome assessor.

Tools

As ASD children are characterised by diagnostic heterogeneity, both groups underwent cognitive and psychological evaluations to determine areas of strengths and weaknesses and to appropriately match across groups and conditions. Each participant was assessed through the appropriate screening measures and by means of questionnaires measuring strengths, difficulties and satisfaction. The tools used were:

  • Autism diagnostic tools: Childhood Autism Rating Scale (CARS-2; Schopler et al., 2010), and Autism Diagnostic Interview–Revised (ADI-R; Rutter et al., 2003) were used to confirm autism diagnosis, distinguish autism from other developmental disorders, and plan treatment by tailoring the intervention’s sessions. CARS-2 contains 15 items scored from 1 (no symptom) to 4 (severe symptom) in 0.5 intervals. For the standard version (CARS-ST) a total score of 15–29.5 favours the absence of ASD diagnosis; a score of 30–36.5 indicates mild to moderate autism; a score of 37–60 reflects moderate to severe autism. For the “high functioning” version (CARS-HF), adjusted for individuals with verbal fluency, over 6 years of age, with an IQ greater than 80, cut-off scores are specified as 15–27.5 for no ASD, 28–33.5 for mild to moderate ASD, and 34–60 for moderate to severe ASD (Schopler et al., 2010). CARS-2 total score was also used as a primary clinical outcome indicator.

  • Neuropsychological testing: Wechsler Preschool and Primary Scale of Intelligence (WIPPSI; Wechsler, 2012), and Wechsler Intelligence Scale for Children (WISC-V; Wechsler, 2014) were used to exclude children with cognitive deficits. The Intelligence Quotient is derived by the summation of five scales: Verbal Comprehension, Visual Spatial, Fluid Reasoning, Working Memory, and Processing Speed. A score of < 69 is defined as “extremely low”, a score of 70–79 as “very low”, a score of 80–89 as “low average”, a score of 90–109 as “average”, a score of 110–119 as “high average”, a score of 120–129 as “very high” and a score of > 130 is considered “extremely high”. The Developmental Neuropsychological Assessment (NEPSY-II; Korkman et al., 2007) was used to explore social cognition. The NEPSY-II subtests of Affect Recognition (AF) and Theory of Mind (ToM), which form part of the Social Perception domain, were administered to assess the ability to identify emotions, and the ability to understand other’s beliefs, intentions, thoughts, and feelings (Miranda et al., 2017). Inhibition (IN) another NEPSY-II subtest was used to detect problems with impulsivity, cognitive flexibility, and the level of self-monitoring skills, as behavioural regulation executive processes are crucial elements in social functioning (Leung et al., 2016). A standard score of <70 (scale score 1-3) is considered “well below expected”, a score of 70-79 (scaled score 4-5) is regarded as “below expected”, a score of 80-89 (scaled score 6-7) is treated as “slightly below expected”, a score of 90-110 (scaled score 8-12) is classified “at expected”, and a score >110 (scaled score >13) is defined as “above expected”. AF, ToM and INN scores were used as primary clinical outcome indicators.

  • Parent/Teacher reporting: The Achenbach System of Empirically Based Assessment (ASEBA; Achenbach & Rescorla, 2001), and the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997) were filled out by parents and teachers to assess symptoms, emotional difficulties, peer relationships and prosocial behaviour problems. The Internalizing/Externalizing and Total Syndrome ASEBA scores from parents (CBCL) and teachers (TRF) and the Peer relationships/Prosocial Behaviour and Total Difficulties SDQ scores were used as primary clinical outcome indicators. SDQ subscales were used as secondary clinical outcome indicators. The subscales “Intervention evaluation” and “Additional aid” (completed at follow-up) were used as secondary clinical outcome indicators.

  • A semi-structured parent interview e.g. questions “name three positive characteristics of your child”; “what is his/her biggest difficulty in everyday life?” was developed by the authors to further adapt the intervention to each child’s individual needs.

  • A 9-item satisfaction scale (e.g. items “I am satisfied with the quality of the intervention; I would recommend the intervention to other parents”), rated on a 10 point Likert-type scale form “not at all agree” to “totally agree”, was developed by the authors to explore satisfaction and was distributed at the end of the intervention as an anonymous parent evaluation form. The scale was used as a secondary outcome indicator.

  • Children’s satisfaction was measured by a questionnaire with 7 closed-ended questions (e.g. items “I feel better since I've been coming here”) rated on a 5 point faces scale (with the first face representing “zero satisfaction” and the fifth face representing “total satisfaction”) and 3 open-ended questions in the form of sentence completion (i.e. “I liked most…”; “I disliked most…”; “My favourite game with the robot was…”). The scale was used as a secondary outcome indicator.

Therapist’s Recordings

In each session the therapist recorded reasons for incomplete scenarios, which consisted of “difficulties understanding the task”, “behavioural non-compliance”, “verbal refusal”, and “not applicable”. Therapist’s recordings also included: “ability to maintain eye contact”, “child's speech duration”, “voice volume”, “spontaneous communication” (how many times the child spontaneously addressed the robot), “gesture communication” (how many times the child pointed to an object without naming it), “focus duration on the target”, and “number of words produced”. Those recordings were used as secondary outcome indicators.

The Robot

The selection of the child-sized humanoid robot NAO was based on the fact that is one of the most popular and researched humanoid robots worldwide and it is predominantly used in child robot-assisted interventions (Amirova et al., 2021; Papakostas et al., 2021). NAO is able to apply verbal and nonverbal communication using polytropic sensors, such as cameras and microphones, and is widely preferred due to its flexible movement and its multi-coloured eyes (Kumazaki et al., 2020). Using its hands and arms, the robot is able to perform human-like gestures in order to enhance its text-to-speech capability or provide non-verbal cues. In addition, its small size and human-like characteristics have been shown to be appealing to children (Syriopoulou-Delli & Gkiolnta, 2020).

NAO was selected as co-therapist and was programmed to function in a semi-autonomous mode during the session, while remaining under the supervision of the therapist. If the child’s cognitive and emotional development did not permit NAO’s autonomy, the robot functioned as a social facilitator rewarding and empowering the child (e.g. lighting the eyes, clapping hands, saying encouraging words or playing music).

The robot was also used to collect real-time information regarding the session based on machine vision algorithms to recognise actions and gestures and speech recognition algorithms to recognise answers (Lytridis et al., 2022). In each session the robot recorded logs (i.e. voice volume, eye contact time, speech time, silence time, number of correct answers). More specifically, the action recognition algorithms were based on machine learning-based libraries (namely OpenPose and OpenFace) that detect body pose and facial landmarks. On the other hand, the built-in speech recognition module of the NAO robot was used in conjunction with sound detection features to extract information on pauses, speech duration and speech volume.

The Therapy Room

Intervention sessions were conducted in a white sensory room that was designed specifically to cover ASD children needs for a calming and relaxing environment and limit possible distracting triggers during the therapy session (Fig. 3).

Fig. 3
figure 3

Therapy room

Exit Interviews

One month after the end of the follow up session, exit interviews were conducted to allow parents and children to describe their experiences of the intervention. Participants had the opportunity to report any positive or negative changes related to the intervention and also propose ideas to ameliorate its implementation.

Data Analysis

The sample size calculation was based on the NEPSY-II AF subscale, a parameter which is considered to be a key measure in social and emotional functioning and found to be significant in the feasibility study. In order to achieve significant differences with an appropriate power (accepting an α level of 0.05 and a power level of 0.8), the sample size calculation resulted in a sample size of 37 participants with 80% power to detect an effect of at least d = 0.40.

The percentage of missing values across the 326 variables varied between 0 and 25%. In total 822 out of 14.344 cases (5.8%) were incomplete. The analysis of patterns with a minimum percentage of missing for variable to be displayed at 0.01, revealed a random pattern of missing data. Incomplete variables were imputed by using the default settings of the SPSS 26 package. Comparison of descriptive values between data before and after the imputation revealed plausible imputations.

Analyses were conducted on both intention-to-treat (ITT) and per-protocol (PP) samples but for the sake of parsimony, PP analyses are reported (except for demographics). For the ITT sample, the method of including all participants who were entered into the trial (N = 51) and provided pre-treatment assessments (irrespectively of whether they completed the intervention) was used.

Baseline characteristics of NG and CG participants were contrasted descriptively. Analyses estimating the effects of the two interventions on changes in core symptoms and psychosocial outcome scores over time were conducted with mixed linear models (MLM), specifying the 2 assessment points as a repeated measure and the type of intervention (Robot-assisted vs. Human only) as a fixed effect. Normality of distribution was tested using the Shapiro – Wilk Test. The final model included a random intercept for the subject identifier to account for between-subject variability and correlation between time points. Analyses used restricted maximum likelihood estimation. Moderating effects of gender and IQ on outcomes were explored by testing the significance of the additional random component. An autoregressive covariance structure with heterogeneous variances was assumed.

Finally, a General Linear model was used in order to compare the progress of the two groups by using the therapist’s recordings concerning different variables on the beginning, the middle and the end of the sessions. Regarding parent reporting, good inter-parent agreement indicated that a single parent informant could be used to facilitate data analysis. All analyses were conducted using IBM SPSS software v.26.0.

Results

Treatment Attrition

Fifty-one children (Mage = 9.43, SD = 2.07; 80.4% male) were allocated into two groups: the NAO group (NG) and the Control Group (CG). Forty four of 51 participants (86%) completed the full 21 sessions of the protocol and the end of treatment evaluation. The additional 7 (14%) received a partial dose of intervention: 3 participants (2 NG; 1 CG) attended 10–12 sessions, 1 (CG) attended 3 sessions and 3 (CG = 2; NG = 1) dropped out after the first session. The total number of sessions attended by the ITT sample did not differ by group. Baseline demographic and clinical characteristics (Table 3) were similar across the two study groups so there was no need to adjust the analyses to account for baseline imbalances. The only statistically significant difference was found at the WISC-IV variable “Processing Speed” favouring the CG and at the NEPSY-II variable “Inhibition-Inhibition: favouring the NG.

Table 3 Baseline characteristics and scale’s scores (ITT sample)

Completers (n = 44; 79.5% male) were matched by age and class and presented a mean age of 9.48 years (SD = 1.95; MIN = 6, MAX = 13). Each group of completers had included 22 children, (NG: Mean age = 9.68 years; SD = 1.87; 19 males and 3 females; CG: Mean age = 9.27 years; SD = 2.06; 16 males and 6 females).

Primary Outcomes

Both groups showed improvement in CARS-2 Total score, better performance in the NEPSY-II Social Perception domain (AF and ToM subdomains) as well as fewer errors and less completion time in the NEPSY-II Inhibition (part of the Attention and Executive Functioning domain) at the end point of the intervention. Concerning parent reporting, results revealed significant differences for the CBCL “Internalising problems” and “Externalising problems” subscales, as well as the CBCL “Total problems score” subscale, indicating fewer problems at the end of the intervention (Table 4). On top of that, a higher mean at the end of the treatment in the SDQ-subscale “Prosocial Behaviour” showed an increase in the intention to help others. No statistically significant differences were presented regarding TRF scores, but this may be due to the fact that the learning process was conducted online, resulting in limited ability of teachers to assess psychosocial components in addition to learning evaluation.

Table 4 Primary outcomes, change over time

Secondary Outcomes

Concerning satisfaction of participating to this study parents’ as well as children satisfaction was explored. CG results revealed a high satisfaction rate (children: M = 4.84; SD = 0.37, MIN = 4, MAX = 5; parents: M = 9.09 (SD = 0.81, MIN = 8, MAX = 10). The Intervention evaluation at the End of Treatment mean score was for mothers 3.62 (SD = 0.78; MIN = 2, MAX = 5), for fathers 3.35 (SD = 0.76; MIN = 2, MAX = 5) and for teachers 3.42 (SD = 0.60; MIN = 3, MAX = 5). Moreover, parents and teachers were asked to evaluate the additional aid offered after the study. Results revealed a mean score for mothers of 2.60 (SD = 0.66; MIN = 2, MAX = 4), for fathers of 2.71 (SD = 0.86; MIN = 2, MAX = 5) and for teachers of 1.58 (SD = 0.55; MIN = 1, MAX = 3). As for the NG the Total satisfaction score for children was 4.81 (SD = 0.40; MIN = 4, MAX = 5) and for parents of 9.27 (SD = 0.76; MIN = 8, MAX = 10). The Intervention evaluation at the End of Treatment mean score was for mothers 3.74 (SD = 0.66; MIN = 3, MAX = 5), for fathers 3.68 (SD = 0.71; MIN = 2, MAX = 5) and for teachers 3.45 (SD = 0.60; MIN = 2, MAX = 4). Additional aid evaluation was for mothers of 2.71 (SD = 0.70, MIN = 2, MAX = 4), for fathers of 2.82 (SD = 0.73, MIN = 2, MAX = 4) and for teachers of 1.77 (SD = 0.96, MIN = 1, MAX = 4).

Group Comparison

At the end of treatment, the only significant difference between the two groups concerned Inhibition subscales indicating higher change scores for the CG, but this difference was not of clinical value since it was apparent at baseline comparisons as well. Concerning CBCL and TRF outcomes, results did not reveal statistically significant differences between the two groups at the end of treatment (all p > 0.05). Concerning SDQ, statistically significant differences were presented for Prosocial Behaviour (t(42) = 2.457, p = 0.014, 95%CI[0.2967, 2.6614]), indicating higher mean scores for NG. Finally, there was no statistically significant difference concerning satisfaction of participating in this study (all p > 0.05). Nevertheless, the mean scores in all questions of satisfaction for parents, teachers and children were higher in the NG compared to CG even if the difference was not significant.

Follow-Up

Follow-up evaluations were extremely limited (only 22 completed follow-ups), therefore were not included in the main MLM model. Despite this limitation, analyses of the follow-up ratings showed that changes over time in the variables “Affect Recognition” [F(2,28) = 6.58, p = 0.004]; “Inhibition Naming Combined” [F(2,28) = 6.58, p = 0.004]; “Inhibition-Inhibition Combined” [F(2,28) = 6.58, p = 0.004]; “Theory of Mind Verbal” [F(2,28) = 6.58, p = 0.004]; “Theory of Mind Contextual” [F(2,28) = 6.58, p = 0.004]; and “Theory of Mind Total” [F(2,28) = 6.58, p = 0.004]; remained statistically significant.

Therapist’s Recordings

A multivariate General Linear Model was used in order to compare the progress of the two groups (NG vs. CG) concerning therapist’s recordings (eye contact, speech, communication, gestures, focus on target and number of spoken words) at baseline, at the middle and at the end of treatment. Participants in both groups showed improvement in eye contact; spontaneous communication; speech-based social interaction; voice volume; gesture communication and focus on target. Results showed a statistically significant interaction effect between group and speech based social interaction [F(2,34) = 6.810; p = 0.003, Wilks’ Λ = 0.714], between group and gesture based social interaction [F(2,34) = 3.587; p = 0.039, Wilks’ Λ = 0.826] and a marginally statistically significant interaction effect between group and eye contact [F(2,34) = 3.131; p = 0.056, Wilks’ Λ = 0.844] on the combined dependent variables (Fig. 4).

Fig. 4
figure 4

Therapist’s recordings

Robot Recordings

The robot’s recordings (NG-every session) provided data valuable for decoding the child's responsiveness during the intervention in reference to the variables: eye contact, voice volume, speaking time, sequence recordings and correct imitative movements. Again, the recordings overall implied improvements in the aforementioned variables but technical or other difficulties hampered some of the robot's recordings in several sessions. Such difficulties were: a. The script did not run, b. An error was made by the robot and no recording was made c. The therapist handled the script incorrectly (e.g. the "Exit" button may not have been pressed to import the recordings into the file), d. Some scripts did not include recordings anyway (e.g. relaxation scenarios).

Parents’ Comments (Exit Interviews)

Exit interviews were conducted at the end of the intervention to generate qualitative change insights and identify patterns of themes in the interview data. Parents were invited to participate and the information gathered was analysed using thematic analysis, themes were generated inductively from data. Results are presented in Table 5.

Table 5 Content Analysis (Exit Interviews)

Aversive Reactions

An aversive reaction was observed in one of the 22 children of the NG during the familiarisation (first) session with NAO. The robot saluted the child entering the therapy room by calling his name. The child then immediately showed signs of stress and the therapist intervened to calm the child. Similar findings have been reported previously (Bekele et al., 2014; Schadenberg et al., 2020) in studies introducing social robots into therapy. In this case the child was reintroduced to the robot following desensitisation techniques with success.

Discussion

This study explored whether a robot-assisted psychosocial intervention can improve psychosocial skills in children with ASD by comparing change in several psychological and neuropsychological domains from baseline to post-intervention and whether this result differs from the same intervention applied by a human therapist only.

At the end of the intervention both groups significantly improved regarding core symptoms of ASD as measured by the primary clinical outcome indicators: (CARS-2 total score); social perception (AF and ToM) and executive functioning (IN).

Despite proven statistical significance in CARS-2 total score change for both groups, clinical meaningfulness was not achieved, since the overall CARS-2 total score change was less than the proposed 4.5 points threshold that proves the efficacy of interventions in ASD (Jurek et al., 2021). This result is in line with another study (Pioggia et al., 2007) that used a robot as a therapist assistant and CARS-2 as an outcome measure.

In regards with social perception and similarly to other studies (Conti et al., 2019; Marino et al., 2020; Pop et al., 2013) AF and ToM were clearly improved for both groups, after the intervention. Both groups also presented significantly better performance, fewer errors and less completion time in IN which is an executive function highly related to social functioning and emotional competence (Li et al., 2020). Similar progress has been reported in other studies using the social robot NAO (Heidari et al., 2019) or other types of robots (Boccanfuso et al., 2017) to evaluate improvements in specific executive functions.

With respect to parents’ reporting, both groups presented improvements in the “Total Problems”, “Internalising Problems” and “Externalising Problems” subscales as measured by CBCL. These findings, indicative of improvements in mental health symptoms, are in line with the results of Yun et al. (2017) who found that robots are useful mediators of social skills training for children with ASD, based on their parents’ reporting and Pinto Costa et al. (2019) whose robot-mediated emotional ability training significantly reduced internalising but did not have an effect on externalising problems. Prosocial behaviour as measured by SDQ was increased after the intervention indicating a further improvement in psychosocial functioning. This result was also present in another study (Kim et al., 2021) that explored the implementation of robot-assisted therapy based on smile detection for facilitating prosocial behaviours in children with ASD.

The therapist's recordings (every child-every session) confirmed the positive course of the intervention. The lack of sufficient data on the robot’s recordings due mainly to technical difficulties did not allow valid statistical analysis to be performed. As a result, the robot's recording data could not be compared with the therapist's recording data to draw valid conclusions.

The level of change in the clinical outcome indicators and parent-completed measures did not differ significantly between the groups. This result is in line with some studies (Costescu et al., 2014; Huskens et al., 2013; Yun et al., 2017) and in contrast with other studies that either favour the robot-assisted group (Ghiglino et al., 2021; Soares et al., 2019; van den Berk-Smeekens et al., 2021) or the control group (Srinivasan et al., 2015). Differences in methodology, in particular in sample size, robot type and outcome measures make comparison with other studies challenging.

Satisfaction for parents, teachers and children were higher in the NG compared to the CG. This result is quite significant because children who are satisfied with the process become intrinsically motivated and are more likely to have increased benefits from treatment (Lebersfeld et al., 2018). Therefore, methods that favour the therapeutic process such as robot-based interventions offer an optimistic outlook for the future of autism therapies.

Taken together, all the results indicate that participants’ adherence to the robot-assisted intervention was successful and showed positive results after therapy sessions, a finding that raises optimism for future applications of robot-assisted therapy. In support of the findings, intervention initiatives in robot-assisted therapy based on sections of the original intervention protocol have also demonstrated beneficial results for the participants. Specifically, positive outcomes regarding engagement and motivation were identified in the relevant sub-studies; a robot-assisted relaxation training for anxiety and anger symptoms (Holeva et al., 2021) and a robot-assisted relaxation training adapted for patient’s hospitalization (Nikopoulou et al., 2022). Moreover, in a recent review of psychosocial interventions including robot- assisted ones, it was suggested that they could increase the number of individuals with ASD who are employable, linking the positive impact of interventions to professional functioning (Ogawa et al., 2021).

Although technology is advancing at a rapid pace, supporting robotic therapy still faces several challenges. The collaboration between psychologists and programmers seems to have given a new impetus to the resulting product (Gubenko et al., 2021) with more recent trials reaching greater methodological quality (Robinson et al., 2019). Still, there are many obstacles that need to be overcome. The children started the intervention with varying degrees of cognitive or social functioning; therefore it became clear that after a few sessions some of the children lost their interest in the robot. This finding is similar with other studies (Srinivasan et al., 2015) and stresses the importance of creating treatment protocols based on variety and different levels of difficulty.

Robotic therapy is still in its infancy, and its effectiveness has not been fully studied in regards with autism symptomatology or cognitive functions. Its advantages relate to the ability of the robot to take on various roles such as instructor, educator, social companion, entertainer, therapist's assistant, diagnostician and observer (Kouroupa et al., 2022). Important disadvantages at this stage are the inability to meet the therapists’ or the family’s expectations as well as difficulties related to cost, maintenance, programming, and safety (Alabdulkareem et al., 2022).

Despite obstacles like the COVID-19 pandemic and challenges like symptomatology itself and technical issues, the hopeful message resulting from this study was that the intervention was completed with a minimum of dropouts and with high satisfaction rates.

Limitations

Although there is a chance that responses to the satisfaction questionnaire were affected by social desirability biases, those biases were minimised by the parallel completion of the anonymised version of the parental satisfaction which provided similarly high results. Despite the research team’s efforts to match the two groups, children began the intervention while receiving additional forms of support, so there is a small risk that the results may reflect the effect of other treatment components as well. Data from teachers were limited; probably due to their inability to observe and evaluate the impact of the treatment on children's social skills as they would have done if the lessons had taken place in the classroom rather than online. The data from the robot recordings were also insufficient to draw valid conclusions by comparing them with the therapist's recording data. Finally, the limited response of the 3-month follow-up, again due to COVID-related challenges, did not provide the opportunity to accurately explore the sustainability of the intervention’s outcomes over time.

Conclusions

Continued development of practicing robot-assisted interventions in clinical practice will permit more accurate identification of strengths and weaknesses, permitting continuous improvements, and refinements of the methods informed by Artificial Intelligence. The use of weighted and valid instruments is essential for the evaluation of study results, as is a rigorous methodology with a sufficient sample of participants. Larger randomised controlled trials and guidelines for the implementation of robot-assisted therapy for ASD are needed for making comparison of studies possible.