Review

. 2019 Oct;24(10):1435-1450.

doi: 10.1038/s41380-018-0321-0. Epub 2019 Jan 7.

Big data approaches to decomposing heterogeneity across the autism spectrum

Michael V Lombardo^#^{1

2}, Meng-Chuan Lai^#^{3

4

5}, Simon Baron-Cohen^{3

6}

Affiliations

¹ Department of Psychology, University of Cyprus, Nicosia, Cyprus. mvlombardo@gmail.com.
² Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK. mvlombardo@gmail.com.
³ Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK.
⁴ Centre for Addiction and Mental Health and The Hospital for Sick Children, Department of Psychiatry, University of Toronto, Toronto, ON, Canada.
⁵ Department of Psychiatry, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan.
⁶ Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK.

^# Contributed equally.

PMID: 30617272
PMCID: PMC6754748
DOI: 10.1038/s41380-018-0321-0

Review

Big data approaches to decomposing heterogeneity across the autism spectrum

Michael V Lombardo et al. Mol Psychiatry. 2019 Oct.

. 2019 Oct;24(10):1435-1450.

doi: 10.1038/s41380-018-0321-0. Epub 2019 Jan 7.

Authors

Michael V Lombardo^#^{1

2}, Meng-Chuan Lai^#^{3

4

5}, Simon Baron-Cohen^{3

6}

Affiliations

¹ Department of Psychology, University of Cyprus, Nicosia, Cyprus. mvlombardo@gmail.com.
² Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK. mvlombardo@gmail.com.
³ Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK.
⁴ Centre for Addiction and Mental Health and The Hospital for Sick Children, Department of Psychiatry, University of Toronto, Toronto, ON, Canada.
⁵ Department of Psychiatry, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan.
⁶ Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK.

^# Contributed equally.

PMID: 30617272
PMCID: PMC6754748
DOI: 10.1038/s41380-018-0321-0

Abstract

Autism is a diagnostic label based on behavior. While the diagnostic criteria attempt to maximize clinical consensus, it also masks a wide degree of heterogeneity between and within individuals at multiple levels of analysis. Understanding this multi-level heterogeneity is of high clinical and translational importance. Here we present organizing principles to frame research examining multi-level heterogeneity in autism. Theoretical concepts such as 'spectrum' or 'autisms' reflect non-mutually exclusive explanations regarding continuous/dimensional or categorical/qualitative variation between and within individuals. However, common practices of small sample size studies and case-control models are suboptimal for tackling heterogeneity. Big data are an important ingredient for furthering our understanding of heterogeneity in autism. In addition to being 'feature-rich', big data should be both 'broad' (i.e., large sample size) and 'deep' (i.e., multiple levels of data collected on the same individuals). These characteristics increase the likelihood that the study results are more generalizable and facilitate evaluation of the utility of different models of heterogeneity. A model's utility can be measured by its ability to explain clinically or mechanistically important phenomena, and also by explaining how variability manifests across different levels of analysis. The directionality for explaining variability across levels can be bottom-up or top-down, and should include the importance of development for characterizing changes within individuals. While progress can be made with 'supervised' models built upon a priori or theoretically predicted distinctions or dimensions of importance, it will become increasingly important to complement such work with unsupervised data-driven discoveries that leverage unknown and multivariate distinctions within big data. A better understanding of how to model heterogeneity between autistic people will facilitate progress towards precision medicine for symptoms that cause suffering, and person-centered support.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

**Fig. 1**
Approaches to decomposing heterogeneity in autism. a A population of interest is shown, and autism cases are colored in green, pink, and blue. The different colors are meant to represent different autism subtypes. In b we show the impact of ignoring heterogeneity on effect size. With a typical case–control model, we ignore these possible subtype distinctions and compare autism to controls on some dependent variable. In this example scenario there is no clear case–control difference but the autism group shows higher variability (indicated by the larger error bars). An approach towards decomposing heterogeneity might be to construct a stratified model whereby we model the subtype labels instead of one autism label, and then re-examine differences on the hypothetical dependent variable of interest. In this example, the autism subtypes show contradictory effects. These effects are masked in the case–control model as the averaging cancels out the interesting different effects across the subgroups. c Heterogeneity is shown in autism as multi-level phenomena. This panel also visualizes the difference between broad versus deep big data characteristics and labels the top-down versus bottom-up approaches to understanding heterogeneity in this multi-level context. Finally, this panel also shows how development is another important dimension of heterogeneity to consider at each level of analysis (i.e., ‘chronogeneity’). In this example, chronogeneity is represented by different trajectories for different types of autism individuals

**Fig. 2**
Case–control vs stratified model example with adult autism and mentalizing ability. This figure reports data from Lombardo et al. [25] on two independent datasets of adults with autism and performance on an advanced mentalizing test, the Reading the Mind in the Eyes Test (RMET). a (Discovery), b (Replication) Case–control differentiation and the standardized effect size for each dataset are shown. c–f RMET scores and standardized effect sizes from the same two datasets after unsupervised data-driven stratification into five distinct autism subgroups and four distinct TD subgroups. Autism subgroups 1–2 are highly impaired on the RMET, while autism subgroups 3–5 are completely overlapping in RMET scores with the TD population

**Fig. 3**
Simulation of sample effect size estimates at different sample sizes and across a range of true population effects for a hypothetical case–control study. In this simulation we set the population effect size to a range of different values, from very small (e.g., d = 0.1) to very large (e.g., d > 1.0) (panels a–e show simulation results when effect size ranges from d = 0.1 to d = 0.9 in steps of 0.2). We then simulated data from two populations (cases and controls), each with n = 10,000,000, that had a case–control difference at these population effect sizes. Next, we simulated 10,000 experiments where we randomly sampled from these populations different sample sizes (n = 20, n = 50, n = 100, n = 200, n = 1000, n = 2000) and computed the sample effect size estimate (standardized effect size, Cohen’s d) for the case–control difference. These histograms (gray) show how variable the sample effect size estimates are (black lines show 95% confidence intervals) relative to the true population effect size (green line). Visually, it is quite apparent how small sample sizes (e.g., n = 20) have wildly varying sample effect size estimates and that this variability is consistent irrespective of what the true population effect size is. Overlaid on each gray histogram are red histograms that show the distribution of sample effect size estimates where the hypothesis test (e.g., independent samples t-test) passes statistical significance at p < 0.05. The rightward shift in this red distribution relative to the true population effect size (green line) illustrates the phenomenon of effect size inflation. The problem is much more pronounced at small sample sizes and when true population effects are smaller. We then computed what is the average effect size inflation for this red distribution and plotted this average effect size inflation as a percentage increase relative to the true population effect in (f). Each line in panel f refers to simulations with different sample sizes. This plot directly quantifies the degree of effect size inflation across a range of true population effects and across a range of sample sizes. The code for implementing and reproducing these simulations is available at https://github.com/mvlombardo/effectsizesim

**Fig. 4**
Simulation showing sampling variability and bias of enrichment of specific strata in small sample size studies. In this simulation we generated a control population (n = 1,000,000) with a mean of 0 and a standard deviation of 1 on a hypothetical dependent variable (DV). We then generated an autism population (n = 1,000,000) with 5 different autism subtypes each with a prevalence of 20% (e.g., n = 200,000 for each subtype). These subtypes vary from the control population in effect size in units of 0.5 standard deviations, ranging from −1 to 1. This was done to simulate heterogeneity in the autism population that is reflective of very different types of effects. For example, the autism subtype 5 shows a pronounced increased response on the DV, whereas autism subtype 1 shows a pronounced decreased response on the DV. Across 10,000 simulated experiments, we then randomly sampled from the autism population sample sizes of n = 20, n = 200, and n = 2000, and computed the sample prevalence of each autism subtype. The ideal result without any bias would be sample prevalence rates of around 20% for each subtype. This 20% sample prevalence is approached at n = 2000, and to some extent at n = 200. However, small sample sizes such as n = 20 shows large variability in sample prevalence rates of the subtypes and this can markedly bias the results of a case–control comparison. The code for implementing and reproducing these simulations is available at https://github.com/mvlombardo/effectsizesim

See this image and copyright information in PMC

Cited by

Heterogeneity in strategy use during arbitration between experiential and observational learning.
Charpentier CJ, Wu Q, Min S, Ding W, Cockburn J, O'Doherty JP. Charpentier CJ, et al. Nat Commun. 2024 May 24;15(1):4436. doi: 10.1038/s41467-024-48548-y. Nat Commun. 2024. PMID: 38789415 Free PMC article.
The Pediatric Autism Research Cohort (PARC) Study: protocol for a patient-oriented prospective study examining trajectories of functioning in children with autism.
Kata A, McPhee PG, Chen YJ, Zwaigenbaum L, Singal D, Roncadin C, Bennett T, Carter M, Di Rezze B, Drmic I, Duku E, Fournier S, Frei J, Gentles SJ, Georgiades K, Hanlon-Dearman A, Hoult L, Kelley E, Koller J, de Camargo OK, Lai J, Mahoney B, Mesterman R, Ng O, Robertson S, Rosenbaum P, Salt M, Zubairi MS, Georgiades S. Kata A, et al. BMJ Open. 2024 Apr 29;14(4):e083045. doi: 10.1136/bmjopen-2023-083045. BMJ Open. 2024. PMID: 38684247 Free PMC article.
Transparent deep learning to identify autism spectrum disorders (ASD) in EHR using clinical notes.
Leroy G, Andrews JG, KeAlohi-Preece M, Jaswani A, Song H, Galindo MK, Rice SA. Leroy G, et al. J Am Med Inform Assoc. 2024 May 20;31(6):1313-1321. doi: 10.1093/jamia/ocae080. J Am Med Inform Assoc. 2024. PMID: 38626184 Free PMC article.
Applicability and Psychometric Properties of General Mental Health Assessment Tools in Autistic People: A Systematic Review.
Halvorsen MB, Kildahl AN, Kaiser S, Axelsdottir B, Aman MG, Helverschou SB. Halvorsen MB, et al. J Autism Dev Disord. 2024 Apr 13. doi: 10.1007/s10803-024-06324-3. Online ahead of print. J Autism Dev Disord. 2024. PMID: 38613595
Teachers and educators' experiences and perceptions of artificial-powered interventions for autism groups.
Li G, Zarei MA, Alibakhshi G, Labbafi A. Li G, et al. BMC Psychol. 2024 Apr 11;12(1):199. doi: 10.1186/s40359-024-01664-2. BMC Psychol. 2024. PMID: 38605422 Free PMC article.

See all "Cited by" articles

References

1. Lai MC, Lombardo MV, Baron-Cohen S. Autism. Lancet. 2014;383:896–910. - PubMed
1. Buescher AV, Cidav Z, Knapp M, Mandell DS. Costs of autism spectrum disorders in the United Kingdom and the United States. JAMA Pediatr. 2014;168:721–8. - PubMed
1. Leigh JP, Du J. Brief Report: Forecasting the economic burden of autism in 2015 and 2025 in the United States. J Autism Dev Disord. 2015;45:4135–9. - PubMed
1. Kapur S, Phillips AG, Insel TR. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Mol Psychiatry. 2012;17:1174–9. - PubMed
1. Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372:793–5. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Big data approaches to decomposing heterogeneity across the autism spectrum

Affiliations

Big data approaches to decomposing heterogeneity across the autism spectrum

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical