Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 24;10(21):eadj4452.
doi: 10.1126/sciadv.adj4452. Epub 2024 May 23.

Using a comprehensive atlas and predictive models to reveal the complexity and evolution of brain-active regulatory elements

Affiliations

Using a comprehensive atlas and predictive models to reveal the complexity and evolution of brain-active regulatory elements

Henry E Pratt et al. Sci Adv. .

Abstract

Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Construction and characteristics of our PsychENCODE atlas of b-cCREs.
(A) 240-mammal phyloP scores of ENCODE cCREs, binned by the number of fetal brain DNase-seq experiments that support the cCRE’s activity. cCREs active in five or more DNase experiments (pink and red) are defined as fetal b-cCREs, while cCREs active in one to four biosamples (purple) are not unless they have DNase signal in the 99th percentile. These contrast cCREs that are not brain active (blue), and 10,000 dinucleotide matched random genomic regions (gray). (B) Stacked bar charts representing the classification of b-cCREs based on identification in fetal-specific (green), adult-specific (red), or both fetal and adult biosamples (blue) (top) and the classification of b-cCREs as neuron-specific (purple), glia-specific (gray), neuron and glia (blue), or low signal (light gray) based on adult NeuN FAN-sorted ATAC data (bottom). (C) Venn diagrams representing the overlap between adult b-cCREs and a published high-confidence enhancer set in adult brain (left) and between fetal b-cCREs and a published set of ATAC-seq peaks from the developing human cerebral cortex (right). (D) Ten most significantly enriched biological processes from GREAT analysis of fetal-specific b-cCREs. FDR, false discovery rate. (E) The validation rate of b-cCREs by transgenic mouse assays in the VISTA database. Fractions of b-cCREs, nb-cCREs, and non-cCREs overlapping VISTA enhancers that are active in various tissue types are grouped by color. (F) Overlap of active b-cCREs in 14 different brain regions (table S1D). The top-right and bottom-left triangles show pairwise overlap coefficients between different brain regions for NeuN and NeuN+ nuclei, respectively. (G) b-cCRE activity, represented by the proportion of active b-cCREs over active cCREs, in brain (red) and nonbrain (blue) scATAC-seq experiments. Each violin plot represents a different single-cell study, with each point being a cell type from pseudo-bulk scATAC-seq data.
Fig. 2.
Fig. 2.. Identifying adult-specific, fetal-specific, neuron-specific, and glia-specific b-cCREs and characterizing their role in complex traits.
(A) Heritability enrichment meta-analysis of adult-specific, fetal-specific, shared b-cCREs, and nb-cCREs (left) and neuron-specific, glia-specific, neuron/glia-shared b-cCREs, b-cCREs with low signal in both NeuN+ and NeuN adult samples, and nb-cCREs (right) in brain-related traits (red) and nonbrain-related traits (blue). LDSC meta-analysis P value for enrichment in heritability of genetic variants residing in subsets of b-cCREs: *P < 0.05, **P < 0.01, and ****P < 0.0001. (B) Heatscatter comparing z scores of b-cCREs (left-most plot) from NeuN+ versus NeuN ATAC-seq data aggregated across multiple donors. Black lines intersecting the plot represent a z score of 1.64, our threshold of defining an active cCRE. Middle and right-most plots are the same but with all cCREs (middle) and nb-cCREs (right). (C) Same as (B) but plotting adult-specific b-cCREs (left), fetal-specific b-cCREs (middle), and b-cCREs active in both adult and fetal biosamples (right).
Fig. 3.
Fig. 3.. Machine learning of sequence features distinguishing b-cCREs active at different developmental time points and in different brain cell types.
(A) Stacked bar charts representing the classification of b-cCREs based on identification in Corces et al. (19) single-cell ATAC cell types. Each bar represents the number of active b-cCREs within a particular cell type, split by activity in all seven cell types (black), activity shared among two to six cell types (gray), and activity specific to a particular cell type (colors vary by cell type). (B) Meta-analysis on the heritability enrichment in brain-related traits (red) and nonbrain-related traits (blue) for b-cCREs predicted to be active in each cell type using the Corces et al. (19) scATAC-seq data. LDSC meta-analysis P value for enrichment in heritability of genetic variants residing in subsets of b-cCREs: *P < 0.05 and ****P < 0.0001. (C) UMAP plot of latent sequence features learned by a variational autoencoder (VAE) within adult-specific (red), fetal-specific (green), and adult/fetal shared (blue) b-cCRE subsets. (D) UMAP plot of latent sequence features learned by a VAE within neuron-specific (purple), glia-specific (gray), and neuron/glia-shared (blue) b-cCRE subsets. (E) UMAP plot of latent sequence features learned by a VAE within cell type–specific b-cCRE subsets: Excitatory neurons (orange), inhibitory neurons (green), microglia (pink), astrocytes (red), oligodendrocytes (gray), and OPCs (blue). (F) Comparison of Analysis of Motif Enrichment (AME) scores (x axis) and random forest feature importance (y axis), to identify TF motifs that are found within excitatory neuron-specific b-cCREs, colored by TF family. (G) Differential gene expression of excitatory neurons versus all other cell types (x axis) is plotted against sum of scaled random forest feature importance and AME scores within excitatory neuron-specific b-cCREs using arbitrary units (a.u.) (y axis), colored by TF family.
Fig. 4.
Fig. 4.. Deep learning of sequences influencing differences in chromatin accessibility at b-cCREs between neurons and glia.
(A) Illustration of ChromBPNet-identified high-importance sequences within an example b-cCRE. Top track: VLPFC neuron ATAC-seq signal at the b-cCRE; middle track: regions within the b-cCRE which are high-importance score sites (green); bottom track: the reference human genome sequence scaled according to ChromBPNet profile importance scores. Motifs (CREB1 and NFY) matching high-importance regions are shown below the tracks. (B) Left: Number of neuron (top), glia (middle), or neuron/glia-shared b-cCREs (bottom) overlapping at least one high-importance score site from ChromBPNet models—neuron model (purple), glia model (gray), or both neuron and glia models (blue). b-cCREs that do not contain any called high-score sites are in light gray. Right: Heatmaps of ChromBPNet importance scores in 300-bp windows centered on high-score sites within neuron/glia-shared b-cCREs. Top row heatmaps show the glia model importance scores, and bottom row heatmaps show neuron model importance scores. Left column sites are called high-score only by the neuron model, center column sites are called high-score by both neuron and glia models, and right column sites are called high-score only by the glia model. (C) Histograms of b-cCREs according to the number of high-score sites (cTFBSs) from the neuron model (purple bars) or glia model (gray bars) they contain. Left plot shows neuron-specific, middle shows glia-specific, and right shows neuron/glia-shared b-cCREs. (D) Average ChromBPNet importance scores of de novo–discovered NEUROD (left) and SPI1 (right) binding sites from ChromBPNet models trained using the pseudo-bulk scATAC-seq signal profile in each cell type (colored accordingly), along with the average phyloP scores at those binding sites.
Fig. 5.
Fig. 5.. Evolutionary conservation of b-cCREs throughout the mammalian lineage and the role of conserved versus evolving elements in complex traits.
(A) Evolutionary conservation of b-cCREs according to the number of mammalian genomes in which they align. N1 indicates the number of genomes aligning ≥90% of the b-cCRE’s sequence. N2 denotes the number of genomes aligning ≤10% of the b-cCRE’s sequence. From left to right, only adult-specific, fetal-specific, and adult/fetal-shared b-cCREs are shown. (B) Heritability enrichment meta-analysis of adult-specific, fetal-specific, and adult/fetal-shared b-cCREs, split by evolutionary groups, in brain-related traits (red) and nonbrain-related traits (blue). LDSC meta-analysis P value for enrichment in heritability of genetic variants residing in subsets of b-cCREs: *P < 0.05, **P < 0.01, and ****P < 0.0001. (C) Heritability enrichment meta-analysis of single-cell–type–specific b-cCREs, split by evolutionary groups, in brain-related traits (red) and nonbrain-related traits (blue). LDSC meta-analysis P value for enrichment in heritability of genetic variants residing in subsets of b-cCREs: *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.
Fig. 6.
Fig. 6.. Identifying primate-conserved b-cCRE sequence elements and characterizing their role in complex traits.
(A) Heritability enrichment meta-analysis of b-cCREs and nb-cCREs, split based on primate conservation, in brain-related traits (red) and nonbrain-related traits (blue). LDSC meta-analysis P value for enrichment in heritability of genetic variants residing in subsets of b-cCREs: **P < 0.01, ***P < 0.001, and ****P < 0.0001. (B) Comparison of heritability enrichment in brain-related traits (x axis) and nonbrain-related traits (y axis) cCREs, divided into 50 bins based on primate conservation and then further divided by b-cCRE (red), nb–cCRE (blue), and non-cCRE (green). (C) Heritability enrichment meta-analysis of brain-related traits (y axis) of the top 2% of primate constrained positions not intersecting a cCRE after removing those within x bp of a b-cCRE (b-cCRE slop). (D) Heritability enrichment of mammalian-constrained positions, primate-constrained positions, or both, intersecting b-cCREs, nb-cCREs, non-cCREs, and non-cCRE slop b-cCRE in brain-related traits (red) and nonbrain-related traits (blue). LDSC meta-analysis P value for enrichment in heritability of genetic variants residing in subsets of b-cCREs: *P < 0.05 and ****P < 0.0001.
Fig. 7.
Fig. 7.. PsychSCREEN is an interactive platform to view and compare multiomic data and annotations.
(A) Screenshots from PsychSCREEN’s disease/SNP portals, including a written description of the queried disease—schizophrenia (top right)—and a view of all risk loci identified by GWAS and where they appear in the genome (bottom left). Clicking on a locus redirects you to a genome browser (bottom right) spanning the highlighted coordinates, displaying tracks representing b-cCREs, aggregate neuron and glia ATAC signal, Schizophrenia GWAS summary statistics, a set of documented SNPs and their linkage disequilibrium, and base pair–resolution mammalian conservation scores. (B) Screenshots from PsychSCREEN’s gene portal, upon searching for the gene OLIG2. The genome browser is the default view, displaying b-cCREs, aggregate neuron and glia ATAC signal, and ATAC signal on an individual experiment and predicted importance scores from ChromBPNet models of PsychENCODE data. On the ChromBPNet track, highlighting a section of nucleotides (shaded in red) will prompt a search of the closest matching TF motif, and the motif logo is displayed. (C) Screenshots from PsychSCREEN’s single-cell portal, upon searching for the gene SOX8. The UMAP plot from a scRNA-seq experiment is displayed (top), with the color of each dot representing the expression (natural log-transformed counts per 105 total sequencing reads) of the queried gene. Alternatively, a dotplot (bottom) can be displayed, with the size of the circle representing the percentage of cells in a particular cluster expressing the queried gene and the opacity representing the average expression of the gene in that cluster.

Similar articles

References

    1. Azevedo F. A. C., Carvalho L. R. B., Grinberg L. T., Farfel J. M., Ferretti R. E. L., Leite R. E. P., Jacob Filho W., Lent R., Herculano-Houzel S., Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. J. Comp. Neurol. 513, 532–541 (2009). - PubMed
    1. Herculano-Houzel S., The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost. Proc. Natl. Acad. Sci. U.S.A. 109 (Suppl. 1), 10661–10668 (2012). - PMC - PubMed
    1. Wang D., Liu S., Warrell J., Won H., Shi X., Navarro F. C. P., Clarke D., Gu M., Emani P., Yang Y. T., Xu M., Gandal M. J., Lou S., Zhang J., Park J. J., Yan C., Rhie S. K., Manakongtreecheep K., Zhou H., Nathan A., Peters M., Mattei E., Fitzgerald D., Brunetti T., Moore J., Jiang Y., Girdhar K., Hoffman G. E., Kalayci S., Gümüş Z. H., Crawford G. E.; PsychENCODE Consortium, Roussos P., Akbarian S., Jaffe A. E., White K. P., Weng Z., Sestan N., Geschwind D. H., Knowles J. A., Gerstein M. B., Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018). - PMC - PubMed
    1. Li M., Santpere G., Kawasawa Y. I., Evgrafov O. V., Gulden F. O., Pochareddy S., Sunkin S. M., Li Z., Shin Y., Zhu Y., Sousa A. M. M., Werling D. M., Kitchen R. R., Kang H. J., Pletikos M., Choi J., Muchnik S., Xu X., Wang D., Lorente-Galdos B., Liu S., Giusti-Rodríguez P., Won H., de Leeuw C. A., Pardiñas A. F.; BrainSpan Consortium; PsychENCODE Consortium; PsychENCODE Developmental Subgroup, Hu M., Jin F., Li Y., Owen M. J., O’Donovan M. C., Walters J. T. R., Posthuma D., Reimers M. A., Levitt P., Weinberger D. R., Hyde T. M., Kleinman J. E., Geschwind D. H., Hawrylycz M. J., State M. W., Sanders S. J., Sullivan P. F., Gerstein M. B., Lein E. S., Knowles J. A., Sestan N., Integrative functional genomic analysis of human brain development and neuropsychiatric risks. Science 362, eaat7615 (2018). - PMC - PubMed
    1. Colantuoni C., Lipska B. K., Ye T., Hyde T. M., Tao R., Leek J. T., Colantuoni E. A., Elkahloun A. G., Herman M. M., Weinberger D. R., Kleinman J. E., Temporal dynamics and genetic control of transcription in the human prefrontal cortex. Nature 478, 519–523 (2011). - PMC - PubMed

Substances