Meta-Analysis

. 2021 Dec 2;108(12):2336-2353.

doi: 10.1016/j.ajhg.2021.10.009. Epub 2021 Nov 11.

Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics

Zihuai He¹, Yann Le Guen², Linxi Liu³, Justin Lee⁴, Shiyang Ma⁵, Andrew C Yang⁶, Xiaoxia Liu⁶, Jarod Rutledge⁷, Patricia Moran Losada⁶, Bowen Song⁸, Michael E Belloy⁶, Robert R Butler 3rd⁶, Frank M Longo⁶, Hua Tang⁷, Elizabeth C Mormino⁶, Tony Wyss-Coray⁶, Michael D Greicius⁶, Iuliana Ionita-Laza⁵

Affiliations

¹ Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA; Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA. Electronic address: zihuai@stanford.edu.
² Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA; Institut du Cerveau - Paris Brain Institute - ICM, Paris 75013, France.
³ Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA.
⁴ Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA.
⁵ Department of Biostatistics, Columbia University, New York, NY 10032, USA.
⁶ Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA.
⁷ Department of Genetics, Stanford University, Stanford, CA 94305, USA.
⁸ Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA.

PMID: 34767756
PMCID: PMC8715147
DOI: 10.1016/j.ajhg.2021.10.009

Meta-Analysis

Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics

Zihuai He et al. Am J Hum Genet. 2021.

. 2021 Dec 2;108(12):2336-2353.

doi: 10.1016/j.ajhg.2021.10.009. Epub 2021 Nov 11.

Authors

Affiliations

¹ Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA; Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA. Electronic address: zihuai@stanford.edu.
² Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA; Institut du Cerveau - Paris Brain Institute - ICM, Paris 75013, France.
³ Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA.
⁴ Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA.
⁵ Department of Biostatistics, Columbia University, New York, NY 10032, USA.
⁶ Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA.
⁷ Department of Genetics, Stanford University, Stanford, CA 94305, USA.
⁸ Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA.

PMID: 34767756
PMCID: PMC8715147
DOI: 10.1016/j.ajhg.2021.10.009

Abstract

Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches cannot analyze millions of rare genetic variants in biobank-scale whole-genome sequencing and whole-genome imputed datasets. We propose a scalable knockoff-based method for the analysis of common and rare variants across the genome, KnockoffScreen-AL, that is applicable to biobank-scale studies with hundreds of thousands of samples and millions of genetic variants. The application of KnockoffScreen-AL to the analysis of Alzheimer disease (AD) in 388,051 WG-imputed samples from the UK Biobank resulted in 31 significant loci, including 14 loci that are missed by conventional association tests on these data. We perform replication studies in an independent meta-analysis of clinically diagnosed AD with 94,437 samples, and additionally leverage single-cell RNA-sequencing data with 143,793 single-nucleus transcriptomes from 17 control subjects and AD-affected individuals, and proteomics data from 735 control subjects and affected indviduals with AD and related disorders to validate the genes at these significant loci. These multi-omics analyses show that 79.1% of the proximal genes at these loci and 76.2% of the genes at loci identified only by KnockoffScreen-AL exhibit at least suggestive signal (p < 0.05) in the scRNA-seq or proteomics analyses. We highlight a potentially causal gene in AD progression, EGFR, that shows significant differences in expression and protein levels between AD-affected individuals and healthy control subjects.

Keywords: Alzheimer disease; GWAS; knockoff statistics; omics; sequencing.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1**
Overview of KnockoffScreen-AL (A) The KnockoffScreen-AL method. (B) The application of KnockoffScreen-AL to UK biobank data. (C) Venn diagrams showing the number of identified loci that overlap with known AD loci or being replicated (p < 0.05). Common, common variant loci; rare, rare-variant loci; overlap with known AD loci, overlap with Jansen et al. and Kunkle et al.; replication, replication p value < 0.05 based on summary statistics from Kunkle et al. (D) Venn diagrams showing the number of implicated genes that are significant (p < 0.05) in scRNA-seq or proteomics analysis; KS-AL only: the additional genes identified by KnockoffScreen-AL but missed by conventional association tests; ProteomicsAging: p value < 0.05 in the proteomics analysis of age effect; ProteomicsADvsHC: p value < 0.05 in the proteomics analysis comparing Alzheimer disease-affected individuals to healthy control subjects; scRNA-seq: p value < 0.05 in the scRNA-seq analysis for at least one cell type.

**Figure 2**
Computing time, peak random-access memory (RAM) use, power, and FDR of different knockoff generators (A and B) The computing time and RAM were evaluated based on 2,000 variants, varying the sample size from 1,000 to 500,000. Naive SCIT, sequential conditional independent tuples (SCIT) with the “exact” linear model; BM, memory-efficient matrix operation. The shrinkage algorithmic leveraging BM method corresponds to the proposed KnockoffScreen-AL. The computing time for naive SCIT is truncated at sample size 100,000 because it cannot be applied to larger sample size. We also benchmark the computing time for phasing 10,000 samples via fastPhase with number of states K = 12. (C and D) Power/FDR comparison between KnockoffScreen-AL and the naive SCIT. (E and F) Power/FDR comparison between KnockoffScreen-AL (SCIT multiple knockoffs + ACAT-O) and other existing knockoff generators and feature importance score calculations. The different colors indicate different knockoff generators. The different types of lines indicate different tests to define the importance score.

**Figure 3**
Genome-wide analysis of Alzheimer disease in UK Biobank (A) The Manhattan plot of p values (truncated at 10⁻⁵⁰ for clear visualization) from the conventional common-variant and rare-variant association tests with conventional GWAS threshold (p < 5 × 10⁻⁸) for FWER control. (B) The Manhattan plot of KnockoffScreen-AL with target FDR at 0.10. The names of those loci previously reported by GWASs are shown in purple; names of discoveries not included in Jansen et al. and Kunkle et al. are shown in red (FDR = 0.05) and blue (FDR = 0.10).

**Figure 4**
Single-cell RNA-seq data (n = 143,793) analysis of the 43 proximal genes For each gene, we present the differentially expressed genes (DEG) analysis, comparing Alzheimer disease-affected individuals (AD) with healthy control subjects. (A) All 43 proximal genes. (B) The additional genes identified by KnockoffScreen-AL but missed by conventional association tests. Each dot represents a gene. Colors represent different cell types. The black dashed lines present p value cutoff at 0.05; the gray dashed lines present p value cutoff at 0.05/43 (number of candidate genes). For visualization purpose, −log10(p) was capped at 15 and abs(log2(fold change)) was capped at 1.0. Positive log2 fold change corresponds to higher expression level in AD.

**Figure 5**
Proteomics data analysis of genes at the 31 significant loci In addition to the 43 proximal genes, we additionally include genes within ±200 kb at each significant loci that can be matched with proteomics profile. (A and D) We present the differential abundance analysis comparing Alzheimer disease (AD)-affected individuals with healthy control subjects (HC) (A) and evaluated the age effect (D). Each dot presents a gene. Different colors represent different types of significance. NS, not significant; log2FC: |log2 fold change| ≥ 0.05; p value: p value ≤ 0.05; p value and log2FC: |log2 fold change| ≥ 0.05 and p value ≤ 0.05. The dashed gray lines correspond to the Bonferroni correction p value threshold 0.05/78 = 0.00064. (B and C) Differential abundance analysis of EGFR/TREM2. (E and F) Age effect analysis of EGFR/TREM2. MCI, mild cognitive impairment; LBD, Lewy body dementia.

**Figure 6**
Colocalization analysis of *EGFR* (A) Colocalization analysis of EGFR and nearby genes with the brain eQTLs meta-analysis and GTEx brain tissue eQTLs. (B) Colocalization analysis of EGFR with the brain eQTLs meta-analysis. The lead variant rs75061358 and its LD linked variant rs6979446 are highlighted (red and purple, respectively).

See this image and copyright information in PMC

Cited by

Leveraging electronic health records and knowledge networks for Alzheimer's disease prediction and sex-specific biological insights.
Tang AS, Rankin KP, Cerono G, Miramontes S, Mills H, Roger J, Zeng B, Nelson C, Soman K, Woldemariam S, Li Y, Lee A, Bove R, Glymour M, Aghaeepour N, Oskotsky TT, Miller Z, Allen IE, Sanders SJ, Baranzini S, Sirota M. Tang AS, et al. Nat Aging. 2024 Mar;4(3):379-395. doi: 10.1038/s43587-024-00573-8. Epub 2024 Feb 21. Nat Aging. 2024. PMID: 38383858 Free PMC article.
Identification of blood metabolites associated with risk of Alzheimer's disease by integrating genomics and metabolomics data.
Liu S, Zhong H, Zhu J, Wu L. Liu S, et al. Mol Psychiatry. 2024 Apr;29(4):1153-1162. doi: 10.1038/s41380-023-02400-9. Epub 2024 Jan 12. Mol Psychiatry. 2024. PMID: 38216726
Deep neural networks with controlled variable selection for the identification of putative causal genetic variants.
Kassani PH, Lu F, Guen YL, Belloy ME, He Z. Kassani PH, et al. Nat Mach Intell. 2022 Sep;4(9):761-771. doi: 10.1038/s42256-022-00525-0. Epub 2022 Sep 15. Nat Mach Intell. 2022. PMID: 37859729 Free PMC article.
Integrated analysis of plasma proteome and cortex single-cell transcriptome reveals the novel biomarkers during cortical aging.
Niu RZ, Feng WQ, Yu QS, Shi LL, Qin QM, Liu J. Niu RZ, et al. Front Aging Neurosci. 2023 Jul 19;15:1063861. doi: 10.3389/fnagi.2023.1063861. eCollection 2023. Front Aging Neurosci. 2023. PMID: 37539343 Free PMC article.
BIGKnock: fine-mapping gene-based associations via knockoff analysis of biobank-scale data.
Ma S, Wang C, Khan A, Liu L, Dalgleish J, Kiryluk K, He Z, Ionita-Laza I. Ma S, et al. Genome Biol. 2023 Feb 13;24(1):24. doi: 10.1186/s13059-023-02864-6. Genome Biol. 2023. PMID: 36782330 Free PMC article.

See all "Cited by" articles

References

1. Taliun D., Harris D.N., Kessler M.D., Carlson J., Szpiech Z.A., Torres R., Taliun S.A.G., Corvelo A., Gogarten S.M., Kang H.M., et al. NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021;590:290–299. - PMC - PubMed
1. Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. - PMC - PubMed
1. Schaid D.J., Chen W., Larson N.B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 2018;19:491–504. - PMC - PubMed
1. Battle A., Brown C.D., Engelhardt B.E., Montgomery S.B., GTEx Consortium. Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group. Statistical Methods groups—Analysis Working Group. Enhancing GTEx (eGTEx) groups. NIH Common Fund. NIH/NCI. NIH/NHGRI. NIH/NIMH. NIH/NIDA. Biospecimen Collection Source Site—NDRI. Biospecimen Collection Source Site—RPCI. Biospecimen Core Resource—VARI. Brain Bank Repository—University of Miami Brain Endowment Bank. Leidos Biomedical—Project Management. ELSI Study. Genome Browser Data Integration &Visualization—EBI. Genome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa Cruz. Lead analysts. Laboratory, Data Analysis &Coordinating Center (LDACC) NIH program management. Biospecimen collection. Pathology. eQTL manuscript working group Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. - PMC - PubMed
1. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
Medical
- Genetic Alliance
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics

Affiliations

Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials

Miscellaneous