A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
- PMID: 35966919
- PMCID: PMC9372872
- DOI: 10.1212/NXG.0000000000200012
A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
Abstract
Background and objectives: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data.
Methods: We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5).
Results: We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants.
Discussion: We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs.
Copyright © 2022 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology.
Figures
Similar articles
-
Impact of variant-level batch effects on identification of genetic risk factors in large sequencing studies.PLoS One. 2021 Apr 16;16(4):e0249305. doi: 10.1371/journal.pone.0249305. eCollection 2021. PLoS One. 2021. PMID: 33861770 Free PMC article.
-
Frequency of Variants in Mendelian Alzheimer's Disease Genes within the Alzheimer's Disease Sequencing Project (ADSP).medRxiv [Preprint]. 2024 Mar 29:2023.10.24.23297227. doi: 10.1101/2023.10.24.23297227. medRxiv. 2024. PMID: 37961373 Free PMC article. Preprint.
-
Key variants via Alzheimer's Disease Sequencing Project whole genome sequence data.medRxiv [Preprint]. 2023 Aug 29:2023.08.28.23294631. doi: 10.1101/2023.08.28.23294631. medRxiv. 2023. Update in: Alzheimers Dement. 2024 May;20(5):3290-3304. doi: 10.1002/alz.13705. PMID: 37693453 Free PMC article. Updated. Preprint.
-
Identification of rare variants in Alzheimer's disease.Front Genet. 2014 Oct 28;5:369. doi: 10.3389/fgene.2014.00369. eCollection 2014. Front Genet. 2014. PMID: 25389433 Free PMC article. Review.
-
SORL1 genetic variants and Alzheimer disease risk: a literature review and meta-analysis of sequencing data.Acta Neuropathol. 2019 Aug;138(2):173-186. doi: 10.1007/s00401-019-01991-4. Epub 2019 Mar 25. Acta Neuropathol. 2019. PMID: 30911827 Review.
Cited by
-
Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression.ArXiv [Preprint]. 2024 Feb 20:arXiv:2402.12724v1. ArXiv. 2024. PMID: 38463500 Free PMC article. Preprint.
-
Novel loci for Alzheimer's disease identified by a genome-wide association study in Ashkenazi Jews.Alzheimers Dement. 2023 Dec;19(12):5550-5562. doi: 10.1002/alz.13117. Epub 2023 Jun 1. Alzheimers Dement. 2023. PMID: 37260021
-
GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies.Nat Commun. 2022 Nov 23;13(1):7209. doi: 10.1038/s41467-022-34932-z. Nat Commun. 2022. PMID: 36418338 Free PMC article.
References
-
- Sierksma A, Escott-Price V, De Strooper B. Translating genetic risk of Alzheimer's disease into mechanistic insight and drug targets. Science. 2020;370(6512):61-66. - PubMed
-
- Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer's disease. Nat Neurosci. 2020;23(3):311-322. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources