Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 11;8(5):e200012.
doi: 10.1212/NXG.0000000000200012. eCollection 2022 Oct.

A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data

Affiliations

A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data

Michael E Belloy et al. Neurol Genet. .

Abstract

Background and objectives: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data.

Methods: We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5).

Results: We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants.

Discussion: We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Variant Artifacts Across Different Sequencing Centers/Platforms Drive Spurious Associations in ADSP WES and WGS data
In initial exome-wide and genome-wide association studies of ADSP WES and WGS, we observed many spurious associations (p ≤ 1e−5) using model 1 (i.e., not adjusting for sequencing center/platform; cf. Figures 2A and 3A). On inspection of these signals, it was notable that these variants displayed large variation in genotype counts across sequencing centers/platforms. The MAF variation in controls for all analyzed variants is visualized in (A.a-b) for ADSP WES and in (B.a-b) for ADSP WGS. (C.a-b) A specific example of a variant showing spurious association is provided. This variant, rs199707443, has an MAF of 0.003% in non-Finnish Europeans in Genome Aggregation Database v3.1.1, contrasting the 411 heterozygote counts in the Broad sequencing center. Notably, this particular variant still showed genome-wide significant association with Alzheimer disease risk even after sequencing center/platform adjustment (cf. Figure 2B). ADSP, Alzheimer Disease Sequencing Project; CN, cognitively normal; HET, heterozygote; HOM, homozygote; MAF, minor allele frequency; WT, wild type; WES, whole-exome sequencing; WGS, whole-genome sequencing.
Figure 2
Figure 2. The Proposed Center-Based/Platform-Based Variant Filters Remove Spurious Associations in Alzheimer Disease Sequencing Project Whole-Exome Sequencing
Figure shows the Manhattan (left) and quantile-quantile (right) plots. (A) Model 1 indicates many spurious hits. (B) Model 2 shows that adjustment for center/platform can reduce many, but not all, spurious hits. The variant described in Figure 1C is highlighted by the blue arrow. (C) Filters remove most spurious hits. (D) Further adjustment for center/platform removes few additional spurious hits.
Figure 3
Figure 3. Proposed Center-Based/Platform-Based Variant Filters Remove Spurious Associations in Alzheimer Disease Sequencing Project Whole-Genome Sequencing
Figure shows the Manhattan (left) and quantile-quantile (right) plots. (A) Model 1 indicates many spurious hits. (B) Model 2 shows that adjustment for center/platform can reduce many, but not all, spurious hits. (C) Filters remove most spurious hits. (D) Further adjustment for center/platform removes few additional spurious hits.
Figure 4
Figure 4. Metrics of Variants Removed by the Proposed Center-Based/Platform-Based Variant Filters
(A.a, A.b, and B) ADSP WES. (C.a, C.b, and D) ADSP WGS. (A.a and C.a) Variants that passed filters showed largely consistent p values across model 1 and model 2 case-control association analyses, with only few variants remaining that reach suggestive significance in model 1 but lose suggestive significance on center/platform adjustment in model 2 (lower right quadrant). (A.b and C.b) Variants that were removed by filters showed many inconsistent p values across models 1 and 2, consistent with center-related/platform-related variant artifacts that could not fully be accounted for by model 2. (B and D) Frequency density plots, comparing variants that were filtered/removed with those that were not filtered. Note that variants were consistently filtered across the full frequency range, with increased density at frequencies <1% or >10% in ADSP WES. ADSP, Alzheimer Disease Sequencing Project; WES, whole-exome sequencing; WGS, whole-genome sequencing.

Similar articles

Cited by

References

    1. Sierksma A, Escott-Price V, De Strooper B. Translating genetic risk of Alzheimer's disease into mechanistic insight and drug targets. Science. 2020;370(6512):61-66. - PubMed
    1. Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer's disease. Nat Neurosci. 2020;23(3):311-322. - PubMed
    1. Kunkle BW, Grenier-Boley B, Sims R, et al. . Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet 2019;51(3):414-430. - PMC - PubMed
    1. Jansen IE, Savage JE, Watanabe K, et al. . Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk. Nat Genet 2019;51(3):404-413. - PMC - PubMed
    1. de Rojas I, Moreno-Grau S, Tesi N, et al. . Common variants in Alzheimer's disease and risk stratification by polygenic risk scores. Nat Commun 2021;12(1):3417. DOI: 10.1038/s41467-021-22491-8. - DOI - PMC - PubMed