Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2022 Nov 23;13(1):7209.
doi: 10.1038/s41467-022-34932-z.

GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies

Affiliations
Meta-Analysis

GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies

Zihuai He et al. Nat Commun. .

Abstract

Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of GhostKnockoff.
We present the workflow of GhostKnockoff compared to conventional GWAS and knockoff inference based on same marginal test statistics using individual level data. A Conventional GWAS. B Knockoff inference using individual level data. We present the approach based on marginal test statistics. C The proposed GhostKnockoff using Z-scores from conventional GWAS as input.
Fig. 2
Fig. 2. Empirical simulation studies for power, FDR and stability.
Two cohorts are randomly sampled from the same population. AF. Power and FDR based on 1000 replicates for different types of traits (quantitative and dichotomous) and different levels of sample overlap (0%/25%/50%), with different target FDR varying from 0 to 0.2. GhostKnockoff-M/S: the proposed multiple/single knockoff method based on the meta-analysis of Z-scores calculated separately from each individual cohort. IndividualData Knockoff-M/S: knockoff inference based on individual level data. G, H. Prioritization of causal variants. I, J. Stability of knockoff inference, with 25% overlap and 20% unobserved variants per study. The stability is quantified as the standard deviation of feature statistics across 1000 replicates due to randomly sampling knockoffs for a given dataset.
Fig. 3
Fig. 3. Meta-analysis of Alzheimer’s disease studies.
A Study correlations estimated using the proposed method. For each study, we present sequencing technology, sample size and number of variants. B Optimal combination of studies estimated using the proposed method. Each bar presents the weight per study in percentage, i.e. weight per study divided by the summation of all weights. C We present the Manhattan plot of W statistics (truncated at 100 for clear visualization) from GhostKnockoff with target FDR at 0.05 (red) and 0.10 (blue). The results are based on the optimal weights combining the nine studies. Variant density is shown at the bottom of Manhattan plot (number of variants per 1Mb).
Fig. 4
Fig. 4. Single-cell RNAseq data (n = 143793) analysis of the identified proximal AD genes.
A Differentially expressed genes (DEG) analysis using MAST implemented in Seurat, comparing Alzheimer’s disease cases (AD) with healthy controls. Each dot represents a gene. Colors represent different cell types. OPC: Oligodendrocyte progenitor cell. The black dashed line corresponds to p-value cutoff 0.05; the gray dashed line corresponds to p-value cutoff 0.05/38 (number of candidate genes) which accounts for multiple comparisons. For visualization purposes, −log10(p) values are capped at 15 and abs(log2(fold change)) values are capped at 1.0. Positive log2 fold change corresponds to higher expression level in AD. B Proportion of suggestive genes stratified by cell types. C Proportion of suggestive genes in at least one cell type. P-values are calculated with two-sided Fisher’s exact test. D Enrichment analysis of DEG nominal p-values relative to background genes. P-values are calculated by MAST implemented in Seurat.
Fig. 5
Fig. 5. Phenome-wide Analysis of 1403 binary phenotypes from UK biobank data with 408,961 white British participants with European ancestry.
A, B Comparison between conventional GWAS and GhostKnockoff. C Summary of (A) and (B). For each phenotype, we calculated the ratio between the total number of identified loci/ the average number of proxy variants per shared locus by GhostKnockoff and by conventional GWAS (capped at 500 for better visualization). Panel (C) presents the average ratio (as in (A) and (B)) across 1403 phenotypes. The standard error is calculated as standarddeviationoftheratiototalnumberofphenotypes1. D Distribution of the number of identified loci. We present boxplot (median and 25%/75% quantiles) for each disease category. E For loci identified by both conventional GWAS and the proposed method, we present median and 25%/75% quantiles of the number of identified variants per locus. For visualization purposes, we present the results for disease phenotypes with 5 loci identified by either conventional GWAS or the proposed multiple knockoff inference for panels (D, E). F Functional score of variants identified by GhostKnockoff compared to that of genome-wide background variants. Each data point in the boxplot corresponds to the average score of one disease category. The boxplot presents median and 25%/75% quantiles.

Similar articles

Cited by

References

    1. Sierksma, A., Escott-Price, V. & De Strooper, B. Translating genetic risk of Alzheimer’s disease into mechanistic insight and drug targets. Science370, 61–66 (2020). - PubMed
    1. Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 2018;19:491. doi: 10.1038/s41576-018-0016-z. - DOI - PMC - PubMed
    1. Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. - DOI - PMC - PubMed
    1. Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010 427. 2010;42:565–569. - PMC - PubMed
    1. Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer’s disease. Nat. Neurosci. 2020 233. 2020;23:311–322. - PubMed

Publication types