Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2022 Dec;54(12):1803-1815.
doi: 10.1038/s41588-022-01233-6. Epub 2022 Dec 6.

Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants

Krishna G Aragam #  1   2   3   4 Tao Jiang #  5 Anuj Goel #  6   7 Stavroula Kanoni #  8 Brooke N Wolford #  9 Deepak S Atri #  10   11 Elle M Weeks  12 Minxian Wang  10   12 George Hindy  13 Wei Zhou  12   9   14   15 Christopher Grace  6   7 Carolina Roselli  10 Nicholas A Marston  16 Frederick K Kamanu  16 Ida Surakka  17 Loreto Muñoz Venegas  18   19 Paul Sherliker  20 Satoshi Koyama  21 Kazuyoshi Ishigaki  22 Bjørn O Åsvold  23   24   25 Michael R Brown  26 Ben Brumpton  23   24 Paul S de Vries  26 Olga Giannakopoulou  8 Panagiota Giardoglou  27 Daniel F Gudbjartsson  28   29 Ulrich Güldener  30 Syed M Ijlal Haider  18   19 Anna Helgadottir  28 Maysson Ibrahim  31 Adnan Kastrati  30   32 Thorsten Kessler  30   32 Theodosios Kyriakou  7 Tomasz Konopka  8 Ling Li  30 Lijiang Ma  33   34 Thomas Meitinger  32   35   36 Sören Mucha  18   19 Matthias Munz  18   19 Federico Murgia  31 Jonas B Nielsen  17   23 Markus M Nöthen  37 Shichao Pang  30 Tobias Reinberger  18   19 Gavin Schnitzler  10 Damian Smedley  8 Gudmar Thorleifsson  28 Moritz von Scheidt  30   32 Jacob C Ulirsch  12   14   38 Biobank JapanEPIC-CVDDavid O Arnar  28   39   40 Noël P Burtt  12 Maria C Costanzo  12 Jason Flannick  41 Kaoru Ito  21 Dong-Keun Jang  12 Yoichiro Kamatani  42 Amit V Khera  43   10   12 Issei Komuro  44 Iftikhar J Kullo  45 Luca A Lotta  46 Christopher P Nelson  47 Robert Roberts  48 Gudmundur Thorgeirsson  28   39   40 Unnur Thorsteinsdottir  28   39 Thomas R Webb  47 Aris Baras  46 Johan L M Björkegren  49   50   51 Eric Boerwinkle  26   52 George Dedoussis  27 Hilma Holm  28 Kristian Hveem  23   24 Olle Melander  53 Alanna C Morrison  26 Marju Orho-Melander  53 Loukianos S Rallidis  54 Arno Ruusalepp  55 Marc S Sabatine  16 Kari Stefansson  28   39 Pierre Zalloua  56   57 Patrick T Ellinor  58   10 Martin Farrall  6   7 John Danesh  5   59   60   61   62   63 Christian T Ruff  16 Hilary K Finucane  12   14   15 Jemma C Hopewell  31 Robert Clarke  31 Rajat M Gupta  10   12   11 Jeanette Erdmann  18   19 Nilesh J Samani  39 Heribert Schunkert  30   32 Hugh Watkins  6   7 Cristen J Willer  9   17   64 Panos Deloukas  8   65 Sekar Kathiresan  66 Adam S Butterworth  67   68   69   70   71 CARDIoGRAMplusC4D Consortium
Collaborators, Affiliations
Meta-Analysis

Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants

Krishna G Aragam et al. Nat Genet. 2022 Dec.

Abstract

The discovery of genetic loci associated with complex diseases has outpaced the elucidation of mechanisms of disease pathogenesis. Here we conducted a genome-wide association study (GWAS) for coronary artery disease (CAD) comprising 181,522 cases among 1,165,690 participants of predominantly European ancestry. We detected 241 associations, including 30 new loci. Cross-ancestry meta-analysis with a Japanese GWAS yielded 38 additional new loci. We prioritized likely causal variants using functionally informed fine-mapping, yielding 42 associations with less than five variants in the 95% credible set. Similarity-based clustering suggested roles for early developmental processes, cell cycle signaling and vascular cell migration and proliferation in the pathogenesis of CAD. We prioritized 220 candidate causal genes, combining eight complementary approaches, including 123 supported by three or more approaches. Using CRISPR-Cas9, we experimentally validated the effect of an enhancer in MYO9B, which appears to mediate CAD risk by regulating vascular cell motility. Our analysis identifies and systematically characterizes >250 risk loci for CAD to inform experimental interrogation of putative causal mechanisms for CAD.

PubMed Disclaimer

Conflict of interest statement

All deCODE affiliated authors are employees of deCODE/Amgen. The TIMI Study Group has received institutional research grant support through Brigham and Women’s from Abbott, Amgen, Aralez, AstraZeneca, Bayer HealthCare Pharmaceuticals, BRAHMS, Daiichi-Sankyo, Eisai, GlaxoSmithKline, Intarcia, Janssen, MedImmune, Merck, Novartis, Pfizer, Poxel, Quark Pharmaceuticals, Roche, Takeda, The Medicines Company, and Zora Biosciences. R.C., J.C.H., M.I. and F.M. work at the Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, which receives research grants from industry that are governed by the University of Oxford contracts that protect its independence and has a staff policy of not taking personal payments from industry; further details can be found at https://www.ndph.ox.ac.uk/files/about/ndph-independence-of-research-policy-jun-20.pdf. A.S.B. reports grants outside of this work from AstraZeneca, Bayer, Biogen, BioMarin, Bioverativ, Merck, Novartis and Sanofi. A.B. and L.A.L. are employees of Regeneron Pharmaceuticals and the spouse of C.J.W. works at Regeneron Pharmaceuticals. J.L.M.B. and A.R. are members of the board of directors, founders and shareholders of Clinical Gene Networks AB that has an invested interest in STARNET. J.D. serves on scientific advisory boards for AstraZeneca, Novartis, and UK Biobank and has received multiple grants from academic, charitable and industry sources outside of the submitted work. J.C.U. has received compensation for consulting from Goldfinch Bio and is an employee of Patch Biosciences. O.G. became a full-time employee of UCB while this manuscript was being drafted. The other authors declare no conflicts of interest.

Figures

Fig. 1
Fig. 1. Common variant association signals for CAD.
MAF versus per-allele OR for CAD for common sentinel variant (MAF > 5%) associations reaching genome-wide significance or the 1% FDR threshold in our study. Colored circles indicate genome-wide significant associations (P < 5.0 × 10−8) with sentinel variants that are not correlated (r2 < 0.2) with a previously reported variant (red), genome-wide significant sentinel variants correlated with a previously reported variant (blue), new genome-wide significant sentinels after meta-analysis with Biobank Japan (gold) and associations reaching the 1% FDR threshold (P < 2.52 × 10−5) in our meta-analysis (gray). Two-sided P values are from Z scores from fixed-effect inverse-variance weighted meta-analyses.
Fig. 2
Fig. 2. Polygenic prediction of incident and recurrent CAD.
a,b, Prognostication of incident CAD (a) and recurrent coronary events (b) by optimal PRS derived from the current meta-analysis of ~180 K CAD cases (2022 PRS; includes ~2.3 million variants) or a previously reported GWAS meta-analysis of CAD from 2015 involving ~60 K CAD cases (2015 PRS; includes ~1.5 million variants). We analyzed 815 incident events in the validation subset of the MDC Study and 1,074 recurrent coronary events in the FOURIER trial. Cox proportional hazards models were adjusted for age, sex and genetic principal components.
Fig. 3
Fig. 3. Epigenetic enrichment and functionally informed fine-mapping of CAD loci.
a, Number of tissues/cell types in which 127 regions were enriched. Of 235 distance-based regions containing genome-wide significant associations in our meta-analysis, 127 regions had significant enrichment in at least one tissue type and were therefore fine-mapped using FGWAS. b, Distribution of 95% credible set sizes for the 127 enriched regions. For display purposes, the plot excludes ten regions for which the 95% credible set contained more than 100 variants (Supplementary Table 20). c, Circle plot of epigenetic enrichment for 53 significantly enriched GWAS regions containing a variant with PPA ≥ 0.5. The number of regions in which each tissue showed enrichment in is displayed in the upper right quadrant. The number of regions that show enrichment with a given tissue/cell type is displayed in the box next to the tissue/cell type name. The 53 significantly enriched GWAS regions containing a variant with PPA ≥ 0.5 are colored according to the tissue with the strongest evidence of enrichment for that region. Region names with an asterisk denote those for which all conditionally independent association signals were annotated as being new. The histogram shows the total number of tissues with enrichment for each region and the links indicate the tissues/cell types in which each region was enriched. The number of 95% credible variants per region is displayed in the outer ring.
Fig. 4
Fig. 4. PoPS informs the identification of causal genes for CAD.
a, Feature clusters contributing to causal gene prioritization. Rank-order plot of 2,852 feature clusters (arising from 19,091 distinct features) contributing to the prioritization of likely causal genes for CAD by PoPS. Similarity-based cluster labels are provided for several top clusters. b, Prioritization of MFGE8 for rs1807214. Regional association plot at chromosome 15 demonstrates the prioritization of MFGE8 as the likely causal gene for rs1807214, which lies in an intergenic region of chromosome 15. Genes in the region are plotted by their chromosomal position (x axis) and PoPS (y axis).
Fig. 5
Fig. 5. Integrating eight gene prioritization predictors to identify most likely causal genes.
a, Prioritization of 220 likely causal genes using eight predictors. Blue circles represent the eight predictors used to prioritize causal genes, which are as follows: (1) a gene in the region harbors a variant that ClinVar classifies as having evidence for being pathogenic for a cardiovascular-relevant monogenic disorder (Supplementary Table 34); (2) a gene in the region has been implicated by an effective drug targeting the protein and/or a positive MR study suggesting a causal effect of the protein on CAD (Supplementary Table 31); (3) either of the two top prioritized genes in the region from PoPS (Supplementary Table 24); (4) a gene in the region has an eQTL in a CAD-relevant tissue from GTEx or STARNET for which the lead eSNP is in high linkage disequilibrium (LD) (r2 ≥ 0.8) with the CAD sentinel variant (Supplementary Tables 27 and 28); (5) a gene for which a mouse knock-out has a cardiovascular-relevant phenotype (Supplementary Table 35); (6) a gene in the region harbors a protein-altering variant that is in high LD (r2 ≥ 0.8) with the CAD sentinel variant (Supplementary Table 31); (7) a gene in the region has been shown to have a rare variant association with CAD in a previous WES or genotyping study (Supplementary Table 31); (8) the nearest gene to the CAD sentinel variant. Numbers in the blue circles indicate, firstly, the number of genes for which this predictor agreed with the most likely causal gene, secondly, the number of genes for which this predictor provided evidence for at least one gene, and in parentheses, the percentage agreement (that is, the first number as a percentage of the second). The central histogram shows the number of agreeing predictors that supported the 220 prioritized genes by the number of genes. b, Predictors for 44 most likely causal genes strongly prioritized by at least four agreeing predictors. The matrix denotes predictors that supported the most likely causal gene (colored red) for each of the 44 most likely causal genes with at least four predictors that supported the gene. Genes are ordered by number of agreeing predictors. The sentinel variant for the association with the smallest P value for CAD is shown for each gene. Full details of the causal gene prioritization evidence for all 279 genome-wide associations are presented in Supplementary Table 31 and the 79 most likely causal genes with three agreeing predictors are displayed in the same format in Supplementary Fig. 1.
Fig. 6
Fig. 6. Experimental interrogation of a new CAD locus near MYO9B.
a, Regional association plot from the primary CAD meta-analysis for the new gene-dense region around MYO9B. Colored dots represent the position (x axis) in GRCh37 coordinates and –log10(meta-analysis P value) (y axis) of each variant in the region. Dots are shaded to represent the r2 with the lead CAD variant (rs7246865), estimated using a random sample of 5,000 European ancestry participants from the UK Biobank. Recombination peaks are plotted in blue based on estimates of recombination from 1000 Genomes European ancestry individuals. b, Identification of a noncoding enhancer in the region around the CAD association signal. The plot shows an inset of a 5-kb window surrounding the lead CAD variant (rs7246865). The top three tracks (blue) show H3K27Ac ChIP-seq of human CA, aorta and tibial artery, identifying a vascular tissue enhancer element overlying rs7246865. The bottom three tracks (purple) show ATAC-seq of human monocytes, immortalized human aortic ECs and CA-VSMCs, identifying a region of open chromatin in all three cell types around rs7246865. The plot also shows the location of the sgRNAs used for deletion of the noncoding enhancer. c, Efficiency of CRISPR editing in primary human cells. The Cas9-sgRNA ribonucleoprotein nucleofection method resulted in noncoding enhancer deletion efficiency (x-axis) of greater than 0.5 by densitometry and was comparable across monocytes, ECs and CA-VSMCs. Points indicate enhancer deletion efficiency for each of the 12 replicates. Horizontal bars indicate mean enhancer deletion efficiency, and whiskers indicate 95% CIs. d, Relative expression of nearby genes after enhancer deletion in ECs. The y axis shows mean expression of five local genes expressed in ECs compared to expression levels of a control gene (GAPDH). Blue bars indicate gene expression with Cas9–control sgRNA. Red bars indicate expression with tandem enhancer-deleting guides as identified in b. Points indicate gene expression levels for each of the six replicates. Vertical bars indicate mean expression levels and whiskers indicate 95% CIs. Gene expression was quantified by qPCR. Expression levels were compared using an unpaired two-way Student’s t test. Reduced expression of MYO9B and HAUS8 was identified after 131-bp enhancer deletion as in b. **P = 0.0020; ***P < 0.0001. e, Relative expression of nearby genes after enhancer deletion in CA-VSMCs. The y-axis shows mean expression of five local genes expressed in CA-VSMCs compared to expression levels of a control gene (GAPDH). Blue bars indicate gene expression with Cas9–control sgRNA. Red bars indicate expression with tandem enhancer-deleting guides as identified in b. Points indicate gene expression levels for each of the six biological replicates. Vertical bars indicate mean expression levels and whiskers indicate 95% CIs. Gene expression was quantified by qPCR. Expression levels were compared using an unpaired two-way Student’s t test. Reduced expression of MYO9B was identified after 131-bp enhancer deletion as in b. **P = 0.0044. f, In vitro endothelial wound healing with enhancer and gene deletions. The y-axis indicates fluorescence intensity, a read-out for endothelial wound healing and a composite of migration and proliferation. ECs with CRISPR–Cas9 genome editing for enhancer deletion (red) or single-gene knock-outs exhibited diminished wound healing relative to nontargeting control with no deletions (blue). Dots indicate endothelial wound healing for each of the six replicates. Vertical bars indicate mean wound-healing levels and whiskers indicate 95% CIs. Levels of wound healing were compared by one-way ANOVA. *P = 0.0464; **P = 0.0013; ***P = 0.0003; ****P < 0.0001; NS, not significant.
Extended Data Fig. 1
Extended Data Fig. 1. Study design.
Flowchart depicting contributing studies and analysis strategy.
Extended Data Fig. 2
Extended Data Fig. 2. Genetic architecture of 897 association signals for CAD.
Minor allele frequency versus per-allele odds ratio for CAD for all sentinel variants reaching genome-wide significance or the 1% FDR threshold in our study. Colored circles indicate genome-wide significant associations (P < 5.0 × 10−8) with sentinel variants that are not correlated (r2 < 0.2) with a previously reported variant (red), genome-wide significant sentinel variants correlated with a previously reported variant (blue), and associations reaching the 1% FDR threshold (P < 2.52 × 10−5) in our meta-analysis (gray). Two-sided P values are from Z-scores from fixed-effect inverse-variance weighted meta-analyses.
Extended Data Fig. 3
Extended Data Fig. 3. Gene-based association testing of rare variants in UK Biobank.
QQ-plot of aggregate variant association tests from 15,923 genes versus CAD in UK Biobank. Results presented here are for the SKAT-O test using the Mask 1 (‘lenient’) filter, which includes variants with minor allele frequency < 5% that are annotated as missense, frameshift, stop gain, stop loss or splice site. Results for all genes, tests and filters are in Supplementary Table 7. Details of masks and test are in Supplementary Table 6. The red dashed line indicates the Bonferroni threshold accounting for the number of genes tested. The gray dashed line indicates the null hypothesis (that is observed = expected under the null). The blue shaded area indicates the 95% confidence interval around the null.
Extended Data Fig. 4
Extended Data Fig. 4. Cross-ancestry comparison.
a, Comparison of allele frequencies between the meta-analysis and Biobank Japan. Black dots denote the allele frequencies for 199 sentinel variants reaching genome-wide significance in the (predominantly European ancestry) meta-analysis (y-axis) that were also present in the publicly available summary statistics from Biobank Japan (x-axis). Variants were aligned according to the effect allele in Supplementary Table 3. The Pearson correlation coefficient was 0.76. b, Comparison of beta estimates between the meta-analysis and Biobank Japan. Black dots denote the beta estimates for the CAD associations for 199 sentinel variants reaching genome-wide significance in the (predominantly European ancestry) meta-analysis (y-axis) that were also present in the publicly available summary statistics from Biobank Japan (x-axis). Variants were aligned according to the effect allele in Supplementary Table 3. Horizontal and vertical lines represent 95% confidence intervals. The Pearson correlation coefficient was 0.59, which increased to 0.85 when three outlying variants marked in red (at ATXN2, FER and SLC22A1) were excluded.
Extended Data Fig. 5
Extended Data Fig. 5. Epigenetically-informed fine-mapping of the MAFB locus.
a, Regional association plot from the CAD meta-analysis for the MAFB region. Colored dots represent the position (x-axis) in GRCh37 coordinates and –log10(meta-analysis P value) (y-axis) of each variant in the region. Dots are shaded to represent the r with the lead CAD variant (rs2207132), estimated using a random sample of 5,000 European ancestry participants from the UK Biobank. Recombination peaks are plotted in blue based on estimates of recombination from 1000 Genomes European-ancestry individuals. b, Tissue-specific imputed chromHMM states at the three credible set variants in the MAFB region. The top track shows the position on chromosome 20 (GRCh37) in the MAFB region. The second track shows as orange vertical bars the posterior probability (y-axis) for each variant in the window from the FGWAS fine-mapping, identifying rs1883711 (PPA = 0.77) as the most likely causal variant. The third track indicates as a black box the position of the imputed chromHMM state in each of the ten CAD-relevant tissues based on epigenomic data from the NIH Roadmap Epigenomics Consortium project. The yellow vertical line indicates the position of the most likely causal variant (rs1883711) with respect to the chromHMM states. rs1883711 lies in an enhancer region for liver (the most strongly enriched tissue for this region) and adipose, the two functionally enriched tissues in the region. The other two variants in the 95% credible set (rs2207132 and rs117113213) do not lie in regions annotated as chromHMM states. HSMM, human skeletal muscle myoblasts; HUVEC, human umbilical vein endothelial cells; PPA, posterior probability of being the causal variant.
Extended Data Fig. 6
Extended Data Fig. 6. Pairwise concordance of eight gene-prioritization predictors to identify most likely causal genes.
White squares lying on the diagonal contain the number of genes for which that predictor provided evidence (denominator) and the number of times for which that predictor prioritized the most likely causal gene at the locus (numerator). For example, eQTL data provided evidence for 105 causal genes, of which 90 (86%) were also the most likely causal gene at the locus. Blue squares below the diagonal show the concordance between pairs of predictors and contain the number of genes for which both predictors provided evidence (denominator) and the number of times for which the prioritized causal gene was the same (numerator). For example, the nearest gene and the presence of a protein-altering variant in high LD (r2 > 0.8) with the CAD sentinel both provided evidence for a causal gene at 48 loci, of which they were concordant (that is prioritized the same causal gene) at 34 (71%). Darker blue squares show higher levels of concordance. Orange squares above the diagonal show the discordance between pairs of predictors and contain the number of genes for which both predictors provided evidence (denominator) and the number of times for which the prioritized causal gene was the different (numerator). For example, the nearest gene and the presence of a protein-altering variant in high LD (r2 > 0.8) with the CAD sentinel both provided evidence for a causal gene at 48 loci, of which they were discordant (that is prioritized a different causal gene) at 13 (27%). Darker orange squares show higher levels of discordance. See Fig. 5a for descriptions of the eight predictors used to prioritize causal genes.
Extended Data Fig. 7
Extended Data Fig. 7. Prioritizing the likely causal variant, gene and pathway at the ITGA1 locus.
a, Regional association plot from the primary CAD meta-analysis for the ITGA1 region. Colored dots represent the position (x-axis) in GRCh37 coordinates and –log10(meta-analysis P value) (y-axis) of each variant in the region. Dots are shaded to represent the r2 with the lead CAD variant (rs4074793), estimated using a random sample of 5,000 European-ancestry participants from UK Biobank. Recombination peaks are plotted in blue based on estimates of recombination from 1000 Genomes European-ancestry individuals. b, Tissue-specific imputed chromHMM states at the two credible set variants in the ITGA1 region. The top track shows the position on chromosome 5 (GRCh37) with respect to the ITGA1 gene. The second track shows as a vertical orange line the posterior probability (y-axis) for each variant in the region from the FGWAS fine-mapping, identifying rs4074793 (PPA = 0.95) as the likely causal variant. The third track indicates as a black box the position of an enhancer state in each of the ten CAD-relevant tissues, using custom imputed chromHMM states based on epigenomic data from the NIH Roadmap Epigenomics Consortium project. The yellow vertical line indicates the position of the likely causal variant (rs4074793) with respect to the chromHMM states. rs4074793 is annotated to a chromHMM state for all five tissues that show enrichment in the region. HSMM, human skeletal muscle cells; HUVEC, human umbilical vein endothelial cells; PPA, posterior probability of being the causal variant. c, Effect of rs4074973 on ITGA1 expression in liver in the STARNET study. The plot shows the position (x-axis) in GRCh37 coordinates and –log10(P value) (y-axis) of each variant in the region. The likely causal CAD variant rs4074973 is circled in black. Only variants with P < 0.01 are displayed. d, Associations of rs4074973 with ITGA1 expression and phenotypes from a phenome-wide association study. The per-allele association of rs40747973-G (the CAD risk allele) measured in s.d. units is plotted for each phenotype. The box indicates the point estimate and the horizontal bars represent the 95% confidence intervals. The top panel shows the association estimates for ITGA1 expression from the STARNET study. The bottom panel shows associations from UK Biobank (liver enzymes and inflammatory markers) and the literature (lipids). ALP, alkaline phosphatase; ALT, alanine aminotransferase; CRP, C-reactive protein; GGT, gamma glutamyltransferase; LDL-c, low-density lipoprotein cholesterol; Tchol, total cholesterol.

Comment in

Similar articles

Cited by

References

    1. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396:1204–1222. doi: 10.1016/S0140-6736(20)30925-9. - DOI - PMC - PubMed
    1. Howson JM, et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 2017;49:1113–1119. doi: 10.1038/ng.3874. - DOI - PMC - PubMed
    1. Ishigaki K, et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat. Genet. 2020;52:669–679. doi: 10.1038/s41588-020-0640-3. - DOI - PMC - PubMed
    1. Klarin D, et al. Genetic analysis in UK Biobank links insulin resistance and transendothelial migration pathways to coronary artery disease. Nat. Genet. 2017;49:1392–1397. doi: 10.1038/ng.3914. - DOI - PMC - PubMed
    1. Koyama S, et al. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat. Genet. 2020;52:1169–1177. doi: 10.1038/s41588-020-0705-3. - DOI - PubMed

Publication types

MeSH terms

Grants and funding