Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep;50(9):1219-1224.
doi: 10.1038/s41588-018-0183-z. Epub 2018 Aug 13.

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Affiliations

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Amit V Khera et al. Nat Genet. 2018 Sep.

Abstract

A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Study design and workflow
A genome-wide polygenic score (GPS) for each disease was derived by combining summary association statistics from a recent large GWAS and a linkage disequilibrium reference panel of 503 Europeans. 31 candidate GPS were derived using two strategies: 1. ‘pruning and thresholding’ – aggregation of independent polymorphisms that exceed a specified level of significance in the discovery GWAS and 2. LDPred computational algorithm, a Bayesian approach to calculate a posterior mean effect for all variants based on a prior (effect size in the prior GWAS) and subsequent shrinkage based on linkage disequilibrium. The seven candidate LDPred scores vary with respect to the tuning parameter ρ, the proportion of variants assumed to be causal, as previously recommended. The optimal GPS for each disease was chosen based on area under the receiver-operator curve (AUC) in the UK Biobank Phase I validation dataset (N=120,280 Europeans) and subsequently calculated in an independent UK Biobank Phase II testing dataset (N=288,978 Europeans).
Figure 2.
Figure 2.. Risk for coronary artery disease according to genome-wide polygenic score.
(a) Distribution of genome-wide polygenic score for CAD (GPSCAD) in the UK biobank testing dataset (N=288,978). The x-axis represents GPSCAD, with values scaled to a mean of 0 and standard deviation of 1 to facilitate interpretation. Shading reflects proportion of population with 3, 4, and 5-fold increased risk versus remainder of the population. Odds ratio assessed in a logistic regression model adjusted for age, sex, genotyping array, and the first four principal components of ancestry; (b) GPSCAD percentile among CAD cases versus controls in the UK biobank validation cohort. Within each boxplot, the horizontal lines reflect the median, the top and bottom of the box reflects the interquartile range, and the whiskers reflect the maximum and minimum value within each grouping; (c) prevalence of CAD according to 100 groups of the validation cohort binned according to percentile of the GPSCAD.
Figure 3.
Figure 3.. Risk gradient for disease according to genome-wide polygenic score percentile
100 groups of the validation cohort were derived according to percentile of the disease-specific GPS. Prevalence of disease displayed for risk of (a) atrial fibrillation, (b) type 2 diabetes, (c) inflammatory bowel disease, and (d) breast cancer according to GPS percentile.

Comment in

Similar articles

Cited by

References

    1. Green ED, Guyer MS; National Human Genome Research Institute. Charting a course for genomic medicine from base pairs to bedside. Nature. 470, 204–213 (2011). - PubMed
    1. Fisher RA The correlation between relatives on the supposition of Mendelian inheritance. Proc. Roy. Soc. Edinburgh 52, 99–433 (1918).
    1. Gibson G Rare and common variants: twenty arguments. Nat Rev Genet. 18, 135–45 (2012). - PMC - PubMed
    1. Golan D, Lander ES, Rosset S. Measuring missing heritability: inferring the contribution of common variants. Proc Natl Acad Sci U S A. 111, E5272–81 (2014). - PMC - PubMed
    1. Fuchsberger C, et al. The genetic architecture of type 2 diabetes. Nature. 536, 41–47 (2016). - PMC - PubMed

Publication types