. 2022 Sep 1;185(18):3426-3440.e19.

doi: 10.1016/j.cell.2022.08.004.

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

Marta Byrska-Bishop¹, Uday S Evani², Xuefang Zhao³, Anna O Basile², Haley J Abel⁴, Allison A Regier⁴, André Corvelo², Wayne E Clarke⁵, Rajeeva Musunuri², Kshithija Nagulapalli², Susan Fairley⁶, Alexi Runnels², Lara Winterkorn², Ernesto Lowy⁶; Human Genome Structural Variation Consortium; Paul Flicek⁶, Soren Germer², Harrison Brand⁷, Ira M Hall⁸, Michael E Talkowski⁷, Giuseppe Narzisi², Michael C Zody⁹

Collaborators, Affiliations

Collaborators

Human Genome Structural Variation Consortium:
Evan E Eichler, Jan O Korbel, Charles Lee, Tobias Marschall, Scott E Devine, William T Harvey, Weichen Zhou, Ryan E Mills, Tobias Rausch, Sushant Kumar, Can Alkan, Fereydoun Hormozdiari, Zechen Chong, Yu Chen, Xiaofei Yang, Jiadong Lin, Mark B Gerstein, Ye Kai, Qihui Zhu, Feyza Yilmaz, Chunlin Xiao

Affiliations

¹ New York Genome Center, New York, NY 10013, USA. Electronic address: mbyrska-bishop@nygenome.org.
² New York Genome Center, New York, NY 10013, USA.
³ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.
⁴ McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA.
⁵ New York Genome Center, New York, NY 10013, USA; Outlier Informatics Inc., Saskatoon, SK S7H 1L4, Canada.
⁶ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
⁷ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
⁸ McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; Center for Genomic Health, Yale University School of Medicine, New Haven, CT 06510, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA.
⁹ New York Genome Center, New York, NY 10013, USA. Electronic address: mczody@nygenome.org.

PMID: 36055201
PMCID: PMC9439720
DOI: 10.1016/j.cell.2022.08.004

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

Marta Byrska-Bishop et al. Cell. 2022.

. 2022 Sep 1;185(18):3426-3440.e19.

doi: 10.1016/j.cell.2022.08.004.

Authors

Collaborators

Human Genome Structural Variation Consortium:
Evan E Eichler, Jan O Korbel, Charles Lee, Tobias Marschall, Scott E Devine, William T Harvey, Weichen Zhou, Ryan E Mills, Tobias Rausch, Sushant Kumar, Can Alkan, Fereydoun Hormozdiari, Zechen Chong, Yu Chen, Xiaofei Yang, Jiadong Lin, Mark B Gerstein, Ye Kai, Qihui Zhu, Feyza Yilmaz, Chunlin Xiao

Affiliations

¹ New York Genome Center, New York, NY 10013, USA. Electronic address: mbyrska-bishop@nygenome.org.
² New York Genome Center, New York, NY 10013, USA.
³ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.
⁴ McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA.
⁵ New York Genome Center, New York, NY 10013, USA; Outlier Informatics Inc., Saskatoon, SK S7H 1L4, Canada.
⁶ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
⁷ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
⁸ McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; Center for Genomic Health, Yale University School of Medicine, New Haven, CT 06510, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA.
⁹ New York Genome Center, New York, NY 10013, USA. Electronic address: mczody@nygenome.org.

PMID: 36055201
PMCID: PMC9439720
DOI: 10.1016/j.cell.2022.08.004

Abstract

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.

Keywords: 1000 Genomes Project; INDEL; SNV; population genetics; reference imputation panel; structural variation; trio sequencing; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. P.F. is an SAB member of Fabric Genomics, Inc., and Eagle Genomics, Ltd.

Figures

**Figure 1**
SNV/INDEL discovery in the high-coverage WGS data across the 3,202 1kGP samples (A) Counts of samples stratified by sex and super-population. Original: 2,504 original 1kGP samples. New: 698 newly added samples. (B) Cohort-level alternate allele counts of SNVs and INDELs across the 3,202 samples, stratified by AF bins. Novel/known: sites absent from/present in dbSNP build 155. AF was estimated based on the 2,504 unrelated samples. Pie chart: breakdown of all novel variants by the super-population ancestry. Gray area in the pie chart: novel sites that were called in more than one super-population. (C) Count of small variant loci per genome, stratified by population. See also Figures S1A–S1C. (D) Predicted functional SNVs and INDELs (autosomes). Top row: cohort-level counts (purple bar plot) overlaid with distributions of sample-level counts (boxplots) across the 2,504 unrelated samples. Middle row: fraction of rare (MAF ≤1%) SNVs and INDELs among the predicted functional sites. Bottom row: fraction of novel SNVs and INDELs among the predicted functional sites. See also Figures S1G and S1H. (E) Precision versus recall computed relative to the GIAB truth set v3.3.2, stratified by easy and difficult regions of the genome. See also Figure S1D. Super-population ancestry labels: EUR, European; AFR, African; EAS, East Asian; SAS, South Asian; AMR, American. Descriptions of population labels are in Table S1.

**Figure S1**
Evaluation of small variant calls, related to Figure 1 Sample-level counts of SNVs **(A)** and INDELs **(B)**, stratified by super-population. **(C)** Sample-level Het/Hom ratios across small variants, stratified by super-population. **(D)** Counts of true positive (TP), false positive (FP), and false negative (FN) SNV and INDEL calls in easy and difficult regions of the genome (GIAB v3.3.2 high confidence regions only). **(E)** Sample-level singleton (sites with AC = 1 across 3,202 samples) counts, stratified by relatedness status. **(F)** Counts of true positive (TP) and false positive (FP) singletons in NA12878 relative to either the GIAB v3.3.2 or GIAB v4.2.1 truth set (GIAB high confidence regions only). Due to the presence of NA12878’s parental samples in the expanded cohort, the analysis using the 3,202-sample 1kGP call set is based on both *de novos* and inherited variants private to the NA12878 trio. **(G)** Sample-level counts of predicted functional small variants, stratified by super-population. Reported counts are across the 2,504 unrelated samples only. **(H)** Distributions of log2(ratios) of sample-level counts from (G) normalized by the mean count across the 2,504 unrelated samples. Super-population ancestry labels: European (EUR), African (AFR), East Asian (EAS), South Asian (SAS), American (AMR). Descriptions of population labels are in Table S1. Panels E, G, H are based on autosomes

**Figure S2**
Ploidy of each chromosome across the 3,202 samples, related to Figure 1 **(A)** Ploidy of allosomes. **(B)** Copy number (CN) of each chromosome. Each dot represents a copy number of the 1Mbp bin in a sample. Blue dots are samples with copy gain and red dots represent copy loss

**Figure S3**
Benchmark of GATK-SV, svtools, and Absinthe, related to Figure 2 **(A)** Overlap of insertion sites between GATK-SV and Absinthe call sets. **(B)** Overlap of SV other than insertions between the GATK-SV and svtools call set. **(C)** Overlap of SV sites of each type between GATK-SV, svtools, and Absinthe. **(D)** Overlap of insertions in each genome between GATK-SV and Absinthe. **(E-G)** Overlap of deletions (E), duplications (F), inversion and complex SVs (G) in each genome between GATK-SV and svtools. The integers in (D-G) represent count of SVs per sample, followed by proportion of SVs validated by VaPoR/proportion of SVs assessable by VaPoR in the second row, proportion of SVs supported by PacBio SVs in Ebert et al., (2021)/proportion of SVs supported by PacBio SVs in Chaisson et al. (2019) in the third row, and transmission rate/rate of biparentally inherited SVs in the fourth row. **(H-I)** Precision of the insertion breakpoint (H) and length (I) assessed against PacBio assemblies. **(J-K)** Precision of the SV breakpoints in GATK-SV (J) and svtools (K) call sets assessed against PacBio assemblies. **(L)** Breakpoint distance of SVs shared by GATK-SV and svtools. **(M-N)***de novo* rate of SVs in GATK-SV (M) and svtools (N) call set when filtered at different boost score cutoffs. **(O)** False positives and false negatives in the GATK-SV and svtools call sets when filtered at different boost score cutoffs

**Figure S4**
Comparison of small variant calls to the phase 3 call set, related to Figure 3 **(A)** Length of INDELs in the high-coverage as compared to the phase 3 call sets. **(B)** Number of true positive (TP), false positive (FP), and false negative (FN) SNVs and INDELs in the high-coverage vs. phase 3 call set, stratified by easy and difficult regions of the genome (GIAB v3.3.2 high confidence regions only). **(C)** Comparison of allele frequencies in the high-coverage vs. the phase 3 call set across shared loci, stratified by variant type and regions of the genome. r: Pearson correlation coefficient. Number of false positive (FP), true positive (TP), and unassessed (NA; sites outside of the GIAB v3.3.2 high confidence regions of the genome) predicted functional SNVs **(D)** and INDELs **(E)** in sample NA12878, defined based on the comparison against the GIAB NA12878 truth set v3.3.2. There were no stop-loss INDELs in sample NA12878 hence no plot for that category in E. See also Figures 3G and 3H (bottom row). Panels A, C, D, E: chr1-22; panel B: chr1-22 and X

**Figure 2**
SV discovery in the high-coverage WGS data across the 3,202 1kGP samples (A–C) The count (A), size distribution (B), and allele frequency distribution (C) of each SV class. (D–F) The mean per sample count of SVs by variant class (D) and ancestral population (E) is also provided, as well as inheritance and transmission rates (F) of all SVs. In (F), child inheritance rate refers to the proportion of SVs in a child inherited from the parents. Parental transmission rate refers to the proportion of SVs in parents’ genomes that are transmitted and displayed here are all informative SVs that are only heterozygous in one parental genome. Vertical colored lines in each row represent the mean value, whereas numbers on the right margin represent median SV counts across the children or families. SV Classes: DEL, deletion; DUP, duplication; mCNV, multiallelic copy number variant; INS, insertion; INV, inversion; CPX, complex SV; CTX, inter-chromosomal translocation. Super-population ancestry labels: EUR, European; AFR, African; EAS, East Asian; SAS, South Asian; AMR, American. Descriptions of population labels are in Table S1. See also Figure S3.

**Figure 3**
Comparison of small variant calls to the phase 3 call set (A and B) Number of SNVs (A) and INDELs (B) across the 2,504 samples in phase 3 and high-coverage datasets, stratified by AF bins and regions of the genome. Secondary y axis: % of autosomal phase 3 variants recalled in the high-coverage call set across SNVs (A) and INDELs (B) in easy and difficult regions of the genome. See also Figure S4C. (C and D) Comparison of FDR across SNVs (C) and INDELs (D) between the high-coverage and phase 3 call sets, stratified by AF bins and regions of the genome. See also Figure S4B. (E and F) Sample-level SNV (E) and INDEL (F) counts in the phase 3 versus high-coverage call sets, stratified by 1kGP super-population ancestry. EUR, European; AFR, African; EAS, East Asian; SAS, South Asian; AMR, American. Reported counts are at a locus level. (G and H) Comparison of predicted functional SNV (G) and INDEL (H) counts in the high-coverage versus phase 3 call set. Log2(ratio) denotes ratio of variant counts in the high-coverage versus phase 3 call set. Top row: cohort-level comparison. Middle row: sample-level comparison. Bottom row: comparison of FDR. Red asterisks mark categories with fewer than 100 sites in sample NA12878 (i.e., categories where FDR estimation is less reliable). See also Figures S4D and S4E. FDR in (C), (D), (G), and (H) was estimated based on comparison of calls in sample NA12878 to the GIAB truth set v3.3.2. (A), (B), and (E–H): chromosomes (chr) 1–22; (C) and (D): chr1–22 and X.

**Figure 4**
Comparison of the ensemble SV calls to the phase 3 call set (A) Count of SV sites in the current ensemble SV call set and phase 3 SV call set and their overlap. Numbers next to each bar represent the counts of SV sites in each dataset. (B) The distribution of SV counts per sample in both call sets and their average overlap, displayed in the Venn diagram. (C) Count of genes altered by SVs in both datasets. pLoF, predicted loss of function; CG, complete copy gain; IED, intragenic exon duplication. (D) Count of genes altered by SVs across ancestral populations. See also Figure S5.

**Figure S5**
Comparison of gene interruptive SVs in the high-coverage ensemble versus phase 3 1kGP call sets, related to Figure 4 **(A)** Count of genes interrupted as predicted loss of function (pLoF), **(B)** intragenic exon duplications (IED), and **(C)** complete copy gain (CG) by SVs in the high-coverage ensemble call set and 1kGP phase 3 SV call set. Super-population ancestry labels: European (EUR), African (AFR), East Asian (EAS), South Asian (SAS), American (AMR)

**Figure 5**
Small variant phasing and imputation performance (A) Counts of small variants passing specified filtering criteria (chr1–22 and X; top 10 combinations of filtering criteria in terms of variant counts are shown). PASS, sites that passed VQSR; Miss., genotype missingness; HWE, Hardy-Weinberg Equilibrium exact test p value > 1e-10 in at least one of the five 1kGP super-populations; ME, mendelian error rate across complete trios; MAC, minor allele count. See also Table S6. (B) Haplotype phasing accuracy of the high-coverage and the phase 3 1kGP panel. SER, switch error rate relative to the Platinum Genome truth set. Two additional phasing conditions (dashed lines) are shown for the high-coverage panel for evaluation purposes only: (1) diamonds: SER obtained when phasing NA12878 without parents included in the cohort. (2) Triangles: SER obtained when phasing NA12878 with parents included but without the pedigree-based correction (duohmm) applied. See also Figures S6A and S6B. (C) Haplotype phasing accuracy of the high-coverage panel, stratified by relationship status. SER was computed relative to the HGSVC SNV call set (Ebert et al., 2021). See also Figure S6C. (D) Imputation accuracy of SNV and INDEL genotypes imputed using the high-coverage panel, stratified by genomic regions. Mean r², squared Pearson correlation coefficient averaged over 110 SGDP samples. See also Figures S6D–S6G. (E) Comparison of the imputation accuracy between the high-coverage and phase 3 panels for SNVs and INDELs, stratified by super-population ancestry. EUR, European; AFR, African; EAS, East Asian; SAS, South Asian; AMR, American. The comparison was restricted to sites that are shared between the two panels. (B–E) are based on autosomes.

**Figure 6**
SV phasing and imputation performance (A) Cohort-level counts of filtered SVs included in the integrated haplotype-resolved panel, stratified by the SV type (chr1–22 and X). (B) Distribution of sample-level flip rate of phased HET DELs and INSs that were evaluated for phasing accuracy against the HGSVC truth set. (C) Distribution of sample-level parental flip rate of phased HET SVs, stratified by SV type. (D) SV imputation performance of the high-coverage panel in the SGDP study dataset, stratified by SV type. Mean r², squared Pearson correlation coefficient between imputed allelic dosages and dosages from the SV “truth set,” averaged over the 110 SGDP samples (except for the AF = 0.5% bin: 100 and 92 samples for INSs and DELs, respectively). (E) Counts of SVs imputed in the SGDP study dataset using the high-coverage reference panel at info >0.4 (left) and info >0.8 (right) across three MAF bins (MAF based on 110 imputed SGDP samples). (B–E) are based on autosomes. SV types: DEL, deletions; INS, insertions; DUP, duplications; INV, inversions.

**Figure S6**
SNV/INDEL phasing and imputation performance, related to Figure 5 SER: switch error rate stratified by **(A)** chromosome and **(B)** variant type. Note: SER on chr21 in the 0.1–1% MAF bin is equal to 0 (i.e. no switch errors found). This is a fluctuation due to low variant counts per MAF bin in sample NA12878 as chromosomes get smaller. Chromosome X is shown separately in (B) as it was phased using a different strategy than autosomes (statistical phasing vs. statistical phasing with pedigree-based correction, respectively). **(C)** Impact of inclusion of trios on the phasing accuracy of the 1kGP high-coverage call set, stratified by relationship status in the 3,202-sample cohort. log10(SER ratio) refers to the ratio of SER in the phasing run including trios (n = 3,202 samples) vs. phasing run without trios (n = 2,504 samples), computed relative to the HGSVC truth set (1 child, 5 parents, 9 unrelated samples). Imputation accuracy of the high-coverage panel stratified by super-population for SNVs **(D, E)** and INDELs **(F, G)** in easy and difficult regions of the genome. Imputation accuracy was estimated as described in Figure 5D. **(H-L)** Imputation accuracy of the high-coverage panel for each of the five super-populations, stratified by the population. **(M)** Genotype discordance rates for SNVs and INDELs imputed using the high-coverage and phase 3 panels stratified by super-population. **(N)** Counts of SNVs and INDELs imputed in the SGDP study dataset using the high-coverage vs. the phase 3 reference panel at info >0.4 (left) and info >0.8 (right) across three MAF bins (MAF based on the 110 imputed SGDP samples). Panels C-N are based on autosomes

**Figure S7**
SV phasing and imputation performance, related to Figure 6 **(A)** Distribution of sample-level fractions of HET SVs (DELs and INSs) that were assessed for phasing accuracy against the HGSVC truth set in Figure 6B. **(B)** Distribution of sample-level fractions of HET SVs (DELs, INSs, DUPs, INVs) that were assessed for phasing accuracy using parental flip rate as shown in Figure 6C. **(C)** Fraction of SV sites (DELs and INSs; out of all DELs and INSs included in the high-coverage panel) that was included in the imputation performance evaluation against the HGSVC truth set shown in Figure 6D. **(D)** Upset plot showing site-level overlap of DELs and INSs discovered in the high-coverage 1kGP call set with those discovered in the long-read-based HGSVC call set used as the truth set. Overlap criteria: breakpoint position within +/−50 bp from the start site in the 1kGP call set and 80% length overlap. SV types: DEL: deletions, INS: insertions, DUP: duplications, INV: inversions

See this image and copyright information in PMC

Comment in

1000 Genomes Project phase 4: The gift that keeps on giving.
Hanchard NA, Choudhury A. Hanchard NA, et al. Cell. 2022 Sep 1;185(18):3286-3289. doi: 10.1016/j.cell.2022.08.001. Cell. 2022. PMID: 36055197 Clinical Trial.

Cited by

Estimation of genetic variation in vitiligo associated genes: Population genomics perspective.
Bharti N, Banerjee R, Achalare A, Kasibhatla SM, Joshi R. Bharti N, et al. BMC Genom Data. 2024 Jul 26;25(1):72. doi: 10.1186/s12863-024-01254-6. BMC Genom Data. 2024. PMID: 39060965 Free PMC article.
A sequence of SVA retrotransposon insertions in ASIP shaped human pigmentation.
Kamitaki N, Hujoel MLA, Mukamel RE, Gebara E, McCarroll SA, Loh PR. Kamitaki N, et al. Nat Genet. 2024 Jul 24. doi: 10.1038/s41588-024-01841-4. Online ahead of print. Nat Genet. 2024. PMID: 39048794
Molecular and clinical characterization of a founder mutation causing G6PC3 deficiency.
Zhen X, Betti M, Kars ME, Patterson A, Medina-Torres EA, Scheffler Mendoza SC, Herrera Sánchez DA, Lopez-Herrera G, Svyryd Y, Mutchinick O, Gamazon E, Rathmell J, Itan Y, Markle J, O'Farrill Romanillos P, Lugo-Reyes SO, Martinez-Barricarte R. Zhen X, et al. Res Sq [Preprint]. 2024 Jul 11:rs.3.rs-4595246. doi: 10.21203/rs.3.rs-4595246/v1. Res Sq. 2024. PMID: 39041036 Free PMC article. Preprint.
An assessment of the genomic structural variation landscape in Sub-Saharan African populations.
Wiener E, Cottino L, Botha G, Nyangiri O, Noyes H, McLeod A, Jakubosky D, Adebamowo C, Awadalla P, Landouré G, Matshaba M, Matovu E, Ramsay M, Simo G, Simuunza M, Tiemessen C, Wonkam A, Sahibdeen V, Krause A, Lombard Z, Hazelhurst S; as members of the H3Africa Consortium. Wiener E, et al. Res Sq [Preprint]. 2024 Jul 8:rs.3.rs-4485126. doi: 10.21203/rs.3.rs-4485126/v1. Res Sq. 2024. PMID: 39041024 Free PMC article. Preprint.
DSB profiles in human spermatozoa highlight the role of TMEJ in the male germline.
Scheuren M, Möhner J, Müller M, Zischler H. Scheuren M, et al. Front Genet. 2024 Jul 8;15:1423674. doi: 10.3389/fgene.2024.1423674. eCollection 2024. Front Genet. 2024. PMID: 39040993 Free PMC article.

See all "Cited by" articles

References

1. Abel H.J., Larson D.E., Regier A.A., Chiang C., Das I., Kanchi K.L., Layer R.M., Neale B.M., Salerno W.J., Reeves C., et al. Mapping and characterization of structural variation in 17, 795 human genomes. Nature. 2020;583:83–89. doi: 10.1038/s41586-020-2371-0. - DOI - PMC - PubMed
1. Abyzov A., Urban A.E., Snyder M., Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974–984. doi: 10.1101/gr.114876.110. - DOI - PMC - PubMed
1. Almeida R., Ricaño-Ponce I., Kumar V., Deelen P., Szperl A., Trynka G., Gutierrez-Achury J., Kanterakis A., Westra H.-J., Franke L., et al. Fine mapping of the celiac disease-associated LPP locus reveals a potential functional variant. Hum. Mol. Genet. 2014;23:2481–2489. doi: 10.1093/hmg/ddt619. - DOI - PMC - PubMed
1. Andrews S. FastQC. 2019. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
1. Broad Institute Picard Toolkit, Github Repository. 2019. http://broadinstitute.github.io/picard/

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

Collaborators

Affiliations

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources