Abstract
Background: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. Results: Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. Conclusion: Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00438-024-02158-x/MediaObjects/438_2024_2158_Fig1_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00438-024-02158-x/MediaObjects/438_2024_2158_Fig2_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00438-024-02158-x/MediaObjects/438_2024_2158_Fig3_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00438-024-02158-x/MediaObjects/438_2024_2158_Fig4_HTML.png)
Similar content being viewed by others
Data availability
The data analyzed in the present study were all downloaded from public data repositories. Please find more details in Supplementary Table 2.
Abbreviations
- CMRGs:
-
Challenging medically relevant genes
- CNV:
-
Copy number variation
- LRS:
-
Long-read sequencing
- SRS:
-
Short-read sequencing
- WES:
-
Whole-exome sequencing
- WGS:
-
Whole genome sequencing
- SNVs:
-
Single nucleotide variations
- InDels:
-
Short insertions and deletions
- GIAB:
-
Genome in a Bottle
- PacBio:
-
Pacific Biosciences
- ONT:
-
Oxford Nanopore Technology
- SVs:
-
Structural variants
- ADME:
-
Absorption, distribution, metabolism, and excretion
- CLR:
-
Continuous long reads
- HiFi:
-
Highly accurate long reads
- DoC:
-
Depth of coverage
- KIV-2:
-
Kringle IV type 2
- OMIM:
-
Online Mendelian Inheritance in Man
- HGMD:
-
Human Gene Mutation Database
- SD:
-
Standard deviation
- VEP:
-
Ensembl Variant Effect Predictor
References
Aganezov S, Yan SM, Soto DC et al (2022) A complete reference genome improves analysis of human genetic variation. Science 376:eab13533
Altemose N, Logsdon GA, Bzikadze AV et al (2022) Complete genomic and epigenetic maps of human centromeres. Science 376:l4178
Amberger JS, Bocchini CA, Schiettecatte F et al (2015) OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res 43:789–798. https://doi.org/10.1093/nar/gku1205
Audano PA, Sulovari A, Graves-Lindsay TA et al (2019) Characterizing the Major Structural Variant Alleles of the Human Genome. Cell 176(3):663–675. https://doi.org/10.1016/j.cell.2018.12.019
Barile M, Giancaspero TA, Leone P et al (2016) Riboflavin transport and metabolism in humans. J Inherit Metab Dis 39:545–557
Behera S, LeFaive J, Orchard P et al (2022) Fixing reference errors efficiently improves sequencing results. bioRxiv 202:500506
Best S, Wou K, Vora N et al (2018) Promises, pitfalls and practicalities of prenatal whole exome sequencing. Prenat Diagn 38:10–19
Beyter D, Ingimundardottir H, Oddsson A et al (2021) Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53:779–786
Bylund J, Bylund M, Oliw EH (2001) cDna cloning and expression of CYP4F12, a novel human cytochrome P450. Biochem Biophys Res Commun 280:892–897
Chin C-S, Behera S, Metcalf GA et al (2022) A pan-genome approach to decipher variants in the highly complex tandem repeat of LPA. BioRxiv 2022:06
Coassin S, Kronenberg F (2022) Lipoprotein(a) beyond the kringle IV repeat polymorphism: the complexity of genetic variation in the LPA gene. Atherosclerosis 349:17–35
da Rocha JEB, Othman H, Botha G et al (2021) The Extent and Impact of Variation in ADME Genes in Sub-Saharan African Populations. Front Pharmacol 12:634016
Daly AK (2013) Pharmacogenomics of adverse drug reactions. Genome Med 5:5
De Coster W, Weissensteiner MH, Sedlazeck FJ (2021) Towards population-scale long-read sequencing. Nat Rev Genet 22:572–587
Ebert P, Audano PA, Zhu Q et al (2021) Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372:6537. https://doi.org/10.1126/science.abf7117
Esteves F, Rueff J, Kranendonk M (2021) The central role of cytochrome P450 in xenobiotic metabolism-a brief review on a fascinating enzyme family. J Xenobiot 11:94–114
Fan S, Hansen MEB, Lo Y, Tishkoff SA (2016) Going global by adapting local: a review of recent human adaptation. Science 354:54–59
Gong J, Sun H, Wang K et al (2024) Long-read sequencing of 945 Han individuals identifies novel structural variants associated with phenotypic diversity and disease susceptibility. bioRxiv 20:24
Guengerich FP (2015) Human Cytochrome P450 Enzymes. In: Ortiz de Montellano PR (ed) Cytochrome P450: Structure, Mechanism, and Biochemistry. Springer International Publishing, Cham
Harris RS (2007) Improved pairwise alignmnet of genomic DNA. University Park, The Pennsylvania State University
Hashizume T, Imaoka S, Hiroi T et al (2001) cDNA cloning and expression of a novel cytochrome p450 (cyp4f12) from human small intestine. Biochem Biophys Res Commun 280:1135–1141
He Y, Hoskins JM, McLeod HL (2011) Copy number variants in pharmacogenetic genes. Trends Mol Med 17:244–251
Hovelson DH, Xue Z, Zawistowski M et al (2017) Characterization of ADME gene variation in 21 populations by exome sequencing. Pharmacogenet Genomics 27:89–100
Ingelman-Sundberg M, Mkrtchian S, Zhou Y, Lauschke VM (2018) Integrating rare genetic variants into pharmacogenetic drug response predictions. Hum Genomics 12:26
Jain C, Rhie A, Zhang H et al (2020) Weighted minimizer sampling improves long read mapping. Bioinformatics 36:i111–i118
Jin Y, Zollinger M, Borell H et al (2011) CYP4F enzymes are responsible for the elimination of fingolimod (FTY720), a novel treatment of relapsing multiple sclerosis. Drug Metab Dispos 39:191–198
Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006
King EA, Davis JW, Degner JF (2019) Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet 15:e1008489
Krusche P, Trigg L, Boutros PC et al (2019) Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol 37:555–560
Landrum MJ, Lee JM, Riley GR et al (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980
Lee YJ, Kim SY, Kim MJ et al (2021) Infant with early onset bilateral facial and bulbar weakness: Successful treatment of riboflavin in multiple acyl-CoA dehydrogenase deficiency caused by biallelic nonsense FLAD1 variants. Neuromuscul Disord 31:1194–1198
Li H, Handsaker B, Wysoker A et al (2009) The sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079
Lincoln SE, Hambuch T, Zook JM et al (2021) One in seven pathogenic variants can be challenging to detect by NGS: an analysis of 450,000 patients with implications for clinical sensitivity and genetic test implementation. Genet Med 23:1673–1680
Logsdon GA, Vollger MR, Eichler EE (2020) Long-read human genome sequencing and its applications. Nat Rev Genet 21:597–614
Mandelker D, Schmidt RJ, Ankala A et al (2016) Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet Med 18:1282–1289
Martis S, Mei H, Vijzelaar R et al (2013) Multi-ethnic cytochrome-P450 copy number profiling: novel pharmacogenetic alleles and mechanism of copy number variation formation. Pharmacogenomics J 13:558–566
Mason-Suares H, Landry L, Lebo M (2016) Detecting copy number variation via next generation technology. Curr Genet Med Rep 4:74–85
McLaren W, Gil L, Hunt SE et al (2016) The ensembl variant effect predictor. Genome Biol 17:122
Møller PL, Holley G, Beyter D et al (2020) Benchmarking small variant detection with ONT reveals high performance in challenging regions. BioRxiv 2020:350009
Muru K, Reinson K, Künnapas K et al (2019) FLAD1-associated multiple acyl-CoA dehydrogenase deficiency identified by newborn screening. Mol Genet Genomic Med 7:e915
Nelson MR, Tipney H, Painter JL et al (2015) The support of human genetic evidence for approved drug indications. Nat Genet 47:856–860
Nurk S, Koren S, Rhie A et al (2022) The complete sequence of a human genome. Science 376:44–53
Olsen RKJ, Koňaříková E, Giancaspero TA et al (2016) Riboflavin-responsive and -non-responsive mutations in FAD synthase cause multiple Acyl-CoA dehydrogenase and Combined respiratory-chain deficiency. Am J Hum Genet 98:1130–1145
Ortega VE, Meyers DA (2014) Pharmacogenetics: implications of race and ethnicity on defining genetic profiles for personalized medicine. J Allergy Clin Immunol 133:16–26
Paten B, Novak AM, Eizenga JM, Garrison E (2017) Genome graphs and the evolution of genome inference. Genome Res 27:665–676
Patrinos GP, Shuldiner AR (2022) Pharmacogenomics: the low-hanging fruit in the personalized medicine tree. Hum Genet 141:1109–1111
Pedersen BS, Quinlan AR (2018) Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34(5):867–868. https://doi.org/10.1093/bioinformatics/btx699
Pirmohamed M, Park BK (2001) Genetic susceptibility to adverse drug reactions. Trends Pharmacol Sci 22:298–305
Quan C, Li Y, Liu X et al (2021) Characterization of structural variation in Tibetans reveals new evidence of high-altitude adaptation and introgression. Genome Biol 22:159
Rabbani B, Tekin M, Mahdieh N (2014) The promise of whole-exome sequencing in medical genetics. J Hum Genet 59:5–15
Ramírez B, Niño-Orrego MJ, Cárdenas D et al (2019) Copy number variation profiling in pharmacogenetics CYP-450 and GST genes in Colombian population. BMC Med Genomics 12:110
Ryder B, Tolomeo M, Nochi Z et al (2019) A Novel Truncating FLAD1 Variant, Causing Multiple Acyl-CoA Dehydrogenase Deficiency (MADD) in an 8-Year-Old Boy. JIMD Rep 45:37–44
Santos M, Niemi M, Hiratsuka M et al (2018) Novel copy-number variations in pharmacogenes contribute to interindividual differences in drug pharmacokinetics. Genet Med 20:622–629
Savage SA, Bertuch AA (2010) The genetics and clinical manifestations of telomere biology disorders. Genet Med 12:753–764
Schneider VA, Graves-Lindsay T, Howe K et al (2017) Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 27:849–864
Sedlazeck FJ, Lee H, Darby CA, Schatz MC (2018a) Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 19:329–346
Sedlazeck FJ, Rescheneder P, Smolka M et al (2018b) Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 15:461–468
Shafin K, Pesout T, Chang P-C et al (2021) Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods 18:1322–1332
Stenson PD, Mort M, Ball EV et al (2017) The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 136:665–677
Tan R, Wang Y, Kleinstein SE et al (2014) An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum Mutat 35:899–907
Taylor RW, Pyle A, Griffin H et al (2014) Use of whole-exome sequencing to determine the genetic basis of multiple mitochondrial respiratory chain complex deficiencies. JAMA 312:68–77
Telenti A, Pierce LCT, Biggs WH et al (2016) Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A 113:11901–11906
Tishkoff SA, Reed FA, Ranciaro A et al (2007) Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 39:31–40
Vollger MR, Guitart X, Dishuck PC et al (2022) Segmental duplications and their variation in a complete human genome. Science 376:6965
Wagner J, Olson ND, Harris L et al (2022) Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol 40:672–680
Wang MZ, Saulter JY, Usuki E et al (2006) CYP4F enzymes are the major enzymes in human liver microsomes that catalyze the O-demethylation of the antiparasitic prodrug DB289 [2,5-bis(4-amidinophenyl)furan-bis-O-methylamidoxime]. Drug Metab Dispos 34:1985–1994
Wang Y, Li Y, Lu J et al (2018) Involvement of CYP4F2 in the metabolism of a novel monophosphate ester prodrug of gemcitabine and its interaction potential in vitro. Molecules 23(5):1195. https://doi.org/10.3390/molecules23051195
Witt KE, Huerta-Sánchez E (2019) Convergent evolution in human and domesticate adaptation to high-altitude environments. Philos Trans R Soc Lond B Biol Sci 374:20180235
Yamaguchi H, Calado RT, Ly H et al (2005) Mutations in TERT, the gene for telomerase reverse transcriptase, in aplastic anemia. N Engl J Med 352:1413–1424
Yang Y, Muzny DM, Reid JG et al (2013) Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med 369:1502–1511
Zare F, Dow M, Monteleone N et al (2017) An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. BMC Bioinformatics 18:286
Zhao L, Liu H, Yuan X et al (2020) Comparative study of whole exome sequencing-based copy number variation detection tools. BMC Bioinformatics 21:97
Zheng Z, Li S, Su J et al (2021) Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. bioRxiv 2021:474431
Zhou Y, Lauschke VM (2022) Population pharmacogenomics: an update on ethnogeographic differences and opportunities for precision public health. Hum Genet 141:1113–1136
Zook JM, Hansen NF, Olson ND et al (2020) A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol 38:1347–1355
Acknowledgements
The authors thank Shuhang Li for the discussion of short variant calling.
Funding
S.F. is supported by grants from the National Key R&D Program of China (Grant No. 2020YFE0201600 and 2021YFC2500202), Shanghai Municipal Science and Technology (Grant No. 2017SHZDZX01), and the National Natural Science Foundation of China (Grant No. 31970563, 32370686). F.S is supported by NIH grants (UM1HG008898, 1U01HG011758-01).
Author information
Authors and Affiliations
Contributions
Conceptualization: S.F. and F.S; data curation: Y.J. and J.G.; formal analysis: Y.J. and J.Z; supervision: S.F. and F.S.; visualization: Y.J. and J.Z.; writing original draft: S.F.; writing review and editing: all authors.
Corresponding authors
Ethics declarations
Conflicts of interest
F.S receives research support from Illumina, PacBio, and Oxford Nanopore.
Additional information
Communicated by Shuhua Xu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ji, Y., Zhao, J., Gong, J. et al. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 299, 65 (2024). https://doi.org/10.1007/s00438-024-02158-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00438-024-02158-x