Skip to main content
Log in

Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations

  • Original Article
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Background: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. Results: Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. Conclusion: Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The data analyzed in the present study were all downloaded from public data repositories. Please find more details in Supplementary Table 2.

Abbreviations

CMRGs:

Challenging medically relevant genes

CNV:

Copy number variation

LRS:

Long-read sequencing

SRS:

Short-read sequencing

WES:

Whole-exome sequencing

WGS:

Whole genome sequencing

SNVs:

Single nucleotide variations

InDels:

Short insertions and deletions

GIAB:

Genome in a Bottle

PacBio:

Pacific Biosciences

ONT:

Oxford Nanopore Technology

SVs:

Structural variants

ADME:

Absorption, distribution, metabolism, and excretion

CLR:

Continuous long reads

HiFi:

Highly accurate long reads

DoC:

Depth of coverage

KIV-2:

Kringle IV type 2

OMIM:

Online Mendelian Inheritance in Man

HGMD:

Human Gene Mutation Database

SD:

Standard deviation

VEP:

Ensembl Variant Effect Predictor

References

  • Aganezov S, Yan SM, Soto DC et al (2022) A complete reference genome improves analysis of human genetic variation. Science 376:eab13533

    Article  Google Scholar 

  • Altemose N, Logsdon GA, Bzikadze AV et al (2022) Complete genomic and epigenetic maps of human centromeres. Science 376:l4178

    Article  Google Scholar 

  • Amberger JS, Bocchini CA, Schiettecatte F et al (2015) OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res 43:789–798. https://doi.org/10.1093/nar/gku1205

    Article  CAS  Google Scholar 

  • Audano PA, Sulovari A, Graves-Lindsay TA et al (2019) Characterizing the Major Structural Variant Alleles of the Human Genome. Cell 176(3):663–675. https://doi.org/10.1016/j.cell.2018.12.019

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Barile M, Giancaspero TA, Leone P et al (2016) Riboflavin transport and metabolism in humans. J Inherit Metab Dis 39:545–557

    Article  CAS  PubMed  Google Scholar 

  • Behera S, LeFaive J, Orchard P et al (2022) Fixing reference errors efficiently improves sequencing results. bioRxiv 202:500506

    Google Scholar 

  • Best S, Wou K, Vora N et al (2018) Promises, pitfalls and practicalities of prenatal whole exome sequencing. Prenat Diagn 38:10–19

    Article  CAS  PubMed  Google Scholar 

  • Beyter D, Ingimundardottir H, Oddsson A et al (2021) Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53:779–786

    Article  CAS  PubMed  Google Scholar 

  • Bylund J, Bylund M, Oliw EH (2001) cDna cloning and expression of CYP4F12, a novel human cytochrome P450. Biochem Biophys Res Commun 280:892–897

    Article  CAS  PubMed  Google Scholar 

  • Chin C-S, Behera S, Metcalf GA et al (2022) A pan-genome approach to decipher variants in the highly complex tandem repeat of LPA. BioRxiv 2022:06

    Google Scholar 

  • Coassin S, Kronenberg F (2022) Lipoprotein(a) beyond the kringle IV repeat polymorphism: the complexity of genetic variation in the LPA gene. Atherosclerosis 349:17–35

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • da Rocha JEB, Othman H, Botha G et al (2021) The Extent and Impact of Variation in ADME Genes in Sub-Saharan African Populations. Front Pharmacol 12:634016

    Article  PubMed  PubMed Central  Google Scholar 

  • Daly AK (2013) Pharmacogenomics of adverse drug reactions. Genome Med 5:5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • De Coster W, Weissensteiner MH, Sedlazeck FJ (2021) Towards population-scale long-read sequencing. Nat Rev Genet 22:572–587

    Article  PubMed  PubMed Central  Google Scholar 

  • Ebert P, Audano PA, Zhu Q et al (2021) Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372:6537. https://doi.org/10.1126/science.abf7117

    Article  CAS  Google Scholar 

  • Esteves F, Rueff J, Kranendonk M (2021) The central role of cytochrome P450 in xenobiotic metabolism-a brief review on a fascinating enzyme family. J Xenobiot 11:94–114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fan S, Hansen MEB, Lo Y, Tishkoff SA (2016) Going global by adapting local: a review of recent human adaptation. Science 354:54–59

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gong J, Sun H, Wang K et al (2024) Long-read sequencing of 945 Han individuals identifies novel structural variants associated with phenotypic diversity and disease susceptibility. bioRxiv 20:24

    Google Scholar 

  • Guengerich FP (2015) Human Cytochrome P450 Enzymes. In: Ortiz de Montellano PR (ed) Cytochrome P450: Structure, Mechanism, and Biochemistry. Springer International Publishing, Cham

    Google Scholar 

  • Harris RS (2007) Improved pairwise alignmnet of genomic DNA. University Park, The Pennsylvania State University

    Google Scholar 

  • Hashizume T, Imaoka S, Hiroi T et al (2001) cDNA cloning and expression of a novel cytochrome p450 (cyp4f12) from human small intestine. Biochem Biophys Res Commun 280:1135–1141

    Article  CAS  PubMed  Google Scholar 

  • He Y, Hoskins JM, McLeod HL (2011) Copy number variants in pharmacogenetic genes. Trends Mol Med 17:244–251

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hovelson DH, Xue Z, Zawistowski M et al (2017) Characterization of ADME gene variation in 21 populations by exome sequencing. Pharmacogenet Genomics 27:89–100

    Article  CAS  PubMed  Google Scholar 

  • Ingelman-Sundberg M, Mkrtchian S, Zhou Y, Lauschke VM (2018) Integrating rare genetic variants into pharmacogenetic drug response predictions. Hum Genomics 12:26

    Article  PubMed  PubMed Central  Google Scholar 

  • Jain C, Rhie A, Zhang H et al (2020) Weighted minimizer sampling improves long read mapping. Bioinformatics 36:i111–i118

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jin Y, Zollinger M, Borell H et al (2011) CYP4F enzymes are responsible for the elimination of fingolimod (FTY720), a novel treatment of relapsing multiple sclerosis. Drug Metab Dispos 39:191–198

    Article  CAS  PubMed  Google Scholar 

  • Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • King EA, Davis JW, Degner JF (2019) Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet 15:e1008489

    Article  PubMed  PubMed Central  Google Scholar 

  • Krusche P, Trigg L, Boutros PC et al (2019) Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol 37:555–560

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Landrum MJ, Lee JM, Riley GR et al (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980

    Article  CAS  PubMed  Google Scholar 

  • Lee YJ, Kim SY, Kim MJ et al (2021) Infant with early onset bilateral facial and bulbar weakness: Successful treatment of riboflavin in multiple acyl-CoA dehydrogenase deficiency caused by biallelic nonsense FLAD1 variants. Neuromuscul Disord 31:1194–1198

    Article  PubMed  Google Scholar 

  • Li H, Handsaker B, Wysoker A et al (2009) The sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  • Lincoln SE, Hambuch T, Zook JM et al (2021) One in seven pathogenic variants can be challenging to detect by NGS: an analysis of 450,000 patients with implications for clinical sensitivity and genetic test implementation. Genet Med 23:1673–1680

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Logsdon GA, Vollger MR, Eichler EE (2020) Long-read human genome sequencing and its applications. Nat Rev Genet 21:597–614

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mandelker D, Schmidt RJ, Ankala A et al (2016) Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet Med 18:1282–1289

    Article  CAS  PubMed  Google Scholar 

  • Martis S, Mei H, Vijzelaar R et al (2013) Multi-ethnic cytochrome-P450 copy number profiling: novel pharmacogenetic alleles and mechanism of copy number variation formation. Pharmacogenomics J 13:558–566

    Article  CAS  PubMed  Google Scholar 

  • Mason-Suares H, Landry L, Lebo M (2016) Detecting copy number variation via next generation technology. Curr Genet Med Rep 4:74–85

    Article  Google Scholar 

  • McLaren W, Gil L, Hunt SE et al (2016) The ensembl variant effect predictor. Genome Biol 17:122

    Article  PubMed  PubMed Central  Google Scholar 

  • Møller PL, Holley G, Beyter D et al (2020) Benchmarking small variant detection with ONT reveals high performance in challenging regions. BioRxiv 2020:350009

    Google Scholar 

  • Muru K, Reinson K, Künnapas K et al (2019) FLAD1-associated multiple acyl-CoA dehydrogenase deficiency identified by newborn screening. Mol Genet Genomic Med 7:e915

    Article  PubMed  PubMed Central  Google Scholar 

  • Nelson MR, Tipney H, Painter JL et al (2015) The support of human genetic evidence for approved drug indications. Nat Genet 47:856–860

    Article  CAS  PubMed  Google Scholar 

  • Nurk S, Koren S, Rhie A et al (2022) The complete sequence of a human genome. Science 376:44–53

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Olsen RKJ, Koňaříková E, Giancaspero TA et al (2016) Riboflavin-responsive and -non-responsive mutations in FAD synthase cause multiple Acyl-CoA dehydrogenase and Combined respiratory-chain deficiency. Am J Hum Genet 98:1130–1145

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ortega VE, Meyers DA (2014) Pharmacogenetics: implications of race and ethnicity on defining genetic profiles for personalized medicine. J Allergy Clin Immunol 133:16–26

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Paten B, Novak AM, Eizenga JM, Garrison E (2017) Genome graphs and the evolution of genome inference. Genome Res 27:665–676

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Patrinos GP, Shuldiner AR (2022) Pharmacogenomics: the low-hanging fruit in the personalized medicine tree. Hum Genet 141:1109–1111

    Article  PubMed  Google Scholar 

  • Pedersen BS, Quinlan AR (2018) Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34(5):867–868. https://doi.org/10.1093/bioinformatics/btx699

    Article  CAS  PubMed  Google Scholar 

  • Pirmohamed M, Park BK (2001) Genetic susceptibility to adverse drug reactions. Trends Pharmacol Sci 22:298–305

    Article  CAS  PubMed  Google Scholar 

  • Quan C, Li Y, Liu X et al (2021) Characterization of structural variation in Tibetans reveals new evidence of high-altitude adaptation and introgression. Genome Biol 22:159

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rabbani B, Tekin M, Mahdieh N (2014) The promise of whole-exome sequencing in medical genetics. J Hum Genet 59:5–15

    Article  CAS  PubMed  Google Scholar 

  • Ramírez B, Niño-Orrego MJ, Cárdenas D et al (2019) Copy number variation profiling in pharmacogenetics CYP-450 and GST genes in Colombian population. BMC Med Genomics 12:110

    Article  PubMed  PubMed Central  Google Scholar 

  • Ryder B, Tolomeo M, Nochi Z et al (2019) A Novel Truncating FLAD1 Variant, Causing Multiple Acyl-CoA Dehydrogenase Deficiency (MADD) in an 8-Year-Old Boy. JIMD Rep 45:37–44

    Article  CAS  PubMed  Google Scholar 

  • Santos M, Niemi M, Hiratsuka M et al (2018) Novel copy-number variations in pharmacogenes contribute to interindividual differences in drug pharmacokinetics. Genet Med 20:622–629

    Article  CAS  PubMed  Google Scholar 

  • Savage SA, Bertuch AA (2010) The genetics and clinical manifestations of telomere biology disorders. Genet Med 12:753–764

    Article  PubMed  Google Scholar 

  • Schneider VA, Graves-Lindsay T, Howe K et al (2017) Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 27:849–864

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sedlazeck FJ, Lee H, Darby CA, Schatz MC (2018a) Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 19:329–346

    Article  CAS  PubMed  Google Scholar 

  • Sedlazeck FJ, Rescheneder P, Smolka M et al (2018b) Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 15:461–468

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shafin K, Pesout T, Chang P-C et al (2021) Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods 18:1322–1332

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Stenson PD, Mort M, Ball EV et al (2017) The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 136:665–677

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tan R, Wang Y, Kleinstein SE et al (2014) An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum Mutat 35:899–907

    Article  CAS  PubMed  Google Scholar 

  • Taylor RW, Pyle A, Griffin H et al (2014) Use of whole-exome sequencing to determine the genetic basis of multiple mitochondrial respiratory chain complex deficiencies. JAMA 312:68–77

    Article  PubMed  PubMed Central  Google Scholar 

  • Telenti A, Pierce LCT, Biggs WH et al (2016) Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A 113:11901–11906

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tishkoff SA, Reed FA, Ranciaro A et al (2007) Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 39:31–40

    Article  CAS  PubMed  Google Scholar 

  • Vollger MR, Guitart X, Dishuck PC et al (2022) Segmental duplications and their variation in a complete human genome. Science 376:6965

    Article  Google Scholar 

  • Wagner J, Olson ND, Harris L et al (2022) Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol 40:672–680

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang MZ, Saulter JY, Usuki E et al (2006) CYP4F enzymes are the major enzymes in human liver microsomes that catalyze the O-demethylation of the antiparasitic prodrug DB289 [2,5-bis(4-amidinophenyl)furan-bis-O-methylamidoxime]. Drug Metab Dispos 34:1985–1994

    Article  CAS  PubMed  Google Scholar 

  • Wang Y, Li Y, Lu J et al (2018) Involvement of CYP4F2 in the metabolism of a novel monophosphate ester prodrug of gemcitabine and its interaction potential in vitro. Molecules 23(5):1195. https://doi.org/10.3390/molecules23051195

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Witt KE, Huerta-Sánchez E (2019) Convergent evolution in human and domesticate adaptation to high-altitude environments. Philos Trans R Soc Lond B Biol Sci 374:20180235

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yamaguchi H, Calado RT, Ly H et al (2005) Mutations in TERT, the gene for telomerase reverse transcriptase, in aplastic anemia. N Engl J Med 352:1413–1424

    Article  CAS  PubMed  Google Scholar 

  • Yang Y, Muzny DM, Reid JG et al (2013) Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med 369:1502–1511

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zare F, Dow M, Monteleone N et al (2017) An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. BMC Bioinformatics 18:286

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhao L, Liu H, Yuan X et al (2020) Comparative study of whole exome sequencing-based copy number variation detection tools. BMC Bioinformatics 21:97

    Article  PubMed  PubMed Central  Google Scholar 

  • Zheng Z, Li S, Su J et al (2021) Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. bioRxiv 2021:474431

    Google Scholar 

  • Zhou Y, Lauschke VM (2022) Population pharmacogenomics: an update on ethnogeographic differences and opportunities for precision public health. Hum Genet 141:1113–1136

    Article  CAS  PubMed  Google Scholar 

  • Zook JM, Hansen NF, Olson ND et al (2020) A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol 38:1347–1355

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank Shuhang Li for the discussion of short variant calling.

Funding

S.F. is supported by grants from the National Key R&D Program of China (Grant No. 2020YFE0201600 and 2021YFC2500202), Shanghai Municipal Science and Technology (Grant No. 2017SHZDZX01), and the National Natural Science Foundation of China (Grant No. 31970563, 32370686). F.S is supported by NIH grants (UM1HG008898, 1U01HG011758-01).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: S.F. and F.S; data curation: Y.J. and J.G.; formal analysis: Y.J. and J.Z; supervision: S.F. and F.S.; visualization: Y.J. and J.Z.; writing original draft: S.F.; writing review and editing: all authors.

Corresponding authors

Correspondence to Fritz J. Sedlazeck or Shaohua Fan.

Ethics declarations

Conflicts of interest

F.S receives research support from Illumina, PacBio, and Oxford Nanopore.

Additional information

Communicated by Shuhua Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, Y., Zhao, J., Gong, J. et al. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 299, 65 (2024). https://doi.org/10.1007/s00438-024-02158-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00438-024-02158-x

Keywords

Navigation