Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 31;3(1):246-59.
doi: 10.1016/j.celrep.2012.12.008. Epub 2013 Jan 10.

Deciphering signatures of mutational processes operative in human cancer

Affiliations

Deciphering signatures of mutational processes operative in human cancer

Ludmil B Alexandrov et al. Cell Rep. .

Abstract

The genome of a cancer cell carries somatic mutations that are the cumulative consequences of the DNA damage and repair processes operative during the cellular lineage between the fertilized egg and the cancer cell. Remarkably, these mutational processes are poorly characterized. Global sequencing initiatives are yielding catalogs of somatic mutations from thousands of cancers, thus providing the unique opportunity to decipher the signatures of mutational processes operative in human cancer. However, until now there have been no theoretical models describing the signatures of mutational processes operative in cancer genomes and no systematic computational approaches are available to decipher these mutational signatures. Here, by modeling mutational processes as a blind source separation problem, we introduce a computational framework that effectively addresses these questions. Our approach provides a basis for characterizing mutational signatures from cancer-derived somatic mutational catalogs, paving the way to insights into the pathogenetic mechanism underlying all cancers.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
Modeling Signatures of Mutational Processes Operative in Cancer Genomes (A) Simulated example of three mutational processes operative in a single cancer genome. The mutational catalog of the cancer genome is modeled as a linear superposition of the signatures of the three processes and the respective number of mutations contributed by each signature, plus added nonsystematic noise. (B) Simulated example illustrating mutational processes operative in a set of G cancer genomes. The mutational catalogs of these G cancer genomes can be used to decipher the signatures of N mutational processes as well as the number of mutations caused by each of the processes in each of the genomes. The extracted signatures and contributions do not allow an exact reconstruction of the original set, thus resulting in genome-specific reconstruction error.
Figure 2
Figure 2
Deciphering Signatures of Mutational Processes from a Set of Simulated Mutational Catalogs from 100 Cancer Genomes (A) Identifying the number of processes operative in a set of 100 simulated cancer genomes based on reproducibility of their signatures and low error for reconstructing the original catalogs. (B) Comparison between the ten deciphered signatures and the ten signatures used to simulate the catalogs. Signature recognition, measured using cosine similarity, and signature reproducibility, measured using average silhouette width, is given for each mutational signature. The error bars represent the SD of the corresponding characteristics for the extracted signature(s). (C) Comparison between deciphered and simulated contributions of one of the ten mutational processes in all cancer genomes. (D) Comparison between deciphered and simulated contributions of all signatures in a typical cancer genome. The error bars represent the SD of the corresponding characteristics for the extracted signature(s). (E) Comparison between the profiles of typical deciphered and simulated signature. The error bars represent the SD of the corresponding characteristics for the extracted signature(s). (F) Comparison between the mutational catalogs of a typical deciphered (red line) and simulated (dark blue line) cancer genome. The separately bootstrapped per iteration mutational catalogs (Experimental Procedures), which are used to decipher the mutational signatures and their contributions, are shown in light blue.
Figure 3
Figure 3
Evaluating Factors Affecting the Efficacy of Deciphering Mutational Signatures with Simulated Data (A) Evaluating the effect of deciphering similar mutational signatures from mutational catalogs containing different number of cancer genomes. Signatures III and IV were simulated with cosine similarity between 0.9 and 1.0 (i.e., with extremely similar shapes) whereas the remaining two signatures were very different from any of the other signatures (Figure S1A). (B) Evaluating the effect of deciphering mutational signatures with different similarities between them from mutational catalogs of 20 cancer genomes. (C) Evaluating the effect of deciphering different number of mutational signatures from sets of mutational catalogs derived from 10, 20, 30, 50, 70, 100, and 200 cancer genomes. (D) Evaluating the effect of deciphering different number of mutational signatures from sets of mutational catalogs derived from 50 cancer genomes. The catalogs were simulated with different average number of mutations in a cancer genome. (E) Evaluating the effect of deciphering two, three, five, or seven mutational signatures from large sets of mutational catalogs containing small number of average mutations per cancer genome. The line colors correspond to the ones in (D) legend. (F) Evaluating the effect of deciphering mutational signatures with different contributions across sets of 50 mutational catalogs. Signature I’s contributions were fixed to contribute a fixed percentage of all mutations in either the whole set of mutational catalogs, i.e., the overall contribution is fixed but different genomes can have different contributions of Signature I (blue bars) or in each individual cancer genome, i.e., Signature I’s contributions are fixed in every single mutational catalog (red bars). (G) Comparison, across all performed simulations, between the accuracy for deciphering mutational signatures and the deciphering error for identifying the contributions of these signatures. The deciphering Frobenius reconstruction error was calculated and averaged for each contribution and normalized based on the number of mutations in the respective mutational catalog. In all panels, deciphering accuracy is shown in cosine similarity where accuracy of 1.00 corresponds to extracting exactly the same process used to simulate the data. The error bars represent the SD of the deciphering accuracies after performing each simulation scenario 100 times. See also Figure S1.
Figure 4
Figure 4
Signatures of Mutational Processes Extracted from the Mutational Catalogs of 21 Breast Cancer Genomes (A) Four mutational signatures deciphered from the base substitutions (including their immediate 3′ and 5′ sequence context) identified in the 21 breast cancer genomes. (B) A fifth mutational signature identified when kataegis, dinucleotide substitutions, and indels at microhomologies and at mono or polynucleotide repeats are added as mutation types. (C) Total contributions of mutations of the five signatures for kataegis, dinucleotide substitutions, and indels in the 21 breast cancer genomes. The error bars represent the SD of the contributions for each mutation type for the deciphered signature. See also Figure S2.
Figure 5
Figure 5
Strand Bias in Signatures of Mutational Processes Extracted from Genic Regions of 21 Breast Cancer Genomes (A) Four mutational signatures deciphered from the base substitutions (including their immediate 3′ and 5′ sequence context) identified in genic regions of 21 breast cancer genomes. (B) Sequence context independent summary of strand bias in the four mutational signatures extracted from the 21 breast cancer genomes. The error bars represent the SD of the contributions for each mutation type for the deciphered signature.
Figure 6
Figure 6
Signatures of Mutational Processes Extended to Include Additional Sequence Context (A) Signature 2 deciphered from the base substitutions (including the two bases 5′ and 3′ to each mutated base resulting in 1,536 possible mutated pentanucleotides) identified in 21 breast cancer genomes. (B) Detailed view of C > T mutation types in Signature 2. Purine nucleotides located two bases 5′ of the mutated base are shown in green whereas pyrimidine nucleotides are in red. (C) Summary of all mutation types caused by Signature 2. The error bars represent the SD of the contributions for each mutation type for the deciphered signature.
Figure 7
Figure 7
Signatures of Mutational Processes Extracted from the Mutational Catalogs of 100 Breast Cancer Exomes (A) Two mutational signatures deciphered from the base substitutions (including their immediate 3′ and 5′ sequence context) identified in the exomes of 100 breast cancers. (B) Strand bias signatures deciphered from the base substitutions identified in the exomes of 100 breast cancers. (C) Sequence context independent summary of strand bias in the two mutational signatures extracted from the 100 breast cancer exomes. The error bars represent the SD of the contributions for each mutation type for the deciphered signature.
Figure S1
Figure S1
Additional Factors Affecting the Efficacy of Deciphering Mutational Signatures with Simulated Data, Related to Figure 3 (A) Design for simulating the signatures of four mutational processes with different similarities between them. Signatures I and II differ significantly from each other as well as from the other two Signatures (cosine similarity between 0.00 and 0.20). Signatures III and IV were simulated with varying similarities between them. (B) Dependency between accurately deciphered signatures (i.e., cosine similarity between simulated and deciphered signature > 0.95) and the number of mutational catalogs needed to decipherer these signatures. (C) Identifying the maximum number of accurately deciphered signatures (cosine similarity between simulated and deciphered signature shown in the legend) from sets of mutational catalogs simulated using the signatures of 20 mutational processes. (D) Distribution of the normalized Frobenius error for identifying the contributions of accurately deciphered signatures of mutational processes (i.e., cosine similarity between simulated and deciphered signature > 0.95). (E) Average symmetric mean absolute percentage error for identifying the contributions of accurately deciphered signatures of mutational processes (i.e., cosine similarity between simulated and deciphered signature > 0.95) based on the number mutations contributed by the signature.
Figure S2
Figure S2
Signatures of Mutational Processes Extracted from the Extended Mutational Catalogs of 21 Breast Cancer Genomes, Related to Figure 4 Four of the five mutational signatures deciphered from the base substitutions (including their immediate 3′ and 5′ sequence context), kataegis, indels, and dinucleotide substitutions identified in the 21 breast cancer genomes. The fifth mutational signature is shown in Figure 4B.

Similar articles

  • The repertoire of mutational signatures in human cancer.
    Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, Islam SMA, Lopez-Bigas N, Klimczak LJ, McPherson JR, Morganella S, Sabarinathan R, Wheeler DA, Mustonen V; PCAWG Mutational Signatures Working Group; Getz G, Rozen SG, Stratton MR; PCAWG Consortium. Alexandrov LB, et al. Nature. 2020 Feb;578(7793):94-101. doi: 10.1038/s41586-020-1943-3. Epub 2020 Feb 5. Nature. 2020. PMID: 32025018 Free PMC article.
  • Mutational spectra and mutational signatures: Insights into cancer aetiology and mechanisms of DNA damage and repair.
    Phillips DH. Phillips DH. DNA Repair (Amst). 2018 Nov;71:6-11. doi: 10.1016/j.dnarep.2018.08.003. Epub 2018 Aug 24. DNA Repair (Amst). 2018. PMID: 30236628 Free PMC article. Review.
  • Bioinformatic Methods to Identify Mutational Signatures in Cancer.
    Islam SMA, Alexandrov LB. Islam SMA, et al. Methods Mol Biol. 2021;2185:447-473. doi: 10.1007/978-1-0716-0810-4_28. Methods Mol Biol. 2021. PMID: 33165866
  • The topography of mutational processes in breast cancer genomes.
    Morganella S, Alexandrov LB, Glodzik D, Zou X, Davies H, Staaf J, Sieuwerts AM, Brinkman AB, Martin S, Ramakrishna M, Butler A, Kim HY, Borg Å, Sotiriou C, Futreal PA, Campbell PJ, Span PN, Van Laere S, Lakhani SR, Eyfjord JE, Thompson AM, Stunnenberg HG, van de Vijver MJ, Martens JW, Børresen-Dale AL, Richardson AL, Kong G, Thomas G, Sale J, Rada C, Stratton MR, Birney E, Nik-Zainal S. Morganella S, et al. Nat Commun. 2016 May 2;7:11383. doi: 10.1038/ncomms11383. Nat Commun. 2016. PMID: 27136393 Free PMC article.
  • Mechanisms underlying mutational signatures in human cancers.
    Helleday T, Eshtad S, Nik-Zainal S. Helleday T, et al. Nat Rev Genet. 2014 Sep;15(9):585-98. doi: 10.1038/nrg3729. Epub 2014 Jul 1. Nat Rev Genet. 2014. PMID: 24981601 Free PMC article. Review.

Cited by

References

    1. Ames B.N., Gold L.S. Endogenous mutagens and the causes of aging and cancer. Mutat. Res. 1991;250:3–16. - PubMed
    1. Berry M.W., Browne M., Langville A.N., Pauca V.P., Plemmons R.J. Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 2007;52:155–173.
    1. Berwick M., Vineis P. Markers of DNA repair and susceptibility to cancer in humans: an epidemiologic review. J. Natl. Cancer Inst. 2000;92:874–897. - PubMed
    1. Brunet J.P., Tamayo P., Golub T.R., Mesirov J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA. 2004;101:4164–4169. - PMC - PubMed
    1. Comon P. First Edition. Elsevier; Boston, MA: 2010. Handbook of Blind Source Separation: Independent Component Analysis and Blind Deconvolution.

Publication types