Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec;612(7938):106-115.
doi: 10.1038/s41586-022-05249-0. Epub 2022 Oct 26.

Single-cell genomic variation induced by mutational processes in cancer

Collaborators, Affiliations

Single-cell genomic variation induced by mutational processes in cancer

Tyler Funnell et al. Nature. 2022 Dec.

Abstract

How cell-to-cell copy number alterations that underpin genomic instability1 in human cancers drive genomic and phenotypic variation, and consequently the evolution of cancer2, remains understudied. Here, by applying scaled single-cell whole-genome sequencing3 to wild-type, TP53-deficient and TP53-deficient;BRCA1-deficient or TP53-deficient;BRCA2-deficient mammary epithelial cells (13,818 genomes), and to primary triple-negative breast cancer (TNBC) and high-grade serous ovarian cancer (HGSC) cells (22,057 genomes), we identify three distinct 'foreground' mutational patterns that are defined by cell-to-cell structural variation. Cell- and clone-specific high-level amplifications, parallel haplotype-specific copy number alterations and copy number segment length variation (serrate structural variations) had measurable phenotypic and evolutionary consequences. In TNBC and HGSC, clone-specific high-level amplifications in known oncogenes were highly prevalent in tumours bearing fold-back inversions, relative to tumours with homologous recombination deficiency, and were associated with increased clone-to-clone phenotypic variation. Parallel haplotype-specific alterations were also commonly observed, leading to phylogenetic evolutionary diversity and clone-specific mono-allelic expression. Serrate variants were increased in tumours with fold-back inversions and were highly correlated with increased genomic diversity of cellular populations. Together, our findings show that cell-to-cell structural variation contributes to the origins of phenotypic and evolutionary diversity in TNBC and HGSC, and provide insight into the genomic and mutational states of individual cancer cells.

PubMed Disclaimer

Conflict of interest statement

B.W. reports ad hoc membership of the advisory board of Repare Therapeutics, outside the scope of this study. J.S.R.-F. reports receiving personal or consultancy fees from Goldman Sachs, REPARE Therapeutics, Paige.AI and Eli Lilly, membership of the scientific advisory boards of VolitionRx, REPARE Therapeutics and Paige.AI, membership of the Board of Directors of Grupo Oncoclinicas and ad hoc membership of the scientific advisory boards of Roche Tissue Diagnostics, Ventana Medical Systems, Novartis, Genentech and InVicro, outside the scope of this study. D.G.H is the Chief Medical officer of Imagia Canexia Health, outside the scope of this study. S.P.S. and S.A. are shareholders and consultants of Canexia Health and shareholders of Imagia Canexia Health, outside the scope of this study The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Single-cell genome properties of CRISPR–Cas9-derived isogenic genotypes of 184-hTERT mammary epithelial cell lines.
a, Genotype lineage diagram showing wild-type→TP53BRCA1/BRCA2 alleles. The horizontal axis shows the relative passage number; the number of cell genomes per lineage is shown in parentheses. b,c, Wild-type (WT), TP53−/− (b) and BRCA1−/− (c) 184-hTERT single-cell genomes sequenced with DLP+. Top track, total copy number; bottom track, HSCN states (A haplotype in green; B haplotype in purple). d,e, Heat map representations of copy number profiles from cell populations of TP53−/− (n = 650 cells) (d) and BRCA1−/− (n = 382 cells) (e) lineages. Top, total copy number; bottom, haplotype-specific states. Rows represent cells, and columns the indicated chromosomes. Clone assignment is based on CNA profiles. hom, homozygous. fk, Comparisons of the rates of polyploidization (f), proportions of cells with chromosome missegregation (g), distributions over number of segments with gains (red), loss (blue) and either gains or loss (box plots) (h), distributions over ratio of gains/losses (i), numbers of segments that have lost heterozygosity (j) and distributions of pairwise HSCN distances between 250 subsampled cells (n = 31,125 cell pairs for all datasets; see Methods) (k). fj, One data point per cell; number of cells as shown in a. fk, Horizontal axes: cell line genotypes; BRCA1 red, BRCA2 green, TP53 blue. Half-filled red and green boxes indicate BRCA1+/− and BRCA2+/−, respectively. All box plots indicate the median, first and third quartiles (hinges), and the most extreme data points no farther than 1.5× the IQR from the hinge (whiskers).
Fig. 2
Fig. 2. Processes that generate cell-to-cell variation in single-cell genomes.
a, Number of HLAMPs per cell as a total proportion of cells. bd, Oncogenic amplifications found in the cell lines. For each panel: annotated SVs: INV, inversion; FBI, fold-back inversion; DUP, duplications (top); HSCN in two cells with the locations of oncogenes shown with dashed lines and arrows (middle); distributions of copy number in single cells relative to wild-type cells (bottom). ***P < 0.001 (bottom); all P values < 10−10). e, Parallel copy number gains in SA906b: heat maps showing total copy number (top) and HSCN (bottom) for chr. 20 in SA906b (n = 2,312 cells). f, Two individual cells from e. g, Number of parallel copy number events per cell in 184-hTERT mammary epithelial cell lines. cnLOH, copy neutral loss of heterozygosity. h, Copy number heat maps showing the variation in breakpoint location across cells. Top to bottom: dataset, breakpoint location and number of cells; ideogram indicating the chromosome region shown in the heat map and the number of cells; average copy number across cells in the heat map, with breakpoint-adjacent segment copy number states indicated with dotted black lines; copy number states inferred from HMMCopy; haplotype-specific states inferred from SIGNALS. Heat map x axis, genomic bins; y axis, cells with the indicated breakpoint. The greyscale passage number indicates time points (passage number) of each cell; cells are ordered by breakpoint position (left to right). The left and the middle heat map represent two different clones from the same cell line with distinct breakpoints.
Fig. 3
Fig. 3. Single-cell genome properties of PDX models and patient tissues.
a, Total copy number (top) and HSCN (bottom) profiles for two single cells from a case of TNBC HRD-Dup (SA501) and a case of HGSC FBI (SA1049). b, Total copy number heat map (left) and HSCN heat map (right) for 1,283 cells from SA1049. c,d, Proportion of polyploid cells (c) and cells with missegregation events (d). e, Ratio of chromosomal gains versus losses across different ploidy states and mutational signature groupings (number of cells shown below each violin plot). All box plots indicate the median, first and third quartiles (hinges), and the most extreme data points no farther than 1.5× the IQR from the hinge (whiskers).
Fig. 4
Fig. 4. HLAMP copy number variation.
a, Clone and single-cell whole-genome consensus copy number profiles for chromosome 12 in FBI tumour (SA1049) clones A and C. The top track shows the absolute difference between the copy number of the two clones; the bottom two panels show consensus copy number profiles (coloured points) and all single-cell values (small black points). Cell numbers indicate the number of cells used for consensus copy number evaluation. b, Uniform manifold approximation and projection (UMAP) dimensionality reduction of scRNA-seq data from SA1049 coloured by gene expression cluster and violin plots of gene expression per cluster of KRAS; n = 1,697 cells (per cluster: 0 = 287, 1 = 286, 2 = 276, 3 = 270, 4 = 221, 5 = 200, 6 = 157). c, Immunohistochemistry staining of KRAS protein in primary human tissue (left) and PDX tissue (right). Scale bars, 200 µm (left); 300 µm (right). Images are representative of two cores stained from each PDX tissue. d, Copy number variance across cells for HLAMP bins within each dataset (number of bins shown below each violin). Datasets without HLAMPs not shown. e, Distribution of mean copy variance in eight HRD-Dup cases versus 12 FBI cases. P = 0.0096 (two-sided Wilcoxon test); **P < 0.01. Notches show individual data points. f, Width of genomic segment containing the amplification; dashed red line indicates a width of 10 Mb. g, Maximum cell copy number (CN) per gene. h, Clone maximum/minimum copy number ratio of cancer genes overlapping HLAMP regions. Genes across all cancer datasets with ratio > 2 are shown (n = 296). Colours as in d. For fh, distributions of values are shown in a violin plot on the right. i, Distribution of the maximum logFC between gene expression clusters in matched scRNA-seq for variable oncogenes (n = 140) versus non-variable oncogenes (n = 159). P = 0.019 (two-sided Wilcoxon test). j, Consensus copy number profiles in two clones in SA1096 overlaid with lines indicating SVs. All box plots indicate the median, first and third quartiles (hinges), and the most extreme data points no farther than 1.5× the IQR from the hinge (whiskers).
Fig. 5
Fig. 5. Haplotype-specific parallel copy number evolution.
ac, Heat maps of chromosomes 1q (a), 10q (b) and 8 (c) ordered by a phylogenetic tree. The tips of the phylogeny are coloured according to the allelic phase of the region of interest. Arrows indicate single cells, the copy number profiles of which are shown below each heat map. d, VAFs of SNVs in parallel copy number events in two haplotype-specific states in which the dominant allele switches between the two states. Each point is the VAF of a single SNV; lines connect the same SNV in the two states. Dashed lines indicate the expected VAF on the basis of the states. e, VAF of mutations (n = 66) present clonally on allele A after computationally mixing data from SA535 chr. 8 in cells with copy number 2|1 and 1|2. Mixing proportion = 0 means that all cells are in state 2|1 and mixing proportion = 1 means that all cells are in state 1|2. f, UMAP of scRNA-seq data from SA1053 coloured by allelic state of genes at the terminal end of chromosome 10. A (hom), n = 1,614 cells; B (hom), n = 890 cells. g, BAF (B-allele frequency) distribution of cells in f. h, Scatter plot of mean BAF per segment across all datasets (n = 828) computed in RNA versus DNA. i, BAF distribution on chromosome 17 in all tumours and cell lines with matched scRNA (n = 21,347 cells DNA; n = 70,553 cells RNA) versus wild-type cell line (n = 1,963 cells DNA; n = 5,752 cells RNA). j, Rate of gains and losses within whole chromosomes (n = 35 events), chromosome arms (n = 31 events) and segments (n = 341 events) on diploid (1|1) and tetraploid (2|2) backgrounds. WGD, whole-genome duplication. k,l, Correlation of the number of parallel copy number events with copy number distance (P = 0.0008) (k) and phylogenetic distance (P = 0.0003) (l). Annotations at the top indicate the correlation coefficient (R) and P value derived from a linear regression; shaded areas in plots show the 95% confidence interval of the linear regression. All box plots indicate the median, first and third quartiles (hinges), and the most extreme data points no farther than 1.5× the IQR from the hinge (whiskers).
Fig. 6
Fig. 6. Breakpoint serriform variability.
a, Copy number heat maps showing variation in breakpoint location on the horizontal axis across single cells along the vertical axis. Top to bottom: dataset, breakpoint location and number of cells; ideogram indicating the chromosome region shown in the heat map; average copy number across cells in the heat map, with breakpoint-adjacent segment copy number states indicated with dotted black lines; copy number states inferred by HMMCopy; A allele (green) and B allele (purple) copy number states inferred by SIGNALS, Copy number state shading is shown in the adjacent key. Heat map x axis, genomic bins; y axis, cells with the indicated breakpoint. Cells are ordered by breakpoint position (left to right). Arrows in heat map 2 (from left) indicate the cells shown in b. b, Four single-cell copy number profiles from the SA609 SSV event in a. The cell number from the top of the heat map is indicated to the right of the profiles. Top, total copy number; bottom, HSCN. A and B alleles are indicated with green and purple, respectively. The dotted vertical line indicates the cell-specific breakpoint location. c, Breakpoint serration distribution for all cancer datasets for which scores could be computed (see Methods), segregated as FBI (brown), TD (red) and HRD (blue). d, Distribution of serration scores by case; colours as in c (number of scores shown below each violin). e, Mean per-case serration scores versus polyploid cell percentage. f, Mean cell-to-cell HSCN distance (even chromosomes only) per case versus serration (odd chromosomes only). Shaded areas show the 95% confidence interval of the linear regression; the correlation coefficient and P values are annotated at the top (P = 0.001 for e and P = 0.00061 for f). All box plots indicate the median, first and third quartiles (hinges), and the most extreme data points no farther than 1.5× the IQR from the hinge (whiskers).
Extended Data Fig. 1
Extended Data Fig. 1. Study overview.
Experimental and cohort design. Single-cell genomes, transcriptomes and long-read sequencing libraries were generated from isogenic 184-hTERT cell lines (WT, or with TP53, BRCA1 or BRCA2 mutations) a) or PDX tissue from patients with TNBC and patients with HGSC from a meta-cohort with assigned SV or SNV mutational signatures b). Single-cell and long-read sequencing (c) was used to examine mutational processes and haplotype-specific genomic diversity, including HLAMPs or rearrangements d), parallel events e) and SSVs f) at single-cell resolution and within clonal and subclonal populations. Generated using Biorender.com.
Extended Data Fig. 2
Extended Data Fig. 2. Sanger sequencing of cell lines and tumour histology.
a,b) Verification of CRISPR–Cas9 induced genotypes of 184-hTERT cell lines. a) Sanger sequencing of TOPO cloned BRCA1 and BRCA2 regions. b) Western blotting for p53, BRCA1 and BRCA2 proteins for 184-hTERT cell lines and including an additional BRCA2−/− clone, 112.72 and TP53/− clone, SA1101. GAPDH and vinculin loading controls were performed on the same blot as p53, BRCA1 or BRCA2 probes. Blots shown are representative of n = 3 (WT), n = 3 (BRCA1) and n = 6 (BRCA2) independent experiments. For source blots, refer to Supplementary Fig. 1. c) Histology of HGSC PDXs in the dataset. Scale bars 300 µm and 50 µm as indicated. Images are representative of two cores stained from each PDX tissue.
Extended Data Fig. 3
Extended Data Fig. 3. Validation of the SIGNALs method in an ovarian cancer cell line.
a,b) HSCN from 2 individual cells from the OV2295 cell line. c) Total copy number heat maps and HSCN heat map for 1084 cells. Each row is an individual cell. Rows are ordered according to a UMAP + HDBSCAN clustering, with clusters annotated on the left hand side. d) Distribution of VAFs as a function of haplotype-specific state. e,f,g) VAFs in clones where the dominant allele switches between A and B. Each point is the VAF of a mutation, with lines connecting the same mutation in different clones.
Extended Data Fig. 4
Extended Data Fig. 4. DLP summary statistics.
Summary of DLP+ sequencing statistics of data for 184-hTERT cell lines ad) and HGSC and TNBC tumours eh). Number of cells for each box plot as indicated in panels a) and e). Shared y axis labels shown at left. The legend for eh) indicates the number of samples for each cancer and signature type. All box plots indicate the median, 1st and 3rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).
Extended Data Fig. 5
Extended Data Fig. 5. Haplotype-specific analysis reveals breakage–fusion–bridge processes and parallel losses.
a) Diagram of BFBCs b) Heat maps of the copy number of each homologue in SA1188. cf) HSCN and structural variation in clusters B, I, F and the small subpopulation with PIK3CA amplification. Here we plot the copy number for each homologous chromosome in purple for homologue B and green for homologue A. g) Parallel copy number losses in SA906b: total copy number (top) and HSCN (bottom) heat maps for chr2 in SA906b h) two individual cells from g). i) UMAP dimensionality reduction plots of scRNA-seq data generated from SA906b, colours indicate the density of loss of chr 2q A vs. B haplotype. j) Enrichment of the haplotype-specific state on chr 2q of nearest neighbour cells (# cells with loss of A = 175, # of cells with loss of B = 34, # Balanced = 2066). All box plots indicate the median, 1st and 3rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).
Extended Data Fig. 6
Extended Data Fig. 6. SNV and SV signatures.
a) SNV and b) SV mutation signatures estimated from HGSC and TNBC bulk tumour mutation catalogues using the MMCTM method. The x axis in a) is the 96-channel (i.e. A[C>A]A, …, T[T>G]T) SNV types. SV types are DEL: interstitial deletions, DUP: tandem duplications, INV: inversions, FBI: fold-back inversions, TR: translocations. SV signature labels are S-Dup: small duplications, M-Dup: medium duplications, L-Dup: large duplications, S-Del: small deletions, L-Del: large deletions, Clust-FBI: clustered fold-back inversions, Clust-SV: clustered other structural variants, Tr: translocations, FBI/Inv: fold-back inversions and inversions.
Extended Data Fig. 7
Extended Data Fig. 7. Meta-cohort signature analysis of 139 TNBC and 170 HGSC bulk whole genomes.
a) Heat map representing individual patients as columns, annotation tracks (top) including cancer type and mutation status of key genes (strata with adjusted p-values ≤ 0.1 shown as coloured bars on left), standardized signature probabilities of SNVs and SVs (middle) and event counts (bottom). b) Signature type (see stratum annotation track) proportions by cancer type. c) SNV and SV count distributions per signature type (number of samples shown below each violin, data points shown left of violins). Kaplan–Meier survival probability of HGSCs faceted by d) HRD and e) more granular signatures (p-values computed using the log-rank test, p = 0.0038 for d) and p = 0.0022 for e)). All box plots indicate the median, 1st and 3rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).
Extended Data Fig. 8
Extended Data Fig. 8. Summary, quality control and features of single-cell WGS of tumours.
a) UMAP of meta-cohort signature probabilities. Lines connect DLP-pseudobulk to their bulk data counterpart. b) Correlation of proportion of the genome that is LOH between DLP-pseudobulk (horizontal) and matched bulk WGS (vertical). Correlation coefficient (R) and p-value (p) derived from a linear regression in inset, shaded area shows the 95% CI of the linear regression. c) VAF distributions (horizontal) for somatic mutations called in single cells as a function of haplotype-specific state (vertical), coded as integer copy level allele A | integer copy level allele B. Data from all DLP samples are included. d) Heat map showing total copy number (left) and HSCN (right) of single cells from a TNBC HRD-Dup case (SA501). e) Chromosomal gains and losses across different ploidy states and mutational signature grouping. Total counts (black), gains (red), and losses (blue) shown. f) Relationship between gain/loss ratios and number of gained or lost segments for representative datasets from each signature type (left) and all HRD-Dup, TD, or FBI cases (right). g) Differences in copy number segmental gain and loss counts (n = 12 FBI, n = 8 HRD-Dup, n = 3 TD), comparing ploidy-relative case-level consensus copy number profiles (green) and mean cell-level changes relative to clone copy number profiles (purple). h) HSCN distance distributions for all PDX samples. Distribution is over n = 1,000 sampled pairwise HSCN distances. Horizontal black line shows the mean value of the distribution. i) HSCN distance distributions as a function of signature type, each dataset is summarized as the mean of the distributions on the left. P-values indicate per group comparisons using the two-sided Wilcoxon test (n = 12 FBI, n = 8 HRD-Dup, n = 3 TD). j) Number of parallel copy number segments (n, size of circle) and the proportion of segments containing parallel events (f, colour of circle) across all datasets as a function of clonality. Clonal: CCF > 80%, Subclonal: 20% < CCF ≤ 80%, Rare: 1% < CCF ≤ 20%. k) Proportion of segments with parallel CNA in HRD-Dup vs FBI, * = p < 0.05, ns = p > 0.05, two-sided Wilcoxon test (n = 12 FBI, n = 8 HRD-Dup). Exact p-values from left to right, p = 0.85, p = 0.031, p = 0.1. All box plots represent the median, 1st and 3rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).
Extended Data Fig. 9
Extended Data Fig. 9. Genomic features of HLAMPS and long read sequencing validation.
a) Each column is a HLAMP that amplifies an oncogene. Each row is a feature extracted from a region 15Mb either side of the amplification. Complexity = entropy of haplotype-specific states, #SV = total number of structural variants identified, proportion of SVs of each type: fold-back inversions, duplications, deletions and translocations. #chr = number of chromosomes involved in translocation. bin/chr ratio copy number of the bin containing the oncogene to the average copy number across the chromosome. Ratio is the copy number ratio between the clone with the maximum copy number state and the minimum copy number state. bc) HLAMPs involving multiple chromosomes, left plot shows copy number profiles from pseudobulk clones derived from DLP, lines indicate rearrangement breakpoints, right plot shows example long reads from Oxford nanopore technologies  that support inter-chromosomal translocations. Example reads and their mapping to chromosomes of interest (top right), long-read coverage of genomic region and alignment of all supporting reads (bottom right). b) SA1184 MYC amplification c) SA1181 chr5q amplification. d) Long-read support for inter-chromosomal alterations involving chromosomes 3 and 6 in SA1096, DLP clone-level plots shown in Fig. 4j.

Similar articles

Cited by

References

    1. Umbreit NT, et al. Mechanisms generating cancer genome complexity from a single cell division error. Science. 2020;368:eaba0712. doi: 10.1126/science.aba0712. - DOI - PMC - PubMed
    1. Minussi DC, et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature. 2021;592:302–308. doi: 10.1038/s41586-021-03357-x. - DOI - PMC - PubMed
    1. Laks E, et al. Clonal decomposition and DNA replication states defined by scaled single-cell genome sequencing. Cell. 2019;179:1207–1221. doi: 10.1016/j.cell.2019.10.026. - DOI - PMC - PubMed
    1. Hakem R. DNA-damage repair; the good, the bad, and the ugly. EMBO J. 2008;27:589–605. doi: 10.1038/emboj.2008.15. - DOI - PMC - PubMed
    1. Li Y, et al. Patterns of somatic structural variation in human cancer genomes. Nature. 2020;578:112–121. doi: 10.1038/s41586-019-1913-9. - DOI - PMC - PubMed

Publication types