Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb;39(2):215-224.
doi: 10.1038/s41587-020-0652-7. Epub 2020 Sep 14.

Oncoprotein-specific molecular interaction maps (SigMaps) for cancer network analyses

Affiliations

Oncoprotein-specific molecular interaction maps (SigMaps) for cancer network analyses

Joshua Broyde et al. Nat Biotechnol. 2021 Feb.

Abstract

Tumor-specific elucidation of physical and functional oncoprotein interactions could improve tumorigenic mechanism characterization and therapeutic response prediction. Current interaction models and pathways, however, lack context specificity and are not oncoprotein specific. We introduce SigMaps as context-specific networks, comprising modulators, effectors and cognate binding-partners of a specific oncoprotein. SigMaps are reconstructed de novo by integrating diverse evidence sources-including protein structure, gene expression and mutational profiles-via the OncoSig machine learning framework. We first generated a KRAS-specific SigMap for lung adenocarcinoma, which recapitulated published KRAS biology, identified novel synthetic lethal proteins that were experimentally validated in three-dimensional spheroid models and established uncharacterized crosstalk with RAB/RHO. To show that OncoSig is generalizable, we first inferred SigMaps for the ten most mutated human oncoproteins and then for the full repertoire of 715 proteins in the COSMIC Cancer Gene Census. Taken together, these SigMaps show that the cell's regulatory and signaling architecture is highly tissue specific.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

A.C. is founder and equity holder of DarwinHealth Inc., a company that has licensed some of the algorithms used in this manuscript from Columbia University. Columbia University is also an equity holder in DarwinHealth Inc.

Figures

Extended Data Figure 1
Extended Data Figure 1. The top 40 predictions from the OncoSigNB algorithm for the KRAS LUAD SigMap were chosen for validation.
(a): Performance of OncoSigNB at recovering the Ingenuity-derived PGSS as a function of LRPost. At an LRPost = 53 (probability = 0.50) (vertical gray line), the OncoSigNB LUAD-specific KRAS SigMap contains 10% of the PGSS (horizontal gray line). The vertical red line corresponds to LRPost = 240, the cutoff used to obtain candidates for experimental validation. (b): ROC curve analysis, evaluated as the recovery of the Ingenuity-derived PGSS (FPR ≤ 0.05), for 1) OncoSigNB (green curve, N = 1028)), 2) Pearson’s correlation between mRNA expression of KRAS and mRNA expression of other proteins in LUAD (blue curve), and 3) random performance (black curve). Recovery using 2-fold cross-validation (green) is essentially indistinguishable from recovery using 100-fold Monte-Carlo Cross-validation (not shown). 393 OncoSigNB LUAD-specific KRAS SigMap predictions are made for LRPost ≥ 53, which corresponds to probability ≥ 0.50 and FPR ≤ 0.019 (purple dot). 40 OncoSigNB LUAD-specific KRAS SigMap predictions are made for LRPost ≥ 240, which corresponds to probability ≥ 0.82 and FPR ≤ 0.0018 (yellow dot). The top 40 predictions are listed by gene name in (c). (c): Orange and blue boxes contain, respectively, known upstream regulators and downstream effectors that are successfully recovered by OncoSigNB. Italicized text indicates proteins known to interact with KRAS via a physical protein-protein interaction. The box titled “validated predictions” shows the novel OncoSigNB predictions tested with the RNAi negative screen; those that were experimentally found to affect cell growth in a KRAS-dependent context are highlighted in bold text.
Extended Data Figure 2
Extended Data Figure 2. The OncoSigRF and OncoSigNB algorithms produce highly similar KRAS LUAD SigMaps.
(a) Comparison of ROC curves (FPR ≤ 0.05) for LUAD-specific KRAS SigMaps predicted by OncoSigNB (green and blue curves) and OncoSigRF (orange and red curve) trained on the Ingenuity PGSS and the MSigDB PGSS, respectively. (b) Gene Set Enrichment Analysis (GSEA) of the top 100 OncoSigNB LUAD-specific KRAS SigMap predictions at the top of the OncoSigRF LUAD-specific KRAS SigMap predictions. Ranking is based on OncoSigRF score (SRF). Both the OncoSigNB predictions tested in the knockdown experiments (red lines) and the remaining top 100 OncoSigNB predictions (blue lines) are highly enriched at the top of the OncoSigRF predictions (p = 5.6 x 10−8 and p = 1.7 x 10−19, respectively).
Extended Data Figure 3
Extended Data Figure 3. The log2FC of shRNA abundance is plotted against the novel proteins tested in the KRAS negative selection screen.
The 3-5 points plotted for a given protein are shRNAs that target the mRNA for that protein (N = 100). The X-axis is sorted by mean log2FC for all shRNAs targeting each gene. Colors change from red to green with mean log2FC.
Extended Data Figure 4
Extended Data Figure 4. OncoSigRF predictions are highly enriched in oncogenic KRASMut dependencies.
(a): GSEA of KRASMut synthetic lethal partners (25) (blue lines, N =216) and the top 500 OncoSigRF LUAD-specific KRAS SigMap predictions obtained by training on a modified PGSS for which the intersection with the synthetic lethal set was removed. Inset is the GSEA using all OncoSigRF predictions obtained in this way, where the ranking is OncoSigRF score. Enrichment analysis was performed with the aREA (analytic Rank-based Enrichment Analysis) algorithm (10). (b): Enrichment of the protein resistance-signature to ERK inhibitor SCH772984 (28) (blue lines, N = 24) within OncoSigRF LUAD-specific KRAS SigMap predictions. (c): Enrichment of proteins involved in response to Reactive Oxygen Species (GO:0000302) (8, 29) (blue lines, N = 276) within OncoSigRF LUAD-specific KRAS SigMap predictions.
Extended Data Figure 5
Extended Data Figure 5. OncoSigRF KRAS SigMaps exhibit tissue context specificity.
(a): ROC curves for the OncoSigRF KRAS SigMaps in LUAD (red), LUSC (gray) COAD (brown), and PAAD (orange) for FPR ≤ 0.05. Performance is evaluated as the recovery of established KRAS pathway proteins. (b): Gene set enrichment analysis (GSEA) of KRASMut synthetic lethal partners, as determined by Corcoran et al. (27) (N = 48, blue lines). To avoid training and testing on the same proteins, OncoSigRF predictions for COAD-specific KRAS SigMap proteins were obtained by training on a modified PGSS from which any established KRASMut synthetic lethal protein had been previously removed. Enrichment analysis was performed with the aREA (analytic Rank-based Enrichment Analysis) algorithm. (c): Scatterplot of OncoSigRF scores for KRAS SigMap proteins in PAAD-vs-LUAD (N = 19,789). Each dot represents the scores for one protein. Darker colored points have high scores (SRF ≥ 0.5) in at least one context, and lighter colored points score poorly in both contexts (SRF ≤ 0.5). R2PAAD/LUAD = 0.037. (d): OncoSigRF COAD-specific KRAS SigMap in the form depicted conceptually in Figure 1a. To prevent visual cluttering, only the top 33 OncoSigRF predictions (FPR ≤ 0.01) that are also VIPER-inferred KRAS interactors (p ≤ 0.01), PrePPI-predicted KRAS physical interactors, or both, are depicted. Bold and regular text node labels represent established and novel predictions, respectively; orange and blue node colors represent upstream regulators and downstream effectors, respectively; red, blue, and black node borders represent predictions that are druggable (Drug Repurposing Hub (22)), KRASMut synthetic lethal from the literature and validated here (see text), and both, respectively; orange and blue solid lines and gray nodes represent PrePPI-predicted physical interactors of KRAS.
Extended Data Figure 6
Extended Data Figure 6. OncoSigRF SigMaps for hypermutated oncoproteins are retrospectively validated.
(a) Pairwise overlap of established pathway proteins (left) and the OncoSigRF LUAD-specific SigMaps (FPR ≤ 0.01, right) for the ten hyper-mutated oncoproteins (names of columns and rows). Percent overlap is color-coded according to the scale at top. (b) SigMap predictions are highly enriched in 600 EGFR-centric network proteins (52) (p = 2.3 x 10−43). Enrichment analysis was performed with the aREA (analytic Rank-based Enrichment Analysis) algorithm. (c) Box plots of the OncoSigRF LUAD-specific EGFR SigMap scores for two subsets of the curated EGFR pathway proteins from Astsaturov et al. (52): those identified as EGFR synthetic lethal partners (red, N = 58) and those not identified as synthetic lethal (grey, N = 542). The p-value (2 x 10−4) was calculated using Welch’s two sample t-test.
Figure 1:
Figure 1:. Protein-specific molecular interaction Signaling Map (SigMap) and the OncoSigRF algorithm
(a) Graphical representation of a SigMap for an anchor oncoprotein (red node). The SigMap comprises: (i) upstream activity modulators (orange nodes), (ii) downstream effectors, responsible for mediating its pathophysiologic function (blue nodes), (iii) structural cognate binding partners (gray nodes), which may be either modulators (solid orange lines) or effectors (solid blue lines), and (iv) auto-regulatory loops connecting downstream effectors to upstream modulators (dashed green lines). To avoid unnecessary clutter, implicit arrows connecting upstream modulators to the red node and the latter to its downstream effectors are omitted. Thus, the only interactions explicitly denoted by an edge are physical protein-protein interactions and autoregulatory interactions between modulator and effector proteins. (b) Networks used to train the OncoSigRF algorithm. (i) PrePPI predicts interactions between a protein (red), and its physical and/or functional interactors (gray). (ii) The ARACNe algorithm predicts transcription factors or signaling molecules (red) that transcriptionally regulate target genes (blue). (iii) CINDy predicts signaling molecules (orange/red) that post-translationally modify transcription factors (blue boxes), which in turn leads to differential expression of a transcription factor’s targets (blue diamonds). (iv) The VIPER algorithm infers downstream effectors (blue) and upstream regulators (orange) for a given protein (red). VIPER associates 1) the protein (red) with a missense mutation (black dot) with the activity change of transcription factors (blue) and 2) signaling molecules (orange) with missense mutations (black dots) with activity of the protein (red). (c) Feature matrix for OncoSigRF algorithm. Networks in (b) are encoded as a feature matrix (dark green box), where rows correspond to proteins in the human proteome, columns correspond to proteins for which clues exist in PrePPI, ARACNe, CINDy, and VIPER, respectively, and each entry is a scalar proportional to the confidence in the corresponding interaction as described in the literature. The latter include likelihood ratios for PrePPI, mutual information for ARACNe, number of SP-coF/TF-gene triplets for CINDy, and −log10 p-value for VIPER. The gold column corresponds to whether a protein is an established member of a particular pathway (PGSS, value of 1) or not (value of 0). Only a few components of a small subset of proteins are shown. The feature vector for LIMD1 (highlighted in light green) is described in the text. The last column provides the OncoSigRF score of the subset proteins for the LUAD-specific KRAS SigMap (see Figure 2). SP = signaling protein, coF = co-factor, and TF = transcription factor.
Figure 2:
Figure 2:. The OncoSigRF LUAD-specific KRAS SigMap
(a): ROC curves are displayed for the performance at recovering established KRAS-pathway proteins (FPR ≤ 0.05 (5%)) of OncoSigRF (red curve, N = 1,114), Pearson’s correlation between mRNA expression of KRAS and mRNA expression of other proteins in LUAD (green curve, N = 957), and random prediction (black curve). The inset shows the full ROC curves (red curve, N = 19,548; green curve, N = 18,891). (b): The ROC curve in (a) for FPR ≤ 0.01 (1%, N = 263) is separated into two according to whether predictions correspond to (i) established KRAS-pathway proteins (top panel, yellow circles), with best-known KRAS-pathway proteins individually labeled or (ii) novel KRAS SigMap proteins (bottom panel, white circles). Circles annotate predictions as either druggable (Drug Repurposing Hub) (red), experimentally-validated KRAS interactors (BioGRID) (blue), or both (black). . (c): OncoSigRF LUAD-specific KRAS SigMap in the form depicted conceptually in Figure 1a. To prevent visual cluttering, only the top 68 OncoSigRF predictions that are also VIPER-inferred KRAS interactors, PrePPI-predicted physical interactors, or both, are depicted. Bold and regular text node labels represent established and novel predictions, respectively; orange and blue node colors represent upstream regulators and downstream effectors, respectively; red, black, and purple node borders represent predictions that are druggable (Drug Repurposing Hub), KRASMut synthetic lethal partners from the literature or validated in this study, and both, respectively; solid orange and blue lines and gray nodes represent PrePPI-predicted physical KRAS interactors; green dashed lines represent auto-regulatory and feed-forward loop interactions.
Figure 3:
Figure 3:. Experimental validation of the OncoSigRF LUAD-specific KRAS SigMap
(a): Schematic of the pooled shRNA negative screen experiments performed. An average of four shRNAs target each gene in the protocol implemented. KRASG12D/+/p53fl/fl primary tumor cells (green patches) are isolated from the mouse and placed in a semi-solid 3D matrix (cylinder). A pooled shRNA knockdown is performed (Day 1), and each cell stochastically integrates one shRNA into its DNA. Cells that integrate different shRNAs are shown as, red (representing shRNAs for novel predictions), green and purple (for positive controls), and black (for the background pool). Some cells and their daughter cells form spheroids (Day 6). The spheroids are dissociated, reseeded in a new matrix, and reform (Day 12). Fold Change (FC) of shRNA abundance is measured by deep sequencing the shRNAs at days 6 and 12. (b): Plot of log2FC of shRNAs targeting predicted KRAS functional partners (red, N = 100), known members of the KRAS signaling pathways (RALGDS, MAPK1, RASA1 and AKT1) (purple, N = 17) and two synthetic lethal positive controls (NUP205 and TBK1) (green, N = 8). The black dots show log2FC of shRNAs targeting 515 genes within the Background Pooled Screens (BPS, N = 2286) not expected to be involved in KRAS regulated signaling. The X-axis is the normalized rank, calculated by ranking log2FC of each set of shRNAs and dividing by the number of shRNAs in that set. Each gene is represented by several dots, which correspond to different shRNAs. See Extended Data Figure 3 for more details. (c): Density plots of log2FC for predicted KRAS functional partners (red), all individual BPS (grey), and the average of all BPS (black). (d): Fold change versus significance for genes that significantly reduce organoid growth in the pooled shRNA negative screen experiments (FDR < 0.05, gray dotted line). Log2FC of shRNAs, averaged after removing one or two outlier hairpins, is plotted against log10-transformed, FDR-adjusted p-values. shRNAs are colored as described in (b).
Figure 4:
Figure 4:. KRAS SigMap tumor context specificity
Scatterplots of OncoSigRF scores for KRAS SigMap proteins in (a) the LUSC-vs-LUAD (N = 19,790) and (b) the COAD-vs-LUAD (N = 19,438) tumor contexts (b). Each dot represents one protein. Gold and gray points represent established and novel predictions, respectively. Darker colored points have high scores (SRF ≥ 0.5) in at least one context, and lighter colored points have low scores (SRF < 0.5) in both contexts and should thus not be compared. Correlation coefficients are R2LUSC/LUAD = 0.35 (p < 10−267) and R2COAD/LUAD = 0.10 (p < 10−165), respectively (Welch’s Two Sample t-test). Specific points highlighted in black and green are discussed in the text.
Figure 5:
Figure 5:. OncoSigRF LUAD-specific SigMap analysis of hyper-mutated oncoproteins
(a): ROC curves showing OncoSigRF’s performance in terms of identifying established pathway proteins in LUAD-specific SigMaps for the 10 oncoproteins listed in the legend (FPR ≤ 0.05, N ~ 1000). The thick red line represents performance of OncoSigRF for the LUAD-specific KRAS SigMap from Figure 2a as a reference. (b): Number of literature-derived KRASMut synthetic lethal partners (predicted by each of the ten SigMaps) as a function of OncoSigRF score. OncoSigRF scores are binned, and the bins are colored from dark red (highest scores) to white (SRF < 0.50), as depicted in the leftmost column. The number of predictions per score bin is color-coded according to the legend at the top.
Figure 6:
Figure 6:. OncoSig: Generalization of OncoSigRF to Cancer Gene Census proteins
(a): ROC curve analysis of LUAD-specific KRAS SigMaps generated by OncoSig (blue curve) versus OncoSigRF (red curve), FPR ≤ 0.10. Established KRAS signaling proteins recovered by OncoSig and OncoSigRF are labeled in gray and black, respectively. (b-d): LUAD-specific OncoSig SigMaps for three Cancer Gene Census proteins in the form depicted conceptually in Figure 1a: SMARCA4 (b), MET (c), and BIRC6 (d). Orange and blue node colors represent upstream regulators and downstream effectors, respectively; solid orange and blue lines and gray nodes represent PrePPI-predicted physical interactors; red node borders represent predictions that are druggable (Drug Repurposing Hub); blue node text represents experimentally observed interactions (BioGRID); and dashed lines represent regulatory interactions. In (b), the gray dashed box includes map members involved in chromatin organization (GO:0016568); green shading includes map members involved in histone acetylation (GO:0016573); and blue shading includes map members involved in chromatin remodeling (GO:0006338). In (d), blue shading includes the map members involved in protein ubiquitination (GO:0016567), and green shading includes map members involved in protein sumoylation (GO:0016925).

Similar articles

Cited by

References

    1. Prahallad A et al. Unresponsiveness of colon cancer to BRAF(V600E) inhibition through feedback activation of EGFR. Nature 483, 100–103, doi:10.1038/nature10868 (2012). - DOI - PubMed
    1. Bild AH et al. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439, 353–357, doi:10.1038/nature04296 (2006). - DOI - PubMed
    1. Krogan NJ, Lippman S, Agard DA, Ashworth A & Ideker T The cancer cell map initiative: defining the hallmark networks of cancer. Mol Cell 58, 690–698, doi:10.1016/j.molcel.2015.05.008 (2015). - DOI - PMC - PubMed
    1. Greene CS et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 47, 569–576, doi:10.1038/ng.3259 (2015). - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research, N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120, doi:10.1038/ng.2764 (2013). - DOI - PMC - PubMed

Methods-only References

    1. Woo JH et al. Elucidating Compound Mechanism of Action by Network Perturbation Analysis. Cell 162, 441–451, doi:10.1016/j.cell.2015.05.056 (2015). - DOI - PMC - PubMed
    1. Duan Q et al. LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures. Nucleic Acids Res 42, W449–460, doi:10.1093/nar/gku476 (2014). - DOI - PMC - PubMed
    1. Kramer A, Green J, Pollard J Jr. & Tugendreich S Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523–530, doi:10.1093/bioinformatics/btt703 (2014). - DOI - PMC - PubMed
    1. Li W & Godzik A Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659, doi:10.1093/bioinformatics/btl158 (2006). - DOI - PubMed
    1. Arlot S & Celisse A A survey of cross-validation procedures for model selection. Statist. Surv 4, 40–79, doi:doi:10.1214/09-SS054 (2010). - DOI

Publication types