Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 2;40(5):btae202.
doi: 10.1093/bioinformatics/btae202.

A supervised Bayesian factor model for the identification of multi-omics signatures

Affiliations

A supervised Bayesian factor model for the identification of multi-omics signatures

Jeremy P Gygi et al. Bioinformatics. .

Abstract

Motivation: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful.

Results: We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes.

Availability and implementation: SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.

PubMed Disclaimer

Conflict of interest statement

S.H.K. receives consulting fees from Peraton. All other authors declare that they have no competing interests.

Figures

Figure 1.
Figure 1.
SPEAR workflow overview: (A) SPEAR takes multi-omics data (X) taken from the same N samples, as well as a response of interest (Y). SPEAR supports Gaussian, ordinal, and multinomial types of responses. From these inputs, the algorithm first automatically estimates the minimum number of factors to use in the SPEAR model. X and Y are then jointly modeled in a variational Bayesian framework to adaptively construct factor loadings (B) and scores (U) that explain variance of X and are predictive of Y (reflected in B¯). (B) SPEAR factors are used to predict Y and provide probabilities of class assignment for ordinal and multinomial responses. (C) Downstream biological interpretation of factors is facilitated via automatic feature selection, expression profile analysis, enrichment analysis, and analyte correlation
Figure 2.
Figure 2.
Gaussian simulation results. (A) Boxplots of mean-squared errors of the models on the testing data. MSE results for each simulated iteration are connected. Results are shown for varying signal-to-noise ratios, including low, moderate, and high signals. (B) Scatterplots of various factor scores (y-axis) against the true Gaussian response (x-axis) of the moderate signal test data. Color is applied to factor scores found to be correlated with true factors 1 (red) and 2 (blue) with true factors 3–5 designated as grey. (C) Correlation matrix showing the Spearman correlation between each derived factor of the true factors for both the training and testing data of the moderate signal simulated dataset. Significant correlations are denoted with *P ≤ 0.001
Figure 3.
Figure 3.
TCGA-BC Tumor Subtype and COVID-19 Severity Prediction Results. (A, E) Test sample class predictions of the SPEAR model, colored by true class. (B, F) Multi-class AUROC statistics for each model for the LumB and Moderate classes. Error bars show the 95% confidence interval found via 2,000 stratified bootstrapping replicates. Significance testing is denoted as *P ≤ 0.05), **P ≤ 0.005, and ***P ≤ 0.0005. (C) AUROC plot for all models predicting LumB subtype. (G) AUROC plot for all models predicting the moderate severity class. (D, H) Balanced misclassification errors of SPEAR, MOFA, DIABLO, and Lasso on test samples from the (D) TCGA (Breast Cancer) dataset and (H) COVID-19 dataset
Figure 4.
Figure 4.
Downstream TCGA-BC Analysis. (A) Grouped violin plot of Factors 1–3 scores (y-axis) and tumor subtype (x-axis), with group means marked with a line. (B) 3D scatter plot, embedding samples by Factors 1, 2, and 3 scores. Samples are colored by tumor subtype. (C) Dotplot of GSEA results on mRNA features for Factors 1–3. Points are shaded by −log(P.adjusted) with color representing enrichment direction. (D) GSEA plot for Estrogen Response (Early) Hallmark pathway for SPEAR Factor 1. mRNA genes are ranked by their assigned projection coefficient from SPEAR Factor 1. (E) Heatmap showing normalized expressions for the top 24 mRNA genes involved in the Estrogen Response (Early) Hallmark pathway. mRNA genes were selected with a factor loading (projection coefficient) magnitude ≥0.02. Samples were ranked by Factor 1 score (x-axis) and genes were ranked by projection coefficient (y-axis). Also shown are corresponding true tumor subtypes (True) and SPEAR-predicted tumor subtypes (Pred)
Figure 5.
Figure 5.
Downstream COVID-19 dataset analysis. (A) Grouped violin plot of factor 2 scores (y-axis) and simplified WHO score (x-axis), with group means marked with a line. (B) Grouped violin plot of factor 8 scores (y-axis) and simplified WHO score (x-axis), with group means marked with a line. (C) GSEA plot for IL6 JAK STAT3 Signaling Hallmark pathway for SPEAR Factor 2. Proteins are ranked by their assigned projection coefficient from SPEAR factor 2. (D) Embedding of samples by factor 8 (x-axis) and factor 2 (y-axis) scores. Samples are colored by WHO Ordinal Score, normalized IL6 expression, and normalized kynurenine expression. (E) Heatmap showing normalized expressions for proteins involved in the IL6 JAK STAT3 Signaling Hallmark pathway. Samples were ranked by Factor 2 score (x-axis) and proteins were ranked by projection coefficient (y-axis). Also shown are corresponding true patient severity scores (True) and SPEAR (sd) predicted severity scores (Pred). (F) Alluvial plot showing correlation between the IL6 JAK STAT3 proteins and the top metabolite features contributing to SPEAR Factor 2. Metabolites are grouped by positive/negative correlation and super-pathway. Also shown are normalized plasmalogen expressions of samples ranked by Factor 2 score (x-axis)

Update of

Similar articles

References

    1. Almulla AF, Supasitthumrong T, Tunvirachaisakul C. et al. The tryptophan catabolite or kynurenine pathway in COVID-19 and critical COVID-19: a systematic review and meta-analysis. BMC Infect Dis 2022;22:615. - PMC - PubMed
    1. Argelaguet R, Arnol D, Bredikhin D. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol 2020;21:111. - PMC - PubMed
    1. Azevedo RB, Botelho BG, Hollanda JVGd. et al. Covid-19 and the cardiovascular system: a comprehensive review. J Hum Hypertens 2021;35:4–11. - PMC - PubMed
    1. Banoth B, Cassel SL.. Mitochondria in innate immune signaling. Transl Res 2018;202:52–68. - PMC - PubMed
    1. Bardowell SA, Parker J, Fan C. et al. Differential methylation relative to breast cancer subtype and matched normal tissue reveals distinct patterns. Breast Cancer Res Treat 2013;142:365–80. - PMC - PubMed

Publication types