Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

The tidyomics ecosystem: enhancing omic data analyses

Abstract

The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the tidyomics ecosystem.
Fig. 2: Performance of the tidyomics ecosystem.

Similar content being viewed by others

Data availability

Human Cell Atlas peripheral blood mononuclear single-cell data were downloaded from the CELLxGENE database. The relative weblink for each sample is listed in Supplementary Table 1. The samples analyzed are accessible at the Human Cell Atlas. Metadata and gene-transcript abundance for these datasets from the CuratedAtlasQuery database is accessible at sample_metadata.0.2.3.parquet. CELLxGENE sample accession codes are available in Supplementary Table 1. Source data are provided with this paper.

Code availability

The tidyomics homepage is https://github.com/tidyomics31, which provides links to the constituent packages. The tidyomics meta-package is available at Bioconductor bioconductor.org/packages/tidyomics/. The tidySummarizedExperiment package is available at Bioconductor bioconductor.org/packages/tidySummarizedExperiment. The tidySingleCellExperiment package is available at Bioconductor bioconductor.org/packages/tidySingleCellExperiment. The tidySpatialExperiment package is available at Bioconductor bioconductor.org/packages/tidySpatialExperiment/. The code used to benchmark workflow efficiency and analyze peripheral blood mononuclear cells from the Human Cell Atlas is available at github.com/tidyomics/tidyomics_paper. Source data for Fig. 2h are available at github.com/tidyomics/tidyomics_paper.

References

  1. Tarazona, S., Arzalluz-Luque, A. & Conesa, A. Undisclosed, unmet and neglected challenges in multi-omics studies. Nat. Comput. Sci. 1, 395–402 (2021).

    Article  PubMed  Google Scholar 

  2. Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019).

    Article  Google Scholar 

  4. Li, P. Computation and Visualization of Package Download Counts and Percentiles [R package packageRank version 0.8.3] (R Project, 2023).

  5. Çetinkaya-Rundel, M. et al. An educator’s perspective of the tidyverse. Preprint at https://doi.org/10.48550/arXiv.2108.03510 (2021).

  6. Lee, S., Cook, D. & Lawrence, M. plyranges: a grammar of genomic data transformation. Genome Biol. 20, 4 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Mangiola, S., Doyle, M. A. & Papenfuss, A. T. Interfacing Seurat with the R tidy universe. Bioinformatics https://doi.org/10.1093/bioinformatics/btab404 (2021).

  8. Mangiola, S., Molania, R., Dong, R., Doyle, M. A. & Papenfuss, A. T. tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biol. 22, 42 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Mu, W. et al. bootRanges: flexible generation of null sets of genomic ranges for hypothesis testing. Bioinformatics 39, btad190 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Keyes, T. J., Koladiya, A., Lo, Y.-C., Nolan, G. P. & Davis, K. L. tidytof: a user-friendly framework for scalable and reproducible high-dimensional cytometry data analysis. Bioinform. Adv. 3, vbad071 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Davis, E. S. et al. matchRanges: generating null hypothesis genomic ranges via covariate-matched sampling. Bioinformatics 39, btad197 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020).

    Article  CAS  PubMed  Google Scholar 

  15. Ko, M. E. et al. FLOW-MAP: a graph-based, force-directed layout algorithm for trajectory mapping in single-cell time course datasets. Nat. Protoc. 15, 398–420 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Righelli, D. et al. SpatialExperiment: infrastructure for spatially-resolved transcriptomics data in R using Bioconductor. Bioinformatics 38, 3128–3131 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wang, Y. et al. Spatial transcriptomics: technologies, applications and experimental considerations. Genomics 115, 110671 (2023).

    Article  CAS  PubMed  Google Scholar 

  18. Rozenblatt-Rosen, O. et al. Building a high-quality Human Cell Atlas. Nat. Biotechnol. 39, 149–153 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20, 163–172 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Fernández, J. M. et al. The BLUEPRINT Data Analysis Portal. Cell Syst 3, 491–495.e5 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Xu, W. et al. Mapping of γ/δ T cells reveals Vδ2+ T cells resistance to senescence. EBioMedicine 39, 44–58 (2019).

    Article  PubMed  Google Scholar 

  23. Law, C. W. et al. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. F1000Res https://doi.org/10.12688/f1000research.9005.3 (2016).

  24. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  CAS  PubMed  Google Scholar 

  25. Lewis, M., Goldmann, K., Sciacca, E., Cubut, C. & Surace, A. glmmSeq: General Linear Mixed Models for Gene-Level Differential Expression (glmmSeq: General Linear, 2022).

  26. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).

  28. International Multiple Sclerosis Genetics Consortium. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 365, eaav7188 (2019).

  29. Wang, Y.-F. et al. Identification of 38 novel loci for systemic lupus erythematosus and genetic heterogeneity between ancestral groups. Nat. Commun. 12, 772 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Mangiola, S. et al. A multi-organ map of the human immune system across age, sex and ethnicity. Preprint at bioRxiv https://doi.org/10.1101/2023.06.08.542671 (2023).

  31. tidyomics. GitHub https://github.com/tidyomics (2024).

Download references

Acknowledgements

We acknowledge Bioconductor and tidyverse communities, whose software and coding paradigms this work is based on and would not be possible without. We also thank the tidyomics community for their feedback and contribution. We thank V. Carey for his support and feedback on the project. Also, we thank M. Ritchie for his continuous support and feedback. Human illustrations were created with BioRender.com. S.M. was supported by the Victorian Cancer Agency Early Career Research Fellowship (ECRF21036). M.I.L. was supported by the Chan Zuckerberg Initiative (EOSS3-0000000057). A.T.P. was supported by the National Health and Medical Research Council (NHMRC) Senior Research Fellowship (1116955) and Investigator Grant (2026643). A.T.P., S.M. and W.H. were supported by the Lorenzo and Pamela Galli Medical Research Trust and the Galli Next Generation Discoveries Initiative. K.L.D. is the Anne T. and Robert M. Bass Endowed Faculty Scholar in Pediatric Cancer and Blood Diseases of the Stanford Maternal Child Health Research Institute and the Harriet and Mary Zelencik Endowed Faculty in Children’s Cancer and Blood Diseases. P.-P.A. was supported by the Cancéropole GSO and Intergroupe Français du Myélome. R.G. was funded by a project grant from the Swiss National Foundation. M.M. was supported by the NHGRI and NCI of the National Institutes of Health under award numbers U41HG004059 and U24CA180996. This work was supported by an ASPIRE award from the Mark Foundation for Cancer Research and the B+ Foundation. The research benefited from support from the Victorian State Government Operational Infrastructure Support and Australian Government NHMRC Independent Research Institute Infrastructure Support. The funders had no role in study design, data collection and analysis, or decision to publish or prepare the manuscript.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

S.M. proposed the study, and S.M. and M.I.L. designed the study. W.J.H. and S.M. developed the novel tidy adapters for transcriptomics, W.J.H., T.J.K., S.M. and M.I.L. performed the analyses. W.J.H., T.J.K., H.L.C., J.S., C.S., E.S.D., N.S., L.M., B.T., A.A.N., M.K., Q.C., V.Y., W.M., J.-E.P., I.M., M.H.R., P.-P.A., P.P., C.-L.P., M.T., R.G., M.M., S.L., M.L., S.C.H., G.P.N., K.L.D., A.T.P., M.I.L. and S.M. contributed to the ecosystem’s development and ongoing improvement. S.M., M.I.L., A.T.P., K.L.D., S.C.H., M.L., M.M. and R.G. acted as the supervisory team. S.M., M.I.L. and A.T.P. contributed equally and jointly led the study. W.J.H. and T.J.K. contributed equally. All authors contributed to the manuscript’s writing.

Corresponding authors

Correspondence to Anthony T. Papenfuss, Michael I. Love or Stefano Mangiola.

Ethics declarations

Competing interests

R.G. has received consulting income from Takeda and Sanofi, and declares ownership in Ozette Technologies. M.K. is an employee of and declares ownership in Achilles Therapeutics. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Bo Li and Judith Zaugg for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Supplementary Table 1

List of samples used in peripheral blood mononuclear cell analysis.

Source data

Source Data Fig. 2

Source data used to create the benchmarking plot Fig. 2h.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hutchison, W.J., Keyes, T.J., The tidyomics Consortium. et al. The tidyomics ecosystem: enhancing omic data analyses. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02299-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41592-024-02299-2

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing