Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 20;24(1):55.
doi: 10.1186/s12859-023-05179-2.

A robust and accurate single-cell data trajectory inference method using ensemble pseudotime

Affiliations

A robust and accurate single-cell data trajectory inference method using ensemble pseudotime

Yifan Zhang et al. BMC Bioinformatics. .

Abstract

Background: The advance in single-cell RNA sequencing technology has enhanced the analysis of cell development by profiling heterogeneous cells in individual cell resolution. In recent years, many trajectory inference methods have been developed. They have focused on using the graph method to infer the trajectory using single-cell data, and then calculate the geodesic distance as the pseudotime. However, these methods are vulnerable to errors caused by the inferred trajectory. Therefore, the calculated pseudotime suffers from such errors.

Results: We proposed a novel framework for trajectory inference called the single-cell data Trajectory inference method using Ensemble Pseudotime inference (scTEP). scTEP utilizes multiple clustering results to infer robust pseudotime and then uses the pseudotime to fine-tune the learned trajectory. We evaluated the scTEP using 41 real scRNA-seq data sets, all of which had the ground truth development trajectory. We compared the scTEP with state-of-the-art methods using the aforementioned data sets. Experiments on real linear and non-linear data sets demonstrate that our scTEP performed superior on more data sets than any other method. The scTEP also achieved a higher average and lower variance on most metrics than other state-of-the-art methods. In terms of trajectory inference capacity, the scTEP outperforms those methods. In addition, the scTEP is more robust to the unavoidable errors resulting from clustering and dimension reduction.

Conclusion: The scTEP demonstrates that utilizing multiple clustering results for the pseudotime inference procedure enhances its robustness. Furthermore, robust pseudotime strengthens the accuracy of trajectory inference, which is the most crucial component in the pipeline. scTEP is available at https://cran.r-project.org/package=scTEP .

Keywords: Pathway; Pseudotime; Single cell; Trajectory inference.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Box plots for HIM, F1 branches, F1 milestones, and correlation values for 26 gold standard data sets. The diamond shape in the box indicates the mean value of a method. The mean value of scTEP is also shown as a red dashed horizontal line for comparison. The scTEP outperforms other state-of-the-art trajectory inference methods by having the best mean values regarding all four metrics
Fig. 2
Fig. 2
The visualization of ground truth, inferred trajectories, and pseudotime on the Mesoderm development loh data set. The landscape in the reduced dimension space provided by each method is colored by: a ground truth development stages, b trajectory inference results, c ground truth pseudotime, d pseudotime inferred by methods
Fig. 3
Fig. 3
Box plots for correlation values for 15 real scRNA-seq data sets. The diamond shape in the box indicates the mean value of a method. The mean value of scTEP is also shown as a red dashed horizontal line for comparison. The scTEP outperforms other state-of-the-art trajectory inference methods by having the best average correlation value. a The correlation values of all data sets. b The correlation values of data sets that are larger than 50,000 cells
Fig. 4
Fig. 4
Visualization and comparison on the Goolam [21] data set: a The landscape of Goolam data set using UMAP colored by ground truth development stages. b UMAP landscape colored by clustering results. c UMAP landscape colored by pseudotime. d The pseudotime of scTEP against ground truth development stages. e Slingshot. f TSCAN. g SCORPIUS. e PAGA. i Monocle3. j VIA
Fig. 5
Fig. 5
Visualization and comparison on the Yuzwa [24] data set: a The landscape of MouseCortex data set using UMAP colored by ground truth development stages. b UMAP landscape colored by clustering results. c UMAP landscape colored by pseudotime. d The pseudotime of scTEP against ground truth development stages. e Slingshot. f TSCAN. g SCORPIUS. h PAGA. i Monocle3. j VIA
Fig. 6
Fig. 6
The architecture of our proposed single-cell data Trajectory inference method using Ensemble Pseudotime inference (scTEP). It consists of four parts: a Data pre-processing and Pathway gene sets intersection, b scDHA [14] clustering and dimension reduction, c Ensemble pseudotime inference, and d Trajectory construction using MST algorithm on clusters and fine-tuned by Pseudotime
Fig. 7
Fig. 7
Box plots for HIM, F1 branches, F1 milestones, and correlation values for 26 gold standard datasets. The diamond shape in the box indicates the mean value of a method. The mean values of scTEP and scTEP without pathway gene sets intersection procedure (scTEP-pw) are also shown as a red and blue dashed horizontal line for comparison, respectively. The scTEP’s performance is degraded by removing the pathway gene sets intersection procedure
Fig. 8
Fig. 8
Box plots for correlation values for 15 real world datasets. The diamond shape in the box indicates the mean value of a method. The mean values of scTEP and scTEP without pathway gene sets intersection procedure (scTEP-pw) are also shown as a red and blue dashed horizontal line for comparison, respectively. The scTEP without pathway gene sets intersection procedure only degraded a trivial amount regarding the correlation. a The correlation values of all data sets. b The correlation values of data sets that are larger than 50,000 cells

References

    1. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32(4):381–386. doi: 10.1038/nbt.2859. - DOI - PMC - PubMed
    1. Qiu X, Mao Q, Tang Y, Wang L, Chawla R, Pliner HA, Trapnell C. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017;14(10):979–982. doi: 10.1038/nmeth.4402. - DOI - PMC - PubMed
    1. Ji Z, Ji H. Tscan: Pseudo-time reconstruction and evaluation in single-cell rna-seq analysis. Nucleic Acids Res. 2016;44(13):117–117. doi: 10.1093/nar/gkw430. - DOI - PMC - PubMed
    1. Shin J, Berg DA, Zhu Y, Shin JY, Song J, Bonaguidi MA, Enikolopov G, Nauen DW, Christian KM, Ming G-L, et al. Single-cell rna-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell. 2015;17(3):360–372. doi: 10.1016/j.stem.2015.07.013. - DOI - PMC - PubMed
    1. Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, Purdom E, Dudoit S. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 2018;19(1):1–16. doi: 10.1186/s12864-018-4772-0. - DOI - PMC - PubMed

LinkOut - more resources