STAR: ultrafast universal RNA-seq aligner
- PMID: 23104886
- PMCID: PMC3530905
- DOI: 10.1093/bioinformatics/bts635
STAR: ultrafast universal RNA-seq aligner
Abstract
Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.
Results: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.
Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
Figures
![Fig. 1.](https://cdn.statically.io/img/www.ncbi.nlm.nih.gov/pmc/articles/instance/3530905/bin/bts635f1.gif)
![Fig. 2.](https://cdn.statically.io/img/www.ncbi.nlm.nih.gov/pmc/articles/instance/3530905/bin/bts635f2.gif)
![Fig. 3.](https://cdn.statically.io/img/www.ncbi.nlm.nih.gov/pmc/articles/instance/3530905/bin/bts635f3.gif)
Similar articles
-
Mapping RNA-seq reads to transcriptomes efficiently based on learning to hash method.Comput Biol Med. 2020 Jan;116:103539. doi: 10.1016/j.compbiomed.2019.103539. Epub 2019 Nov 13. Comput Biol Med. 2020. PMID: 31765913 Review.
-
Optimizing RNA-Seq Mapping with STAR.Methods Mol Biol. 2016;1415:245-62. doi: 10.1007/978-1-4939-3572-7_13. Methods Mol Biol. 2016. PMID: 27115637
-
Mapping RNA-seq Reads with STAR.Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. doi: 10.1002/0471250953.bi1114s51. Curr Protoc Bioinformatics. 2015. PMID: 26334920 Free PMC article. Review.
-
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19. Bioinformatics. 2011. PMID: 21775302 Free PMC article.
-
Supersplat--spliced RNA-seq alignment.Bioinformatics. 2010 Jun 15;26(12):1500-5. doi: 10.1093/bioinformatics/btq206. Epub 2010 Apr 21. Bioinformatics. 2010. PMID: 20410051 Free PMC article.
Cited by
-
Growth behavior and mRNA expression profiling during growth of IPEC-J2 cells.BMC Res Notes. 2024 Jun 5;17(1):154. doi: 10.1186/s13104-024-06812-w. BMC Res Notes. 2024. PMID: 38840260 Free PMC article.
-
Sex differences in metabolic adaptation in infants with cyanotic congenital heart disease.Pediatr Res. 2024 Jun 5. doi: 10.1038/s41390-024-03291-4. Online ahead of print. Pediatr Res. 2024. PMID: 38839995
-
mRNA-encoded Cas13 can be used to treat dengue infections in mice.Nat Microbiol. 2024 Jun 5. doi: 10.1038/s41564-024-01726-6. Online ahead of print. Nat Microbiol. 2024. PMID: 38839984
-
A virally encoded high-resolution screen of cytomegalovirus dependencies.Nature. 2024 Jun;630(8017):712-719. doi: 10.1038/s41586-024-07503-z. Epub 2024 Jun 5. Nature. 2024. PMID: 38839957
-
MYCT1 controls environmental sensing in human haematopoietic stem cells.Nature. 2024 Jun;630(8016):412-420. doi: 10.1038/s41586-024-07478-x. Epub 2024 Jun 5. Nature. 2024. PMID: 38839950 Free PMC article.
References
-
- De Bona F, et al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008;24:i174–180. - PubMed
Publication types
MeSH terms
Associated data
- Actions
- Actions
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources