Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;40(7):1023-1025.
doi: 10.1038/s41587-021-01156-3. Epub 2022 Jan 3.

SignalP 6.0 predicts all five types of signal peptides using protein language models

Affiliations

SignalP 6.0 predicts all five types of signal peptides using protein language models

Felix Teufel et al. Nat Biotechnol. 2022 Jul.

Abstract

Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.

PubMed Disclaimer

Conflict of interest statement

The downloadable version of SignalP 6.0 has been commercialized (it is licensed for a fee to commercial users). The revenue from these commercial sales is divided between the program developers and the Technical University of Denmark.

Figures

Fig. 1
Fig. 1. Modeling SP structure using protein LMs.
a, Region structures of the five SP types. Twin arginine (RR)-translocated SPs feature a twin-arginine motif, while SPs cleaved by SPase II feature a C-terminal lipobox. Sec/SPIII SPs have no substructure. b, Protein LM training procedure. BERT learns protein features by predicting masked amino acids in sequences from UniRef100. c, t-Distributed stochastic neighbor embedding (t-SNE) projection of protein representations before prediction training. Different SP types form distinct clusters, separated from sequences without SPs. d, SignalP 6.0 architecture. An amino acid sequence is passed through the LM, and the resulting representation serves as input for the CRF, which predicts region probabilities at each position and the SP type. CS, cleavage site.
Fig. 2
Fig. 2. SignalP 6.0 shows strong performance on all types and organism groups.
a, SP detection performance (ARC, Archaea; EUK, Eukarya; NEG, Gram-negative bacteria; POS, Gram-positive bacteria). SignalP 6.0 substantially improves performance on underrepresented types. b, CS prediction performance. SignalP 6.0 has improved precision for all categories. c, Dependence of performance on identity to sequences in the training data. At sequence identities lower than 60%, SignalP 6.0 outperforms SignalP 5.0.

Similar articles

Cited by

References

    1. Nielsen H, Tsirigos KD, Brunak S, von Heijne G. A brief history of protein sorting prediction. Protein J. 2019;38:200–216. doi: 10.1007/s10930-019-09838-3. - DOI - PMC - PubMed
    1. Dalbey RE, Wang P, van Dijl JM. Membrane proteases in the bacterial protein secretion and quality control pathway. Microbiol. Mol. Biol. Rev. 2012;76:311–330. doi: 10.1128/MMBR.05019-11. - DOI - PMC - PubMed
    1. Pohlschroder M, Pfeiffer F, Schulze S, Halim MFA. Archaeal cell surface biogenesis. FEMS Microbiol. Rev. 2018;42:694–717. doi: 10.1093/femsre/fuy027. - DOI - PMC - PubMed
    1. Almagro Armenteros JJ, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 2019;37:420–423. doi: 10.1038/s41587-019-0036-z. - DOI - PubMed
    1. Craig L, Forest KT, Maier B. Type IV pili: dynamics, biophysics and functional consequences. Nat. Rev. Microbiol. 2019;17:429–440. doi: 10.1038/s41579-019-0195-4. - DOI - PubMed

Publication types

LinkOut - more resources