- Split View
-
Views
-
Cite
Cite
Cheng-Tsung Lu, Kai-Yao Huang, Min-Gang Su, Tzong-Yi Lee, Neil Arvin Bretaña, Wen-Chi Chang, Yi-Ju Chen, Yu-Ju Chen, Hsien-Da Huang, dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications, Nucleic Acids Research, Volume 41, Issue D1, 1 January 2013, Pages D295–D305, https://doi.org/10.1093/nar/gks1229
- Share Icon Share
Abstract
Protein modification is an extremely important post-translational regulation that adjusts the physical and chemical properties, conformation, stability and activity of a protein; thus altering protein function. Due to the high throughput of mass spectrometry (MS)-based methods in identifying site-specific post-translational modifications (PTMs), dbPTM (http://dbPTM.mbc.nctu.edu.tw/) is updated to integrate experimental PTMs obtained from public resources as well as manually curated MS/MS peptides associated with PTMs from research articles. Version 3.0 of dbPTM aims to be an informative resource for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins. In order to investigate the substrate specificity for modification sites, a newly developed statistical method has been applied to identify the significant substrate motifs for each type of PTMs containing sufficient experimental data. According to the data statistics in dbPTM, >60% of PTM sites are located in the functional domains of proteins. It is known that most PTMs can create binding sites for specific protein-interaction domains that work together for cellular function. Thus, this update integrates protein–protein interaction and domain–domain interaction to determine the functional association of PTM sites located in protein-interacting domains. Additionally, the information of structural topologies on transmembrane (TM) proteins is integrated in dbPTM in order to delineate the structural correlation between the reported PTM sites and TM topologies. To facilitate the investigation of PTMs on TM proteins, the PTM substrate sites and the structural topology are graphically represented. Also, literature information related to PTMs, orthologous conservations and substrate motifs of PTMs are also provided in the resource. Finally, this version features an improved web interface to facilitate convenient access to the resource.
INTRODUCTION
Protein post-translational modification (PTM) plays an essential role in various cellular processes that adjusts the physical and chemical properties, folding, conformation, stability and activity of proteins; thus altering protein function (1). More than 200 different types of PTMs have been identified by mass spectrometry (MS)-based proteomics (2). The biological functions of this ubiquitous regulatory mechanisms include phosphorylation for signal transduction, attachment of fatty acids for membrane anchoring and association, glycosylation for changing protein half-life, targeting substrates, promotion of cell–cell and cell–matrix interactions, acetylation and methylation of histone for gene regulation and ubiquitylation for protein degradation (3). With the high-throughput MS or MS/MS-based methods in proteomics, several databases associated with a specific modification type have been established. Phospho.ELM (4), Phosphorylation Site Database (5), PhosphoSitePlus (6), PHOSIDA (7) and PhosPhAt (8) were developed for accumulating experimentally verified phosphorylation sites. NetworKIN (9) and RegPhos (10) designed an integrative method to identify the kinase-substrate phosphorylation networks. O-GLYCBASE (11) and dbOGAP (12) are the databases of glycoproteins, most of which include experimentally verified O-linked glycosylation sites. UbiProt (13) stores experimental ubiquitylated proteins and ubiquitylation sites, which are implicated in protein degradation through an intracellular ATP-dependent proteolytic system. PupDB (14) is a prokaryotic ubiquitin-like protein (Pup) database which stores a collection of experimentally identified pupylated proteins and pupylation sites from published articles. It also integrates the information of pupylated proteins with corresponding structures and functional annotations. An increasing number of proteomic studies have suggested that protein S-nitrosylation plays important role in the nitric oxide (NO)-related redox pathway. With this, a new database named dbSNO (15) was established by manually curating S-nitrosylation peptides from research articles.
With regard to public resources of multiple PTM types currently available, UniProtKB/Swiss-Prot (2,16) includes as much information of PTMs as is available with functional and structural annotations. SysPTM (17) has designed a systematic platform for multi-type PTM research and data mining. Additionally, Human Protein Reference Database (HPRD) (18) contains a wealth of information relevant to the function of human proteins in health and disease, as well as the annotation of PTMs. With the importance of protein modifications in biological processes, we have previously proposed dbPTM (19) which integrates published databases in order to obtain experimentally validated protein modifications, as well as putative PTM substrate sites predicted by a series of accurate computational tools (20–22). Version 2.0 of dbPTM was extended to a knowledge base comprising the modified sites, solvent accessibility of substrate, protein secondary and tertiary structures, protein domains and protein variations (23).
Due to the high throughput of MS/MS-based methods in identifying site-specific PTMs, this version (dbPTM 3.0) not only integrates experimental PTMs from public resources but also manually curates MS/MS peptides associated with PTMs from research articles using a text mining approach. The dbPTM 3.0 aims to be an informative resource for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins. In order to investigate the substrate specificity for modification sites, a newly developed method, MDDLogo (24), has been applied to identify the significant substrate motifs for each type of PTMs. According to the data statistics in dbPTM, >60% of PTM sites are located in protein functional domains. Many PTMs can create binding sites for specific protein-interaction domains that work together for cellular function and read the state of proteome to cellular organization (25). Thus, this update integrates both protein–protein interaction (PPI) and domain–domain interaction information to determine the functional association of PTM sites located in protein-interacting domains. Additionally, in order to delineate the structural correlation between the reported PTM sites and transmembrane (TM) topologies, the information of structural topologies on TM proteins is integrated in dbPTM 3.0. To facilitate the investigation of PTMs on TM proteins, PTM sites as well as the structural topology of TM proteins are graphically represented. Furthermore, the web interface is enhanced to facilitate access to the resource and is now freely accessible at http://dbPTM.mbc.nctu.edu.tw/.
IMPROVEMENTS
The highlighted improvements and advances in dbPTM 3.0 are presented in Figure 1 including data integration from public PTM resources and research articles, investigation of PTM substrate site specificity, investigation of PTM-associated protein interactions, as well as the investigation of the effects of PTM on TM proteins. To facilitate the study of PTMs and their functions, the web interface is redesigned and enhanced. Published literature information related to PTMs, orthologous conservations and substrate motifs of PTM sites are also provided in this online resource. The details of each improved process are depicted as follows.
Data integration from public PTM resources and research articles
Supplementary Figure S1 shows the detailed system flow of the construction of dbPTM 3.0. Due to the inaccessibility of database contents in several online PTM resources, a total 11 biological databases related to PTMs are integrated in dbPTM, including UniProtKB/Swiss-Prot (2), version 9.0 of Phospho.ELM (4), PhosphoSitePlus (6), PHOSIDA (26), version 6.0 of O-GLYCBASE (11), dbOGAP (12), dbSNO (15), version 1.0 of UbiProt (13), PupDB (14), version 1.1 of SysPTM (17) and release 9.0 of HPRD (27). A brief description and the data statistics of the integrated databases are given in Supplementary Table S1. To solve the heterogeneity among the data collected from different sources, the reported modification sites are mapped to the UniProtKB protein entries using sequence comparison. With the high throughput of MS-based methods in post-translational proteomics, this update also includes manually curated MS/MS-identified peptides associated with PTMs from research articles through a literature survey. First, a table list of PTM-related keywords is constructed by referring to the UniProtKB/SwissProt PTM list (http://www.uniprot.org/docs/ptmlist.txt) and the annotations of RESID (28). Then, all fields in the PubMed database are searched based on the keywords of the constructed table list. This is then followed by downloading the full text of the research articles. For the various experiments of proteomic identification, a text-mining system is developed to survey full-text literature that potentially describes the site-specific identification of modified sites. Approximately 800 original and review articles associated with MS/MS proteomics and protein modifications are retrieved from PubMed (July 2012). Next, the full-length articles are manually reviewed for precisely extracting the MS/MS peptides along with the modified sites. Furthermore, in order to determine the locations of PTMs on a full-length protein sequence, the experimentally verified MS/MS peptides are then mapped to UniProtKB protein entries based on its database identifier (ID) and sequence identity. In the process of data mapping, MS/MS peptides that cannot align exactly to a protein sequence are discarded. Finally, each mapped PTM site is attributed with a corresponding literature (PubMed ID).
Detection of PTM substrate site specificities
Due to the difficulty of detecting the conserved motifs for a specific PTM with a large data size, MDDLogo (24) was used to identify the substrate motifs for each type of PTMs containing >500 modified peptides. MDDLogo exploits maximal dependence decomposition (MDD) in order to discover conserved motifs from a group of aligned signal sequences. MDD groups a set of aligned signal sequences into subgroups that capture the most significant dependencies between positions. MDD adopts Chi-squared test to evaluate the dependence of amino acid occurrence between two positions Ai and Aj that surround the PTM substrate sites. MDDLogo has demonstrated its effectiveness in identifying substrate motifs of plant and virus phosphorylation (29,30), as well as the mouse S-nitrosylation (31). In order to extract the motifs that have conserved biochemical property of amino acids when doing MDD, it categorizes the 20 types of amino acids into five groups such as aliphatic, polar and uncharged, acid, basic and aromatic groups, as shown in Supplementary Figure S2. An example of MDD clustering on S-nitrosylation data shows that position −7 has the maximal dependence with the occurrence of basic amino acids, including lysine (k), arginine (r) and histidine (H). Subsequently, all data can be divided into two subgroups: one has the occurrence of basic amino acids in position −7 and the other does not have the occurrence of basic amino acids in position −7. The MDD clustering is a recursive process to divide the data sets into tree-like subgroups.
Integration of protein domains, domain–domain interactions and PPIs
Protein-interaction domains usually recognize short peptide motifs of a target protein but do not bind stably until the peptides have the appropriate PTMs; this can create binding sites for specific protein-interaction domains that work together for cellular function and read the state of proteome to cellular organization (25). For instance, the SH2 domain can bind to phosphotyrosine (pTyr)-associated peptides in a manner that depends on ligand phosphorylation and the motif of the flanking amino acids (32,33). Thus, this update integrates the information of protein functional domains and PPIs to infer the PTM-dependent protein interactions. To investigate the preference of functional domains for PTM, this study refers to the domain annotations in InterPro (34). InterPro is an integrated resource, which was developed initially as a means of rationalizing the complementary efforts of the PROSITE (35), PRINTS (36), Pfam (37) and ProDom (38) databases, for providing protein ‘signatures’ such as protein families, domains and functional sites. For the information of experimentally verified PPIs, five databases including DIP (39), MINT (40), IntAct (41), HPRD (18) and STRING (42) are integrated in dbPTM (see Supplementary Table S2). Additionally, the domain–domain interactions of InterDom (43) are also integrated to determine the functional association for the PTM sites which locate in protein-interacting domains.
Integration of TM proteins with structural topology
TM proteins play crucial roles in various cellular processes (44). A genome-wide study has discovered that ∼20–30% of the proteins encoded by a typical genome are TM proteins (45). However, due to the experimental difficulties in obtaining high-quality structures, TM proteins are notably under-represented in Protein Data Bank (46). The biological roles of PTMs playing on TM proteins include phosphorylation for signal transduction and ion transport, acetylation for structure stability, attachment of fatty acids for membrane anchoring and association, as well as the glycosylation for receptors targeting, cell–cell interactions and virus infection (44,47). With the importance of PTMs functioning on TM proteins, the experimentally curated information of membrane topologies is collected from TMPad (48), TOPDB (49), PDB_TM (50) and OPM (51). In order to provide a comprehensive investigation of TM proteins, a potential set of TM proteins is extracted from UniProtKB (52) by choosing protein entries which contain the keyword ‘TRANSMEM’ in feature (‘FT’) line, the localization of ‘membrane’ and the information of TM topology. The potential TM proteins are further filtered using a TM prediction program MEMSAT (53) to determine its membrane topologies. As shown in Supplementary Table S3, the filtering process resulted in 2216 experimental and 43 142 potential TM proteins with membrane topologies. To facilitate the investigation of PTMs on TM proteins, the structural topology of TM proteins is graphically represented using PHP GD library, as well as the PTM substrate sites. Moreover, the tertiary structures of TM proteins and PTM sites are visualized using the Jmol program (54).
Integration of external biological databases
For a given protein, the basic biological functions can be obtained from the annotations of UniProtKB. To provide more information about protein functional and structural annotations relevant to the modified proteins and the PTM substrate sites, the data contents of Gene Ontology (GO) (55), Protein Data Bank (PDB) (46) and Clusters of Orthologous Groups (COGs) (56) have been integrated in dbPTM. In this study, the information regarding the molecular function, cellular components and biological process for a modified protein can be accessed by a crosslink that refers to the corresponding entry from QuickGO (57) via a UniProtKB accession number. In order to facilitate the investigation of structural characteristics surrounding the PTM substrate sites, protein tertiary structure obtained from PDB was graphically presented by Jmol program. For proteins with tertiary structures (5% of UniProtKB/Swiss-Prot proteins), the protein structural properties, such as solvent accessibility and secondary structure of residues, were calculated by DSSP (58). With respect to the previous studies investigating the structural characteristics of PTMs (59–61) in proteins without known tertiary structures, two effective tools, RVP-net (62) and PSIPRED (63), are used to predict the solvent accessibility and secondary structure, respectively. In order to observe whether a PTM sites located in the conserved regions among orthologous protein sequences, the COGs of proteins were integrated and the ClustalW (64) program was adopted to implement the alignment of multiple protein sequences in each COG cluster.
DATA CONTENT AND UTILITY
Data statistics of the integrated PTM sites
In order to provide the most comprehensive data of PTMs, this update not only integrates experimental PTMs from 11 external PTM-related resources but also manually curates MS/MS peptides associated with PTMs from ∼800 research articles. After removing the redundancy data among these heterogeneous resources, there are totally 208 521 experimental PTM sites in dbPTM 3.0. All the experimental PTM sites are further categorized by PTM types and the number of non-redundant PTM sites is calculated. As the data statistics of representative PTM types shown in Table 1, protein phosphorylation contains the most abundant data of experimentally verified substrate sites. Due to the high throughput of Ms/MS-based proteomics in the site-specific identification of modified peptides, several PTMs have a significantly increasing number of experimental data, including protein ubiquitylation, acetylation, methylation, N-linked and O-linked glycosylation, as well as the emerging S-nitrosylation. In addition to the experimental PTM sites, UniProtKB/Swiss-Prot provides putative PTM sites by using sequence similarity or evolutionary potential, which are annotated as ‘by similarity’, ‘potential’ or ‘probable’ in the ‘MOD_RES’ fields. A total of 226 122 putative sites for all PTM types are integrated in dbPTM. Moreover, a KinasePhos-like method (19–22) has been adopted to construct the profile hidden Markov models (HMMs) for 18 types of PTM. Especially in protein phosphorylation, >70 kinase-specific prediction models are constructed and used to identify the putative phosphorylation sites with their kinases. These models were applied to search the potential PTM sites against UniProtKB/Swiss-Prot protein sequences. As given in Table 1, totally 2 509 267 putative sites for all PTM types are detected by HMMs with 90% predictive specificity. All the experimental PTM sites and putative PTM sites are available and downloadable in the web interface.
PTM types . | Number of experimental substrate sites . | Number of putative substrate sites from UniProtKB/Swiss-Prot . | Number of HMM-predicted sites . |
---|---|---|---|
Phoshorylation | 142 446 | 74 174 | 1 414 879 |
Ubiquitylation | 23 647 | 1702 | 8865 |
N-linked glycosylation | 15 242 | 87 529 | 418 253 |
Acetylation | 9683 | 19 981 | 1156 |
O-linked glycosylation | 3508 | 3695 | 373 758 |
Amidation | 2533 | 1445 | 114 034 |
Hydroxylation | 1629 | 1274 | 9743 |
Methylation | 1585 | 5479 | 22 332 |
Pyrrolidone carboxylic acid | 829 | 742 | 12 322 |
Sumoylation | 725 | 800 | 13 042 |
Gamma-carboxyglutamic acid | 448 | 814 | 1942 |
Palmitoylation | 312 | 5252 | 33 830 |
Sulfation | 207 | 800 | 70 005 |
Myristoylation | 178 | 1275 | 988 |
C-linked glycosylation | 156 | 99 | 3923 |
Prenylation | 130 | 1327 | 6741 |
Nitration | 80 | 93 | 1432 |
Deamidation | 52 | 165 | 2022 |
S-nitrosylation | 3096 | 170 | – |
Oxidation | 333 | 180 | – |
ADP-ribosylation | 140 | 164 | – |
N6-succinyllysine | 88 | 69 | – |
Formylation | 56 | 125 | – |
GPI anchoring | 34 | 849 | – |
Bromination | 33 | 56 | – |
N6-malonyllysine | 33 | 167 | – |
Citrullination | 32 | 110 | – |
N6-carboxylysine | 30 | 1566 | – |
Glutathionylation | 19 | 32 | – |
FAD | 19 | 163 | – |
Others | 1218 | 15 825 | – |
Total | 208 521 | 226 122 | 2 509 267 |
PTM types . | Number of experimental substrate sites . | Number of putative substrate sites from UniProtKB/Swiss-Prot . | Number of HMM-predicted sites . |
---|---|---|---|
Phoshorylation | 142 446 | 74 174 | 1 414 879 |
Ubiquitylation | 23 647 | 1702 | 8865 |
N-linked glycosylation | 15 242 | 87 529 | 418 253 |
Acetylation | 9683 | 19 981 | 1156 |
O-linked glycosylation | 3508 | 3695 | 373 758 |
Amidation | 2533 | 1445 | 114 034 |
Hydroxylation | 1629 | 1274 | 9743 |
Methylation | 1585 | 5479 | 22 332 |
Pyrrolidone carboxylic acid | 829 | 742 | 12 322 |
Sumoylation | 725 | 800 | 13 042 |
Gamma-carboxyglutamic acid | 448 | 814 | 1942 |
Palmitoylation | 312 | 5252 | 33 830 |
Sulfation | 207 | 800 | 70 005 |
Myristoylation | 178 | 1275 | 988 |
C-linked glycosylation | 156 | 99 | 3923 |
Prenylation | 130 | 1327 | 6741 |
Nitration | 80 | 93 | 1432 |
Deamidation | 52 | 165 | 2022 |
S-nitrosylation | 3096 | 170 | – |
Oxidation | 333 | 180 | – |
ADP-ribosylation | 140 | 164 | – |
N6-succinyllysine | 88 | 69 | – |
Formylation | 56 | 125 | – |
GPI anchoring | 34 | 849 | – |
Bromination | 33 | 56 | – |
N6-malonyllysine | 33 | 167 | – |
Citrullination | 32 | 110 | – |
N6-carboxylysine | 30 | 1566 | – |
Glutathionylation | 19 | 32 | – |
FAD | 19 | 163 | – |
Others | 1218 | 15 825 | – |
Total | 208 521 | 226 122 | 2 509 267 |
PTM types . | Number of experimental substrate sites . | Number of putative substrate sites from UniProtKB/Swiss-Prot . | Number of HMM-predicted sites . |
---|---|---|---|
Phoshorylation | 142 446 | 74 174 | 1 414 879 |
Ubiquitylation | 23 647 | 1702 | 8865 |
N-linked glycosylation | 15 242 | 87 529 | 418 253 |
Acetylation | 9683 | 19 981 | 1156 |
O-linked glycosylation | 3508 | 3695 | 373 758 |
Amidation | 2533 | 1445 | 114 034 |
Hydroxylation | 1629 | 1274 | 9743 |
Methylation | 1585 | 5479 | 22 332 |
Pyrrolidone carboxylic acid | 829 | 742 | 12 322 |
Sumoylation | 725 | 800 | 13 042 |
Gamma-carboxyglutamic acid | 448 | 814 | 1942 |
Palmitoylation | 312 | 5252 | 33 830 |
Sulfation | 207 | 800 | 70 005 |
Myristoylation | 178 | 1275 | 988 |
C-linked glycosylation | 156 | 99 | 3923 |
Prenylation | 130 | 1327 | 6741 |
Nitration | 80 | 93 | 1432 |
Deamidation | 52 | 165 | 2022 |
S-nitrosylation | 3096 | 170 | – |
Oxidation | 333 | 180 | – |
ADP-ribosylation | 140 | 164 | – |
N6-succinyllysine | 88 | 69 | – |
Formylation | 56 | 125 | – |
GPI anchoring | 34 | 849 | – |
Bromination | 33 | 56 | – |
N6-malonyllysine | 33 | 167 | – |
Citrullination | 32 | 110 | – |
N6-carboxylysine | 30 | 1566 | – |
Glutathionylation | 19 | 32 | – |
FAD | 19 | 163 | – |
Others | 1218 | 15 825 | – |
Total | 208 521 | 226 122 | 2 509 267 |
PTM types . | Number of experimental substrate sites . | Number of putative substrate sites from UniProtKB/Swiss-Prot . | Number of HMM-predicted sites . |
---|---|---|---|
Phoshorylation | 142 446 | 74 174 | 1 414 879 |
Ubiquitylation | 23 647 | 1702 | 8865 |
N-linked glycosylation | 15 242 | 87 529 | 418 253 |
Acetylation | 9683 | 19 981 | 1156 |
O-linked glycosylation | 3508 | 3695 | 373 758 |
Amidation | 2533 | 1445 | 114 034 |
Hydroxylation | 1629 | 1274 | 9743 |
Methylation | 1585 | 5479 | 22 332 |
Pyrrolidone carboxylic acid | 829 | 742 | 12 322 |
Sumoylation | 725 | 800 | 13 042 |
Gamma-carboxyglutamic acid | 448 | 814 | 1942 |
Palmitoylation | 312 | 5252 | 33 830 |
Sulfation | 207 | 800 | 70 005 |
Myristoylation | 178 | 1275 | 988 |
C-linked glycosylation | 156 | 99 | 3923 |
Prenylation | 130 | 1327 | 6741 |
Nitration | 80 | 93 | 1432 |
Deamidation | 52 | 165 | 2022 |
S-nitrosylation | 3096 | 170 | – |
Oxidation | 333 | 180 | – |
ADP-ribosylation | 140 | 164 | – |
N6-succinyllysine | 88 | 69 | – |
Formylation | 56 | 125 | – |
GPI anchoring | 34 | 849 | – |
Bromination | 33 | 56 | – |
N6-malonyllysine | 33 | 167 | – |
Citrullination | 32 | 110 | – |
N6-carboxylysine | 30 | 1566 | – |
Glutathionylation | 19 | 32 | – |
FAD | 19 | 163 | – |
Others | 1218 | 15 825 | – |
Total | 208 521 | 226 122 | 2 509 267 |
Enhanced web interface
To facilitate the use of the dbPTM resource, the web interface has been redesigned and enhanced to allow efficient access to the protein of interest. Supplementary Figure S3 shows the content of a typical dbPTM query: (i) quick search by IDs and keywords, (ii) basic information, (iii) graphical visualization of PTM sites with structural characteristics and functional domains, (iv) table of experimental PTM sites with reported literature, (v) orthologous conservation of PTM substrate sites, (vi) PPIs and domain–domain interactions and (vii) literature related to PTMs. The combined visualization of PTM sites and function domains for a protein sequence can help users to understand the functional associations of PTM substrate sites. According to the multiple sequence alignment result of orthologous proteins, users can investigate whether a PTM site located in evolutionary conserved regions, which indicates that the orthologous sites in other species could be involved in the same modification. Additionally, this update incorporates the protein functional domains and domain–domain interactions to infer the PTM-dependent protein interactions. Moreover, the literatures associated with PTMs are categorized by the modification type.
In addition to the database query by the protein name, gene name, UniProtKB ID or accession, the protein sequence is allowed for homology search against UniProtKB protein sequence database using Blast (65) program. For browse function of dbPTM web site, a summary table of PTM types and their modified residues is provided for users to efficiently access the number of data in a specific modified amino acid of a PTM type. The annotations of PTM types are referred to the UniProtKB/Swiss-Prot PTM list (http://www.uniprot.org/docs/ptmlist.txt). As depicted in Supplementary Figure S4, the acetylation of lysine (K) is chosen to obtain more detailed information such as the location of the modification in protein sequence, the modified chemical formula, the mass difference and the substrate site specificity, which is the preference of amino acids surrounding the modification sites. The structural characteristics, such as solvent accessibility and secondary structure surrounding the PTM substrate sites, are also provided. Additionally, the substrate site specificity of the acetylated lysines is investigated in detail with reference to the subcellular localizations of acetylated proteins. Previous work has demonstrated that the co-localization of acetyltransferases and substrate proteins could be a promising method to investigate the substrate site specificities and could be adopted to improve the computational identification of protein acetylation sites (66).
Investigation of PTM substrate site specificities
Given a window length, n, the fragment of 2n + 1 residues centering on PTM site (position 0) is extracted and the positional frequencies of amino acids are calculated and presented as sequence logos by WebLogo (67). Supplementary Figure S5 shows the substrate motif and structural characteristics of experimental phosphorylation sites. According to the kinase classification extracted from KinBase (http://kinase.com/) and RegPhos (10), the substrate site specificity of protein phosphorylation could be further categorized into >200 kinase groups. As given in Supplementary Figure S5, most of the kinase-specific substrate motifs have conserved amino acids surrounding the phosphorylation sites. For the PTMs other than phosphorylation, there are no annotations of catalytic enzymes or transferases due to the experimental difficulty in identifying the catalytic enzymes for a specific PTM. Based on the basic concept of sequence conservation, a sequence logo could display the substrate motif for each PTM type with a group of aligned sequences. However, it is difficult to explore conserved motifs for large-scale sequence data; for instance, a sequence logo for all phosphorylation data involved with various catalytic kinases fails to obviously present the kinase-specific substrate specificity. Thus, for the PTM containing sufficient data of experimental substrate sites, MDDLogo was performed to cluster a group of aligned substrate sequences into subgroups containing statistically significant motifs. As the example of protein S-nitrosylation presented in Figure 2, 10 sequence logos, which were identified from 3095 S-nitrosylated peptides with a 13-mer window length, contain a conserved motif of positively charged amino acids (K, R and H) surrounding the S-nitrosocysteine. Interestingly, the first and sixth groups contain the conserved motifs of negatively charged amino acids (D and E) accompanied by positively charged amino acids at two specific positions. Consistent with previous studies (68–73), the S-nitrosylated cysteines may be located within an acid-base motif flanked by acidic and basic amino acids.
Investigation of PTM-associated domains and protein interactions
According to the data statistics in dbPTM, >60% of experimentally verified PTM sites locate in the functional domains of proteins. Such statistics could be analyzed in detail for each type of PTMs. For instance of protein S-nitrosylation, which is an emerging PTM playing crucial role in the regulation of NO-related cellular processes, the statistics shows that ∼70% of the reported S-nitrosylation sites locate within the functional domains. Furthermore, the detailed distribution of functional domains covering S-nitrosylation sites is given in Supplementary Table S4. It is observed that the most preferred functional domain is the ‘nucleotide-binding alpha–beta plait’ with InterPro ID: IPR012677 which covers 47 S-nitrosylation sites. Another preferred functional domain is the ‘RNA recognition motif, RNP-1’ domain with InterPro ID: IPR000504 which covers 46 S-nitrosylation sites. This investigation indicates that these S-nitrosylation sites may play important roles in the domains of proteins involving in DNA or RNA binding (74). In addition, Supplementary Table S5 shows the distribution of functional domains covering substrate sites for several representative PTMs, including acetylation, methylation, hydroxylation, N-linked and O-linked glycosylation, phosphorylation and ubiquitylation.
Many PTMs provide binding sites for specific protein-interaction domains, which often contain a conserved structure for the modified site and a more flexible surface for the flanking amino acids, synergize to regulate cellular processes (75–78). In order to investigate the PTM-associated protein interactions, the information of domain–domain interactions collected from InterDom is adopted in this study. As the case study of ‘Histone H3’ (UniProtKB ID: H31_HUMAN) presented in Figure 3, ‘Heterochromatin protein 1 homolog alpha’ (‘HP1’, UniProtKB ID: CBX5_HUMAN) and ‘WD repeat-containing protein 5’ (‘WDR5’, UniProtKB ID: WDR5_HUMAN) interact with ‘Histone H3’. When investigating the protein interaction between ‘HP1’ and ‘Histone H3’ in detail, there is a domain–domain interaction between ‘Chromodomain’ (InterPro ID: IPR000953) and ‘Histone H3’ (InterPro ID: IPR000164). Among the PTMs located in the domain of ‘Histone H3’, a previous study has demonstrated that the ‘HP1 chromodomain’ can bind to the ‘Histone H3’ methylated at lysine 10 (79). Another protein interaction shows that there is a domain–domain interaction between the ‘WD40 Repeat’ (InterPro ID: IPR001680) and ‘Histone Core’ (InterPro ID: IPR007125). It has been proposed that the structural motif for the specific recognition of methylated ‘Histone H3’ lysine 5 by ‘WD40 Repeat’ of ‘WDR5’ is essential to vertebrate development (80,81). This investigation indicates that the other PTM sites could be the potential binding sites for protein-interaction domains.
Investigation of PTM sites on TM proteins
According to the data statistics of PTM sites and TM proteins in dbPTM, a total of 9644 and 68 775 PTM substrate sites locate on the 2088 experimental and 33 747 potential TM proteins, respectively. In order to investigate the structural distribution of PTM sites on TM proteins, the structural topologies of a TM protein are mainly categorized into four types: extracellular, cytoplasmic, TM and unknown regions. Supplementary Table S6 provides the structural distribution of PTMs containing >10 substrate sites on experimental TM proteins. Interestingly, without the consideration of substrate sites located in unknown region, all of the N-linked (GlcNAc …) glycosylation sites are located in the extracellular region, as well as the O-linked and C-linked glycosylation sites. This investigation is reasonable to understand the biological effect of glycosylation functioning on TM proteins for receptor targeting and cell–cell interactions (47). Otherwise, the phosphorylation sites are mainly located in cytoplasmic regions, which induce signal transduction and ion transport. The structural distribution of PTM sites could be the means to infer the potential roles of PTMs functioning on TM proteins. Actually, a previous work has demonstrated that the incorporation of membrane topology could improve the performance of predicting O-linked glycosylation sites on TM proteins (82). Supplementary Figure S6 shows a graphical visualization of the PTMs and membrane topology on human Beta-2 adrenergic receptor (ADRB2). Furthermore, two modification sites Tyr141 (pTyr) and Cys341 (S-palmitoyl cysteine) are further highlighted in red on the tertiary structure (PDB ID: 2R4R) using Jmol viewer, which indicates the solvent accessibility and distance between them.
CONCLUSION
The expansion of the dbPTM database increases its usefulness for researchers investigating the impact of PTMs on protein function and cellular processes. Additionally, the enhanced web interface enables both wet-lab biologists and bioinformatics researchers to efficiently explore the further information about protein PTMs. Table 2 summarizes the advancements and new features supported in dbPTM 3.0. In the future, we expect dbPTM to continue to grow with the increasing availability of data in resources such as Phospho.ELM, PhosphoSitePlus and UniProtKB. One area that we can envision dbPTM improving greatly in prospective works is implementing a more accurate method for the discovery of PTM substrate motifs. Also, enhancements on the text mining algorithm will enable the system to select MS/MS peptides from research articles associated with protein modifications with a higher confidence rate. In order to provide more adequate information for PTM function, the descriptions associated with the biological function of PTMs will be extracted from research articles using an information retrieval system. Moreover, the thermodynamic parameters for proteins (83), PPIs (84) and protein–nucleic acid interactions (85) could be integrated for the investigation of PTM-associated protein stability.
Features . | dbPTM 1.0 . | dbPTM 2.0 . | dbPTM 3.0 . |
---|---|---|---|
Protein entry | UniProtKB/Swiss-Prot (release 46) | UniProtKB/Swiss-Prot (release 55) | UniProtKB release 2012-04 |
Experimental PTM resource | UniProtKB/Swiss-Prot, Phospho.ELM and O-GLYCBASE | UniProtKB/Swiss-Prot, Phospho.ELM, PHOSIDA, HPRD, O-GLYCBASE and UbiProt | UniProtKB/Swiss-Prot, HPRD, SysPTM, Phospho.ELM, PhosphoSitePlus, PHOSIDA, O-GLYCBASE, dbOGAP, dbSNO, UbiProt and PupDB |
Literature survey of PTMs | – | – | >5000 modified peptides extracted from ∼800 articles |
Literatures related to PTMS | – | Yes | Yes (categorized by PTM types) |
Computationally predicted PTMs | Phosphorylation, glycosylation and sulfation | 20 types of PTM | 18 types of PTM |
Protein tertiary structure | Protein Data Bank (PDB) | Protein Data Bank (PDB) | Protein Data Bank (PDB) |
Structural properties of PTM sites | Amino acid frequency | Amino acid frequency, solvent accessibility and secondary structure | Amino acid frequency, solvent accessibility, secondary structure and intrinsic disorder region |
PTM annotation | RESID (373 PTM annotations) | RESID (431 PTM annotations) | RESID (431 PTM annotations) |
Kinase family annotation | – | KinBase | KinBase and RegPhos |
Protein functional domain | InterPro | InterPro | InterPro and InterProScan |
Protein–protein interaction | – | – | DIP, MINT, IntAct, HPRD and STRING |
Domain–domain interaction | – | – | InterDom |
Functional association of PTM | – | – | PTM-associated domains and PTM-dependent protein interactions |
PTM substrate motif | – | WebLogo | WebLogo and MDDLogo |
Evolutionary conservation of PTM sites | – | ClustalW | ClustalW and COG |
Transmembrane topology | – | – | TMPad, PDBTM, TOPDB and OPM |
Graphical visualization | PTM, solvent accessibility, protein variation and protein domain | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation and sequence logo | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation, sequence logo, PTM substrate motifs, domain–domain interaction, protein–protein interaction, transmembrane topology and tertiary structure of PTMs |
Features . | dbPTM 1.0 . | dbPTM 2.0 . | dbPTM 3.0 . |
---|---|---|---|
Protein entry | UniProtKB/Swiss-Prot (release 46) | UniProtKB/Swiss-Prot (release 55) | UniProtKB release 2012-04 |
Experimental PTM resource | UniProtKB/Swiss-Prot, Phospho.ELM and O-GLYCBASE | UniProtKB/Swiss-Prot, Phospho.ELM, PHOSIDA, HPRD, O-GLYCBASE and UbiProt | UniProtKB/Swiss-Prot, HPRD, SysPTM, Phospho.ELM, PhosphoSitePlus, PHOSIDA, O-GLYCBASE, dbOGAP, dbSNO, UbiProt and PupDB |
Literature survey of PTMs | – | – | >5000 modified peptides extracted from ∼800 articles |
Literatures related to PTMS | – | Yes | Yes (categorized by PTM types) |
Computationally predicted PTMs | Phosphorylation, glycosylation and sulfation | 20 types of PTM | 18 types of PTM |
Protein tertiary structure | Protein Data Bank (PDB) | Protein Data Bank (PDB) | Protein Data Bank (PDB) |
Structural properties of PTM sites | Amino acid frequency | Amino acid frequency, solvent accessibility and secondary structure | Amino acid frequency, solvent accessibility, secondary structure and intrinsic disorder region |
PTM annotation | RESID (373 PTM annotations) | RESID (431 PTM annotations) | RESID (431 PTM annotations) |
Kinase family annotation | – | KinBase | KinBase and RegPhos |
Protein functional domain | InterPro | InterPro | InterPro and InterProScan |
Protein–protein interaction | – | – | DIP, MINT, IntAct, HPRD and STRING |
Domain–domain interaction | – | – | InterDom |
Functional association of PTM | – | – | PTM-associated domains and PTM-dependent protein interactions |
PTM substrate motif | – | WebLogo | WebLogo and MDDLogo |
Evolutionary conservation of PTM sites | – | ClustalW | ClustalW and COG |
Transmembrane topology | – | – | TMPad, PDBTM, TOPDB and OPM |
Graphical visualization | PTM, solvent accessibility, protein variation and protein domain | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation and sequence logo | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation, sequence logo, PTM substrate motifs, domain–domain interaction, protein–protein interaction, transmembrane topology and tertiary structure of PTMs |
Features . | dbPTM 1.0 . | dbPTM 2.0 . | dbPTM 3.0 . |
---|---|---|---|
Protein entry | UniProtKB/Swiss-Prot (release 46) | UniProtKB/Swiss-Prot (release 55) | UniProtKB release 2012-04 |
Experimental PTM resource | UniProtKB/Swiss-Prot, Phospho.ELM and O-GLYCBASE | UniProtKB/Swiss-Prot, Phospho.ELM, PHOSIDA, HPRD, O-GLYCBASE and UbiProt | UniProtKB/Swiss-Prot, HPRD, SysPTM, Phospho.ELM, PhosphoSitePlus, PHOSIDA, O-GLYCBASE, dbOGAP, dbSNO, UbiProt and PupDB |
Literature survey of PTMs | – | – | >5000 modified peptides extracted from ∼800 articles |
Literatures related to PTMS | – | Yes | Yes (categorized by PTM types) |
Computationally predicted PTMs | Phosphorylation, glycosylation and sulfation | 20 types of PTM | 18 types of PTM |
Protein tertiary structure | Protein Data Bank (PDB) | Protein Data Bank (PDB) | Protein Data Bank (PDB) |
Structural properties of PTM sites | Amino acid frequency | Amino acid frequency, solvent accessibility and secondary structure | Amino acid frequency, solvent accessibility, secondary structure and intrinsic disorder region |
PTM annotation | RESID (373 PTM annotations) | RESID (431 PTM annotations) | RESID (431 PTM annotations) |
Kinase family annotation | – | KinBase | KinBase and RegPhos |
Protein functional domain | InterPro | InterPro | InterPro and InterProScan |
Protein–protein interaction | – | – | DIP, MINT, IntAct, HPRD and STRING |
Domain–domain interaction | – | – | InterDom |
Functional association of PTM | – | – | PTM-associated domains and PTM-dependent protein interactions |
PTM substrate motif | – | WebLogo | WebLogo and MDDLogo |
Evolutionary conservation of PTM sites | – | ClustalW | ClustalW and COG |
Transmembrane topology | – | – | TMPad, PDBTM, TOPDB and OPM |
Graphical visualization | PTM, solvent accessibility, protein variation and protein domain | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation and sequence logo | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation, sequence logo, PTM substrate motifs, domain–domain interaction, protein–protein interaction, transmembrane topology and tertiary structure of PTMs |
Features . | dbPTM 1.0 . | dbPTM 2.0 . | dbPTM 3.0 . |
---|---|---|---|
Protein entry | UniProtKB/Swiss-Prot (release 46) | UniProtKB/Swiss-Prot (release 55) | UniProtKB release 2012-04 |
Experimental PTM resource | UniProtKB/Swiss-Prot, Phospho.ELM and O-GLYCBASE | UniProtKB/Swiss-Prot, Phospho.ELM, PHOSIDA, HPRD, O-GLYCBASE and UbiProt | UniProtKB/Swiss-Prot, HPRD, SysPTM, Phospho.ELM, PhosphoSitePlus, PHOSIDA, O-GLYCBASE, dbOGAP, dbSNO, UbiProt and PupDB |
Literature survey of PTMs | – | – | >5000 modified peptides extracted from ∼800 articles |
Literatures related to PTMS | – | Yes | Yes (categorized by PTM types) |
Computationally predicted PTMs | Phosphorylation, glycosylation and sulfation | 20 types of PTM | 18 types of PTM |
Protein tertiary structure | Protein Data Bank (PDB) | Protein Data Bank (PDB) | Protein Data Bank (PDB) |
Structural properties of PTM sites | Amino acid frequency | Amino acid frequency, solvent accessibility and secondary structure | Amino acid frequency, solvent accessibility, secondary structure and intrinsic disorder region |
PTM annotation | RESID (373 PTM annotations) | RESID (431 PTM annotations) | RESID (431 PTM annotations) |
Kinase family annotation | – | KinBase | KinBase and RegPhos |
Protein functional domain | InterPro | InterPro | InterPro and InterProScan |
Protein–protein interaction | – | – | DIP, MINT, IntAct, HPRD and STRING |
Domain–domain interaction | – | – | InterDom |
Functional association of PTM | – | – | PTM-associated domains and PTM-dependent protein interactions |
PTM substrate motif | – | WebLogo | WebLogo and MDDLogo |
Evolutionary conservation of PTM sites | – | ClustalW | ClustalW and COG |
Transmembrane topology | – | – | TMPad, PDBTM, TOPDB and OPM |
Graphical visualization | PTM, solvent accessibility, protein variation and protein domain | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation and sequence logo | PTM, solvent accessibility, secondary structure, protein variation, protein domain, tertiary structure, orthologous conservation, sequence logo, PTM substrate motifs, domain–domain interaction, protein–protein interaction, transmembrane topology and tertiary structure of PTMs |
AVAILABILITY
The data content of dbPTM will be regularly maintained and semiannually updated. The resource is now available at http://dbPTM.mbc.nctu.edu.tw/.
FUNDING
National Science Council of the Republic of China financial support, [contract no. 101-2628-E-155-002-MY2, NSC 101-2311-B-009-003-MY3, NSC 100-2627-B-009-002, NSC 101-2911-I-009-101 and NSC 101-2319-B-400-001]. Funding for open access charge: National Science Council of Taiwan.
Conflict of interest statement. None declared.
REFERENCES
Author notes
The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.
Comments