Comparative Study

. 2003 Sep;13(9):2129-41.

doi: 10.1101/gr.772403.

PANTHER: a library of protein families and subfamilies indexed by function

Paul D Thomas¹, Michael J Campbell, Anish Kejariwal, Huaiyu Mi, Brian Karlak, Robin Daverman, Karen Diemer, Anushya Muruganujan, Apurva Narechania

Affiliations

PMID: 12952881
PMCID: PMC403709
DOI: 10.1101/gr.772403

Comparative Study

PANTHER: a library of protein families and subfamilies indexed by function

Paul D Thomas et al. Genome Res. 2003 Sep.

. 2003 Sep;13(9):2129-41.

doi: 10.1101/gr.772403.

Authors

Paul D Thomas¹, Michael J Campbell, Anish Kejariwal, Huaiyu Mi, Brian Karlak, Robin Daverman, Karen Diemer, Anushya Muruganujan, Apurva Narechania

Affiliation

¹ Protein Informatics, Celera Genomics, Foster City, California 94404, USA. paul.thomas@fc.celera.com

PMID: 12952881
PMCID: PMC403709
DOI: 10.1101/gr.772403

Abstract

In the genomic era, one of the fundamental goals is to characterize the function of proteins on a large scale. We describe a method, PANTHER, for relating protein sequence relationships to function relationships in a robust and accurate way. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of "books," each representing a protein family as a multiple sequence alignment, a Hidden Markov Model (HMM), and a family tree. Functional divergence within the family is represented by dividing the tree into subtrees based on shared function, and by subtree HMMs. PANTHER/X is an abbreviated ontology for summarizing and navigating molecular functions and biological processes associated with the families and subfamilies. We apply PANTHER to three areas of active research. First, we report the size and sequence diversity of the families and subfamilies, characterizing the relationship between sequence divergence and functional divergence across a wide range of protein families. Second, we use the PANTHER/X ontology to give a high-level representation of gene function across the human and mouse genomes. Third, we use the family HMMs to rank missense single nucleotide polymorphisms (SNPs), on a database-wide scale, according to their likelihood of affecting protein function.

PubMed Disclaimer

Figures

**Figure 1**
Number of sequences in PANTHER families and subfamilies. (A) the distribution of the sizes of PANTHER/LIB families. Note that families are limited to no less than 10 sequences, and no more than 1000 sequences. (B) distribution of the sizes of PANTHER/LIB subfamilies. Singleton subfamilies are not included in the figure. The insets show a more detailed view of the distributions for sizes smaller than 100 sequences.

**Figure 3**
Pairwise identity within PANTHER families and subfamilies. (A) Average pair-wise identity within PANTHER families. (B) Average pairwise identity within PANTHER subfamilies. Singleton subfamilies are not included. Pairwise identity is calculated over only the region of the sequences that aligns to the family HMM.

**Figure 4**
Comparing classifications of human and mouse LocusLink genes using GO terms and their mapped PANTHER/X terms. Top-level molecular function categories for (A) PANTHER/X and (B) GO. Top-level biological process terms for (C) PANTHER/X and (D) GO. The set of gene classifications is identical for PANTHER/X and GO; the difference is in organization (relationships between ontology terms).

**Figure 5**
Distribution of amino acid scores (aaPEC) for different missense SNP alleles in HGMD and dbSNP. (A) The distribution from HGMD shows that >40% of the disease-associated mutant alleles (hatched bars) are rare (aaPEC < –3) in alignments of related sequences, whereas >70% of the wild-type alleles (blackbars) are the most common allele across evolutionarily related sequences (aaPEC = 0). (B) The distribution from dbSNP (presumably randomly sampled SNPs) is very different from A, containing four times fewer evolutionarily rare alleles (aaPEC < –3) and more than one-third fewer evolutionarily most common alleles (aaPEC = 0).

**Figure 6**
Predicting whether a missense SNP will have an effect on protein function: comparison between position-specific scores (subPEC) and “average” substitution scores. Position-specific scores from PANTHER HMMs (blue line) make a larger number of correct predictions (true positives shown on Y-axis) for a given number of errors (false positives shown on X-axis) than scores from the two most commonly referenced substitution scores: the Grantham scale (green line) and the BLOSUM62 substitution matrix (red line). The black line shows the curve for a random prediction, as a reference. HGMD mutations are used to approximate a set of functionally impaired proteins, and dbSNP variations are used to approximate a set of functional proteins (see text for more details).

**Figure 7**
Schematic illustration of the process for building PANTHER families.

See this image and copyright information in PMC

Cited by

The Long Non-Coding RNA MALAT1 Modulates NR4A1 Expression through a Downstream Regulatory Element in Specific Cancer Cell Types.
Wernig-Zorc S, Schwartz U, Martínez-Rodríguez P, Inalef J, Pavicic F, Ehrenfeld P, Längst G, Maldonado R. Wernig-Zorc S, et al. Int J Mol Sci. 2024 May 18;25(10):5515. doi: 10.3390/ijms25105515. Int J Mol Sci. 2024. PMID: 38791553 Free PMC article.
In vitro generation of genetic diversity for directed evolution by error-prone artificial DNA synthesis.
Wang B, Liu Y, Bai X, Tian H, Wang L, Feng M, Xia H. Wang B, et al. Commun Biol. 2024 May 24;7(1):628. doi: 10.1038/s42003-024-06340-0. Commun Biol. 2024. PMID: 38789612 Free PMC article.
Integrated proteomic, phosphoproteomic, and N-glycoproteomic analyses of small extracellular vesicles from C2C12 myoblasts identify specific PTM patterns in ligand-receptor interactions.
Chen X, Song X, Li J, Wang J, Yan Y, Yang F. Chen X, et al. Cell Commun Signal. 2024 May 16;22(1):273. doi: 10.1186/s12964-024-01640-8. Cell Commun Signal. 2024. PMID: 38755675 Free PMC article.
Lactobacillus paracasei subsp. paracasei 2004 improves health and lifespan in Caenorhabditis elegans.
Kishimoto S, Nono M, Makizaki Y, Tanaka Y, Ohno H, Nishida E, Uno M. Kishimoto S, et al. Sci Rep. 2024 May 7;14(1):10453. doi: 10.1038/s41598-024-60580-y. Sci Rep. 2024. PMID: 38714725 Free PMC article.
Chromatin accessibility profiling reveals that human fibroblasts respond to mechanical stimulation in a cell-specific manner.
Logan NJ, Broda KL, Pantelireis N, Williams G, Higgins CA. Logan NJ, et al. JBMR Plus. 2024 Feb 29;8(5):ziae025. doi: 10.1093/jbmrpl/ziae025. eCollection 2024 May. JBMR Plus. 2024. PMID: 38682000 Free PMC article.

See all "Cited by" articles

References

1. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. - PMC - PubMed
1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25: 25–29. - PMC - PubMed
1. Attwood, T.K., Beck, M.E., Bleasby, A.J., and Parry-Smith, D.J. 1994. PRINTS—A database of protein motif fingerprints. Nucleic Acids Res. 22: 3590–3596. - PMC - PubMed
1. Bairoch, A. 1991. PROSITE: A dictionary of sites and patterns in proteins. Nucleic Acids Res. 19 Suppl: 2241–2245. - PMC - PubMed
1. Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28: 45–48. - PMC - PubMed

WEB SITE REFERENCES

1. ftp://ftp.ncbi.nih.gov/refseq/LocusLink/; NCBI LocusLink.
1. http://panther.celera.com; PANTHER Protein Classification.
1. http://www.geneontology.org; Gene Ontology Consortium.
1. http://www.ncbi.nlm.nih.gov/omim/; OMIM, Online Mendelian Inheritance in Man.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PANTHER: a library of protein families and subfamilies indexed by function

Affiliation

PANTHER: a library of protein families and subfamilies indexed by function

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

WEB SITE REFERENCES

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

WEB SITE REFERENCES

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources