Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Sep;13(9):2129-41.
doi: 10.1101/gr.772403.

PANTHER: a library of protein families and subfamilies indexed by function

Affiliations
Comparative Study

PANTHER: a library of protein families and subfamilies indexed by function

Paul D Thomas et al. Genome Res. 2003 Sep.

Abstract

In the genomic era, one of the fundamental goals is to characterize the function of proteins on a large scale. We describe a method, PANTHER, for relating protein sequence relationships to function relationships in a robust and accurate way. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of "books," each representing a protein family as a multiple sequence alignment, a Hidden Markov Model (HMM), and a family tree. Functional divergence within the family is represented by dividing the tree into subtrees based on shared function, and by subtree HMMs. PANTHER/X is an abbreviated ontology for summarizing and navigating molecular functions and biological processes associated with the families and subfamilies. We apply PANTHER to three areas of active research. First, we report the size and sequence diversity of the families and subfamilies, characterizing the relationship between sequence divergence and functional divergence across a wide range of protein families. Second, we use the PANTHER/X ontology to give a high-level representation of gene function across the human and mouse genomes. Third, we use the family HMMs to rank missense single nucleotide polymorphisms (SNPs), on a database-wide scale, according to their likelihood of affecting protein function.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Number of sequences in PANTHER families and subfamilies. (A) the distribution of the sizes of PANTHER/LIB families. Note that families are limited to no less than 10 sequences, and no more than 1000 sequences. (B) distribution of the sizes of PANTHER/LIB subfamilies. Singleton subfamilies are not included in the figure. The insets show a more detailed view of the distributions for sizes smaller than 100 sequences.
Figure 2
Figure 2
Overlap of PANTHER families. Some sequences appear in more than one family, and this figure shows the distribution of the number of families in which a given sequence appears. Most sequences (163,912, 85%) appear in only one family, and no sequence appears in more than nine families.
Figure 3
Figure 3
Pairwise identity within PANTHER families and subfamilies. (A) Average pair-wise identity within PANTHER families. (B) Average pairwise identity within PANTHER subfamilies. Singleton subfamilies are not included. Pairwise identity is calculated over only the region of the sequences that aligns to the family HMM.
Figure 4
Figure 4
Comparing classifications of human and mouse LocusLink genes using GO terms and their mapped PANTHER/X terms. Top-level molecular function categories for (A) PANTHER/X and (B) GO. Top-level biological process terms for (C) PANTHER/X and (D) GO. The set of gene classifications is identical for PANTHER/X and GO; the difference is in organization (relationships between ontology terms).
Figure 4
Figure 4
Comparing classifications of human and mouse LocusLink genes using GO terms and their mapped PANTHER/X terms. Top-level molecular function categories for (A) PANTHER/X and (B) GO. Top-level biological process terms for (C) PANTHER/X and (D) GO. The set of gene classifications is identical for PANTHER/X and GO; the difference is in organization (relationships between ontology terms).
Figure 5
Figure 5
Distribution of amino acid scores (aaPEC) for different missense SNP alleles in HGMD and dbSNP. (A) The distribution from HGMD shows that >40% of the disease-associated mutant alleles (hatched bars) are rare (aaPEC < –3) in alignments of related sequences, whereas >70% of the wild-type alleles (blackbars) are the most common allele across evolutionarily related sequences (aaPEC = 0). (B) The distribution from dbSNP (presumably randomly sampled SNPs) is very different from A, containing four times fewer evolutionarily rare alleles (aaPEC < –3) and more than one-third fewer evolutionarily most common alleles (aaPEC = 0).
Figure 6
Figure 6
Predicting whether a missense SNP will have an effect on protein function: comparison between position-specific scores (subPEC) and “average” substitution scores. Position-specific scores from PANTHER HMMs (blue line) make a larger number of correct predictions (true positives shown on Y-axis) for a given number of errors (false positives shown on X-axis) than scores from the two most commonly referenced substitution scores: the Grantham scale (green line) and the BLOSUM62 substitution matrix (red line). The black line shows the curve for a random prediction, as a reference. HGMD mutations are used to approximate a set of functionally impaired proteins, and dbSNP variations are used to approximate a set of functional proteins (see text for more details).
Figure 7
Figure 7
Schematic illustration of the process for building PANTHER families.

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. - PMC - PubMed
    1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25: 25–29. - PMC - PubMed
    1. Attwood, T.K., Beck, M.E., Bleasby, A.J., and Parry-Smith, D.J. 1994. PRINTS—A database of protein motif fingerprints. Nucleic Acids Res. 22: 3590–3596. - PMC - PubMed
    1. Bairoch, A. 1991. PROSITE: A dictionary of sites and patterns in proteins. Nucleic Acids Res. 19 Suppl: 2241–2245. - PMC - PubMed
    1. Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28: 45–48. - PMC - PubMed

WEB SITE REFERENCES

    1. ftp://ftp.ncbi.nih.gov/refseq/LocusLink/; NCBI LocusLink.
    1. http://panther.celera.com; PANTHER Protein Classification.
    1. http://www.geneontology.org; Gene Ontology Consortium.
    1. http://www.ncbi.nlm.nih.gov/omim/; OMIM, Online Mendelian Inheritance in Man.

Publication types

MeSH terms

LinkOut - more resources