Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 7;50(D1):D785-D794.
doi: 10.1093/nar/gkab776.

GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

Affiliations

GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

Donovan H Parks et al. Nucleic Acids Res. .

Abstract

The Genome Taxonomy Database (GTDB; https://gtdb.ecogenomic.org) provides a phylogenetically consistent and rank normalized genome-based taxonomy for prokaryotic genomes sourced from the NCBI Assembly database. GTDB R06-RS202 spans 254 090 bacterial and 4316 archaeal genomes, a 270% increase since the introduction of the GTDB in November, 2017. These genomes are organized into 45 555 bacterial and 2339 archaeal species clusters which is a 200% increase since the integration of species clusters into the GTDB in June, 2019. Here, we explore prokaryotic diversity from the perspective of the GTDB and highlight the importance of metagenome-assembled genomes in expanding available genomic representation. We also discuss improvements to the GTDB website which allow tracking of taxonomic changes, easy assessment of genome assembly quality, and identification of genomes assembled from type material or used as species representatives. Methodological updates and policy changes made since the inception of the GTDB are then described along with the procedure used to update species clusters in the GTDB. We conclude with a discussion on the use of average nucleotide identities as a pragmatic approach for delineating prokaryotic species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Growth of the GTDB since its inception in November 2017. (A, B) Number of bacterial and archaeal isolates, MAGs, and SAGs in the GTDB along with the total number of genomes. Archaea were introduced into the GTDB starting with R03-RS86 in August, 2018. (C, D) Percent growth in the number of bacterial and archaeal taxa in the GTDB. (E, F) Proportion of bacterial and archaeal taxa at each taxonomic rank in GTDB R06-RS202 comprised exclusively of environmental genomes (MAGs and/or SAGs), exclusively of isolates, or both isolate and environmental genomes. For comparison, the proportion of isolate and environmental genomes is shown in the right bar plot.
Figure 2.
Figure 2.
Taxonomic, nomenclatural, and assembly quality information provided for individual genomes. (A) NCBI genome assembly accession and GTDB quality badges associated with this genome. Hovering over a tag provides information about the criteria used to establish the tag. (B) An external link is always provided to the NCBI Assembly page of the genome as all GTDB genomes are sourced from NCBI. A link is also provided to an LPSN species page when a genome is established to be assembled from the type strain of a species based on LPSN nomenclatural information. (C) GTDB and NCBI classifications for this genome along with its strain identifiers, nomenclatural status at GTDB (i.e. type strain of the species), and GTDB species representative status. GTDB taxa link to their corresponding position in the GTDB Taxonomy Tree (i.e. Figure 2F) while NCBI taxa link to NCBI Taxonomy Browser pages. Each genome also links to a table indicating all genomes in the same GTDB species cluster. (D) GTDB classification of the genome in each GTDB release. GTDB taxa link to their corresponding Taxon History page (i.e. Figure 2E). (E) GTDB Taxon History view for genomes classified as Enterocloster bolteae indicating this species was reclassified from Clostridium_M to Enterocloster in GTDB R95. Numbers in parenthesis indicate the number of genomes assigned to a taxon. The Not Present label indicates genomes that were not available at the time of a GTDB release or failed the GTDB quality-control criteria used for the release, and thus had no GTDB classification. (F) GTDB Taxonomy Tree which provides a hierarchical exploration of the GTDB taxonomy and indicates nomenclatural type information, genomes selected as GTDB representatives, and Latin names in the GTDB which remain to be validated. Genomes link to their corresponding GTDB Genome page (i.e. A–D).
Figure 3.
Figure 3.
Updating species clusters with each GTDB release. (A) Workflow for updating GTDB species clusters with results for the most recent GTDB release, R06-RS202, given below each step. There were 90 368 new genomes in this release, 987 genomes where the assembly at NCBI was updated, and 158 genomes where the assembly was suppressed at NCBI and thus not used in this release. All genomes were subjected to quality control which resulted in 26 407 (9.3%) genomes being removed from consideration. There were 2,458 species where multiple genomes were identified as being assembled from the type strain of the species. Of these, 130 species had genomes that were sufficiently divergent to warrant manual inspection to establish the genome most likely to be from the type strain. The 31 910 representatives from the previous GTDB release, R05-RS95, were examined and 1131 (3.5%) updated to a new genome. In addition, 6 species defined in R05-RS95 were retired as the sole genome representing the species was suppressed at NCBI. (B) Illustrative example of a GTDB species cluster with previous and new genomes. Genomes are depicted by shapes and the distance between genomes scales with their ANI divergence. The large red circle indicates the ANI circumscription radii for assigning genomes to the current species clusters. The new/updated genome (blue triangle) will only replace the existing GTDB species representative (red circle) if the ANI between these genomes is sufficiently high and the new/updated genome is of sufficient quality (see Table 1). This decision is determined quantitatively using the balanced ANI score (see main text). (C) Updating the Macrococcus equipercicus species cluster from GTDB R05-RS95 to R06-RS202. The M. equipercicus genome assembly, GCF_004359525.1, was updated and found to be distinct from the previous assembly (ANI = 80.6%). Consequently, this genome formed a new species cluster and the genome GCF_004359515.1 was promoted to a species representative. GCF_004359525.2 is assembled from the type strain of M. equipercicus and GCF_004359515.1 assembled from the type strain of M. carouselicus indicating the M. equipercicus cluster in GTDB R05-RS95 actually represented the species M. carouselicus and was incorrectly classified as a result of the GCF_004359525.1 assembly being incorrect.
Figure 4.
Figure 4.
Use of genomic similarity for delineating species. (A) ANI values between 36 781 GTDB species representatives and their closest representative within the same genus. (B) Same plot as in A but restricted to the 9,687 species where the GTDB representative genome is assembled from the type strain. (C) Pairwise ANI between the 24 Enterocloster bolteae and 34 E. clostridioformis genomes in GTDB RS06-RS202. (D) Pairwise ANI between the 14 Bradyrhizobium elkanii and 8 B. pachyrhizi genomes in GTDB RS06-RS202. (E) ANI between genomes and their closest genome in a different, intrageneric species cluster (108 503 total pairs). (F) ANI between the closest pairs in plot E for each of the 35 147 species considered.

Similar articles

  • A complete domain-to-species taxonomy for Bacteria and Archaea.
    Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. Parks DH, et al. Nat Biotechnol. 2020 Sep;38(9):1079-1086. doi: 10.1038/s41587-020-0501-8. Epub 2020 Apr 27. Nat Biotechnol. 2020. PMID: 32341564
  • Putative genome contamination has minimal impact on the GTDB taxonomy.
    Mussig AJ, Chaumeil PA, Chuvochina M, Rinke C, Parks DH, Hugenholtz P. Mussig AJ, et al. Microb Genom. 2024 May;10(5):001256. doi: 10.1099/mgen.0.001256. Microb Genom. 2024. PMID: 38809778 Free PMC article.
  • GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database.
    Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. Chaumeil PA, et al. Bioinformatics. 2019 Nov 15;36(6):1925-7. doi: 10.1093/bioinformatics/btz848. Online ahead of print. Bioinformatics. 2019. PMID: 31730192 Free PMC article.
  • En route to a genome-based classification of Archaea and Bacteria?
    Klenk HP, Göker M. Klenk HP, et al. Syst Appl Microbiol. 2010 Jun;33(4):175-82. doi: 10.1016/j.syapm.2010.03.003. Epub 2010 Apr 20. Syst Appl Microbiol. 2010. PMID: 20409658 Review.
  • Roadmap for naming uncultivated Archaea and Bacteria.
    Murray AE, Freudenstein J, Gribaldo S, Hatzenpichler R, Hugenholtz P, Kämpfer P, Konstantinidis KT, Lane CE, Papke RT, Parks DH, Rossello-Mora R, Stott MB, Sutcliffe IC, Thrash JC, Venter SN, Whitman WB, Acinas SG, Amann RI, Anantharaman K, Armengaud J, Baker BJ, Barco RA, Bode HB, Boyd ES, Brady CL, Carini P, Chain PSG, Colman DR, DeAngelis KM, de Los Rios MA, Estrada-de Los Santos P, Dunlap CA, Eisen JA, Emerson D, Ettema TJG, Eveillard D, Girguis PR, Hentschel U, Hollibaugh JT, Hug LA, Inskeep WP, Ivanova EP, Klenk HP, Li WJ, Lloyd KG, Löffler FE, Makhalanyane TP, Moser DP, Nunoura T, Palmer M, Parro V, Pedrós-Alió C, Probst AJ, Smits THM, Steen AD, Steenkamp ET, Spang A, Stewart FJ, Tiedje JM, Vandamme P, Wagner M, Wang FP, Yarza P, Hedlund BP, Reysenbach AL. Murray AE, et al. Nat Microbiol. 2020 Aug;5(8):987-994. doi: 10.1038/s41564-020-0733-x. Epub 2020 Jun 8. Nat Microbiol. 2020. PMID: 32514073 Free PMC article. Review.

Cited by

References

    1. Parks D.H., Rinke C., Chuvochina M., Chaumeil P.-A., Woodcroft B.J., Evans P.N., Hugenholtz P., Tyson G.W.. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2017; 2:1533–1542. - PubMed
    1. Pasolli E., Asnicar F., Manara S., Zolfo M., Karcher N., Armanini F., Beghini F., Manghi P., Tett A., Ghensi P.et al. .. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell. 2019; 176:649–662. - PMC - PubMed
    1. Nayfach S., Roux S., Seshadri R., Udwary D., Varghese N., Schulz F., Wu D., Paez-Espino D., Chen I.-M., Huntemann M.et al. .. A genomic catalog of Earth's microbiomes. Nat. Biotechnol. 2020; 39:499–509. - PMC - PubMed
    1. Parks D.H., Chuvochina M., Waite D.W., Rinke C., Skarshewski A., Chaumeil P.-A., Hugenholtz P.. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 2018; 36:996–1004. - PubMed
    1. Parks D.H., Chuvochina M., Chaumeil P.-A., Rinke C., Mussig A.J., Hugenholtz P.. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat. Biotechnol. 2020; 38:1079–1086. - PubMed

Publication types