Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul;13(7):581-3.
doi: 10.1038/nmeth.3869. Epub 2016 May 23.

DADA2: High-resolution sample inference from Illumina amplicon data

Affiliations

DADA2: High-resolution sample inference from Illumina amplicon data

Benjamin J Callahan et al. Nat Methods. 2016 Jul.

Abstract

We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Sequence variants inferred by DADA2 compared to the OTUs constructed by UPARSE
The merged sequences output by DADA2 are plotted for three Illumina amplicon datasets: (a) Balanced, (b) HMP, and (c) Extreme. Frequency is plotted on the y-axis; Hamming distance to the closest more-abundant sequence on the x-axis. Shapes represent accuracy (Methods). When variants are well separated from other members of the community the sequence variants inferred by DADA2 largely coincide with the OTUs output by UPARSE (black). However, DADA2 resolves additional variation (blue), especially within the UPARSE's OTU radius (dashed line), while outputting fewer spurious sequences (One Off and Other).
Figure 2
Figure 2. Lactobacillus crispatus sequence variants in the human vaginal community during pregnancy
DADA2 identified six Lactobacillus crispatus 16S rRNA sequence variants present in multiple samples and a significant fraction of all reads (L1: 19.7%, L2: 11.1%, L3: 6.5%, L4: 3.1%, L5: 1.3%, L6: 0.4%). (a) The frequency of L1–L6 in each sample. Black bars at the bottom link samples from the same subject. The frequency of (b) L1 vs. L2, and (c) L1 vs. L3, by sample. The dashed line indicates a total frequency of 1.

Similar articles

Cited by

References

    1. Human Microbiome Project Consortium. Nature. 2012;486:207–214. - PMC - PubMed
    1. Rosen MJ, Davison M, Bhaya D, Fisher DS. Science. 2015;348:1019–1023. - PubMed
    1. Reeder J, Knight R. Nat Methods. 2010;7:668–669. - PMC - PubMed
    1. Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. BMC Bioinformatics. 2011;12:38. - PMC - PubMed
    1. Rosen MJ, Callahan BJ, Fisher DS, Holmes SP. BMC Bioinformatics. 2012;13:283. - PMC - PubMed

Methods-only References

    1. Sun Y, et al. ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Res. 2009;37:e76. - PMC - PubMed
    1. Caporaso JG, et al. ISME J. 2012;6:1621. - PMC - PubMed
    1. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. Bioinformatics. 2011;27:2194–2200. - PMC - PubMed

Publication types

MeSH terms