-
PDF
- Split View
-
Views
-
Cite
Cite
Leonardo Mancabelli, Christian Milani, Gabriele Andrea Lugli, Francesca Turroni, Deborah Cocconi, Douwe van Sinderen, Marco Ventura, Identification of universal gut microbial biomarkers of common human intestinal diseases by meta-analysis, FEMS Microbiology Ecology, Volume 93, Issue 12, December 2017, fix153, https://doi.org/10.1093/femsec/fix153
- Share Icon Share
Abstract
Intestinal diseases, such as Crohn's disease (CD), ulcerative colitis (UC) and pseudomembranous colitis (CDI), are among the most common diseases in humans and may lead to more serious pathologies, e.g. colorectal cancer (CRC). Next generation sequencing has in recent years allowed the identification of correlations between intestinal bacteria and diseases, although the formulation of universal gut microbial biomarkers for such diseases is only in its infancy. In the current study, we selected and reanalyzed a total of 3048 public datasets obtained from 16S rRNA profiling of individuals affected by CD, UC, CDI and CRC. This meta-analysis revealed possible biases in the reconstruction of the gut microbiota composition due to the use of different primer pairs employed for PCR of 16S rRNA gene fragments. Notably, this approach also identified common features of individuals affected by gut diseases (DS), including lower biodiversity compared to control subjects. Moreover, potential universal intestinal disease microbial biomarkers were identified through cross-disease comparisons. In detail, CTRL showed high abundance of the genera Barnesiella, Ruminococcaceae UCG-005, Alistipes, Christensenellaceae R-7 group and unclassified member of Lachnospiraceae family, while DS exhibited high abundance of Lactobacillus, unclassified member of Erysipelotrichaceae family and Streptococcus genera.
INTRODUCTION
In the last 50 years, the incidence and prevalence of inflammatory bowel diseases (IBD), such as Crohn's disease (CD) and ulcerative colitis (UC), has increased worldwide (Cosnes et al.2011; Molodecky et al.2012; Ananthakrishnan 2015), especially in traditionally low-incident regions such as Asia, South America as well as southern and eastern Europe (Lovasz et al.2013; Ng et al.2013). Moreover, IBD represents one of the main risk factors for the development of colorectal cancer (CRC) (Kulaylat and Dayton 2010) that is a major cause of morbidity and mortality throughout the world (Haggar and Boushey 2009). Furthermore, populations living in industrialized countries have been affected in the last 15 years by an increased incidence of Clostridium difficile infections (CDI) (Reveles et al.2014), representing one of the most common hospital-acquired infections, and being generally associated with antibiotic use and responsible for pseudomembranous colitis (Antharam et al.2013). The recent development of high throughput sequencing technologies, such as Roche 454, Ion Torrent and Illumina, allowed profiling of the bacterial population harbored by the intestinal tract, i.e. gut microbiota, and characterization of alterations in this microbiota, sometimes referred to as gut dysbiosis, associated with major intestinal diseases (Carding et al.2015).
Although many studies have investigated possible correlations between gut microbiota and intestinal diseases, thereby revealing possible gut bacterial biomarkers (Table S1, Supporting Information), most of such studies would have focused on a single pathology such as CD (Perez-Brocal et al.2015; Eun et al.2016), UC (Duranti et al.2016; Mar et al.2016), CRC (Kostic et al.2012; Geng et al.2013; Wu et al.2013; Zackular et al.2014; Burns et al.2015) and CDI (Yatsunenko et al.2012; Antharam et al.2013; Rojo et al.2015; Gu et al.2016; Khanna et al.2016; Milani et al.2016). Nonetheless, the accuracy of the majority of published microbiota analyses is very much dependent on the applied methodology (Turroni et al.2012; Milani et al.2013). One of the most critical steps for accurate 16S rRNA-based microbiota profiling is the selection of primer pairs used for amplification that may lead to under-representation or selection against single bacterial species or even complete microbial groups (Klindworth et al.2013).
Here, we evaluate the accuracy of different and currently used 16S rRNA gene-targeting PCR primers, and evaluate their impact on the profiling of the gut microbiota. Furthermore, we performed a meta-analysis of case-control studies focusing on 16S rRNA profiling of the gut microbiota of individuals affected by CD, UC, CRC and CDI. We selected a total of 3048 datasets, 1252 corresponding to control subjects (CTRL) and 1796 corresponding to individuals with intestinal diseases, retrieved from 24 public studies (Table S2, Supporting Information). In detail, we collected 359, 1457; 512 and 720 samples belonging to CRC (Kostic et al.2012; Geng et al.2013; Weir et al.2013; Wu et al.2013; Zackular et al.2014; Burns et al.2015), CD (Gevers et al.2014; Perez-Brocal et al.2015; Eun et al.2016), Clostridium difficile infections (Yatsunenko et al.2012; Antharam et al.2013; Rojo et al.2015; Gu et al.2016; Khanna et al.2016; Milani et al.2016) and UC studies (Gevers et al.2014; Duranti et al.2016; Mar et al.2016; Shah et al.2016), respectively.
MATERIALS AND METHODS
Selection of databases
All datasets included in this meta-analysis were collected from publicly available and published comparative human gut microbiota studies in the context of CD, UC, CRC and Clostridium difficile infections. For each intestinal disease, we collected 16S rRNA profiling datasets from a minimum of five studies. Illumina sequencing technology was preferred in order to ensure high data coverage and quality. Nevertheless, if Illumina datasets were not available, we included data produced by means of 454 sequencing. Moreover, selected datasets had to represent both control and diseased subjects, and obtained from fecal samples or biopsies collected from the adult human large intestine (average age of 22 ± 18).
Evaluation of primer pairs efficiency
The performance of primer pairs employed in the studies included in our meta-analysis (Table S3, Supporting Information) were evaluated through the web-tool TestPrime 1.0 (Klindworth et al.2013). The latter performs an in silico PCR using the SILVA database as template and provides the percentage of amplified sequences for each bacterial genus (Klindworth et al.2013).
16S rRNA-based microbiota analysis
To avoid biases caused by different bioinformatic analysis pipelines, the sequence read pools of each study were filtered and analyzed using the same custom script based on the QIIME software suite (Caporaso et al.2010). Quality control retained sequences with a length between 140 and 400 bp and mean sequence quality score >20, while sequences with homopolymers >7 bp and mismatched primers were omitted. 16S rRNA operational taxonomic units (OTUs) were defined at ≥97% sequence homology using UCLUST (Edgar 2010) and OTUs with less than 10 sequences were filtered. All reads were classified into the lowest possible taxonomic rank using QIIME (Caporaso et al.2010) and a reference dataset from the SILVA database v.123 (Quast et al.2013). In order to assess the bacterial complexity, the alpha diversity was evaluated based on Chao1 indexes and represented by rarefaction curves generated using 10 subsampling of the whole datasets. Furthermore, the Bray–Curtis dissimilarity index was used to estimate the beta-diversity between CTRL and individuals affected by intestinal diseases. Dissimilarities were reported through a 3D principal coordinate analysis (PCoA) representation.
QIIME and SPSS software (www.ibm.com/software/it/analytics/spss/) were used to compute statistical analyses. PERMANOVA were performed using 1000 permutations to estimate P-values for differences among populations in PCoA analyses. Furthermore, differential abundance of bacterial genera and alpha-diversity was tested by ANOVA. Moreover, covariance analysis between primer pairs and bacterial relative abundance was performed through the Pearson correlation coefficient.
RESULTS AND DISCUSSION
Homogeneity of the samples
16S rRNA-based microbiota profiling is a technique that relies on next-generation sequencing data for a cost-effective analysis of the bacterial community present in a given environmental sample. Due to its accuracy and ability to profile non-cultivable taxa, 16S rRNA-based profiling rapidly became the most widely exploited approach for gut microbiota characterization. Nevertheless, the absence of a gold standard protocol led to extensive methodological variation, with consequent output biases that might prevent reliable and meaningful comparisons between datasets derived from different studies (Milani et al.2013). In detail, possible biases may be due to study design, sample collection, transport and storage of the samples, DNA extraction and other variables related to sequencing and bioinformatics analyses (Milani et al.2013). Among the main reasons for variable data outputs is the species-specific efficiency and accuracy of the various sets of PCR primers employed to amplify part of the 16S rRNA genes that represent a given sample community (Milani et al.2013). In order to evaluate the accuracy and efficacy of the 12 primer pairs that are used in the selected datasets and that also represent the currently most frequently used PCR primers in 16S microbial profiling, we tested the primer pairs through in silico PCR. Notably, this assessment revealed rather variable amplification performances that are expected to cause genera-specific biases (Table S4, Supporting Information).
In silico evaluation of the PCR primers accuracy
Primer pairs Probio_Uni/Probio_Rev, 357F/926R, 338F/806R, 530F/926R, V5F/V6R, 341F/534R and 515F/806R showed an in silico efficacy of >90% in their ability to amplify the targeted 16S rRNA gene sequences. In contrast, primer pairs 27F/338R, 8F/357R, 8F/518R, 27F/534R and 8F/530R exhibited a predicted capacity of <32% for their ability to amplify their specifically targeted 16S rRNA sequences. We then focused on the evaluation of genus-specific amplification performances of intestinal taxa that had been determined to be present at a relative abundance of >2% in at least one sample included in our meta-analysis (Table S5, Supporting Information). The analysis of the 252 selected intestinal genera confirmed the efficiency observed regarding all bacterial 16S rRNA sequences. In detail, the bacterial sequences belonging to 248 genera were amplified by all primers examined and the primer pairs used in studies based on 454 sequencing showed the lowest efficiency (<32%) with the exception of 341F/534R (Muyzer, de Waal and Uitterlinden 1993; Juck et al.2000) (efficiency at genus level of 95.54%), 357F/926R (Liu et al.1997) (efficiency at genus level of 95.38%) and 515F/806R (Caporaso et al.2011; Walters et al.2011) (efficiency at genus level of 95.24%). Moreover, the evaluation of the primer pair-mediated amplication efficiency for each of the bacterial sequences harboring the 252 selected taxa showed that 27F/338R (Hongoh, Ohkuma and Kudo 2003; Fierer et al.2008), 8F/530R (Frank et al.2007; Perez-Brocal et al.2013), 8F/357R, 27F/534R (Ben-Dov et al.2006) and 8F/518R (Muyzer, de Waal and Uitterlinden 1993; Frank et al.2007) primer pairs elicited an amplification efficacy of >70% only for the genera Ethanoligenens, Fretibacterium and Lachnospiraceae UCG-008. Notably, Probio_Uni/Probio_Rev (Milani et al.2013) showed the highest in silico predicted PCR performances among all evaluated PCR primer pairs. In fact, the Probio_Uni/Probio_Rev primer pair was predicted to amplify the 16S rRNA gene sequences of 75.40% of the 252 selected genera with an efficiency of >95%, followed by primer pairs 341F/534R (75.00%) and 357F/926R (71.03%). Furthermore, 530F/926R (Liu et al.1997; Dowd et al.2008), 515F/806R (Caporaso et al.2011; Walters et al.2011), V5F/V6R (Cai et al.2013) and 338F/806R (el Fantroussi et al.1999; Walters et al.2011) displayed an efficiency >95% in less than 70% of the assessed 252 genera. In order to evaluate the correlation between a given primer pair and corresponding predicted relative abundance at genus level, we performed a covariance analysis through Pearson correlation coefficient based on the 3048 datasets and primer pair efficiency. This analysis indicated that 50 genera displayed a positive correlation with a given primer pair-mediated amplification efficiency (P-value <0.05), thereby indicating that the primer pair in question plays an important role in the generation of a bias in the determination of gut microbiota composition. Notably, when focusing on taxa with a relative abundance of >0.1% at least one dataset (Fig. 1), the primer pairs appear to have an impact on assessing the presence and abundance of certain taxa that are considered key gut commensal bacteria, such as Bifidobacterium, Coprococcus 3, genera belonging to Ruminococcaceae and Eubacteriaceae. These marked differences in amplification performance obtained for the tested 12 primer pairs therefore highlight the existence of biases in the reconstruction of the gut microbiota composition as reported by many published studies (Perez-Brocal et al.2015; Rojo et al.2015). This finding unfortunately prevents a reliable cross-study meta-analysis of all datasets corresponding to case and CTRL produced by different research projects. For this reason, each case-control sample processed with different laboratory protocols from several intestinal diseases, i.e. CD, UC, CRC and CDI, were analyzed separately. Subsequently, the study-specific results were evaluated together to define a global trend (increase or decrease) for each bacterial taxon in control versus disease condition.
![Covariance analysis based on the selected public datasets and primer pair efficiency. A heat map illustrating relative abundance of genera with a significant positive correlation to primer pair efficiency was shown. Only the genera with a relative abundance >0.1% in at least one dataset were reported.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/femsec/93/12/10.1093_femsec_fix153/2/m_fix153fig1.jpeg?Expires=1723515724&Signature=I8t0ZrfJ54ejFDDRipX6ZJ6b74zdD1e4wWm41eIhVoTlKnSC7u2cLco5r9yF-dcQxUp-cfkDHikd1hT9sh92w~lr0Exq4Elej4IBrA2OjaOKfbMeplMjohRQJDMouxRV3lhkW-7jcb9zXoesbjFTO-xKSgZeEpVthJdenS4RT6CFIIjQ-xoWAsJ2cBpryMq~mvuaVBNpxZu2JMqZoIeDZ9Vm6PToaJXO0DQ~owWL3aIaiZbRxsI5qx9VvB0Afy3UQWjpDMcj5QSBy6vJhlqm6NHg5mvtjq7EDWNF-RjfrsI09wiWbKd0oJAgRfxcRLxIFEG2CnIsejTKh6Z533QwNA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Covariance analysis based on the selected public datasets and primer pair efficiency. A heat map illustrating relative abundance of genera with a significant positive correlation to primer pair efficiency was shown. Only the genera with a relative abundance >0.1% in at least one dataset were reported.
Meta- and cross-analysis of the gut microbiota in intestinal diseases
Quality filtering of CD, UC, CRC and CDI samples produced an average of 49 651, 66 127, 62 242 and 376 768 reads, respectively (Table S2, Supporting Information). This level of DNA sequencing depth is considered appropriate to infer a thorough analysis of the gut microbiota (Hamady and Knight 2009).
Analysis of the microbiota complexity evaluated through alpha-diversity cross-study meta-averages, e.g. averages of all the CTRL and all affected subjects for each intestinal disease analyzed, showed higher complexity in control samples compared to CD (P-value < 0.01) (Fig. 2a), CRC (P-value < 0.05) (Fig. 4a) and CDI (P-value < 0.01) (Fig. 5a) samples. In addition, the meta-analyzed studies of UC samples provided very different alpha-diversity curves and, as expected, evaluation of the control and UC cross-study meta-averages showed a P-value > 0.05. Therefore, such data may indicate that biases in taxonomic reconstruction induced by the use of different analytical protocols, such as selection of primer pairs, significantly impact on the observed biodiversity (Hamady and Knight 2009) thus precluding cross-study meta-analysis of alpha diversity (Fig. 3a).
![Exploration of the diversity and bacterial composition of CD and CTRL samples. Panel a shows the rarefaction curves calculated through Chao1 index of each study and of CD and CTRL average. Panel b reports the principal coordinate analysis (PCoA) of the collected case-control belonging to CD studies. Panel c displays the bacterial composition at phylum level based on cross-study of the CD and CTRL groups. Panel d reports the bacterial genera present an average abundance variation of >0.5%.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/femsec/93/12/10.1093_femsec_fix153/2/m_fix153fig2.jpeg?Expires=1723515724&Signature=JFLv-4ciaEgpgYAs9rWRlcQcL48QAB7yH5J4vXWbVMkzPxXjbZK5ZCsFkQNIzZKr2s9byhwuCosGmrI1RN3ZSWvjWrK3~9XUmKt8hrsA4y4KYxbRcSkYFR40xApRCJAar8~RKxdq7soPesfBbjix6LXubO9c0d4-dmoKYX7LUlRaK8MEZurZJK7uw94DEntqntmDnpkJR-HW1fVTsGyLTuLKIvwbVJI9zn1wvP0DUlApxh7ESTxn~8SPN0pD~dv~aZW55pfNAikN2rceaLPYZjygBzU5r-i8vR9rG4-E-l2PwiQvmyuztMAWI3MzQo0owLHWgoJ-rkmiH9l1rNc7pQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Exploration of the diversity and bacterial composition of CD and CTRL samples. Panel a shows the rarefaction curves calculated through Chao1 index of each study and of CD and CTRL average. Panel b reports the principal coordinate analysis (PCoA) of the collected case-control belonging to CD studies. Panel c displays the bacterial composition at phylum level based on cross-study of the CD and CTRL groups. Panel d reports the bacterial genera present an average abundance variation of >0.5%.
![Evaluation of the microbiota composition of UC and CTRL samples. Panel a indicates the alpha-diversity curves calculated through the Chao1 index of each study and of UC and CTRL average. Panel b shows the PCoA of the UC and CTRL groups. Panel c reports a bar plot depicting the bacterial composition at phylum level based on cross-study of the UC and CTRL groups. Panel d represents the average abundance variation >0.5% at the genus level.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/femsec/93/12/10.1093_femsec_fix153/2/m_fix153fig3.jpeg?Expires=1723515724&Signature=2DcmN~GateFIzyxaXl34VEiWdf6vxiyKpOOnpW163Ie9MN3EkjbccjYzjm139CEMUU7Bgc7OlR7F9QG2evWvKWo05U8y4iHTMIVG4JakVVi7ar776ks1nRskEO9Op2YskN3eGfss1d6fYWs4x22bVDWlmq2aH19MlLGBJWMgXhyjTstmZ1~sbqKo6BjrVNgbFbOks~6bq0uOYEvXsSGKntmRGGgUfBkks-cf1cS~-pDwBGIHUq1gzuzQXRks7JML8tJ60LVV7ysCx9oAwJCSwimTuDMzZuwwHDH4XWtpNRMl7kpvhatwxUCEuOSO~J-ub25kYi4POItd0dlLyrXRyw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Evaluation of the microbiota composition of UC and CTRL samples. Panel a indicates the alpha-diversity curves calculated through the Chao1 index of each study and of UC and CTRL average. Panel b shows the PCoA of the UC and CTRL groups. Panel c reports a bar plot depicting the bacterial composition at phylum level based on cross-study of the UC and CTRL groups. Panel d represents the average abundance variation >0.5% at the genus level.
Moreover, cross-study meta-PERMANOVA, i.e. PERMANOVA obtained for all CTRL samples and all affected subjects for each intestinal disease of the meta-analyzed studies, based on the Bray-Curtis dissimilarity index showed a P-value <0.001 for all comparisons indicating a taxonomical difference among the samples from control and diseased subjects (Figs 2b, 3b, 4c and 5b).
![Examination of the complexity and bacterial differences between CRC and CTRL samples. Panel a displays the rarefaction curves of each study and of CRC and CTRL average. Panel b reports the beta-diversity of the collected case-control belonging to CRC studies. Panel c shows the bacterial composition at phylum level based on cross-study of the CRC and CTRL groups. Panel d reports the microbiota composition at genus level. The panel reports only the taxa that present an average abundance difference >0.5%.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/femsec/93/12/10.1093_femsec_fix153/2/m_fix153fig4.jpeg?Expires=1723515724&Signature=RCr4FORCoeJ0dzppeIDp3nxmxUJMkeFidEeIA0o2Nm-nxCuM-X37xIb4V~n3C0wHmP-1QIx9wrkFGfBSIUF-0z4r9DTdnIkBb~2f8pS1m2Hs9SqJlpcM49sxrzB0Q1zYST9NfsYpi1hrpQ8UyOLPqe5Z~hLEL7JcBK3Xind7tTacep7CykDC7-MwA8tKx7OE72XDVgrIa99oZ~KNIC9Cv3JzHOfjJQYG8qGz78X55GFr3f8XS2EAJ2ziNy6i2EYIzy1AOsEZUcuLFafa7G4Trs31CCogQv-04uV7OnvBU14yoIBDGHv9ujEt1iDmm-k155eepdq2lylLR2uvw-K~GQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Examination of the complexity and bacterial differences between CRC and CTRL samples. Panel a displays the rarefaction curves of each study and of CRC and CTRL average. Panel b reports the beta-diversity of the collected case-control belonging to CRC studies. Panel c shows the bacterial composition at phylum level based on cross-study of the CRC and CTRL groups. Panel d reports the microbiota composition at genus level. The panel reports only the taxa that present an average abundance difference >0.5%.
![Exploration of the of the microbiota composition of CDI and CTRL samples. Panel a displays the complexity of each studied sample and of CDI and CTRL average calculated through Chao1 and represented with rarefaction curves. Panel b reports the principal coordinate analysis (PCoA) of the collected case-control belonging to CDI studies. Panel c displays the bacterial composition at phylum level based on cross-study of the CDI and CTRL groups. Panel d reports the taxa composition at genus level reporting only bacteria with an average abundance variation >0.5%.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/femsec/93/12/10.1093_femsec_fix153/2/m_fix153fig5.jpeg?Expires=1723515724&Signature=UB3N34iL~cDQm7FhENhU23dMTW7KvR0oMsEmwB4232q~Fz~zo7pq24F5xB6RXpj85KDjqdVPWOL2EaXEEOV5-78HzXzw1JqBsCQBA9YyrlEXsOkCP3RVoH-97Cal9oK2tBYGyz9~cqWF~YuffKhVFr6GZqQqRWfaI0rSFHs6oQLc8D-n17X7tda6QojHpENdCgfggiazkf0CyNx992XvLKcoPBteHujy~QxJK1nN3-K1uhjPCnNgJBnw~F-F~C2GChsRzIguitgKbkXpuStmg0JwaWRxO-qRviB3bT-~P8ZRUvswrCLPDl27uyCK24JX9M~K~HNE9TAX7AxPf9s1Yw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Exploration of the of the microbiota composition of CDI and CTRL samples. Panel a displays the complexity of each studied sample and of CDI and CTRL average calculated through Chao1 and represented with rarefaction curves. Panel b reports the principal coordinate analysis (PCoA) of the collected case-control belonging to CDI studies. Panel c displays the bacterial composition at phylum level based on cross-study of the CDI and CTRL groups. Panel d reports the taxa composition at genus level reporting only bacteria with an average abundance variation >0.5%.
Cross-study meta-ANOVA of the bacterial profile at phylum level, e.g. averages of all CTRL and all affected subjects for each intestinal disease analyzed, showed predominant abundance of Bacteroidetes in control samples in mainly disease analyzed, i.e. average of 44.42% (P-value < 0.01), 45.43% (P-value < 0.01) and 34.01% (P-value < 0.01) compared to CD, UC and CRC, respectively (Figs 2c, 3c and 4c). Conversely, when compared to control samples the gut microbiota of diseased samples appears to exhibit a higher abundance of the Proteobacteria phylum, such as in the case of samples from individuals suffering from CD (average 19.11%, P-value < 0.01), CRC (average 15.11%, P-value < 0.01) and CDI (average 22.49%, P-value < 0.01) (Figs 2c, 4c and 5c).
Comparison between the gut microbiota composition of control individuals and of subjects for each intestinal disease analyzed at genus level showed higher abundance of genera belonging to the Bacteroidetes phylum in CTRL samples (Figs 2d, 3d, 4d and 5d). In contrast, genera belonging to Proteobacteria were articularly less abundant in control samples as compared to samples corresponding to each of the investigated diseases (Figs 2d, 3d, 4d and 5d).
Interestingly, comparison between the gut microbiota composition of CD and CTRL samples showed higher abundance of genera Parabacteroides (increase of 67.10%, P-value < 0.05), Faecalibacterium (increase of 18.19%, P-value < 0.05), Prevotella 9 (increase of 185.24%, P-value < 0.05) and Bacteroides (increase of 45.68%, P-value < 0.05), in CTRL samples (Fig. 2d). In contrast, genera Escherichia-Shigella (decrease of −39.08%, P-value < 0.05) and Haemophilus (decrease of −39.08%, P-value < 0.05) were less abundant in control samples as compared to samples obtained from individuals with CD (Fig. 2d). Notably, CRC samples possess a higher abundance of bacteria that have been associated with the development of intestinal diseases, such as Campylobacter (increase of 950.04%, P-value < 0.05) (Warren et al.2013; Akutko and Matusiewicz 2017), or known to be involved in the transition from eubiosis, i.e. an optimal balance of gut microbiota composition (Iebba et al.2016), to dysbiosis, such as Gemella (increase of 118.05%, P-value < 0.05) (Chen et al.2016) (Fig. 4d). Interestingly, 16S microbial profiles of CTRL samples displayed a higher abundance of members of the genus Faecalibacterium (increase of 77.91%, P-value < 0.05), which is considered a bacterial genus with a beneficial effect on the human gut (Ventura et al.2014) and which could have a role in preventing CRC (Wei et al.2016). Interestingly, also the gut microbiota profiles of CDI samples possess a higher abundance of opportunistic pathogens belonging to the phylum Proteobacteria and a lower abundance of taxa that are associated with health promoting effects, such as Bifidobacterium and Faecalibacterium (Milani et al.2014; Ventura et al.2014; Milani et al.2015) (Fig. 5d).
Moreover, the evaluation of the taxonomic trend (Tables S7, S9, S11 and S13, Supporting Information) and the differences of the gut microbiota composition at genus level across the meta-analyzed studies allowed us to identify genera that may represent suitable bacterial biomarkers of each analyzed disease. Interestingly, the relative abundance of 16, 5, 7 and 3 taxa increase, while 2, 4, 4 and 3 taxa decrease, respectively, in CD, UC, CRC and CDI subjects when compared to CTRL individuals in all meta-analyzed studies (Tables S7, S9, S11 and S13, Supporting Information). A summary of the taxa that may constitute specific disease microbial biomarkers was reported in Table 1.
Intestinal diseases . | Samples . | Genera . |
---|---|---|
CTRL | Barnesiella | |
Ruminococcus 2 | ||
Actinomyces | ||
Eggerthella | ||
Blautia | ||
Crohn's disease | Peptoclostridium | |
CD | Flavonifractor | |
Erysipelatoclostridium | ||
Lactobacillus | ||
Streptococcus | ||
U. m. of Proteobacteria phylum | ||
Barnesiella | ||
CTRL | Odoribacter | |
Alistipes | ||
Faecalibacterium | ||
Ulcerative colitis | Streptococcus | |
UC | Veillonella | |
U. m. of Enterobacteriaceae family | ||
Haemophilus | ||
U. m. of Lachnospiraceae family | ||
Faecalibacterium | ||
CTRL | Ruminococcaceae UCG-005 | |
Subdoligranulum | ||
Colorectal cancer | Alloprevotella | |
Gemella | ||
Parvimonas | ||
CRC | ||
Streptococcus | ||
Leptotrichia | ||
Campylobacter | ||
Christensenellaceae R-7 group | ||
CTRL | U. m. of Lachnospiraceae family | |
Ruminococcaceae UCG-003 | ||
Clostridium difficile infection | ||
Erysipelatoclostridium | ||
CDI | Enterococcus | |
Lactobacillus | ||
Barnesiella | ||
Ruminococcaceae UCG-005 | ||
Total CTRL | Alistipes | |
Christensenellaceae R-7 group | ||
Total diseases | ||
U. m. of Lachnospiraceae family | ||
Lactobacillus | ||
Total diseases | Streptococcus | |
U. m. of Enterobacteriaceae family |
Intestinal diseases . | Samples . | Genera . |
---|---|---|
CTRL | Barnesiella | |
Ruminococcus 2 | ||
Actinomyces | ||
Eggerthella | ||
Blautia | ||
Crohn's disease | Peptoclostridium | |
CD | Flavonifractor | |
Erysipelatoclostridium | ||
Lactobacillus | ||
Streptococcus | ||
U. m. of Proteobacteria phylum | ||
Barnesiella | ||
CTRL | Odoribacter | |
Alistipes | ||
Faecalibacterium | ||
Ulcerative colitis | Streptococcus | |
UC | Veillonella | |
U. m. of Enterobacteriaceae family | ||
Haemophilus | ||
U. m. of Lachnospiraceae family | ||
Faecalibacterium | ||
CTRL | Ruminococcaceae UCG-005 | |
Subdoligranulum | ||
Colorectal cancer | Alloprevotella | |
Gemella | ||
Parvimonas | ||
CRC | ||
Streptococcus | ||
Leptotrichia | ||
Campylobacter | ||
Christensenellaceae R-7 group | ||
CTRL | U. m. of Lachnospiraceae family | |
Ruminococcaceae UCG-003 | ||
Clostridium difficile infection | ||
Erysipelatoclostridium | ||
CDI | Enterococcus | |
Lactobacillus | ||
Barnesiella | ||
Ruminococcaceae UCG-005 | ||
Total CTRL | Alistipes | |
Christensenellaceae R-7 group | ||
Total diseases | ||
U. m. of Lachnospiraceae family | ||
Lactobacillus | ||
Total diseases | Streptococcus | |
U. m. of Enterobacteriaceae family |
Intestinal diseases . | Samples . | Genera . |
---|---|---|
CTRL | Barnesiella | |
Ruminococcus 2 | ||
Actinomyces | ||
Eggerthella | ||
Blautia | ||
Crohn's disease | Peptoclostridium | |
CD | Flavonifractor | |
Erysipelatoclostridium | ||
Lactobacillus | ||
Streptococcus | ||
U. m. of Proteobacteria phylum | ||
Barnesiella | ||
CTRL | Odoribacter | |
Alistipes | ||
Faecalibacterium | ||
Ulcerative colitis | Streptococcus | |
UC | Veillonella | |
U. m. of Enterobacteriaceae family | ||
Haemophilus | ||
U. m. of Lachnospiraceae family | ||
Faecalibacterium | ||
CTRL | Ruminococcaceae UCG-005 | |
Subdoligranulum | ||
Colorectal cancer | Alloprevotella | |
Gemella | ||
Parvimonas | ||
CRC | ||
Streptococcus | ||
Leptotrichia | ||
Campylobacter | ||
Christensenellaceae R-7 group | ||
CTRL | U. m. of Lachnospiraceae family | |
Ruminococcaceae UCG-003 | ||
Clostridium difficile infection | ||
Erysipelatoclostridium | ||
CDI | Enterococcus | |
Lactobacillus | ||
Barnesiella | ||
Ruminococcaceae UCG-005 | ||
Total CTRL | Alistipes | |
Christensenellaceae R-7 group | ||
Total diseases | ||
U. m. of Lachnospiraceae family | ||
Lactobacillus | ||
Total diseases | Streptococcus | |
U. m. of Enterobacteriaceae family |
Intestinal diseases . | Samples . | Genera . |
---|---|---|
CTRL | Barnesiella | |
Ruminococcus 2 | ||
Actinomyces | ||
Eggerthella | ||
Blautia | ||
Crohn's disease | Peptoclostridium | |
CD | Flavonifractor | |
Erysipelatoclostridium | ||
Lactobacillus | ||
Streptococcus | ||
U. m. of Proteobacteria phylum | ||
Barnesiella | ||
CTRL | Odoribacter | |
Alistipes | ||
Faecalibacterium | ||
Ulcerative colitis | Streptococcus | |
UC | Veillonella | |
U. m. of Enterobacteriaceae family | ||
Haemophilus | ||
U. m. of Lachnospiraceae family | ||
Faecalibacterium | ||
CTRL | Ruminococcaceae UCG-005 | |
Subdoligranulum | ||
Colorectal cancer | Alloprevotella | |
Gemella | ||
Parvimonas | ||
CRC | ||
Streptococcus | ||
Leptotrichia | ||
Campylobacter | ||
Christensenellaceae R-7 group | ||
CTRL | U. m. of Lachnospiraceae family | |
Ruminococcaceae UCG-003 | ||
Clostridium difficile infection | ||
Erysipelatoclostridium | ||
CDI | Enterococcus | |
Lactobacillus | ||
Barnesiella | ||
Ruminococcaceae UCG-005 | ||
Total CTRL | Alistipes | |
Christensenellaceae R-7 group | ||
Total diseases | ||
U. m. of Lachnospiraceae family | ||
Lactobacillus | ||
Total diseases | Streptococcus | |
U. m. of Enterobacteriaceae family |
Identification of universal biomarkers
In order to evaluate the existence of universal intestinal diseases biomarkers, we performed a meta-analysis for all datasets corresponding to studies regarding CD, UC, CRC and CDI. Cross-analysis of the alpha diversity showed a higher biodiversity in CTRL samples with respect to subjects affected by an intestinal disease (DS) (P-value < 0.05) (Fig. 6a). These data confirm previous observations that intestinal dysbiosis is linked to loss of microbiota diversity (Sha et al.2013; Mosca, Leclerc and Hugot 2016). Moreover, the beta-diversity cross-analysis indicated a clear division between CTRL and samples affected by intestinal diseases (meta-PERMANOVA P-value < 0.05) (Fig. 6b). Therefore, we focused at the genus level to identify the differences in gut microbiota composition between these two groups. In detail, these analyses revealed a total of 261 genera with a significantly different abundance (P-value < 0.05) (Table S14, Supporting Information), of which 20 with an average abundance variation of >0.5% (Fig. 6c).
![Investigation of the microbiota composition of all subjects affected by an intestinal disease (DS) and control samples. Panel a shows alpha-diversity curves calculated through Chao1 index of each study and of DS and CTRL average. Panel b reports the PCoA of the collected case-control belonging to DS studies. Panel c indicates the bacterial composition at phylum level based on cross-study of the DS and CTRL groups. Panel d shows the bacterial genera that present an average abundance difference >0.5%.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/femsec/93/12/10.1093_femsec_fix153/2/m_fix153fig6.jpeg?Expires=1723515724&Signature=Rz1ETaA-rIBAy4EdK1WLGV2EhJRw9~DisQfSFgSt77~NlHnQtGg7HYaPxSDhXBfNXEM9znncUI5xN~owtkAUVEKkB7DnDEjZV1crP9ukaM8WGmOtPRpJIidw66lrXR6OaIwkGbj8Clsz3r6TiTnTPsWAvi3~uQAGDpn4hKTQtZw1Imk~C1er88I2QOCJx7-ZzYF42c-iTi9zF~4CZSxsI2kJol-hRqau3C~zoUpXct-oOhiqOsSwsPEit8NjR7kLeLmhjSnOfTXVa8IP416OIVZkSnHMyGdGK~0nExvEjfcHkEz9gcNR5RpfewD6esVEnMOnd8Li7cXakv48Tgpzlg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Investigation of the microbiota composition of all subjects affected by an intestinal disease (DS) and control samples. Panel a shows alpha-diversity curves calculated through Chao1 index of each study and of DS and CTRL average. Panel b reports the PCoA of the collected case-control belonging to DS studies. Panel c indicates the bacterial composition at phylum level based on cross-study of the DS and CTRL groups. Panel d shows the bacterial genera that present an average abundance difference >0.5%.
Interestingly, when focusing on the genera with a significant P-value and a taxonomic trend with a prevalence of >80% (Table S15, Supporting Information), it was possible to identify five and three taxa characteristic of CTRL and DS subjects, respectively. In detail, CTRL showed high relative abundance of the genera Barnesiella (in 90% of the studies and P-value < 0.05), Ruminococcaceae UCG-005 (in 85% of the studies and P-value < 0.05), Alistipes (in 80% of the studies and P-value < 0.05), Christensenellaceae R-7 group (in 80% of the studies and P-value < 0.05) and unclassified member of Lachnospiraceae family (in 80% of the studies and P-value < 0.05), while DS displayed high abundance of the taxa Lactobacillus (in 90% of the studies and P-value < 0.05), unclassified member of Erysipelotrichaceae family (in 80% of the studies and P-value < 0.05) and Streptococcus (in 80% of the studies and P-value < 0.05).
In previous studies, Barnesiella genus was identified only in populations living in developed countries (Mancabelli et al.2017) and was correlated with beneficial effects on human gut (Kulagina et al.2012; Ubeda et al.2013). Moreover, Ruminococcaceae UCG-005, Alistipes and unclassified member of Lachnospiraceae family have been reported to be butyrate-producing bacteria (Flint et al.2012; Chen et al.2017) that may protect healthy subjects from chronic intestinal inflammation (Lepage et al.2011). In contrast, the higher relative abundance of Streptococcus genus in DS confirm its previously reported correlation with a range of gastrointestinal diseases (Murray and Roberts 1978; Burnett-Hartman, Newcomb and Potter 2008) and renders it a valuable candidate as a universal biomarker of intestinal dysbiosis. Furthermore, bacteria belonging to Erysipelotrichaceae family were correlated with inflammation (Kaakoush 2015) and immunomodulation (Palm et al.2014) but their functional correlation with intestinal diseases is far from being fully elucidated.
Notably, the observed higher relative abundance of the non-pathogenic taxa Lactobacillus in DS may reflect lower niche-competition caused by simplification of the dysbiotic gut microbiota (Walter 2008).
CONCLUSIONS
A substantial number of studies based on 16S rRNA gene profiling have reported on the correlation between human gut diseases and microbiota composition. Nevertheless, one of the main biases in the reconstruction of the gut microbiota composition through 16S rRNA profiling is the selection of reliable and universal primer pairs. In silico PCR and covariance analysis of the 12 primer pairs used in 24 selected public gut metagenomic studies confirmed their impact on biased amplification of the targeted section of the 16S rRNA gene. To overcome this limitation, we performed a cross-study meta-analysis of 3048 public metagenomic datasets, corresponding to 1252 control (CTRL) and 1796 patient subjects, in order to identify possible bacterial biomarkers for major intestinal diseases such as CD, UC, CRC and CDI. Furthermore, we analyzed all datasets together, in order to identify possible universal gut disease microbial biomarkers. In detail, this cross-study analysis showed that Barnesiella, Ruminococcaceae UCG-005, Alistipes, Christensenellaceae R-7 group and unclassified member of Lachnospiraceae family genera correlated with a healthy state of subjects. In contrast, subjects that present an intestinal disease displayed higher abundance of genera reported to cause intestinal inflammation, such as unclassified member of Erysipelotrichaceae family and Streptococcus. The identification of novel universal biomarkers as indicators of human gut diseases may contribute to rapid diagnosis as well as to predict the course and prognosis of the disease and guide therapeutic decisions improving patient care.
SUPPLEMENTARY DATA
Supplementary data are available at FEMSEC online
Acknowledgements
Part of this research is conducted using the High Performance Computing (HPC) facility of the University of Parma.
FUNDING
This work was funded by the EU Joint Programming Initiative—A Healthy Diet for a Healthy Life (JPI HDHL, http://www.healthydietforhealthylife.eu/) to MV and DvS (Grant no. 15/JP/HDHL/3280), and the MIUR to MV. We thank GenProbio srl for financial support of the Laboratory of Probiogenomics. LM is supported by Fondazione Cariparma, Parma, Italy. DvS is a member of The APC Microbiome Institute funded by Science Foundation Ireland (SFI), through the Irish Government's National Development Plan (Grant no. SFI/12/RC/2273).
Conflict of interest. None declared.