Abstract

Background

Drivers of tuberculosis (TB) transmission in India, the country estimated to carry a quarter of the world's burden, are not well studied. We conducted a genomic epidemiology study to compare epidemiological success, host factors, and drug resistance among the 4 major Mycobacterium tuberculosis (Mtb) lineages (L1–L4) circulating in Pune, India.

Methods

We performed whole-genome sequencing (WGS) of Mtb sputum culture–positive isolates from participants in two prospective cohort studies and predicted genotypic susceptibility using a validated random forest model. We compared lineage-specific phylogenetic and time-scaled metrics to assess epidemiological success.

Results

Of the 612 isolates that met sequence quality criteria, Most were L3 (44.6%). The majority (61.1%) of multidrug-resistant isolates were L2 (P < .001) and L2 demonstrated a higher rate and more recent resistance acquisition. L4 and/or L2 demonstrated higher clustering and time-scaled haplotypic density (THD) compared to L3 and/or L1, suggesting higher epidemiological success. L4 demonstrated higher THD and clustering (odds ratio, 5.1 [95% confidence interval, 2.3–12.3]) in multivariate models controlling for host factors and resistance.

Conclusions

L2 shows a higher frequency of resistance, and both L2 and L4 demonstrate evidence of higher epidemiological success than L3 or L1 in Pune. Contact tracing around TB cases and heightened surveillance of TB DR in India is a public health priority.

Despite investment in tuberculosis (TB) healthcare infrastructure, the decline in global incidence has been very slow (∼2% annually) [1], decelerated by ongoing community transmission [2, 3]. In addition to host-related factors, Mycobacterium tuberculosis (Mtb) strain characteristics like drug resistance and lineage characteristics may directly affect or correlate with TB transmission [4]. An improved understanding of these characteristics is necessary to improve our ability to control TB transmission, especially in high-burden settings.

The Mtb species, in its strictest sense, has been divided into 4 major lineages based on genotype—the ancient lineage 1 (L1) and modern lineages L2, L3, and L4 [5]. Previous studies have supported transmissibility differences between these lineages based on a higher rate of genetic clustering [6], an increasing frequency of a lineage in a community [7], lineage propensity for younger hosts (implying recent transmission) [8], or comparing rates of secondary TB among close contacts of index cases infected with different lineages [9]. Yet some of this evidence has been criticized, including evidence derived from the analysis of genetic clustering because of several simplifying assumptions that may introduce bias. These include an assumption of similar rate of genetic evolution, or the molecular clock rate, between lineages when evidence suggests significant differences especially between L1 and L2/L3/L4 [10]. Time-scaled haplotypic density (THD) can measure epidemiological success at short and long time scales while incorporating the effect of differing molecular clock rates circumventing some of the shortcomings of clustering-based approaches [6]. Coalescent-based methods can also supplement these analyses by estimating if the data support population expansion or contraction under model assumptions [11, 12].

Few studies have examined both host and strain characteristics to understand TB transmission in India. India carries a quarter of the global burden and the highest annual incidence of TB [1]. Although geographic variation in lineage prevalence has been noted, L1 comprises approximately two-thirds (67%) of the Mtb isolates in the country, and India is unique in that all 4 major lineages currently circulate with appreciable frequencies [6, 13]. Furthermore, India is reported to have increasing rates of drug resistance at least in certain districts [14, 15] and indirect evidence of drug-resistant TB transmission in the community [2, 3, 6]. Host-level characteristics such as comorbidities and bacterial burden affect TB spread in conjunction with lineage and drug resistance. Yet most prior studies in India have not evaluated combined host and strain factors to understand the epidemiological success of TB in the community.

In this study, we leverage 2 prospective cohort studies of active pulmonary TB in western India that enrolled between 2013 and 2018: (1) a prospective observational study of active TB in Pune, India [16], and (2) a prospective study to estimate co-prevalence of diabetes and TB and its impact on treatment outcomes in Pune [17, 18]. We study the frequency of drug resistance among the major Mtb lineages and the age of resistance acquisition, as well as discrepancies between genotype and phenotypic resistance in this less-studied TB patient population. We also study host characteristics relevant to transmissibility, and several short-scale and long-scale metrics of epidemiological success between the 4 major Mtb lineages.

METHODS

Study Design and Participant Eligibility

The whole-genome sequences used in this study were obtained from (1) the prospective Cohort for Tuberculosis Research by the Indo-US Medical Partnership (CTRIUMPH) study conducted in Pune, India, from August 2014 to December 2019 [16], and (2) a prospective study, focused on the relationship between tuberculosis and diabetes in adults, conducted in Pune, India, between December 2013 and May 2019 [17, 18]. The eligibility criteria, study setting, and study procedures have been previously described [16–19]. In brief, both studies enrolled participants with new active pulmonary TB or extrapulmonary TB presenting to 11 TB treatment centers (run through the National Tuberculosis Elimination Program, previously known as Revised National TB Control Programme) located in the western Indian city of Pune, India, serving a largely urban population [17, 18]. Any adult (≥18 years of age) participant with microbiologically confirmed pulmonary TB was included in this analysis. CTRIUMPH also enrolled household contacts (including those <18 years of age) of index pulmonary TB cases and followed them for 24 months for development of TB infection or TB disease. Additional study procedures are provided in the Supplementary Methods.

Whole-Genome Sequencing and Variant Identification

Positive Lowenstein-Jensen cultures from baseline sputum samples were subjected to DNA extraction and sequenced using a standardized protocol as described in the Supplementary Methods. The raw reads were processed through a custom bioinformatics pipeline [20] (Supplementary Methods). Isolate lineage was called using standard techniques [21–23] implemented in a bioinformatics pipeline [24].

Phylogenetic Analysis and Epidemiological Success

We generated a final dated phylogenetic tree using maximum likelihood assuming a generalizable time-reversible nucleotide substitution model followed by time calibration using BEAST software as described in the Supplementary Methods. Participant characteristics were compared across the 4 lineages using analysis of variance for continuous variables or χ2 test for categorical variables. We compared the following characteristics across the 4 lineages using a 2-sample Kolmogorov-Smirnov test: terminal branch lengths, pairwise single-nucleotide substitution (SNS) distances, and branching times, with a 1-sided P < .01 considered to be significant. We investigated possible clusters of recent transmission [25] using a range of cut-offs for SNS distance as is routinely performed in public health settings [26–31]. For the clustering analysis, we defined clusters as isolates that had a maximum genetic distance using specified SNS thresholds. We computed the THD as an alternative measure of lineage success using the thd R-package that incorporates differing evolutionary rates and time scales as available and described in https://github.com/rasigadelab/thd, assuming an Mtb genomic length of 4.411 × 106, and lineage-specific molecular clock rates as reported by Menardo et al [32]. THD uses kernel density estimation to model the distribution of genetic distances between isolates. The distribution is assumed to be a truncated geometric distribution with a bandwidth scaled to the input time-scale and mutational rate for each lineage. The resulting THD values are proxies of epidemiological success that are thought to be more accurate than the simple genetic distances (or clustering based on this) as they incorporate an assessment of sampling time and the number of expected mutations per unit time. The THD values is used as the response variable in a linear regression model using predictors of epidemiological success, or lineage in the case of this study, as explanatory variables. The result of this regression indicates which lineages are associated with increased epidemiological success [48].

Host Contribution to Transmission

We used a previously described “propensity to propagate” (PTP) method [33] to assess host-related factors associated with transmission. PTP measures the likelihood of an isolate to be transmitted by adjusting for both patient-related characteristics and molecular measures of transmission (cluster size and proportion of cases in a cluster). This individual propensity is computed as the linear sum of premeasured coefficients of transmissibility multiplied by the value/status of risk factors assessed including age, sex, smear grade, and substance use, etc, as detailed in Nebenzahl-Guimaraes et al [33]. The PTP value for a cluster is computed as the geometric mean of each TB-infected individual's propensity score for transmissibility. Data were available on participants’ age, sex, smear positivity, and alcohol use. All participants had pulmonary disease. Data were not available on drug use, occupation, or homelessness. While foreign birth did not apply to our setting, internal migration may be contributory, but these data were not available. Alcohol use data were missing for 2 participants, and they were assumed to be nonconsumers as the overall frequency of alcohol consumption was low. In addition to computing PTP, we used the finalfit R v4.1.3 package version 1.03 [34] to perform logistic regression to identify factors associated with clustering at an SNS threshold of ≤12 and THD.

RESULTS

Study Participant and Mtb Isolate Characteristics

Of the 2257 participants screened in the two studies by 1 July 2018, 1046 (46%) had microbiologically positive sputum cultures at baseline. A total of 48 pediatric participants and 9 with extrapulmonary TB who had positive cultures were excluded. Of these, a total of 880 (84%) study participants with Mtb growth on sputum culture at baseline were successfully subcultured. Only 9 participants with active TB were identified among household contacts of index cases in the CTRIUMPH study but were excluded from sequencing due to low numbers. Extracted DNA from index samples available as of 1 February 2019 (n = 750) underwent whole-genome sequencing (WGS). Of these, 108 samples (14%) did not meet the sequence coverage threshold despite 2 attempts. Of the remaining 642 (86%), 612 (95%) met sequence quality criteria (Supplementary Methods). The majority of the 612 sequenced participants were male (64.7%), and the median age of participants was 31 years (interquartile range [IQR], 24.4–44.2 years). Most participants had a positive sputum smear microscopy (83.3%), human immunodeficiency virus coinfection occurred among 6.7%, and 13.1% had diabetes mellitus. Participants were infected with L3 isolates most commonly (n = 273 [44.7%]), followed by L1 (n = 162 [26.5%]), L4 (n = 132 [21.6%]), and L2 (n = 45 [7.3%]). Demographic characteristics and comorbidities of study participants did not differ by Mtb lineage (Table 1). Among the 574 isolates that underwent phenotypic drug susceptibility testing, 49 (8.5%) were monoresistant to isoniazid, 6 (1.0%) were monoresistant to rifampin, and 24 (4.2%) were multidrug resistant (MDR), that is, resistant to both isoniazid and rifampin (Supplementary Table 1).

Table 1.

Demographic Characteristics of Study Participants, by Lineage

CharacteristicLineage 1
(n = 162)
Lineage 2
(n = 45)
Lineage 3
(n = 273)
Lineage 4
(n = 132)
Total
(n = 612)
P Valuea
Female sex59 (36.4%)13 (28.9%)103 (37.7%)41 (31.1%)216 (35.3%).45
Median age, y (IQR)32 (25–45)29 (22–39)31 (24–44)30 (25–45)31 (24–44).23
Smear positive138 (85.2%)40 (88.9%)222 (81.3%)110 (83.3%)510 (83.3%).53
HIV positive6 (3.7%)1 (2.2%)17 (6.2%)14 (10.6%)38 (6.2%).15
Known diabetes mellitus21 (13.0%)3 (6.7%)38 (13.9%)18 (13.6%)80 (13.1%).60
CharacteristicLineage 1
(n = 162)
Lineage 2
(n = 45)
Lineage 3
(n = 273)
Lineage 4
(n = 132)
Total
(n = 612)
P Valuea
Female sex59 (36.4%)13 (28.9%)103 (37.7%)41 (31.1%)216 (35.3%).45
Median age, y (IQR)32 (25–45)29 (22–39)31 (24–44)30 (25–45)31 (24–44).23
Smear positive138 (85.2%)40 (88.9%)222 (81.3%)110 (83.3%)510 (83.3%).53
HIV positive6 (3.7%)1 (2.2%)17 (6.2%)14 (10.6%)38 (6.2%).15
Known diabetes mellitus21 (13.0%)3 (6.7%)38 (13.9%)18 (13.6%)80 (13.1%).60

Abbreviations: HIV, human immunodeficiency virus; IQR, interquartile range.

aχ2 test was used to compare characteristics across lineages, except for “Age” where analysis of variance was used and for “HIV positive” where Fisher exact test was used.

Table 1.

Demographic Characteristics of Study Participants, by Lineage

CharacteristicLineage 1
(n = 162)
Lineage 2
(n = 45)
Lineage 3
(n = 273)
Lineage 4
(n = 132)
Total
(n = 612)
P Valuea
Female sex59 (36.4%)13 (28.9%)103 (37.7%)41 (31.1%)216 (35.3%).45
Median age, y (IQR)32 (25–45)29 (22–39)31 (24–44)30 (25–45)31 (24–44).23
Smear positive138 (85.2%)40 (88.9%)222 (81.3%)110 (83.3%)510 (83.3%).53
HIV positive6 (3.7%)1 (2.2%)17 (6.2%)14 (10.6%)38 (6.2%).15
Known diabetes mellitus21 (13.0%)3 (6.7%)38 (13.9%)18 (13.6%)80 (13.1%).60
CharacteristicLineage 1
(n = 162)
Lineage 2
(n = 45)
Lineage 3
(n = 273)
Lineage 4
(n = 132)
Total
(n = 612)
P Valuea
Female sex59 (36.4%)13 (28.9%)103 (37.7%)41 (31.1%)216 (35.3%).45
Median age, y (IQR)32 (25–45)29 (22–39)31 (24–44)30 (25–45)31 (24–44).23
Smear positive138 (85.2%)40 (88.9%)222 (81.3%)110 (83.3%)510 (83.3%).53
HIV positive6 (3.7%)1 (2.2%)17 (6.2%)14 (10.6%)38 (6.2%).15
Known diabetes mellitus21 (13.0%)3 (6.7%)38 (13.9%)18 (13.6%)80 (13.1%).60

Abbreviations: HIV, human immunodeficiency virus; IQR, interquartile range.

aχ2 test was used to compare characteristics across lineages, except for “Age” where analysis of variance was used and for “HIV positive” where Fisher exact test was used.

Genotypic Drug Susceptibility Testing and Concordance With Phenotypes

We subjected the 612 Mtb isolates to genotypic drug susceptibility testing (DST) using a validated random forest model to assess discrepancies as these are understudied in the Indian context [35, 36]. Fifteen isolates were concordant for rifampin resistance by phenotypic and genotypic DST; 5 tested resistant only by genotype, and 15 tested resistant only by phenotype. For isoniazid, 38 isolates were concordant for resistance; 35 were tested resistant only by phenotype, and 19 tested resistant only by genotype. We identified 24 of 533 (4.5%) isolates to be MDR based on phenotype and 18 of 612 (2.9%) isolates as MDR based on genotype (Table 2), of which 13 of 612 (2.1%) were concordant resistant by both methods.

Table 2.

Isolates Predicted to be Resistant Based on Genotype, by Drug and Lineage

DrugLineage 1 (n = 162)Lineage 2 (n = 45)Lineage 3 (n = 273)Lineage 4 (n = 132)Total (n = 612)P Value (Fisher Exact, L2 vs Non-L2)
Isoniazid11 (6.8)14 (31.1)23 (8.4)12 (9.1)60 (9.8)<.001
Rifampin2 (1.2)11 (24.4)5 (1.8)2 (1.5)20 (3.3)<.001
Ethambutol012 (26.7)5 (1.8)017 (2.8)<.001
Ethionamide5 (3.0)4 (8.9)5 (1.8)2 (1.5)16 (2.6).02
Pyrazinamide02 (4.4)4 (1.5)2 (1.5)8 (1.3).11
Streptomycin4 (2.5)17 (37.8)13 (4.8)8 (6.0)42 (6.9)<.001
Levofloxacin2 (1.2)12 (26.7)10 (3.7)10 (7.6)34 (5.6)<.001
Capreomycin004 (1.5)04 (0.7)
Multidrug resistant1 (0.6)11 (24.4)4 (1.5)2 (1.5)18 (2.9)<.001
DrugLineage 1 (n = 162)Lineage 2 (n = 45)Lineage 3 (n = 273)Lineage 4 (n = 132)Total (n = 612)P Value (Fisher Exact, L2 vs Non-L2)
Isoniazid11 (6.8)14 (31.1)23 (8.4)12 (9.1)60 (9.8)<.001
Rifampin2 (1.2)11 (24.4)5 (1.8)2 (1.5)20 (3.3)<.001
Ethambutol012 (26.7)5 (1.8)017 (2.8)<.001
Ethionamide5 (3.0)4 (8.9)5 (1.8)2 (1.5)16 (2.6).02
Pyrazinamide02 (4.4)4 (1.5)2 (1.5)8 (1.3).11
Streptomycin4 (2.5)17 (37.8)13 (4.8)8 (6.0)42 (6.9)<.001
Levofloxacin2 (1.2)12 (26.7)10 (3.7)10 (7.6)34 (5.6)<.001
Capreomycin004 (1.5)04 (0.7)
Multidrug resistant1 (0.6)11 (24.4)4 (1.5)2 (1.5)18 (2.9)<.001
Table 2.

Isolates Predicted to be Resistant Based on Genotype, by Drug and Lineage

DrugLineage 1 (n = 162)Lineage 2 (n = 45)Lineage 3 (n = 273)Lineage 4 (n = 132)Total (n = 612)P Value (Fisher Exact, L2 vs Non-L2)
Isoniazid11 (6.8)14 (31.1)23 (8.4)12 (9.1)60 (9.8)<.001
Rifampin2 (1.2)11 (24.4)5 (1.8)2 (1.5)20 (3.3)<.001
Ethambutol012 (26.7)5 (1.8)017 (2.8)<.001
Ethionamide5 (3.0)4 (8.9)5 (1.8)2 (1.5)16 (2.6).02
Pyrazinamide02 (4.4)4 (1.5)2 (1.5)8 (1.3).11
Streptomycin4 (2.5)17 (37.8)13 (4.8)8 (6.0)42 (6.9)<.001
Levofloxacin2 (1.2)12 (26.7)10 (3.7)10 (7.6)34 (5.6)<.001
Capreomycin004 (1.5)04 (0.7)
Multidrug resistant1 (0.6)11 (24.4)4 (1.5)2 (1.5)18 (2.9)<.001
DrugLineage 1 (n = 162)Lineage 2 (n = 45)Lineage 3 (n = 273)Lineage 4 (n = 132)Total (n = 612)P Value (Fisher Exact, L2 vs Non-L2)
Isoniazid11 (6.8)14 (31.1)23 (8.4)12 (9.1)60 (9.8)<.001
Rifampin2 (1.2)11 (24.4)5 (1.8)2 (1.5)20 (3.3)<.001
Ethambutol012 (26.7)5 (1.8)017 (2.8)<.001
Ethionamide5 (3.0)4 (8.9)5 (1.8)2 (1.5)16 (2.6).02
Pyrazinamide02 (4.4)4 (1.5)2 (1.5)8 (1.3).11
Streptomycin4 (2.5)17 (37.8)13 (4.8)8 (6.0)42 (6.9)<.001
Levofloxacin2 (1.2)12 (26.7)10 (3.7)10 (7.6)34 (5.6)<.001
Capreomycin004 (1.5)04 (0.7)
Multidrug resistant1 (0.6)11 (24.4)4 (1.5)2 (1.5)18 (2.9)<.001

We investigated the discrepancies between genotypic and phenotypic DST since these are not well studied in India. We performed repeat DST for all discrepant isolates (n = 98) for the drugs isoniazid (n = 54), rifampin (n = 20), ethambutol (n = 26), and streptomycin (n = 46). Of the 98, 64 (65.3%) isolates had growth on subcultures without contamination. Repeat phenotypic testing resolved a substantial number of discrepancies with few remaining discrepant isolates—isoniazid (n = 6), rifampin (n = 1), ethambutol (n = 4), and streptomycin (n = 8). Mutations found in these isolates are shown in Supplementary Tables 2 and 3. The 3 isolates identified as isoniazid resistant only by genotypic DST harbored the canonical drug resistance-conferring mutation katG S315T. The one rifampin-resistant isolate identified only on genotypic DST also harbored the canonical resistance-conferring mutation rpoB L430P. Mutations identified in isolates that were genotypically predicted to be resistant to ethambutol and streptomycin but were determined to be susceptible on phenotypic testing are shown in Supplementary Table 2. We conducted capreomycin phenotypic testing on 4 isolates that were predicted to be capreomycin resistant but susceptible to isoniazid and rifampin based on genotype. All 4 of these isolates harbored an H68R mutation in the tlyA gene and tested phenotypically capreomycin susceptible at the critical concentration of 2.5 μg/mL. Among genotypically susceptible and phenotypically resistant isolates (Supplementary Table 3), we found no mutations known to be associated with resistance. The isolates harbored 7 mutations not previously reported in the World Health Organization (WHO) resistance catalogue and 13 mutations not associated with resistance or of unknown significance [37].

Frequency and Timing of Resistance Acquisition

L2 had a higher frequency of drug resistance acquisition (acquisition of new resistance-conferring mutations as percentage of total isolates) as compared to other lineages, specifically for rifampin, ethambutol, streptomycin, and levofloxacin (Supplementary Table 4). We investigated the specific drug resistance–conferring mutations acquired (Supplementary Table 5) and the timing of these acquisitions, with L2 demonstrating the highest number of resistance acquisition events. The median age of resistance acquisition was most recent for L2 (ca. 2016 [IQR, 2014–2016]) followed by L4 (ca. 2014 [IQR, 2007–2016]) (Wilcoxon P = .02) and then by L1 or L3 (Wilcoxon P value for L2 or L4 vs L1 or L3 = 2 × 10−8). L1 and L3 resistance ages did not significantly differ (pooled median age ca. 2008 [IQR, 2005–2012]; Wilcoxon P = .7).

Epidemiological success

Phylogenetic Characteristics

We generated a phylogenetic tree (Figure 1), and compared diversity and tree characteristics including terminal branch lengths as indirect measures of bacterial transmissibility [7, 23, 38] (Phylogenetic Analysis and Epidmiological Success Methods). The L2 and L4 phylogenies had shorter terminal branch lengths (median, 3.3 [IQR, 1.3–9.2] and 5.1 [IQR, 0.8–10.5], respectively) as compared to L1 and L3 (L1: 9.4 [IQR, 5.4–15.7], L3: 7.8 [IQR, 4.4–11.3]; P < .004 for comparisons with L4 or L2) (Figure 2A). Isolates belonging to L2 were genetically more similar than those belonging to other lineages (Figure 2B). Branching times, that is, time since divergence between branches, for L2 (median, 6.3 years [IQR, 3.2–10.9 years]) were more recent than for L1 (median, 12 years [IQR, 7–21 years]; P < .001) and L3 (median, 10.4 years [IQR, 6.8–15.0 years]; P = .001) but not L4 (median, 9.0 years [IQR, 2.9–13.8 years]; P = .071) (Figure 2C).

Dated Bayesian phylogenetic tree of 612 Mycobacterium tuberculosis isolates. Color bars represent antibiotic resistance and branch colors represent bootstrap support. Tip labels are shaded to depict lineage. The internal tree scale is depicted via solid and dashed blue concentric circles with time noted along the axis.
Figure 1.

Dated Bayesian phylogenetic tree of 612 Mycobacterium tuberculosis isolates. Color bars represent antibiotic resistance and branch colors represent bootstrap support. Tip labels are shaded to depict lineage. The internal tree scale is depicted via solid and dashed blue concentric circles with time noted along the axis.

Lineage-wise distribution of terminal branch lengths (natural log) (A), pairwise single-nucleotide substitution (SNS) distance (B), and branching times (C), using 612 tuberculosis isolates from Pune, India. P values calculated using 1-sided 2-sample Kolmogorov-Smirnov test.
Figure 2.

Lineage-wise distribution of terminal branch lengths (natural log) (A), pairwise single-nucleotide substitution (SNS) distance (B), and branching times (C), using 612 tuberculosis isolates from Pune, India. P values calculated using 1-sided 2-sample Kolmogorov-Smirnov test.

Propensity to Propagate

We investigated host-related factors associated with TB transmission for each host using a previously developed score called the PTP [33] (Host Contribution to Transmission section Methods). The median PTP for all the 612 hosts was 1.02 (IQR, 0.88–1.17) and did not vary significantly by lineage (L1: median, 1.01 [IQR, 0.88–1.17], L2: 1.17 [IQR, 1.01–1.30], L3: 1.01 [IQR, 0.88–1.17], L4: 1.06 [IQR, 0.89–1.17]; P > .1 for each pairwise test between lineages; Figure 3).

Lineage-wise distribution of host propensity to propagate (PTP). Subjects with isolates belonging to lineage 2 did not have a significantly higher PTP compared to other 3 lineages based on a 1-sided 2-sample Kolmogorov-Smirnov test (P > .05).
Figure 3.

Lineage-wise distribution of host propensity to propagate (PTP). Subjects with isolates belonging to lineage 2 did not have a significantly higher PTP compared to other 3 lineages based on a 1-sided 2-sample Kolmogorov-Smirnov test (P > .05).

Mtb Clusters

We investigated clusters as a measure of possible recent transmission [25]. When evaluated cumulatively over a range of SNS distances spanning 1–100 SNSs, we found L2 and L4 to have a higher proportion of isolates belonging to clusters at any given SNS cut-off (Figure 4). The odds ratio of clustering ranged from 2.2 to 10.6 for L2 and L4 relative to L3 and L1 across 5 SNS thresholds ranging from 5 [26] to 25 (P < .003, Supplementary Table 6). We studied host factors including smear grade, age, and alcohol use, as well as strain factors, specifically lineage and rifampin resistance, for association with clustering (as defined by a threshold of ≤12 SNSs) in a multivariable logistic model (Supplementary Table 7). Compared to L1, infection with L4 had 5.1 times the odds of clustering after adjustment (P < .001). L2 had an adjusted odds clustering of 2.1 compared with L1 but was not statistically significant (P = .27).

Proportion of isolates belonging to clusters by lineage (L) based on a range of single-nucleotide substitution (SNS) distance threshold.
Figure 4.

Proportion of isolates belonging to clusters by lineage (L) based on a range of single-nucleotide substitution (SNS) distance threshold.

As terminal branch lengths, pairwise distances, clustering, and branching times are subject to bias due to differing molecular clock rates by lineage, we computed the THD as an alternative metric of transmissibility or epidemiologic success [6]. At short time scales, the lineage-specific THD distributions mirrored the order of distributions of pairwise distance, clustering, and branching times and were statistically significantly different (Supplementary Figure 1). We conducted multivariable linear regression of host and bacterial factors using the isolate THD and reproduced the association of L4 with epidemiological success (vs L1) that we measured in the logistic regression using clustering as the measure of transmissibility (P < .001, Supplementary Table 8). In addition, we measured a significant association between L2 and epidemiologic success (high THD, compared with L1 P < .001). L3 was associated with lower epidemiologic success (P = .003, Supplementary Table 8).

Skyline Analysis

To assess longer-term epidemiological success, we performed a Bayesian skyline analysis comparing the four major lineages (Supplementary Methods). The skyline analysis provided evidence of long-term epidemic growth for L4, and a more recent introduction of L2 into this community than L4 or L3. L3 also showed evidence of past epidemic growth (ca. 1400–1800) (Supplementary Figure 2).

Clustering by Resistance Phenotype

Of the clustered isolates, 7.6% of L1, 50.0% of L2, 22.2% of L3, and 20% of L4 harbored resistance to one or more drugs (Supplementary Table 9). One L4 cluster network of four isolates (all ≤6 SNSs apart) tested levofloxacin monoresistance and harbored the D94G mutation in gyrA (Figure 5). These four isolates with levofloxacin monoresistance belonged to participants with no known epidemiological links. Two additional L4 pairs of isolates (≤10 SNSs apart) harbored isoniazid monoresistance for a total of 20% of clustered isolates harboring drug resistance (n = 40). Another L4 cluster of seven isolates (≤10 SNSs apart) was pan-susceptible (Figure 5).

Largest network of genetically closely related lineage 4 isolates identified using a single-nucleotide substitution (SNS) cut-off of ≤10 SNSs. Each vertex in this network is an isolate and each edge represents a SNS distance of ≤10. A, All isolates were rifampin-susceptible (genotypic and phenotypic) and levofloxacin-resistant based on genotypic prediction and were ≤5 SNSs apart, except for 678 and 662, which were 6 SNSs apart. B, All isolates were pan-susceptible based on genotype and phenotype. The numbers near each vertex depict age of study participant in years, sex, and human immunodeficiency virus coinfection status. The number of branches represents the number of SNSs separating the isolates represented by the connected nodes. Abbreviations: F, female; HIV-, human immunodeficiency virus negative; HIV+, human immunodeficiency virus positive; M, male.
Figure 5.

Largest network of genetically closely related lineage 4 isolates identified using a single-nucleotide substitution (SNS) cut-off of ≤10 SNSs. Each vertex in this network is an isolate and each edge represents a SNS distance of ≤10. A, All isolates were rifampin-susceptible (genotypic and phenotypic) and levofloxacin-resistant based on genotypic prediction and were ≤5 SNSs apart, except for 678 and 662, which were 6 SNSs apart. B, All isolates were pan-susceptible based on genotype and phenotype. The numbers near each vertex depict age of study participant in years, sex, and human immunodeficiency virus coinfection status. The number of branches represents the number of SNSs separating the isolates represented by the connected nodes. Abbreviations: F, female; HIV-, human immunodeficiency virus negative; HIV+, human immunodeficiency virus positive; M, male.

DISCUSSION

Using Mtb WGS from participants with pulmonary TB in western India, we characterize differences in transmission and resistance between the 4 major Mtb lineages. We find evidence supporting recent epidemiological success of L4 and L2 in our setting, with a high rate of clustering and THD as compared with L1 and L3, consistent with findings from other parts of the world [6, 7, 39, 40]. For L4 (clustering and THD) and L2 (THD only), these differences persisted after controlling for host factors. At longer time scales Bayesian skyline analysis suggested growth in the TB epidemic overall, and a more recent introduction of L2 than L4 or L3. Our findings highlight ongoing transmission of TB and the possibility that L4 and L2 may increase in relative prevalence in western India unless TB control measures reduce transmission substantially.

Despite the relatively small sample, L2 was noted to be more drug resistant and to have a larger number of resistance acquisition events. This is in line with recent evidence from Mumbai [6] and emphasizes the need to rapidly diagnose resistance and refer patients for therapy [3]. We identify discordance of genotype- and phenotype-based resistance diagnosis for TB isolates from Pune. Genotypic prediction increased the sensitivity of resistance diagnosis as all resistant isolates missed by phenotype harbored known/canonical resistance variants. Yet, a substantial proportion of isolates predicted genotypically susceptible were measured to be phenotypically resistant; among these, we identified 20 possible resistance-candidate variants not otherwise documented in the WHO resistance catalogue. Identification of novel resistance-associated mutations may put available genotype-based assays for resistance detection at risk, accentuating the critical need for additional studies in western India.

Recent clinical trials support the use of fluoroquinolones (FQs) to shorten TB treatment duration to 4 months [41]. In our study, we identified a group of genetically closely related rifampin-susceptible isolates that harbored the D94G mutation in the gyrA gene, known to confer FQ resistance [37]. This raises the possibility of high fitness and transmissibility of some FQ-resistant isolates [1, 15, 42–44], despite the recognized fitness cost of gyrA mutations in other bacteria [45], and supports the need for heightened surveillance and comprehensive DST for regimen selection [46].

This study had several limitations. Our sample of participants seeking active TB care is a small subsample of circulating TB in this community with risk for bias including missed transmission links, or incomplete sampling of certain lineages. However, lineage assignments were unknown at the time of subject enrollment, so sampling is unlikely to affect our findings. The use of WGS does not always allow inference of direction of transmission and misses genetic diversity in approximately 10% of the genome [47, 29], underestimating the genetic distance between isolates, but this should impact all isolates equally, supporting our differential clustering finding between lineages. Clustering and branch length comparisons, although routinely used in current public health practice, are imperfect proxies of recent transmission and would ideally be supplemented by social network analysis that was not performed in this study. We addressed this by using multiple independent metrics of epidemiological success that account for differences in evolutionary rates and effective population size between the lineages, and studied the effect of host variable in multivariate models. Last, data were not available on participant drug use history, occupation, or homelessness. These host factors can contribute to TB transmission. Our results are also consistent with recently published literature on the differential success and transmissibility of L4 and L2 [3, 6].

In conclusion, our findings highlight that differences between Mtb lineages may have implications for TB surveillance and management. To achieve control, resources will need to be directed toward interrupting transmission by increasing efforts toward active case finding, contact tracing, early diagnosis, and treatment. The wider adoption of WGS can assist these efforts by providing quicker and more comprehensive genotype-based DST results allowing clinicians to tailor therapy sooner, in turn decreasing transmission.

Supplementary Data

Supplementary materials are available at The Journal of Infectious Diseases online (http://jid.oxfordjournals.org/). Supplementary materials consist of data provided by the author that are published to benefit the reader. The posted materials are not copyedited. The contents of all supplementary data are the sole responsibility of the authors. Questions or messages regarding errors should be addressed to the author.

Notes

Acknowledgments. The authors wish to thank the study participants and acknowledge the clinical staff who collected and maintained samples, and the laboratory staff who conducted the culturing, phenotyping, and genotyping work across multiple institutions.

Author contributions. The study question and analysis design was conceived by A. D. and M. F. Analysis was executed by A. D. with key contributions from Y. E., L. F., and M. G.; A. D. and M. F. had access to the data and are responsible for study integrity. V. M., with contributions from A. G., J. G., A. K., R. K., R. L., N. G., N. P., J. A. T., M. P., S. D., and D. K., led the field participant recruitment, data collection, and sequencing coordination and secured grant funding. D. M. E. and M. S. performed the sequencing. A. D. and M. F. wrote the manuscript. All authors reviewed the manuscript and provided edits.

Financial support. This work was supported through The Impact of Diabetes on TB Treatment Outcomes project (NIH/NIAID R01A1097494 to J. G.); Through the Regional Prospective Observational Reserach for Tuberculosis (RePORT) India consortium (CTRIUMPH: Cohort for TB Research by the Indo-US Medical Partnership USB1–31147-XX-13 to V. M., A. G., A. K., R. K., R. L., N. G., N. P., J. A. T.), US Civilian Research and Development Foundation (CRDF) (OISE-17-63221 to V. M.) Indian Department of Biotechnology; Baltimore Washington India Clinical Trials Unit (BWI CTU) (NIAID UM1AI069465); Boston Children’s Hospital Office of Faculty Development Career Development Fellowship and the Bushrod H. Campbell and Adah F. Hall Charity Fund/Charles A. King Trust Postdoctoral Fellowship to A. D.; National Institutes of Health (grant number R01 AI55765 to M. R. F.); National Institute of Allergy and Infectious Diseases (grant numbers K23AI135102 and R21AI122922 to J. A. T.); and German Research Foundation (GR5643/1-1 to M. I. G.). The sequencing work was supported by the ReSeqTB sequencing platform, with direct funding from the Bill & Melinda Gates Foundation (grant number OPP1115887 to D. M. E. and M. S.).

References

1

World Health Organization
.
Global tuberculosis report 2020
.
2020
. http://www.who.int/tb/publications/global_report/en/. Accessed 22 July 2022.

2

Ragonnet
 
R
,
Trauer
 
JM
,
Geard
 
N
,
Scott
 
N
,
McBryde
 
ES
.
Profiling Mycobacterium tuberculosis transmission and the resulting disease burden in the five highest tuberculosis burden countries
.
BMC Med
 
2019
;
17
:
208
.

3

Atre
 
SR
,
Jagtap
 
JD
,
Faqih
 
MI
, et al.   
Tuberculosis pathways to care and transmission of multidrug resistance in India
.
Am J Respir Crit Care Med
 
2022
;
205
:
233
41
.

4

Mathema
 
B
,
Andrews
 
JR
,
Cohen
 
T
, et al.   
Drivers of tuberculosis transmission
.
J Infect Dis
 
2017
;
216
(
Suppl_6
):
S644
53
.

5

Gagneux
 
S
.
Ecology and evolution of Mycobacterium tuberculosis
.
Nat Rev Microbiol
 
2018
;
16
:
202
13
.

6

Dreyer
 
V
,
Mandal
 
A
,
Dev
 
P
, et al.   
High fluoroquinolone resistance proportions among multidrug-resistant tuberculosis driven by dominant L2 Mycobacterium tuberculosis clones in the Mumbai metropolitan region
.
Genome Med
 
2022
;
14
:
95
.

7

Holt
 
KE
,
McAdam
 
P
,
Thai
 
PVK
, et al.   
Frequent transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for the EsxW Beijing variant in Vietnam
.
Nat Genet
 
2018
;
50
:
849
56
.

8

Coscolla
 
M
,
Gagneux
 
S
.
Consequences of genomic diversity in Mycobacterium tuberculosis
.
Semin Immunol
 
2014
;
26
:
431
44
.

9

Gröschel
 
MI
,
Pérez-Llanos
 
FJ
,
Diel
 
R
, et al.  Host-pathogen co-adaptation shapes susceptibility to infection with Mycobacterium tuberculosis. medRxiv [Preprint]. Posted online 5 August 2022. https://doi.org/10.1101/2022.08.04.22278337. Accessed 8 January 2023.

10

Menardo
 
F
.
Understanding drivers of phylogenetic clustering and terminal branch lengths distribution in epidemics of Mycobacterium tuberculosis
.
Elife
 
2022
;
11
:
e76780
.

11

Drummond
 
AJ
,
Bouckaert
 
RR
.
Bayesian evolutionary analysis with BEAST
.
Cambridge, UK
:
Cambridge University Press
,
2015
.

12

Huang
 
H
,
Ding
 
N
,
Yang
 
T
, et al.   
Cross-sectional whole-genome sequencing and epidemiological study of multidrug-resistant Mycobacterium tuberculosis in China
.
Clin Infect Dis
 
2018
;
69
:
405
13
.

13

Poonawala
 
H
,
Kumar
 
N
,
Peacock
 
SJ
.
A review of published spoligotype data indicates the diversity of Mycobacterium tuberculosis from India is under-represented in global databases
.
Infect Genet Evol
 
2020
;
78
:
104072
.

14

Central TB Division, Ministry of Health and Family Welfare
.
India TB report 2020: National Tuberculosis Programme annual report
.
2020
. https://tbcindia.gov.in/WriteReadData/l892s/India%20TB%20Report%202020.pdf. Accessed 22 July 2022.

15

Ministry of Health and Family Welfare, Government of India
. Report of the first national anti-tuberculosis drug resistance survey, 2014–16. 2018. https://tbcindia.gov.in/WriteReadData/l892s/4187947827National%20Anti-TB%20Drug%20Resistance%20Survey.pdf. Accessed 31 January 2019.

16

Gupte
 
A
,
Padmapriyadarsini
 
C
,
Mave
 
V
, et al.   
Cohort for Tuberculosis Research by the Indo-US Medical Partnership (CTRIUMPH): protocol for a multicentric prospective observational study
.
BMJ Open
 
2016
;
6
:
e010542
.

17

Mave
 
V
,
Meshram
 
S
,
Lokhande
 
R
, et al.   
Prevalence of dysglycemia and clinical presentation of pulmonary tuberculosis in western India
.
Int J Tuberc Lung Dis
 
2017
;
21
:
1280
7
.

18

Mave
 
V
,
Gaikwad
 
S
,
Barthwal
 
M
, et al.   
Diabetes mellitus and tuberculosis treatment outcomes in Pune, India
.
Open Forum Infect Dis
 
2021
;
8
:
ofab097
.

19

Paradkar
 
M
,
Padmapriyadarsini
 
C
,
Jain
 
D
, et al.   
Tuberculosis preventive treatment should be considered for all household contacts of pulmonary tuberculosis patients in India
.
PLoS One
 
2020
;
15
:
e0236743
.

20

GitHub, Inc
. A wrapper pipe for variant calling and genome assembly for M. tuberculosis: github.com/farhat-lab/megapipe. 2018. https://github.com/farhat-lab/megapipe. Accessed 30 August 2018.

21

Coll
 
F
,
McNerney
 
R
,
Guerra-Assunção
 
JA
, et al.   
A robust SNP barcode for typing Mycobacterium tuberculosis complex strains
.
Nature Commun
 
2014
;
5
:
4812
.

22

Stucki
 
D
,
Ballif
 
M
,
Bodmer
 
T
, et al.   
Tracking a tuberculosis outbreak over 21 years: strain-specific single-nucleotide polymorphism typing combined with targeted whole-genome sequencing
.
J Infect Dis
 
2015
;
211
:
1306
16
.

23

Freschi
 
L
,
Vargas
 
R
 Jr
,
Husain
 
A
, et al.   
Population structure, biogeography and transmissibility of Mycobacterium tuberculosis
.
Nat Commun
 
2021
;
12
:
6099
.

24

GitHub, Inc
. fast-lineage-caller. 2020. https://github.com/farhat-lab/fast-lineage-caller. Accessed 10 January 2022.

25

Hatherell
 
H-A
,
Didelot
 
X
,
Pollock
 
SL
, et al.   
Declaring a tuberculosis outbreak over with genomic epidemiology
.
Microb Genom
 
2016
;
2
:
e000060
.

26

Walker
 
TM
,
Ip
 
CLC
,
Harrell
 
RH
, et al.   
Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study
.
Lancet Infect Dis
 
2013
;
13
:
137
46
.

27

Bryant
 
JM
,
Schürch
 
AC
,
van Deutekom
 
H
, et al.   
Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data
.
BMC Infect Dis
 
2013
;
13
:
110
.

28

Walker
 
TM
,
Lalor
 
MK
,
Broda
 
A
, et al.   
Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007–12, with whole pathogen genome sequences: an observational study
.
Lancet Respir Med
 
2014
;
2
:
285
92
.

29

Meehan
 
CJ
,
Goig
 
GA
,
Kohl
 
TA
, et al.   
Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues
.
Nat Rev Microbiol
 
2019
;
17
:
533
45
.

30

Nikolayevskyy
 
V
,
Niemann
 
S
,
Anthony
 
R
, et al.   
Role and value of whole genome sequencing in studying tuberculosis transmission
.
Clin Microbiol Infect
 
2019
;
25
:
1377
82
.

31

Tagliani
 
E
,
Anthony
 
R
,
Kohl
 
TA
, et al.   
Use of a whole genome sequencing-based approach for Mycobacterium tuberculosis surveillance in Europe in 2017–2019: an ECDC pilot study
.
Eur Respir J
 
2021
;
57
:
2002272
.

32

Menardo
 
F
,
Duchêne
 
S
,
Brites
 
D
,
Gagneux
 
S
.
The molecular clock of Mycobacterium tuberculosis
.
PLoS Pathog
 
2019
;
15
:
e1008067
.

33

Nebenzahl-Guimaraes
 
H
,
Borgdorff
 
MW
,
Murray
 
MB
,
van Soolingen
 
D
.
A novel approach—the propensity to propagate (PTP) method for controlling for host factors in studying the transmission of Mycobacterium tuberculosis
.
PLoS One
 
2014
;
9
:
e97816
.

34

Harrison
 
E
,
Drake
 
T
,
Ots
 
R
. finalfit: quickly create elegant regression results tables and plots when modelling. 2021. https://CRAN.R-project.org/package=finalfit. Accessed 10 January 2022.

35

Farhat
 
MR
,
Sultana
 
R
,
Iartchouk
 
O
, et al.   
Genetic determinants of drug resistance in Mycobacterium tuberculosis and their diagnostic value
.
Am J Respir Crit Care Med
 
2016
;
194
:
621
30
.

36

Gröschel
 
MI
,
Owens
 
M
,
Freschi
 
L
, et al.   
GenTB: a user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
.
Genome Med
 
2021
;
13
:
138
.

37

World Health Organization
.
Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance
.
2021
. https://www.who.int/publications/i/item/9789240028173. Accessed 10 January 2022.

38

Colijn
 
C
,
Gardy
 
J.
 
Phylogenetic tree shapes resolve disease transmission patterns
.
Evol Med Public Health
 
2014
;
2014
:
96
108
.

39

Guerra-Assunção
 
JA
,
Crampin
 
AC
,
Houben
 
RMGJ
, et al.   
Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area
.
Elife
 
2015
;
4
:
e05166
.

40

Sobkowiak
 
B
,
Banda
 
L
,
Mzembe
 
T
,
Crampin
 
AC
,
Glynn
 
JR
,
Clark
 
TG
.
Bayesian reconstruction of Mycobacterium tuberculosis transmission networks in a high incidence area over two decades in Malawi reveals associated risk factors and genomic variants
.
Microb Genom
 
2020
;
6
:
e000361
.

41

Dorman
 
SE
,
Nahid
 
P
,
Kurbatova
 
EV
, et al.   
Four-Month Rifapentine Regimens with or without Moxifloxacin for Tuberculosis.
 
N Engl J Med.
 
2021
;
384
:
1705
18
. doi:.

42

Jain
 
A
,
Dixit
 
P
,
Prasad
 
R
.
Pre-XDR & XDR in MDR and ofloxacin and kanamycin resistance in non-MDR Mycobacterium tuberculosis isolates
.
Tuberculosis (Edinb)
 
2012
;
92
:
404
6
.

43

Selvakumar
 
N
,
Kumar
 
V
,
Balaji
 
S
, et al.   
High rates of ofloxacin resistance in Mycobacterium tuberculosis among both new and previously treated patients in Tamil Nadu, south India
.
PLoS One
 
2015
;
10
:
e0117421
.

44

Sharma
 
R
,
Sharma
 
SK
,
Singh
 
BK
,
Mittal
 
A
,
Kumar
 
P
.
High degree of fluoroquinolone resistance among pulmonary tuberculosis patients in New Delhi, India
.
Indian J Med Res
 
2019
;
149
:
62
6
.

45

Melnyk
 
AH
,
Wong
 
A
,
Kassen
 
R
.
The fitness costs of antibiotic resistance mutations
.
Evol Appl
 
2015
;
8
:
273
83
.

46

Udwadia
 
ZF
,
Tornheim
 
JA
,
Ganatra
 
S
,
DeLuca
 
A
,
Rodrigues
 
CS
,
Gupta
 
A
.
Few eligible for the newly recommended short course MDR-TB regimen at a large Mumbai private clinic
.
BMC Infect Dis
 
2019
;
19
:
94
.

47

Cole
 
ST
,
Brosch
 
R
,
Parkhill
 
J
, et al.   
Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence
.
Nature
 
1998
;
393
:
537
44
.

48

Wirth
 
T
,
Wong
 
V
,
Vandenesch
 
F
,
Rasigade
 
JP
.
Applied phyloepidemiology: detecting drivers of pathogen transmission from genomic signatures using density measures
.
Evol Appl
 
2020
;
13
:
1513
25
.

Author notes

A. D. and Y. E. contributed equally to this work.

V. M. and M. F. contributed equally to this work.

Potential conflicts of interest. All authors: No reported conflicts.

All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.

This work is written by (a) US Government employee(s) and is in the public domain in the US.

Supplementary data