Abstract

Background

Tuberculosis (TB) is among the largest infectious causes of death worldwide, and there is a need for a time- and resource-effective diagnostic methods. In this novel and exploratory study, we show the potential of using buccal swabs to collect human DNA and investigate the DNA methylation (DNAm) signatures as a diagnostic tool for TB.

Methods

Buccal swabs were collected from patients with pulmonary TB (n = 7), TB-exposed persons (n = 7), and controls (n = 9) in Sweden. Using Illumina MethylationEPIC array, the DNAm status was determined.

Results

We identified 5644 significant differentially methylated CpG sites between the patients and controls. Performing the analysis on a validation cohort of samples collected in Kenya and Peru (patients, n = 26; exposed, n = 9; control, n = 10) confirmed the DNAm signature. We identified a TB consensus disease module, significantly enriched in TB-associated genes. Last, we used machine learning to identify a panel of 7 CpG sites discriminative for TB and developed a TB classifier. In the validation cohort, the classifier performed with an area under the curve of 0.94, sensitivity of 0.92, and specificity of 1.

Conclusions

In summary, the result from this study shows clinical implications of using DNAm signatures from buccal swabs to explore new diagnostic strategies for TB.

Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), was one of the most fatal infectious diseases worldwide in 2022 [1]. Mtb is spread via aerosol when an infected individual coughs or sneezes. The immunological events following exposure are heterologous and range from clearance of the bacteria with innate or adaptive immune responses, to latent infection or subclinical or clinical TB [2]. There are several challenges with diagnosing both latent and active TB, and the World Health Organization (WHO) has developed the End TB Strategy and issued a global priority in research for new diagnostic tools [3]. The current diagnostic methods for latent TB infection include the Mantoux tuberculin skin test (TST) and the interferon-gamma release assay (IGRA), both of which have several limitations. TST and IGRA are based on circulating adaptive immune memory but cannot distinguish between a latent, active, or eliminated infection [4–6]. Diagnosis of active TB requires detection of Mtb in sputum through microscopy, culture, or nucleic acid amplification tests. The most contagious patients can be detected by smear microscopy, but the time to diagnosis with culture can take up to 6 weeks [7]. In addition, diagnosis based on sputum is of limited use in children and in cases of extrapulmonary TB, which is common in people with human immunodeficiency virus (HIV) infection. The laboratory handling of sputum samples used for culture and microscopy requires experienced personnel and laboratory facilities with high biosafety level. There is an urgent need for a diagnostic method that is resource-effective to enable rapid diagnosis in TB-endemic countries. Epigenetic signatures have become recognized as new promising tools for the diagnosis of different diseases, including cancer, neurodegenerative diseases, and cardiovascular disease (reviewed in [8–10]). Transcriptomic signatures from whole blood have been widely studied and diagnostic signatures for TB proposed [11]. DNA methylation (DNAm) signatures of peripheral blood mononuclear cells, whole blood, and lung immune cells have also been shown to distinguish both active and latent TB [12–18]. The buccal mucosa is a part of the mucosal immunity and the first line of defense in respiratory infections [19]. Humoral immune responses against Mtb have previously been described in saliva of TB patients [20]. During active TB, the bacteria can be present in the oral cavity and oral TB infection can develop from a pulmonary infection [21]. Several studies have investigated the possibility to diagnose TB by detection of Mtb DNA from mouth swab samples, but the sensitivity varies largely between studies [22–25]. In this study, we aimed to investigate if TB infection and exposure could be reflected in the DNA methylome of buccal cells. We investigated the DNAm patterns using Illumina EPIC Array and identified differently methylated CpG sites (DMCs) between the patients and controls. Using a validation cohort of samples collected in Peru and Kenya, we confirmed DNAm changes in the buccal mucosa of patients with TB compared to controls. The results showed cross-continental DNAm differences in buccal swabs from TB patients and controls. We further used machine learning to identify a panel of 7 CpG sites with TB case/control discriminative potential.

METHODS

Patients (n = 7) and individuals with occupational- or household-related TB exposure (n = 7) were enrolled in the study at the Department of Infectious Diseases at Linköping University Hospital. All patients had active pulmonary TB, and 2 patients also had bacteria spreading to other organs including pancreas and lymph nodes. TB patients in Sweden were diagnosed with sputum microscopy, polymerase chain reaction (PCR), or culture and were HIV negative (Supplementary Table 1). With new TB cases discovered in Sweden, there is a routine contact tracing around the index to identify exposed individuals. Individuals with >24 hours of exposure to a contagious patient or >8 hours of exposure with a highly contagious patient are enrolled in contact tracing and are tested with IGRA. Highly contagious patients were defined by positive smear microscopy diagnosis. Individuals enrolled in the contact tracing were asked to participate in the study. Healthy controls (n = 9) were enrolled at Linköping University. Additional participants were recruited in Eldoret, Kenya (patients; n = 19) and in Lima, Peru (patients; n = 7, control; n = 10, exposed; n = 9). All patients donated sample within 2 weeks from diagnosis. TB patients in Peru and Kenya were diagnosed with smear microscopy and GeneXpert PCR. In Kenya, urine lipoarabinomannan (LAM) or clinical diagnosis was also used. All TB patients in Peru had pulmonary TB; in Kenya, 15 patients had pulmonary and 4 had extrapulmonary TB. All TB patients in Peru were HIV negative; in Kenya, 5 TB patients were HIV positive (Supplementary Table 1). The exposed population in Peru was defined as household contacts to a TB patient (n = 5) or healthcare workers with occupational TB exposure (n = 4). Study participants donated buccal swab samples and blood samples for IGRA with the QuantiFERON TB-Gold Plus test (SSI Diagnostica, Hillerød, Denmark), which was analyzed according to the manufacturer's instructions. Buccal swab samples were collected using OmniSwab (Qiagen, Hilden, Germany). The swab was rubbed in the buccal mucosa 5 times up and down for 10 seconds and ejected into a 2-mL tube; 1 swab per cheek was collected from each participant. The buccal swab was stored at 4°C for a maximum of 4 hours before DNA isolation was performed. DNA was extracted from the buccal swabs using QIAamp DNA mini kit (Qiagen) following the manufacturer's instructions for DNA isolation from buccal swabs. The DNA was analyzed using Illumina Infinium MethylationEPIC BeadChip microarray (Illumina, California). Bioinformatic analysis and statistical analysis was performed (Supplementary File 1: Supplementary Methods, Bioinformatics and Statistics).

RESULTS

Study Cohort and Design

We included study participants with active TB (patients; n = 7), persons with occupational- or household-related TB exposure (exposed; n = 7), and healthy controls (controls; n = 9) to investigate epigenetic patterns in buccal swabs in TB infection and exposure. The participants donated buccal swabs and blood samples for IGRA. The demographics of the included study participants are shown in Table 1. There were no significant differences regarding sex, age, weight, body mass index, or BCG vaccination status. We observed a significant difference in the IGRA status and height (P < .001 and P = .023, respectively). Furthermore, we included a validation cohort of participants included from Kenya (patients; n = 19) and Peru (patients; n = 7, control; n = 10, exposed; n = 9) to validate the results of the pilot cohort. The demographics of the validation cohort showed significant differences between weight and body mass index (BMI) (P < .001 and P < .001, respectively) (Table 2). The patients had significantly lower weight and BMI compared to exposed individuals (P < .001, P < .001) and controls (P < .001, P = .023). Malnutrition and underweight are both risk factors for developing TB and features of the disease [26]. The TB disease phenotype, diagnostic method used, and HIV status of all patients is shown in Supplementary Table 1.

Table 1.

Demographics of the Study Participants in Pilot Cohort

CharacteristicPatient (n = 7)Exposed (n = 7)Control (n = 9)P ValuePost Hoc P value
Sex.838
 Male4 (57.1)3 (42.9)4 (44.4)
 Female3 (42.9)4 (57.1)5 (55.6)
BCG vaccine.551
 Yes2 (33.3)5 (71.4)4 (44.4)
 No3 (50)2 (28.6)5 (55.6)
 NA2 (28.6)0 (0)0 (0)
IGRA status<.001
 Positive5 (71.4)0 (0)0 (0)
 Negative1 (14.3)7 (100)7 (77.8)
 NA1 (14.3)
Age, y.857
 Mean ± SD37.5 ± 9.535.9 ± 14.933.44 ± 14.0
 Min, max25, 5018, 6118, 51
Weight, kg.490
 Mean ± SD61 ± 11.766.3 ± 20.570.89 ± 16.51
 Min, max44, 8049, 11052, 100
Height, m.023Patient–Control .308
 Mean ± SD1.68 ± 9.31.63 ± 0.11.78 ± 0.11Control–Exposed .021
 Min, max1.54, 1.831.55, 1.771.65, 1.98Exposed–Patient .943
BMI, kg/m2.427
 Mean ± SD21.49 ± 2.724.58 ± 5.222.4 ± 4.5
 Min, max18.6, 24.519.9, 35.117.6, 30.9
CharacteristicPatient (n = 7)Exposed (n = 7)Control (n = 9)P ValuePost Hoc P value
Sex.838
 Male4 (57.1)3 (42.9)4 (44.4)
 Female3 (42.9)4 (57.1)5 (55.6)
BCG vaccine.551
 Yes2 (33.3)5 (71.4)4 (44.4)
 No3 (50)2 (28.6)5 (55.6)
 NA2 (28.6)0 (0)0 (0)
IGRA status<.001
 Positive5 (71.4)0 (0)0 (0)
 Negative1 (14.3)7 (100)7 (77.8)
 NA1 (14.3)
Age, y.857
 Mean ± SD37.5 ± 9.535.9 ± 14.933.44 ± 14.0
 Min, max25, 5018, 6118, 51
Weight, kg.490
 Mean ± SD61 ± 11.766.3 ± 20.570.89 ± 16.51
 Min, max44, 8049, 11052, 100
Height, m.023Patient–Control .308
 Mean ± SD1.68 ± 9.31.63 ± 0.11.78 ± 0.11Control–Exposed .021
 Min, max1.54, 1.831.55, 1.771.65, 1.98Exposed–Patient .943
BMI, kg/m2.427
 Mean ± SD21.49 ± 2.724.58 ± 5.222.4 ± 4.5
 Min, max18.6, 24.519.9, 35.117.6, 30.9

Categorical variables are shown as No. (%). Continuous variables are shown as mean ± SD, and min, max shows the range of data. Significance tested in SPSS with χ2 test for categorical variables and independent sample Kruskal-Wallis test for continuous variables. For significant findings, post hoc testing with Bonferroni was applied.

Abbreviations: BMI, body mass index; IGRA, interferon-gamma release assay; NA, not applicable; SD, standard deviation.

Table 1.

Demographics of the Study Participants in Pilot Cohort

CharacteristicPatient (n = 7)Exposed (n = 7)Control (n = 9)P ValuePost Hoc P value
Sex.838
 Male4 (57.1)3 (42.9)4 (44.4)
 Female3 (42.9)4 (57.1)5 (55.6)
BCG vaccine.551
 Yes2 (33.3)5 (71.4)4 (44.4)
 No3 (50)2 (28.6)5 (55.6)
 NA2 (28.6)0 (0)0 (0)
IGRA status<.001
 Positive5 (71.4)0 (0)0 (0)
 Negative1 (14.3)7 (100)7 (77.8)
 NA1 (14.3)
Age, y.857
 Mean ± SD37.5 ± 9.535.9 ± 14.933.44 ± 14.0
 Min, max25, 5018, 6118, 51
Weight, kg.490
 Mean ± SD61 ± 11.766.3 ± 20.570.89 ± 16.51
 Min, max44, 8049, 11052, 100
Height, m.023Patient–Control .308
 Mean ± SD1.68 ± 9.31.63 ± 0.11.78 ± 0.11Control–Exposed .021
 Min, max1.54, 1.831.55, 1.771.65, 1.98Exposed–Patient .943
BMI, kg/m2.427
 Mean ± SD21.49 ± 2.724.58 ± 5.222.4 ± 4.5
 Min, max18.6, 24.519.9, 35.117.6, 30.9
CharacteristicPatient (n = 7)Exposed (n = 7)Control (n = 9)P ValuePost Hoc P value
Sex.838
 Male4 (57.1)3 (42.9)4 (44.4)
 Female3 (42.9)4 (57.1)5 (55.6)
BCG vaccine.551
 Yes2 (33.3)5 (71.4)4 (44.4)
 No3 (50)2 (28.6)5 (55.6)
 NA2 (28.6)0 (0)0 (0)
IGRA status<.001
 Positive5 (71.4)0 (0)0 (0)
 Negative1 (14.3)7 (100)7 (77.8)
 NA1 (14.3)
Age, y.857
 Mean ± SD37.5 ± 9.535.9 ± 14.933.44 ± 14.0
 Min, max25, 5018, 6118, 51
Weight, kg.490
 Mean ± SD61 ± 11.766.3 ± 20.570.89 ± 16.51
 Min, max44, 8049, 11052, 100
Height, m.023Patient–Control .308
 Mean ± SD1.68 ± 9.31.63 ± 0.11.78 ± 0.11Control–Exposed .021
 Min, max1.54, 1.831.55, 1.771.65, 1.98Exposed–Patient .943
BMI, kg/m2.427
 Mean ± SD21.49 ± 2.724.58 ± 5.222.4 ± 4.5
 Min, max18.6, 24.519.9, 35.117.6, 30.9

Categorical variables are shown as No. (%). Continuous variables are shown as mean ± SD, and min, max shows the range of data. Significance tested in SPSS with χ2 test for categorical variables and independent sample Kruskal-Wallis test for continuous variables. For significant findings, post hoc testing with Bonferroni was applied.

Abbreviations: BMI, body mass index; IGRA, interferon-gamma release assay; NA, not applicable; SD, standard deviation.

Table 2.

Demographics of the Study Participants in Validation Cohort

CharacteristicPatient (n = 26)Exposed (n = 9)Control (n = 10)P ValuePost Hoc P value
Sex.297
 Male12 (46.2)2 (22.2)5 (50)
 Female14 (53.8)7 (77.8)5 (50)
Smoking<.001
 Yes2 (7.7)0 (0)0 (0)
Age, y.105
 Mean ± SD35.54 ± 16.59741.56 ± 14.38828.8 ± 10.433
 Min, max19, 7225, 7220, 54
Weight, kg<.001Patient–Control <.001
Patient–Exposed <.001
 Mean ± SD55.85 ± 11.97973.67 ± 12.82670.7 ± 10.914
 Min, max45, 9057, 9561, 95
Height, m.185
 Mean ± SD165.23 ± 9.02159.11 ± 9.36165.6 ± 10.617
 Min, max148, 185150, 159159, 183
BMI, kg/m2<.001Patient–Control .023
 Mean ± SD20.58 ± 5.13929 ± 4.025.9 ± 3.843Patient–Exposed .00
 Min, max15, 3424, 3421, 33
IGRA
 Positive051
 Negative049
 Unknown2600
 Min, max15, 3424, 3421, 33
Country
 Peru7 (21.2)9 (100)10 (100)
 Kenya19 (57.6)0 (0)0 (0)
CharacteristicPatient (n = 26)Exposed (n = 9)Control (n = 10)P ValuePost Hoc P value
Sex.297
 Male12 (46.2)2 (22.2)5 (50)
 Female14 (53.8)7 (77.8)5 (50)
Smoking<.001
 Yes2 (7.7)0 (0)0 (0)
Age, y.105
 Mean ± SD35.54 ± 16.59741.56 ± 14.38828.8 ± 10.433
 Min, max19, 7225, 7220, 54
Weight, kg<.001Patient–Control <.001
Patient–Exposed <.001
 Mean ± SD55.85 ± 11.97973.67 ± 12.82670.7 ± 10.914
 Min, max45, 9057, 9561, 95
Height, m.185
 Mean ± SD165.23 ± 9.02159.11 ± 9.36165.6 ± 10.617
 Min, max148, 185150, 159159, 183
BMI, kg/m2<.001Patient–Control .023
 Mean ± SD20.58 ± 5.13929 ± 4.025.9 ± 3.843Patient–Exposed .00
 Min, max15, 3424, 3421, 33
IGRA
 Positive051
 Negative049
 Unknown2600
 Min, max15, 3424, 3421, 33
Country
 Peru7 (21.2)9 (100)10 (100)
 Kenya19 (57.6)0 (0)0 (0)

Categorical variables are shown as No. (%). Continuous variables are shown as mean ± SD, and min, max shows the range of data. Significance tested in SPSS with χ2 test for categorical variables and independent sample Kruskal-Wallis test for continuous variables. For significant findings, post hoc testing with Bonferroni was applied.

Abbreviations: BMI, body mass index; IGRA, interferon-gamma release assay; SD, standard deviation.

Table 2.

Demographics of the Study Participants in Validation Cohort

CharacteristicPatient (n = 26)Exposed (n = 9)Control (n = 10)P ValuePost Hoc P value
Sex.297
 Male12 (46.2)2 (22.2)5 (50)
 Female14 (53.8)7 (77.8)5 (50)
Smoking<.001
 Yes2 (7.7)0 (0)0 (0)
Age, y.105
 Mean ± SD35.54 ± 16.59741.56 ± 14.38828.8 ± 10.433
 Min, max19, 7225, 7220, 54
Weight, kg<.001Patient–Control <.001
Patient–Exposed <.001
 Mean ± SD55.85 ± 11.97973.67 ± 12.82670.7 ± 10.914
 Min, max45, 9057, 9561, 95
Height, m.185
 Mean ± SD165.23 ± 9.02159.11 ± 9.36165.6 ± 10.617
 Min, max148, 185150, 159159, 183
BMI, kg/m2<.001Patient–Control .023
 Mean ± SD20.58 ± 5.13929 ± 4.025.9 ± 3.843Patient–Exposed .00
 Min, max15, 3424, 3421, 33
IGRA
 Positive051
 Negative049
 Unknown2600
 Min, max15, 3424, 3421, 33
Country
 Peru7 (21.2)9 (100)10 (100)
 Kenya19 (57.6)0 (0)0 (0)
CharacteristicPatient (n = 26)Exposed (n = 9)Control (n = 10)P ValuePost Hoc P value
Sex.297
 Male12 (46.2)2 (22.2)5 (50)
 Female14 (53.8)7 (77.8)5 (50)
Smoking<.001
 Yes2 (7.7)0 (0)0 (0)
Age, y.105
 Mean ± SD35.54 ± 16.59741.56 ± 14.38828.8 ± 10.433
 Min, max19, 7225, 7220, 54
Weight, kg<.001Patient–Control <.001
Patient–Exposed <.001
 Mean ± SD55.85 ± 11.97973.67 ± 12.82670.7 ± 10.914
 Min, max45, 9057, 9561, 95
Height, m.185
 Mean ± SD165.23 ± 9.02159.11 ± 9.36165.6 ± 10.617
 Min, max148, 185150, 159159, 183
BMI, kg/m2<.001Patient–Control .023
 Mean ± SD20.58 ± 5.13929 ± 4.025.9 ± 3.843Patient–Exposed .00
 Min, max15, 3424, 3421, 33
IGRA
 Positive051
 Negative049
 Unknown2600
 Min, max15, 3424, 3421, 33
Country
 Peru7 (21.2)9 (100)10 (100)
 Kenya19 (57.6)0 (0)0 (0)

Categorical variables are shown as No. (%). Continuous variables are shown as mean ± SD, and min, max shows the range of data. Significance tested in SPSS with χ2 test for categorical variables and independent sample Kruskal-Wallis test for continuous variables. For significant findings, post hoc testing with Bonferroni was applied.

Abbreviations: BMI, body mass index; IGRA, interferon-gamma release assay; SD, standard deviation.

DNA Methylation Pattern in Buccal Swabs Separates Patients, Exposed Contacts, and Healthy Controls

The DNA methylation status in >800 000 CpG sites was assessed using Illumina Infinium MethylationEPIC array. A singular value decomposition (SVD) analysis of the factors known to influence DNAm was performed (Supplementary Figure 1A) and the data were batch corrected (Supplementary Figure 1B). We performed an unsupervised clustering analysis using multidimensional scaling (MDS) of the 1000 most variable CpG sites in the dataset and observed separation of the groups (Figure 1A). To investigate if there were any significant differences between the groups, we identified DMCs (mean methylation difference [MMD], >0.2 and false discovery rate [FDR]–adjusted P < .05). There were 5644 significant DMCs between the patients and controls, 413 between patients and exposed individuals, and 309 between exposed individuals and controls. Using all significant DMCs (n = 5865), we created a heatmap showing a spectrum of DNAm changes in the exposed and separation of the patients and controls (Figure 1B). The overlap of the DMCs between the groups was analyzed in a Venn analysis and showed 5153 significant DMCs unique to the patients and controls (Figure 1C). Together, these results indicate that the DNAm profiles obtained from buccal mucosa differentiate TB patients, exposed individuals, and healthy controls. We further investigated the cellular heterogeneity of the samples, since different cell types display distinct DNAm patterns, which can influence the DNA methylomes in a mixed sample [27]. We identified epithelial cell proportions of 75% (standard error of the mean [SEM], 6.2%) in patients, 79% (SEM, 6.1%) in exposed individuals, and 85% (SEM, 3.7%) in controls. The proportion of neutrophils was 13.8% (SEM, 5.5%) in patients, 10% (SEM, 5%) in exposed individuals, and 5.8% (SEM, 2.2%) in controls. The remaining cells consisted of other leukocytes including B cells, natural killer cells, CD4+ T cells, and monocytes. There were no significant differences of the cell proportions between the groups (P = .893; Figure 1D), and the cellular heterogeneity identified in the buccal swabs samples was in line with previous findings based on DNAm data [28] and on microscopy characterization [29].

DNA methylation patterns in buccal swabs distinguish patients with active tuberculosis (TB) (pink triangles), TB-exposed individuals (green squares), and healthy controls (blue circles). A, Multidimensional scaling plot of the 1000 most variable CpG sites within the dataset. B, Heatmap of beta values of the differently methylated CpG sites (DMCs) identified in pairwise comparison across all groups with stringency criteria of adjusted P < .05 and mean methylation difference >0.2. Dendrogram shows separation based on groups. C, Venn diagram of the DMCs identified in the pairwise comparisons showing the largest amount of DMCs between the TB patients and healthy controls. D, Hierarchical epigenetic dissection of intra-sample heterogeneity of the data showing proportions of different cell types within the mouth swab samples. No significant difference of the cell types between groups was identified (Kruskal-Wallis test, P = .893).
Figure 1.

DNA methylation patterns in buccal swabs distinguish patients with active tuberculosis (TB) (pink triangles), TB-exposed individuals (green squares), and healthy controls (blue circles). A, Multidimensional scaling plot of the 1000 most variable CpG sites within the dataset. B, Heatmap of beta values of the differently methylated CpG sites (DMCs) identified in pairwise comparison across all groups with stringency criteria of adjusted P < .05 and mean methylation difference >0.2. Dendrogram shows separation based on groups. C, Venn diagram of the DMCs identified in the pairwise comparisons showing the largest amount of DMCs between the TB patients and healthy controls. D, Hierarchical epigenetic dissection of intra-sample heterogeneity of the data showing proportions of different cell types within the mouth swab samples. No significant difference of the cell types between groups was identified (Kruskal-Wallis test, P = .893).

Validation of DNA Methylation Pattern in Buccal Swab Samples of TB Patients, Exposed Individuals, and Controls Using a Validation Cohort

To validate the robustness and generalizability of our findings, we incorporated additional participants from Kenya and Peru into a validation cohort. This validation cohort, notably larger than the initial pilot cohort, provided a more rigorous test of our findings, particularly given the perfect separation observed in the pilot study. We performed an unsupervised clustering analysis using MDS of the 1000 most variable CpG sites in the dataset-confirmed separation of the groups (Figure 2A). Furthermore, we identified 413 significant DMCs between the patients and controls, 32 between patients and exposed individuals, and 51 between exposed individuals and controls (MMD >0.2 and FDR-adjusted P < .05). The overlap in DMCs identified in the pilot and validation cohort was investigated in a Venn analysis and showed 22 overlapping DMCs between the cohorts (Figure 2B). In summary, these results confirm the findings from the pilot cohort and support that DNAm from the buccal mucosa can distinguish TB patients among healthy controls and exposed.

A validation cohort confirms differential methylation pattern from buccal swabs between patients with tuberculosis (TB) and healthy controls. A, Multidimensional scaling plot of the 1000 most variable CpG sites from buccal swab samples of patients with TB (pink triangles), TB-exposed individuals (green squares), and healthy controls (blue circles) with 85% confidence ellipses. The interferon-gamma release assay (IGRA) status of participants is indicated with black outline. B, Differently methylated CpG sites (DMCs) between the patients and controls from the pilot cohort and validation cohort compared in a Venn analysis showing an overlap of 22 DMCs. C, Multidimensional scaling plot of the 1000 most variable CpG sits in DNA methylomes from buccal swab samples from TB patients (pink), TB-exposed individuals (green), and healthy controls (blue) from Kenya (triangles), Peru (squares), or Sweden (circles).
Figure 2.

A validation cohort confirms differential methylation pattern from buccal swabs between patients with tuberculosis (TB) and healthy controls. A, Multidimensional scaling plot of the 1000 most variable CpG sites from buccal swab samples of patients with TB (pink triangles), TB-exposed individuals (green squares), and healthy controls (blue circles) with 85% confidence ellipses. The interferon-gamma release assay (IGRA) status of participants is indicated with black outline. B, Differently methylated CpG sites (DMCs) between the patients and controls from the pilot cohort and validation cohort compared in a Venn analysis showing an overlap of 22 DMCs. C, Multidimensional scaling plot of the 1000 most variable CpG sits in DNA methylomes from buccal swab samples from TB patients (pink), TB-exposed individuals (green), and healthy controls (blue) from Kenya (triangles), Peru (squares), or Sweden (circles).

Cross-continental DNA Methylation Patterns in Buccal Swab Samples of TB Patients

We identified the signature in the pilot and showed the general applicability across different subpopulations by replicating the results in the validation cohort. To investigate the similarities and dissimilarities of TB patients from the different geographical areas, we combined the data from the pilot and validation cohort. This allowed us to introduce and investigate population-based confounding factors. We did an SVD analysis of the data including all samples of the pilot and validation cohort (patients; n = 33, exposed; n = 16, controls; n = 19), and we identified significant contribution to variation in the data by the group, country, and slide (Supplementary Figure 2A). An SVD correction was performed to reduce technical batch effect from the slide (Supplementary Figure 2B). We performed an MDS analysis of the 1000 most variable CpG sites and identified separation between the groups regardless of the country (Figure 2C). In summary, the analysis show that the patients with TB display a distinct DNAm pattern regardless of the population, suggesting a cross-continental DNAm signature in buccal swab samples for active TB.

Supervised Machine Learning Models Trained on a Panel of CpG Sites Achieve High Classification Performance of TB Patients and Controls

To investigate the potential of this DNAm signature as a diagnostic tool, we applied a machine learning approach to select a panel of CpG sites that can accurately distinguish TB patients from TB-exposed individuals and healthy controls. First, we trained L1-regularized multivariate logistic regression models on the pilot cohort (n = 23) using recursive feature elimination to promote model simplicity and interpretability, while mitigating potential overfitting and instability. Then, we validated the predictive accuracy of the selected CpG subsets by training supervised learning classifiers to estimate the probability of TB on samples from the pilot cohort and evaluating them on the left-out validation cohort. We observed that classifiers trained on the selected CpG subsets were able to achieve a high discriminatory performance for active TB (area under the curve [AUC] >0.90, sensitivity >0.70, specificity >0.95) among TB-exposed and healthy controls (Figure 3A and 3B). In particular, we found that a panel of 7 CpG sites optimized the balance between set size and model classification performance, showing an AUC of 0.94, sensitivity of 0.92, and specificity of 1, on the validation set (Figure 3C). The beta values for the 7 CpG sites for each group are shown in Figure 3D. Two CpG sites from the classifier were identified as DMCs in both the pilot and validation cohort independently, whereas the remaining 5 CpG sites were identified as DMCs in the pilot cohort (Supplementary Figure 3A). The model also performed within a satisfactory range (AUC, 0.82–0.92) when evaluated on the validation set without samples below the underweight BMI threshold (18.5 kg/m2), and on both the complete Peruvian cohort and the Peruvian cohort without low-BMI individuals (Supplementary Figure 3B). The average TB probability of the validation cohort samples estimated by the classifiers trained on these 7 CpG sites was significantly higher for active TB patients (probpatients = .76) compared to both healthy controls (probcontrols = .13, adjusted Wilcoxon test P = 2.36e-8) and exposed individuals (probexposed = .52, adjusted Wilcoxon test P = 1.95e-4). Similarly, the TB probability predicted for exposed individuals was significantly higher than the estimations for controls (adjusted Wilcoxon test P = 1.30e-4) (Supplementary Figure 3C). Since Sweden is a low-incidence country for TB whereas Peru is a high-incidence country, we also investigated if this circumstance could have influenced the development of the classifier. We applied the same methodology as before to construct a new classifier trained on high-incidence settings, wherein we interchanged the training and validation sets (Peruvian cohort as training set, Swedish cohort as validation). The resulting model was built using 6 CpG sites, 2 of which overlapped with the previous set of 7 CpG sites (Fisher exact test P = 1.83e-5; odds ratio [OR], 548.54). Remarkably, this high-incidence classifier achieved an AUC of 0.98 on the Swedish cohort (Supplementary Figure 3D). These results suggest that DNAm levels from a small number of CpG sites suffice to accurately classify TB patients from exposed individuals and controls. Furthermore, the selected sites demonstrate consistency across different TB incidence settings.

Tuberculosis (TB) classifier based on DNA methylation (DNAm) in buccal swab samples accurately classifies patients with TB among healthy controls and exposed individuals. Using machine learning, 20 candidate CpG sites with discriminative features for TB were obtained. A classifier was trained on the pilot cohort (blue) and tested on the validation cohort (orange). A, Sensitivity (y-axis) of the classifier based on DNAm level in 1–20 CpG sites (x-axis). Sensitivity of 0.70–0.94 was reached depending on the number of CpGs included. B, Specificity (y-axis) of the classifier in 1–20 CpG sites (x-axis). Specificity of 0.95–100 was reached depending on the number of CpG sites investigated. C, Receiver operating characteristics (ROCs) of the classifier based on 7 CpG sites (sensitivity of 0.92 and specificity 1) showing the true-positive rate (y-axis) and false-positive rate (x-axis) with an area under the curve (AUC) of 0.94. D, Beta values of the 7 classifier CpG sites for each group. The CpG sites are ordered by importance. All samples from pilot and validation cohort are represented in the plot.
Figure 3.

Tuberculosis (TB) classifier based on DNA methylation (DNAm) in buccal swab samples accurately classifies patients with TB among healthy controls and exposed individuals. Using machine learning, 20 candidate CpG sites with discriminative features for TB were obtained. A classifier was trained on the pilot cohort (blue) and tested on the validation cohort (orange). A, Sensitivity (y-axis) of the classifier based on DNAm level in 1–20 CpG sites (x-axis). Sensitivity of 0.70–0.94 was reached depending on the number of CpGs included. B, Specificity (y-axis) of the classifier in 1–20 CpG sites (x-axis). Specificity of 0.95–100 was reached depending on the number of CpG sites investigated. C, Receiver operating characteristics (ROCs) of the classifier based on 7 CpG sites (sensitivity of 0.92 and specificity 1) showing the true-positive rate (y-axis) and false-positive rate (x-axis) with an area under the curve (AUC) of 0.94. D, Beta values of the 7 classifier CpG sites for each group. The CpG sites are ordered by importance. All samples from pilot and validation cohort are represented in the plot.

Identification of a Consensus Disease Module Enriched in TB-Associated Genes and Pathways

To explore the biological context of the TB DNAm signatures from the pilot and validation cohorts, we applied a network analysis approach to identify modules of highly interconnected genes using the MODifieR pipeline [30]. We mapped the DMCs between TB patients and controls for each cohort, generating a pilot cohort TB module of 763 genes and a validation cohort TB module of 126 genes. KEGG pathway enrichment analysis showed significant enrichment in several pathways of infectious diseases and immune system (Supplementary Figures 4 and 5, respectively). Furthermore, the genes from the pilot and validation modules overlapped significantly (P < 2.2e-16; OR, 13.56), allowing to retrieve a consensus TB disease module of 48 genes (Figure 4A). To further examine the TB consensus module, we performed pathway and gene ontology enrichment analyses. Notably, we found that the consensus genes were significantly enriched (adjusted P < .05) in pathways associated with bacterial infection pathways, extracellular matrix (ECM) interactions, and immunoregulatory pathways (Figure 4B, complete pathway analysis in Supplementary Figure 6). The main component of the consensus module (n = 42) was significantly enriched in TB-associated genes from DisGeNET (P = .03; OR, 2.75). The TB-associated genes identified was WNT family member 5A (Wnt5a), growth factor receptor bound protein 2 (GRB2), mitogen-activated protein kinase 1 (MAPK1), epidermal growth factor (EGFR), protein tyrosine phosphate nonreceptor type 6 (PTPN6), and protein tyrosine phosphate receptor type C (PTPRC).

Disease modules of patients with tuberculosis (TB) from pilot and validation cohort overlap and are enriched in TB-associated genes and pathways. A, Disease modules for the pilot and validation cohort identified based on the differentially methylated CpG sites from each cohort using MODifieR. The disease modules were compared in a Venn analysis showing significant overlap (P < 2.2e-16; odds ratio [OR], 13.56) and a consensus module of 48 genes. B, Network showing the genes in the consensus module and their connections. Hypermethylated CpG sites are shown in red, hypomethylated in blue, mixed methylation pattern in beige, and TB-associated genes with a black outline. There was a significant overlap of TB-associated genes in the interconnected module genes (n = 42) (P = .03; OR, 2.75). The module was explored using KEGG pathway enrichment analysis, and genes enriched in pathways of cell and extracellular matrix (ECM) interactions (light blue area) and genes enriched in immune system pathways (light red area) were identified.
Figure 4.

Disease modules of patients with tuberculosis (TB) from pilot and validation cohort overlap and are enriched in TB-associated genes and pathways. A, Disease modules for the pilot and validation cohort identified based on the differentially methylated CpG sites from each cohort using MODifieR. The disease modules were compared in a Venn analysis showing significant overlap (P < 2.2e-16; odds ratio [OR], 13.56) and a consensus module of 48 genes. B, Network showing the genes in the consensus module and their connections. Hypermethylated CpG sites are shown in red, hypomethylated in blue, mixed methylation pattern in beige, and TB-associated genes with a black outline. There was a significant overlap of TB-associated genes in the interconnected module genes (n = 42) (P = .03; OR, 2.75). The module was explored using KEGG pathway enrichment analysis, and genes enriched in pathways of cell and extracellular matrix (ECM) interactions (light blue area) and genes enriched in immune system pathways (light red area) were identified.

DISCUSSION

To meet the needs for efficient TB diagnostics with reliable performance in low-resource settings, WHO is asking for non-sputum-based TB triage tests (minimum sensitivity of 0.90 and specificity of 0.70) and confirmatory tests (minimum sensitivity of 65% and specificity of 98%). Urine LAM is clinically used in some settings but has suboptimal sensitivity [31]. Several blood transcriptomic classifiers have been suggested and performed with sensitivities of 0.83–0.91 at a specificity of 0.70 [11]. Another classifier based on 3 differentially methylated regions performed with AUC 0.84, sensitivity of 0.65, and specificity of 0.90 in a validation cohort of 31 TB patients and 31 controls [15]. Compared to blood samples, buccal swabs are less invasive and easy to collect and store, and DNAm are stable epigenetic marks [32]. At the collection site, buccal swabs can be put in decontamination buffer to allow laboratory processing in Biosafety Level 2 laboratories. The cell proportions in buccal swab samples are also more homogenous as compared to blood samples and can consequently be more suitable for diagnostic developments of epigenetic signatures, since cellular heterogeneity contributes to variation in the DNAm [33]. By narrowing down the number of PCR-addressable CpG sites that with precision separate TB patients from persons without TB, the technology can be aligned with existing high-throughput PCR protocols. Such a tool is not primarily a stand-alone tool but would have value in a clinical setting as a triaging tool, which needs validation with further confirmatory testing. In other fields of research, the buccal mucosa has been explored, and DNAm signatures of smoking [34], biological age [35], maternal stress during pregnancy [36], and in utero exposure to severe acute respiratory syndrome coronavirus 2 [37] have been described.

In the present study, we have analyzed DNA methylomes of TB-exposed study participants in geographically distant locations and thereby introduced population-based confounding since there are population-specific epigenetic differences due to ethnicity and environment [38, 39]. Sweden is a low-incidence country where most TB patients are foreign-born [40]. Investigating covariates in our datasets with SVD showed variation in the data caused by the country but that the TB status was contributing with greater variation in the data. We confirmed that TB patients have a distinct DNAm pattern and that these are cross-continental differences. The results are in line with our own and others’ previous work demonstrating that DNAm patterns are changed in blood- and lung-derived immune cells during clinical or subclinical TB infection or after TB exposure [12–18]. To our knowledge, we are the first to report that DNAm changes are present in the buccal mucosa during active TB. We also identified DNAm differences in the TB-exposed individuals as compared with the healthy controls, suggesting induced epigenetic changes after exposure to Mtb. The signature seemed independent on involvement of the adaptive immune system, possibly reflecting the spectrum of disease severity that is not reflected by IGRA status. The classifier showed higher probability for TB-exposed individuals than unexposed controls to be classified as TB patients, proposing that some of the exposed individuals have a subclinical infection. We have previously shown that recently TB-exposed individuals have altered DNAm of lung immune cells regardless of IGRA status [14], suggesting that DNAm signatures could be a measure of TB exposure independent from immunological tests. The relationship between DNAm and the heterology of TB, ranging from early clearance, latent TB, subclinical TB, and active TB, needs further investigation.

DNAm is intricately linked to the regulation of gene expression, with hypomethylation of promoters being associated with increased expression and hypermethylation with silencing of genes [41]. We identified a consensus disease module from the pilot and validation cohort, with significant enrichment for TB-associated genes. Wnt5a was hypomethylated and is involved in the cellular processes following Mtb recognition through Toll-like receptor 2 [42]. Grb2, PTPN6 (both hypomethylated), and cdc-42 are involved in Mtb recognition of the mannose receptor [43]. Furthermore, we identified hypermethylation in EGFR, and mutations in this gene have been reported at an increased frequency in TB patients [44]. We also identified hypomethylation in PTPRC, which has been suggested as a biomarker for the diagnosis of active and latent TB [45]. Although macrophages are the primary target cell for Mtb, studies have provided evidence of infection in alveolar epithelial cells (AECs) [46, 47]. It has been proposed that Mtb translocates over the epithelial barrier when internalized in AECs and through migration of infected macrophages [48]. Mtb adheres to ECM proteins such as collagens, fibronectins, and laminins [48]. We observed enrichment in the pathway of ECM receptor interaction and bacterial invasion of epithelial cells. We also identified enrichment in pathways connected to shigellosis and salmonella infections; these are invasive intracellular infections where the pathogen is phagocytosed by macrophages and manipulates the host to avoid digestion and extend intracellular survival, similarly as in TB [49].

Limitations of this study include the limited sample size and lack of controls from Kenya, inconsistency in diagnostic method used, and potential model overfitting due to high sensitivity with limited features. Temporal data and comparisons of other diseases would be required to further develop a DNAm signature with diagnostic properties.

CONCLUSIONS

We identified DNAm differences in buccal swab samples distinguishing TB patients from healthy controls. The signature was present in TB patients from 3 different populations collected in Sweden, Peru, and Kenya. Furthermore, we developed a TB-specific DNAm classifier that demonstrated promising performance in identifying TB patients within our limited-scale cohort. Our results suggest that we can use buccal swabs to identify TB patients and strengthen the clinical relevance and implications for future development of DNAm signatures as a diagnostic tool in TB.

Supplementary Data

Supplementary materials are available at The Journal of Infectious Diseases online (http://jid.oxfordjournals.org/). Supplementary materials consist of data provided by the author that are published to benefit the reader. The posted materials are not copyedited. The contents of all supplementary data are the sole responsibility of the authors. Questions or messages regarding errors should be addressed to the author.

Notes

Acknowledgments. We would like to acknowledge the Core Facility for Bioinformatics and Expression Analysis at Karolinska Institute for the help with DNAm analysis of all samples collected in Sweden. We would like to acknowledge Nicholas Kiprotich and Mary Chepkwemoi at Moi University for their contribution in the coordination of the project and collection and processing of samples. We would like to acknowledge Clinical Genomics Linköping, Science for Life Laboratory, Linköping University, for DNAm analysis performed on samples collected in Peru and Kenya. The authors also acknowledge the contributions by Martina Sönnerbrandt, Department of Infectious Diseases, Linköping University Hospital, for help with contact tracing and sample collections during the study. The authors acknowledge the students Simona Lazarevic, John Berg, Gordon Spiegel, Danna Gutierrez, Sandra Dahling, and Remo Andersson for their work in the project in Peru, and Raynice Waker, Sam Widén, Frida Lindgärde, Anders Appeldahl, Malin Grönqvist, and Felicia Ollfors for their work in the project in Kenya.

Author contributions. L. K. coordinated the study in Sweden and designed and coordinated the studies in Kenya and Peru, optimized methods, prepared samples, did bioinformatic analysis, generated figures, and wrote the manuscript. I. Ö. included participants and prepared samples in Linköping, Sweden, and designed and coordinated the studies in Kenya and Peru. S. S. designed the bioinformatic analysis. D. M.-E. and M. G. performed MODifieR analysis and performed machine learning to develop the TB classifier. P. E. was responsible for the inclusion of participants and sample preparation in Lima, Peru. M. M.-A. was responsible for the coordination and supervision of laboratory activities. C. U.-G. had medical responsibility and supervised the study in Lima. L. D. and R. T. had medical responsibility and supervised the study in Eldoret. J. P. had medical responsibility of the studies and designed the study. M. L. designed and funded the study and wrote ethical application. All authors contributed to the manuscript.

Ethics approval. Ethical approval for the study in Sweden was obtained from the regional ethical review board in Linköping, No. 2016/237-31. In Kenya, ethical approval was obtained from Moi Teaching and Referral Hospital/Moi University Institutional Research and Ethics Committee, No. 0004260. In Peru, ethical approval was obtained from Universidad Peruana Cayetano Heredia Institutional Review Board, No. 209390. All participants signed an informed consent.

Data availability. Participant-related data from this study are not available for sharing because Institutional Review Board rules currently limit the data release. Bioinformatic pipelines used to analyze the data and to generate graphs and figures will be available on the following GitHub account upon publication: https://github.com/Lerm-Lab/TB-BuccalSwabTB.

Financial support. This study was funded by the Heart and Lung Foundation (grant numbers 20180613 and 20220034 to M. L.) and by the Swedish Research Council (grant numbers 2018-02961 and 2018-04246 to M. L.).

References

1

World Health Organization
. Global tuberculosis report. 2023. https://www.who.int/publications/i/item/9789240083851. Accessed 28 February 2024.

2

Simmons
 
JD
,
Stein
 
CM
,
Seshadri
 
C
, et al.   
Immunological mechanisms of human resistance to persistent Mycobacterium tuberculosis infection
.
Nat Rev Immunol
 
2018
;
18
:
575
89
.

3

World Health Organization
.
Implementing the end TB strategy: the essentials
 
2015
. https://iris.who.int/handle/10665/206499. Accessed 8 August 2023.

4

Pai
 
M
,
Denkinger
 
CM
,
Kik
 
SV
, et al.   
Gamma interferon release assays for detection of Mycobacterium tuberculosis infection
.
Clin Microbiol Rev
 
2014
;
27
:
3
20
.

5

Cobelens
 
FG
,
Menzies
 
D
,
Farhat
 
M
.
False-positive tuberculin reactions due to non-tuberculous mycobacterial infections
.
Int J Tuberc Lung Dis
 
2007
;
11
:
934
5
;
author reply 5
.

6

Andersen
 
P
,
Munk
 
ME
,
Pollock
 
JM
,
Doherty
 
TM
.
Specific immune-based diagnosis of tuberculosis
.
Lancet
 
2000
;
356
:
1099
104
.

7

Steingart
 
KR
,
Ng
 
V
,
Henry
 
M
, et al.   
Sputum processing methods to improve the sensitivity of smear microscopy for tuberculosis: a systematic review
.
Lancet Infect Dis
 
2006
;
6
:
664
74
.

8

Tulsyan
 
S
,
Aftab
 
M
,
Sisodiya
 
S
, et al.   
Molecular basis of epigenetic regulation in cancer diagnosis and treatment
.
Front Genet
 
2022
;
13
:
885635
.

9

Mayo
 
S
,
Benito-Leon
 
J
,
Pena-Bautista
 
C
,
Baquero
 
M
,
Chafer-Pericas
 
C
.
Recent evidence in epigenomics and proteomics biomarkers for early and minimally invasive diagnosis of Alzheimer's and Parkinson's diseases
.
Curr Neuropharmacol
 
2021
;
19
:
1273
303
.

10

Fischer
 
MA
,
Vondriska
 
TM
.
Clinical epigenomics for cardiovascular disease: diagnostics and therapies
.
J Mol Cell Cardiol
 
2021
;
154
:
97
105
.

11

Turner
 
CT
,
Gupta
 
RK
,
Tsaliki
 
E
, et al.   
Blood transcriptional biomarkers for active pulmonary tuberculosis in a high-burden setting: a prospective, observational, diagnostic accuracy study
.
Lancet Respir Med
 
2020
;
8
:
407
19
.

12

Chen
 
H
,
Zhang
 
JY
,
Hu
 
XJ
, et al.   
Methylation analysis and validation of whole genome DNA in active tuberculosis [in Chinese]
.
Sichuan Da Xue Xue Bao Yi Xue Ban
 
2018
;
49
:
731
6
.

13

Karlsson
 
L
,
Das
 
J
,
Nilsson
 
M
, et al.   
A differential DNA methylome signature of pulmonary immune cells from individuals converting to latent tuberculosis infection
.
Sci Rep
 
2021
;
11
:
19418
.

14

Pehrson
 
I
,
Sayyab
 
S
,
Das
 
J
, et al.   
The spectrum of tuberculosis described as differential DNA methylation patterns in alveolar macrophages and alveolar T cells
.
Clin Epigenetics
 
2022
;
14
:
175
.

15

Lyu
 
M
,
Zhou
 
J
,
Jiao
 
L
, et al.   
Deciphering a TB-related DNA methylation biomarker and constructing a TB diagnostic classifier
.
Mol Ther Nucleic Acids
 
2022
;
27
:
37
49
.

16

Chen
 
YC
,
Hsiao
 
CC
,
Chen
 
TW
, et al.   
Whole genome DNA methylation analysis of active pulmonary tuberculosis disease identifies novel epigenotypes: PARP9/miR-505/RASGRP4/GNG12 gene methylation and clinical phenotypes
.
Int J Mol Sci
 
2020
;
21
:
3180
.

17

Maruthai
 
K
,
Kalaiarasan
 
E
,
Joseph
 
NM
,
Parija
 
SC
,
Mahadevan
 
S
.
Assessment of global DNA methylation in children with tuberculosis disease
.
Int J Mycobacteriol
 
2018
;
7
:
338
42
.

18

Du
 
Y
,
Gao
 
X
,
Yan
 
J
, et al.   
Relationship between DNA methylation profiles and active tuberculosis development from latent infection: a pilot study in nested case-control design
.
Microbiol Spectr
 
2022
;
10
:
e0058622
.

19

Senel
 
S
.
An overview of physical, microbiological and immune barriers of oral mucosa
.
Int J Mol Sci
 
2021
;
22
:
7821
.

20

Khambati
 
N
,
Olbrich
 
L
,
Ellner
 
J
,
Salgame
 
P
,
Song
 
R
,
Bijker
 
EM
.
Host-based biomarkers in saliva for the diagnosis of pulmonary tuberculosis in children: a mini-review
.
Front Pediatr
 
2021
;
9
:
756043
.

21

Jain
 
P
,
Jain
 
I
.
Oral manifestations of tuberculosis: step towards early diagnosis
.
J Clin Diagn Res
 
2014
;
8
:
ZE18
21
.

22

Wood
 
RC
,
Luabeya
 
AK
,
Weigel
 
KM
, et al.   
Detection of Mycobacterium tuberculosis DNA on the oral mucosa of tuberculosis patients
.
Sci Rep
 
2015
;
5
:
8668
.

23

Mesman
 
AW
,
Calderon
 
RI
,
Pollock
 
NR
, et al.   
Molecular detection of Mycobacterium tuberculosis from buccal swabs among adult in Peru
.
Sci Rep
 
2020
;
10
:
22231
.

24

Flores
 
JA
,
Calderon
 
R
,
Mesman
 
AW
, et al.   
Detection of Mycobacterium tuberculosis DNA in buccal swab samples from children in Lima, Peru
.
Pediatr Infect Dis J
 
2020
;
39
:
e376
80
.

25

LaCourse
 
SM
,
Seko
 
E
,
Wood
 
R
, et al.   
Diagnostic performance of oral swabs for non-sputum based TB diagnosis in a TB/HIV endemic setting
.
PLoS One
 
2022
;
17
:
e0262123
.

26

Feleke
 
BE
,
Feleke
 
TE
,
Biadglegne
 
F
.
Nutritional status of tuberculosis patients, a comparative cross-sectional study
.
BMC Pulm Med
 
2019
;
19
:
182
.

27

Zheng
 
SC
,
Webster
 
AP
,
Dong
 
D
, et al.   
A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix
.
Epigenomics
 
2018
;
10
:
925
40
.

28

van Dongen
 
J
,
Ehli
 
EA
,
Jansen
 
R
, et al.   
Genome-wide analysis of DNA methylation in buccal cells: a study of monozygotic twins and mQTLs
.
Epigenetics Chromatin
 
2018
;
11
:
54
.

29

Theda
 
C
,
Hwang
 
SH
,
Czajko
 
A
,
Loke
 
YJ
,
Leong
 
P
,
Craig
 
JM
.
Quantitation of the cellular content of saliva and buccal swab samples
.
Sci Rep
 
2018
;
8
:
6944
.

30

de Weerd
 
HA
,
Badam
 
TVS
,
Martinez-Enguita
 
D
, et al.   
MODifier: an ensemble R package for inference of disease modules from transcriptomics networks
.
Bioinformatics
 
2020
;
36
:
3918
9
.

31

Engel
 
N
,
Mwaura
 
M
.
User perspectives on LF-LAM for the diagnosis of active tuberculosis: results from qualitative research. Lateral flow urine lipoarabinomannan assay (LF-LAM) for the diagnosis of active tuberculosis in people living with HIV: policy update
.
Geneva, Switzerland
:
World Health Organization
,
2019
.

32

Gosselt
 
HR
,
Griffioen
 
PH
,
van Zelst
 
BD
,
Oosterom
 
N
,
de Jonge
 
R
,
Heil
 
SG
.
Global DNA (hydroxy)methylation is stable over time under several storage conditions and temperatures
.
Epigenetics
 
2021
;
16
:
45
53
.

33

Jaffe
 
AE
,
Irizarry
 
RA
.
Accounting for cellular heterogeneity is critical in epigenome-wide association studies
.
Genome Biol
 
2014
;
15
:
R31
.

34

Jessen
 
WJ
,
Borgerding
 
MF
,
Prasad
 
GL
.
Global methylation profiles in buccal cells of long-term smokers and moist snuff consumers
.
Biomarkers
 
2018
;
23
:
625
39
.

35

McEwen
 
LM
,
O’Donnell
 
KJ
,
McGill
 
MG
, et al.   
The PedBE clock accurately estimates DNA methylation age in pediatric buccal cells
.
Proc Natl Acad Sci U S A
 
2020
;
117
:
23329
35
.

36

Nazzari
 
S
,
Grumi
 
S
,
Mambretti
 
F
, et al.   
Maternal and infant NR3C1 and SLC6A4 epigenetic signatures of the COVID-19 pandemic lockdown: when timing matters
.
Transl Psychiatry
 
2022
;
12
:
386
.

37

Hill
 
RA
,
Gibbons
 
A
,
Han
 
U
, et al.   
Maternal SARS-CoV-2 exposure alters infant DNA methylation
.
Brain Behav Immun Health
 
2023
;
27
:
100572
.

38

Fraser
 
HB
,
Lam
 
LL
,
Neumann
 
SM
,
Kobor
 
MS
.
Population-specificity of human DNA methylation
.
Genome Biol
 
2012
;
13
:
R8
.

39

Liu
 
J
,
Hutchison
 
K
,
Perrone-Bizzozero
 
N
,
Morgan
 
M
,
Sui
 
J
,
Calhoun
 
V
.
Identification of genetic and epigenetic marks involved in population structure
.
PloS One
 
2010
;
5
:
e13209
.

40

Lönnroth
 
K
,
Mor
 
Z
,
Arkens
 
C
, et al.   
Tuberculosis in migrants in low-incidence countries: epidemiology and intervention entry points
.
Int J Tuberc Lung Dis
 
2017
;
21
:
624
37
.

41

Moore
 
LD
,
Le
 
T
,
Fan
 
G
.
DNA methylation and its basic function
.
Neuropsychopharmacology
 
2013
;
38
:
23
38
.

42

Blumenthal
 
A
,
Ehlers
 
S
,
Lauber
 
H
, et al.   
The Wingless homolog WNT5A and its receptor Frizzled-5 regulate inflammatory responses of human mononuclear cells induced by microbial stimulation
.
Blood
 
2006
;
108
:
965
73
.

43

Rajaram
 
MVS
,
Arnett
 
E
,
Azad
 
AK
, et al.   
M. tuberculosis-initiated human mannose receptor signaling regulates macrophage recognition and vesicle trafficking by FcRgamma-Chain, Grb2, and SHP-1
.
Cell Rep
 
2017
;
21
:
126
40
.

44

Hwang
 
IK
,
Paik
 
SS
,
Lee
 
Sh
.
Impact of pulmonary tuberculosis on the EGFR mutational status and clinical outcome in patients with lung adenocarcinoma
.
Cancer Res Treat
 
2019
;
51
:
158
68
.

45

Mamishi
 
S
,
Pourakbari
 
B
,
Sadeghi
 
RH
,
Marjani
 
M
,
Mahmoud
 
S
.
Differential gene expression of ASUN, NEMF, PTPRC and DHX29: candidate biomarkers for the diagnosis of active and latent tuberculosis
.
Infect Disord Drug Targets
 
2021
;
21
:
268
73
.

46

Hernández-Pando
 
R
,
Jeyamathan
 
M
,
Mengistu
 
G
, et al.   
Persistence of DNA from Mycobacterium tuberculosis in superficially normal lung tissue during latent infection
.
Lancet
 
2000
;
356
:
2133
8
.

47

Eum
 
SY
,
Kong
 
JH
,
Hong
 
MS
, et al.   
Neutrophils are the predominant infected phagocytic cells in the airways of patients with active pulmonary TB
.
Chest
 
2010
;
137
:
122
8
.

48

Ryndak
 
MB
,
Laal
 
S
.
Mycobacterium tuberculosis primary infection and dissemination: a critical role for alveolar epithelial cells
.
Front Cell Infect Microbiol
 
2019
;
9
:
299
.

49

Pham
 
TH
,
Monack
 
DM
.
Turning foes into permissive hosts: manipulation of macrophage polarization by intracellular bacteria
.
Curr Opin Immunol
 
2023
;
84
:
102367
.

Author notes

Presented in part: Epigenomics of Common Diseases Conference, Wellcome Genome Campus, UK, 15–17 November 2023.

Potential conflicts of interest. M. L. and M. G. are founders of PredictME AB. S. S. and D. M.-E. are bioinformaticians at PredictME. All other authors report no potential conflicts of interest.

All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.

Supplementary data