Integrative transcriptomic meta-analysis reveals conserved transcriptional signatures and predictive biomarkers for active tuberculosis: a pathway-based machine learning approach
Li T, Liu H, Lei Q, You Z
Frontiers in microbiology · 2026-04
Abstract
Background Tuberculosis (TB) caused 1.23 million deaths in 2024, with accurate diagnosis hampered by population heterogeneity and limited biomarker generalizability. We developed an integrative framework combining multi-cohort transcriptomics and machine learning to identify host-derived transcriptional signatures of active TB. Methods Five transcriptomic datasets (GSE83456, GSE107995, GSE158802, GSE19435, GSE25534) comprising 529 samples were analyzed. After standardized preprocessing, we performed differential expression analysis, inverse variance-weighted meta-analysis, and single-sample gene set enrichment analysis (ssGSEA) for three KEGG pathways. Machine learning classifiers were developed using logistic regression with SHapley Additive exPlanations (SHAP)-based interpretability. Results Meta-analysis identified 108 core differentially expressed genes (80 upregulated, 28 downregulated) conserved across all cohorts. Upregulated genes showed significant enrichment in interferon signaling, antigen presentation, and chemokine activity. Pathway analysis revealed modest downregulation in NF-κB signaling (fold-change: -0.023, p = 0.02), antigen presentation (fold-change: -0.026, p = 0.08), tuberculosis pathway (fold-change: -0.023, p = 0.05). Machine learning classifiers achieved excellent discrimination with cross-validated AUCs of 0.85-0.94 (mean: 0.89 ± 0.04), maintaining balanced sensitivity (82-91%) and specificity (85-93%). SHAP analysis identified interferon-stimulated genes (STAT1, IFITM1), chemokine receptors (CXCL10, CXCL9), and MHC class II molecules (HLA-DRA) as top predictive features, underscoring the biological relevance of the human host response to Mycobacterium tuberculosis . Conclusion Our integrative framework identifies a conserved 347-gene transcriptional signature and three key immune pathways that transcend population and technical heterogeneity. The high diagnostic accuracy and biologically interpretable feature sets provide validated biomarkers for TB diagnosis and support clinical translation toward precision medicine approaches in global TB control. Clinical trial registration https://www.chictr.org.cn/, identifier ChiCTR2300074328.