Machine-Learning-Derived, Mechanistically Informed Transcriptomic Signature to Diagnose Active Tuberculosis and Guide Host-Directed Therapy.
Asif Hassan Syed, Nashwan Alromema, Hatem A Almazarqi, Jasrah Irfan, Shakeel Ahmad, Altyeb A Taha, Alhuseen Omar Alsayed
Diagnostics (Basel, Switzerland) · 2026-02
Abstract
An important diagnostic problem is to differentiate between active tuberculosis (TB) and latent TB infection (LTBI). Furthermore, the current biomarkers also offer minimal insight into disease pathogenesis to direct treatment. This triggered us to design a two-mode biomarker signature based on the multicohort analysis using a transcriptomic and stringent machine learning pipeline.When analyzing active TB, latent TB, and healthy control samples, a rigorous filter (ANOVA,< 0.001) was used, followed by the selection of features with the help of Boruta-XGBoost and LASSO regression. This determined a small four-gene signature (,,, and), which was selectively and highly upregulated in the active TB clinical state (< 0.001). An ensemble staking classifier based on this signature (Random Forest and XGBoost) had a very high diagnostic performance (ROC-AUC = 0.991 (95% CI: 0.983-0.997)) in the stratification of infection phases, which was strongly confirmed in another cohort (GSE19444).Importantly, the analysis of the functional pathways showed that all the genes are mapped to core dysregulated host pathways in active TB: antigen presentation (), lipid trafficking (), interferon response (), and inflammasome signaling (). In such a way, the signature has a dual advantage: (1) high specificity, non-sputum transcriptional diagnostic of active TB, and (2) a mechanistic map of key host pathways, which describes targets of intervention.Thus, the signature provides a two-fold response: a biomarker panel aligned with WHO performance targets for TB triage and a mechanistic plan of therapy, which provides an easy way to implement transcriptomic discovery into clinical action against TB.