Machine-Learning-Derived, Mechanistically Informed Transcriptomic Signature to Diagnose Active Tuberculosis and Guide Host-Directed Therapy
Asif Hassan Syed, Nashwan Alromema, Hatem A. Almazarqi, Jasrah Irfan, Shakeel Ahmad, Altyeb Taha, Alhuseen Omar Alsayed
Diagnostics · 2026-02
Abstract
Background/Objectives: An important diagnostic problem is to differentiate between active tuberculosis (TB) and latent TB infection (LTBI). Furthermore, the current biomarkers also offer minimal insight into disease pathogenesis to direct treatment. This triggered us to design a two-mode biomarker signature based on the multicohort analysis using a transcriptomic and stringent machine learning pipeline. Methods: When analyzing active TB, latent TB, and healthy control samples, a rigorous filter (ANOVA, p < 0.001) was used, followed by the selection of features with the help of Boruta-XGBoost and LASSO regression. This determined a small four-gene signature (TAP2, SORT1, WARS, and ANKRD22), which was selectively and highly upregulated in the active TB clinical state (p < 0.001). An ensemble staking classifier based on this signature (Random Forest and XGBoost) had a very high diagnostic performance (ROC-AUC = 0.991 (95% CI: 0.983–0.997)) in the stratification of infection phases, which was strongly confirmed in another cohort (GSE19444). Results: Importantly, the analysis of the functional pathways showed that all the genes are mapped to core dysregulated host pathways in active TB: antigen presentation (TAP2), lipid trafficking (SORT1), interferon response (WARS), and inflammasome signaling (ANKRD22). In such a way, the signature has a dual advantage: (1) high specificity, non-sputum transcriptional diagnostic of active TB, and (2) a mechanistic map of key host pathways, which describes targets of intervention. Conclusions: Thus, the signature provides a two-fold response: a biomarker panel aligned with WHO performance targets for TB triage and a mechanistic plan of therapy, which provides an easy way to implement transcriptomic discovery into clinical action against TB.
MeSH terms
- Active tuberculosis
- Computational biology
- Biomarker discovery
- Transcriptome
- Tuberculosis
- Lipoarabinomannan
- Biomarker
- Bioinformatics
- Medicine
- Disease
- Biology
- Classifier (UML)
- Signature (topology)
- Computer science
- Mycobacterium tuberculosis
- Feature selection