TB Research

Temporal Knowledge Discovery in Drug-Resistant Tuberculosis: A Decade-Long Machine Learning Analysis From Egyptian Clinical Data

Elsayed H, Elkhwsky F, Amin W, Abdelbaky I

International journal of medical informatics · 2026-04

Abstract

Background Advanced predictive tools are required due to the worldwide persistence of tuberculosis (TB) and the growing threat of multidrug-resistant tuberculosis (MDR-TB). The rate and precision of traditional diagnostic techniques frequently experience delays. This study uses machine learning (ML) to identify clinical and longitudinal treatment-history risk factors for drug-resistant tuberculosis (DR-TB) in Egypt using a ten-year dataset (2012-2022). Methods Three significant respiratory hospitals in Egypt participated in a retrospective case-control study. Demographics, clinical history, and lifestyle factors were examined in 1,462 patients (677 DR-TB cases and 785 DS-TB controls). Drug resistance was ascertained through phenotypic drug susceptibility testing (pDST) and, where applicable, rapid molecular assays including Xpert MTB/RIF. SHapley Additive exPlanations (SHAP) was used for feature selection. Stratified five-fold cross-validation on the training set, combined with evaluation on an independent held-out test set (20% of the total dataset), was used to build and validate four machine learning models: Random Forest, XGBoost, KNN, and Neural Networks. Results With 94.54% accuracy, 95.54% recall, and a ROC-AUC of 94.46%, the Random Forest model performed most effectively (cross-validation mean accuracy: 92.64% ± 0.50%) . With an accuracy of 93.17%, XGBoost was closely behind (cross-validation mean accuracy: 92.90% ± 1.98%). Longitudinal treatment-history variables (prior first-line drug use, patient category, and number of previous treatment episodes), geographic region (Governorate), and radiological infiltrations were the most influential predictors. Conclusion Using longitudinal clinical data, ML models showed high efficacy in differentiating DR-TB from DS-TB. A strong framework for early DR-TB prediction is provided by the integration of treatment-history features and geographic risk factors with AI, which may optimize treatment initiation and resource allocation in high-burden settings.

MeSH terms

  • Humans
  • Tuberculosis, Multidrug-Resistant
  • Antitubercular Agents
  • Risk Factors
  • Case-Control Studies
  • Retrospective Studies
  • Adult
  • Middle Aged
  • Egypt
  • Female
  • Male
  • Machine Learning