TB Research

Temporal Knowledge Discovery in Drug-Resistant Tuberculosis: A Decade-Long Machine Learning Analysis From Egyptian Clinical Data.

Hanan Elsayed, Fayek Elkhwsky, Wagdy Amin, Ibrahim Abdelbaky

International journal of medical informatics · 2026-07

Abstract

BACKGROUND: Advanced predictive tools are required due to the worldwide persistence of tuberculosis (TB) and the growing threat of multidrug-resistant tuberculosis (MDR-TB). The rate and precision of traditional diagnostic techniques frequently experience delays. This study uses machine learning (ML) to identify clinical and longitudinal treatment-history risk factors for drug-resistant tuberculosis (DR-TB) in Egypt using a ten-year dataset (2012-2022).

METHODS: Three significant respiratory hospitals in Egypt participated in a retrospective case-control study. Demographics, clinical history, and lifestyle factors were examined in 1,462 patients (677 DR-TB cases and 785 DS-TB controls). Drug resistance was ascertained through phenotypic drug susceptibility testing (pDST) and, where applicable, rapid molecular assays including Xpert MTB/RIF. SHapley Additive exPlanations (SHAP) was used for feature selection. Stratified five-fold cross-validation on the training set, combined with evaluation on an independent held-out test set (20% of the total dataset), was used to build and validate four machine learning models: Random Forest, XGBoost, KNN, and Neural Networks.

RESULTS: With 94.54% accuracy, 95.54% recall, and a ROC-AUC of 94.46%, the Random Forest model performed most effectively (cross-validation mean accuracy: 92.64% ± 0.50%) . With an accuracy of 93.17%, XGBoost was closely behind (cross-validation mean accuracy: 92.90% ± 1.98%). Longitudinal treatment-history variables (prior first-line drug use, patient category, and number of previous treatment episodes), geographic region (Governorate), and radiological infiltrations were the most influential predictors.

CONCLUSION: Using longitudinal clinical data, ML models showed high efficacy in differentiating DR-TB from DS-TB. A strong framework for early DR-TB prediction is provided by the integration of treatment-history features and geographic risk factors with AI, which may optimize treatment initiation and resource allocation in high-burden settings.

MeSH terms

  • Humans
  • Machine Learning
  • Tuberculosis, Multidrug-Resistant
  • Egypt
  • Male
  • Female
  • Retrospective Studies
  • Adult
  • Case-Control Studies
  • Middle Aged
  • Risk Factors
  • Antitubercular Agents