TB Research

Explainable machine learning for predicting clinical outcomes in HIV/TB co-infection: a comparative retrospective study.

Qingfeng Sun, Kai Zhang, Yuanlong Xu, Mengmei Luo, Zhouzhou Yang, Qianyu Liu, Sang Liu, Aimei Liu

BMC infectious diseases · 2025-11

Abstract

BACKGROUND: HIV/TB co-infection presents substantial public-health challenges, showing greater treatment-failure and mortality rates than tuberculosis alone. Recent advances in machine learning (ML) provide a robust means of identifying high-risk patients early in the disease course.

METHODS: This retrospective study enrolled 359 patients co-infected with HIV and TB at a single tertiary-care hospital. We extracted clinical and immunological data. The cohort was subsequently divided into training (0%) and test (0%) subsets, and class imbalance was addressed with the Synthetic Minority Over-sampling Technique (SMOTE). Six ML classifiers-Random Forest, XGBoost, LightGBM, Support Vector Machine, Extra Trees and CatBoost-were trained after grid-search hyper-parameter tuning. Model performance was assessed with the area under the receiver-operating-characteristic curve (AUC), accuracy, recall, precision, specificity and F1-score. Multi-criteria ranking was then conducted with the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS). The leading model was interpreted using SHapley Additive exPlanations (SHAP).

RESULTS: Overall, 304 of 359 patients (84.7%) had favourable outcomes, whereas 55 (15.3%) had unfavourable outcomes. LightGBM achieved the best overall performance (AUC = 0.771; accuracy = 84.72%; F1 = 0.522) and was ranked first by TOPSIS. SHAP analysis highlighted age, CD4 and CD8 counts, body-mass index and occupation as key predictors. Lower BMI, pronounced immunosuppression and older age were strongly associated with unfavourable outcomes, findings that align with established clinical evidence.

CONCLUSION: A gradient-boosted model (LightGBM) combined with SHAP interpretation demonstrated reliable predictive performance in HIV/TB co-infection and highlighted clinically actionable risk factors. Incorporating this tool into routine workflows could enable healthcare providers to identify high-risk individuals earlier, allocate resources more efficiently and, ultimately, improve TB-treatment success.

CLINICAL TRIAL REGISTRATION: Not applicable.

MeSH terms

  • Humans
  • Retrospective Studies
  • HIV Infections
  • Machine Learning
  • Female
  • Coinfection
  • Male
  • Adult
  • Tuberculosis
  • Middle Aged
  • ROC Curve
  • Treatment Outcome