TB Research

Explainable prediction of MDR/RR-TB in tuberculosis-diabetes mellitus multimorbidity: a machine learning model developed and validated in a dual-center study.

Xinxin Zhong, Kuan Liu, Jiujin Zhang, Xinhua Tang, Tao Lu, Qiqi Chen, Jianzhi Pang, Rongjun Chen, et al. (11 authors)

BMC infectious diseases · 2026-04

Abstract

BACKGROUND: Tuberculosis-diabetes mellitus (TB-DM) multimorbidity significantly increases the risk of multidrug-resistant/rifampicin-resistant tuberculosis (MDR/RR-TB). Early risk stratification tools for this high-risk population remain lacking.

OBJECTIVE: To develop and validate an interpretable machine learning (ML) model for predicting MDR/RR-TB in patients with TB-DM multimorbidity, and to identify key predictive factors using explainable artificial intelligence.

METHODS: This dual-center retrospective study enrolled 245 patients with TB-DM multimorbidity from January 2019 to December 2022. Seven machine learning algorithms were constructed and validated with 10-fold cross-validation. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, precision, recall, F1-score, calibration curve, and decision curve analysis (DCA). SHapley Additive exPlanations (SHAP) was applied to identify critical predictive factors.

RESULTS: The random forest (RF) model achieved the optimal performance, with an AUC-ROC of 0.818, accuracy of 0.806, precision of 0.688, recall of 0.611, and F1-score of 0.647; the moderate recall indicates a considerable false-negative rate (FNR) , supporting its use as a triage tool rather than a stand-alone diagnostic test. Calibration and DCA confirmed robust predictive reliability and substantial clinical net benefit within a clinically relevant threshold range of 0.06-0.80. SHAP analysis identified the symptom-to-diagnosis interval, tuberculosis (TB) treatment history, treatment adherence, pulmonary cavitation, and smoking history as the top five critical predictors.

CONCLUSION: The interpretable RF model accurately and reliably predicts the risk of MDR/RR-TB in patients with TB-DM multimorbidity. The symptom-to-diagnosis interval is the most crucial risk factor. This model can assist clinical triage, early intervention, and personalized management.