TB Research

Analysis of High-Risk Factors for Tuberculosis Retreatment Based on Machine Learning and Latent Class Analysis

Du X, Yimamu M, Na Y, Li X, Wang Z, Nuermaihaimaiti ZZ, Wang Y, Zhang L, et al. (9 authors)

DOAJ (DOAJ: Directory of Open Access Journals) · 2026-04

Abstract

Xilong Du,1 Maiwulajiang Yimamu,2 Yan Na,3 Xiaoxue Li,3 Ziyu Wang,3 Zulimire Z Nuermaihaimaiti,3 Yuxin Wang,3 Liping Zhang,3 Yanling Zheng3,4 1School of Public Health, Xinjiang Medical University, Urumqi, Xinjiang, People’s Republic of China; 2Tuberculosis and Leprosy Prevention and Control Department, kashgar Prefecture Center for Disease Control and Prevention, Kashgar, Xinjiang, People’s Republic of China; 3College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, Xinjiang, People’s Republic of China; 4Institute of Medical Engineering Interdisciplinary Research, Xinjiang Medical University, Urumqi, Xinjiang, People’s Republic of ChinaCorrespondence: Yanling Zheng, Email zhengyl_math@sina.cnObject: To identify high-risk factors for tuberculosis retreatment and to provide a scientific basis for developing targeted prevention and control strategies by integrating machine learning with latent class analysis.Methods: This study retrospectively collected baseline and treatment-related data from 6,821 tuberculosis patients, employing machine learning and latent class analysis (LCA) to investigate the key influencing factors associated with high-risk populations for retreatment.Results: The XGBoost model achieved an overall accuracy of 84% and an area under the ROC curve (AUC) of 0.938. The analysis identified sputum examination results at month 6 or 8 of treatment, treatment regimen, and diagnostic classification as the most influential factors associated with retreatment. SHAP analysis further revealed that a sputum examination status of “not performed” was strongly linked to increased retreatment risk. Logistic regression confirmed this finding, with “not performed” (OR = 123.47, P < 0.001) and a “positive” result (OR = 14.89, P = 0.02) at month 6 or 8 identified as significant risk factors. Latent class analysis stratified patients into four distinct subgroups, among which those characterized by comorbid diabetes or prior treatment failure constituted the highest-risk populations for retreatment.Conclusion: It is recommended to improve treatment adherence and efficacy monitoring for newly diagnosed patients, strengthen whole-course supervision, and optimize management for elderly patients and those on long-term regimens.Keywords: tuberculosis, latent class analysis, random forest, xgboost, cramér’s v

MeSH terms

  • Tuberculosis
  • Artificial intelligence
  • Logistic regression
  • Latent class model
  • Machine learning
  • Medicine
  • Sputum
  • Class (philosophy)
  • Family medicine
  • Disease control
  • Disease
  • Latent tuberculosis
  • Public health
  • Environmental health
  • Tuberculosis control
  • Active tuberculosis