TB Research

Clinical data analysis research on tuberculosis based on machine learning.

Rongrong Kang, Huanqing Liu, Qian Lei, Tingting Li

Frontiers in medicine · 2026-01

Abstract

BACKGROUND: Tuberculosis (TB) remains a global health challenge, with heterogeneous treatment outcomes despite standardized protocols. Traditional statistical models struggle with high-dimensional clinical data, necessitating advanced machine learning (ML) approaches.

OBJECTIVE: To analyze clinical data from 467 pulmonary TB patients and construct a predictive model using multiple ML algorithms.

METHODS: A prospective cohort of 467 patients (218 intervention, 249 control) was enrolled from Xi'an Chest Hospital. Medical ratio features (ALT/AST, CD4/CD8) and polynomial interaction terms (e.g., RBC × ALT) were constructed. Recursive feature elimination (RFE) selected 60 predictive factors from an expanded 80-dimensional feature space. Fourteen ML algorithms were systematically compared, with hyperparameters optimized via grid search. Performance was assessed using five-fold cross-validation, RMSE, and MAE.

RESULTS: LightGBM achieved the highest initial predictive performance (= 0.1829, RMSE = 139.23). Following hyperparameter optimization, Random Forest attained a marginally improvedof 0.1867 with comparable error metrics and enhanced clinical interpretability, serving as the final reference model. Feature engineering expanded the feature space from 33 to 80, with 60 optimal features retained.

CONCLUSION: The optimized Random Forest model (= 0.1867) demonstrates moderate accuracy and clinical interpretability, supporting its potential as a decision-support tool for TB treatment optimization. Pharmacist-led therapeutic drug monitoring (TDM) further enhances individualized therapy. Future work requires multi-center validation and radiomics integration to improve predictive performance in severe cases.

CLINICAL TRIAL REGISTRATION: Registration Platform: Chinese Clinical Trial Registry [https://www.chictr.org.cn/], identifier [ChiCTR2300074328].