Clinical data analysis research on tuberculosis based on machine learning
Rongrong Kang, Huanqing Liu, Qian Lei, Tingting Li
Frontiers in Medicine · 2026-04
Abstract
Background Tuberculosis (TB) remains a global health challenge, with heterogeneous treatment outcomes despite standardized protocols. Traditional statistical models struggle with high-dimensional clinical data, necessitating advanced machine learning (ML) approaches. Objective To analyze clinical data from 467 pulmonary TB patients and construct a predictive model using multiple ML algorithms. Methods A prospective cohort of 467 patients (218 intervention, 249 control) was enrolled from Xi'an Chest Hospital. Medical ratio features (ALT/AST, CD4/CD8) and polynomial interaction terms (e.g., RBC × ALT) were constructed. Recursive feature elimination (RFE) selected 60 predictive factors from an expanded 80-dimensional feature space. Fourteen ML algorithms were systematically compared, with hyperparameters optimized via grid search. Performance was assessed using five-fold cross-validation R 2 , RMSE, and MAE. Results LightGBM achieved the highest initial predictive performance ( R 2 = 0.1829, RMSE = 139.23). Following hyperparameter optimization, Random Forest attained a marginally improved R 2 of 0.1867 with comparable error metrics and enhanced clinical interpretability, serving as the final reference model. Feature engineering expanded the feature space from 33 to 80, with 60 optimal features retained. Conclusion The optimized Random Forest model ( R 2 = 0.1867) demonstrates moderate accuracy and clinical interpretability, supporting its potential as a decision-support tool for TB treatment optimization. Pharmacist-led therapeutic drug monitoring (TDM) further enhances individualized therapy. Future work requires multi-center validation and radiomics integration to improve predictive performance in severe cases. Clinical trial registration Registration Platform: Chinese Clinical Trial Registry [ https://www.chictr.org.cn/ ], identifier [ChiCTR2300074328].
MeSH terms
- Random forest
- Machine learning
- Artificial intelligence
- Hyperparameter
- Feature selection
- Feature (linguistics)
- Support vector machine
- Hyperparameter optimization
- Clinical trial
- Feature engineering
- Computer science
- Mean squared error
- Medicine
- Statistical learning
- Ensemble learning
- Data mining
- Tuberculosis
- Big data
- Decision tree