Clinical Data-Driven Prediction of Pulmonary Tuberculosis with Comorbidities Using Extreme Gradient Boosting (XGBoost)
Dhea Salsabila Semitra, Riska Yanu Fa’rifah, Dita Pramesti
Abstract
Pulmonary tuberculosis (PTB) remains a major global health challenge, with Indonesia ranked second worldwide in TB burden and over 90% of cases being pulmonary. Traditional diagnostic methods such as sputum microscopy and culture remain dominant in limited settings despite their low sensitivity and long turnaround times, contributing to delayed or missed diagnoses, especially in patients with comorbidities such as HIV/AIDS, diabetes mellitus, and COPD. This study proposes a machine learning approach using the Extreme Gradient Boosting (XGBoost) algorithm to classify PTB cases based on real-case clinical and laboratory data collected from Indonesian hospital records. The dataset includes symptomatic features and blood-based laboratory indicators. To ensure robustness, model evaluation was performed using randomized search, hyperparameter tuning, and multiple train-test splits. The highest performance was achieved with an accuracy of 96.67% and an AUC of 97.04% with ratio splitting data in 90:10. Interpretability was enhanced using SHAP analysis, identifying LED, platelet count, age, and body weight as key predictors. This study highlights the potential of XGBoost to identify clinical risk factors for PTB with comorbidities using structured clinical data, offering a scalable decision-support tool aligned with national PTB control strategies in resource-constrained environments.
MeSH terms
- Gradient boosting
- Boosting (machine learning)
- Artificial intelligence
- Computer science
- Tuberculosis
- Medicine
- Machine learning