Construction and Validation of a Predictive Model for the Risk of Anti-Tuberculosis Drug-Induced Liver Injury Based on Machine Learning Algorithms
Cheng J, Chen R, Pan H, Lu L, Zhang M, He X, Yi H, Tang S
Journal of clinical pharmacology · 2025-11
Abstract
Anti-tuberculosis drug-induced liver injury (ATLI) is the most harmful to anti-tuberculosis (TB) treatment. This study aims to construct and validate a binary ATLI risk prediction model based on clinical data through seven machine learning algorithms (logistic regression, decision tree, support vector machine, random forest, gradient boosting decision tree, extreme gradient boosting, and light gradient boosting machine [LightGBM]). A retrospective cohort of 2356 TB patients followed between January 2017 and December 2024 was used to develop and evaluate the prediction model. Random undersampling was performed to address class imbalance problem. Least absolute shrinkage and selection operator (LASSO) regression was used to select features and retained 27 of 31 original features. Seven ML algorithms were trained and the LightGBM model demonstrated optimal performance in testing set based on the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity (AUC = 0.789, sensitivity = 0.734, and specificity = 0.706). The model exhibited optimal simplicity and stability when incorporating the 8-feature combination comprising baseline high-density lipoprotein cholesterol (HDLC), γ-glutamyl transpeptidase (GGT), triglycerides, total cholesterol (TCHOL), uric acid, total bilirubin (TBIL), globulin (GLB), and liver disease history (AUC = 0.764, sensitivity = 0.758, and specificity = 0.610), and Shapely additive explanations analysis also revealed that these variables were the most influential contributors. The optimal model maintained robust predictive ability in the external validation cohort (AUC = 0.721, sensitivity = 0.828, and specificity = 0.604). This study determined that the combination of baseline HDLC, GGT, triglycerides, total cholesterol (TCHOL), uric acid, TBIL, GLB, and liver disease history was an important predictor of ATLI through LightGBM model, which could help clinicians in the early identification of ATLI.
MeSH terms
- Humans
- Antitubercular Agents
- Retrospective Studies
- Algorithms
- Adult
- Aged
- Middle Aged
- Female
- Male
- Machine Learning
- Chemical and Drug Induced Liver Injury