Data-driven model analysis of the impact of environmental and socioeconomic factors on tuberculosis incidence
Tao Y, Zhao J, Cui H, Liang Z, Li J, Ren J, Zhu H
Infectious Disease Modelling · 2026-02
Abstract
Tuberculosis (TB), a global infectious disease, poses a formidable challenge to Taiwan, China, exacerbated by its aging demographic and the incursion of pathogens from Southeast Asia's high-risk districts. In this study, we analyzed data across 19 cities and counties in Taiwan, China from 2014 to 2022, deploying four machine learning (ML) and four deep learning (DL) models to forecast TB's monthly incidence, leveraging 12 drivers. The CatBoost, random forest, and gradient boosting models emerged as the top-performing models. By amalgamating these models with post-hoc explainable ML techniques, we consistently identified population size, sulfur dioxide levels, physician count, normalized difference vegetation index, wind velocity, and precipitation level the paramount influences on TB incidence. Additionally, we disclosed the nonlinear interactions and threshold effects between these determinants and TB incidence. W e further employed stepwise regression and statistical assessments to identify a model configuration that minimizes the necessary drivers while maintaining a high predictive accuracy. The framework and findings of this study offer a robust data support and decision-making basis for TB mitigation initiatives on a global scale.