Leveraging machine learning for accurate forecasting of pulmonary tuberculosis epidemics in a coastal city in China.
Jingjing Yang, Jieru Pan, Jianhui Chen, Youqiong Xu, Xiaoyang Zhang
Tropical medicine and health · 2026-03
Abstract
BACKGROUND: While pulmonary tuberculosis (PTB) remains a leading notifiable cause of death in China, city-level monthly forecasts with sufficient resolution to guide vaccine, drug and bed logistics are scarce, and no head-to-head comparison of classical time-series versus machine-learning strategies under identical epidemiological conditions has been published.
METHODS: Using 168 monthly PTB case reports from Fuzhou (January 2009-December 2022) and an 24-month prospective validation set (2023-2024), we developed, tuned and independently tested three forecasting frameworks: seasonal ARIMA with automatic order selection, Facebook Prophet with multiplicative seasonality and change-point detection, and extreme-gradient-boosting (XGBoost) fed with 1-12 month lagged incidence, calendar and linear-trend covariates. Hyper-parameters were optimized by grid search and early stopping; accuracy was quantified with MSE, RMSE and MAE, while residual diagnostics, stationarity and white-noise tests assessed model adequacy.
RESULTS: All algorithms fitted the training data closely (RMSE 25.11, 25.31 and 0.0258 cases; MAE ≤ 22 cases). However, on unseen data XGBoost achieved substantially lower prediction errors (RMSE 9.80; MAE 2.93; MSE 96.10) than ARIMA (60.43; 50.28; 3651.86) or Prophet (64.74; 54.49; 4191.86), correctly anticipating the observed 5.7% annual decline and progressively narrowing spring-summer double peaks. Prophet slightly over-estimated seasonal amplitude, whereas ARIMA accumulated trend extrapolation bias; XGBoost residuals remained approximately white noise.
CONCLUSIONS: For cities with nonlinear waning epidemics and seasonally contracting amplitude, machine-learning-based XGBoost offers superior extrapolation robustness over traditional ARIMA or Prophet approaches, providing an evidence-based tool for monthly PTB early-warning, precise resource pre-positioning and targeted control in comparable high-density, coastal urban settings.