TB Research

Leveraging machine learning for accurate forecasting of pulmonary tuberculosis epidemics in a coastal city in China

J. Joshua Yang, J. Pan, Jianhui Chen, Yan Xu, Xiaoyang Zhang

Tropical Medicine and Health · 2026-03

Abstract

BACKGROUND: While pulmonary tuberculosis (PTB) remains a leading notifiable cause of death in China, city-level monthly forecasts with sufficient resolution to guide vaccine, drug and bed logistics are scarce, and no head-to-head comparison of classical time-series versus machine-learning strategies under identical epidemiological conditions has been published. METHODS: Using 168 monthly PTB case reports from Fuzhou (January 2009-December 2022) and an 24-month prospective validation set (2023-2024), we developed, tuned and independently tested three forecasting frameworks: seasonal ARIMA with automatic order selection, Facebook Prophet with multiplicative seasonality and change-point detection, and extreme-gradient-boosting (XGBoost) fed with 1-12 month lagged incidence, calendar and linear-trend covariates. Hyper-parameters were optimized by grid search and early stopping; accuracy was quantified with MSE, RMSE and MAE, while residual diagnostics, stationarity and white-noise tests assessed model adequacy. RESULTS: All algorithms fitted the training data closely (RMSE 25.11, 25.31 and 0.0258 cases; MAE ≤ 22 cases). However, on unseen data XGBoost achieved substantially lower prediction errors (RMSE 9.80; MAE 2.93; MSE 96.10) than ARIMA (60.43; 50.28; 3651.86) or Prophet (64.74; 54.49; 4191.86), correctly anticipating the observed 5.7% annual decline and progressively narrowing spring-summer double peaks. Prophet slightly over-estimated seasonal amplitude, whereas ARIMA accumulated trend extrapolation bias; XGBoost residuals remained approximately white noise. CONCLUSIONS: For cities with nonlinear waning epidemics and seasonally contracting amplitude, machine-learning-based XGBoost offers superior extrapolation robustness over traditional ARIMA or Prophet approaches, providing an evidence-based tool for monthly PTB early-warning, precise resource pre-positioning and targeted control in comparable high-density, coastal urban settings.

MeSH terms

  • Pulmonary tuberculosis
  • Autoregressive integrated moving average
  • Robustness (evolution)
  • China
  • Machine learning
  • Artificial intelligence
  • Sri lanka
  • Extrapolation
  • Computer science
  • Tuberculosis control
  • Geography
  • Public health
  • Coronavirus disease 2019 (COVID-19)
  • Resource (disambiguation)