TB Research

Artificial Intelligence-Based Surveillance of Tuberculosis in South Africa Using Google Trends Data

Nqobile S. Hlatshwayo, Seun O. Olukanmi

International Journal of Online and Biomedical Engineering (iJOE) · 2025-10

Abstract

Tuberculosis (TB) remains a significant global public health challenge and the deadliest infectious disease, according to the World Health Organization (WHO) Global TB Report of 2024. This study explores integrating Google Trends (GT) data with machine learning (ML) methods to forecast TB incidence in South Africa. Pearson correlation analysis identified eight TB-related search terms with moderate to strong correlations to official surveillance data from the National Institute for Communicable Diseases (NICD) between 2012–2021. Four ML models were compared using rolling-window cross-validation: partial least squares (PLS), LASSO regression, support vector machine (SVM), and long short-term memory (LSTM) networks. The PLS model achieved superior performance, significantly outperforming more complex deep learning approaches. These findings demonstrate that simpler linear models can effectively leverage GT data to complement traditional TB surveillance systems in South Africa.

MeSH terms

  • Leverage (statistics)
  • Artificial intelligence
  • Tuberculosis
  • Public health
  • Partial least squares regression
  • Computer science
  • Support vector machine
  • Machine learning
  • Big data
  • Public health surveillance
  • Real world data
  • Medicine
  • Global health
  • Incidence (geometry)
  • Geography
  • Linear model
  • Complement (music)
  • Environmental health
  • Data science
  • Data mining