TB Research

Development and validation of a machine learning model for community-based tuberculosis screening among persons aged ≥ 15 years in South Africa and Zambia

A. Zimmer, Kindie Fentahun Muchie, Henry Loharja, Lisa Koeppel, Helen Ayles, Maria del Mar Castro, Evangelia Christodoulou, Greg J Fox, et al. (28 authors)

medRxiv · 2026-04

Abstract

Introduction: Current tuberculosis (TB) screening tools, such as the WHO four-symptom screen (W4SS), lack sufficient sensitivity and specificity for effective community-based active case finding, contributing to both missed diagnoses and unnecessary diagnostic evaluations. This study aimed to develop and validate a machine learning (ML) model to improve TB risk prediction among persons aged ≥15 years in community settings of Zambia and South Africa. Methods: A large, harmonized dataset was created from four community-based TB prevalence surveys in South Africa and Zambia (N=169,813), restricted to individuals not under treatment at the time of survey. A binary reference outcome was defined based on available microbiological and radiographic data, grouping individuals as either 'Possible TB' or 'Unlikely TB'. An XGBoost model was trained on 80% (N=135,854) of the data using demographic, clinical, and socio-economic variables, and model interpretability was assessed using SHapley Additive exPlanations (SHAP) values. Internal validation was performed using a 20% hold-out test set (N=33,959). Model performance was assessed using discrimination, calibration, and clinical utility measures compared to the W4SS and against WHO's 2025 Target Product Profile (TPP) for a tool in a two-step screening algorithm. Results: Overall, 16,413 (9.7%) of individuals were labelled as 'Possible TB'. On the test set, the XGBoost model yielded an area under the curve (AUC) of 79.7% (95% CI: 78.7, 80.7), outperforming the W4SS (AUC 57.0%; 95% CI: 56.1, 57.8). The XGBoost model achieved 81.5% sensitivity (95% CI: 77.6, 84.9) at a 60% specificity threshold. This exceeded the W4SS, which achieved only 38.2% sensitivity (95% CI: 36.5, 39.9) on the same dataset. SHAP analysis identified age, previous TB treatment, times treated for TB and unemployment as the primary contributors to risk. Conclusion: The ML XGBoost model shows promise as a screening tool to support community-based active case finding activities prior to diagnostic testing. However, as performance remained below TPP targets, and adding variables, e.g. on geolocation, could be considered. Registration: The study was not registered.

MeSH terms

  • Interpretability
  • Medicine
  • Machine learning
  • Medical diagnosis
  • Artificial intelligence
  • Test set
  • Test (biology)
  • Tuberculosis
  • Receiver operating characteristic
  • Sensitivity (control systems)
  • Area under curve
  • Product (mathematics)