Development and validation of a machine learning model for community-based tuberculosis screening among persons aged ≥ 15 years in South Africa and Zambia

A. Zimmer, Kindie Fentahun Muchie, Henry Loharja, Lisa Koeppel, Helen Ayles, Maria del Mar Castro, Evangelia Christodoulou, Greg J Fox, et al. (28 authors)

medRxiv · 2026-04

Abstract

Introduction: Current tuberculosis (TB) screening tools, such as the WHO four-symptom screen (W4SS), lack sufficient sensitivity and specificity for effective community-based active case finding, contributing to both missed diagnoses and unnecessary diagnostic evaluations. This study aimed to develop and validate a machine learning (ML) model to improve TB risk prediction among persons aged ≥15 years in community settings of Zambia and South Africa. Methods: A large, harmonized dataset was created from four community-based TB prevalence surveys in South Africa and Zambia (N=169,813), restricted to individuals not under treatment at the time of survey. A binary reference outcome was defined based on available microbiological and radiographic data, grouping individuals as either 'Possible TB' or 'Unlikely TB'. An XGBoost model was trained on 80% (N=135,854) of the data using demographic, clinical, and socio-economic variables, and model interpretability was assessed using SHapley Additive exPlanations (SHAP) values. Internal validation was performed using a 20% hold-out test set (N=33,959). Model performance was assessed using discrimination, calibration, and clinical utility measures compared to the W4SS and against WHO's 2025 Target Product Profile (TPP) for a tool in a two-step screening algorithm. Results: Overall, 16,413 (9.7%) of individuals were labelled as 'Possible TB'. On the test set, the XGBoost model yielded an area under the curve (AUC) of 79.7% (95% CI: 78.7, 80.7), outperforming the W4SS (AUC 57.0%; 95% CI: 56.1, 57.8). The XGBoost model achieved 81.5% sensitivity (95% CI: 77.6, 84.9) at a 60% specificity threshold. This exceeded the W4SS, which achieved only 38.2% sensitivity (95% CI: 36.5, 39.9) on the same dataset. SHAP analysis identified age, previous TB treatment, times treated for TB and unemployment as the primary contributors to risk. Conclusion: The ML XGBoost model shows promise as a screening tool to support community-based active case finding activities prior to diagnostic testing. However, as performance remained below TPP targets, and adding variables, e.g. on geolocation, could be considered. Registration: The study was not registered.

MeSH terms

Interpretability
Medicine
Machine learning
Medical diagnosis
Artificial intelligence
Test set
Test (biology)
Tuberculosis
Receiver operating characteristic
Sensitivity (control systems)
Area under curve
Product (mathematics)

Development and validation of a machine learning model for community-based tuberculosis screening among persons aged ≥ 15 years in South Africa and Zambia

Abstract

MeSH terms

Related papers