Machine learning approaches to predict the risk of tuberculosis among household contacts of index TB patients in Central Ethiopia
Habtamu Milkias Wolde, Wakjira Kebede, Delenasaw Yewhalaw, Gemeda Abebe, Youngoh Bae, Seung Won Lee
Scientific Reports · 2026-02
Abstract
Tuberculosis (TB) transmission within households remains a significant challenge in Ethiopia. Various factors including household, index TB case and contact characteristics may have a role in its transmission. Predicting which contacts of TB patients are at greatest risk of developing TB could help focus screening and intervention efforts on early detection and treatment initiation. In this study, we employed various machine learning models to predict TB risk among household contacts. We used data from a cross-sectional study of household contacts of index TB patients in Central Ethiopia. Data were collected through household visits conducted by trained health workers as part of a community-based TB contact investigation program in Central Ethiopia. A set of individual, household, and index case variables were used to train and evaluate multiple supervised machine learning models. Models included Random Forest (RF), Logistic Regression, Artificial Neural Networks (ANN) and XGBoost classifiers. Model performance was assessed using recall, precision, accuracy, ROC-AUC, and F1-score with recall (sensitivity) being prioritized because of the existence of class imbalance. Among 1,277 household contacts screened, 23 (1.8%) were diagnosed with TB. Random Forest and Balanced Random Forest classifiers outperformed logistic regression, achieving recall scores of 0.85. Logistic regression, a commonly used classifier, had an accuracy of 0.92 but a modest recall of 0.43 and an AUC of 0.76. Important TB predictors among household contacts identified using the best performing Balanced Random Forest model included being a presumptive TB case at baseline, providing sputum, having a cough and a productive cough, fatigue and loss of appetite. Machine learning models can identify household contacts at high risk for TB using routinely collected contact investigation data. Integration of predictive modeling into contact investigation could support more targeted and efficient TB screening efforts.
MeSH terms
- Random forest
- Machine learning
- Logistic regression
- Medicine
- Tuberculosis
- Artificial intelligence
- Recall
- Index (typography)
- Precision and recall
- Intervention (counseling)
- Transmission (telecommunications)
- Environmental health
- Predictive modelling