TB Research

Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches

Lindiwe Modest Faye, Cebo Magwaza, Ntandazo Dlatu, Teke Apalata

Information · 2025-03

Abstract

Latent tuberculosis infection (LTBI) poses a significant public health challenge, especially in populations with high HIV prevalence and limited healthcare access. Early detection and targeted interventions are essential to prevent the progression of active tuberculosis. This study aimed to identify the key factors influencing LTBI outcomes through the application of predictive models, including logistic regression and machine learning techniques, while also evaluating strategies to enhance LTBI awareness and testing. Data from rural areas in the Eastern Cape, South Africa, were analyzed to identify key demographic, health, and knowledge-related factors influencing LTBI outcomes. Predictive models utilized, included logistic regression, decision trees, and random forests, to identify key determinants of LTBI positivity based on demographic, health, and knowledge-related factors in rural areas of the Eastern Cape, South Africa. The models evaluated factors such as age, HIV status, and LTBI awareness, with random forests demonstrating the best balance of accuracy and interpretability. Additionally, a knowledge diffusion model was employed to assess the effectiveness of educational strategies in increasing LTBI awareness and testing uptake. Logistic regression achieved an accuracy of 68% with high precision (70%) but low recall (33%) for LTBI-positive cases, identifying age, HIV status, and LTBI awareness as significant predictors. The random forest model outperformed logistic regression in accuracy (59.26%) and F1-score (0.63), providing a better balance between precision and recall. Feature importance analysis revealed that age, occupation, and knowledge of LTBI symptoms were the most critical factors across both models. The knowledge diffusion model demonstrated that targeted interventions significantly increased LTBI awareness and testing, particularly in high-risk groups. While logistic regression offers more interpretable results for public health interventions, machine learning models like random forests provide enhanced predictive power by capturing complex relationships between demographics and health factors. These findings highlight the need for targeted educational campaigns and increased LTBI testing in high-risk populations, particularly those with limited awareness of LTBI symptoms.

MeSH terms

  • Logistic regression
  • Cape
  • Tuberculosis
  • Geography
  • Regression analysis
  • Machine learning
  • Artificial intelligence
  • Demography
  • Statistics
  • Computer science