Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches
Lindiwe Modest Faye, Cebo Magwaza, Ntandazo Dlatu, Teke Apalata
Preprints.org · 2024-10
Abstract
Latent Tuberculosis Infection (LTBI) poses a significant public health challenge, especially in populations with high HIV prevalence and limited healthcare access. Early detection and targeted interventions are essential to prevent the progression of active tuberculosis. This study develops predictive models for LTBI outcomes using logistic regression and machine learning approaches and evaluates strategies to improve LTBI awareness and testing. Data from rural areas in the Eastern Cape, South Africa, were analyzed to identify key demographic, health, and knowledge-related factors influencing LTBI outcomes. Logistic regression was employed to pre-dict LTBI positivity based on factors such as age, education, and HIV status. Machine learning models, including decision trees and random forests, were also applied to compare predictive accuracy. A knowledge diffusion model was used to assess the impact of educational interventions on increasing LTBI awareness and testing rates. Logistic regression achieved an accuracy of 66.67% with high precision (80%) but low recall (33%) for LTBI-positive cases, identifying age, HIV status, and LTBI awareness as significant predictors. The random forest model outperformed logistic regression in accuracy (59.26%) and F1-score (0.63), providing a better balance between precision and recall. Feature importance analysis revealed that age, occupation, and knowledge of LTBI symptoms were the most critical factors across both models. The knowledge diffusion model demonstrated that targeted interventions significantly increased LTBI awareness and testing, particularly in high-risk groups. While logistic regression offers more interpretable results for public health interventions, machine learning models like random forests provide enhanced predictive power by capturing complex relationships between demographics and health factors. These findings highlight the need for targeted educational campaigns and increased LTBI testing in high-risk populations, particularly those with limited awareness of LTBI symptoms.
MeSH terms
- Logistic regression
- Cape
- Artificial intelligence
- Machine learning
- Tuberculosis
- Regression analysis
- Computer science
- Geography
- Econometrics