TB Research

Application of Machine Learning K-Means Clustering and Linear Regression in Determining the Risk Level of Pulmonary Tuberculosis

Abhijit Pathak, Ziaul Islam Bablu, Towhidul Haque Limon, Sowmik Barua, Piyal Dey, Mowmita Tajnin Jiba, Tazrian Alam

Abstract

Pulmonary tuberculosis (TB) remains a significant public health concern in densely populated regions like Bireuen, Bangladesh, which reported 755 cases in 2019 among a population of 400,000. This study used data from Bangabandhu Sheikh Mujib Medical University Hospital and the Health Department across 17 districts to identify high-risk areas and predict disease incidence. Utilizing K-Means clustering and Cluster-wise Regression, the analysis identified two high-risk areas in Cluster 1, six in Cluster 2, and nine in Cluster 3, with a regression analysis R-squared value of 0.5740, indicating moderate predictive capacity. These findings provide critical insights for public health authorities to devise targeted interventions and allocate resources effectively. Strategies such as targeted screening programs and improved access to diagnostic and treatment facilities in high-risk areas can help mitigate TB’s impact. This study presents a novel approach to TB risk assessment by combining K-Means clustering and cluster-wise linear regression, offering a more nuanced understanding of TB incidence at a regional level. Our findings provide actionable insights for public health authorities, enabling more targeted interventions and efficient resource allocation in high-risk areas.

MeSH terms

  • Cluster analysis
  • Computer science
  • Linear regression
  • Artificial intelligence
  • Logistic regression
  • Pulmonary tuberculosis
  • Machine learning
  • Regression analysis
  • Regression
  • Statistics