TB Research

Boosting diagnostic precision for Tuberculosis and Sarcoidosis using gene expression data

Aman Yadav, Yasha Hasija

Abstract

Tuberculosis and sarcoidosis are clinically overlap-ping granulomatous diseases that present significant diagnostic challenges, particularly when distinguishing between their subtypes. Traditional diagnostic methods often lack specificity, leading to misclassification and suboptimal treatment. In this study, we propose a novel hierarchical machine learning framework that leverages gene expression profiling to enhance the differential diagnosis of tuberculosis and sarcoidosis. Using high-throughput transcriptomic data, we developed a classification model capable of distinguishing between healthy individuals, tuberculosis patients, sarcoidosis patients, and further differentiating key subtypes such as active and latent tuberculosis, pulmonary tuberculosis and extrapulmonary tuberculosis, and active and non-active sarcoidosis. Among various models evaluated, AdaBoost consistently outperformed other algorithms across all classifications, demonstrating exceptional accuracy, robustness, and interpretability. Our results underscore the power of ensemble learning combined with gene expression data to uncover distinct molecular signatures for complex diseases. This approach not only improves diagnostic precision but also supports more targeted and individualized treatment strategies for infectious and inflammatory diseases.

MeSH terms

  • Boosting (machine learning)
  • Sarcoidosis
  • Computer science
  • Tuberculosis
  • Artificial intelligence
  • Computational biology