TB Research

Integrating WGCNA and machine learning to distinguish active pulmonary tuberculosis from latent tuberculosis infection based on neutrophil extracellular trap-related genes

Wang T, Lu T, Lu W, He J, Wu Z, Lei Y

Diagnostic microbiology and infectious disease · 2025-08

Abstract

Background Pulmonary tuberculosis (PTB) remains a major global public health challenge, with diagnostic delays being a key factor contributing to its high morbidity and mortality. Growing evidence suggests that neutrophil extracellular traps (NETs) are closely associated with PTB pathogenesis. This study focuses on elucidating the role of NETs in PTB and identifying critical diagnostic methods and potential biomarkers. Methods Weighted gene co-expression network analysis (WGCNA) was employed to identify the three modules most strongly correlated with NETs. Differentially expressed genes (DEGs) from GSE39939 dataset were intersected with module genes to obtain NET-related DEGs. Four machine learning algorithms (LASSO, random forest, RFE, and Boruta) were applied to select feature genes and develop a PTB diagnostic model. Model's performance was evaluated using support vector machine (SVM)-based receiver operating characteristic (ROC) and precision-recall (PR) curves, with validation in the GSE39940 dataset. The optimal algorithm was selected to refine feature genes and construct a miRNA-gene regulatory network. Results ROC and PR curve analyses revealed that RFE and Boruta algorithms exhibited superior diagnostic efficacy in distinguishing active PTB from latent TB infection (LTBI). Further analysis identified five overlapping high-ranking feature genes (GPR84, SIGLEC10, CCR2, TMEM167A, and GYG1) between the RFE and Boruta algorithms. hsa-miR-1264, hsa-miR-664a-3p, hsa-miR-548e-5p, hsa-miR-4775, and hsa-miR-5056 were predicted to potentially target these genes. Conclusion RFE algorithm achieves high diagnostic accuracy for PTB and identifies five potential biomarkers (GPR84, SIGLEC10, CCR2, TMEM167A, and GYG1). These findings may provide valuable tools for PTB diagnosis and treatment.

MeSH terms

  • Humans
  • Tuberculosis, Pulmonary
  • MicroRNAs
  • Diagnosis, Differential
  • ROC Curve
  • Gene Expression Profiling
  • Algorithms
  • Gene Regulatory Networks
  • Latent Tuberculosis
  • Extracellular Traps
  • Biomarkers
  • Machine Learning