Construction of a diagnostic model for tuberculosis based on long non-coding RNA
Ji X, Yao S, Jia H, Sun Q, Wang Y, Shang X, Wang Z, Huang M, et al. (12 authors)
Annals of medicine · 2026-01
Abstract
Background The World Health Organization encourages the development of novel diagnostic tools based on 'non-sputum' samples to meet global goals for tuberculosis (TB) control. We aimed to develop a machine learning-driven model for TB diagnosis, using long non-coding RNAs (lncRNAs) as biomarkers. Methods Peripheral blood mononuclear cells (PBMCs) from 10 TB patients, 10 latent TB infection individuals (LTBI), and 10 healthy controls (HCs) underwent microarray analysis, and the TB-related lncRNA modules were identified by weighted gene co-expression network analysis (WGCNA). Key lncRNAs were validated by qPCR and selected using LASSO regression. Five machine learning algorithms were employed to construct a diagnostic model, with the ROC analysis assessing its performance. Results Based on the differential lncRNA profile, WGCNA identified 12 key modules associated with TB. From the most significant modules, 45 candidate lncRNAs were validated by qPCR, with 14 showing differential expression among TB ( n = 192), LTBI ( n = 55), HC ( n = 66), and NTB ( n = 78) groups. Five lncRNAs demonstrating the greatest contribution to TB diagnosis were further selected by LASSO analysis. AdaBoost algorithm incorporating these five lncRNAs achieved optimal diagnostic performance, with an area under the curve (AUC) of 0.97 (95%CI: 0.95-0.98) in the training cohort ( n = 272) and 0.91 (95%CI: 0.86-0.96) in the validation cohort ( n = 119). Independent validation of the model in another cohort ( n = 206) showed an AUC of 0.92 (95%CI: 0.88-0.95). Conclusions This study established a novel, blood-based diagnostic model incorporating five host-derived lncRNAs with an AdaBoost algorithm, offering a non-sputum approach to enhance TB diagnosis.
MeSH terms
- Leukocytes, Mononuclear
- Humans
- Tuberculosis
- Case-Control Studies
- ROC Curve
- Gene Expression Profiling
- Algorithms
- Adult
- Middle Aged
- Female
- Male
- Latent Tuberculosis
- RNA, Long Noncoding
- Biomarkers
- Machine Learning