TB Research

PTBD: a machine learning-based non-invasive diagnostic model for pulmonary tuberculosis using large-scale blood transcriptomes.

Changchun Wu, Xueqin Xie, Ziru Huang, Yuwei Zhou, Yushu Gou, Mengze Du, Hao Lin, Jian Huang

BMC biology · 2026-03

Abstract

BACKGROUND: Accurate and rapid diagnosis is essential for controlling pulmonary tuberculosis (PTB) by enabling timely intervention and reducing disease transmission. Existing diagnostic methods for PTB are limited by sensitivity, specificity, or practicality.

RESULTS: Here, we present a non-invasive blood transcriptome-based diagnostic model that integrates top-scoring pair with machine learning, identifying novel and robust transcriptomic biomarkers for diverse PTB patients. Using 2,792 peripheral blood transcriptome samples, the proposed model (PTBD) effectively distinguishes PTB from healthy individuals, latent tuberculosis infection, pneumonia, lung cancer, and pulmonary nodules, reflecting its robustness against the complex and heterogeneous negative sample background typical of real-world clinical settings, and achieving AUCs of 0.869 in the test set and 0.909 in an independent external validation set. Its performance is consistent across different geographic regions, age groups, and special conditions, including Bacillus Calmette-Guérin vaccination, HIV infection, diabetes, and drug resistance, meeting WHO requirements for community-based triage, children, and PTB with HIV infection. Furthermore, PTBD also enables diagnosis of extrapulmonary tuberculosis and prediction of treatment outcomes, with feature scores serving as molecular biomarkers reflecting disease progression and prognosis.

CONCLUSIONS: This study provides a broadly applicable tool for early PTB diagnosis, facilitating timely intervention and potentially reducing global PTB burden. Moreover, PTBD uncovers novel transcriptomic biomarkers, represented by five diagnostic gene-pair expression patterns that embody the molecular hallmarks of PTB.

MeSH terms

  • Humans
  • Tuberculosis, Pulmonary
  • Machine Learning
  • Transcriptome
  • Biomarkers
  • Adult
  • Middle Aged
  • Male
  • Female
  • Child
  • Adolescent
  • Young Adult
  • Aged