TB Research

C106-01 Baseline and Post-treatment Radiographic Severity in Tuberculosis: A Comparative Assessment of Human and Machine-learning Derived Severity Scores in the Trust Cohort

K Harrington, S Kulkarni, M Ghanem, S Kulkarni Goodwin, T Carney, R Warren, K Jacobson, M R Farhat

American Journal of Respiratory and Critical Care Medicine · 2026-05

Abstract

Abstract Introduction Chest x-ray (CXR) evaluation remains a cornerstone in tuberculosis (TB) diagnosis, severity assessment, and treatment monitoring. However, CXR interpretation requires a trained physician or radiologist, making it a labor-intensive diagnostic tool not often readily available in high-burden settings. Advances in machine learning (ML) models trained to extract features of CXRs offer a scalable method for rapid and standardized assessments, which can reduce burden on providers and aid in triaging care. Yet, few studies have compared ML-derived severity estimates to human reads longitudinally after TB treatment. Methods The TRUST-TB cohort recruited participants initiating pulmonary TB treatment for rifampicin susceptible TB 2017-2024 in Worcester, South Africa and followed them for a year after treatment completion. CXRs were digitally captured at treatment initiation and 6-17 months post-treatment. Radiographic disease severity was quantified using two metrics: the percent of lung involved (PLI) with tuberculous pneumonia (range 0-100), and the Timika score equal to the PLI + 40 if cavitation present (range 0-140). CXRs were interpreted independently by clinicians and ensemble ML models developed in prior work. We studied longitudinal change in human and machine read PLI and Timika from treatment initiation to post treatment. Agreement between ML estimates and human reads was assessed using Pearson correlation coefficients and visualized in Figure 1A-D. Results Of 432 participants, 190 had baseline and follow-up CXRs with ML estimates for PLI and Timika scores, while 53 had both baseline and follow-up human reads for both. Average PLI by human read at baseline and follow up were 22.1 and 7.9, respectively. Average Timika by human read at baseline and follow up were 43.9 and 11.7, respectively. Among ML estimates, median percent change was -42.9 (IQR -61.0, -18.7) for PLI, and -43.1 (IQR -58.0, -21.7) for Timika. Among CXRs with human reads, median percent change was -64.3 (IQR -75.0, -40.4) for PLI, and -75.0 (IQR -89.6, -40.4) for Timika. PLI estimates from the ML model and human reads were moderately to strongly correlated at baseline (r = 0.62) and follow up (r = 0.66) (p-values<0.001). Timika scores from ML model and human reads were moderately correlated at baseline (r = 0.58) and follow up (r = 0.49) (p-values<0.01). Conclusions ML-derived severity estimates correlated with human reads and captured treatment-associated improvements in lung involvement. These findings support use of ML-based tools for standardized quantification of radiographic severity in TB treatment monitoring. Further work is needed to optimize calibration, particularly for Timika estimation, to improve correlation with human reads. This abstract is funded by: Boston University/Rutgers Tuberculosis Research Unit

MeSH terms

  • Medicine
  • Cohort
  • Radiography
  • Pneumonia
  • Severity of illness
  • Tuberculosis
  • Cohort study
  • Baseline (sea)
  • Radiological weapon
  • Physical therapy
  • Internal medicine
  • Emergency department
  • Emergency medicine