TB Research

Development and validation of prediction algorithm to identify tuberculosis in two large California health systems

Heidi Fischer, Lei Qian, Zhuoxin Li, Katia Bruxvoort, Jacek Skarbinski, Yuching Ni, Jennifer H Ku, Bruno Lewin, et al. (13 authors)

Nature Communications · 2025-04

Abstract

California data demonstrate failures in latent tuberculosis screening to prevent progression to tuberculosis disease. Therefore, we developed a clinical risk prediction model for tuberculosis disease using electronic health records. This study included Kaiser Permanente Southern California and Northern California members ≥18 years during 2008-2019. Models used Cox proportional hazards regression, Harrell’s C-statistic, and a simulated TB disease outcome accounting for cases prevented by current screening which includes both observed and simulated cases. We compared sensitivity and number-needed-to-screen for model-identified high-risk individuals with current screening. Of 4,032,619 and 4,051,873 Southern and Northern California members, tuberculosis disease incidences were 4.1 and 3.3 cases per 100,000 person-years, respectively. The final model C-statistic was 0.816 (95% simulation interval 0.805-0.824). Model sensitivity screening high-risk individuals was 0.70 (0.68-0.71) and number-needed-to-screen was 662 (646-679) persons-per tuberculosis disease case, compared to a sensitivity of 0.36 (0.34-0.38) and number-needed-to-screen of 1632 (1485-1774) with current screening. Here, we show our predictive model improves tuberculosis screening efficiency in California. In the United States, tuberculosis control is focused on prevention of progression from latent tuberculosis infection to TB disease. Here, the authors develop and validate a prediction model to identify individuals at risk of TB disease using data from electronic health records from California.

MeSH terms

  • Tuberculosis
  • Medicine
  • Disease
  • Statistic
  • Proportional hazards model
  • Demography
  • Internal medicine