TB Research

Development and validation of a habitat-based computed tomography radiomics model for differentiating isolated lung cancer, isolated tuberculoma, and coexistence of tuberculosis with lung cancer: a dual-center retrospective study

Shi N, Wan Z, Wen L, Liu Z, Wang B, Li Y, Xiong P, Hou D, et al. (9 authors)

Translational lung cancer research · 2026-02

Abstract

Background Isolated lung cancer (ILC), isolated tuberculoma, and coexistence of tuberculosis with lung cancer (CTBLC) exhibit similarities in computed tomography (CT) imaging features but great differences in pathology, treatment strategy, and prognosis; therefore, accurate differential diagnosis is critical for clinical management and patient safety. The purpose of this study was to develop and validate a habitat-based CT radiomics model that integrates intralesional subregion features with whole-lesion features for reliable differentiation among these three conditions. Methods This study retrospectively included 317 patients with ILC, tuberculoma, or CTBLC from 2018 to 2022. Among these, 239 patients from Beijing Chest Hospital, Capital Medical University (Center 1) formed the training and internal test cohorts, and 78 from Infectious Disease Hospital of Heilongjiang Province (Center 2) constituted an external validation cohort. Volumes of interest (VOIs) were manually outlined by two experienced radiologists on CT images. Then each lesion was partitioned into two subregions using K-means clustering. A total of 1,218 three-dimensional whole-lesion radiomics features and 2,436 habitat features were extracted. Feature selection was performed via least absolute shrinkage and selection operator (LASSO). Six classification algorithms were trained and evaluated. To distinguish ILC, tuberculoma, and CTBLC, three models were developed: (I) a traditional radiomics model using only whole-lesion radiomics features; (II) a habitat model based on intralesional habitat features; and (III) a combined habitat-radiomics model fusing both feature sets. Discrimination was assessed using the area under the curve (AUC), and SHapley Additive exPlanations (SHAP) was used to interpret the optimal model and visualize individual prediction decisions. Results The combined habitat-radiomics model that integrates habitat and whole-lesion features outperformed the traditional radiomics model. Among them, the extreme gradient boosting (XGBoost)-based fusion model achieved the best performance (mean AUC =0.934) in the internal test cohort, surpassing both the radiomics model (mean AUC =0.910) and the habitat model (mean AUC =0.873). For individual classes, the fusion model yielded AUCs of 0.911 (ILC), 0.955 (tuberculoma), and 0.937 (CTBLC). Compared with the interpretations provided by three radiologists, the combined radiomics-habitat model demonstrated better discriminative performance. SHAP plots revealed key features and presented individual visualizations of each prediction. Conclusions A habitat-based CT radiomics approach that incorporates intralesional subregion features into whole-lesion signatures improves differentiation among ILC, tuberculoma, and CTBLC. This combined model provides a noninvasive tool to support clinical decision-making.