Comprehensive comparative analysis of explainable deep learning model for differentiation of brucellar spondylitis and tuberculous spondylitis through MRI sequences
Parhat Yasin, Abudouresuli Tuersun, Anuar Ashir, Yerlan Makhambetov, Jie Sheng, Xinghua Song
European journal of medical research · 2025-12
Abstract
BACKGROUND: The differentiation of brucellar spondylitis (BS) from tuberculous spondylitis (TS) on magnetic resonance imaging (MRI) is a critical clinical challenge. While deep learning holds promise, the optimal architectural strategy for integrating information from multi-sequence MRI remains unclear. This study systematically compared distinct deep learning architectures to identify a valid and effective integration strategy for this diagnostic problem. METHODS: In this retrospective, single-center diagnostic study, we included 235 patients with surgically and pathologically confirmed BS (n = 82) or TS (n = 153) from January 2014 to December 2024. We systematically evaluated four distinct architectural strategies for processing sagittal T1-weighted, T2-weighted, and fat-suppressed MRI sequences: (1) baseline models trained on single sequences; (2) a single-branch model that fused sequences as input channels; (3) a heterogeneous multi-branch model using different backbones for each sequence; and (4) a homogeneous multi-branch model using identical backbones. Models were developed on patient-level data splits for training (70%), validation (15%), and internal testing (15%). The primary performance metric was the area under the receiver operating characteristic curve (AUC) on the test set. Statistical significance of performance differences between models was assessed using the DeLong test, with P values adjusted for multiple comparisons using the Benjamini-Hochberg procedure. RESULTS: The single-branch fusion model, which treated the three sequences as channels in a single input, failed to learn, yielding performance equivalent to random chance (test AUC range: 0.474-0.538). In stark contrast, both the single-sequence and multi-branch architectures proved to be effective. The best single-sequence model achieved a test AUC of 0.765 (95% CI 0.759-0.771). The optimal multi-branch model, which successfully integrated all three sequences, achieved a comparable test AUC of 0.764 (95% CI 0.757-0.770). CONCLUSIONS: The choice of architecture for integrating multi-sequence MRI data is a critical determinant of model viability. Our findings demonstrate that naive channel wise fusion is an invalid strategy for this task. In contrast, both processing a single MRI sequence and utilizing a multi-branch parallel-processing architecture are valid and effective strategies, achieving comparable diagnostic performance. This study clarifies the architectural principles required for successfully applying deep learning to this multi-modal diagnostic challenge.
MeSH terms
- Deep learning
- Medicine
- Spondylitis
- Artificial intelligence
- Ankylosing spondylitis
- Medical physics
- Radiology
- Architecture
- Machine learning
- Magnetic resonance imaging
- Computer science
- Convolutional neural network
- Sequence (biology)