Pan- and core genome analysis of <i>Mycobacterium tuberculosis</i> in high-resolution transmission and genetic diversity studies
Liu Z, Pei S, Ou X, Li X, Zhu Y, Lu Y, Zhang M, Che Y, et al. (11 authors)
Microbial genomics · 2025-10
Abstract
The application of pan-genomics in understanding Mycobacterium tuberculosis complex (MTBC) transmission remains understudied, particularly in high-burden settings such as China. We constructed an M. tuberculosis complex pan-genome (MTB_pan) using 307 complete genomes representing all eight lineages from the National Center for Biotechnology Information (NCBI). Core and accessory genomes were analysed, lineage-associated genes were identified and functional annotations were assessed using Pfam domains. Transmission dynamics were evaluated by comparing pan-genome- and H37Rv-based approaches for isolates collected from China, focusing on alignment rates, clustering efficiency and SNP distances. The MTB_pan (5.75 Mb) consisted of 3,893 core and 958 accessory genes, with 176 accessory genes significantly associated with specific lineages. These genes were enriched in PE/PPE/PGRS families, mobile genetic elements (e.g. IS6110) and pentapeptide repeats. The median alignment rate based on MTB_pan reached 99.8%, which was significantly higher than that based on H37Rv. The clustering rate of isolates based on MTB_pan (19.93%) was higher than that based on H37Rv (18.90%). The pairwise SNP distances below 50 SNPs within lineage 2 decreased significantly, while those within lineage 4 showed no significant differences. Compared to a single reference genome, clustering using the pan-genome improved the identification of same-province transmission events. Therefore, the pan-genomic analysis is a more powerful analytical tool that enables the establishment of a high-resolution picture of tuberculosis transmission in different epidemiological settings, which will enable more precise outbreak mapping and support data-driven tuberculosis control strategies.
MeSH terms
- Humans
- Mycobacterium tuberculosis
- Tuberculosis
- Genomics
- Phylogeny
- Polymorphism, Single Nucleotide
- Genome, Bacterial
- China
- Genetic Variation
- Whole Genome Sequencing