Pan- and core genome analysis of Mycobacterium tuberculosis in high-resolution transmission and genetic diversity studies
Zhengwei Liu, Shaojun Pei, Xichao Ou, Xiangchen Li, Yelei Zhu, Yewei Lu, Mingwu Zhang, Yang Che, et al. (11 authors)
Microbial Genomics · 2025-10
Abstract
complex pan-genome (MTB_pan) using 307 complete genomes representing all eight lineages from the National Center for Biotechnology Information (NCBI). Core and accessory genomes were analysed, lineage-associated genes were identified and functional annotations were assessed using Pfam domains. Transmission dynamics were evaluated by comparing pan-genome- and H37Rv-based approaches for isolates collected from China, focusing on alignment rates, clustering efficiency and SNP distances. The MTB_pan (5.75 Mb) consisted of 3,893 core and 958 accessory genes, with 176 accessory genes significantly associated with specific lineages. These genes were enriched in PE/PPE/PGRS families, mobile genetic elements (e.g. IS6110) and pentapeptide repeats. The median alignment rate based on MTB_pan reached 99.8%, which was significantly higher than that based on H37Rv. The clustering rate of isolates based on MTB_pan (19.93%) was higher than that based on H37Rv (18.90%). The pairwise SNP distances below 50 SNPs within lineage 2 decreased significantly, while those within lineage 4 showed no significant differences. Compared to a single reference genome, clustering using the pan-genome improved the identification of same-province transmission events. Therefore, the pan-genomic analysis is a more powerful analytical tool that enables the establishment of a high-resolution picture of tuberculosis transmission in different epidemiological settings, which will enable more precise outbreak mapping and support data-driven tuberculosis control strategies.
MeSH terms
- Biology
- Genome
- Transmission (telecommunications)
- Genetics
- Computational biology
- Cluster analysis
- Lineage (genetic)
- Identification (biology)
- Pairwise comparison
- Mycobacterium tuberculosis complex
- Mycobacterium tuberculosis
- Genetic diversity
- Gene
- Evolutionary biology
- SNP
- Single-nucleotide polymorphism
- Core (optical fiber)
- Genome-wide association study
- Phylogenetic tree