TB Research

Pan- and core genome analysis of Mycobacterium tuberculosis in high-resolution transmission and genetic diversity studies

Zhengwei Liu, Shaojun Pei, Xichao Ou, Xiangchen Li, Yelei Zhu, Yewei Lu, Mingwu Zhang, Yang Che, et al. (11 authors)

Microbial Genomics · 2025-10

Abstract

complex pan-genome (MTB_pan) using 307 complete genomes representing all eight lineages from the National Center for Biotechnology Information (NCBI). Core and accessory genomes were analysed, lineage-associated genes were identified and functional annotations were assessed using Pfam domains. Transmission dynamics were evaluated by comparing pan-genome- and H37Rv-based approaches for isolates collected from China, focusing on alignment rates, clustering efficiency and SNP distances. The MTB_pan (5.75 Mb) consisted of 3,893 core and 958 accessory genes, with 176 accessory genes significantly associated with specific lineages. These genes were enriched in PE/PPE/PGRS families, mobile genetic elements (e.g. IS6110) and pentapeptide repeats. The median alignment rate based on MTB_pan reached 99.8%, which was significantly higher than that based on H37Rv. The clustering rate of isolates based on MTB_pan (19.93%) was higher than that based on H37Rv (18.90%). The pairwise SNP distances below 50 SNPs within lineage 2 decreased significantly, while those within lineage 4 showed no significant differences. Compared to a single reference genome, clustering using the pan-genome improved the identification of same-province transmission events. Therefore, the pan-genomic analysis is a more powerful analytical tool that enables the establishment of a high-resolution picture of tuberculosis transmission in different epidemiological settings, which will enable more precise outbreak mapping and support data-driven tuberculosis control strategies.

MeSH terms

  • Biology
  • Genome
  • Transmission (telecommunications)
  • Genetics
  • Computational biology
  • Cluster analysis
  • Lineage (genetic)
  • Identification (biology)
  • Pairwise comparison
  • Mycobacterium tuberculosis complex
  • Mycobacterium tuberculosis
  • Genetic diversity
  • Gene
  • Evolutionary biology
  • SNP
  • Single-nucleotide polymorphism
  • Core (optical fiber)
  • Genome-wide association study
  • Phylogenetic tree