Global whole-genome-based genomic insights into Mycobacterium tuberculosis: Clonal dominance, sequence-type structure, and antimicrobial resistance-virulence landscapes.
Laith B Alhusseini, Taif H Hassan, Firas Nabeeh Jaafar, Ebrahim Kouhsari, Mohammad Sholeh
Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases · 2026-04
Abstract
BACKGROUND: Tuberculosis (TB) remains a major global public health burden, exacerbated by the continued emergence and spread of drug-resistant Mycobacterium tuberculosis. Despite the rapid expansion of publicly available whole-genome sequencing data, gaps remain in the consistent characterization of global population structure, dominant sequence-type (ST) distributions, accessory genome variability, and antimicrobial resistance (AMR) gene profiles, largely due to fragmented and uneven sampling across regions and time periods. This study aimed to conduct a large-scale in silico analysis of publicly available M. tuberculosis whole-genome sequences to descriptively characterize global ST structure, accessory genome diversity, AMR gene landscapes, and their geographic and temporal distributions, while integrating available phenotypic susceptibility data.
METHODS: We conducted the largest genomic analysis of M. tuberculosis to date, examining 7890 high-quality genomes from 82 countries (1900-2024) retrieved from NCBI GenBank. Rigorous quality filtering using CheckM ensured retention of genomes with >90% completeness and <5% contamination. Comprehensive genomic characterization included assembly metrics, annotated gene features, multi-locus sequence typing, AMR profiling using AMRFinderPlus (v4.0.23; database 2025-07-16.1), temporal trend analysis, geographic distribution mapping, and gene presence pattern (GPP) clustering to assess accessory genome diversity.
RESULTS: Analysis of 7890 high-quality M. tuberculosis genomes from 77 countries (1900-2024) revealed a highly conserved global population dominated by a few epidemic clones. Although 158 ST were identified, three ST (ST 215, ST 279, ST 276) accounted for 84.9% of all isolates, with ST 215 alone representing 58.0%, indicating a strong global clonal bottleneck, while 90.5% of ST were rare (≤10 isolates each). Most isolates were human-derived (93.7%), and genome size (∼4.38 Mb) and gene content (∼4149 genes) showed minimal variation worldwide. AMR analysis identified 27 AMR genes, but >99.6% of isolates carried only three core genes (erm(37), blaC, and aac(2')-Ic), whereas all other resistance genes occurred in <0.25% of genomes, including a single vancomycin-resistant isolate (0.01%). Phenotypic data showed high susceptibility to first-line drugs (97-98%), but substantial non-susceptibility to several second-line agents, particularly fluoroquinolones (ciprofloxacin and ofloxacin) and the second-line drugs capreomycin and ethionamide. Overall, while global M. tuberculosis is driven by a few dominant clones with a conserved core genome, rare lineages and resistance profiles highlight important hidden genomic diversity. GPP analysis identified 146 recurrent patterns, with GPP1 dominating ST 215, ST279, and ST276, suggesting limited accessory genome diversification.
CONCLUSIONS: This large-scale in silico analysis reveals a highly skewed global sequence-type distribution of M. tuberculosis, with pronounced geographic structuring and widespread presence of conserved, intrinsic chromosomal resistance-associated genes. The findings emphasize the importance of cautious interpretation of resistance gene prevalence and phenotypic non-susceptibility patterns derived from heterogeneous public datasets, and highlight key methodological considerations for global genomic analyses of M. tuberculosis.
MeSH terms
- Mycobacterium tuberculosis
- Genome, Bacterial
- Humans
- Whole Genome Sequencing
- Genomics
- Virulence
- Antitubercular Agents
- Drug Resistance, Bacterial
- Genetic Variation
- Tuberculosis