TB Research

Updated Erdman reveals tandem repeat copy number is phase-variable and impactsadaptation across evolutionary timescales.

Samuel J Modlin, Nachiket Thosar, Paulina M Mejía-Ponce, Raegan L Lunceford, Gaelle Guiewi Makafe, Brian Weinrick, Faramarz Valafar

mSystems · 2026-01

Abstract

UNLABELLED: High-quality reference genomes are essential for comparative genomics and accurate genotype-phenotype mapping. Here, we corrected theErdman strain reference genome (Erdman) using ultra-deep HiFi sequencing. Among the small variants (= 275) between Erdmanand the current Erdman reference NC_020559.1 (Erdman), numerous are likely errors in Erdman. We identified a novel bias toward in-frame structural variations (SVs) ingenes and 28 SVs between Erdmanand Erdman, half representing likely errors in Erdman. Other SVs were consistent withevolution, including copy number variation (CNV) of promoter tandem repeats (PTRs). PTR CNVs were polyphyletic and within isogenic populations (10-10CNVs/chromosome), demonstrating the impact of phase-variable CNV across evolutionary timescales. These hypervariable PTRs pinpoint a genomic basis for rapidly switching nitric oxide resistance (Dop), biofilm formation (LpdA), drug tolerance (EfpA), and glycerol utilization (GlpD2) phenotypes. This work uncovers a common phase variation mechanism obscured by short-read sequencing limitations and provides an improved reference for comparative studies.

IMPORTANCE: (), the pathogen responsible for tuberculosis, is often described as genetically stable. Our findings reveal an overlooked evolutionary adaptation mechanism: phase variation driven by tandem repeat copy number changes in gene promoters. Enabled by ultra-deep, long-read sequencing, we corrected errors in the Erdman reference genome and uncovered frequent, spontaneous expansions and contractions of promoter repeats upstream of genes linked to nitric oxide resistance, drug efflux, and biofilm formation. Through altering promoter strength, these dynamic promoter variants may generate phenotypic diversity within subpopulations and across diverse clinical lineages, suggesting a conserved evolutionary advantage for navigating host-imposed stress. This reframes's evolutionary potential, highlighting how adaptive flexibility has been underestimated due to reliance on short-read sequencing and limited resolution of subpopulations at standard genomic depths. Our findings underscore the need to integrate structural variation-aware approaches into studies ofpathogenesis, evolution, and drug response.

MeSH terms

  • Mycobacterium tuberculosis
  • DNA Copy Number Variations
  • Tandem Repeat Sequences
  • Evolution, Molecular
  • Genome, Bacterial
  • Adaptation, Physiological
  • Humans