TB Research

Copy number variation analysis of 9,482 <i>Mycobacterium tuberculosis</i> isolates identifies lineage-specific molecular determinants

Nikhil Bhalla, Anil Behera, Ashish Gupta, Ranjan Kumar Nanda

bioRxiv (Cold Spring Harbor Laboratory) · 2024-10

Abstract

Abstract Background Clinical manifestations of tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) show lineage-specific differences contributed by genetic polymorphism such as phylo-single nucleotide variations (PhyloSNPs) and insertion or deletions (INDELs). Intragenomic rearrangement events, such as gene duplications and deletions, may cause gene copy number differences in Mtb, contributing to lineage-specific phenotypic variations, if any, which need better understanding. Results The relative gene copy number differences in high-quality publicly available whole genome sequencing datasets of 9,482 clinical Mtb isolates were determined by repurposing and modifying an RNA-seq data analysis pipeline. The pipeline included various steps, viz., alignment of reads, sorting by coordinate, GC bias correction, and variant stabilising transformation. The strategy showed maximum separation of lineage-specific clusters in two principal components, capturing ∼54% variability. Unsupervised hierarchical clustering of the top 100 genes and pairwise comparisons between Mtb lineages revealed an overlapping subset of genes (n=42) having significantly perturbed copy numbers (Benjamin Hochberg adjusted P-value &lt; 0.05 and log 2 (drug-resistant/sensitive) &gt; ± 1). These 42 genes formed multiple tandem gene clusters and are known to be involved in virulence, pathogenicity and defence response to invading phages. A separate comparison showed a significantly high copy number of phage genes and a recently reported druggable target Rv1525 in pre- and extensively drug-resistant (Pre-XDR, XDR) compared to drug-sensitive clinical Mtb isolates. Conclusion The identified gene sets in Mtb clinical isolates may be useful targets for lineage-specific therapeutics and diagnostics development.

MeSH terms

  • Lineage (genetic)
  • Mycobacterium tuberculosis
  • Biology
  • Variation (astronomy)
  • Tuberculosis
  • Evolutionary biology
  • Genetics
  • Computational biology