Copy number variation analysis of 9,482 <i>Mycobacterium tuberculosis</i> isolates identifies lineage-specific molecular determinants
Nikhil Bhalla, Anil Behera, Ashish Gupta, Ranjan Kumar Nanda
bioRxiv (Cold Spring Harbor Laboratory) · 2024-10
Abstract
Abstract Background Clinical manifestations of tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) show lineage-specific differences contributed by genetic polymorphism such as phylo-single nucleotide variations (PhyloSNPs) and insertion or deletions (INDELs). Intragenomic rearrangement events, such as gene duplications and deletions, may cause gene copy number differences in Mtb, contributing to lineage-specific phenotypic variations, if any, which need better understanding. Results The relative gene copy number differences in high-quality publicly available whole genome sequencing datasets of 9,482 clinical Mtb isolates were determined by repurposing and modifying an RNA-seq data analysis pipeline. The pipeline included various steps, viz., alignment of reads, sorting by coordinate, GC bias correction, and variant stabilising transformation. The strategy showed maximum separation of lineage-specific clusters in two principal components, capturing ∼54% variability. Unsupervised hierarchical clustering of the top 100 genes and pairwise comparisons between Mtb lineages revealed an overlapping subset of genes (n=42) having significantly perturbed copy numbers (Benjamin Hochberg adjusted P-value < 0.05 and log 2 (drug-resistant/sensitive) > ± 1). These 42 genes formed multiple tandem gene clusters and are known to be involved in virulence, pathogenicity and defence response to invading phages. A separate comparison showed a significantly high copy number of phage genes and a recently reported druggable target Rv1525 in pre- and extensively drug-resistant (Pre-XDR, XDR) compared to drug-sensitive clinical Mtb isolates. Conclusion The identified gene sets in Mtb clinical isolates may be useful targets for lineage-specific therapeutics and diagnostics development.
MeSH terms
- Lineage (genetic)
- Mycobacterium tuberculosis
- Biology
- Variation (astronomy)
- Tuberculosis
- Evolutionary biology
- Genetics
- Computational biology