Novel whole-genome sequencing approaches to study drug resistance, transmission, and evolution of the Mycobacterium tuberculosis complex
Ana María García Marín
Repository of Digital Objects for Teaching Research and Culture (University of Valencia) · 2025-01
Abstract
Tuberculosis (TB) is an airborne infectious disease with a high global burden that continues to be one of the top leading causes of death worldwide despite decades of global efforts to control it. The combination of social inequalities, difficulties in diagnosis, complicated treatments, and the rise of drug resistance makes TB an enduring public health challenge. The Mycobacterium tuberculosis complex (MTBC), the pathogen that causes TB, has a biology that makes it exceptionally resilient. Whole-genome sequencing (WGS) has emerged as a transformative tool to better understand the MTBC biology and evolution. However, there is a lack of global standardization of genomic studies, and conventional short-read technologies leave important regions of the genome unexplored. This dissertation investigates how advanced genomic approaches can overcome these limitations and generate insights with both clinical and epidemiological relevance. First, it evaluates the recently published WHO catalogue of mutations associated with resistance. Applied to a population-based dataset of isolates from the Comunitat Valenciana, the catalogue proved reliable for predicting resistance to first-line drugs, confirming that genome-based diagnostics can support individualized treatment even in low-burden regions. The results also revealed the importance of continually updating the catalogue with new evidence. The work then turns to the challenge of capturing the diversity of the entire genome through long-read sequencing. An optimized extraction protocol was developed to meet the demanding requirements of high-integrity DNA, enabling the creation of the largest collection to date of high-quality MTBC complete genomes. This unique dataset made it possible to study diversity across evolutionary scales. The analyses revealed higher levels of variation than previously estimated by short-read methods and highlighted the role of the pe/ppe gene family. These genes, typically excluded from genomic studies, showed diversity hotspots due to gene conversion, a homologous recombination mechanism that may impact host-pathogen interactions. Finally, the dissertation demonstrates the added value of complete genomes for studying transmission. By recovering variants in previously masked regions and identifying indels and structural variants missed by short reads, complete genomes improved the resolution of transmission clusters and enabled a more accurate reconstruction of networks. The use of cluster- and patient-specific references uncovered genuine within-host diversity, providing clues about microevolution during infection and enhancing the resolution of transmission events. By validating standardized resistance prediction, producing a comprehensive dataset of complete genomes, and applying these to questions of diversity and transmission, the thesis highlights that the future of tuberculosis control lies in fully integrating genomic insights into clinical practice and global strategies.
MeSH terms
- Mycobacterium tuberculosis complex
- Tuberculosis
- Mycobacterium tuberculosis
- Standardization
- Biology
- Computational biology
- Genomics
- Genome
- Drug resistance
- Infectious disease (medical specialty)
- Data science
- Public health
- Precision medicine
- Global health
- DNA sequencing
- Whole genome sequencing
- Disease
- Genetics