AI-Driven Workflows Integrating Strain Diversity and Host Metabolic-Genetic Variation for Tuberculosis Treatment Prioritization
Andrea Valenzuela
Open MIND · 2025-01
Abstract
Tuberculosis (TB) remains the leading cause of death from infectious diseases, with treatment complicated by the growing prevalence of drug-resistant Mycobacterium tuberculosis (MTB) strains. Current regimens are largely standardized, overlooking substantial metabolic and genetic heterogeneity among clinical isolates, even within the same lineage or resistance phenotype. Large-scale experimental screening of drug combinations across diverse strains and host contexts is infeasible due MTB slow growth rate, cost, and dangerous study conditions, creating a need for scalable computational approaches that can prioritize mechanistic leads. This dissertation presents AI-driven, multi-omics workflows that integrate MTB pathogen strain diversity and host macrophage metabolic-genetic variation to prioritize mechanistic leads for tailored TB therapy. Genome-scale metabolic models (GEMs) were coupled with machine learning to predict drug interaction scores, linking these predictions to strain-specific pathways, key metabolic reactions, host-informed gene knockout effects, and regulatory single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWAS) through two novel complementary genetics system biology pipelines. Across 43 drugs, over 13,000 drug combination predictions were generated for 20 MTB GEMs (19 MTB, 1 Host-pathogen), identifying both broadly synergistic combinations and strain-specific synergies. Using the sMtb-RECON (host macrophage–pathogen) model, in silico gene knockouts of over 2,000 genes were profiled for the standard HRZE regimen, ranking metabolic genes by their impact on interaction scores. High-confidence regulatory GWAS variants were mapped to these genes, with predicted effects evaluated via gene knockout simulations. Benchmarking against available in vitro and clinical datasets yielded high correlations, supporting the predictive strength of the approach. In summary, these workflows generate testable hypotheses for experimental validation and provide a proof-of-concept for scalable, AI-driven integration of pathogen strain diversity, host macrophage metabolism, and genetic variation in mechanistic lead prioritization for TB drug development. This dissertation is organized into four chapters: Chapter 1 introduces the global burden of TB, the clinical and mechanistic limitations of current treatment strategies, and the rationale for computational approaches that integrate MTB strain diversity and host biology. It presents the research questions, objectives, and aims, and discusses the study’s scope, underlying assumptions, and significance. Chapter 2 details the development of the Strain-specific Tuberculosis Antibiotic Regimen (STAR) pipeline, an extension of the Condition-specific Antibiotic Regimen Assessment using Mechanistic Learning (CARAMeL) framework. STAR integrates drug interaction data, strain-specific genome-scale metabolic models (GEMs), and flux-informed machine learning to predict drug combination outcomes across 19 diverse MTB clinical strains. The approach enables the identification of synergistic or antagonistic combinations tailored to each strain and highlights key metabolic reactions driving these outcomes. Chapter 3 investigates host–pathogen genome-scale gene knockout simulations, focusing on the impact of high-probability regulatory variants on drug response. This work emphasizes the interplay between host and pathogen genetics and the value of incorporating variant data into mechanistic treatment models. Chapter 4 examines the translational potential of the AI-driven framework, outlining its integration into clinical decision-support systems, educational tools for healthcare providers, and associated ethical considerations. Collectively, these chapters present mechanistically interpretable, AI-enhanced workflows for prioritizing TB treatment leads. The workflows developed herein address a critical gap by enabling researchers to focus experimental resources on the most promising mechanistic leads, a capability that is particularly valuable for resistant strains where laboratory testing is most challenging.
MeSH terms
- Computational biology
- In silico
- Biology
- Workflow
- Mycobacterium tuberculosis
- Systems biology
- Gene
- Single-nucleotide polymorphism
- Genetics
- Tuberculosis
- Genomics
- Mechanism (biology)
- Genome-wide association study
- Host (biology)
- Benchmarking
- Bioinformatics
- Prioritization
- DNA microarray
- Drug repositioning
- Genetic variation
- Ranking (information retrieval)
- Strain (injury)
- Drug resistance
- Druggability
- Context (archaeology)