TB Research

Leveraging large language models to predict antibiotic resistance in Mycobacterium tuberculosis

Testagrose C, Pandey S, Serajian M, Marini S, Prosperi M, Boucher C

Bioinformatics (Oxford, England) · 2025-07

Abstract

Motivation Antibiotic resistance in Mycobacterium tuberculosis (MTB) poses a significant challenge to global public health. Rapid and accurate prediction of antibiotic resistance can inform treatment strategies and mitigate the spread of resistant strains. In this study, we present a novel approach leveraging large language models (LLMs) to predict antibiotic resistance in MTB (LLMTB). Our model is trained and evaluated on genomic data from 12 185 CRyPTIC isolates and their associated resistance profiles, utilizing natural language processing techniques to capture patterns and mutations linked to resistance. The model's architecture integrates state-of-the-art transformer-based LLMs, enabling the analysis of complex genomic sequences and the extraction of critical features relevant to antibiotic resistance. Results We evaluate our model's performance using a comprehensive dataset of MTB strains, demonstrating its ability to achieve high performance in predicting resistance to various antibiotics. Unlike traditional machine learning methods, fine-tuning or few-shot learning opens avenues for LLMs to adapt to new or emerging drugs, thereby reducing reliance on extensive data curation. Beyond predictive accuracy, LLMTB uncovers deeper biological insights, identifying critical genes, intergenic regions, and novel resistance mechanisms. This method marks a transformative shift in resistance prediction and offers significant potential for enhancing diagnostic capabilities and guiding personalized treatment plans, ultimately contributing to the global effort to combat tuberculosis and antibiotic resistance. Availability and implementation All source code is publicly available at https://github.com/ctestagrose/LLMTB.

MeSH terms

  • Humans
  • Mycobacterium tuberculosis
  • Computational Biology
  • Drug Resistance, Bacterial
  • Mutation
  • Genome, Bacterial
  • Natural Language Processing
  • Machine Learning
  • Large Language Models