TB Research

Advanced Machine Learning for Predicting Drug Resistance in Clinical Isolates of Mycobacterium Tuberculosis Complex

Naoufal Sirri, Christophe Guyeux, Christophe Sola

Abstract

Tuberculosis remains a significant public health issue, and addressing multidrug-resistant (MDR) and extensively drug-resistant (XDR) strains is a critical global health priority. Resistance primarily results from mutations in genes related to drug targets or enzyme conversions, though our understanding of these mutations is still incomplete. Whole-genome sequencing (WGS) has become a prevalent method for rapidly characterizing bacterial isolates and detecting mutations associated with drug resistance. Despite its widespread use, WGS has limitations, particularly in accounting for the evolutionary aspects of resistance. Conversely, machine learning techniques have shown great promise in predicting Mycobacterium tuberculosis (MTB) resistance to specific drugs and in identifying resistance markers efficiently. In this study, machine learning models were applied to a dataset of 28,073 MTB isolates, which had undergone both WGS analysis and laboratory-based drug susceptibility testing (DST) for ten antituberculosis drugs. Advanced boosting algorithms, including extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and a novel deep neural network model, were employed to forecast drug resistance. Separate models were constructed for each drug, using the 10 most impactful feature classes as input variables during the training phase to optimize performance. The effectiveness of the models was evaluated using various metrics, such as sensitivity, specificity, F1 score, receiver operating characteristic (ROC) curve, and the area under the curve (AUC). All three models accurately predicted drug resistance, with the deep learning model outperforming existing methods. AU C values for nine drugs ranged from 0.97 to 0.99, demonstrating model robustness. This study underscores the utility of machine learning for drug resistance prediction, effectively integrating multiple predictors and aiding clinical decision-making while improving SNP detection as WGS data increases.

MeSH terms

  • Mycobacterium tuberculosis
  • Drug resistance
  • Tuberculosis
  • Drug
  • Microbiology
  • Computer science
  • Medicine