TB Research

Development of machine learning models to identify potentially active compounds against tuberculosis.

Aman Rawat, Saatvik Gupta, Chiranjit Pal, Sushobhan Chowdhury

Scientific reports · 2025-12

Abstract

Tuberculosis (TB) is a contagious bacterial disease affecting millions of people globally and is one of the major causes of morbidity and mortality, particularly in the developing world. The spread of multi-drug-resistant (MDR) and extensively drug-resistant (XDR) tuberculosis has emerged as a major challenge to TB treatment, demanding the discovery of novel drug candidates. The process requires fast and efficient lead identification methodologies. This research content embarks on a comprehensive exploration of machine learning algorithms applied to identify potential compounds active against TB. Leveraging a comprehensive dataset encompassing 23,791 molecules from 5 targets with unique SMILES and ChEMBL IDs obtained from the ChEMBL database, a total of 103 classification models were developed based on six different types of molecular representations, namely RDKit descriptors (RDKitDes), MACCS fingerprints (MACCSFP), Morgan fingerprints (MorganFP), Atom-pair fingerprints (PairsFP), PubChem fingerprints (PubChemFP), and RDKit fingerprints (RDKitFP). In this regard, seven machine learning algorithms, e.g., Random Forest (RF), XGBoost, Decision Tree (DT), k-Nearest Neighbors (KNN), Gaussian Naive Bayes (GNB), Logistic Regression (LR), and ANN, were employed to build these models using different combinations of dataset. Further, the performance of the models was assessed using a tenfold cross-validation, with the area under the receiver operating characteristic curve (AUC) as the evaluation metric. In addition to that, the contribution of the important descriptors in the molecule's bioactivities was interpreted using the SHapley Additive exPlanations (SHAP) method.

MeSH terms

  • Machine Learning
  • Antitubercular Agents
  • Humans
  • Tuberculosis
  • Algorithms
  • Mycobacterium tuberculosis
  • Drug Discovery
  • Tuberculosis, Multidrug-Resistant