Dock & Discover: Sensitizing Mycobacterium tuberculosis to Oxidative Assault via MazG Inhibition
Scott W. Nelson
Abstract
Dock & Discover: Sensitizing Mycobacterium tuberculosis to Oxidative Assault via MazG Inhibition To continue my summer of learning computational biology, I decided that I wanted to learn: 1) How to do Molecular Dynamics simulations and 2) How to do large-scale in silico docking for enzyme inhibitor discovery. I’m going to make a separate notebook on the MD simulations (which are ongoing) and this one will be for the docking experiments (also still ongoing). I’ve found these research blogs super helpful for organizing my thoughts and documenting results, so just going to keep making them even though no one is reading them. First some background material: A Global Health Crisis: Tuberculosis (TB) Tuberculosis, caused by the bacillus Mycobacterium tuberculosis (Mtb), remains one of the deadliest infectious diseases worldwide. Despite the discovery of antibiotics nearly a century ago, TB still kills over a million people each year, with the heaviest burden falling on low and middle income regions. What makes Mtb so formidable is its ability to survive and even thrive within macrophages. After inhalation, Mtb is engulfed by macrophages and sequestered in phagosomes, yet it resists the ensuing oxidative burst and lysosomal onslaught. Even more challenging, a subpopulation of bacilli can slip into a dormant, non-replicating “persister” state that is clinically silent and impervious to drugs that target active growth processes such as cell-wall synthesis or DNA replication. This propensity for persistence forces standard treatment regimens to stretch six to nine months, combining multiple antibiotics to clear both active and latent infections. Such lengthy courses often lead to poor patient adherence, fueling the rise of multidrug-resistant (MDR) and extensively drug-resistant (XDR) TB—an alarming trend that threatens to reverse decades of progress. In the face of this global health crisis, novel interventions that can eliminate both replicating and dormant Mtb are desperately needed. The Dual Oxidative Assault on Mtb: Macrophages, Antibiotics, and the MazG Defense I need to start with alittle background about how macrophages deal with Mtb. When a macrophage encounters Mycobacterium tuberculosis, it engulfs the bacillus into a membrane-bound compartment called the phagosome. Almost immediately, this nascent phagosome fuses with lysosomal vesicles and activates the “oxidative burst,” a rapid production of reactive oxygen species (superoxide, hydrogen peroxide) and reactive nitrogen species (nitric oxide and peroxynitrite). These powerful oxidants are meant to inactivate and kill the pathogen by damaging proteins, lipids, various metabolites, and DNA. Inside this hostile compartment, Mtb tries to defend itself against two fronts: the acidified, enzyme-rich milieu of the phagolysosome, and the barrage of ROS/RNS. One of the most dangerous forms of metabolite damage comes from oxidized nucleotides, which if incorporated into the genome during cellular replication, will either induce mutagenesis or completely stall DNA polymerases leading to DNA Double Strand Breaks (DSBs). To survive this onslaught, Mtb relies on specialized “housekeeping” enzymes to intercept and detoxify these oxidized nucleotides before they reach the replication fork. One of the most important defenders is MazG, and disabling it through small-molecule inhibition could tip the balance back in the macrophage’s favor. Genetic deletion of mazG hypersensitizes M. tuberculosis to hydrogen peroxide and nitric oxide: ΔmazG strains exhibit a >10-fold drop in survival after ROS challenge and they accumulate mutants consistent with the incorporation of damaged nucleotides. These observations underscore MazG’s dual role as both a guardian of the genome and a survival factor within host macrophages. Besides macrophages, another potential source of oxidative damage are antibiotics. We’ve known that many of our most potent antibiotics do more than inhibit a single enyzme or process - they unleash a cascade of reactive species inside the bacterial cell. For example, isoniazid generates peroxide‐derived radicals that damage membranes, metabolites, and DNA. Also, fluoroquinolones like moxifloxacin release a burst of hydroxyl radicals via Fenton chemistry. Even aminoglycosides such as streptomycin increase superoxide levels by inducing mistranslation of respiratory complexes. By chemically inhibiting MazG, we essentially shut down Mtb’s final “housekeeping” defense against oxidized nucleotides—regardless of whether they’re generated by the macrophage oxidative burst or by antibiotic‐induced ROS. The result is a two‐pronged onslaught: damaged NTPs accumulate to mutagenic levels during DNA replication (causing lethal G→T transversions and double‐strand breaks), while antibiotics such as INH, moxifloxacin, or streptomycin deliver their own bursts of free radicals. Inhibiting MazG therefore not only sensitizes Mtb to the immune system’s oxidative weaponry but also amplifies the collateral DNA damage triggered by frontline TB drugs, paving the way for faster kill rates and a reduced likelihood of resistance. MazG, RelA, and the Stringent Response: Disabling Mtb’s Stress Adaptation When MazG is deleted or inactivated, the consequences extend beyond an oxidized nucleotide pool. It turns out that Mtb also loses its ability to properly engage the RelA‐mediated stringent response. Under oxidative or nutrient stress, MazG’s pyrophosphatase activity provides the signals that trigger RelA to synthesize the alarmone (p)ppGpp. This small molecule reprograms the bacterium’s transcriptional landscape, downregulating growth‐related genes and upregulating stress‐survival pathways, including antioxidant defenses, DNA repair enzymes, and toxin–antitoxin systems. In ΔmazG strains (or in mutants carrying active‐site mutations) RelA induction is markedly decreased. Without sufficient (p)ppGpp, Mtb cannot fully activate its oxidative‐stress response genes, making it more vulnerable to both the macrophage oxidative burst and antibiotic‐induced ROS. Moreover, a weakened stringent response increases sensitivity to nutrient deprivation and reduces the formation of drug‐tolerant persister cells, which rely on (p)ppGpp signaling to enter a dormant, antibiotic‐refractory state. Thus, by inhibiting MazG we may be able to achieve a dual hit: not only do oxidized nucleotides accumulate to lethal levels, but the bacterium’s master regulator of stress survival (RelA) fails to activate the defenses needed to cope with that damage. This compounded vulnerability offers a powerful strategy to both speed Mtb killing and prevent the emergence of persistent, treatment‐resistant populations. Why I’m interested in MazG MazG stands out as a compelling drug target for tuberculosis because it sits at the crossroads of DNA damage repair and mutagenesis. In the infected macrophage, oxidized nucleotides threaten the integrity of the mycobacterial genome, but MazG’s pyrophosphatase activity intercepts these lesions, preventing their mutagenic or lethal incorporation. I came to know about MazG through our work on DnaE2, which is a translesion polymerase that is responsible for incorporating or copying over the oxidized nucleotides when MazG isn’t doing its job. From a biochemical standpoint, MazG is an attractive target: it functions as a stand-alone enzyme, unencumbered by large multiprotein assemblies or DNA scaffolds, making it straightforward to express, purify, and characterize in vitro. I also thought that as a novice computational biologist, it would be good to start out on a protein that didn’t require accessory factors or use DNA as a substrate. Structurally, MazG looks interesting and it’s domain architecture suggests that there may interesting dynamics or allostery that control its activity. MazG Structure This is what I meant when I said “interesting domain architecture”. MazG is a swapped dimeric protein (one subunit in green and the other in cyan - and they crisscross) with three domains that I’ve come to think of as the shoulders, belly, and feet. This naming came about when I was looking at the MD simulation trajectories and watching these parts dance around. The active sites where the oxidized nucleotides bind are in the belly near the interface of the shoulders and belly. The blue spheres represent Mg2+, which are necessary for its pyrophosphorylsis activity. The In silico Docking Workflow: Define the pocket FYI: I’m going to talk about how I used MD simulations to generate an “average MazG structure” from the trajectories in a different notebook Once you have a structure you need to define the pocket that you want to target. This is to speed up the process by telling the docking software: “please try to fit a molecule into this area”. There’s many different programs that are designed to find pockets and I decided to use one called fpocket (“The fpocket suite of programs is a very fast open source protein pocket detection algorithm based on Voronoi tessellation”). I’ll leave it to you to ask ChatGPT what Voronoi tessellation - that’s what I had to do. Fpocket is really easy to use and the output is a Pymol script that makes it really easy to visualize the pockets (below). Each group of colored sphere represents a potential pocket. It’s abit easier to see the pockets when you look at the protein in it surface mode (below). You can see that some of the pockets are rather deep (e.g., the orange and black) and most are relatively shallow (take your pick). Fortunately, fpocket quantifies these things and give you a “druggability score” for each pocket. There are 67 pockets in total and the top two scoring pockets are shown below. There are are two sites with good druggability scores. Pocket 57 is the active site, where it binds to the oxidized nucleotides that it hydrolyzes. This a gratifying result and basically a postive control for the method, since we know that this pocket binds small molecules. The other pocket is Pocket 13, which sits above the active sites, between the shoulders and body of the protein. It’s druggability score is nearly as high as the active site, so I think it has a lot of potential as a small molecule binder. I’m only showing one pocket here, but due to the two-fold symmetry, there’s actually two pockets on the front and back of the protein. I’m going to discuss this more in the MD simulation notebook, but I had already identified this location as an potential allosteric hub that is connected to the active site. A teaser figure is below (the communication network is routed through those red amino acid side chains indicated by the ‘right here!’. So, it would seem we now have a very promising pocket to target our inhibitor discovery efforts at. In silico screening of compounds that bind to Pocket 13 We now have our pocket, it’s time to do the fun stuff of actually trying to find a compound that binds to it (and hopefully allosterically inhibits the active site). There are many, many different docking programs to choose from, but I decided to go with perhaps the most established one: autodock-vina. There are many reasons for this, but the primary ones are: 1) it’s open source and very well supported and 2) my HPC already had it installed as an available module. Obtaining the ligand files Two things are required for docking using autodock-vina: 1) a “receptor file”, in our case the average structure of MazG and 2) a bunch of individual files containing the 3D information for the compounds (file type = pdbqt). #2 turned out much more difficult than I was expecting. There are quite a few different sources where you can download ligand files, but none of them are in the pdbqt format that autodock accepts. The most common format for a small molecule is SMILES (Simplified Molecular Input Line Entry System), which turns a chemical structure into something like this: CC@@H]1CC[C@H]2[C@@HCC[C@@H]1[C@@]23OO4, which represents artemisinin (2D structure below). So, you can download a large list of compounds in SMILES format, and then technically you should be able to convert that to a 2D chemical structure, and from there calculate a likely 3D structure that can be used by autodock. I spent several days going around in circles with the typical programs that do this (RDKit and Open Babel), but the resulting pdbqt files always had some problem with about ~1/3 of the compounds. Some compounds were still 2D, others had really strange bond lengths, some turned into a single sphere. Eventually, I ended up going to the ChemDiv Diversity Libraries and downloading the SDF files for three libraries: the SmartTM Library (50,213 compounds), the 3D-Diversity Natural-Product-Like Library (17,656 compounds), and the 3D-Biodiversity Library (34,000 compounds). An SDF file stands for Structure-Data File that has the atom position info for a compound in either 2D or 3D format. In this case, ChemDiv gives you the 2D format. I have tried a few different things at this point but here’s what was successful. Step 1, split the SDF file containing all the compounds into individual SDF files (1 file per compound). The python code I used requires RDKit for this: #!/usr/bin/env python3 from rdkit import Chem import os INFILE = "3D_Diversity_NPL_Library.sdf" OUTDIR = "sdf_mols" os.makedirs(OUTDIR, exist_ok=True) suppl = Chem.SDMolSupplier(INFILE, removeHs=False) print(f"Found {len(suppl)} molecules in {INFILE}") count = 0 for i, mol in enumerate(suppl, start=1): if mol is None: continue writer = Chem.SDWriter(f"{OUTDIR}/mol_{i:05d}.sdf") writer.write(mol) writer.close() count += 1 print(f"Wrote {count} individual SDFs to {OUTDIR}/") That works great and in this case, gave me 17,656 SDF files. Now, I should be able to go straight from SDF—>PDBQT with Open Babel, but that never produced reasonable looking structures. However, I did find that if I went SDF—>PDB—>PDBQT the resulting 3D structures were good. The code for this is: #!/bin/bash echo "Setting up output directories..." mkdir -p pdb mkdir -p pdbqt echo "Directories 'pdb' and 'pdbqt' are ready." echo "" # Check for .sdf files if ! ls *.sdf 1> /dev/null 2>&1; then echo "Error: No .sdf files found in the current directory." exit 1 fi echo "Starting parallel conversion process..." convert_file() { file="$1" base_name=$(basename "$file" .sdf) # Skip if already completed if [ -f "pdbqt/${base_name}.pdbqt" ]; then echo "Skipping: $file (already converted)" return fi echo "Processing: $file" obabel "$file" -O "pdb/${base_name}.pdb" --gen3d -h if [ -f "pdb/${base_name}.pdb" ]; then obabel "pdb/${base_name}.pdb" -O "pdbqt/${base_name}.pdbqt" --partialcharge gasteiger else echo " -> ERROR: Failed to create PDB for $file" fi } export -f convert_file # Run with up to 32 parallel jobs, skipping already completed files parallel -j 32 convert_file ::: *.sdf echo "----------------------------------------------------" echo "All conversions complete!" echo "Your PDB files are in the 'pdb' directory." echo "Your PDBQT files are in the 'pdbqt' directory." As always, I would like to acknowledge ChatGPT o4-mini for the coding assistance. I’m still pretty reliant on it, but actually I am starting to understand what it’s doing and could probably do some of it myself now. Docking using Autodock-vina This part is very easy once you have your receptor file (MazG), the coordinates of the box that you will place around your pocket (you can get this using Pymol), and your ligand files. Docking is computationally expensive and takes anywhere from a few hours to a few days depending on the number of compounds. The sbatch script I used to start the job is below. The most important parts of this are: the # of cpus (32), exhaustiveness (8, a measure of how intensively the algorithm searches the conformational space), and the x,y,z coordinates for the search box. #!/usr/bin/env bash #SBATCH --job-name=NPL #SBATCH --output=NPL.%j.out #SBATCH --error=NPL.%j.err #SBATCH --ntasks=1 #SBATCH --cpus-per-task=32 #SBATCH --time=7-0:0:0 # Walltime limit (DD-HH:MM:SS) #SBATCH --mem=32G set -euo pipefail # Load Conda and activate environment eval "$(conda shell.bash hook)" conda activate comp_bio # Load AutoDock Vina module module load autodock-vina # --- User parameters --- RECEPTOR="receptor_vina.pdbqt" LIGDIR="/lustre/hdd/LAS/swn-lab/swn/MazG_Docking/Compound_libraries/3D_NPL_Library" OUTDIR="/lustre/hdd/LAS/swn-lab/swn/MazG_Docking/Compound_libraries/3D_NPL_Library/docking_results" CPU=32 EXH=8 CENTER_X=75.000 CENTER_Y=55.640 CENTER_Z=60 SIZE_X=20 SIZE_Y=35 SIZE_Z=30 # ------------------------ # Prepare output mkdir -p echo "Starting Vina docking on for in do echo Docking { echo docking continue } echo in if then echo Some fi This of 17,656 compounds days to using a single and 32 You may that the output below is from a python script called This was a pretty script to and and I’ll this with else to an Open python time days abit to me and I’m on out how to speed it Eventually, I want to of so I need to make this more In case, below are the top from the (the script for is also on I’ve by a docking that near are good of by is As you can we have compounds that are than the we are looking for a I’ll go through each one of these in the several with from the other two but here’s a teaser think we found some promising This is likely be a compound that I and against MazG in the to see if it can to a below for a bunch of including a This is what ChatGPT has to about the & = well the which well for both cellular and with two and that should good & This is below the of yet high to to around at This should you to up to in biochemical for & up to you can characterize in the to and has a good of macrophages to reach & You likely need in your should be you need to above you could an on the nitrogen to a For you small (e.g., a or on the to into the and a balance between and and give you a to MazG inhibition in both biochemical and cellular with
MeSH terms
- DOCK
- Mycobacterium tuberculosis
- Oxidative phosphorylation
- Tuberculosis
- Chemistry
- Microbiology
- Biology
- Medicine