Decision letter: Combining genomics and epidemiology to analyse bi-directional transmission of Mycobacterium bovis in a multi-host system
Christian Gortázar
Abstract
Article Figures and data Abstract eLife digest Introduction Results Discussion Materials and methods Appendix 1 Appendix 2 Data availability References Decision letter Author response Article and author information Metrics Abstract Quantifying pathogen transmission in multi-host systems is difficult, as exemplified in bovine tuberculosis (bTB) systems, but is crucial for control. The agent of bTB, Mycobacterium bovis, persists in cattle populations worldwide, often where potential wildlife reservoirs exist. However, the relative contribution of different host species to bTB persistence is generally unknown. In Britain, the role of badgers in infection persistence in cattle is highly contentious, despite decades of research and control efforts. We applied Bayesian phylogenetic and machine-learning approaches to bacterial genome data to quantify the roles of badgers and cattle in M. bovis infection dynamics in the presence of data biases. Our results suggest that transmission occurs more frequently from badgers to cattle than vice versa (10.4x in the most likely model) and that within-species transmission occurs at higher rates than between-species transmission for both. If representative, our results suggest that control operations should target both cattle and badgers. eLife digest Disease-causing microbes that infect more than one type of animal can be difficult to control. This is especially true when they infect wildlife. For example, Mycobacterium bovis is a bacterium that causes tuberculosis in tens of thousands of cattle in Britain every year and also infects badgers and other wildlife. Controlling the infections in cattle is essential, as it helps prevent the bacteria from infecting humans, improves cattle welfare and reduces the substantial costs to the livestock industry. Analysing the relatedness of M. bovis genomes from infected cattle and badgers may help scientists work out how often badgers infect cattle and vice versa. Scientists have collected data and M. bovis samples from infected badgers in Woodchester Park, in England, for over three decades. Using these data and additional information about M. bovis infecting nearby cattle may help scientists learn how the bacteria spreads and how to stop it. Now, Crispell et al. show that complex patterns of contact between cattle and badgers likely drive the persistence of tuberculosis in cattle, also known as bovine tuberculosis. In three separate analyses, Crispell et al. compared the genomes of M. bovis found in cattle and badgers, the animals' locations, when they were infected, and whether they could have been in contact. The analyses found that M. bovis was likely to have been transmitted more frequently from badgers to cattle rather than from cattle to badgers. They also showed that transmission within each species happened more often than transmission between species. If these results are confirmed by other studies, they may help scientists develop better strategies for controlling tuberculosis in British cattle. In particular, controversial control strategies – such as badger culls – could be more targeted to better combat tuberculosis in cattle but have less of an impact on badgers. These insights might also aid control efforts in other countries where bovine tuberculosis is a problem and an important source of human tuberculosis. Introduction Control of a pathogen in a system where it can infect multiple species requires an understanding of the role of each host species in the infection dynamics (Haydon et al., 2002). For example, when each host species is capable of maintaining infection independently, control operations in one species can be rendered ineffective as a result of spillover from another. Mycobacterium bovis infection in cattle populations (resulting in bovine tuberculosis - bTB) is a problem around the world (Ayele et al., 2004; Cousins and Roberts, 2001; de Kantor and Ritacco, 2006; Godfray et al., 2013; Reviriego Gordejo and Vermeersch, 2006; Schmitt et al., 2002), with many wildlife species implicated in its spread and persistence in different bTB systems (Delahay et al., 2002; Gortazar et al., 2003; Miller and Sweeney, 2013; Nugent, 2005; Nugent et al., 2015). On the islands of Britain and Ireland, the current evidence suggests that effective control of infection in cattle is hindered by transmission from an infected wildlife population – the European badger (Meles meles) (Godfray et al., 2013). Although a considerable amount of research demonstrates an association between M. bovis found in sympatric cattle and badger populations (Balseiro et al., 2013; Goodchild et al., 2012; Olea-Popelka et al., 2005; Vial et al., 2011; Woodroffe et al., 2005), quantification of the direction and extent of transmission remains elusive. Recent studies using whole genome sequences (WGS) have demonstrated a close genetic relationship among M. bovis isolates taken from sympatric cattle and wildlife populations (Biek et al., 2012; Glaser et al., 2016; Patané et al., 2017). However, the low genomic variability of M. bovis and imbalanced sampling across host species has limited the ability to identify the direction of transmission. Evidence to date suggests that, even with access to pathogen sequence data, obtaining directional estimates of transmission might only be possible at the population level and will require dense targeted sampling and fine-grained epidemiological metadata (Kao et al., 2016; Kao et al., 2014), as has previously been demonstrated in investigations of M. tuberculosis outbreaks in humans (Bryant et al., 2013; Gardy et al., 2011; Guthrie et al., 2018; Walker et al., 2012; Walker et al., 2018; Yang et al., 2017) and in tracing between cattle herds for outbreaks of M. bovis (Biek et al., 2012; Salvador et al., 2019). However, these approaches have yet to be applied to situations where dense multi-host pathogen data are available. Since the 1970s, a high-density naturally infected badger population at Woodchester Park in southwest England has been the subject of detailed study (Delahay et al., 2013). Both the resident badgers and sympatric cattle herds are frequently infected with M. bovis, providing the potential for inter-species transmission of infection to occur in either direction (DEFRA, 2017; Delahay et al., 2013). The data and samples associated with bTB occurrence in and around Woodchester Park are uniquely detailed, with individual-level host life history data and archived M. bovis isolates available for both the cattle (Orton et al., 2018) and badger (Delahay et al., 2013) populations. By combining WGS of selected cattle and badger isolates, with detailed local population data from this exceptionally in-depth study system, our work aimed to quantify the relative roles of the local badger and cattle populations in the spread and persistence of M. bovis in an endemic area. Based on previous evidence of transmission between cattle and badgers, and the success of combining detailed tracing methods with WGS for M. tuberculosis, our hypothesis is that M. bovis circulation in our endemic setting is not limited to a single maintenance host and that it involves bi-directional transmission between the two host populations. Our research aimed to test this hypothesis and to quantify transmission patterns by analysing the Woodchester Park data using a series of statistical and observational analyses linking pathogen genome data with diagnostic testing and population movement and demographic data for both cattle and badgers. Results Selecting the isolates, generating and processing the sequencing data Archived M. bovis isolates were available from 116 badgers and 189 cattle living in and around Woodchester Park. Multiple isolates were available from the sampled badgers, resulting in a total of 230 isolates sourced from badgers. These isolates were whole genome sequenced, and, after quality assessments, 193 badger-derived (from 98 individual badgers taken from 2000 to 2011) and 159 cattle-derived sequences (from 1988 to 2013) were retained for further analyses. Evidence of epidemiological signatures in the genetic data To investigate the presence of spatial, temporal, and network signatures associated with infection dynamics in the M. bovis genomic data, inter-sequence genetic distances were calculated between all the cattle- and badger-derived sequences and compared to population metrics. The metrics described the spatial-, temporal-, and network-based relationships that were expected to be associated with pathogen transmission. The genetic and epidemiological data were compared using Random Forest (Liaw and Wiener, 2002) and Boosted Regression (Elith et al., 2008) models in R (v3.4.3; R Development Core Team, 2016) to separately analyse badger–badger (n = 12483), cattle–cattle (n = 1927), and badger–cattle (n = 4838) comparisons. The Random Forest (and Boosted Regression) models were able to explain approximately 67% (62%), 60% (54%) and 75% (70%) of the variation observed in the inter-sequence genetic distance distributions associated with the badger–badger, cattle–cattle, and badger–cattle comparisons, respectively. For each of these models, metrics based on spatial and temporal distances were the most informative in explaining the variation in the genetic distances. Generally, as the temporal and spatial distances associated with the sampled animals decreased, the number of differences between the M. bovis genomes decreased (Appendix 1—figures 5, 6 and 7). There was substantial agreement in the variable rankings between the Random Forest and Boosted Regression models (Appendix 1—figures 2, 3 and 4). For the within-species comparisons metrics, the network data were also highly informative. Generally, the number of differences between the genomes associated with a pair of animals of the same species decreased as the connectedness of their social groups (badgers) or herds (cattle) increased. The variation explained by the Random Forest models and the high ranking of spatial-, temporal-, and network-based metrics was robust to the presence of highly correlated or non-informative metrics and those with missing data (data not shown). Inter-species clades identified in the phylogeny The relatedness of M. bovis genomes sampled from the cattle and badgers was evaluated by constructing a phylogenetic tree (Figure 1) using RAxML (v8.2.11; Stamatakis, 2014). Genetic diversity was observed between the cattle- and badger-derived M. bovis sequences, with the number of Single Nucleotide Variants (SNVs) between sequences ranging from 0 to 150 (median = 20). Five clades including cattle- and badger-derived sequences were identified (Figure 1 and Figure 1—figure supplement 1), using a 10 SNV threshold (informed by thresholds used for M. tuberculosis [Bryant et al., 2013; Jajou et al., 2018; Roetzer et al., 2013; Yang et al., 2017]). Figure 1 with 1 supplement see all Download asset Open asset A Maximum Likelihood phylogenetic tree constructed using RAxML (v8.2.11; Stamatakis, 2014) and rooted against the Mycobacterium bovis reference sequence, AF2122/97 (Malone et al., 2017). Badger and cattle isolates are represented at the tips of the phylogeny by circles and triangles, respectively. Five clades, labelled 1–5, are highlighted with cyan, pink, green, purple, and brown branches, respectively. Cattle and badger isolates within the clades can be distinguished by their shape and colour. Each internal node in the phylogeny is shown as a grey to black shaded circle, with the intensity of the shading indicating the amount of support each node had across 100 bootstraps. Four of the five clades (1–4) identified contained highly similar (within three SNVs) badger- and cattle-derived M. bovis sequences. The badger-derived M. bovis sequence in clade 5 was six SNVs away from its closest cattle-derived sequence. The similarities between the cattle-derived and badger-derived M. bovis sequences in clades 1–4 indicate recent shared transmission histories (Meehan et al., 2018). Clade 4 (highlighted in purple in Figure 1) contained the majority (156/193) of the badger-derived M. bovis sequences and represents the main lineage circulating within the Woodchester Park badger population. In addition, the presence of 16 cattle-derived sequences in clade 4, 15 of which were distant (up to 12 SNVs) from the clade root is consistent with multiple badger-to-cattle transmission events. In contrast, the presence of cattle-derived sequences close to the roots of clades 1–5 suggests that these lineages might have originated in cattle, although these patterns could also be explained by the cattle population being sampled up to 12 years prior to the badger population (cattle were sampled from 1988 to 2013 and badgers from 2000 to 2011). Although clades 1 and 5 contained highly similar sequences originating from cattle and badgers, each clade was associated with only eight animals, making meaningful inference of inter-species transmission patterns difficult. In addition to inter-species clades, several cattle-only clades were identified (Figure 1). Consistent with our hypothesis, the close proximity of M. bovis genomes sourced from cattle and badgers suggests that inter-species transmission occurred in the sampled system. In addition, the presence of clades dominated by a single species suggests that sustained within-species transmission has been occurring in both the cattle and badger populations. The life histories of the sampled cattle and badgers and in-contact animals associated with the inter-species clades (clades 1–5) identified in Figure 1 were interrogated. In this manuscript, a badger or cow is considered 'sampled', if one of the M. bovis genomes analysed here was sourced from it. In-contact animals were defined as those that lived in the same herd (for cattle) or social group (for badgers) at the same time as one or more of the sampled animals, according to the available data. From the interrogations of the life history data, further evidence indicative of inter-species transmission and disease maintenance in the Woodchester Park badger population was identified for the animals associated with clade 4 (Figure 2; equivalent figures for the remaining clades can be found in Figure 2—figure supplements 1, 2, 3, and 4). Infection was detected in the majority of the sampled badgers before it was detected in the majority of the sampled cattle. Sampled badgers were present in Woodchester Park at least from 1993 until 2011, based on the available capture and sampling data (Figure 2c). The sampled badgers were in contact with 575 captured badgers, 291 (51%) of which had tested positive for M. bovis infection at some point in their lives (Figure 2a). In contrast, the sampled cattle were in contact with 1760 cattle, of which only 312 (18%) tested positive for M. bovis (Figure 2b). In the animals associated with clade 4, infection was detected earlier in badgers, except in the case of one cow, despite the cattle population being sampled over a broader temporal and spatial window (see Materials and methods section: 'Selecting the isolates' for more details). In addition, the badgers were the most represented species in clade 4. These two observations suggest that the clade 4 lineage was being maintained in the badger population. The single cattle-derived sequence that was found closest to the root node of clade 4 (Figure 2c) was sourced from an animal sampled six years prior to any sequences derived from badgers being available. Across all inter-species clades investigated, the sampled cattle (n = 71) were in contact with approximately 11,732 animals, 1356 of which tested positive for M. bovis infection, whereas the sampled badgers (n = 97) were in contact with approximately 650 badgers, over half of which (329) tested positive. Figure 2 with 4 supplements see all Download asset Open asset Life history summaries of the sampled and in-contact cattle and badgers associated with clade 4 in Figure 1. (a) The number of in-contact badgers associated with the sampled badgers (total in grey, number of animals that have tested positive in red). (b) The number of in-contact cattle associated with the sampled cattle (total in grey [right axis], number of animals that reacted inconclusively [red] or positively [blue] to routine skin test [left axis]). In-contact animals are those that lived in the same herd (cattle) or social group (badgers) at the same time as the sampled animals. (c) The recorded lifespans of the sampled cattle (black horizontal bars) and badgers (grey horizontal bars) associated with clade 4. Estimated inter-species transmission rates Although the patterns observed in the phylogenetic and animal life history data were consistent with inter-species transmission in both directions, further analyses were required to quantify the inter-species transmission rates. These further analyses should account for the temporal and spatial sampling biases resulting from the broader sampling window applied to the cattle population in time (1988 to 2013 versus 2000 to 2011) and space (cattle were sampled from up to 100 km away from the Woodchester Park area, whereas the badgers were only sampled from within Woodchester Park). A series of analyses were conducted using the Bayesian Structured coalescent Approximation, or BASTA, package (De Maio et al., 2018) available as part of Bayesian evolutionary analyses platform BEAST2 (Bayesian Evolutionary Analysis by Sampling Trees; Bouckaert et al., 2014). These analyses aimed to estimate the M. bovis inter-species transmission rates between the sampled badger and cattle populations. BASTA is capable of estimating evolutionary dynamics in a structured population and accounting for sampling biases. Here the sampled M. bovis population was structured as it was circulating largely separately in the sampled cattle and badger populations, as seen in Figure 1 and the strong population-specific epidemiological signatures found by the Random Forest and Boosted Regression analyses. In addition, further structure exists within the cattle and badger populations as these were subdivided into herds and social groups, respectively. A series of increasingly spatially structured population models were defined to determine whether the inter-species transmission rates estimated using BASTA were affected by the spatial patterns evident from the Random Forest and Boosted Regression analyses. Structured population models were also chosen to address the spatial sampling biases, by introducing an increasingly structured unsampled badger population. Previous analyses have used BASTA in a similar fashion to estimate evolutionary dynamics in the presence of unsampled populations (De Maio et al., 2015). To further reduce the influence of the spatial and temporal biases and the computational load, the BASTA analyses used a subset of the cattle- (n = 83) and badger-derived (n = 97) M. bovis sequences obtained between 1999 and 2014 within 10 km of Woodchester Park. The AICM (Akaike's Information Criterion Markov Chain Monte Carlo) score (Baele et al., 2013) was used to compare the BASTA analyses based on different structured populations (Figure 3a). The structured population with two demes (M. bovis populations in badgers and cattle) had the best (lowest) AICM score, although there was considerable overlap with the bootstrapped AICM score interval for one of the four deme models (splitting the M. bovis populations in badgers and cattle into inner and outer populations based on being within or beyond 3.5 km from Woodchester Park [Figure 3a]). The estimated inter-species transition rates provided from each BASTA analysis demonstrated considerable variation, with some estimated cattle-to-badger transition rates bounding zero (Figure 3b). The estimated transition rates can be considered equivalent to the transmission rates, because the states (between which the transition rates were estimated) considered here represented different species. The estimates of the inter-species transition rates from the two-deme model with the best AICM score support the existence of both badger-to-cattle transmission (0.045 times per lineage per year, lower 2.5%: 0.028, upper 97.5%: 0.069) and cattle-to-badger transmission (0.0044 times per lineage per year, lower 2.5%: 0.00021, upper 97.5%: 0.017). Figure 3b shows the order of magnitude differences between the estimated inter-species transmission rates, with the highest supported two-deme model estimating that badger-to-cattle transmission events occurred on average 10.4 times more frequently than cattle-to-badger transmission events in the sample population. Figure 3c represents the lower bound on the number of times (according to the analyses based on the favoured two-deme model) that the sampled M. bovis population was transmitted from one animal to another (regardless of sub-population and, where possible, assuming the ancestral node and one of its daughter nodes represent infection in the same animal [Figure 3—figure supplement 1]). The estimated counts of these transmission events are consistent with the estimated inter-species transition rates and demonstrate that within-species transmission occurs at a higher rate. transmission was estimated to occur at least times more frequently than badger-to-cattle transmission 2.5%: upper 97.5%: In cattle, analyses estimated that at least transmission events occurred 2.5%: upper 97.5%: whereas the estimated number of cattle-to-badger events zero 2.5%: upper 97.5%: 4, with a of The counts of events between individual animals by BASTA represent the lower bound of the number of transmission events that occurred over the evolutionary history of the sampled M. bovis population because they are estimated on the transmission between the sampled and ancestral host animals and not account for missing in these Figure 3 with 1 supplement see all Download asset Open asset of and inter-species transition estimates from the BASTA analyses. structure is described in Figure and for each model the of defined demes were or to (a) The Information Criterion Markov Chain Monte et al., 2013) is calculated for each of the of a structured population analysed in BASTA (Figure The show the lower and upper and of the AICM on 100 bootstrapped (b) Estimated inter-species transition rates for each multiple and transition rates were estimated (see Figure the were The each represent the of each either as a of associated with multiple estimated rates (for the and or a single (for the and (c) The number of between the known and estimated states on each phylogenetic tree in the by the structured population model analysed in BASTA is in Figure 3—figure supplement 1). The show the lower and upper and of the the results from the BASTA analyses are consistent with the hypothesis that circulation of M. bovis in our study populations transmission within and between the badgers and cattle. In addition, the directional inter-species transmission rates indicate that transmission from badgers to cattle occurred more frequently than transmission from cattle to badgers and inter-species transmission rates were estimated to be lower than transmission rates. Discussion We that the sampled M. bovis population was circulating within and between the sampled cattle and badger populations. our hypothesis across multiple analyses, found that, of these analyses are in their our results are consistent with our hypothesis and suggest that there has been a history of and between-species transmission in the Woodchester Park area, and an important role for badgers in disease Our of methods was based in part on our of data biases. sampling should be to in the host populations and over the same spatial and temporal the of of the for cattle de et al., and badgers et al., and a on archived isolates, data biases were this are the dense sampling of both host populations and the exceptionally detailed Random Forest and Boosted Regression models identified strong epidemiological signatures of M. bovis transmission within and between host populations. metrics the spatial, temporal, and network dynamics were all highly indicative of M. bovis circulation being on these the variation observed between M. bovis sourced from cattle and badgers was found to be explained by where the animals and when they were in these relationships could be to identify in the as might be by badger social operations et al., Woodroffe et al., The present study identified further evidence of and between-species transmission in the phylogenetic relationships between the M. bovis genomes (Figure 1). Five clades highly similar M. bovis genomes derived from infected cattle and badgers were that substantial inter-species transmission had The presence of clades dominated by a single host species was also consistent with sustained within-species transmission. However, these phylogenetic relationships are to sampling biases and should be with For example, one of the of the cattle-derived M. bovis genomes in the clades shown in Figure 1 is that they originated in cattle. this could be the result of sampling the cattle population over a broader temporal (from 1988 to 2013) than the badgers to 2011). of the cattle and badger life histories associated with clade 4 (Figure 1) evidence of persistence of this lineage in the badger population (Figure the cattle population being sampled over a time the badgers associated with clade 4 were infected earlier than the cattle and that in the badgers for over 10 The remaining clades that cattle could have been infected before it was not possible to determine whether badgers of Woodchester Park could be these Our results suggest that transmission is likely to be dominated by that spatial distances less than were highly informative in the genetic relationships in the analyses. badgers further away from Woodchester Park are to be the patterns observed in our sampled badger and the clades observed here are more explained by of M. bovis from cattle. additional of these analyses is that other wildlife species were Previous research by Delahay et al. found other species infected with M. bovis in the area, at lower in and in than the sampled badger population Delahay et al., 2013). considerable evidence in the present study for inter-species transmission of M. bovis, used BASTA, an analysis platform that can account for sampling biases (De Maio et al., to quantify these (Figure 3b). The BASTA analyses estimated transition rates between demes within a structured population. the demes within the structured model were the estimated between-species transition rates can be considered equivalent to transmission rates between populations of badgers and cattle. The most favoured two-deme model estimated transmission rates on 10.4 times higher than transmission rates (Figure and However, the most favoured model a more complex population estimated that inter-species transmission rates were close to Although even structured coalescent models not spatial contact that the model is favoured is more spatially structured models not However, the two-deme model may also have been favoured because of the limited genetic diversity available to estimate the evolutionary and further with spatial approaches is an important In the of inter-species transmission rates, the BASTA analyses also counts of the number of transmission even
MeSH terms
- Mycobacterium bovis
- Host (biology)
- Genomics
- Transmission (telecommunications)
- Biology
- Computational biology
- Computer science