Genomic epizootiology of a Brucella abortus outbreak in Northern Ireland (1997–2012)

Genomic epizootiology of a Brucella abortus outbreak in Northern Ireland (1997–2012)

Journal Pre-proof Genomic epizoology of a Brucella abortus outbreak in Northern Ireland (1997–2012) Adrian R. Allen, Georgina Milne, Kevin Drees, Ele...

5MB Sizes 0 Downloads 9 Views

Journal Pre-proof Genomic epizoology of a Brucella abortus outbreak in Northern Ireland (1997–2012)

Adrian R. Allen, Georgina Milne, Kevin Drees, Eleanor Presho, Jordon Graham, Paul McAdam, Kerri Jones, Lorraine Wright, Robin Skuce, Adrian M. Whatmore, Judith Graham, Jeffrey T. Foster PII:

S1567-1348(20)30067-8

DOI:

https://doi.org/10.1016/j.meegid.2020.104235

Reference:

MEEGID 104235

To appear in:

Infection, Genetics and Evolution

Received date:

10 October 2019

Revised date:

28 January 2020

Accepted date:

4 February 2020

Please cite this article as: A.R. Allen, G. Milne, K. Drees, et al., Genomic epizoology of a Brucella abortus outbreak in Northern Ireland (1997–2012), Infection, Genetics and Evolution(2019), https://doi.org/10.1016/j.meegid.2020.104235

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof

Genomic epizoology of a Brucella abortus outbreak in Northern Ireland (1997-2012). Adrian R. Allen*a, Georgina Milnea, Kevin Dreesb, Eleanor Preshoa, Jordon Grahama, Paul McAdamc, Kerri Jonesa, Lorraine Wrighta, Robin Skucea, Adrian M. Whatmored, Judith Grahame, Jeffrey T. Fosterf. a

-p

ro

of

Agri Food and Biosciences Institute (AFBI), AFBI Stormont, Bacteriology Branch, Stoney Road, Belfast, United Kingdom. b University of New Hampshire, Department of Molecular, Cellular and Biomedical Sciences, Rudman Hall, 46 College Road, Durham, New Hampshire, USA. c Fios Genomics, Nine Edinburgh Bioquarter, 9 Little France Road, Edinburgh, United Kingdom. d Department of Bacteriology, Animal and Plant Health Agency (APHA), New Haw, Addlestone, Surrey, United Kingdom. e Department of Agriculture, Environment and Rural Affairs, Veterinary Service, Belfast, Northern Ireland, United Kingdom. f Pathogen and Micro-biome Institute, Northern Arizona University, Flagstaff, Arizona, USA.

lP

re

* Corresponding author Adrian Allen Email: [email protected]

Jo

ur

na

Declaration of interest - All authors declare no conflict of interest.

1

Journal Pre-proof

Abstract

ur

na

lP

re

-p

ro

of

Background In the recent past (1997–2012), Northern Ireland (NI) in the United Kingdom (UK), suffered an outbreak of Brucella abortus, which at its height affected over 200 cattle herds. Initially, isolates were characterized using multi locus variable number tandem repeats analysis (MLVA). Whilst informative in this setting, hyper-variability in some loci, limited the resolution necessary to infer fine-scale disease transmission networks. Consequently, we applied whole genome sequencing to isolates from this outbreak to evaluate higher resolution markers for disease epizoology. Results Phylogenetic analysis revealed the B. abortus outbreak in NI was caused by two distinct pathogen lineages. One contained isolates from an endemic infection previously believed to have been eradicated, consistent with the 1997-2012 outbreak being linked to a previously thought eradicated endemic infection. The dominant second lineage exhibited little genetic diversity throughout the recrudescent outbreak, with limited population sub-structure evident. This finding was inconsistent with prior MLVA molecular characterizations that suggested the presence of seven clonal complexes. Spatio-temporal modeling revealed a significant association of pairwise SNP differences between isolates and geographic distances, however effect sizes were very small owing to reduced pathogen diversity. Conclusions Genome sequence data suggested that MLVA hyper-variability in some MLVA loci contributed to an overestimate of pathogen diversity in the most recent outbreak. The low diversity observed in our genomic dataset made it inappropriate to apply phylodynamic methods to these data. We conclude that maintaining data repositories of genome sequence data will be invaluable for source attribution/epizoological inference should recrudescence ever re-occur. However genomic epizoological methods may have limited utility in some settings, such as when applied to recrudescent/re-emergent infections of slowly evolving bacterial pathogens.

Jo

Keywords – Brucellosis, molecular epizoology, genome sequencing, recrudescence

2

Journal Pre-proof

1. Introduction Brucellosis is one of the world’s most common and significant zoonoses (Adone and Pasquali 2013, Whatmore 2014). Indeed, it is recognised by the U.S. Centers for Disease Control (CDC) as one of the top 8 major zoonotic disease risks in the United States (CDC, 2019), whilst the World Health Organisation (WHO) highlights its global importance and lists it as a potential bioterror weapon (WHO, 2006).

re

-p

ro

of

Brucellosis is caused by members of the pathogenic bacterial genus, Brucella (Moreno and Moriyon 2002). To date, twelve species within the genus have been characterised (Olson and Palmer, 2014; Scholz et al. 2018) with a range of hosts from pinnipeds and cetaceans to pigs, goats and cattle (Whatmore 2014). Brucella abortus infection in cattle has inflicted significant economic damage to national agricultural industries (Wattam et al. 2013, Whatmore 2014). In domesticated livestock, the hallmarks of brucellosis infection are abortion and epididymitis (Moreno, 2014), with transmission occurring both vertically and horizontally (Moreno, 2014). A primary route of horizontal transmission is via aborted foetal material and amniotic fluids (Whatmore 2009, Abernethy et al. 2011, Moreno, 2014); other animals coming into contact with this material may become infected via the nasal and oral mucosae (Diaz Aparicio 2013, Poester et al. 2013). In intensive agriculture systems, such horizontal transfer can have devastating consequences for both single herds and contiguous enterprises, if not contained (Allen et al. 2015).

Jo

ur

na

lP

Northern Ireland (NI) was recently the site of a substantial B. abortus recrudescent outbreak (Stringer et al. 2008, Abernethy et al. 2011), which followed what appeared to have been a successful, endemic disease eradication program in the late 1980s / early 1990s. Three major sporadic outbreaks in 1997 underpinned this regional recrudescence (Abernethy et al. 2006; Stringer et al. 2008). Following further application of the internationally standardised test and slaughter eradication scheme and active serological surveillance (Ragan et al. 2013, OIE 2014), NI was declared officially brucellosis free (OBF) in September 2015; the last confirmed case in cattle occurred in March 2012. A retrospective molecular epizoological study employing a Multi Locus Variable Number Tandem Repeats Analysis (MLVA) inferred the genetic diversity and population structure of the Northern Irish B. abortus population during the recent outbreak, and highlighted the potential for molecular typing data to inform on disease source tracing (Allen et al. 2015). The study recognised the usefulness of the MLVA technique, demonstrating its ability to recapitulate the findings of classical epizoological investigations whilst also revealing historical insights about the outbreak and possible links to previous occurrences in the 1990s (Allen et al. 2015). However, the study also highlighted some now well documented limitations of MLVA genotyping, most notably hyper-variability of some VNTR loci and potential impacts on source attribution. Allen et al (2015) used a custom MLVA panel distinct from the international standard MLVA 16 panel, deliberately seeking to maximize discrimination in their B. abortus population by excluding monomorphic markers and retaining those that exhibited most diversity in the NI setting. The locus hypervariability phenomenon has been noted in markers used to characterise other Brucella species from a variety of hosts (Bricker and Ewalt 2006, Alvarez et al. 2011, Gyuranecz et al. 2013). In addition, it is well-documented that VNTR loci can be prone to homoplasy (convergent evolution), often making them unsuitable markers for phylogenetic inference and in establishing bacterial population structures (Pearson et al. 2009).

3

Journal Pre-proof

The advent of phylodynamic approaches applied to bacterial pathogen molecular epizoology may address some of these deficiencies. These methods integrate population genetic coalescent theory and epizoological modeling to better infer transmission dynamics (Volz et al. 2009). The ideal scenario is one in which a dense longitudinal sampling of infected individuals is available, and where pathogen mutation rate outstrips epidemic growth. In the latter ideal case, the bifurcations of phylogenetic trees assembled from bacterial isolates from distinct hosts can correspond to disease transmission events (Volz et al. 2009). Originally, these methods were developed for use with viral pathogens, whose mutation rates are typically an order of magnitude larger than those observed in bacteria. (Sanjuan et al. 2010, Biek et al. 2015).

na

lP

re

-p

ro

of

Due to major developments in sequencing technology and bioinformatics, phylodynamic approaches are increasingly being applied successfully to several bacterial populations, despite lower mutation rates. Whole genome sequence (WGS) based tracing of infectious sources for methicillin resistant Staphylococcus aureus (MRSA) in a special care baby unit at a UK hospital aided efforts to stop spread of infection (Harris et al. 2013). Food borne sources of a recent outbreak of Escherichia coli O104:H4 in Germany were similarly traced by WGS (Mellmann et al. 2011). Mycobacterium tuberculosis transmission among patients in the UK was also monitored using genomic data, leading to the identification of plausible transmission networks and super shedder patients, considered of prime importance in maintaining an epidemic (Walker et al. 2013). The complex multi host bovine tuberculosis epidemic in Britain and Ireland has been subjected to phylodynamic investigation, revealing ongoing transmission of Mycobacterium bovis between cattle and European badgers (Biek et al. 2012). Similarly, in the United States, WGS of B. abortus from the Greater Yellowstone Area (GYA) has clarified the roles of elk and bison in disease maintenance and spread (Kamath et al. 2016). A considerable advantage in employing WGS data to infer transmission dynamics is the increased resolution such data typically provides compared to any other genetic markers.

Jo

ur

We used WGS approaches to further investigate the MLVA-characterized B. abortus outbreak in NI (Allen et al, 2015) to better understand the pathogen’s history, population structure and disease dynamics. Our results highlight both the strengths and limitations of approaches using WGS data. Such data is the highest resolution possible, and can link new outbreaks to past epidemic / endemic events, thereby informing on historical sources of infection. However, recrudescent outbreaks with limited diversity do not lend themselves to WGS mediated phylodynamic approaches for inferring contemporaneous epizoological links.

2. Methods 2.1 Sampling procedure Previously, Allen et al. (2015) identified seven MLVA defined clonal complexes extant across NI’s 10 Divisional Veterinary Offices (DVOs) over a time window that covered most of the duration of the recrudescent outbreak, with some isolates from the prior outbreak (1991–2011). In this study, we selected 76 isolates, from 76 herds representative of the major MLVA diversity across the geographical extent of NI over the period of the outbreak. Full sample details, previous MLVA defined clonal complex assignment, time of isolation and location are listed in Supplementary data 1. Mapping was undertaken in the R environment (R Development Core Team, 2008) using the packages rgdal v1.4.4 (Bivand et al. 2019) and sp v1.3-1 (Pebesma et al.

4

Journal Pre-proof 2018); isolate collection locations are shown in Figure 1. 2.2 Bacterial culture and DNA extraction. B. abortus infected animals were identified by serological testing and culture confirmation using standardized methods (OIE, 2014). Isolates were cultured from one or all of the following lymph node tissues: internal iliac, submandibular, superficial inguinal, supra-mammary inguinal, retropharyngeal. Glands were trimmed and macerated, then streaked onto Farrell’s agar plates containing nantamycin. Incubation at 37C in a 5–10% carbon dioxide enriched environment was undertaken for up to 14 days (Alton et al. 1988). Confirmed cultures were grown to single colonies, heat killed, and genomic DNA extracted using a standard CTAB and solvent extraction protocol (van Soolingen et al. 2002).

lP

re

-p

ro

of

2.3 Whole Genome Sequencing and bioinformatics Sequencing libraries were prepared from the extracted DNA using the method of Baym et al. (2015) to produce inserts of approximately 300 bp. WGS was carried out on an Illumina HiSeq platform using Illumina V2 chemistry producing paired-end reads of 150 bp. Reads were mapped to a reference genome, B. abortus strain 2308 (GenBank accession AM040264.1) using the mapping-based phylogenomics pipeline, RedDog V1beta.10.3 (Edwards et al. 2016) to identify SNP variants across all 76 isolates. Alignment and mapping were carried out in Bowtie2 v2.2.9 (Langmead et al. 2012) using the sensitive local mapping setting. SNP calling was undertaken using SAMtools and BCFtools (Li et al. 2009) using the consensus caller setting. A minimum depth of 10x was set for SNP calling. The coverage failure filter, depth filter and mapping failure filters were set at 90%, 10x, and 50% respectively.

Jo

ur

na

2.4 Phylogenetic analyses To determine the most appropriate allele substitution model to employ for phylogenetic analysis, the RedDog-produced fasta alignment of informative SNPs from all 76 isolates was analysed using the ‘modelTest’ function of the package ‘Phangorn’ (Schliep, 2011), in the R environment (R Development Core Team, 2008). Specifically, the fits of the General Time Reversible (GTR), Jukes Cantor, and Hasegawa Kishino Yano models to the data were assessed. The allele substitution model exhibiting the lowest AIC was used to construct a maximum likelihood phylogeny using the rapid bootstrapping method in the program RAxML v8 (Stamakis, 2014). The rapid bootstrap search was stopped after 400 replicates. The phylogeny was visualized in FigTree v1.4.4 (Rambaut, 2018). The presence/absence of a temporal signal in the phylogenetic data was assessed using the program TempEst v1.5.1 (Rambaut et al. 2016). After selecting the best-fitting root isolate, which maximised the temporal signal, the root to tip divergence model was fitted using the residual mean squared method. 2.5 Epizoological inference Animal movement data collated from the Department of Agriculture, Environment and Rural Affairs’ (DAERA) Animal and Public Health Information System (APHIS) (Houston, 2001) were used to investigate possible linkages between herds that exhibited close pathogen genetic relatedness on the phylogenetic tree. Specifically, herds possessing genetically similar isolates were checked for any associations occurring in the 0-12 month period before brucellosis breakdowns were identified. This time period was chosen because the incubation time for brucellosis between infection and onset of symptoms can extend from 2 weeks up to 1 year (USDA, 2019). Movement between premises need not be the only facilitating factor for brucellosis transmission. Herds that use fields in close proximity to one another for grazing, can

5

Journal Pre-proof provide opportunity for contiguous disease transmission (Abernethy et al, 2011). To facilitate investigation of the likelihood of contiguous spread between herds that shared closely related B. abortus isolates, we determined the average pairwise distance between farms.

Jo

ur

na

lP

re

-p

ro

of

2.6 Spatio-temporal analyses Pairwise Euclidean distance (kilometres; Map_Dist) between farms where B. abortus was isolated, and pairwise time difference (no. of days; Time_Dist) between sample isolations were modeled for their relationship to pairwise B. abortus isolate SNP distances. We limited this analysis to the single major phylogenetic lineage (Major Lineage 2) containing 72 of the 76 isolates; Major lineage 1 contained too few samples (n=4) to model effectively. Firstly, the relationships between both explanatory variables and the outcome were visually explored using boxplots and smoothed scatterplots. Univariate associations between the outcome and untransformed explanatory variables were modeled using Poisson Generalized Linear Models (GLMs), however over-dispersion was detected and the standard errors were therefore corrected using quasi-GLM models. The deviance associated with each variable was also quantified. The over-dispersion warranted further investigation, and thus the form of the outcome variable (SNP distances) was further scrutinized in order to construct the optimal final model. The SNP distances variable was indeed found to follow a Poisson distribution, but there were more zeroes in the dataset (17.5%, n = 446, Figure 2A) than expected when compared to a simulated Poisson distribution (7.8%). Thus, the use of a hurdle model (i.e. zero-altered Poisson) was considered most appropriate. Hurdle models first model the probability of obtaining SNP distances of one of more, compared to SNP distances equal to zero. This component of the model takes the form of a binomial GLM. A Poisson GLM was also carried out on the sub-section of the data where SNP distances were greater than zero. There was no over-dispersion detected in this component of the model. Initial models were constructed with the two explanatory variables and their interaction terms in both the binomial and count components. Final models were selected for via backwards selection on the basis of likelihood ratio tests. Model diagnostics included visual inspection of the residuals; plots of fitted values and deviance residuals were constructed, along with plots of the deviance residuals and the explanatory variables. The exponentiated coefficients of the binomial model were interpreted as Odds Ratios (OR), whilst the exponentiated coefficients of the Poisson model were interpreted as Incidence Rate Ratios (IRR). These are reported alongside the upper and lower 95% confidence intervals (95% CI). Explanatory variables were log-transformed prior to inclusion in the final model. A small number of rows contained values of zero (Time_Dist: n = 14; Map_Dist: n = 2) and thus a small constant (0.001) was added to all data to permit log transformation. Hurdle models were constructed using the package ‘pscl’ v1.5.2 (Jackman et al. 2017).

3. Results 3.1 Sequence quality All 76 isolates passed quality assurance criteria. The average number of reads per sample was 3,814,392 (SD  1,212,605). At least 99.9% of the reference sequence across both B. abortus chromosomes mapped for all samples, at least 98.9% of reads from all isolates mapped to the combined, two chromosome reference. Average depth of coverage was 146x (range 56–248x). Average Phred base quality score was 34.6 (range 33.9 to 35.0) equating to a less than 1 in 1,000 probability of an incorrect base call in a read. Full details of sequence quality metrics are presented in Supplementary material 2. Forward and reverse reads for all 76 isolates are

6

Journal Pre-proof deposited at the National Centre for Biotechnology Information (NCBI) Sequence Read Archive (SRA), bio-project PRJNA576561. Accession numbers for all reads are shown in Supplementary material 1.

-p

ro

of

3.2 Phylogenetic analyses A total of 345 SNPs were identified across all 76 isolates. Locations of all SNPs relative to the reference sequence are presented in Supplementary material 3. The allele substitution model that fitted the B. abortus genetic data best, was the GTR model (AIC=3569.099). The SNPs were used to construct a maximum likelihood phylogeny, which indicated the presence of two lineages of B. abortus in NI over the period of the outbreak (Figure 3). Both lineages exhibited high bootstrap nodal support. Lineage 1, containing 15 SNPs, comprised three isolates from the recrudescent outbreak of 1997–2012, alongside a single isolate from the earlier outbreak. All isolates from lineage 1 were classified as belonging to MLVA clonal complexes 5 and 7 (Allen et al. 2015). Lineage 2, containing 43 SNPs, was comprised exclusively of 72 isolates from the recrudescent outbreak (date ranges 1999–2011), and exhibited very little diversity/internal branching. Indeed, all pairs of isolates from lineage 2 were separated by on average, only 2.6 SNPs (SD  2.1). All isolates from lineage 2 were defined as belonging to MLVA clonal complexes 1, 2, 3, 4, and 6 (Allen et al. 2015).

lP

re

TempEst selected isolate NI260 (1991), the earliest sampled isolate in the dataset, as the best fitting root. The phylogeny exhibited positive correlation between genetic divergence and sampling time (correlation coefficient 0.4056), but there was little temporal signal (R2 = 0.1645) indicated in the model for root to tip distance by year of sample isolation (Figure 4). Substitution rate, estimated from the slope of the model was 0.277 substitutions per year.

Jo

ur

na

3.3 Epizoological inferences Despite the general lack of diversity observed across the phylogeny, limited sub-structure, consistent with epizoological links between some herds, was apparent within both lineages - see Figure 3 clades 1A, 2A, 2B and 2C. Investigation of between herd animal movement records for herds within the same clades, for the period 0-12 months before a brucellosis outbreak, revealed little opportunity for direct contact between any registered animals, including those that provided sequenced isolates. There was potential for indirect contact however, as the herds involved in each clade regularly used the same sales market premises, albeit market use was not contemporaneous. The herds from clade 2C also exhibited associations with the same sales market premises, again raising the possibility of indirect means of transmission. Additionally, the two herds within clade 2C from which isolates NI051 and NI376 originate exhibited evidence of direct contact. Specifically the animal hosts of isolates NI051 and NI376, spent 7 months together in the same herd prior to brucellosis breakdown. NI051’s host animal was culled from this herd during the breakdown. The host animal of isolate NI376 had, prior to this breakdown moved to another herd which then succumbed to brucellosis. For assessment of likelihood of contiguous spread between herds, we determined the average pairwise distance between home farm premises represented in each of clades 1A (39.9km ±SD 13.9km), 2A (37.1km ±SD 28.1km) and 2C (30.9km ±SD 11.4km). The distance between the two farms representing clade 2B was 8.7km. Whilst not strictly a monophyletic clade, we also determined the average pairwise distance between farms whose isolates were members of

7

Journal Pre-proof MLVA clonal complex 3 (10.6km ±SD 8.9km). All spatial relationships of these farms are represented in Figure 5.

re

-p

ro

of

3.4 Spatio-temporal analyses The mean distance between isolates was 37.7 km (SD26.0), and the mean time between isolates was 928 days (SD800.3). Over 90% of the data were associated with SNP_Dist values of five or less (n = 2305, 90.2%). The modal SNP_Dist was 2 (n = 688, 26.9%). Univariable Poisson regression revealed a positive association between Map_Dist and SNP_Dist (IRR: 1.01, 95% CI: 1.00–1.01, Figure 2B), suggesting that SNP_Dist increases by 1% for every 1 km increase in Map_Dist. The deviance associated with Map_Dist in the univariate context was 11.17%. There was little evidence, however that Time_Dist was significantly associated with the outcome (IRR: 1.00, 95% CI: 0.99–1.00, Figure 2C), deviance <1%. The results from the final model suggested that log(Map_Dist) was an important variable to differentiate between a SNP_Dist of zero, and a SNP_Dist of one or more (i.e. the binomial component; OR: 1.59, 95% CI: 1.44–1.74). Each of log(Map_Dist) (IRR: 2.34, 95% CI: 1.98–2.76) and log(Time_Dist) (IRR: 1.34, 95%CI:1.22-1.48) were significant in the multivariable context in the Poisson component of the model, however an interaction term between both explanatory variables was also significant (IRR:0.92, 95% CI:0.89–0.94). These are comparable to the coefficient values derived from a purely Poisson GLM; (log(Map_Dist)(IRR: 2.39, 95% CI: 2.11–2.71); log(Time_Dist)(IRR: 1.33, 95% CI: 1.23–1.43) log(Time_Dist)* log(Map_Dist) (IRR: 0.92, 95% CI: 0.90–0.93)).

Jo

ur

na

lP

Additional analysis revealed that in models where the interaction term was omitted, the Time_Dist variable ceased to remain a significant predictor of the outcome. This indicates that isolates exhibited a greater difference in SNP_Distances when they were identified further apart in space but closer together in time, when compared to isolates further apart in space and further apart in time. However, the biological relevance of this should be interpreted with caution. Only 5.16% of the data (n=132) were associated with a SNP_Dist of seven or more, meaning that firm conclusions regarding the combined effect of Map_Dist and Time_Dist on the outcome cannot be strongly inferred. To investigate this further, the Map_Dist variable was divided into quartiles and the Time_Dist associated with each SNP_Dist in each Map_Dist quartile was plotted (data not shown). Whilst there does appear to be a decrease in Time_Dist at higher SNP_Dist values as Map_Dist increases, it should be noted, for example, that the values for Time_Dist at SNP_Dist 11 were derived from only six records. Model residual values tended to decrease as fitted values increased (Figure 2D). This demonstrated that the model tended to over-predict SNP_Dist values, compared to those found in the actual data. Further examination revealed a number of SNP_Dist values of zero that were associated with high values for Map_Dist and Time_Dist. Wide variation occurred in Map_Dist and Time_Dist values across different SNP_Dist values (Figures 2E and 2F).

4. Discussion 4.1 Two distinct phylogenetic lineages confirmed in the NI outbreak The genomic data support a speculative finding proposed by Allen et al. (2015), based solely on MLVA, that the recent, recrudescent B. abortus outbreak in NI comprised two distinct phylogenetic lineages. However, VNTR loci are prone to homoplasy (Pearson et al. 2009; O’Brien

8

Journal Pre-proof et al. 2017) and consequently, phylogenetic inference requires considerable caution. Here, broad inferences derived from MLVA data were supported (Allen et al. 2015). Indeed, the genome sequence data and associated maximum likelihood phylogeny (Figure 3) were congruent with the MLVA findings at the broadest of phylogenetic scales. Yet, detailed analyses indicated substantial differences between results from the two approaches.

lP

re

-p

ro

of

4.2 MLVA hyper-variability & comparison to genomic data While broad inferences of the MLVA phylogeny from Allen et al. (2015) were supported by the genomic data, the finer scale existence of discrete clonal complexes was not (Figure 3). In lineage 2, isolates classified in MLVA clonal complexes 1, 2, 3, and 4 clustered together and not in mono-phyletic sub-clades, suggesting that VNTR locus hyper-variability implied genetic substructure that was not supported, when viewed with the highest possible resolution genomic data. Allen et al. (2015) suggested that such locus hyper-variability would also affect epizoological precision. This was also suggested by Oliveira et al. (2017), who demonstrated that hyper-variable markers could mask genuine links between affected farms in Brazil. Similar concerns around VNTR hyper-variability have recently been raised in the UK and Kazakhstan, specifically noting that some VNTR loci are unstable after multiple culture passages (Whatmore et al. 2006; Shevstova et al. 2016). These data suggest that caution is warranted when deploying MLVA schemes to avoid spurious inferences of B. abortus population structure and unsupported epizoological connections. Certainly, with hindsight it appears that Allen et al’s (2015) choices of a custom MLVA panel that sought to maximize diversity in the NI population, could have resulted in finer scale sub-structure that was not completely reflective of the actual genetic diversity in the 1997-2012 outbreak.

Jo

ur

na

4.3 Bio-typing and genomic data We noted that within lineage 2, isolates from the MLVA defined clonal complex 6 resided within their own sub-clade (Figure 3). The two isolates from this sub clade (NI268 and NI261) were genetically indistinguishable, and were the only isolates in the entire dataset to be designated biovar 2. All other isolates were biovar 1. However, SNP-based phylogenetic data indicated that these isolates exhibited only a 2 SNP distance from isolates that were designated biovar 1, suggesting that isolates with a biovar 2 designation are closely related to biovar 1 and may have evolved from a very recent biovar 1 common ancestor. MLST data do suggest that some biovars represent distinct genetic lineages (Whatmore et al. 2016), a finding supported by WGS data that suggested biovars 5 and 9 seem to segregate into their own lineage whilst biovars 1,2, 4 and 6 are found in their own distinct lineage (Whatmore, unpubl. data). Indeed, a global analysis of B. abortus genomes indicates that biovars 1, 2, and 4 are closely related, but paraphyletic (Foster, unpubl. data). As a result, the usefulness of biotyping for epizoology has been questioned (Whatmore, 2009). A recent genomic epizoological investigation in Italy found inconsistencies in biovar assignment compared to phylogenetic lineage (Garofolo et al. 2017), whilst earlier studies suggested that some B. abortus biovars correspond to multiple genetic clusters (Ocampo-Sosa et al. 2005; Ica et al. 2008; Whatmore et al. 2016). Our data support the suggestion that bio-typing in B. abortus may not always be representative of deep ancestry nor indicative of stable associations with defined genetic groups. 4.4 Epizoological Inferences? Animal movement data were largely inconclusive in linking herds within sub-clades but did suggest shared market premises could be sites for indirect transmission. Transmission in such a setting could theoretically have occurred by mechanisms other than abortion, for instance

9

Journal Pre-proof contact with genital excretions released to the environment (Whatmore, 2009; Moreno, 2014). However we do not know if animals at the same markets would have used the same holding pens at different time points, and if pathogen could have survived in said environment in the interim (see below). Additionally such indirect transmission mechanisms would be dependent on newly infected, unsold animals then returning to their farm of origin to further spread disease, thereby linking affected herds. Without firm evidence for these caveats, this theory remains speculative. The exception to the observation above is in the case of isolates NI051 and NI376 from clade 2C. The likelihood of direct transmission between both host animals and subsequent spread to another herd by animal movement is supported by movement and pathogen genetic data.

re

-p

ro

of

At first appearance the pairwise geographic distances between home farm premises within clades 1A, 2A, 2B and 2C would appear to preclude the likelihood of contiguous spread of disease across soft boundaries / fences. However, land parcel fragmentation and seasonal rental are very common in Northern Ireland (Allen et al, 2015), with pasture several tens of kilometers distant from registered home farm buildings often used for grazing. There is therefore the possibility that some of the spread of disease has been contiguous in nature. However without detailed knowledge of land fragmentation and pasture usage, it is incredibly difficult to be definitive.

Jo

ur

na

lP

In the absence of detailed land parcel spatial data, a very coarse ‘rule of thumb’ perhaps for likelihood of contiguous disease spread may be that the closer home farm premises are geographically, the more likely it is that their outlying land fragments may be in close contact with one another, thereby facilitating contiguous disease transmission. In the case of the isolates from MLVA clonal complex 3, which do not form their own clade but do exhibit close genetic relatedness, this appears to hold true. Average pairwise distance between clonal complex 3 affected farms was 10.6km (±SD 8.9km), with all farms observed to cluster tightly in space – see Figure 5. Previously, Abernethy et al (2011) had suggested in their classical epizoological study that these farms likely represented a case of contiguous disease transmission. The close proximity of all farms and the genetic similarity as defined by both the phylogeny in Figure 3 and Allen et al’s (2015) MLVA data suggest this is likely the case. 4.5 Reduced diversity in recrudescent outbreaks and implications Lineage 2 isolates (Figure 3) exhibited very little genomic diversity, despite having been collected over a time period of 12 years (1999–2011) – see Supplementary Data 1 and Figure 3. TempEst analysis also suggested a poor temporal signal (R2 = 0.1645 – Firth et al. 2010) in the root to tip analysis of both lineages. Spatio-temporal modeling of lineage 2 isolates found significant associations between pairwise physical distances and isolation time differences with pairwise SNP distance between isolates. However, the association effect size was very small, consistent with the presence of minimal genetic diversity. Similar studies with B. abortus and M. bovis have shown these endemic veterinary pathogens to naturally diffuse across a landscape as they diversify (Biek et al. 2012; Trewby et al. 2016; Kamath et al. 2016), something not seen in our data. These findings indicate that the application of a phylodynamic approach in this instance may be less useful, limiting further inferences about disease transmission dynamics. B. abortus is a slow growing bacterium (Alton and Forsyth, 1996) whose rate of molecular evolution was estimated at 1.4x10-7 substitutions per site per year (Kamath et al. 2016),

10

Journal Pre-proof

ro

of

equivalent to 0.46 SNPs per year per 3.3 Mb B. abortus genome. Kamath et al. (2016) observed mutation rate heterogeneity across different lineages, however they estimated that overall variation was low, with a range of 0.36–0.57 SNPs per genome per year. Consequently, it is puzzling that such limited diversity was observed during the 12-year surveillance of lineage 2. Even at the lower bound of possible substitution rates from Kamath et al. (2016), we would expect greater diversity than was found in our study. While there is a paucity of B. abortus phylodynamic analyses to permit a direct assessment of the substitution rate presented here, we instead compare our findings to the substitution rates observed in M. bovis in NI, another slowly evolving veterinary pathogen. Trewby et al. (2016) indicated a substitution rate of 0.2 SNPs per year per genome in a single lineage of M. bovis sampled across 51 herds from 1996– 2011. They found an average pairwise SNP distance of 6.4 (range 0–19). This is more than twice the magnitude of that observed for the B. abortus isolates from lineage 2, although the sampling timescale was greater than that for B. abortus. Unlike B. abortus however, M. bovis has not been eradicated, and the potential for existing standing variation to contribute to overall M. bovis diversity still remains.

Jo

ur

na

lP

re

-p

We propose that at least part of the recrudescent outbreak from 1997–2012 is epizoologically linked to the endemic infection thought eradicated in the late 1980s/early 1990s. Genomic data provide more definitive support for this conclusion, and indeed indicate that a viable source of B. abortus from the earlier outbreak persisted, possibly within a single source (or limited number of sources). Whilst the provenance of such an infectious, surviving reservoir is impossible to elucidate in the absence of more data, we posit that (1) survival in limited number of hosts, (2) limited environmental persistence and (3) criminal activity, are potential mechanisms that could explain the observed lack of genetic diversity. First, whilst the exact mechanisms for long-term survival within hosts and immune system evasion are unknown, members of the Brucellae can survive in a dormant, non-replicative state (Olson and Palmer, 2014). Indeed, reactivation of a B. melitensis infection has been observed in a human host, 28 years after initial infection (Ogredici et al. 2010) and in cattle, nine years post infection (Lapraik and Moffat, 1982). Therefore, an animal latently infected with B. abortus from the historical outbreak could be responsible for the lineage 1 recrudescence event. We attempted to discover if any animals present in affected herds during the initial brucellosis outbreak of the 1980s/1990s were still alive in the period 1997–2012, however animal-level data are sparse from the earlier era and our queries were inconclusive. Second, environmental persistence of the Brucellae has been demonstrated in areas associated with infected animals (Olson and Palmer, 2014). Brucella spp can survive in dust, dung, water and slurry, with survival dependent on multiple factors including substrate, temperature, pH, sunlight and the presence of other microbes (WHO, 2006). Typically however, B. abortus survival in any matrix tends not to exceed 8 months (WHO, 2006), a period of time far shorter than that observed between the original endemic infection in NI, and the recrudescent outbreak. Indeed whilst environmental persistence can occur, Olson and Palmer (2014) indicate that duration of persistence is short and of little long term epizoological importance Third, there was alleged criminal involvement in the re-introduction and spread of B. abortus in the recrudescent outbreak in NI (Belfast Telegraph, 2010; BBC Northern Ireland, 2011; Allen et al. 2015). Specifically, B. abortus-infected tissues were alleged to have been intentionally deposited on farmland near grazing cattle (Northern Ireland Executive, 2010). If criminals used and reused the same stock of infective materials, Brucellae being known to survive well during refrigeration and freezing (WHO, 2006), then this could have acted as a natural bottleneck that helped to limit pathogen diversity. However, we note that this is extremely speculative in the absence of further data, and indeed it

11

Journal Pre-proof is impossible to quantify with any certainty the size of effect, if any, these potential mechanism/s had on pathogen diversity.

-p

ro

of

Notwithstanding these potential processes, we also acknowledge that the surprising lack of diversity found in B. abortus in NI may reflect a comparatively short sampling window, thus limiting the probability of observing increases in genetic diversity. Simply, insufficient time may have elapsed to acquire substantial genetic variation in the recrudesced lineage. Kamath et al. (2016) studied an established endemic outbreak, using samples straddling a 48-year temporal window to maximize the probability of observing clock-like behaviour across lineages. Additionally, in their work, Kamath et al included isolates taken from not just domestic cattle, but also two wildlife species – namely elk and bison. Such wide-ranging wild animal hosts experiencing little to no brucellosis control measures, would potentially have been free to develop greater pathogen diversity over the time period. However, our study focused on a reemerging infection over a temporal window of only 12 years. Mutation is a stochastic process, and a defined clock-like rate from one outbreak may not be directly applicable to another. Consequently the lack of diversity seen in our study may be due to entirely natural phenomena, or some combination of natural phenomena and the mechanisms detailed above.

ur

na

lP

re

Regardless of the explanation(s) for the low genetic diversity observed in this study, there is a more general cautionary note of the limitations of applying molecular tools to recrudescent infectious disease outbreaks. By their very nature, recrudescent infections are likely to arise from a genetically homogeneous pathogen stock. This natural population bottleneck virtually guarantees reduced genetic diversity. Additionally, when national control measures are applied, this has the effect of curtailing transmission and preventing the development of further pathogen differentiation. Indeed, we observed that despite the positive relationship between increasing Euclidian distance among farms and increasing genetic diversity, this relationship was less apparent as the time between isolates increased. We hypothesise that ongoing eradication efforts may be reducing the genetic diversity associated with increasing Euclidian distance. However, the relatively few records associated with some of the larger SNP distances mean that firm conclusions cannot be reached.

Jo

In such low diversity, recrudescent infectious outbreaks, even WGS may not be a panacea for molecular epizoological studies. Combinations of some MLVA markers and major lineage/clade defining SNPs may be more informative in homogeneous pathogen systems with little temporal depth (Garofolo et al. 2017). Indeed, hyper-variable MLVA loci, being representative of more recent polymorphism, may even have a role in such settings (Oliveira et al. 2016). Our findings within the SNP defined sub-clades across both lineages, and among members of MLVA clonal complex 3 would lend some credence to that concept. Specifically investigating SNP defined sub-clades that contain isolates with related MLVA profiles may be a way to further investigate epizoological links. Despite these reservations, genome sequence data are the essential data for pathogen monitoring. Comparison of sequence data to historical and contemporary genomes from multiple territories permits the rapid detection of recrudescence and can also identify potential sources of infection. The effectiveness of such measures will be greatly enhanced by the creation of national and international pathogen sequence data repositories, which will permit cataloguing of regional pathogen diversity (Gardy and Loman, 2018). As the technology becomes more affordable, accessible and indeed portable, the latter will become a reality. To advance

12

Journal Pre-proof progress in the phylodynamic investigations of disease transmission for slowly-evolving pathogens, integration of detailed epizoological data and modelling may be the solution (Trewby et al. 2016).

5. Conclusions We demonstrate that applying genomic methods to a recrudescent B. abortus outbreak had limited capacity to inform on disease transmission dynamics due to the limited genetic diversity detected in this case. The nature of recrudescent infections arising from limited, founder pathogen genetic stock, compounded by slow molecular evolution of some bacterial pathogens, and potentially other factors, makes application of phylodynamic approaches particularly challenging.

ro

of

However, genomic data did confirm that at least part of the recent brucellosis outbreak in NI was related to an earlier outbreak, previously assumed to have been eradicated. Regional and global archived pathogen genome datasets, collected over wide temporal scales, will be useful in assessing the likely provenance of such recrudescent infections in the future.

lP

re

-p

While MLVA data were congruent with genomic data at the widest of phylogenetic resolutions (lineage), VNTR locus hyper-variability may suggest more genetic sub-structure in a bacterial population than was actually present when genomic data are considered. Careful selection, configuration and deployment of MLVA schemes alongside genomic data is recommended to reduce the risk of incorrect epizoological inference.

Acknowledgements

Jo

ur

na

The authors gratefully acknowledge the Department of Agriculture, Environment and Rural Affairs, Northern Ireland for access to herd and animal level data. This work was funded by the US-UK Fulbright Commission and was kindly hosted from September to December 2015, at the laboratory of Dr. Jeffrey Foster, then of the University of New Hampshire. Thanks to Dr. Foster, Dr. Drees, Katy Parise and to Wayne and Betsy Burton of Durham, NH for their hospitality and kindness during the Fall of 2015. Work at APHA was funded by the UK Department of the Environment, Food and Rural Affairs (Defra).

13

Journal Pre-proof

Figure Legends Figure 1 – Map illustrating spatial location of all B. abortus herd breakdowns (n=76). Herd isolates are color coded by genome sequence defined lineage.

of

Figure 2 – (A) Histogram illustrating distribution of inter-isolate genomic differences (SNP distance). (B) Univariate Poisson model of inter-isolate physical distance (Map Distance) vs. inter-isolate genetic distance (SNP Distance). (C) Univariate Poisson model of inter-isolate temporal distance (Time Distance) vs. inter-isolate genetic distance (SNP Distance). (D) Distribution of residuals from final multivariate Poisson model. (E) Boxplot illustrating wide variation per unit of inter-isolate genetic distance (SNP distance) by inter-isolate physical distance (Map distance). (F) Boxplot illustrating wide variation per unit of inter-isolate genetic distance (SNP distance) by inter-isolate temporal distance (Time distance).

-p

ro

Figure 3 – Maximum likelihood phylogeny of B. abortus isolates (n=76) from Northern Ireland. Isolates are labeled with unique sample ID, Clonal Complex assignation as per Allen et al. (2015), MLVA genotype as per Allen et al. (2015), and year of isolation. Phylogeny branches are color coded by lineage – lineage 1 red, lineage 2 blue. Bootstrap support values are indicated for major nodes.

re

Figure 4 – TempEst root to tip divergence vs time regression for all 76 B. abortus isolates – fitted by residual mean squared method. R2=0.1645.

Jo

ur

na

lP

Figure 5 – Spatial distribution of home farm premises associated with B. abortus clades 1A, 2A, 2B, 2C and MLVA clonal complex 3.

14

Journal Pre-proof

References Abernethy, D.A., Pfeiffer, D.U., Watt, R., Denny, G.O., McCullough, S., McDowell, S.W., 2006. Epidemiology of bovine brucellosis in Northern Ireland between 1990 and 2000. Vet. Rec. 158, 717-721. Abernethy, D.A., Moscard-Costello, J., Dickson, E., Harwood, R., Burns, K., McKillop, E., McDowell, S., Pfeiffer, D.U., 2011. Epidemiology and management of a bovine brucellosis cluster in Northern Ireland. Prev. Vet. Med. 98, 223-229. Adone, R., Pasquali, P., 2013. Epidemiosurveillance of brucellosis. Rev. Sci. Tech. OIE. 32, 199205.

ro

of

Allen, A., Breadon, E., Byrne, A., Mallon, T., Skuce, R., Groussaud, P., Dainty, A., Graham, J., Jones, K., Pollock, L., Whatmore, A., 2015. Molecular Epidemiology of Brucella abortus in Northern Ireland-1991 to 2012. PLoS One 10, e0136721.

-p

Alton, G.G., Jones, L.M., Angus, R.D., Verger, J.M., 1988. Techniques for the brucellosis laboratory. Paris Institute National de la Recherche Agronomique.

re

Alton, G.G., Forsyth J.R.L., 1996. Medical Microbiology, Fourth ed, University of Texas, Galveston.

na

lP

Alvarez, J., Saez, J.L., Garcia, N., Serrat, C., Perez-Sancho, M., Gonzalez, S., Ortega, M.J., Gou, J., Carbajo, L., Garrido, F., Goyache, J., Dominguez, L., 2011. Management of an outbreak of brucellosis due to B. melitensis in dairy cattle in Spain. Res. Vet. Sci. 90, 208-211.

ur

Baym, M., Kryazhimskiy, S., Lieberman, T.D., Chung, H., Desai, M.M., Kishony, R., 2015. Inexpensive multiplexed library preparation for megabase-sized genomes. PLoS One 10, e0128036.

Jo

BBC Northern Ireland, 2011. Armagh 'brucellosis incident' under investigation. https://www.bbc.co.uk/news/uk-northern-ireland-12650613 (accessed September 2019) Belfast Telegraph, 2010. Top-level talks on criminals behind cattle disease. https://www.belfasttelegraph.co.uk/news/northern-ireland/toplevel-talks-on-criminals-behindcattle-disease-28562730.html (accessed September 2019) Biek, R., O'Hare, A., Wright, D., Mallon, T., McCormick, C., Orton, R.J., McDowell, S., Trewby, H., Skuce, R.A., Kao, R.R., 2012. Whole Genome Sequencing Reveals Local Transmission Patterns of Mycobacterium bovis in Sympatric Cattle and Badger Populations. PLoS Pathog. 8, e1003008. Biek, R., Pybus, O.G., Lloyd-Smith, J.O., Didelot, X., 2015. Measurably evolving pathogens in the genomic era. Trends Ecol. Evol. 30, 306-313.

15

Journal Pre-proof Bivand, R., Keitt, T., Rowlingston, B., Pebesma, E., Summer, M., Hijmans, R., Rouault, E., Warmerdam, F., Ooms, J., Rundel, C., 2019. rgdal: Bindings for the 'Geospatial' Data Abstraction Library. https://cran.rproject.org/web/packages/rgdal/index.html (accessed September 2019) Bricker, B.J., Ewalt, D.R., 2006. HOOF prints: Brucella strain typing by PCR amplification of multilocus tandem-repeat polymorphisms. Methods in molecular biology (Clifton, N.J.) 345, 141173.

of

CDC – Centres for Disease Control and Prevention, 2019. 8 Zoonotic Diseases Shared Between Animals and People of Most Concern in the US. https://www.cdc.gov/media/releases/2019/s0506-zoonotic-diseases-shared.html Acccessed September 2019

-p

ro

Diaz Aparicio, E., 2013. Epidemiology of brucellosis in domestic animals caused by Brucella melitensis, Brucella suis and Brucella abortus. Rev. Sci. Tech. OIE. 32, 43-51, 53-60.

re

Edwards, D.B., Pope, B., Holt, K., 2016. RedDog mapping based phylogenomics pipeline. https://github.com/katholt/RedDog (accessed September 2019)

lP

Firth, C., Kitchen, A., Shapiro, B., Suchard, M.A., Holmes, E.C., Rambaut, A., 2010. Using timestructured data to estimate evolutionary rates of double-stranded DNA viruses. Mol. Biol. Evol. 27, 2038-2051.

na

Gardy, J.L., Loman, N.J., 2018. Towards a genomics-informed, real-time, global pathogen surveillance system. Nat. Rev. Genet. 19, 9-20.

Jo

ur

Garofolo, G., Di Giannatale, E., Platone, I., Zilli, K., Sacchini, L., Abass, A., Ancora, M., Camma, C., Di Donato, G., De Massis, F., Calistri, P., Drees, K.P., Foster, J.T., 2017. Origins and global context of Brucella abortus in Italy. BMC Microbiol. 17, 28. Gyuranecz, M., Rannals, B.D., Allen, C.A., Janosi, S., Keim, P.S., Foster, J.T., 2013. Within-host evolution of Brucella canis during a canine brucellosis outbreak in a kennel. BMC Vet. Res. 9, 76. Harris, S.R., Cartwright, E.J., Torok, M.E., Holden, M.T., Brown, N.M., Ogilvy-Stuart, A.L., Ellington, M.J., Quail, M.A., Bentley, S.D., Parkhill, J., Peacock, S.J., 2013. Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study. Lancet Infect. Dis. 13, 130-136. Houston, R., 2001. A computerised database system for bovine traceability. Rev. Sci. Tech. OIE. 20, 652-661. Ica, T., Aydin, F., Erdenlig, S., Guler, L., Buyukcangaz, E., 2008. Characterisation of Brucella abortus biovar 3 isolates from Turkey as biovar 3b. Vet. Rec. 163, 659-661.

16

Journal Pre-proof Jackman, S., Tahk, A., Zeillis, A., Maimone, C., Fearon, J., Meers, Z., 2017. pscl - political science computational laboratory. https://cran.r-project.org/web/packages/pscl/index.html (accessed September 2019) Kamath, P.L., Foster, J.T., Drees, K.P., Luikart, G., Quance, C., Anderson, N.J., Clarke, P.R., Cole, E.K., Drew, M.L., Edwards, W.H., Rhyan, J.C., Treanor, J.J., Wallen, R.L., White, P.J., RobbeAusterman, S., Cross, P.C., 2016. Genomics reveals historic and contemporary transmission dynamics of a bacterial disease among wildlife and livestock. Nat. Commun. 7, 11448. Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359.

-p

ro

of

Lapraik, R.D., Moffat, R., 1982. Latent bovine brucellosis. The Veterinary record 111, 578-579. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079.

lP

re

Mellmann, A., Harmsen, D., Cummings, C.A., Zentz, E.B., Leopold, S.R., Rico, A., Prior, K., Szczepanowski, R., Ji, Y., Zhang, W., McLaughlin, S.F., Henkhaus, J.K., Leopold, B., Bielaszewska, M., Prager, R., Brzoska, P.M., Moore, R.L., Guenther, S., Rothberg, J.M., Karch, H., 2011. Prospective Genomic Characterization of the German Enterohemorrhagic Escherichia coli O104:H4 Outbreak by Rapid Next Generation Sequencing Technology. PLoS ONE 6, e22751.

na

Moreno, E., 2014. Retrospective and prospective perspectives on zoonotic brucellosis. Front. Microbiol. 5.

ur

Moreno, E., Cloeckaert, A., Moriyon, I., 2002. Brucella evolution and taxonomy. Vet. Microbiol. 90, 209-227.

Jo

Northern Ireland Executive, 2010. DNA science will help detect those who tried to infect herd Gildernew. Department of Agricluture and Rural Development, January 2010 news release. https://wayback.archiveit.org/11112/20150609181839/http:/www.northernireland.gov.uk/inde x/media-centre/news-departments/news-dard/news-dard-january-2010/news-dard-070110dna-science-will.htm (accessed September 2019) O'Brien, M.P., Beja-Pereira, A., Anderson, N., Ceballos, R.M., Edwards, W.H., Harris, B., Wallen, R.L., Costa, V., 2017. Brucellosis Transmission between Wildlife and Livestock in the Greater Yellowstone Ecosystem: Inferences from DNA Genotyping. J. Wildl. Dis. 53, 339-343. Ocampo-Sosa, A.A., Aguero-Balbin, J., Garcia-Lobo, J.M., 2005. Development of a new PCR assay to identify Brucella abortus biovars 5, 6 and 9 and the new subgroup 3b of biovar 3. Vet. Microbiol. 110, 41-51. Ogredici, O., Erb, S., Langer, I., Pilo, P., Kerner, A., Haack, H.G., Cathomas, G., Danuser, J., Pappas, G., Tarr, P.E., 2010. Brucellosis reactivation after 28 years. Emerg. Infect. Dis. 16, 20212022.

17

Journal Pre-proof

OIE, 2014. Bovine Brucellosis, Manual of diagnostic tests and vaccines for terrestrial animals. Olsen, S.C., Palmer, M.V., 2014. Advancement of knowledge of Brucella over the past 50 years. Vet. Pathol. 51, 1076-1089. Pearson, T., Okinaka, R.T., Foster, J.T., Keim, P., 2009. Phylogenetic understanding of clonal populations in an era of whole genome sequencing. Infect. Genet. Evol. 9, 1010-1019.

of

Pebesma, E., Bivand, R., Rowlingston, B., Gomez-Rubio, V., Hijmans, R., Summer, M., MacQueen, D., Lemon, J., O'Brien, J., O'Rourke, J., 2018. sp: Classes and Methods for Spatial Data. https://cran.r-project.org/web/packages/sp (accessed September 2019)

ro

Poester, F.P., Samartino, L.E., Santos, R.L., 2013. Pathogenesis and pathobiology of brucellosis in livestock. Rev. Sci. Tech. OIE. 32, 105-115.

-p

R Development Core Team, 2008. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

lP

re

Ragan, V., Vroegindewey, G., Babcock, S., 2013. International standards for brucellosis prevention and management. Rev. Sci. Tech. OIE. 32, 189-198.

na

Rambaut, A., 2018. Figtree v1.4.4. https://github.com/rambaut/figtree/releases (accessed September 2019)

ur

Rambaut, A., Lam, T.T., Max Carvalho, L., Pybus, O.G., 2016. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007.

Jo

Sanjuan, R., Nebot, M.R., Chirico, N., Mansky, L.M., Belshaw, R., 2010. Viral Mutation Rates. J. Virol. 84, 9733-9748. Schliep, K.P., 2011. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592-593. Scholz, H.C., Banaj, M., Cloeckaert, A., Kämpfer, P., Whatmore, A.M., 2018. Brucella, in: W.B. Whitman, F.R., P. Kämpfer, M. Trujillo, J. Chun, P. DeVos, B. Hedlund and S. Dedysh. (Ed.), Bergey's Manual of Systematics of Archaea and Bacteria. John Wiley and Sons. Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312-1313. Stringer, L.A., Guitian, F.J., Abernethy, D.A., Honhold, N.H., Menzies, F.D., 2008. Risk associated with animals moved from herds infected with brucellosis in Northern Ireland. Prev. Vet. Med. 84, 72-84. Trewby, H., Wright, D., Breadon, E.L., Lycett, S.J., Mallon, T.R., McCormick, C., Johnson, P., Orton, R.J., Allen, A.R., Galbraith, J., Herzyk, P., Skuce, R.A., Biek, R., Kao, R.R., 2016. Use of

18

Journal Pre-proof bacterial whole-genome sequencing to investigate local persistence and spread in bovine tuberculosis. Epidemics 14, 26-35. USDA – United States Department of Agriculture, 2019. Facts about brucellosis. https://www.aphis.usda.gov/animal_health/animal_diseases/brucellosis/downloads/brucfacts.pdf (accessed September 2019) van Soolingen, D., de Haas, P.E., Kremer, K., 2001. Restriction fragment length polymorphism typing of mycobacteria. Methods Mol. Med. 54, 165-203.

of

Volz, E.M., Kosakovsky Pond, S.L., Ward, M.J., Leigh Brown, A.J., Frost, S.D.W., 2009. Phylodynamics of Infectious Disease Epidemics. Genetics 183, 1421-1430.

-p

ro

Walker, T.M., Ip, C.L., Harrell, R.H., Evans, J.T., Kapatai, G., Dedicoat, M.J., Eyre, D.W., Wilson, D.J., Hawkey, P.M., Crook, D.W., Parkhill, J., Harris, D., Walker, A.S., Bowden, R., Monk, P., Smith, E.G., Peto, T.E., 2013. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect. Dis. 13, 137-146.

lP

re

Wattam, A.R., Foster, J.T., Mane, S.P., Beckstrom-Sternberg, S.M., Beckstrom-Sternberg, J.M., Dickerman, A.W., Keim, P., Pearson, T., Shukla, M., Ward, D.V., Williams, K.P., Sobral, B.W., Tsolis, R.M., Whatmore, A.M., O'Callaghan, D., 2014. Comparative phylogenomics and evolution of the Brucellae reveal a path to virulence. J. Bacteriol. 196, 920-930.

na

Whatmore, A.M., 2009. Current understanding of the genetic diversity of Brucella, an expanding genus of zoonotic pathogens. Infect. Genet. Evol. 9, 1168-1184.

ur

Whatmore, A.M., 2014. Ancient-Pathogen Genomics: Coming of Age? mBio 5, e01676-01614e01676-01614.

Jo

Whatmore, A.M., Koylass, M.S., Muchowski, J., Edwards-Smallbone, J., Gopaul, K.K., Perrett, L.L., 2016. Extended Multilocus Sequence Analysis to Describe the Global Population Structure of the Genus Brucella: Phylogeography and Relationship to Biovars. Front. Microbiol. 7, 2049. Whatmore, A.M., Shankster, S.J., Perrett, L.L., Murphy, T.J., Brew, S.D., Thirlwall, R.E., Cutler, S.J., MacMillan, A.P., 2006. Identification and characterization of variable-number tandemrepeat markers for typing of Brucella spp. J. Clin. Microbiol. 44, 1982-1993. WHO – World Health Organisation, 2006. Brucellosis in humans and animals. https://www.who.int/csr/resources/publications/Brucellosis.pdf (accessed September 2019)

19

Journal Pre-proof

ur

na

lP

re

-p

ro

of

We genome sequenced isolates from a recrudescent outbreak of Brucella abortus. Two major pathogen lineages present. One was linked to a prior epidemic. Genetic diversity was low, with little spatio temporal signal observed. Previous VNTR method had suggested more genetic sub-structure than was present. WGS data has limited use for epizoology of low diversity, recrudescent infections.

Jo

    

20

Journal Pre-proof

Genomic epizootiology of a Brucella abortus outbreak in Northern Ireland (1997–2012). Reference - MEEGID-D-19-00819 Credit Author Statement

Adrian Allen and Jeffrey Foster secured funding for the study. Adrian Allen, Jeffrey Foster, Adrian Whatmore and Robin Skuce designed the study.

of

Adrian Allen, Georgina Milne, Jeffrey Foster, Adrian Whatmore, Kevin Drees, and Robin Skuce wrote the manuscript.

ro

Adrian Allen, Georgina Milne, Jordon Graham and Paul McAdam performed statistical analyses and data curation.

re

-p

Adrian Allen, Eleanor Presho, Lorraine Wright and Kerri Jones performed all laboratory work.

Jo

ur

na

lP

Judith Graham supplied epidemiological information and veterinary advice.

21

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5