Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada

Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada

Infection, Genetics and Evolution xxx (2014) xxx–xxx Contents lists available at ScienceDirect Infection, Genetics and Evolution journal homepage: w...

2MB Sizes 0 Downloads 26 Views

Infection, Genetics and Evolution xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada Adam Chernick a,⇑, Dale L. Godson b, Frank van der Meer a a b

Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N4N1, Canada Prairie Diagnostic Services, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N5B4, Canada

a r t i c l e

i n f o

Article history: Available online xxxx Keywords: BVDV Phylodynamics Bayesian Evolution Most common recent ancestor Model selection

a b s t r a c t Bovine viral diarrhea virus (BVDV) has been recognized as an important pathogen of livestock in Canada. The high mutation rate of this virus leads to a great degree of diversity between isolates resulting in the ability to infer precise evolutionary relationships. Many studies have attempted to elucidate the regional and global evolution of BVDV, but so far few have applied Bayesian methods to this end. We aimed to describe the molecular epidemiology and phylodynamics of BVDV 1a isolates in Western Canada using 50 UTR and E1–E2 sequence data, collection dates and locations. Sequences were obtained from isolates submitted to a diagnostic laboratory in Saskatoon, Canada. Path sampling and stepping stone model testing were employed to identify the model that best fit the data. We found that these Western Canadian isolates share a most recent common ancestor dated near 1909. Furthermore, the E1–E2 region shows a median substitution rate about ten times greater than the 50 UTR. It was also noted that caution should be exercised when inferring phylogenetic relationships using the 50 UTR alone, as it becomes difficult to resolve relationships within major clades. Phylogeographic and population size fluctuation estimates require more thorough sampling than was performed here to be reliable. We have found that there are significant gains to be made by utilizing a Bayesian analysis and by incorporating additional types of data beyond the sequence. These include the estimation of most common recent ancestor dates and the precise inference of transmission routes. Future work will expand upon these findings by more thoroughly sampling BVDV isolates spatially and temporally and further refining the Bayesian model employed here. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction Bovine viral diarrhea virus (BVDV) is a Flavivirus infecting cattle in Canada (Taylor et al., 1995). This single stranded RNA virus is responsible for significant economic losses (Houe, 1999) due to its production limiting effects (Campbell, 2004) and has been reported to occasionally cause severe hemorrhagic outbreaks with high mortality (Carman et al., 1998; Taylor et al., 1997). In North America, genotypes 1a, 1b and 2a are known to circulate widely with effectively all being the noncytopathic (ncp) biotype except in cases of mucosal disease where a superinfection with a cytopathic (cp) biotype causes a lethal outcome (Fulton et al., 2005; Peterhans et al., 2010). To combat the significant costs associated with BVDV infections, vaccination has become a routine practice in many beef and dairy operations, although its application is often inconsistent (Carruthers and Petrie, 1996). Further study of how this important pathogen changes genetically over time using phylodynamic techniques can foster the development of more efficacious vaccines, increase insight into transmission routes of ⇑ Corresponding author. Tel.: +1 403 220 7892. E-mail addresses: [email protected], [email protected] (A. Chernick).

pathogens, provide tools to establish effective eradication programs and give new insight into the basic evolutionary biology of viruses. High mutation rates are a key driver of a quasispecies distribution (Domingo et al., 2012; Eigen, 1971) within BVDV genotypes (Dow et al., submitted). Hepatitis C virus, also a member of the Flavivirus genus, exhibits mutation rates around 104 substitutions per site per replication (Cuevas et al., 2009). The resulting swarm of genetically unique viruses dramatically increases the ability of BVDV to adapt to rapidly changing environmental conditions, particularly within a single host (Collins et al., 1999; Jones et al., 2002). While this extensive diversity has important implications for infection control methods (such as vaccine development), it is also useful for molecular epidemiology and phylodynamic studies. A brief overview of viral phylodynamics is provided as it is thoroughly reviewed elsewhere (Volz et al., 2013). The rapid diversification of RNA genomes over relatively short periods of time allows for precise inference of phylogenetic relationships over time scales ranging from days (Fordyce et al., 2013) to centuries (Liu et al., 2009). For example, it is possible to estimate the date of the most recent common ancestor (MRCA) that is shared by all sequences in the analysis. This inference requires the estimation of mutation

1567-1348/$ - see front matter Ó 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.meegid.2014.01.003

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003

2

A. Chernick et al. / Infection, Genetics and Evolution xxx (2014) xxx–xxx

rates with an appropriate nucleotide substitution model. In the process of MRCA date estimation a phylogeny is also constructed. Using this phylogeny it is possible to infer other epidemiological properties such as historical fluctuations in population dynamics. For example, one can infer fluctuations in the effective population size over the phylogenetic tree (Drummond et al., 2005). Phylodynamics can also allow for the study of the spatial dispersion of ancestral viral strains to their contemporary locations in a geographical area by integrating viral collection dates and geographical collection locations into the analysis (Lemey et al., 2009). Only one such study has been previously reported for BVDV (Luzzago et al., 2012). It is not only possible to infer the history of the virus itself, but also, due to the close link between pathogen and host, the history of the host as well. Major geographical movements of the virus could be indicative of key migratory or transport routes followed by the host species. In the case of BVDV, it is likely that the movements of the virus mirror those of cattle due to the need for direct contact to facilitate viral transmission (Wentink et al., 1991), although indirect transmission routes are possible (Niskanen and Lindberg, 2003). Recent improvements in how we specify molecular clocks and nucleotide substitution models have increased the accuracy of phylodynamic inference. For example, evaluating the differences between sequences to determine a most recent common ancestor (MRCA) date is not a new concept (Zuckerkandl and Pauling, 1962), but modern refinements to the methods used have made it increasingly accurate. One such refinement is the use of different clock rates for different regions of a genome. This well established methodology accommodates for variation in mutation rates across a genome (Gibbs et al., 1998). Furthermore, a number of techniques to evaluate the fit of different models have also been developed recently, thus facilitating the construction of models that better fit the data (Baele et al., 2012). The use of extra information such as viral isolation dates can also be useful to estimate molecular clock rates accurately. Other additional data can be useful in further refining these sorts of analyses as well. Such metadata can include collection locations, clinical outcomes, tissues sampled and so on, depending on the nature of the question being asked. These data are then used to further inform a phylogenetic analysis and, in some cases, resolve ambiguities that can appear when the genetic sequence data is used in isolation. For example, if there is reason to believe that viruses with very difference clinical outcomes have distinct clades on a phylogenetic tree, an analysis can be constrained so that that clustering pattern is preserved. The ability to integrate this sort of prior knowledge is one of the key strengths of Bayesian methods. With respect to BVDV, the 50 untranslated region (50 UTR) region is a non-coding region of the genome. It has been used extensively (alone and occasionally with other gene regions) in phylogenetic analyses of BVDV. As of October 29, 2013 a search of NCBI PubMed using the term ‘‘(BVDV) and (phylogen*) and (50 UTR)’’ returned 76 results. The widespread use of the 50 UTR as a target for sequencespecific diagnostic methods and the resulting availability of this sequence data make it an obvious choice for phylogenetic work. However, it is not entirely clear what selective pressures are exerted on this region of the genome or even if the inferred relationships between 50 UTR sequences are representative of the overall BVDV phylogeny. It is also well documented that different regions of the BVDV genome mutate at different rates with the 50 UTR being one of the most stable (Hamers et al., 2001). The evolutionary pressures acting on other regions such as E2, a structural surface glycoprotein under intense immune surveillance, are much better understood (Brown et al., 2005). Additionally, the higher variability of a region such as E2 allows for a much higher resolution phylogeny over shorter time scales to be reconstructed compared to the relatively stable 50 UTR region. While phylogenies built on the

50 UTR sequence alone are still very informative (particularly over longer time scales), the use of additional sequence data in conjunction with metadata such as isolation dates can further clarify these complex relationships. This study aimed to examine the evolution of BVDV subgenotype 1a in Western Canada from 1999 to 2013. A phylodynamic analysis was undertaken to describe the genetic change, population dynamics, MCRA date and geographic movement of the virus using sequence data from the 50 UTR and E1–E2 regions, isolate collection dates and approximate collection locations. Our primary aim is to demonstrate the added value of Bayesian methods (Drummond et al., 2012; Heled and Drummond, 2008; Lemey et al., 2009) over other commonly employed tree-building techniques. For this reason we built neighbor-joining trees using the same data as a comparison. We also aim to demonstrate the importance of gene selection and the added benefit of sequencing multiple loci for such an analysis. In doing so this research provides a better understanding of the historical genetic dynamics of BVDV in Western Canada as well as arguing for the consideration of alternate methods of phylogenetic inference.

2. Materials and methods 2.1. Isolate and sequence preparation Twenty-six viral isolates were obtained from Prairie Diagnostic Services in Saskatoon, Saskatchewan. Dates of laboratory submission were used as an estimate of the actual collection dates and the postal codes of the submitting veterinary clinic were used as an estimate of the actual collection locations. The postal codes were geocoded to latitudes and longitudes using batchgeo.com. See Supplementary Table 1 for the dates and geocoded locations of all isolates. See Fig. 1 for a map of these locations. Regions of the 50 UTR and E1–E2 genes were amplified by reverse transcriptase PCR (RT-PCR) as described by Ridpath and Bolin (1998), Couvreur et al. (2002), respectively. Fragments were amplified using the Takara PrimeScript One Step RT-PCR Kit (Clontech, Mountain View, CA, USA) in a reaction volume of 25 lL. Thermal cycler conditions for the 50 UTR were 50 °C for 30 min, 94 °C for 2 min followed by 40 cycles of 98 °C for 10 s, 52 °C for 30 s and 72 °C for 1 min with a final extension at 72 °C for 7 min. Thermal cycler conditions for E1–E2 were 50 °C for 30 min, 94 °C for 2 min followed by 35 cycles of 94 °C for 30 s, 55 °C for 1 min and 72 °C for 1 min with a final extension at 72 °C for 7 min. PCR fragments were sent for Sanger sequencing using the forward primers used in their synthesis (Eurofins MWG Operon, Huntsville, AL, USA). The 50 UTR primers amplified a region from bases 165 to 388 on the NADL genome (GenBank: M31182) based on our sequence alignments. The E1–E2 fragments were from bases 2312 to 2708 on the same genome. All sequence reads were visually inspected for quality and edited where necessary. All sequences of the same gene fragment were aligned using MUSCLE (Edgar, 2004) in Geneious v6.1.6 (Biomatters, San Francisco, CA, USA). Alignments were visually inspected and corrected where necessary. In the case of the E1–E2 fragment the amino acid translation was used to correct both the alignment as well as any sequencing errors. The alignment was trimmed on the 50 end to the first full codon and then on the 30 end to make the alignment a uniform length. All sequences and metadata are available in GenBank (accession numbers GenBank: KF834209 – GenBank: KF834234 for 50 UTR and GenBank: KF834235 – GenBank: KF834260 for E1–E2). Isolates were genotyped by alignment of the 50 UTR sequences with the 50 UTR regions of a number of GenBank sequences of previously reported genotypes (1373 (GenBank: AF145967), type 2;

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003

A. Chernick et al. / Infection, Genetics and Evolution xxx (2014) xxx–xxx

3

Fig. 1. Map of the geocoded locations of all isolates used in this analysis. Note that several points on the map (colored green) actually represent multiple isolates obtained from the same postal code. These are shown as a single point for clarity. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

NADL (GenBank: M31182), type 1a; Osloss (GenBank: M96687), type 1b). A neighbor-joining tree was constructed to confirm that the sequences used in our analysis all clustered with type 1a reference strains and not types 1b or 2 (data not shown). 2.2. Likelihood mapping Likelihood mapping was performed to assess the phylogenetic content of these alignments separately without the use of the additional metadata. TreePuzzle v5.2 (Schmidt et al., 2002) was used to produce 10,000 random quartets of the sequences that were then resolved via maximum likelihood methods to assess the strength of the phylogenetic signal (Strimmer and von Haeseler, 1997). The resulting likelihoods were plotted on a triangular surface were the points represented the three well resolved phylogenies of each quartet and the areas in between represent either network-like relationships (sides of triangle) or star-like relationships (center). 2.3. Substitution models and molecular clocks Alignments were exported to BEAUti v1.7.5 and BEAST v1.7.5 (Drummond et al., 2012). BEAUti v1.7.5 was used to specify parameters of the analysis. Separate data partitions were created for each gene fragment to allow for separate substitution models and molecular clocks to be applied. The latitude and longitude of the collection locations were used to create the third partition to infer the locations of ancestral nodes on the tree. All tips were dated with their exact collection dates. Stepping stone and path sampling model testing were performed to determine which set of model parameters best fit the data (Baele et al., 2012). HKY (Hasegawa et al., 1985), TN93 (Tamura and Nei, 1993) and GTR (Tavare, 1986) nucleotide substitution models with either invariant sites

or gamma (four discrete categories) rate distributions were evaluated for each gene as well as four different tree priors. These included a coalescent constant population size (Kingman, 1982), coalescent exponential population growth (Griffiths and Tavare, 1994), extended Bayesian skyline plot (Heled and Drummond, 2008) and GMRF Bayesian skyride model (Minin et al., 2008). All 144 permutations of these variables were tested. The E1–E2 sequence was also partitioned into 3 separate codon positions to allow for different relative substitution rates at positions 1, 2 and 3 throughout the sequence. The gamma relaxed random walk model was used for the location partition (Lemey et al., 2010). This model attempts to infer the locations of ancestral sequences based on the branching pattern of the tree. Some location coordinates were moved within a few hundred meters of their original locations to accommodate the model’s intolerance of multiple isolates with differing genetic sequences occupying exactly the same location. This applies to any isolates with the same symbols in Fig. 3 (or the green symbols of the map in Fig. 1). Separate strict molecular clocks were used for each gene sequence to account for the different relationships between time and nucleotide substitutions in these different gene regions. Markov chain Monte Carlo (MCMC) lengths were set to 50 million steps in all cases and sampled so as to produce 10,000 log entries and trees. 2.4. BEAST output analysis BEAST log files were inspected in Tracer v1.5 to ensure an adequate effective sampling size (ESS) and mixing indicating that the evaluation had reached convergence while adequately sampling the range of possible model parameters. For the variables of interest an ESS of >200 was considered adequate. The trees file was summarized using TreeAnnotator v1.7.5 to produce a maximum

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003

4

A. Chernick et al. / Infection, Genetics and Evolution xxx (2014) xxx–xxx

clade credibility (MCC) tree with an initial burn in of 10% and median tree heights. Trees were then prepared in FigTree v1.3.1 (Rambaut, 2013). The MCC tree file was also processed using SPREAD v1.0.6 (Bielejec et al., 2011) to produce a keyhole markup language (KML) file readable in Google Earth (Google, Mountain View, CA, USA). The inferred population size from the extended Bayesian skyline plot was graphed for analysis.

alignment and an extended Bayesian skyline plot for the tree prior. For this model, path sampling indicated an increase in the log likelihood from the next best model of 1.78 and stepping stone sampling indicated an increase of 2.03. Both approaches to model selection agreed that this model fit the data best.

2.5. Neighbor-joining trees

The BEAST analysis using the best fitting model indicated a median root height of 104.03 years with a 95% highest probability density (HDP95) of 65.42–162.82. This equates to a most common recent ancestor dated in the year 1909.05 (HPD95 1850.26– 1947.66). Median clock rates in substitutions per site per year for the 50 UTR and E1–E2 regions were 5.90  104 (HPD95 2.80  104 9.66  104) and 1.26  103 (HPD95 7.05  1041.93  103), respectively. Codon partitioning of the E1–E2 alignment indicated a much higher relative rate parameter for the third codon position (median 1.54 HPD95 1.28–1.80) than the first (median 0.68 HPD95 0.48–0.89) or second (median 0.77 HPD95 0.52–1.05) positions. The MCC tree constructed from all BEAST trees is shown in Fig. 3. There are four main clusters defined by two nodes directly descendant from the root of the tree estimated to have occurred in 1925.72 and 1922.95. There is also a large extent of clustering associated with the postal code of origin (see the symbols adjacent to the isolate names). Neighbor-joining trees depict the inferred relationships using the genetic sequences alone. See Figs. 4 and 5 for the 50 UTR and E1–E2 based trees, respectively. All branches of these trees have been rotated so as to best match the topology of the MCC tree generated using BEAST. It should be noted that the general topology of these trees is quite similar to one another as well as to the MCC tree. However, there are several polytomies present in addition to numerous poorly supported branches, particularly in the 50 UTR tree. Another key difference to note is the different scales used. Those on the neighbor-joining trees are in nucleotide substitutions while the one on the MCC tree is in years. Between the neighborjoining trees there is also a significant difference in the scale stemming from the increased variability of the E1–E2 alignment relative to the 50 UTR alignment.

Neighbor-joining trees were constructed using the same model selected for the Bayesian approach. These trees are based on either alignment alone and do not include any additional metadata. They are intended to act as a comparison between a commonly employed method of tree building and the methods used here. 3. Results 3.1. Likelihood mapping The likelihood maps of the 50 UTR and E1–E2 regions are presented in Fig. 2. In the case of the 50 UTR map, 83.4% of all quartets were well resolved into tree like structures with only 16.2% having star-like phylogenies. Similarly for the E2 region, 88.7% were located at the points of the plot with only 7.0% in the center. This is indicative of a high amount of phylogenetic content in both cases, although somewhat more in the case of the E1–E2 alignment. 3.2. Model testing The evaluation of 144 different combinations of model choices showed that certain models (a) were more computationally tractable than others and (b) produced higher marginal log likelihoods than others. Models using some combination of HKY and TN93 substitution models always converged well and had ESS values >200 for all key parameters. Models where one or both sequences were evaluated with a GTR substitution model did not converge within 50 million steps and, as a result, had ESS values <200 for many parameters. The lack of convergence after so many steps indicates the need for further optimization of this substitution model to this dataset. For this reason the GTR model was not considered for use here. Of all of the combinations not using the GTR model, the one with the highest log marginal likelihoods used the TN93 model with a gamma distribution of rates for the 50 UTR alignment, the HKY model with invariant sites for the E1–E2

3.3. Phylodynamic analysis and trees

3.4. Population dynamics and phylogeography The best fitting model used a coalescent extended Bayesian skyline tree prior to describe the population dynamics going back in time. While this prior was the best fit to the data, the estimates of historic effective population size change are not presented here

Fig. 2. Likelihood map of the random quartets assessed by TreePuzzle v.5.2 for the 50 UTR (left) and E1–E2 (right) gene regions. The dots each represent the plotted likelihood of one resolved quartet while the percentages describe the proportion of quartets in each region of the triangle. Quartets falling in the corners represent well-resolved quartets while those on the edges represent network-like trees. Those in the middle represent star-like trees. These figures indicate that there is a stronger, more tree-like phylogenetic signal in the E1–E2 alignment than the 50 UTR alignment.

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003

A. Chernick et al. / Infection, Genetics and Evolution xxx (2014) xxx–xxx

5

Fig. 3. MCC tree produced with the BEAST v1.7.5 output using TreeAnnotator v1.7.5. Branches are shaded based on the posterior probability (grey = 0, black = 1). Nodes are labeled with the median estimated year of divergence for those clades. In cases where multiple isolates were contributed from the same postal code the names were labeled with matching symbols. This tree utilizes the sequence alignments for both the 50 UTR (with a TN93 model) and E1–E2 (with an HKY model) as well as the dates noted in the isolate names. Phylogeographic transmission patters were also inferred from this tree using SPREAD v1.0.6 (not shown).

due to the small number of isolates analyzed and the potentially significant sampling bias associated with diagnostic laboratory submissions. Similarly, the phylogeographic inference did not produce reliable results and is not presented. 4. Discussion A key consideration when constructing a phylogeny is the selection of an appropriate gene region(s). Although more conserved regions with less variability may be useful for inferring more distant relationships they often lack the required diversity to resolve more recent evolutionary events. Several publications have indicated the danger of inferring BVDV phylogenies using the 50 UTR alone (Becher et al., 1997; Xia et al., 2007). The main pitfall also noted in this study is that the 50 UTR region lacks the diversity to clearly infer relationships within major clades. The relationship between isolates based on the 50 UTR region remained unresolved in many cases. In fact, many of the isolates share identical sequences that resulted in uninformative polytomies or branches that are poorly supported via bootstrap analysis. This improved with the use of the E1–E2 region where the increased sequence length and diversity resulted in fewer cases of polytomy and better bootstrap support of branches leading to a more resolved tree. The improvement seen here was not trivial and supports the conclusion that the addition of other gene regions and longer gene segments to a phylogenetic analysis should always be considered. Bayesian approaches in phylogenetic analyses of BVDV have been used to simultaneously evaluate a number of gene regions (Liu et al., 2009; Xia et al., 2007) and to include additional data types (Luzzago et al., 2012). The Bayesian analysis performed here was an extension of these methods applied to BVDV type 1a isolates from Western Canada. The estimates of substitution rates produced in our analysis are consistent with those produced in a

previous Bayesian analysis of BVDV. Luzzago et al. (2012) found a mean rate of 9.3  103 substitutions per site per year in the 50 UTR. This estimate falls well within the HPD95 of our estimate. Brown et al. (2005) found intra-patient mutation rates for the E1–E2 gene region of hepatitis C to be in a similar range as our estimate for the region as well. Our finding of a median year of 1909 for the MRCA is consistent with early reports of BVDV. These were first published in 1946 and the virus was presumably in circulation for some time prior to that (Childs, 1946; Olafson et al., 1946). Indeed, the introduction of BVDV to Western Canada could not have happened much prior to this date seeing as cattle were not present in any significant numbers until the late 1800’s (Government of Alberta, 2013). Although the HPD95 from our analysis was quite large (1850–1948), more thorough sampling over a larger time frame would improve the accuracy of this estimate. Additionally, the higher relative substitution rate of the third codon position relative to the first and second is consistent with the wobble hypothesis (Crick, 1966). 4.1. Model selection Selection of the appropriate substitution model and tree priors for a Bayesian analysis is crucial. These parameters define the underlying assumptions of the phylogenetic inference and should reflect one’s prior knowledge while being as uninformed as possible with respect to aspects where there is little prior knowledge. In this study the more computationally demanding stepping stone and path sampling (Baele et al., 2012) model selection methods were chosen for their increased accuracy over the harmonic mean estimator (Newton and Raftery, 1994). They were applied to select both tree priors and nucleotide substitution models. Because tree priors account for changes in the population size over the inferred phylogeny and there is little prior knowledge regarding the popu-

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003

6

A. Chernick et al. / Infection, Genetics and Evolution xxx (2014) xxx–xxx

Fig. 4. Neighbor-joining tree based on the same 50 UTR sequence alignment used to construct the MCC tree in Fig. 2. A TN93 substitution model was used (the same as chosen via the model selection procedure) and the branches are shaded based on their bootstrap support values (grey = 0, black = 1).

lation size fluctuations of BVDV, several were evaluated to ensure the best fitting model was identified. The GTR substitution model was initially considered, particularly due to its use in similar work (Luzzago et al., 2012). However, the greatly increased number of parameters led to difficulties in reaching convergence of those analyses; therefore they were not considered for the final analysis. Further optimization of priors and longer runs may resolve this issue. However, while it is entirely possible that another analysis including the GTR model would have fit the data better than the one presented here, it is not likely to have significantly altered the outcomes of this study. This stems from a comparison of the MCC trees from other model permutations that still converged but did not have the best fit to the data (not shown). Most trees contained very similar branching patterns with the estimated divergence dates usually having overlapping HPD95 intervals. It appears that, with this data, the phylogenetic signal is strong enough to largely overpower changes to the nucleotide substitution models and tree priors. Although a more thorough assessment of the GTR model is needed for future work it is unlikely to have significantly influenced the phylogeny presented here. 4.2. Phylogeography and population dynamics The gamma relaxed random walk model was included and attempted to infer the physical geographical coordinates of the ancestral nodes (Lemey et al., 2010). The ‘‘relaxed’’ aspect refers to how the model allows for variable diffusion rates on different branches of the tree, thus accommodating non-uniform dispersion of various ancestral viruses. A noted limitation of this model is the need to modify the locations of isolates if they originate from exactly the same position. The likelihood of two isolates with different sequences occupying an identical location is very low or

effectively zero as calculated using these methods. The model cannot infer ancestral locations if the likelihood of the initial model is zero and this dataset contains several sequences from the same submitting veterinary clinic. The necessary modifications to the location data are described in Section 2.3. In this study the inference of precise transmission routes was not possible. The transmission routes identified are incongruous with previous knowledge of BVDV transmission. The single largest factor determining transmission routes is cattle movements (Courcoul and Ezanno, 2010; Van Campen, 2010) and the first documented cases of BVDV are from North America (Childs, 1946; Olafson et al., 1946). As such the global transmission patterns observed seem unlikely. A more plausible outcome would involve more regional transmission across Western Canada over this time period. However, there is still a link between the sequence-based phylogeny and the virus collection location. The detection of postal code-specific BVDV strains (see the symbols in Fig. 3) is consistent with other publications the have found farm-specific strains (Paton et al., 1995). As farms can be used as a proxy for geographical location, this finding supports the idea that with a more thorough approach to sampling it should be possible to infer precise transmission routes using the phylogeographic approaches outlined in this study. A more representative sampling approach is required to properly construct precise transmission routes. These samples need to accurately represent the BVDV population in Western Canada going back through time. This means more thorough sampling of the livestock population, incorporating older isolates into the analysis and less reliance on diagnostic laboratory submissions alone. A partial solution is the use of sequences from publically accessible databases. However, most sequences in these databases are not thoroughly annotated; at best they have a country

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003

A. Chernick et al. / Infection, Genetics and Evolution xxx (2014) xxx–xxx

7

Fig. 5. Neighbor-joining tree based on the same E1–E2 sequence alignment used to construct the MCC tree in Fig. 2. A HKY substitution model was used (the same as chosen via the model selection procedure) and the branches are shaded based on their bootstrap support values (grey = 0, black = 1).

and year of isolation. To infer transmission patterns on the scale we have attempted these metadata must be at a much higher resolution. In order for databases sequences to be useful in these sorts of analyses they should be annotated with exact collection dates and precise latitudes and longitudes of collection locations. In a similar fashion it is not possible to infer the ancestral population dynamics of BVDV in this dataset. The small sample size and bias towards diagnostic laboratory submissions makes the inferred changes unreliable even though a tree prior designed specifically to estimate those parameters (Bayesian skyline) was used. As a primary challenge of this preliminary study was the need for more thorough sampling we recommend that all sequence data submitted to publically accessible databases include as much additional information as possible (Parr et al., 2012). Thoroughly annotated sequence files will allow an increased payout of the original data generation efforts. To better understand the phylogeography of Canadian BVDV an expanded sampling of archival and contemporary strains will be undertaken. Additionally, other gene regions and full genome sequences will be evaluated for their use in future studies that will employ similar analytical methods as those described here.

Acknowledgments We would like to thank Prairie Diagnostic Services for their invaluable contribution of BVDV isolates as well as Sandeep Atwal for his technical assistance in the laboratory. Additionally we thank all veterinarians and farmers for their sample submissions and participation in the project. Funding for this project was provided by the Alberta Livestock and Meat Agency (ALMA) and the Margaret Gunn Endowment for Animal Research.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.meegid.2014.01. 003.

References Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M.A., Alekseyenko, A.V., 2012. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol. Biol. Evol. 29, 2157–2167. Becher, P., Orlich, M., Shannon, A.D., Horner, G., Konig, M., Thiel, H.J., 1997. Phylogenetic analysis of pestiviruses from domestic and wild ruminants. J. Gen. Virol. 78 (Pt. 6), 1357–1366. Bielejec, F., Rambaut, A., Suchard, M.A., Lemey, P., 2011. SPREAD: spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics 27, 2910–2912. Brown, R.J., Juttla, V.S., Tarr, A.W., Finnis, R., Irving, W.L., Hemsley, S., Flower, D.R., Borrow, P., Ball, J.K., 2005. Evolutionary dynamics of hepatitis C virus envelope genes during chronic infection. J. Gen. Virol. 86, 1931–1942. Campbell, J.R., 2004. Effect of bovine viral diarrhea virus in the feedlot. Vet. Clin. North Am. Food Anim. Pract. 20, 39–50. Carman, S., van Dreumel, T., Ridpath, J., Hazlett, M., Alves, D., Dubovi, E., Tremblay, R., Bolin, S., Godkin, A., Anderson, N., 1998. Severe acute bovine viral diarrhea in Ontario, 1993–1995. J. Vet. Diagn. Invest. 10, 27–35. Carruthers, T.D., Petrie, L., 1996. A survey of vaccination practices against bovine viral diarrhea (BVD) virus in Saskatchewan dairy herds. Can. Vet. J. 37, 621–622. Childs, T., 1946. X Disease of Cattle – Saskatchewan. Can. J. Comp. Med. Vet. Sci. 10, 316–319. Collins, M.E., Desport, M., Brownlie, J., 1999. Bovine viral diarrhea virus quasispecies during persistent infection. Virology 259, 85–98. Courcoul, A., Ezanno, P., 2010. Modelling the spread of bovine viral diarrhoea virus (BVDV) in a managed metapopulation of cattle herds. Vet. Microbiol. 142, 119– 128. Couvreur, B., Letellier, C., Collard, A., Quenon, P., Dehan, P., Hamers, C., Pastoret, P.P., Kerkhofs, P., 2002. Genetic and antigenic variability in bovine viral diarrhea virus (BVDV) isolates from Belgium. Virus Res. 85, 17–28. Crick, F.H.C., 1966. Codon—anticodon pairing: the wobble hypothesis. J. Mol. Biol. 19, 548–555.

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003

8

A. Chernick et al. / Infection, Genetics and Evolution xxx (2014) xxx–xxx

Cuevas, J.M., Gonzalez-Candelas, F., Moya, A., Sanjuan, R., 2009. Effect of ribavirin on the mutation rate and spectrum of hepatitis C virus in vivo. J. Virol. 83, 5760– 5764. Domingo, E., Sheldon, J., Perales, C., 2012. Viral quasispecies evolution. Microbiol. Mol. Biol. Rev. 76, 159–216. Drummond, A.J., Rambaut, A., Shapiro, B., Pybus, O.G., 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192. Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A., 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. Eigen, M., 1971. Self organization of matter and the evolution of biological macromolecules. Naturwissenschaften 58, 465–523. Fordyce, S.L., Bragstad, K., Pedersen, S.S., Jensen, T.G., Gahrn-Hansen, B., Daniels, R., Hay, A., Kampmann, M.L., Bruhn, C.A., Moreno-Mayar, J.V., Avila-Arcos, M.C., Gilbert, M.T., Nielsen, L.P., 2013. Genetic diversity among pandemic 2009 influenza viruses isolated from a transmission chain. Virol. J. 10, 116. Fulton, R.W., Ridpath, J.F., Ore, S., Confer, A.W., Saliki, J.T., Burge, L.J., Payton, M.E., 2005. Bovine viral diarrhoea virus (BVDV) subgenotypes in diagnostic laboratory accessions: distribution of BVDV1a, 1b, and 2a subgenotypes. Vet. Microbiol. 111, 35–40. Gibbs, P.E.M., Witke, W.F., Dugaiczyk, A., 1998. The molecular clock runs at different rates among closely related members of a gene family. J. Mol. Evol. 46, 552–561. Government of Alberta, December 30, 2013. Government of Alberta History http:// alberta.ca/history.cfm. Griffiths, R.C., Tavare, S., 1994. Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. London B 344, 403–410. Hamers, C., Dehan, P., Couvreur, B., Letellier, C., Kerkhofs, P., Pastoret, P.P., 2001. Diversity among bovine pestiviruses. Vet. J. 161, 112–122. Hasegawa, M., Kishino, H., Yano, T.-A., 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174. Heled, J., Drummond, A.J., 2008. Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8, 289. Houe, H., 1999. Epidemiological features and economical importance of bovine virus diarrhoea virus (BVDV) infections. Vet. Microbiol. 64, 89–107. Jones, L.R., Zandomeni, R., Weber, E.L., 2002. Quasispecies in the 50 untranslated genomic region of bovine viral diarrhoea virus from a single individual. J. Gen. Virol. 83, 2161–2168. Kingman, J.F.C., 1982. The coalescent. Stochastic Processes Appl. 13, 235–248. Lemey, P., Rambaut, A., Drummond, A.J., Suchard, M.A., 2009. Bayesian phylogeography finds its roots. PLoS Comput. Biol. 5, e1000520. Lemey, P., Rambaut, A., Welch, J.J., Suchard, M.A., 2010. Phylogeography takes a relaxed random walk in continuous space and time. Mol. Biol. Evol. 27, 1877– 1885. Liu, L., Xia, H., Wahlberg, N., Belak, S., Baule, C., 2009. Phylogeny, classification and evolutionary insights into pestiviruses. Virology 385, 351–357. Luzzago, C., Ebranati, E., Sassera, D., Lo Presti, A., Lauzi, S., Gabanelli, E., Ciccozzi, M., Zehender, G., 2012. Spatial and temporal reconstruction of bovine viral diarrhea virus genotype 1 dispersion in Italy. Infect. Genet. Evol. 12, 324–331. Minin, V.N., Bloomquist, E.W., Suchard, M.A., 2008. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471.

Newton, M.A., Raftery, A.E., 1994. Approximate Bayesian-inference with the weighted likelihood bootstrap. J. Roy. Stat. Soc. B Met. 56, 3–48. Niskanen, R., Lindberg, A., 2003. Transmission of bovine viral diarrhoea virus by unhygienic vaccination procedures, ambient air, and from contaminated pens. Vet. J. 165, 125–130. Olafson, P., Mac, C.A., Fox, F.H., 1946. An apparently new transmissible disease of cattle. Cornell Vet. 36, 205–213. Parr, C.S., Guralnick, R., Cellinese, N., Page, R.D., 2012. Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol. Evol. 27, 94–103. Paton, D.J., Carlsson, U., Lowings, J.P., Sands, J.J., Vilcˇek, S., Alenius, S., 1995. Identification of herd-specific bovine viral diarrhoea virus isolates from infected cattle and sheep. Vet. Microbiol. 43, 283–294. Peterhans, E., Bachofen, C., Stalder, H., Schweizer, M., 2010. Cytopathic bovine viral diarrhea viruses (BVDV): emerging pestiviruses doomed to extinction. Vet. Res. 41, 44. Rambaut, A., October 29, 2013. FigTree v1.3.1, http://tree.bio.ed.ac.uk/software/ figtree/. Ridpath, J.F., Bolin, S.R., 1998. Differentiation of types 1a, 1b and 2 bovine viral diarrhoea virus (BVDV) by PCR. Mol. Cell. Probes. 12, 101–106. Schmidt, H.A., Strimmer, K., Vingron, M., von Haeseler, A., 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504. Strimmer, K., von Haeseler, A., 1997. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc. Natl. Acad. Sci. U.S.A. 94, 6815–6819. Tamura, K., Nei, M., 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526. Tavare, S., 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura, R.M. (Ed.), Lectures on Mathematics in the Life Sciences. American Mathematical Society, Providence, RI, pp. 57–86. Taylor, L.F., Van Donkersgoed, J., Dubovi, E.J., Harland, R.J., van den Hurk, J.V., Ribble, C.S., Janzen, E.D., 1995. The prevalence of bovine viral diarrhea virus infection in a population of feedlot calves in western Canada. Can. J. Vet. Res. 59, 87–93. Taylor, L.F., Janzen, E.D., Ellis, J.A., van den Hurk, J.V., Ward, P., 1997. Performance, survival, necropsy, and virological findings from calves persistently infected with the bovine viral diarrhea virus originating from a single Saskatchewan beef herd. Can. Vet. J. 38, 29–37. Van Campen, H., 2010. Epidemiology and control of BVD in the U.S. Vet. Microbiol. 142, 94–98. Volz, E.M., Koelle, K., Bedford, T., 2013. Viral phylodynamics. PLoS Comput. Biol. 9, e1002947. Wentink, G.H., van Exsel, A.C., de Goey, I., van Lieshout, J.A., 1991. Spread of bovine virus diarrhoea virus in a herd of heifer calves. Vet. Q. 13, 233–236. Xia, H., Liu, L., Wahlberg, N., Baule, C., Belak, S., 2007. Molecular phylogenetic analysis of bovine viral diarrhoea virus: a Bayesian approach. Virus Res. 130, 53–62. Zuckerkandl, E., Pauling, L.B., 1962. Molecular disease, evolution, and genic heterogeneity. In: Kasha, M., Pullman, B. (Eds.), Horizons in Biochemisty. Academic Press, New York, NY, pp. 189–225.

Please cite this article in press as: Chernick, A., et al. Metadata beyond the sequence enables the phylodynamic inference of bovine viral diarrhea virus type 1a isolates from Western Canada. Infect. Genet. Evol. (2014), http://dx.doi.org/10.1016/j.meegid.2014.01.003