Accepted Manuscript High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses
A.F. Barry, R. Durães-Carvalho, E.F. Oliveira-Filho, A.A. Alfieri, W.H.M. Van der Poel PII: DOI: Reference:
S1567-1348(17)30330-1 doi:10.1016/j.meegid.2017.09.024 MEEGID 3280
To appear in:
Infection, Genetics and Evolution
Received date: Revised date: Accepted date:
23 August 2017 18 September 2017 19 September 2017
Please cite this article as: A.F. Barry, R. Durães-Carvalho, E.F. Oliveira-Filho, A.A. Alfieri, W.H.M. Van der Poel , High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses. The address for the corresponding author was captured as affiliation for all authors. Please check if appropriate. Meegid(2017), doi:10.1016/j.meegid.2017.09.024
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
1
High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses
T
Barry A.F.a*, Durães-Carvalho R.b*†, Oliveira-Filho E.F.*b, Alfieri A.A.a, Van der Poel W.H.M.c
CR
a
IP
*These authors contributed equally to this work
Laboratory of Animal Virology, Department of Preventive Veterinary Medicine,
US
Universidade Estadual de Londrina. Campus Universitário, PO Box 6001, 86051-990,
b
AN
Londrina, Paraná, Brazil.
Department of Virology, Aggeu Magalhães Institute, Oswaldo Cruz Foundation (FIOCRUZ),
Wageningen Bioveterinary Research, Wageningen University and Research, Department of
ED
c
M
Av. Professor Moraes Rego s/n, Cidade Universitária, Recife, PE 50670-420, Brazil.
Virology, P.O. Box 65, 8200 AB Lelystad, Edelhertweg 15,
CE
PT
8219 PH Lelystad, The Netherlands.
†Corresponding author: Ricardo Durães-Carvalho (
[email protected]). Departamento de
AC
Virologia, Instituto Aggeu Magalhães (IAM), Fundação Oswaldo Cruz (FIOCRUZ), Recife-PE, Brasil. Abstract The evolution, epidemiology and zoonotic aspects of Sapoviruses (SaV) are still not well explored. In this study, we applied high-resolution phylogeny to investigate the epidemiological and zoonotic origins as well as taxonomic aspects of animal and human SaV. Bayesian framework analyses showed an increase in porcine SaV (PoSaV) population dynamics between
ACCEPTED MANUSCRIPT
2
1975 to 1982, resulting in a SaV gene flow and generation of new strains amongst porcine and human populations. Our results also show the contribution of different animal populations involved in SaV epidemiology and highlight zoonotic aspects, as exemplified by the crucial role that swine, dogs, mink and humans play in SaV spread. Additionally, phylogenetic analysis
T
suggests that bats may play key role in SaV epidemiology. According to our hypothesis, these
IP
animals may act as reservoirs or intermediate host species, contributing to viral dispersion in
CR
zoonotic and other epidemiological scenarios and facilitating the generation of new SaV genogroups and genotypes through recombination events. Data from large-scale phylogeny
US
partition based on patristic distance, did not show a correlation between transmission clusters on
AN
generation of SaV genogroups, nevertheless we present both important findings about SaV
M
taxonomy and important considerations useful for further taxonomical studies.
ED
Keywords: Sapoviruses; Phylogeny; Epidemiology; Taxonomy; Zoonoses
PT
1. Introduction
CE
Viral gastroenteritis is a major worldwide public health problem, and the enteric caliciviruses (norovirus – NoV and sapovirus – SaV) are responsible for most reported cases
AC
(Fankhauser et al., 2002; Lopman et al., 2003). NoV is the leading cause of outbreaks, but studies have described that SaV infection also plays an important role in cases of gastroenteritis in humans (Hansman et al., 2007a; Svraka et al., 2010). The epidemiological profile of SaV infections is not well understood, since human and animal SaV strains present a certain genetic similarity and their zoonotic potential still needs to be investigated (Bank-Wolf et al., 2010).
ACCEPTED MANUSCRIPT
3
The Sapovirus genus belongs to the Caliciviridae family which also includes the genera Norovirus, Lagovirus, Vesivirus, Nebovirus, Recovirus and Valovirus (Green et al., 2000; Smiley et al., 2002; Farkas et al., 2008; L’Homme et al., 2009). Sapovirus particles are non-enveloped, 27 to 35 nm diameter in size, and of icosahedral symmetry (Guo et al., 1999). The single-
T
stranded, positive-sense, polyadenylated RNA genome, approximately 7.3 kb in length, is
IP
organized in two or three open reading frames (ORFs). ORF1 encodes a polyprotein that is
CR
cleaved simultaneously into the viral non-structural proteins and the main capsid protein (VP1) (Oka et al., 2005). Another structural protein is encoded by ORF2 and is responsible for
US
stabilization of the viral particle and regulation of VP1 expression (Bertolotti-Ciarlet et al.,
AN
2003). Some human SaV strains possess an additional ORF that overlaps the 5’ end of the VP1 gene (Clarke and Lambden, 2000).
M
Since SaVs present high genetic variability and are constantly evolving, most studies
ED
perform SaV screening based on RT-PCR to target the highly conserved RNA-dependent RNA polymerase (RdRP). This is then followed by sequencing, targeting a more variable region
PT
(VP1), which allows molecular identification and phylogenetic analysis to characterize the
CE
different circulating SaV strains (Reuter et al., 2010). The definition of genogroups and genotypes is very important for clarifying SaV
AC
molecular epidemiology. For porcine and other animal SaVs, there is still no consensus concerning genogrouping. Classification systems are based on either complete or partial RdRP and VP1 genes (Farkas et al., 2004; L’Homme et al., 2009; Reuter et al., 2010). In addition, the current knowledge about SaV evolution and evolutionary mechanism is very limited. In the present study, we used an approach based on large-scale datasets using a Bayesian framework to
ACCEPTED MANUSCRIPT
4
understand the epidemiologic origin and zoonotic aspects of the sapoviruses and herewith propose a classification system for SaV(genogroups). 2. Material and Methods 2.1. Sequences dataset
T
All VP1 (n= 514), RdRP (n=180) and whole-genome (n= 35) sequences of SaV were
IP
downloaded from the GenBank database up to February 2017. Then, the sequences were filtered
CR
by known location and sampling date (1979 to 2014), aiming to calibrate divergence of time estimates and geographical ranges. Sequences containing degenerate bases, information such as
US
virus isolated from "unknown hosts”, sequences from clones and recombinants were discarded
AN
from our analyses. The datasets were aligned using the MUSCLE v3.8.31 software (Edgar, 2003).
M
2.2. Phylogenetic analysis
ED
In order to check the evolutionary relationships among SaV genogroups and genotypes, phylogenetic reconstructions were performed using the Maximum Likelihood (ML) method
PT
implemented in FastTree v.2.1.7 (Price et al., 2010) and IQ-TREE (Trifinopoulos et al., 2016)
CE
softwares, using the standard implementation GTR + CAT with 20 gamma distribution parameters and a mix of Nearest-Neighbor Interchanges (NNI) and Sub-Tree-Prune-Regraft
AC
(SPR) (FastTree), and the combination of hill-climbing algorithms, random perturbation of current best trees, and a broad sampling of initial starting trees (IQ-TREE). A methodological approach for extraction of large-scale phylogenetic partitions based on patristic distance and SHlike support was also applied to identify transmission clusters among SaVs genogroups and genotypes aiming to investigate different aspects on the SaV epidemiology taxonomy (Prosperi et al., 2011).
ACCEPTED MANUSCRIPT
5
The presence of phylogenetic signals was investigated by the likelihood mapping analysis of 10,000 random quartets generated using Tree-Puzzle v.5.2 software (available at: http://www.tree-puzzle.de/) (Strimmer and von Haeseler, 1997; Schmidt et al., 2001). The Pairwise Homoplasy Index (PHI) test for evidence of recombination was implemented in the
T
SplitsTree software v.4.10 following the default settings (Bruen et al., 2006). The reliability of
IP
the nodes was analyzed by the Shimodaira-Hasegawa (SH-like) test, aBayes and ultrafast
CR
bootstrap supports values with 1000 replicates.
2.3. Epidemiological insights and zoonotic aspects of the SaV dispersion
US
Phylogeographic and spatiotemporal analyses were applied to study the geographic
AN
spread pattern and population dynamics of SaVs hosted in different species and isolated in different years and locations. For this purpose, a Bayesian framework analysis was applied
M
through the implementation of the Metropolis-Hasting Markov Chain Monte Carlo (MCMC)
ED
algorithm in the Bayesian Evolutionary Analysis Sampling Trees (BEAST) software package, v1.8.0 (Drummond and Rambaut, 2007). The coalescent demographic model Bayesian Skyline
PT
Plot (BSP) was used imposing a strict or relaxed molecular clock (with log-normal distributions
CE
rates). The Lineage Through Time (LTT) plot was also applied to check the overall pattern of SaV diversification over time.
AC
The Markov model of nucleotide substitution was indicated by the software jModelTest v.2.1.6 (Darriba et al., 2011). The MCMC algorithm was run up to 1 billion generations, with sampling every 100,000 generations, for each molecular clock. Good mixing of the MCMC was determined by Effective Sample Size (ESS) values ≥ 200 and the convergence of parameters was checked in Tracer v1.6 software (available at http://beast.bio.ed.ac.uk/) with 10% burn-in. The marginal likelihood for each clock model was obtained using the path sampling and stepping-
ACCEPTED MANUSCRIPT
6
stone algorithm (Baele et al., 2013a; Baele et al., 2013b). The posterior distribution of Maximum Clade Credibility (MCC) tree was summarized by TreeAnnotator (implemented in BEAST v.1.8.0
package)
and
visualized
in
FigTree
v.1.4.2
software
(available
at:
http://tree.bio.ed.ac.uk/software/figtree/). Phylogenetic uncertainties were estimated by the 95%
T
Highest Probability Density (HPD) intervals. Evolutionary parameters such as nucleotide
IP
substitution rates, tree topologies, SaVs demographic histories and time of the most recent
CR
common ancestor (tMRCA) were estimated along of SaV whole-genome sequences. 2.4. Recombination analysis
US
A full exploratory recombination scan was performed to detect SaV recombinants, minor
AN
and major parental sequences over whole-genome sequences, using the Recombination Detection Program (RDP) v.4.80 (Martin et al., 2015), and the algorithms embedded in it: MaxChi (Smith,
M
1992); PhylPro (Weiller, 1998); GENECONV (Padidam et al., 1999), LARD (Holmes et al.,
ED
1999); SIScan (Gibbs et al., 2000); CHIMAERA (Posada and Crandall, 2001); BootScan (Martin et al., 2005) and 3Seq (Boni et al., 2006); p-values ≥ 0.05 were regarded as statistically
PT
significant. Only statistically significant events over nine programs were considered for the
3. Results
CE
presence of recombination.
AC
3.1. Phylogenetic relationships among SaV genogroups and genotypes Initially, we analyzed large datasets comprising all available SaV RdRP (n= 180) and VP1 (n= 514) sequences to better understand the taxonomy in different host species (see Supplementary material). Our results show a massive amount of RdRP sequences from PoSaV and HuSaV, as well as HuSaV-like detected in other species (Fig. 1A). On the other hand, the amount of SaV VP1 sequences is almost three times higher if compared to RdRP (Fig. 1B).
ACCEPTED MANUSCRIPT
7
Moreover, an interesting pattern in the evolutionary dynamics of SaV showed that different host species (such as swine, dog, mink and human) apparently might have played an important role in SaV dispersion. In this large-scale approach, we selected sequences from well-defined genogroups (GI,
T
GII, GIII, GIV, V), other genogroups proposed by Scheuer et al. (2013) and other non-assigned
IP
sequences available on the GenBank database. In addition, we established two criteria for
CR
grouping besides the phylogenetic analyses. For a new strain be considered as a new genogroup it should have a complete genomic sequence or, at least, complete VP1/RdRP sequence and it
US
must be grouped in different clades (I). In addition, we considered the existence of certain non-
AN
mathematical features such as geographic and transmission clusters among SaV-infected hosts (II). Following our approach, viruses not meeting these criteria could not be grouped in a
ED
genogroups (Table 1 and Fig. 2).
M
potential new genogroup. Accordingly, we propose that SaV should be organized in ten
Large-scale phylogenetic partition approach was applied to the large RdRP and VP1 ML
PT
trees for SaV grouping (to select clusters based on patristic distances and SH-like support more
CE
than 90%). Sequences were extracted from large ML trees and new phylogenies were reconstructed aiming to trace a strategy for the establishment of different taxonomic groupings of
AC
viruses (Fig. 1). However, analysis using the methodology for large-scale phylogeny partition was not successful to separate genogroups defined when applying the optimal threshold interval corresponding to absolute distance range of 0.05 nucleotide substitutions per site (see Supplementary material). 3.2. Epidemiologic origin and zoonotic aspects of SaVs
ACCEPTED MANUSCRIPT
8
A Bayesian coalescent-based method imposing a relaxed molecular clock was applied to trace the epidemiologic, zoonotic and phylogeographic origins of SaVs hosted in different animals. We applied the coalescent model Bayesian Skyline Plot (BSP) to evaluate SaV population dynamics over time and the Lineage through Time (LTT) for an overall inference
T
about the pattern of viral diversification. Our results suggest the potential role of swine host in
IP
contributing to the viral dissemination and increase in SaV population dynamics and
CR
epidemiological profile (Fig. 3A and Fig. S1). The dated Bayesian Maximum Clade Credibility (MCC) tree shows a SaV isolated from swine in 1979 (KT922087.1_PoSaV_Cowden_1979_US)
US
as the most recent common ancestor (MRCA) responsible for the SaV introduction and spread to
AN
other species. Our results also highlight an increase in PoSaV population dynamics between 1975-1982 (Fig. 3A and Fig. 3B), exhibiting a SaV gene flow amongst animals of different
M
countries until finally reaching the human population. The evolutionary jumps of PoSaV to other
ED
host species probably as acted as a source for the subsequent SaV lineages, triggering the
(Fig. 3A and Fig. S1).
PT
generation of recombinant strains and thus playing an important role in sapovirus epidemiology
CE
The BSP and LTT analyses show SaV genetic variation over time, taking into account the presence of different hosts involved in their epidemiology. Based on a peak related to the viral
AC
effective population size (Ne) near 1978 and 80’s, it was possible to hypothesize concerning the role of different animals strongly contributing to the epidemiology of this virus (Fig. 3B and Fig. 3C). In order to better understand the SaV evolution, we also mapped the SaV mutation rate over time. Genome-wide estimation of the mutation rate of SaV was 2.723-2 changes per site per replication cycle (Fig. 3D). This data corroborates similar reports for NoVs (Cuevas et al., 2016).
ACCEPTED MANUSCRIPT
9
Complete genomic sequence analysis via the ML method also inferred probable evolutionary and epidemiological ways of SaV evolution. The phylogenetically-based statistical methods used (SH-aLRT, aBayes and ultrafast bootstrap supports), reinforce that HuSaV as well as SaV isolated in other animals, apparently originate from recombinant SaV isolated from swine
T
(KJ508818.1 and AY974192.2). In addition, we also highlight the important role that bats and
IP
swine play in SaV dispersion (Fig. 4 and Fig. S2).
CR
Evidence based on the ML phylogenetic tree, suggest that viral recombination has played an important role in SaV evolution and epidemiology. The measurement of patristic distance is
US
an essential approach to the inference of transmission clusters. Our data from the tree branch
AN
lengths along of VP1 and RdRP SaV ML phylogeny, show a close or identical patristic distance among SaVs hosted in different animals (such as dog, swine, bat and humans) exhibiting thus the
M
contribution of these animals in the generation of SaV recombinants. In a way, this may
3.3. Recombination analysis
ED
contribute to the elucidation of viral zoonotic aspects (Table S1 and Table S2).
PT
In order to give in silico evidence for the presence of recombination events in SaV
CE
whole-genome sequences, we inferred the presence of SaV recombinants using nine softwares embedded in RDP4 and we found three statistically highly credible events. Accordingly, possible
AC
recombination events have been found in the SaV Ehime strain (isolated from Japan, GenBank code DQ058829.1. Host: Human) and PoSaVs (both isolated from China, GenBank codes FJ387164.1 and KF204570.1. Host: Swine) (Table S3). Although there are reports about SaV Ehime strain involvement in intergenogroup recombination events (Hansman et al., 2005), little attention has been given regarding its role in the advance of SaV epidemiology and zoonotic aspects considering multiple SaV host species.
ACCEPTED MANUSCRIPT
10
4. Discussion This study employed a high-resolution phylogenetic analysis to understand SaV taxonomy, molecular epidemiology and evolutionary aspects and to provide insights about zoonotic aspects. The close genetic relationship of SaV found in animals and humans has
T
previously raised concern regarding their zoonotic potential (Martella and Lorusso, 2008; Oka et
IP
al., 2015). According to Bank-Wolf et al. (2010), transmission from animals to humans and vice
CR
versa would have extensive considerable consequences for SaV epidemiology. So far, animal SaVs have not been found in humans, however the detection of porcine SaVs similar to human
US
strains is well documented, giving rise to hypotheses concerning its potential for zoonotic
AN
infection and generation of new recombinant strains (Oka et al., 2015). Our phylogenetic inferences showed the SaV evolutionary relationships and dispersion
M
among different host species, such as canine, mink, swine, bat and humans, where viruses
ED
detected from different animals clustered together exhibiting a close relationship, for instance, porcine and human SaVs (Fig. 1, Fig. 3A and Fig. 4). Also, we highlighted the important role of
PT
swine and bats in SaV dispersion and other evolutionary aspects (Fig. 4). Moreover, we identify
CE
an increase in SaV population dynamics and generation of new viral variants occurring near the 80’s as a possible reason for the introduction and dissemination of pathways of genotypes
AC
hitherto unknown (Fig. 3).
Through our phylogenetically-based statistical methods, we showed a plausible epidemiological pathway for the SaV (Fig. 4). These transmission events (among swine, bats and humans infected by SaV) raised the important questions of whether transmission of these viruses between animals and humans can indeed occurr and what the role of different animal species infected by SaV might be. These host species probably act as reservoirs or intermediate hosts for
ACCEPTED MANUSCRIPT
11
viral dispersion contributing to genetic variability (e.g. generation of new SaV genogroups and genotypes) and possible zoonotic transmission. Concerning SaV taxonomy, two important points must be considered. First, the use of multiple genomic regions based on partial sequences of different strains can lead to
T
inconsistencies in SaV classification, due to the formation of ambiguous groups as reported for
IP
other ssRNA viruses (Oliveira-Filho et al., 2013). Secondly, viral recombination has already
CR
been reported for SaVs and is known to play an important role in evolution of Caliciviruses. Recombination sites usually lie in the region between RdRP and capsid sequences (VP1 and
US
VP2) (Bull et al., 2007; Zhang et al., 2015). While a standardized method to classify recombinant
AN
strains does not yet exist, the analysis of both RdRP and VP1 sequences, nevertheless allows
M
identification.
The major difference from the previously proposed classification is that swine
ED
genogroups (GVI, GVII, GIX?, GX?, GXI?) (Scheuer et al., 2013) have been placed as GVI.
PT
Despite their high genetic variability, separation of porcine SaVs was not performed, since analyses using the VP1 gene showed that all sequences clustered together in a single swine clade
CE
(Fig. 2A and Supplementary material). Data from RdRP phylogeny showed different SaV
AC
grouping patterns. The GIII was excluded from the clade encompassing the new proposed genogroups GIX?, GX?, GXI? and the SaV hosted in swine within the GV was removed. Moreover, GIII, GVIII, GXII and GIV were grouped within a set of hosts such as mink, swine and bat (Fig. 2B and Supplementary material). Additionally, while we found a correlation between transmission clusters and SaV genogroups, it was not a reliable separation for all genogroups. Assuming that the method considers nodes/sub-trees with a reliability ≥ 90% and ≥ 2 distinct patients (Prosperi et al., 2011), our approach shows that the results obtained are stuck
ACCEPTED MANUSCRIPT
12
on the fact that many SaV genogroups are clustered in the same clade. In other words, these data suggest that previous taxonomic reports considered the presence of SaVs strains within a single clade for the establishment of newly RdRP-based genogroups. On the other hand, viral recombination plays a fundamental role in the evolution and
IP
T
genogroup classification o caliciviruses as reported for instance for Noroviruses (NoV) and
CR
Feline calicivirus (FCV) (White, 2014; Hou et al., 2016). For SaV, recombination is known to be important in the evolutionary process (Katayama et al., 2004) and both intra- and inter-
US
genogroup recombination have previously been reported (Hansman et al., 2005; Phan et al., 2005). SaVs have long circulated among different host species, probably, though unconfirmedly
AN
mediated by a spillover mechanism linked to the high genetic variability and recombination
M
potential of the virus. Our results from whole-genome SaV sequences hosted in different animal species, show three statistically highly credible events suggesting that SaV evolution has been
ED
driven by successive recombination events involving different species and that this may have
PT
contributed to the spillover between different animal species. These results reinforce the importance of viral recombination for playing a crucial role in evolution and diversification of
CE
SaVs and provide directions and clarifications regarding their zoonotic aspects and potentials.
AC
In summary, phylogenetic analysis based on the VP1 and complete genomic sequences were suitable and might be considered for the definition of genogroups and genotypes of current available and future SaV sequences. In addition, further studies involving SaV molecular epidemiology are required to better understand the epidemiology and zoonotic potential of these viruses. It is important note that some of our conclusions may be related to a sampling bias due to the limited amounts of SaV sequences available in public databases, and hence the lack of
ACCEPTED MANUSCRIPT
13
prior evolutionary studies. Our methodology may also be extended to not only understand the epidemiologic origin and taxonomic classification of SaVs but also to other viruses.
Conflict of interest statement
IP
T
The authors have declared that no competing interests exist.
CR
Acknowledgments
E.F. Oliveira-Filho and R. Durães-Carvalho are supported by FACEPE and MCT/CNPq
US
DCR grants. A.A. Alfieri is a recipient of CNPq fellowship. We thank Louisa Ludwig for the
AN
careful English correction of the manuscript. The authors would also like to thank Dr. Alex
M
Bosser for fruitful discussion.
ED
References
AC
CE
PT
Baele, G., Lemey, P., Vansteelandt, S., 2013a. Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution. BMC Bioinformatics. 14, 85. doi:10.1186/1471-2105-14-85 Baele, G., Li, W.L.S., Drummond, A.J., Suchard, M.A., Lemey, P., 2013b. Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. Mol. Biol. Evol. 30, 239– 243. doi:10.1093/molbev/mss243 Bank-Wolf, B.R., König, M., Thiel, H.-J., 2010. Zoonotic aspects of infections with noroviruses and sapoviruses. Vet. Microbiol. 140, 204–212. doi:10.1016/j.vetmic.2009.08.021 Bertolotti-Ciarlet, A., Crawford, S.E., Hutson, A.M., Estes, M.K., 2003. The 3’ end of Norwalk virus mRNA contains determinants that regulate the expression and stability of the viral capsid protein VP1: a novel function for the VP2 protein. J. Virol. 77, 11603–11615. doi:10.1128/JVI.77.21.11603-11615.2003 Boni, M.F., Posada, D., Feldman, M.W., 2006. An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets. Genetics 176. doi:10.1534/genetics.106.068874 Bruen, T.C., Philippe, H., Bryant, D., 2006. A simple and robust statistical test for detecting the presence of recombination. Genetics 172, 2665–2681. doi:10.1534/genetics.105.048975 Bull, R.A., Tanaka, M.M., White, P.A., 2007. Norovirus recombination. J. Gen. Virol. 88, 33473359. doi:10.1099/vir.0.83321-0 Clarke, I.N., Lambden, P.R., 2000. Organization and expression of calicivirus genes. J. Infect. Dis. 181 Suppl 2, S309-316-S309-316. doi:10.1086/315575 Cuevas, J.M., Combe, M., Torres-Puente, M., Garijo, R., Guix, S., Buesa, J., Rodríguez-Díaz, J.,
ACCEPTED MANUSCRIPT
14
AC
CE
PT
ED
M
AN
US
CR
IP
T
Sanjuán, R., 2016. Human norovirus hyper-mutation revealed by ultra-deep sequencing. Infect. Genet. Evol. 41, 233–239. doi:10.1016/j.meegid.2016.04.017 Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772. doi: 10.1038/nmeth.2109 Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. doi:10.1186/1471-2148-7-214 Edgar, R.C., 2003. MUSCLE: multiple sequence alignment with high accuracy and high throughput 32, 1792–1797. doi:10.1093/nar/gkh340 Fankhauser, R.L., Monroe, S.S., Noel, J.S., Humphrey, C.D., Bresee, J.S., Parashar, U.D., Ando, T., Glass, R.I., 2002. Epidemiologic and molecular trends of “Norwalk-like viruses” associated with outbreaks of gastroenteritis in the United States. J. Infect. Dis. 186, 1–7. doi:10.1086/341085 Farkas, T., Sestak, K., Wei, C., Jiang, X., 2008. Characterization of a rhesus monkey calicivirus representing a new genus of Caliciviridae. J. Virol. 82, 5408–5416. doi:10.1128/JVI.00070-08 Farkas, T., Zhong, W.M., Jing, Y., Huang, P.W., Espinosa, S.M., Martinez, N., Morrow, A.L., Ruiz-Palacios, G.M., Pickering, L.K., Jiang, X., 2004. Genetic diversity among sapoviruses. Arch. Virol. 149, 1309–1323. doi:10.1007/s00705-004-0296-9 Gibbs, M.J., Armstrong, J.S., Gibbs, A.J., 2000. Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics. 16, 573-582. doi:10.1093/bioinformatics/16.7.573 Green, K.Y., Ando, T., Balayan, M.S., Berke, T., Clarke, I.N., Estes, M.K., Matson, D.O., Nakata, S., Neill, J.D., Studdert, M.J., Thiel, H.J., 2000. Taxonomy of the caliciviruses. J. Infect. Dis. 181 Suppl 2, S322–S330. doi:10.1086/315591 Green, S.M., Lambden, P.R., Caul, E.O., Ashley, C.R., Clarke, I.N., 1995. Capsid diversity in small round-structured viruses: molecular characterization of an antigenically distinct human enteric calicivirus. Virus Res. 37, 271–283. doi:10.1016/0168-1702(95)00041-N Guo, M., Chang, K.O., Hardy, M.E., Zhang, Q., Parwani, A.V., Saif, L.J., 1999. Molecular characterization of a porcine enteric calicivirus genetically related to Sapporo-like human caliciviruses. J. Virol. 73, 9625–9631. Hansman, G.S., Saito, H., Shibata, C., Ishizuka, S., Oseto, M., Oka, T., Takeda, N., 2007a. Outbreak of gastroenteritis due to sapovirus. J. Clin. Microbiol. 45, 1347–1349. doi:10.1128/JCM.01854-06 Hansman, G.S., Takeda, N., Oka, T., Oseto, M., Hedlund, K.-O., Katayama, K., 2005. Intergenogroup recombination in sapoviruses. Emerg. Infect. Dis. 11, 1916–1920. doi:10.3201/eid1112.050722 Holmes, E.C., Worobey, M., Rambaut, A., 1999. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 16, 405–409. doi:10.1093/oxfordjournals.molbev.a026121 Hou, J., Sánchez-Vizcaíno, F., McGahie, D., Lesbros, C., Almeras, T., Howarth, D., O’Hara, V., Dawson, S., Radford, A.D., 2016. European molecular epidemiology and strain diversity of feline calicivirus. Vet. Rec. 178, 114–115. doi:10.1136/vr.103446 Katayama, K., Miyoshi, T., Uchino, K., Oka, T., Tanaka, T., Takeda, N., Hansman, G.S., 2004. Novel recombinant sapovirus. Emerg. Infect. Dis. 10, 1874–1876. doi:10.3201/eid1010.040395 L’Homme, Y., Sansregret, R., Plante-Fortier, E., Lamontagne, A.-M., Ouardani, M., Lacroix, G., Simard, C., 2009. Genomic characterization of swine caliciviruses representing a new genus of Caliciviridae. Virus Genes 39, 66–75. doi:10.1007/s11262-009-0360-3
ACCEPTED MANUSCRIPT
15
AC
CE
PT
ED
M
AN
US
CR
IP
T
Lopman, B.A., Reacher, M.H., Van Duijnhoven, Y., Hanon, F.X., Brown, D., Koopmans, M., 2003. Viral gastroenteritis outbreaks in Europe, 1995-2000. Emerg. Infect. Dis. 9, 90-96. doi:10.3201/eid0901.020184 Martin, D.P., Murrell, B., Golden, M., Khoosal, A., Muhire, B., 2015. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003. Martin, D.P., Posada, D., Crandall, K.A., Williamson, C., 2005. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res. Hum. Retroviruses 21, 98–102. doi:10.1089/aid.2005.21.98 Martella, V., Lorusso, K.B., Decaro, N., 2008. Identification of a porcine calicivirus related genetically to human sapoviruses. doi:10.1128/JCM.00341-08 Oka, T., Katayama, K., Ogawa, S., Hansman, G.S., Kageyama, T., Ushijima, H., Miyamura, T., Takeda, N., 2005. Proteolytic processing of sapovirus ORF1 polyprotein. J. Virol. 79, 7283– 7290. doi:10.1128/JVI.79.12.7283-7290.2005 Oka, T., Wang, Q., Katayama, K., Saif, L.J., 2015. Comprehensive review of human sapoviruses. Clin. Microbiol. Rev. 28, 32–53. doi:10.1128/CMR.00011-14 Oliveira-Filho, E.F., König, M., Thiel, H.-J., 2013. Genetic variability of HEV isolates: inconsistencies of current classification. Vet. Microbiol. 165, 148–154. doi:10.1016/j.vetmic.2013.01.026 Padidam, M., Sawyer, S., Fauquet, C.M., 1999. Possible emergence of new geminiviruses by frequent recombination. Virology 265, 218–225. doi:10.1006/viro.1999.0056 Phan, T.G., Yan, H., Khamrin, P., Quang, T.D., Dey, S.K., Yagyu, F., Okitsu, S., Müller, W.E.G., Ushijima, H., 2005. Novel intragenotype recombination in sapovirus. Clin. Lab. 52, 363–366. Posada, D., Crandall, K.A., 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc. Natl. Acad. Sci. USA. 98, 13757–13762. doi:10.1073/pnas.241370698 Price, M.N., Dehal, P.S., Arkin, A.P., 2010. FastTree 2--approximately maximum-likelihood trees for large alignments. PloS one 5, e9490. doi:10.1371/journal.pone.0009490 Prosperi, M.C.F., Ciccozzi, M., Fanti, I., Saladini, F., Pecorari, M., Borghi, V., Di Giambenedetto, S., Bruzzone, B., Capetti, A., Vivarelli, A., Rusconi, S., Re, M.C., Gismondo, M.R., Sighinolfi, L., Gray, R.R., Salemi, M., Zazzi, M., De Luca, A., 2011. A novel methodology for large-scale phylogeny partition. Nat. Commun. 2, 321. doi:10.1038/ncomms1325 Reuter, G., Zimsek-Mijovski, J., Poljsak-Prijatelj, M., Di Bartolo, I., Ruggeri, F.M., Kantala, T., Maunula, L., Kiss, I., Kecskeméti, S., Halaihel, N., Buesa, J., Johnsen, C., Hjulsager, C.K., Larsen, L.E., Koopmans, M., Böttiger, B., 2010. Incidence, diversity, and molecular epidemiology of sapoviruses in swine across Europe. J. Clin. Microbiol. 48, 363–368. doi:10.1128/JCM.01279-09 Scheuer, K.A., Oka, T., Hoet, A.E., Gebreyes, W.A., Molla, B.Z., Saif, L.J., Wang, Q., 2013. Prevalence of porcine noroviruses, molecular characterization of emerging porcine sapoviruses from finisher swine in the United States, and unified classification scheme for sapoviruses. J. Clin. Microbiol. 51, 2344–2353. doi:10.1128/JCM.00865-13 Schmidt, H.A., Strimmer, K., Vingron, M., von Haeseler, A., 2001. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504. Smiley, J.R., Chang, K.O., Hayes, J., Vinjé, J., Saif, L.J., 2002. Characterization of an
ACCEPTED MANUSCRIPT
16
AC
CE
PT
ED
M
AN
US
CR
IP
T
enteropathogenic bovine calicivirus representing a potentially new calicivirus genus. J. Virol. 76, 10089–10098. doi:10.1128/JVI.76.20.10089-10098.2002 Smith, J.M., 1992. Analyzing the mosaic structure of genes. J. Mol. Evol. 34, 126–129. doi:10.1007/BF00182389 Strimmer, K., von Haeseler, A., 1997. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc. Natl. Acad. Sci. USA. 94, 6815–6819. doi:10.1073/pnas.94.13.6815 Svraka, S., Vennema, H., van der Veer, B., Hedlund, K.-O., Thorhagen, M., Siebenga, J., Duizer, E., Koopmans, M., 2010. Epidemiology and genotype analysis of emerging sapovirusassociated infections across Europe. J. Clin. Microbiol. 48, 2191–2198. doi:10.1128/JCM.02427-09 Trifinopoulos, J., Nguyen, L.-T., von Haeseler, A., Minh, B.Q., 2016. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic acids Res. 44, W232–W235. doi:10.1093/nar/gkw256 Weiller, G.F., 1998. Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. Mol. Biol. Evol. 15, 326–335. doi:10.1093/oxfordjournals.molbev.a025929 White, P.A., 2014. Evolution of norovirus. Clin. Microbiol. Infect. : Off. Publ. Eur. Soc. Clin. Microbiol. Infect. Dis. 20, 741–745. doi:10.1111/1469-0691.12746 Zhang, H., Cockrell, S.K., Kolawole, A.O., Rotem, A., Serohijos, A.W.R., Chang, C.B., Tao, Y., Mehoke, T.S., Han, Y., Lin, J.S., Giacobbi, N.S., Feldman, A.B., Shakhnovich, E., Weitz, D.A., Wobus, C.E., Pipas, J.M., 2015. Isolation and Analysis of Rare Norovirus Recombinants from Coinfected Mice Using Drop-Based Microfluidics. J. Virol. 89, 7722–7734. doi:10.1128/JVI.01137-15
17
AN
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 1. Sapovirus large-scale phylogeny partition and Maximum Likelihood phylogenetic
M
(ML) tree. The colored ML phylogenetic reconstructions represent the large ML trees using
ED
RdRP (n= 180) (A) and VP1 (n= 514) (B) sequences, while non-colored trees represent the phylogenetic reconstructions of the sequences collected from the transmission clusters (in the
PT
center). The colors in the tree highlight SaV isolated in different host types, as shown in the
CE
legend. The asterisks along tree branches represent SH-like support values of ≥ 0.74. For color
AC
version of this figure, the reader is referred to the web version of this article.
18
AN
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 2. Maximum likelihood (ML) phylogenetic tree of complete SaV VP1 and RdRP genes.
M
ML trees show SaV genogroup clustering patterns based on complete VP1 (A) and RdRP (B)
ED
genes. The animals represent SaV hosts associated to a particular genogroup or genogroups. The
PT
interrogation mark in SaV genogroups represents the classification scheme proposed by Scheuer et al. (2013). The highlight in light red color denotes a large clade of SaVs isolated from swine.
CE
The asterisks along the tree branches represent SH-like support values of ≥ 0.85. For color
AC
version of this figure, the reader is referred to the web version of this article.
19
US
CR
IP
T
ACCEPTED MANUSCRIPT
AN
Fig. 3. Bayesian frameworks analyzing SaV population dynamics over whole-genome
M
sequences. The highlight in light green shows the SaV’s epidemiological steps until it reaches other host types. The red circle represents our hypothetical period of SaV introduction and
ED
spread in the human population. The circles indicate SaVs involved in recombination events. The
PT
asterisks in the tree branches represent posterior distribution values of ≥ 0.74. The horizontal bars show the phylogenetic uncertainty (A). Bayesian Skyline Plot (BSP) (B) and The Lineage
CE
Through Time (LTT) (C) plot show SaV population dynamics and the overall pattern of SaV
AC
diversification over-time (C). Black and dashed blue lines mark the medians and the credibility based on 95% highest posterior density (HPD) intervals, respectively. The Y-axis represents the effective number of the viral population dynamics through time (B and C). The density as well as SaV nucleotide substitution rate are shown in panel D. The number above the panel represents the SaV mutation rate (2.723-2). Each color represents pairs of nucleotide rate. Viral abbreviations: HuSaV: Human Sapovirus, PoSaV: Porcine Sapovirus, PESaV: Porcine Enteric Sapovirus. Country abbreviations: PH: Philippines, CH: China, SK: South Korea, BR: Brazil,
ACCEPTED MANUSCRIPT
20
DE: Germany, JP: Japan, US: United States. Other abbreviations: HPD: Highest Posterior Density, ESS: Effective Sample Size. For color version of this figure, the reader is referred to
PT
ED
M
AN
US
CR
IP
T
the web version of this article.
CE
Fig. 4. Maximum Likelihood (ML) phylogenetic tree exploring SaV epidemiology. The
AC
colors highlight different hosts involved in the SaV epidemiological chain. The tree branches colored orange represent the SaV swine ancestry. Our hypothesis regarding the participation of other hosts involved in SaV epidemiological pathways and spread are shown in the center of the ML tree. The circles indicate SaVs involved in recombination events. Asterisks represent SHaLRT/aBayes/ultrafast bootstrap supports values of ≥ 80%. Viral abbreviations: HuSaV: Human Sapovirus, BatSaV: Bat Sapovirus, PoSaV: Porcine Sapovirus. For color version of this figure, the reader is referred to the web version of this article.
ACCEPTED MANUSCRIPT
21
Table 1: Proposed SaV Genogroups based on the complete VP1 gene and reference strains Hosts
GI
U65427 (partial), AY237422, X86560, AY694184, DQ366345 AJ249939, AY603425, AY646855, AY237420, AY237419 AF182760, AY425671 AF435814 (parcial), DQ125333, DQ058829 AY646856, JN420370, AB521771, AB521772, AJ606699, AJ786352, DQ366344, AY289803, AB924385 AY974192, KJ508818, DQ359100 AY144337 EU221477, FJ498786, KC309415, KC309416, KC309417, KC309419 JN387135 JN899072, JN899074
Human
ED
AC
GIX GX
PT
GVII GVIII
CE
GVI
M
AN
GV
CR
GIII GIV
Human
GII
Swine Human
GIII GIV
Human, Swine, Sea lion
GV
Swine Mink Swine
GVI, GVII, GIX?, GX?, GXI? GXII? GVIII
Dog Bat
GXIII? GXIV?
US
GII
Genogroups according to Scheuer et al. (2013) GI
T
Reference sequences
IP
Proposed Genogroups
ACCEPTED MANUSCRIPT
22
Highlights
We employed High-resolution phylogeny to investigate the SaV evolution and taxonomy
We have mapped the SaV epidemiology, zoonotic aspects and taxonomy
We have suggested criteria for SaV genogrouping and genotyping
We have showed the importance of different animal species in SaV spread
We have showed that different animal species may be acting as SaV reservoirs
AC
CE
PT
ED
M
AN
US
CR
IP
T