High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses

High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses

Accepted Manuscript High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses A.F. Barry, R...

989KB Sizes 0 Downloads 64 Views

Accepted Manuscript High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses

A.F. Barry, R. Durães-Carvalho, E.F. Oliveira-Filho, A.A. Alfieri, W.H.M. Van der Poel PII: DOI: Reference:

S1567-1348(17)30330-1 doi:10.1016/j.meegid.2017.09.024 MEEGID 3280

To appear in:

Infection, Genetics and Evolution

Received date: Revised date: Accepted date:

23 August 2017 18 September 2017 19 September 2017

Please cite this article as: A.F. Barry, R. Durães-Carvalho, E.F. Oliveira-Filho, A.A. Alfieri, W.H.M. Van der Poel , High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses. The address for the corresponding author was captured as affiliation for all authors. Please check if appropriate. Meegid(2017), doi:10.1016/j.meegid.2017.09.024

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

1

High-resolution phylogeny providing insights towards the epidemiology, zoonotic aspects and taxonomy of sapoviruses

T

Barry A.F.a*, Durães-Carvalho R.b*†, Oliveira-Filho E.F.*b, Alfieri A.A.a, Van der Poel W.H.M.c

CR

a

IP

*These authors contributed equally to this work

Laboratory of Animal Virology, Department of Preventive Veterinary Medicine,

US

Universidade Estadual de Londrina. Campus Universitário, PO Box 6001, 86051-990,

b

AN

Londrina, Paraná, Brazil.

Department of Virology, Aggeu Magalhães Institute, Oswaldo Cruz Foundation (FIOCRUZ),

Wageningen Bioveterinary Research, Wageningen University and Research, Department of

ED

c

M

Av. Professor Moraes Rego s/n, Cidade Universitária, Recife, PE 50670-420, Brazil.

Virology, P.O. Box 65, 8200 AB Lelystad, Edelhertweg 15,

CE

PT

8219 PH Lelystad, The Netherlands.

†Corresponding author: Ricardo Durães-Carvalho ([email protected]). Departamento de

AC

Virologia, Instituto Aggeu Magalhães (IAM), Fundação Oswaldo Cruz (FIOCRUZ), Recife-PE, Brasil. Abstract The evolution, epidemiology and zoonotic aspects of Sapoviruses (SaV) are still not well explored. In this study, we applied high-resolution phylogeny to investigate the epidemiological and zoonotic origins as well as taxonomic aspects of animal and human SaV. Bayesian framework analyses showed an increase in porcine SaV (PoSaV) population dynamics between

ACCEPTED MANUSCRIPT

2

1975 to 1982, resulting in a SaV gene flow and generation of new strains amongst porcine and human populations. Our results also show the contribution of different animal populations involved in SaV epidemiology and highlight zoonotic aspects, as exemplified by the crucial role that swine, dogs, mink and humans play in SaV spread. Additionally, phylogenetic analysis

T

suggests that bats may play key role in SaV epidemiology. According to our hypothesis, these

IP

animals may act as reservoirs or intermediate host species, contributing to viral dispersion in

CR

zoonotic and other epidemiological scenarios and facilitating the generation of new SaV genogroups and genotypes through recombination events. Data from large-scale phylogeny

US

partition based on patristic distance, did not show a correlation between transmission clusters on

AN

generation of SaV genogroups, nevertheless we present both important findings about SaV

M

taxonomy and important considerations useful for further taxonomical studies.

ED

Keywords: Sapoviruses; Phylogeny; Epidemiology; Taxonomy; Zoonoses

PT

1. Introduction

CE

Viral gastroenteritis is a major worldwide public health problem, and the enteric caliciviruses (norovirus – NoV and sapovirus – SaV) are responsible for most reported cases

AC

(Fankhauser et al., 2002; Lopman et al., 2003). NoV is the leading cause of outbreaks, but studies have described that SaV infection also plays an important role in cases of gastroenteritis in humans (Hansman et al., 2007a; Svraka et al., 2010). The epidemiological profile of SaV infections is not well understood, since human and animal SaV strains present a certain genetic similarity and their zoonotic potential still needs to be investigated (Bank-Wolf et al., 2010).

ACCEPTED MANUSCRIPT

3

The Sapovirus genus belongs to the Caliciviridae family which also includes the genera Norovirus, Lagovirus, Vesivirus, Nebovirus, Recovirus and Valovirus (Green et al., 2000; Smiley et al., 2002; Farkas et al., 2008; L’Homme et al., 2009). Sapovirus particles are non-enveloped, 27 to 35 nm diameter in size, and of icosahedral symmetry (Guo et al., 1999). The single-

T

stranded, positive-sense, polyadenylated RNA genome, approximately 7.3 kb in length, is

IP

organized in two or three open reading frames (ORFs). ORF1 encodes a polyprotein that is

CR

cleaved simultaneously into the viral non-structural proteins and the main capsid protein (VP1) (Oka et al., 2005). Another structural protein is encoded by ORF2 and is responsible for

US

stabilization of the viral particle and regulation of VP1 expression (Bertolotti-Ciarlet et al.,

AN

2003). Some human SaV strains possess an additional ORF that overlaps the 5’ end of the VP1 gene (Clarke and Lambden, 2000).

M

Since SaVs present high genetic variability and are constantly evolving, most studies

ED

perform SaV screening based on RT-PCR to target the highly conserved RNA-dependent RNA polymerase (RdRP). This is then followed by sequencing, targeting a more variable region

PT

(VP1), which allows molecular identification and phylogenetic analysis to characterize the

CE

different circulating SaV strains (Reuter et al., 2010). The definition of genogroups and genotypes is very important for clarifying SaV

AC

molecular epidemiology. For porcine and other animal SaVs, there is still no consensus concerning genogrouping. Classification systems are based on either complete or partial RdRP and VP1 genes (Farkas et al., 2004; L’Homme et al., 2009; Reuter et al., 2010). In addition, the current knowledge about SaV evolution and evolutionary mechanism is very limited. In the present study, we used an approach based on large-scale datasets using a Bayesian framework to

ACCEPTED MANUSCRIPT

4

understand the epidemiologic origin and zoonotic aspects of the sapoviruses and herewith propose a classification system for SaV(genogroups). 2. Material and Methods 2.1. Sequences dataset

T

All VP1 (n= 514), RdRP (n=180) and whole-genome (n= 35) sequences of SaV were

IP

downloaded from the GenBank database up to February 2017. Then, the sequences were filtered

CR

by known location and sampling date (1979 to 2014), aiming to calibrate divergence of time estimates and geographical ranges. Sequences containing degenerate bases, information such as

US

virus isolated from "unknown hosts”, sequences from clones and recombinants were discarded

AN

from our analyses. The datasets were aligned using the MUSCLE v3.8.31 software (Edgar, 2003).

M

2.2. Phylogenetic analysis

ED

In order to check the evolutionary relationships among SaV genogroups and genotypes, phylogenetic reconstructions were performed using the Maximum Likelihood (ML) method

PT

implemented in FastTree v.2.1.7 (Price et al., 2010) and IQ-TREE (Trifinopoulos et al., 2016)

CE

softwares, using the standard implementation GTR + CAT with 20 gamma distribution parameters and a mix of Nearest-Neighbor Interchanges (NNI) and Sub-Tree-Prune-Regraft

AC

(SPR) (FastTree), and the combination of hill-climbing algorithms, random perturbation of current best trees, and a broad sampling of initial starting trees (IQ-TREE). A methodological approach for extraction of large-scale phylogenetic partitions based on patristic distance and SHlike support was also applied to identify transmission clusters among SaVs genogroups and genotypes aiming to investigate different aspects on the SaV epidemiology taxonomy (Prosperi et al., 2011).

ACCEPTED MANUSCRIPT

5

The presence of phylogenetic signals was investigated by the likelihood mapping analysis of 10,000 random quartets generated using Tree-Puzzle v.5.2 software (available at: http://www.tree-puzzle.de/) (Strimmer and von Haeseler, 1997; Schmidt et al., 2001). The Pairwise Homoplasy Index (PHI) test for evidence of recombination was implemented in the

T

SplitsTree software v.4.10 following the default settings (Bruen et al., 2006). The reliability of

IP

the nodes was analyzed by the Shimodaira-Hasegawa (SH-like) test, aBayes and ultrafast

CR

bootstrap supports values with 1000 replicates.

2.3. Epidemiological insights and zoonotic aspects of the SaV dispersion

US

Phylogeographic and spatiotemporal analyses were applied to study the geographic

AN

spread pattern and population dynamics of SaVs hosted in different species and isolated in different years and locations. For this purpose, a Bayesian framework analysis was applied

M

through the implementation of the Metropolis-Hasting Markov Chain Monte Carlo (MCMC)

ED

algorithm in the Bayesian Evolutionary Analysis Sampling Trees (BEAST) software package, v1.8.0 (Drummond and Rambaut, 2007). The coalescent demographic model Bayesian Skyline

PT

Plot (BSP) was used imposing a strict or relaxed molecular clock (with log-normal distributions

CE

rates). The Lineage Through Time (LTT) plot was also applied to check the overall pattern of SaV diversification over time.

AC

The Markov model of nucleotide substitution was indicated by the software jModelTest v.2.1.6 (Darriba et al., 2011). The MCMC algorithm was run up to 1 billion generations, with sampling every 100,000 generations, for each molecular clock. Good mixing of the MCMC was determined by Effective Sample Size (ESS) values ≥ 200 and the convergence of parameters was checked in Tracer v1.6 software (available at http://beast.bio.ed.ac.uk/) with 10% burn-in. The marginal likelihood for each clock model was obtained using the path sampling and stepping-

ACCEPTED MANUSCRIPT

6

stone algorithm (Baele et al., 2013a; Baele et al., 2013b). The posterior distribution of Maximum Clade Credibility (MCC) tree was summarized by TreeAnnotator (implemented in BEAST v.1.8.0

package)

and

visualized

in

FigTree

v.1.4.2

software

(available

at:

http://tree.bio.ed.ac.uk/software/figtree/). Phylogenetic uncertainties were estimated by the 95%

T

Highest Probability Density (HPD) intervals. Evolutionary parameters such as nucleotide

IP

substitution rates, tree topologies, SaVs demographic histories and time of the most recent

CR

common ancestor (tMRCA) were estimated along of SaV whole-genome sequences. 2.4. Recombination analysis

US

A full exploratory recombination scan was performed to detect SaV recombinants, minor

AN

and major parental sequences over whole-genome sequences, using the Recombination Detection Program (RDP) v.4.80 (Martin et al., 2015), and the algorithms embedded in it: MaxChi (Smith,

M

1992); PhylPro (Weiller, 1998); GENECONV (Padidam et al., 1999), LARD (Holmes et al.,

ED

1999); SIScan (Gibbs et al., 2000); CHIMAERA (Posada and Crandall, 2001); BootScan (Martin et al., 2005) and 3Seq (Boni et al., 2006); p-values ≥ 0.05 were regarded as statistically

PT

significant. Only statistically significant events over nine programs were considered for the

3. Results

CE

presence of recombination.

AC

3.1. Phylogenetic relationships among SaV genogroups and genotypes Initially, we analyzed large datasets comprising all available SaV RdRP (n= 180) and VP1 (n= 514) sequences to better understand the taxonomy in different host species (see Supplementary material). Our results show a massive amount of RdRP sequences from PoSaV and HuSaV, as well as HuSaV-like detected in other species (Fig. 1A). On the other hand, the amount of SaV VP1 sequences is almost three times higher if compared to RdRP (Fig. 1B).

ACCEPTED MANUSCRIPT

7

Moreover, an interesting pattern in the evolutionary dynamics of SaV showed that different host species (such as swine, dog, mink and human) apparently might have played an important role in SaV dispersion. In this large-scale approach, we selected sequences from well-defined genogroups (GI,

T

GII, GIII, GIV, V), other genogroups proposed by Scheuer et al. (2013) and other non-assigned

IP

sequences available on the GenBank database. In addition, we established two criteria for

CR

grouping besides the phylogenetic analyses. For a new strain be considered as a new genogroup it should have a complete genomic sequence or, at least, complete VP1/RdRP sequence and it

US

must be grouped in different clades (I). In addition, we considered the existence of certain non-

AN

mathematical features such as geographic and transmission clusters among SaV-infected hosts (II). Following our approach, viruses not meeting these criteria could not be grouped in a

ED

genogroups (Table 1 and Fig. 2).

M

potential new genogroup. Accordingly, we propose that SaV should be organized in ten

Large-scale phylogenetic partition approach was applied to the large RdRP and VP1 ML

PT

trees for SaV grouping (to select clusters based on patristic distances and SH-like support more

CE

than 90%). Sequences were extracted from large ML trees and new phylogenies were reconstructed aiming to trace a strategy for the establishment of different taxonomic groupings of

AC

viruses (Fig. 1). However, analysis using the methodology for large-scale phylogeny partition was not successful to separate genogroups defined when applying the optimal threshold interval corresponding to absolute distance range of 0.05 nucleotide substitutions per site (see Supplementary material). 3.2. Epidemiologic origin and zoonotic aspects of SaVs

ACCEPTED MANUSCRIPT

8

A Bayesian coalescent-based method imposing a relaxed molecular clock was applied to trace the epidemiologic, zoonotic and phylogeographic origins of SaVs hosted in different animals. We applied the coalescent model Bayesian Skyline Plot (BSP) to evaluate SaV population dynamics over time and the Lineage through Time (LTT) for an overall inference

T

about the pattern of viral diversification. Our results suggest the potential role of swine host in

IP

contributing to the viral dissemination and increase in SaV population dynamics and

CR

epidemiological profile (Fig. 3A and Fig. S1). The dated Bayesian Maximum Clade Credibility (MCC) tree shows a SaV isolated from swine in 1979 (KT922087.1_PoSaV_Cowden_1979_US)

US

as the most recent common ancestor (MRCA) responsible for the SaV introduction and spread to

AN

other species. Our results also highlight an increase in PoSaV population dynamics between 1975-1982 (Fig. 3A and Fig. 3B), exhibiting a SaV gene flow amongst animals of different

M

countries until finally reaching the human population. The evolutionary jumps of PoSaV to other

ED

host species probably as acted as a source for the subsequent SaV lineages, triggering the

(Fig. 3A and Fig. S1).

PT

generation of recombinant strains and thus playing an important role in sapovirus epidemiology

CE

The BSP and LTT analyses show SaV genetic variation over time, taking into account the presence of different hosts involved in their epidemiology. Based on a peak related to the viral

AC

effective population size (Ne) near 1978 and 80’s, it was possible to hypothesize concerning the role of different animals strongly contributing to the epidemiology of this virus (Fig. 3B and Fig. 3C). In order to better understand the SaV evolution, we also mapped the SaV mutation rate over time. Genome-wide estimation of the mutation rate of SaV was 2.723-2 changes per site per replication cycle (Fig. 3D). This data corroborates similar reports for NoVs (Cuevas et al., 2016).

ACCEPTED MANUSCRIPT

9

Complete genomic sequence analysis via the ML method also inferred probable evolutionary and epidemiological ways of SaV evolution. The phylogenetically-based statistical methods used (SH-aLRT, aBayes and ultrafast bootstrap supports), reinforce that HuSaV as well as SaV isolated in other animals, apparently originate from recombinant SaV isolated from swine

T

(KJ508818.1 and AY974192.2). In addition, we also highlight the important role that bats and

IP

swine play in SaV dispersion (Fig. 4 and Fig. S2).

CR

Evidence based on the ML phylogenetic tree, suggest that viral recombination has played an important role in SaV evolution and epidemiology. The measurement of patristic distance is

US

an essential approach to the inference of transmission clusters. Our data from the tree branch

AN

lengths along of VP1 and RdRP SaV ML phylogeny, show a close or identical patristic distance among SaVs hosted in different animals (such as dog, swine, bat and humans) exhibiting thus the

M

contribution of these animals in the generation of SaV recombinants. In a way, this may

3.3. Recombination analysis

ED

contribute to the elucidation of viral zoonotic aspects (Table S1 and Table S2).

PT

In order to give in silico evidence for the presence of recombination events in SaV

CE

whole-genome sequences, we inferred the presence of SaV recombinants using nine softwares embedded in RDP4 and we found three statistically highly credible events. Accordingly, possible

AC

recombination events have been found in the SaV Ehime strain (isolated from Japan, GenBank code DQ058829.1. Host: Human) and PoSaVs (both isolated from China, GenBank codes FJ387164.1 and KF204570.1. Host: Swine) (Table S3). Although there are reports about SaV Ehime strain involvement in intergenogroup recombination events (Hansman et al., 2005), little attention has been given regarding its role in the advance of SaV epidemiology and zoonotic aspects considering multiple SaV host species.

ACCEPTED MANUSCRIPT

10

4. Discussion This study employed a high-resolution phylogenetic analysis to understand SaV taxonomy, molecular epidemiology and evolutionary aspects and to provide insights about zoonotic aspects. The close genetic relationship of SaV found in animals and humans has

T

previously raised concern regarding their zoonotic potential (Martella and Lorusso, 2008; Oka et

IP

al., 2015). According to Bank-Wolf et al. (2010), transmission from animals to humans and vice

CR

versa would have extensive considerable consequences for SaV epidemiology. So far, animal SaVs have not been found in humans, however the detection of porcine SaVs similar to human

US

strains is well documented, giving rise to hypotheses concerning its potential for zoonotic

AN

infection and generation of new recombinant strains (Oka et al., 2015). Our phylogenetic inferences showed the SaV evolutionary relationships and dispersion

M

among different host species, such as canine, mink, swine, bat and humans, where viruses

ED

detected from different animals clustered together exhibiting a close relationship, for instance, porcine and human SaVs (Fig. 1, Fig. 3A and Fig. 4). Also, we highlighted the important role of

PT

swine and bats in SaV dispersion and other evolutionary aspects (Fig. 4). Moreover, we identify

CE

an increase in SaV population dynamics and generation of new viral variants occurring near the 80’s as a possible reason for the introduction and dissemination of pathways of genotypes

AC

hitherto unknown (Fig. 3).

Through our phylogenetically-based statistical methods, we showed a plausible epidemiological pathway for the SaV (Fig. 4). These transmission events (among swine, bats and humans infected by SaV) raised the important questions of whether transmission of these viruses between animals and humans can indeed occurr and what the role of different animal species infected by SaV might be. These host species probably act as reservoirs or intermediate hosts for

ACCEPTED MANUSCRIPT

11

viral dispersion contributing to genetic variability (e.g. generation of new SaV genogroups and genotypes) and possible zoonotic transmission. Concerning SaV taxonomy, two important points must be considered. First, the use of multiple genomic regions based on partial sequences of different strains can lead to

T

inconsistencies in SaV classification, due to the formation of ambiguous groups as reported for

IP

other ssRNA viruses (Oliveira-Filho et al., 2013). Secondly, viral recombination has already

CR

been reported for SaVs and is known to play an important role in evolution of Caliciviruses. Recombination sites usually lie in the region between RdRP and capsid sequences (VP1 and

US

VP2) (Bull et al., 2007; Zhang et al., 2015). While a standardized method to classify recombinant

AN

strains does not yet exist, the analysis of both RdRP and VP1 sequences, nevertheless allows

M

identification.

The major difference from the previously proposed classification is that swine

ED

genogroups (GVI, GVII, GIX?, GX?, GXI?) (Scheuer et al., 2013) have been placed as GVI.

PT

Despite their high genetic variability, separation of porcine SaVs was not performed, since analyses using the VP1 gene showed that all sequences clustered together in a single swine clade

CE

(Fig. 2A and Supplementary material). Data from RdRP phylogeny showed different SaV

AC

grouping patterns. The GIII was excluded from the clade encompassing the new proposed genogroups GIX?, GX?, GXI? and the SaV hosted in swine within the GV was removed. Moreover, GIII, GVIII, GXII and GIV were grouped within a set of hosts such as mink, swine and bat (Fig. 2B and Supplementary material). Additionally, while we found a correlation between transmission clusters and SaV genogroups, it was not a reliable separation for all genogroups. Assuming that the method considers nodes/sub-trees with a reliability ≥ 90% and ≥ 2 distinct patients (Prosperi et al., 2011), our approach shows that the results obtained are stuck

ACCEPTED MANUSCRIPT

12

on the fact that many SaV genogroups are clustered in the same clade. In other words, these data suggest that previous taxonomic reports considered the presence of SaVs strains within a single clade for the establishment of newly RdRP-based genogroups. On the other hand, viral recombination plays a fundamental role in the evolution and

IP

T

genogroup classification o caliciviruses as reported for instance for Noroviruses (NoV) and

CR

Feline calicivirus (FCV) (White, 2014; Hou et al., 2016). For SaV, recombination is known to be important in the evolutionary process (Katayama et al., 2004) and both intra- and inter-

US

genogroup recombination have previously been reported (Hansman et al., 2005; Phan et al., 2005). SaVs have long circulated among different host species, probably, though unconfirmedly

AN

mediated by a spillover mechanism linked to the high genetic variability and recombination

M

potential of the virus. Our results from whole-genome SaV sequences hosted in different animal species, show three statistically highly credible events suggesting that SaV evolution has been

ED

driven by successive recombination events involving different species and that this may have

PT

contributed to the spillover between different animal species. These results reinforce the importance of viral recombination for playing a crucial role in evolution and diversification of

CE

SaVs and provide directions and clarifications regarding their zoonotic aspects and potentials.

AC

In summary, phylogenetic analysis based on the VP1 and complete genomic sequences were suitable and might be considered for the definition of genogroups and genotypes of current available and future SaV sequences. In addition, further studies involving SaV molecular epidemiology are required to better understand the epidemiology and zoonotic potential of these viruses. It is important note that some of our conclusions may be related to a sampling bias due to the limited amounts of SaV sequences available in public databases, and hence the lack of

ACCEPTED MANUSCRIPT

13

prior evolutionary studies. Our methodology may also be extended to not only understand the epidemiologic origin and taxonomic classification of SaVs but also to other viruses.

Conflict of interest statement

IP

T

The authors have declared that no competing interests exist.

CR

Acknowledgments

E.F. Oliveira-Filho and R. Durães-Carvalho are supported by FACEPE and MCT/CNPq

US

DCR grants. A.A. Alfieri is a recipient of CNPq fellowship. We thank Louisa Ludwig for the

AN

careful English correction of the manuscript. The authors would also like to thank Dr. Alex

M

Bosser for fruitful discussion.

ED

References

AC

CE

PT

Baele, G., Lemey, P., Vansteelandt, S., 2013a. Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution. BMC Bioinformatics. 14, 85. doi:10.1186/1471-2105-14-85 Baele, G., Li, W.L.S., Drummond, A.J., Suchard, M.A., Lemey, P., 2013b. Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. Mol. Biol. Evol. 30, 239– 243. doi:10.1093/molbev/mss243 Bank-Wolf, B.R., König, M., Thiel, H.-J., 2010. Zoonotic aspects of infections with noroviruses and sapoviruses. Vet. Microbiol. 140, 204–212. doi:10.1016/j.vetmic.2009.08.021 Bertolotti-Ciarlet, A., Crawford, S.E., Hutson, A.M., Estes, M.K., 2003. The 3’ end of Norwalk virus mRNA contains determinants that regulate the expression and stability of the viral capsid protein VP1: a novel function for the VP2 protein. J. Virol. 77, 11603–11615. doi:10.1128/JVI.77.21.11603-11615.2003 Boni, M.F., Posada, D., Feldman, M.W., 2006. An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets. Genetics 176. doi:10.1534/genetics.106.068874 Bruen, T.C., Philippe, H., Bryant, D., 2006. A simple and robust statistical test for detecting the presence of recombination. Genetics 172, 2665–2681. doi:10.1534/genetics.105.048975 Bull, R.A., Tanaka, M.M., White, P.A., 2007. Norovirus recombination. J. Gen. Virol. 88, 33473359. doi:10.1099/vir.0.83321-0 Clarke, I.N., Lambden, P.R., 2000. Organization and expression of calicivirus genes. J. Infect. Dis. 181 Suppl 2, S309-316-S309-316. doi:10.1086/315575 Cuevas, J.M., Combe, M., Torres-Puente, M., Garijo, R., Guix, S., Buesa, J., Rodríguez-Díaz, J.,

ACCEPTED MANUSCRIPT

14

AC

CE

PT

ED

M

AN

US

CR

IP

T

Sanjuán, R., 2016. Human norovirus hyper-mutation revealed by ultra-deep sequencing. Infect. Genet. Evol.  41, 233–239. doi:10.1016/j.meegid.2016.04.017 Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772. doi: 10.1038/nmeth.2109 Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. doi:10.1186/1471-2148-7-214 Edgar, R.C., 2003. MUSCLE: multiple sequence alignment with high accuracy and high throughput 32, 1792–1797. doi:10.1093/nar/gkh340 Fankhauser, R.L., Monroe, S.S., Noel, J.S., Humphrey, C.D., Bresee, J.S., Parashar, U.D., Ando, T., Glass, R.I., 2002. Epidemiologic and molecular trends of “Norwalk-like viruses” associated with outbreaks of gastroenteritis in the United States. J. Infect. Dis. 186, 1–7. doi:10.1086/341085 Farkas, T., Sestak, K., Wei, C., Jiang, X., 2008. Characterization of a rhesus monkey calicivirus representing a new genus of Caliciviridae. J. Virol. 82, 5408–5416. doi:10.1128/JVI.00070-08 Farkas, T., Zhong, W.M., Jing, Y., Huang, P.W., Espinosa, S.M., Martinez, N., Morrow, A.L., Ruiz-Palacios, G.M., Pickering, L.K., Jiang, X., 2004. Genetic diversity among sapoviruses. Arch. Virol. 149, 1309–1323. doi:10.1007/s00705-004-0296-9 Gibbs, M.J., Armstrong, J.S., Gibbs, A.J., 2000. Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics. 16, 573-582. doi:10.1093/bioinformatics/16.7.573 Green, K.Y., Ando, T., Balayan, M.S., Berke, T., Clarke, I.N., Estes, M.K., Matson, D.O., Nakata, S., Neill, J.D., Studdert, M.J., Thiel, H.J., 2000. Taxonomy of the caliciviruses. J. Infect. Dis. 181 Suppl 2, S322–S330. doi:10.1086/315591 Green, S.M., Lambden, P.R., Caul, E.O., Ashley, C.R., Clarke, I.N., 1995. Capsid diversity in small round-structured viruses: molecular characterization of an antigenically distinct human enteric calicivirus. Virus Res. 37, 271–283. doi:10.1016/0168-1702(95)00041-N Guo, M., Chang, K.O., Hardy, M.E., Zhang, Q., Parwani, A.V., Saif, L.J., 1999. Molecular characterization of a porcine enteric calicivirus genetically related to Sapporo-like human caliciviruses. J. Virol. 73, 9625–9631. Hansman, G.S., Saito, H., Shibata, C., Ishizuka, S., Oseto, M., Oka, T., Takeda, N., 2007a. Outbreak of gastroenteritis due to sapovirus. J. Clin. Microbiol. 45, 1347–1349. doi:10.1128/JCM.01854-06 Hansman, G.S., Takeda, N., Oka, T., Oseto, M., Hedlund, K.-O., Katayama, K., 2005. Intergenogroup recombination in sapoviruses. Emerg. Infect. Dis. 11, 1916–1920. doi:10.3201/eid1112.050722 Holmes, E.C., Worobey, M., Rambaut, A., 1999. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 16, 405–409. doi:10.1093/oxfordjournals.molbev.a026121 Hou, J., Sánchez-Vizcaíno, F., McGahie, D., Lesbros, C., Almeras, T., Howarth, D., O’Hara, V., Dawson, S., Radford, A.D., 2016. European molecular epidemiology and strain diversity of feline calicivirus. Vet. Rec. 178, 114–115. doi:10.1136/vr.103446 Katayama, K., Miyoshi, T., Uchino, K., Oka, T., Tanaka, T., Takeda, N., Hansman, G.S., 2004. Novel recombinant sapovirus. Emerg. Infect. Dis. 10, 1874–1876. doi:10.3201/eid1010.040395 L’Homme, Y., Sansregret, R., Plante-Fortier, E., Lamontagne, A.-M., Ouardani, M., Lacroix, G., Simard, C., 2009. Genomic characterization of swine caliciviruses representing a new genus of Caliciviridae. Virus Genes 39, 66–75. doi:10.1007/s11262-009-0360-3

ACCEPTED MANUSCRIPT

15

AC

CE

PT

ED

M

AN

US

CR

IP

T

Lopman, B.A., Reacher, M.H., Van Duijnhoven, Y., Hanon, F.X., Brown, D., Koopmans, M., 2003. Viral gastroenteritis outbreaks in Europe, 1995-2000. Emerg. Infect. Dis. 9, 90-96. doi:10.3201/eid0901.020184 Martin, D.P., Murrell, B., Golden, M., Khoosal, A., Muhire, B., 2015. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003. Martin, D.P., Posada, D., Crandall, K.A., Williamson, C., 2005. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res. Hum. Retroviruses 21, 98–102. doi:10.1089/aid.2005.21.98 Martella, V., Lorusso, K.B., Decaro, N., 2008. Identification of a porcine calicivirus related genetically to human sapoviruses. doi:10.1128/JCM.00341-08 Oka, T., Katayama, K., Ogawa, S., Hansman, G.S., Kageyama, T., Ushijima, H., Miyamura, T., Takeda, N., 2005. Proteolytic processing of sapovirus ORF1 polyprotein. J. Virol. 79, 7283– 7290. doi:10.1128/JVI.79.12.7283-7290.2005 Oka, T., Wang, Q., Katayama, K., Saif, L.J., 2015. Comprehensive review of human sapoviruses. Clin. Microbiol. Rev. 28, 32–53. doi:10.1128/CMR.00011-14 Oliveira-Filho, E.F., König, M., Thiel, H.-J., 2013. Genetic variability of HEV isolates: inconsistencies of current classification. Vet. Microbiol. 165, 148–154. doi:10.1016/j.vetmic.2013.01.026 Padidam, M., Sawyer, S., Fauquet, C.M., 1999. Possible emergence of new geminiviruses by frequent recombination. Virology 265, 218–225. doi:10.1006/viro.1999.0056 Phan, T.G., Yan, H., Khamrin, P., Quang, T.D., Dey, S.K., Yagyu, F., Okitsu, S., Müller, W.E.G., Ushijima, H., 2005. Novel intragenotype recombination in sapovirus. Clin. Lab. 52, 363–366. Posada, D., Crandall, K.A., 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc. Natl. Acad. Sci. USA. 98, 13757–13762. doi:10.1073/pnas.241370698 Price, M.N., Dehal, P.S., Arkin, A.P., 2010. FastTree 2--approximately maximum-likelihood trees for large alignments. PloS one 5, e9490. doi:10.1371/journal.pone.0009490 Prosperi, M.C.F., Ciccozzi, M., Fanti, I., Saladini, F., Pecorari, M., Borghi, V., Di Giambenedetto, S., Bruzzone, B., Capetti, A., Vivarelli, A., Rusconi, S., Re, M.C., Gismondo, M.R., Sighinolfi, L., Gray, R.R., Salemi, M., Zazzi, M., De Luca, A., 2011. A novel methodology for large-scale phylogeny partition. Nat. Commun. 2, 321. doi:10.1038/ncomms1325 Reuter, G., Zimsek-Mijovski, J., Poljsak-Prijatelj, M., Di Bartolo, I., Ruggeri, F.M., Kantala, T., Maunula, L., Kiss, I., Kecskeméti, S., Halaihel, N., Buesa, J., Johnsen, C., Hjulsager, C.K., Larsen, L.E., Koopmans, M., Böttiger, B., 2010. Incidence, diversity, and molecular epidemiology of sapoviruses in swine across Europe. J. Clin. Microbiol. 48, 363–368. doi:10.1128/JCM.01279-09 Scheuer, K.A., Oka, T., Hoet, A.E., Gebreyes, W.A., Molla, B.Z., Saif, L.J., Wang, Q., 2013. Prevalence of porcine noroviruses, molecular characterization of emerging porcine sapoviruses from finisher swine in the United States, and unified classification scheme for sapoviruses. J. Clin. Microbiol. 51, 2344–2353. doi:10.1128/JCM.00865-13 Schmidt, H.A., Strimmer, K., Vingron, M., von Haeseler, A., 2001. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504. Smiley, J.R., Chang, K.O., Hayes, J., Vinjé, J., Saif, L.J., 2002. Characterization of an

ACCEPTED MANUSCRIPT

16

AC

CE

PT

ED

M

AN

US

CR

IP

T

enteropathogenic bovine calicivirus representing a potentially new calicivirus genus. J. Virol. 76, 10089–10098. doi:10.1128/JVI.76.20.10089-10098.2002 Smith, J.M., 1992. Analyzing the mosaic structure of genes. J. Mol. Evol. 34, 126–129. doi:10.1007/BF00182389 Strimmer, K., von Haeseler, A., 1997. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc. Natl. Acad. Sci. USA. 94, 6815–6819. doi:10.1073/pnas.94.13.6815 Svraka, S., Vennema, H., van der Veer, B., Hedlund, K.-O., Thorhagen, M., Siebenga, J., Duizer, E., Koopmans, M., 2010. Epidemiology and genotype analysis of emerging sapovirusassociated infections across Europe. J. Clin. Microbiol. 48, 2191–2198. doi:10.1128/JCM.02427-09 Trifinopoulos, J., Nguyen, L.-T., von Haeseler, A., Minh, B.Q., 2016. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic acids Res. 44, W232–W235. doi:10.1093/nar/gkw256 Weiller, G.F., 1998. Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. Mol. Biol. Evol. 15, 326–335. doi:10.1093/oxfordjournals.molbev.a025929 White, P.A., 2014. Evolution of norovirus. Clin. Microbiol. Infect. : Off. Publ. Eur. Soc. Clin. Microbiol. Infect. Dis. 20, 741–745. doi:10.1111/1469-0691.12746 Zhang, H., Cockrell, S.K., Kolawole, A.O., Rotem, A., Serohijos, A.W.R., Chang, C.B., Tao, Y., Mehoke, T.S., Han, Y., Lin, J.S., Giacobbi, N.S., Feldman, A.B., Shakhnovich, E., Weitz, D.A., Wobus, C.E., Pipas, J.M., 2015. Isolation and Analysis of Rare Norovirus Recombinants from Coinfected Mice Using Drop-Based Microfluidics. J. Virol. 89, 7722–7734. doi:10.1128/JVI.01137-15

17

AN

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 1. Sapovirus large-scale phylogeny partition and Maximum Likelihood phylogenetic

M

(ML) tree. The colored ML phylogenetic reconstructions represent the large ML trees using

ED

RdRP (n= 180) (A) and VP1 (n= 514) (B) sequences, while non-colored trees represent the phylogenetic reconstructions of the sequences collected from the transmission clusters (in the

PT

center). The colors in the tree highlight SaV isolated in different host types, as shown in the

CE

legend. The asterisks along tree branches represent SH-like support values of ≥ 0.74. For color

AC

version of this figure, the reader is referred to the web version of this article.

18

AN

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 2. Maximum likelihood (ML) phylogenetic tree of complete SaV VP1 and RdRP genes.

M

ML trees show SaV genogroup clustering patterns based on complete VP1 (A) and RdRP (B)

ED

genes. The animals represent SaV hosts associated to a particular genogroup or genogroups. The

PT

interrogation mark in SaV genogroups represents the classification scheme proposed by Scheuer et al. (2013). The highlight in light red color denotes a large clade of SaVs isolated from swine.

CE

The asterisks along the tree branches represent SH-like support values of ≥ 0.85. For color

AC

version of this figure, the reader is referred to the web version of this article.

19

US

CR

IP

T

ACCEPTED MANUSCRIPT

AN

Fig. 3. Bayesian frameworks analyzing SaV population dynamics over whole-genome

M

sequences. The highlight in light green shows the SaV’s epidemiological steps until it reaches other host types. The red circle represents our hypothetical period of SaV introduction and

ED

spread in the human population. The circles indicate SaVs involved in recombination events. The

PT

asterisks in the tree branches represent posterior distribution values of ≥ 0.74. The horizontal bars show the phylogenetic uncertainty (A). Bayesian Skyline Plot (BSP) (B) and The Lineage

CE

Through Time (LTT) (C) plot show SaV population dynamics and the overall pattern of SaV

AC

diversification over-time (C). Black and dashed blue lines mark the medians and the credibility based on 95% highest posterior density (HPD) intervals, respectively. The Y-axis represents the effective number of the viral population dynamics through time (B and C). The density as well as SaV nucleotide substitution rate are shown in panel D. The number above the panel represents the SaV mutation rate (2.723-2). Each color represents pairs of nucleotide rate. Viral abbreviations: HuSaV: Human Sapovirus, PoSaV: Porcine Sapovirus, PESaV: Porcine Enteric Sapovirus. Country abbreviations: PH: Philippines, CH: China, SK: South Korea, BR: Brazil,

ACCEPTED MANUSCRIPT

20

DE: Germany, JP: Japan, US: United States. Other abbreviations: HPD: Highest Posterior Density, ESS: Effective Sample Size. For color version of this figure, the reader is referred to

PT

ED

M

AN

US

CR

IP

T

the web version of this article.

CE

Fig. 4. Maximum Likelihood (ML) phylogenetic tree exploring SaV epidemiology. The

AC

colors highlight different hosts involved in the SaV epidemiological chain. The tree branches colored orange represent the SaV swine ancestry. Our hypothesis regarding the participation of other hosts involved in SaV epidemiological pathways and spread are shown in the center of the ML tree. The circles indicate SaVs involved in recombination events. Asterisks represent SHaLRT/aBayes/ultrafast bootstrap supports values of ≥ 80%. Viral abbreviations: HuSaV: Human Sapovirus, BatSaV: Bat Sapovirus, PoSaV: Porcine Sapovirus. For color version of this figure, the reader is referred to the web version of this article.

ACCEPTED MANUSCRIPT

21

Table 1: Proposed SaV Genogroups based on the complete VP1 gene and reference strains Hosts

GI

U65427 (partial), AY237422, X86560, AY694184, DQ366345 AJ249939, AY603425, AY646855, AY237420, AY237419 AF182760, AY425671 AF435814 (parcial), DQ125333, DQ058829 AY646856, JN420370, AB521771, AB521772, AJ606699, AJ786352, DQ366344, AY289803, AB924385 AY974192, KJ508818, DQ359100 AY144337 EU221477, FJ498786, KC309415, KC309416, KC309417, KC309419 JN387135 JN899072, JN899074

Human

ED

AC

GIX GX

PT

GVII GVIII

CE

GVI

M

AN

GV

CR

GIII GIV

Human

GII

Swine Human

GIII GIV

Human, Swine, Sea lion

GV

Swine Mink Swine

GVI, GVII, GIX?, GX?, GXI? GXII? GVIII

Dog Bat

GXIII? GXIV?

US

GII

Genogroups according to Scheuer et al. (2013) GI

T

Reference sequences

IP

Proposed Genogroups

ACCEPTED MANUSCRIPT

22

Highlights

We employed High-resolution phylogeny to investigate the SaV evolution and taxonomy



We have mapped the SaV epidemiology, zoonotic aspects and taxonomy



We have suggested criteria for SaV genogrouping and genotyping



We have showed the importance of different animal species in SaV spread



We have showed that different animal species may be acting as SaV reservoirs

AC

CE

PT

ED

M

AN

US

CR

IP

T