Reassessing conflicting evolutionary histories of the Paramyxoviridae and the origins of respiroviruses with Bayesian multigene phylogenies

Reassessing conflicting evolutionary histories of the Paramyxoviridae and the origins of respiroviruses with Bayesian multigene phylogenies

Infection, Genetics and Evolution 10 (2010) 97–107 Contents lists available at ScienceDirect Infection, Genetics and Evolution journal homepage: www...

815KB Sizes 0 Downloads 206 Views

Infection, Genetics and Evolution 10 (2010) 97–107

Contents lists available at ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Reassessing conflicting evolutionary histories of the Paramyxoviridae and the origins of respiroviruses with Bayesian multigene phylogenies Alex J. McCarthy, Simon J. Goodman * Institute of Integrative & Comparative Biology, Faculty of Biological Sciences, University of Leeds, LS2 9JT, UK

A R T I C L E I N F O

A B S T R A C T

Article history: Received 17 July 2009 Received in revised form 26 October 2009 Accepted 3 November 2009 Available online 10 November 2009

The evolution of paramyxoviruses is still poorly understood since past phylogenetic studies have revealed conflicting evolutionary signals among genes, and used varying methods and datasets. Using Bayesian phylogenetic analysis of full length single and concatenated sequences for the 6 genes shared among paramyxovirus genera, we reassess the ambiguous evolutionary relationships within the family, and examine causes of varying phylogenetic signals among different genes. Relative to a pneumovirus outgroup, the concatenated gene phylogeny, splits the Paramyxovirinae into two lineages, one comprising the avulaviruses and rubulaviruses, and a second containing the respiroviruses basal to the henipaviruses, and morbilliviruses. Phylogenies for the matrix (M), RNA dependent RNA polymerase (L) and the fusion (F) glycoprotein genes, are concordant with the topology from the concatenated dataset. In phylogenies derived from the nucleocapsid (N) and phosphoprotein (P) genes, the respiroviruses form the most basal genus of the Paramyxovirinae subfamily, with the avulaviruses and rubulaviruses in one lineage, and the henipaviruses, and morbilliviruses in a second. The phylogeny of the hemagglutinin (H) gene places the respiroviruses basal to the avula-rubulavirus group, but the relationship of this lineage with henipa and morbillviruses is not resolved. Different genes may be under varying evolutionary pressures giving rise to these conflicting signals. Given the level of conservation in the M and L genes, we suggest that together with F gene, these or concatenated datasets for all six genes are likely to reveal the most reliable phylogenies at a family level, and should be used for future phylogenetic studies in this group. Split decomposition analysis suggests that recombination within genera, may have a contributed to the emergence of dolphin morbillivirus, and several species within respiroviruses. A partial L gene alignment, resolves the relationship of 25 unclassified paramxyoviruses into 4 clades (Chiopteran-, Salmon-, Rodentian- and Ophidian paramyxoviruses) which group with rubula-, respiro-, morbilliviruses, and within the paramxyovirinae respectively. ß 2009 Elsevier B.V. All rights reserved.

Keywords: Paramyxovirus Selection Recombination Emerging infectious diseases Bats Host jump Disease emergence

1. Introduction Paramyxoviruses are a family of single-stranded negativesense RNA viruses that infect a diverse range of hosts including mammals, birds, reptiles and fish. Recently, several high-profile zoonotic emerging infectious diseases (EIDs) that have important implications from human and domestic animal health have arisen from the Paramyxoviridae. These include the nipah virus (NiV) and hendra virus (HV) in South East Asia, which both have natural reservoirs in bats and which have spread into humans and pigs (Enserink, 2000), and humans and horses, respectively (Selvey et al., 1995). Recent emergences of other paramyxoviruses have also been identified in rodents (beilong virus & J virus, Jun et al., 1977; Li et al., 2006b), bats (menangle virus and tioman virus,

* Corresponding author. Tel.: +44 0113 3432561; fax: +44 0113 3432835. E-mail address: [email protected] (S.J. Goodman). 1567-1348/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2009.11.002

Tidona et al., 1999; Bowden et al., 2001), reptiles (Franke et al., 2001; Jacobson et al., 1997, 2001; Allender et al., 2006; Marschang et al., 2009), and salmon (Kvellestad et al., 2003). The group also includes important established pathogens of humans, domestic animals and wildlife. Important human examples include measles virus (MV) and mumps virus (MuV), both of which still cause high levels of childhood mortality worldwide (Moss and Griffin, 2006). Newcastle disease virus (NDV) remains an important pathogen of poultry, whilst rinderpest virus (RDV) had devastating impacts on domestic and wild bovids but is now on the edge of eradication (Normile, 2008). Canine distemper virus is a growing threat to wild canid populations due to spill over from expanding domestic dog populations (Carpenter et al., 1998; Kennedy et al., 2000; Cleaveland et al., 2006). Understanding the phylogenetic relationships among paramyxoviruses, is crucial to accurately characterise newly emergent viruses, identify their origins, and for gaining insights into the processes and drivers of disease emergence in this group.

98

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

The evolution of, and the relationships among individual species are well known for a subset of paramyxoviruses, but the evolution of unclassified species and among sub-families or genera in the family as a whole, is poorly understood. Currently, paramyxovirus species are classified into two subfamilies and seven genera; the Avulavirus, Henipavirus, Morbillivirus, Respirovirus and Rubulavirus genera belong to the Paramyxovirinae subfamily, whilst the Metapneumovirus and Pneumovirus genera compose the Pneumovirinae subfamily (International Committee on Taxonomy of Viruses (ICTV) 8th report 2005). All paramyxoviruses have a single-stranded negative-sense RNA genome and share six common genes encoding; nucleoprotein (N), phosphoprotein (P), matrix protein (M), fusion glycoprotein (F), hemagglutinin/neuraminidase glycoprotein (H) and large RNA dependent RNA polymerase (L). A number of unshared genes exist in paramyxovirus genomes. These genes include those that encode the small hydrophobic protein (SH), matrix protein 2 (M2), and the non-structural proteins (NS)-1 and (NS)-2. Previous phylogenies constructed for each of the paramyxovirus genes agree that the Henipavirus and Morbillivirus genera are closely related, and the Avulavirus and Rubulavirus genera are sister clades (Chang et al., 2001; Kurath et al., 2004; Seal et al., 2000, 2002; Jordan et al., 2000; Westover and Hughes, 2001; Wise et al., 2004). The phylogenetic position of the respiroviruses remains ambiguous. Some analyses of the sequences of the N, P, M and F genes indicate that the respiroviruses form the most basal genus of the Paramyxovirinae subfamily (Jordan et al., 2000; Chang et al., 2001; Westover and Hughes, 2001; Seal et al., 2002; Kurath et al., 2004; Kumar et al., 2008; Jeon et al., 2008; Shi et al., 2008; Xiao et al., 2009). Whereas, other independent analyses of alignments of the F gene, M gene, and L gene suggest an early divergence between an avula–rubula lineage and a henipa–morbilli–respiro lineage, in which the respiroviruses are basal in the latter clade (Seal et al., 2000; Chang et al., 2001; Westover and Hughes, 2001; Wise et al., 2004; Jeon et al., 2008; Xiao et al., 2009). Maximum parsimony analysis of a complete genome alignment agrees with these latter findings (Lwamba et al., 2005; Nayak et al., 2008). In this paper we aim to resolve previous ambiguities in the complete phylogeny for paramyxoviruses, and to determine the cause of differences in the phylogenetic signal returned by different genes. This study is the first that we are aware of that uses single gene and whole genome Bayesian phylogenetic analysis of the paramyxoviruses. In addition we attempt to resolve the evolutionary relationship of several novel and currently unclassified viruses, and discuss the results in relation to the future potential emergence of novel paramyxoviruses. 2. Materials and methods 2.1. Sequence data Amino acid sequences were extracted from the GenBank database (www.ncbi.nlm.nih.gov). Except for the unclassified viruses, only full-length sequences for each gene were used, and where possible sequences were selected from full genomes sequences so that the concatenated sequence analysis reflected the original genome. Two sequences per viral species were selected randomly where possible in order to give a sample of genome diversity whilst ensuring that the number of taxa per dataset was not large enough to restrict the computational time required to perform analyses. Phylogenies were constructed using amino acid sequences, as opposed to nucleotide sequences, in order to increase the chance of recovering unambiguous relationships for the deep branches in the phylogeny and to limit bias due to homoplasies. Accession numbers for sequences used in the phylogenetic analysis are shown as part of taxa names in

Figs. 1–3. Amino acid sequences from the same complete genome sequence were concatenated together to perform the multigene analysis. In all cases, except for the partial L gene phylogeny, complete amino acid sequences were used for each gene. Complete gene sequences were available for each of the six shared paramyxovirus genes for the majority of species. 2.2. Construction of alignments Alignments for N, P, M, F, H and L amino acid sequences were created in Clustal X (Thompson et al., 1997). For simplicity the attachment glycoprotein gene is referred to as the H gene throughout, irrespective of the presence or absence of hemagglutinin and neuraminidase activity. Alignments were then edited by hand where necessary. Alignment files are available on request. ModelGenerator (Keane et al., 2006) was used to select the most likely model of molecular evolution for each alignment. The highest ranked model was used in subsequent Bayesian analysis. 2.3. Bayesian phylogenetic analysis Phylogenetic analysis was performed using MrBayes v3.1.1 (Hulsenbeck and Ronquist, 2001). Analyses were run for a minimum of 1,000,000 iterations and convergence of Monte-Carlo Markov Chain (MCMC) chains was assessed by the standard deviation between chains falling below 0.05. Trees were sampled every 1000 generations, and the first 20% of sampled trees were disregarded as the burnin period so that parameter estimates were only made from data drawn from distributions derived after the MCMCs had converged (Hulsenbeck and Ronquist, 2001). Two replicate runs were performed for each analysis and the phylogenies assessed for consistency. All phylogenies were drawn and edited in MEGA v4.1 (Kumar et al., 2004). The Pneumovirinae subfamily is used as to show evolutionary relationships between Paramyxovirinae genera. There was no suitable viral species that could be used as an outgroup to root the whole Paramyxoviridae family for multigene analysis. 2.4. Split decomposition analysis Split decomposition is a method for identifying potential recombination events in phylogenies. The output of the split decomposition analysis is a graph that can be portioned into an independent set and a clique. In phylogenetic analysis the presence of recombination results in the construction of a split graph with a reticulated network, as opposed to a bifurcurating phylogeny (Bandelt and Dress, 1992). Hence, the analysis constructs the shortest pathways that link a set of taxa and includes links that produce a reticulated network. Split graphs analysis was performed using SplitsTree v4 (Huson and Bryant, 2006) for each of the six shared paramyxovirus genes, as well as on the concatenated six-gene dataset. Genes M, H and L were analysed with the WAG + G model of evolution, using the ProteinMLdist function. The JTT + G model of evolution was used for genes N, P and F. In split decomposition analysis different models of evolution cannot be assigned to different partitions of a dataset. Therefore, analysis of the concatenated dataset was performed using an uncorrected protein distance model. Support values for branch lengths were computed from 1000 bootstraps. 3. Results 3.1. Bayesian phylogenetic analysis The main objective of this research was to determine the evolutionary relationship among genera of the Paramyxovirinae

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

subfamily. We therefore do not comment in detail on evolutionary signals detected within genera. We discuss gene phylogenies based on the order they appear in the genome from the 30 to 50 direction, N, P, M, H, F and L. The Pneumovirinae are used as an outgroup as they are clearly distinct from the Paramyxovirinae and therefore serve well for exploring relationships within the Paramyxovirinae subfamily. In the following discussion the term ‘basal’ is used to

99

refer to the relationship of the genera relative to the topology of the trees with a pneumovirus outgroup. ‘Correct clades’ refers to taxonomy and nomenclature agreed in the 8th report (2005). The nucleoprotein alignment was composed of 51 taxa (6 avula-, 4 henipa-, 12 morbilli-, 10 pneumo-, 8 respiro- and 11 rubulaviruses) and 615 amino acids. Phylogenetic analysis of the nucleoprotein alignment, using the Jones, Taylor and Thornton (JTT) model of

Fig. 1. Single gene phylogenetic analysis. Bayesian phylogenies of the N, P, M, F, H and L proteins of Paramyxoviridae family of viruses. The following sequence evolution models were used: JTT + G (N, P, F) or WAG + G (H, L, M). Posterior probabilities of branching events are shown at nodes. The Pneumovirinae subfamily is used as an outgroup to show evolutionary relationships among Paramyxovirinae genera (Avulavirus, Henipavirus, Morbillivirus, Respirovirus and Rubulavirus). Branch lengths represent relative genetic distances. GenBank accession numbers of each sequence are shown in taxa names. Abbreviations: Avian paramyxovirus (APMV)-2 and -6; avian pneumovirus (APV); bovine parainfluenza virus (BPIV)-1 and -3; bovine respiratory syncytial virus (BRSV); canine distemper virus (CDV); dolphin morbillivirus (DMV); goose paramyxovirus (GPMV); hendra virus (HV); human parainfluenza virus (HPIV)-1, -2, -3 and -4; human pneumovirus (HPV); human respiratory syncytial virus (HRSV); measles virus (MV); murine pneumovirus (MPV); Newcastle disease virus (NDV); nipah virus (NV); ovine respiratory syncytial virus (ORSV); peste-des-petits’ virus (PDPV); phocine distemper virus (PDV); porcine rubulavirus (LMPV); sendai virus (SeV); simian virus (SV)-5 and -41. Viral clades are shown with black lines. The Pneumovirinae are shown with a dotted line.

100

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

evolution with a gamma distribution of variation (+G), resolved all taxa to genera that have previously been classified by the ICTV 8th report (Pr: 1.00; Fig. 1A). The Respirovirus genus is placed as the most basal clade in the Paramyxovirinae lineage. The next split separates a lineage containing the morbilliviruses and henipaviruses, from a lineage consisting of the avulaviruses and rubulaviruses (Pr: 1.00). Relationships between the henipaviruses and morbilliviruses, and the avulaviruses and rubulaviruses are

supported by high Bayesian posterior probabilities (Pr: 0.99 and 0.98, respectively). Sequences for 51 taxa (6 avula-, 4 henipa-, 10 morbilli-, 10 pneumo-, 8 respiro- and 12 rubulaviruses) were included in the phosphoprotein alignment of 660 amino acids. This topology is the same as the N gene (Fig. 1B), and, supported by consistently high posterior probabilities (Pr: 0.93–1.00), by an analysis that used the JTT + G model of protein evolution. The respiroviruses are basal to

Fig. 1. (Continued ).

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

101

Fig. 1. (Continued ).

the other genera in the Paramyxovirinae. The remaining taxa form a lineage composed of morbilliviruses and henipaviruses (Pr: 1.00) and another lineage consisting of the avulaviruses and rubulaviruses (Pr: 0.96). The matrix protein alignment had 51 taxa (6 avula-, 4 henipa-, 12 morbilli-, 11 pneumo-, 8 respiro- and 10 rubulaviruses) and 421 amino acids, and was analysed using the Whelan and Goldblum (WAG) model of evolution with a gamma distribution (+G). All taxa were separated into correct genera clades (Pr: 0.94–1.00; Fig. 1C).

Analysis of the M protein dataset supports the grouping of the avulaviruses and rubulaviruses (Pr: 0.98). The respiroviruses are placed basal in a lineage that also includes the morbilliviruses and henipaviruses (Pr: 1.00). The fusion protein alignment was composed of 53 taxa (8 avula-, 4 henipa-, 9 morbilli-, 11 pneumo-, 9 respiro- and 12 rubulaviruses) and 943 amino acids. The Jones, Taylor and Thornton (JTT) model of evolution with a gamma distribution of variation (+G) was used for the F protein. The phylogeny shows divergence into an Avulavirus

102

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

Fig. 2. Multigene concatenated phylogenetic analysis. The phylogenies were constructed from concatenation of previously aligned protein sequences in the Clustal X program. Partitions were made in the complete alignment and models of JTT + G (N, P, F) or WAG + G (M, L, H) were assigned to specific gene partitions. Posterior probabilities of branches are shown. The Pneumovirinae subfamily is used as an outgroup to show evolutionary relationships between Paramyxovirinae genera. Branch lengths represent relative genetic distances. GenBank accession numbers of each species are shown in taxa names. Viral clades are shown with black lines. The Pneumovirinae are shown with a dotted line.

and Rubulavirus lineage (Pr: 1.00) and a lineage comprising respiro-, morbilli- and henipaviruses (Pr: 0.91) with the respiroviruses basal in this latter group (Fig. 1D). Analysis of the hemagglutinin protein gene was performed on 58 taxa (9 avula-, 4 henipa-, 13 morbilli-, 14 pneumo-, 8 respiroand 10 rubulaviruses). The H protein alignment was 827 amino acids in length and WAG + G was selected as the best fit. Again all taxa were separated into the correct clusters representative of ICTV genera (Pr: 1.00). The evolutionary relationship between genera of the Paramyxovirinae is not completely resolved (Fig. 1E), with the position of the henipa- and morbilliviruses relative to the other

genera in the clade remaining ambiguous. The respiroviruses are placed basal in a lineage comprising the respiro-, avula- and rubulaviruses (Pr: 0.92). The L protein alignment comprises 2480 amino acids and 42 taxa (6 avula-, 3 henipa-, 8 morbilli-, 9 pneumo-, 8 respiro- and 9 rubulaviruses). The WAG + G model of protein evolution was identified as the best-fit model. In the phylogeny all taxa were resolved into clusters corresponding to the expected genera (Pr: 1.00; Fig. 1F). As for the M and F genes topologies, one lineage was recovered comprising the avulaviruses and rubulaviruses, and a second composed of the henipa-, morbilli- and respiroviruses (Pr:

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

103

Fig. 3. Partial L gene phylogenetic analysis. Phylogeny from neighbour-joining analysis of partial L gene sequences (nucleotide sequences encoding residues 479–647 of the complete whole gene alignment) of the Paramyxoviridae family of viruses, including currently unclassified viruses. The Pneumovirinae subfamily is used as an outgroup to show evolutionary relationships between Paramyxovirinae genera. Branch lengths represent relative genetic distances. Support values are derived from Bootstrap analysis. GenBank accession numbers of each species are shown in taxa names. The Pneumovirinae are shown with a dotted line, whilst unclassified virus clades are shown in grey. Abbreviations; Atlantic Salmon paramyxovirus (aSPMV), Beilong virus (BeV), Fer-de-Lance virus (FDLV), J-virus (JV), menangle virus (MenV), ophidian paramyxovirus (OPMV), Pacific Salmon paramyxovirus (pSPMV), tioman virus (TiV) and tupaia virus (TuV).

1.00). In the second lineage, the respiroviruses are basal to the henipaviruses and morbilliviruses (Pr: 1.00). The phylogenies from the different genes vary in the placement of the Respirovirus genus. The placement of the repsiroviruses varies between genes at the 30 and 50 ends of the genome. Notably, phylogenies based on 30 genes (N and P: total 1275 amino acids) place the respiroviruses basal to the other four genera. Whereas,

phylogenies based on 50 genes (M, F, and L: total 3884 amino acids) place the repsiroviruses basal to the henipa- and the morbilliviruses. The H gene places the respiroviruses basal to the avulaand rubulaviruses, but this should be treated with caution given the unresolved position of the morbilli- and henipaviruses. Potentially the difference between genes could be explained by a recombination event between the ancestor of the respiroviruses

104

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

and the ancestor of the avula–rubula clade, at a location between the P and M genes. This would be detectable as deep reticulation between the genera in a Splits decomposition analysis. An analysis of a concatenated alignment using sequences from all genes combined yielded a topology identical to the topologies derived from the individual F, L and M gene datasets, but different to from those obtained from the H, N and P genes (Fig. 2). The F, N and P protein regions were assigned the JTT + G model of evolution, whilst the H, L and M protein regions were assigned the WAG + G model of evolution. The avulaviruses and rubulaviruses form one lineage (Pr: 1.00), and the henipaviruses, morbilliviruses and respiroviruses forming a second (Pr: 1.00), with the respiroviruses basal in this second group (Pr: 1.00). Limited sequence data was available for 25 unclassified virus sequences. The viruses included in this analysis were beilong virus (BeV), fer-de-lance virus (FDLV), J virus, menangle virus (MeV), salmon paramyxoviruses (SAPV), tioman virus (TiV). These viruses were not included in the initial analysis since they lacked the necessary sequence coverage. In order to attempt to place these viruses we used 168 amino acids sites (residues 479–647) common to the main L gene alignment of 72 viral sequences. A NeighbourJoining (NJ) phylogeny was constructed using a JTT distance with a Poisson correction, and 1000 bootstraps. The analysis recovered the same underlying topology as the complete L gene alignment with Bayesian analysis (Fig. 3). The 25 unclassified viruses form 4 clusters according to their host taxonomic origin, each with mid-high bootstrap support (Salmon paramxyoviruses (bootstrap: 100); Ophidian (snake) (bootstrap: 100); Rodentian (bootstrap: 66); Chiopteran (bootstrap: 99)). There is good bootstrap support (bootstrap: 85) for clustering of salmon paramyxoviruses with respiroviruses, and this grouping, along with the Ophidian paramyxoviruses, cluster with a strongly supported group (bootstrap: 98) comprising rodentian paramyxoviruses, morbilli and hernipa viruses. Rodentian paramyxoviruses and morbillivirus appear to be sister genera (bootstrap: 74), with henipaviruses basal to this grouping. The placement of the ophidianviruses within the paramxyovirinae is not clear given the low bootstrap support (bootstrap: 53) at the internal node. Therefore, with present data, there is no strong evidence for grouping of Orphidian viruses with any known genus. MeV and TiV, both bat hosted viruses, form a distinct chiropteran clade closely related to the rubulaviruses (bootstrap: 94). 3.2. Splits decomposition analysis The split graph results for analyses of each gene are shown in online supplementary Figure 1 (Fig. S1.). Bootstrap values are consistently above 85% for the majority of branches, supporting the robustness of the split graphs. The split graphs are consistent with the phyolgenies recovered from the Bayesian analyses. Recombination disrupts bifurcating evolutionary signals to produce a network of sequences. This is represented in split graphs by the formation of parallel edges between taxa if conflicting signals exist in the data. Reticulations are therefore representative of recombination. Reticulations are present at the basal node of clades in the analyses of the N gene (morbilli- and pneumoviruses); P genes (rubulaviruses); M gene (morbilliviruses); F gene (morbilliviruses); H gene (morbilli-, respiro- and rubulaviruses), but not in the L gene for any genera (Fig. S1). In addition the split decomposition analysis of the multigene concatenated dataset show reticulations within the morbilli-, respiro-, rubula- and pneumoviruses (Fig. S2). In the presence of frequent and strong recombination between the ancestors of viral genera we would expect reticulation between genera clades, but these are not observed in any of the split recombination analyses. The lack of strong recombination signals suggests that recombination has not

played a prominent role in the evolutionary history of this virus sub-family in terms of the emergence of new genera, and probably does not explain the differing pattern of topologies recovered from genes 30 and 50 in the genome with respect to the position of respiroviruses. However, the splits analysis does indicate that recombination may have played a role in the emergence of multiple viral species within genera, including dolphin morbillivirus (DMV) in the morbilliviruses (reticulations are observed between DMV, canine distemper virus (CDV) and peste-des-petits ruminants’ virus (PDPV)), and several lineages within the respiroviruses and pneumoviruses. 4. Discussion This study presents the first Bayesian multigene phylogenetic analysis for paramyxoviruses and comparison of the differing phylogenetic signals from each of the main shared genes. Understanding the evolution of some viral species and genera through phylogenetic analysis has proven problematic in the past due to the saturation of synonymous substitutions (Lukashov and Goudsmit, 2002). To date most of the uncertainty in phylogenetic analyses of the paramyxoviruses has centred on the evolutionary relationship of the Respirovirus genus (Jordan et al., 2000; Seal et al., 2000, 2002; Chang et al., 2001; Westover and Hughes, 2001; Kurath et al., 2004; Wise et al., 2004; Lwamba et al., 2005; Kumar et al., 2008; Jeon et al., 2008; Shi et al., 2008; Nayak et al., 2008). Our findings show consistency in the topologies recovered from the M, F, L and concatenated datasets, with the respiroviruses placed basal to the lineage containing the morbilliviruses and henipaviruses as sister clades; and with the rubulaviruses and avulaviruses grouped in a separate lineage (Figs. 1C, D, F and 2). The N and P genes yield a different topology where the respiroviruses are the most basal genus of the Paramyxovirinae subfamily, but retaining the morbilli–henipavirus and the rubula–avulavirus lineages (Fig. 1A and B). The phylogeny of the hemagglutinin (H) gene places the respiroviruses basal to the avula–rubulavirus group, but the relationship of this lineage with henipa and morbillviruses is not resolved (Fig. 1E). Overall we suggest the M, F, and L gene topology is likely to be the most reliable since it is favoured by the majority of genes, with the greatest length of sequence data, and the concatenated dataset. Also bias due to high mutation rates and selection may be less in the L and M genes. If we accept the MFL topology, this suggests that the respiroviruses diverged early in the history of the Paramyxovirinae group, but that they are probably not the common ancestor of all the remaining extant Paramyxovirinae. Recently, there has been debate regarding the potential bias in the construction of phylogenies from concatenated datasets (Suzuki et al., 2002; Rokas et al., 2003). The statistical confidence of interior branches judged by the posterior probabilities of Bayesian analysis is considered to be better than bootstrap probabilities derived from of Maximum Likelihood analysis. However, the judgment of posterior probabilities on concatenated datasets is argued to be too liberal and can result in elevated posterior probabilities. In some case, phylogenies from concatenated datasets can be dominated by the topology arising from the sequences with the strongest phylogenetic signal. In our analysis models of evolution were assigned to each partition of the concatenated dataset. This takes into account difference in substitution rate, difference base compositions and different transition/transversion rates between genetic loci. In concatenated datasets lineage sorting may lead to inconsistent estimates of phylogenetic trees. These cases are likely to be rare events (Rannala and Yang, 2008). The most likely factors driving the differing phylogenetic signals among genes observed in this study, and by other authors,

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

are varying levels of selection, recombination and mutation rate, which can cause deviations from assumptions of neutrality, or which can influence whether phylogenetically informative variation that captures divergence events is accumulated, or lost due to saturation. Due to the quantity of data available for each gene at the time of analysis, the taxa sample size varied between genes from 42 taxa for the L gene to 58 for the H gene. The H, N and P genes had some of the larger sample sizes, so potentially bias due to variation in sample size may also explain differences in topology returned. However, this seems unlikely since all the datasets contain a representative sample taxa in all the genera, and the partial L gene analysis with 72 taxa returns the same underlying topology as the full-length analysis with 42 species. Testing for effects of selection at family/subfamily level is problematic for a dataset such as this since either the population/ species sample sizes are too small, or sequences are too divergent, violating assumptions of relatively simple tests such as the McDonald–Kreitman test (McDonald and Kreitman, 1991) and its derivatives (Fay et al., 2001). Such methods typically compare diversity within a species against the diversity between two species at synonymous and non-synonymous sites. Future studies need to sequence multiple strains of viral species in order to make more rigorous investigations of the difference in selective pressures acting on paramyxovirus genomes. Such a study could provide essential insights into the processes involved in the diversification of the paramyxoviruses and in the formation of novel lineages and species. Studies of individual virus genera however do indicate varying patterns, and different types of selection may be acting on individual genes. A possible explanation for the poorly resolved phylogeny for the H gene is a high level of homoplasy due to elevated substitution rates driven by selection pressures to evade host immune systems. Recent studies on a large dataset of CDV sequences (McCarthy et al., 2007) also showed significant positive selection acting on the H gene, and evidence for varying levels of purifying selection acting across genomes. In the CDV H gene diversifying selection at key residues associated with receptor tropism appears to be linked with host switches, and therefore could be a driver of disease emergence in new species. Overall, the L and M genes may provide the most reliable evolutionary signal at the family level since these genes are more conserved than other paramyxovirus genes to maintain crucial domains for replication and lytic formation respectively (Afzal et al., 1994; Zanetti et al., 2003). We also considered if recombination has contributed to differing evolutionary histories among genes at the family level, in particular, whether a single recombination event could have occurred between the ancestor of the respiroviruses and the ancestor of the rubulaviruses and avulaviruses between the P and M genes. Previous studies have described only low rates of recombination in paramyxoviruses implying that overall there is no prominent role for recombination in the evolutionary history of the paramyxoviruses (Worobey and Holmes, 1999; Chare et al., 2003; Schierup et al., 2005). In this study, with our larger sample size, split graphs from all six genes and concatenated multigene dataset show the presence of minor reticulations at the basal nodes of genera, but not between genera (Figs. S1 and S2). This confirms that recombination probably is not a major driver of evolutionary change in the paramyxoviruses at the family level, and probably does not explain the varying signal between the F, M, L, genes and H, P, N. The split decomposition analysis presented here is consistent with rare recombination events that have been detected within individual species, such as in the canine distemper virus (CDV) genome (McCarthy et al., 2007; Han et al., 2008). McCarthy et al. identified recombination in the F gene between two pairs of

105

canine distemper virus (CDV) isolates from dogs. More recently, rare recombination events have also been identified in the H gene of isolates of CDV from wild carnivores (Han et al., 2008). Therefore, it is likely that rare recombination events may have generated diversity within genera and species, and could potentially act as a trigger for the emergence of novel infectious disease agents (Holmes and Drummond, 2007). Splits decomposition analysis indicates that dolphin morbillivirus (DMV) may have originated from recombination events between CDV and peste-des-petits ruminants virus (PDPV; or the ancestral ruminant morbillivirus), since reticulations exist between these three viral species in all split decomposition analyses except the L gene (Figs. S1 and S2). Reticulations exist between multiple species in other lineages but are less consistent than for DMV. However, overall this suggests that rare recombination could be a driver for the future emergence of novel paramyxovirus species within genera. Our analysis of currently unclassified paramyxoviruses based on partial L gene sequences helps position 4 previously uncharacterized clades. The respiroviruses form a well-supported lineage with the salmon paramyxoviruses, which clusters with a rodentian–morbilli–henipavirus clade. The ophidian paramyxoviruses also cluster with the two groups, but with present data its position cannot be resolved reliably. A future full genome analysis should solve this ambiguity. This result supports the findings of Marschang et al. (2009) and also provides evidence for a clade of rodent viruses (including J virus (JV) isolated from feral mice, tupaia (TuV) shrew virus and Beilong virus (BeV); (bootstrap: 66)), as a sister group (bootstrap: 74) to the currently known morbilliviruses. This could suggest a potential origin in rodents for the extant morbilliviruses. Whilst based on topology alone, an origin in Morbillivirus hosts of rodent viruses is equally likely, the former would fit with a model of disease emergence in morbilliviruses being driven by progressive host jumps between domesticated species based on time of domestication, with mice/ rats being the first species to become commensal with humans (McCarthy & Goodman, unpublished data). A second cluster of bat viruses (menangle pig/human/bat (MeV) and tioman fruit bat (TiV), (bootstrap: 94), separate from the henipaviruses, was identified and groups as a sister lineage to the rubulaviruses (bootstrap: 95) confirming previous observations made by Chua et al. (2002). An alternate view of the phylogeny and evolutionary history of this family would be that there is not a single correct phylogeny and that different parts of the paramyxovirus genome genuinely do have varying histories with respect to the respiroviruses, so there is no true species tree as such. However, before accepting this view further analysis including complete genome sequences for the all newly identified genera should be done. In addition as more species level sequences become available a thorough within genus and species assessment of selection to identify potential bias in different genes with more rigour than is possible with current data also needs to be carried out. Incorporating additional sources of phylogenetic information such as the gain/loss of non-shared genes across the family could also be informative. The latter may also prove useful in investigating mechanisms by which viruses gain new functions or host ranges. Over the past few decades many novel paramyxoviruses have been discovered in animal species not previously known to host paramxyoviruses. In particular four novel bat-associated paramyxoviruses have been identified since the early 1990s that have crossed in to domestic animals and humans without warning. Characterisation of novel viral diseases among pigs and horses and associated human cases in South-East Asia and Australia led to the identification of nipah virus (NV) and hendra virus (HV), which

106

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107

were subsequently placed in the Henipavirus genus, a sister lineage to the morbilliviruses. Similarly menangle virus (MeV) and tioman virus (TiV) have also recently been characterised and are closely related to the rubulaviruses (Bowden and Boyle, 2005; Chua et al., 2002; Wang et al., 2007). Additionally, maupera virus (MPRV), isolated from fruit bats, and porcine rubulavirus (PoRV) are found to be closely related members of the rubulavirus genus (Wang et al., 2007). Since we now know there are multiple incidents of bat and rodent viruses across the whole paramyxovirus phylogeny, and bats and rodents are being increasingly recognized as important sources of EIDs (Halpin et al., 2007), we suggest that there is a high probability that there are further as yet undiscovered paramyxoviruses circulating in bats and rodents, and predict there should be bat respiroviruses. It is also possible that bats/rodents may have been the ancestral hosts of paramyxoviruses. Given the importance of paramyxoviruses as human, domestic and wild animal pathogens, and their potential to cross species boundaries, including jumping into humans, we urgently need to establish the full number of species and the extent of their host range and potential reservoirs. We also need to increase our understanding of Paramyxoviridae family evolution, and determine the drivers that may contribute to the emergence of novel paramyxoviruses, or host jumps. Having the knowledge to predict such events is key to limiting future impacts from these pathogens (Jones et al., 2008). Acknowledgements A.J.M was funded by a BBSRC PhD studentship. We thank MarieAnne Shaw and Rupert Quinnell for constructive discussions during the course of the study. We are also grateful to the anonymous referees whose suggestions helped improve the original manuscript. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.meegid.2009.11.002. References Afzal, M.A., Pickford, A.R., Yates, P.J., Forsey, T., Minor, P.D., 1994. Matrix protein gene sequence of vaccine and vaccine-associated strains of mumps virus. J. Gen. Virol. 75, 1169–1172. Allender, M.C., Mitchell, M.A., Phillips, C.A., Gruszynski, K., Beasley, V.R., 2006. Hematology, plasma biochemistry, and antibodies to select viruses in wildcaught Eastern massasauga rattlesnakes (Sistrurus catenatus catenatus) from Illinois. J. Wildlife Dis. 42, 107–114. Bandelt, H.J., Dress, A.W., 1992. Split decomposition: a new and useful approach to phylogenetic analysis of distance data. Mol. Phylo. Evol. 1, 242–252. Bowden, T.R., Westenberg, M., Wang, L.F., Eaton, B.T., Boyle, D.B., 2001. Molecular characterization of Menangle virus, a novel paramyxovirus which infects pigs, fruit bats, and humans. Virology 283, 358–373. Bowden, T.R., Boyle, D.B., 2005. Completion of the full-length genome sequence of Menangle virus: characterization of the polymerase gene and genomic 50 trailer region. Arch. Virol. 150, 2125–2137. Carpenter, M.A., Appel, M.J., Roelke-Parker, M.E., Munson, L., Hofer, H., East, M., O’Brien, S.J., 1998. Genetic characterization of canine distemper virus in Serengeti carnivores. Vet. Immunol. Immunopathol. 65, 259–266. Chang, P.C., Hsieh, M.L., Shien, J.H., Graham, D.A., Lee, M.S., Shieh, H.K., 2001. Complete nucleotide sequence of avian paramyxovirus type 6 isolated from ducks. J. Gen. Virol. 82, 2157–2168. Chare, E.R., Gould, E.A., Holmes, E.C., 2003. Phylogenetic analysis reveals a low rate of homologous recombination in negative-sense RNA viruses. J. Gen. Virol. 84, 2691–2703. ˜ o and Chua, K.B., Chua, B.H., Wang, C.W., 2002. Anthropogenic deforestation, El Nin the emergence of Nipah virus in Malaysia. Malay. J. Path. 24, 15–21. Cleaveland, S., Meslin, F.X., Breiman, R., 2006. Dogs can play a useful role as sentinel hosts for disease. Nature 440, 605. Enserink, M., 2000. Emerging diseases. Malaysian researches trace Nipah virus outbreak to bats. Science 289, 518–519.

Fay, J.C., Wyckoff, G.J., Wu, C.I., 2001. Positive and negative selection on the human genome. Genetics 158 (3), 1227–1234. Franke, J., Essbauer, S., Ahne, W., Blahak, S., 2001. Identification and molecular characterization of 18 paramyxoviruses isolated from snakes. Virus Res. 80 (1– 2), 67–74. Halpin, K., Hyatt, A.D., Plowright, R.K., Epstein, J.H., Daszak, P., Field, H.E., Wang, L.F., Daniels, P.W., 2007. Emerging viruses: coming in on a wrinkled wing and a prayer. Clin. Infect. Dis. 44, 711–717. Han, G.Z., Liu, X.P., Li, S.S., 2008. Cross-species recombination in the haemagglutinin gene of canine distemper virus. Virus Res. 136, 198–201. Holmes, E.C., Drummond, A.J., 2007. The evolutionary genetics of viral emergence. Curr. Top. Microbiol. Immunol. 315, 51–66. Hulsenbeck, J., Ronquist, F., 2001. MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17, 754–755. Huson, D.H., Bryant, D., 2006. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267. International Committee on Taxonomy of Viruses Eighth Report, 2005. Elsevier/ Academic Press, London. Jacobson, E.R., Adams, H.P., Geisbert, T.W., Tucker, S.J., Hall, B.J., Homer, B.L., 1997. Pulmonary lesions in experimental ophidian paramyxovirus pneumonia of Aruba Island rattlesnakes, Crotalus unicolor. Vet. Pathol. 34, 450–459. Jacobson, E.R., Origgi, F., Pessier, A.P., Lamirande, E.W., Walker, I., Whitaker, B., Stalis, I.H., Nordhausen, R., Owens, J.W., Nichols, D.K., Heard, D., Homer, B., 2001. Paramyxovirus infection in caiman lizards (Draecena guianensis). J. Vet. Diag. Invest. 13, 143–151. Jeon, W.J., Lee, E.K., Kwon, J.H., Choi, K.S., 2008. Full-length genome sequence of avain paramyxovirus type 4 isolated from a mallard duck. Virus Genes 37, 342– 350. Jones, K.E., Patel, N.G., Levy, M.A., Storeygard, A., Balk, D., Gittleman, J.L., Daszak, P., 2008. Global trends in emerging infectious diseases. Nature 451, 990–993. Jordan, I.K., Sutter IV, B.A., McClure, M.A., 2000. Molecular evolution of the paramyxoviridae and rhabdoviridae multiple-protein-encoding P gene. Mol. Biol. Evol. 17, 75–86. Jun, M.H., Karabatsos, N., Johnson, R.H., 1977. A new mouse paramyxovirus (J virus). Aust. J. Exp. Biol. Med. Sci. 55, 645–647. Keane, T., Creevey, C., Pentony, M., Naughton, N., McInerney, J., 2006. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol. Biol. 6, 29. Kennedy, S., Kuiken, T., Jepson, P., Deaville, R., Forsyth, M., 2000. Mass die-off of Caspian seals caused by canine distemper virus. Emerg. Infect. Dis. 6, 637–639. Kumar, S., Tamura, K., Nei, M., 2004. MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 5, 150–163. Kumar, S., Nayak, B., Collins, P.L., Samal, S.K., 2008. Complete genome sequence of avian paramyxovirus type 3 reveals an unusually long trailer region. Virus Res. 137, 189–197. Kurath, G., Batts, W.N., Ahne, W., Winton, J.R., 2004. Complete genome sequence of Fer-de-Lance virus reveals a novel gene in reptilian paramyxoviruses. J. Virol. 78, 2045–2056. Kvellestad, A., Dannevig, B.H., Falk, K., 2003. Isolation and partial characterization of a novel paramyxovirus from the gills of diseased seawater-reared Atlantic salmon (Salmo salar L.). J. Gen. Virol. 84, 2179–2189. Li, Z., Yu, M., Zhang, H., Magoffin, D.E., Jack, P.J., Hyatt, A., Wang, H.Y., Wang, L.F., 2006. Beilong virus, a novel paramyxovirus with the largest genome of nonsegmented negative-stranded RNA viruses. Virology 346, 219–228. Lukashov, V.V., Goudsmit, J., 2002. Evolutionary relationships among Astroviridae. J. Gen. Virol. 83, 1397–1405. Lwamba, H.C., Alvarez, R., Wise, M.G., Yu, Q., Halvorson, D., Njenga, M.K., Seal, B.S., 2005. Comparison of the full-length genome sequence of avian metapneumovirus subtype C with other paramyxoviruses. Virus Res. 107, 83–92. Marschang, R.E., Papp, T., Frost, J.W., 2009. Comparison of paramyxovirus isolates from snakes, lizards and a tortoise. Virus Res. 144, 272–279. McCarthy, A.J., Shaw, M.A., Goodman, S.J., 2007. Pathogen evolution and disease emergence in carnivores. Proc. Biol. Sci. 274, 3165–3174. McDonald, J.H., Kreitman, M., 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654. Moss, W.J., Griffin, D.E., 2006. Global measles elimination. Nat. Rev. Microbiol. 4, 900–908. Nayak, B., Kumar, S., Collins, P.L., Samal, S.K., 2008. Molecular characterization and complete genome sequence of avian paramyxovirus type 4 prototype strain duck/Hong Kong/D3/75. Virol. J. 5, 124. Normile, D., 2008. Rinderpest. Driven to extinction. Science 319, 1606–1609. Rannala, B., Yang, Z., 2008. Phylogenetic inference using whole genomes. Annu. Rev. Genom. Hum. Genet. 9, 217–231. Rokas, A., Williams, B.L., King, N., Carroll, S.B., 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804. Schierup, M.H., Mordhorst, C.H., Muller, C.P., Christensen, L.S., 2005. Evidence of recombination among early-vaccination era measles virus strains. BMC Evol. Biol. 5, 52. Seal, B.S., King, D.J., Meinersmann, R.J., 2000. Molecular evolution of the Newcastle disease virus matrix protein gene and phylogenetic relationships among the paramyxoviridae. Virus Res. 66, 1–11. Seal, B.S., Crawford, J.M., Sellers, H.S., Locke, D.P., King, D.J., 2002. Nucleotide sequence analysis of the Newcastle disease virus nucleocapsid protein gene and phylogenetic relationships among the Paramyxoviridae. Virus Res. 83, 119–129.

A.J. McCarthy, S.J. Goodman / Infection, Genetics and Evolution 10 (2010) 97–107 Selvey, L.A., Wells, R.M., McCormack, J.G., Ansford, A.J., Murray, K., Rogers, R.J., Lavercombe, P.S., Selleck, P., Sheridan, J.W., 1995. Infection of humans and horses by a newly described Morbillivirus. Med. J. Aust. 162, 642–645. Shi, L.Y., Li, M., Yuan, L.J., Wang, Q., Li, X.M., 2008. A new paramyxovirus, Tianjin strain, isolated from common cotton-eared marmoset: genome characterization and structural protein sequence analysis. Arch. Virol. 153, 1715–1723. Suzuki, Y., Glazko, G.V., Nei, M., 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. USA 99, 16138– 16143. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882. Tidona, C.A., Kurz, H.W., Gelderblom, H.R., Darai, G., 1999. Isolation and molecular characterization of a novel cytopathogenic paramyxovirus from tree shrews. Virology 258, 425–434. Wang, L.F., Hansson, E., Yu, M., Chua, K.B., Mathe, N., Crameri, G., Rima, B.K., MorenoLo´pez, J., Eaton, B.T., 2007. Full-length genome sequence and genetic relation-

107

ship of two paramyxoviruses isolated from bat and pigs in the Americas. Arch. Virol. 152, 1259–1271. Westover, K.M., Hughes, A.L., 2001. Molecular evolution of viral fusion and matrix protein genes and phylogenetic relationships among the paramyxoviridae. Mol. Phylogenet. Evol. 21, 128–134. Wise, M.G., Sellers, H.S., Alvarez, R., Seal, B.S., 2004. RNA-dependent RNA polymerase gene analysis of worldwide Newcastle disease virus isolates representing different virulence types and their phylogenetic relationship with other members of the paramyxoviridae. Virus Res. 104, 71–80. Worobey, M., Holmes, E.C., 1999. Evolutionary aspects of recombination in RNA viruses. J. Gen. Virol. 80, 2535–2543. Xiao, S., Paldurai, A., Nayak, B., Subbiah, M., Collins, P.L., Samal, S.K., 2009. Complete genome sequence of avian paramyxovirus type 7 (strain Tennessee) and comparison with other paramyxoviruses. Virus Res. 145, 80–91. Zanetti, F., Rodrı´guez, M., King, D.J., Capua, I., Carrillo, E., Seal, B.S., Berinstein, A., 2003. Matrix protein gene sequence analysis of avian paramyxovirus 1 isolates obtained from pigeons. Virus Genes 26, 199–206.