Adaption and Parallel Evolution of Human-Isolated H5 Avian Influenza Viruses

Adaption and Parallel Evolution of Human-Isolated H5 Avian Influenza Viruses

Journal Pre-proof Adaption and Parallel Evolution of Human-Isolated H5 Avian Influenza Viruses Wanting He , Liang Wang , Yuhui Zhao , Ningning Wang ,...

773KB Sizes 0 Downloads 40 Views

Journal Pre-proof

Adaption and Parallel Evolution of Human-Isolated H5 Avian Influenza Viruses Wanting He , Liang Wang , Yuhui Zhao , Ningning Wang , Gairu Li , Michael Veit , Yuhai Bi , George F. Gao , Shuo Su PII: DOI: Reference:

S0163-4453(20)30036-0 https://doi.org/10.1016/j.jinf.2020.01.012 YJINF 4432

To appear in:

Journal of Infection

Accepted date:

18 January 2020

Please cite this article as: Wanting He , Liang Wang , Yuhui Zhao , Ningning Wang , Gairu Li , Michael Veit , Yuhai Bi , George F. Gao , Shuo Su , Adaption and Parallel Evolution of Human-Isolated H5 Avian Influenza Viruses, Journal of Infection (2020), doi: https://doi.org/10.1016/j.jinf.2020.01.012

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 The British Infection Association. Published by Elsevier Ltd. All rights reserved.

Adaption and Parallel Evolution of Human-Isolated H5 Avian Influenza Viruses Wanting He1,a, Liang Wang2,a, Yuhui Zhao2 ,Ningning Wang1, Gairu Li1, Michael Veit3, Yuhai Bi2, George F. Gao2 and Shuo Su1*

1. MOE International Joint Collaborative Research Laboratory for Animal Health & Food Safety, Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China. 2. CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China. 3. Institute for Virology, Center for Infection Medicine, Veterinary Faculty, Free University Berlin, Robert-von-OstertagStraβe 7-13, Berlin, Germany. *Corresponding authors: Shuo Su ([email protected]) a

co-first author

Abstract Avian-to-human transmission of highly pathogenic avian influenza viruses (HPAIV) and their subsequent adaptation to humans are of great concern to public health. Surveillance and early warning of AIVs with the potential to infect humans and pandemic potential is crucial. In this study, we determined whether adaptive evolution occurred in human-isolated H5 viruses. We evaluated all available genomes of H5N1 and H5N6 avian influenza A virus. Firstly, we systematically identified several new mutations in H5 AIV that might be associated with human adaptation using a combination of novel comparative phylogenetic methods and structural analysis. Some changes are the result of parallel evolution, further demonstrating their importance. In total, we identified 102 adaptive evolution sites in eight genes. Some 1

residues had been previously identified, such as 227 in HA and 627 in PB2, while others have not been reported so far. Ten sites from four genes evolved in parallel but no obvious positive selection was detected. Our study suggests that during infection of humans, H5 viruses evolved to adapt to their new host environment and that the sites of adaptive/parallel evolution might play a role in crossing the species barrier and the response to new selection pressure. The results provide insight to implement early detection systems for transitional stages in H5 AIV evolution before it potential adaptation for humans. Author Summary Line The prerequisite of surveillance and early warning of avian influenza viruses with the potential to infect humans depends on the identification of human-adaptation related mutations. In this study, we used a novel approach combining both phylogenetic and structural analysis to identify possible human-adaptation related mutations in H5 AIVs. Previous studies reported human-adaptation related mutations and some novel mutations exhibiting parallel evolution. Our result provides new insights into how avian viruses adapt to humans by point mutations. Key words: H5 Avian Influenza Virus; Adaptive Evolution; Structural Analysis Introduction Since 1997, human infection with highly pathogenic avian influenza virus (HPAI) in Hong Kong (SAR of China) (1) became an important threat to public health. Until November 2018, 860 human infections occurred worldwide with 454 cases having a fatal

outcome

(http://www.who.int/influenza/human_animal_interface/2018_09_21_tableH5N1.pdf? ua=1). The continued circulation of the H5N1 virus within poultry has caused outbreaks every few years and resulted in great economic losses. To reduce the deleterious effects of H5N1 infection, culling and/or vaccinations of poultry were 2

applied (2). In China, the government obliges farmers to vaccinate all domestic birds, with the replacement of vaccine strains with the current endemic virus strains every few years. Nevertheless, the live poultry trade has contributed to the continued presence of many avian influenza virus (AIV) subtypes in China. Moreover, the ‘silent spread’ of viruses within poultry flocks may increase the risk of transmission in poultry (3), which is largely due to incomplete vaccine protection at the flock level (4, 5). Because vaccination could not thoroughly eliminate the virus, H5N1 has continuously evolved and diversified into many distinct lineages in China and other Asian countries (6, 7). Moreover, new H5N1 reassortants frequently emerge in domestic ducks in China, including H5N2, H5N8, H5N9 (8, 9), and H5N6 that replaced H5N1 as the dominant AIV subtype in China. These new viruses infected 22 humans (https://www.chp.gov.hk/sc/resources/29/332.html). Genetic analysis of H5Ny showed that the virus continued to undergo dynamic reassortment in live poultry markets (LPMs) with the potential to infect humans (10). The World Organization for Animal Health (OIE) and the World Health Organization (WHO) designated HPAI H5 subtypes and the recent H7N9 viruses as posing the highest threat to public health (10). Given the ongoing infections of humans by viruses from these lineages, they are considered to have pandemic potential. AIVs must adapt to novel conditions when establishing themselves in humans, a process that provides strong selection pressure. Several amino acid exchanges have been described that are required for H5 viruses to acquire airborne transmissibility in ferrets, the best animal model for human transmission. In particular, HA must adapt to a different receptor and this requires changes in the receptor binding pocket, a loss of a glycosylation site for better accessibility of the human receptor, and also changes that enhance the stability of the HA molecule. Likewise, a mutation in the polymerase complex enhances the replication rate in human cells (11-15). This is especially interesting since common genetic rules of “host jump” from avian to humans which happen under different conditions are still elusive. Parallel molecular evolution 3

describes repeated evolutionary changes leading to the same phenotype in unrelated populations, resulting from adaptation by natural selection to comparable ecological niches (16). The chances of adaptive evolution increase as selection pressure and population sizes grow. In China and some Asian countries, many infections of humans occurred at live poultry markets (LPM) which are an important ecosystem for the circulation, evolution, and adaptation of avian influenza viruses. Notably, the transmissibility among humans with the HPAI Asian H5 lineage is perhaps the most important pandemic risk and has become worrisome, especially since some cases with limited human-to-human transmission occurred (17, 18). To better understand the genetic dynamics of their transmission and adaption from avian sources to humans, we developed a comparative approach by integrating phylogenetic and molecular selection analysis with a subsequent structural biology analysis. We aimed at identifying genome-wide mutations that exhibit adaptive evolution in independent lineages and we applied this approach to human isolates of H5 avian influenza viruses. This study provides the first large-scale approach to examine the evolution of H5 influenza virus. Since this virus has invaded the human population multiple times, we attempted to determine whether adaptive evolution had occurred leading to changes from avian to human.

Materials and Methods Sequence Retrieval and Phylogenetic Reconstruction Since only H5N1 and H5N6 influenza viruses have been reported to infect humans, we focused on the viruses of these two subtypes in this study. We combined all sequences from H5N1 and H5N6 in GISAID and sequences of human isolates from the NCBI Influenza Research Database. Identical sequences were removed using CD-HIT (19, 20). Sequences derived from laboratory strains and containing 4

ambiguous

characters

were

also

removed

from

the

dataset.

ORFfinder

(https://www.ncbi.nlm.nih.gov/orffinder/) was then used to determine the open reading frames (ORF). Only the longest ORF were retained for each sequence, i.e. M2 and NS2 sequences were not analyzed. Protein sequences were aligned using MAFFT (21) with default parameters. Then, we backed the alignment to its corresponding coding sequences using an in-house script to avoid frame shift errors. In order to avoid the impact of partial coding sequences on the phylogenetic tree and on further analysis, sequences with more than 20% of gaps were removed from the dataset. The total number of sequences for HA, NA, MP, NP, NS, PA, PB1, and PB2 were 5071, 3903, 2647, 2987, 2835, 3213, 3256, and 3191, respectively. Accession number and the sequence source are listed in Table S1. All HA sequences from GISAID were used to determine whether they are highly pathogenic avian influenza viruses (HPAIV). If the HA sequence has a characteristic multibasic cleavage site we consider it to be highly pathogenic. Coding region sequences (CDS) were used for phylogenetic reconstruction using RAxML v8.2.4 (22) based on the GTR + G model. The autoMRE bootstrapping convergence criterion described previously (23) was applied to determine the most suitable number of replicates instead of the default 1,000 replicates. The selection of most suitable replicates of bootstrap was done as follow: after 50 replicates, all of the generated bootstrapped trees were repeatedly (1,000 permutations) split into two equal subsets and the Weighted Robinson-Foulds (WRF) distance was calculated between the majority-rule consensus trees of both subsets (for each permutation). Bootstrapping convergence was considered to be reached if over 99% permutations have low WRF distances (< 3%). In this case, convergence was reached after 400, 350, 400, 350, 400, 350, 350, and 350 replicates and average weighted Robinson-Foulds distance (WRF) = 2.67, 2.78, 2.85, 2.81, 2.81, 2.82, 2.69, and 2.65 when percentage of permutations is 99.9% for HA, NA, MP, NP, NS, PA, PB1, and PB2, respectively. The phylogenetic tree was visualized using iTOL v4 (24). Group Selection and Evolutionary Analysis 5

We first detected the presence of adaptive evolution based on the HA gene. All sequences from human isolates were selected for analysis of genetic diversity. In addition, we also randomly selected the sequences from avian isolates with the same number of sequences from human isolates in the group. The BEAST package (v1.10.5) was used to construct the maximum clade credibility (MCC) tree (25). The best fit nucleotide substitution model was selected using IQ-tree (26). General Time Reversible plus Gamma distributed rate variation among sites in nucleotide substitution model, uncorrelated relaxed log normal clock set in molecular clock, Coalescent: Bayesian Skyline set in prior, total chain length was 5×109 and sampling every 50,000 steps. Two dependently runs were combined using Logcombiner. And final tree was shown in Figtree. For further analysis, we split the big tree into several different groups, which were genetically distant from each other. The criteria for group selection were as follows: 1) contained more than 2 human sequences and 2 avian sequences; 2) the bootstrap value for the selected group and its closet sequences was  70. Ancestral sequences for each group were inferred separately using maximum likelihood methods implemented in CODEML of PAML 4.8 (27) and the Jones, Taylor and Thornton replacement matrix. Sites in which dominant amino acids in human sequences were different from both dominant amino acids in avian sequences and ancestral amino acid groups were considered potential adaptive sites (PAS) are listed in Table S2. The χ2 test was used to determine the association of PAS and the phenotype (host jump from avian to human) using counts of dominant amino acids in human and avian sequences. PASs with p values < 0.05 were considered to be adaptive sites (AS). Parallel evolution sites were defined as amino acid changes in  50% human sequences in at least 2 groups in PAS. The χ2 test was further done by combining sequences among these parallel groups to determine if mutations were associated to host adaptation. The p value was also corrected using the Benjamini & Hochberg procedure for each gene segment. To examine if AS was the result of positive selection, we used CODEML of PAML 4.8 (27). Branches containing only 6

human sequences and having more than 2 sequences were selected as foreground branch to further analysis. We then tested positive selection under branch site model, allowing ω to vary among sites and branches. The likelihood ratio test (LRT) was then performed by comparing the alternative model to the null model. The significance of the test statistic (2ΔlnL) was determined by comparing the χ2 distribution with the degree of freedom as the difference in the number of parameters between the null and alternative model. Sites with bayesian posterior probability > 0.95 were considered to be under positive selection in foreground branches. Inferring the effects of mutations under parallel evolution on protein function Figures were created with PyMol (Molecular Graphics System, Version 2.0 Schrödinger, LLC, https://pymol.org/2/). For visualization of HA the pdb file 4K63 was used which contains the structure of H5 HA from A/Indonesia/5/2005 virus bound to the avian receptor determinant LSTa (α2,3-pentasaccharide) (28). The recently published structure of a polymerase complex from a H5N1 virus (A/duck/Fujian/01/2002(H5N1)(29), was used to highlight the amino acids under parallel evolution in PA, PB1 and PB2. To show the location of three of these sites at the entrance of the NTP tunnel we used the structure of a virus from bats (A/little yellow-shouldered bat/Guatemala/060/2010, H17N10, pdb 4WSB (30)) since several basic amino acids that line the tunnel are located in a small part of PB1 that is not resolved in the structure of the polymerase complex from H5N1.

Results 1. Adaptive evolution of H5 AIVs to human Human-derived H5Ny viruses have been isolated in three continents including Asia, Africa, and North America (Fig 1A). Among them, Asia had the most human-infected 7

isolates (about 74%). Human-derived H5N1 isolates were found in the three continents mentioned above, but human-infected H5N6 isolates were only detected in China. According to the phylogenetic tree (Fig 1B, 1C, and Fig 2), human-derived sequences were scattered throughout the tree. In addition, most human-derived isolates clustered with isolates from birds, indicating that human infections were mainly of avian origin. Due to the large number of H5Ny sequences, we divided them into significantly distant genetic groups for further analysis. Despite the high similarity of influenza genes, we identified 17, 17, 7, 8, 10, 14, 17, and 14 genetically distinct groups for HA, NA, MP, NP, NS, PA, PB1, and PB2, respectively, based on the criteria described in materials and methods. First, the structure of the MCC tree based on HA was similar to the ML tree, clearly divided into G1 to G17 (Fig 3), with the effective population size increasing around 2005 inline with the increased number of samples. . Furthermore, we identified a total of 102 adaptive mutations (AMs) in 31 groups significantly associated with human adaptation (Table S3). For each gene segment, there were 23, 28, 2, 11, 17, 11, and 10 AMs which are considered as being related to human adaptation for HA, NA, MP, NS, PA, PB1, and PB2 (no AMs were found in NP), respectively. Furthermore, there were 10, 5, and 1 AMs for NA, PB1, and PB2, respectively that were specific for the H5N6 subtype groups and 21, 18, 1, 17, 6, and 9 AMs for HA, NA, NS, PA, PB1, and PB2, respectively that were specific for the H5N1 subtype. The other AMs were located in groups containing both H5N1 and H5N6 subtypes. Among them, some AMs have been previously reported to be associated with human adaptation or found in isolates from clinical patients, such as S227N (H3 numbering hereafter) in HA (31) and K526R and E627K (32) in PB2, further demonstrating the reliability of our method. 2. Some previously reported human/mammalian-adaptation related mutations are dominating among poultry and are thus not absolutely required for human adaptation. 8

T160A in HA has been experimentally demonstrated to enhance the binding affinity for the human receptor (33), but this mutation was not found in our AMs. Detailed analysis showed that T160 is mainly present in the phylogenetic group G8 and dominant in avian isolates in the same group, suggesting it was not an adaptive mutation. Furthermore, we also found that alanine 160 was dominant in avian isolates from both Asia (~59%) and Africa (~85%), and also in human-derived isolates from Africa (100%), whereas threonine was dominant in human-infected isolates from Asia (~84%). From a time perspective (Fig S1), we found that alanine was dominant in avian isolates from Africa since 2004 and in avian isolates from Asia from 1999 to 2003 and from 2011 to 2018. The HA mutation A138V was previously reported to allow binding to sialic acid linked to both α-2,3 and α-2,6 galactose residues indicating a partial adaption to the human receptor (34). However, according to our results, alanine was dominant in both human and avian isolates (Fig S2). A similar phenomenon was also found for the D701N change in PB2 which promotes nuclear transport in mammalian cells (Fig S3) (35), where D is dominant in both human and avian isolates. Taken together, although A138V in HA and D701N were suggested to help virus adaption to humans, single mutations at these sites may be not absolutely required for human adaptation. 3. Parallel evolution found in some AMs Parallel evolution of AIV has been described before (36). Recurrent mutations under parallel evolution were then considered to be essential for adaptation of virus to a similar environment. Here, we wanted to test whether parallel-evolved mutations existed in H5 AIV when transmitting from avian to humans. Most of human H5 IAVs could be only transmitted from avian by independent events (Fig 1). We therefore asked whether independent AMs showed parallel evolution. In total, we identified ten mutations showing parallel evolution in at least two groups and were not fixed in others, referred to as parallel evolved mutations (PEMs) (Table 1). S159N (shared by 9

G3 and G9), N186S (shared by G7 and G15), A189T (shared by G13 and G16), T192I (shared by G11 and G13), and I202V (shared by G13 and G16) were found to be parallel evolved in HA. For PA, N321K (shared by G11 and G14), and D394N (shared by G2 and G5) were identified. L384S (shared by G5 and G7) and K386R showed parallel evolution in PB1. We also identified E627K (shared by G1, G4, and G11) in PB2 as parallel evolutionary mutations. Among these PEMs, D394N in PA, K386R in PB1, and E627K were parallel evolved in both H5N1 (G5 in PA and PB1; G1 and G11 in PB2) and H5N6 (G2 in PA; G1 in PB1; and G4 in PB2) subtype isolates. Other PEMs were only parallel evolved in H5N1. PEMs were not detected in the M, NA, NP, and NS genes. Although two parallel evolved mutations, T192I in HA and N321K in PA (Table 1), were not significantly associated with human adaptation by our follow-up statistics criteria, they have been previously considered as human adaptive markers due to their repetitive presence (37, 38) and we considered them as PEMs for parallel evolution in this study. We next investigated if these 10 mutations were the result of positive selection. We did not detect any signal of positive selection on these sites in human sequences in each phylogenetically independent groups (Table S4), which could be a result of both scattered distribution of human isolates in the phylogenetic tree and the non-stability of the topology of phylogenetic tree resulting from highly similar seqeunces. 4. Structural analysis of parallel mutations of human H5N1 isolates Fig 4A shows the five amino acids under parallel evolution on the crystal structure of H5 HA from A/Indonesia/5/2005 virus (28). Their side chains are exposed at the surface of the HA1 subunit, except for I202, which is a component of an internal ß-sheet. Residue 159 is in the middle of an N-glycosylation sequon, NXT, which is located in a loop region on top of the globular head domain. The three other amino acids are located very close to the receptor binding site, which is composed of the base containing the three conserved residues Y95, W153, and H183 (not highlighted) 10

and three structural elements at the side, the 130-loop, the 190-helix, and the 220-loop (labelled as blue sticks in Fig 4B). The 220-loop contains the avian-signature amino acids Q226 and G228, which must mutate to L226 and S228 to increase binding of H5 HA to human-type receptors (15). One of the amino acids under parallel evolution (N186) is known to be involved in receptor binding by forming a water-mediated hydrogen bond with Q226 (39). A189 and T192 are part of the 190-helix, but their side chains are not in contact with the ligand since they point upwards from the helix to the molecule´s surface. Five amino acids under parallel evolution were identified in the three proteins of the polymerase complex (Fig 5A). Their location is highlighted in the recently published crystal structure of the virus polymerase complex from a H5N1 Influenza virus. In the cap-binding protein PB2 it is residue 627, which is well known to be involved in host adaption (40). In the endonuclease PA, which cleaves the cap-structure from cellular mRNAs, two sites were identified, residues 321 and 394. The catalytic part of the polymerase complex, the PB1 protein, contains two neighboring residues (384 and 386) under parallel evolution. Both are also in close proximity to K394 of PA, exposed at the surface of the molecule and located near the proposed entry site for NTPs into the catalytic center of PB1 (Fig 5B) (30). Discussion Since 1997, H5N1 and H5N6 AIVs have caused the death of more than half of infected humans and therefore they clearly represent a threat to public health (http://www.who.int). Most human cases of infection resulted from direct exposure to H5 virus-infected poultry or poultry products. However, human-to-human transmission, albeit limited, has been detected (18, 41). Amino acid changes occurring in avian H5N1 viruses that are fully adapted to airborne transmission between ferrets (an animal model for transmission between humans) have been described (11, 12). However, it is largely unexplored whether (and which) amino acid changes occur in 11

humans infected with an avian H5 virus that was not transmitted to other humans, although other studies observed that adaptation/parallel evolution occurred during multiple adaptations of H7N9 avian influenza viruses to human hosts (42).

Hence,

we found that adaptive and parallel evolution occurred during the multiple adaptations of these viruses to human hosts. Based on our approach, several mutations were identified, which were already shown to be relevant to human adaption, further demonstrating the reliability of our method. For instance, HA S227N has been experimentally validated to alter the receptor specificity from avian to humans(31). PB1 T182I, which had appreciable effects on H5N1 human adaptation, structurally and energetically modified the conformation of the PB1 β-sheet that interacts with the vRNA promoter (37). Mutation of PB2 site 627 was also demonstrated to be associated with viral host range (40), and E627K has been widely confirmed to increase replication in mammalian cells at relative low temperature(32). An experimentally verified PB2 mutation K526R is a novel human-adaptive marker in human influenza A virus (43). Some important well-known mutations, such as A138V, T160A in HA have been demonstrated to enhance the ability to infect humans/mammals (33). However, these mutations were not found in our AMs. In addition, some mammal-adaptive markers such as D701N in PB2 using mouse experiment model were also not detected in our study. This is important because clinical signs in humans and mice are different and the adaptive evolution of AIVs into different environments leads to different results (44). The methods and criteria for detecting parallel evolved sites described in reference (45) were highly relied on accurate and stable topology of phylogenetic tree. As we known, sequences of influenza viruses were highly similar, making topology of the sequences near the leaf nodes may be unstable. Therefore, our method may be more suitable for the detection of parallel sites of viruses, especially influenza viruses. However, our approach has some shortcomings, which could be improved in the 12

future. For instance, our method does not consider the association among mutations in different sites. In addition, more methods should be used to detect positive selection sites. Despite some limitations, we first detected AMs in H5 human-adapted viruses in multiple phylogenetically distant groups indicating that they are the result of parallel evolution. Since the criteria we used in detecting adaptive mutations was quite strict, adaptive mutations which did not appear repetitively among genetically distant groups (parallel evolved) are also worthy of attention in the future. Our analysis did not identify key residues in HA that are exchanged upon adaption of H5N1 for airborne transmission between ferrets (11, 12), but four of the five residues under parallel evolution might subtly influence receptor binding and/or antigenicity of HA. N186 has been structurally and experimentally shown to be involved in ligand binding by perturbing the biochemical properties of the receptor-binding pocket. Its amine group forms a water-mediated hydrogen-bond with the –OH group of Q226, the decisive residue that determines the receptor binding specificity in avian viruses (15). Mutation of N186 by a positively charged K residue disrupts the hydrogen bond and reduces the affinity for the avian and at the same time enhances binding to the human receptor (39). An exchange of N186 to K was also identified in two human patients infected with the H5N1 virus A/Vietnam/04 by deep sequencing, although at low frequency (11). Whether the exchange we observed (N to S) has a similar effect is not known. A189 and T192 are components of the 190-helix that is part of the receptor binding site, but their side chains do not directly contact the ligand. However, an exchange by a T (189) or I (192) might cause a small rotation of the helix and thus subtly influence the spatial pattern of the receptor binding site. Accordingly, a K to T exchange in the adjacent residue 193 improves binding of H5 HA to human-type receptors (46).

13

S159 is part of a glycosylation motif (NXS/T) that is lost upon forced adaption of an H5N1 virus to airborne transmission between ferrets by mutation of N158D and T160A, respectively (11, 12). Since the same glycosylation site is also not present in ferret- and guinea pig-transmissible H5 viruses (13), it is likely that a carbohydrate attached to N158 may sterically interfere with virus binding to the cellular receptor. Exchange of S159 by N is unlikely to prevent glycosylation at this site. X can be any residue

except

a

proline

and

the

bioinformatics

tool

NetNGlyc

http://www.cbs.dtu.dk/services/NetNGlyc/ does not predict a decreased probability for glycosylation at this site if S159 is exchanged by an N. However, since S159 (and also A189 and T192) are located in a variable loop on top of the molecule, which contains antibody escape mutations, their exchange might influence the antigenic properties of H5 HA (47). The relevance of the hydrophobic residue I202 and its exchange by an (also hydrophobic) valine is obscure. It is part of a ß-sheet region within the HA1 subunit and no role of this residue for the stability of HA or for the threshold pH at which HA fusion activity is triggered has been reported (14). An exchange of the adjacent P203 by an S has been observed at low frequency in a human patient H5N1 virus A/Vietnam (48) suggesting that this region might indeed be involved in adaption of avian H5 viruses to humans. We also identified five residues under parallel evolution in each of the three proteins of the polymerase complex. The role of this residue as a host-adaption factor is well established (13, 49). Of note, it was reported that a difference in the sequence of the cellular protein ANP32A between mammalian and avian cells is responsible for the suboptimal function of avian influenza virus polymerase in mammalian cells (50). Residue 321 in PA is located in the PA-C domain, but no specific function has been associated with this region. However, residue 321 was also identified in several human-adapted A(H1N1)pdm(2009) viruses in comparison to the prototypic 14

A/England/195/2009 strain (51). This mutation in PA (N321K) enhanced polymerase activity of third-wave viruses and provided a replicative advantage in human airway epithelial cells. The same residue (among others) was also identified in a study of human adaption mutations in the clade 2.2.1 of H5N1 virus, although no effect on the polymerase activity was experimentally observed (37). An exchange of an S by a P at residue 325 has been observed at low frequency in one human patient infected with the H5N1 virus A/Vietnam (48). The other three residues under parallel evolution (384 and 386 in PB1 and 394 in PA) are present at the surface of the molecule in close proximity. Interestingly, they are located close to the proposed NTP entry channel, which is composed of basic, positively charged amino acids (labelled blue in Fig. 5B) that funnels nucleotides to the presumed active center of PB1 (acidic residues labelled magenta). Residues 384 and 386 are L and K in avian virus and are exchanged to S and R, respectively in human viruses. PA residue 394 is D in avian viruses and N in human viruses meaning that a negative charge is removed during human adaption. PB1 residues 384 and 386 were also identified in a study of human adaption mutations in the H5N1 clade 2.2.1 virus. Mutations of these residues had a slight effect on the polymerase activity in transfected cells, but no appreciable effect on progeny vRNA production in infected cells (37). To our knowledge, residue 394 in PA has not been described so far to be important for adaption of the polymerase to replication in humans and no function has been assigned to this residue. One might speculate that (together with 384 and 386 in PB1) it is binding to a host cell factor that is involved in guiding NTPs into the polymerase complex. Since we did not analyze viral proteins generated by splicing (M2, NS2) and from usage of an alternative initiation codon (PB1-F2) we do not know whether they also acquired adaptive mutations. Especially residue 66 in PB1-F2, an accessory protein of many Influenza A viruses, has pathogenic significance (52). Likewise, we did not 15

analyze whether M2 acquired mutations in its transmembrane region that would generate amantadine resistant virus strains (53). In summary, four of the five parallel human adaptation polymerase mutations identified in our study had been previously reported in a study investigating adaption of H5N1 AIV to humans. However, when the residues were mutated, they had no or little effect on polymerase activity or vRNP production (18). One explanation might be that multiple mutations in different segments might be required to yield a measurable effect. Furthermore, the adaptation mechanism of the polymerase might be multifactorial, suggesting that avian viruses might adapt by multiple pathways to enable their growth in humans. The jump of H5 AIVs from birds to humans leads to exposure of the virus to a new host environment and thus selection pressure. Avian influenza virus adaptation to humans involves the stepwise accumulation of potentiating mutations that favor the emergence of a particular adaptive mutation (48). Therefore, the mutational panel provided here might be very useful as an early detection system for transitional stages in H5 AIV evolution before it poses a greater public health threat. We identified several dominant AMs that have not experienced parallel evolution that are intriguing and worthwhile candidates for further experimental studies. The application of these findings is invaluable for public health given the constant threat of a more serious influenza pandemic.

Acknowledgements This work was financially supported by the National Key Research and Development Program of China (2017YFD0500101), Young Top-notch Talent of the China National Ten Thousand Talent Program ,the Youth Talent Lift Project of China 16

Association for Science and Technology(2017-2019).

References 1.

de Jong JC, Claas EC, Osterhaus AD, Webster RG, Lim WL. 1997. A pandemic warning? Nature 389:554.

2.

Su S, Bi Y, Wong G, Gray GC, Gao GF, Li S. 2015. Epidemiology, Evolution, and Recent Outbreaks of Avian Influenza Virus in China. J Virol 89:8671-6.

3.

Savill NJ, St Rose SG, Keeling MJ, Woolhouse ME. 2006. Silent spread of H5N1 in vaccinated poultry. Nature 442:757.

4.

Connie Leung YH, Luk G, Sia SF, Wu YO, Ho CK, Chow KC, Tang SC, Guan Y, Malik Peiris JS. 2013. Experimental challenge of chicken vaccinated with commercially available H5 vaccines reveals loss of protection to some highly pathogenic avian influenza H5N1 strains circulating in Hong Kong/China. Vaccine 31:3536-42.

5.

Pantin-Jackwood MJ, Suarez DL. 2013. Vaccination of domestic ducks against H5N1 HPAI: a review. Virus Res 178:21-34.

6.

Duan L, Bahl J, Smith GJ, Wang J, Vijaykrishna D, Zhang LJ, Zhang JX, Li KS, Fan XH, Cheung CL, Huang K, Poon LL, Shortridge KF, Webster RG, Peiris JS, Chen H, Guan Y. 2008. The development and genetic diversity of H5N1 influenza virus in China, 1996-2006. Virology 380:243-54.

7.

Vijaykrishna D, Bahl J, Riley S, Duan L, Zhang JX, Chen H, Peiris JS, Smith GJ, Guan Y. 2008. Evolutionary dynamics and emergence of panzootic H5N1 influenza viruses.

17

PLoS Pathog 4:e1000161. 8.

Chen H, Deng G, Li Z, Tian G, Li Y, Jiao P, Zhang L, Liu Z, Webster RG, Yu K. 2004. The evolution of H5N1 influenza viruses in ducks in southern China. Proc Natl Acad Sci U S A 101:10452-7.

9.

Guan Y, Peiris M, Kong KF, Dyrting KC, Ellis TM, Sit T, Zhang LJ, Shortridge KF. 2002. H5N1 influenza viruses isolated from geese in Southeastern China: evidence for genetic reassortment and interspecies transmission to ducks. Virology 292:16-23.

10.

Bi Y, Chen Q, Wang Q, Chen J, Jin T, Wong G, Quan C, Liu J, Wu J, Yin R, Zhao L, Li M, Ding Z, Zou R, Xu W, Li H, Wang H, Tian K, Fu G, Huang Y, Shestopalov A, Li S, Xu B, Yu H, Luo T, Lu L, Xu X, Luo Y, Liu Y, Shi W, Liu D, Gao GF. 2016. Genesis, Evolution and Prevalence of H5N6 Avian Influenza Viruses in China. Cell Host Microbe 20:810-821.

11.

Imai M, Watanabe T, Hatta M, Das SC, Ozawa M, Shinya K, Zhong G, Hanson A, Katsura H, Watanabe S, Li C, Kawakami E, Yamada S, Kiso M, Suzuki Y, Maher EA, Neumann G, Kawaoka Y. 2012. Experimental adaptation of an influenza H5 HA confers respiratory droplet transmission to a reassortant H5 HA/H1N1 virus in ferrets. Nature 486:420-8.

12.

Herfst S, Schrauwen EJ, Linster M, Chutinimitkul S, de Wit E, Munster VJ, Sorrell EM, Bestebroer TM, Burke DF, Smith DJ, Rimmelzwaan GF, Osterhaus AD, Fouchier RA. 2012. Airborne transmission of influenza A/H5N1 virus between ferrets. Science 336:1534-41.

18

13.

Neumann G, Kawaoka Y. 2015. Transmission of influenza A viruses. Virology 479-480:234-46.

14.

Mair CM, Ludwig K, Herrmann A, Sieben C. 2014. Receptor binding and pH stability how influenza A virus hemagglutinin affects host-specific virus infection. Biochim Biophys Acta 1838:1153-68.

15.

Shi Y, Wu Y, Zhang W, Qi J, Gao GF. 2014. Enabling the 'host jump': structural determinants of receptor-binding specificity in influenza A viruses. Nat Rev Microbiol 12:822-31.

16.

Stern DL. 2013. The genetic causes of convergent evolution. Nat Rev Genet 14:751-64.

17.

Wang H, Feng Z, Shu Y, Yu H, Zhou L, Zu R, Huai Y, Dong J, Bao C, Wen L, Wang H, Yang P, Zhao W, Dong L, Zhou M, Liao Q, Yang H, Wang M, Lu X, Shi Z, Wang W, Gu L, Zhu F, Li Q, Yin W, Yang W, Li D, Uyeki TM, Wang Y. 2008. Probable limited person-to-person transmission of highly pathogenic avian influenza A (H5N1) virus in China. Lancet 371:1427-34.

18.

Ungchusak K, Auewarakul P, Dowell SF, Kitphati R, Auwanit W, Puthavathana P, Uiprasertkul M, Boonnak K, Pittayawonganon C, Cox NJ, Zaki SR, Thawatsupha P, Chittaganpitch M, Khontong R, Simmerman JM, Chunsutthiwat S. 2005. Probable person-to-person transmission of avian influenza A (H5N1). N Engl J Med 352:333-40.

19.

Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658-9.

19

20.

Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150-2.

21.

Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059-66.

22.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312-3.

23.

Pattengale ND, Alipour M, Bininda-Emonds OR, Moret BM, Stamatakis A. 2010. How many bootstrap replicates are necessary? J Comput Biol 17:337-54.

24.

Letunic I, Bork P. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242-5.

25.

Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. Bmc Evolutionary Biology 7:214.

26.

Lam-Tung N, Schmidt HA, Arndt VH, Bui Quang M. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology & Evolution 32:268-274.

27.

Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586-91.

28.

Zhang W, Shi Y, Lu X, Shu Y, Qi J, Gao GF. 2013. An airborne transmissible avian influenza H5 hemagglutinin seen at the atomic level. Science 340:1463-7.

29. 20

Fan H, Walker AP, Carrique L, Keown JR, Serna Martin I, Karia D, Sharps J,

Hengrung N, Pardon E, Steyaert J, Grimes JM, Fodor E. 2019. Structures of influenza A virus RNA polymerase offer insight into viral genome replication. Nature doi:10.1038/s41586-019-1530-7. 30.

Pflug A, Lukarska M, Resa-Infante P, Reich S, Cusack S. 2017. Structural insights into RNA synthesis by the influenza virus transcription-replication machine. Virus Research.

31.

Gambaryan A, Tuzikov A, Pazynina G, Bovin N, Balish A, Klimov A. 2006. Evolution of the receptor binding phenotype of influenza A (H5) viruses. Virology 344:432-438.

32.

Hatta M, ., Gao P, ., Halfmann P, ., Kawaoka Y, . 2001. Molecular basis for high virulence of Hong Kong H5N1 influenza A viruses. Science 293:1840-2.

33.

Gu M, Li Q, Gao R, He D, Xu Y, Xu H, Xu L, Wang X, Hu J, Liu X, Hu S, Peng D, Jiao X, Liu X. 2017. The T160A hemagglutinin substitution affects not only receptor binding property but also transmissibility of H5N1 clade 2.3.4 avian influenza virus in guinea pigs. Vet Res 48:7.

34.

Prasert A, Ornpreya S, Alita K, Chak S, Yasuo S, Kumnuan U, Suda L, Hatairat L, Phisanu P, Arunee T. 2007. An avian influenza H5N1 virus that binds to a human-type receptor. Journal of Virology 81:9950.

35.

Gabriel G, Czudai-Matwich V, Klenk HD. 2013. Adaptive mutations in the H5N1 polymerase complex. Virus Research 178:53-62.

36.

Xue KS, Stevens-Ayers T, Campbell AP, Englund JA, Pergam SA, Boeckh M, Bloom JD. 2017. Parallel evolution of influenza across multiple spatiotemporal scales. Elife 6.

21

37.

Arai Y, Kawashita N, Daidoji T, Ibrahim MS, El-Gendy EM, Takagi T, Takahashi K, Suzuki Y, Ikuta K, Nakaya T, Shioda T, Watanabe Y. 2016. Novel Polymerase Gene Mutations for Human Adaptation in Clinical Isolates of Avian H5N1 Influenza Viruses. PLoS Pathog 12:e1005583.

38.

Eshaghi A, Duvvuri VR, Li A, Patel SN, Bastien N, Li Y, Low DE, Gubbay JB. 2014. Genetic characterization of seasonal influenza A (H3N2) viruses in Ontario during 2010-2011 influenza season: high prevalence of mutations at antigenic sites. Influenza Other Respir Viruses 8:250-7.

39.

Xiong X, Xiao H, Martin SR, Coombs PJ, Liu J, Collins PJ, Vachieri SG, Walker PA, Lin YP, McCauley JW, Gamblin SJ, Skehel JJ. 2014. Enhanced human receptor binding by H5 haemagglutinins. Virology 456-457:179-87.

40.

Subbarao EK, London W, Murphy BR. 1993. A single amino acid in the PB2 gene of influenza A virus is a determinant of host range. J Virol 67:1761-4.

41.

Beigel JH, Farrar J, Han AM, Hayden FG, Hyer R, de Jong MD, Lochindarat S, Nguyen TK, Nguyen TH, Tran TH, Nicoll A, Touch S, Yuen KY, Writing Committee of the World Health Organization Consultation on Human Influenza AH. 2005. Avian influenza A (H5N1) infection in humans. N Engl J Med 353:1374-85.

42.

Xiang D, Shen X, Pu Z, Irwin DM, Liao M, Shen Y. 2018. Convergent Evolution of Human-Isolated H7N9 Avian Influenza A Viruses. J Infect Dis 217:1699-1707.

43.

Wen L, Chu H, Wong BH, Wang D, Li C, Zhao X, Chiu MC, Yuan S, Fan Y, Chen H, Zhou

22

J,

Yuen

KY.

2018.

Large-scale

sequence

analysis

reveals

novel

human-adaptive markers in PB2 segment of seasonal influenza A viruses. Emerg Microbes Infect 7:47. 44.

Li Z, Chen H, Jiao P, Deng G, Tian G, Li Y, Hoffmann E, Webster RG, Matsuoka Y, Yu K. 2005. Molecular basis of replication of duck H5N1 influenza viruses in a mammalian mouse model. J Virol 79:12058-64.

45.

Zhang J, Kumar S. 1997. Detection of convergent and parallel evolution at the amino acid sequence level. Mol Biol Evol 14:527-36.

46.

Peng W, Bouwman KM, McBride R, Grant OC, Woods RJ, Verheije MH, Paulson JC, de Vries RP. 2018. Enhanced Human-Type Receptor Binding by Ferret-Transmissible H5N1 with a K193T Mutation. J Virol 92.

47.

Stevens J, Blixt O, Tumpey TM, Taubenberger JK, Paulson JC, Wilson IA. 2006. Structure and receptor specificity of the hemagglutinin from an H5N1 influenza virus. Science 312:404-10.

48.

Imai H, Dinis JM, Zhong G, Moncla LH, Lopes TJS, McBride R, Thompson AJ, Peng W, Le MTQ, Hanson A, Lauck M, Sakai-Tagawa Y, Yamada S, Eggenberger J, O'Connor DH, Suzuki Y, Hatta M, Paulson JC, Neumann G, Friedrich TC, Kawaoka Y. 2018. Diversity of Influenza A(H5N1) Viruses in Infected Humans, Northern Vietnam, 2004-2010. Emerg Infect Dis 24:1128-1238.

49.

Te Velthuis AJ, Fodor E. 2016. Influenza virus RNA polymerase: insights into the mechanisms of viral RNA synthesis. Nat Rev Microbiol 14:479-93.

50. 23

Long JS, Giotis ES, Moncorge O, Frise R, Mistry B, James J, Morisson M, Iqbal M,

Vignal A, Skinner MA, Barclay WS. 2016. Species difference in ANP32A underlies influenza A virus polymerase host restriction. Nature 529:101-4. 51.

Elderfield RA, Watson SJ, Godlee A, Adamson WE, Thompson CI, Dunning J, Fernandez-Alonso M, Blumenkrantz D, Hussell T, Investigators M, Zambon M, Openshaw P, Kellam P, Barclay WS. 2014. Accumulation of human-adapting mutations during circulation of A(H1N1)pdm09 influenza virus in humans in the United Kingdom. J Virol 88:13269-83.

52.

Kamal RP, Alymova IV, York IA. 2017. Evolution and Virulence of Influenza A Virus Protein PB1-F2. Int J Mol Sci 19.

53.

Abdelwhab EM, Veits J, Mettenleiter TC. 2017. Biological fitness and natural selection of amantadine resistant variants of avian influenza H5N1 viruses. Virus Res 228:109-113.

24

Figure 1A. Global distribution of H5Ny isolates and large-scale phylogeny and groups of HA and NA of H5Ny. Sequences were collected from GISAID. Laboratory-derived isolates were removed from the dataset. The host was divided into 4 categories: avian, human, environment and others. The proportion of isolates for each host category is visualized in a pie chart. The size of the pie chart is proportional to the number of isolates in each continent. 1B. Phylogeny and groups of HA for H5Ny. Only bootstrap values ≥ 70% are visualized as a purple circle in the middle of the branch. The size of the circle is proportional to the bootstrap value. Groups (subtype, host, location) are identified by different colors of circles in the outer part of the tree. 1C. Phylogeny and groups of NA of H5Ny.

25

Figure 2A. Phylogeny and groups of MP of H5Ny. Only bootstrap values ≥ 70% are indicated as a purple circle in the middle of the branch. The size of the circle is proportional to the bootstrap value. Groups are identified by different colors. 2B. Phylogeny and groups of NP of H5Ny. 2C. Phylogeny and groups of NS of H5Ny. 2D.Phylogeny and groups of PA of H5Ny. 2E. Phylogeny and groups of PB1 of H5Ny. 2F. Phylogeny and groups of PB2 of H5Ny.

26

Figure 3A. Maximum clade credibility tree of HA gene. The Maximum clade credibility tree was constructed by beast (v 1.10.5) with GTR and gamma distribution nucleotide substitution model, relaxed molecular clock, the tree prior was Coalescent: Bayesian Skyline and 5×109 total chain length, sample every 50,000 steps. 3B. the Bayesian skyline plot based on infected avian and human H5 influenza virus.

27

Figure 4A. Location of amino acids under parallel evolution in H5 HA bound to the avian receptor analog LSTa. Amino acids are shown as red spheres. LSTa (α2,3-pentasaccharide) is shown as magenta sticks. Green: HA1 subunit, Blue: HA2 subunit. 4B. Receptor binding site of H5 HA bound to the avian receptor analog LSTa. Amino acids in the 130-loop, 190-helix, and 220-loop (including the “avian-signature” residues Q226 and G228) are shown as blue sticks. The amino acids under parallel evolution are shown as red sticks: N186 is involved in ligand binding, A189 and T191 are part of the 190- helix, S159 is in the middle of a (variable) glycosylation site and I202 part of an internal ß-sheet. LSTa ((α2,3) pentasaccharide) is shown as magenta sticks. Figures were created with Pymol from pdb file 4K63 (H5 HA from A/Indonesia/5/2005).

28

Figure 5. Location of amino acids under parallel evolution in the polymerase complex. A. The polymerase proteins are shown as cartoon. Magenta: PB2, Green: PB1, Blue: PA subunit. The amino acids under parallel evolution are shown as red spheres. Figure was created with Pymol from pdb file 6QPF, which is from a H5N1 virus A/duck/Fujian/01/2002(H5N1). B. Location of the residues 384 and 386 of PB1 and 394 from PA near the proposed entry site for NTPs. Basic amino acids lining the NTP channel (Lys 235, Lys 237, Arg 239, Lys 308, Lys 480, Lys 481 are shown in blue. Amino acids of the proposed active centre of PB1 (Asp 305, Asp 445, Asp 446) are shown in magenta. This figure was created from pdb file 4WSB, which is the polymerase from a bat virus (A/little yellow-shouldered bat/Guatemala/060/2010 (H17N10), since the amino acids 227 to 241 in PB1 are not resolved in the structure of the polymerase complex from H5N1.

29

Table 1. Parallel evolved mutations and their association with human adaptation Ge

Loc Group

ne

HA

HA

G7,G15

Hum

Dominate_Prop

N1,N1

N1,N1

159

186

tor

S

N

an

N|10

S|7

ortion

1

0.583333333

G13,G1 HA

HA

6

Avi

N1,N1

189

G11,G1

192

3

a

N1,N1

A

T|8

0.615384615

PA

PA

6

N1,N1

202

G11,G1

321

4

a

G2,G5

N1,N1

N6,N1

394

T

I|5

0.5

I

N

D

V|10

K|2

N|63

0.769230769

0.5

0.741176471

18.0

2.18E-

S|26

22

05

N_5

30.3

3.57E-

6

72

08

A|7

7.15

0.0074

4

66

69

0.00933625

25

0.1712

0.1712

8.23

0.0041

I|65

17

17

N|8

3.03

0.0813

8

86

1

D|6

24.2

8.42E-

4

59

07

T|50

L|10 G5,G7

N1,N1

384

L

S|40

0.64516129

PB 1

G1,G5

N6,N1

PB

G1,G4,

N1,N6,

2

G11

N1

a

386

627

K

E

R|3

K|21

0.6

0.617647059

not significantly associated with human adaptation

30

by BH

5.45E-05

1.79E-07

1.87

PB 1

P

an

G13,G1 HA

Corrected P Chi2

i

G3,G9

Ances

Type

0.006861667

0.08131

1.68E-06

0.0030

5

8.76

79

K|1

16.5

4.76E-

91

43

05

E|21

112.

2.20E-

6

6

16

0.003079

9.52E-05

2.20E-16