Conservation region finding for influenza A viruses by machine learning methods of N-linked glycosylation sites and B-cell epitopes

Conservation region finding for influenza A viruses by machine learning methods of N-linked glycosylation sites and B-cell epitopes

Mathematical Biosciences 315 (2019) 108217 Contents lists available at ScienceDirect Mathematical Biosciences journal homepage: www.elsevier.com/loc...

6MB Sizes 0 Downloads 6 Views

Mathematical Biosciences 315 (2019) 108217

Contents lists available at ScienceDirect

Mathematical Biosciences journal homepage: www.elsevier.com/locate/mbs

Conservation region finding for influenza A viruses by machine learning methods of N-linked glycosylation sites and B-cell epitopes Jone-Han Liua, Chi-Chang Changb,c, Chi-Wei Chend, Li-Ting Wongd, Yen-Wei Chud,e,

T



a

Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung 402, Taiwan, ROC School of Medical Informatics, Chung-Shan Medical University, Taichung 402, Taiwan, ROC c IT Office, Chung-Shan Medical University Hospital, Taichung 402, Taiwan, ROC d Institute of Genomics and Bioinformatics, National Chung Hsing University, 250, Kuo Kuang Rd., Taichung 402, Taiwan, ROC e Biotechnology Center, Agricultural Biotechnology Center, Institute of Molecular Biology, National Chung Hsing University, Taichung 402, Taiwan, ROC b

ARTICLE INFO

ABSTRACT

Keywords: Hemagglutinin Neuraminidase N-linked glycosylation Linear B-cell epitope Machine learning

Influenza type A, a serious infectious disease of the human respiratory tract, poses an enormous threat to human health worldwide. It leads to high mortality rates in poultry, pigs, and humans. The primary target identity regions for the human immune system are hemagglutinin (HA) and neuraminidase (NA), two surface proteins of the influenza A virus. Research and development of vaccines is highly complex because the influenza A virus evolves rapidly. This study focused on three genetic features of viral surface proteins: ribonucleic acid (RNA) sequence conservation, linear B-cell epitopes, and N-linked glycosylation. On the basis of these three properties, we analyzed 12,832 HA and 9487 NA protein sequences, which we retrieved from the influenza virus database. We classified the viral surface protein sequences into the 18 HA and 11 NA subtypes that have been identified thus far. Using available analytic tools, we searched for the representative strain of each virus subtype. Furthermore, using machine learning methods, we looked for conservation regions with sequences showing linear B-cell epitopes and N-linked glycosylation. Compared to the prediction of the Immune Epitope Database (IEDB) antibody neutralization response (i.e., screening of antibody sequence regions), in this study, the virus sequence coverage was large and accurate and contained N-linked glycosylation sites. The results of this study proved that we can use the machine learning-based prediction method to solve the problem of vaccine invalidation that occurred during the rapid evolution of the influenza A virus and also as a prevaccine assessment. In addition, the screening fragments can be used as a universal influenza vaccine design reference in the future.

1. Introduction Influenza viruses are types of ribonucleic acid (RNA) viruses. Clinically, on the basis of its nucleoprotein (NP) and interstitial protein matrix (M), influenza is of three types: A, B, and C. Among the three types, influenza A is the most infectious and highly pathogenic, causing ∼250,000–500,000 deaths each year worldwide. Therefore, the battle against the influenza A virus is an important challenge in health care. Sometimes, two different types of influenza viruses may simultaneously infect the same host. In such cases, genetic fragments of the two influenza viruses may recombine to produce a new influenza virus subtype [1], a mechanism called antigenic shift [2]. For example, the 2009 influenza pandemic was caused by the human influenza H1N1 virus. This virus originally infected pigs, but through genetic ⁎

recombination, it evolved into a new strain, resulting in the pandemic. Over the past 100 years, there have been four global pandemics with high mortality rates, resulting in millions of deaths. Among these four pandemics, H1N1 influenza caused 20 million deaths around the world in 1918 [4]. Therefore, the development of effective vaccines is a serious medical issue [7–9]. The influenza A virus is an extremely unstable pathogen. It evolves rapidly [5,6] using molecular mechanisms such as antigenic shift and antigenic drift [2,3], resulting in changes in the main surface proteins. Mutation of the influenza A virus causes a mismatch between the vaccine and epidemic strains and reduces the vaccine's inhibition efficacy against the virus. Multivalent vaccines need to be first developed using previously available monovalent ones and then extended to universal vaccines [10–12] having multi-protection effects. To develop universal vaccines by bioinformatics technology is a problem of high

Corresponding author at: Institute of Genomics and Bioinformatics, National Chung Hsing University, 250, Kuo Kuang Rd., Taichung 402, Taiwan, ROC. E-mail address: [email protected] (Y.-W. Chu).

https://doi.org/10.1016/j.mbs.2019.108217 Received 29 September 2018; Received in revised form 7 May 2019; Accepted 15 June 2019 Available online 17 June 2019 0025-5564/ © 2019 Elsevier Inc. All rights reserved.

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

Influenza Virus Sequence Database Human and Swine: H1N1, H3N2, pdm09_H1N1

Human, Swine, Avian: H5N1, H9N2

Human and Avian: H2N2, H7N7, H7N9

Bat: H17N10,H18N11

Avian: H3N8,H4N6,H6N2,H7N3,H8 N4,H10N7,H11N6,H12N5,H 13N6,H14N5,H15N9

Extrac ng surface protein sequences of influenza virus Sequences of

Sequences of

Hemagglu nin (HA)

Neuraminidase (NA)

Screening representa ve strains of virus and grouping Clustal X2

Representa ve virus strain of each subtype

Mega so!ware 60.6 Fig Tree 1.4.0 Phylogene c tree

Clustered group of all viruses subtypes

Predic on (machine learning based tool)

AVANA high conserva ve

N-linked Glycosyla on predic on

Linear B-cell epitope Predic on

Conserva on Regions (target sequences for vaccine developing)

Fig. 1. Flowchart of our system.

dimensionality. So, it is needed to be reduced to low dimensionality by proper features selection [15]. In this study, we proposed a machine learning-based prediction method of finding conservation regions with three features that can be used as vaccine development. The results showed that in regions of hemagglutinin (HA) sequences showing high conservation, the coverage ratios of neutralizing antibodies [35,36] F10 and C179 are 33.3% and 52.6%, respectively, significantly greater than

the ratios of 6.7% and 5.3%, respectively, in the Immune Epitope Database (IEDB). We compared these conservation regions with the antiinfluenza drug oseltamivir to identify sequences and structures. The results showed that all the sites we found can effectively bind to drugs or antibodies, confirming that our proposed machine learning-based prediction method can be applied to universal influenza vaccine development.

2

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

2. Materials and methods

Table 1 Representative strains of 18 HA subtypes of the influenza A virus.

The influenza A virus contains 8 genetic fragments that can be used for protein translation. Of the surface proteins of the influenza A virus, 18 HA and 11 neuraminidase (NA) subtypes have been studied so far. Using combinations of these two surface proteins, we can classify a variety of virus subtypes. Different subtypes of viruses infect different hosts, and each type of host has a specific representative virus strain. For example, H1N1, H1N2, and H3N2 cause influenza A infections in humans, while H5N1, H7N9, and H9N2 are highly lethal pathogens that infect poultry or other birds. In this study, we used the Influenza Virus Sequence Database, which was created by the National Center for Biotechnology Information specifically to include influenza virus proteins and nucleotides [13,14]. We downloaded virus data using the following parameters: virus type=A, host=any, subtype=any. Next, we downloaded and preprocessed two datasets that kept HA and NA sequences individually. In this study, we proposed the computing and analyzing method depicted in Fig. 1. To develop influenza vaccines, researchers must collect information about all virus subtypes that have caused pandemics in the past and then predict the representative strain most likely to be the epidemic target in the future. Predicting representative strains is critical. Many computation methods have been proposed. All these methods basically focus on analyzing genetic features of influenza viruses, such as RNA sequence conservation, B-cell epitopes, and N-linked glycosylation. Although each of these genetic features is individually important for vaccine development, we believe that combining them as a basis for prediction will maximize vaccine efficacy. Therefore, in this study, we tried to use these three genetic features of the influenza virus sequence in order to predict the most likely representative strain for vaccine design. First, we collected 12,832 HA and 9487 NA protein sequences from the IEDB (http://tools.iedb.org/main/download/). Next, from these protein sequences, we found the representative strain of each influenza virus subtype with the highest conservation. Finally, we analyzed these conservation sequences of the representative strains for B-cell epitopes and N-glycosylation.

HA

Subtype

Host

Virus strain

H1 H1 H1 H1 H2 H2 H3 H3 H4 H5 H5 H5 H6 H7 H7 H7 H7 H7 H8 H9 H9 H9 H10 H11 H12 H13 H14 H14 H15 H16 H17 H18

H1N1 H1N1 H1N1 H1N1 H2N2 H2N2 H3N2 H3N2 H4N6 H5N1 H5N1 H5N1 H6N2 H7N3 H7N7 H7N7 H7N9 H7N9 H8N4 H9N2 H9N2 H9N2 H10N7 H11N6 H12N5 H13N6 H14N5 H14N5 H15N9 H16N3 H17N10 H18N11

Human Swine Human Swine Human Avian Human Swine Avian Human Swine Avian Avian Avian Human Avian Human Avian Avian Human Swine Avian Avian Avian Avian Avian Avian Avian Avian Avian Avian Avian

A/California/04/2009/(H1N1) A/swine/Iowa/A01049003/2010(H1N1) A/Brevig Mission/1/19189(H1N1) A/swine/19319(H1N1) A/Singapore/1/1957(H2N2) A/duck/Hong Kong/273/1978(H2N2) A/Aichi/2/1968(H3N2) A/swine/Colorado/1/1977(H3N2) A/duck/Czechoslovakia/1956(H4N6) A/Vietnam/1203/20049(H5N1) A/duck/Laow/3295/2006(H5N1) A/swine/Banten/UT3062/2005(H5N1) A/chicken/California/431/2000(H6N2) A/turkey/Italy/8458/2002(H7N3) A/Neterlands/219/2003(H7N7) A/red knot/NJ/325/1989(H7N7) A/turkey/Minnesota/1/1988/(H7N9) A/Shanghai/02/2013(H7N9) A/turkey/Ontario/6118/1968(H8N4) A/swine/Shan Dong/1/2003(H9N2) A/Hong Kong/1074/1997(H9N2) A/turkey/Wisconsin/1/1966(H9N2) A/chicken/Germany/N/1949(H10N7) A/duck/England/1/1956(H11N6) A/duck/Alberta/60/1976(H12N5) A/gull/Maryland/704/1977(H13N6) A/mallard/Astrakhan/263/1982(H14N5) A/mallard/Gurjev/263/1982(H14N5) A/shearwater/West Australia/2576/79(H15N9) A/black-headed gull/Sweden/2/99(H16N3) A/bat/Guatemala/060/2010(H17N10) A/bat/Peru/033/2010(H18N11)

subtypes. We used Clustal X2 and MEGA 6.0.6 to perform multiplesequence alignment and clustering of the phylogenetic relationship for two protein types. Sequences with high similarity and the same circulated year were put into the same cluster. From each cluster, we selected a virus strain with the highest frequency as being representative of that influenza virus subtype. Our set of representative strains coincided with that of Jing Sun et al. Tables 1 and 2 show the representative strain of each influenza virus subtype. Next, we reused Clustal X2 and MEGA 6.0.6 to compute the phylogenetic relation among these representative virus strains. Consequently, the 18 HA subtypes were divided into group 1 (H1, H2, H5, H6, H8, H9, H11, H12, H13, and H16) and group 2 (H3, H4, H7, H10, H14, and H15). In addition, the 11 NA subtypes were divided into

2.1. Extracting surface protein sequences of the influenza A virus First, we separated the HA [18] and NA [19] sequence data into individual datasets by species [20]. From these two datasets, it was necessary to find representative strains of influenza A viruses and perform cluster classification based on the phylogenetic tree of those representative strains. We used Clustal X2 [16] and MEGA 6.0.6 [17] to calculate the genetic relationship of the influenza A virus and then used phylogenetic tree software to find representative strains of its various subtypes. Next, we used FigTree 1.4.0 to determine the grouping of the influenza A virus data according to the representative strain of each subtype. On the basis of the genetic features of high conservation, Nlinked glycosylation, and linear B-cell epitopes, we input these grouped sequence data to a machine learning-based online tool to predict whether they are suitable for vaccine development. Finally, we compared the predicted conservation regions with the results of other studies.

Table 2 Representative strains of 11 NA subtypes of the influenza A virus.

2.2. Screening representative strains of influenza A virus subtypes and grouping Classification of influenza A viruses is based on the combination of 18 HA and 11 NA surface protein components to form different

3

NA

Subtype

Host

Virus strain

N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11

H1N1 H3N2 H7N3 H8N4 H12N5 H4N6 H10N7 H3N8 H7N9 H17N10 H18N11

Human Human Avian Avian Avian Avian Avian Avian Human Bat Bat

A/Brevig Mission/1/1918(H1N1) A/Aichi/2/1968(H3N2) A/turkey/Italy/8458/2002(H7N3) A/turkey/Ontario/6118/1967(H8N4) A/mallard duck/ALB/69/1976(H12N5) A/duck/Czechoslovakia/1956(H4N6) A/chicken/Germany/N/1949(H10N7) A/Duck/Ukraine/1/1963(H3N8) A/Shanghai/02/2013(H7N9) A/bat/Guatemala/060/2010(H17N10) A/bat/Peru/033/2010(H18N11)

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

Fig. 2. Phylogenetic tree of the influenza A virus.

three groups: group 1 (N1, N4, N5, and N8), group 2 (N2, N3, N6, N7, and N9), and group 3 (N10 and N11, subtypes discovered later). At present, N10 and N11 are less researched and analyzed [21]. Using FigTree 1.4.0, Fig. 2 depicts the phylogenetic trees of these grouped representative virus strains. After grouping the HA and NA subtypes, we looked for regions with sequences showing high conservation, linear Bcell epitopes, and N-linked glycosylation in each group.

Lbtope (developed from an experimentally validated B-cell Immune Epitope Database [14]) to find the epitopes in HA and NA sequences. We uploaded sequences showing high conservation to the Lbtope website for prediction of B-cell epitopes. We selected the “Prediction of Antigen Sequence” mode and set the prediction algorithm to “LBtope_Fixed.” Supplementary Figs. S1 and S2 show the results of prediction.

2.3. Prediction (machine learning-based tool)

2.3.3. Sites of N-linked glycosylation Wagner's research [24] demonstrated that when N-linked glycosylation occurs on viral surface proteins, it promotes efficient binding of HA to host receptors. In addition, NA increases its activity [25,26], so newly generated viruses can be effectively released outside the cell in order to infect other cells [27–29]. Therefore, predicting the sites of Nlinked glycosylation HA and NA sequences will help in designing influenza vaccines. To perform prediction, we used NetGlycate 1.0 [30], an online tool that predicts N-linked glycosylation sites in human proteins using an artificial neural network algorithm.

2.3.1. High-conservation regions and sites We analyzed the grouped sequences of representative strains using the Antigen Variability Analyzer (AVANA) for multiple-sequence alignment. AVANA can identify PB2, an important protein found in a host infected with the influenza A virus. When the influenza virus replicates enough to cause infection, AVANA can be used to observe changes in PB2. From the sequence evolution, we can identify regions with high conservation. In this study, we set the parameter of the conservation value to > = 95% and the length of the conservation region to 1–20 amino acid bases. Using the AVANA results, we retrieved the conservation regions of HA and NA sequences of each group of viruses and further analyzed them.

3. Results Supplementary Tables S1 and S2 show the results of prediction. The conservation regions we found are consensus sequences with high conservation, B-cell epitopes, and N-linked glycosylation. Using these sequences, we compared the coverage ratios with neutralizing antibodies and influenza drugs. The coverage ratios showed that these sequences are suitable for vaccine development.

2.3.2. Sites of linear B-cell epitopes The most challenging part of designing a vaccine against any virus is to find antigen sites in the surface protein sequence of the virus [22]. These sites are called the epitopes of the antigen. HA and NA, two surface proteins of the influenza A virus, are the main antigen targets for vaccine design. Their sequences contain many epitopes that can be recognized by the human immune system (B-cells). Locating epitopes on HA and NA sequences is a complicated and huge task. It was not feasible to use biochemical technology in this study to find such epitopes. Therefore, we used the B-cell antigen prediction system [23]

3.1. Analysis and verification After three series of computations, we obtained consensus sequences with high conservation, B-cell epitopes, and N-linked glycosylation, as shown in Fig. 3 [31]. In Fig. 3A, groups 1 and 2 are

4

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

Fig. 3. Consensus sequences of (a) HA and (b) NA.

classified HA sequences and group 1 + 2 is unclassified HA sequences. In Fig. 3B, groups 1, 2, and 3 are classified NA sequences and group 1 + 2 + 3 is unclassified NA sequences. To analyze the population coverage of the consensus sequences of all datasets, we used the Epitope Conservancy Analysis tool. Computation was performed in grouped and ungrouped datasets. In the HA dataset, we found 13,155 and 6945 consensus sequences of groups 1 and 2, respectively, while in the NA dataset, we found 8442, 5820, and 3 consensus sequences of groups 1, 2, and 3, respectively. The ungrouped dataset was the sum of all

grouped datasets. We calculated the conservancy of each featured consensus sequence against the entire influenza database with identity threshold = 90%. We also aggregated the frequency for each consensus sequence with identity > = 90%. The final results are shown in Fig. 4. Next, we used the consensus sequences of the grouped datasets to compare the coverage ratios of the target identity region with the neutralizing antibodies of the influenza A virus [32–34]. The IEDB neutralizing antibodies F10 and C179 [35] were selected to calculate the coverage ratios against group 1 of the NA dataset, and antibody

5

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

Fig. 4. Statistics of the consensus sequences of (a) HA and (b) NA.

6

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

distribution, as shown in Fig. 5. For both HA and NA, the ungrouped dataset has fewer consensus sequences than the grouped datasets in the higher-density section, implying that the ungrouped dataset has lower coverage ratios of the consensus sequences. This result suggested that consensus sequences with high conservation, B-cell epitopes, and Nlinked glycosylation calculated by our machine learning-based prediction method have high-target identity in all the other subtypes of the influenza virus. These high-target identity regions are the most important target sequences to search for when developing universal influenza vaccines.

Table 3 Coverage ratios of group 1 of the NA dataset. Dataset

Compared epitope

Coverage ratio

IEDB_result

F10 C179 F10 C179

6.7% 5.3% 33.3% 52.6%

SR

Table 4 Coverage ratios of group 2 of the NA dataset. Dataset

Compared epitope

Coverage ratio

IEDB_result SR

39.29 39.29

30.4% 21.7%

4.2. Coverage ratios compared to the IEDB results The epitopes located on neutralizing antibodies are the target identity regions of the influenza vaccine. The higher the coverage ratio, the better the vaccine's immunity. As mentioned earlier, the SRs in our study showed much higher coverage ratio compared to the neutralizing antibodies F10 and C179 and a lower coverage ratio compared to antibody 39.29. In Tables 5 and 6, we also proved that our machine learning-based prediction method can obtain drug-binding sites on which the anti-influenza drug works to inhibit the virus. These results indicated that the machine learning-based prediction method we proposed is effective in locating epitopes to design influenza vaccines, and the high population coverage could facilitate the development of universal vaccines.

Table 5 Coverage ratios of group 1 of the NA dataset. Epitope

Comparing drug

Accuracy

IEDB_result SR

Oseltamivir Oseltamivir

50.0% 33.3%

Table 6 Coverage ratio of group 2 of the NA dataset.

5. Conclusion

Epitope

Comparing drug

Accuracy

SR

Oseltamivir

50.0%

In this study, we selected the representative strain of each influenza A virus subtype with the highest conservation of surface protein sequences. With the sequence similarity of each representative strain, we furthered group all influenza A virus subtypes. Current research has shown that correct grouping can lead to an understanding of the rapid evolution of the influenza A virus and help us find a representative strain suitable for vaccine production. In this study, however, we added analysis of B-cell linear epitopes and N-linked glycosylation sites as a consensus region of viral surface proteins. These two genetic features of the protein sequence are important mechanisms by which the influenza A virus infects host cells. Therefore, by adding the sequence conservative used in grouping, we found that the consensus regions had higher population coverage for different influenza A virus subtype sequences. The results confirmed that the SRs have higher coverage ratios. This means that our machine learning-based prediction method is suitable for finding target sequences during vaccine research. Moreover, since the SRs also have high-target identity for different of influenza A virus subtype sequences, our machine learning-based prediction method can also be used as a strategy for developing universal influenza A vaccines. Despite these advantages, there is room for improvement in this study. First, the linear B-cell epitope and N-linked glycosylation sites were determined by predictive software, so the accuracy of prediction was a key factor. How to choose higher-accuracy software is an important research issue. Second, comparisons should be made with more neutralizing antibodies in order to understand the differences in their target identity regions and obtain more information about vaccine development.

39.29 [36] was selected to calculate the coverage ratios against group 2 of the NA dataset. The select regions (SRs; target identity regions) showed much higher coverage ratios compared to F10 and C179, but the SR coverage ratio was less compared to antibody 39.29 (Tables 3 and 4). Finally, we obtained, from the IEDB, NA protein sequence sites on which the anti-influenza drug [37] oseltamivir binds with the influenza A virus and compared our results with those predicted by the IEDB (Tables 5 and 6). In group 1 of the NA dataset, the coverage ratio of oseltamivir was 33.3%, which was less than the coverage ratio of 50.0% predicted by the IEDB, but the SRs not only covered important sites but also contains N-glycosylation sites.n group 2 of the NA dataset, the coverage ratio of oseltamivir was 50.0%, but the IEDB results did not show any coverage of drug-binding sites. 4. Discussion 4.1. Population coverage of consensus sequences Fig. 4a and b show the frequency of the consensus sequences of HA and NA datasets, respectively, with high conservancy (identity > = 90%). Furthermore, we converted the data in Fig. 4 to a normal,

7

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

Fig. 5. Normal distribution of the consensus sequences of (a) HA and (b) NA.

Acknowledgments

MY2,107-2634-F-005-002 and 107-2321-B-005-013. (b) National Chung Hsing University and Chung-Shan Medical University under grant number NCHU-CSMU-10705.

This research was supported by (a) Ministry of Science and Technology, Taiwan, ROC under grant number 106-2221-E-005-0778

Mathematical Biosciences 315 (2019) 108217

J.-H. Liu, et al.

Supplementary materials Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.mbs.2019.108217.

[22]

Reference [23]

[1] D.A. Steinhauer, J.J. Skehel, Genetics of influenza viruses, Annu. Rev. Genet. 36 (2002) 305–332. [2] J. Treanor, Influenza vaccine—outmaneuvering antigenic shift and drift, N. Engl. J. Med. 350 (3) (2004) 218–220. [3] A.W. Hampson, Influenza virus antigens and ‘antigenic drift, Perspect. Med. Virol. 7 (2002) 49–85. [4] N.P. Johnson, J. Mueller, Updating the accounts: global mortality of the 1918-1920 "Spanish" influenza pandemic, Bull. Hist. Med. 76 (1) (2002) 105–115. [5] H. Hayashida, H. Toh, R. Kikuno, T. Miyata, Evolution of influenza virus genes, Mol. Biol. Evol. 2 (4) (1985) 289–303. [6] H. Hayashida, H. Toh, R. Kikuno, T. Miyata, Evolution of influenza virus genes, Mol. Biol. Evol. 2 (4) (1985) 289–303. [7] C. Gerdil, The annual production cycle for influenza vaccine, Vaccine 21 (16) (2003) 1776–1779. [8] WHO, WHO Position paper influenza vaccines, Weekly Epidemiol. Record 36 (2005) 277–288. [9] M.P. Girard, J.S. Tam, O.M. Assossou, M.P. Kieny, The 2009 A (H1N1) influenza virus pandemic: a review, Vaccine 28 (31) (2010) 4895–4902. [10] H. Zhang, L. Wang, R.W. Compans, B.Z. Wang, Universal influenza vaccines, a dream to be realized soon, Viruses 6 (5) (2014) 1974–1991. [11] L. Wang, H. Zhang, R.W. Compans, B.-Z. Wang, Universal influenza vaccines–a short review, J. Immunol. Clinical Res. 1 (2013) 1003–1009. [12] N. Pica, P. Palese, Toward a universal influenza virus vaccine: prospects and challenges, Annu. Rev. Med. 64 (2013) 189–202. [13] Y. Bao, P. Bolotov, D. Dernovoy, B. Kiryutin, L. Zaslavsky, T. Tatusova, J. Ostell, D. Lipman, The influenza virus resource at the National Center for Biotechnology Information, J. Virol. 82 (2) (2008) 596–601. [14] Q. Zhang, P. Wang, Y. Kim, P. Haste-Andersen, J. Beaver, P.E. Bourne, H.-H. Bui, S. Buus, S. Frankild, J. Greenbaum, Immune epitope database analysis resource (IEDB-AR), Nucleic Acids Res. 36 (Suppl 2) (2008) W513–W518. [15] Zou Quan, Zeng Jiancang, Cao Liujuan, Ji Rongrong, A novel features ranking metric with application to scalable visual and bioinformatics data classification, Neurocomputing Volume 173 (15) (2016) 346–354 Part 2January. [16] M.A. Larkin, G. Blackshields, N. Brown, R. Chenna, P.A. McGettigan, H. McWilliam, F. Valentin, I.M. Wallace, A. Wilm, R. Lopez, Clustal W and Clustal X version 2.0, Bioinformatics 23 (21) (2007) 2947–2948 (Oxford, England). [17] K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: molecular evolutionary genetics analysis version 6.0, Mol. Biol. Evol. 30 (12) (2013) 2725–2729. [18] J.J. Skehel, D.C. Wiley, Receptor binding and membrane fusion in virus entry: the influenza hemagglutinin, Annu. Rev. Biochem. 69 (2000) 531–569. [19] J.L. McKimm-Breschkin, Influenza neuraminidase inhibitors: antiviral action and mechanisms of resistance, Influenza Other Respir. Viruses 7 (Suppl 1) (2013) 25–36. [20] Y. Suzuki, Sialobiology of influenza: molecular mechanism of host range variation of influenza viruses, Biol. Pharm. Bull. 28 (3) (2005) 399–408. [21] S. Tong, X. Zhu, Y. Li, M. Shi, J. Zhang, M. Bourgeois, H. Yang, X. Chen, S. Recuenco, J. Gomez, L.M. Chen, A. Johnson, Y. Tao, C. Dreyfus, W. Yu,

[24] [25] [26] [27] [28]

[29] [30] [31] [32] [33] [34]

[35] [36]

[37]

9

R. McBride, P.J. Carney, A.T. Gilbert, J. Chang, Z. Guo, C.T. Davis, J.C. Paulson, J. Stevens, C.E. Rupprecht, E.C. Holmes, I.A. Wilson, R.O. Donis, New world bats harbor diverse influenza A viruses, PLoS Pathog 9 (10) (2013) e1003657. C. Dreyfus, N.S. Laursen, T. Kwaks, D. Zuijdgeest, R. Khayat, D.C. Ekiert, J.H. Lee, Z. Metlagel, M.V. Bujny, M. Jongeneelen, R. van der Vlugt, M. Lamrani, H.J. Korse, E. Geelen, O. Sahin, M. Sieuwerts, J.P. Brakenhoff, R. Vogels, O.T. Li, L.L. Poon, M. Peiris, W. Koudstaal, A.B. Ward, I.A. Wilson, J. Goudsmit, R.H. Friesen, Highly conserved protective epitopes on influenza B viruses, Science 337 (6100) (2012) 1343–1348. H. Singh, H.R. Ansari, G.P.S. Raghava, Improved method for linear B-Cell epitope prediction using antigen's primary sequence, PLoS ONE 8 (5) (2013) e62216. R. Wagner, T. Wolff, A. Herwig, S. Pleschka, H.D. Klenk, Interdependence of hemagglutinin glycosylation and neuraminidase as regulators of influenza virus growth: a study by reverse genetics, J. Virol. 74 (14) (2000) 6316–6323. V.C. Chu, G.R. Whittaker, Influenza virus entry and infection require host cell Nlinked glycoprotein, Proc. Natl. Acad. Sci. 101 (52) (2004) 18153–18158. K.I. Hidari, S. Shimada, Y. Suzuki, T. Suzuki, Binding kinetics of influenza viruses to sialic acid-containing carbohydrates, Glycoconj. J. 24 (9) (2007) 583–590. J. Stevens, O. Blixt, J.C. Paulson, I.A. Wilson, Glycan microarray technologies: tools to survey host specificity of influenza viruses, Nat. Rev. Microbiol. 4 (11) (2006) 857–864. M. Zhang, B. Gaschen, W. Blay, B. Foley, N. Haigwood, C. Kuiken, B. Korber, Tracking global patterns of N-linked glycosylation site variation in highly variable viral glycoproteins: HIV, SIV, and HCV envelopes and influenza hemagglutinin, Glycobiology 14 (12) (2004) 1229–1246. S. Sun, Q. Wang, F. Zhao, W. Chen, Z. Li, Glycosylation site alteration in the evolution of influenza A (H1N1) viruses, PLoS ONE 6 (7) (2011) e22844. Morten Bo Johansen, Lars Kiemer, Søren Brunak, Analysis and prediction of mammalian protein glycation, Glycobiology 16 (2006) 844–853. A.C. Shih, D.T. Lee, C.L. Peng, Y.W. Wu, Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences, BMC Bioinformatics 8 (2007) 63. M. Knossow, M. Gaudier, A. Douglas, B. Barrere, T. Bizebard, C. Barbey, B. Gigant, J.J. Skehel, Mechanism of neutralization of influenza virus infectivity by antibodies, Virology 302 (2) (2002) 294–298. J. Sun, U.J. Kudahl, C. Simon, Z. Cao, E.L. Reinherz, V. Brusic, Large-scale analysis of B-cell epitopes on influenza virus hemagglutinin—implications for cross-reactivity of neutralizing antibodies, Front. Immunol. 5 (2014) 38. D.C. Ekiert, R.H. Friesen, G. Bhabha, T. Kwaks, M. Jongeneelen, W. Yu, C. Ophorst, F. Cox, H.J. Korse, B. Brandenburg, R. Vogels, J.P. Brakenhoff, R. Kompier, M.H. Koldijk, L.A. Cornelissen, L.L. Poon, M. Peiris, W. Koudstaal, I.A. Wilson, J. Goudsmit, A highly conserved neutralizing epitope on group 2 influenza A viruses, Science 333 (6044) (2011) 843–850. C. Dreyfus, D.C. Ekiert, I.A. Wilson, Structure of a classical broadly neutralizing stem antibody in complex with a pandemic H2 influenza virus hemagglutinin, J. Virol. 87 (12) (2013) 7149–7154. G. Nakamura, N. Chai, S. Park, N. Chiang, Z. Lin, H. Chiu, R. Fong, D. Yan, J. Kim, J. Zhang, Wyne P. Lee, A. Estevez, M. Coons, M. Xu, P. Lupardus, M. Balazs, Lee R. Swem, An in vivo human-plasmablast enrichment technique allows rapid identification of therapeutic influenza A antibodies, Cell Host Microbe 14 (1) (2013) 93–103. R.J. Russell, L.F. Haire, D.J. Stevens, P.J. Collins, Y.P. Lin, G.M. Blackburn, A.J. Hay, S.J. Gamblin, J.J. Skehel, The structure of H5N1 avian influenza neuraminidase suggests new opportunities for drug design, Nature 443 (7107) (2006) 45–49.