The fit of codon usage of human-isolated avian influenza A viruses to human

The fit of codon usage of human-isolated avian influenza A viruses to human

Journal Pre-proof The fit of codon usage of human-isolated avian influenza A viruses to human Wen Luo, Lin Tian, Yingde Gan, Enlong Chen, Xuejuan She...

569KB Sizes 4 Downloads 39 Views

Journal Pre-proof The fit of codon usage of human-isolated avian influenza A viruses to human

Wen Luo, Lin Tian, Yingde Gan, Enlong Chen, Xuejuan Shen, Junbin Pan, David M. Irwin, Rui-Ai Chen, Yongyi Shen PII:

S1567-1348(20)30013-7

DOI:

https://doi.org/10.1016/j.meegid.2020.104181

Reference:

MEEGID 104181

To appear in:

Infection, Genetics and Evolution

Received date:

8 July 2019

Revised date:

14 December 2019

Accepted date:

5 January 2020

Please cite this article as: W. Luo, L. Tian, Y. Gan, et al., The fit of codon usage of humanisolated avian influenza A viruses to human, Infection, Genetics and Evolution(2019), https://doi.org/10.1016/j.meegid.2020.104181

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof

The fit of codon usage of human-isolated avian influenza A viruses to human Wen Luo1 , Lin Tian2 , Yingde Gan1 , Enlong Chen1 , Xuejuan Shen1 , Junbin Pan1 , David M. Irwin3,4 , Rui-Ai Chen1,5* , Yongyi Shen1,5* 1

College of Veterinary Medicine, South China Agricultural University, Guangzhou

of

510642, China; Guangdong Provincial Hospital of Chinese Medicine, Zhuhai 519015, China;

3

Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto,

-p

ro

2

re

M5S 1A8, Canada;

Banting and Best Diabetes Centre, University of Toronto, Toronto, M5S 1A8, Canada;

5

Zhaoqing Institute of Biotechnology, Zhaoqing 526238, China.

*

Corresponding author: Yongyi Shen: [email protected]

Jo

ur

na

lP

4

Rui-Ai Chen: [email protected]

1

Journal Pre-proof

ABSTRACT Avian influenza A viruses (AIVs) classify into 18 hemagglutinin (HA) and 11 neuraminidase (NA) subtypes. Even though H1N1 and H3N2 subtypes usually circulate among humans leading to infection, occasionally, H5, H6, H7, H9, and H10 that circulate in poultry also infect humans, and especially H5N1 and H7N9. Efficient virus replication

of

is a critical factor that influences infection. Codon usage of a virus must coevolve with its

ro

host for efficient viral replication, therefore, we conduct a comprehensive analysis of

-p

codon usage bias in human- isolated AIVs to test their adaptation to host expression

re

system. The relative synonymous codon usage (RSCU) pattern, and the codon adaptation

lP

index (CAI) are calculated for this purpose. We find that all human-isolated AIVs tend to

na

eliminate GC and CpG compositions, which may prevent activation of the host innate immune system. Although codon usage differs between AIV subtypes, our data support

ur

the conclusion that natural selection has played a major role and mutation pressure a

Jo

minor role in shaping codon usage bias in all AIVs. Our efforts discover that codon usage of genes encoding surface proteins of H5N1, and the polymerase genes of H7N9 has better fit to the human expression system. This may associate with their better replication and infection in human. Keywords codon usage; avian influenza A viruses; H5N1; H7N9; adaptation.

2

Journal Pre-proof

1. Introduction

Avian Influenza A viruses (AIVs) classify into 18 hemagglutinin (HA) and 11 neuraminidase (NA) subtypes (Long et al., 2019). Wild aquatic birds serve as the animal reservoir for most subtypes of AIVs (Long et al., 2019). Usually, in humans H1N1 and

of

H3N2 subtypes cause seasonal infections or pandemics, but occasionally, H5, H6, H7,

ro

H9, and H10 that circulate in poultry also infect humans (Paules and Subbarao, 2017),

-p

and especially H5N1 and H7N9 (Webby and Yang, 2017). Highly pathogenic avian influenza (HPAI) H5N1 virus was first identified in human infections in 1996 (Peiris

re

et al., 2007). Since then, this subtype has circulated in migratory birds and caused

lP

large-scale poultry outbreaks in Asia, Europe, and Africa along with over 800 human

na

cases having about 53% lethality (WHO, 2019). Recently, H7N9 has surpassed H5N1

Jo

2017).

ur

in the numbers of human infection cases since its emergence in China in 2013 (WHO,

The ability of a viral strain to cause disease closely associates with its virulence, quantity and invasion portal in combination with host immunity and environmental factors, among other variables. Efficient virus replication, i.e. the efficient production of new viruses, is a critical factor that also influences infection. Viruses depend on their hosts’ cellular structure and metabolism to replicate and assemble. Most virus genomes do not encode tRNAs and, consequently, the translatio n of viral proteins requires host tRNAs (Kumar et al., 2016). The molecular bases for efficient virus replication and 3

Journal Pre-proof

transmission are complex and multifactorial (Harwig et al., 2017). Notwithstanding, codon usage patterns of viruses reflect the evolutionary changes that allow them to optimize their survival and fitness to their hosts (Burns et al., 2006; Costafreda et al., 2014; Mueller et al., 2006; Tian et al., 2018). Because influenza virus replication is based on its host's expression systems, virus codon usage must coevolve with its host to use

ro

-p

overall codon usage should help virus replication.

of

these host resources efficiently (Smith et al., 2018). Accordingly, greater similarity of

Although the codon usage and base composition of avian and mammalian animal

re

influenza viruses have been widely studied (Anhlan et al., 2011; Bera et al., 2017; Deka

lP

et al., 2018; Goni et al., 2012; Kumar et al., 2016; Li et al., 2018; Luo et al., 2019a; Luo

na

et al., 2019b; Wong et al., 2010), little is known about the differences in codon usage

ur

among human- isolated AIVs. Host barriers restrict interspecies transmission in AIVs,

Jo

nevertheless, some subtypes of AIVs, especially H5, and H7, occasionally infect humans. Increased replication is necessary for AIVs to cross the species barrier and to adapt for replication and transmission in mammals (Herfst et al., 2014). Matching viral and host codon usage can enhance the translation of viral p roteins (Carnero et al., 2009; Wang et al., 2006; Zhao and Chen, 2011). Herein, we hypothesize that those AIV subtypes that more easily infect humans have better codon correspondence to humans. To test this hypothesis, we calculate the relative synonymous codon usage (RSCU) and codon

4

Journal Pre-proof

adaptation index (CAI) values to assess how well these human-isolated AIVs correspond

Jo

ur

na

lP

re

-p

ro

of

to the host (human) codon usage pattern.

5

Journal Pre-proof

2. Materials and methods

2.1 Data collection

Gene sequences of human- isolated AIVs were downloaded from the NCBI Influenza Virus Resource (http://www.ncbi.nlm.nih.gov/genomes/FLU), the Influenza Research

of

Database (IRD) (www.fludb.org/brc/home.spg?decorator=influenza), and the Global

ro

Initiative on Sharing Avian Influenza Data (www.gisaid.org). Redundant sequences,

-p

laboratory strains and short (<90% of the corresponding gene) sequences were removed.

re

As the available number of genomes for human- isolated H6N1, H7N2, H7N3, H7N4,

lP

H7N7, and H10N8 were all less than five, thus, these subtypes were excluded from our following analyses. Our final dataset contained 241 human- isolated H5N1 genomes, 31

na

human-isolated H5N6 genomes, 1013 human- isolated H7N9 genomes, and 21

ur

human-isolated H9N2 genomes. Open reading frames (ORFs) for all eight viral genes

Jo

(PB2, PB1, PA, HA, NP, NS, MP, and NA) were used. The accession numbers and other detailed information of these viruses, such as strain names, gene segments and subtypes are shown in Supplementary table 1.

2.2 Preliminary analyses

Codon usage, nucleotide composition, effective number of codons (ENC), and the relative synonymous codon usage (RSCU) (Sharp and Li, 1986) were calculated using the program CodonW (http://sourceforge.net/projects/codonw). All coding sequences of 6

Journal Pre-proof

host cells were obtained from the Ensembl database (http://www.ensembl.org). The Wilcoxon rank sum test was used to compare medians of the continuous variables.

2.3 Estimating fit of virus to host's expression system

Codon adaptation index (CAI) predicts the level of gene expression and the

of

adaptation of viral genes to their hosts. The analysis of the influenza coding sequences

ro

was performed with the local version of CAIcal server (http://genomes.urv.cat/CAIcal/).

-p

CAI values can range from 0 to 1. The viral sequences with higher CAIs are better

re

adapted to hosts than those with lower CAIs (Sharp and Li, 1987).

lP

An ENc-plot mapping analysis was used to identify factors that influence codon

na

usage bias. ENc values were plotted against GC3s values (Wright, 1990). The expected

Jo

ur

ENc values for each GC3s were calculated using the following formula:

where s represents the GC3 value. Corresponding points falling near the expected curve indicate that the mutation was the main force shaping codon usage. Alternatively, when points fall considerably below the expected curve, then selection appears to be the main force shaping codon usage.

7

Journal Pre-proof

Neutrality plot mapping analysis was performed to identify the effects of natural selection and mutation pressure on codon usage patterns by plotting the P12 (GC12) values of the synonymous codons and the P3 (GC3s) values (Sueoka, 1988). GC12 worked as the ordinate while GC3s the abscissa, and each do t represented a virus strain. No significant difference in the three codon positions was identified when all points were

of

laid along the diagonal, which indicated that there was no or only weak external selection

ro

pressure. In contrast, if the regression curve tended to be sloped or was parallel to the

-p

horizontal axis, then GC12 and GC3s had little or no correlation.

lP

re

2.4 Multivariate analysis

The relationship between variables and samples was obtained using multivariate

na

correspondence analysis (CA), which yielded a geometrical representation of the sets of

ur

rows and columns in a dataset (Greenacre, 1984). Each ORF was represented as a

Jo

59-dimensional vector and each dimension corresponded to the RSCU value of each codon (all triplets excluding AUG, UGG and stop triplets). Major trends within a dataset were determined using measures of relative inertia and genes ordered according to their position along the different axes. CA was performed on the RSCU values using the program CodonW.

8

Journal Pre-proof

3. Results 3.1 Base composition of different human-isolated subtypes To reveal the potential influence of compositional constraints on codon usage bias, the nucleotide compositions of all eight viral genes were determined. As shown in Table 1 and Supplementary table 1, the mean composition of nucleotide A were the highest in

of

all eight viral genes, with the nucleotides at the third position of synonymous codons

ro

(A3s, T3s, G3s and C3s) showing similar compositional trends in each gene. The GC3s

-p

were lower than 50%, and the mean AT compositions were higher than for GC. The

re

human genome had a higher GC composition than human- isolated AIVs. The effective

lP

numbers of codons (ENc) of human- isolated AIVs were in the range of 49–57.

na

Codon AGA (Arg) was overwhelmingly favored in all eight viral genes and ACG (Thr), TCG (Ser), CGC (Arg), GCG (Ala), and CGT (Arg) were rarely used, which

ur

revealed the tendency of human- isolated AIVs to reduce the usage of the dinucleotide

Jo

CpG (Figure 1). As shown in Table 2, the mean composition of the dinucleotide ApA was highest while the composition of the dinucleotide CpG was lowest in all eight viral genes.

3.2 Codon usage variation between different human-isolated subtypes and hosts A CAI analysis was performed to predict the level of gene expression and the adaptation of viral genes to their hosts. CAI value comparison revealed that HA, NA, and MP genes of H5N1 are the best-adapted to the human host expression system compared 9

Journal Pre-proof

to other human- isolated AIVs (Figure 2). HA of H7N9 shows the worst fit to human compared to other AIVs (Figure 2), while the genes PB1 and PB2 from H7N9 have higher CAI values than those from H5N1, H5N6, and H9N2. The CA analysis of RSCU values (Supplementary table 2) yielded insights into the trends of codon usage variation among the different subtypes. The plane defined by the

of

first three principal axes of CA separated the different subtypes (Figure 3) and the

ro

locations of the MP, NP, NS, PA, PB1, and PB2 genes of H7N9 were close to those of

re

-p

H9N2.

lP

3.3 Forces influencing codon usage patterns of AIVs

na

The ENc-plot analysis identified forces that shaped the codon usage patterns of AIVs. ENc denoted the effective number of codons, and GC3s denoted the GC content at

ur

the third position of synonymous codons. ENc values of all subtypes were below the

Jo

expected ENc curve (Figure 4), indicating that it was not mutation pressure but instead other factors, including natural selection, which drove codon usage bias in AIVs. The neutrality plot analysis identified the main factor shaping the codon usage of AIVs (Figure 5). The slopes of HA-H5Nx (0.02 for H5N1, and 0.09 for H5N6) were much lower than that of H7N9 (0.20) and H9N2 (0.29). While for NA gene, the slopes of NA varied from 6.75E-3 to 0.17 and the slope of NA-H5N6 (6.75E-3) was the lowest. For the six internal genes, the slopes of MP varied from 3.84E-3 to 0.13, NP from 0.03 to 0.09, NS from 0.16 to 0.45, PA from 0.01 to 0.18, PB1 from 0.06 to 0.11 and PB2 from 10

Journal Pre-proof

0.02 to 0.08. According to this neutrality plot analysis, the influence of mutation pressure ranged from 1% to 45%,while natural selection accounts for 55%-99% of the total

Jo

ur

na

lP

re

-p

ro

of

selection pressure acting on the evolution of codon usage in all eight genes.

11

Journal Pre-proof

4. Discussion AIVs rely on their hosts for replication, and thus codon usage bias plays a role in their evolution. Because nucleotide composition highly influences codon usage bias, in the present study, nucleotide compositions of all human- isolated AIVs were analyzed. All AIVs appeared to have much lower GC, GC3s and CpG compositions compared to

of

human (Table 1 and 2). Previous studies showed that reduced GC content in AIVs may

ro

prevent activation of the host innate immune system (Greenbaum et al., 2009), or might

-p

form less stable viral mRNA structures, therefore have more efficient viral RNA

re

translation (Brower-Sinning et al., 2009). The reduced CpG dinucleotide content might

lP

relate to the anti-viral response of a cell (Karlin et al., 1994).

na

The effective number of codons (ENc) reveals the variations in codon usage. ENc values of these human- isolated AIVs range from 49 to 57. The overall value of ENc >40

ur

indicates low codon usage bias in these AIVs. Low codon usage bias has also been

Jo

observed among many RNA viruses, such as Ebola (ENC, 57.23) (Cristina et al., 2015), Rabies (Zhang et al., 2018), Chikungunya (ENC, 55.56) (Butt et al., 2014), Zika (53.93) (Butt et al., 2016), and hepatitis C (ENC, 52.62) (Hu et al., 2011). Low codon bias of RNA viruses may promote their efficient virus replication in host cells by reducing the competition between the virus and the host for the synthesis machinery, which potentially have distinct codon preferences (Butt et al., 2016). ENC–GC3s plots identify the forces that drive codon usage bias (Figure 4). All subtypes fall under the expected curve, which indicates that, except for mutation pressure, 12

Journal Pre-proof

factors including natural selection also drive codon usage bias in AIVs. The neutrality plot analysis further shows that natural selection is the predominant force (55%–99%) compared to mutational pressure (Figure 5).

Viruses depend on their hosts’ cellular structure and metabolism to replicate and

of

assemble. Codon usage patterns of viruses reflect the evolutionary changes that allow

ro

them to optimize their survival and fitness to their hosts (He et al., 2017; Hu et al., 2011;

-p

Luo et al., 2019a; Rahman et al., 2017; Su et al., 2017; Taylor et al., 2017; Zang et al., 2017). Thus, we compared the codon usage bias of human- isolated AIVs to their host.

re

Our findings support that HA, NA, and MP genes of H5N1 are the best-adapted to

lP

human host expression compared to other human- isolated AIVs (Figure 2). HA, NA, and

na

MP encode the three surface proteins of the envelope of AIVs. HA is a major surface

ur

glycoprotein of IAV and is involved in viral infection via binding to sialic acid

Jo

(SA)-containing glycans on the surface of host cell. NA is the other major glycoprotein of IAV, involved in the cleavage of SA on the host cell receptor to facilitate the release of viral particles to infect other cells. MP encodes the third surface protein of IAV and has ion channel activity to regulate virus penetration and uncoating. Patients with H5N1 infections had relatively short incubation and survival periods, and high mortality rates compared with the H7N9 infections (Cowling et al., 2013). The great fit of the three surface proteins of H5N1 AIVs to human codon usage pattern may be integral in the rapid disease progression and poor prognosis of patients with H5N1. 13

Journal Pre-proof

H7N9 subtype has invaded human multiple times since 2013, and has surpassed H5N1 infections in the number of laboratory-confirmed cases despite limited dissemination outside of China (Webby and Yang, 2017; Xiang et al., 2018). HA of H7N9 shows the worst fit to human compared to other AIVs (Figure 2). While PB1 and PB2 genes of H7N9 have higher CAI values than those of H5N1 and H9N2. The

of

influenza virus polymerase, a heterotrimer composed of three subunits, PA, PB1, and

ro

PB2, is responsible for replication and transcription of the viral RNA genome in the

-p

nuclei of infected cells. The PB2 subunit binds the 5′7- methylguanosine cap of host

re

pre-mRNAs, which are subsequently cleaved off 10–15 nucleotides downstream by PA

lP

(Dias et al., 2009). PB1 protein, a viral RNA-dependent RNA polymerase, catalyzes the

na

addition of nucleotides to the resulting capped short RNA primer and initiates viral transcription. The better fit for the codon usage of PB1 and PB2 would help H7N9 to

ur

replicate in human cells. This may explain the greater number of human infections for

Jo

H7N9 compared to H5N1 and H9N2. 5. Conclusions

Our findings reveal that codon usage bias of all human- isolated AIVs is weak and is governed mainly by natural selection. All human-isolated AIVs tend to eliminate GC and CpG compositions, which may prevent the activation of the host innate immune system and be beneficial for allowing the virus to escape the host anti- viral immune response. According to the CAI analysis, the codon usage of the surface proteins of H5N1, and the polymerase genes of H7N9, has better fit to the human expression system. This may 14

Journal Pre-proof

associate with their better replication and infection in humans. The findings of the present study not only aid in understanding the underlying factors involved in codon usage of all human- isolated AIVs and their fitness towards humans, but also contributes

Jo

ur

na

lP

re

-p

ro

of

to our understanding of the factors that drive AIV evolution.

15

Journal Pre-proof

Acknowledgments This work was supported by National Natural Science Foundation of China (Grant No. 31822056), Natural Science Foundation of Guangdong Province (Grant No. 2014A030306046), Key Realm R&D Program of Guangdong Province (Grant No. 2019B020211004), the 111 Project and the third batch of ZhaoqingXijiang innovation

-p

ro

of

team project.

lP

re

CONFLICT OF INTEREST

Jo

ur

na

All authors: No reported conflicts of interest.

16

Journal Pre-proof

References

Jo

ur

na

lP

re

-p

ro

of

Anhlan, D., Grundmann, N., Makalowski, W., Ludwig, S., Scholtissek, C., 2011. Origin of the 1918 pandemic H1N1 influenza A virus as studied by codon usage patterns and phylogenetic analysis. RNA 17, 64-73. Bera, B.C., Virmani, N., Kumar, N., Anand, T., Pavulraj, S., Rash, A., Elton, D., Rash, N., Bhatia, S., Sood, R., Singh, R.K., Tripathi, B.N., 2017. Genetic and codon usage bias analyses of polymerase genes of equine influenza virus and its relation to evolution. BMC Genomics 18, 652. Brower-Sinning, R., Carter, D.M., Crevar, C.J., Ghedin, E., Ross, T.M., Benos, P.V., 2009. The role of RNA folding free energy in the evolution of the polymerase genes of the influenza A virus. Genome Biol 10, R18. Burns, C.C., Shaw, J., Campagnoli, R., Jorba, J., Vincent, A., Quay, J., Kew, O., 2006. Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsid region. J Virol 80, 3259-3272. Butt, A.M., Nasrullah, I., Qamar, R., Tong, Y., 2016. Evolution of codon usage in Zika virus genomes is host and vector specific. Emerg Microbes Infect 5, e107. Butt, A.M., Nasrullah, I., Tong, Y., 2014. Genome-wide analysis of codon usage and influencing factors in chikungunya viruses. PLoS One 9, e90905. Carnero, E., Li, W., Borderia, A.V., Moltedo, B., Moran, T., Garcia-Sastre, A., 2009. Optimization of human immunodeficiency virus gag expression by newcastle disease virus vectors for the induction of potent immune responses. J Virol 83, 584-597. Costafreda, M.I., Perez-Rodriguez, F.J., D'Andrea, L., Guix, S., Ribes, E., Bosch, A., Pinto, R.M., 2014. Hepatitis A virus adaptation to cellular shutoff is driven by dynamic adjustments of codon usage and results in the selection of populations with altered capsids. J Virol 88, 5029-5041. Cowling, B.J., Jin, L., Lau, E.H.Y., Liao, Q., Wu, P., Jiang, H., Tsang, T.K., Zheng, J., Fang, V.J., Chang, Z., Ni, M.Y., Zhang, Q., Ip, D.K.M., Yu, J., Li, Y., Wang, L., Tu, W., Meng, L., Wu, J.T., Luo, H., Li, Q., Shu, Y., Li, Z., Feng, Z., Yang, W., Wang, Y., Leung, G.M., Yu, H., 2013. Comparative epidemiology of human infections with avian influenza A(H7N9) and A(H5N1) viruses in China. Lancet 382, 129-137. Cristina, J., Moreno, P., Moratorio, G., Musto, H., 2015. Genome-wide analysis of codon usage bias in Ebolavirus. Virus Res 196, 87-93. Deka, H., Nath, D., Uddin, A., Chakraborty, S., 2019. DNA compositional dynamics and codon usage patterns of M1 and M2 matrix protein genes in influenza A virus. Infect Genet Evol 67, 7-16. Dias, A., Bouvier, D., Crepin, T., McCarthy, A.A., Hart, D.J., Baudin, F., Cusack, S., Ruigrok, R.W., 2009. The cap-snatching endonuclease of influenza virus polymerase resides in the PA subunit. Nature 458, 914-918. Goni, N., Iriarte, A., Comas, V., Sonora, M., Moreno, P., Moratorio, G., Musto, H., Cristina, J., 2012. Pandemic influenza A virus codon usage revisited: biases, adaptation 17

Journal Pre-proof

Jo

ur

na

lP

re

-p

ro

of

and implications for vaccine strain development. Virol J 9, 263. Greenacre, M.J., 1984. Theory and Applications of Correspondence Analysis. Academic Press, London. Greenbaum, B.D., Rabadan, R., Levine, A.J., 2009. Patterns of oligonucleotide sequences in viral and host cell RNA identify mediators of the host innate immune system. PLoS One 4, e5969. Harwig, A., Landick, R., Berkhout, B., 2017. The battle of rna synthesis: virus versus host. Viruses 9, E309. He, W., Zhang, H., Zhang, Y., Wang, R., Lu, S., Ji, Y., Liu, C., Yuan, P., Su, S., 2017. Codon usage bias in the N gene of rabies virus. Infect Genet Evol 54, 458-465. Herfst, S., Imai, M., Kawaoka, Y., Fouchier, R.A., 2014. Avian influenza virus transmission to mammals. Curr Top Microbiol Immunol 385, 137-155. Hu, J.S., Wang, Q.Q., Zhang, J., Chen, H.T., Xu, Z.W., Zhu, L., Ding, Y.Z., Ma, L.N., Xu, K., Gu, Y.X., Liu, Y.S., 2011. The characteristic of codon usage pattern and its evolution of hepatitis C virus. Infect Genet Evol 11, 2098-2102. Karlin, S., Doerfler, W., Cardon, L.R., 1994. Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? J Virol 68, 2889-2897. Kumar, N., Bera, B.C., Greenbaum, B.D., Bhatia, S., Sood, R., Selvaraj, P., Anand, T., Tripathi, B.N., Virmani, N., 2016. Revelation of influencing factors in overall codon usage bias of equine influenza viruses. PLoS One 11, e0154376. Li, G., Wang, R., Zhang, C., Wang, S., He, W., Zhang, J., Liu, J., Cai, Y., Zhou, J., Su, S., 2018. Genetic and evolutionary analysis of emerging H3N2 canine influenza virus. Emerg Microbes Infect 7, 73. Long, J.S., Mistry, B., Haslam, S.M., Barclay, W.S., 2019. Host and viral determinants of influenza A virus species specificity. Nat Rev Microbiol 17, 67-81. Luo, W., lia, Y., Yu, S., Shen, X., Tian, L., Irwin, D.M., Shen, Y., 2019a. Better fit of codon usage of the polymerase and nucleoprotein genes to the chicken host for H7N9 than H9N2 AIVs. J Infect 79, 174-187. Luo, W., Tian, L., Huang, C., Li, J., Shen, X., Murphy, R.W., Liao, M., Shen, Y., 2019b. The codon usage bias of avian influenza A viruses. J Infect 79, 174-187. Mueller, S., Papamichail, D., Coleman, J.R., Skiena, S., Wimmer, E., 2006. Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity. J Virol 80, 9687-9696. Paules, C., Subbarao, K., 2017. Influenza. Lancet (London, England) 390, 697-708. Peiris, J.S., de Jong, M.D., Guan, Y., 2007. Avian influenza virus (H5N1): a threat to human health. Clin Microbiol Rev 20, 243-267. Rahman, S.U., Yao, X., Li, X., Chen, D., Tao, S., 2017. Analysis of codon usage bias of Crimean-Congo hemorrhagic fever virus and its adaptation to hosts. Infect Genet Evol 58, 1-16. Sharp, P.M., Li, W.H., 1987. The codon Adaptation Index--a measure of directional 18

Journal Pre-proof

Jo

ur

na

lP

re

-p

ro

of

synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 1281-1295. Smith, B.L., Chen, G., Wilke, C.O., Krug, R.M., 2018. Avian Influenza Virus PB1 Gene in H3N2 Viruses Evolved in Humans To Reduce Interferon Inhibition by Skewing Codon Usage toward Interferon-Altered tRNA Pools. mBio 9, e01222-01218. Su, W., Li, X., Chen, M., Dai, W., Sun, S., Wang, S., Sheng, X., Sun, S., Gao, C., Hou, A., Zhou, Y., Sun, B., Gao, F., Xiao, J., Zhang, Z., Jiang, C., 2017. Synonymous codon usage analysis of hand, foot and mouth disease viruses: A comparative study on coxsackievirus A6, A10, A16, and enterovirus 71 from 2008 to 2015. Infect Genet Evol 53, 212-217. Sueoka, N., 1988. Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci U S A 85, 2653-2657. Taylor, T.L., Dimitrov, K.M., Afonso, C.L., 2017. Genome-wide analysis reveals class and gene specific codon usage adaptation in avian paramyxoviruses 1. Infect Genet Evol 50, 28-37. Tian, L., Shen, X., Murphy, R.W., Shen, Y., 2018. The adaptation of codon usage of +ssRNA viruses to their hosts. Infect Genet Evol 65, 276-282. Wang, S., Farfan-Arribas, D.J., Shen, S., Chou, T.H., Hirsch, A., He, F., Lu, S., 2006. Relative contributions of codon usage, promoter efficiency and leader sequence to the antigen expression and immunogenicity of HIV-1 Env DNA vaccine. Vaccine 24, 4531-4540. Webby, R.J., Yang, Z., 2017. The changing landscape of A H7N9 influenza virus infections in China. Lancet Infect Dis 17, 783-784. WHO, 2017. Human infection with avian influenza A(H7N9) virus – China(Disease outbreak news, 26 October 2017). WHO, 2019. Cumulative number of confirmed human cases of avian influenza A(H5N1) reported to WHO. Wong, E.H., Smith, D.K., Rabadan, R., Peiris, M., Poon, L.L., 2010. Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus. BMC Evol Biol 10, 253. Wright, F., 1990. The 'effective number of codons' used in a gene. Gene 87, 23-29. Xiang, D., Shen, X., Pu, Z., Irwin, D.M., Liao, M., Shen, Y., 2018. Convergent evolution of human- isolated H7N9 avian influenza A viruses. J Infect Dis 217, 1699-1707. Zang, M., He, W., Du, F., Wu, G., Wu, B., Zhou, Z., 2017. Analysis of the codon usage of the ORF2 gene of feline calicivirus. Infect Genet Evol 54, 54-59. Zhang, X., Cai, Y., Zhai, X., Liu, J., Zhao, W., Ji, S., Su, S., Zhou, J., 2018. Comprehensive Analysis of Codon Usage on Rabies Virus and Other Lyssaviruses. Int J Mol Sci 19. Zhao, K.N., Chen, J., 2011. Codon usage roles in human papillomavirus. Rev Med Virol 21, 397-411.

19

Journal Pre-proof Figure legends: Figure 1: Heat map of RSCU of human-isolated AIVs with host. Figure 2: The

codon adaptation index (CAI) of human-isolated AIVs.

Human- isolated H5N1 AIVs are shown in red, human- isolated H5N6 in yellow,

of

human-isolated H7N9 in blue, and human-isolated H9N2 in grey. Figure 3: Multivariate correspondence analysis (CA) of different human-isolated

ro

AIVs with human. (A) HA gene. (B) NA gene. (C) MP gene. (D) NP gene. (E) NS

-p

gene. (F) PA gene. (G) PB1 gene. (H) PB2 gene. Each subtype is displayed in a

re

3-dimensional representation. The X, Y and Z axes are in arbitrary scales generated by

lP

the CA and the weight of each codon in these axes varies in different segments.

na

Figure 4: ENc-plot analysis (ENc plotted against GC3s). (A) HA gene. (B) NA gene.

ur

(C) MP gene. (D) NP gene. (E) NS gene. (F) PA gene. (G) PB1 gene. (H) PB2 gene.

Jo

ENc denotes the effective number of codons, and GC3s denotes the GC content on the third synonymous codon position. The solid blue line represents the expected curve derived from the positions of strains when the codon usage was only determined by the GC3s composition (without selection). Figure 5: Neutrality plot analysis (correlation between GC12 and GC3s). (A) HA gene. (B) NA gene. (C) MP gene. (D) NP gene. (E) NS gene. (F) PA gene. (G) PB1 gene. (H) PB2 gene. GC12 stands for the average GC content at the first and second

20

Journal Pre-proof

position of the synonymous codons, while GC3s refers to the GC content at the third synonymous codon position. Suppleme ntary table 1. Accession numbers, strain names, subtypes, compositional features, CAI values of all human- isolated AIVs that used in this study.

Jo

ur

na

lP

re

-p

ro

of

Supplementary table 2. The CA analysis of RSCU values.

21

Journal Pre-proof

Table 1: Compositional features of different human-isolated

Jo

ur

na

lP

re

-p

ro

of

subtypes.

22

ENc

GC3s

GC

T

C

A

H5N1-HA

49.506±0.756

0.397±0.009

0.411±0.004

0.238±0.003

0.184±0.003

0.351±0.004

H5N6-HA

50.05±0.744

0.392±0.012

0.413±0.004

0.241±0.002

0.184±0.002

0.346±0.003

H7N9-HA

49.806±0.458

0.346±0.007

0.421±0.002

0.238±0.001

0.179±0.001

0.341±0.002

H9N2-HA

50.01±0.739

0.373±0.008

0.417±0.004

0.241±0.002

0.189±0.004

0.342±0.003

H5N1-MP

54.28±1.101

0.473±0.008

0.482±0.004

0.232±0.003

0.214±0.002

0.285±0.003

H5N6-MP

55.346±2.711

0.461±0.009

0.48±0.004

0.232±0.003

0.213±0.003

0.288±0.003

H7N9-MP

56.711±0.651

0.466±0.008

0.484±0.003

0.229±0.002

0.213±0.002

0.286±0.002

H9N2-MP

56.939±0.847

0.464±0.009

0.483±0.003

0.23±0.004

0.213±0.003

0.286±0.003

H5N1-NA

50.04±0.889

0.381±0.01

0.436±0.004

0.266±0.002

0.183±0.003

0.298±0.003

H5N6-NA

47.243±0.926

0.41±0.007

0.445±0.003

0.21±0.002

0.209±0.002

0.345±0.003

H7N9-NA

51.617±0.428

0.393±0.007

0.44±0.003

0.221±0.002

0.209±0.002

0.34±0.002

H9N2-NA

50.71±1.267

0.359±0.015

0.424±0.006

0.261±0.002

0.186±0.003

0.316±0.006

H5N1-NP

51.067±0.851

0.438±0.01

0.477±0.004

0.208±0.002

0.199±0.003

0.315±0.003

H5N6-NP

50.927±0.632

0.429±0.019

0.473±0.007

0.21±0.003

0.198±0.004

0.318±0.005

H7N9-NP

50.382±0.554

0.413±0.008

0.468±0.003

0.209±0.002

0.195±0.002

0.323±0.003

H9N2-NP

50.902±0.98

0.421±0.013

0.471±0.005

0.207±0.004

0.198±0.002

0.321±0.004

H5N1-NS

51.001±1.445

0.42±0.012

0.446±0.006

0.237±0.004

0.204±0.003

0.316±0.004

H5N6-NS

51.035±1.965

0.393±0.021

0.439±0.012

0.243±0.005

0.195±0.005

0.318±0.008

H7N9-NS

48.584±1.223

0.369±0.011

0.425±0.005

0.245±0.002

0.191±0.003

0.33±0.004

H9N2-NS

48.776±1.818

0.376±0.015

0.429±0.009

0.244±0.003

0.192±0.004

0.327±0.007

H5N1-PA

52.81±0.915

0.433±0.011

0.437±0.004

0.229±0.003

0.198±0.003

0.334±0.004

H5N6-PA

53.204±1.093

0.443±0.017

0.44±0.007

0.227±0.006

0.202±0.006

0.333±0.004

53.464±0.57

0.447±0.006

0.441±0.003

0.223±0.002

0.205±0.002

0.336±0.002

53.111±1.122

0.444±0.012

0.439±0.004

0.226±0.004

0.202±0.005

0.335±0.002

H5N1-PB1

53.542±0.529

0.422±0.009

0.431±0.004

0.223±0.003

0.197±0.002

0.346±0.002

H5N6-PB1

53.506±0.849

0.442±0.021

0.437±0.005

0.22±0.004

0.2±0.002

0.343±0.002

H7N9-PB1

52.586±0.506

0.463±0.008

0.442±0.002

0.216±0.002

0.201±0.002

0.342±0.002

H9N2-PB1

52.614±0.522

0.449±0.023

0.439±0.006

0.218±0.003

0.2±0.003

0.343±0.004

H5N1-PB2

52.891±0.929

0.414±0.011

0.446±0.004

0.218±0.002

0.19±0.002

0.336±0.004

H5N6-PB2

51.362±1.095

0.413±0.02

0.444±0.006

0.22±0.002

0.185±0.002

0.335±0.006

H7N9-PB2

51.512±0.937

0.427±0.006

0.448±0.002

0.22±0.002

0.184±0.002

0.332±0.002

H9N2-PB2

52.03±0.995

0.416±0.018

0.445±0.006

0.22±0.003

0.185±0.002

0.335±0.006

Human

54

0.562

0.523

0.218

0.261

0.259

H9N2-PA

-p

re

lP

na

ur

Jo

H7N9-PA

of

Gene

ro

Journal Pre-proof

Note: ENc represents the effective number of codons. GC3s represents the frequency of the nucleotides G+C at the third posit

23 represents the frequency of the nucleotide T, C, A, G at the third posit content of T, C, A, G, respectively. T3s, C3s, A3s, G3s

Jo

ur

na

lP

re

-p

ro

of

Journal Pre-proof

24

Journal Pre-proof

Table 2: The mean compositions of dinucleotides of different human–isolated subtypes. CpT

CpC

CpA

CpG

ApT

ApC

ApA

ApG

0.066±0.002

0.049±0.002

0.04±0.002

0.08±0.003

0.017±0.001

0.085±0.002

0.061±0.002

0.128±0.004

0.079±0.002

0.069±0.002

0.049±0.001

0.046±0.002

0.076±0.001

0.017±0.001

0.087±0.002

0.06±0.001

0.129±0.004

0.075±0.003

0.08±0.001

0.046±0.001

0.031±0.001

0.083±0.001

0.016±0.001

0.092±0.001

0.053±0.001

0.122±0.002

0.08±0.002

0.075±0.002

0.052±0.002

0.034±0.003

0.085±0.002

0.017±0.002

0.09±0.002

0.062±0.002

0.118±0.003

0.072±0.003

0.077±0.002

0.061±0.002

0.04±0.002

0.078±0.003

0.035±0.003

0.072±0.002

0.049±0.002

0.079±0.003

0.085±0.003

0.076±0.003

0.059±0.002

0.039±0.002

0.079±0.001

0.034±0.002

0.067±0.004

0.054±0.005

0.082±0.005

0.086±0.004

0.076±0.001

0.06±0.001

0.04±0.002

0.077±0.002

0.035±0.001

0.063±0.001

0.06±0.001

0.081±0.002

0.083±0.001

0.077±0.003

0.06±0.002

0.04±0.002

0.078±0.003

0.034±0.002

0.064±0.002

0.058±0.005

0.081±0.003

0.085±0.002

0.089±0.002

0.049±0.003

0.038±0.003

0.081±0.002

0.015±0.002

0.087±0.002

0.047±0.003

0.092±0.003

0.07±0.002

0.067±0.001

0.045±0.001

0.045±0.001

0.106±0.003

0.017±0.002

0.088±0.001

0.07±0.002

0.113±0.002

0.074±0.002

0.074±0.001

0.049±0.001

0.043±0.002

0.096±0.001

0.018±0.001

0.087±0.002

0.076±0.001

0.101±0.002

0.074±0.001

0.087±0.003

0.051±0.004

0.035±0.002

0.084±0.003

0.013±0.002

0.096±0.006

0.05±0.004

0.094±0.004

0.071±0.003

0.076±0.002

0.055±0.001

0.04±0.002

0.078±0.002

0.026±0.002

0.072±0.001

0.054±0.001

0.091±0.003

0.098±0.001

0.077±0.002

0.056±0.002

0.041±0.003

0.076±0.002

0.025±0.003

0.072±0.001

0.054±0.002

0.096±0.004

0.095±0.003

0.077±0.002

0.057±0.002

0.04±0.002

0.075±0.001

0.023±0.002

0.073±0.002

0.056±0.001

0.097±0.003

0.097±0.002

0.076±0.002

0.055±0.002

0.042±0.002

0.074±0.002

0.026±0.002

0.072±0.003

0.056±0.002

0.096±0.004

0.097±0.003

0.071±0.002

0.061±0.002

0.041±0.002

0.072±0.003

0.027±0.003

0.067±0.004

0.059±0.002

0.108±0.003

0.086±0.002

0.072±0.005

0.059±0.001

0.036±0.004

0.071±0.002

0.027±0.002

0.07±0.005

0.055±0.004

0.106±0.003

0.089±0.006

0.067±0.001

0.061±0.001

0.032±0.002

0.073±0.002

0.025±0.001

0.078±0.003

0.052±0.003

0.105±0.002

0.095±0.002

0.069±0.005

0.06±0.002

0.032±0.003

0.071±0.002

0.023±0.002

0.077±0.005

0.054±0.003

0.108±0.003

0.094±0.004

0.078±0.002

0.051±0.002

0.046±0.001

0.076±0.002

0.025±0.001

0.082±0.002

0.056±0.001

0.119±0.002

0.077±0.002

0.076±0.002

0.053±0.001

0.047±0.003

0.076±0.002

0.025±0.003

0.08±0.003

0.057±0.002

0.119±0.005

0.077±0.003

0.074±0.001

0.053±0.002

0.049±0.001

0.076±0.001

0.027±0.001

0.081±0.003

0.059±0.002

0.122±0.002

0.075±0.001

0.076±0.002

0.052±0.002

0.047±0.002

0.076±0.001

0.026±0.002

0.082±0.002

0.056±0.001

0.12±0.003

0.075±0.003

0.073±0.002

0.047±0.001

0.044±0.001

0.083±0.002

0.023±0.002

0.085±0.002

0.065±0.002

0.118±0.002

0.078±0.002

0.072±0.002

0.046±0.002

0.046±0.002

0.085±0.003

0.023±0.002

0.082±0.002

0.065±0.002

0.116±0.003

0.08±0.004

0.072±0.001

0.044±0.001

0.046±0.001

0.087±0.001

0.023±0.001

0.08±0.002

0.065±0.001

0.112±0.002

0.084±0.001

0.072±0.002

0.046±0.003

0.046±0.002

0.086±0.002

0.022±0.002

0.081±0.003

0.065±0.002

0.113±0.003

0.085±0.002

0.074±0.002

0.043±0.001

0.04±0.001

0.082±0.001

0.024±0.002

0.078±0.002

0.057±0.002

0.111±0.003

0.089±0.002

0.075±0.001

0.042±0.002

0.039±0.003

0.081±0.002

0.023±0.001

0.076±0.003

0.058±0.003

0.11±0.004

0.092±0.003

0.076±0.001

0.04±0.001

0.041±0.001

0.08±0.001

0.022±0.001

0.076±0.002

0.054±0.001

0.108±0.002

0.094±0.002

0.075±0.001

0.041±0.001

0.041±0.002

0.081±0.001

0.021±0.001

0.078±0.002

0.056±0.002

0.108±0.005

0.093±0.003

0.078

0.071

0.078

0.081

0.032

0.052

0.056

0.071

0.08

ro

-p

re

lP

na

ur

Jo

of

TpG

25

Jo

ur

na

lP

re

-p

ro

of

Journal Pre-proof

26

Journal Pre-proof

Author Contributions YS and RC conceived, designed, and supervised the study. WL, LT, YG, EC, XS, and JP collected and analyzed the data. YS and WL wrote the drafts of the manuscript. DMI and RC commented on and revised drafts of the manuscript. All authors read and approved the

Jo

ur

na

lP

re

-p

ro

of

final report.

27

Journal Pre-proof

Highlights: All human- isolated AIVs tend to eliminate GC and CpG compositions Natural selection plays a major role in shaping the codon usage bias in all AIVs Codon usage of surface proteins of H5N1 and has better fit to the human

Jo

ur

na

lP

re

-p

ro

of

Codon usage of the polymerase genes of H7N9 has better fit to the human

28