Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns

Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns

Accepted Manuscript Short communication Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns Hao Zh...

594KB Sizes 0 Downloads 33 Views

Accepted Manuscript Short communication Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns Hao Zhou, Bing Yan, Shun Chen, Mingshu Wang, Renyong Jia, Anchun Cheng PII: DOI: Reference:

S1567-1348(15)00286-5 http://dx.doi.org/10.1016/j.meegid.2015.07.024 MEEGID 2418

To appear in:

Infection, Genetics and Evolution

Received Date: Revised Date: Accepted Date:

10 March 2015 15 July 2015 20 July 2015

Please cite this article as: Zhou, H., Yan, B., Chen, S., Wang, M., Jia, R., Cheng, A., Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns, Infection, Genetics and Evolution (2015), doi: http://dx.doi.org/10.1016/j.meegid.2015.07.024

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns. Hao Zhoua,#, Bing Yana,#, Shun Chena,b,c,#,*, Mingshu Wanga,b,c, Renyong Jiaa,b,c, Anchun Chenga,b,c,* a

Institute of Preventive Veterinary Medicine, Sichuan Agricultural University, Chengdu, Sichuan,

611130, P.R. China b

Avian Disease Research Center, College of Veterinary Medicine of Sichuan Agricultural

University, Chengdu, Sichuan, 611130, P.R. China c

Key Laboratory of Animal Disease and Human Health of Sichuan Province, Sichuan Agricultural

University, Chengdu, Sichuan, 611130, P.R. China * Corresponding authors. Tel.: +86-028-86296117; Fax: +86-028-86296117 E-mail address: [email protected]; [email protected] Mailing address: Institute of Preventive Veterinary Medicine, Sichuan Agricultural University, No. 211 Huimin Road, Wenjiang District, Chengdu, Sichuan Province, 611130, China. # These authors contributed equally as co-first authors of this work.

Abstract Tembusu virus (TMUV) is a single-stranded, positive-sense RNA virus. As reported, TMUV infection has resulted in significant poultry losses, and the virus may also pose a threat to public health. To characterize TMUV evolutionarily and to understand the factors accounting for codon usage properties, we performed, for the first time, a comprehensive analysis of codon usage bias for the genomes of 60 TMUV strains. The most recently published TMUV strains were found to be widely distributed in coastal cities of southeastern China. Codon preference among TMUV genomes exhibits a low bias (effective number of codons (ENC)=53.287) and is maintained at a stable level. ENC-GC3 plots and the high correlation between composition constraints and principal component factor analysis of codon usage demonstrated that mutation pressure dominates over natural selection pressure in shaping the TMUV coding sequence composition. The high correlation between the major components of the codon usage pattern and hydrophobicity (Gravy) or aromaticity (Aromo) was obvious, indicating that properties of viral proteins also account for the observed variation in TMUV codon usage. Principal component analysis (PCA) showed that CQW1 isolated from Chongqing may have evolved from GX2013H or GX2013G isolated from Guangxi, thus indicating that TMUV likely disseminated from southeastern China to the mainland. Moreover, the preferred codons encoding eight amino acids were consistent with the optimal codons for human cells, indicating that TMUV may pose a threat to public health due to possible cross-species transmission (birds to birds or birds to humans). The results of this study not only have theoretical value for uncovering the characteristics of synonymous codon usage patterns in TMUV genomes but also have significant meaning with regard to the molecular evolutionary tendencies of TMUV. Keywords Tembusu virus, codon usage pattern, mutation pressure, cross-species transmission 1

1 Introduction Tembusu virus (TMUV), a single-stranded, positive-sense RNA virus with a genome length of approximately 11 kb, was first identified in mosquitoes from Malaysia in 1955(Platt et al., 1975). TMUV is a member of the Flavivirus genus in the family Flaviviridae and contains a unique open reading frame (ORF) encoding three structural proteins, including core (C), pre-membrane (prM), and envelope (E), and seven nonstructural (NS) proteins, including NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5(Chambers et al., 1990; Li et al., 2012a; Tang et al., 2012). TMUV has recently attracted increasing attention, the main reasons for which are its association with serious illness, heavy declines in avian egg production, and severe neurological symptoms in avian species, with ducks being particularly susceptible(Cao et al., 2011; Yan et al., 2011). This virus can have devastating effects on poultry farming and cause serious economic losses in the waterfowl industry. In addition, as a novel member of Flavivirus, TMUV should receive more attention because it represents a potential threat to public health. Indeed, cell-adapted duck Tembusu virus (DTMUV) presenting antibody-dependent enhancement (ADE) was able to replicate in a mouse model(Liu et al., 2013). With regard to humans, TMUV antibodies (71.9%) and RNA (47.7%) were detected in samples collected from duck farm workers in Shandong, China(Tang et al., 2013b). Nevertheless, the infectious route and evolutionary mechanisms of TMUV infection remain largely unknown; as such, the possibility of zoonotic transmission of TMUV from birds to humans should not be overlooked. For viruses, little information exists regarding the extent and origin of synonymous codon variation. Recent efforts to elucidate codon usage biases in Flaviviridae have concentrated mainly on Dengue virus (DGV)(Zhou et al., 2013), West Nile virus (WNV)(Moratorio et al., 2013), and Hepatitis C virus (HCV)(Hu et al., 2011). In contrast, investigations of TMUV have primarily focused on viral isolation and identification(Huang et al., 2013; Tang et al., 2013a), the establishment and application of diagnostic methods(Li et al., 2012b; Yun et al., 2012), and genetic analyses of complete genome sequences(Tang et al., 2012; Wan et al., 2012; Zhu et al., 2012), whereas information regarding codon usage patterns is not available. Therefore, we examined the codon usage bias of TMUV to clarify the factors influencing codon usage patterns of the TMUV genome, which may provide a unique and valuable foundation for a better understanding of the molecular evolutionary process of TMUV. Furthermore, a detailed understanding of the extent and causes of codon usage biases is critical for exploring the interplay between mutation pressure and natural selection. To our knowledge, this is the first comprehensive study to systematically investigate the codon usage bias of TMUV.

2 Materials and methods 2.1 Sequence data A total of 60 genomic TMUV sequences were extracted from the National Center for Biotechnology Information (NCBI) database. One completed genome of TMUV was isolated from a duck in Chongqing and sequenced by our laboratory (Accession number KM233707). After removing redundant and repeated sequences, a coding sequence (CDS) analysis of all TMUV strain genomes was performed to investigate the characteristics of codon usage variation. Detailed information about these strains is listed in supplementary Table S1, and the distribution of these strains in the different provinces of China is shown in Fig. 1. 2.2 Relative synonymous codon usage Relative synonymous codon usage (RSCU) can measure the degree of synonymous codon usage 2

bias, avoiding the unnecessary influence of amino acid composition in certain genes. Furthermore, an RSCU value of 1 shows that a codon has a random translation selection; an RSCU value greater than 1 means that a codon has a high frequency, and vice versa. 2.3 The ENC-GC3s plot The effective number of codons (ENC) of sequences varies from 20 to 61, with a lower value indicating a stronger codon usage bias. Moreover, larger ENC values are associated with weaker codon preference. The GC contents of the third codon positions (GC3s) in TMUV were also calculated. The expected ENC-GC3 curve has been widely utilized to determine whether codon usages of given genes are affected by mutation only or also by other factors such as natural selection(Wright, 1990). 2.4 Principal component analysis Principal component analysis (PCA) is a widely used multivariate statistical method that was employed in this study to analyse the major trends among different TMUV strains in a codon usage model. Each strain of TMUV was represented as a 59 dimensional vector (RSCU value of each codon), excluding the codons ATG and TGG and the three stop codons. The first principal component and the second principal component, namely f1 and f2, were extracted to visually determine the genetic relationship of each TMUV. 2.5 Data processing The primary indices mentioned above and the calculation procedures were performed using the Codon W program and SPSS 20.0 software. The Spearman’s rank correlation analysis and cluster analysis by Euclidean distance were performed with SPSS 20.0 software.

3 Results 3.1 Codon usage bias and synonymous codon usage among TMUV strains To investigate the degree of codon usage variation in TMUV, we determined the details of the RSCU values of all strains (Table 1). The results showed a preference for codons ending with A/T versus C/G, at a ratio of 11:7. Four preferred codons end with G, whereas three end with C; in addition, the use of codons ending with A was more frequent than the use of codons ending with T (6:5). Furthermore, the data show that the frequency of a preferred codon ending with a particular nucleotide was essentially stable in TMUV genes. 3.2 Compositional properties of TMUV strains The overall base composition of the 60 TMUV strains evaluated was nonrandom. There was no remarkable difference between A and G, with mean values of 28.59% and 28.96%, respectively. However, the mean value of the T content (22.52%) was found to be higher than that of the C content (19.93%). Evidently, the G+C value fluctuated from 48.66% to 49.32%, with a mean value of 48.88%, and an S.D. of 0.001, whereas the mean value of GC3s was 47.56%, with an S.D. of 0.001 (Table S2). The differences in nucleotide content suggest that composition constraints are determinants of the codon usage pattern of TMUV. The values of ENC among the TMUV strains were found to be similar, varying from 52.85 to 53.54, with a mean of 53.287 and an S.D. of 0.149 (Table S2), suggesting that the extent of codon preference in the TMUV genomes is not strongly biased (ENC>40) and is maintained at a stable level. 3.3 The main determining factor of codon usage bias in TMUV In general, mutation pressure and natural selection are two main factors that shape codon usage bias. A plot of actual ENC values against both GC3s (%) and the expected ENC values provides a useful display of trends in codon usage. In this study, all of the points lie below the expected curve 3

(Fig. 2), indicating that although the TMUV genome is principally influenced by mutational pressure, other factors may be responsible for shaping the codon usage bias of TMUV. To further confirm whether the mutation pressure is due to natural selection pressure or viral mutation pressure, a correlation analysis was implemented to analyse relationships among the G+C content at the first and second codon positions (GC12s) and that at the synonymous third codon positions (GC3s). A significant correlation was observed (r=-0.264, p=0.042), demonstrating that mutation pressure dominates over natural selection pressure in shaping the TMUV nucleotide composition. 3.4 Other factors that affect codon usage patterns in TMUV The factors influencing codon usage bias can be estimated and confirmed through both correlation and PCA analyses. Correlation analysis was conducted by comparing TMUV third site codon compositions (A3s, T3s, G3s, C3s, and GC3s), ENC values, nucleotide compositions (A, T, G, and C), Gravy values, and Aromo values. A remarkable correlation was obtained between the ENC value and A (r=0.433, p<0.01) and between the ENC value and G (r=-0.538, p<0.01) (Table S3), thus indicating that the composition constraints might influence synonymous codon usage in TMUV. Based on the above results, codon usage bias among the different strains of TMUV was revealed to be directly related to base composition, thus suggesting that composition constraints could further drive TMUV evolution in complex environments. Moreover, a correlation index was observed between the major components (f1 and f2) of the codon usage pattern; for example, f1 had highly significant correlations with the G+C (r=0.456, p<0.01), A+T (r=-0.484, p<0.01), and GC3S (r=0.723, p<0.01) values (Table 2), indicating that composition constraints affect the codon usage patterns to some extent. Furthermore, f1 and f2 showed a strong correlation with protein hydrophobicity (r=-0.384, p<0.01; r=-0.352, p<0.01, respectively), and f2 showed a correlation with aromaticity (r=0.298, p<0.01); these results indicate that to some degree, hydrophobicity and aromaticity influence TMUV codon usage variations. 3.5 Dinucleotide biases also influence the TMUV codon usage bias To discern the effect of dinucleotide biases on the codon usage bias of TMUV, the frequencies of the 16 dinucleotides and the correlation coefficients between them as well as the position of the genomes along the first two axes (f1 and f2) were calculated (Table 3). Notably, the content of CG dinucleotides in the TMUV genomes was the lowest (mean frequency=0.02893, S.D.=0.0005); the TA dinucleotide content was 0.03659, with an S.D. of 0.0006 (Table S4). The results showed that CA, CC, CG, CT, TA, TC, TG, and TT in the TMUV genome were significantly related to its f1 axis (p<0.01), while the f2 axis was highly correlated to AA, AC, CA, GA, GG, and TG (p<0.01) (Table 3). These results revealed that the codon usage bias of TMUV is to some extent influenced by dinucleotide biases. 3.6 The genetic relationship based on synonymous codon usage in TMUV As shown in Fig. 3, the plots for chickens were generally aggregated into one cluster, with the exception of one strain isolated from the chicken in Malaysia. The plots for ducks, however, were generally separated into two major clusters; for geese, the plots were aggregated into one specific group with chickens and ducks. Moreover, the codon usage patterns of the strains isolated from geese, pigeons, and sparrows were similar to the patterns of the strains isolated from ducks (Fig. 3). We inferred from this phenomenon that a given host of the viral strains is likely related to the TMUV codon usage pattern and that TMUV shows multi-host infection patterns. Furthermore, the plots for TMUV strains isolated from birds in 2010, 2011, and 2012 showed similar trends, as opposed to the four strains isolated from ducks in 2013 (Fig. S1); this finding indicated that most 4

TMUV strains, both past and present, are relatively conserved phylogenetically. However, the possibility of a TMUV mutation in recent years should not be ignored. In addition, based on the potential influence of geographical factors on TMUV evolution, a plot of f1 and f2 was generated to illustrate geographic distribution (Fig. S2). The strains from China were divided into 2 subgroups and separated from Malaysian strains. A clear geographical demarcation in TMUV groups is vital for identifying the potential origin of TMUV isolates based on codon usage variation. Although the ancestry of the TMUV strain of Mainland China remains unknown, the TMUV strain in Chongqing (CQW1) clustered with two strains isolated from Guangxi (GX2013H and GX2013G) (Fig. S2, Fig. S4). Subsequently, it is reasonable to speculate that CQW1 evolved from GX2013H or GX2013G or that certain strains of TMUV likely disseminated from southeastern China to the mainland. 3.7 Comparative analysis of codon usage between TMUV and human cells A detailed comparison of different avian species revealed strong similarities in codon usage patterns (Fig. S3). However, codons encoding Leu, Ile, Pro, Ala, His and Gln showed dramatic differences among different birds, suggesting a high degree of variation for these amino acids during the epidemic process. Interestingly, TMUV’s preferred codons for seven amino acids correspond to the optimal codons encoding these amino acids in human cells (Table 1). This phenomenon suggests the possibility that TMUV can perform protein synthesis at a high rate of efficiency in human cells.

4 Discussion Codon usage bias in different species has different characteristics and is often considered to be an indicator of the forces shaping viral evolutionary trends(Chen et al., 2007; Jenkins and Holmes, 2003; Selva Kumar et al., 2012; Wong et al., 2010). To date, analyses of Flavivirus codon usage are fragmentary, and such studies have primarily focused on particular virus species, as previously described (Hu et al., 2011; Moratorio et al., 2013; Zhou et al., 2013). Conversely, the evolutionary characteristics and genetic relationships of codon usage bias in the TMUV genome have not been systematically studied; indeed, only a few genomic sequences of TMUV have been sequenced to date, with many studies only recently performed. The results have revealed that several complex and interesting correlations exist among the base compositions of these genomes, affecting the codon usage bias. For instance, prominent correlations of f1 with G+C (r=0.456, p<0.01), A+T (r=-0.484, p<0.01), and GC3S (r=0.723, p<0.01) values were observed, indicating that the nucleotide composition of the genome is one major factor influencing the synonymous codon usage pattern. The synonymous codon usage among TMUV CDSs was relatively low (ENC=53.287, S.D.=0.149), in agreement with previous results for RNA virus genomes, including West Nile virus (ENC=53.81)(Moratorio et al., 2013), Hepatitis C virus (ENC=52.62)(Hu et al., 2011), Newcastle disease virus (ENC= 56.15)(Wang et al., 2011), and H5N1 influenza virus (ENC=50.91)(Zhou et al., 2005). Therefore, it can be hypothesized that the degree of codon bias in TMUV is not remarkable. Furthermore, a possible explanation for this phenomenon could be that the weak codon usage variation of TMUV is essential for translational accuracy and efficiency and also beneficial for effective virus replication, re-adaption, and survival in various host cells. Another less likely possibility is that the low codon usage bias in TMUV might result from the limited number of samples included in this study, which may not be fully representative of TMUV. 5

As with some other RNA viruses, the significant determining factor shaping codon usage models is usually assumed to be mutation pressure, rather than natural selection(Liu et al., 2010; Zhong et al., 2007; Zhou et al., 2005). A highly significant correlation between the ENC value and GC3s was found, suggesting that mutation pressure might be the dominant factor in shaping codon usage patterns in TMUV. As shown in the ENC-GC3 plot, all of the points with low ENC values lying below the expected curve suggest that although codon usage bias is influenced by uneven base composition (mutation pressure), certain other factors also have an influence on the codon usage variation. Simultaneously, significant relationships among the GC12 and GC3 contents were observed, indicating that mutation pressure from TMUV itself was dominant over natural selection pressure in shaping the composition of the TMUV genome. Notably, mutation pressure in RNA viruses is much higher than that in DNA viruses(Drake and Holland, 1999). Therefore, it is understandable that mutation pressure would be the primary factor accounting for codon usage bias in TMUV strains. Furthermore, the first principal component had a substantial correlation with the general average hydrophobicity and aromaticity value of each strain. This finding, albeit preliminary, suggests that natural selection pressure may jointly account for codon usage variation. To gain insight into the potential effect of dinucleotides, the relative abundances of the 16 dinucleotides were summarized. Notably, the content of CG dinucleotides in the TMUV genomes was the lowest (Table S4) among the 16 dinucleotides. In addition, CpG was highly correlated to f1, and most CpG-containing codons of the TMUV genome were poorly used codons. The reason for this is not clear, but a possible explanation might be that CpG deficiency is a viral tactic used to escape the antiviral immune defence of the host. Importantly, unmethylated CpGs identify foreign pathogens, which are generally recognized by Toll-like receptor 9 (TLR9) of the host cells, triggering an effective immune response(Takeshita et al., 2001). Hence, TMUV strains may evade immune pressure from the host by avoiding the use of CG dinucleotides, resulting in large-scale outbreaks in various birds on distinct farms. These findings provided further support for the hypothesis that a lower CpG content in a viral genome could be beneficial for adaptation to vertebrate hosts, which was consistent with previous results(Shackelton et al., 2006; Wang et al., 2011). In China, TMUV strains have been isolated from many species, including chickens, ducks, geese, house sparrows, pigeons, and even mosquitoes (Table S1). Although the transmission cycle of this virus in birds or mosquitoes remains unknown or uncertain, WNV is transmitted from infectious mosquitoes to humans and other mammalian species by releasing virions from their salivary glands during feeding(Dauphin and Zientara, 2007). Therefore, geographical and seasonal distributions of mosquito vectors may be a key factor in influencing the transmission of TMUV. In addition, geographic factors obviously do not influence the TMUV codon usage bias in various provinces or municipalities in China, suggesting that the importance of geography as a determinant of codon usage bias in this virus gradually decreases with the promotion of social commercial trade. The strains isolated from China tended to aggregate in a group that was distant from the strains from Malaysia, suggesting obvious genetic characteristics in distinct regions. Most strains of TMUV circulate among the coastal areas of China (Fig. 1A), indicating that social factors, such as public activities, frontier defence inspection, and international trade, might be 6

involved in the genetic diversity of novel TMUV strains. Furthermore, unlike the major clusters, only one strain of TMUV from Malaysia was identified in 1955, suggesting a distant phylogenetic relationship with the China isolates. Additional attention should be given to the four strains isolated from ducks in 2013, as the plot distribution suggested that these four strains represent a different variant (Fig. 3, Fig. S1). The codon usage pattern of the goose TMUV was similar to that isolated from most ducks, indicating that the goose TMUV strains were of possible duck origin. The codon usage pattern of the TMUV strains isolated from Guangxi in 2011 was different from another Guangxi isolates in 2013 (Fig. S2). One possible explanation for this variance is that the codon usage pattern of the 2013 isolates has evolved to some degree; another possibility is that this strain originates from a location other than Guangxi. In short, these complex genetic diversities of TMUV strains isolated from diverse locations in different periods suggest that epidemic factors, including viral host, geographic location, and time, should be considered. In general, the codon usage bias pattern is the result of interactions between mutation pressure and natural selection. Viral replication depends on the machinery of the host cell, and the codon usage variations of different hosts may be barriers for the synthesis, assembly, or drift of palingenetic viruses. The preferential codons for amino acids of TMUV were largely coincident with those of human cells, including TGC, GAC, AAG, CTG, CAG, GTG, and TAC (Table 1), suggesting tRNA competence between virus and host. Nonetheless, some of the preferred codons of TMUV, including GAA, TTT, CAT, and AAT (Table 1), are disfavoured codons for the corresponding amino acids in human cells, which may negatively influence translation rate and the correct folding of viral proteins. The translation of coding sequences in the TMUV genome is likely regulated via the fine-tuning of translation kinetics, suggesting that the interplay of codon usage between TMUV and its hosts may have a great influence on viral fitness, survival and evolution. Furthermore, the adaptive characteristic of TMUV in birds and humans might suggest the possibility of efficient TMUV dissemination through various routes of transmission, including birds-birds, birds-mosquitoes-birds, birds-mosquitoes-mammals. Importantly, the seasonality of TMUV disease outbreaks is not relatively strong (Fig. 1B): TMUV infections can still be observed in winter, when mosquitoes disappear in cold environments. Although there are no experiments showing that this disease is transmitted through mosquitoes or ticks, one particular strain of TMUV was isolated from mosquitoes in Shandong Province, China. However, whether the transmission vector of TMUV is indeed a mosquito species or another vector, such as an avian species, has yet to be proven. Previous results demonstrated that TMUV-SDMS, which was first isolated from a mosquito in Shandong Province, was able to grow well in mosquito cells, duck cells and mammalian cells(Tang et al., 2015). Therefore, TMUV may pose a potential threat to public health because of possible cross-species transmission, persistent replication, and high virulence. Additionally, the classification tree of TMUV codon usage was similar to the results of our principal component analysis (Fig. S4) and a previous phylogenetic analysis(Liu et al., 2012; Yu et al., 2013). The genetic diversity revealed by this analysis of codon usage bias patterns provides a foundation for understanding the complex mechanism responsible for the molecular evolution, prevalence and pathogenesis of TMUV.

7

5 Conclusion In conclusion, comprehensive analysis of codon usage models for TMUV has provided a basic understanding of the evolutionary characteristics of the codon usage of this virus. This analysis revealed that codon usage among TMUV strains is slightly biased and that mutation pressure plays a substantial role in influencing codon usage variation in TMUV. Additionally, other determining factors, such as nucleotide composition, dinucleotide content, properties of proteins (hydrophobicity and aromaticity), host, geographic location, and epidemic year, all influence the codon usage bias of TMUV strains. Furthermore, TMUV’s potential for infecting human hosts or other mammals poses a threat to public health that should not be ignored. Thus, it is anticipated that deeper knowledge about the characteristics of recently identified TMUV strains will advance our current understanding about the mechanisms of TMUV infection and evolution. Acknowledgments This work was funded by the National Natural Science Foundation of China (31201891), the Sichuan Provincial Cultivation Program for Leaders of Disciplines in Science (2012JQ0040), the Major Project of Education Department in Sichuan Province (12ZA107), the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20125103120012), the Innovative Research Team Program in Education Department of Sichuan Province (No. 12TD005, 2013TD0015), the National Science and Technology Support Program (No. 2015BAD12B05), the National Special Fund for Agro-scientific Research in the Public Interest (No. 201003012), and the China Agricultural Research System (CARS-43-8).

Cao, Z., Zhang, C., Liu, Y., Liu, Y., Ye, W., Han, J., Ma, G., Zhang, D., Xu, F., Gao, X., Tang, Y., Shi, S., Wan, C., Zhang, C., He, B., Yang, M., Lu, X., Huang, Y., Diao, Y., Ma, X., Zhang, D., 2011. Tembusu virus in ducks, china. Emerg Infect Dis 17, 1873-1875. Chambers, T.J., Hahn, C.S., Galler, R., Rice, C.M., 1990. Flavivirus genome organization, expression, and replication. Annu Rev Microbiol 44, 649-688. Chen, R., Yan, H., Zhao, K.N., Martinac, B., Liu, G.B., 2007. Comprehensive analysis of prokaryotic mechanosensation genes: their characteristics in codon usage. DNA Seq 18, 269-278. Dauphin, G., Zientara, S., 2007. West Nile virus: recent trends in diagnosis and vaccine development. Vaccine 25, 5563-5576. Drake, J.W., Holland, J.J., 1999. Mutation rates among RNA viruses. Proc Natl Acad Sci U S A 96, 13910-13913. Hu, J.S., Wang, Q.Q., Zhang, J., Chen, H.T., Xu, Z.W., Zhu, L., Ding, Y.Z., Ma, L.N., Xu, K., Gu, Y.X., Liu, Y.S., 2011. The characteristic of codon usage pattern and its evolution of hepatitis C virus. Infect Genet Evol 11, 2098-2102. Huang, X., Han, K., Zhao, D., Liu, Y., Zhang, J., Niu, H., Zhang, K., Zhu, J., Wu, D., Gao, L., Li, Y., 2013. Identification and molecular characterization of a novel flavivirus isolated from geese in China. Res Vet Sci 94, 774-780. Jenkins, G.M., Holmes, E.C., 2003. The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 92, 1-7. Li, L., An, H., Sun, M., Dong, J., Yuan, J., Hu, Q., 2012a. Identification and genomic analysis of two duck-origin Tembusu virus strains in southern China. Virus Genes 45, 105-112. 8

Li, X., Li, G., Teng, Q., Yu, L., Wu, X., Li, Z., 2012b. Development of a blocking ELISA for detection of serum neutralizing antibodies against newly emerged duck Tembusu virus. PLoS One 7, e53026. Liu, P., Lu, H., Li, S., Moureau, G., Deng, Y.Q., Wang, Y., Zhang, L., Jiang, T., de Lamballerie, X., Qin, C.F., Gould, E.A., Su, J., Gao, G.F., 2012. Genomic and antigenic characterization of the newly emerging Chinese duck egg-drop syndrome flavivirus: genomic comparison with Tembusu and Sitiawan viruses. J Gen Virol 93, 2158-2170. Liu, Y.S., Zhou, J.H., Chen, H.T., Ma, L.N., Ding, Y.Z., Wang, M., Zhang, J., 2010. Analysis of synonymous codon usage in porcine reproductive and respiratory syndrome virus. Infect Genet Evol 10, 797-803. Liu, Z., Ji, Y., Huang, X., Fu, Y., Wei, J., Cai, X., Zhu, Q., 2013. An adapted duck Tembusu virus induces systemic infection and mediates antibody-dependent disease severity in mice. Virus Res 176, 216-222. Moratorio, G., Iriarte, A., Moreno, P., Musto, H., Cristina, J., 2013. A detailed comparative analysis on the overall codon usage patterns in West Nile virus. Infect Genet Evol 14, 396-400. Platt, G.S., Way, H.J., Bowen, E.T., Simpson, D.I., Hill, M.N., Kamath, S., Bendell, P.J., Heathcote, O.H., 1975. Arbovirus infections in Sarawak, October 1968--February 1970 Tembusu and Sindbis virus isolations from mosquitoes. Ann Trop Med Parasitol 69, 65-71. Selva Kumar, C., Nair, R.R., Sivaramakrishnan, K.G., Ganesh, D., Janarthanan, S., Arunachalam, M., Sivaruban, T., 2012. Influence of certain forces on evolution of synonymous codon usage bias in certain species of three basal orders of aquatic insects. Mitochondrial DNA 23, 447-460. Shackelton, L.A., Parrish, C.R., Holmes, E.C., 2006. Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses. J Mol Evol 62, 551-563. Takeshita, F., Leifer, C.A., Gursel, I., Ishii, K.J., Takeshita, S., Gursel, M., Klinman, D.M., 2001. Cutting edge: Role of Toll-like receptor 9 in CpG DNA-induced activation of human cells. J Immunol 167, 3555-3558. Tang, Y., Diao, Y., Chen, H., Ou, Q., Liu, X., Gao, X., Yu, C., Wang, L., 2015. Isolation and Genetic Characterization of a Tembusu Virus Strain Isolated From Mosquitoes in Shandong, China. Transbound Emerg Dis 62, 209-216. Tang, Y., Diao, Y., Gao, X., Yu, C., Chen, L., Zhang, D., 2012. Analysis of the complete genome of Tembusu virus, a flavivirus isolated from ducks in China. Transbound Emerg Dis 59, 336-343. Tang, Y., Diao, Y., Yu, C., Gao, X., Ju, X., Xue, C., Liu, X., Ge, P., Qu, J., Zhang, D., 2013a. Characterization of a Tembusu virus isolated from naturally infected house sparrows (Passer domesticus) in Northern China. Transbound Emerg Dis 60, 152-158. Tang, Y., Gao, X., Diao, Y., Feng, Q., Chen, H., Liu, X., Ge, P., Yu, C., 2013b. Tembusu virus in human, China. Transbound Emerg Dis 60, 193-196. Wan, C., Huang, Y., Fu, G., Shi, S., Cheng, L., Chen, H., 2012. Complete genome sequence of avian tembusu-related virus strain WR isolated from White Kaiya ducks in Fujian, China. J Virol 86, 10912. Wang, M., Liu, Y.S., Zhou, J.H., Chen, H.T., Ma, L.N., Ding, Y.Z., Liu, W.Q., Gu, Y.X., Zhang, J., 2011. Analysis of codon usage in Newcastle disease virus. Virus Genes 42, 245-253. Wong, E.H., Smith, D.K., Rabadan, R., Peiris, M., Poon, L.L., 2010. Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus. BMC Evol Biol 10, 253. Wright, F., 1990. The 'effective number of codons' used in a gene. Gene 87, 23-29. Yan, P., Zhao, Y., Zhang, X., Xu, D., Dai, X., Teng, Q., Yan, L., Zhou, J., Ji, X., Zhang, S., Liu, G., Zhou, Y., Kawaoka, Y., Tong, G., Li, Z., 2011. An infectious disease of ducks caused by a newly emerged Tembusu virus strain in mainland China. Virology 417, 1-8. Yu, K., Sheng, Z.Z., Huang, B., Ma, X., Li, Y., Yuan, X., Qin, Z., Wang, D., Chakravarty, S., Li, F., Song, M., 9

Sun, H., 2013. Structural, antigenic, and evolutionary characterizations of the envelope protein of newly emerging Duck Tembusu Virus. PLoS One 8, e71319. Yun, T., Ni, Z., Hua, J., Ye, W., Chen, L., Zhang, S., Zhang, Y., Zhang, C., 2012. Development of a one-step real-time RT-PCR assay using a minor-groove-binding probe for the detection of duck Tembusu virus. J Virol Methods 181, 148-154. Zhong, J., Li, Y., Zhao, S., Liu, S., Zhang, Z., 2007. Mutation pressure shapes codon usage in the GC-Rich genome of foot-and-mouth disease virus. Virus Genes 35, 767-776. Zhou, J.H., Zhang, J., Sun, D.J., Ma, Q., Chen, H.T., Ma, L.N., Ding, Y.Z., Liu, Y.S., 2013. The distribution of synonymous codon choice in the translation initiation region of dengue virus. PLoS One 8, e77239. Zhou, T., Gu, W., Ma, J., Sun, X., Lu, Z., 2005. Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. Biosystems 81, 77-86. Zhu, W., Chen, J., Wei, C., Wang, H., Huang, Z., Zhang, M., Tang, F., Xie, J., Liang, H., Zhang, G., Su, S., 2012. Complete genome sequence of duck Tembusu virus, isolated from Muscovy ducks in southern China. J Virol 86, 13119.

Figure captions. Fig. 1 A. Geographical location of the TMUV samples analysed in this study. The provinces or autonomous cities (regions) of the TMUV strains are indicated in red. The CQW1 strain isolated and sequenced by our laboratory is indicated in the green circle. B. Available seasonality of TMUV strains based on known isolation time in Table S1. Fig. 2 Plot of the effective number of codons (ENC) against the GC content of the third codon position. The continuous curve represented the expected curve between GC3s and ENC under random codon usage. The ENC values of the TMUV strains are marked with black triangles. Fig. 3 Genetic characteristics of TMUV coding sequences based on six hosts (mosquito, chicken, duck, goose, pigeon, and house sparrow). The first axis (f1) accounts for 27.10% of the total variation, and the second axis (f2) accounts for 15.58% of the total variation.

10

Table 1. The Synonymous codon usage (RSCU values) of TMUV. The preferentially used codons for each amino acid are displayed in bold.

Amino acids Alaa

Cysb Aspb Glua Phea Glya

Hisa Ilea

Lysb Leub

Asna Proa

Glnb Arga

Codon GCA GCCc GCGd GCT TGCc TGTd GACc GATd GAAd GAGc TTCc TTTd GGA GGCc GGG GGTd CACc CATd ATAd ATCc ATT AAAd AAGc CTA CTC CTGc CTT TTAd TTG AACc AATd CCA CCCc CCGd CCT CAAd CAGc AGAc AGG CGAd

Fraction 0.279 0.287 0.122 0.312 0.569 0.431 0.524 0.476 0.546 0.454 0.436 0.564 0.438 0.167 0.239 0.157 0.492 0.508 0.347 0.222 0.431 0.49 0.51 0.125 0.137 0.298 0.076 0.101 0.263 0.476 0.524 0.478 0.166 0.141 0.214 0.45 0.55 0.383 0.312 0.069

Frequency 21.517 22.174 9.452 24.076 9.618 7.278 22.908 20.845 33.124 27.588 14.006 18.087 38.874 14.774 21.196 13.889 9.627 9.934 17.328 11.072 21.502 27.943 29.082 11.009 12.065 26.129 6.64 8.903 23.074 17.499 19.284 20.529 7.141 6.076 9.199 12.478 15.251 22.48 18.282 4.018

Count 4423 4558 1943 4949 1977 1496 4709 4285 6809 5671 2879 3718 7991 3037 4357 2855 1979 2042 3562 2276 4420 5744 5978 2263 2480 5371 1365 1830 4743 3597 3964 4220 1468 1249 1891 2565 3135 4621 3758 826

RSCU 1.11 1.15 0.49 1.25 1.14 0.86 1.05 0.95 1.09 0.91 0.87 1.13 1.75 0.67 0.96 0.63 0.98 1.02 1.04 0.67 1.29 0.98 1.02 0.75 0.82 1.79 0.45 0.61 1.58 0.95 1.05 1.91 0.67 0.57 0.86 0.9 1.1 2.3 1.87 0.41

Sera

Thra

Valb

Tyrb

a

CGC CGG CGT AGCc AGT TCA TCC TCGd TCT ACA ACCc ACGd ACT GTAd GTC GTGc GTT TACc TATd

0.055 0.121 0.06 0.24 0.188 0.272 0.095 0.086 0.119 0.409 0.209 0.154 0.228 0.1 0.218 0.429 0.253 0.587 0.413

3.23 7.107 3.537 14.716 11.529 16.696 5.828 5.293 7.297 27.856 14.234 10.454 15.489 8.061 17.518 34.54 20.369 14.229 10.026

664 1461 727 3025 2370 3432 1198 1088 1500 5726 2926 2149 3184 1657 3601 7100 4187 2925 2061

0.33 0.73 0.36 1.44 1.13 1.63 0.57 0.52 0.71 1.64 0.84 0.61 0.91 0.4 0.87 1.72 1.01 1.17 0.83

Stands for amino acids whose optimal codons are A- or T-end. Stands for amino acids whose optimal codons are C- or G-end. c Stands for the optimal codons used in human cells (Pinto et al., 2007; Sanchez et al., 2003). d Stands for the rare codons used in human cells (Pinto et al., 2007; Sanchez et al., 2003). b

Table 2. Correlation analysis between the first two axes in principle (f1 and f2) and nucleotide contents (A, C, G, T), the respective nucleotide content at the third position of codons (T3s, C3s, A3s, G3s), the G+C value (G+C), the A+T value (A+T), the GC content at the third position of codons (GC3S), the effective number of codons (ENC) values, the hydrophobicity (Gravy) values, and the aromaticity (Aromo) values of the TMUV.

A C G T T3s C3s A3s G3s G+C A+T GC3s ENC Gravy Aromo *P<0.05 **P<0.01

f1 0.259* 0.590** 0.159 -0.663** -0.854** 0.756** 0.239 0.090 0.456** -0.484** 0.723** 0.146 -0.384** 0.217

f2 0.407** 0.280* -0.265* -0.374** 0.234 0.133 0.442** 0.242 0.167 0.090 0.014 0.382** -0.352** 0.298*

Table 3. Correlation coefficients between frequencies of 16 dinucleotides and the axis1, axis2 of TMUV (f1 and f2), respectively. Frequecy ratios of 16 dinucleotides are indicated in their own nucleotide name, respectively.

AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT *P<0.05 **P<0.01

f1 0.012 0.193 0.223 -0.048 0.585** 0.868** 0.725** -0.892** 0.084 -0.138 -0.172 -0.046 -0.352** -0.409** -0.801** 0.426**

f2 0.561** 0.334** -0.554 -0.034 0.340** 0.129 0.222 -0.160 -0.682** 0.073 0.476** -0.252 0.057 -0.073 -0.423** -0.233

Figure(1)

Figure(2)

Figure(3)

Highlights The characteristics of synonymous codon usage pattern in the TMUV were analyzed. The factors accounting for the evolutional processes of TMUV codon usage bias were identified. TMUV potential infection to public health should not be ignored.