The characteristic of codon usage pattern and its evolution of hepatitis C virus

The characteristic of codon usage pattern and its evolution of hepatitis C virus

Infection, Genetics and Evolution 11 (2011) 2098–2102 Contents lists available at SciVerse ScienceDirect Infection, Genetics and Evolution journal h...

251KB Sizes 0 Downloads 24 Views

Infection, Genetics and Evolution 11 (2011) 2098–2102

Contents lists available at SciVerse ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Short communication

The characteristic of codon usage pattern and its evolution of hepatitis C virus Jin-song Hu a,b,1, Qin-qin Wang a,1, Jie Zhang a,1, Hao-tai Chen a, Zhi-wen Xu b, Ling Zhu b, Yao-zhong Ding a, Li-na Ma a, Kai Xu b, Yuan-xing Gu a, Yong-sheng Liu a,⇑ a b

State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou 730046, China Animal Biological Technological Center, Sichuan Agricultural University, Ya’an 625014, China

a r t i c l e

i n f o

Article history: Received 20 May 2011 Received in revised form 22 August 2011 Accepted 24 August 2011 Available online 1 September 2011 Keywords: Hepatitis C virus Codon usage Relative synonymous codon usage values Effective number of codons values Correlation analysis

a b s t r a c t To give a new perspective on the codon usage of the hepatitis C virus (HCV) and the factors accounting for shaping the codon usage pattern of the virus, the relative synonymous codon usage (RSCU) values, aromaticity and hydrophobicity of each polyprotein of the virus, effective number of codons (ENC) values and nucleotide contents were calculated to implement a comparative analysis to evaluate the dynamics of the virus evolution. The RSCU values of each codon of 144 HCV ORFs indicated that all abundant codons were C/G-ended codons. The plots of principal component analysis based on sub-genotype of HCV indicated that sub-genotype 1a and 1b separated clearly on the axis of f2 suggesting that the codon usage bias between sub-genotype 1a and 1b strains was different. By comparing the codon usage between HCV and human cells, we found that the synonymous codon usage pattern of HCV was a mixture of coincidence and antagonism to that of host cells. The characteristics of the synonymous codon usage patterns and nucleotide contents of HCV, and the correlation analysis between GC3s, GC1,2s, GC% (ORF), GC% (50 UTR), GC% (30 -UTR), aromaticity, hydrophobicity and ENC value, respectively, indicated that mutational pressure was the dominant factor accounting for the codon usage variation and selection pressure also accounted for HCV codon usage pattern. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction Studies of codon usage pattern can reveal the molecular evolution of organisms, and contribute to understand the interaction between RNA viruses and the immune response of the hosts (Shackelton et al., 2006). Selection pressure is a phenomenon that alters the behavior and fitness of living organisms within the host environment such as the the fine-tuning translation kinetics selection mechanism (Aragones et al., 2008, 2010). Mutation pressure is the change in some gene frequencies due to the repeated occurrence of the same mutations. The molecular bases of this genetic variability may be the high error rate of the viral RNA-dependent RNA polymerase and the absence of proofreading mechanisms. The mutation frequencies for a variety of RNA viruses range from 104 to 105 substitution per base per round of copying (Domingo, 1996). It was reported that compositional constraints and translational selection were the main factors that accounted for codon usage variation among genes/organisms (Lesnik et al., 2000; Ghosh et al., 2000). In some unicellular organisms such as Escherichia coli and Bacillus subtilis, the highly expressed genes had a strong selective preference for codons with a high concentration of the ⇑ Corresponding author. Tel.: +86 931 8342771; fax: +86 931 8340977. 1

E-mail address: [email protected] (Y.-s. Liu). These authors contributed equally to this work.

1567-1348/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2011.08.025

corresponding tRNA molecule, whereas the lowly expressed genes displayed a uniform pattern of codon usage (Ikemura, 1981, 1985; Sharp and Li, 1986; Lesnik et al., 2000). Additionally, in some prokaryotes with extremely high A + T or G + C contents, mutation bias was the major factor accounting for the variation in codon usage (Bulmer, 1988). As for some RNA viruses, compared with translational selection, mutation pressure plays a more important role in the synonymous codon usage pattern (Jenkins and Holmes, 2003; Levin and Whittome, 2000). Although it is known that compositional constraints and translation selection are the generally accepted factors accounting for codon usage bias, other selection forces such as the fine-tuning translation kinetics selection, as well as escape from cellular antiviral responses have also been reported (Aragones et al., 2008, 2010; Karlin et al., 1994; Sugiyama et al., 2005). Hepatitis C virus (HCV), a small enveloped RNA virus, is the pathogen of chronic hepatitis C. It belongs to the hepacivirus genus of the Flaviviridae family (Suzuki et al., 2007; Alter, 1995). It consists of a single-stranded positive-sense RNA of about 9.6 kb, which contains an open reading frame (ORF) encoding a polyprotein precursor of approximately 3000 residues, flanked by untranslated regions (UTR) at both ends (Choo et al., 1989; Suzuki et al., 2007). An important feature of the HCV genome is its high degree of genetic variability (Pawlotsky, 2006; Suzuki et al., 2007). HCV is classified into at least seven principal genotypes that

J.-s. Hu et al. / Infection, Genetics and Evolution 11 (2011) 2098–2102

differ in their nucleotide sequences by 31–34% and in their amino acid sequences by 30%. These genotypes (1, 2, 3, 4, 5, 6 and 7) show differences with regard to their worldwide distribution, transmission and disease progression, and have been further classified into different sub-genotypes (a, b, c, d, etc) (Kim et al., 2010; Sharma, 2010; Duarte et al., 2010). HCV, which is most commonly spread by direct contact with the infected blood and the blood products, has been recognized as a major cause of chronic liver disease. Chronic infection eventually causes cirrhosis leading to hepatocellular carcinoma (HCC) and ultimately death. Currently, there is no vaccine to prevent hepatitis C (Suzuki et al., 2007; Sharma, 2010). Abundant genome sequences of HCV have been published and lots of studies have been performed in recent years (Suzuki et al., 2007; Duarte et al., 2010), but little codon usage information about HCV is available. In order to better understand the characteristics of the viral genome evolution, the codon usage and the potential factors accounting for codon usage pattern of HCV was analyzed in the study. 2. Materials and methods 2.1. HCV genome sequences In this study, a total of 144 HCV complete genomic sequences representing seven genotypes and 35 sub-genotypes were selected from NCBI (http://www.ncbi.nlm.nih.gov/) according to the availability of collection time, sub-genotypes, sampling country, sampling year and patient information as much as possible. The serial number, GenBank accession numbers, sub-genotypes, and other detail information were listed in Supplementary Table 1. 2.2. Compositional properties measures To examine compositional properties of 144 HCV sequences, the GC3s (the frequencies of nucleotide G + C at the third codon position) and GC1,2s (the mean frequencies of nucleotide G + C at the first and the second position) of each ORF were calculated. The GC content of the ORF, 50 -UTR and 30 -UTR of HCV samples were also calculated, respectively. 2.3. Analysis of codon usage The relative synonymous codon usage (RSCU) values of each codon (excluding AUG, UGG and the termination codons) of 144 complete coding sequences of HCV were calculated according the previously reported method (Sharp and Li, 1986; Sharp et al., 1986). The codons with RSCU values >1.0 have positive codon usage bias (abundant codons), while those with RSCU values <1.0 have negative codon usage bias (less-abundant codons), and when the RSCU values is 1.0, it means that these codons are chosen equally or randomly (Sharp and Li, 1986). To examine the synonymous codon usage bias of the whole coding sequences, the ‘Effective Number of Codons’ (ENC) values (Wright, 1990; Schubert and Putonti, 2010) of 144 HCV strains were calculated. The ENC value is the best overall estimator of one gene/ORF absolute synonymous codon usage bias (Comeron and Aguade, 1998). The ENC values are always between 20 (when only one codon is used for each amino acid) and 61 (when all codons are used equally) (Sharp and Li, 1986). A plot was drawn to show the distribution of the GC3s and ENC values among 144 HCV strains. Codon adaptation index (CAI) of all HCV ORFs was also calculated. The CAI value ranges from 0 to 1. The CAI was proposed as a quantitative way of predicting the expression level of a gene based on its codon sequence. The most frequent codons simply have the highest relative adaptiveness values, and sequences with higher CAIs are preferred over

2099

those with lower CAIs (Kadam and Ghosh, 2008). A rare codon was defined as one whose frequency was less than 30% that of its most abundant synonym in each of the codon usage tables (Gavrilin et al., 2000; Sanchez et al., 2003), and a comparative analysis of the codon usage was implemented between HCV and human cells. 2.4. Analysis of influencing factors of codon usage pattern Correlation analysis between GC3s, GC1,2s, GC% (ORF), GC% (50 UTR), GC% (30 -UTR), aromaticity, hydrophobicity of corresponding encoding polyprotein of each gene and ENC value among 144 samples were carried out, respectively, using the Pearson’s rank correlation analysis method (Ewens and Grant, 2001). Principal component analysis (PCA) was used to investigate the major trend in codon usage variation among ORFs (Jolliffe, 2002; Mardia et al., 1979; Liu et al., 2011). In order to minimize the effect of amino acid composition on codon usage, each ORF is represented as a 59-dimensional vector. Each dimension corresponds to the RSCU value of one sense codon (excluding AUG, UGG and the termination codons). These indices mentioned above were calculated by the program Codon W and all statistical processes were done by statistical software SPSS17.0. 3. Results 3.1. Compositional properties Among the 144 samples, the GC3s values ranged from 56.5% to 69.1% with a mean value 65.9% and a standard error 0.03, the GC1,2s values ranged from 53.3% to 55.1% with a mean value 54.4% and a standard error 0.003, and the GC content of ORF ranged from 54.7% to 59.4% with a mean value 58.2% and a standard error 0.012 (Supplementary Table 2). These indicated that HCV is a GC abundant virus. 3.2. Codon usage The RSCU values of each codon of 144 HCV complete coding sequences indicated that all abundant codons were C/G-ended codons and the overwhelming majorities of less-abundant codons were A/U-end codons. Although first nucleotide position is a synonymous position in Leu (UUA-CUA, UUG-CUG) and Arg (CGAAGA, CGG-AGG), the all G-ended codons (UUG, CUG, CGG and AGG) are used more often than the A-ended codons (UUA, CUA, CGA and AGA) when the coding sequence of HCV is being translated (Table 1). The phenomenon suggested that the codon usage bias of HCV was related to the G/C bias of the coding sequences. The CAI values among the 144 HCV ORFs ranged from 0.167 to 0.199 with a mean value 0.179 and a standard error 0.007. It was low, implying that the codon usage bias and the expression level of HCV were low. Furthermore, the ENC values among the 144 HCV ORFs ranged from 50.68 to 56.99 with a mean value 52.62 and a standard error 1.707 (Supplementary Table 2). Based on the comparative analysis with the published data of codon usage bias among other RNA viruses such as bovine viral diarrhea virus (ENC = 51.43), classical swine fever virus (ENC = 51.7), hepatitis A virus (ENC = 39.78) and hepatitis E virus (ENC = 48.2) (Wang et al., 2010; Tao et al., 2009; Sanchez et al., 2003; D’ Andrea et al., 2011; Jenkins and Holmes, 2003), we could conclude that codon usage bias of HCV whole coding sequence was lower. In addition, a tendency that ENC values of genotype 1 strains were lower than that of the other genotypes (2, 3, 4, 5, 6 and 7) was observed (Fig. 1). Furthermore, the differences of the ENC values between

2100

J.-s. Hu et al. / Infection, Genetics and Evolution 11 (2011) 2098–2102

Table 1 The characteristic of synonymous codon usage of HCV. AAa

Codon

RSCUb

Phe

UUU UUC

c

0.63 1.37

Leu

UUAc UUGd CUUc CUC CUAc CUG

Ile

Val

AAa

Codon

RSCUb

His

c

CAU CAC

0.79 1.21

0.24 0.94 0.89 1.73 0.57 1.63

Gln

CAAc CAG AAUc AAC AAAc AAG

0.77 1.23 0.64 1.36 0.65 1.35

AUUc AUC AUAc

0.56 1.73 0.71

Asp

GAUc GAC GAAc

0.56 1.44 0.50

GUUc GUC GUAc GUG

0.60 1.37 0.47 1.55

GAG UGUc UGC CGUc

1.50 0.63 1.37 0.57

Pro

CCUc CCC CCAc CCGd

0.95 1.54 0.80 0.72

CGC CGAc CGG AGAc

1.16 0.52 1.25 0.79

Thr

ACUc ACC ACAc ACGd

0.82 1.59 0.79 0.80

AGG GGUc GGC GGAc

1.71 0.64 1.53 0.61

GCUc GCC GCAc GCGd

0.93 1.46 0.70 0.91

GGG AGUc AGC UCUc

1.22 0.40 1.25 0.90

UAUc UAC

0.69 1.31

UCC UCAc UCGd

1.88 0.83 0.74

Ala

Tyr

Asn Lys

Glu Cys Arg

Gly

Ser

The abundant codons are described in bold. a Stands for amino acid. b Stands for the relative synonymous codon usage value. c Stands for less-abundant codons which are A/U-ended. d Stands for less-abundant codons which are G/C-ended.

Fig. 1. Distribution of GC3 and ENC of 144 HCV samples. The solid line shows the expectation of ENC given GC3 according to the hypothesis that there is no selection and G/C biases at silent sites are due to mutation (Wright, 1990).

the genotype 1 strains and the others (genotypes 2, 3, 4, 5, 6 and 7) are statistically significant (Mann–Whitney test, P < 0.001). The results implied that the codon usage bias of the genotype 1 strains was stronger than that of the other genotypes (2, 3, 4, 5, 6 and 7). However, the distinction of synonymous codon usage bias among genotypes 2, 3, 4, 5, 6 and 7 strains was not remarkable. It implied that genotype 1 strains may have other specific selection pressure to explain the differences between genotype 1 and the other genotypes. Among 59 synonymous codons, two rare codons were detected. By comparing the codon usage between HCV and human cells, we found that the codon usage of HCV was partially antagonistic to that of human cells. In details, six codons consisting of GUA, AUA, UUG, CUU, CUA and UCG which were rare in human cells were not rare in the HCV, while the rare codon (AGU) of HCV was not rare in host cells. In addition, the synonymous codon usage patterns of Gly, Lys, Asn, Gln, His, Glu, Asp, Tyr, Cys, Phe, Pro and Arg of HCV were in agreement with those of human cells, and the rare codon of HCV (UUA) was also the rare codon in human cells (Supplementary Table 3). 3.3. Influencing factors of codon usage pattern Firstly, the results of correlation analysis showed significantly negative correlation existed between GC3s, GC1,2s, GC-content of complete coding sequence and ENC values, respectively (r = 0.974, p < 0.01; r = 0.639, p < 0.01; r = 0.964, p < 0.01, respectively) (Table 2). These analyzes indicated that the codon usage pattern was directly related to the nucleotide composition of the coding sequence. Secondly, the r values showed that ENC values had a negative correlation with the GC content of 30 -UTR (r = 0.650, p < 0.01) and 50 -UTR (r = 0.592, p < 0.01) (Table 2). The results showed that the nucleotide composition of UTR was also important in affecting these viruses codon usage pattern. We found that all plots of the coding sequences were under the expected curve (Fig. 1). It may be explained that the effective codon usage for all of 144 complete coding sequences was lower than the expectation. Therefore, it could be concluded that base composition was not the only factor influencing codon usage. The r values showed that ENC values had a positive correlation with the aromaticity and ORF length, (r = 0.425, p < 0.01; r = 0.502, p < 0.01), while ENC values had a negative correlation with the hydrophobicity (r = 0.461, p < 0.01) (Table 2). The results showed that the hydrophobicity, aromaticity and ORF length were also critical in affecting HCV codon usage pattern. The PCA detected the first principal component (f1) which can account for 28.89% of the total synonymous codon usage variation, and the second principal component (f2) for 17.74% of the total variation. A plot of f1 and f2 was drawn according to genotypes. Both the plots of sub-genotype 1a and that of 1b strains aggregated highly, while the plots of the other genotypes (2–7) strains scattered largely. In addition, the plots of sub-genotype 1a and 1b separated clearly on the axis of f2 (Supplementary Figs. 1A and B). It suggested that the codon usage bias between sub-genotype 1a and 1b strains was different. A plot of f1 and f2 was also drawn according to the duration of infection in individuals (acute phase and chronic phase). It appeared to be a little complex with some overlapping plots representing sub-genotype 1a strains from acute phase and chronic phase, while the sub-genotype 1b strains from acute phase and chronic phase were separated clearly (Supplementary Fig. 2). It suggested that the length of infection in individuals was probably related to the codon usage pattern in HCV sub-genotype 1b strains. The strains of the other sub-genotypes (1c, 1g, 2c, 2j, 2k, 2q, 3a, 3b, 3i, 3k, 4a, 4b, 4d, 4f, 4k, 5a, 6a, 6b, 6d, 6g, 6h, 6k, 6n, 6q, 6r, 6s, 6t, 6u, 6v, 6w and 7a) have not been analyzed, due to the limited samples. We also performed a plot of f1 and f2

2101

J.-s. Hu et al. / Infection, Genetics and Evolution 11 (2011) 2098–2102

Table 2 Summary of correlation analysis between GC3s, GC1,2s, GC% (ORF), GC% (50 -UTR), GC% (30 -UTR), ORF length, aromaticity and hydrophobicity of each polyprotein of the virus and ENC values, respectively.

ENC

GC3s

GC1,2s

GC% (ORF)

Aro

Hyd

GC% (50 -UTR)

GC% (30 -UTR)

ORF length

r = 0.974 p < 0.01

r = 0.639 p < 0.01

r = 0.964 p < 0.01

r = 0.425 p < 0.01

r = 0.461 p < 0.01

r = 0.592 p < 0.01

r = 0.650 p < 0.01

r = 0.502 p < 0.01

according to collection time and collection areas. However, the plots failed to show the remarkable relevancy between the collection time, country and codon usage pattern (Supplementary Figs. 3 and 4). 4. Discussion The bovine viral diarrhea virus (BVDV) genome was A/U abundant and one half of abundant codons of BVDV were A/U-end codons (Wang et al., 2010). In this study, we found that HCV was G/C abundant and all abundant codons of HCV, which belonged to the same family Flaviviridae with BVDV, were C/G-ended codons. The nucleotide composition of the coding sequence was therefore the probably crucial factor shaping the codon usage pattern. The codon usage pattern of poliovirus was mostly coincident with that of its host, while the codon usage pattern of HAV is antagonistic to that of its host (Mueller et al., 2006; Sanchez et al., 2003; Liu et al., 2011). The codon usage pattern of HCV is a mixture of the coincidence and antagonism to that of the host cells. The coincident portions of codon usage enable the corresponding amino acids to be translated efficiently, and the other antagonistic portions of codon usage may enable viral proteins to be folded properly, although the translation efficiency of the corresponding amino acids decreased. The CAI is designed for predicting the level of gene expression and assessing the adaptation of viral genes to their hosts. It was well known that the highly expressed genes exhibit a strong bias for some particular codons in many bacteria and small eukaryotes (Kadam and Ghosh, 2008). The CAI value of HCV was low and the codon usage pattern of HCV was partially antagonistic to that of the human cells, implying that the translation of the coding sequences of HCV was possibly regulated under the fine-tuning translation kinetics selection (Aragones et al., 2010; Komar, 2009). We used ENC values to quantify the synonymous codon usage bias, which was widely used to analyze the variation of codon usage bias among the different virus species (Wright, 1990; Schubert and Putonti, 2010). It suggested that the codon usage bias of human RNA viruses was low with an ENC values close to 45 (Jenkins and Holmes, 2003). In the study, we found that the ENC values among the HCV samples ranged from 50.74 to 57.32 with a mean value 53.64 and a standard error 1.85. Compared with other human hepatitis virus such as hepatitis A virus (ENC = 39.78) and hepatitis E virus (ENC = 48.2) (Jenkins and Holmes, 2003; Sanchez et al., 2003; D’ Andrea et al., 2011), HCV has a lower synonymous codon usage bias. One possible explanation was that a low bias was advantageous to virus replication in human cells. That the GC content varies at all codon positions is usually assumed to be the result of mutational bias (Adams and Antoniw, 2004). A general mutational bias which affects the compositional properties of the whole genome would certainly account for the majority of the codon usage variation. Mutational bias is thought to be the dominant factor accounting for the codon usage variation in different organisms (Das et al., 2006; Levin and Whittome, 2000; Shackelton et al., 2006). In this study, we found that there was significant correlation between the nucleotide composition and the codon usage pattern and this was in agreement with the previous

reports about other viruses (Gu et al., 2004; Liu et al., 2010). Since mutation rates in RNA viruses are much higher than those in DNA viruses (Drake and Holland, 1999), it is understandable that mutation pressure is the determinant source of codon usage bias in the HCV strains in this study. Generally, ORF length (base), aromaticity and hydrophobicity of the corresponding polyprotein are thought to be the factors from the translational selection accounting for the codon usage pattern in the different organisms (Wang et al., 2010; Zhong et al., 2007). It was reported that the codon usage was influenced by the hydropathy strength of each protein in Chlamydia trachomatis and Thermotoga maritime (Zavala et al., 2002; Romero et al., 2000; Zhong et al., 2007). There was negative correlation between ENC values and hydrophobicity, while positive correlation between ENC values and aromaticity in the HCV, implying that the translational selection pressure may be involved in HCV codon usage pattern. The link between hydropathy and the codon usage may be caused by the facts that the expressed protein are hydrophilic just because they accomplish their function in the aqueous media of the cell (Zhong et al., 2007). The general association between codon usage bias and base composition suggests that the mutational pressure rather than the natural selection is the dominant factor accounting for the codon usage bias in complete coding sequences of HCV virus. This was supported by the fact that the r absolute value (0.974, 0.639, 0.964, 0.592 and 0.650) of the correlation analysis between GC3s, GC1,2s, GC-content of ORF, 50 -UTR and 30 -UTR and ENC values, respectively, were all higher than the r absolute value of correlation analysis between aromaticity, hydrophobicity, ORF length and ENC values, respectively (0.425, 0.461 and 0.502). It has been suggested that the patients infected with HCV subgenotypes 1b and, to a lesser degree, 1a are less likely to have a favorable response to interferon treatment than those infected with genotype 2 or 3 (Zein et al., 1996; Zein, 2000). In our study, a tendency was also observed that the codon usage bias of the genotype 1 strains were stronger than the other genotypes (2, 3, 4, 5, 6 and 7). The existence of correlation between the codon usage pattern and the heterogeneous virologic response rates to interferon-based therapy in patients with chronic hepatitis C caused by HCV genotype 1 and other genotypes needs to be further confirmed. By analyzing the codon usage pattern of subtype 1b strains from the acute phase and the chronic phase, we found the duration of infection in individuals may be related to the codon usage pattern in HCV sub-genotype 1b strains. However, this conclusion needs to be further analyzed due to the limited acute phase samples of HCV subtype 1b. Taken together, our analysis revealed that the mutational pressure was the dominant factor accounting for the codon usage pattern, and the selection pressure also accounted for HCV codon usage pattern. Acknowledgments This work was supported in parts by Grants from International Science and Technology Cooperation Program of China (No. 2010DFA32640) and Science and Technology Key Project of Gansu Province (No. 0801NKDA034). This study was also supported by

2102

J.-s. Hu et al. / Infection, Genetics and Evolution 11 (2011) 2098–2102

National Natural Science foundation of China (Nos. 31172335 and 31072143). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.meegid.2011.08.025. References Adams, M.J., Antoniw, J.F., 2004. Codon usage bias amongst plant viruses. Arch. Virol. 149, 113–135. Alter, M.J., 1995. Epidemiology of hepatitis C in the West. Semin. Liver Dis. 15, 5–14. Aragones, L., Bosch, A., Pinto, R., 2008. Hepatitis A virus mutant spectra under the selective pressure of monoclonal antibodies: codon usage constrains limit capsid variability. J. Virol. 82, 1688–1700. Aragones, L., Guix, S., Ribes, E., Bosch, A., Pinto, R.M., 2010. Fine-tuning translation kinetics selection as the driving force of codon usage bias in the Hepatitis A virus capsid. PLoS Pathog. 6, e1000797. Bulmer, M., 1988. Are codon usage patterns in unicellular organisms determined by selection–mutation balance? J. Evol. Biol. 1, 15–26. Choo, Q.L., Kuo, G., Weiner, A.J., Overby, L.R., Bradley, D.W., Houghton, M., 1989. Isolation of a cDNA clone derived from a blood-borne non-A, non-B viral hepatitis genome. Science 244, 359–362. Comeron, J.M., Aguade, M., 1998. An evaluation of measures of synonymous codon usage bias. J. Mol. Evol. 47, 268–274. D’ Andrea, L., Pintó, R.M., Bosch, A., Musto, H., Cristina, J., 2011. A detailed comparative analysis on the overall codon usage patterns in hepatitis A virus. Virus Res. 157, 19–24. Das, S., Paul, S., Dutta, C., 2006. Synonymous codon usage in adenoviruses: influence of mutation, selection and protein hydropathy. Virus Res. 117, 227–236. Domingo, E., 1996. Biological significance of viral quasispecies. Viral Hep. Rev. 2, 247–261. Drake, J.W., Holland, J.J., 1999. Mutation rates among RNA viruses. Proc. Natl. Acad. Sci. USA 96, 13910–13913. Duarte, C.A.B., Foti, L., Nakatani, S.M., Riediger, I.N., Poersch, C.O., Pavoni, D.P., Krieger, M.A., 2010. A novel Hepatitis C virus genotyping method based on liquid microarray. Plos One 5, e12822. Ewens, W.J., Grant, G.R., 2001. Statistical Methods in Bioinformatics. Springer, New York. Gavrilin, G.V., Cherkasova, E.A., Lipskaya, G.Y., Kew, O.M., Agol, V.I., 2000. Evolution of circulating wild poliovirus and of vaccine-derived poliovirus in an immunodeficient patient: a unifying model. J. Virol. 74, 7381–7390. Ghosh, T.C., Gupta, S.K., Majumdar, S., 2000. Studies on codon usage in Entamoeba histolytica. Int. J. Parasitol. 30, 715–722. Gu, W.J., Zhou, T., Ma, J.M., Sun, X., Lu, Z.H., 2004. Analysis of synonymous codon usage in SARS coronavirus and other viruses in the Nidovirales. Virus Res. 101, 155–161. Ikemura, T., 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389–409. Ikemura, T., 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34. Jenkins, G.M., Holmes, E.C., 2003. The extent of codon usage bias in human RNA virus and its evolutionary origin. Virus Res. 92, 1–7. Jolliffe, I.T., 2002. Principal Component Analysis, second ed.. Springer-Verlag Inc., New York. Kadam, U.S., Ghosh, S.B., 2008. Codon adaptation index analysis of RNA genome plant viruses. Curr. Sci. 94, 1. Karlin, S., Doerfler, W., Cardon, L.R., 1994. Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? J. Virol. 68, 2889–2897.

Kim, J., Ahn, Y., Lee, K., Park, S.H., Kim, S., 2010. A classification approach for genotyping viral sequences based on multidimensional scaling and linear discriminant analysis. BMC Bioinform. 11, 434. Komar, A.A., 2009. A pause for thought along the co-translational folding pathway. Trends Biochem. Sci. 34, 16–24. Lesnik, T., Solomovici, J., Deana, A., Ehrlich, R., Reiss, C., 2000. Ribosome traffic in E. coli and regulation of gene expression. J. Theor. Biol. 202, 175–178. Levin, D.B., Whittome, B., 2000. Codon usage in nucleopolyhedroviruses. J. Gen. Virol. 81, 2313–2325. Liu, Y.S., Zhou, J.H., Chen, H.T., Ma, L.N., Ding, Y.Z., Wang, M., Zhang, J., 2010. Analysis of synonymous codon usage in porcine reproductive and respiratory syndrome virus. Infect. Genet. Evol. 10, 797–803. Liu, Y.S., Zhou, J.H., Chen, H.T., Ma, L.N., Pejsak, Z., Ding, Y.Z., Zhang, J., 2011. The characteristics of the synonymous codon usage in enterovirus 71 virus and the effects of host on the virus in codon usage pattern. Infect. Genet. Evol. 11, 1168– 1173. Mardia, K.V., Kent, J.T., Bibby, J.M., 1979. Multivariate Analysis. Academic Press, New York. Mueller, S., Papamichail, D., Coleman, J.R., Skiena, S., Wimmer, E., 2006. Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity. J. Virol. 80, 9687–9696. Pawlotsky, J.M., 2006. Hepatitis C virus population dynamics during infection. Curr. Top. Microbiol. Immunol. 299, 261–284. Romero, H., Zavala, A., Musto, H., 2000. Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces. Nucleic Acids Res. 28, 2084–2090. Sanchez, G., Bosch, A., Pinto, R.M., 2003. Genome variability and capsid structural constraints of Hepatitis A virus. J. Virol. 77, 452–459. Schubert, A.M., Putonti, C., 2010. Evolution of the sequence composition of Flaviviruses. Infect. Genet. Evol. 10, 129–136. Shackelton, L.A., Parrish, C.R., Holmes, E.C., 2006. Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses. J. Mol. Evol. 62, 551–563. Sharma, S.D., 2010. Hepatitis C virus: molecular biology and current therapeutic options. Indian J. Med. Res. 131, 17–34. Sharp, P.M., Li, W.H., 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24, 28–38. Sharp, P.M., Tuohy, T., Mosurski, K., 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14, 5125–5143. Sugiyama, T., Gursel, M., Takeshita, F., Coban, C., Jacqueline, C., Kaisho, T., Akira, S., Klinman, D.M., Ishii, K.J., 2005. CpG RNA: identification of novel single-stranded RNA that stimulates human CD14+ CD11c+ monocytes. J. Immunol. 174, 2273– 2279. Suzuki, T., Ishii, K., Aizaki, H., Wakitab, T., 2007. Hepatitis C viral life cycle. Adv. Drug Deliver Rev. 59, 1200–1212. Tao, P., Dai, L., Luo, M., Tang, F., Tien, P., Pan, Z., 2009. Analysis of synonymous codon usage in classical swine fever virus. Virus Genes 38, 104–112. Wang, M., Zhang, J., Zhou, J., Chen, H., Ma, L., Ding, Y., Liu, W., Liu, Y., 2010. Analysis of codon usage in bovine viral diarrhea virus. Arch. Virol. 156, 153–160. Wright, F., 1990. The ‘‘effective number of codons’’ used in a gene. Gene 87, 23–29. Zavala, A., Naya, H., Romero, H., Musto, H., 2002. Trends in codon and amino acid usage in Thermotoga maritime. J. Mol. Evol. 54, 563–568. Zein, N.N., 2000. Clinical significance of hepatitis C virus genotypes. Clin. Microbiol. Rev. 13, 223–235. Zein, N.N., Rakela, E.L., Krawitt, K.R., Reddy, T., Tominaga, D.H.Persing, the Collaborative Study Group, 1996. Hepatitis C virus genotypes in the United States: epidemiology, pathogenicity, and response to interferon therapy. Ann. Intern. Med. 125, 634–639. Zhong, J., Li, Y., Zhao, S., Liu, S., Zhang, Z., 2007. Mutation pressure shapes codon usage in the GC-Rich genome of foot-and-mouth disease virus. Virus Genes 35, 767–776.