Infection, Genetics and Evolution 54 (2017) 458–465
Contents lists available at ScienceDirect
Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid
Short communication
Codon usage bias in the N gene of rabies virus 1
1
MARK
1
Wanting He , Hongyu Zhang , Yuchen Zhang , Ruyi Wang, Sijia Lu, Yanjie Ji, Chang Liu, Pengkun Yuan, Shuo Su⁎ Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
A R T I C L E I N F O
A B S T R A C T
Keywords: Rabies virus Nucleoprotein Evolution analysis Phylogenetic analysis Codon usage bias
Since its emergence, rabies virus (RABV) has been a major worldwide concern especially in developing countries. The nucleoprotein (N) of RABV is highly conserved and key for genetic typing, thus a better understanding of the N gene evolutionary trajectory can assist the development of control measures. We found that the N gene of RABV has a low codon usage bias with a mean effective number of codons (ENC) value of 56.33 influenced by both mutation pressure and natural selection. However, neutrality analysis indicated that natural selection dominates over mutation pressure. Additionally, we found that dinucleotide bias partly contributed to RABV codon usage bias. On the other hand, based on the clades of phylogenetic tree, we found that the evolutionary rate of the Africa 2 clade was the highest with a mean value of 3.75 × 10− 3 substitutions per site per year. Above all, our results regarding N gene of RABV codon usage will serve future RABV evolution research.
1. Introduction
(P), matrix protein (M), glycoprotein (G) and the large RNA dependent RNA polymerase (L) (Schnell et al., 2009). Previous research described that N, P, L are responsible for synthesis of the viral RNA (Conzelmann and Schnell, 1994) while M and G are essential for release and virus infectivity (Mebatsion et al., 1996; Mebatsion et al., 1999). During viral replication, the ribonucleoprotein (RNP) complex is formed and the N protein encapsidates the viral RNA. The viral polymerase complex that includes the P and L proteins, serves as a template for the transcription and replication of the viral RNA (Tordo and Kouknetzoff, 1993; Tordo et al., 1988). The G protein is part of the lipid bilayer (Mebatsion et al., 1999) and mediates infection of the host cell (Dietzschold et al., 1983). Importantly, the M protein is an important determinant of pathogenicity (Finke et al., 2010). The L protein is a multifunctional protein (Surhone et al., 2010) while the G protein is an antigenic determinant (Paul et al., 2008). Compared with the other five proteins, the N protein is highly conserved and is employed in diagnosis and classification of the virus (Mehta et al., 2016). Additionally, the N protein plays a significant role in escaping the host innate immunity (Masatani et al., 2013) and pathogenicity (Masatani et al., 2011), therefore understanding the evolution of nucleoprotein gene merits investigation. The genetic code is degenerate since the same amino acid can be encoded by multiple codons (Lagerkvist, 1978). The frequency of use of synonymous codons in different organisms is not random. This phenomenon is termed codon usage bias (Marín et al., 1989). In general, the main purpose of optimal codon usage of various genes is to find the
Rabies, as a major viral fatal disease that has caused epidemics in 150 countries on every continent except for Antarctica, is an unheeded public health problem (Baghi et al., 2016). More than 55,000 human rabies-related deaths occur every year worldwide with most fatalities (almost 99%) found in developing countries (Knobel et al., 2005; Consultation WHOE et al., 2005). Notably, in Asia human rabies cases account for over 80% of the world total cases (Tang et al., 2005). Rabies virus (RABV) is a member of the Rhabdoviridae, genus Lyssavirus (Nel and Markotter, 2007). The natural evolution of RABV provides an example of multiple host switches, which allows comparative studies of the evolutionary patterns, processes, and dynamics associated with host adaptation (Troupin et al., 2016). As previous studies showed, RABV isolates mainly cluster in two phylogenetic groups: the dog- and the batrelated RABV groups (Kuzmin et al., 2012). Compared to the bat-related isolates that mainly circulate in bats and rural skunks and raccoons (Kuzmin et al., 2012; Biek et al., 2007), the dog-related isolates can be found not only in domestic dogs worldwide but also in wildlife such as foxes in Europe, Middle East and Americas, raccoons in Asia, mongooses in Africa, and so on (Oem et al., 2013; Nel et al., 2005; Bourhy et al., 1999). Importantly, dogs are the main virus reservoir and the major source for the dissemination of the disease (Tang et al., 2005). The RABV genome is composed of a single RNA molecule of negative polarity that encodes for the nucleoprotein (N), phosphoprotein
⁎
1
Corresponding author. E-mail address:
[email protected] (S. Su). Wanting He, Hongyu Zhang and Yuchen Zhang contributed equally to this work.
http://dx.doi.org/10.1016/j.meegid.2017.08.012 Received 8 May 2017; Received in revised form 11 August 2017; Accepted 12 August 2017 Available online 14 August 2017 1567-1348/ © 2017 Elsevier B.V. All rights reserved.
Infection, Genetics and Evolution 54 (2017) 458–465
W. He et al.
2.3.2. Effective number of codon (ENC) The parameter of ENC demonstrates the random selection of codon usage deviation, which highlights the preference of synonymous codon usage among the codon family. ENC values range from 20 to 61. A lower ENC value indicates that the gene's codon usage preference is stronger (Wright, 1990). ENC represents the influence of natural selection and mutation pressure on codon usage. GC3s (frequency of G + C at third position) is a ratio that can be calculated by the third base of the content of A + U + G + C divided by its content of G + C (Vicario et al., 2007) using the following formula:
codon of choice to achieve high translation and greater accuracy (Zhou et al., 2015). Observably, a constant codon usage pattern is found in highly expressed genes (Grantham et al., 1980). Codon usage bias plays an indispensable part in the evolution of viruses (Wang et al., 2011). There are several factors which can influence codon usage patterns including mutation pressure, natural or translational selection, secondary protein structure, replication, selective transcription, hydrophobicity and hydrophilicity of the protein, and the external environment (Moratorio et al., 2013; Liu et al., 2011). Codon usage bias on different RNA viruses has been reported whereas mutation pressure and natural selection are the two main factors influencing it. The codon usage choice between viruses and hosts will affect the overall survival of the virus and host health by allowing evasion of the host immune system and viral evolution and fitness (Moratorio et al., 2013). Thus, the study of codon usage bias of viruses can help us gain a deeper understanding of the regulation of viral gene expression and thus the development of better vaccines (Butt et al., 2016). There are currently few studies focusing on large-scale gene analysis of RABV worldwide, in particular of the N gene. Here, we analyzed in detail and compared the codon usage pattern and the evolutionary rate of RABV derived from bats and dogs.
9 1 5 3 ENC = 2 + − + − + − + − F2 F3 F4 F6 F (i = 2,3,4,6) stands for Fi values for the i-fold degenerate amino acids (Chen et al., 2014). 2.4. Analysis of parameters shaping codon usage bias of the N gene of RABV 2.4.1. ENC-plot analysis An ENC-plot (ENC plotted against the percentages of GC3 at the third codon position) is widely used to find out the determining factors that influence the pattern of codon usage (Wright, 1990). In genes where codon choice is constrained only by G + C mutational bias, the ENC-GC3s plot will lie on or around the continuous curve of expected ENC values (Tsai et al., 2007). The expected ENC values were calculated by the equation given below:
2. Materials and methods 2.1. Sequence data In this study, a total of 655 sequences of N gene spanning 85 years from 1931 to 2015 were retrieved from the National Center for Biotechnological Information (https://www.ncbi.nlm.nih.gov/). To keep the codon usage bias statistically significant only sequences with length above 1353 bp originated from a wide variety of host species and collected in 70 countries from 1931 to 2015 were analyzed. In this study, 649 sequences were used for codon analysis since isolates from wild animals were removed. Details of these 655 strains including their accession number, sequences name, country of origin, year of isolation, and host information are shown in Table S1.
ENC expected = 2 + s +
where ‘s’ is the frequency of either a guanine or cytosine at the third site of the synonymous codons, excluding Met, Trp, and stop codons (i.e.GC3s) (Kumar et al., 2016). In this study, the ENC-plot analysis of the N gene of RABV was completed according to the classification of different clades, the host species, and geographical distribution, respectively. 2.4.2. Principal component analysis (PCA) PCA is widely used as a multivariate statistical approach to identify the major trends among genes in codon usage variation (Gupta and Ghosh, 2001), which is axis1 plotted against axis2. In this analysis, the RSCU value of each gene is explained by a 59-dimensional vector and transformed into a smaller number of unrelated factors (Morla et al., 2016). PCA analysis was performed using the software Graphad Prism 5.0 against the classification based on different clades, host species, and countries.
2.2. Calculation of nucleotide contents The nucleotide contents (A%, C%, G%, U%) of each RABV coding sequence were calculated using the Bioedit software. In addition, the values of GC1s, GC2s, GC12s, GC3s were calculated using the program Codon W (http://emboss.toulouse.inra.fr/cgi-bin/emboss/cusp). 2.3. Codon usage indices 2.3.1. Relative synonymous codon usage (RSCU) To evaluate the synonymous codon usage bias, the RSCU of the RABV N gene was estimated (Sharp and Li, 1986). The values of RSCU were calculated as described previously (Cai et al., 2009) using the formula given below in the EMBOSS: cusp online software (http:// emboss.toulouse.inra.fr/cgi-bin/emboss/cusp):
RSCU =
gij ni
∑ j gij
29 s2 + (1 − s2)
2.4.3. General average hydropathicity (Gravy) and aromaticity (Aroma) analysis Gravy is a measure of the hydrophobic and hydrophilic indices of a protein (Kyte and Doolittle, 1982). Positive and negative values of Gravy indicate polar and nonpolar protein respectively. Aroma values reflect the influence of aromatic proteins on codon usage preferences. Thus, both of the two indices magnify the natural selection in shaping the codon usage bias (Kumar et al., 2016).
ni
2.4.4. Neutrality analysis Neutrality analysis is used to determine the role of mutation pressure and natural selection shaping codon usage bias (Sueoka, 1988). Neutrality analysis was represented by GC12s plotted against GC3s using Graphpad Prism 5.0.
where gij represents the observed number of the ith codon for the jth amino acid which has several synonymous codons (Kumar et al., 2016). The RSCU value directly reflects the codon frequency, excluding the influence of amino acid composition and coding sequence size (Zhou et al., 2015). Normally, a RSCU > 1 represents positive codon usage bias while a RSCU < 1 indicates negative codon usage bias (Butt et al., 2014). Additionally, the RSCU of relative host-canine were downloaded from the Codon Usage Database (http://www.kazusa.or.jp/codon/) to compare with the RABV N gene.
2.4.5. The analysis of dinucleotide abundances To estimate the dinucleotide abundances in shaping the codon usage bias, the dinucleotide compositions calculated in this study by software DAMBE. The contents of the 16 dinucleotides calculated as the 459
Infection, Genetics and Evolution 54 (2017) 458–465
W. He et al.
formula (Karlin and Burge, 1995) given below:
Pxy =
Table 1 RSCU analysis of the RABV N gene.
fxy f y fx
In the formula, fx and fy represent the frequency of nucleotide x, y, respectively. fxy stands for the observed frequency of the dinucleotide, while the fyfx stands for the expected frequency of the dinucleotide. Additionally, it is considered that Pxy > 1.23 and < 0.78 denotes over-represented and under-represented based on the above formula (Nasrullah et al., 2015).
Amino acid
Codon
RSCU/ RABV
RSCU/ Canine
Amino acid
Codon
RSCU/ RABV
RSCU/ Canine
Ala
GCA GCC GCG GCU UGC UGU GAC GAU GAA GAG UUC UUU GGA GGC GGG GGU CAC CAU AUA AUC AUU AAA AAG CUA CUC CUG CUU UUA UUG
1.456 1.021 0.293 1.231 0.861 1.139 0.969 1.031 0.802 1.198 1.209 0.791 1.447 0.594 1.107 0.851 0.869 1.131 1.083 0.892 1.025 0.746 1.254 0.906 0.637 1.357 0.755 0.877 1.469
0.793 1.754 0.457 0.996 1.155 0.845 1.142 0.858 0.792 1.208 1.176 0.824 0.968 1.387 0.997 0.648 1.221 0.779 0.446 1.593 0.961 0.791 1.209 0.388 1.303 2.558 0.699 0.347 0.705
Asn
AAC AAU CCA CCC CCG CCU CAA CAG AGA AGG CGA CGC CGG CGU AGC AGU UCA UCC UCG UCU ACA ACC ACG ACU GUA GUC GUG GUU UAC UAU
0.897 1.103 0.779 1.014 0.666 1.542 1.014 0.986 2.758 1.395 0.633 0.167 0.444 0.604 0.427 1.081 1.546 0.967 0.568 1.411 1.198 0.981 0.477 1.344 0.614 1.202 0.918 1.266 0.649 1.351
0.711 0.543 1.014 1.417 0.486 1.083 0.505 1.495 1.195 1.264 0.638 1.207 1.252 0.444 1.486 0.849 0.771 1.447 0.362 1.085 1.031 1.554 0.523 0.893 0.406 1.075 1.938 0.581 1.207 0.793
Cys Asp
2.4.6. Correlation analysis In this study, the correlations among the A%, U%, G%, C%, A3s, U3s, G3s, C3s, GC3s, ENC, Aroma, and Gravy were calculated using the Graphpad Prism 5.0.
Glu Phe Gly
2.5. The evolutionary rate of the N gene of RABV His
The nucleotide sequences of the N gene were compared with that of the earliest sequences of each evolutionary branch excluding the branches of Africa 3 and Indian subcontinent for which the evolutionary rate was analyzed for the selected sequences as the number of substitutions per site per year (Troupin et al., 2016).
Ile
Lys Leu
3. Results 3.1. Composition characteristics of the N gene of RABV The nucleotide composition of 649 strains of N gene of RABV was analyzed (Table S2). The average content ± standard deviation (SD) of each nucleotide of the N gene were A% (29.13 ± 0.64), C% (20.51 ± 0.42), G% (23.91 ± 0.44) and U% (26.45 ± 0.68). This indicates that the number of A/Us were more than C/Gs. Moreover, A was the richest nucleotide among all of the selected sequences. The difference in the content of each nucleotide was not particularly large. The SD value of all four nucleotides was extremely small while C was the lowest one, which means that the basic part of these strains change little, especially the C nucleotide.
Pro
Gln Arg
Ser
Thr
Val
Tyr
The bold and italics in the table denote the eighteen abundant codons of RABV and canine.
plots for different clades based on the phylogeny clades, host species, and countries were analyzed (Fig. 1A, B, C). Fig. 1A shows that different lineage clades clustered together although there was overlapping among these clades. Additionally, the ENC-GC3s points that different hosts and separation sites basically clustered together (Fig. 1B–C). Furthermore, ENC values of a few Asian clades were located over the expected curve, while a larger percentage were distributed below the curve, which indicates that low bias of codon usage is affected by nucleotide composition. Because of the wobble of the third position of the codons, GC3s is mainly used to express the effect of synonymous mutation pressure, thus the codon usage pattern of the N protein is affected by mutation pressure. In addition, other factors might contribute to codon usage bias such as natural selection pressure. In order to explore the effect of mutation pressure in depth, the correlation among codon composition (A3s, U3s, G3s, C3s, and GC3s), ENC value, and nucleic acid composition (A%, U%, G%, C% and, GC%) was analyzed (Table 2). The relationship between ENC values and nucleotide composition indicate significant differences while the composition of the third codon shows different correlations with the content of different nucleotides. U3s and C3s were significantly correlated with A%, C%, U%, and GC%. A3s and G3s were significantly correlated with all nucleic acid compositions, showing that most nucleotide compositions are related to the third codon position. In conclusion, the low bias of codon usage pattern of the N gene of RABV is affected by mutation bias. Likewise, we performed principal component analysis (PCA) according to different clades, different hosts, and geographical distribution (isolated countries) (Fig. 2A, B, C respectively). PCA analysis showed that the first four axes occupied the major proportion of the clades, with the percentage of main change rate of different axes being 1st axis 22.71%, 2nd axis 15.58%, 3rd axis 13.26%, and 4th axis 10.17%. This indicates that the first and second axes, based on the
3.2. Analysis of codon usage indices 3.2.1. RSCU analysis The RSCU values of the 59 codons are listed in Table 1. Among the 59 codons, eighteen frequently employed codons were GCA for Ala, UGU for Cys, GAU for Asp, GAG for Glu, UUC for Phe, GGA for Gly, CAU for His, AUA for Ile, AAG for Lys, UUG for Leu, UAU for Tyr, GUU for Val, ACU for Thr, UCA for Ser, AGA for Arg, CAA for Gln, CCU for Pro, AAU for Asn. The RSCU values of all the 18 codons were > 1, indicating codon usage bias in the N gene of RABV. Additionally, it revealed that eight codons ended with U, six codons ended with A and three codons ended with G while only one codon ended with C, showing that synonymous codons were more in favor of A/U in the second and third positions. These data confirm that the synonymous codon usage of the N gene is not equal and there is codon usage bias. 3.2.2. ENC analysis In addition, to assess the degree of codon usage bias of the RABV N gene, ENC values were calculated. The ENC values ranged of from 51.42 to 60.89 with a mean ± SD of 56.33 ± 1.68, which are extremely high, suggesting that the ENC values are relative unstable and codon usage bias of N gene of RABV is significantly low. 3.3. The effect of mutation bias on codon usage of the N gene To unravel if the mutation pressure shapes codon usage, the ENC460
Infection, Genetics and Evolution 54 (2017) 458–465
W. He et al.
Fig. 1. A) Relationship between ENC and GC3s. The solid curves represent the stranded ENC, while the different clades excluding Africa 3 based on the phylogeny are marked with different colors. All the Indian subcontinent isolates are indicated in red. B) Relationship between ENC and GC3s. The solid curves represent the stranded ENC, while the strains isolated from different host species are marked with different colors. The ENC values of bat, dog and human are indicated in purple, red and green, respectively. C) Relationship between ENC and GC3s. The solid curves represent the stranded ENC, while the strains isolated from different countries are marked with different colors. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
of three species is different. In summary, the fluctuation of distribution of all the strains reveals a role for mutation pressure and natural selection. Furthermore, most of the spots representing all of the seventy different countries were widely distributed, with no special distribution (Fig. 2C).
change of RSCU, mainly affect codon usage. Fig. 2A shows that most of the spots representing all of the six different lineage clades were widely distributed, except few spots representing branches belonging to the Indian subcontinent. Asian and bat spots were near the origin indicating that mutation pressure and natural selection contribute to the codon usage of different clades, especially the Indian subcontinent, Asian and bat clades. PCA analysis based on host species showed that the origin of the three species was widely distributed, in particular they mainly clustered into three groups: bat-related species mainly distributed in the left of 2nd axis, dog-related species mainly distributed in the right of 2nd axis, and Homo sapiens distributed near the origin (Fig. 2B). This demonstrates that the pattern of codon usage was affected by factors including mutation pressure and natural selection and the degree of mutation pressure and natural selection in shaping the codon usage bias
3.4. The effect of natural selection on codon usage of the N protein In order to estimate the role of natural selection in shaping the codon usage bias of the N gene of RABV, the correlations between Gravy, Aroma, ENC value, and composition of the third codon and were analyzed (Table 2). The analysis indicated that Gravy was significantly correlated with U3s, G3s, and GC3s while Aroma was significantly
Table 2 Correlation analysis among codon composition, ENC value and nucleic acid composition.
U3s C3s A3s G3s Nc GC3s
A
C
G
U
GC
1st axis
2nd axis
Gravy
Aroma
− 0.568** 0.108** 0.978** − 0.650** − 0.180** − 0.422**
− 0.720** 0.939** 0.269** − 0.211** 0.323** 0.520**
0.009 0.031 − 0.710** 0.935** 0.207** 0.740**
0.973** −0.707** −0.622** 0.135** −0.167** −0.409**
− 0.516** 0.706** − 0.339** 0.549** 0.390** 0.935**
− 0.208** 0.168** 0.329** − 0.213** − 0.076 − 0.082*
0.154** 0.188** −0.317** 0.024 0.165** 0.172**
− 0.119** 0.270** − 0.038 0.001 0.043 0.186**
0.079* 0.008 − 0.117** 0.067 − 0.335** 0.054
The **denotes p < 0.01, *denotes 0.05 < p < 0.01.
461
Infection, Genetics and Evolution 54 (2017) 458–465
W. He et al.
Fig. 2. A) PCA analysis was plotted against clades based on the phylogeny. Different clades are represented by different colors. The description of blue, light green, orange, dark green, red and yellow is the same as in panel A. B) PCA analysis was plotted against host species. Different host species are represented by different colors. The description of purple, red and green is the same as in panel B. C) PCA analysis was plotted against countries. Different countries are represented by different colors. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
usage bias. The slope of the equation was 0.003464, which significantly deviates from the diagonal line. Thus, the proportion of mutation pressure shaping the codon usage is 0.3464%, which is slight, highlighting that natural selection influences the codon usage of RABV N gene to a larger extent than to mutation pressure (Zhao et al., 2016).
correlated with U3s, A3s, and ENC value. These results confirmed that natural selection contributes to the codon usage bias of RABV N gene. Additionally, comparison of the RSCU of RABV to the canine RSCU revealed that there exists little similarity among the 18 frequently used codons, with just 3 codons being the same with the host. Moreover, among the 18 abundant codons, 12 were terminated with C, while 6 codons terminated with G, in contrast to the results of RABV.
3.6. Dinucleotide analysis To figure out the possible effect of dinucleotide on codon usage, the relative abundances of the 16 dinucleotides of the RABV coding sequence were calculated. None of the estimated dinucleotide frequencies were equal to the expected values (Table 3) with GpA and UpG overrepresented, while GpU and UpA under-represented. Additionally, the RSCU of all of the 8 CpG-containing codons was < 1.6 indicating that CpG were inhibited.
3.5. Natural selection dominates over mutation pressure in shaping codon usage pattern Neutrality analysis between GC12s and GC3s (r2 = 0.000328) was performed. Fig. 3 shows that GC3s ranged from 42.57% to 51.00% suggesting that mutation pressure plays a major role in shaping codon
3.7. Evolutionary rate analysis To study the evolutionary tendency of the N gene of RABV worldwide since 1931, the evolutionary rate was estimated. Fig. 4A shows that among the five different branches of the phylogenetic tree the substitution rate of the Africa 2 clade was the highest with a mean rate of 3.75 × 10− 3 sub./site/year while the Cosmopolitan was the lowest with a mean rate of 9.98 × 10− 4 sub./site/year. Additionally, there were fluctuations in the substitution rates of the bat and the Africa 2 clades before 1990, but since then the rate remained stable and low. However, the branches of Arctic-related, Cosmopolitan, and Asian
Fig. 3. Neutrality analysis between GC3s and GC12s.
462
Infection, Genetics and Evolution 54 (2017) 458–465
W. He et al.
Table 3 Dinucleotide abundances analysis of the RABV N gene.
Mean SD Max Min
Mean SD Max Min
AA
AC
AG
AU
CA
CC
CG
CU
0.94 0.04 1.10 0.91
0.91 0.05 1.06 0.85
1.12 0.04 1.22 1.09
1.02 0.05 1.10 1.01
1.17 0.08 1.37 1.18
1.01 0.06 1.21 0.97
0.57 0.06 0.75 0.60
1.19 0.06 1.36 1.19
GA
GC
GG
GU
UA
UC
UG
UU
1.31 0.04 1.41 1.30
0.82 0.05 0.95 0.86
1.03 0.05 1.26 1.05
0.77 0.04 0.85 0.70
0.65 0.03 0.72 0.60
1.26 0.06 1.46 1.28
1.17 0.05 1.29 1.11
1.04 0.07 1.22 0.88
found that A/U nucleotides account for more than half of the percentage of the N gene of RABV. Moreover, most of the 18 fondly used codons were A/U terminated, further confirming that nucleotide bias is essential in shaping the codon usage pattern. This phenomenon decrypts that base composition constrains synonymous codon usage during the evolution of RABV (Nasrullah et al., 2015). Furthermore, the N gene of RABV displays a low codon usage bias with ENC values ranging from 51.42 to 60.89, which is lower compared with the G gene of RABV (from 44.40% to 51.40%; SD of 1.20) (Zhao et al., 2016). Similarly, ENC values of other RNA viruses including SARS virus (mean value of 48.99 (Zhao et al., 2008)), PEDV (mean value of 47.91 (Chen et al., 2014)), FMDV (mean value of 51.42 (Zhou et al., 2013)), suggest that the low codon usage bias of the N gene of RABV is remarkable, which is in accordance with previously published results showing that the degree of codon usage bias of RNA viruses is low (Jenkins et al., 2001). The weak variation of codon usage of RABV indicates that the N gene experiences accurate and efficient translation. Moreover, the lower codon usage bias benefits virus escape of host's immunity (Zhou
clades experienced a stable change since the emergence of RABV. 4. Discussion Since its emergence, RABV has caused potential threats to public health all over the world, particularly to humans. The N protein is the most conserved of all the viral proteins and is part of the ribonucleoprotein core of RABV. Previous studies of the N gene of RABV have mainly focused on its basic molecular characterization (Body et al., 2014) and genetic polymorphisms (Kissi et al., 1995). Although many studies have analyzed RABV evolution (Lan et al., 2017; Lin et al., 2016; Velasco-Villa et al., 2017; Hayman et al., 2016; Al, 2008), there is not a large-scale and comprehensive analysis based on both codon usage bias and evolution of N gene of RABV until now. Therefore, to understand the molecular evolution of N gene of RABV, the codon usage bias analyzed in the present study. It has been previously reported that nucleotide compositions can constrain the codon usage, therefore, we calculated the nucleotide composition and
Fig. 4. Evolutionary rate of different branches excluding the branches of Africa 3 and Indian subcontinent of the RABV N gene. The substitution rates of the bat, Africa 2, Arctic-related, Cosmopolitan, and Asian clades are indicated by blue, light green, orange, dark green and yellow, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
463
Infection, Genetics and Evolution 54 (2017) 458–465
W. He et al.
Appendix A. Supplementary data
et al., 2015). In addition, the low codon usage bias of RABV was consistent with the low codon usage bias of RNA viruses, which might be responsible for relative viruses replicating in host cells effectively via reducing the synthetic mechanism of virus in infecting host (Nasrullah et al., 2015). Previous studies showed that codon usage is mainly determined by one of the following two factors: natural selection or mutation pressure (Sharp et al., 2010). To dissect the factors shaping the codon usage of the N gene of RABV in depth, ENC-plots and PCA analysis were performed using the branches according to the phylogenetic evolution, the original host, and the detailed geographical distribution, respectively. ENC-plot analysis showed that few N gene stains, belonging to the Asian branch isolated from dog, were influenced by natural selection while other Asian strains were influenced by mutation pressure and other factors to some degree. In other five branches, the decisive factor shaping codon usage of the Africa 2 clade was mutation pressure with an ENC value slightly lower than the standard value. Additionally, the distribution of Asian strains was wider than the other five branches in PCA analysis, which might be because RABV mainly circulated in developing countries (Zhang et al., 2017). The dog-related and bat-related RABV strains distributed separately, which might be explained by the fact that these are the two major RABV groups (Troupin et al., 2016). Additionally, we found that there exists overlap among the six branches, especially the bat, Asian, and India subcontinent clades suggesting significant correlation among the three branches during evolution. In conclusion, mutation pressure is important in shaping the codon usage of N gene of RABV as well as translational pressure. Furthermore, the remarkable correlations between the Gravy, Aroma and GC3s and ENC decrypt that natural selection contributes to the codon usage bias of N gene of RABV. Moreover, comparison of the RSCU of RABV with the host (canine) revealed that the codon usage was, different from the host to a large degree. This could be due to the virus escape from host immune responses leading to natural selection shaping the codon usage of RABV (Liu et al., 2012). From our analysis it appears that mutation pressure has a more important influence than natural selection. However, neutrality plot analysis indicated that the role of mutation pressure is minor compared to natural selection. It has been previously described that dinucleotide abundance is one factor affecting codon usage bias, in agreement with the findings of our study (Kumar et al., 2016). All of the unmethylated CpG containingcodons had a RSCU < 1.6, indicating CpG deficiency. It is considered that Toll like receptor 9 (TLR9), an intracellular pattern recognition, contributed to the unmethylated CpG, thus, different immune response pathways are activated (Dorn and Kippenberger, 2008). In conclusion, in study the codon usage of N gene of RABV, mutation pressure, translational selection, dinucleotide abundances, as well as geographical distribution play different in the evolution of RABV. Furthermore, the evolutionary rate of the N gene of RABV was analyzed for the first time based on the phylogenetic lineage branches (Troupin et al., 2016). Interestingly, the evolutionary rate of each branch of the five different clades of the N gene was significantly different, with the branch of Africa 2 having the highest and the Cosmopolitan clade having the lowest. Here, a large number of N gene sequences of RABV were analyzed and novel findings were discovered. These new results will serve future global RABV research, such as identifying strains with specific biologic characteristic.
Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.meegid.2017.08.012. References Al, V.V.E., 2008. Enzootic rabies elimination from dogs and reemergence in wild terrestrial carnivores, United States - volume 14, number 12—December 2008 - emerging infectious disease journal - CDC. Emerg. Infect. Dis. 14 (12), 1849–1854. Baghi, H.B., Bazmani, A., Aghazadeh, M., 2016. The fight against rabies: the Middle East needs to step up its game. Lancet 388 (10054), 1880. Biek, R., Henderson, J.C., Waller, L.A., Rupprecht, C.E., Real, L.A., 2007. A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus. Proc. Natl. Acad. Sci. U. S. A. 104 (19), 7993–7998. Body, M.H.H., Rawahi, A.A., Hussain, M.H., Habsi, S.S.A., Wadir, A.A., Saravanan, N., et al., 2014. Study on molecular characterization of rabies virus N gene segment from different animal species in the Sultanate of Oman. J. Vet. Med. Anim. Health. 6 (12), 295–301. Bourhy, H., Kissi, B., Audry, L., Smreczak, M., Sadkowskatodys, M., Kulonen, K., et al., 1999. Ecology and evolution of rabies virus in Europe. J. Gen. Virol. 80 (Pt 10) (10), 2545–2557. Butt, A.M., Nasrullah, I., Tong, Y., 2014. Genome-wide analysis of codon usage and influencing factors in chikungunya viruses. PLoS One 9 (3), e90905. Butt, A.M., Nasrullah, I., Qamar, R., Tong, Y., 2016. Evolution of codon usage in Zika virus genomes is host and vector specific. Emerg. Microbes Infect. 5 (10), e107. Cai, M.S., Cheng, A.C., Wang, M.S., Zhao, L.C., Zhu, D.K., Luo, Q.H., et al., 2009. Characterization of synonymous codon usage bias in the duck plague virus UL35 gene. Intervirology 52 (5), 266. Chen, Y., Shi, Y., Deng, H., Gu, T., Xu, J., Ou, J., et al., 2014. Characterization of the porcine epidemic diarrhea virus codon usage bias. Infect. Genet. Evol. 28, 95–100. Consultation WHOE, Expert WHO, Panel A, Foundation G, Alliance G, Control R, et al., 2005. WHO Expert Consultation on Rabies. Conzelmann, K.K., Schnell, M., 1994. Rescue of synthetic genomic RNA analogs of rabies virus by plasmid-encoded proteins. J. Virol. 68 (2), 713–719. Dietzschold, B., Wunner, W.H., Wiktor, T.J., Lopes, A.D., Lafon, M., Smith, C.L., et al., 1983. Characterization of an antigenic determinant of the glycoprotein that correlates with pathogenicity of rabies virus. Proc. Natl. Acad. Sci. 80 (1), 70–74. Dorn, A., Kippenberger, S., 2008. Clinical application of CpG-, non-CpG-, and antisense oligodeoxynucleotides as immunomodulators. Curr. Opin. Mol. Ther. 10 (1), 10–20. Finke, S., Granzow, H., Hurst, J., Pollin, R., Mettenleiter, T.C., 2010. Intergenotypic replacement of lyssavirus matrix proteins demonstrates the role of lyssavirus M proteins in intracellular virus accumulation. J. Virol. 84 (4), 1816. Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pavé, A., 1980. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8 (1), r49–r62. Gupta, S.K., Ghosh, T.C., 2001. Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas Aeruginosa. Gene 273 (1), 63–70. Hayman, D.T.S., Fooks, A.R., Marston, D.A., Garciar, J.C., 2016. The global phylogeography of lyssaviruses - challenging the ‘Out of Africa’ hypothesis. PLoS Negl. Trop. Dis. 10 (12), e0005266. Jenkins, G.M., Pagel, M., Gould, E.A., Pm, D.A.Z., Holmes, E.C., 2001. Evolution of base composition and codon usage bias in the genus Flavivirus. J. Mol. Evol. 52 (4), 383–390. Karlin, S., Burge, C., 1995. Dinucleotide relative abundance extremes: a genomic signature. Trends Genetics Tig. 11 (7), 283. Kissi, B., Tordo, N., Bourhy, H., 1995. Genetic polymorphism in the rabies virus nucleoprotein gene. Virology 209 (209), 526–537. Knobel, D.L., Cleaveland, S., Coleman, P.G., Fèvre, E.M., Meltzer, M.I., Miranda, M.E., et al., 2005. Re-evaluating the burden of rabies in Africa and Asia. Bull. World Health Organ. 83 (5), 360–368. Kumar, N., Bera, B.C., Greenbaum, B.D., Bhatia, S., Sood, R., Selvaraj, P., et al., 2016. Revelation of influencing factors in overall codon usage bias of equine influenza viruses. PLoS One 11 (4), e0154376. Kuzmin, I.V., Shi, M., Orciari, L.A., Yager, P.A., Velascovilla, A., Kuzmina, N.A., et al., 2012. Molecular inferences suggest multiple host shifts of rabies viruses from bats to Mesocarnivores in Arizona during 2001–2009. PLoS Pathog. 8 (6), e1002786. Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157 (1), 105–132. Lagerkvist, U., 1978. “Two out of three”: an alternative method for codon reading. Proc. Natl. Acad. Sci. U. S. A. 75 (4), 1759–1762 Apr. Lan, Y.C., Wen, T.H., Chang, C.C., Liu, H.F., Lee, P.F., Huang, C.Y., et al., 2017. Indigenous wildlife rabies in Taiwan: ferret badgers, a long term terrestrial reservoir. Biomed. Res. Int. 2017, 5491640. Lin, Y.C., Chu, P.Y., Chang, M.Y., Hsiao, K.L., Lin, J.H., Liu, H.F., 2016. Spatial temporal dynamics and molecular evolution of re-emerging rabies virus in Taiwan. Int. J. Mol. Sci. 17 (3), 392. Liu, Y.S., Zhou, J.H., Chen, H.T., Ma, L.N., Pejsak, Z., Ding, Y.Z., et al., 2011. The characteristics of the synonymous codon usage in enterovirus 71 virus and the effects of host on the virus in codon usage pattern. Infect. Genet. Evol. 11 (5), 1168–1173. Liu, X.S., Zhang, Y.G., Fang, Y.Z., Wang, Y.L., 2012. Patterns and influencing factor of synonymous codon usage in porcine circovirus. Virol. J. 9 (1), 1–9. Masatani, T., Ito, N., Ito, Y., Nakagawa, K., Abe, M., Yamaoka, S., et al., 2013. Importance of rabies virus nucleoprotein in viral evasion of interferon response in the brain. Microbiol. Immunol. 57 (7), 511–517.
Acknowledgements This work was financially supported by the National Key Research and Development Program of China (2016YFD0500402); the National Natural Science Foundation of Jiangsu Province (BK20170721); the Fundamental Research Funds for the Central Universities (Y0201600147) and the Priority Academic Program Development of Jiangsu Higher Education Institutions. 464
Infection, Genetics and Evolution 54 (2017) 458–465
W. He et al.
Sueoka, N., 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. U. S. A. 85 (8), 2653–2657. Surhone, L.M., Timpledon, M.T., Marseken, S.F., System, P.N., System, C.N., 2010. Rabies virus. Encycl. Neurol. Sci. 9 (1), 1027–1030. Tang, X., Luo, M., Zhang, S., Fooks, A.R., Hu, R., Tu, C., 2005. Pivotal role of dogs in rabies transmission, China. Emerg. Infect. Dis. 11 (12), 1970–1972. Tordo, N., Kouknetzoff, A., 1993. The rabies virus genome: an overview. Onderstepoort J. Vet. Res. 60 (60), 263–269. Tordo, N., Poch, O., Ermine, A., Keith, G., Rougeon, F., 1988. Completion of the rabies virus genome sequence determination: highly conserved domains among the L (polymerase) proteins of unsegmented negative-strand RNA viruses. Virology 165 (2), 565–576. Troupin, C., Dacheux, L., Tanguy, M., Sabeta, C., Blanc, H., Bouchier, C., et al., 2016. Large-scale phylogenomic analysis reveals the complex evolutionary history of rabies virus in multiple carnivore hosts. PLoS Pathog. 12(12). Tsai, C.T., Lin, C.H., Chang, C.Y., 2007. Analysis of codon usage bias and base compositional constraints in iridovirus genomes. Virus Res. 126 (1–2), 196. Velasco-Villa, A., Mauldin, M.R., Shi, M., Escobar, L.E., Gallardo-Romero, N.F., Damon, I., et al., 2017. The history of rabies in the Western Hemisphere. Antivir. Res. Vicario, S., Moriyama, E.N., Powell, J.R., 2007. Codon usage in twelve species of drosophila. BMC Evol. Biol. 7 (1), 226. Wang, M., Zhang, J., Zhou, J.H., Chen, H.T., Ma, L.N., Ding, Y.Z., et al., 2011. Analysis of codon usage in bovine viral diarrhea virus. Arch. Virol. 156 (1), 153–160. Wright, F., 1990. The ‘effective number of codons’ used in a gene. Gene 87 (1), 23–29. Zhang, Y., Vrancken, B., Yun, F., Dellicour, S., Yang, Q., Yang, W., et al., 2017. Crossborder spread, lineage displacement and evolutionary rate estimation of rabies virus in Yunnan Province, China. Virol. J. 14 (1), 102. Zhao, S., Zhang, Q., Liu, X., Wang, X., Zhang, H., Wu, Y., et al., 2008. Analysis of synonymous codon usage in 11 Human Bocavirus isolates. Biosystems 92 (3), 207–214. Zhao, Y., Zheng, H., Xu, A., Yan, D., Jiang, Z., Qi, Q., et al., 2016. Analysis of codon usage bias of envelope glycoprotein genes in nuclear polyhedrosis virus (NPV) and its relation to evolution. BMC Genomics 17 (1), 677. Zhou, J.H., Gao, Z.L., Zhang, J., Ding, Y.Z., Stipkovits, L., Szathmary, S., et al., 2013. The analysis of codon bias of foot-and-mouth disease virus and the adaptation of this virus to the hosts. Infect. Genet. Evol. 14 (2), 105–110. Zhou, H., Yan, B., Chen, S., Wang, M., Jia, R., Cheng, A., 2015. Evolutionary characterization of Tembusu virus infection through identification of codon usage patterns. Infect. Genet. Evol. 35, 27–33.
Marín, A., Bertranpetit, J., Oliver, J.L., Medina, J.R., 1989. Variation in G + C-content and codon choice: differences among synonymous codon groups in vertebrate genes. Nucleic Acids Res. 17 (15), 6181–6189. Masatani, T., Ito, N., Shimizu, K., Ito, Y., Nakagawa, K., Abe, M., et al., 2011. Amino acids at positions 273 and 394 in rabies virus nucleoprotein are important for both evasion of host RIG-I-mediated antiviral response and pathogenicity. Virus Res. 155 (1), 168–174. Mebatsion, T., König, M., Conzelmann, K.K., 1996. Budding of rabies virus particles in the absence of the spike glycoprotein. Cell 84 (6), 941–951. Mebatsion, T., Weiland, F., Conzelmann, K.K., 1999. Matrix protein of rabies virus is responsible for the assembly and budding of bullet-shaped particles and interacts with the transmembrane spike glycoprotein G. J. Virol. 73 (1), 242. Mehta, S., Charan, P., Dahake, R., Mukherjee, S., Chowdhary, A., 2016. Molecular characterization of nucleoprotein gene of rabies virus from Maharashtra, India. J. Postgrad. Med. 62 (2), 105–108. Moratorio, G., Iriarte, A., Moreno, P., Musto, H., Cristina, J., 2013. A detailed comparative analysis on the overall codon usage patterns in West Nile virus. Infect. Genet. Evol. 14 (1), 396–400. Morla, S., Makhija, A., Kumar, S., 2016. Synonymous codon usage pattern in glycoprotein gene of rabies virus. Gene 584 (1), 1–6. Nasrullah, I., Butt, A.M., Tahir, S., Idrees, M., Tong, Y., 2015. Genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on Marburg virus evolution. BMC Evol. Biol. 15 (1), 1–15. Nel, L.H., Markotter, W., 2007. Lyssaviruses. Crit. Rev. Microbiol. 33 (4), 301. Nel, L.H., Sabeta, C.T., Teichman, B.V., Jaftha, J.B., Rupprecht, C.E., Bingham, J., 2005. Mongoose rabies in southern Africa: a re-evaluation based on molecular epidemiology. Virus Res. 109 (2), 165. Oem, J.K., Kim, S.H., Kim, Y.H., Lee, M.H., Lee, K.K., 2013. Complete genome sequences of three rabies viruses isolated from rabid raccoon dogs and a cow in Korea. Virus Genes 47 (3), 563. Paul, S., Bag, S.K., Das, S., Harvill, E.T., Dutta, C., 2008. Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes. Genome Biol. 9 (4), 1–19. Schnell, M.J., Mcgettigan, J.P., Wirblich, C., Papaneri, A., 2009. The cell biology of rabies virus: using stealth to reach the brain. Nat. Rev. Microbiol. 8 (1), 51–61. Sharp, P.M., Li, W.H., 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24 (1–2), 28–38. Sharp, P.M., Emery, L.R., Zeng, K., 2010. Forces that influence the evolution of codon bias. Philos. Trans. R. Soc. B 365 (1544), 1203–1212.
465