Virus Research 258 (2018) 68–72
Contents lists available at ScienceDirect
Virus Research journal homepage: www.elsevier.com/locate/virusres
Analysis of compositional bias and codon usage pattern of the coding sequence in Banna virus genome
T
⁎
Shiyu Long, Huipeng Yao , Qi Wu, Guoling Li College of Life Science, Sichuan Agriculture University, Ya'an 625014, Sichuan, People’s Republic of China
A R T I C LE I N FO
A B S T R A C T
Keywords: Banna virus ENC-plot Neutrality plot analysis Correspondence analysis
By using DNA Star, CUSP of EMBOSS, Codon W and IBM SPSS Statistics, nucleotide composition and codon usage pattern of 115 genes are researched in 37 BAVs. It shows that the composition of all genes prefers to AU, compared to CG, and for most of genes, the order is A, U, G and C in the virus. The ENC-values of the genes are slightly high which shows the weak codon bias, in which the codon bias of VP9 gene is the highest. The codon usage pattern of 12 different genes is different and related to their composition, their function or their host. For example, VP9 gene encoding viral spike protein in contact with different hosts spread dispersedly and VP1, VP2, VP3 and VP4 genes encoding the capsid protein are concentrated on first quadrant in correspondence analysis. The ENC of VP5 is correlated to GC3s in correlation analysis. The points of VP12 gene are tightly close to the expected curve in the ENC-plot analysis but GC12 is not related to GC3 in neutrality analysis. All analysis indicates the codon usage pattern of 12 genes is influenced by both natural selection and neutral mutation in a different extent in BAV. As a pathogen of viral encephalitis, compositional analysis and codon bias analysis of BAV can provide a theoretical basis for the disease control.
1. Introduction Banna virus (BAV) is a species of the genus Seadornavirus within the family Reoviridae (Hong and Zhou, 2006). Its genome has 12 segments of double-stranded RNA, each of which encodes a single viral protein (VP), respectively, VP1∼VP12 (Xu et al., 1990a, 1990b). In the function of these proteins in BAV, VP4 and VP9 are outer coat proteins. The former has the function of Methyltransferases. The later, as a kind of soluble trimeric form of viral attachment protein, is responsible for absorbing on the host cell surface by binding receptor and initiating endocytosis for subsequent penetration, so the pretreatment of the cell with metric VP9 can increase viral infection (Jaafar et al., 2005a). VP1, VP2, VP3, VP8, and VP10 are inner capsid particles. VP1 participates in transcription and gene duplication. VP2 is sub-coreshell‘T2’ protein. VP3 has the function of capping enzymes, methyltransferases, and helicase (Jaafar et al., 2005b). VP8 is core-surface ‘T13’ protein. VP10 is the foot of the viral spike, which helps stabilize the VP9 spike protein in the viral coat. VP5, VP6, VP7, VP11, and VP12 are non-structural proteins (Jaafar et al., 2005c). VP6 is nucleoside hydrolase, containing leucine zipper. VP7 is a protein kinase.VP12 is a double stranded RNA binding protein. The functions of VP5 and VP11 are not clear. As an arbovirus, BAV can be carried and spread by a variety of ⁎
blood-sucking arthropods such as mosquitoes, ticks, and midges to cause disease, which has obvious seasonality and is often endemic. Unfortunately, there are no vaccines for the prevention and control of BAV (Harbin Veterinary Research Institute, 2013). BAV is considered being one of the causes of fever of unknown origin (Xu et al., 1990a, 1990b). The symptoms of infection are fever, accompanied by headache, muscle soreness, severe coma, and even further triggering encephalitis so that it is also one of the pathogen of viral encephalitis (Zhai et al., 2010). In addition, BAV also has a lethal effect. For example, in 1990, Li Qiping and others reported that livestock became anorexic, sick, and eventually died because of infection of BAV and ineffective treatment in Xinjiang (Li, 1992). BAV was first found and isolated from patients, at 1987, in Sip Son Panna (Xu et al., 1990a, 1990b). Since then, isolates have subsequently been found in Indonesia, Vietnam and China, which indicates that BAV is widely distributed among East Asia (Xu et al., 1990a, 1990b; Li, 1992; Brown et al., 1993; Nabeshima et al., 2008). Based on the analysis of the geographical distribution, temporal distribution and the evolution of BAV, Liu Hong et al. draw a conclusion: the danger and the expansion of the virus was ignored for a long time, and each of 12 segments have evolved in different directions (Liu and Liang, 2011). In addition, combining with the genome sequence of BAV from Gansu Province in China, Zhai et al. have analyzed the differences of its
Corresponding author. E-mail address:
[email protected] (H. Yao).
https://doi.org/10.1016/j.virusres.2018.10.006 Received 21 February 2018; Received in revised form 19 August 2018; Accepted 9 October 2018 Available online 11 October 2018 0168-1702/ © 2018 Elsevier B.V. All rights reserved.
Virus Research 258 (2018) 68–72
S. Long et al.
most of genes of the strains (Fig. 1), except for, VP8 and VP9 genes, at whiles, in which U-content is slightly greater than A-content. In an addition, in VP8 gene of 02VN018b and JKT-7043, C-content is greater than G-content. According to Fig. 1, A-content of VP11gene is the highest in the BAVs (except for VP10 of QTM104536), and the average is 37.09%. VP8 gene has the lowest A-content in the virus (except for VP9 of LN0689 and LN0688), the average of which is 28.98%. Consequently, it can be thought that VP11 gene has stronger A-bias than VP8 gene. Nevertheless, the compositional nucleotide does not deviate greatly among the same gene in different strains (Fig. 1).
geographic distribution and found that BAV may form a stable ecological cycle by mediating in its intermediate host and the terminal host (Zhai et al., 2010). The amino acid can be encoded by more than one triplet codon, for example, aspartic acid is encoded by 2 triplets, isoleucine by 3, and arginine by 6. The codons encoding the same amino acid are called synonymous codon. Often, a species tend to use one or several specific codons. Different viruses have preference to different codon, so the study of viral codon bias will help to explore the evolutionary processes of viral genes, to analyze the interaction mechanism between viruses and its host and to establish a scientific mechanism of disease control and prevention. BAV is widely distributed in East Asia, which can infect humans and livestock, and cause serious diseases, so the establishment of BAV prevention mechanism and the development of an effective vaccine are very necessary. The previous studies for BAV concentrate on virus isolation, nucleotide sequence determination, protein function, and evolution direction. In this paper, we firstly analyze the compositional bias and codon bias of its encoding region based on investigating its coding region.
3.2. ENC-plot analysis The ENC-values range from 39.79 to 56.56 for all genes of the BAVs, and the average is 48.62, which reflects that coding region of BAV has low codon bias. We analyzed the ENC-value of each gene in order to explore the factors that influence codon bias of BAV. Generally speaking, most of dots lay on or slightly under the expected curve, which illustrates that the codon usage pattern is primarily influenced by the mutation pressure (Fig. 2). According to Fig. 2, the following points are more close to the curve, which are produced by VP12 gene of 17 isolates, VP7 gene of JKT-6423 and 02VN078b, VP11 gene and VP5 gene of 02VN018b, and VP9 gene of BJ9575, and VP10 gene of JKT6423, JKT-7043 and JKT-6969 (the difference of expected value and actual value is below 1.6). It indicates that the mentioned genes in the last sentence may be affected by neutral mutations. At least, VP12 gene is affected to a greater extent by neutral mutations for BAVs. But for other genes, their points are slightly farther under the standard curve. In other aspect, the dots from the genes of the same strain are extremely dispersed, so there is no strain specificity to some extent (Fig. 2). In general, the codon usage bias is different among the different genes of the same strain and is not in the same gene of different strains.
2. Materials and methods The information of 205 sequences of all BAVs, such as accession number, name of strain, and name of gene were downloaded from GenBank in May 26, 2018, in which rejecting 57 partial sequences and 33 repetitive sequences, only 115 available coding sequences (CDS) or genes were selected to analyze their characteristics (Table S1). The values of C-content, A-content, U-content, G-content, GC-content, effective number of codons (ENC), the frequencies of nucleotide G + C at the third positions of synonymous codons (GC3s), frequency of optimal codons (FOP), grand average of hydropathy (Gravy), aromaticity (Aroma), codon bias index (CBI), frequencies of nucleotide G + C at the first (GC1), second (GC2), and third codon position (GC3)of all sequences were calculated by DNA Star, CUSP of EMBOSS and Codon W (Burland, 2000; Peden, 2000). ENC-value is widely used in codon bias analysis, which can reflect synonymous codon bias, that is, the smaller the value, the greater the bias. ENC-plot of ENC-values versus GC3swas used to analyze the factors that influence codon bias of BAVs (Wright, 1990). If the dots of the given genes lay on the expected curve with the random codons, it implies that codon bias is affected by neutral mutation. Conversely, codon bias is mainly affected by natural selection. Neutrality plot of GC12-values (the average of GC1 and GC2) against GC3-values was also used to analyze the influencing codon bias of the viruses (Sueoka, 1988). Correspondence analysis (COA) was performed in Codon W based on relative synonymous codon usage (RSCU) value, and the first (Axis 1), second (Axis 2), third (Axis 3), and fourth major axes (Axis 4) were obtained. IBM SPSS Statistics was used to analyze the correlation of various parameters (CBI, FOP, ENC, GC1, GC2, GC3, GC, GC3s, Axis 1, Axis 2, Gravy and Aroma). Statistic and plotting was performed in Excel.
3.3. Neutrality plot analysis As shown in Fig. 3, most of the points are above the diagonal, indicating that GC3-value is less than GC12-value, and base composition is also preferred to be A or U at third codon position for most of the genes in BAV. There is weak relevance between the base composition at the third position and the former two positions for all genes of BAV, as the R2 of fitted lineal regression equation of VP5 gene is 0.869, the rest are below 0.504 (the data is not published). It is note that the slope of the fitted lineal regression equation for VP5 gene is only 0.287. Therefore, it can be concluded that BAV codon bias is mostly influenced by natural selection for most genes, but for the VP5 gene, codon bias may be influenced by mutation pressure. 3.4. Correlation analysis According to the correlation analysis of given parameter of CDS in the same gene of different BAVs, the following results can be obtained (Table S2). There is a strong relationship between GC, GC3 and GC3S for the VP1, VP2, VP3, VP5, VP6, VP7, VP9 and VP12 genes. Among them, for VP1, VP5, VP6, and VP9 genes, the ENC is correlated with GC, GC3 and GC3S (0.01 < p < 0.05), which indicates that codon bias is related to base composition at the third synonymous codon position. The relationship between ENC and Gravy (0.01 < p < 0.05) for the VP1 gene, and the relationship between ENC and CBI or FOP for VP9 gene (0.01 < p < 0.05) illustrate the effect of natural selection on codon usage pattern of BAV. For VP4 gene, ENC is significantly correlated with GC (P < 0.01), and the relationship between Gravy, Aroma, CBI and FOP (0.01 < p < 0.05) shows the effect of gene expression level on codon bias. For VP8 gene, GC2 is correlated with CBI and FOP (p < 0.01). GC3 is significantly correlated with GC3S (0.01 < p < 0.05) for both VP4 gene and VP8 gene. This means that
3. Results 3.1. Nucleotide composition of coding region of BAV genome In order to elaborate on the similarities and differences of the base composition of coding region in BAV genome, we compared the Acontent, U-content, C-content, and G-content among all genes according to the same gene of different viruses. Nucleotide composition of coding region of all BAVs is shown as Fig. 1. The content of A, U, C, G is stable among the same gene in different BAVs, but the difference among A, U, C and G is clear. The A-content is nearly 2 times of C-content, and Ucontent is greater than G-content. Obviously, AU-content shares a larger proportion than GC-content in all genes. Furthermore, the order of nucleotide composition is found to be A > U > 0.25 > G > C in 69
Virus Research 258 (2018) 68–72
S. Long et al.
Fig. 1. Summary of the A-content, U-content, G-content and C-content of all genes of BAVs. Comparison of nucleotide content in the same gene of different isolates, four base types are marked by four colors. A total of 12 genes of BAV are listed.
Fig. 2. ENC-plot analysis of all genes of BAVs. ENC-values of each gene of BAVs plot against the GC3s. The red curve represents the expected ENC-values versus GC3s when the codon usage is only affected by the GC3s. Different genes are marked by different colors.
codon bias of VP4 and VP8 genes of BAV is affected by the similar factors. All in, codon bias is diverse for the coding region of different segments of BAV.
related to non-structural proteins with different functions, such as the points of VP5, VP6, VP7, VP11, and VP12 genes, are concentrated on third areas, respectively.
3.5. Correspondence analysis
4. Discussion
Correspondence analysis of all genes was performed based on RSCU value. Axis 1 accounted for 24.38% of the total variation, Axis 2 for 14.20%, Axis 3 for 7.36% and Axis 4 for 5.09%. Thus, a plot of the two principal axes (Axis 1 and Axis 2) of the BAVs was drawn. As shown in Fig. 4, for 12 genes, the points of each gene cluster together in a corner of the plot. The points of most genes distribute in the first quadrant, except for VP9, VP10 and VP12 genes. The points of VP9 gene distribute in the fourth quadrant, VP10 gene in the first and the second quadrant, and VP12 gene in the second and the third quadrant. Almost all points related to capsid proteins, VP1, VP2, VP3, VP4 and VP8 genes, gather together in the first quadrant. In other aspect, the points
It is known that, dsRNA genome are generally composed of many segments, such as, Aquareovirus C, changuinola virus, Great Island virus, however, the codon usage patterns of dsRNA virus have been poorly studied. As a kind of dsRNA virus, the genome of BAV contains 12 segments, encoding different proteins, among which, VP1, VP2, VP3, VP8, and VP10 are inner capsid particles, VP5, VP6, VP7, VP11, and VP12 are non-structural proteins, VP4 and VP9 are outer coat proteins. The 12 genes of BAV have rich AU-content and weak codon usage bias which are same as segment 7 of RBSDV (rice black-streaked dwarf virus) in the same family (Zhou et al., 2015). According to Fig. 1, we found that AU-content is greater than GC-content for every gene from 70
Virus Research 258 (2018) 68–72
S. Long et al.
codon bias of the VP9 gene is affected by mutation pressure. All in, codon usage bias of the VP9 gene is affected by both selection pressure and mutation pressure. Among non-structural proteins, VP6 is nucleoside hydrolase, VP7 is protein kinase, VP12 is a double stranded RNA binding protein, and the function of VP5 and VP11 are not clear. In correspondence analysis, the points related to non-structural proteins form third different regions, respectively, which indicated that the codon usage pattern of the nonstructural proteins genes is related to its function. In ENC-plot analysis, the points of VP12 gene are tightly close to the expected curve, moreover, base substitution rate of segment 12 is the lowest among all segments (Liu and Liang, 2011), that is, the codon bias of VP12 gene is mainly affected by mutation pressure and the other genes mainly by selection pressure. However, according to neutrality plot analysis and correlation analysis, the codon usage of VP5 gene is influenced by mutation pressure. Overall, the codon usage of VP5, VP6, VP7, VP11, and VP12 genes is affected by both selection pressure and mutation pressure, in which VP12 gene are subjected to the weakest selection pressure. VP1, VP2, VP3 and VP4 belong to capsid particles (Jaafar et al., 2005c). In correspondence analysis, the points of VP1, VP2, VP3 and VP4 genes are concentrated on the first quadrant, which means that their similar codon usage pattern may be related to being capsid proteins genes. It is similar to the ENC-plot analysis and neutrality plot analysis, in which the points of VP1, VP2, VP3, and VP4 genes tend to gather closely together at different locations, respectively. In addition, ENC-plot analysis shows that the codon bias of the four genes is affected by neutral mutation. Generally, codon bias of VP1, VP2, VP3, and VP4 genes is affected by both selection pressure and mutation pressure. In this study, it is concluded that the codon bias of all genes is different and weak in BAV, which is affected by both neutral mutation and natural selection, including the influence from its host and its different functions.
Fig. 3. Neutrality plot analysis. The straight line in the figure shows that GC3 is equal to GC12. If all points lie on the line, it indicates that the codon usage pattern is mainly affected by neutral mutation. Otherwise, it is affected by natural selection.
the corresponding segment in all different BAVs, which may help to replicate, transcript and even to produce its variable progeny in the host, except for VP9 gene. For HIV-1, its extremely A-rich genome helps to avoid recognition by the innate immune system of host cell (Vabret et al., 2012). So the A-bias base composition of BAV is beneficial to its stable life in the host. A previous study showed that the nucleotide substitution rate of segment 9 is the highest among all 12 segments in Banna virus (Liu and Liang, 2011). In the correspondence analysis, the points of the corresponding VP9 gene are most widely distributed and spread across two quadrants, showing that VP9 gene has the greatest variability compared to the other genes. All above characteristics may be related to the virus to adapt the different hosts, because the VP9 gene encoding viral spike proteins is in direct contact with its host. Correlation analysis and neutrality plot analysis show that codon usage bias of the VP9 gene is greatly affected by selection pressure. ENC-plot analysis shows that
Fig. 4. Correspondence analysis in each gene of BAV. A plot with Axis 1 against Axis 2 was plotted based on RSCU values of all genes. Different genes are represented by different colors and shapes. 71
Virus Research 258 (2018) 68–72
S. Long et al.
Acknowledgements
Jaafar, F., Attoui, H., Mertens, P.P., De, M.P., De, L.X., 2005c. Structural organization of an encephalitic human isolate of banna virus (genus seadornavirus, family reoviridae). J. Gen. Virol. 86 (Pt 4), 1147–1157. Li, Q.P., 1992. First Isolation of New Orbivirus (Banna) from Ticks and Infected Cattle sera in Xinjiang. Endemic Disease Bulletin. Liu, H., Liang, G.D., 2011. Study on the Genomic Characteristics and Molecular Evolution of Banna Virus (BAV). Chinese Center for Disease Control and Prevention. http:// kns.cnki.net/KCMS/detail/detail.aspx?dbname=CDFD2012&filename= 1011210944.nh. Nabeshima, T., Thi, N.P., Guillermo, P., Parquet, M.C., Yu, F., Thanh, T.N., et al., 2008. Isolation and molecular characterization of banna virus from mosquitoes, vietnam. Emerg. Infect. Dis. 14 (8), 1276–1279. Peden, J.F., 2000. Analysis of Codon Usage. University of Nottingham 90(1), 73–74. Sueoka, N., 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. U. S. A. 85 (8), 2653–2657. Vabret, N., Bailly-Bechet, M., Najburg, V., Mã¼Ller-Trutwin, M., Verrier, B., Tangy, F., 2012. The biased nucleotide composition of hiv-1 triggers type i interferon response and correlates with subtype d increased pathogenicity. PLoS One 7 (4), e33502. Wright, F., 1990. The’ effective number of codons’ used in a gene. Gene 87 (1), 23. Xu, P.T., Wang, Y.M., Zuo, J.M., Lin, J.W., Xu, P.M., 1990a. New orbiviruses isolated from patients with unknown fever and encephalitis in yunnan province. Chin. J. Virol. 6 (1), 27–33. Xu, P.T., Wang, Y.M., Zuo, J.M., Che, Y., Peng, H., Huang, Z., et al., 1990b. Recovery of the same type of virus as human new orbivirus from sera of cattles and pigs collected in yunnan province. Chin. J. Virol 6 (4), 327–331. Zhai, Y.G., Wang, H.Q., Yu, D.S., Li, G.T., Jiang, J.X., Jia, Y.X., et al., 2010. Isolation and identification of a novel subtype of banna virus in gansu province. Chin. J. Zoonoses 26 (04), 304–309. Zhou, Y., Weng, J., Chen, Y., Wu, J., Meng, Q., Han, X., et al., 2015. Molecular genetic analysis and evolution of segment 7 in rice black-streaked dwarf virus in china. PLoS One 10 (6), e0131410.
This work was supported by the research grants from Discipline construction Double Support Project of Sichuan Agriculture University [grant numbers 00770114]. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi: https://doi.org/10.1016/j.virusres.2018.10.006. References Brown, S.E., Gorman, B.M., Tesh, R.B., Knudson, D.L., 1993. Coltiviruses isolated from mosquitoes collected in indonesia. Virology 196 (1), 363–367. Burland, T.G., 2000. Dnastar’s lasergene sequence analysis software. Methods Mol. Biol. 132, 71. Harbin Veterinary Research Institute, 2013. Chinese Academy of Agricultural Sciences, Veterinary Microbiology, 2st ed. China Agriculture Press, Beijing. Hong, J., Zhou, X., 2006. The universal system of virus taxonomy in the 8∼(th) ictv report. Acta Phytopathologica Sinica 21 (1), 84–96. Jaafar, F., Attoui, H.M., Siebold, C., Sutton, G., Mertens, P., De Micco, P., et al., 2005a. The structure and function of the outer coat protein vp9 of banna virus. Structure 13 (1), 17. Jaafar, F., Attoui, H., Mertens, P.P., De, M.P., De, L.X., 2005b. Identification and functional analysis of vp3, the guanylyltransferase of banna virus (genus seadornavirus, family reoviridae). J. Gen. Virol. 86 (Pt 4), 1141–1146.
72