Accepted Manuscript Analyses of nucleotide, codon and amino acids usages between peste des petits ruminants virus and rinderpest virus
Xiao-xia Ma, Qiu-yan Chang, Peng Ma, Lin-jie Li, Xiao-kai Zhou, De-rong Zhang, Ming-sheng Li, Xin Cao, Zhong-ren Ma PII: DOI: Reference:
S0378-1119(17)30777-1 doi:10.1016/j.gene.2017.09.045 GENE 42194
To appear in:
Gene
Received date: Revised date: Accepted date:
20 May 2017 3 September 2017 21 September 2017
Please cite this article as: Xiao-xia Ma, Qiu-yan Chang, Peng Ma, Lin-jie Li, Xiao-kai Zhou, De-rong Zhang, Ming-sheng Li, Xin Cao, Zhong-ren Ma , Analyses of nucleotide, codon and amino acids usages between peste des petits ruminants virus and rinderpest virus. The address for the corresponding author was captured as affiliation for all authors. Please check if appropriate. Gene(2017), doi:10.1016/j.gene.2017.09.045
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Analyses of nucleotide, codon and amino acids usages between peste des petits ruminants virus and rinderpest virus Xiao-xia Ma1, Qiu-yan Chang1, Peng Ma1, Lin-jie Li, Xiao-kai Zhou, De-rong Zhang, Ming-sheng Li, Xin Cao*, Zhong-ren Ma* Engineering & Technology Research Center for Animal Cell, Gansu; College of Life Science and Engineering, Northwest Minzu University, Gansu, 730030, PR China 1
These authors contributed equally to this work
AC
CE
PT E
D
MA
NU
SC
RI
PT
* the corresponding authors E-mail address: Xin Cao,
[email protected]; Zhong-ren Ma,
[email protected] Abstract: Peste des petits ruminants virus (PPRV) and rinderpest virus (RPV) are two causative agents of an economically important diseases for ruminants (i.e., sheep, cattle and goat). In this study, the nucleotide, codon and amino acid usages for PPRV and RPV have been analyzed by multivariate statistical methods. Relative synonymous codon usage (RSCU) analysis represents that ACG for Thr and GCG for Ala are selected with under-representation in both PPRV and RPV, and AGA for Arg in PPRV and AGG for Arg in RPV are used with over-representation. The usage of nucleotide pair (CpG) tends to be removed from viral genes of the two viruses, suggesting that other evolutionary forces take part in evolutionary processes for viral genes in addition to mutation pressure from nucleotide usage at the third codon position. The overall nucleotide usage of viral gene is not major factor in shaping synonymous codon usage patterns, while the nucleotide usages at the third codon position and the nucleotide pairs play important roles in shaping synonymous codon usage patterns. Although PPRV and RPV are closely related antigenically, the codon and amino acid usage patterns for viral genes represent a significant genetic diversity between PPRV and RPV. Moreover, the overall codon usage trends for viral genes between PPRV and RPV are mainly influenced by mutation pressure from nucleotide usage at the third codon position and translation selection from hosts. Taken together, this is first comprehensive analyses for nucleotide, codon and amino acid usages of viral genes of PPRV and RPV and the findings are expected to increase our understanding of evolutionary forces influencing viral evolutionary pathway and adaptation toward hosts. Key words: Peste des petits ruminants virus, rinderpest virus, synonymous codon usage, nucleotide pair, mutation pressure, translation selection, host 1. Introduction Peste des petits ruminants virus (PPRV) and rinderpest virus (RPV), which belong to members of the Morbillivirus genus of the Paramyxoviridae family, can cause highly contagious and devastating viral diseases of ruminants (Gibbs et al., 1979; Plowright, 1962). Like other morbiliviruses, PPRV and RPV are enveloped, single-stranded, nonsegmented, negative-sense RNA viruses, and these two viruses share similar genomes which own six genes in order 5’ L-H-F-M-P-N 3’. PPRV and RPV have a common with some immunological cross-reactions and relatively similar clinical syndromes (Kumar et al., 2014). Small ruminants are the major hosts of PPRV, and some other ruminant species can be also infected by this virus (Banyard et al., 2010). RPV can result in fatal diseases in cattle, yaks and buffaloes (Carrillo et al., 2010). Notably, transmission of PPRV from infected goats to cattle and wild animals has been recently focused (Lembo et al., 2013), and RPV also infects small ruminant species and wildlife species without clinical syndromes (Anderson, 1995), suggesting that PPRV and RPV can switch hosts. A successful viral life-cycle often needs several properties, for example, the virus has the capability
1
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
to infect the host cells and controls cellular translation systems and directs them toward the efficient production of new viruses. All viruses are characterized by very high natural mutation rates, and RNA viruses often represent an exceptionally higher rate than DNA viruses (Drake and Holland, 1999; Jenkins and Holmes, 2003). Co-evolution and adaptation of viruses to their natural hosts have been mostly studied by comparing mutations at synonymous and non-synonymous coding site in specific genes and indicating a strong correlation between nucleotide usage patterns and synonymous codon usage patterns caused by multiple evolutionary forces (Bahir et al., 2009; Butt et al., 2014; Lobo et al., 2009; Wong et al., 2010; Zhou et al., 2013a; Zhou et al., 2013c). In nature, 20 amino acids plus three stop codons are coded by 64 codons. Although this redundancy enhances the resistance of genes to mutation: the third codon letters can be interchanged without affecting the primary sequence of protein, synonymous codons are not selected in random and all synonymous codons might be integrated parts of the genetic code with equal importance in maintaining its functional integrity (Biro, 2008). This phenomenon of synonymous codon usage has been studied in a wide range of organisms, from prokaryotes to eukaryotes and viruses, because genetic codes are regarded as a link between nucleotide and amino acid and are involved in many biological functions (i.e. translation efficiency and protein structure). There are several factors in shaping synonymous codon usages, including secondary protein structure, replication and selective transcription, hydrophobicity, hydrophilicity of the protein, the external environment, mutation pressure and translational selection (Ma et al., 2015; Ma et al., 2013; Plotkin and Kudla, 2011; Rosano and Ceccarelli, 2009; Sharp et al., 2010; Zhang et al., 1994; Zhou et al., 2012; Zhou et al., 2013b; Zhou et al., 2006; Zouridis and Hatzimanikatis, 2008). As of yet, the comprehensive studies for genetic features of nucleotide, synonymous codon and amino acid usages for viral genes of PPRV and RPV have not been reported. Here, a better understanding the genetic diversity of each functional viral gene in the viral genome and the co-evolution between PPRV, RPV and their natural hosts is necessary to identify the characteristics of codon usage among the viral functional genes and the adaptation degree of virus to its host. 2. Materials and Methods 2.1 Sequence data The 9 genomes of PPRV and the 9 genomes of RPV were downloaded from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.gov/Genbank/). The sequence data of PPRV includes AJ849636: Ovis aries (sheep), FJ905304: goat, EU267273: gaot, HQ197753: caprine goat, EU267274: goat, KC594074: alpine goat, JX217850: wild bharal, NC_006383: Ovis aries (sheep) and JF939201: goat. The sequence data of RPV includes AB547190: Oryctolagus cuniculus var, domesticus (rabbit), NC_006296: cattle, AB547189: Bos Taurus (cattle), X98291: cattle, JN234010: bovine, JN234009: bovine, JN234008: bovine and Z30697: vaccine strain. To estimate the synonymous codon usage patterns and the codon usage bias of viral genes, the six functional genes (L, P, F, H, M and N) from each viral genome were obtained by multiple sequence alignments with the Clustal W (1.7) computer program (Thompson et al., 1994). Additionally, codon usage frequencies of sheep (Ovis aries) and cattle (Bos taurus) were obtained from the codon usage database (Nakamura et al., 2000). 2.2 Analyses of nucleotide usages of viral genes The nucleotide usages were calculated for viral genes of PPRV and RPV, including the total frequency of occurrence of each nucleotide (U%, C%, A% and G%), the frequency of occurrence
2
ACCEPTED MANUSCRIPT of each nucleotide at different codon positions (U1%, U2%, U3%, C1%, C2%, C3%, A1%, A2%, A3%, G1%, G2% and G3%), the frequencies of occurrence of nucleotides GC at the first & second codon positions (GC12%) and GC at the third codon position (GC3%). 2.3 Calculation of nucleotide pairs in viral genes To investigate effects of nucleotide pairs on synonymous codon usage patterns of viral genes, the relative dinucleotides abundance was utilised in the study. Based on the previous report (Karlin and Burge, 1995), the relative abundance of nucleotide pairs in viral genes was calculated. The odds ratio was calculated depending on the following formula:
F ( xy ) n2 F ( x) F ( y ) n 1
PT
P( xy )
D
MA
NU
SC
RI
where F(xy) stands for the frequency of occurrence of nucleotide pair (xy), F(x) means the frequency of occurrence of nucleotide (x), F(y) means the frequency of occurrence of nucleotide (y), n means the total number of nucleotide in the sequence. Compared with a random association of mononucleotides, when P(xy) is more than 1.23, the nucleotide pair (xy) is regarded as the over-represented nucleotide pair; when P(xy) is less than 0.78, the nucleotide pair (xy) is regarded as the under-represented nucleotide pair. 2.4 Analyzing the synonymous codon usage patterns of viral genes In order to avoid the effects of gene lengths and amino acid compositions on the synonymous codon usage patterns, the relative synonymous codon usage value (RSCU) was utilised in the study, based on the previous report (Sharp et al., 1986). Notably, three stop codons (UGA, UAA and UAG), UGG for Try and AUG for Met are not utilised into the RSCU analyses. For codon usage frequencies of each genomes of the sheep and cattle, the RSCU values were also calculated for the 59 synonymous codons by the formula mentioned above.
PT E
gij RSCU ni ni gij j 1
AC
CE
where gij is the observed number of the the jth codon for the ith amino acid (which has ni synonymous codons). RSCU values show the ratio between the observed usage frequency of one codon in a viral gene and the expected usage frequency in the synonymous codon family given that all codons for the particular amino acid are used in random. When RSCU value is 1, the corresponding synonymous codon is no bias at codon usage. In addition, to further investigate the genetic diversity of the specific viral gene between PPRV and RPV at the aspect of codon usage bias (CUB), the CUB values (CUB=RSCU-1) of 59 synonymous codons from the target viral genes in PPRV and RPV were calculated by complete linkage clustering with Eudidean distance, and the heat maps associated with CUB values from the specific viral gene were drawn by software Java TreeView (http://jtreeview.sourceforge.net/). 2.5 Calculation for the overall trend of codon usage for viral gene To investigate the overall trends of codon usage for the viral functional genes, the ‘effective number of codons’ (ENC), the useful estimator of absolute codon bias, was also employed to quantify the overall trends of codon usage for the specific genes (Wright, 1990). The plot of ENC value versus GC3% can be effectively applied to estimate the heterogeneity of the codon usages of different genes. The ENC value ranges from 20 (when only one synonymous codon is chosen by
3
ACCEPTED MANUSCRIPT the corresponding amino acid) to 61 (when all synonymous codons are used equally). Remarkably, when ENC value is less than or equal to 35, the gene of interest is regarded as the significant codon bias. 2.6 Calculating the relative amino acid usage value of viral genes To better analyze the amino acid usage patterns for viral proteins, we referenced the formula for RSCU, and therefore a simple formula for the relative amino acid usage (RAAU) values of the target amino acid sequence was employed in the study.
M ij / Ni 1/ 20
PT
RAAU ij
where RAAUij denotes the relative magnitude of the amino acid j that is selected by the ith amino
RI
acid sequence, Mij denotes the usage frequency of the amino acid j in the ith amino acid sequence,
SC
and Ni denotes the total number of amino acids in the ith amino acid sequence. Particularly, Met and Trp can be included in this RAAU value calculation.
PT E
D
MA
NU
Furthermore, to better estimate the genetic diversity of different viral proteins between PPRV and RPV, principal component analysis (PCA) was employed in the current study, which reduces data dimensionality by performing a covariance analysis between factors (20 types of amino acid). This analysis gives a more convenient way to visualize the genetic diversity of the specific type of viral protein. All statistical processes were performed by the statistical software SPSS 11.5 for Windows. 2.7 Estimating the effect of host on viral genes at the codon usage For estimating the potential effect of the overall codon usage in host cell on that of viral genes of PPRV and RPV, we utilised a formula for the comparative analysis of codon usage between host and the specific viral gene (Zhou et al., 2013d).
AC
CE
R( A, B)
D( A, B )
59
ai bi i 1
59
59
i 1
i 1
2 2 ai bi
1 R ( A, B ) 2
where R(A,B) is defined as a cosine value of an included angle between A and B special vectors representing the degree of similarity between the target virus and a specific host at the overall codon usage pattern, ai is defined as the RSCU value for a specific codon in 59 synonymous codons of viral coding sequence, bi is termed as the RSCU value for the same codon of the host. D(A,B) represents the potential effect of the overall codon usage of the host on that of virus, and this value ranges from zero to 1.0. 3. Results 3.1 The nucleotide usages in viral genes between PPRV and RPV The comparative analysis of the overall nucleotide usage shows that the A% is highest in all kinds of viral genes of PPRV and RPV, and the U1%, C1% and G2% are generally low in these
4
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
viral genes (Fig.1). In detail, as shown in Fig.1A for F gene, A%, U%, U1%, C1%, U2%, C2%, G2%, U3%, C3% and A3% are significantly different between PPRV and RPV. As shown in Fig.1B for H gene, all kinds of contents of nucleotide are significantly different between PPRV and RPV, not including the U1% and C1%. As shown in Fig.1C for L gene, the contents of nucleotide at the second codon position are similar between PPRV and RPV due to the U2%, C2% and G2% with no significant difference, while the contents of nucleotide at the third codon position are significantly different between PPRV and RPV due to all nucleotide contents being significant difference; in addition, the A contents at the different codon position are significant between PPRV and RPV. As shown in Fig.1D for M gene, the A contents at the different codon position are significantly different between PPRV and RPV. As shown in Fig.1E for N gene, the total contents of nucleotide are significantly different between PPRV and RPV, and the G contents at the different codon position are significantly different between PPRV and RPV. As shown in Fig.1F for P gene, all contents of nucleotide are significantly different between PPRV and RPV, not including the U3%. Notably, based on the fluctuations of nucleotide content at different codon positions for the six types of viral gene of the two viruses, the total nucleotide contents do not affect the trends of nucleotide usage at different codon positions, suggesting that the total contents of nucleotide are just a comprehensive reflection from the contents of nucleotide at different codon positions which are influenced by various evolutionary forces. Based on the different extents of base stacking force, the two kinds of nucleotide (G & C with high base stacking forces) were estimated at different codon positions by means of correlation analysis. As for the comparative analyses between GC%, GC12% and GC3% for viral genes of PPRV, there are highly significant positive correlations (r=0.911, p<0.001; GC% versus GC12%) (r=0.734, p<0.01; GC12% versus GC3%) (Fig.2 A & B). As for the comparative analyses between GC%, GC12% and GC3% for viral genes of RPV, there are also highly significant positive correlations (r=0.976, p<0.001; GC% versus GC12%) (p=0.538, p<0.001, GC% versus GC3%) (Fig.2C & D). High correlation of GC% and GC12% is as expected, because nucleotide usages in the first and second positions of codon largely influence amino acid compositions of viral proteins. Although there are significant positive correlations between GC12% and GC3% for PPRV and RPV, the extents of correlation is much lower than those of correlations of GC% versus GC12% and GC% versus GC3% (Fig.2E & F), further suggesting that although the reaction between GC12% and GC3% occurs in genomes of PPRV and RPV, the formations of GC12% and GC3% contents are influenced by different evolutionary forces. 3.2 The usages of nucleotide pairs in viral genes Because the relationship of nucleotide usages between GC12% and GC3% has been estimated above mentioned, it is necessary to analyze the features of other kinds of nucleotide pairs in viral genes of PPRV and RPV. To better investigate the potential effect of nucleotide pairs on the overall trends of codon usage in viral genes of PPRV and RPV, we calculated the relative abundances of 16 types of nucleotide pairs from viral genes. In detail, as for nucleotide pairs in viral genes of PPRV, nucleotide pairs with under-representation include ApA, TpT and CpG in F gene, ApA, TpA, TpT, CpG and GpC in H gene, ApA and CpG in L gene, TpT, CpG and GpC in M gene, TpT, CpG and GpC in M gene, ApA, TpA, CpG and GpT in N gene, ApA, ApC, TpA, TpT and CpG in P gene; nucleotide pairs with over-representation include TpC and GpA in H gene, ApT, TpT, TpC, TpG, CpT and GpA in L gene, GpA in M gene, TpC and GpA in N gene, TpC and GpA in P gene (Table 1). Turning to nucleotide pairs in viral genes of RPV, nucleotide pairs with
5
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
under-representation include ApA, TpA, CpG and GpC in F gene, ApA, TpA and CpG in H gene, ApA, ApC, CpT, CpG and GpT in L gene, ApA, TpT, CpG and GpT in M gene, ApA, TpT, CpT, CpG and GpA in N gene, ApA, ApC, TpT, TpA, CpT, CpG, GpA and GpC in P gene; nucleotide pairs with over-representation include CpA and GpG in H gene, TpG and GpG in L gene, GpG in M gene, CpC, CpA and GpG in N gene, TpC and GpG in P gene (Table 2). The occurrences of nucleotide pairs were not randomly distributed, and no nucleotide pairs were suitable for the expected frequencies. It is interesting that the nucleotide pair (CpG) is under-represented usage in all viral genes of PPRV and RPV, because the usages of G and C at any codon positions are not low frequencies for viral genes (Fig. 1), suggesting that the usage of nucleotide pair (CpG) is suppressed by evolutionary forces. In addition, the nucleotide pair (TpA) is under-represented usage in H, N and P genes of PPRV and in F, H and P genes in RPV. Moreover, the nucleotide pairs with high base stacking energies (GpG, CpC, CpG and GpC) and the nucleotide pairs with low base stacking energies (TpT, TpA, ApA and ApT) are generally selected in low frequencies by PPRV and RPV. These results show that a reasonable formation of nucleotide pairs could sustain the normal transcription of viral genes in PPRV and RPV. In comparison of the corresponding gene between PPRV and RPV (Tables 1 & 2), the two viruses had different usage patterns of GpG and GpA in L gene, GpG in M gene, GpG and GpA in P gene and GpA in N gene. These results imply that the bias usage of nucleotide pairs is one of factors from nucleotide composition constraint, which is related to mutation pressure, in affecting the synonymous codon usage patterns for PPRV and RPV. 3.3 The synonymous codon usage patterns of viral genes As for synonymous codon usage patterns of viral genes in PPRV, the under-represented codons include UCU, CCC, CCG, ACG, GCG, CGU, CGC, CGA, CGG, GGU and the over-represented codons include CUG, UCA, CCA, ACA, GCA, AGA, GGG in F gene; the under-represented codons include UUA, UCG, AGC, ACG, GCG, GAC, CGU, CGC, CGA and the over-represented codons include CUG, UCA, CCU, AGA, AGG in H gene; the under-represented codons include UCG, ACG, GCG, CGU, CGC and the over-represented AGA and AGG in L gene; the under-represented codons include CUU, CUC, UCC, UCG, CCG, ACG, GCG, UAU, GAU, GAC, UGU, CGU, CGC and the over-represented codons include CUG, UCA, CCC, GCA and AGA in M gene; the under-represented codons UUA, CUA, AUA, GUA, GUG, CCG, ACG, GCG, UAC, CAU, AAU, UGU, CGA, CGG, GGU, GGC and the over-represented GUC, UCA, CAC, AAC, AGA, AGG in N gene; the under-represented codons include UUG, CUG, GUA, UCG, AGU, ACG, GCG, UGC, CGC, CGA, CGG, GGU and the over-represented CUC, GCA, AGA in P gene (Table 3). Turning to synonymous codon usage patterns of viral genes in RPV genome, the under-represented codons include UUA, ACG, GCG, CGA, CGG, GGA and the over-represented AGG and GGG in F gene; the under-represented codons include UCG, CCG, ACG, GCG, CAC, GAA, CGU, CGC and the over-represented UCA, GCU and AGA in H gene; the under-represented codon include GUU, UCG, ACG, CGU, CGC and the over-represented codons include UCC, AGU, CCA, ACA, GCA, AGA, GGU and GGG in M gene; the under-represented codons include CUA, AUA, UCG, ACG, GCG, UGC, CGU, CGC, CGA and the over-represented codon include GCA, UGU, AGA and AGG in N gene; the under-represented codons include UUU, UUA, UUG, CUA, AUA, GUA, ACG, GCG, CAC, CGU and the over-represented codons include CUG, AUC and UCU in P gene (Table 4). Of note, these codons (ACG, GCG, CGC, CGA and CGG) which contain the nucleotide pair CpG are selected in viral genes at low levels between
6
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
PPRV and RPV. Furthermore, there were some different usage patterns of the corresponding gene between PPRV and RPV, including AGA and AGG for Arg in F gene, CGA for Arg in H gene, UCC, UCA and UCG for Ser, CCC and CCA for Pro, ACA for Thr, GCU, GCC and GCG for Ala, GGG for Gly in M gene, AGC for Ser, UAU and UAC for Tyr, CAU and CAC for His, UGU and UGC for Cys in N gene, CUA and CUG for Lue in P gene (Tables 3 & 4). These results suggest that the nucleotide usages at the third codon position and the formation of nucleotide pairs may influence the synonymous codon usage patterns for viral genes of the two viruses. In addition, comparison of codon usage bias for all 59 synonymous codons (not including Met, Trp and three stop codons) generated a heat map for the specific viral gene of PPRV and RPV strains by clustering. With no doubt, there are distinct evolutionary ways for the specific viral gene between PPRV and RPV (Fig. S1). Interestingly, although vaccine strain (Z30697) for RPV were developed following multiple passages in different cells to induce attenuation-therefore codon bias is likely to be artefactual, codon usage bias for the 6 genes of this vaccine strain is closely related to that of the wild type RPV (X98291). This genetic characterization might implicate that the passages of RPV in vitro culture would not impair viral codon usage bias, mainly due to few any evolutional pressures. From the six heat maps, it is found that the usages of synonymous codons for Arg are highly biased for viral genes, and the usages of synonymous codon for Lys and Asn are almost equal and random for viral genes of the two viruses. These results indicate how codon usage bias has influenced the two viruses’ evolution and RSCU is a reasonable indicator for evaluating the evolutionary relationship between PPRV and RPV. 3.4 The overall trends of codon usage for viral genes To further quantify the extent of the overall codon usage of viral genes of PPRV and RPV, the ENC values for each viral gene were calculated. As shown in Fig.3, all ENC values are more than 50 for viral genes of PPRV and RPV, however, all ENC values are less than the corresponding expected values (below the expected curve). Although there are six kinds of viral genes with different biological functions in PPRV and RPV, the effects of nucleotide usages at the third codon position on the overall codon usage trends strongly exist in the PPRV and RPV genomes. Because the dots of ENC for viral genes do not exist at the expected curve, other evolutionary forces may take part in the genetic processes of PPRV and RPV. 3.5 The genetic features for amino acid usages for viral genes Based on the PCA for amino acid usage patterns for the six kinds of viral genes of PPRV and RPV, each kind of viral gene has its specific genetic feature involved in amino acid usages (Fig. 4). As for the amino acid usage patterns of P, N, M, H and F proteins, it is significantly different in evolutionary processes between PPRV and RPV, while the amino acid usage patterns of L protein represent the similar evolutionary trend between PPRV and RPV. These results imply the biological functions generated by L protein need the stable and similar amino acid usage patterns between PPRV and RPV. The normal biological functions generated by viral proteins can be regarded as indicators for evolutionary process of viruses, and the specific amino acid usage patterns play an important role in protein generation with the correct structure (Ma et al., 2013; Zhou et al., 2013c). 3.6 The effects of codon usages of hosts on that of viral genes Although comparisons of synonymous codon usage patterns between hosts and the target viral genes have been shown by some previous reports (Ma et al., 2014; Sanchez et al., 2003; Zhou et al., 2013e), they have limitations in estimating the comprehensive effects of the overall codon
7
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
usage of hosts on that of viral genes. The similarity index D(A,B) was therefore calculated for each viral gene in relation to the host. Generally, the comprehensive effect of the overall codon usage in sheep is higher than that of cattle on all viral genes between PPRV and RPV (Fig. 5). In comparison of the six genes of PPRV, the similarity of the overall codon usage of F gene is highest and that of L gene is lowest. In comparison of the six genes of RPV, the similarity of the overall codon usage of M is highest and that of P gene is lowest (Fig. 5). These results imply that viral genes of PPRV and RPV have different evolutional processes to meet the environment of the overall codon usage in different natural hosts, and viral genes with various biological functions represent different extents of codon usage adaptation on hosts. 4. Discussion RNA viruses whose lifecycles strongly depend on host cells often have an ability to replicate and mutate rapidly. Although PPRV can cause subclinical infections of large ruminants in absence of RPV, this virus did not enhance virulence and transmission (Nambulli et al., 2016), suggesting that the specific genetic features of PPRV limit its adaptability and pathogenicity (Muniraju et al., 2014). Although RPV has been eradicated in the world, RPV still poses a risk for disease recurrence potentially (Hamilton et al., 2015). Comparison of genetic diversity between PPRV and RPV could estimate the real features of evolutionary processes in viral genomes. Even though an amino acid can be generated by several synonymous codons, these synonymous codons tend to be selected unequally in viral genes of PPRV and RPV. The usages of nucleotide and synonymous codon are key factors in genetic diversity for PPRV and RPV (Carrillo et al., 2010; Fukai et al., 2011; Muniraju et al., 2014; Padhi and Ma, 2014). The bias of synonymous codon usages tends to be affected directly by the distribution of the overall nucleotide usage in viral genomes. As for viral genes, PPRV and RPV share a similar model for nucleotide usages at different codon positions, and the overall nucleotide usages have stronger abilities to influence the nucleotide usages at the first and second codon positions than nucleotide usages at the third codon position, suggesting that multiple evolutionary forces (i.e. mutation pressure, translation selection, fine-tuning translation kinetics selection, transcription, biological function, immunological defense from host, and so on) take part in the formation of synonymous codon usages (Aragones et al., 2008; Aragones et al., 2010; Karlin et al., 1994a; Sugiyama et al., 2005). In comparisons among GC12%, GC3% and GC% of viral genes of PPRV and RPV, the extents of correlation between GC12% and GC% are higher than the correlation between GC3% and GC% in viral genes, further suggesting that the overall nucleotide usages just reflect the stronger role of total nucleotide composition in amino acid usages controlled by the first and second codon positions than that in synonymous codon usages. According to analyses of nucleotide conservation of PPRV genome, there are the strong purifying selection pressures on PPRV and nucleotide usages at the first and second codon position result mainly in amino acid usage patterns of PPRV (Bao et al., 2017; Clarke et al., 2017). The bias usages of nucleotide pairs can influence the overall codon usage pattern in RNA viruses’ genomes (Cheng et al., 2013; Greenbaum et al., 2008). Based on the data for synonymous codon usage patterns of PPRV and RPV, the codons including CpG are used with under-representation, further indicate that nucleotide pairs join in the formation of synonymous codon usages. Moreover, as for usage extents of nucleotide pair, most pairs are used in normal range (0.78≤P≤1.23), however, the nucleotide pair (CpG) tends to be used at low levels in viral genes of PPRV and RPV. In DNA strand, the over-represented CpG tend to enhance the risk of cytosine DNA methylation (Law and Jacobsen, 2010). Turning to viruses’ genome, CpG has also
8
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
been observed to be predominantly under-represented (Karlin et al., 1994b; Rima and McFerran, 1997). It is obvious that the mechanism about cytosine DNA methylation is unlikely to apply for explanation for the under-represented CpG in viral genes of PPRV and RPV, due to no stage of DNA intermediates during RNA replication. Of note, due to the two kinds of nucleotide (C and G) with high base stacking energy (Breslauer et al., 1986), the under-represented CpG may reduce the stability of secondary structure of RNA sequence and ensure a reasonable efficiency of transcription and replication for genome of PPRV and RPV. Another reason for the under-represented CpG may be related to the host innate immunity evasion (Cheng et al., 2013). Although codon usage of viral genes between PPRV and RPV are clearly determined by the joint actions of nucleotide usages from mutation pressure, the adaptation benefit of codon usage of viral genes is still less clear. As for trends of the overall codon usage of various viral genes, figure 3 (GC3% versus ENC value) can be widely employed in estimating codon usage bias among different kinds of genes. In the plot of GC3% versus ENC value, if the observed dots lay on the expected ENC curve, the nucleotide usages at the third codon position will dominate the trends of the overall codon usage of gene; if the observed dots deviate from the expected ENC curve, other evolutionary forces plus the nucleotide usages at the third codon position play roles in the trends of the overall codon usage (Wright, 1990). Comparative analyses between the overall trend of codon usage and GC3% represented the weakly bias trend of the overall codon usage in viral genes of PPRV and RPV and nucleotide usages at the third codon position dominating in the overall codon usage for PPRV and RPV. One possible explanation for the weak codon usage bias of viral genes in PPRV and RPV is that it might be advantageous for efficient replication in host cells, with potentially distinct codon preference. Of note, PPRV and RPV are closely related antigenically. Infection of cattle with PPRV can impair immune response against RPV, and antibodies against PPRV can offer cross protection against RPV (Anderson and McKay, 1994; Taylor, 1979). By analyzing genetic diversity of amino acid usages for viral proteins of the two viruses, the amino acid usage patterns of L protein between PPRV and RPV are highly related, while the others can remain the specific genetic diversity between PPRV and RPV. The conserved biological functions for PPRV and RPV play an important role in amino acid usage for L protein (Bailey et al., 2005). It is interesting that although M protein which serves as a link between the N protein and the surface glycoproteins (F and H) is the most conserved viral protein within the Morbilliviruses (Haffar et al., 1999; Muthuchelvan et al., 2006), the amino acid usage patterns of this protein represent a significant genetic diversity between PPRV and RPV. It may be explained that the amino acid usage patterns for L protein need to change and meet the various amino acid usage patterns of N, F and H proteins. As for the biological function of N protein, PPRV and RPV genomes can be encapsidated by the N protein in ribonucleoprotein complex (Diallo et al., 1987). The genetic diversity of amino acid usages between PPRV and RPV may show that the amino acid usage pattern of N protein needs to meet the requirement from the viral genome contexts. According to the adaptation of the overall codon usage of PPRV and RPV to their hosts, the evolutionary selection from hosts contributes to the molecular evolution of PPRV and RPV at the level of codon usage. Some evidences indicate that PPRV is extending its host range (Jaisree et al., 2017; Li et al., 2017; Zakian et al., 2016; Zhou et al., 2017). Based on the adaptation of the overall codon usage of viral gene on host, the selection of codon usage from hosts acts on the overall codon usage of viral gene. The fitness of codon usage of viral gene for protein translation is an important fitness determinant in rapidly growing organisms (Cannarozzi et al., 2010).
9
ACCEPTED MANUSCRIPT
RI
PT
In general, the overall codon usage trends for viral genes between PPRV and RPV are slightly biased, but the major evolutionary force shaping codon usage pressure is still mutation pressure caused by the usages of nucleotide at the third codon position and nucleotide pairs. In addition, contributions of other evolutionary factors (translation selection from hosts and transcription and amino acid usage) are also evident from our analysis. Co-evolution and adaptation of viruses to the hosts were mostly studied by the codon usage pattern in viral genes. Like others RNA viruses, the codon usage patterns of RNA viruses is mainly influenced by the strong mutation pressure, but codon usage patterns of viral genes in genomes of PPRV and RPV are largely regulated under translation selection from hosts. In conclusion, the synonymous codon usage patterns of viral genes between PPRV and RPV are shaped by the equilibrium of mutation pressure, translation selection, transcription, cellular antiviral response from hosts. Acknowledgements The work was supported by National Natural Science foundation of China (No.31700763,
SC
No.81760287), the Central Universities Deriving from the Northwest Minzu University (No. 31920170158), Innovative Research Team in University (No. IRT_17R88), Gansu Provincial
Assistance Project Grant (KY201501005).
MA
Reference
NU
Science and Technology Grant (No. 1504WKCA094) and Ministry of Science and Technology
Anderson, E.C., 1995. Morbillivirus infections in wildlife (in relation to their population biology and disease control in domestic animals). Vet Microbiol 44, 319-332. Anderson, J., McKay, J.A., 1994. The detection of antibodies against peste des petits ruminants virus in
D
cattle, sheep and goats and the possible implications to rinderpest control programmes.
PT E
Epidemiol Infect 112, 225-231.
Aragones, L., Bosch, A., Pinto, R.M., 2008. Hepatitis A virus mutant spectra under the selective pressure of monoclonal antibodies: codon usage constraints limit capsid variability. J Virol 82, 1688-1700.
CE
Aragones, L., Guix, S., Ribes, E., Bosch, A., Pinto, R.M., 2010. Fine-tuning translation kinetics selection as the driving force of codon usage bias in the hepatitis A virus capsid. PLoS Pathog 6, e1000797.
AC
Bahir, I., Fromer, M., Prat, Y., Linial, M., 2009. Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences. Mol Syst Biol 5, 311. Bailey, D., Banyard, A., Dash, P., Ozkul, A., Barrett, T., 2005. Full genome sequence of peste des petits ruminants virus, a member of the Morbillivirus genus. Virus Res 110, 119-124. Banyard, A.C., Parida, S., Batten, C., Oura, C., Kwiatek, O., Libeau, G., 2010. Global distribution of peste des petits ruminants virus and prospects for improved diagnosis and control. J Gen Virol 91, 2885-2897. Bao, J., Wang, Q., Li, L., Liu, C., Zhang, Z., Li, J., Wang, S., Wu, X., Wang, Z., 2017. Evolutionary dynamics of recent peste des petits ruminants virus epidemic in China during 2013-2014. Virology 510, 156-164. Biro, J.C., 2008. Does codon bias have an evolutionary origin? Theor Biol Med Model 5, 16. Breslauer, K.J., Frank, R., Blocker, H., Marky, L.A., 1986. Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A 83, 3746-3750. 10
ACCEPTED MANUSCRIPT Butt, A.M., Nasrullah, I., Tong, Y., 2014. Genome-wide analysis of codon usage and influencing factors in chikungunya viruses. PLoS One 9, e90905. Cannarozzi, G., Schraudolph, N.N., Faty, M., von Rohr, P., Friberg, M.T., Roth, A.C., Gonnet, P., Gonnet, G., Barral, Y., 2010. A role for codon order in translation dynamics. Cell 141, 355-367. Carrillo, C., Prarat, M., Vagnozzi, A., Calahan, J.D., Smoliga, G., Nelson, W.M., Rodriguez, L.L., 2010. Specific detection of Rinderpest virus by real-time reverse transcription-PCR in preclinical and clinical samples from experimentally infected cattle. J Clin Microbiol 48, 4094-4101. Cheng, X., Virk, N., Chen, W., Ji, S., Sun, Y., Wu, X., 2013. CpG usage in RNA viruses: data and
PT
hypotheses. PLoS One 8, e74109.
Clarke, B., Mahapatra, M., Friedgut, O., Bumbarov, V., Parida, S., 2017. Persistence of Lineage IV
RI
Peste-des-petits ruminants virus within Israel since 1993: An evolutionary perspective. PLoS One 12, e0177028.
SC
Diallo, A., Barrett, T., Lefevre, P.C., Taylor, W.P., 1987. Comparison of proteins induced in cells infected with rinderpest and peste des petits ruminants viruses. J Gen Virol 68 ( Pt 7), 2033-2038.
NU
Drake, J.W., Holland, J.J., 1999. Mutation rates among RNA viruses. Proc Natl Acad Sci U S A 96, 13910-13913.
Fukai, K., Morioka, K., Sakamoto, K., Yoshida, K., 2011. Characterization of the complete genomic
MA
sequence of the rinderpest virus Fusan strain cattle type, which is the most classical isolate in Asia and comparison with its lapinized strain. Virus Genes 43, 249-253. Gibbs, E.P., Taylor, W.P., Lawman, M.J., Bryant, J., 1979. Classification of peste des petits ruminants virus as the fourth member of the genus Morbillivirus. Intervirology 11, 268-274.
D
Greenbaum, B.D., Levine, A.J., Bhanot, G., Rabadan, R., 2008. Patterns of evolution and host gene mimicry in influenza and other RNA viruses. PLoS Pathog 4, e1000079.
PT E
Haffar, A., Libeau, G., Moussa, A., Cecile, M., Diallo, A., 1999. The matrix protein gene sequence analysis reveals close relationship between peste des petits ruminants virus (PPRV) and dolphin morbillivirus. Virus Res 64, 69-75. Hamilton, K., Visser, D., Evans, B., Vallat, B., 2015. Identifying and Reducing Remaining Stocks of
CE
Rinderpest Virus. Emerg Infect Dis 21, 2117-2121. Jaisree, S., Aravindhbabu, R.P., Roy, P., Jayathangaraj, M.G., 2017. Fatal peste des petits ruminants disease in Chowsingha. Transbound Emerg Dis.
AC
Jenkins, G.M., Holmes, E.C., 2003. The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 92, 1-7. Karlin, S., Burge, C., 1995. Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11, 283-290. Karlin, S., Doerfler, W., Cardon, L.R., 1994a. Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? J Virol 68, 2889-2897. Karlin, S., Mocarski, E.S., Schachtel, G.A., 1994b. Molecular evolution of herpesviruses: genomic and protein sequence comparisons. J Virol 68, 1886-1902. Kumar, N., Maherchandani, S., Kashyap, S.K., Singh, S.V., Sharma, S., Chaubey, K.K., Ly, H., 2014. Peste des petits ruminants virus infection of small ruminants: a comprehensive review. Viruses 6, 2287-2327. Law, J.A., Jacobsen, S.E., 2010. Establishing, maintaining and modifying DNA methylation patterns in
11
ACCEPTED MANUSCRIPT plants and animals. Nat Rev Genet 11, 204-220. Lembo, T., Oura, C., Parida, S., Hoare, R., Frost, L., Fyumagwa, R., Kivaria, F., Chubwa, C., Kock, R., Cleaveland, S., Batten, C., 2013. Peste des petits ruminants infection among cattle and wildlife in northern Tanzania. Emerg Infect Dis 19, 2037-2040. Li, J., Li, L., Wu, X., Liu, F., Zou, Y., Wang, Q., Liu, C., Bao, J., Wang, W., Ma, W., Lin, H., Huang, J., Zheng, X., Wang, Z., 2017. Diagnosis of Peste des Petits Ruminants in Wild and Domestic Animals in Xinjiang, China, 2013-2016. Transbound Emerg Dis. Lobo, F.P., Mota, B.E., Pena, S.D., Azevedo, V., Macedo, A.M., Tauch, A., Machado, C.R., Franco, G.R., 2009. Virus-host coevolution: common patterns of nucleotide motif usage in Flaviviridae
PT
and their hosts. PLoS One 4, e6282.
Ma, X.X., Feng, Y.P., Bai, J.L., Zhang, D.R., Lin, X.S., Ma, Z.R., 2015. Nucleotide composition bias
RI
and codon usage trends of gene populations in Mycoplasma capricolum subsp. capricolum and M. Agalactiae. J Genet 94, 251-260.
SC
Ma, X.X., Feng, Y.P., Liu, J.L., Ma, B., Chen, L., Zhao, Y.Q., Guo, P.H., Guo, J.Z., Ma, Z.R., Zhang, J., 2013. The effects of the codon usage and translation speed on protein folding of 3D(pol) of foot-and-mouth disease virus. Vet Res Commun 37, 243-250.
NU
Ma, X.X., Feng, Y.P., Liu, J.L., Zhao, Y.Q., Chen, L., Guo, P.H., Guo, J.Z., Ma, Z.R., 2014. The characteristics of synonymous codon usage in the initial and terminal translation regions of encephalomyocarditis virus. Acta Virol 58, 86-91.
MA
Muniraju, M., Munir, M., Parthiban, A.R., Banyard, A.C., Bao, J., Wang, Z., Ayebazibwe, C., Ayelet, G., El Harrak, M., Mahapatra, M., Libeau, G., Batten, C., Parida, S., 2014. Molecular evolution of peste des petits ruminants virus. Emerg Infect Dis 20, 2023-2033. Muthuchelvan, D., Sanyal, A., Sreenivasa, B.P., Saravanan, P., Dhar, P., Singh, R.P., Singh, R.K.,
D
Bandyopadhyay, S.K., 2006. Analysis of the matrix protein gene sequence of the Asian lineage of peste-des-petits ruminants vaccine virus. Vet Microbiol 113, 83-87.
PT E
Nakamura, Y., Gojobori, T., Ikemura, T., 2000. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28, 292. Nambulli, S., Sharp, C.R., Acciardo, A.S., Drexler, J.F., Duprex, W.P., 2016. Mapping the evolutionary trajectories of morbilliviruses: what, where and whither. Curr Opin Virol 16, 95-105.
CE
Padhi, A., Ma, L., 2014. Genetic and epidemiological insights into the emergence of peste des petits ruminants virus (PPRV) across Asia and Africa. Sci Rep 4, 7040. Plotkin, J.B., Kudla, G., 2011. Synonymous but not the same: the causes and consequences of codon
AC
bias. Nat Rev Genet 12, 32-42. Plowright, W., 1962. Rinderpest virus. Ann N Y Acad Sci 101, 548-563. Rima, B.K., McFerran, N.V., 1997. Dinucleotide and stop codon frequencies in single-stranded RNA viruses. J Gen Virol 78 ( Pt 11), 2859-2870. Rosano, G.L., Ceccarelli, E.A., 2009. Rare codon content affects the solubility of recombinant proteins in a codon bias-adjusted Escherichia coli strain. Microb Cell Fact 8, 41. Sanchez, G., Bosch, A., Pinto, R.M., 2003. Genome variability and capsid structural constraints of hepatitis a virus. J Virol 77, 452-459. Sharp, P.M., Emery, L.R., Zeng, K., 2010. Forces that influence the evolution of codon bias. Philos Trans R Soc Lond B Biol Sci 365, 1203-1212. Sharp, P.M., Tuohy, T.M., Mosurski, K.R., 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14, 5125-5143.
12
ACCEPTED MANUSCRIPT Sugiyama, T., Gursel, M., Takeshita, F., Coban, C., Conover, J., Kaisho, T., Akira, S., Klinman, D.M., Ishii, K.J., 2005. CpG RNA: identification of novel single-stranded RNA that stimulates human CD14+CD11c+ monocytes. J Immunol 174, 2273-2279. Taylor, W.P., 1979. Protection of goats against peste-des-petits-ruminants with attenuated rinderpest virus. Res Vet Sci 27, 321-324. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673-4680. Wong, E.H., Smith, D.K., Rabadan, R., Peiris, M., Poon, L.L., 2010. Codon usage bias and the
PT
evolution of influenza A viruses. Codon Usage Biases of Influenza Virus. BMC Evol Biol 10, 253.
RI
Wright, F., 1990. The 'effective number of codons' used in a gene. Gene 87, 23-29.
Zakian, A., Nouri, M., Kahroba, H., Mohammadian, B., Mokhber-Dezfouli, M.R., 2016. The first
SC
report of peste des petits ruminants (PPR) in camels (Camelus dromedarius) in Iran. Trop Anim Health Prod 48, 1215-1219.
Zhang, S., Goldman, E., Zubay, G., 1994. Clustering of low usage codons and ribosome movement. J
NU
Theor Biol 170, 339-354.
Zhou, J.H., Gao, Z.L., Zhang, J., Chen, H.T., Pejsak, Z., Ma, L.N., Ding, Y.Z., Liu, Y.S., 2012. Comparative [corrected] codon usage between the three main viruses in pestivirus genus and
MA
their natural susceptible livestock. Virus Genes 44, 475-481. Zhou, J.H., Gao, Z.L., Zhang, J., Ding, Y.Z., Stipkovits, L., Szathmary, S., Pejsak, Z., Liu, Y.S., 2013a. The analysis of codon bias of foot-and-mouth disease virus and the adaptation of this virus to the hosts. Infect Genet Evol.
D
Zhou, J.H., Su, J.H., Chen, H.T., Zhang, J., Ma, L.N., Ding, Y.Z., Stipkovits, L., Szathmary, S., Pejsak, Z., Liu, Y.S., 2013b. Clustering of low usage codons in the translation initiation region of
PT E
hepatitis C virus. Infect Genet Evol 18, 8-12. Zhou, J.H., You, Y.N., Chen, H.T., Zhang, J., Ma, L.N., Ding, Y.Z., Pejsak, Z., Liu, Y.S., 2013c. The effects of the synonymous codon usage and tRNA abundance on protein folding of the 3C protease of foot-and-mouth disease virus. Infect Genet Evol 16, 270-274.
CE
Zhou, J.H., Zhang, J., Sun, D.J., Ma, Q., Chen, H.T., Ma, L.N., Ding, Y.Z., Liu, Y.S., 2013d. The distribution of synonymous codon choice in the translation initiation region of dengue virus. PLoS One 8, e77239.
AC
Zhou, J.H., Zhang, J., Sun, D.J., Ma, Q., Ma, B., Pejsak, Z., Chen, H.T., Ma, L.N., Ding, Y.Z., Liu, Y.S., 2013e. Potential roles of synonymous codon usage and tRNA concentration in hosts on the two initiation regions of foot-and-mouth disease virus RNA. Virus Res 176, 298-302. Zhou, T., Sun, X., Lu, Z., 2006. Synonymous codon usage in environmental chlamydia UWE25 reflects an evolutional divergence from pathogenic chlamydiae. Gene 368, 117-125. Zhou, X.Y., Wang, Y., Zhu, J., Miao, Q.H., Zhu, L.Q., Zhan, S.H., Wang, G.J., Liu, G.Q., 2017. First report of peste des petits ruminants virus lineage II in Hydropotes inermis, China. Transbound Emerg Dis. Zouridis, H., Hatzimanikatis, V., 2008. Effects of codon distributions and tRNA competition on protein translation. Biophys J 95, 1018-1033.
13
ACCEPTED MANUSCRIPT Fig.1 The nucleotide usages of viral genes in both PPRV and RPV. (A) The nucleotide usages of F gene in both PPRV and RPV; (B) The nucleotide usages of H gene in both PPRV and RPV; (C) The nucleotide usages of L gene in both PPRV and RPV; (D) The nucleotide usages of M gene in both PPRV and RPV; (E) The nucleotide usages of N gene in both PPRV and RPV; (F) The nucleotide usages of P gene in both PPRV and RPV.
SC
RI
PT
Fig.2 The relationships between GC content, GC12 content and GC3content of viral genes. (A) The relationship between the GC12 content and the total GC content for genes of PPRV; (B) The relationship between the GC3 content and the total GC content for genes of PPRV; (C) The relationship between the GC12 content and the total GC content for genes of RPV; (D) The relationship between the GC3 content and the total GC content for genes of RPV; (E) The relationship between the GC12 content and the GC3 content for genes of PPRV; (F) The relationship between the GC12 content and the GC3 content for genes of RPV. Fig.3 The plot of ENC v.s GC3 content of viral genes. (A) PPRV; (B) RPV
NU
Fig.4 The plot of the first two dominant factors derived from PCA.
MA
Fig.5 The similarity degree of the overall codon usage between virus and its natural host.
AC
CE
PT E
D
Fig.S1 Heat maps of codon usage bias of viral genes between PPRV and RPV. Heat map of CUB of 59 synonymous codons from PPRV and RPV using Euclidean distance and complete linkage clustering module. (A) F gene; (B) H gene; (C) L gene; (D) M gene; (E) N gene; (F) P gene.
14
ACCEPTED MANUSCRIPT Table 1 Dinucleotide relative abundance for viral genes of PPRV L gene
M gene
N gene
P gene
0.70 1.04 0.87 1.15 0.88 0.75 1.09 1.12 1.23 1.08 0.88 0.51 1.20 0.94 0.95 0.84
0.68 1.10 0.91 1.13 0.73 0.74 1.24 1.13 1.20 1.08 0.93 0.54 1.30 0.88 0.65 0.86
0.62 1.42 0.85 1.02 1.05 1.45 1.47 1.52 1.08 1.43 0.91 0.54 1.74 1.07 0.81 0.87
0.82 0.98 0.88 1.13 0.80 0.76 1.06 1.19 1.06 1.08 0.99 0.50 1.35 0.97 0.73 0.84
0.73 0.99 0.81 1.20 0.58 0.94 1.37 0.97 1.23 1.16 0.80 0.61 1.51 0.75 0.85 0.91
0.74 1.15 0.69 1.21 0.61 0.62 1.57 1.11 1.16 1.21 0.85 0.51 1.79 0.81 0.79 0.85
RI
SC
NU
MA D PT E CE 15
PT
H gene
AC
AA AT AC AG TA TT TC TG CA CT CC CG GA GT GC GG
F gene
ACCEPTED MANUSCRIPT Table 2 Dinucleotide relative abundance for viral genes of RPV L gene
M gene
N gene
P gene
0.76 1.01 0.91 1.07 0.96 0.69 1.09 1.14 1.20 1.18 0.87 0.48 1.00 0.98 0.89 0.68
0.69 1.02 1.01 1.19 0.88 0.75 1.09 1.06 1.22 1.25 0.83 0.40 1.26 0.80 0.81 0.91
0.65 1.19 0.76 1.23 0.94 0.94 1.09 1.29 1.03 1.08 0.64 0.42 1.55 0.95 0.76 0.92
0.78 1.05 0.89 1.05 0.64 0.91 1.16 1.12 1.23 0.87 0.95 0.63 1.44 0.87 0.71 0.83
0.68 1.03 0.91 1.16 0.64 0.85 1.21 1.18 1.28 1.28 0.77 0.36 1.52 0.66 0.85 0.97
0.77 1.22 0.65 1.13 0.42 0.71 1.57 1.23 1.22 1.17 0.79 0.60 1.72 0.72 0.89 0.77
RI
SC
NU
MA D PT E CE 16
PT
H gene
AC
AA AT AC AG TT TA TC TG CC CA CT CG GG GA GT GC
F gene
ACCEPTED MANUSCRIPT Table 3 The RSCU values for viral genes of PPRV M
N
NU
D
17
PT
1.12 0.88 0.46 1.27 0.98 1.58 0.58 1.12 0.99 1.42 0.59 1.24 1.62 0.54 0.60 0.68 1.18 1.73 0.93 0.84 0.64 0.95 1.35 1.10 0.60 1.22 1.39 0.93 0.46 0.77 1.08 1.55 0.60 1.53 0.47 0.20 1.80 0.88 1.12 0.34 1.66 0.89
RI
SC
0.96 1.04 0.92 0.67 0.49 0.59 1.55 1.78 0.79 1.20 1.01 0.77 1.55 0.72 0.97 0.92 0.32 2.36 0.00 1.16 1.24 0.71 2.05 0.85 0.39 1.02 1.33 1.38 0.26 0.86 1.54 1.60 0.00 0.52 1.48 1.08 0.92 0.69 1.31 0.83 1.17 1.09
MA
L 0.92 1.08 0.74 1.22 1.07 0.82 0.97 1.18 0.73 1.25 1.02 0.82 1.24 0.97 0.97 1.02 1.17 1.20 0.51 0.99 1.11 1.33 0.84 1.10 0.73 0.85 1.37 1.51 0.26 1.15 1.33 1.25 0.27 1.02 0.98 1.09 0.91 0.98 1.02 1.13 0.87 1.12
PT E
H 0.93 1.07 0.68 0.99 1.03 0.87 0.77 1.66 0.98 1.26 0.76 1.04 1.03 0.85 1.08 0.97 1.16 2.29 0.02 1.16 0.41 1.60 0.80 0.84 0.76 0.96 1.37 1.38 0.29 1.22 1.44 1.00 0.33 0.89 1.11 1.28 0.72 0.74 1.26 0.97 1.03 0.82
CE
AC
UUU(F) UUC(F) UUA(L) UUG(L) CUU(L) CUC(L) CUA(L) CUG(L) AUU(I) AUC(I) AUA(I) GUU(V) GUC(V) GUA(V) GUG(V) UCU(S) UCC(S) UCA(S) UCG(S) AGU(S) AGC(S) CCU(P) CCC(P) CCA(P) CCG(P) ACU(T) ACC(T) ACA(T) ACG(T) GCU(A) GCC(A) GCA(A) GCG(A) UAU(Y) UAC(Y) CAU(H) CAC(H) CAA(Q) CAG(Q) AAU(N) AAC(N) AAA(K)
F 1.04 0.96 0.82 0.66 1.20 0.67 1.01 1.64 0.64 1.06 1.30 1.06 0.92 1.23 0.79 0.52 0.93 1.73 0.69 0.96 1.17 1.36 0.47 1.79 0.39 1.02 1.06 1.63 0.29 0.80 1.14 1.69 0.37 0.82 1.18 0.73 1.27 0.68 1.32 1.36 0.64 0.75
P 1.09 0.91 0.69 0.59 1.10 1.86 1.20 0.56 0.72 1.38 0.90 0.97 1.92 0.15 0.97 1.53 1.37 1.21 0.29 0.58 1.03 0.97 0.80 1.43 0.80 0.88 1.27 1.39 0.47 1.05 0.89 1.61 0.44 1.35 0.65 1.08 0.92 0.98 1.02 0.84 1.16 0.88
ACCEPTED MANUSCRIPT 0.91 1.43 0.57 0.84 1.16 0.27 1.73 0.24 0.70 0.83 0.24 2.65 1.34 0.97 0.87 1.38 0.78
MA D PT E CE AC
18
1.11 0.76 1.24 0.87 1.13 0.20 1.80 0.62 0.61 0.50 0.45 1.87 1.95 0.54 0.59 1.41 1.46
PT
0.88 1.14 0.86 0.84 1.16 1.03 0.97 0.33 0.34 0.68 0.70 2.12 1.83 1.11 0.77 0.76 1.36
RI
1.18 1.43 0.57 0.77 1.23 1.16 0.84 0.38 0.40 0.34 0.66 2.40 1.82 0.62 1.15 1.05 1.18
SC
1.25 1.15 0.85 0.62 1.38 0.83 1.17 0.54 0.35 0.57 0.51 2.84 1.20 0.28 0.84 0.92 1.95
NU
AAG(K) GAU(D) GAC(D) GAA(E) GAG(E) UGU(C) UGC(C) CGU(R) CGC(R) CGA(R) CGG(R) AGA(R) AGG(R) GGU(G) GGC(G) GGA(G) GGG(G)
1.12 1.17 0.83 0.75 1.25 1.45 0.55 1.14 0.29 0.39 0.33 2.93 0.91 0.55 0.96 1.57 0.93
ACCEPTED MANUSCRIPT Table 4 The RSCU values for viral genes of RPV
F
H
L
M
N
P
0.75
0.67
0.94
0.71
1.01
0.52
UUC(F)
1.25
1.33
1.06
1.29
0.99
1.48
UUA(L)
0.50
0.73
0.90
0.64
0.80
0.25
UUG(L)
0.87
0.98
0.94
1.22
1.01
0.54
CUU(L)
0.99
0.63
0.90
0.46
1.14
1.51
CUC(L)
1.04
1.06
1.16
1.17
1.04
1.20
CUA(L)
1.19
1.05
1.03
1.15
0.47
0.31
CUG(L)
1.40
1.55
1.08
1.35
1.53
2.19
AUU(I)
1.01
0.96
0.67
1.29
0.87
AUC(I)
1.00
1.11
1.18
1.08
1.73
AUA(I)
0.99
0.93
1.15
0.63
0.41
0.55
GUU(V)
0.65
0.76
0.55
0.63
0.88
1.05
GUC(V)
1.14
1.50
1.23
1.50
GUA(V)
0.73
0.87
1.03
0.69
GUG(V)
1.48
0.87
1.19
1.19
UCU(S)
0.69
0.78
0.85
UCC(S)
0.82
0.78
0.75
UCA(S)
1.22
2.18
1.79
UCG(S)
0.89
0.39
0.35
0.48
AGU(S)
1.70
0.93
1.21
AGC(S)
0.69
0.95
CCU(P)
0.94
1.37
CCC(P)
0.80
0.78
CCA(P)
1.40
1.33
CCG(P)
0.85
ACU(T)
1.31
ACC(T)
1.31
ACA(T)
1.22
ACG(T)
0.15
GCU(A) GCC(A)
PT
UUU(F)
0.80
1.40
0.94
0.47
1.09
1.08
0.51
1.18
1.62
1.70
0.66
0.76
0.71
1.48
1.21
0.17
0.66
1.73
0.96
0.73
1.05
0.87
1.55
1.01
1.21
0.97
1.23
0.67
1.09
0.58
1.13
1.20
1.09
1.67
1.17
1.11
0.52
0.61
0.78
0.46
1.02
1.55
1.09
0.63
1.23
1.03
0.84
0.89
1.00
1.55
1.46
1.29
1.74
2.09
1.05
1.18
0.31
0.28
0.28
0.17
0.34
1.43
1.81
1.40
0.15
1.31
1.21
0.92
1.00
1.06
0.65
0.82
0.86
1.41
1.00
1.32
2.24
1.62
1.50
0.25
0.20
0.22
0.96
0.25
0.43
UAU(Y)
0.69
0.65
0.98
0.79
0.77
0.67
UAC(Y)
1.31
1.35
1.02
1.21
1.23
1.33
CAU(H)
0.73
1.51
1.17
0.74
1.20
1.54
CAC(H)
1.27
0.49
0.83
1.26
0.80
0.46
CAA(Q)
0.97
0.65
0.78
1.18
1.01
0.93
CAG(Q)
1.03
1.35
1.22
0.82
0.99
1.07
AAU(N)
1.04
0.79
1.01
0.98
0.89
0.89
AAC(N)
0.96
1.21
0.99
1.02
1.11
1.11
AAA(K)
0.72
1.04
0.81
0.82
1.15
0.80
GCA(A) GCG(A)
CE
PT E
D
MA
NU
1.09
AC
SC
RI
1.64
19
ACCEPTED MANUSCRIPT 1.28
0.96
1.19
1.18
0.85
1.20
GAU(D)
1.04
0.97
1.19
1.23
0.85
1.26
GAC(D)
0.96
1.03
0.81
0.77
1.15
0.74
GAA(E)
1.16
0.50
0.78
0.81
1.00
0.79
GAG(E)
0.84
1.50
1.22
1.19
1.00
1.21
UGU(C)
1.16
0.75
0.76
1.12
2.00
0.92
UGC(C)
0.84
1.25
1.24
0.88
0.00
1.08
CGU(R)
0.84
0.28
0.17
0.41
0.33
0.30
CGC(R)
0.46
0.42
0.37
0.62
0.50
0.98
CGA(R)
0.46
1.08
0.47
0.83
0.08
0.79
CGG(R)
0.46
0.16
0.70
0.65
0.74
AGA(R)
1.12
2.09
2.26
2.19
1.70
AGG(R)
2.67
1.97
2.03
1.30
2.65
1.51
GGU(G)
1.17
0.66
1.15
0.59
0.70
0.73
GGC(G)
0.74
0.61
0.56
0.78
GGA(G)
0.47
1.18
0.91
1.01
GGG(G)
1.63
1.55
1.38
1.62
0.98 1.44
RI
SC
NU MA D PT E CE AC
20
PT
AAG(K)
0.90
0.97
1.27
1.44
1.13
0.87
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Fig. 1
21
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Fig. 2
22
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Fig. 3
23
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Fig. 4
24
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Fig. 5
25
ACCEPTED MANUSCRIPT Highlight >A systemic analyses of nucleotide, codon and amino acid usages between PPRV and RPV >Synonymous codons with under-/over-representation are found between PPRV and RPV
PT
>Synonymous codons with CpG nucleotide pair avoid to be selected by the two viruses
SC
RI
>Synonymous codon usage patterns can represent genetic diversity between PPRV and RPV
AC
CE
PT E
D
MA
NU
>Mutation pressure & natural selection drive evolutionary ways between PPRV and RPV
26
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
Abbreviations: PPRV, Peste des petits ruminants virus; RSCU, Relative synonymous codon usage
27