Journal of Integrative Agriculture 2018, 17(9): 2074–2081 Available online at www.sciencedirect.com
ScienceDirect
RESEARCH ARTICLE
Synonymous codon usage pattern in model legume Medicago truncatula SONG Hui1, 2, LIU Jing1, CHEN Tao1, NAN Zhi-biao1 1
State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730000, P.R.China
2
Grassland Agri-husbandry Research Center, Qingdao Agricultural University, Qingdao 266109, P.R.China
Abstract Synonymous codon usage pattern presumably reflects gene expression optimization as a result of molecular evolution. Though much attention has been paid to various model organisms ranging from prokaryotes to eukaryotes, codon usage has yet been extensively investigated for model legume Medicago truncatula. In present study, 39 531 available coding sequences (CDSs) from M. truncatula were examined for codon usage bias (CUB). Based on analyses including neutrality plots, effective number of codons plots, and correlations between optimal codons frequency and codon adaptation index, we conclude that natural selection is a major driving force in M. truncatula CUB. We have identified 30 optimal codons encoding 18 amino acids based on relative synonymous codon usage. These optimal codons characteristically end with A or T, except for AGG and TTG encoding arginine and leucine respectively. Optimal codon usage is positively correlated with the GC content at three nucleotide positions of codons and the GC content of CDSs. The abundance of expressed sequence tag is a proxy for gene expression intensity in the legume, but has no relatedness with either CDS length or GC content. Collectively, we unravel the synonymous codon usage pattern in M. truncatula, which may serve as the valuable information on genetic engineering of the model legume and forage crop. Keywords: codon usage, gene expression, Medicago truncatula, natural selection, optimal codon
several different codons encode the same amino acid.
1. Introduction The central dogma of molecular biology describes 61 codons that encode 20 amino acids in the process of translation. Fifty-nine out of the 61 codons are synonymous, i.e.,
Bias in synonymous codon usage may arise as a result of evolutionary forces such as natural selection and mutation pressure (Hershberg and Petrov 2008). Studies have shown that selection favours specific codons that promote efficient and accurate translation of genes expressed at high levels (Duret 2000; Hershberg and Petrov 2008). Thus, an effective method to identify codons favoured by selection (i.e., optimal codons) is the comparison of codon usages
Received 5 September, 2017 Accepted 11 April, 2018 Correspondence SONG Hui, E-mail:
[email protected]; NAN Zhi-biao, E-mail:
[email protected] © 2018 CAAS. Publishing services by Elsevier B.V. All rights reserved. doi: 10.1016/S2095-3119(18)61961-6
for each amino acid encoded by both highly and lowly expressed genes (Ingvarsson 2008; Qiu et al. 2011a; Whittle and Extavour 2015). Under mutation pressure, in contrast, preference is given to codons that tend to demonstrate lower equilibrium toward particular nucleotides (i.e., GC vs. AT) at
SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081
codon positions, especially at the third nucleotide position (Sueoka 1988; Hershberg and Petrov 2008). In addition, mutation pressure can act upon genes encoding common and uncommon amino acids (Hershberg and Petrov 2008). Furthermore, many other factors can also be associated with codon usage bias (CUB). For example, base composition, relative transfer ribonucleic acid (tRNA) abundance, gene length, intron number and length, gene expression level, translation initiation efficiency, protein structure, codon and anticodon binding energy, and alternative splicing are all associated with CUB (Rao et al. 2011; Novoa and de Pouplana 2012; Williford and Demuth 2012; Chaney and Clark 2015; Liu et al. 2015; Song et al. 2017b). CUB in unicellular eukaryotes is relatively simple because of fewer introns and fewer alternative transcripts in the organisms (Akashi 2001; Plotkin 2011). In contrast, CUB in higher plants is complex (Ikemura 1985; Ingvarsson 2007; Camiolo et al. 2012; Liu et al. 2015). Studies have demonstrated that both mutation pressure and/or selection forces are associated with CUB in various multicellular organisms (Hershberg and Petrov 2008; Plotkin 2011; Li et al. 2016). Though so far researchers have examined CUB in plants including Arabidopsis thaliana, Arachis ssp., Oryza sativa, Picea spp., Populus tremula, Silene latifolia, Triticum aestivum and Zea mays (Morton and Wright 2007; Whittle et al. 2007; Zhang et al. 2007; Ingvarsson 2010; Qiu et al. 2011a; Camiolo et al. 2015; Liu et al. 2015; De La Torre et al. 2015; Song et al. 2017a, c), CUB in Medicago truncatula, a model legume species, remains poorly understood. In this study, we analysed CUB in M. truncatula using 39 531 available coding sequences (CDSs). We found that natural selection is a major driving force behind CUB in M. truncatula. Furthermore, we identified 30 optimal codons for 18 amino acids, most of which characteristically end with A or T. In addition, we found that optimal codons are present frequently in CDSs with higher GC content. These findings not only unravel molecular evolution in terms of synonymous codon usage in the model legume, but also serve as the valuable information on genetic improvement of the forage crop.
2. Materials and methods 2.1. Sequence data The CDSs of M. truncatula were downloaded from the M. truncatula genome website (http://jcvi.org/medicago/ display.php?pageName=General§ion=Download) (Young et al. 2011). To avoid biases caused by short sequence fragments, we filtered the CDSs using the following criteria: (1) CDSs start with ATG and end with
2075
TAA, TAG or TGA; (2) CDSs lengths exceed 300 bp; and (3) CDSs lack premature stop codons or ambiguous codons. Data of M. truncatula tRNA abundance were obtained from GtRNAdb (version 3.0, http://gtrnadb.ucsc.edu).
2.2. Codon bias indices We calculated the content of each of the four bases at the third synonymous codon positions (i.e., A3s, C3s, G3s, and T3s), GC content at the third synonymous codon positions (GC3s), codon adaptation index (CAI), effective number of codons (ENC), frequency of optimal codons (Fop), and relative synonymous codon usage (RSCU) using the Codon W Program (version 1.4, http://codonw.sourceforge.net). CAI and Fop are directional measures of CUB, which quantify the degree of selection acting upon a gene (Ikemura 1985; Sharp and Li 1987). CAI values are between 0 and 1. Values close to 1 suggest that a given gene has experienced increasing intense selection to maintain a specific codon optimized for efficient translation (Sharp and Li 1987). ENC is a non-directional measure depending upon the nucleotide composition of genes. ENC values range from 20 to 61. Twenty indicates that one codon is exclusively used to encode a given amino acid, whereas 61 indicates all codons are used equally (Wright 1990). RSCU values greater than 1 indicate that a particular codon is used more frequently than expected, while RSCU values less than 1 indicate that a codon is used less frequently than expected (Sharp and Li 1987). CDS and genomic DNA sequence lengths and the GC content at the first (GC1), second (GC2), and third (GC3) nucleotide positions of codon were calculated using an inhouse Perl script.
2.3. Data analyses Many methods that identify optimal codons have been reported (Hershberg and Petrov 2009; Wang et al. 2011). However, RSCU value calculation is particularly popular for assessing optimal codons (Whittle et al. 2011; Whittle and Extavour 2015). Optimal codons are defined as described by Whittle and Extavour (2015). Briefly, the CDS sequences were sorted based on ENC value, and the top and bottom 5% of sequences were defined as the genes with high and low expression, respectively. ΔRSCU equals RSCUmean minus RSCUmean lowly expressed CDS. A statistically highly expressed CDS significant and positive ΔRSCU value indicates that more than one codon matches this criterion per amino acid; the codon with the largest ΔRSCU for a particular amino acid is defined as the primary optimal codon (Ingvarsson 2008; Whittle et al. 2011).
SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081
A total of 256 975 M. truncatula expressed sequence tag (EST) sequences were downloaded from the National Center for Biotechnology Information (NCBI) database on January 19, 2016. EST abundance has been used to estimate gene expression intensity (Ohlrogge and Benning 2000). In this study, we surveyed all available EST sequences for each M. truncatula CDSs using a local BLAST program (Altschul et al. 1997). The number of EST sequences that match a specific CDS defines the expression intensity of a given gene (Ohlrogge and Benning 2000; Song and Nan 2014). The following evaluation criteria were used as thresholds to determine sequences subjected to further analyses (Song et al. 2015): (1) length of aligned sequences>200 bp; (2) identity>96%; and (3) E-value≤10–10. Correlation analyses were carried out using JMP 9.0 (SAS Institute Inc., Cary, NC, USA), and results were depicted using Origin 9.0 (OriginLab, Northampton, MA, USA).
3. Results 3.1. Base composition of M. truncatula A total of 62 319 CDSs have been previously identified in the sequenced M. truncatula genome (Young et al. 2011). The total number of CDSs used in the present study was reduced to 39 531 using the filtering criteria described in Materials and methods. The GC content in these CDSs varied from 23.9 to 69.7% (SD=3.77, Appendix A). Moreover, the GC contents at three nucleotide positions of codons were different. The GC1 was the highest (47.6%), followed by that of GC2 (38.5%) and GC3 (36.3%). The average GC contents across the three positions were 40.9%, indicating that CDSs in M. truncatula have higher AT content (59.1%). RSCU is expressed as the observed frequency of a codon divided by its expected frequency. Thirty-one codons were less frequently used than expected (i.e., RSCU<1) and 26 codons were used more frequently (i.e., RSCU>1) in CDSs of M. truncatula, indicating the 26 codons are used preferentially among all CDSs M. truncatula (Appendix B). Furthermore, the RSCU analysis demonstrated that CDSs in M. truncatula are biased towards codons ending with A or T, except for AGG (Arg) and TTG (Leu) (Appendix B).
3.2. Factors associated with codon usage in M. truncatula A significant correlation between the average of GC1 and GC2 (GC12) and GC3 with a slope value close to 1 suggests that mutation pressure is the major force in shaping codon usage pattern (Sueoka 1988). If natural selection is the dominant factor, in contrast, the slope value is close to 0
(Sueoka 1988). In this study, a significant positive correlation (r=0.12, P<0.01, and slope=0.08) between GC12 and GC3 with a slope close to 0 was observed (Fig. 1), suggesting that natural selection has shaped the codon usage pattern in M. truncatula. Gene spots occur along the curves of ENC plots if codons are constrained by neutral pressure (Wright 1990; Zhang et al. 2007). Other pressures influence codon usage if all gene spots occur below or above the ENC curve. In addition, Kawabe and Miyashita (2003) demonstrated that natural selection shapes codon usage if the GC3s across genes is narrow. Our analysis showed that in M. truncatula, most genes analysed fell below the ENC curve and GC3s values were distributed within a narrow range (0.2–0.5, Fig. 2), suggesting that natural selection plays a substantial role in the codon usage pattern. A comparison between ENC and CAI can assess the relationship between the nucleotide composition and the natural selection (Sharp and Li 1987; Wright 1990). In this study, we found ENC and CAI had no correlation (r=0.06) in M. truncatula. There were significantly negative correlations between ENC and either A or T content at synonymous third codon positions (A3s, r=–0.26, P<0.01; and T3s, r=–0.37, P<0.01), and significantly positive correlations between ENC and either C or G content at synonymous third codon positions (C3s, r=0.40, P<0.01; and G3s, r=0.24, P<0.01). These patterns indicated that in M. truncatula, CUB was featured with high A3s and T3s (AT3s) or low G3s and C3s (GC3s) values. The reasonable explanation is that natural selection acts on the third codon position to increase the A and T content (AT3, 63.7%), instead of the G and C content (GC3, 36.3%).
90 80 70 GC12 (%)
2076
60 50 40 r=0.12 P<0.01
30 20 0
20
40 60 GC3 (%)
80
100
Fig. 1 Correlation between GC12 (GC1 and GC2) and GC3. GC content at the first (GC1), second (GC2), and third (GC3) codon positions were calculated using an in-house Perl script. Correlation analyses were executed in JMP 9.0, and the figure was generated using Origin 9.0.
SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081
3.4. Correlation between gene length, GC content and gene expression
65 60 55
ENC
50 45 40 35 30 25
2077
0
0.2
0.4 0.6 GC3 (%)
0.8
1.0
Fig. 2 Effective number of codons (ENC) plot. The ENC values shown in this plot were generated using codon W. The figure was generated using Origin 9.0. The continuous curve indicates the relationship between ENC and GC3s values under neutral selection. Each dot indicates a gene. GC3, GC content at the third position of synonymous codons.
3.3. Identification of optimal codon We identified 30 optimal codons that encode 18 amino acids in M. truncatula using the ΔRSCU method (Table 1). These optimal codons, except AGG (Arg) and TTG (Leu), preferentially end with A or T. This is consistent with the RSCU result. Furthermore, RSCU and optimal codons analyses led to the identification of 26 optimal codons with high frequency and 4 optimal codons without high frequency, whereas 28 codons are neither high frequent nor optimal. To define the factors that determine optimal codons, we selected the Fop as an evaluation index for a correlation analysis. There was no correlation between Fop and CDS length and genomic DNA (exon and intron) length (Table 2). Significant positive correlations were observed between Fop and the GC content from different CDS and genomic DNA sequences (Table 2). These results indicated that the GC content across the three codon positions had similar effects on optimal codon usage. Moreover, optimal codon usage is associated with higher GC content in CDSs and intronic (Table 2). There was a significant positive correlation between Fop and CAI (r=0.76, P<0.01). High CAI values indicate high levels of gene expression. This finding is consistent with previous studies, which showed that optimal codons were used in highly expressed genes under the impact of natural selection (Ingvarsson 2007; Qiu et al. 2011a). In general, low ENC values indicate CUB. In this study, Fop and ENC exhibited a significant positive correlation (r=0.21, P<0.01), indicating that highly optimal codons have low CUB.
Various factors have been examined for their association with gene expression, including GC content, intron size, and protein sequence length (Rao et al. 2011; Williford and Demuth 2012; De La Torre et al. 2015). Based on EST data analysis, gene expression intensity in M. truncatula was not correlated with CDS and genomic DNA length, and the GC content of both CDS and genomic DNA sequences (Table 2). The results indicate that CDS length and the GC content do not influence the gene expression in M. truncatula.
4. Discussion As far as codon usage study is concerned, plants have remained well behind prokaryotic models. One major reason is the limited number of completely sequenced genomes in plants comparatively. Hordium vulgare, Nicotiana tabacum, Pisum sativum, T. aestivum, and Z. mays were pioneeringly investigated for their codon usage utilizing their EST or partial genome sequences (Fennoy and Bailey-Serres 1993; Kawabe and Miyashita 2003). Following the completion of A. thaliana genome sequencing in 2000, other plant genome sequences have also become available increasingly. So far codon usage has been analysed for several sequenced model plants including A. thaliana, Brachypodium distachyon, and O. sativa (Morton and Wright 2007; Qiu et al. 2011b; Liu et al. 2015). However, codon usage patterns in M. truncatula remain unexamined. In this study, we analysed codon usage patterns in M. truncatula utilizing 39 531 CDSs. Our results suggest that: (1) natural selection acts on codon usage pattern in M. truncatula; (2) for 18 out of 20 amino acids, the optimal codons characteristically end with A or T; (3) optimal codons are more widely present in genes with higher GC content; and (4) no correlation between gene expression intensity and either gene length or GC content. In Populous and Arabidopsis, tRNA abundance is positively correlated with optimal codon usage (Wright et al. 2004; Ingvarsson 2007). However, we found that nine optimal codons (TTT, TAT, CAT, AAT, AAA, GAT, AGT, TGT and GGT) with high RSCU are associated with the low abundance of corresponding tRNA in M. truncatula (Table 1). Williford and Demuth (2012) explained this phenomenon through two hypotheses: (1) Codon-anticodon recognition heavily depends on post-transcriptional modifications of tRNA sequences. It has been confirmed that nucleotide A is always modified into I (inosine), and nucleotide U at the first anticodon position experiences extensive changes that could expand or restrict the number of recognized codons (Agris et al. 2007); (2) codons that correspond to highly
2078
SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081
Table 1 Optimal codons and transfer ribonucleic acid (tRNA) abundance in Medicago truncatula1) Amino acid Phe Leu
Ile
Met Val
Tyr His Gln Asn Lys Asp Glu Ser
Pro
Thr
Ala
Cys Trp Arg
Codon2) TTT* TTC TTA* TTG* CTT* CTC CTA* CTG ATT* ATC ATA* ATG GTT* GTC GTA* GTG TAT* TAC CAT* CAC CAA* CAG AAT* AAC AAA* AAG GAT* GAC GAA* GAG AGT* AGC TCT* TCC TCA* TCG CCT* CCC CCA* CCG ACT* ACC ACA* ACG GCT* GCC GCA* GCG TGT* TGC TGG CGT CGC CGA CGG AGA* AGG*
tRNA (copy number) AAA(ND) GAA(26) TAA(7) CAA(14) AAG(9) GAG(ND) TAG(10) CAG(4) AAT(19) GAT(ND) TAT(10) CAT(40) AAC(17) GAC(3) TAC(7) CAC(7) ATA(1) GTA(12) ATG(ND) GTG(17) TTG(17) CTG(5) ATT(9) GTT(28) TTT(18) CTT(16) ATC(ND) GTC(28) TTC(19) CTC(11) ACT(ND) GCT(15) AGA(12) GGA(2) TGA(8) CGA(2) AGG(7) GGG(ND) TGG(11) CGG(1) AGT(10) GGT(4) TGT(10) CGT(1) AGC(25) GGC(ND) TGC(16) CGC(3) ACA(ND) GCA(10) CCA(14) ACG(12) GCG(ND) TCG(5) CCG(3) TCT(8) CCT(6)
High expression RSCU (number) 1.38 (18 892) 0.62 (8 414) 1.34 (11 882) 1.65 (14 638) 1.67 (14 771) 0.43 (3 779) 0.72 (6 359) 0.19 (1 689) 1.58 (19 105) 0.50 (6 044) 0.92 (11 198) 1.00 (15 239) 2.02 (17 918) 0.33 (2 892) 0.74 (6 554) 0.92 (8 138) 1.43 (12 499) 0.57 (5 005) 1.56 (10 968) 0.44 (3 134) 1.68 (17 357) 0.32 (3 251) 1.39 (23 259) 0.61 (10 170) 1.23 (24 736) 0.77 (15 646) 1.62 (22 419) 0.38 (5 297) 1.38 (25 986) 0.62 (11 745) 1.30 (11 363) 0.51 (4 436) 1.62 (14 165) 0.39 (3 434) 2.05 (17 883) 0.12 (1 069) 1.69 (10 906) 0.20 (1 320) 2.01 (12 949) 0.09 (587) 1.47 (10 530) 0.48 (3 457) 1.96 (14 026) 0.09 (625) 1.85 (13 436) 0.30 (2 206) 1.76 (12 788) 0.08 (557) 1.48 (8 400) 0.52 (2 966) 1.00 (8 013) 0.35 (1 328) 0.06 (245) 0.19 (744) 0.05 (184) 3.77 (14 409) 1.57 (5 998)
Low expression RSCU (number) 0.93 (10 326) 1.07 (11 874) 0.61 (4 627) 1.25 (9 456) 1.32 (9 996) 1.59 (12 083) 0.50 (3 798) 0.73 (5 498) 1.22 (10 003) 1.29 (10 569) 0.49 (4 065) 1.00 (10 857) 1.54 (12 384) 0.93 (7 500) 0.42 (3 390) 1.11 (8 939) 0.99 (6 093) 1.01 (6 231) 1.00 (6 231) 1.00 (6 190) 1.12 (8 589) 0.88 (6 789) 0.95 (9 876) 1.05 (10 973) 0.98 (12 716) 1.02 (13 227) 1.21 (14 607) 0.79 (9 479) 1.06 (13 691) 0.94 (12 166) 0.70 (4 833) 0.68 (4 666) 1.33 (9 156) 1.30 (8 987) 1.10 (7 582) 0.89 (6 131) 1.18 (8 013) 0.75 (5 050) 1.03 (6 934) 1.04 (7 057) 1.06 (7 358) 1.30 (9 016) 0.86 (5 939) 0.78 (5 444) 1.30 (10 876) 0.95 (7 983) 0.82 (6 835) 0.93 (7 797) 1.06 (4 690) 0.94 (4 186) 1.00 (6 161) 1.24 (5 892) 1.23 (5 862) 0.83 (3 924) 0.83 (3 942) 0.98 (4 657) 0.89 (4 204)
(Continued on next page)
SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081
2079
Table 1 (Continued from preceding page) Amino acid Gly
Codon2) GGT* GGC GGA* GGG
tRNA (copy number) ACC(1) GCC(23) TCC(15) CCC(3)
High expression RSCU (number) 1.58 (14 585) 0.29 (2 674) 1.68 (15 498) 0.45 (4 171)
Low expression RSCU (number) 1.33 (10 158) 0.98 (7 432) 0.99 (7 537) 0.70 (5 323)
1)
RSCU, synonymous codon usage. * indicates optimal codon. ND indicates not detected.
2)
Table 2 Correlation analysis between coding sequence architecture features and gene expression based on expressed sequence tag (EST) abundance in Medicago truncatula1) Fop EST
CDS length –0.07 –0.04
DNA length –0.004 –0.02
GC1 content 0.47** 0.08
GC2 content 0.28** 0.04
GC3 content 0.27** 0.06
GC content in CDS 0.23** 0.10
GC content in DNA 0.40** 0.03
1)
CDS, coding sequence; GC1–3, GC content at the first, second, and third codon positions, respectively; CDS, coding sequence; Fop, frequency of optimal codons. ** indicates significance at P<0.01.
abundant tRNA cannot be translated most accurately based on 73 bacterial genomes from 20 different genera (Shah and Gilchrist 2010). Besides these hypotheses, other factors may also explain these results. First, optimal codons with low tRNA abundance may encode conserved domains. Purifying selection acts on codons with low-abundent tRNAs, many of which encode conserved domains that play a crucial role in physiological development (Zhou et al. 2013; Chaney and Clark 2015). As such, proteins encoded by genes with low-abundant tRNAs have experienced purifying selection, and these proteins may play a vital role in M. truncatula. Secondly, codons with low-abundant corresponding tRNAs may be used in more frequency. In vivo analyses in Saccharomyces cerevisiae indicated that codons preferentially used in highly expressed genes are not translated faster than those highly expressed genes with non-optimal codon usage (Novoa and de Pouplana 2012; Qian et al. 2012). Recent studies have focused on identifying factors that act on CUB, but some resulting conclusions are inconsistent. In this study, we performed correlation analyses between Fop and a number of variables, including sequence length, GC content, CAI, and ENC. We found that optimal codon use (i.e., Fop) is not correlated with CDS length, but positively correlated with the GC content and CAI. However, Wang and Hickey (2007) found that CUB is negatively correlated with gene length, and that short genes have high GC content compared to long genes in rice. Ingvarsson (2007) showed that Fop values are negatively correlated with protein lengths, but strongly and positively correlated with the GC3 content in P. tremula. Note that CAI, which indicates CUB in genes with high expression levels, is the major factor associated positively with optimal codon usage. A strongly positive correlation has been found between CUB and gene
expression in many species, including Cardamine spp., P. tremula, S. latifolia, and Tribolium castaneum (Ingvarsson 2007; Qiu et al. 2011a; Ometto et al. 2012; Williford and Demuth 2012). When average gene expression intensities within a given tissue type are examined, Fop is not correlated with gene expression; however, when maximal gene expressions across tissues are under survey, Fop is weakly correlated with gene expression (De La Torre et al. 2015). In M. truncatula, gene expression is not correlated with CDS length and GC content. Similar results were also observed previously in rice. Liu et al. (2004) confirmed that natural selection is one major driving force behind gene expression level, whereas CDS length only plays a minor role in rice. However, Qiu et al. (2011a) found that gene expression is positively correlated with the GC3 content, but strongly and negatively correlated with the intron GC content. GC3 is not positively correlated with gene expression in A. thaliana and A. lyrata, but there is a weak positive correlation between gene expression and intron GC content (Wright et al. 2004). The studies of the correlation between CDS or genomic DNA length and gene expression have led to controversial results. Long gene sequences actually improve gene expression in species such as T. castaneum and Picea spp. (Williford and Demuth 2012; De La Torre et al. 2015). By contrast, Camiolo et al. (2015) confirmed that short and higher-GC DNA sequences are always positively correlated with gene expression and optimal usage bias in four monocots, 15 dicots and two mosses. Some studies have proposed that short proteincoding sequences with high expression levels are less costly in terms of metabolism (Williford and Demuth 2012; Whittle and Extavour 2015). However, Yang (2009) argued that short sequences with high expression levels hardly support the energy-cost hypothesis, but may be better reconciled
2080
SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081
with the time-cost hypothesis, in which rapidly rather than highly expressed genes, tend to be expressed well in timecost efficient manner.
5. Conclusion Our results support that nature selection played a pivotal role in forming codon usage pattern, and no relatedness between gene expression and either CDS length or GC content in M. truncatula. In addition, we have identified a total of 30 optimal codons for the model legume. These results could provide valuable information on genetic engineering of the model legume and forage crop.
Acknowledgements We thank Dr. Wen Jiangqi (Noble Research Institute, USA) and Dr. Wang Hongliang (United States Department of Agriculture-Agricultural Research Service, Tifton) for critical reviews and comments. This study was supported by the National Basic Research Program of China (2014CB138702) and the National Natural Science Foundation of China (31502001). Appendices associated with this paper can be available on http://www.ChinaAgriSci.com/V2/En/appendix.htm
References Agris P F, Vendeix F A P, Graham W D. 2007. tRNA’s wobble decoding of the genome: 40 years of modification. Journal of Molecular Biology, 366, 1–13. Akashi H. 2001. Gene expression and molecular evolution. Current Opinion in Genetics and Development, 11, 660–666. Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402. Camiolo S, Farina L, Porceddu A. 2012. The relation of codon bias to tissue-specific gene expression in Arabidopsis thaliana. Genetics, 192, 641–649. Camiolo S, Melito S, Porceddu A. 2015. New insights into the interplay between codon bias determinants in plants. DNA Research, 22, 461–470. Chaney J, Clark P L. 2015. Roles for synonymous codon usage in protein biogenesis. Annual Review of Biophysics, 44, 143–166. Duret L. 2000. tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends in Genetics, 16, 287–289. Fennoy S L, Bailey-Serres J. 1993. Synonymous codon usage in Zea mays L. nuclear genes is varied by levels of C- and G-ending codons. Nucleic Acids Research, 21, 5294–5300.
Hershberg R, Petrov D A. 2008. Selection on codon bias. Annual Review of Genetics, 42, 287–299. Hershberg R, Petrov D A. 2009. General rules for optimal codon choice. PLoS Genetics, 5, e1000556. Ikemura T. 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Molecular Biology and Evolution, 2, 13–34. Ingvarsson P K. 2007. Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Molecular Biology and Evolution, 24, 836–844. Ingvarsson P K. 2008. Molecular evolution of synonymous codon usage in Populus. BMC Evolutionary Biology, 8, 307. Ingvarsson P K. 2010. Natural selection on synonymous and nonsynonymous mutations shaps patterns of polymorphism in Populus tremula. Molecular Biology and Evolution, 27, 650–660. Kawabe A, Miyashita N T. 2003. Patterns of codon usage bias in three dicot and four monocot plant species. Genes and Genetic Systems, 78, 343–352. Li X, Song H, Kuang Y, Chen S, Tian P, Li C, Nan Z. 2016. Genome-wide analysis of codon usage bias in Epichloë festucae. International Journal of Molecular Sciences, 17, E1138. Liu Q, Feng Y, Zhao X, Dong H, Xue Q. 2004. Synonymous codon usage bias in Oryza sativa. Plant Science, 167, 101–105. Liu Q, Hu H, Wang H. 2015. Mutational bias is the driving force for shaping the synonymous codon usage pattern of alternatively spliced in rice (Oryza sativa L.). Molecular Genetics and Genomics, 290, 649–660. Morton B R, Wright S I. 2007. Selective constraints on codon usage of nuclear genes from Arabidopsis thaliana. Molecular Biology and Evolution, 24, 122–129. Novoa E M, de Pouplana L R. 2012. Speeding with control: Codon usage, tRNAs, and ribosomes. Trends in Genetics, 28, 574–581. Ohlrogge J, Benning C. 2000. Unraveling plant metabolism by EST analysis. Current Opinion in Plant Biology, 3, 224–228. Ometto L, Li M, Bresadola L, Varotto C. 2012. Rates of evolution in stress-related genes are associated with habitat preference in two Cardamine lineages. BMC Evolutionary Biology, 12, 7. Plotkin J B. 2011. Synonymous but not the same: The causes and consequences of codon bias. Nature Reviews Genetics, 12, 32–42. Qian W, Yang J R, Pearson N M, Maclean C, Zhang J. 2012. Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genetics, 8, e1002603. Qiu S, Bergero R, Zeng K, Charlesworth D. 2011a. Patterns of codon usage bias in Silene latifolia. Molecular Biology and Evolution, 28, 771–780. Qiu S, Zeng K, Slotte T, Wright S, Charlesworth D. 2011b. Reduced efficacy of natural selection on codon usage bias in selfing Arabidopsis and Capsella species. Genome Biology and Evolution, 3, 868–880.
SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081
Rao Y, Wu G, Wang Z, Chai X, Nie Q, Zhang X. 2011. Mutation bias is the driving force of codon usage in the Gallus gallus genome. DNA Research, 18, 499–512. Shah P, Gilchrist M A. 2010. Effect of correlated tRNA abundances on translation errors and evolution of codon usage bias. PLoS Genetics, 6, e1001128. Sharp P M, Li W H. 1987. The codon adaption index - A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research, 15, 1281–1295. Song H, Gao H, Liu J, Tian P, Nan Z. 2017a. Comprehensive analysis of correlations among codon usage bias, gene expression, and substitution rate in Arachis duranensis and Arachis ipaënsis orthologs. Scientific Reports, 7, 14853. Song H, Liu J, Song Q, Zhang Q, Tian P, Nan Z. 2017b. Comprehensive analysis of codon usage bias in seven Epichloë species and their peramine-coding genes. Frontiers in Microbiology, 8, 1419. Song H, Nan Z. 2014. Genome-wide analysis of nucleotidebinding site disease resistance genes in Medicago truncatula. Chinese Science Bulletin, 59, 1129–1138. Song H, Wang P F, Li T T, Xia H, Zhao S Z, Hou L, Zhao C Z. 2015. Genome-wide identification and evolutionary analysis of nucleotide-binding site-encoding resistance genes in Lotus japonicus (Fabaceae). Genetics and Molecular Research, 14, 16024–16040. Song H, Zhang Q, Tian P, Nan Z. 2017c. Differential evolutionary patterns and expression levels between sex-specific and somatic tissue-specific genes in peanut. Scientific Reports, 7, 9016. Sueoka N. 1988. Directional mutation pressure and neutral molecular evolution. Proceedings of the National Academy of Sciences of the United States of America, 85, 2653–2657. De La Torre A R, Lin Y C, Van de Peer Y, Ingvarsson P K. 2015. Genome-wide analysis reveals diverged pattern of codon bias, gene expression, and rates of sequence evolution in Picea gene families. Genome Biology and Evolution, 7, 1002–1015. Wang B, Shao Z Q, Xu Y, Liu J, Liu Y, Hang Y Y, Chen J Q. 2011. Optimal codon identities in bacteria: Implications from the conflicting results of two different method. PLoS ONE, 6, e22714. Wang H C, Hickey D A. 2007. Rapid divergence of codon
2081
usage patterns within the rice genome. BMC Evolutionary Biology, 7, S6. Whittle C A, Extavour C G. 2015. Codon and amino acid usage are shaped by selection across divergent model organisms of the Pancrustacea. G3: Genes Genomes Genetics, 5, 2307–2321. Whittle C A, Malik M R, Krochko J E. 2007. Gender-specific selection on codon usage in plant genomes. BMC Genomics, 8, 169. Whittle C A, Sun Y, Johannesson H. 2011. Evolution of synonymous codon usage in Neurospora tetrasperma and Neurospora discreta. Genome Biology and Evolution, 3, 332–343. Williford A, Demuth J P. 2012. Gene expression levels are correlated with synonymous codon usage, amino acid composition, and gene architecture in the red flour beetle, Tribolium castaneum. Molecular Biology and Evolution, 29, 3755–3766. Wright F. 1990. The ‘effective number of codons’ used in a gene. Gene, 87, 23–29. Wright S I, Yan C B, Looseley M, Meyers B C. 2004. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Molecular Biology and Evolution, 21, 1719–1726. Yang H. 2009. In plants, expression breadth and expression level distinctly and non-linearly correlate with gene structure. Biology Direct, 4, 45. Young N D, Debellé F, Oldroyd G E, Geurts R, Cannon S B, Udvardi M K, Benedito V A, Mayer K F, Gouzy J, Schoof H, Van de Peer Y, Proost S, Cook D R, Meyers B C, Spannagl M, Cheung F, De Mita S, Krishnakumar V, Gundlach H, Zhou S, et al. 2011. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature, 480, 520–524. Zhang W J, Zhou J, Li Z F, Wang L, Gu X, Zhong Y. 2007. Comparative analysis of codon usage patterns among mitochondrion, chloroplast and nuclear genes in Triticum aestivum L. Journal of Integrative Plant Biology, 49, 246–254. Zhou M, Guo J, Cha J, Chae M, Chen S, Barral J M, Sachs M S, Liu Y. 2013. Non-optimal codon usage affects expression, structure and function of clock protein FRQ. Nature, 495, 111–115.
Section editor LUO Xu-gang Managing editor ZHANG Juan