BioSystems 85 (2006) 225–230
Whole genome analysis of non-optimal codon usage in secretory signal sequences of Streptomyces coelicolor Yu-Dong Li a , Yong-Quan Li a,∗ , Jian-shu Chen b , Hui-jun Dong a , Wen-Jun Guan a , Hong Zhou a a
b
Zhejiang University, College of Life, Hangzhou 310027, PR China Zhejiang University of Technology, College of Pharmaceutics, Hangzhou 310014, PR China
Received 28 December 2005; received in revised form 24 January 2006; accepted 15 February 2006
Abstract Non-optimal (rare) codons have been suggested to reduce translation rate and facilitate secretion in Escherichia coli. In this study, the complete genome analysis of non-optimal codon usage in secretory signal sequences and non-secretory sequences of Streptomyces coelicolor was performed. The result showed that there was a higher proportion of non-optimal codons in secretory signal sequences than in non-secretory sequences. The increased tendency was more obvious when tested with the experimental data of secretory proteins from proteomics analysis. Some non-optimal codons for Arg (AGA, CGU and CGA), Ile (AUA) and Lys (AAA) were significantly over presented in the secretary signal sequences. It may reveal that a balanced non-optimal codon usage was necessary for protein secretion and expression in Streptomyces. © 2006 Elsevier Ireland Ltd. All rights reserved. Keywords: Non-optimal codon; Signal peptide; Secretory protein; Streptomyces coelicolor
1. Introduction Streptomyces are Gram-positive soil bacteria renowned for their ability to produce a large number of different secondary metabolites and to secrete a wide range of extracellular hydrolytic enzymes (Hopwood, 1999). Recently, Streptomyces have also been developed as a potential bacterial host to secrete large amounts of heterologous proteins (Binnie et al., 1997). Many researches about heterologous gene expression and secretion in Streptomyces have been carried out, including the signal peptides (Lammertyn and Anne, 1998), secretory pathways (Schaerlaekens et al., 2004)
∗ Corresponding author. Tel.: +86 571 87951232 (O); fax: +86 571 86971634. E-mail address:
[email protected] (Y.-Q. Li).
and so on. However, the mechanism to assure high extracellular protein yields is still uncertain. Streptomyces coelicolor is a model system of Streptomyces for its high G + C content and large linear chromosomes (>8 Mb) (Bentley et al., 2002). Release of the S. coelicolor A3(2) genome sequence has further expanded the knowledge of this organism and enabled us to analyse in a large-scale of whole genome and proteome (Kim et al., 2005a,b). This can help to decode the mystery of heterologous gene expression in Streptomyces. The presence of rare codons has been postulated to reduce translation rate (Kinnaird et al., 1991), probably due to a relatively scarcity of cognate tRNA species. It was generally believed that it was possible to improve the yield of the heterologous protein by optimizing codon compositions in an introduced gene (Hannig and Makrides, 1998; Lakey et al., 2000). Rare codon UUA in Streptomyces was linked to differentiation genes that
0303-2647/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2006.02.006
226
Y.-D. Li et al. / BioSystems 85 (2006) 225–230
were expressed in later stage of life cycle, when tRNA becomes abundant. If heterologous genes contained UUA, its expression would be problematic; otherwise, the expression ability would be promoted (Fuglsang, 2005; Ueda et al., 1993). Surprisingly, a drop in protein secretion was observed when five consecutive rare codons occurring in 5 end of the mouse TNF␣ gene were changed to preferred codons (Lammertyn et al., 1996), which suggested that non-optimal codons may also have other functions. Furthermore, the elevated levels of non-optimal codons in Escherichia coli signal peptide sequences was observed by comparison with nonsecretory sequences through the whole genome analysis. So it was proposed that a possible role of non-optimal codons in the 5 region of the mRNA was relevant with protein secretion in E. coli, which may facilitate the targeting of secretory proteins by allowing interaction of the nascent signal peptide with translocation machinery prior to translocation across, or insertion into, the membrane (Burns and Beacham, 1985; Power et al., 2004). Previous studies have shown that translational selection may also work in codon usage of the GC-rich Streptomyces (Wright and Bibb, 1992). To investigate the affection of non-optimal codons in secretion, all nonoptimal codons were selected and the frequency of nonoptimal codons in complete secretory signal sequences of S. coelicolor was investigated, using the method available (SignalP3.0) for predicting signal sequences in whole genome. 2. Materials and methods 2.1. Sequences The complete DNA sequence file and its annotation file of S. coelicolor were downloaded from Sanger Institute ftp site (ftp://ftp.sanger.ac.uk/pub/S coelicolor/). Some OFRs (Incomplete ORFs; less than 200 codons; Transposons or Repeats) were excluded, and the remaining sequences were used for further analysis. 2.2. Non-optimal codons set The selection of non-optimal codons was based on a low frequency of usage of each codon relative to other synonymous codons plus a low relative abundance of the cognate tRNA species. Optimal codons were calculated by relative synonymous codon usage value (RSCU) with codonW (http://codonw.sourceforge.net/), and each non-optimal codon has a RSCU value <0.5 in highly expressed genes. The frequency of each codon of S. coelicolor was retrieved from codon usage database (http://www.kazusa.or.jp/codon/). Codons with a frequency of less than 1% were arbitrarily selected as nonoptimal codons from all synonymous codons of an amino
acid. The tRNA copy number was retrieved from GtRNAdb (http://lowelab.ucsc.edu/GtRNAdb/). Since the copy number was correlated well with tRNA abundance, Copy number larger than 1 was considered as optimal codons. The selected nonoptimal codons were checked by all these three methods. 2.3. Generation of datasets The first 70 amino acid residues of each ORF of the genome of S. coelicolor were sent to the SignalP server (Bendtsen et al., 2004) (http://www.cbs.dtu.dk/services/SignalP/) to determine whether it was a secretary protein. According to SignalP report, each ORF sequence in the S. coelicolor genome was sorted into three categories: non-secretory, secretory, and uncertain. Sequences whose SignalP report with mean S-score and Dscore both featured “Y” were classed as secretory; if both featured “N” were classed as non-secretory, and other sequences were classed as uncertain. Sequences in the uncertain category were excluded from further analysis. For a secretory sequence, the value in the second “pos” column (Ymax ) of report gives the position of the first amino acid of the mature sequence of the protein. This value was used to calculate the mean signal peptide length in S. coelicolor. Four datasets were derived from secretory and non-secretory files (Fig. 1). Dataset 1 (5 secretory) was derived from secretory sequences by copying 5 end of each sequence with a length of signal peptide predicted by signalP. Dataset 2 (5 non-secretory) was derived from non-secretory sequences by copying 5 end of each sequence with the same length of mean signal peptide. Dataset 3 (secretory-mature) and dataset 4 (non-secretory-mature) were derived by copying an excerpt of mean signal peptide length from the remaining secretory and non-secretory sequences respectively. The complete DNA sequences, including secretory and non-secretory from S. coelicolor was named dataset 5 (genome). 2.4. Analysis of the datasets The five datasets were first analyzed for codon usage. Briefly, the sequences within each dataset were concatenated, enabling a single codon usage table to be generated for each dataset using the program “Countcodon” (http://www.kazusa. or.jp/codon/countcodon.html). For each dataset the percentage of non-optimal codons in each synonymous codon family was calculated from their corresponding Countcodon result.
Fig. 1. Derivation of four datasets 1–4. The top bar depicts secretory sequences and the bottom bar depicts non-secretory sequences. Dataset titles are shown in plain text. The regions of sequence depicted are not shown to scale.
Y.-D. Li et al. / BioSystems 85 (2006) 225–230
For each dataset, the relative synonymous codon usage values (RSCU) of different codons of the selected amino acids have been calculated. The relative synonymous codon usage value of the jth codon for the ith amino acid was calculated as below (Sharp and Li, 1986): obs RSCUij = ni × ni ij , where obsij is the observed numj=1
obsij
ber of the jth codon for the ith amino acid, which has ni type of synonyms. RSCU values smaller than 1 indicate a negative codon bias, while RSCU values above 1 indicate a positive. These codon bias indices and amino acid usage were calculated with GCUA (McInerney, 1998) (http://gcua.schoedl.de/). Statistical analyses including Kruskal–Wallis test and Wilcox signed rank test were performed using the R software (http://rproject.org). 2.5. Plot of non-optimal codon usage In order to investigate the distribution of non-optimal codons along the polypeptide chain, and to compare such distributions between non-secretory and secretory sequences, a sliding window program (CodonUsageW) was developed to determine the percentage of non-optimal codons present at each codon position. The proportion of non-optimal codons was calculated as: (all non-optimal codons)/(all synonymous codons where choice is available). For this analysis, the sequences were aligned relative to the start codon, and only the non-secretory and the secretory datasets were used.
3. Results 3.1. Secretory sequences and non-optimal codons in S. coelicolor At the outset of analysis each ORF in the S. coelicolor was checked by the conditions mentioned and 5491 sequences was selected for further analysis. According to SignalP result, the total ORFs were sorted into three categories: non-secretory (4452 sequences), secretory (776 sequences), and uncertain (263). The mean length of signal peptide sequences was 32 amino acid residues, which was slightly shorter than previously reported (Morosoli et al., 1997). The four datasets derived from the secretory and non-secretory categories were analyzed for their content of non-optimal codons (Fig. 1). S. coelicolor, like other bacteria, uses a specific subset of the 61 available amino acid codons for the production of most mRNA molecules. The so-called optimal codons are those that have more abundant tRNA species (Ikemura, 1985), which tend to occur in highly expressed genes. Non-optimal codons tend to be in genes expressed at a low level. The set of codons regarded as non-optimal in this study include rare codons and common codons (see Table 1).
227
The selection of non-optimal codons set was based on their cognate tRNA concentration and their frequency of use. Three methods (see Section 2) were used to check if they were consistent with each other. Non-optimal codons identified by RSCU value were similar with by frequency and tRNA species, only one codon (CGC), whose RSCU value >2 but with no tRNA species found, was excluded from non-optimal codons set. 3.2. Non-optimal codon usage in S. coelicolor The concatenated datasets 1–5 were analyzed for their content of non-optimal codons (Table 2). It was clear that the rank order (dataset 1, 2 > dataset 3–5) was conserved among all amino acids. Of the 18 amino acids (except for Trp and Met for only one codon), 10 amino acids showed higher non-optimal codon usage in 5 end signal sequences in secretory proteins than in non-secretory proteins. In contrast, dataset 3 (Secretory-mature) was slightly lower than dataset 4 and 5. The total usage order follows the dataset 1 > 2 > 4 ≈ 5 > 3. Among these increased non-optimal usage amino acids, Arg (R), Ile (I) and Lys (K) have increased most apparently. To visually examine whether non-optimal codons are preferential in signal peptide sequences, a sliding window program called “CodonUsageW” was developed by plotting the frequency of use of non-optimal codons along the length of the sequences in the 5 secretory versus 5 non-secretory data. The result of this analysis was shown in Fig. 2. The signal peptide-related nature of this plot shows that the secretory sequences maintain a high level of non-optimal codons over most of the 5 region corresponding to the signal sequence, the leftward difference is statistically significant (P < 0.05, Kruskal–Wallis test).
Fig. 2. Mean percentage of non-optimal codons: 5 -ends of nonsecretory sequences vs. secretory sequences. Predicted secretory sequences were taken from SignalP result, and identified secretory sequences were taken from proteomics analysis data in Kim et al. (2005a,b).
228
Y.-D. Li et al. / BioSystems 85 (2006) 225–230
Table 1 Non-optimal codons used by Streptomyces coelicolor Amino acid
Codons
Frequency of 1000 codonsa
RSCU in high expressed genesb
tRNA copyc
Gly Gly Glu Asp Val Val Ala Ala Arg Arg Arg Arg Ser Ser Ser Lys Asn Ile Ile Thr Thr Cys Tyr Leu Leu Leu Leu Phe Gln His Pro Pro
GGU GGA GAA GAU GUU GUA GCU GCA AGG AGA CGA CGU AGU UCA UCU AAA AAU AUA AUU ACA ACU UGU UAU UUG UUA CUA CUU UUU CAA CAU CCU CCA
9.23 7.16 8.73 3.06 1.46 2.66 3.07 5.61 3.66 0.78 2.58 5.49 1.51 1.08 0.65 1.06 0.72 0.66 0.63 1.64 1.20 0.72 0.98 2.44 0.06 0.36 1.60 0.44 1.35 1.70 1.54 1.35
0.32 0.12 0.13 0.02 0.03 0.04 0.04 0.06 0.19 0.02 0.05 0.32 0.05 0.02 0.03 0.01 0.02 0.02 0.02 0.02 0.03 0.07 0.02 0.03 0 0 0.03 0.01 0.01 0.02 0.03 0.01
0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1
a b c
A cut-off of <1% was used for non-optimal codons. The high expressed genes was selected according to Wu et al. (2005). Codon can not find tRNA species was selected as non-optimal.
The difference of non-optimal codon levels in secreted group and non-secreted group was reduced, and then became similar at a position starting at approx codon number 32, which was mean signal peptide length. After the plots intersect, the mean values of the nonsecretory and secretory sequences were virtually the same through the remainder of the plotted sequence (secretory sequences was slightly less than non-secretory sequences). The tendency was clearer with secretory sequences identified by proteomics analysis (Kim et al., 2005b). In order to investigate each codon’s usage variety in the datasets, the RSCU values of each codon were calculated by GCUA, and some codons, which have increased usage, are shown in Fig. 3. These non-optimal codons have been used more frequently in secretory sig-
nal sequences than other datasets. Among these, AGA, CGA, CGU (Arg), AUA (Ile), AAA (Lys) were changed distinctly (p < 0.05, Kruskal–Wallis test), and may contribute mostly to secretion.
Fig. 3. Non-optimal codon usage in three datasets. Only non-optimal codons with increased usage were shown.
Y.-D. Li et al. / BioSystems 85 (2006) 225–230
229
Table 2 Percentage of non-optimal codons in datasets 1–5 Amino acidsa
5 Secretory
5 Non-secretory
Secretory mature
Non-secretory mature
Genome
ArgN IleN LysN GlyN GlnN PheN ProN ThrN GluN CysN ValN SerN AsnN AlaN LeuN AspN TyrN HisN Total
38.25 17.73 15.04 27.89 11.48 5.62 9.63 10.85 23.46 16.55 7.58 12.79 7.48 10.41 7.76 8.06 6.61 7.99 245.18
24.93 9.79 10.05 25.32 8.97 4.49 9.02 10.37 23.21 16.65 7.91 13.19 9.25 12.19 9.56 10.07 8.68 12.2 225.85
15.91 4.32 4.13 17.91 4.77 2.39 3.61 3.47 14.75 8.92 4.53 5.64 2.21 6.43 4.37 4.6 5.07 6.23 119.26
14.51 5.07 5.64 18.91 4.59 1.72 3.77 4.14 14.54 10.11 4.4 6.18 4.9 5.64 3.86 5.37 5.81 7.55 126.71
14.99 4.48 5.1 17.17 5.06 1.68 4.69 4.61 15.28 9.28 4.79 6.54 4.23 6.36 4.38 5 4.78 7.24 125.66
a
ArgN, etc., represent the proportion of non-optimal codons of that particular synonymous codon family expressed as a percentage of the total number of codons in that family.
4. Discussion Natural translation selection favors certain synonymous codons, yet non-optimal codons still persist in protein coding genes in a wide variety of organisms. There were many hypotheses for the existence of translational non-optimal codons, one of which is that nonoptimal codons may be favored by alternative selection pressure at certain sites (Smith and Eyre-Walker, 2001). The high incidence of non-optimal codons in 5 signal sequences of E. coli has been suggested to play a role in secretion and translation (Power et al., 2004). Recently, it was reported that there was translational selection in highly expressed genes in high GC content bacteria like S. coelicolor, as in most bacteria with a balanced GC/AT genomic base composition (Wu et al., 2005). As shown in Fig. 2, the proportions of non-optimal codons were relatively higher at the 5 end of both the non-secretory and secretory sequences, and the proportion of non-optimal codons was markedly higher in the 5 ends of the secretory sequences than that in the non-secretory sequences. This result confirmed that nonoptimal codon usage was increased in signal sequences of secretory proteins in high GC content bacteria and was coordinated with non-optimal codon usage in signal sequences in E. coli. The signal recognition particles (SRP) pathway was also found recently in Streptomyces (Palacin et al., 2003) and secretory preproteins were escorted to the cell membrane following interac-
tion with SRP. But SRP itself did not cause translation arrest (Fekkes and Driessen, 1999). The translation arrest caused by high rate usage of non-optimal codons may increase the time available for SRP interaction with the emerging signal peptide and improve secretion efficiency. The function of non-optimal codons in secretion pathway could be approved again by different non-optimal usage of amino acids. The mostly increased non-optimal usage of Arg and Lys (see Table 2) were contained mainly in N-region of signal peptide, which was suggested to interact with the translocation machinery, such as SRP, and the negatively charged membrane (Lammertyn and Anne, 1998; Tjalsma et al., 2004). This result agreed well with our supposition of the role of non-optimal codons in signal sequences. Recently, considerable efforts have been undertaken to exploit the potential of using certain species of Streptomyces as a host for efficient expression and secretion of recombinant heterologous proteins. It was known that expression can be improved by changing unfavored codons to preferred codons in many bacteria (Kane, 1995; Makrides, 1996). This phenomenon was thought to be relevant to the relative levels of the intracellular cognate tRNA species, which were high for optimal codons and low for low-usage codons. However, the hypersecretion of HlyA can be recovered and improved in E. coli by introducing five rare codons into hlyA sequences, which has also been mutated to obtain high expression muta-
230
Y.-D. Li et al. / BioSystems 85 (2006) 225–230
tion (Lee and Lee, 2005). When five consecutive minor codons of the mouse TNF␣ gene were changed to preferred codons, a 104 -fold drop in protein secretion was observed in Streptomyces (Lammertyn et al., 1996). All these observations were consistent with our finding that non-optimal codon usage was increased in the 5 end of signal sequences. This showed that replacement of rare codons in 5 end signal sequences caused a reduction of the protein secretion. Therefore, too many rare codons may reduce expression of the protein, while some nonoptimal codons may be helpful for secretion efficiency. Our investigation may reach the conclusion that balanced usage of non-optimal codons was necessary for improving expression and secretion efficiency of heterologous proteins in Streptomyces. Acknowledgements We would like to thank Dr Zhongyu Xie of Louisiana University for his critical reading of this manuscript. This research was supported by grants from Key Research Foundation of State Education Ministry of China. References Bendtsen, J.D., Nielsen, H., von Heijine, G., Brunak, S., 2004. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795. Bentley, S.D., Chater, K.F., Cerdeno-Tarraga, A.M., Challis, G.L., Thomson, N.R., James, K.D., Harris, D.E., Quail, M.A., Kieser, H., Harper, D., et al., 2002. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417, 141–147. Binnie, C., Cossar, J.D., Stewart, D.I., 1997. Heterologous biopharmaceutical protein expression in Streptomyces. Trends Biotechnol. 15, 315–320. Burns, D.M., Beacham, I.R., 1985. Rare codons in E. coli and S. typhimurium signal sequences. FEBS Lett. 189, 318–324. Fekkes, P., Driessen, A.J., 1999. Protein targeting to the bacterial cytoplasmic membrane. Microbiol. Mol. Biol. Rev. 63, 161– 173. Fuglsang, A., 2005. Intragenic position of UUA codons in streptomycetes. Microbiol. Sgm 151, 3150–3152. Hannig, G., Makrides, S.C., 1998. Strategies for optimizing heterologous protein expression in Escherichia coli. Trends Biotechnol. 16, 54–60. Hopwood, D.A., 1999. Forty years of genetics with Streptomyces: from in vivo through in vitro to in silico. Microbiology 145 (Pt 9), 2183–2202. Ikemura, T., 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34. Kane, J.F., 1995. Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr. Opin. Biotechnol. 6, 494–500. Kim, D.W., Chater, K., Lee, K.J., Hesketh, A., 2005a. Changes in the extracellular proteome caused by the absence of the bldA gene product, a developmentally significant tRNA, reveal a new target
for the pleiotropic regulator AdpA in Streptomyces coelicolor. J. Bacteriol. 187, 2957–2966. Kim, D.W., Chater, K.F., Lee, K.J., Hesketh, A., 2005b. Effects of growth phase and the developmentally significant bldA-specified tRNA on the membrane-associated proteome of Streptomyces coelicolor. Microbiology 151, 2707–2720. Kinnaird, J.H., Burns, P.A., Fincham, J.R., 1991. An apparent rarecodon effect on the rate of translation of a Neurospora gene. J. Mol. Biol. 221, 733–736. Lakey, D.L., Voladri, R.K., Edwards, K.M., Hager, C., Samten, B., Wallis, R.S., Barnes, P.F., Kernodle, D.S., 2000. Enhanced production of recombinant Mycobacterium tuberculosis antigens in Escherichia coli by replacement of low-usage codons. Infect. Immun. 68, 233–238. Lammertyn, E., Anne, J., 1998. Modifications of Streptomyces signal peptides and their effects on protein production and secretion. FEMS Microbiol. Lett. 160, 1–10. Lammertyn, E., Van Mellaert, L., Bijnens, A.P., Joris, B., Anne, J., 1996. Codon adjustment to maximise heterologous gene expression in Streptomyces lividans can lead to decreased mRNA stability and protein yield. Mol. Gen. Genet. 250, 223–229. Lee, P.S., Lee, K.H., 2005. Engineering HlyA hypersecretion in Escherichia coli based on proteomic and microarray analyses. Biotechnol. Bioeng. 89, 195–205. Makrides, S.C., 1996. Strategies for achieving high-level expression of genes in Escherichia coli. Microbiol. Rev. 60, 512–538. McInerney, J.O., 1998. GCUA: general codon usage analysis. Bioinformatics 14, 372–373. Morosoli, R., Shareck, F., Kluepfel, D., 1997. Protein secretion in streptomycetes. FEMS. Microbiol. Lett. 146, 167–174. Palacin, A., de la Fuente, R., Valle, I., Rivas, L.A., Mellado, R.P., 2003. Streptomyces lividans contains a minimal functional signal recognition particle that is involved in protein secretion. Microbiology 149, 2435–2442. Power, P.M., Jones, R.A., Beacham, I.R., Bucholtz, C., Jennings, M.P., 2004. Whole genome analysis reveals a high incidence of nonoptimal codons in secretory signal sequences of Escherichia coli. Biochem. Biophys. Res. Commun. 322, 1038–1044. Schaerlaekens, K., Lammertyn, E., Geukens, N., De Keersmaeker, S., Anne, J., Van Mellaert, L., 2004. Comparison of the Sec and Tat secretion pathways for heterologous protein production by Streptomyces lividans. J. Biotechnol. 112, 279–288. Sharp, P.M., Li, W.H., 1986. Codon usage in regulatory genes in Escherichia coli does not reflect selection for ’rare’ codons. Nucleic Acids Res. 14, 7737–7749. Smith, N.G., Eyre-Walker, A., 2001. Why are translationally suboptimal synonymous codons used in Escherichia coli? J. Mol. Evol. 53, 225–236. Tjalsma, H., Antelmann, H., Jongbloed, J.D., Braun, P.G., Darmon, E., Dorenbos, R., Dubois, J.Y., Westers, H., Zanen, G., Quax, W.J., et al., 2004. Proteomics of protein secretion by Bacillus subtilis: separating the “secrets” of the secretome. Microbiol. Mol. Biol. Rev. 68, 207–233. Ueda, Y., Taguchi, S., Nishiyama, K., Kumagai, I., Miura, K., 1993. Effect of a rare leucine codon, TTA, on expression of a foreign gene in Streptomyces lividans. Biochim. Biophys. Acta 1172, 262–266. Wright, F., Bibb, M.J., 1992. Codon usage in the G + C-rich Streptomyces genome. Gene 113, 55–65. Wu, G., Culley, D.E., Zhang, W., 2005. Predicted highly expressed genes in the genomes of Streptomyces coelicolor and Streptomyces avermitilis and the implications for their metabolism. Microbiology 151, 2175–2187.