Codon optimizer: a freeware tool for codon optimization

Codon optimizer: a freeware tool for codon optimization

Protein Expression and Purification 31 (2003) 247–249 www.elsevier.com/locate/yprep Codon optimizer: a freeware tool for codon optimization Anders Fug...

70KB Sizes 0 Downloads 55 Views

Protein Expression and Purification 31 (2003) 247–249 www.elsevier.com/locate/yprep

Codon optimizer: a freeware tool for codon optimization Anders Fuglsang* Institute of Pharmacology, Danish University of Pharmaceutical Sciences, Universitetsparken 2, DK-2100 Copenhagen Ø, Denmark Received 4 April 2003, and in revised form 13 June 2003

Abstract Selection plays a major role in the determination of codon usage in all organisms studied so far. In highly expressed genes, a narrow set of codons is used and these codons correspond to the more abundant tRNA species. This minimizes the risk of tRNA depletion during translation. In fact, the codons in a gene may be true bottlenecks, especially in cases where foreign genes are expressed in a host in which the usage of codons in highly expressed genes does not resemble the usage of codons in the species from which the foreign gene originates. In such cases, it has been shown that substitution of rare codons in the introduced gene may increase the yield dramatically. In addition, replacement of rare codons might decrease the chance of misincorporation and protect the protein from premature turnover. Here, a piece of software is announced that calculates a codon-optimized sequence of any gene based on knowledge of highly expressed genes of a host. In addition, it calculates the codon adaptation index of the gene and identifies internal type II restriction sites of the optimized sequence. The program runs under Windows and is available as freeware for use in academia. Ó 2003 Elsevier Inc. All rights reserved.

Expressivity is a major determinant of codon usage in all species studied so far; genes that are highly expressed tend to use a narrow set of codons corresponding to the most abundant species of tRNA [1–3]. This minimizes the risk of tRNA depletion during translation. This phenomenon is of practical importance for expression experiments, especially when the expressed gene has many codons that are rarely used in the host. In such cases, the codons (or tRNA availability) can be limiting factors for the yield. To overcome this problem, the non-optimal codons in the introduced gene can be substituted for codons that correspond to the more abundant tRNA species, and yields can thereby be increased considerably. The span of species in which codon optimization strategies have proven useful is expanding and presently includes prokaryotes (see, for example [4–8]) and cells of uni- and multicellular eukaryotes (see, for example [9–13]). Moreover, the usage of rare codons may have other negative consequences for the expression other than just the amount of obtained protein. The quality of the expressed protein is dependent on the codons, as rare codons are associated * Fax: +45-35306020. E-mail address: [email protected].

1046-5928/$ - see front matter Ó 2003 Elsevier Inc. All rights reserved. doi:10.1016/S1046-5928(03)00213-4

with an increased chance of misincorporation. Several researchers have reported decreased quality of the expressed product due to rare codons [14–17]. Finally, rare arginine codons have also been reported to activate the ssr tagging system of Escherichia coli when present near stop codons [18]. The ssr tag system directs protein degradation. All in all, there therefore seem to be very good arguments for replacing rare codons when performing expression experiments (or, as an alternative, use a producer strain which has been engineered to express higher amount of otherwise scarce tRNA species, see [19] and the review by Makrides [20]). I therefore wrote a computer program which reads a file of highly expressed genes of an organism and, based on this knowledge, calculates a sequence for any gene in which the codons one by one have been optimized, i.e., replaced by the synonymous codon that is most abundant in the reference set of highly expressed genes. The program uses two different inputs from the user. First, a file containing sequences for highly expressed genes is loaded. The application scans through these genes, counts the different codons, and generates a table of codon usage for the highly expressed genes. Next, the gene that the user wants to optimize is pasted into the application. The program then scans through the codon

248

A. Fuglsang / Protein Expression and Purification 31 (2003) 247–249

one by one, gives the weight-statistic (see below) for each codon in the original sequence, and suggests optimal codons for expression one by one. The issue of empirical prediction of gene expression levels based on codons has been addressed in a few works [21,22]. Sharp and Li [22] defined a quantity called the codon adaptation index (CAI), which today is the most commonly used empirical measure of expressivity. It is calculated as !1=L L Y CAI ¼ Wi ; ð1Þ i¼1

where Wi is the weight of the ith synonymous codon in a gene having L amino acids with synonymous codons (thus, tryptophan and methionine are not included). The weight is a number between 0 and 1 that tells to what degree the codon is abundant in a reference set of highly expressed genes such as ribosomal proteins and elongation factors. If, for example, the lysine codon AAA is present 50 times in the reference set and the lysine AAG codon is present 10 times, then AAA is given the weight 1.0 and AAG will have the value W ¼ 10=50 ¼ 0:2. The codon adaptation index is thus a quantity that tells to what degree the codons in a gene resemble the codons of highly expressed genes. If a gene has a very low value of CAI, then it means that there are relatively many codons with a low weight and few with a high weight in the gene, which is unfavorable for high expression. The application calculates the CAI of the input sequence as an indicator. One should be aware that the CAI is an empirical measure that does not predict the actual absolute yield of an experiment. In addition, many other factors determine the yield than just the codons. Shine– Dalgarno regions, plasmid copy numbers, promoter strengths, and secondary structure of the mRNA in the 50 untranslated region are factors that are also involved in determining the final yield and should also be taken into consideration. In addition, the program also identifies potential type II restriction sites (hexapalindromes) in the optimized sequence. The output is a text file, which can be opened with any text editor, word-processing package or spreadsheet. The system requirements are very moderate. The program will run under any Microsoft Windows 32-bit operative system and does not require a large space (<1 Mb) on the hard drive or external dynamic link libraries. The program can be obtained from the author of this paper. Reference files containing highly expressed genes (ribosomal proteins and elongation factors) for several bacterial species are included.

References [1] T. Ikemura, Codon usage and tRNA content in unicellular and multicellular organisms, Mol. Biol. Evol. 2 (1985) 13–34.

[2] M. Bulmer, Coevolution of codon usage and transfer RNA abundance, Nature 325 (1987) 728–730. [3] M. Gouy, C. Gautier, Codon usage in bacteria: correlation with gene expressivity, Nucleic Acids Res. 10 (1982) 7055–7074. [4] N. Acosta-Rivero, J.C. Sanchez, J. Morales, Improvement of human interferon HUIFNalpha2 and HCV core protein expression levels in Escherichia coli but not of HUIFNalpha8 by using the tRNA(AGA/AGG), Biochem. Biophys. Res. Commun. 296 (2002) 1303–1309. [5] X. Hu, Q. Shi, T. Yang, G. Jackowski, Specific replacement of consecutive AGG codons results in high-level expression of human cardiac troponin T in Escherichia coli, Protein Expr. Purif. 7 (1996) 289–293. [6] R.S. Hale, G. Thompson, Codon optimization of the gene encoding a domain from human type 1 neurofibromin protein results in a threefold improvement in expression level in Escherichia coli, Protein Expr. Purif. 12 (1998) 185–188. [7] S.J. Park, S.K. Lee, B.J. Lee, Effect of tandem rare codon substitution and vector-host combinations on the expression of the EBV gp110 C-terminal domain in Escherichia coli, Protein Expr. Purif. 24 (2002) 470–480. [8] A.R. Meetei, M.R.S. Rao, Hyperexpression of rat spermatidal protein tp2 in Escherichia coli by codon optimization and engineering the vector-encoded 50 utr, Protein Expr. Purif. 13 (1998) 184–190. [9] F.F. Hamdan, A. Mousa, P. Ribeiro, Codon optimization improves heterologous expression of a Schistosoma mansoni cDNA in HEK293 cells, Parasitol. Res. 88 (2002) 583–586. [10] L. Deml, A. Bojak, S. Steck, M. Graf, J. Wild, R. Schirmbeck, H. Wolf, R. Wagner, Multiple effects of codon usage optimization on expression and immunogenicity of dna candidate vaccines encoding the human immunodeficiency virus type 1 gag protein, J. Virol. 75 (2001) 10991–11001. [11] G. Sinclair, F.Y.M. Choy, Synonymous codon usage bias and the expression of human glucocerebrosidase in the methylotrophic yeast, Pichia pastoris, Protein Expr. Purif. 26 (2002) 96– 105. [12] E.B. Vervoort, A. van Ravenstein, N.M.N.E. van Peij, J.C. Heikoop, P.J.M. van Haastert, G.F. Verhaijden, H.K. Linskens, Optimizing heterologous expression in Dictyostelium: importance of 50 codon adaptation, Nucleic Acids Res. 28 (2000) 2069– 2074. [13] J.H. Woo, Y.Y. Liu, A. Mathias, S. Stavrou, Z. Wang, J. Thompson, D.M. Neville Jr., Gene optimization is necessary to express a bivalent anti-human anti-T cell immunotoxin in Pichia pastoris, Protein Expr. Purif. 25 (2002) 270–282. [14] R. Seetharam, R.A. Heeren, E.Y. Wong, S.R. Braford, B.K. Klein, S. Aykent, C.E. Kotts, K.J. Mathis, B.F. Bishop, M.J. Jennings, Mistranslation in IGF-1 during over-expression of the protein in Escherichia coli using a synthetic gene containing low frequency codons, Biochem. Biophys. Res. Commun. 155 (1988) 518–523. [15] D.E. McNulty, B.A. Claffee, M.J. Huddleston, J.F. Kane, Mistranslational errors associated with the rare arginine codon CGG in Escherichia coli. 29, Protein Expr. Purif. 27 (2003) 365– 374. [16] T.L. Calderone, R.D. Stevens, T.G. Oas, High-level misincorporation of lysine for arginine at AGA codons in a fusion protein expressed in Escherichia coli, J. Mol. Biol. 262 (1996) 407–412. [17] M.D. Forman, R.F. Stack, P.S. Masters, C.R. Hauer, S.M. Baxter, High level, context dependent misincorporation of lysine for arginine in Saccharomyces cerevisiae a1 homeodomain expressed in Escherichia coli, Protein Sci. 7 (1998) 500–503. [18] C.S. Hayes, B. Bose, R.T. Sauer, Stop codons preceded by rare arginine codons are efficient determinants of SsrA tagging in Escherichia coli, Proc. Natl. Acad. Sci. USA 99 (2002) 3440– 3445.

A. Fuglsang / Protein Expression and Purification 31 (2003) 247–249 [19] B.J. Del Tito Jr., J.M. Ward, J. Hodgson, C.J. Gershater, H. Edwards, L.A. Wysocki, F.A. Watson, G. Sathe, J.F. Kane, Effects of a minor isoleucyl tRNA on heterologous protein translation in Escherichia coli, J. Bacteriol. 177 (1995) 7086–7091. [20] S.C. Makrides, Strategies for achieving high-level expression of genes in Escherichia coli, Microbiol. Rev. 60 (1996) 512–538.

249

[21] M. Gribskov, J. Devereux, R.R. Burgess, The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression, Nucleic Acids Res. 12 (1984) 539–549. [22] P.M. Sharp, W.-H. Li, The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications, Nucleic Acids Res. 15 (1987) 1281–1295.