Gene 288 (2002) 1–8 www.elsevier.com/locate/gene
Influences on translation initiation and early elongation by the messenger RNA region flanking the initiation codon at the 3 0 side C. Magnus Stenstro¨m, Leif A. Isaksson* Department of Microbiology, Stockholm University, S-106 91 Stockholm, Sweden Received 16 November 2001; received in revised form 22 January 2002; accepted 19 February 2002 Received by D.L. Court
Abstract The downstream region (DR) located immediately after the initiation codon acts as a translational enhancer and depending on its sequence gene expression can vary considerably. In order to determine the influence of the DR on the apparent translation initiation, we have analyzed several naturally occurring DRs (a stretch of five codons) in a lacZ reporter gene. The efficiency of expression, associated with these DRs did not show any correlation to the expression levels connected with the natural genes. Changes of the iso-codon composition in the DR, thus maintaining the amino acid sequence in the gene product, gave significant variations in gene expression. Thus, the messenger RNA base sequence, and not the encoded amino acid sequence, in the early coding region is the determinant for the apparent efficiency of translation initiation and/or early elongation. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Initiation; Escherichia coli; Early elongation; Shine-Dalgarno; Downstream Region; Codon context
1. Introduction The initiation process and the formation of the initiation complex determine the efficiency of gene expression at the translational level. The sequences surrounding the start triplet act as determinants, influencing this efficiency (Dreyfus, 1988; Stenstro¨m et al., 2001b and references therein). The Shine-Dalgarno sequence (SD), a few bases upstream of the initiation codon is complementary to the anti-ShineDalgarno sequence near the 3 0 end of the 16S rRNA (Shine and Dalgarno, 1974; O’Connor et al., 1999 and references therein; Stenstro¨m et al., 2001a and references therein). The purine-rich SD region is of prime importance for the association of messenger RNA (mRNA) to the 30S ribosomal subunit, thereby increasing the probability for decoding of the initiation codon (AUG, GUG, UUG and in one case AUU). The complementary base pairing between the SD and the anti-SD in the 16S rRNA during an early initiation step has been well established by site directed mutagenesis (O’Connor et al., 1999 and references therein). Even though the initiation codon and the SD is generally accepted to be the key determinants during the initiation event, several other sequences surrounding the start codon Abbreviations: SD, Shine-Dalgarno (sequence); DR, Downstream region * Corresponding author. Tel.: 146-8-164-197; fax: 146-8-612-9552. E-mail address:
[email protected] (L.A. Isaksson).
are proposed to influence (Stenstro¨m et al., 2001b and references therein). The non-random distribution of nucleotides among the 40 bases surrounding the initiation codon (Schneider et al., 1986; Stenstro¨m et al., 2001a and references therein), indicates that there might be more signals in the mRNA primary structure that constitute the ribosome binding site during initiation of translation (Dreyfus, 1988). A region located downstream of the start codon has been suggested to influence translation initiation by mRNArRNA complementary base pairing (Faxe´n et al., 1991; O’Connor et al., 1999 and references therein; Stenstro¨m et al., 2001a and references therein). However, mutational alterations both in mRNA and 16S rRNA failed to support such complementary binding (Firpo and Dahlberg, 1998; O’Connor et al., 1999). Still, this region influences translation initiation in a context dependent manner (Stenstro¨m et al., 2001a,b). A CA repeat sequence in the region downstream of the initiation codon in mRNA, with or without a leader, increases the level of expression (Martin-Farmer and Janssen, 1999). More than 30 naturally leaderless mRNAs have been found in Archaea, Bacteria and Eucarya (Wu and Janssen, 1996). The AUG initiation codon itself, not the codon – anti-codon contact is important for translation of leaderless mRNA in Escherichia coli (vanEtten and Janssen, 1998). The question is if the initiation triplet is the sole determinant for a proper initiation event when a beneficial
0378-1119/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(02)00501-2
2
C.M. Stenstro¨m, L.A. Isaksson / Gene 288 (2002) 1–8
upstream sequence is missing, or if there is a contribution also from the region downstream of the initiation codon. Since earlier reports have suggested a strong influence by the downstream region (DR) upon translation initiation or during the early elongation phase, we wanted to check gene expression associated with different naturally occurring DRs. The fourteen DRs that are analyzed in this report originate from genes that are naturally expressed at different levels, keeping in mind that they are under control of different promoters. We also analyzed context mutants in these DRs and how a stretch of lysine codons would influence expression. Our analysis shows that the base composition of the early codons, and not the amino acid sequence at the N-terminus of the gene protein product, is the determinant for gene expression. 2. Material and methods
Deoxy-oligonucleotides and primers were purchased from CyberGene.
2.5. b -Galactosidase assays and growth conditions Cells were grown overnight at 378C in M9 minimal medium supplemented with all amino acids at recommended concentrations and 0.3 mg/ml ampicillin (Miller, 1972; Neidhardt et al., 1977). This culture was used for inoculation in the same medium at 378C. Exponentially growing cells were harvested without IPTG induction at an A540 nm density of 0.4–0.5. b-Galactosidase assays, using the lysed un-induced cells, were carried out as described (Miller, 1972). All measurements were carried out using an iEMS Multiscan Microplate Photometer (Labsystems).
2.1. Chemicals Restriction enzymes and T4 DNA ligase were from Promega or BioLabs. DNA extraction kit was from Qiagen. Plasmids were prepared with a JET Prep purification system from Genomed GmbH. Termination mixtures for automatic sequencing were purchased from PerkinElmer. 2.2. Bacterial strain The Dlac E. coli strain MC 1061 (Stenstro¨ m et al., 2001b and references therein) was the host strain for all plasmids. 2.3. Plasmid and plasmid construction Constructions of plasmids were done using standard recombinant DNA techniques. The Shine-Dalgarno region of pCMS71 (Stenstro¨ m et al., 2001b) was replaced by a strong Shine-Dalgarno sequence (SD 1) using the restriction sites ApaI and SwaI giving the plasmid pCMS80 (Stenstro¨ m et al., 2001a) (Fig. 1). pCMS71/pCMS80 carry lacI and lacZ, with lacZ as the reporter gene, under control of an IPTG inducible trc promoter (Stenstro¨ m et al., 2001b). Oligonucleotides containing downstream sequences were cloned into pCMS71/pCMS80 cleaved with SwaI and SalI. To ascertain correct transformants ethanol precipitation was performed after the ligation reaction followed by a treatment with XbaI, to cleave plasmids that still carry the vector sequence with the XbaI site (Fig. 1). 2.4. DNA sequencing All primary recombinants resulting from cloning of synthetic deoxy-oligonucleotides were sequenced by automatic sequencing (CyberGene) using a DNA Sequencing Kit (Perkin-Elmer Applied Biosystems). The sequence primer DA1 was used, which is complementary to a unique sequence (5 0 -GCG ATC GGT GCG GGC CTC-3 0 ) approximately 100 bases downstream of the lacZ initiation site.
Fig. 1. Principal constructions of plasmids used in this study starting with pCMS1 (Stenstro¨ m et al., 2001b). The lacZ gene is under transcriptional control of the trc promoter (Amann et al., 1988). The derivatives pCMS71 and pCMS80 were used for insertion of different AUG downstream context sequences using the restriction sites SwaI and SalI as indicated. pCMS71 contains an sequence showing low homology to a Shine-Dalgarno region (TAAATAAA) eight nucleotides upstream of the start codon (SD 2), while pCMS80 contains a strong SD 1 (TAAGGAGG).
C.M. Stenstro¨ m, L.A. Isaksson / Gene 288 (2002) 1–8
3
high expression. For AUG initiated variants together with SD 2 the difference in expression level was 35-fold, albeit at a lower level, if araC and trpB are compared (Table 1). A comparison of expression values for the DR model gene variants and the Codon Adaptation Index (CAI) values, which serve as an index for the level of gene expression (Sharp and Burgess, 1992) for the corresponding original natural genes, does not indicate any correlation. The influence of the DR on gene expression was also examined for the two near-cognate start codons, UUG (spaA) and GUG (lacI) (Table 1). As expected, together with SD 2 the activities were low for these two constructs. However, together with SD 1 the spaA construct gave a quite high expression, which was even higher than for most of the other constructs with AUG. Some of the DRs, especially the sequences from the trpB and trpL genes, give a significantly higher expression compared to the others. Even together with SD 2 these two DRs give activities that are higher than found for several of the constructs with an SD 1. TrpL represents the beginning of the gene sequence coding for the leader peptide of the trp operon and trpB one of the genes in the trp operon. The high activities found for the trpL and trpB DRs together with either SD 1 or SD 2 indicates that these DRs are exceptionally favorable for gene expression during translation initiation or the early elongation phase. The 12 codon, and in particular AAA, has a strong influence on gene expression at the translation level, either with a strong (Looman et al., 1987) or weak SD (Stenstro¨ m et al., 2001b). To analyze if this effect of AAA is dependent on the nature of the rest of the DR, four of the DR sequences were chosen for further studies of the 12 codon influence in constructs with SD 2. Two codons, one favorable (AAA)
3. Results 3.1. The effect of natural DR sequences on gene expression In this study two different Shine-Dalgarno (SD) sequences were used upstream of a lacZ reporter gene. The plasmid pCMS71 (Stenstro¨ m et al., 2001b) contains a sequence designed to show low complementarity to the antiSD region on the 16S rRNA (SD 2; UAAAUAAA), while the derivative pCMS80 (Stenstro¨ m et al., 2001a) contains a canonical Shine-Dalgarno sequence (SD 1; UAAGGAGG). Both Shine-Dalgarno sequences were inserted with an appropriate spacing of eight nucleotides to the initiation codon (Fig. 1). Thirteen downstream region sequences (DRs) originating from naturally occurring E. coli genes and one originating from Staphylococcus aureus were cloned into the region immediately following the initiation codon, to analyze the influence of this region on gene expression. The data in Table 1 show the levels of gene expression associated with these naturally occurring DRs together with either a strong (SD 1) or weak (SD 2) upstream region. Together with SD 1 the highest expression was found for the trpL and trpB constructs, which were about 5-fold higher than for aroH (Table 1). An inspection of the molar ratios of the tRNA/ribosome at the relevant growth rate (Dong et al., 1996) shows that these ratios for the trpL pentacodon context (0.22, 0.44, 0.47, 0.12, 0.48 tRNA/ribosome, respectively) and the aroH context (0.60, 0.50, 0.50, 0.22, 0.22) are not remarkably different from each other. The difference in expression associated with these DRs is therefore not likely a consequence of different tRNA pools. Also the fusA, aroG and spaA downstream sequences generated Table 1 Effect on expression by naturally occurring DRs a Gene
fusA tsf trpL (DR-A) alaS prfA galK trpB aroG aroH malF trpA lacI araC spaA
SD 1
SD 2
DR
Plasmid
Expression
Plasmid pCMS
Expression
413 414 30 412 415 422 420 418 417 423 419 424 421 445
471 ^ 43 138 ^ 12 664 ^ 12 160 ^ 12 226 ^ 14 270 ^ 8.5 616 ^ 19 387 ^ 42 127 ^ 14 224 ^ 11 132 ^ 10 148 ^ 16 202 ^ 2.0 518 ^ 28
399 400 40 398 401 409 407 405 403 410 406 411 408 383
71 ^ 2.6 23 ^ 1.6 234 ^ 7.6 14 ^ 1.1 24 ^ 3.4 107 ^ 18 276 ^ 30 19 ^ 2.1 10 ^ 0.7 64 ^ 8.6 9 ^ 1.4 3 ^ 0.4 8 ^ 0.7 27 ^ 1.6
aat aat aat aat aat aat aat aat aat aat aat aat aat aat
AUG AUG AUG AUG AUG AUG AUG AUG AUG AUG AUG GUG AUG UUG
GCU GCU AAA AGC AAG AGU ACA AAU CGU GAU GAA AAA GCU AAA
CAI
CGU GAA GCA AAG CCU CUG ACA UAU GGC GUC CGC CCA GAA AAG
ACA AUU AUU AGC UCU AAA UUA CAG GGC AUU UAC GUA GCG AAA
ACA ACC UUC ACC AUC GAA CUU AAC AAA AAA GAA ACA CAA AAC
CCC GCA GUA GCU GUU AAA AAC GAC AAA AAG UCU UUA AAU AUU
ucg ucg ucg ucg ucg ucg ucg ucg ucg ucg ucg ucg ucg ucg
0.744 0.769 0.677 0.496 0.452 0.422 0.409 0.379 0.320 0.359 0.338 0.295 0.250
Origin of downstream sequences from indicated genes. The sequences are preceded either by a strong Shine-Dalgarno (SD 1) or a sequence showing low homology to a canonical Shine-Dalgarno (SD 2). The trpL downstream region is in some cases referred to as DR-A. b-Galactosidase activity is given in Miller units (Miller, 1972), and each value represents the mean value from at least six independent measurements. CAI values correspond to each respective gene. spaA originates from Staphylococus aureus where no CAI value is available. a
C.M. Stenstro¨ m, L.A. Isaksson / Gene 288 (2002) 1–8
4
Table 2 Effect on gene expression by the 12 codon in naturally occurring DRs a Gene
Plasmid pCMS
Expression
tsf:3 tsf:4 trpB:3 trpB:4 araC:3 araC:4 galK:3 galK:4 trpL (GGG)
426 427 435 436 429 430 432 433 129
142 ^ 3.8 10 ^ 0.8 372 ^ 10 54 ^ 5.7 21 ^ 1.7 13 ^ 2.0 194 ^ 5.2 29 ^ 2.3 19 ^ 3.9
DR
aat aat aat aat aat aat aat aat aat
AUG AUG AUG AUG AUG AUG AUG AUG AUG
AAA GGG AAA GGG AAA GGG AAA GGG GGG
Relative expression GAA GAA ACA ACA GAA GAA CUG CUG GCA
AUU AUU UUA UUA GCG GCG AAA AAA AUU
ACC ACC CUU CUU CAA CAA GAA GAA UUC
GCA GCA AAC AAC AAU AAU AAA AAA GUA
ucg ucg ucg ucg ucg ucg ucg ucg ucg
6.17 0.43 1.35 0.20 2.62 1.62 1.81 0.27 0.08
a Analysis of the favorable AAA and the unfavorable GGG as 12 codon in four gene variants. A region, showing low homology to a canonical ShineDalgarno (SD 2) precedes the sequences. Relative expression is given as the ratio compared to the original DR, as shown in Table 1. b-Galactosidase activity is given in Miller units (Miller, 1972), and each value represents the mean value from at least six independent measurements.
and one unfavorable (GGG) (Stenstro¨ m et al., 2001b) were introduced in position 12 of the DRs derived from tsf, trpB, araC and galK.. As can be seen in Table 2 AAA instead of the natural codon increased expression, while GGG decreased expression for all, except araC. Thus, AAA as 12 codon promotes high expression if present in several DRs. GGG is associated with low expression values. However, even if AAA and GGG are associated with high and low expression, respectively, the levels of the effect are also dependent on the rest of the sequence in the DR.
about 1.5 times could be detected as a consequence of changing codons within the codon family (Table 3). If the same DR constructs were initiated by an AUG, no further significant increase of the already high expression value was found (pCMS384 and pCMS386). A comparison between the UUG (pCMS383) and AUG (pCMS384) constructs shows an increase of 13 times while for the pair UUG (pCMS385) and AUG (pCMS386) the difference in expression is about nine times. The pCMS387 and pCMS388 variants have an AAA lysine codon in position 12, which is known to influence expression positively. In this case only a small further increase in the high expression value is obtained by the alteration of the other downstream codons. For the low activity pair pCMS389 and pCMS390, however, a substantial increase is obtained by the alterations in the downstream codons. A series of five different DRs, generating the same amino
3.2. mRNA context effects The influence of the context in the DR was analyzed by changes in the mRNA, without altering the amino acid sequence in the protein product. When the DR is preceded by a UUG initiation codon an increase in expression of Table 3 Effects on gene expression by iso-codons in DRs a Plasmid pCMS
Expression
DR
383
27 ^ 1.6
aat
UUG
385
40 ^ 2.5
aat
UUG
384
355 ^ 36
aat
AUG
386
371 ^ 35
aat
AUG
387
284 ^ 24
aat
AUG
388
314 ^ 14
aat
AUG
389
26 ^ 2.8
aat
AUG
390
72 ^ 7.4
aat
AUG
Lys AAA Lys AAA Lys AAA Lys AAA Lys AAA Lys AAA Ala GCA Ala GCA
Lys AAG Lys AAA Lys AAG Lys AAA Lys AAG Lys AAA Lys AAA Lys AAA
Lys AAA Lys AAA Lys AAA Lys AAA Thr ACA Thr ACU Lys AAG Lys AAA
Asn AAC Asn AAU Asn AAC Asn AAU Ala GCU Ala GCA Thr ACA Thr ACU
Ile AUU Ile AUA Ile AUU Ile AUA Ile AUC Ile AUA Ala GCU Ala GCA
ucg ucg ucg ucg ucg ucg ucg ucg
a Iso-codons in the DR, giving the same amino acid sequence are analyzed. The constructs are preceded by a sequence showing low homology to a canonical Shine-Dalgarno (SD 2). b-Galactosidase activity is given in Miller units (Miller, 1972), and each value represents the mean value from at least six independent measurements.
C.M. Stenstro¨m, L.A. Isaksson / Gene 288 (2002) 1–8
5
Table 4 Effects on gene expression by iso-codons in the DR a Plasmid pCMS
Expression
DR
367
4.2 ^ 0.7
aat
AUG
368
5.9 ^ 0.7
aat
AUG
364
16 ^ 1.4
aat
AUG
366
23 ^ 3.0
aat
AUG
370
35 ^ 2.4
aat
AUG
Ala GCA Ala GCA Ala GCA Ala GCA Ala GCA
Ala GCG Ala GCC Ala GCA Ala GCA Ala GCA
Gly GGU Gly GGG Gly GGC Gly GGA Gly GGC
Ser UCU Ser AGC Ser AGC Ser UCA Ser AGC
Ile AUU Ile AUC Ile AUU Ile AUA Ile AUC
ucg ucg ucg ucg ucg
a
Five different downstream contexts that code for the same amino acid sequence are analyzed. The sequence upstream of the initiation codon shows no homology to a Shine-Dalgarno sequence (SD 2). b-Galactosidase activity is given in Miller units (Miller, 1972), and each value represents the mean value from at least six independent measurements.
replaced another codon (GCA at 12, AUU at 13) and this fact could possibly give the observed effect. A further introduction of an AAA in position 15 decreased the efficiency about 40–50%, which could possibly be the result of a local lysyl-tRNA limitation.
acid sequence, were compared. The results presented in Table 4 illustrate the stimulation that can be achieved when using different iso-codons in a codon family. An 8fold increase in expression can be obtained if a DR that is unfavorable for expression is properly modified within the codon family to maintain the amino acid sequence in the protein product. Thus by choosing appropriate codons in a codon family a DR that gives low expression can be improved to give higher expression. DRs that are associated with a high expression can not be substantially improved further, unless other alterations are introduced, for instance involving the SD sequence upstream of the initiation codon.
3.4. Influence of transcriptional activity The functional half-lives of different variants of lacZ mRNA were determined as described earlier (Stenstro¨ m et al., 2001a), to analyze if the differences in expression were due to mRNA stability and thus mRNA levels. Indistinguishable stability (250 ^ 10 s) was found, when testing a number of analyzed constructs, indicating that the observed differences in expression are not due to an altered mRNA pool (data not shown).
3.3. Influence of lysine codons The DR-A sequence (trpL) contains a favorable AAA lysine codon in position 12 and gives high gene expression. The effect of this codon also at other positions was analyzed. Introduction of AAA at positions 13 and 13/14 increased expression slightly, irrespective of whether the gene variants were preceded by a SD 1 or a SD 2 (Table 5). It is noted that the variant with AAA at position 12 and 13 is somewhat lower than the constructs with this codon at only 12 or at positions 12, 13 and 14. The reason for this effect is not clear. However, in this experiment the AAA codon
3.5. Polarity effects and mRNA secondary structure Computer analyzes spanning 55 nucleotides surrounding the ribosome binding site did not show any differences in secondary structure of the investigated constructs, that could explain the differences in expression (Zuker et al., 1999; Mathews et al., 1999). Expression of constructs with varied 12 codons in the polar free protein A system are in reasonable agreement with the corresponding DRs in the lac-
Table 5 Effects on gene expression by lysine codons in the DR a Name
DR-A
a
SD 1
SD 2
DR
Plasmid
Expression
Plasmid
Expression
pCMS 30 pCMS 494 pCMS 495 pCMS 496
665 ^ 11 594 ^ 8.2 674 ^ 6.1 401 ^ 35
pCMS 40 pCMS 371 pCMS 372 pCMS 373
234 ^ 7.6 250 ^ 9.0 312 ^ 8.2 156 ^ 5.0
aat aat aat aat
AUG AUG AUG AUG
AAA AAA AAA AAA
GCA AAA AAA AAA
AUU AUU AAA AAA
UUC UUC UUC AAA
GUA GUA GUA GUA
ucg ucg ucg ucg
Changes in the downstream region of DRA, introducing lysine codons from position 12 to 15 are indicated. The sequences are preceded either by a strong Shine-Dalgarno (SD 1) or a sequence showing low homology to a canonical Shine-Dalgarno (SD 2). The DR-A constructs in pCMS30 and pCMS40 are homologous to the trpL sequence. b-Galactosidase activity is given in Miller units (Miller, 1972), and each value represents the mean value from at least six independent measurements.
6
C.M. Stenstro¨ m, L.A. Isaksson / Gene 288 (2002) 1–8
system used here (Bjo¨ rnsson and Isaksson, 1993; MottaguiTabar et al., 1994; Stenstro¨ m et al., 2001b). We therefore have no reason to imply any influence of polarity on the expression characteristics associated with the DRs analyzed here. 4. Discussion 4.1. A strong DR can act together with the SD to increase the translation initiation Previous studies have shown that the downstream sequence located immediately 3 0 of the initiation triplet affects gene expression (Faxe´ n et al., 1991; O’Connor et al., 1999; Stenstro¨ m et al., 2001b) at the level of translation initiation and/or early elongation (Stenstro¨ m et al., 2001a). The results presented here are obtained by analysis of natural DRs composed of the five codons that follow the initiation codon and variants thereof. The effects by the DRs on gene expression are not correlated with the known expression of the natural corresponding gene. This suggests that transcriptional control and/or other initiation signals besides the DR determine or superimpose the possible contribution by the DR to the expression levels for the native genes. Several recognition sites proposed to control initiation site selection and/or initiation efficiency have been proposed earlier (Stenstro¨ m et al., 2001b and references therein). The context of the DR that follows the initiation codon influences translation initiation and/or early elongation. This downstream context has a cooperative effect together with a strong SD upstream of the initiation codon (Stenstro¨ m et al., 2001a). The first codon of the DR (12 codon) has pronounced effects on gene expression. Depending on the codon at position 12, a difference in gene expression up to 20 times could be seen together with a weak SD sequence. A correlation between the efficiency of a certain codon and its abundance in position 12 in E. coli genes was also found (Stenstro¨ m et al., 2001b). The results presented here show that the nature of the codons constituting the DR sequence is not limited to the 12 codon, since the nature of all the five codons is of importance for the effect on gene expression. The results suggest that given a variation only within the five codons of the DR, gene expression reaches an upper limit that is almost 50 times higher than the lowest one if AUG is the start codon and the gene is preceded by the weak SD 2 sequence. If the gene is preceded by the strong SD 1 sequence, expression is higher and the difference is about five times. The SD 1 sequence can increase gene expression between 2 and 50 times, compared to the SD 2 counterpart, depending on the initiation codon and the DR sequence. 4.2. Effects of SD and naturally occurring DRs Data from chemical crosslinking and different protection assays suggest that during initiation the SD region in the
mRNA binds to the 16S rRNA in the small ribosomal subunit that covers a region on both sides of the initiation codon. Even though different suggestions for the exact extension of the region have been presented, the ribosome should cover the codons that constitute the DR downstream of the initiation codon, besides the Shine-Dalgarno sequence and the initiation codon itself (Kang and Cantor, 1985; Hartz et al., 1988; Stenstro¨ m et al., 2001a and references therein). A remarkable stimulation on translation is obtained by a CA repeat sequence downstream of the initiation codon (Martin-Farmer and Janssen, 1999). However, inspection of the sequences analyzed here do not indicate any such motif that could explain the observed effects on gene expression. For the DR from the lacI gene with the weak GUG initiation codon, the effect by SD 1 is almost 50-fold. This demonstrates the strong compensation that can be achieved by SD 1 when a weak near-cognate initiation codon is used. For the other construct (spaA) containing a weak initiation triplet, the effect by SD 1 is almost 20 times. Similarly, the AUG containing araC construct shows a difference of 25 times when the SD 1 and SD 2 derivatives are compared. This indicates that the enhancing effect of the SD 1 sequence on weak initiation codons is dependent on the context of the DR sequence. The 12 codon of a very efficient DR sequence (DR-A) found in the trpL gene has been shown to influence expression up to 20-fold (Stenstro¨ m et al., 2001b). To analyze if the 12 codon also influences gene expression for other DRs the codons AAA and GGG were chosen as being efficient and inefficient 12 codons, respectively. Four of the DRs (tsf, trpB, araC and galK), together with an upstream SD 2 sequence, were changed in position 12. All four constructs with AAA increased expression compared to the original test construct. Three of the constructs (tsf, trpB and galK) showed a decrease together with GGG compared to their natural variant, while araC followed by either AAA or GGG showed a slight increase from an already low value (Tables 1 and 2). AAA is better than GGG for all four contexts (Table 2). With AAA as the 12 codon together with SD 2, expression can be comparable, or even higher than found for several of the native DRs, even if they were preceded by a SD 1 structure. Secondary structure models of ribosomal RNAs have shown that approximately 40% of the nucleotides in unpaired regions are A residues (Gutell et al., 1985), suggesting that an mRNA rich in A-residues is unstructured, thus being favorable for translation initiation. We found that different codons in the same codon family introduced large variations in expression. Comparison of such DRs based on different iso-codons, thus generating an identical protein, could give an approximately 8-fold difference (Table 4). This iso-codon effect strongly suggests that the influence on gene expression originates from the base sequence in the mRNA, and not from the encoded amino acid sequence in the protein product, even though direct proof for this is difficult to obtain. Introduction of
C.M. Stenstro¨ m, L.A. Isaksson / Gene 288 (2002) 1–8
five minor AGG codons downstream of the initiation triplet inhibited expression of b-galactosidase almost completely (Chen and Inouye, 1994). The decrease was suggested to be the result of a limited availability of charged tRNAs for the minor codon, causing ribosome stalling. This would then lead to a masking of the initiation site for incoming ribosomes. AAA is the most common and most expression promoting codon in position 12 (Stenstro¨ m et al., 2001b) and it is naturally found in DR-A, representing trpL. This codon was analyzed at positions 13, 13/14 of the DR-A. A slight increase in expression was seen both when the DR was preceded by SD 1 or SD 2. Introduction of one more AAA codon in position 15 decreased expression by 40– 50% (Table 5). Thus, it is possible that also a local decrease of the Lys-tRNA Lys pool could lower gene expression to a certain extent. The negative influence by mRNA secondary structure on translation initiation has been well documented (Stenstro¨ m et al., 2001a and references therein). However, mutations in the ribosome binding site that affect translation have been found, which are not possible to explain in terms of mRNA secondary structure (Looman et al., 1987; Stenstro¨ m et al., 2001a and references therein). Instead, concern should be taken to the proper sequence of the ribosome binding site and not only to unfavorable secondary structures (Dreyfus, 1988). It has been pointed out (Jacques and Dreyfus, 1990), that translation is initiated before transcription is completed, and hence before the mRNA reaches a fully folded structure. RNA polymerase encounters pause sites in the lac message (starting from base 1180), or even terminates transcription, if its not closely followed by a translating ribosome (Rutenhouser and Richardson, 1989; Stanssens et al., 1986). This indicates that the secondary structures of the mRNA probably is not of a high dignity when just transcribed. Thus, the ribosome closely following the RNA polymerase can enter the ribosome binding sequence while it is still unfolded. The mRNA secondary structures, if stable enough (deSmit and vanDuin, 1994), will only be formed after the first ribosome has elongated further down along the mRNA, thus limiting the effect by secondary structure on binding of a second ribosome. Even if a certain effect by secondary structure thus can be anticipated computer analyzes (Zuker et al., 1999; Mathews et al., 1999) of constructs analyzed here did not reveal any correlation between gene expression and potential secondary structures, involving 55 bases with the initiation codon in the middle. Likewise, we have no reason to expect that differences in mRNA pools as the result of different degradation can explain the observed effects associated with the different DRs analyzed here. The SD and the DR sequences, upstream and downstream of the initiation codon, respectively, act together on the efficiency of translation initiation but they can also independently give high gene expression. Optimal translation initiation, giving a high yield of the cognate protein, is detected only if efficient variants of both sequences are present. In fact, the otherwise weak UUG initiation codon can be as
7
efficient as AUG provided it is surrounded by a strong SD and a strong DR (Stenstro¨ m et al., 2001a). The SD sequence anneals to the anti-SD region on the 16S rRNA, making the first contact between the mRNA and the 30S subunit. Some nucleotides in the DR could form base pairs to different locations on the 16S rRNA (Sergiev et al., 1997), and such interactions could possibly affect translation efficiency. Drop-off of short peptidyl-tRNAs must be considered as a possibility to explain the observed variable effects on gene expression by various DRs. The reason for the 12 codon effect on gene expression is not clear, although an involvement of the respective decoding tRNA seems likely (Stenstro¨ m et al., 2001b). Other possible explanations have been put forward like interactions with the 16S rRNA, or with the initiation factors (Stenstro¨ m et al., 2001b). For instance, it has been shown that IF3 can discriminate non-canonical initiation codons and that IF1 contains an oligomer-binding motif (Stenstro¨ m et al., 2001a and references therein).
Acknowledgements This work has been supported by grants from NFR, SSF and TFR to LAI.
References Amann, E., Ochs, B., Abel, K.J., 1988. Tightly regulated tac promoter vectors useful for the expression of unfused proteins in Escherichia coli. Gene 69, 301–315. Bjo¨ rnsson, A., Isaksson, L.A., 1993. UGA codon context which spans three codons. J. Mol. Biol. 232, 1017–1029. Chen, G.-F.T., Inouye, M., 1994. Role of the AGA/AGG codons, the rarest codons in global gene expression in Escherichia coli. Genes Dev. 8, 2641–2652. deSmit, M.H., vanDuin, J., 1994. Control of translation by mRNA secondary structure in Escherichia coli. J. Mol. Biol. 244, 144–150. Dong, H., Nilsson, L., Kurland, C.G., 1996. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J. Mol. Biol. 260, 649–663. Dreyfus, M., 1988. What constitutes the signal for the initiation of protein synthesis on Escherichia coli mRNAs? J. Mol. Biol. 204, 79–94. Faxe´ n, M., Plumbridge, J., Isaksson, L.A., 1991. Codon choice and potential complementarity between mRNA downstream of the initiation codon and bases 1471–1480 in the 16S ribosomal RNA affects expression of gln S. Nucleic Acid Res. 19, 5247–5251. Firpo, M.A., Dahlberg, A.E., 1998. The importance of base pairing in the penultimate stem of Escherichia coli 16S rRNA for ribosomal subunit association. Nucleic Acids Res. 26, 2156–2160. Gutell, R.R., Weisser, B., Woese, C.R., Noller, H.F., 1985. Comparative anatomy of 16S-like ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 32, 155–216. Hartz, D., McPheeters, D.S., Traut, R., Gold, L., 1988. Extention inhibition analysis of translation initiation complexes. Methods Enzymol. 164, 419–425. Jacques, N., Dreyfus, M., 1990. Translation initiation in Escherichia coli: old and new questions. Mol. Microbiol. 4, 1063–1067. Kang, C., Cantor, C.R., 1985. Structure of ribosome-bound messenger RNA as revealed by enzymatic accessibility studies. J. Mol. Biol. 181, 241–251. Looman, A.C., Bodlaender, J., Comstock, L.J., Eaton, D., Jhurani, P.,
8
C.M. Stenstro¨ m, L.A. Isaksson / Gene 288 (2002) 1–8
deBoer, H.A., van Knippenberg, P.H., 1987. Influence of the codon following the AUG initiation codon on the expression of a modified lacZ gene in Escherichia coli. EMBO J. 6, 2489–2492. Martin-Farmer, J., Janssen, G.R., 1999. A downstream CA repeat sequence increases translation from leadered and unleadered mRNA in Escherichia coli. Mol. Microbiol. 31, 1025–1038. Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H., 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911–940. Miller, J.H., 1972. Experiments in Molecular Genetics, Cold Spring Harbor Laboratory Press, Cold Spring Harbor. Mottagui-Tabar, S., Bjo¨ rnsson, A., Isaksson, L.A., 1994. The second to last amino acid in the nascent peptide as a codon context determinant. EMBO J. 13, 249–257. Neidhardt, F.C., Bloch, P.L., Pedersen, S., Reeh, S., 1977. Chemical measurement of steady-state levels of ten aminoacyl-tRNA synthetases in E. coli. J. Bacteriol. 129, 378–387. O’Connor, M., Asai, T., Squires, C.L., Dahlberg, A.E., 1999. Enhancement of translation by the downstream box does not involve base pairing of mRNA with the penultimate stem sequence of 16S rRNA. Proc. Natl. Acad. Sci. USA 96, 8973–8978. Rutenhouser, E.C., Richardson, J.P., 1989. Identification and characterization of transcription termination sites in the Escherichia coli lacZ gene. J. Mol. Biol. 208, 23–43. Schneider, T.D., Stormo, G.D., Gold, L., Ehrenfeucht, A., 1986. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415–431. Sergiev, P.V., Lavrik, I.N., Wlasoff, V.A., Dokudovskaya, S.S., Dontsova, O.A., Bogdanov, A.A., Brimacombe, R., 1997. The path of mRNA
through the bacterial ribosome: a site-directed crosslinking study using new photoreactive derivatives of guanosine and uridine. RNA 3, 464–475. Sharp, P.M., Burgess, C.J., 1992. Selective use of termination codons and variations in codon choice. In: Hatfield, D.L., Lee, B.J., Prtle, R.M. (Eds.). Transfer RNA in Protein Synthesis, CRC Press Ltd, Boca Raton, FL, pp. 397–425. Shine, J., Dalgarno, L., 1974. The 3 0 -terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to non-sense triplets and ribosome binding sites. Proc. Nat. Acad. Sci. USA 71, 1342–1346. Stanssens, P., Remaut, E., Fiers, W., 1986. Inefficient translation initiation causes premature transcription termination in the lacZ gene. Cell 44, 711–718. Stenstro¨ m, C.M., Holmgren, E., Isaksson, L.A., 2001a. Cooperative effects by the initiation codon and its flanking regions on translation initiation. Gene 273, 259–265. Stenstro¨ m, C.M., Jin, H., Major, L.L., Tate, W.P., Isaksson, L.A., 2001b. Codon bias at the 3 0 -side of the initiation codon is correlated with translation initiation efficiency in Escherichia coli. Gene 263, 273–284. vanEtten, W.J., Janssen, G.R., 1998. An AUG initiation codon, not codonanticodon complementarity, is required for the translation of unleadered mRNA in Escherichia coli. Mol. Microbiol. 27, 987–1001. Wu, C.J., Janssen, G.R., 1996. Translation of vph mRNA in Streptomyses lividans and Escherichia coli after removal of the 5 0 untranslated leader. Mol. Microbiol. 22, 339–355. Zuker, M., Mathews, D.H., Turner, D.H., 1999. Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: Barciszewski, J., Clark, B.F.C. (Eds.). RNA Biochemistry and Biotechnology, Kluwer Academic Publishers, Dordrecht, pp. 11–43.