,J. Mol.
Biob. (1982)
156, 807-820
Analysis of Ribosome Binding Sites from the sl Message of Reovirus Initiation
at the First and Second AUC; Codons MARILYN
KOZAK
Depurtment of Biological Sciences University of Pittsburgh Pittsburgh, PA 15260, U.S.A. (Received
15 October 1981)
Two ribosome-protected initiation sites from the sl message of reovirus have been characterized. Comparison of these sites with the previously determined sequence of sl mRNA (Li et al., 1980) reveals that wheat germ ribosomes select and protect the first two AUG triplets in that message. This is unusual, since ribosomes initiate at a single site, the 5’-proximal AUG, in almost all other eukaryotic messenger RNAs that have been examined.The first AUG codon in sl mRNA is preceded by a pyrimidine in position -3, thus distinguishing it from most other eukaryotic messages, which have a purine (usually A) in that position. The behavior of sl mRNA is consistent with the hypothesis that flanking nucleotides modulate the efficiency with which the migrating 40 S ribosomal subunit recognizes an AUG codon as a stop signal.If the first AUG triplet is flanked by suboptimalsequences, as in sl mRNA, some 40 S ribosomes bypassthat site and initiate at the next AUG
downstream. The second AUG in the sl messageconforms to the consensus sequence (A-N-N-A-U-G-G)
for eukaryotic
initiation
sites.
1. Introduction Most eukaryotic messenger RNAs are functionally monocistronic. Ribosomes initiate at a single site, usually at the AUG codon that lies closest to the $-terminal cap. To explain this curious restriction, I have postulated that the 40 S ribosomal subunit binds initially at or near the 5’ end of a messageand subsequently migrates along the RNA chain, stopping when it encounters the first AUG triplet which, solely by virtue 6f its position, is the initiator codon (Kozak, 1978,1980a,1981a). This “scanning mechanism” accounts for many aspects of translational initiation in eukaryotes, but it fails to explain those few instances in which translation begins at an AUG codon that is not 5’-proximal (reviewed by Kozak, 1981a) or at two AUG codons in the same message(Celma & Ehrenfeld, 1975; Piatak et al., 1979; Preston & McGeoch, 1981). The sl messageof reovirus falls into the latter category; i.e. ribosomes appear to initiate at the first and the second AUG triplets in sl mRNA. Evidence 807 0022-2836/82/120807-14
$03.00/O
Q
1982
Academic
Press
Inc.
(London)
Ltd.
808 documenting
M.
this urlusual behavior
KOZAK
is presented
below.
The sequence
flanking
the
first AUG triplet in the sl message is&G-G~~-T--G~~. The presence of a pyrimidine in position -3 distinguishes sl mRNA from all other reovirus messages (and nearly all other eukaryotic messages), in which a purine occurs three nucleotides before the AUG initiator codon. I have recently proposed that flanking nucleotides might modulate the efficiency with which the migrating 40 S ribosomnl subunit recognizes an AUG codon as a stop signal (Kozak. 1981b). Simply stated, the hypothesis is that all 40 S subunits stop when they encounter an AUG triplet preceded by a purine in position - 3. On the other hand, an AUG codon preceded by a pyrimidine in position - 3 is “leaky” : some 40 S subunits stop, while some advance to the next ACG codont. The present studies with the sl message of reovirus provide support for this idea.
2. Materials
and Methods
(a) Synthesis and fractionation of reovirw mRNAs The Dearing strain of reovirus type 3 was grown as described by Banerjee & Shatkin (1970). Purified virions, treated with chymotrypsin to activate the associated transcriptase, were incubated in the presence of one (a-32P)-labeled nucleoside triphosphate (New England Nuclear) and 3 non-radioactive triphosphates. The reaction conditions for transcription in vitro were as described by Both et al. (1975) except that the nucleoside triphosphate concentration was adjusted to give a specific activity of 6 x lo6 cts/min per pg of RNA. During the 2 h incubation, all 10 reovirus messenger RNAs (4 small, 3 medium and 3 large species) were transcribed. The 3 size classes were resolved by sucrose gradient centrifugation (Both et al., 1975). Only the small size class of mRNA was used in this study. (b) Characterization of ribosome-protected initiation sites A mixture of the 4 small-sized reovirus mRNAs (sl, 92, s3 and 94) was incubated with wheat germ ribosomes in a reaction mixture containing 30 mM-HEPES (pH 7.4), 72 mM-KCI. 2% mM-magnesium acetate, 2 m&i-dithiothreitol, 8 mM-creatine phosphate, creatine phosphokinase (40 pg/ml), 1 mM-ATP, 024 mn&TP and 200 PM-sparsomycin. (Sparsomycin was a gift from the Drug Research and Development Division of the National Cancer Institute.) After 10 min at 20°C to allow formation of initiation complexes, T, RNAase was added at 300 units/ml and incubation continued for 10 min at 20°C. The reactions were then chilled and layered onto 12-ml 10% to 30% (v/v) gly cerol gradients. After centrifugation at 39,000 revs/mm for 3 h at 4°C gradient fractions were collected into vials containing phenol. The peak of radioactive material associated with 80 6 ribosomes was located by monitoring Cerenkov radiation. The techniques involved in purifying, fingerprinting and sequencing the 80 S ribosome-protected mRNA fragments were as described by Kozak & Shetkin (1977a,b). Ribosome-protected initiation fragments, which derived from all 4 small-sized messages, were fractionated in 2 steps. Electrophoresis through a 20% (w/v) polyacrylamide gel separated the s2 ribosome binding site (46 nucleotides in length) from the other protected fragments, which formed a broad band in the range of 28 to 38 nucleotides (see Kozak & Shatkin, 19773, Fig. 1B). The latter set of fragments was eluted from the gel and subjected to 2-dimensional fractionation: electrophoresis at pH 3.5 on cellulose acetate was used for t A more precise formulation of the modified scanning model is that all 40 S subunits stop migrating when they encounter either A-N-N-A-IT-G-N or G-N-N-A-ll-G-R (N = any nucleotide, R = purine, Y = pvrimidine). The sequences G-N-N-A-IT-G-Y and Y-N-N-A-I-G-G are leaky; i.e. some 405 subunits stop, while some advance to the next AUG triplet. An AITG triplet flanked by pyrimidines in both positions -3 and +4 is either very weak or non-functional. The rationale for these rules has been described (Kozak, 19816).
REOVIRUS
RIBOSOME
BINDING
SITES
809
the first dimension, and homochromatography-b on a DEAE-cellulose thin-layer plate for the second (Bar&, 1971; Kozak & Shatkin, 19773). This procedure resolved all of the component oligonucleotides, which could be eluted from the thin-layer chromatogram and individually sequenced. Deduction of the nucleotide sequence w&8 facilitated by using nearest-neighbor transfer data, as described by Lebowitz et al. (1971) and Pieczenik et al. (1974).
3. Results (a) Fractionation of ribosome-protected initiation fragments from the small size class of reovirus mRNA When a mixture of the four small-sized messenger RNAs was incubated with wheat germ ribosomes in the presence of sparsomycin and the resulting 80 S initiation complexes were trimmed with T, RNAase, six ribosome-protected fragments, in the size range of 28 to 38 nucleotides, were resolved (Fig. l(a)). This set of protected fragments derived from three of the input messages, sl , s3 and s4. (The ribosome binding site from the s2 message is quite big, 46 nucleotides, and therefore it, was separated from all the others during the polyacrylamide gel electrophoresis step that preceded the fractionation in Figure l(a).) The two major spots in Figure l(a), as well as minor spot s4’, were characterized previously (Kozak & Shatkin, 19773). They derive from the s3 and s4 messages, as indicated in Table 1. Minor fragments sla and slb are the subject of the present
FIG. 1. (a) Fractionation of the 80 S ribosome-protected fragments from the small size class of reovirus mRNA. After trimming 805 initiation complexes with T, RNAase and fractionating the protected mRNA fragments in a 20% (w/v) polyacrylamide gel, 32P-labeled oligonucleotides in the size range of 28 to 38 nucleotides were eluted from the gel and subjected to $-dimensional analysis, aa described in Materials and Methods. The first dimension involved electrophoresis on cellulose acetats, and the second dimension involved chromatography on DEAE-cellulose using homomixture b. An autoradiogram is shown. Five of the ribosome-protected fragments in this Figure can be assigned to reovirus messages 81, s3 and s4 (see Table 1). The sixth spot (unlabeled) has not been identified. (b) T, RNAaee fingerprint of ribosome binding site slb labeled with [a-32P]GTP. Oligonucleotides are numbered as in Table 2. (c) T, RNAfingerprint of ribosome binding site sla labeled with [a-32P]GTP. Oligonucleotides are numbered as in Table 4. The dotted circle in (b) and (c) indicates the position of the tracking dye, xylene cyanol FF blue. Polyethyleneimine-cellulose thin-layer plates and homomixture c were used for the seoond dimension fractionation in (b) and (c).
810
M.
KOZAK
TABLE 1 #‘urnmary of ribosome four
mRNAt
Protein
51 52 a3 54
01 02 UN5 03
small messenger
binding sites derived from the RNAs of reovirus type 3
40s ribosomeprotected fragmentf
80 S ribosome-protected Original designation
Renamed
Y and Z wvl X W and W
--II s46 545 s54
fragments5
(Fig.
ala and slb (not shown in Fig. s3 s4 and ~4’
l(a))
l(a))7
t The correlation between a given message, its 40 S ribosome binding site and the encoded protein is based on data presented by McCrae & Joklik (1978), Mustoe et al. (1978). Darzynkiewicz & Shatkin (1980) and Levin & Samuel (1980). Transcription of each genome segment (Sl S2. etc.) yields a colinear mRNA, which carries the corresponding lower-case designation (sl. ~2. etc.). 1 The ribosome binding sites described in the Table are the regions of each message protected against T, RNAase by wheat germ 40 S or 80 S (sparsomycin-blocked) ribosomes. The numbers assigned to 40 8 ribosome binding sites (Kozak & Shatkin, 19778; Kozak, 1977) indicate the length of each protected fragment. The 40%protected fragment from the s2 message, for example, is 46 nucleotides long. counting the m’G c&p. Fragments s46 and a45 were called 45R and 45L (right and left) in early papers. 5 80 S ribosome-protected fragments from the s2, s3 and s4 mRNAs were characterized previously (Kozak & Shatkin, 1977b). The letter designations (W through Z) used in the earlier studies are shown in column 4. The new designations in column 5 correspond to those in Fig. l(a) of this paper. The justification for assigning fragments sla and slb to the sl message is given in the text. The two 80 Sprotected sequences from the 54 message overlap; minor fragment 84’ lacks one oligonucleotide present in the major s4 fragment (Kozak & Shatkin. 1977b). This differs from the sl message, which yields 2 nonoverlapping ribosome binding sites (see the text). 11The sl message yielded no 40 S-protected fragment under conditions identical to those used for the other 3 messenger RNAs (Kozak & Shatkin, 1977b). 7 Because the 80 S ribosome-protected region of the s2 mesaage is slightly bigger than that derived from the other 3 messages, it was resolved from the others at an early stage of purification. and therefore is not included in the 2-dimensional fractionation shown in Fig. l(a).
communication. The remaining spot in Figure l(a), on the diagonal between fragments sla and slb, was obtained in low, variable yields and was not analyzed.
In earlier studies, we found that fragment sla had a capped 5’ terminus (Kozak & Shatkin, 1977b). Since the capped, 5’-proximal initiation sites from messages92, s3 and s4 had already been identified, we assumed that oligonucleotide sla was derived from the sl message, although we were unable to accumulate sufficient amounts of the fragment for detailed analysis. Preliminary experiments revealed that oligonucleotide slb lacked a cap but had an AUG codon. Although the latter characteristic suggested that it might be a genuine initiation site, we largely ignored fragment slb, because it was such a minor component of the mixture. The present
studies
indicate,
however,
that
slb
derives
from
the same
message
as
fragment sla. Since the yield of slb was always equal to or greater than sla, both sequencesmust be considered potential initiation sites in the sl message.The data in the following section are sufficient to deduce the nucleotide sequenceof ribosome binding site sib. Unfortunately, the yield of fragment sla did not permit a complete sequence analysis. But Joklik and colleagues recently reported the
REOVIRUS
RIBOSOME
BINDING
811
SITES
sequence of 76 nucleotides at the 5’ end of the sl message, which they deduced by analyzing the 3’-terminal sequence of the corresponding “minus” strand (Li et al., 1980). Given that information, the partial sequence data presented below are sufficient to establish that fragment sla derives from the 5’-terminal30 nucleotides of the sl message. Ribosome binding site sla includes an AUG codon in positions 13 to 15 ; this is the first AUG in the sl message. Further comparison with the sequence published by Li et al. (1980) reveals that ribosome-protected fragment sl b derives from the second AUG codon (positions 71 to 73) in the same message. (b) The sequence of ribosome
binding
site
slb
Complete digestion of fragment slb with T, RNAase yielded six oligonucleotides. as shown in Figure l(b). The sequenceof each T, oligonucleotide was deduced from analysis of secondary pancreatic RNAase digestion products and from nearestneighbor analyses, as summarized in Table 2. The oligonucleotides obtained upon
TABLE
An&y&s T, RNAase primary product? Tl
of TI RNAase
2
oligonucleotides
derived from
Products of secondary digestion with pancreatic RNAase [,-=P]cTP: [W~ P]UTP [a-“‘P]GTP F
T2
A&
T3
A-i’,
T4 T5 T6
ii-r: A-C
6
fragment
Products of T, RNAase [a-=P]ATP
slb
Sequence deduced for the T, oligonucleotide§
A-A-A-a-6
il. 6
I’-C-A-A-A-A-G[G]
A-A-F
6. i7. ii, 6
C-A-I’-l’r-A-A-C-G[A]
i;
ii. F
I’-A-l-C-A-C-I’-G[l’]
6. A
A-ll-A-A-U-G[G]
ii-l’.
A-6,
--
h.
A-A-1
--
A-F
ii-G
--
A-WV
--
--
6
I:
I’-G[A]
A-A-I*‘.
6
t Oligonucleotides are identified as in Fig. l(b). $ The designations [w~‘P]CTP, etc., refer to the labeled ribonucleotide triphosphate precursors. An asterisk in the body of the Table denotes the 3’nucleoside monophosphate that retained the 3*P radioactivity after hydrolysis with alkali or T, RNAase. A double dash in any column means that the oligonucleotide in question was not labeled under the indicated conditions. $ All of the oligonucleotides terminate in a 3’ phosphate, which is not shown. The nearest-neighbor residue on the 3’ side of each T, oligonucleotide is indicated in square brackets. ]I A-G[lJ is present in 2 copies. judging from the intensity of spot T5 in the fingerprint shown in Fig. l(b).
pancreatic RNAase digestion of fragment slb are displayed in Figure 2, and described in Table 3. The large pancreatic RNAase oligonucleotides overlapped several of the T, oligonucleotides. Using this information, a unique sequence for fragment slb can be deduced, as follows. (1) Oligonucleotide Pl (G-G-A-G-U[A]) must overlap T4, the only T, oligonucleotide that terminates in Py-G[G]. The structure of Pl reveals that T4 is 28
812
M.
KOZAK
FIG. 2. Pancreatic RNAase fingerprints of ribosome binding site slb. labeled with (b) [a-32P]LTTP and (c) [a-32P]CTP. In the numbered tracing (a), black spots correspond to oligonucleotides that were labeled with [w~~P]IJTP. Oligonucleotide P3 was labeled with [a-“P]CTP, and spots PO and PO’ were detected only in preparations labeled with [B-~‘P]GTP or [a-3ZP]ATP. Cellulose acetate electrophoresis was used for the first dimension separation, and homochromatography on DEAE thin-layer plates foi the second. The dotted circle labeled XC indicates the position of xplene cyanol. Oligonucleotides are identified in Table 3.
followed
by T5: T4
-T5
A-II-A-A-U-G-G-A-G-I’-A
Pl Oligonucleotide T5 must be followed by T3, the only T, oligonucleotide that begins with U-A. Thus, the order T4-T&T3 is established. (2) Pancreatic oligonucleotide P2 (G-A-G-U[G]) contains the second copy of T5 (A-G[U]) and overlaps T6 (U-G[A]), establishing the partial sequence G-A-G-U-GA. This must be followed by a T, oligonucleotide that begins with A. The only candidate is T4, since both copies of T5 have already been accounted for. Thus. the partial sequence constructed in (1) follows that deduced in (2); oligonucleotides T5-T6-T4-T5-T3 are contiguous (see Fig. 3). (3) Nearest-neighbor analysis reveals that T3 is followed by a T, oligonucleotide that begins with IJ, and the only remaining candidate is Tl (U-C-A-A-A-A-G[G]). Tl T2 T5 T4 T5 T3 --- T6 CAUUAACGAGUGAUAAUGGAGUAUCACUGUCAAAAGG ---A P -P7 P3 P2 P5 P4 Pl P7 P6 P8 PO PO' FTC. 3. Nucleotide sequence of ribosome binding site slb. Products obtained upon complete digestion with T, RNAase are prefixed with T, and correspond to the oligonucleotides shown in Fig. l(b). Products obtained upon complete digestion with pancreatic RNAase are prefixed with P, and correspond to the oligonucleotides described in Fig. 2 and Table 3. Pancreatic RNAase products released as mononucleotides are not labeled. The sequence A-G-U-G-A-U-A immediately preceding the AUG codon might base-pair with the complementary sequence 11.A-U-C-A-C-U, whmh occurs just downstream from the AUG codon. Hyphens have been omitted for clarity.
REOVIRUS
RIBOSOME
BINDING
SITES
813
TABLE 3 Analysis Pancreatic RNAase primary productt
of pancreatic
RNAase
oligonuckotides
derived from fragment
Products of secondary digestion with T, RNAase [a-32P]CTP$ [w~‘P]UTP [w~*P]GTP
Products of T2 RNAase [a-“*P]ATP
slb
Sequence deduced$
PO
--
--
A-A-A-&&
x
A-A-A-A-G[G]
PO
--
--
A-A-A-A-6
ii
A-A-A-A-G-G[X]q
Pl
--
A-6
A-G,
6
6,i;
G-G-A-G-U[A]
P2
--
A-i:
i-G.
6
6
G-A-G-U[G]
P3
A-B-C
--
A-A-F
ii
A-A-C[G]
P4
--
A-ii-U
A-A-ii
A
A-A-lJ[G]
P5
--
ii-u
--
6, it
G-A-U[A]
P6
A-C
A-F
--
--
A-C[U]
--
A-U[C,
P7
A-ii
i-6
--
PS
ii
6
--
--
G-U[C]
PO
--
--
6
- -II
WI
V]
t Oligonucleotides are identified as in Fig. 2. $, 5 See the corresponding footnotes to Table 2. ]I Neither LJMP nor CMP was visible in the pancreatic RNAase fingerprint obtained with [w~‘P]ATPlabeled material. Only a small amount of radioactive RNA was available for that fingerprint, however. Mononucleotides tend to diffuse more than oligonucleotides, making detection more difficult. T[ The structure suggested for oligonucleotide PO’ is consistent with its mobility in 2-dimensional fingerprints; i.e. slightly faster than PO in the first dimension, and slightly slower than PO in the second dimension. PO and PO’ presumably derive from staggered cutting by T, RNAaseat the 3’ end of the ribosome-protected fragment.
The order T3-Tl also accounts for oligonucleotide P8 (G-U[C]). The large pancreatic oligonucleotides PO and PO’ are contained within Tl. Since PO and PO’ are the only oligonucleotides in the pancreatic RNAase fingerprint that lack a 3’ pyrimidine, they must be derived from the 3’ end of fragment slb; thus oligonucleotide Tl is located at the 3’ end of the ribosome-protected fragment. (4) The only remaining T, oligonucleotide is T2, which must be located at the 5’ end of fragment slb. The complete sequence is presented in Figure 3. Comparison of sequence slb with the $-proximal portion of the sl message(Li et al., 1980) reveals that nucleotides 56 to 76 of that messageare identical to the first 21 nucleotides of sl b. Thus, ribosome binding site sl b derives from the second AITG triplet in the sl message. (c) Partial
sequence analysis
of ribosome
binding
site sla
Although fragment sla yielded a rather simple Ti fingerprint (Fig. l(c)), I have not been able to determine its complete nucleotide sequence. In addition to the problem of low yield, other complications included the tendency of oligonucleotide
814
M.
KOZAK
TABLE 4 Analysis
of TI RNAase
T, RNAase primary productt
Products
oligonucleotides of secondary
[a-‘*P]GTP
TO
6. F, caps
Tl
6
T2
A-E
T3
8. F
T4
A-i‘.
[x-~*P]ATP n.d.
6
derived from fragment
digestion
with
pancreatic
[W32P]crP
sla
RNAasef [a-“P]IJTP A-l’.
cap
A-7‘.
C
n.d. --
6
A-I‘
t Oligonucleotides are numbered as in Fig. l(c). $ The designations [a-‘*P]GTP. etc.. refer to the labeled ribonucleotide triphosphate precursors. An asterisk denotes the 3’nucIeoside monophosphate that retained the 32P radioactivity after tertiary digestion with T2 RNAase. The amount of radioactive material was not adequate for tertiary analysis on the [a-32P]IJTP-labeled preparation. The designation n.d. means that the analysis was not done, due to an insufficient amount of radioactive material; the oligonucleotide in question was labeled. however. A double dash in any column means that the oligonucleotide in question was not labeled under the indicated conditions. f The designation cap as used here refers to the structure m’GpppG”pCp.
Tl to undergo secondary splitting during digestion with T, RNAase, the tendency of oligonucleotide TO to streak during fingerprinting (as do many other capped oligonucleotides), and the presence of a few contaminating oligonucleotides in the T, fingerprint (seeunlabeled spots in Fig. l(c)). Nevertheless, when the partial set of data assembled in Table 4 is compared with the sequence of the sl message determined by Li et al. (1980), it is clear that ribosome binding site sla derives from the 5’ end of the sl message.The correspondence between the T, oligonucleotides of sla and the sequence of the sl messageis shown in Figure 4. An AUG codon is TO
T3 T4 --~-
Tl
T2
,,,GCUAUUGGUCGGAUGGAUCCUCGCCUACGUGAAGAAGUAGUACGGCUGAUAAUCGCAUUAACGAGUGAUAAUGGAG. IlO 2’0 3’0 4’0 510 6b 7’0’Pm. 4. Sequence at the 5’ end of the movirus sl message (serotype 3). taken from Li et (11. (1980). The nucleotide in position 11. which they were unable to identify, appears to be G based on the data in Table 4. AIiG triplets are underlined. T, oligonucleotides from ribosome-protected fragment sla are indicated above the sequence. Nucleotides 56 to 76 overlap with ribosome-protected fragment slb (see Fig. 3 and Discussion). Hyphens have been omitted for clarity.
located close to the center of fragment sla. just as we have found with all other ribosome-protected initiation sites from reovirus mRNAs. (d) Accessibility of the two initiation sites in sl mRNA Binding of wheat germ ribosomesto site slb is particularly sensitive to variations in magnesium concentration. Figure 5(a) shows that, when the magnesium
REOVIRI’S
RIBOSOME
BINDING
16
SITES
815
80 S iTlO”OSOiTeS
Unbound mRNA
10. Fraction
20 number
FIG. 5. Binding of wheat germ ribosomes to reovirus mRNA at low ~WXLT high magnesium concentration. Binding was carried out with a mixture of the 4 small-sized mRNAs, labeled with [W 32P 1CTP (a) and (b) show the SOS ribosome-protected fragments obtained at (a) 3 mnr and (b) 1.5 rnn magnesium. The fractionation was carried out as in Fig. l(a). Before addition of n&ease, samples of each reaction mixture were analyzed by glycerol gradient centrifugation. (c) Binding reactions, carried out at 1.5 (0) and 3 mM-magnesium (O), were analyzed in parallel; the gradient profiles are superimposed. Centrifugation was from right to left.
concentration was increased to 3 mM, site sla was protected (along with the more abundant initiation sites from mRNA speciess3 and s4), but ribosome binding site sl b was not evident. When binding was carried out at slightly lower magnesium concentrations (1.5 mM in Fig. 5(b) or 2% mM in Fig. l(a)), site slb was readily detected. Under the low magnesium conditions that permitted binding at both sites sl a and sl b, a disome peak sedimenting ahead of the 80 S monosomescould be seen in glycerol gradients (Fig. 5(c), filled circles). Since the major change that occurred upon lowering the magnesium concentration was enhanced binding of ribosomes to site slb (compare Fig. 5(a) with (b)), it seemsreasonable to interpret the disome peak as an indication that, in a single mRNA molecule, both sites sla and slb are accessible to ribosomes. Formation of disomes makes it unlikely that the internal site slb is active only after cleaving the messageto eliminate the upstream AUG codon. When binding of the four small-sized mRNAs was carried out in the presence of 0.8 mM-m7Gp, the overall level of mRNA bound was reduced by 75%. Analysis of
816
M.
KOZAK
the ribosome-protected initiation sites obtained in the presence of the cap analogue revealed no enrichment for site slb. as might be expected if that site were accessible only in partially degraded, uncapped RNA fragments.
4. Discussion Although ribosome binding sites from other reovirus mRNAs were sequenced several years ago, the initiation sites from the sl message have been difficult t,o analyze because of their low yield. It is not clear whether this is due to inefficient transcription of the sl message in vitro, or inefficient binding of ribosomes to that) message ; it is likely that both steps are inefficient. Despite the low yield, the partial sequence data reported for initiation site sl a are sufficient to confirm that it derives from the 5’ end of the sl message, the sequence of which was deduced by Li et al. (1980). Fragment sla includes the cap and the first AIJG codon (positions 13 to 15). It was not surprising to identify that region of sl mRNA as a ribosome-binding site. in view of the similar results obtained with other reovirus messages (Kozak. 1977). The unexpected finding was that initiation complexes also formed at the second AUG codon in sl mRNB. The first 21 nucleotides of ribosome-protected fragment slb (Fig. 3) correspond to the last 21 nucleotides of the sequence reported by Li et al. (1980) (Fig. 4). thus establishing the position of fragment sl b within that message. Since the yield of fragment slb was always equal to or greater than sla, the conclusion is that wheat germ ribosomes form initiation complexes at the first two AUG codons in the sl message of reovirus. The sl message is unusual in this regard, since ribosomes initiate at a single site (namely the Sproximal AUG) in almost all other eukaryotic messenger RNAs that have been examined (Kozak, 198la), including eight other reorirus messages (Kozak & Shatkin, 1977a.b: Kozak, 1977,1982). What causes ribosomes to protect two sites in sl mRNA! The phenomenon cannot be dismissed simply as an in vitro artifact, because the conditions used for ribosome binding with the sl message were identical to those used with the other reovirus mRNAs. Thus. there is something peculiar about the sl message itself. Although internal initiation sites can be activated readily 1,) cleaving a message (Beemon & Hunter. 1978; Pelham. 1979; Kozak, 19806: Lawrence, 1980), that does not seem to be the explanation for the extra ribosome binding site detected with reovirus sl mRNA. The sl message forms disomes in the presence of sparsomycin implying that. in a single mRNA molecule, both initiation sites are accessible to ribosomes. Moreover, initiation at site slb is sensitive to t,he cap analogue m’Gp. Had site slb been activat,ed by cleaving the message t*o eliminate the upstream AUG codon, binding of ribosomes to the derived uncapped 5’ terminus would not be inhibited by m’Gp. The sl messenger RNA of reovirus is unusual but not unique in permitting initiation at a second AUG codon. Four other eukaryotic messages display a similar phenomenon. Ribosomes appear to initiate at two sites in the late I9 8 mRNA of simian virus 40 (SV40), directing synthesis of both VP2 and VP3 (Ghosh et al., 1978; Piatak et al., 1979). The mRNA encoding thgmidine kinase also directs synthesis of a second smaller protein (Preston & McGeoch, 1981). It has lorlg been known that there are two functional initiation sites in the poliovirus genome
REOVIRI’S
RIBOSOME
BINDING
SITES
817
(Celma & Ehrenfeld, 1975; Knauert & Ehrenfeld, 1979). And ribosome binding studies in vitro with the N message of vesicular stomatitis virus revealed low-level binding at the second and third AUG codons. although the first AUG was the predominant initiation site (Rose, 1978; Gallione et al., 1981). It is noteworthy that in each of these unusual messages in which a second initiation site has been detected. the alternate site lies a short distance downstream from the first?. This provides rather striking support for the idea that ribosomes scan the message in a linear fashion. But why do ribosomes initiate at two sites in only these few messages I A possible explanation for the tendency of some ribosomes to bypass the first AUG codon and initiate at the next downstream site in reovirus sl and the other messages described above is that the first initiator codon in those messages is flanked by “unfavorable” sequences. It has been proposed that sequences flanking the initiator codon, particularly in positions - 3 and + 4, modulate the efficient> with which an AUG triplet is recognized as a “stop signal” by the migrating 40 S subunit. Three observations support this hypothesis. (A detailed justification for the following ideas is presented by Kozak (1981b).) (1) A survey of 153 mRNAs from eukaryotic cells and viruses reveals that the nucleotides flanking functional initiator codons are not random. Purines occur with ver,y high frequency in positions - 3 and + 4, with A being the preferred nucleotide in position - 3, and G the preferred nucleotide in position +4. No functional eukaryotic initiator codon has pyrimidines in both positions -3 and +4. (2) Non-functional AUG triplets, which occur in the 5’ non-coding region of a few eukaryotic messages, are flanked b? sequences that differ from those bordering most functional initiator codons. A , which characterizes most functional initiation (“- N-N-A-U-G-G r sites. is never observed among AUG codons that occur upstream from the functional initiator codon. Instead, non-functional upstream AUG codons are nearly always preceded by a pyrimidine in position -3. The working hypothesis is that all 40 S ribosomal subunits stop migrating when they encounter an optimal A Although some 40 S subunits stop at less sequence such as G-N-N-A-U-G-G. The
sequence
favorable sequences (G-N-N-A-U-G-Y or Y-N-N-A-I-G-G, Y = pyrimidine). some 40 S subunits continue migrating beyond such sites, to be stopped at the next AITG. In other words, AUG triplets flanked by unfavorable sequences are “leaky”. The general observation is: in any message in which ribosomes initiate at an AITG triplet which is not first-in-line, all of the upstream AUG t’riplets are flanked by unfavorable sequences that permit (at least some) 40 S ribosomes to migrat)e beyond. Conversely, if the first AUG codon in a message is flanked by unfavorable sequences. the prediction is that some 40 S ribosomes should advance beyond that t It is not known whether this generalization holds for poliovirus. because the location of the alternate initiation site detected by Ehrenfeld has not been mapped. Kitamura et ~2. (1981) have recently published the sequence of the poliovirus genome, and the reader is referred to their paper for speculation about the possible location of the second initiation site described by Ehrenfeld. It is clear that poliovirus is more complicated than the other messages under discussion here, since the major initiation site (encoding the poliovirus polyprotein) is preceded by 8 other APG triplets.
818
M.
KOZAK
site and initiate at the next AUG downstream. The demonstration in this paper of two ribosome binding sites in the sl message of reovirus may be taken as evidence in support of this proposition. (3) Additional support for the idea that purine nucleotides in positions -3 and +4 facilitate recognition of the AUG codon was obtained by measuring the binding of 32P-labeled oligonucleotides to wheat germ ribosomes in vitro. Binding of synthetic AUG-containing oligonucleotides was enhanced five- to 15fold by placing a purine in either of those positions (Kozak. 1981b). The 5’-proximal sequenceof sl mRNA is quite similar to that of one of the largesized reovirus messages.As shown in Figure 6. the sequence between the m7G cap
5
sl k
10
15
m7GpppG-C-"-A-;-"-G-G-U-h---G-G-A-"-k I I I I I I I m7GpppG-C-"-A-A-:-C-G-"-C-A-~-~-~-~-; ;
Ib
;5
FE. 6. Comparison between the 5’.proximal sequence of sl mRNA and one of the large-sized reovirus messages. The latter message yields a single ribosome-protected oligonucleotide, which is centered at the AITG codon in position 14 t,o 16 (Kozak. 1982). Vertical lines aw drawn between homologous bases in the 2 messages. Beyond the first Arc: codon. thr sequences diverge.
and the first AUG codon is identical in the two messagesexcept for three positions; one of the positions of non-homology occurs three nucleotides upstream from the ACG codon. Thus, the I message(in which all ribosomes initiate at the first AUG) has an A in position -3; the sl message,as noted above. has a pyrimidine in that position. Since the 5’ non-coding sequencesof the two messagesare otherwise so similar. the difference in position -3, and the fact that ribosomes tend to bypass the first ACG in sl mRNA but not in the I message, provides additional circumstantial evidence for the hypothesis t’hat a purine three nucleot,ides upstream facilitates recognition of the AlTG codon. The N-terminal amino acid sequence is not known for the reovirus al protein: thus, I do not. know which of the two ribosome binding sites described herein encodesthe N terminus of the al protein. Because the nucleotide sequenceof the sl messageis not known beyond the first 92 nucleotides. I also do not know the size of the polypeptide that would be synthesized by initiating at the alternate ribosome binding site. The two AUG codons that have been identified are in different translational reading frames. and neither frame is interrupted by a terminator codon over the limited region that has been sequenced. Finally, it should be emphasized that all of the reovirus initiation sites have been identified solely by the ribosorne-protection assay. I assume that ribosome-protected sequencescorrespond to the sites where peptide bond formation would begin. In other, more thoroughly documented systems. this assumption has been verified (Steitz, 1969; Dasgupta et al.. 19’75: Legon, 1976; Browning et al., 1980). Only rarely do ribosomes protect non-initiator sites (Steitz. 1973), and those abortive sites can be recognized by the absence of an AliG codon. Experiments with RNA-3 from brome (Ahlquist et al..
REOVIRI’S
RIBOSOME
BINDING
SITES
819
1979) and alfalfa mosaic virus (Pinck et al., 1981) and with the genomic RNA of tobacco mosaic virus (Filipowicz & Haenni, 1979) revealed an extra 80 S ribosome bound upstream from the functional initiator codon ; i.e. at a site lacking an AUG triplet. This obviously differs from the disomes obtained with reovirus sl mRNA. in which each of the two ribosome binding sites is centered about an AUG codon. Apart’ from yielding two ribosome-protected initiation sites, the sl messagehas two other interesting characteristics : (1) it fails to yield a 40 S ribosome-protected fragment when initiation complexes are formed in the presenceof GMPPCP instead of GTP (Kozak & Shatkin, 1977b); and (2) it is translated very inefficiently both in vitro (Levin & Samuel, 1980) and in viva (Zweerink & Joklik, 1970). It remains to be seen whether either of these peculiarities is related to the presence of a “leaky” $-proximal AUG codon. This work was supported from the National Institutes
by grant AI 16634 and Career Development of Health.
award
AI 00380
REFERENCES P., Dasgupta, R., Shih, D. S., Zimmern, D. & Kaesberg, P. (1979). Nature (London), 281, 277-282. Banerjee, A. K. & Shatkin, A. J. (1970). J. Viral. 6, l-11. Barrell, B. G. (1971). In Procedures in Nucleic Acid Research (Cantoni, G. L. & Davies, D. R., eds), vol. 2, pp. 751-779, Harper & Row, New York. Beemon, K. & Hunter, T. (1978). J. Viral. 28, 551-566. Both, G. W., Lavi, S. & Shatkin, A. J. (1975). Cell, 4, 173-180. Browning, K. S., Leung, D. W. & Clark, J. M., Jr (1980). Biochemistry, 19, 2276-2283. Celma, M. L. & Ehrenfeld, E. (1975). J. Mol. Biol. 98, 761-780. Darzynkiewicz, E. & Shatkin, A. J. (1980). Nucl. Acids Rec. 8, 337-350. Dasgupta, R., Shih, D. S., Saris, C. & Kaesberg, P. (1975). Nature (London), 256, 624-628. Filipowicz, W. & Haenni, A.-L. (1979). Proc. Nut. Acad. Sci., U.S.A. 76, 3111-3115. Gallione, C. J., Greene, J. R., Iverson, L. E. & Rose, J. K. (1981). J. Viral. 39, 529-535. Ghosh, P. K., Reddy, V. B., Swinscoe, J., Lebowitz, P. & Weissman, S. M. (1978). J. Mol. Biol. 126, 813-846. Kitamura, N., Semler, B. L., Rothberg, P. G., Larsen, G. R., Adler, C. J., Darner; A. J.. Emini, E. A., Hanecak, R., Lee, J. J., van der Werf, S., Anderson, C. W. & Wimmer, E. (1981). Nature (London), 291, 547-553. Knauert, F. & Ehrenfeld, E. (1979). Virology, 93, 537-546. Kozak, M. (1977). Nature (London), 269, 390-394. Kozak, M. (1978). Cell, 15, 1109-1123. Kozak, M. (198Oa). Cell, 22, 7-8. Kozak, M. (1980b). J. Viral. 35. 748-756. Kozak, M. (1981a). In Current Topics in Microbiology and Immunology (Shatkin, A. J.. ed.), vol. 93, pp. 81-123, Springer-Verlag, Berlin. Kozak, $1. (19816). Sucl. .-lcids Res. 9. 5233S52.52. Kozak, M. (1982). J. lYirol.. in the press. Kozak, M. & Shatkin, A. J. (1977a). J. Mol. Biol. 112, 75-96. Kozak, M. & Shatkin, A. J. (19776). J. Biol. Chem. 252, 6895-6908. Lawrence. C. B. (1980). Nucl. Acids Res. 8, 1307-1317. Lebowitz, P., Weissman, S. M. & Radding, C. M. (1971). ,J. Biol. Chem. 246, 5120-5139. Legon. S. (1976). J. Mol. Biol. 106, 37-53. Levin, K. H. & Samuel, C. E. (1980). Virology, 106, 1-13. Li, J. K.-K., Keene, J. D., Scheible, P. P. & Joklik, W. K. (1980). Virology, 105, 41-51. Ahlquist,
820
M. KOZAK
McCrae, M. A. & Joklik, W. K. (1978). Virology, 89, 578593. Mustoe, T. A., Ramig, R. F., Sharpe, A. H. & Fields, B. N. (1978). l’irology, 89, 594-694. Pelham, H. R. B. (1979). FEBS Letters, 160, 195-199. Piatak, M., Ghosh, P. K., Reddy, V. B., Lebowitz, P. & Weissman, S. M. (1979). In Extrachromosomul DNA. ICNIUCLA Symp. Mol. Cell. Biol. (Cummings, D. tJ., Borst. P., Dawid, I. B., Weissman, S. M. & Fox, C. F., eds), vol. 15, pp. 1999215, Academic Press, New York. Pieczenik, G., Model, P. & Robertson, H. D. (1974). J. Mol. Biol. 99, 191-214. Pinck, M., Fritsch, C., Ravelonandro, M., Thivent C. & Pinck, L. (1981). ~Vucl. Acids Kes. 9. 1087-1100. Preston, C. M. & McGeoch, D. J. (1981). J. viral. 38, 593605. Rose, J. K. (1978). Cell, 14, 345353. Steitz, tJ. A. (1969). Nature (London). 224, 957-964. Steitz, J. ii. (1973). J. Mol. Biol. 73, 1-16. Zweerink, H. J. & Joklik, W. K. (1970). l’irology. 41, 501-518.
Edited
by M. Gottesmr*n