Cell, Vol . 15, 1 1 3 -122, September 1978, Copyright © 1978 by MIT
Nucleotide Sequence of Turnip Yellow Mosaic Virus Coat Protein mRNA H . Guilley and J . P . Briand Laboratoire de Virologie Institut de Biologie Moleculaire et Cellulaire du CNRS Universite Louis Pasteur 15, rue Descartes 67084 Strasbourg, France
Summary The primary structure of the coat protein messenger RNA of turnip yellow mosaic virus is presented . This sequence is the first complete nucleotide sequence of the coat protein messenger of a plant virus to be reported . The coding region, consisting of 567 nucleotides, is flanked by a 5' noncoding region of 19 nucleotides (not including the initiation codon and the cap structure) and by a 3' noncoding region of 109 nucleotides (including the termination signal) . The coat protein mRNA has a base composition identical to that of the genome RNA with, in particular, the same high content in cytosine (38%) . The codons that govern the incorporation of amino acids into the coat protein are nonrandomly utilized : >50% of the time the third base of the codons used is a cytosine . This pattern of codon preference is particularly marked for Leu, lie, Val, Thr and Cys . Introduction The RNA extracted from preparations of turnip yellow mosaic virus (TYMV) consists of two major components with molecular weights of 2 x 10 6 and 0 .25-0 .3 x 10 6 daltons (Klein et al ., 1976 ; Pleij et al ., 1976) . The larger component is the genome RNA, whose sequence encodes all the information for the viral infectivity cycle including the cistron for the viral coat protein, which is believed to occupy the 3' terminal portion of the genome RNA (Briand et al ., 1977 ; Richards et al ., 1977) . Experiments with cell-free protein synthesizing systems have shown that genome RNA can direct the synthesis of long polypeptides, but little if any coat protein (Klein et al ., 1976 ; Pleij et al ., 1976) . The smaller RNA species found in virion preparations is, in contrast, an efficient template for coat protein synthesis in vitro and no doubt serves as the principal coat protein messenger in infected cells (Klein et al ., 1976 ; Pleij et al ., 1976) . The existence of one or more subgenomic RNAs is not confined to TYMV, but seems to be rather general among plant viruses-for example, tobacco mosaic virus (TMV) (Hunter et al ., 1976), eggplant mosaic virus (Szybiak, Bouley and Fritsch, 1978), alfalfa mosaic
virus (AIMV) (Jaspars, 1974 ; Mohier et al ., 1975) and brome mosaic virus (BMV) (Davies and Kaesberg, 1974) . With respect to this situation, two main questions arise : why is the initiation site for coat protein synthesis "closed" on the genome RNA, and why is the coat protein cistron on the subgenomic RNA translated more efficiently than the other viral cistrons? Concerning the first point, knowledge of the nucleotide sequence immediately preceding the initiation codon of the coat protein cistron on both genome and coat protein mRNAs might reveal significant features . With respect to the preferential translation of the coat protein mRNA, we could envisage that comparison of the 5' noncoding regions of both RNAs might shed light on the question, although other factors such as secondary and/or tertiary structure of the RNA may be involved in the mechanism of preferential recognition of these regions by ribosomes . In the hope of answering these questions, we undertook the determination of the primary structure of the TYMV coat protein mRNA . We have elsewhere reported the sequence of the first 110 nucleotides of TYMV genome RNA (Briand, Keith and Guilley, 1978) . The sequence of the 3' OH region of TYMV RNA has already been reported (Briand et al ., 1977 ; Silberklang et al ., 1977b) . The earlier study was carried out upon the nonfractionated mixture of genome RNA and coat protein mRNA, but since the coat protein mRNA is believed to derive from the 3' region of genome RNA, both types of molecules should have identical 3' terminal sequences . In any event, both genome and coat protein mRNA can be aminoacylated with valine to a similar extent and at comparable rates in the presence of yeast valyl-tRNA-synthetase (Giege et al ., 1978) . Results Preparation of Pure TYMV Coat Protein mRNA The RNA extracted from purified TYMV preparations consists essentially of two types of molecules : the genome RNA (molecular weight 2 x 10 6 daltons) and the coat protein mRNA (molecular weight 0 .25-0 .3 x 106 daltons) (Klein et al ., 1976 ; Pleij et al ., 1976) . The two RNA species were separated from one another in two steps : first, filtration of the RNA through a polyacrylamide-agarose column, followed by purification of the appropriate fractions on a 4% polyacrylamide-8 M urea slab gel . The autoradiogram of such a gel is shown in Figure 1 . The coat protein mRNA was extracted from the gel as described by Richards et al . (1978) . About 100 µg of totally 32 P-labeled coat protein mRNA (spec . act . 10 5 cpm/µg) were routinely obtained from six infected plants labeled with 100 mCi of
Cell 114
mensional electrophoresis for the pancreatic RNAase digest (Figure 2A) or by two-dimensional electrophoresis-homochromatography for the Tl RNAase digest (Figure 28). The molar ratio of each oligonucleotide was determined by counting the radioactivity in the spots. All the pancreatic oligonucleotides and most of the Tl oligonucleotides were well separated, and consequently, their characterization was relatively easy. Digestion of the pancreatic oligonucleotides with the complementary enzyme provided enough information for deducing their sequence, except for spots ~18, ~25, ~27, p30 and ~31, whose sequences were determined by partial Pl nuclease digestion of the 5’ 32P-labeled oligonucleotides. As can be seen in Figure 28, the coat protein mRNA contains several very large Tl oligonucleotides (up to 28 nucleotides). Their sequence was obtained by partial Pl digestion of the 5 ’ 32P-labeled oligonucleotide and confirmed by compilation of the data given by pancreatic and U2 ribonucleases. Examples of such sequence determination are given in Figure 3 for oligonucleotides tl and t4.
‘-13 +I2
+I1
-
Figure 1. Isolation by Polyacrylamide
coat protein mRNA
of Uniformly 3ZP-Labeled Gel Electrophoresis
Coat
Protein
mRNA
The coat protein mRNA-containing fractions of the ACA22 column (about 200 pg) were loaded onto a 4% polyacrylamide-8 M urea slab gel. Electrophoresis was for 16 hr at 500 V. The coat protein mRNA was located by autoradiography. I,, I2 and I:% are RNA fragments of discrete length derived from the genome RNA (see text).
carrier-free 32P. “Nonradioactive” coat protein mRNA was prepared in the same way starting from TYMV RNA of very low specific activity (lo3 cpm/ P!J). Note that, in addition to the coat protein mRNA, at least three discrete, slightly slower migrating bands appear on the gel. The material extracted from these bands was analyzed and found to be derived from the 5’ extremity of the genome RNA (authors’ unpublished observations). Sequence Analysis of Pancreatic and Tl Ribonuclease Oligonucleotides of the Coat Protein mRNA Portions of the 32P-labeled RNA were totally digested with pancreatic or Tl RNAases, and the digestion products were fractionated by two-di-
Characterization of the Sequence Following the Cap Spot p22 of the pancreatic RNAase fingerprint and spot t38 of the Tl RNAase fingerprint contain the cap moiety pppm7G which is known to block the 5’ extremity of the RNA (Klein et al., 1976). The sequence of the first 3 nucleotides at the 5’ terminus was determined by analysis of the capped end product P22 as outlined in Table 1. The deduced sequence is m’GpppAAU. The assertion made by Klein et al. (1976) in an earlier study that the 5’ terminal sequence was heterogeneous, with m7GpppG as well as m7GpppA . being present, proved to be incorrect. Presumably, the preparations of coat protein mRNA used in the earlier work were contaminated with fragments of genome RNA which is known to have a cap structure of the former type (Briand et al., 1978). Construction of the Sequence Derivation of the sequence was accomplished by examination of all subproducts obtained by partial digestion of the RNA with Tl and pancreatic RNAases under a variety of conditions. The partial digestion subproducts were prepared from uniformly labeled RNA or from nonradioactive RNA. In the latter case, the partial digestion products were afterwards 5’ terminally labeled with Y-~*P-ATP and polynucleotide kinase. Fractionation of the digests was by one- or two-dimensional electrophoresis on polyacrylamide gels. Analysis of the purified subproducts was by fingerprinting with Tl and pancreatic RNAases in the case of uniformly 32P-labeled RNA and by partial nuclease PI diges-
TYMV 115
Coat
Protein
mRNA
Figure
2. Autoradiograms
Sequence
of the Pancreatic
(A) and Tl
(B) RNAase
Fingerprints
of the Coat
Protein
mRNA
Fractionation of the pancreatic RNAase digest was by two-dimensional electrophoresis (first dimension on cellulose acetate at pH 3.5 in 7 M urea, 5 mM EDTA; second dimension on DEAE paper wetted with 7% formic acid) and by two-dimensional electrophoresis homochromatography for the Tl RNAase digest (first dimension is as above; second dimension on 20 cm x 40 cm DEAE-cellulose thinlayer plate).
tion for 5’ 3*P-labeled material. This dual approach allowed us to determine for each subproduct first, the composition in Tl and pancreatic oligonucleotides, and second, the sequence of the first several nucleotides of the subproduct (usually between 15 and 20). Figure 4 is an example of how we determined the sequence of the subproduct which corresponds to residues 135-265 of the complete sequence. In all, about 200 subproducts originating from various parts of the RNA chain were examined. Of particular interest were those fragments containing the 5’ terminus of the molecule, since they should encompass the 5’ noncoding region. The longest pure 5’ terminal fragment produced upon partial Tl RNAase hydrolysis was 54 nucleotides long. The fingerprint and the sequence of this fragment are shown in Figure 5. Only one AUG occurs in the sequence, at position 20-22. This codon is followed by 11 triplets coding for a peptide whose sequence is exactly that of the N terminus of the coat protein. We therefore conclude that this AUG is the initiation codon for the viral coat protein synthesis.
Consequently, the 5’ noncoding region of the coat protein mRNA is 19 nucleotides long (not including the cap moiety and the initiation codon). Another region of the RNA on which we focused our attention was the 3’ noncoding region. The nucleotide sequence of a fragment produced by partial Tl RNAase digestion of the total TYMV RNA and encompassing the last portion of the coat protein cistron and the 3’ OH extremity of the RNA has been reported elsewhere (Briand et al., 1977; Silberklang et al., 1977b). Aminoacylation experiments (Giege et al., 1978) and sequence homology studies (Richards et al., 1977) on genome and coat protein mRNA suggest that the smaller RNA is derived from the 3’ extremity of genome RNA and that both RNAs possess the same 3’ terminal sequence. All the evidence collected in the course of this work is consistent with the hypothesis that the 3’ terminal sequence of the coat protein mRNA is in fact identical to the sequence determined earlier by Briand et al. (1977). Analysis of the partial digestion products provided enough overlaps to build three long tracts of
Cell 116
Figure
3. Autoradiograms
Fractionation Tl RNAase
of Partial
Nuclease
Pi Digests
of the digests was by two-dimensional fingerprint (Figure 26).
of 5’ 32P-Labeled
electrophoresis
The first tract contains the cap moiety and thus constitutes the 5’ extremity of the RNA (residues 1 to 131); the second is derived from the 3’ extremity (residues 495 to 695); and the third comes from the interior of the RNA molecule (residues 135 to 493). Taken together, the three tracts account for all the Tl oligonucleotides and all the pancreatic oligonucleotides but p27 (GAAGU) and p15 (GGC). We could not find subproducts joining the three tracts, but keeping in mind that tracts 3 and 2 begin, respectively, with U and C, it is reasonable to conclude that their order is: tract 1 sequence.
I Oligonucleotides
homochromatography.
tl (4
and t4 (6)
tl and t4 correspond
to spots
1 and 4 of the
tract 3-tract 2, with GAAGU and GGC, respectively, forming the connection between tracts 1 and 3, and tracts 3 and 2. Any doubt that this interpretation is correct was eliminated by comparison of the deduced sequence with that of the TYMV coat protein (Peter et al., 1972), since only this solution is consistent with the amino acid sequence. The complete sequence of the TYMV coat protein mRNA is shown in Figure 6. The coding region is 567 nucleotides long (not including the initiation and termination codons). Heterogeneity occurs in the sequence at position
TYMV Coat Protein mRNA Sequence 117
Table 1 . Characterization of the Sequence Following the Cap Oligonucleotide p22 + U2 or T2 RNAase Core 1 + Ap + Up Phosphatase Sequence deduced Core 2 + p Venom phosphodiesterase pm7G + pA + p
m'GpppApApUp
10 µg (about 10 6 cpm) of totally labeled coat protein mRNA were digested to completion with pancreatic RNAase, and the digest was fractionated by two-dimensional electrophoresis . The spot (p22) containing the cap moiety was eluted and characterized . Digestion with U2 RNAase was in 10 mM ammonium acetate (pH 4 .5), at 1 U/mg of RNA for 16 hr at 37°C ; digestion with T2 RNAase was in 50 mM sodium acetate (pH 4 .7) at 20 U/mg RNA for 16 hr at 37°C . Fractionation of both digests was by electrophoresis on DEAE paper at pH 3 .5 . Digestion of core 1 with alkaline phosphatase was in 50 mM Tris-HCI, 10 mM MgCl2 (pH 7 .5) at 4 U/mg RNA for 30 min at 37°C . The digest was electrophoresed on DEAE paper at pH 3 .5 . Digestion of core 2 with venom phosphodiesterase was in 20 mM Tris-HCI, 10 mM MgCl 2 (pH 8 .9) at 50 µg/mg RNA for 1 hr at 37°C . Fractionation of the digest was by electrophoresis on 3MM paper . Markers were used to identify pm'G, pA, Ap and Up .
178, giving rise to two separately migrating spots for oligonucleotide t9 [spots 9A and 9B in Figure 4(1)] . The sequence of the major variant is CUUCUCUCACCAUCG and that of the minor variant is CUUCUCUCACCAUUG . The minor spot t9 is mixed with spot t8 in the T1 fingerprint (Figure 2B) . The sequence containing the base substitution is shown in Figures 4A and 4B . The heterogeneity takes place on the third base of the codon for isoleucine and does not affect the sequence of the coat protein . Discussion We report here the complete nucleotide sequence of the TYMV coat protein mRNA . This species, which accounts for 5-6% of the total RNA present in purified TYMV preparations, contains 695 nucleotides excluding the cap moiety pppm'G . The coding region (567 nucleotides) is preceded by a 5' noncoding region of 19 nucleotides and followed by a 3' nontranslated sequence of 109 nucleotides . The nucleotide sequence predicts an amino acid composition which is in total agreement with the known sequence of the viral coat protein (Peter et al ., 1972) . One interesting feature of the sequence is the high content in cytosine (38%), a characteristic common to all the RNA of most members of
the tymovirus group . The cytosine residues appear to be uniformly distributed along the RNA molecule, the longest C tracts corresponding to two runs of six residues each located at positions 4247 and 509-514 . It is known from physicochemical studies that cytosine residues are involved in RNA : protein interactions in the TYMV capsid at low pH (Jonard et al ., 1976) . It is believed that formation of hydrogen bonds between the cytosine residues and the carboxylic groups of aspartic and glutamic acids ensures the stability of the viral particle at acid pH (Briand, 1978) . The Coding Region Table 2 summarizes the codons used for the synthesis of the viral coat protein . It is clear, as has already been reported for other mRNA sequences, that the utilization of synonymous codons is nonrandom (Fiers et al ., 1976 ; Efstratiadis, Kafatos and Maniatis, 1977 ; Marotta et al ., 1977 ; Pan et al ., 1977 ; Sanger et al ., 1977) . This is particularly striking for Leu, lie, Val, Thr and Cys . Another striking feature is the preference of C over the other bases in the third position of the codons, the overall percentages being 50, 18, 18 and 14 for C, U, A and G, respectively . Among the codons in which the third base must be a pyrimidine, the preference of C over U is in the ratio of 2 .7 to 1 . A similar observation has been made in the case of MS2 RNA (Fitch, 1976) and was interpreted in terms of selection for nonwobble pairing . The preference of pyrimidines over purines in the third position of those codons which may end in any of the four bases is in the ratio of 3 to 1 ; moreover, if a pyrimidine is chosen, the triplet XXC is predominant over the triplet XXU in the ratio of 2 .5 to 1 . This preferential selection of C in the third position will probably also be found for the coding part(s) of the genome RNA and may be essential for the stabilization of the virus particle as mentioned above . It is interesting to speculate that the pattern of codon selection could be related to the host range of the virus . The preferential choice of C at the third position of the triplets would then reflect the adaptation of the virus to various factors of the host plant-for instance, tRNA isoacceptor composition . Note that, unlike TYMV coat protein mRNA, SV40 VP1, MS2 and OX174 mRNAs all show an overall preference of U over C for the third base of the triplets (Fiers et al ., 1976 ; Pan et al ., 1977 ; Sanger et al ., 1977) . The Noncoding Regions The 5' noncoding sequence of the coat protein mRNA is only 19 nucleotides long . The sequence of the corresponding region of the coat protein mRNA of brome mosaic virus (BMV) and alfalfa mosaic virus (AIMV) consists of 9 and 36 nucleo-
Cell 118
Figure mRNA
4. Nucleotide
Sequence
of a Subfragment
(Residues
135 to 265) Obtained
by Partial
Tl
RNAase
Digestion
of the Coat
Protein
(1) Tl RNAase fingerprint of the subfragment. (2) Partial nuclease Pi digest of 5’ 32P-labeled subfragment. (3) Partial nuclease Pi digest of a 5 ’ 32P-labeled fragment produced upon partial pancreatic RNAase digestion of the coat protein mRNA. (4A and 48) Nucleotide sequence around the heterogeneity found at position 178 as determined by partial nuclease Pl digestion. In all cases, fractionation was by two-dimensional electrophoresis homochromatography. The continuous lines above the sequence denote the partial Tl RNAase fragments obtained from totally labeled coat protein mRNA. The dashed lines represent the sequences read from partial nuclease PI digests of 5’ 32P partial digestion products of the coat protein mRNA.
tides, respectively (Dasgupta et al., 1975; KoperZwarthoff et al., 1977). The only recognizable common feature among these sequences is the presence of a 5’ terminal methylated cap (pppm’G). This observation may be extended to other eucaryotic mRNAs such as tobacco mosaic virus (TMV) RNA (Richards et al., 1978), TYMV genome RNA (Briand et al., 1978), globin mRNAs (Baralle and Brownlee, 1978) and reovirus mRNAs (Kozak and Shatkin, 1978). The 5’ noncoding regions of AIMV and BMV RNA 4 both lack guanosine residues (except for the 5’ terminal G) and thus differ in this respect from TYMV coat protein mRNA in which two guanosine residues are to be found before the initiation AUG codon. The coat protein mRNA is a very efficient messenger when translated in cell-free protein synthesizing systems such as wheat germ extract or rabbit reticulocyte lysate (Klein et al., 1976; Pleij et al., 1976). If we reason by analogy with procaryotic
systems (Shine and Dalgarno, 1975; Steitz and Jakes, 1975), we would expect ribosome recognition to involve base pairing between a short sequence preceding the initiator codon of the mRNA and a tract of bases near the 3’ end of the 18s rRNA. We have been unable, however, to discover a plausible base pairing scheme between the 5’ noncoding sequence of the coat protein mRNA and the known sequence near the 3’ end of 18s rRNA (Hagenbuchle et al., 1978). It is interesting in this regard, however, to note that TYMV coat protein mRNA falls in the category of mRNAs whose sequence around the leader initiation codon is AUGG; this tetranucleotide is complementary to four bases in the anticodon loop of mammalian initiator tRNA, and as suggested by Kozak and Shatkin (1978), the additional base pair may improve the efficiency of translation of the messenger. Both genome RNA and coat protein mRNA carry
TYMV 119
Coat
Protein
mRNA
Sequence
,7GpppAAUAGCAAUCAGCCCCAACA~GAAAUCGACAAAGAACUCGCCCCCCAAGACCG
L--
Figure
5. Nucleotide
Sequence
J-,---l l -A------
at the 5’ Terminus
of Coat
Protein
I I-,,,,,-,,,1 l
mRNA
(A) Autoradiogram of the Tl RNAase fingerprint of the 54 nucleotides long fragment. (B) Autoradiogram of the partial nuclease Pl digest of 5 ’ 32P oligonucleotide t17. This oligonucleotide terminates with AUG, which is the initiator codon for coat protein synthesis. Fractionations were by two-dimensional electrophoresis-homochromatography. The lines above and under the sequence denote the partial products obtained from totally labeled coat protein mRNA (solid line) or from nonradioactive RNA (dashed lines). In the latter case, the subfragments were afterwards 5’-labeled with polynucleotide kinase and partially digested with nuclease Pl.
the same tRNA-like structure at their 3’ extremity (Pinck et al., 1970; Giege et al., 1978), since they can be aminoacylated with valine in a manner comparable to authentic tRNA‘“‘. The sequence and an extensive study on the aminoacylation of this structure have already been reported (Briand
et al., 1977; Giege et al., 1978). It is worth mentioning that most eucaryotic poly(A)-containing mRNAs sequenced so far possess the hexanucleotide AAUAAA within their 3’ noncoding region (Proudfoot and Brownlee, 1976). Among the plant viral RNAs for which the sequence
Cell 1 20
567 m7G 19
coding
tRNA like 109
region
stucture
n,'G p p p A - A - U- A- G-C - A-A - U- C - A- GC- C- C -C-A - A .C - 29 -- • 38 17 -
Met
Glu
lie
Asp
Lys
Glu
Leu
Aid
Pro
Gin
Asp
Arg
Thr
Val
Thr
Val
Al .
Thr
V al
Lou
20 A-U-G .G-A-A . A -U-C .G -A- C .A-A -A .G-A - A.C- U-GG-C-C.C-CC .C-A-A .G-A-GC-G-C .A-C-C .G-U-C .A-CC .G-U-C.G-C-C .A-C-C .G-U-U.C-U-A . - 35- . 36 .-y - 75 .-26-+e5S- -46-e .41-r .60- •a 5-+ - 22 -
Pro
Al .
Val
Pro
Gly
Pro
So, Pro
Lou
Thr
It .
Lys
Gin
Pro
Ph . Gin
So,
Gin
Val
Lou
SO C-C-A.G-C-U .G-l) -C .C-C-C .G-G-C .C-C-A .U-C-A .C-C-U.C-U-C .A-C-C .A-U-C .A-A-A .C-A-A .C-C-G .U-Il-C.C-A-G .U-C-U .G-A-A .G-U-U .C-U- A . .60 . . .-4e1 .- 41 -+ -5e--64 .- --30-
Ph .
Al .
Giy
Thr
Lys
Asp
Ala
Glu
Al .
Sar
Lou
Thr
Il .
Al .
Asn
lie
Asp
So,
Val
So,
140 U - U -U.G-C-U .G-G-A .A-C-C .A-A-A .G-A-U .G-C-C .G-A-G.GC-U .U-C-U.C-U-C .A-C-C .A-U-C .G-CC .A-A-C.A-UC .G-AC .A-G-C .G.i-U .U-C- C. .60+ 1e .59 • . 62 . w1 . s 20 .49+ .64 .-
Mrs Thr Lou Thr Thr Ph . Tyr Arg His Al . Sar Lou Glu Ser Thr Its Pro La . TIP V a I 200 A-C-C .C-UC.A-C-C .A-C-C.U-U-C .U-A-C .C-G-U.C-A-U .G-C-A .U-C-U .C-U-G.G-A-A.U-C-A.C-U-C .U-G-G.G-U-C.A-C-U .A-UC .C-A-U.C-C-C. 4 -53 --. f- 3 7 14 6
TSr
Lou
Gin Al .
Pro
Thr
Phe Pro
Thr
Thr
Val
GIy
Val
Cys
TIP
Val
Pro
Ale
Asn
Ser
260 A-C-C .U-UG .C-A-A .GC-C .C-C-A . A-C-U . U-U-C .C-CG . A-C-C . A-CG .G-U-C.G-G-U .G-U-C .U-GC .U-rG.G-U-A .CCC.G-GC .A-A-U .U-CC . -49 10 ..- .44 .-+ .60- M3. .-5e- .eo- -4 712 -
Pro Vol Thr Pro Ale Gin II . Thr Lys Thr Tyr Gly Oly Gin Ill Pha C y . It* Gly Gly 320 C-C-A.G-UC.A-C-U .C-CC .G-CC .C-A-A .A-U-C.A-C-C .A-A-G .A-CC .U-A-U.G-G-U .G-G-C .C-A-G .A-UC.U-UC .U-G-C.A-U-U .G-G-C.G-GC. 241s 7 .- 33 - .63 . f 55 + 31 - f-53 - o14 .64
Ale It . Asn Thr Lou So, Pro II . Val Cys Pro Mat Met Asn Pro Lou Lys Lou Olu Arg 380 GC-C. A-UC.A-A-C.A-C-C .C U C .U-C-A .C-C-U.C-U-C .A-U-C .G-U-C .A-A-G .U-GC .CC-A .C-U-U.G-A-A .A-U-G .A-U-G.A-A-C .CCC.C-GG . 2 . f50 - .53 .---34 - -42 -. .59 . .--26
SO
Vol Lys Sar IIe Gin Ty r r Pro II . Sar II . Tis r Ala 01n Asp Lou Asp Lys Lou Lau 440 G-U-C .A-A-A .G-A-U .U-C-G .A-U-U.C-A-G .U-A-C .C-U-U .G-A-C .U-C-G .CCC .A-A-A .C-U-C .C-U-C .A-U-C .U-C-C.A-UC.A-CG.G-C-U .C-A-A. .-341, - -52 - .---39 - 43 -. -50 --- . 3
SO
Pro Thr A I a Pro Pro A l e S a r T h r Cys lie lie Thr Val So, G I y Thr L. u , al Hills 500 C-C-C. A-C-C .GC-U .C-C-C .C-C-C .G-CG.U-C-G .A-C-A .U-GC .A-U-A.A-U-A .A-C-U .G-U-A .U-C-A .GG-A .A-C-U .C-U -C .U-C-G .A-UG .C-A-C . -1 3 .-6o.-Go e4 .-- .40-y 15 .6s . 27 11
Sar
Pro
La .
Il .
Thr
Asp
Thr
So,
Thr
560
UC-U .GC-G.C-UC .A-U-C .A-C-G .G-A-C.A-C-U .ItC-C .A-C-C.U-A-A-G-U-U-C-UC-G-A-U-C-U-U-U-A-A-A-A-U-C-G-U-U-A-G-C-UC-G-C-C3 6 r51-. 8 t57--5s- -s6 - 24 . - -2
620 A-G-U-U-A-G-C-G-A-G-G-U-C-U-G-U-C-C-C-C-A-C-A-C-G-A-C-A-G-A-U-A-A-U-C-G-G-G-U - GC-A-A-C-U-C-C-C-G-C-C-C-C-U-C-U-U-C-C-G19 .49- .-32-. ..6316 --+ .-57- .64. .01- f58^+ 21
680 A-G-G-G-U-C-A-U-C-G-G-A-A-C-CGIA1 o . 1 .61 -41 -. -66 -Figure 6 . Nucleotide Sequence of TYMV Coat Protein mRNA The distribution of TYMV coat protein mRNA sequences in the coding and noncoding regions are shown schematically at the top of the figure . T1 RNAase oligonucleotides are identified by number (see Figure 2B) . The amino acid sequence determined by Peter et al . (1972) is indicated in the appropriate reading frame above the RNA sequence .
TYMV coat protein mRNA sequence 121
Table 2 . Codons Found in the TYMV Coat Protein Cistron 1 \
2
U Phe
U Leu
1
UUU UUC UUA UUG
C 1 4 0 1
UCU UCC UCA UCG
Ser
CUU 2 CUC 11 C
Leu CUA 2 CUG 1 )AUU 2 AUC 1 2 AUA 2 Met AUG 4
A
G
Val
GUU GUC GUA GUG
3 9 2 0
Thr
Ala
4 4
Tyr
4 4
Term
CCU 2 CCC 10 CCA 5 CCG 3
Pro
Ileu
A
c
ACU ACC ACA ACG
6 16 1 3
GCU GCC GCA GCG
5 8 1 1
His GIn
Asn Lys
G UAU UAC UAA UAG
1 2 1 0
CAU CAC CAA CAG
2 1 5 3
AAU AAC AAA AAG
1 3 5 2
Cys
Glu
0 4 0 2
U C A G
CGU CGC CGA CGG
1 1 0 1
U C A G
AGU { AGC AGA AGG
0 1 0 0
U C A G
GGU 2 GGC 4
U C A G
Arg
Ser
Arg
1 GAC 5 GAA 5 GAG 1
UGU UGC UGA UGG
1
Term Trp
GAU 2 Asp
2 / 3
Gly
1
GGA 2 GGG 0
The frequency of use of each codon is indicated . The initiator AUG is included . Amino acids showing a highly significant or marginally significant preference are underlined with solid or dashed lines, respectively .
of the corresponding region has been determined (Dasgupta and Kaesberg, 1977 ; Briand et al ., 1977), none contain the poly(A) tail and only TMV RNA has the hexanucleotide AAUAAA (Guilley, Jonard and Hirth, 1975) . TYMV is one of those plant viruses whose genome RNA is processed in vivo to give a functional coat protein mRNA . Apparently, analogous processing events are involved in messenger production for Sindbis virus (Cancedda, Swanson and Schlessinger, 1974) and certain oncornaviruses (Pawson, Harwey and Smith, 1977) . The reason such a strategy has been adopted by these viruses remains unknown, but it probably provides a means of regulating the production of the viral proteins, although the mechanism of regulation may differ from one messenger to another . With regard to the case of TYMV, however, it is possible to draw a 9 nucleotide base pairing scheme between the 3' extremity of the coat protein mRNA and the 5' terminus of the genome RNA (Briand et al ., 1978) . It is interesting to speculate that such secondary structure could hinder translation of the genome RNA in favor of the smaller species . Perhaps a similar mechanism exists for other genomesubgenome RNA systems . Experimental Procedures
Preparation of the TYMV Coat Protein mRNA Nonradioactive and 32 P-labeled TYMV RNA were prepared as previously described (Briand et al ., 1977) . 10-15 mg RNA were dissolved in 5 ml of 50 mM Tris-HCI (pH 7 .4), 1% SDS, heated at 60°C for 10 min and loaded onto a polyacrylamide-agarose column (ultrogel ACA22 from LKB ; 100 cm x 1 .6 cm) equilibrated with 50 mM Tris-HCI (pH 7 .4) buffer . The fractions coming off the column first correspond to the genome RNA, and those eluting later consist predominantly of coat protein mRNA (Klein et al ., 1976) . The coat protein mRNA-containing fractions were pooled,
concentrated by ethanol precipitation and loaded onto a 4% polyacrylamide-8 M urea slab gel (20 cm x 40 cm x 0 .4 cm) . The electrophoresis buffer was 0 .1 M Tris-borate, 2 .5 mM EDTA (pH 8 .3), and the gel was run at 400 V for about 16 hr . The band containing the coat protein mRNA was located on the gel using the autoradiogram as a guide, and the RNA was extracted as described earlier (Richards et al ., 1978) . The RNA was finally freed of co-precipitating acrylamide by precipitation with 10% perchloric acid (v/v) as reported by Jeppesen et al . (1972) . Sequencing Strategy During the course of this work, we combined the advantages of two sequencing techniques : the classical technique described by Sanger, Brownlee and Barrell (1965) and Brownlee (1972), and a more recent procedure which consists of introducing a 32P label at the 5' extremity of nonradioactive RNA or RNA fragments (Lockard and RajBhandary, 1976) . The use of totally 32 P-labeled RNA allowed us to determine first, the molarity of all pancreatic and T1 RNAases oligonucleotides and to characterize them roughly, and second, to obtain a catalogue of a large number of subproducts by partial RNAase digestion of the RNA . The complete sequence of all oligonucleotides was obtained using 5' 32p_ labeled oligomers . Sequences of up to 20 nucleotides were also read at the 5' extremity of various 5'-labeled partial digestion products as mentioned above . Enzymatic Digestion Total digestion of the RNA with pancreatic or T1 RNAase was carried out in 50 mM Tris-HCI (pH 7 .4), 10 MM MgC1 2 at 37°C for 30 min . Partial digestion of the RNA with pancreatic or T1 RNAase was performed either in 50 mM Tris-HCI (pH 7 .4), 10 mM MgCl 2 or in 0 .1 M Tris-HCI (pH 7 .4), 0 .1 mM EDTA . Incubations were at 0°C for 10-30 min with enzyme to substrate ratios ranging from 1 :1000 (w/w) to 1 :20,000 (w/w) for the pancreatic RNAase and 1 :250 (w/ w) to 1 :2500 (w/w) for the T1 RNAase . At the end of the hydrolysis, the enzyme was eliminated by phenol extraction . 5' End Labeling with 32P 5' end labeling of oligonucleotides or of partial digestion products of the RNA was carried out essentially according to Lockard and RajBhandary (1976) and Silberklang, Gillum and RajBhandary (1977a) . Briefly, 1-20 µg of totally or partially digested RNA in 50 mM Tris-HCI (pH 7 .5), 10 MM MgC1 2 , 10 mM dithiothreitol (reaction volume 10-50 µ1) were 5'-labeled with 5-20 µl of y- 32 P-ATP (200 Ci/mM, 300 pM/µl) and several units of T4 polynucleotide
Cell 122
kinase . The reaction mixture was incubated for 30 min at 37°C and then boiled briefly . Partial Digestion with Nuclease P1 Partial digestion of the 5' 32 P-oligonucleotides or RNA fragments with nuclease P1 was carried out as described by Silberklang et al . (1977a) . The 5'-labeled material containing 100 µg carrier tRNA was dissolved in 20 µl of 50 mM ammonium acetate buffer (pH 5 .3), and 1 µl nuclease P1 (5 µg/ml) was added . The reaction was allowed to proceed at room temperature ; aliquots were taken after 2, 6 and 15 min and rapidly frozen in a siliconized test tube . The reaction mixture was finally heated to 100 °C for several minutes before fractionation . Interpretation of the partial P1 digestion patterns was according to Silberklang et al . (1977a) . Fractionation of Oligonucleotides and RNA Fragments Oligonucleotides were fractionated by either two-dimensional electrophoresis or two-dimensional electrophoresis-homochromatography : the first dimension was on cellulose acetate at pH 3 .5 in the presence of 7 M urea ; the second dimension on DEAE paper wetted with 7% formic acid or on thin-layer plates (CEL 300, DEAE/HR-2/15 from Macherey-Nagel) developed with homomixture C (3%, 30 min hydrolyzed) . The technique of twodimensional polyacrylamide gel electrophroesis described by Frisby et al . (1976) was used for the fractionation of the partial digestion products . The first dimensions was through a 10% gel, the electrophoresis buffer begin 25 mM citric acid, 6 M urea (pH 3 .5) . The second dimension was through a 20% gel with 0 .1 M Tris-borate, 2 .5 mM EDTA (pH 8 .3) as electrophoresis buffer .
Acknowledgments We are much indebted to Professor L . Hirth for his interest and support during the course of this work . We thank our colleagues K . E . Richards and G . Jonard for helping us at different stages of this work . Polynucleotide kinase and 32 P-ATP were a generous gift from Dr . G . Keith . This work was supported in part by Delegation Generale a la Recherche Scientifique et Technique, Centre National de la Recherche Scientifique and Commissariat i I'Energie Atomique . The costs of publication of this article were defrayed in part by the payment of page charges . This article must therefore be hereby marked "advertisement" in accordance with 18 U .S .C . Section 1734 solely to indicate this fact .
D ., Merregaert, J ., Min Jou, N ., Molemans, F ., Raeymackers, A ., Van den Berghe, A ., Volckaert, G . and Ysebaert, M . (1976) . Nature 260,500-507 . Fitch, W . (1976) . Science 194, 1173-1174 . Frisby, D . P ., Newton, C ., Carey, N . H ., Fellner, P ., Newman, J . F . E ., Harris, T . J . R . and Brown, F . (1976) . Virology 71, 379-388 . Giege, R ., Briand, J . P ., Mengual, R ., Ebel, J . P . and Hirth, L . (1978) . Eur . J . Biochem . 84, 251-276 . Guilley, H ., Jonard, G . and Hirth, L . (1975) . Proc . Nat . Acad . Sci . USA 72, 864-868 . Hagenb6chle, 0 ., Santer, M ., Steitz, J . A . and Mans, R . J . (1978) . Cell 13, 551-563 . Hunter, T . R ., Hunt, T ., Knowland, J . and Zimmern, D . (1976) . Nature 260, 759-764 . Jaspars, E . M . J . (1974) . Adv . Virus Res . 19, 37-149 . Jeppesen, P . G . N ., Barrel, B . G ., Sanger, F . and Coulson, A . R . (1972) . Biochem . J . 128, 993-1006 . Jonard, G ., Briand, J . P ., Bouley, J . P ., Witz, J . and Hirth, L . (1976) . Phil . Trans . Roy . Soc . Lond . B 276, 123-129 . Klein, C ., Fritsch, C ., Briand, J . P ., Richards, K . E ., Jonard, G . and Hirth, L . (1976) . Nucl . Acids Res . 3, 3043-3061 . Koper-Zwarthoff, E . C ., Lockard, R . E ., Deweerd, B ., RajBhandary, U . L . and Sol . J . F . (1977) . Proc . Nat . Acad . Sci . USA 74, 5504-5508 . Kozak, M . and Shatkin, A . J . (1978) . Cell 13, 201-212 . Lockard, R . E . and RajBhandary, U . L . (1976) . Cell 9, 747-760 . Marotta, C . A ., Wilson, J . T ., Forget, B . G . and Weissman, S . M . (1977) . J . Biol . Chem . 252, 5040-5053 . Mohier, E ., Hirth, L ., Le Meur, M . A . and Gerlinger, P . (1975) . Virology 68, 349-359 . Pan, J ., Reddy, V . B ., Thimmappaya, B . and Weissman, S . M . (1977) . Nucl . Acids Res . 4, 2539-2548 . Pawson, T ., Harwey, R . and Smith, A . (1977) . Nature 268, 416420 . Peter, R ., Stehelin, D ., Reinbolt, J ., Collot, D . and Duranton, H . (1972) . Virology 49, 615-617 . Pinck, M ., Yot, P ., Chapeville, F . and Duranton, H . M . (1970) . Nature 226, 954-956 . Pleij, C . W . A ., Neeleman, A ., Van Vloten-Doting, L . and Bosch, L . (1976) . Proc . Nat . Acad . Sci . USA 73, 4437-4441 . Proudfoot, N . J . and Brownlee, G . G . (1976) . Nature 263, 211-214 .
Received June 6, 1978
Richards, K . E ., Briand, J . P ., Klein, C . and Jonard, G . (1977) . FEBS Letters 74, 279-282 .
References
Richards, K . E ., Guilley, H ., Jonard, G . and Hirth, L . (1978) . Eur . J . Biochem . 84, 513-519 .
Baralle, F . E . and Brownlee, G . G (1978) . Nature 274, 84-87 . Briand, J . P . (1978) . Ph .D . Thesis, Universite de Strasbourg, France .
Sanger, F ., Brownlee, G . G . and Barrell, B . G . (1965) . J . Mol . Biol . 13, 373-398. Sanger, F ., Air, G . M ., Barrell, B . G ., Brown, N . L ., Coulson, A . R ., Fiddes, J . C ., Hutchinson, C . A ., Slocombe, P . M . and Smith, M . (1977) . Nature 265, 687-694 . Shine, J . and Dalgarno, L . (1975) . Nature 254, 34-38 .
Briand, J . P ., Keith, G . and Guilley, H . (1978) . Proc . Nat . Aced . Sci . USA, in press . Briand, J . P ., Jonard, G ., Guilley, H ., Richards, K . E . and Hirth, L . (1977) . Eur . J . Biochem . 72, 453-463 . Brownlee, G . G . (1972) . Determination of Sequences in RNA (Amsterdam : North-Holland) . Cancedda, R ., Swanson, R . and Schlessinger, M . J . (1974) . J . Virol . 14, 652-663 . Dasgupta, R . and Kaesberg, P. (1977) . Proc . Nat . Acad . Sci . USA 11, 4900-4904 . Dasgupta, R ., Shih, D . S ., Saris, C . and Kaesberg, P . (1975) . Nature 256, 624-628 . Davies, J . V . and Kaesberg, P . (1974) . J . Gen . Virol . 25, 11-20 . Efstratiadis, A ., Kafatos, F . C . and Maniatis, T . (1977). Cell 10, 571-585 . Fiers, W ., Contreras, R ., Duerinck, F ., Haegeman, G ., Iseretant,
Silberklang, M ., Gillum, A . M . and RajBhandary, U . L . (1977a) . Nucl . Acids Res . 4, 4091-4108. Silberklang, M ., Prochiantz, A ., Haenni, A . L . and RajBhandary, U . L . (1977b) . Eur . J . Biochem . 72, 465-478 . Steitz, J . A . and Jakes, K . (1975) Proc . Nat . Acad . Sci . USA 72, 4734-4738 . Szybiak, U ., Bouley, J . P . and Fritsch, C . (1978) . Nucl . Acids Res ., 5, 1821-1831 .