Cell, Vol. 11, 483-493.
July
1977,
Copyright
0 1977 by MIT
Sequence from the Assembly TMV RNA G. Jonard, K. E. Richards, H. Guilley, and L. Hirth Laboratoire de Virologie lnstitut de Biologie Moleculaire et Cellulaire du CNRS Associe a I’Universite Louis Pasteur de Strasbourg 15, rue Descartes 67000 Strasbourg, France
Summary In an effort to isolate RNA sequences containing the assembly nucleation region, uniformly 32P-labeled tobacco mosaic virus RNA was partially digested with pancreatic ribonuclease, and the mixture of fragments was incubated with limited amounts of tobacco mosaic virus protein disks in conditions favorable for reconstitution. The RNA fragments which became encapsidated were purified and sequenced by conventional techniques. The sequence of the first 139 nucleotides of Pl, the principal encapsidated fragment, is AGGUUUGAGAGAGAAGAUUACAAGCGUGAGAGACGGAGGGCCCAUGGAACUUACAGAAGAAGUUGUUGAUGAGUUCAUGGAAGAUGUCCCUAUGUCAAUCAGACUUGCAAAGUUUCGAUCUCGAACCGGAAAAAAGAGU. Residues l-110 of Pl overlap the assembly origin isolated and characterized in the accompanying papers by Zimmern (1977) and Zimmern and Butler (1977). Our results, taken in conjunction with the two accompanying papers, define the sequence of much of the nucleation region as well as sequences flanking it on both sides. The features of the Pl sequence which may have a role in the nucleation reaction are discussed in detail in the text. Introduction The in vitro reconstitution of tobacco mosaic virus (TMV) from its RNA and protein components is among the most experimentally accessible systems of biological self-assembly, since both reactants and product have been the subject of extensive investigation. The initial step in reconstitution is the binding of a two-layer disk aggregate of TMV protein to the RNA chain (Butler and Klug, 1971). The seventeen protein subunit comprising a ring of the disk structure may interact in a coordinated manner with as many as 51 nucleotide residues, thus helping to ensure that the energy barrier against initiation of the helix is surmounted. In this paper, we describe experiments designed to characterize the portion of the RNA chain which reacts with the initial disk. The site of initiation (or nucleation) of assembly
Nucleation
Region of
should be the region of the RNA chain to be first coated during a productive reconstitution event. One method of isolating this segment of the chain is to perform partial reconstitution with very limited amounts of disk protein and to remove the unencapsidated RNA tail by treatment with ribonuclease (Zimmern, 1976, 1977; Zimmern and Butler, 1977). Although straightforward conceptually, this approach has proved to entail certain difficulties from the point of view of sequence determination in that the protected RNA is on the average considerably more than 200 nucleotides in length and is heterogeneous in size (Zimmern, 1977; Zimmern and Butler, 1977). Consequently, in this and earlier studies, we have taken an alternate approach: the RNA is first cut into pieces by moderate endonucleolytic digestion, and reactive fragments are selected from the mixture by incubation with disk protein. The advantage of such a procedure is that by judicious choice of partial digestion conditions, one can obtain discrete fragments of convenient size. The disadvantage is that other lines of evidence are necessary to show that the encapsidated sequence does in fact contain the site of initiation. In an earlier attempt to isolate the disk initiation site, we reacted a partial Tl RNAse hydrolysate of TMV RNA with limited amounts of disk protein. The principal species to be captured by the disk was a sequence of 105 nucleotides which has been referred to as Tl or SERF A (specifically encapsidated fragment A) (Guilley, Jonard and Hirth, 1974; Guilley et al., 1975b). Sequence analysis has revealed that Tl is a part of the coat protein cistron, coding for amino acids 95-129 of the coat protein (Richards et al., 1974). We regard it as improbable that selection of this fragment has occurred solely by chance. Possibly, Tl represents the remnant of the initiation site of an ancestral form of TMV. In any event, its affinity for the disk notwithstanding, it is clear that the Tl sequence serves no such function in the present day virus, since it is not within the portion of the RNA molecule to be covered with protein in the early stages of in vitro assembly (Richards et al., 1975). In an accompanying paper, Zimmern and Butler (1977) describe how limited amounts of disk protein can bind to the nucleation region of TMV RNA and protect it from nuclease digestion. In a second accompanying paper, Zimmern (1977) reports the results of sequence studies on the encapsidated RNA. In this paper, we show that a fragment of 148 nucleotides which overlaps the nucleation region can be selected by reacting disks with a partial pancreatic ribonuclease hydrolysate of TMV RNA. We have determined the sequence of this fragment by conventional radiochemical techniques and
Cell 464
have noted certain of its properties which may be of importance for the reaction with the disk. Taken together, our investigations and the findings of Zimmern (1977) define the sequence of much of the nucleation region, as well as sequences flanking the nucleation site on both sides. Results Isolation and Some Properties of Partial Pancreatic Ribonuclease Digestion Products Encapsidated by TMV Protein As mentioned in the Introduction, incubation of disk protein with a partial Tl RNAase hydrolysate of TMV RNA results in the capture of fragments of the coat protein cistron but not of RNA coming from the assembly nucleation region. Apparently, the nucleation sequence is a good substrate for Tl RNAase and is cut into nonreactive pieces during partial digestion. Accordingly, we tested the feasibility of generating a reactive fragment from the nucleation region by treatment of the RNA with pancreatic RNAase. A preliminary study in which TMV RNA was partially digested with pancreatic RNAase in proportions ranging from I:50 to 15000 (enzyme:RNA by weight) showed that in all cases, a nucleoprotein complex formed when the hydrolysate was mixed with disks. In subsequent experiments, we chose to work at the highest possible ratio of enzyme to RNA to limit the size of the encapsidated fragments. Uniformly 32P-labeled TMV RNA was partially hydrolyzed with pancreatic ribonuclease (enzyme: RNA = 1:50; for conditions, see Experimental Procedures), and the mixture of fragments was allowed to reconstitute with about an equal weight of protein disks. The reconstituted material was freed of nonreacted RNA and protein by two cycles of ultracentrifugation. With regard to their physical properties, the purified products of reconstitution resembled the previously described nucleoprotein complexes containing fragment Tl and related nucleotides (Jonard, Guilley and Hirth, 1975). In brief, the particles appeared in the electron microscope as rings and short rods with diameter the same as that of TMV. Most of the rods were 100-150 A in length, which corresponds to 4-6 turns of TMV helix. Both the ultraviolet spectrum of the particles and their buoyant density in gradients of cesium chloride (data not shown) suggest an average RNA content between half and two thirds that of native TMV. The buoyant density of the nucleoprotein particles was unaffected by treatment with ribonuclease. The RNA sequestered within the nucleoprotein complexes was freed of protein by phenol extraction and electrophoresed through an 8% polyacryl-
amide gel. Figure 1 shows that most of the RNA migrated as one slowly moving band (fragment PI), but distinctly smaller amounts of as many as 20 faster moving bands could also be detected. Sequence analysis of the minor bands (see below) has revealed that all but the two smallest (P15 and P16) represent portions of the Pl sequence. Thus the great bulk of the RNA encapsidated under the prescribed conditions originates from a region com-
P2, P4, P6, P7, g: PlOb
Pll-P12, P13- P14,
P15,
Figure 1. Polyacrylamide Gel Electrophoresis of a Partial Pancreatic RNAase Digest (for Conditions, see Experimental Procedures) of TMV RNA (6) and of the Fragments Selected from Such a Digest by Reaction with Disks of TMV Protein (A) Electrophoresis was in 6% gel containing 0.1 M Tris-borate, 0.025 M EDTA (pH 6.3) plus 8 M urea. The sequences of most of the selected fragments are given in Figure 5. The band which migrates just beneath Pl was identical to Pl , except for the absence of the last 9 nucleotides at the 3’terminus. The other unidentified bands were either mixtures of two or more fragments which could not be purified or were not reproducibly present. Band Pi6 (see text) was run off the gel.
Sequence 485
from
the Nucleation
Region
of TMV
RNA
prising only about 2% of the RNA chain. Figure 1 also demonstrates that the profile of the encapsidated RNA (Figure IA) is not at all like that of the starting material, the mixture of RNA fragments before reaction with disk protein (Figure 1 B). From this we may conclude that the disk has discriminated among the components of the partial hydrolysate. It should also be noted that the principal selected fragment, Pl, is one of the longest fragments in the starting mixture and hence must have a rather tight secondary structure, high purine content or both.
initiation site, since the cap protects the extremity from 5’ exonucleolytic attack, while the m7G group, with its unsubstituted cis-hydroxyl groups, should be susceptible to the chemical probes (such as periodate oxidation) previously believed to be specific for the 3’-OH end. Indeed, the fact that no periodate-sensitive groups are protected with protein during partial reconstitution implies that neither extremity is involved in initiation, as has been verified by direct sequence analysis of the 200-300 nucleotide region of RNA which is first coated with protein under limiting conditions of assembly (Zimmern, 1975, 1977). We have confirmed this observation and extended it to include incomplete particles in which, on the average, 1000 nucleotides have been encapsidated. Table 1 demonstrates that limited encapsidation of TMV RNA to form particles with an average length of about one sixth the length of intact virus leaves both the m’GpppGp moiety and the 3’ terminal oligonucleotide CCC&, wholly exposed to the action of ribonuclease. If, on the other hand, assembly is allowed to proceed for several hours in the presence of excess protein, protection of the 3’ terminus attains 80% and of the 5’ terminus 50% of completion, where completion is defined as encapsidation of 0.86 mole 5’ end group and 1 .O mole 3’ end group per 6600 nucleotides worth of RNA (see Experimental Procedures). In native virus, both termini are totally resistant to nuclease (Table 1). Unlike the 5’ and 3’ extremities of the RNA chain, the Pl sequence is covered with protein during partial reconstitution. To show this, another preparation of incompletely reconstituted virus particles (average length 400 A, corresponding to about 800 nucleotides encapsidated) was treated with ribonuclease to eliminate the RNA tails, and the protected RNA was extracted with phenol. The protected RNA, when partially digested with pancreatic RNAase, generated fragment PI in amounts comparable to that obtained from an equivalent quantity of full-length RNA (Figure 2). Although, of course, the partially assembled rods are too long to eliminate strictly the possibility that Pl lies close to but does not overlap the nucleation region, this experiment nonetheless encouraged us to determine the sequence of the fragment.
The Pl Sequence Is Encapsidated Early in the Course of Assembly, but the Extremities of the RNA Chain Are Not Several lines of evidence have previously been taken to indicate that TMV assembly starts at the 5’ terminus of the RNA molecule. In particular, partially reconstituted rods appear to have but one RNA tail when examined in the electron microscope, suggesting that assembly commences at or very near an extremity of the RNA (Stussi, Lebeurier and Hirth, 1969; Butler and Klug, 1971; Ohno, Nozu and Okada, 1971; Guilley et al., 1972). The starting point was believed to be the 5’ end, since limited digestion of the RNA with a 5’-OH-specific exonuclease (spleen phosphodiesterase) eliminated its ability to reconstitute (Butler and Klug, 1971; Guilleyet al., 1971), whereas treatment with a 3’-OH-specific exonuclease did not (Butler and Klug, 1971), and since partial reconstitution of TMV RNA whose 3’-OH terminus had been oxidized with periodate and then marked radiochemically did not protect the tagged end of the RNA from subsequent nuclease digestion (Thouvenel et al., 1971; Ohno et al., 1971). Evidently, if the above model of assembly is correct, it should, in principle, be a relatively simple matter to show that PI contains the origin of assembly since methods to identify or label the 5’ extremity are available. Recently, however, it has been shown that the 5’ end of TMV RNA is capped with a m7GS’pppSGp moiety (Zimmern, 1975; Keith and Fraenkel-Conrat, 1975). This finding invalidates the earlier arguments in favor of a 5’ terminal Table
1. Protection
of 5’ and 3’ Terminal
End Groups
of TMV RNA from
Percentage
Attack
of the RNA Chains
by RNAase in Which
Native
TMV
Reconstituted Partially
TMV
reconstituted
TMV
the End Group
of Partial
and Total
Reconstitution
Is Protected CCCA,,
m’ GPPPGP Experiment
in the Course
1
Experiment
2
Experiment
1
Experiment
100
99
100
100
45
51
05
80
8
0
0
0
2
Cell 466
A
B
PlO
Figure 2. Polyacrylamide Gel Electrophoresis of RNA Fragments Encapsidated When a Partial Pancreatic RNAase Digest of Intact TMV RNA (B) or an Equivalent Amount of the RNA Extracted from Short Incompletely Reconstituted Particles (A) Was Reacted with Disk Protein Conditions
of gel electrophoresis
are as in Figure
1
Nucleotide Sequence of Pl The Ti and pancreatic ribonuclease fingerprints of PI are shown in Figure 3; the sequences of the oligonucleotides are given in Tables 2 and 3. A feature which merits attention is the exceptional number of large pancreatic ribonuclease oligonucleotides found in the fragment: two dodecanucleotides, one nonanucleotide and three hepatonucleotides (Table 2). Such purine-rich tracts are not unexpected in a sequence which is rather resistant to pancreatic ribonuclease, although it should be noted that the overall purine content (61%) is not remarkably higher than that of total TMV RNA (55%). Comparison of the oligonucleotide catalogue of Pi with that of the RNA protected by limited reconstitution (Zimmern, 1976, 1977) reveals a high degree of correspondence. In particular, four of the
Figure 3. Pancreatic of Fragment Pi
RNAase
(A) and T, RNAase
The oligonucleotides (C, D) and identified
are numbered in the in Tables 2 and 3.
(B) Fingerprints
accompanying
plans
five long pancreatic oligonucleotides of Pl, GAGAGAGAAGAU, GGAGGGC, AGAAGAAGU and GGAAGAU, are present in near molar yields in Zimmern’s protected RNA. The fifth long pancreatic oligonucleotide, GGAAAAAAGAGU, was only detected once by Zimmern, suggesting that it lies outside the initiation zone proper. With respect to the large Tl RNAase oligonucleotides, there is good agreement except for AUUACAAG (Zimmern has AUUACAAACG instead), ACUUG, AUCUCG and UCAAUCAG, which are present in PI but missing from Zimmern’s protected material. Tables 2 and 3 give the oligonucleotide catalogues of two of the shorter specifically encapsidated fragments, P6 and PlO. It is readily seen that both fragments derive from Pi, as indeed was also the case for the other fragments P2-P14 inclusive (data not shown). It remains uncertain whether the subfragments are created during the original partial pancreatic ribonuclease digestion and are encapsidated and co-purified along with Pi, or whether they are degradation products of purified Pl caused by traces of pancreatic ribonuclease carried through the purification procedure. We
Sequence 487
Table
from
the Nucleation
2. End Products
Region
of Pancreatic
of TMV
RNAase
RNA
Digestion
of Fragments Relative
Molar
Fragment
PI
spot
Sequence
Found
PI
u
10.5
P2
C
12.7
P3
AC
1.9
P4
GC
P5
AU
P6
AAU
P7
pa
Pl,
P6 and PI0
Yield Fragment Expected
Fragment
P6
PlO
Found
Expected
Found
Expected
14
a.2
5
a.0
a a
4.6
14
1.9
2
2
2.0
2
2.1
2
1.1
1
0.8
1
3.1
3
3.2
3
1.1
1
1 .o
1
1 .o
1
GAAC
0.9
1
AGAC
0.9
1
0.9
1
P9
AAGC
t 0.9
1
I 0.9
1
0.9
1
PlO
GU
6.7
6
4.3
4
1.3
1
Pll
GAU
3.0
3
1.3
1
Pi2
GGAAC
1 .o
1
0.9
1
0.9
1
PI3
AAAGU
1 .o
1
PI4
GAGAGAC
1 .o
1
0.8
1
0.8
1
PI5
AGGU
1 1 .o
1
P16
GAGU
1 .o
1
1.2
1
P17
GGAAGAU
0.9
1
pia
GAGAGAGAAGAU
P19
1 .o
0.9
1 1 1
0.8
1
0.8 0.8
1
GGAGGGC
0.8 0.8
P20
AGAAGAAGU
1 .o
1
1 .o
1
0.9
1
P21
GGAAAAAAGAGU
0.9
1
1
1
The relative molar yields of each oligonucleotide as determined experimentally (the average of a least three separate determinations) and as calculated from the final primary structure are indicated in the “found” and “expected” columns, respectively. Bracketed oligonucleotides were not separated on the fingerprint and were counted together.
currently favor the former explanation, since the relative abundance of the subfragments is not like the pattern of abundance found in a partial pancreatic ribonuclease digest of purified Pl, and since rebinding experiments with the purified subfragments using the nitrocellulose filter binding assay described by Zimmern (1976) have shown that most, if not all, can bind independently to the disk. But whatever their origin, the availability of such natural partial digestion products in large quantities greatly facilitated resolution of the sequence. P14 (41 nucleotides) was the smallest naturally occurring subfragment of’P1 which we were able to isolate. Tl RNAase fingerprints of this fragment contained molar amounts of C (and no other Glacking oligonucleotide), indicating that P14 has GC at its 3’ end. P13 gave the same Tl RNAase catalogue plus one UUG and submolar UG, whereas Pll contained two UUGs (no UG) and the supplementary product AGAAGAAGU in the pan-
creatic RNAse fingerprint. Thus the sequence to the immediate left of P14 may be written AGAAGAAGUUGUUG. P12 contains AGAAGAAGU plus more nucleotides to the left and allows the sequence to be further extended in the 5’ direction to GGAACUUACAGAAGAAGUUGUUG (nucleotides 46-48) (Figure 4). P12 and P6 both contain the new Tl RNAase oligonucleotide ACAAG which necessarily derives from AUUACAAG and must be at the 5’ end of each fragment. AUUACAAG appears complete in PlO and P2 accompanied by UUUG; simultaneously, the products AGGU and GAGAGAGAAGAU appear in the pancreatic RNAase catalogue.This suggests that the sequence flanking P12 at its 5’ end is AGGUUUGAGAGAGAAGAUU (l19), which is confirmed by the short pancreatic RNAase partial digestion fragments coming from this region (see Figure 4). At this stage of the analysis, the sequence to the left of P14 may be written AGGUUUGAGAGAGAAGAUUACAAGC(GU,GAGAGAC,GGAGGGCCCAU)GGAACUUACAGAAGAAG-
Cdl 408
Table
3. End Products
of Tl
spot
Sequence
t1
G
t2
CG
t3
AG
t4 t5
Ribonuclease
Digestion
of Fragments
Pl,
Relative
Molar
Fragment
PI
Yield Fragment
P6
Fragment
Expected
Found
7.7
7
1 .l
1
10.2
9
ACG
0.9
1
1 .o
AAG
4.2
4
3.8
t6
AAACG
0.9
1
t7
CAAAG
1 .o
1
ta
UG
2.0
2
1.2
1
1.5
t9
UCCG
0.7
1
t10
AUG
3.3
3
2.5
‘2
t11
CCCAUG
0.8
1
1 .o
1
t12
AAAAAAG
0.8
1
t13
UUG
2.0
2
2.2
2
t14
ACUUG
1 .o
1
1 .o
1
t15
AUCUCG
0.8
1
t16
UUUG
1 .o
1
UUUCG
1 .o
1
t17
Found
P6 and PI0
Expected
Found
Expected
5.6
5
4.7
5
1 .l
1
1 .I
1
5.8
4
6.2
7
1
1 .o
1
3
2.8
3
0.7
1.3
t18
UUCAUG
0.9
1
1 .2
t19
UCCCUAUG
0.8
1
1 .o
t20
AUUACAAG
0.9
1
t21
AACUUACAG
0.8
1
0.8
1
t22
UCAAUCAG
0.8
1
0.8
1
t23
U
1 .o
1
C
0.8
1
ACAAG
1 .o
1
0.8 0.8
0.9
a Oligonucleotide ACAAG in fragment P6 is the 5’-OH terminus of the fragment. It is derived from AUUACAAG. The relative molar yield of each oligonucleotide as determined experimentally (the average of at least three separate calculated from the final primary structure is indicated in the “found” and “expected” columns, respectively.
UUGUUG-P14. Turning to the sequence which extends beyond the 3’ end of P14, fragments P8 and P9 contained four previously unmentioned Tl RNAase oligonucleotides, CAAAG, UUUCG, AACCG and AUCUCG, and four new pancreatic RNAase oligonucleotides, GAAC, AAAGU, GAU and GGAAAAAAGAGU. The 3’ terminus of both fragments was U. The sequence must therefore be P14-AAAGUUUCG(AUCUCG,AACCG)GAAAAAAGAGU. Finally, P3 and P5, which overlap P14 from the left but terminate before reaching CAAAG, solved the 3’ terminal portion of P14: UCCCUAUGUCAAUCAGACUUGC (87-107). In summary, the sequence of Pl , insofar as it can be deduced from the naturally occurring subfragments, is AGGUUUGAGAGAGAAGAUUACAAGC(GU,GAG-
PlO
determinations)
1
and as
AGAC,GGAGGGCCCAU)GGAACUUACAGAAGAAGUUGUUG(AUG,AGUUCAUG,GAAGAUG)UCCCUAUGUCAAUCAGACUUGCAAAGUUUCG(AUCUCG,I+% AACCG)GAAAAAAGAG(UG, UCCG)U. The final sequence of the principal subfragments is shown in Figure 4. To complete solution of the sequence, purified fragment Pl was partially digested with pancreatic RNAase (see Experimental Procedures), and the resulting fragments were purified by gel electrophoresis. The products of partial digestion provided enough additional overlaps to align unambiguously all oligonucleotides apart from a short stretch at the 3’ extremity (Figure 4). The final sequence consists of 148 nucleotides; it does not
Sequence 489
Figure
from
the Nucleation
4. Sequence
of Fragment
Region
of TMV
RNA
Pl
The sequences of the principal subfragments are indicated above products used to deduce the final sequence are indicated below differ by a purine to purine base shift are marked by stars.
overlap any portion of the cistron for the coat protein, the only TMV protein whose sequence is known. Four UGA stop signals are present in PI but are confined to two of the three possible reading frames. Thus the possibility remains open that all or part of the PI sequence codes for viral protein. A plausible secondary structure for the sequence is given in Figure 5. The PI sequence falls into two parts: the 5’ terminal portion of the sequence (position l-62) is 73% purine and has a 20 base stretch free of cytidine at the left-hand end; the rest of the sequence is more normal in base composition, although a short purine tract (GGAAAAAAGAG) is also to be found at the 3’ extremity (position 128-138). In addition to PI and subfragments P2-P14, two fragments which do not derive from the Pl sequence were sometimes captured (G) by the disk. For fragment P16, the data suggest the sequence AAAGGAAAAAUUAGUAGU, where a fraction of the molecules (30-80% in three separate experiments) has a G rather than an A at position 6. No sequence resembling Pi6 was found by Zimmern within the RNA protected by limited reconstitution (see Zimmern, 1977). However, by using milder conditions of partial pancreatic RNAase digestion before reaction with disk protein, we have isolated fragments of about 210 nucleotides which encompass both Pl and P16. Hence Pi6 must be no more than 40 nucleotides from the 3’ end of PI. Pi5 has not yet been completely sequenced or localized, but it does-not appear to correspond to any part of the protected RNA characterized byzimmern (1977). Pl Overlaps the Origin of Assembly In the accompanying paper and elsewhere, Zimmern (1976, 1977) has described some of the properties of the RNA protected during limited reconstitution. Although fragments as small as 50 nucleotides could be detected, the shortest sequence of RNA protected in appreciable quantities is 250-300
the final sequence, and some of the pancreatic RNAase partial digestion it. The positions where Strasbourg and Cambridge common strain TMV
nucleotides in length, corresponding to 5-6 turns of nucleoprotein helix. Lowering the ratio of protein to RNA did not cause a shorter region of RNA to be protected, but only lowered the yield of the 250-300 nucleotide material. Zimmern refers to all this favored sequence as the nucleation region, since all of it is apparently coated on formation of a stable particle. In view of the substantial (but not perfect) agreement between their oligonucleotide catalogues, we anticipated a considerable overlap between Pl and the nucleation region isolated by Zimmern (Zimmern 1976, 1977). Comparison of Figure 4 with Figure 5 of the accompanying paper (Zimmern, 1977) proves that this is indeed the case: the first 110 nucleotides of Pi correspond to the 3’ terminal portion of Zimmern’s nucleation region. This is the part of the nucleation region which is believed, on the basis of Zimmern’s sequence analysis of the shortest protected fragments, to contain the origin of assembly-that is, the region of the RNA that binds to the initial disk (Zimmern, 1977). After binding of the first disk, rod growth proceeds rapidly toward the 5’ end of the RNA but much more slowly in the opposite direction, thus explaining why the oligonucleotides AUCUCG and GGAAAAAAGAGU of Pl , both of which lie on the 3’ side of the assembly origin, were underrepresented or not present at all in the RNA protected by limited reconstitution. Examination of the finished sequences also reveals that the rest of the apparent discrepancies in the Ti RNAase catalogue can be satisfactorily accounted for by purine to purine base shifts between Cambridge and Strasbourg common strains of TMV (see Figure 4). Zimmern did not find any pure partial digestion products which contained the oligonucleotides AGAAGAAGU, CAAAG and UUG, forcing him to rely upon indirect arguments to position them. Nevertheless, his conclusions are in agreement with our placement of these products based upon more
Cell 490
A G A A G--U A C A U -U C A A GG U A C C c G G G A
A 0 l 0 . .
. 0 . . . 0 . 0 0 0 . 0 . 0
G
Discussion G U/ U / G U G A U G A G U U C A U G ~~~ G A GU C” C C U A
G--U C l A .
G U
AG
l
cA
GA
. 0
UA C
r . 0 l .
A G CA U U Gc
:
,“*
G--u A 0 A 0
U u
G G; G A A C
I: A
G
~~
l
cG
GA
. . 0 .
LIA C U C
G A G U--G U . U . G . AG 0
Figure
A A C C G G A A A A A A G A G(UCCG;UG.)U
5. Possible
Secondary
Structure
of Fragment
PI
The preferential sites of action of pancreatic RNAase are indicated. An AUG is present at one or the other positions marked by inverted triangles at the 3’ end.
straightforward lines of evidence. The sequence of Pl allows us to fill in the gap in Zimmern’s sequence (Zimmern, 1977) with a second UUG and indicates that the possible UG in fragment P13 of Zimmern’s sequence, the possible AUG in P16 and the other questionable products are definitely not there. Keeping in mind that one might reasonably expect the sequence in the nucleation region to have translational symmetry to fit into the repeating matrix of binding sites on the disk, it is evidently important that doubts concerning the existence of such products be dispelled.
The evidence presented both in this paper and by Zimmern (1975, 1977) shows that contrary to previously held notions, neither the 5’ end nor the 3’ end of TMV RNA is covered by coat protein during the early stages of assembly. This observation taken by itself could be misleading, since it is difficult to rule out the possibility that the ends of an incomplete particle might unravel slightly in the course of purification. There are, however, other lines of evidence supporting the simple interpretation that assembly does not start near an extremity. In particular, no part of the assembly nucleation sequence contains the coat protein cistron, which is known to be within 1000 nucleotides of the 3’ end of TMV RNA (Hunter et al., 1976). Furthermore, there is no overlap between even the most 5’ distal (with respect to Pl) of the tracts of sequence from the nucleation region (see Zimmern, 1977) and a fragment of 250 nucleotides containing the 5’ terminal m’GpppGp group (K. E. Richards and G. Jonard, unpublished observations). Hence the assembly initiation site can be no nearer than about 500 nucleotides to the 5’ end of TMV RNA, while it must be separated by at least the length of the coat protein cistron from the 3’ end. Treatment of TMV with dimethylsulfoxide preferentially removes protein subunits starting from the end of the particle containing the 3’ terminus of the RNA (Nicola’ieff et al., 1975; G. Lebeurier, further observations). The RNA extracted from stripped particles of 2000 A or less from which the RNA tail has been eliminated by RNAase digestion has largely lost its ability to reconstitute (Nicolai’eff et al., 1975; Richards et al., 1975). The original interpretation of this finding was colored by the then current belief that TMV assembly started at the 5’ terminus, leading to the suggestion that the stripped particles had lost subunits from the 5’ end as well (Nicolai’eff et al., 1975). We now believe it more probable, however, that the loss of ability to reconstitute was caused by elimination of the 2000 nucleotide 3’ terminal RNA tail rather than a short 5’ terminal one. Thus we would argue that the assembly initiation site is to the left of the coat protein cistron but probably no more than 2000 nucleotides from the 3’ end of the RNA. On the basis of the Tl RNAase oligonucleotide catalogue of short alkali-stripped particles, Zimmern and Butler (1977) have mapped Pl at about 1000 nucleotides from the 3’ end, which is compatible with the above conclusion. If reconstitution starts within the interior of the RNA chain, protein must evidently be able to add at both ends of the nucleation complex, although the rate of elongation in each direction may vary de-
Sequence 491
from
the Nucleation
Region
of TMV
RNA
pending upon the part of the RNA molecule being covered or the state of aggregation of the protein. The sequence data show that growth goes preferentially toward the 5’ end of the RNA during the early stages of reconstitution (Zimmern, 1977), which is readily reconciled with the observation that the part of the RNA containing the coat protein cistron (that is, the 3’ end region) remains unencapsidated in the early stages of assembly (Richards et al., 1975). In summary, all available evidence predicts that short, partially reconstituted particles should have two sizable RNA tails, this in spite of the earlier electron microscope studies showing only one. Conceivably, the two tails may have a tendency to stick to one another so as to appear as one thread of RNA in the electron microscope. Indeed, recent electron microscope observations by Lebeurier and her colleagues suggest that this is the case. When partially reconstituted virus particles are deposed on the electron microscope grid under conditions designed to minimize RNA self-interactions, the great majority of the particles can be seen to have two tails (Lebeurier, Nicolai’eff and Richards, 1977). Interestingly enough, the two tails always appear to protrude from the same end of the rod. There are several possible ways of accounting for this observation, but the explanation which we regard as the most probable is that one of the two RNA tails traverses the length of the particle by passing along the central channel. If this hypothesis is correct, it implies that the RNA probably inserts itself between layers of the disk from the inside-that is, from within the central hole, as suggested by Champness et al. (1976) on the basis of their crystallographic studies of the disk structure. This in turn implies that the region of the RNA involved in nucleation of assembly should have a loop structure, since it is difficult to imagine how the RNA could enter the central hole of the initiating disk without folding back on itself. Hence it is gratifying to note that Pl can be cast into a stable hairpin loop structure (Figure 5). Under most conditions of reconstitution, TMV protein reacts preferentially with its own RNA. The structure of the finished virion is such that the RNA is sandwiched between helical layers of protein. Each subunit is associated with three successive nucleotides of the RNA chain in the turn below and three in the turn above. Nothing is known of the nucleotide bonding domains of the subunit, but it is readily imagined that one or more of them may interact preferentially with a certain base or bases. Evidently, even a weak base specificity at the level of the individual bonding domains could provide the foundation for sequence specificity in the multisite nucleation reaction between RNA and protein
disk. The favored sequence would be expected to have a periodicity of order three, reflecting the trimodal repeat of bonding sites upon the disk. Insofar as fragments Pl and Tl both have affinity for the disk, comparison of their sequences should help to focus attention on features of importance for recognition. One reservation, however, must be made. Although the specific binding of such fragments to the disk will undoubtedly reflect many characteristics of the actual assembly nucleation reaction, additional signals which are not possessed (or expressed) by the short fragments may have a role in the interaction between disk and intact RNA. In particular, although both PI and Ti bind to the disk and become RNAase-resistant in doing so, we cannot be sure that they are competent to trigger the complete disk-lockwasher conversion. Comparison of the sequences of Pl and Tl has not brought to light any significant blocks of sequence homology, nor is there a strict repetition of a trinucleotide motif in either sequence. Table 4 demonstrates, however, that there is a strong tendency in the part of Pl which overlaps the nucleation region (residues l-108) for every third position to be occupied by a purine (particularly G). Since this portion of the sequence has a rather high content of purines, the regularity may conceivably be coincidental [the probability of 29 of 36 purineheaded triplets occurring by chance in a random sequence of this length containing 62% purine is about 1% (A. C. H. Durham, personal communication)]. Arguing against such a conclusion, however, is the fact that an even stronger trimodal repeat of purines (30 of 35 triplets started by a purine) is to be found in Tl The probability of such a regularity arising in a random sequence of this length by chance is still lower (P = 0.05%). Note that in both PI and Tl, the triplet reading frame which is headed by purines is also that which is free of nonsense codons. Near the center of Pl falls the 30 nucleotide sequence GAA.GAA.GUU.GUU.GAU.GAG.UUC.-
Table 4. Base Abundancy According Triplets of Fragments Pl (Positions Position
1
to Position l-108) Position
2
in Successive
Position
G
0.25
0.47
0.19
A
0.33
0.33
0.20
U
0.25
0.08
0.36
C
0.17
0.11
0.17
3
Sequence: AGG. UUU. GAG. AGA. GAA. GAU. UAC. AAG. CGU. GAG. AGA. CGG. AGG. GCC. CAU. GGA. ACU. UAC. AGA. AGA. AGU. UGU. UGA. UGA. GUU. CAU. GGA. AGA. UGU. CCC. UAU. GUC. AAU. CAG. ACU. UGC.
Cf?ll 492
AUG.GAA.GAU (residues 564X), in which 8 of the 10 base triplets start with G and 6 have A in the second position. Zimmern (1977) has suggested that this part of the sequence has a crucial role in binding the first disk. In this regard, it is noteworthy that all the naturally occurring subfragments of PI extend over at least a portion of this sequence, suggesting that it does indeed have a special function in the recognition process. Furthermore, the sequence in question falls near the top of the hairpin loop secondary structure of Pl (Figure 5) and so would be strategically placed to insert itself between the layers of the initiating disk. It should be stressed, however, that the hairpin structure, although it may have a role in the nucleation reaction with intact RNA or PI, is not obligatory for formation of a disk-fragment complex, since subfragments of Pl, such as PlO or P7, which correspond to only one or the other side of the hairpin loop, can still bind to the disk (G. Jonard and K. E. Richards, further observations). The above considerations all point to a special role for purines in the recognition process and suggest that at least one of the nucleotide binding domains of the subunit has a preference for purines, perhaps G in particular. The strong purine (G) bonding domain may be flanked by a weaker A bonding domain. Furthermore, we expect that C exerts a negative influence, since it is significantly underrepresented in much of the nucleation region. Finally, the tertiary structure of the intact RNA may further intervene by presenting the binding sequence in a position favorable for interaction with the disk. The nature of the signal which provokes rearrangement of the planar disk structure into the helical (lockwasher) conformation is unclear and will probably remain so until more about the mechanism of the disk-lockwasher conversion is known. If we grant that some common feature of the Tl and Pl sequence, such as the above mentioned triplet repeat of purines, is partly or wholly responsible for the heightened affinity of these fragments for the disk, we are still faced with the question of why one sequence serves as the site of nucleation during virus assembly and the other does not do so. Using the filter binding assay, we have found that Tl and Pl complex with the disk at comparable rates (K. E. Richards and G. Jonard, unpublished observations), making it improbable that selection is mediated solely by the kinetics of binding. Possibly, intact TMV RNA is folded in such a way as to make the Tl sequence inaccessible. Alternatively, discrimination may take place subsequent to initial binding, perhaps, as suggested byzimmern (1977), at the stage of translocation of the disk to the
lockwasher form. Whatever the mechanism of selection, it is evident that the nucleation reaction has proved to be unexpectedly complex. Thus in spite of our knowledge of the initiating sequence and our growing understanding of the 3’ dimensional structure of the disk (Champness et al., 1976), much still remains to be learned about exactly how the two components interact. Experimental
Procedures
Partial Pancreatic RNAase Digestion of TMV RNA and Purification of Those Fragments with Affinity for the TMV Protein Disk 32P-labeled TMV RNA and disks of TMV protein (25s protein) were prepared as described earlier (Guilley et al., 1975b). Partial pancreatic FiNAase digestion of TMV RNA was performed in 0.075 M sodium pyrophosphate (pH 7.25) (reconstitution buffer) at 0°C. The ratio of enzyme to RNA was I:50 by weight. After 10 min, digestion was stopped by phenol extraction, and the phenol was eliminated from the aqueous phase by extraction with ether. The partially hydrolyzed RNA was then incubated with about an equal weight of 25s protein in reconstitution buffer for 2-4 hr at room temperature, and the nucleoprotein complexes which formed were separated from nonreacted RNA and protein by two cycles of ultracentrifugation (Guilley et al., 1975b). After purification, the encapsidated RNA fragments were freed of protein by phenol extraction and separated from one another by polyacrylamide gel electrophoresis in the presence of urea. Details of the experimental procedures for polyactylamide gel electrophoresis and the methods used for further purifying the fragments have been given elsewhere (Guilley et al., 1975b). Sequence Determination Sequence analysis of RNA fragments was modeled after the approach developed by Sanger and his colleagues as applied in our laboratory (Guilley et al., 1975b). Products of total Ti RNAase and pancreatic RNAase were separated by two-dimensional electrophoresis. Electrophoresis in the first dimension was on cellulose acetate at pH 3.5 in the presence of 7 M urea: the second dimension was electrophoresis on DEAE paper moistened with 7% formic acid. Ti RNAase oligonucleotides were characterized by some or all of the following procedures: total or partial digestion with pancreatic RNAase, alkaline hydrolysis, total venom phosphodiesterase digestion of the dephosphorylated oligonucleotide, modification with CMCT followed by hydrolysis with pancreatic RNAase and digestion with U2 RNAase. Help in deducing the correct sequence of t19 (UCCCUAUG) and t17 (UUUCG) also came from partial venom phosphodiesterase hydrolysis of the oligonucleotide 32P-labeled at the 5’ position with polynucleotide kinase. Pancreatic RNAase oligonucleotides were characterized by partial or total digestion with Tl RNAase, alkaline hydrolysis and venom phosphodiesterase hydrolysis of the dephosphorylated product. Oligonucleotide P15 was first assigned the sequence GGG(AG,G)C upon the basis of a very faint Ti RNAase partial digestion product. However, two-dimensional fractionation of a partial Pi nuclease digest of the 5’-32P-labeled oligonucleotide gave the sequence GGAGGGC (H. Guilley, personal observations) in accord with the sequence reported by Zimmern (1977). To order the oligonucleotides in the sequence, purified fragment PI was partially digested with pancreatic RNAase (enzyme: RNA = l:lOO,OOO by weight) for 10 min at O”C, and the breakdown products were purified by gel electrophoresis and fingerprinted. Consideration of the overlaps among the partial digestion products plus those provided by the naturally occurring subfragments permitted unambiguous alignment of all nucleotides in the fragment except for a short sequence at the 3’ end.
Sequence 493
from
the Nucleation
Region
of TMV
RNA
Partial or Total Reconstitution of TMV Incompletely reconstituted virus particles of 400-600 A average length were prepared by reacting 3ZP-TMV RNA with a 4 fold weight excess of 25s protein in reconstitution buffer for 20 min at 24°C. The particles were collected by sedimentation for 3 hr at 35,000 rpm in a Beckman 50 rotor. Totally reconstituted particles were prepared in the same way, except that a 40 fold weight excess of protein over RNA was used and reconstitution was for 2 hr. To eliminate uncoated RNA, the preparation of particles was treated with 1 unit Tl RNAase per 10 pg RNA for 20 min at room temperature and sedimented again. The protected RNA was extracted from the particles with phenol and screened for the presence of the 5’ terminal and 3’ terminal groups (see below) or for the Pl sequence. In the latter case, the preparation of 32P-labeled protected RNA was supplemented with a large excess of nonradioactive TMV RNA, and the mixture was subjected to partial pancreatic RNAase digestion followed by reaction with disks in the same manner as described above for the isolation of fragment Pi from full-length TMV RNA. The yield of fragment Pl from the protected RNA was compared to that obtained from an equivalent amount of full-length RNA. Measure of 5’ and 3’ End Groups in RNA Protected by Partial or Total Reconstitution Defection of m7 GpppGp About 50 +g of TMV RNA or the RNA extracted from partially or wholly reconstituted particles were mixed with 10 ~1 of a solution of Tl RNAase (100 U/ml) + T2 RNAase (100 U/ml) + pancreatic RNAase (100 rg/ml), 0.05 M sodium acetate, 0.01 M EDTA (pH 4.7) and incubated overnight at 37°C. This treatment cleaves all phosphate linkages in TMV RNA except for the 5’-5’ triphosphate linkages of the 5’ terminal blocking group m7GS’pppS’Gp (Zimmern, 1975). The mononucleotides were separated from the RNAase-resistant end group by electrophoresis at pH 3.5 on DEAE paper. After autoradiography, the amount of radioactivity present in the spot of m’GpppGp relative to that present in the mononucleotides was determined by liquid scintillation counting. The extent of protection of the end group in native or totally reconstituted TMV was compared to the experimental figure of 0.053% radioactivity as m’GpppGp for 100% protection. This value was measured in separate experiments with 32 P-TMV RNA and corresponds to 86% of the figure expected on the basis of one “cap” per intact RNA molecule. In making the calculation for the RNA extracted from partially assembled rods, account was taken of the average size of the encapsidated RNA as determined from electron microscope measurement of the incomplete particles before extraction. Defection of CCCA,, 50 pg of each RNA sample were treated with 5 units of Tl RNAase for 1 hr at 37°C in 0.1 M Tris, 0.001 EDTA (pH 7.4). The 3’-OH oligonucleotide, CCC&,,, was separated from the other Tl RNAase oligonucleotides by electrophoresis on Whatman 3MM paper at pH 2.6 (Guilley, Jonard and Hirth, 1975a), and the amount of radioactivity present as CCC&,, relative to that in other products was estimated by liquid scintillation counting. The extent of protection of the CCC&k end group was calculated as for the 5’ end group, and results are expressed relative to a value of 0.044% total radioactivity as CCC& for TMV RNA. Acknowledgments We would like to thank Dr. D. Zimmern and Dr. P. J. G. Butler for communicating their results to us prior to publication. This work was supported by the Commissariat a I’Energie Atomique and by the Delegation G&&ale a la Recherche Scientifique et Technique. Received
November
26, 1976;
revised
January
31, 1977
References Butler,
P. J. G. and Klug,
A. (1971).
Nature
New Biol. 229, 47-50.
Champness, J. N., Bloomer, A. C., Bricogne, and Klug, A. (1976). Nature 259, 20-24. Guilley, 185.
H., Jonard,
Guilley, H., Jonard, USA 72, 864-868.
G. and
Hirth,
L. (1974).
G. and Hirth,
L. (1975a).
Guilley, H., Jonard, G., Richards, J. Biochem. 54, 135-144. Guilley, H., Stussi, 272, 1181-1184. Guilley, (1972).
Jonard, Keith,
G., Guilley,
C.R.
H. and Hirth,
J. and L. (1975).
H. (1975).
NicolaTeff, A., Lebeurier, Gen. Virol. 6, 295-306. Ohno,
T., Nozu,
G., Morel,
Y. and Okada,
Richards, K. E., Guilley, Letters 43, 31-32.
Y. (1971).
Stussi,
C., Lebeurier,
G. and Hirth,
Zimmern,
D. (1975).
Nucl.
Zimmern, Zimmern,
D. (1976). D. (1977).
Phil. Trans. Roy. Cell 77, 463-482.
Acids
Res.,
Zimmern,
D. and Butler,
57, 31-33. Proc.
P. J. G. (1977).
J.
510-516. L. (1974).
L. (1971).
1189-1201. Sot.,
Nat.
L. (1975).
Virology38,
C. and Hirth,
L.
64, I-9.
A., Lebeurier,
L. (1969).
H., Stussi,
Hirth,
D. (1976).
K. E. (1977).
Virology44,
Eur.
Sci. Ser. D
P. and
Virology
Richards, K. E., Morel, M. C., Nicolaleff, Hirth, L. (1975). Biochimie 57, 749-755. Thouvenel, J. C., Guilley, Letters 76, 204-206.
Acad.
Zimmern,
G. and Hirth,
Sci.
L. (1975b).
M. C. and Hirth,
H., Jonard,
56, 181-
FEBS Letters
Lebeurier, G., Nicola’ieff, A. and Richards, Acad. Sci. USA, in press.
P. J. G.
Nat. Acad.
J. C., Pfeiffer,
Knowland,
J. and FraenkeCConrat,
Proc.
L. (1971).
H., Stussi, C., Thouvenel, Virology 49, 475-485. T.,
Biochimie
K. E. and Hirth,
C. and Hirth,
Hunter, T. R., Hunt, Nature 260, 759-764.
G., Butler,
in press.
Cell 11, 455-462
FEBS G. and 16-25. FEBS