J. Mol. Rid. (1989) 208. 307~-3325
Visualization of Protein-Nucleic Interactions in a Virus
Acid
Refined Ftructure of Intact Tobacco Mosaic Virus at 2.9 A Resolution by X-ray Fiber Diffraction Keiichi Namba?, Rekha Pattanayek and Gerald Stubbs Van,derbilt (Received
Department University,
24 August
of Molecular Biology Nashville, TN 37235,
TJ.8.A.
19X8, and in revised *form 18 February
19R9)
The structure of tobacco mosaic virus (TMV) has been determined by fiber diffraction methods at 2.9 a resolution, and refined by restrained least-squares to an R-factor of 0.096. Protein-nucleic acid interactions are clearly visible. The final model contains all of the nonhydrogen atoms of t’he RNA and the protein, 71 water molecules, and two calcium-binding sites. Viral disassembly is driven by electrostatic repulsions between the charges in two carboxyl-carboxylate pairs and a phosphate-carboxylate pair. The phosphate-carboxylat,e pair and at) least one of the carboxyl-carboxylate pairs appear to be calcium-binding sites. Nucleotide specificity, enabling TMV to recognize its own RNA by a repeating pattern of guanine residues, is provided by two guanine-specific hydrogen bonds in one of the three base-binding sites.
1. Introduction Tobacco mosaic virus (TMVS) is the first virus in which protein-nucleic acid interactions have been visualized at the molecular level. Crystal structures have been published for a number of icosahedral plant, animal and insect viruses (for reviews, see l,iljas, 1986; Stubbs, 1989n). but, until recently, thd nucleic acid ha,d not been seen in any of these structures, because it, does not conform to the icosahedral symmetry of the viral coat protein. About 10% of the RNA in bean pod mottle virus has now been seen in electron den&y maps (Chen. it d. 1989), but t’hese structures have not yet been refined or described in detail. In TMV, by contrast, t’he RNA genome is well ordered, with the same helical symmetry as the coat protein. TMV is the type member of the tobamoviruses 3000 A long and virus group, rod-shaped 180 A in diameter (1 A = @l nm), with a central holes of diamet,er 40 A. Approximately 2130 ident Present address: ERATO. S-9-5 Tokodai. Toyosato. Tsukuba 300-26.
tical protein subunits of molecular weight, 17,500 form a right-handed helix of pitch 23 A, wit,h 49 subunits in three turns. A single strand of RNA follows the basic helix between the prot,ein subunits at a radius of 40 A. There are three nucleotides bound to each protein subunit. These features are illustrated in Figure 1. The amino acid sequence of the coat protein was determined by WittmannLiebold & Wittmann (1967), and confirmed by the complet
Figure 1. (‘omputer graphics n:I)r~s’,ntatiotl ofat)out l/Nth of t~hr TN\’ particle. I’rotrin subunits are light gray: RlVA is dark gray. The RXA is shown extending bryontt thta rnd of the prote‘ill helix for c-larit,y. There are 49 protein subunits in 3 turns of the viral helix. Protrusions on the RNi\ c,hain (I pointiny out and 2 pointing up) represrnt thr 3 nuc*leotides hountl to each protein subunit. (:raphic.s f’ront Natnlt;l r,/ nl. (IR85). details were given in preliminary form 11y Namha &, Stuhbs (1986)), and we consider the significance of the st,ruct.nre. including at least t.wo calciumbinding sites, for t’he assembly and disassembly of t tw virus.
2. Materials and Methods TMV (cu&nr~ strain) was grown in Si~~tianu lahucunr var. ~Snmsw~ anti purified by tlifferential cent,rifupation. Fiber tiiffrac:t,ion specimens wert- prepared hy drawing
Figure
2. A diffraction
pattern
from an oriented
gel of TMV.
pellets of thr wntrif’ugecf virus int,o yuart,z X-ray dif#‘rwtion c*apillary tubes of nominal diamet,er 0.7 mm. mixing with small quantities of buffer solut,ion: ant1 moving the column of virus to orirnt the long particles by shearing forces ((Gregory b%Holmes, I!f65). These diffrart,ion spwmrns arc’ ext.remrl; stab. and spwimens were used that had t~wrr prepare<1 over the period 1960 to 1!)75. Although most diffrac:tjion pat,t,rrns were recorded within a fen months of spwitnw preparation. t.hr dat,a usrd in thr final refinement of t,he structure were oht,aiwd in I%+:! from a spec~imrn made by I)r K. C’. Holmrs in the lahoratar>. of I)r I). I,. I). (‘aspar in 1960. A tliffrac?ion piLt?IYl t’r0111 tlrih up~~~~ittlr*n is sl~cb\\n itI Fig. L’ ,\ -;lw~.inlt~ri in I)r Holmc~s‘s Ial~or;~tor~~. IISNI in all thta parli(*r work. hatI
taken using flat’ film anti a douhlr-rnirrcgr
focwsing system.
hwn exposed to N-rays rrpeat,edly over a period of IO vrars without any rridrn(~(~ of deteriorat,ion. six heavy-atom derivatives were used in the analvsis: t)o overcome the problem of cylindrical averaging in hber diffraction, large numbers of derivatives are needed (St,ubbs & Diamond. 1975). Details of the drrivat,ires are given by St,ubbs el ul. (1977). and the heavy-atom paramrt)ers are recorded in Table I of Riamba & Stubbs (1985). Only 2 of the derivatives (0~0, and ITO,F,) were made without sprcifi~ c*hemical c%onsiderations. Three were made 1)~. the r.rac*tiorr of mt~thyltnerc~ury nitrate with sulfhydryl groups: I from oulyare. 1 from ~~lynre with I,ys68 chemically modified by a sulfhydryl imidorstrr group (I’erham & Thomas, 1972). and I from the mut)ant Si80fiR. which has a cysteinr in place of Tgrl39. The 6th derivative was made from lead acetat,e: lead has been known for many years to react with specific, caarboxyl groups ((‘aspar. I963), although the ident,it’g of these groups is only now unambiguously established. Data were c~ollect~ed photographically. and diffracted intensities determined by a modification of the method of angular drconvolution (Makowski. 197X).
Phase determination in fiber diffraction is complicated by the fact t)hat hecause the diffracting particles are randomly oriented about the fiber axis, the diffraction data are cylindrically averaged. The diffracted intensit) is: I = EGG*. where G is a c~ml)lrx I~‘ourirr -Bessel st ructurr fact,or (Klug P! r/l.. lU58). analogous to the crystallographic F. Both phases and magnitudes of G must be determined in order to calculate an electron density map; thus the phase problem in fiber diffraction is multi-dimensional. The number of significant terms in the equation depends on the radius and symmetry of the diffracting particle and on the resolution: for TM\’ at 2.9 A it can be as much as 8. Stubbs & Diamond (1975) proposed using a multi-dimensional analog of crystaiIographic isomorphous replacerutbnt to seIlarat,r and phase the G terms. The TM\’ helix does not repeat, perfectly. but, has 49.02 subunits in 3 t)urns. and Stubbs & Makowski (1982) showed that t,he consequent fine splitting of layer-lines can be used to provide extra phase information, effectively doubling the information content of t,he best heavy-atom derivative datasets. Initial phases were calculated by these methods. and refined by solvent flattening and other methods; one unusual step that was found to be particularly useful in improving phases was to re-refine heavy-atom parameters against phases based on molecular models of the virus as soon as the struct,ure determination was sufficiently advanced t,o build such models. Details of the data collection and phase drt,ermination are given by Samba & Stubbs (1985).
T,east-squares refinement with stereochemical restraints (Hendrickson & Konnert. 1980) was adapted for use with fiber diffraction data. The structure factor calculation was replaced by a Fourier-Bessel transform to utilize the helical symmetry of fiber diffraction systems, and intermolecular interactions were added as restraints, since these intrract’ions are much more extensive in the ~IOSP-
packed assemblies that, give rise to fiber diffraction than they are in crystals. It was also nrcrssary t,o inrludr restraints from covalent bonds bet’wren asvmmet,ric* u&s. in particular the continuous st,rand of Rk,A that follows the basic, helix of the TMV structure. Thea programs were adapted from a set provided by Dr \2’. HerrdricLkson. and incorporated some modifications to ha,ndle nurlric~ acid st,ructurrs by Dr E. Westhof. The programs arr applic,able to fihrr diffraction sgst,ems in general. and are also being used Kay 1)r I,. Makowski and his group in their studies of filamrntous bacteriophages (Sambutlripatl c!? Makowski. 19X7). Molr~ details are given hy Stlll)bs P( /I/ (1986,. Stereochemicallrrst)rained Irast&quarrs refinement irl fiber diffraction was generally found t,o behave as it does in protein crystallography. We repeated the first 20 or 30 ryc~les using a variety of parameters in or&r to &terminc those that gave the most rapid convrrgenc~r t,o ac.cqJt,abl~~ models. The paramet,rrs eventually usrtl are given in TabIt, I, 12’~ found that increasing t,hr weight to br given to the dift‘raction data improved c*orrvrrgr~lcc~ without serious deterioration in stereochemistry. until we reached a point where t,hr refinement became unstable. and the A’-factor rose rapidly. This behavior presumably rrfwts the smaller number of difl’racbtion dat,a available in fiber diffrac+ion than in c~rystallograph; when t.hr weight given to these data is increased too much. thrrcx art’ effecativrly too few data to restrain the atontica (so-ordi natrs. and the refinement fails t,o c&onvergcl. It \vas fount1 possit)lc~ to incarrase thr diffraction data wright latrir in the’ refinement. I)ifferenc~r Fourier maps are used during thts (‘ourst’ of a cbrystallographic. rrtinrrnent to correct t,he model and to add atoms such as t,hosr from solrrnt molt~c~ules; in prot,rin qvstallography. maps synthesized using cneffi clients 2F,- FCand FO-FC are widely used We hav(b shown (Samba bz Stubbs. 1987) that because of t,he c~ylindrical a,vrraginp of the data. the fiber diffraction analogs of’ 2r0- /$ maps do not give the most accurate rrpresen~ tations of the st,ruc*ture. and that for TM\’ at 2.9 A. c~oeficirnts analogous to 6P0-5PC are most satisfactory. “Omit“ ma,ps. in which part of the struc~ture is omit,trd from the c.al(~rllation of F, in order not to tjias that region of the electron density map, are useful in fiber diff’ract,ion. just as the?; are in conventional c~r~st,allogral)ll~. but WC’ have found t,hat only small replans can be omitt,ed without impairing the quality of the map. Ll’r made some use of hvbrid maps. in which part of the model is omitted from tile c~al(.ulation of the phasr of’ G ant1 th(s whoIt. model is used for the division of the diffractrd itltensit) into C terms; sample calculations had shown that, these hybrid maps were not biased towards the model struc+urr (Namba & St,ubbs. 1!%7). In the later stages of th(h mtimmerit, however. we relied on 61”,-5FC and b;,- F, maps with small parts of the model (up to 3 residuchs) ornittrd. and in the final stages we used exclusivei> fib - FC maps. Although these maps are harder to int)rrpret. they are less affected by noise than any other synthesis. P,,-- PC omit maps were used during the last st’ages of rrtinrmrnt t,o check numerous side-rhain conformations. inc~lucling all residues that are important in virus stru(,ture or assembly. The quality of the difference maps can be inti~rred from Fig. 3 in Fig. 3(a). 2 side-(*hains omitted from the F, calculation are clearly visible in a F,- Fc map; in Fig. 3(b), t,he struct’ure of a larger omitted region is similarly c.onfirmed; in Fig. 3(c), a 6F,--5Fc map in which Arg71 has been omitted but’ Tyr72 included shows both residues rlrarly.
((1) Model
building
and
ryfinrmrnt
:I molecular model was built to fit the solvent-flattened rlertron density map at 3.6 b resolution. The RNA electron density represents an average of all the nucleotidr triplets in the TMV genome. but, we used the sequence (:AA in our model. as it is known to bind particularly strongly to the protein in order to initiate viral assembly (Zimmern, 1977; Steckert & Schuster, 1982). and the electron density did appear to fit t,his triplet well. All model-building used the program FRODO (Jones. 1982) in various Evans and Sutherland computer graphics systems (models PS2. MPS and PS340). The crystallographic R-factor. 112_ p/2) (‘0 ._c ~~ p/2 0
.
was 0.311 for the initial model between 10 and 3.6 A. Refinement was carried out in 3 stages: refinement at 3.6 A resolution (cycles 1 to 45), extension to 2.9 A resolution (cycles 46 to 134). and refinement at 2.9 A resolution. periodically using difference maps to correct the structure and add water molecules (cycles 135 to 251). The rootmean-square (r.tn.s.) shift ‘in the atomic co-ordinates during the last cycle was 0.005 A. The progress of the refinement is summarized in Table 2. During the first 30 cycles only positional co-ordinates were refined; thereafter isotropic temperature factors were also allowed to vary. The structure after 45 cycles of refinement against data between 10 A and 36 A resolution has been described briefiy (Namba & Stubbs. 1986). Difference maps were not, used in the early stages (Table 2), because a sound theoretical treatment of difference analysis in fiber diffraction (Namba & Stubbs, 1987) was not yet available. 2F,- F, maps were used to make some changes to the model and to add 90 water molecules after cycles 134 and 146, but much more progress was made using 6F,-5F, maps after cycle 165 and later. At, that stage, 55 of the 90 water molecules were removed
because of unacceptable routacts wit,h the protjein or nucleic3 acid. but eventually F’,- PC maps permitted us to add wat,er molecules unt,il 71 were included in the final model. Water molecules were added somewhat arbitrarily at first, requiring only stereochemically good contacts with protein or nucleic acid and reasonable densit,v in a difference map. After cayclr 216, however, the strict requirement that added water molecules should be in peaks more than 3 times t,he standard deviation of electtron density in the difference map was imposed. and all existing water molecules were checked in omit maps to ensure t,hat they met this criterion.
3. Results (a) 7’h,erqfinetl
vnodel
In the final model, t,here are I379 atoms, including 71 water oxygens, two calcium ions, and all the non-hydrogen atoms in the RNA and the protein. The C-terminal residues 155 to 158 have very high temperature factors (typically X0 to 90). and were located in regions of very low &ctron density. The final difference map is very Aat. containing only one peak of height comparable to those used to fit water molecules in the earlier difference maps; it was not possible to fit. a wat’er molecule with good stereochemistry into this peak. The r.m.s. difference between the at,omic co-ordnates of the initial model and the refined model was @97 A. This difference was the same for protein and nucleic acid atoms. The largest differences (3 to 6 8) were in the terminal atoms of a few large side-chains that were rebuilt during the early stages of the refinement The final R-factor was tPO96. It should he noted that. because of the cylindrical averaging of the
Protein-Su.cleic
Acid Inter&ions
in 7’M V
31 I
(0)
Ib)
(cl
Figure 3. Difference maps (difference Fourier-BesseI syntheses) at y‘+I A resolution. Light lines: electron density. Heavy lines: parts of the final refined TM\’ model. (a) Coefficients F,-- F,; side-chains of Glu50 and Asp77 (shown) were omitted from the naloulation of the coefficients. (b) Coefficients F,-F,; all of residues 70 to 72 omitted. (c) Coefficients fjF,--5F,; Arg71 was omitted from the calculation, all other residues included. At this resolution, for a struct’ure with the helical parameters of TMV, known and unknown parts of a structure should appear with approximately equal intensity in a [6/i] Fourier-Bessel difference map. Arg71 and Tyr72 are equally mcognizable.
data. fiber diffraction El-factors are inherently lower than cryst,allographir R-fact,ors. For TMV at 2.9 A rrsolut’ion, the K-factor to be expected from a set. of atoms randomly distributed wit,hin the radial limits of’ i hr virus would br 0.34 (Stubbs. 1989h). TOP reduction of the R-factor from @311 to @096, therefore. t,ogether with the omit maps, convinces us that this model is a reliable representation of the virus. Bond lengths deviate from ideal values 1)~ (WI 1 ,4. Distances between atoms connected by a single intervening atom (reflecting the magnitudes of bond angles) deviate from idealitg by 0046 A. Other restrained parameters are close to or within the target, values (Table 1). Although most close caontacts were removed during the rebuilding and refinement,, several remain. The shortest nom bonded interatomic distance is 2.3 w. The hydrogen bond lengths are all greater than 2.3 8. but 16 possible hydrogen bonds are less than 2.5 L& long. All of the non-glycine main-chain protein dihedral angles fall within or very close to t)he allowed regions of the Ramachandran plot (Fig. 1: Ramakrishnarr & Ramachandran. 1965). Arg92 is the only residue in t,he left-handed a-helical region oft he plot. This residue is stabilized in its conformat.ion by the binding of the RXA. In the protein disk. where the RNA is absent,. Arg92 is disordered ((‘hampness et nl.. 1976: Jardetzky it al.. 1978). The mean temperature factor for protein atoms is
35, and for nucleic: acid atoms 45. The temperat~urc~ factors for the nucleic acid are probably higher because the RNA structure is an average of all ot the t,riplets ut,ilized in the genome. There is no significant difference between main-chain and sidtbchain temperature factors in the prot!ein or the nucleic acid. reflecting the fact that the TM\’ st,ruc,ture is extremely closely packed. The variation of’ t,emperat’urr factor with amino arid srquencar was illustrated in Figure 2 of Namba & St,uhbs (1986) for an earlier stage of refinement: the general features of that plot are retained, although the average temperature factors in the final model ar’r slight’ly higher. The refined atomic co-ordinates have been drposited with the Protein Data Bank at, Brookhaven National Laboratory. Vpton. NY 11973. I’.S.A. (b)
Protrin
structu~rf~
The virus coat protein has a high proport,ion of secondary structure, with 5Oqi, of the residues in a or 3,, helices, and loo/; in j-structure. in addition to numerous reverse turns. The backbone structures of two subunits and the RNA are illustrated in Figure 5: the packing of these subunits t,o form the intact virus is shown in Figure 6. At the chore of the subunit is a right-handed, four antiparallel a-helix bundle (Richardson, 1981). with the left and right
Protein-llTucleic
Acid Interactions
in TN i’
3 I3
Figure 4. Ramachandran plot of the main-chain dihedral anglesin TMV coat protein. Glycine residuesare marked with open circles. Arg92 (marked) is the only residue in the left-handed a-helical region of the plot.
slewed helices (LK and R8) and the left and right radial helices (LR and RR) extending outward from the RNA-binding site. The ends of these four helices, defined on the basis of hydrogen-bonding geometry, are at residues 19 and 33, 37 and 52, 111 and 135, and 73 and 86, respectively, but these ends are difficult to define becauseof irregularities, and in several casesthe helices begin or end with one turn helix. At higher radius are two shorter of 310 a-helices, N and C. A region at low radius, called V in earlier papers, gave an impression of helical structure before refinement, but is better described as a series of reverse turns, including hydrogen bonds between residues 97 and 100 and residues 103 and 106. There is only one good region of /l-structure, a very short four-stranded sheet at the outer end of the four core cr-helices(Fig. 5). The N terminus and a section near the C terminus of the protein run antiparallel for two residues, but the separation of the chains in the model is a little longer than would normally be acceptable in hydrogen bonds. The whole protein subunit is well ordered, except for the four C-terminal residues. An aggregate of the protein in the absence of nucleic acid, the 34 subunit disk, has been crystallized and its structure determined at 2.8 A resolution (Bloomer et al., 1978). In that structure, however, a 25-residue loop of the protein, containing the RNA-binding site and all the residues at lower radius, is disordered. The axial (top-to-bottom) intersubunit contacts in the disk and the virus are completely unrelated, but
man?- of the lateral contacts are similar. Other differences have been discussedby Samba & Stubbs (1986). Most of the aromat’ic residues in the protein are near the outer surface of the virus. Together with several proline rings and other aliphatic side-chains. they form a continuous hydrophobic ribbon that follows the viral helix (Fig. 7). This ribbon corresponds to the closed “hydrophobic girdle” of the protein disk (Bloomer et al., 1978). The aromatic rings interact extensively. both within and between subunits, making electrostatically favorable edgeto-center contacts as in many other prot,eins (Burley Br:Petsko, 1985). The four tyrosine residues are close together at the bottom of the subunit, near the lateral subunit interface and the groove that follows the viral helix at high radii (Fig. 6). Many authors have remarked on the different capacities of the tyrosine residues t’o be iodinated: Tyr139 is very reactive, Tyr2 less so, and Tyr70 and Tyr72 are generally unreactive (Fraenkel-Conrat dz Sherwood, 1967). The model suggests t,hat these differences arise from the relative accessibilit)V of the t,yrosine rings. The edge of the Tyr139 ring-is exposed to the solvent in the groove, while Tyr2 is near the outer surface of the virus, but pointing in toward the hydrophobic core. Tyr70 and Tyr72 are completely buried in the subunit interfaces. All of the charged residues except Mu97 are in one of four regions of electrostatic interaction (Fig. 7). All four regions contain close approaches between like, generally negative. charges. These
K. Samba
et al.
Figure 5. Secondarystructure in TMV coat protein. Backbonestructures of 2 protein subunitsand 3 RNA nucleotides, representedasGAA, are illustrated. The RNA is enlargedat lower left, with eachnucleotideshadeddifferently. cr-Helices are marked
LS, RS, LR, RR (left and right slewed, left and right radial,
nomenclature
from Champness
et al., 1976),
N (N-terminal, substantially obscuredin this view) and C (C-terminal). Nucleotidesare labeled1, 2 and 3 for referencein the text. The viral axis is vertical, to the left of the Figure. unfavorable interactions are partially ameliorated by nearby opposite charges. Glu97 is the only charged residue on the inner surface of the virus. The other charged residues near or inside the RNA radius form a large, loosely linked network of charge interaction, which includes
the three nucleotide phosphates and extends across protein-protein subunit boundaries in all directions. It is characterized by medium-range, non-specific electrostatic interactions, although there are some identifiable ion pairs. There is a concentration of oxygen atoms at a radius of 25 A that has been
Figure 6. Six subunits of TMV in a central section through the virus. Numbering from the subunit at the bottom right, these would be at positions 0, 8, 16, 24, 32 and 40 in the viral helix. The viral axis runs vertically through the center of the Figure. The x-carbon atoms are connected by cylinders 2 A thick and color-coded for sequence, from the N terminus (yellow) to the C terminus (brown). The direction of each main-chain carbonyl group is indicated by a small bump on the cylinder. The atoms of the RNA are color-coded as follows: uncharged oxygen, red; charged oxygen, crimson; uncharged nitrogen, blue; sugar carbon, green; base carbon, purple; phosphorus, blue-purple. The diameters of the atoms are 3.5 A for phosphorus, 3 A for charged oxygen and 2.5 A for all others. The graphics system is described by Namba et al. (1988). Apart from the groove that follows the viral helix on the outer surface, the subunits are very closely packed at all interfaces. Figure 10. The proposed phosphate-carboxylate calcium-binding site. The calcium ion is represented by the blue dotted sphere; RNA is orange; protein is green. Two water molecules, 1 in front of and 1 behind the site, are represented by green asterisks. The remaining ligands are the phosphate oxygen atoms, O-2’ from ribose 1 (numbered 155 in this graphics display), O-l’ from ribose 3 (numbered 157), and 1 carboxylate oxygen from Aspll6.
Figure 12. Lateral intersubunit interactions between 3 subunits of TMV. The viral axis is at top left, perpendicular to the page. Green: a-carbon trace of the protein chain. Orange: RNA. Blue and red: side-chains of arginine and aspartate residues, respectively, that make lateral intersubunit salt bridges. Purple: water molecules. Yellow: surfaces of hydrophobic interaction.
Protein-Nucleic
Acid Interactions
Fig. 6.
10.
Fig. 12.
._-
in TM V
3 15 -
K. ,Vamba et al.
Figure 7. Two subunits of TMV in the same orientation as Fig. 5. Regionsof electrostatic interaction are crosshatched; the location of the hydrophobic ribbon (which runs approximately perpendicular to the page) is indicated by the circle marked H.
identified (Namba & Stubbs, 1986), on the basis both of its appearance (Einspahr & Bugg, 1984) and of its metal-binding properties (Stubbs et al., 1977), as one of two or more calcium-binding sites found in titration experiments (Gallagher & Lauffer, 1983a,b). The potential ligands (Fig. 8(a)) are one carboxylate oxygen each from Glu95 and Glu106, the hydroxyl oxygen from Thr104, and the mainchain carbonyl oxygens from residues 101 and 102. There is electron density at the proposed metal site, but, in this model, all the metal-ligand distances are closeto 3 A, suggesting that under the conditions of specimen preparation used, the site is probably occupied by a water molecule or a potassium ion rather than a calcium ion. This first charged region also includes the RNA-binding site, discussedbelow, and two lateral intersubunit salt bridges, Argll3Asp115 and Asp88-Arg122. These salt bridges form part of one of the base-binding sites and also stabilize the protein subunit interactions. The second charge cluster (Fig. 8(b); see also Fig. 5(a) of Namba & Stubbs, 1986) is located in the axial subunit interface, at a radius of about 55 A. It is centered about a close approach, about 4 A, between Glu50 and Asp77 in the subunit above it,. Although these side-chains bind uranyl fluoride in one of the heavy-atom derivatives, the electron density does not contain a peak identifiable as a metal ion or water molecule in t’his case, nor are
there any more metal-binding ligands nearby. Asp77 is stabilized by Arg71 in the same subunit, and Glu50 by Arg134 from the subunit diagonally opposite it. Arg134 in turn forms an intrasubunit salt bridge with Glu131. Glu50 is also about 6 A from Arg46 in the same subunit. In contrast to the low-radius charge cluster, all of these contacts except’ the last are close, with most of them close enough to be hydrogen bonds as well as salt bridges. The third and fourth charge clusters are on the outer surface of the virus (Fig. 7) and, like the second, are built around close, probably hydrogenbonded, interactions. They are both located entirely within one subunit. The third cluster, deep inside the helical groove, contains the only two lysine residues in TMV protein. These lysines stabilize the close approach of Asp19 and Glu22. The fourth cluster contains two arginines, 61 and 141, which appear to form two hydrogen bonds with each other. The consequent close approach of charges is relieved by salt bridges between Argl41 and Asp64, and between Arg61 and Glu145, and also by the C terminus. Hydrogen bonds utilizing side-chain atoms are found in three regions of the protein in particular (Fig. 7). At high radius, the relatively large surface of this part of the protein contains many hydrogen bonds, most of which are between main-chain and side-chain
atoms,
almost
all within
one
subunit.
Protein-Nucleic
Acid Interactions
in TM V
317
(b)
(a)
Figure 8. The subunit interface charge clusters. (a) The low-radius cluster, in the lateral interface, centered about Glu95 from 1 subunit and Glu106 from its lateral neighbor. The location of a possible calcium-binding site is marked by a filled circle; possible oxygen ligands by open circles. (b) The medium-radius cluster, in the axial interface, centered about Asp77 from 1 subunit and Glu50 from the subunit below it, and including parts of 2 other subunim. Between 35 A and 60 A radius, particularly in the LS-RS loop, there is another concentration of hydrogen bonds. Many of these are between mainchain and side-chain atoms within a subunit, stabilizing this loop. An example of this type of bonding is Thr37, a highly conserved residue, which forms one hydrogen bond as a donor to the carbonyl oxygen of Gln34, and one as the acceptor for the peptide hydrogen of Ala46 It does not appear to be hydrogen bonded to a phosphate, as previously reported on the basis of a partial refinement at lower resolution (Namba & Stubbs, 1986). There are also numerous intersubunit hydrogen bonds in this part of the protein. The loop at low radius is of particular interest, since it is ordered in the virus and a low pH aggregate of the protein (Mandelkow et al., 1981), but not in the crystalline protein disk (Champness et al., 1976) or in the 20 S aggregate of the protein in solution (Jardetzky et al., 1978). The ordered loop is stabilized by a number of hydrogen bonds, about equally distributed between main-chain and sidechain atoms. More than half of these are intersubunit bonds. (c) Nucleic acid structure The nucleic acid structure is strongly influenced by its interactions with the protein. This leads to some unusual RNA conformations, a fact which should be considered when modelling structures in which, by contrast to TMV, only the protein and not the nucleic acid structure is known. The RNA structure is shown in Figure 9. The RNA torsion angles are given in Table 3. The structure is very similar to that found earlier from a 4 A
resolution map (Stubbs & Stauffacher, 1981), with a r.m.s. difference between the models of less than 1 A. The conformations of all three ribose rings belong to the C-3’-endo group: specifically, riboses 1 and 3 are C-2’-exo (a conformation extremely close to C-3’-endo and often not distinguished from it), and ribose 2 is C-3’-endo. Nucleotides 1 and 2 are in the anti conformation, with the basespointing away from the ribose ring; when base 3 is a purine, it is syn, pointing over the ring. It is not possible to determine the conformation of this base when it is a pyrimidine. The syn conformation is unusual, but it has been seen in small molecules and in Z-DNA (for references, seeSaenger, 1984). In TMV, it appears to be stabilized by base stacking between bases 3 and 1, which are about 3.5 A apart (Fig. 9), and when the base is adenine, it is also stabilized by a hydrogen bond between N-6 and the main-chain carbonyl of Thr89. The combination of a syn base with a 3’endo ribose is particularly rare, but one of the few caseswhere it has been observed is another example of a nucleotide structure perturbed by protein binding: the adenine in NAD bound to two of the four subunits of lobster n-glyceraldehyde-3-phosphate dehydrogenase (Moras et al., 1975). The combination also occurs in Z-DNA (Wang et al.. 1979). Only one of the torsional angles is outside the range normally found in nucleotides, repeating helical nucleic acids or tRNA structures (Saenger, 1984): E for nucleotide 2 is 73”, reflecting the fact that phosphate 3 is close to ribose 2, rather than being rotated away. This structure is stabilized by a hydrogen bond between the ribose O-2’ and one of the phosphate oxygens, about 2.7 A in length. The angle C&O&O is 114”. The only non-bonding
Figure 9. Two consecutive trinucleotides of TMV RKA drawn as GAAGAA, viewed approximately from the outside of the virus (compare with Fig. 5 to see the orientat’ion relative to the protein), but rotated for ease of stereo viewing. As in Fig. 5. nucleotides are labeled 1. 2 and 3 for referencein the t’ext. close contact
in the RNA
structure
is a little
less
than 2.5 A, between C-5’ and a phosphate oxygen. both in nucleotide 2. The RNA-binding site is generally rather crowded; phosphate groups 2 and 3 are only 4.8 A apart. The crowding is to some extent imposed by the helical symmetry of the virus; for the RNA to bind at a radius of 40 A, t,he length of the repeating trinucleotide can only be 15.4 A. (d) Protein-nucleic
acid interactions
These may be considered in several classes: electrostatic interactions between the phosphate groups and protein side-chains, non-specific (hydrophobic) interactions between the bases and the protein, and base-specific hydrogen bonds with the protein. Only one direct hydrogen bond is seen between a ribose hydroxyl group and the protein. and none between ribose ring oxygen atoms and
protein, although there are some close approaches between the ribose oxygen atoms and water molecules. The ribose hydroxyl groups are, nevertheless, important) to the structure. and are discussedbelow in the cont’ext of nucleic acid (RNA rather than DNA) specificit’y. The phosphate groups are, in general, neutralized by arginine residues from the protein subunit below them. but not always in the form of simple ion pairs. Phosphate 1 and phosphate 2 are close (about 3 ,A) to Arg90 and 92, respectively, but there are also several medium-range (5 to 7 A) interactions, between phosphate 1 and Arg92, phosphate 1 and Arg41, phosphate 3 and Arg90, and phosphate 3 and Arg92. There is an unusually close approach (less than 3.5 A) between t,he side-chain carboxyl group of Asp116 and phosphate 2. Electron density between these two groups has been interpreted as a ca,lcium
Table 3 RNA
r Xucleotide 1 2 3
P-b;,,
q3,-p
258 109 206
Nomenclature for atoms Lwe also, Saenger, 1984).
142 263 143 and dihedral
dihedral
angles
(in degrees)
6 c’(41 .q (31
B 0’(5)-.c* ‘(5) 206 316 207
angles is according
181 348 239 to the IUPAC-IUB
91 69 88 Joint
Commission
E (1’ -0’(3) ‘(3)
% Sugar--base
I54 73 253 on Biochemical
209 186 25 Nomenclature
(1983;
Protein-Nucleic
Acid Interactions
in TM V
319
Figure 11. Part of the binding site for base 1. The base lies in the lateral intersubunit interface, between 2 salt bridges. Hydrogen bonds between O-6 and Arg122, and between N-2 and AspllB, make the binding of guenine particularly favorable in this site. A hydrogen bond between Thr89 and N-i (not shown) could probably form with N-3 in a pyrimidine,
thus accommodating
any base.
ion; this calcium ion utilizes one carboxylate oxygen atom, both phosphate oxygens, the ribose hydroxyl group of nucleotide 1, the ring oxygen of ribose 3, and two water molecules as ligands (Fig. 10). The other carboxylate oxygen of Asp116 is hydrogenbonded to the nucleotide 1 ribose hydroxyl group. Phosphate-carboxylate calcium-binding sites are not common (Einspahr & Bugg, 1984), but one has been observed in staphylococcal nuclease (Cotton et al., 1979), and the metal-binding site of DNA polymerase I utilizes three carboxylate groups and a nucleotide phosphate (Ollis et al., 1985). The interaction of the phosphate, carboxylate and calcium in TMV is presumed to play an important part in the assembly and disassembly of the virus, as discussed below. All three bases lie flat against the left radial a-helix (Fig. 5). Base 1 presents its hydrophobic surface to a methyl group from Valll9, while base 3 is close to the a-helical main chain between Asp116 and Ala117. These two basesstack together (Fig. 9) and point up into a cavity formed by the left radial helix, a segment of extended chain following the right radial helix, the left radial helix of the 3’ neighboring subunit, and intersubunit salt bridges Argl13-Asp115 and Arg122-Asp88 Thr89 can form a hydrogen bond with base 1 (N-l if the base is a purine), either as a donor or an acceptor, depending on the base. Base 2 lies along the left radial helix, between the helix and the connecting peptide loop between the left and right slewed helices from the subunit below. Ser123 and Asn127 from the top subunit and Asn33, Gln34 and Thr37 from the bottom subunit provide a hydrophilic environment for the polar parts of the base. There appear to be a number of base-specific hydrogen bonds between the coat protein and the RNA. The binding site for base 1 is particularly
suited to accommodate,guanine (Fig. 11). This base sits between two lateral intersubunit ion pairs, Arg122-Asp88 and Asp115Argl13, so that if the base is guanine, atom O-6 can form a hydrogen bond with Arg122, and atom N-2 with Asp115. There is less specificity for base 2. If base 3 is adenine, N-2 can form a hydrogen bond with the main-chain carbonyl group of Thr89. Cytosine could also form this bond, but purines fit the base-binding cavity better.
(e) Water structure Water molecules are distributed throughout the surface of the protein subunit, both on the viral inner and outer surfaces and in the subunit interfaces (Fig. 12). The model contains more water molecules in the interfaces, but this probably reflects only the fact that the interfaces are more ordered than the surfaces, so that these water molecules are easier to see.There is no marked preference for water molecules to bind to side-chain or mainchain atoms, or to charged or uncharged polar groups.
4. Discussion (a) Protein-nucleic acid interactions (i) Non-specijc interactions Non-base-specific protein-nucleic acid interactions are essential to ensure that the coat protein can encapsidate the entire RNA genome. Such interactions are most easily formed by non-directional ionic and van der Waals forces. In TMV, nonspecific base binding is achieved by interactions
i)tlt \?;wI~ thra flat t)ase surfa.ce at~d t)lttx kft r;tdial z-helix. I:ttlrss t,he base is guanine. space will he left itt the base-binding site. One might, speculate that. a lvater molecule would fill t,he space in the vicinity of guanine X-2 when the base is adrnine, and that one or even two water molecules take up the space when the base is cayt,osine or uracil, but il is not.. of course. possible t,o determine such structural features f’rom our ele(*tron dcttsit’y map. in which only averaged teases are seen. The el&rostatic int,eractions are best, considered as c~orn~)lemetttarit~~ between t,he electIrostatSic surfaces of the protetn and the nucleic acid, rather than as simple ion pairs of arginine residues and phosphate groups. Thus, t,he t,op surface of the helical array of protein subunits presents a positively caharged groove t,o t,he RNA, accmommodating the negativeIT charged phosphat,e groups. While there are close. probably hydrogen-bonded. approaches to phosphate groups by Arg90 and Arg92. the distances between t,hese arginine residues and other phosphat,es are still short enough t,o contribut)e to the binding energy. Arg41 appears t,o be part of t,he RSA-binding site, and is conserved in many strains and tnutants of T,MV. but it is almost, 7 L4 away from the nearest phosphate group. On the bottom surfare of the subunit, there is an anomalous c~lectrostzatic interact,ion. the repulsion between Asp116 and phosphat,e 2. This is one of several interactions in the virus struct8ure that demonstra,t,e an important principle of viral struct’ur’e. and of i)rot.ein-nucleic acid struct.ural int,eractions in general: the structure must achieve a metastable balance, rather than a deep energetic tninimutn. I)ecausr almost all protein-nucleic acid assemblies are required to assemble and disassen~l~le in rtbsponse to changes, sometirnrs quite subtle. in their environments. This particular example of the principle will be discussed further under viral assembly and disassetnbly (section (e). below). (ii) Ryeci$c int~rnctions The TMV coat. protein recognizes RNA with very high selectivity. efficiently encapsidating only its own or closely related RNA (for a review. see Stubbs, 1984). Zimmern (1977) showed that, TMV assembly is init.iated b*y a specific RNA sequence that includes AAGAAG as part of a sequence (SSG),. Zimrnern proposed a highly base-paired secondary st)ructure for t,his region of the RNA, with t.he initiation sequence at t,he end of a tong loop. T?vlV RSA specificity is presumed to be a consequence of this primary and secondary st’ructure. Recognition of the origin of assembly sequence need not a priori br reflected in the RNA binding; t,he specificit,> of the early stages of viral assembly could. for example, be derived from the relative energies of transition states between the unassembled and nucleated states of the growing virus. The base-specifir hydrogen bonds described above do, however, account for a preference by the prot’ein for binding the triplet) AAG, and in particular for
binding G at every t hit-d site. The preference must be slight. since t,he prot,ein encapsidates the entire genome. and the sequence XXG does not. o(+(:ur in phase with the assembly init,iation sequertcee at a st>atistica,lly significant frequency (Goelet rt a/.. 19X2). In a highly co-operative assembly initiation process. however. t.he repetit,ion of the sequence would lead to a st,rong discriminat,ion in favor of t.he viral FLKX. Bteckert & Schuster ( 1982) found t,hat# the t’rinucleoside diphosphate AA(: binds to TMV protein at) pH 5.4 and 20°C‘ with the relatively high binding constant of 12 x lo3 Mm ‘, GAA and AGA also bind strongly. Ijut wit,h smaller binding const,ant,s. 4.7 x lo3 and I.0 x 103 11 ’ . respectively. Most &her trinucleoside diphosphates bind weakly if at all, t,he exceptions being sequences similar to AA!:. including (:A(: (4.0), LJAG (1.9). AY(: (4.1) and C4(: (58). The binding of AAG and similar trimers is readily account)ed for by the pattern of potential hydrogen bonds in the RNA-binding site. The relative binding strengths of AAG. CAA and AC.4 can be explained by considering the different binding energies of t.he three phosphat)e sit,es. If AAC is align&l in the RNA-binding site to make t,he best possible hydrogen bonds, the unoccupied phosphate sit’e (unoccupied because the trinucleotide diphosphate has no t,erminal phosphates) is the one closest t,o Asp1 16. This site would be expected t)o caontribute litt’le to the binding energy, particularly in t-he absence of c>alcium. ln caontrast. when AC.4 and GAA are aligned in the binding sit,e. with C always occupying the specific G site at nucleotide 1. one of t,he two strong phosphate-binding sites is not utilized. Thus the binding of these t,uo trinucaleoside diphosphat,es is expertted to be weaker. as is in fact observed. TM\’ prot)ein does not, assemble with IINA, even when the origin of assembly sequence is included (Callie it ~1.. 1987). All of the t.hree base-binding sites could accommodate thymine wit)hout difficultv. so the RSA specificity must be derived from the ‘int’eractions tnade by t’he ribose 2’ h!-droxyl groups. The group from nucleotide 1 is of obvious importance, as it forms one ligand in t,he met.al binding site. as well as helping to stabilize the concentrat’ion of negat’ive charge in this region by hydrogen-bonding to Asp1 16. Alt,hough a possible interaction between Asp1 15 and a ribose hydroxyl group has often been discussed in the literature, t,he 115 side-cahain is in fact more than 8 A from t,hr nearest ribose hydroxyl. Tn nucleotide 2. the ribose hydroxyl group is hydrogen bonded to the adjacent’ phosphate from nucleotide 3. Phosphate groups 2 and 3 are only about. 5 a apart., a.nd this hydrigen bond might contribut’e t,o the stability of such an unusually close approach. The hvdroxyl group in nucleotide 3 does not make hydrogen bonds with RNA or protein, although we cannot exclude the possibility of there being a poorly ordered water molecule, not. visible in our diff’erence maps. forming a bridge between the hydroxyl and Asp1 16. This hydroxyl does make
Protein-Nucleic good van der Waals contacts with Ile94, a conserved residue in many strains of TMV. (b) Protein-protein
Acid Interactions
highly
interactions
The interactions between protein subunits in TMV are of particular interest because of the variety of protein aggregates formed under different, solution conditions (for reviews, see Stubbs, 1984; Bloomer & Butler, 1986). At pH values below about 6.5, long helical aggregates, isomorphous with t,he virus, form. An aggregate with 52 rather than 49 subunits in three turns of the viral helix has also been reported (Mandelkow et al., 1976; Potschka et al., 1988). At neut,ral pH and low ionic strength, a 20 S protein aggregate forms. This aggregate is required for initiation of viral assembly (Butler & Klug, 197 1). It has generally been assumed to be the 34.subunit protein disk found in crystals formed at pH 8 and high ionic strength (Bloomer et al., 1978), but more recent results suggest that it is probably a helix of a little over two turns, containing about 38 subunits (Correia et al., 1985; Namba & Stubbs, 1986; Raghavendra et al.. 1988). At intermediate ionic strength and neutral or alkaline pH, disks and aggregates of disks are formed. At low ionic strength and alkaline pH, mixtures of small oligomers, called A-protein. are formed. Lit,tle is known about structure in these small oligomers. The lateral protein-protein interactions, illustrated for t’he virus in Figure 12, appear to be similar in all the aggregates, except for the disordering of the lowradius loop in the disk. The long helical aggregat,es were compared with each ot’her and with the virus at 4 A resolution by Mandelkow et al. (1981). At that time, no differences in any of the protein-protein interactions were found. although it will be necessary to repeat those comparisons using the more reliable phase informat,ion available from the virus structure described here. The differences between the virus and the disk are somewhat greater t’han would be the case if the transformation between the subunits in t,he t’wo forms were the minimum geometrically required to change the symmetry (Samba & Stubbs, 1986). The subunits in the virus are t,ilted about 12” from perpendicular to f he viral axis (Fig. 6). whereas those of t’he top layer in the disk are roughly perpendicular to the disk axis, while the bottom layer disk subunits are tilted in t’he opposite direction. The change in tilt, produces a change in the subunit interface equivalent to a hinge movement of about 45”. Xone the less. the interacting residues are generally similar in the disk and the virus, with t’he different, packing accommodated by large side-chain conformations1 changes. For example, Asn25 makes a direct hydrogen bond with the main-chain nitrogen of Serl5 in the disk (Altschuh et al.. 1987), but, in the virus: this interaction is mediated by a water molecule. There is an alternation of hydrophobic and hydrophilic interactions (Fig. l2), with the largest hydrophobic patch being part of the hydrophobic ribbon described above. The outer two
in TN I
321
hydrophobic patches correspond to those described in the disk by Bloomer et al. (1978). but the innermost patch, at very Iow radius, is in the loop that is ordered only in the virus. There are no hydrophobic protein-protein interactions close to the RNA in either the lateral or the axial interfaces. This is not unexpected, since many of the interactions in this region involve the very hydrophilic loop 3242, which contains only two hydrophobic side-chains (Phe35 and Ala40), both buried in t,he core of the protein. Most of the interactions in the low-radius loop are hydrophilic, and, in particular, there is a large number of lateral intersubunit hydrogen bonds. These interactions could well contribute to the high degree of co-operativity in viral assembly. There are two ion pairs in the lateral interface, Arg122-Asp88 and Aspll&Arg113, discussed above as part of the specific guanine-binding site. A number of the hydrophilic interactions, both lateral and axial, are mediated by water molecules. The axial protein-protein interactions in the virus are completely unrelated to those in the disk. This is partly because of the very different tilts of the subunits in the two aggregates, leading to much closer packing in the virus, but primarily because, in the virus, each subunit is displaced about a third of a subunit to the left of its lower neighbor (viewed from the outside), whereas t,he disk subunit is displared about a fifth of a subunit to the right. Thus, Glu22-Arg134 is an axial intersubunit salt bridge in the disk (Mondragon, 1984, quoted by Altschuh et al., 1987), whereas Glu22 does not make intersubunit interactions in the virus, and Arg134 forms an intersubunit salt bridge with Glu50. In the virus. the subunits are packed very closely, with the intersubunit, interact,ions between the four core helices being as close as the intrasubunit interactions. There are two types of axial subunit interface: a major interface, between a subunit and its neighbor 16 subunits along the viral helix (that is, up and to the left), and a minor interface, between the subunit and its neighbor 17 subunits along the helix (up and to the right). Most of the axial interactions are hydrophilic, although there are a few small hydrophobic patches in the major int)erface, between the core helices. There are two axial intersubunit salt bridges in the minor interface. between Argll2 and Glu95, and between Arg134 and GIu50. Each of these pairs forms part of one of the extensive intersubunit, charge interact,ions described above. (c) Conserved amino acid resid,uea Amino acid sequences are known for a number of tobamoviruses, and for many mutants of TMV. Only a small fraction of the amino acids is invariant’. About a third of the conserved residues stabilize the hydrophobic core of the protein; the rest either stabilize the protein fold around the RNAbinding site or form part of the binding site itself. Altschuh et al. (1987) compared seven tobamoviruses, in which 25 residues were invariant. Of
322
K.
Sumba
these 25, three (Arg41+ Lys. Leu83+Asx, Thr89-rSer) are changed in cucumber green mottle mosaic virus, cucumber strain (GGMMV-C; Kurachi et al., 1972), and two others (Asn25-+Ser and ProSS-+Ser) have been changed in viable laboratory mutants (Durham & Butler. 1975). We will. however, consider all 25 residues. Altschuh et al. (1987) discussedthe reasons for the conservation of some of these residues in terms of the structure of the protein disk: we will, therefore, discuss in more detail only those that take part in the RNA binding. As discussed by Altschuh et al. (1987), seven residues (2, 61, 62, 63, 144, 145 and 152) at high radius are conserved in order to stabilize the hvdrophobic core of this part of the molecule. This is the region of greatest’ similarity between the disk and t,he virus, as it has very few intermolecular contacts, and all of their conclusions apply equally to the virus. Leu83, Ala117 and Leu128 are all in the hydrophobic core of the molecule at lower radius. in the region of the four parallel cr-helices. Several invariant residues form hydrogen bonds with main-chain atoms, stabilizing the fold of two loops critical to the integrity of the RNA-binding site and t,he virus helix formation. Glu36 and Thr37 stabilize the hairpin loop between LS and RS. as also observed in the disk. In Asn91, t,he side-chain nitrogen forms hydrogen bonds with the main-chain carbonyls of residues 90 and 110. and the side-chain oxygen atom interacts through a water molecule with the main-chain nitrogen atoms of residues 92 and 93, stabilizing the start of the low-radius loop. Arg41 interacts with the main chain of residue 87. Asn25 makes a lateral intersubunit contact, bridged by a water molecule to the main-chain nitrogen of residue 15. Asp88, Argll3 and Asp1 15 also make lateral contacts, but their importance probably derives more from their part in one of the base-binding sites, discussed below. Most of the conserved residues at low radius are directly or indirectly important in binding the RNA. Arg90 and 92 bind phosphate groups, Glu38 stabilizes the conformation of Arg92, and Arg41 makes van der Waals interactions with Arg90, as well as contributing its own positive charge, at a distance, to the RNA-binding site. Asp116 is hydrogen-bonded to ribose 1, Ala120 makes van der Waals contacts with ribose 2; and He94 with ribose 3. Asp116 also forms part of the calcium-binding site with phosphate 2 and two ribose oxygen atoms. Residues 88, 89, 113 and 115 all form part of the binding site for base 1. The fact that this is the only base-binding site to utilize invariant amino acid residues supports the suggestion made above that this site is particularly important’ to the virus, because of its capacity to recognize guanine residues. Thr89 forms a hydrogen bond with N-l in guanine, and could probably form a similar bond with N-3 in cytosine; since the hydroxyl group can be a proton donor or acceptor. the threonine side-chain can hydrogen bond with any base. The lateral intersubunit salt bridges Asp115Argll3 and Arg122-Asp88 form the
et al
guanine-specific part of this binding site. Of these four residues, only Arg122 is not invariant; it becomes His in sunn-hemp virus and CGMMV-C. The interaction between Asp1 16 and a phosphate group places a strong evolutionary const)raint, on that amino acid residue. On the basis of titrations and calcium binding studies, Durham & Hendry (1977) and Durham et al. (1977) suggested that TMV contains a group titrating with a pK near 8, but not’ present in the helical aggregate of the protein. Since this aggregate is structurally so similar to t,he intact virus (Mandelkow et al.. 1981), it was proposed that the anomalous pK could arise from a protein-phosphate interaction. The interaction was not attributed to Aspll6, becauseat that’ time, Asp115 and Asp1 16, which are strictly conserved through over 100 strains and mutants of TMV: were widely thought to be part of a carboxylcarboxylate pair. Contrary t,o expectation, however, carboxyl clusters are not conserved during evolution, but appear to migrate freely within subunit interfaces in both helical and spherical viruses (Rossmann et al., 1983; Stubbs, 1984: Lobert et al., 1987). A phosphat’e-carboxylate pair in the highly ordered RNA-binding sit’e of TMV would not have this evolutionary freedom; thus, t’he conservation of Asp116 is much better explained by the structure we now seethan by earlier speculations (e) Viral
(i)
Electrostatic
assembly
and disassembly
interactions
We find three sites where negative charges from different molecules are juxtaposed in subunit interfaces, creating an electrostatic potential that could be used to drive disassembly and thus initiate the early stages of viral infection. The low-radius carboxyl-carboxylate pair, in the lateral interface, and the phosphate-carboxylate pair appear to bind calcium, as judged both by peaks in the electron density map and by the presence of sufficient suitable ligands for the metal ion. The high-radius carboxyl-carboxylate pair, in the axial interface, does not meet these criteria, but could bind a proton and thus titrate with an anomalous pK. Electrostatic interactions have been recognized for many years to be important in the assembly and disassembly of simple helical and spherical plant viruses (Caspar, 1963; Bancroft, 1970). Particular attention has been paid to carboxyl-carboxylate interactions, mediated either by bound protons with anomalously high pK values, or by cations, especially calcium. Caspar (1963) suggested that the anomalous pK values near 7 exhibited by TMV might arise from the forced juxtaposition of carboxyl groups. On the basis of titrations of the isolated TMV coat protein under various conditions, Shalaby & Lauffer (1977) suggested that the anomalously titrating groups would be in subunit interfaces, and Gallagher & Lauffer (1983a,b) showed that TMV has at least two sites for which calcium ions and protons compete. Bancroft (1970) recognized that anomalously titrating carboxyl groups
Protein-Nucleic
Acid Interactions
might also be found in many spherical plant viruses. Intersubunit carboxyl clusters have now been found in many simple spherical plant virus structures, often as calcium-binding sites (for reviews, see Stubbs, 1984; Liljas, 1986). Carboxyl clusters provide a sensitive switch, active under physiological conditions, to control the state of aggregation of virus proteins. Our structural results suggest that phosphatecarboxylate interactions can be similarly important. We might speculate that such interactions could occur in other viruses, particularly the animal picornaviruses and the plant comoviruses. In these spherical viruses, the N-terminal peptides, which are on the inside of the protein shell, interacting with the RNA (Hogle et al., 1985; Rossmann et aZ., 1985; Luo et al., 1987; Stauffacher et al., 1987; Chen et al.: 1989), are slightly negatively charged because of the presence of acidic residues (Kitamura et al., 1981; Callahan et al., 1985; VanWezenbeck et al., 1983). By contrast, the N-terminal peptides in most of the plant viruses are positively charged, counteracting the negative charge of the RNA. Unlike the other plant viruses and TMV, picornaand comoviruses do not have carboxyl clusters. Structural details of the protein-nucleic acid interactions in spherical viruses are not available in most cases, although part of the RNA has been seen in electron density maps of the comoviruses, but phosphate-carboxylate interactions of the same type as the one seen in TMV could fulfil the function of maintaining an energy balance, a source of potential energy to drive viral dissociation. The electrostatic repulsion in both carboxyl-carboxylate and phosphate-carboxylate pairs can be reduced by binding protons or calcium; on entry into the cell, both proton and calcium concentrations are lower than in the extracellular environment. In the case of the picornaviruses, this could provide a mechanism for release of the RNA from the A-particles formed when the virus binds to and penetrates the cell membrane (Rueckert, 1985). Thus, the concentration of negative charges could act as a switch, triggered by the intracellular environment to begin the process of dissociation. (ii) Assembly The structural details of TMV provide some indications concerning the molecular basis of assembly of the virus. Butler et al. (1977) and Lebeurier et al. (1977) provided a general description of the assembly process: the origin-of-assembly sequence of the RNA (Zimmern, 1977) binds to the 20 S aggregate of the protein, and elongation of the viral rod proceeds by addition of 20 S aggregates. The smaller A-protein oligomers also add to the growing virus, but they cannot initiate assembly. A remarkable feat)ure of the assembly process is that the 5’ end of the RNA is drawn through the central hole of the growing virus to reach its binding site. The basis of the affinity of the RNA assembly initiation sequence for the coat protein is clear from the hydrogen-bonding pattern, favoring guanine in
in TM V
323
every third position, in the nucleic acid binding site. Other details of the initiation complex are lessclear; for example, it is not obvious whether the RNA would initially bind only to the surface of a 20 S aggregate, or intercalate between the turns of the helix. In the absence of RNA binding, the two-turn protein helix may be sterically prevented from further polymerization by the disordered inner loop (Namba & Stubbs, 1986). The ordering of the loop, induced by the RNA binding, could be the basis of a highly co-operative process of viral assembly. The extensive intersubunit hydrogen-bonding network in the inner loop could account for this co-operativity. (iii) Early events in viral infection Combining our results with those of other groups. it is possible to outline a plausible sequenceof steps following initial entry of a virion into a cell. The low calcium concentration and high pH (relat,ive to extracellular conditions) of the cell could destabilize the close approach of the negative charges in the viral subunit interfaces. However, although some slight’ destabilization of the virus is observed under these conditions, they are not sufficient for disassembly in vitro. This problem was resolved by Wilson (1984), who showed that after pretreatment at pH 8.0 (during which process rods did not, dissociate detectably), TMV particles could be dissociated in vitro by a preparation containing ribosomes. This phenomenon, called cotranslational disassembly, was later observed in vivo (Shaw et al.. 1986). The mechanism not only ensures protection of the genome even under unusual (for example, alkaline) extracellular conditions, but it protects the viral RNA from degradation inside the cells (Shaw et al., 1986). The early stages of infection, then, might be summarized in the following way. It should be emphasized that although this is a highly plausible sequence of events in view of current knowledge of the virus structure and its disassembly behavior, it has not been directly observed. When a virion first enters a plant cell, the low intracellular calcium concentration and the high pH (relative to the extracellular environment) remove protons and calcium ions from the two carboxyl-carboxylate pairs and the phosphate-carboxylate calcium site, allowing electrostatic repulsive forces from the negative charges to destabilize the virus. The protein-nucleic acid interactions involving the first 69 nucleotides are weaker than in the rest of the genome because of the absence of guanine bases (Goelet et al., 1982), so the protein subunits forming about 1.5 turns of the virus helix at the 5’ end are lost. The first start codon is thus exposed, and ribosomes bind and move toward the 3’ end during translation, competing with the coat protein, stripping the rest of the genome, and beginning the cycle of viral replication. This work was supported by NIH grant GM33265. The Evans and Sutherland PS340 graphics system was purchased with funds from NIH grant l-SlO-RR02506.
References Altschuh, D., Lesk, A. M., Bloomer, A. (‘. & Klug. A. (1987). J. Mol. Biol. 193, 693-707. Bancroft, J. R. (1970). Advan. Virus Res. 16, 99-134. Bernal, ,J. D. & Fankuchen, T. (1941). J. Gen. Physiol. 28, 111-165. Bloomer, A. C. & Butler. P. ,J. G. (1986). In The Plant Viruses (van Regenmortel, M. H. V. & FraenkelConrat. H.. eds). vol. 2. pp. 19-57, Plenum. New York. Bloomer, A. C.. Champness, .J. K.. Bricogne, G.. Staden. R. & Klug, A. (1978). Nature (London), 276, 36% 368. Burley, S. K. & Petsko, G. A. (1985). Science, 229, 23-28. Butler, I’. ,J. G. $ Klug, A. (1971). Nature New Riol. 229. 47-50. Butler. P. J. G., Finch, J. T. & Zimmern, D. (1977). Nature (London), 265, 217-219. Callahan, P. L.. Mizutani, S. & Colonno, R. J. (1985). Proc.
Nat.
Acad.
Sci.,
U.S.A.
82,
732-736.
Caspar, D. 1,. D. (1963). Advan. Protein Chem. 18, 37- 121. Champness, J. Ku’.. Bloomer, A. C.. Bricogne. G.. Butler. P.
Durham,
Acad.
Sci.,
U.S.A.
A. C. H. & Butler.
Biochem.
76, 2551-2555.
P. J. G. (1975).
Eur. .I.
53, 397404.
Durham. A. C. H. & Hendry. D. A. (1977). J’irology. 77, 519519. Durham, A. C. H., Vogel, D. & de Marcillac. G. D. (1977). Eur. J. Biochem. 79, 151-159. Einspahr, H. & Bugg, C. E. (1984). In Metal Ions in Biological Systems, (Sigel. H.. ed.). vol. 17. pp. 51-97, Dekker, Kew York. Fraenkel-Conrat. H. & Sherwood, M. (1967). Arch. Biochem. Biophys. 120, 571-577. Gallagher. W. H. & Lauffer. M. A. (1983a). J. Mol. Biol. 170, 905-919. Gallagher, W. H. & Lauffer, M. A. (19836). J. Mol. Biol. 170. 921-929. Gallie, D. R., Plaskitt, K. A. & Wilson. T. M. A. (1987). Virology, 158, 473-476. Goelet, I’., Lomonossoff. G. I’., Butler, P. ,J. G., Akam. M. E., Gait, M. ,J. & Karn, J. (1982). Proc. Nat. Acad. Sci..
U.S.A.
79, 5818-5822.
Gregory. *J. & Holmes, K. C. (1965). J. Mol. Biol. 13, 79& 801. Hendrickson, W. A. & Konnert, J. H. (1980). In Crystallography (Diamond, R.. Computing in Ramaseshan, S. & Venkatesan, K., eds), pp. 13.0113.26, Indian Academy of Science, Bangalore. Hogle. J. M., Chow, M. & Filman, D. ,J. (1985). Science, 229, 1358-1365. IUPAC-IUB Joint Commission on Biochemical Nomenclature (1983). Eur. J. Biochem. 131, 9-15. Jardetzky, O., Akasaka, K., Vogel, D., Morris, S. & Holmes, K. C. (1978). Nature (London), 273, 564566. Jones, T. A. ( 1982). In Computational Crystallography (Sayre, D., ed.), pp. 303-317, Oxford University Press, Oxford, Kitamura, N., Semler, B. L.. Rothberg. P. G.. Larsen,
(:. R., Adler. CI. ,J., Domer, A. ,J.. Ernini. E. A., Hanecak. R.. Lee, J. .J.. van der Werf, S., Anderson. (1. W. & Wimmer, E. (1981). Naturr (J,ondon.). 291, 5477553. Klug, A.. Crick. F. H. C. &, Wyckolf. H. W. (1958). Acta Crystallogr. 11, 199-213. Kurachi, K., Funatsu, G., Funatsu, M. & Hidaka. S. (1972). Agric. Biol. Chem. 36, 1109-1116. Lebeurier, G., ?iicolaieff, A. & Richards, K. E. (1977). Proc. Xat. Acad. Sci., I’.S.A. 74, 149153. Liljas, I,. (1986). Progr. Biophys. Mol. Biol. 48, l-36. Lobert. S.. Heil. P., ru’amba, K. & Stubbs. G. (1987). .I. Mol. Biol. 196, 9355938. Luo, M., Vriend. G., Kamer, G., Minor, I., Arnold, E., Rossmann, M. G., Boege, U.. Scraba, I). G.. Duke, G. M. & Palmenberg, A. C. (1987). Science, 235, 182191.
Makowski. I,. (1978). J. Appl. Crystallogr. 11. 273-283. Mandelkow, E., Holmes, K. C. & Gallwitz, U. (1976). J. Mol. Biol. 102, 2655285. Mandelkow, E., Stubbs. G. & Warren, S. (1981). J. Mol. Biol. 152, 375386. Mondragon, A. (1984). Ph.D. thesis, University of Cambridge. Moras. D., Olsen, K. W., Sabesan, M. I%., Buehner. M., Ford. G. C. & Rossmann, M. G. (1975). J. Biol. (‘hem. 250, 913779162. Namba. K. & Stubbs, G. (1985). Acta Crystallogr. sect. A. 41. 2522262. Samba. K. & Stubbs, G. (1986). Science, 231, 1401-1406. Namba, K. & Stubbs, G. (1987). Acta Crystallogr. sect. A, 43. 533-539. Samba, K., Caspar, D. L. D. & Stubbs, G. (1985). Science, 227. 773-776. Samba. K.. Caspar, D. L. D. & Stubbs, G. (1988). Biophys.
J. 53, 469475.
Nambudripad,
R. & Makowski.
L. (1987). Biophys.
J.
51.
373a.
Ollis, 1~. L.. Brick, P.. Hamlin. R., Xuong, K. G. & Steitz. ‘I’. A. (1985). Nature (London), 313. 762-766. Perham. R. N. & Thomas, J. 0. (1972). J. Mol. Biol. 62, 41&418.
Potschka, M., Koch, M. H. J., Adams, M. L. & Schuster, T. M. (1988). Biochemistry, 27, 8481-8491. Raghavendra, K., Kelly, ,J. A., Khairallah, L. & Schuster. T. M. (1988). Biochemistry, 27, 7583-7588. Ramakrishnan, C. & Ramachandran, G. N. (1965) Biophys.
J.
5, 909-933.
Richardson. ,J. S. (1981). Advan. Protein Chwn. 34, 167339. Rossmann, M. G.. Abad-Zapatero, C., Hermodson, M. A. & Erickson, J. W. (1983). J. Mol. Biol. 166, 37-83. Rossmann; M. G., Arnold, E., Erickson, J. W., Frankenberger. E. A., Griffith, ,J. P., Hecht, H.-J., Johnson, ?J. E.. Kamer, G., Luo; M., Mosser, A. G., Rueckert, R’. R., Sherry, B. &, Vriend, G. (1985). Nature (London),
317,
145-153.
Rueckert, R. R. (1985). In Virology (Fields, B. TV., Knipe, D. M., Chanock, R. M., Melnick, J., Roizman, B. & Shope, R., eds), pp. 705-738, Raven, New York. Saenger, W. (1984). Principles of Nucleic Acid Structure, Springer-Verlag, New York. Shalaby. R. A. F. & Lauffer, M. A. (1977). J. Mol. Biol. 116, 709-725. Shaw. ,J. G., Plaskitt, K. A. & Wilson, T. M. A. (1986). Virology, 148, 326336. Stauffacher, C. V., Usha, R., Harrington, M., Schmidt, T., Hosur. M. V. & Johnson, J. E. (1987). In CrystaZlography in MoEecular BioEogy (Moras, I).. Suck, D.,
Protein-Nucleic
Acid Interactions
Strandberg, B., Drenth, J. & Blundell, T.: eds), pp. 293-308, Plenum, New York. Steckert, J. J. & Schuster, T. M. (1982). Nature (London), 299, 32-36. Stubbs, G. (1984). In Biological Macromolecules and Assemblies, vol. 1. Virus Structures (Jurnak, F. A. & McPherson, A., eds), pp. 149202, Wiley, New York. Stubbs, G. (1989a). In Prediction of Protein Structure and Principles of Protein Conformation (Fasman, G., ed.), pp. 117-148, Plenum, New York. Stubbs, G. (1989b). Acta Crystdogr. sect. A, 45, 254-258. Stubbs, G. $ Diamond, R. (1975). Acta Grystallogr. sect. A, 31, 709-718. Stubbs, G. t Makowski, L. (1982). Acta Crystallcgr. sect. A, 38, 417425.
in TM V
Stubbs, G. & Stauffacher, C. (1981). J. Mol. Biol. 152, 387-396. Stubbs, G., Warren, S. & Holmes. K. (1977). ,%‘ature (London), 267, 21C221. Stubbs, G., Namba, K. & Makowski. L. (1986). Biophys. J. 49, 58-60. VanWezenbeck, P., Verver, J., Harmsen, J.,‘Vos, P. & van Kammen, A. (1983). EMBO J. 2, 941-946. Wang, A. H.-J., Quigley, G. J., Kolpak, F. J.. Crawford, J. L., van Boom, J. M., van der Mare], G. b Rich, A. (1979). Nature (London), 282, 680-686. Wilson, T. M. A. (1984). Virology, 137, 255-265. Wittmann-Liebold, B. & Wittmann, H. G. (1967). Mol. Gen. Genet. 100, 358-363. Zimmern, D. (1977). Cell, 11, 455-462.
Edited by R. Huber