J. Mol.
Biol.
(1979)
132,
19-51
Refined Models for Computer Simulation
of Protein Folding
Applications to the Study of Conserved Secondary Hinge Points During the Folding of Pancreatic BARRY
ROBSON
AND
DAVID
J.
Structure and Flexible Trypsin Inhibitor
OSGUTHORPE
Department of Biochemistry The University Manchester Ml3 9PL, England (Received
18 August
1978)
A new model and parameters are proposed for the protein folding, which satisfy requirements for a fully discussed in recent critical reviews.
computer automatic
simulation simulation
of as
The parameters were obtained, refined or checked by empirical observations on proteins of known sequence and conformation, in order to avoid as much as possible theoretical deductions about the nature of the interactions between groups in proteins, which may not be justified by the current status of the art. The major improvement over previous methods is to retain a more realistic and complete representation of the protein backbone, and t3 alternatively reduce the number of variables by coupling their behaviour. As an example, the method is applied to simulate the folding of pancreatic trypsin inhibitor, and leads to a root-mean-square fit of 6.0 A with good secondary structure. This also allows a more detailed examination of secondary structure transitions during protein folding than has been possible hitherto. Although, in the simulation discussed most extensively, the advantage of initial statistical predictions is demonstrated, the secondary structure was free to change in the simulation. A simulation from an extended chain is also described, and refinements tested. By observing changes in secondary structure during the simulated folding, it is shown that cc-helices and extended chain regions predicted at the outset, or formed early in the simulation, are conserved, and that certain residues are crucial as flexible hinge-points to bring the secondary stru&ure together in order to achieve tertiary packing. In view of recent debate about the importance of glycyl residues as hinge-points, and the danger of imparting glycyl-like backbone behaviour to non-glycyl residlles suspected to be hinge-points, it is of considerable interest that the hingepoint, residues identified by us are ~~02, in general, glycyl residues. This makes an important distinction between a “reverse turn region”, for which glycine is statistically a strong candidate, and a hinge-point in the protein backbone. It is discussed t,hat reverse turns are locally determined and likely to be fairly stable during the folding process, while hinge-points are determined by tertiary interactions. This distinction, implicit in most papers concerned with statistical methods of secondary structure prediction, has not been made clearly in recent reports of folding simulations. 19 0020~SS36~79/210019-33
$02.00,‘0
0
1979
Academic
Press
Inc.
(London)
Ltd.
20
R.
ROBSOK
ANI)
I>.
J.
OSGITTHORPE:
1. Introduction The ability to simulate the folding up of a globular protein. and hence predict its tertiary structure from t,he amino a,cid sequence alone. would bc of considerahlr biological and bechnological significance (Robson. 1976). Rrcentl~~. therr has twn a number of attempts to simulate protein folding, particularly that, of parxreatic trypsin inhibitor because it contains only 58 amino acids and because its X-ra) structure is known (Huber et al., 1971). These attempts ha~w lwn critically revirwed by NBmethy & Scheraga (1977), and Hagler 85 Honig (1978). The methods for attempting to predict, t,ert’iary atructurcl havc~ Iwn ot’ tn.0 typw. The Monte Carlo approach of t,hc Scheraga sc1~o1-~1.discuswd iii detail in the rcvic\\ by N’6met’hv & Scheraga (1977). sa~mples the conformational ~nerpy surface at random and the emphasis is on the prediction of long-range residue to residue cwntaets \~+cn the secondary sbructure is well-predict,ed in advance or is cwnstraincd closely to the* observed secondary structure. Further, it assumw that tlrc natiw struotuw of iI protein is that of lea,st energy. whereas it might bc the low-rner,qconformation t’hnt the protein ca,n reach from an open conformation in rcasonablc~ time. Folding procedures involving minimisat,ion of the conforma,tional wrrgy. on t,hr othw hand. take some account, of the latter possibility but. in order to wtluw t,hr c~omputation time t,o managea,ble proportions. uw wry simplified rcpresentntion of thcx molecular structure of t,he polypeptide chain (Levitt $ Warshel. 1975: Lrvitt. 1976: Kuntz et al., 1976). The simplifications and other assumptions made in tlw lattw rnt~thodh have left them open to the criticism that, they ma,,v havr led to fortuitous approximatt~ agreement with the observed structure (Robson. 1976 : N&~~cth,v & Scheraga. 1977 : Hagler & Honig, 1978). Further. they simplify the hackbow structurtt to such it degree that they cannot be used to provide d&ailed inforrnat,ion alwut changes in secondary structure during folding. Whereas it ca,nnot, be doubted that simplitications and assumptions art! ntwssar>. at present if any attack is t#o be made on the folding problem. it’ is o~~viously csscwtia I to avoid t’hosc t’hat have been subjected to justifiable criticism. Prom the det~ailctl criticisms presented lay NCmethy & Scheraga (1977). it f)ll t 0 w‘S that, Rl14’ acc:eptatjlt~ simulation must : (1) take account of I)ackbonc-lJack~)orle hydrogtan bonding in a \~.a>. consistent with the stabilit’y of secondary structuw ffkaturw such as a-helix (sw also twha8viour to tlic non-glycJ.1 Robson, 1975) : (2) not assign glyc,vl-like backhow residues that, are seen to occur in wvcrsc turns in ttw 1”l’Tt hnckhor~ct (for further discussion, see Hagler & Honig. 1978): (3) use t htb SilIll(‘ potential futwt,ions (*OILsistently throughout the simulation : and (4) use a c:onfi,rrnat,ionRl enrrgy rninimisation procedure that is entirely automatic and that cannot be cont~rollrd at. intermcdiatt~ st’ages and so take into account the subjective judgcrnents of thcl uscr. That a sirnulntion is wzceptablc by these criteria dew not, of co~~rs~~. guaral~tw succ’c~s in fj(brtI1s of :u(~(w. good predictions of trrt’iary structure, but, it M~JS guard a,gainat, fortuitous The simulations carried out in this \vork satisfy these criteria. and do so in a \I a~ that still preserves the attractive time-saving features. Mow important. the!>- c>~nplc~~ a more detailed backbone representation that allows xecondxr~~ structure changes tc) be investigated. Thus a simplification that wc have r&, employed is to neglect !,I~(~ detailed position of backbone NH and CO groups. Levitt & Warshel (1975) hal~,erl the number of backbone variables by neglecting the NH and CO groups and 1~). i Abbreviation
used:
I’TI,
pancreatic
t,rypsin
inhibit)ur.
COMPUTER
SIMULATION
OF
PROTEIN
FOLDING
21
joining the Ca atoms of the backbone with virtual bonds (Brant & Flory, 1965). Since the peptide group angle w (Ca-EN-@) is constrained close to a planar-trana configuration, each virtual bond rotation angle (Ca . . . Ca . . . Ca . . . . Ca) Ohen replaces both backbone variable rotation angles + (C,-N-CY-C’) of the ith residue and Z,!J(K-Ca-Cl-N) of the (i - I)th residue. However, even if representations of KH and CO groups are placed near the virtual bond (Levitt, 1976), they still cannot have the appropriate directionality, which depends on 4 and I+. Since this directionality strongly affects the relative stability of different secondary st’ructure features? this contravenes the first requirement for an acceptable simulation as listed above. Nevertheless, it is clear that replacement of # and 4 by a single variable would result in a considerable saving of computer time because, with the simplified sidechain representation, it reduces the folding of PTI to a problem in 58 variables. Noting that the behaviour of 4 and of # are not independent, our method is to couple the behaviour of 4 and # of the same residue, rather than to neglect their individual behaviour by a simplification of the backbone. A new variable, y, is introduced as a function of both 4 and #. The conformational energy of PTI is minimised as a function of 58 y variables, and at every stage in the minimisation the va,lues of 4 and 4 calculated from those y values are used in the building up of the molecular structure and hence in the calculation of its energy. This corresponds to constraining the conformational transitions of each residue to a single reaction path, the distance along which is represented by y. In order to be able to reproduce most residue conformations observed in protein as exactly as possible, such a reaction path must pass through all regions of residue conformational space that are well-populated in proteins. Further, in order to satisfy criterion (l), it must also pass through all regions of residue conformational space associated with important secondary structure features. This differs from previous methods of reducing the number of backbone variables (1) because it implies a constraint, rather than a simplification of molecular structures. (2) because 4 and $ of the same residue are replaced by a single variable, rather than # of one residue and 4 of the residue immediately following it, and (3) becausr: realistic secondary structure hydrogen bonding is produced for important secondary structure features. Almost all parameters derived in this work were derived, refined or checked by observations on proteins of known sequence and conformation (not including PTI) This is a major distinction between this and previous studies, and reflects our viea that, in t,he present status of the art of peptide conformational analysis, empirica, observations on proteins provide a more realistic and safer account of protein group interactions. Again we emphasise that, compared with other folding simulations, the simulation described here provides much greater detail about changes in, and hence the role of, secondary structure during the folding process. For example, it is shown that the ability to carry out good predictions of secondary structure at the outset (prior t’o the folding simulation) is useful but not essential for all regions of the backbone. Further, the simulation reveals which regions of the backbone have relatively labile secondary structure and so act as crucial hinge-points during the folding. These hinge-points revealed by our simulation do not usually contain glycine, and this contests the conclusion of Hagler & Honig (1978) that placing glycine in regions known to be turns in the backbone of PTI (Levitt $ Warshel, 1975) may be essential
“2
H.
ROBSON
ANI)
L,.
J.
OSGUTHORI’E
for simulating PTI folding. Finally. we note that, since the detailed configuration of groups is likely to he essential for the biological function of a, protein. good predictions of secondary structure at the end of a simulation must obviously be as important a criterion of a successful simulation as the long-range int,eractions between residues. and this can he achieved only with a model like that, described here. which place equal emphasis on the details and behaviour of the secondary struct~uw.
2. Methods (1971) has noted that the observed values of 4 and z,bfor tlipeptidr units in globular s ace. Hence the simplest Patti tend to lie close to an elliptical path in tlic $- 4 Lp for conformational transitions in individual residues might bc that, described by uri elliptical function. The problem is to derive an elliptical functiori tliat delivers values of 4 and # as a function of only one variable. y, the other geometric parameters of the ellipse the latter constant parameters could he made being taken as constant. In principle, dependent on the type of dipeptide, blit at the present level of approximation this was not found to be particularly useful. To obtain the constant’ parameters that define a specific ellipse in 4-4 space, we (1) deterrnined the ellipse that would best, fit the dist,ribution of residue conformations in proteins as studied by Robson & Pain (1974), arid (2) made minor adjustments to ensure that, the ellipse passed t,hroriph the lowest, energy. minima associated with the important secondary strrtct~~ire features in proteins (namely, and reverse tiirns), as well as passing rightand left-hand cc-helices, P-pleated sheets, through or close to the lowest energy routes connecting those minima. The rosuhirrg ellipse is shown in Fig. 1. 0) rnsy bc irithcatcd by ii Any point on this ellipse (which has it,s centro at r$ 0, 4 single value, y, which is the angle between a fixed arbitrary vector passing through t,tle origin 4 = 0, $ = 0 and a vector dra,vvn from thr origin t:) t,his point,. The arbitmr!, reference vector was taken as that, drawn t,o the most extended ronforrnat#ion in the pleated sheet region, which lies at 45’ witti respect] tr) the + axis. A vralue of y ~7 0 corresponds to ,$ = - 140, 4 = 140. y is defined as positive for an anticlockwise rotation arouiitl the origin, so that the rightand left-hand cc-heliccs tliat occur at 4 ~7 ~~~60. u’, --- - 50 and 4 = 60, $ = 50, are defined by y 85 and --85. respectiv,(,ly. The penoral eqiiat,iori relating y and C$ and (CI is: Pohl
proteins
where 8 (the tilt of t,lie ellipse) = 45’. (X 198.00 and h -~ 77.78. It is t,lic vralut: y for c~~clr dipeptide urlit in a protein, rather than C$ and 4, ttiat, is manipulat~cd by the minirniser, so that the conformational energy is rninirnised only as a, function of tlie y variables. (b) Conformational
en,ergy
parameters
for
idemctio,/,s
withi)/
tlipeptide
units
For interactions within each glycyl atid alarryl dipcptidr unit, of the prot.eirt, ari rhxac+. all-atom representation was used, the conformational eiiergy of ttie alanyl side-chaiii being minimisod for each value of C$ and 4. The pntrnt,ial functiotis (witti the 9th power repulsion term) of Hagler et al. (1974) were used in corijunctiori vvitti the molcciilar geometr) of Brant 85 Flory (1965). Justified by Robson & Haglrr (1979) and Robson et al. (1978) the value of the repulsion term was tialred for all at$orns in virinal (I 4) contact>. Ttiis regions, is consistent with ttie mimics the effect of relaxed geometry in low--energy and account,s for rxperimrutal properties of extended-basis set ah initio calculations, 1979; Robsou et al.. 1978). 4 dielectric constarit, dipeptides in solution (Robson & Hagler. of 3.5 was also employed, as used by Brant et al. (1967), because tliis gives an energy coinsistent with Monte Carlo simulation of alanyl dipept’ide interactions with water for t.n-
COXPUTER
SIMULATION
OF
PROTEIN
23
FOLDING
60
-6C
-12C
-120
-18C
FIG. 1. and
the
(l (0 (l (l (l (0) Robson (A) (0) R and The ellipse upper clockwise
Energy ellipse
surface that best
-60
of a residue capable passes through the
60
of forming principal
pleated secondary
120
sheet and struct)ure
P 11 ) Idealised antiparallel conformation. P fi ) Idealised parallel pleated sheet conformation. tlB) Idealised right-hand a-helix. Q,) Idealised left-hand a-helix. uII) Modified x-helix common in proteins (see Ku’Bmethy el nl., Centres of populations associated with pleated sheet regions & Pain (1974), and P,,,,, as found by Pohl (1971). Conformations found in reverse turns. Principal lowest-energy crossing point between a-helices and L, direction of rightand left-hand twist to pleated sheet as ellipse defines the y variable, which is the angle specifying a and major and minor axes of the ellipse are defined. The origin left apex of the ellipse (extended chain), and positive values movement round the ellipse centre.
1967). P,
0
ix-helices features.
and
F’,
(see
as
the
defined
test).
by
extended chain. defined by Chothia (1973). point on the ellipse once the is taken as the point on the are determined by an anti-
The energy surfacr is for a model system chosen to approximate a dipeptide unit including tertiary interactions with t,he rest) of the protein. Two central residues of an N-acetyl (alanyl), *V’-methylamide chain were fixed in a type I reverse turn conformation, and the rest of the 4-4 angles varied according to the periodicit,y constraint that all 4 angles are equal, and all 4 angles are equal. The region near I# = -140, I/J = +I40 then corresponds to a loop of antiparallel pleated sheet,, while the region near + = ~ 60, $ = - 50 corresponds to a right-hand a-helix with a central distortion often observed in helices of proteins (Robson & Fain, 1971). The energy c,ontours ar’c at 0, 2, 5 and 10 kcal (mol dipeptitle unit) -I. The turn energy was minimiscd in the q, region.
24
B.
ROBSON
AND
D.
J.
OSGUTHOKI’K
Osguthorpe & R. Robson, unpublished results). T11e resulting ellergiw of tllc: tlipt!ptitlw as a function of y arc shown in Table I. It should be noted that these energies are expressed \vith wspc>ct to t,tlc, rnirlirnrllll III +-$ space. The reasonable energies of principal minimlun arid harricxrs ill t,lltx ~‘lrt~rg>. surface used may be indicated by comparison with the ah ,initio calculations. \vtliutl II:L\.(~ been carried out itI most detail for N-formyl glycyl amino a,rrd ,V-fornryt glyc.>,l .\” methylamidr ana,logues (Robson et al., 1978; Hillier $ Rohson, 1979). At’ y m-z 0. a(~ irfific, calculation gives (depending on tile ana~lopnr and clloiw of ~~stt~ntlvd tJa.sis SC%) hot)\\ ~vbri zero and 1 kcwl, compared with 0.7 kcal used ill tlw foltiiog program. For ttrc, t)arrit~r al y = 45”, ab initio gives 4.5 to 5.5 kcal compared wit11 3.6 kcal IISCX~ it1 t,lrc, folding pr,jgrtrlrl. ‘I’llis is a K . . . . N cis configrwation. which may br relaxed by flexible gcwnwtq.. \\‘it,tl minirnisatioll of some crucial torsion angles, ab in&o ga.vo 3.95 kca.1 a,t Uris point. l<‘or t)llt, barrier at, y - 135”, ab initio calculation gave 17 t’o 1 9 kcal, compwrcvi \vith 15.3 kcal 11s~ t in the folding program. Minimisation of tile peomet,ry of this c” ( I’ & corlfglll.at,i(,l~ gave 12.3 kcal. At, y = 85”. the cr.helical conformation, 6.7 kcal was obtairrcd hy ah initio calculation, in relatively poor agreement (considering tile importa.ticr of ttris co~iforIii;l,t~io~~) wit11 2.3 kcal used in tttp folding simulation. Howr~~w. t.ltr ah ivitio c~atclllatiorln \vw’. <,I’ course, irr WICUO. T11e preliminary M0rrt.c Carlo stlldic,s mentiotlrd atw\-cs sl~ggwt, tllkrt ~11~ r*-llelical tlipept,idp llnit8 conformatiolr will tw t,trc most, st,rougly st,at)iIisc~tl. tluv t,o I,IIC~ larger dipole moment of t*llis confignrwtiori. and t8hc wat,er sollitc iritc~ractiotl c*oflst,if,litt~s T,.fi kcal at this point,. Thr resulting value of ahollt 1 .I kcal (hot)lr for gly(‘illrl ant1 ala~ri~lv) id then in reasonable a,grrement with bhc value of 2.3 kc~l llsrtl irr tllc, folding sim~~lat,iot~. The method of assigning the energies of a function of y to rrorr-alally dipeptitlc rltlits reflected two problems. The first, was t,hat> good potential wwrpy surfa.ws bawd on 6 to !I potentials, ah initio calculations, flexible geometry and solvorrt8 c+f’wts are riot a\-ailal)lt~ for dipeptides ottlrr than those of glycinv and alanillc: (t,trcw is critl~~llct~ frorrl wdcuIat,totl 01’ nuclear magtl&ic resorlance and hydrodynurnic proprrt.ies of prpt,itltls t~!lat 111(~ s:rll\.~,nt contribution is modified by tftrrl larger side-ctlains: Robsorr ?A Haglcr. 1979). 7’11~ *fw)1lct IS tllat we wrre interest,rd in tlir possibility of ilitroducinp statistica. rcwllts from prot(‘illh of known seyur*lrce atd conformation itr order to r’eprcwtlt, the cr)rrfol.rllittiorl~~,l hc~tra\~iorlr of residues wit11 thv rnirrirnum of assumpt~ions. Tht~ rricarg~. K’ (y. dipc>pticlc). \vllictl is it function of y and the t,ype of dipeptidr: (tprosyl. prolyl. cats.). was t,tlc~rcforc~ catclllat& fi)r all dipeptidrx units. ottlcr than glycyl and alanyl. as IC’ (y,dipeptide)
ry I (y.alanillo
dipept,idc)
~~~ ul
(y,tLipoptid~~)
1 ~(y.;ltarlitl~~tlipf~ptitl~~).
(2)
wtierc tlic inforinat,ioIl nieas~uw 1 (y.dipoptidc) ant1 /(y.alarutw tlipept~itlt~) \v~rth ot)t attlc*tl from Robson & Pain (1974) and Robson & SllzlAi (1976). \Vit,h respect to thr latt,c,r. datil.. and conforniatiorr hrit art’ liniittbtl t0 wllich arc based on 25 proteins of known scquenw various regions of 4-4 space, sine funct,ions of C$ arid $ were used to iIlterpolato int,c:r\ (‘1li11~ values and to produce information maps in qualitative ayreemont with t’how proth~c~vl I)> Robson & E’ain (1974). E(y,alanine dipeptidc) is t.hcl wtllal rr1rrg.y srufacls for thts t~lmtir~t~ dipeptide calculated using the all-atom rcprwcntatiorl. The value of (L t!hat defines the energy aurfa,ce of tlipcptidw otllc~r tIleI glypyl :111(t alanyl (see eqn (2)) was chosen in the followin g wa,y. Since t,Iw major f+%:c~t. of iticrensitig r is to raise the energy of the r*-helical region of Ilrlix-breaking rosiduw. the> erlerpics of allelices and hairpin loops of antiparallel pleat,ed shcst,t (obser\,cd in protritls) wwv c:alv~llated and c( adjusted so tllat if the cc-helix regions arr observed. tlwy arr also calculat,cd :I:: tlq~e~ids being of lower energy (and similarly for pleated shwt, rc~gions). Thl .s assignment on tile paramet,ers used for backbone backborlc~ irlt,wactions. wllicll wer’r tleriwrl 11). dependently, as described helo\v. Bcillg dwived from irlformat~iotl mcasu~w, tl\tx c’rlc’r’g> maps take Into accolrnt tile energy surface of ttlo tlipept.i(lc antI a\‘t~rag~~ interttct,iotw u itIt, neighbouring residues. Some of thr latt,cr contrihutio,l is t’llus. strictly spcakitlg. cot~r~t,tscl twice, although it illtrodnces empirical information from protrins of knowrl scqucnc~~ ant1 conformation, wllictl may rompensattl for hidden dcfirirnciw ill tttc, paranwters for irri~c~v actions botwwn dipeptide units. 0t1 ttw ot,hclr tratlltl, it, rntiy tw ~~ote(l tllat t,llc, stirl)itisat,ilJrl t0wlthot~c~ irlt.c~r.;r,ctions. \vtlicll ilvt’ 1101, of tile /3-sllecLt a.nd n-trelical regions I)>- hackhorrc~ intradipeptidc unit, i,ltcractiorln, is implicitly allow~l for by tllis prowd~w~~. atrtl irl sr1c.11 ~3 way as to bc corlsistc>rlt with our backbotle hackho~~t~ irltcractiotl paranletvrn.
COMPUTER (c) Conformation
SIMULATION energy
OF for
parameters
All interaction energies eij between calculated by a simple functional form: et,
=
interactions
groups
Aij
1
rtj
i and
4n,
-
PROTEIN
a >
rij
between
.i in
b,
25
FOLDING
diflerent
dipeptide dipeptide
units urlits
xvere
(3)
\vhrre S,i and Bij are dependent on the nature of the two groups, and rlj is tile distanctl between their centres. Aij and Bij may be readily calculated from a knowledge of thrx minimum energy Efj of the interaction as a function of rii, and from Rli, the value of rii at which the energy is a minimum : At
= b-a
B,i
=
b
--!I? b-a
Eii
R$
Eii
Rfi.
(5)
electrostatic, hydrogen bonding and Eqn (3) is parameterised to include van der Waals’, solvent-dependent interactions without resolving them into individual contributions. This simplifies the assignment of parameters, avoids the use of complex models for interactions that cannot be justified at the current level of understanding of protein group interactions in solvent, and also simplifies the interpretation of any discrepancies between the predicted and observed protein structures. (i) Backbone
parameters
For backbone+backbone interactions between dipeptide units, grouping of atoms into molecular groups with single centres of action simplifies parameterisation, and speeds calculation (calculation of square roots in the evaluation of distances between atoms 01 groups is a major time-consuming factor). Groupings were made that produce conformational energies consistent with the all-atom representation and the 6 to 9 potential functions of Hagler et al. (1974), using halved vicinal van der Waals’ repulsions and a dielectric constant of 3.5 (for justification of this, see the above section on conformational energy parameters for interactions within dipeptide units). The calculations were compared for S-acetyl polyalanyl A”-methylamide in all periodic conformations (i.e. in which all 4 values are the sarne and all y5 values the same), and in all periodic conformations containing a reverse turn in the middle of the chain (see Fig. 1). The latter allows an estimate of the relative energy of the hairpin loop of antiparallel pleated sheet. It was found that the CaH, the side-chain CH,, and (surprisingly) the amide NH and carbonyl CO groups could all be represented as single centres of action and still give energies within 2 kcal (mol dipeptide unit)-l of the all-atom representation for energies less than 10 kcal (mol dipeptide unit) - l. Although NH and CO groups had to remain dependent on 4 and ZJ in order to produce agreement, the directionality of hydrogen bonding was preserved because of repulsions involving the adjacent backbone atoms when the NH group is centred on the amide nitrogen and the CO group is centred on the carbonyl carbon. The powers a and b (eqn (3)) with values 9 and 6 were found to be suitable for the group representation as for the all-atorn representation and the optimal values of Aij and Bii (rlqns (4) and (5)) that produce the best fit to the all-atom representation are shown in Table 3. Prior to refinement, the interactions between some backbone groups (and the alarryl side-chain), including NH . . . . NH interactions, CO . CO interactions, NH . CaH interactions and so on, were calculated according to the solvation rnodel described below. These interactions required relatively little refinement to make them consistent, with the all-atom representation, which neglects solvent except in terms of a dielectric constant. This emphasises the value of the simple functional representation employed here, because more complex functional forms invariably imply assumptions about the qualitative nature of the interactions, whereas quantitative agreement between different approaches does not necessarily arise only when the approaches are based on the same physical interpretation.
H
0.8 1.1 0.7 0.6 0.9 2.2 3.6 3.6 2.9 2.5 2.2 2.4 3.5 6.1 9.7 16.2 14.4 4.7 0.8 1.1
Dipeptide energies IUPAC l-letter code.
160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350
of the
4.7 16.4 12.2 9.8 11.5 17.9 14.9 14.2 8.5 6.5 4.7 3.7 4.3 6.6 IO.0 16.3 14.5 4.4 0.4 0.7
20 diprptide
6.2 17.6 13.4 12.1 14.5 20.9 17.4 16.6 10.5 8.2 6.1 4.9 5.2 7.3 10.6 16.5 14.7 4.5 0.5 0.8
5.6 17.1 12.9 11.3 13.4 19.8 16.5 15.7 9.8 7.5 5.6 4.5 4.8 7.0 IO.3 16.4 14.6 4.4 0.5 0.8 units
6.4 17.8 13.6 12.4 14.9 21.3 17.9 17.0 10.9 8.5 6.4 5.2 5.4 7.5 10.7 16.6 14.8 4.5 0.5 0.8 naturally
7.3 18.5 14.6 14.3 17.4 23.7 19.7 18.8 12.2 9.6 7.2 5.7 5.8 7.6 IO.8 16.5 14.6 4.3 0.5 1.0 occurring
7.0 18.3 14.3 13.8 16.7 23.0 19.2 18.2 11.8 9.3 7.0 5.6 5.6 7.6 10.7 16.5 14.7 4.4 0.5 1.0
6.1 17.5 13.5 12.4 14.9 21.2 17.6 16.7 10.5 8.2 6.0 4.8 5.0 7.1 10.4 16.3 14.5 4.3 0.5 0.9 in globular
4.4 16.2 11.9 9.3 10.8 17.1 14.3 13.6 8.1 6.1 4.4 3.5 4.1 6.5 10.0 16.3 14.5 4.5 0.4 0.7
6.2
proteins,
17.6 13.7 12.8 15.4 21.7 17.9 17.1 10.7 8.3 6.1 4.8 5.0 7.0 10.3 16.2 14.4 4.2 0.5 I.0
5.7 17.2 13.1 11.6 13.9 20.2 16.8 16.0 10.0 7.7 5.7 4.5 4.8 7.0 10.3 16.3 14.5 4.4 0.5 0.8 as a function
6.2 17.6 13.5 12.3 14.7 21.1 17.6 16.8 10.6 8.3 6.2 5.0 5.2 7.3 10.6 16.5 14.7 4.5 0.5 0.8
6.5 17.9 13.9 13.0 15.7 22.1 18.4 17.5 11.2 6.7 6.5 5.2 5.4 7.4 10.6 16.4 14.6 4.4 0.5 0.9 of y (see
5.8 17.4 13.3 12.1 14.4 20.8 17.2 16.4 10.2 7.9 5.8 4.6 4.9 7.0 10.3 16.2 14.4 4.3 0.5 0.9 the
text).
5.9 17.4 13.3 11.9 14.3 20.6 17.2 16.4 10.3 8.0 5.9 4.7 5.0 7.1 10.4 16.4 14.6 4.4 0.5 0.8 The
7.3 18.5 14.7 14.6 17.7 24.0 19.9 18.9 12.3 9.6 7.2 5.7 5.7 7.6 10.7 16.4 14.5 4.2 0.5 1.0
dipeptide
7.1 18.2 13.7 12.4 14.9 21.5 18.2 17.4 11.4 9.1 7.0 5.9 6.1 8.2 11.4 17.3 15.4 4.9 0.6 0.7
names
6.8 18.1 14.0 13.3 16.0 22.4 18.7 17.8 11.4 9.0 6.8 5.4 5.6 7.6 10.8 16.6 14.7 4.4 0.5 0.9
are
5.4 16.9 12.5 10.4 12.3 18.7 15.7 15.0 9.3 7.2 5.3 4.3 4.8 7.1 10.5 16.6 14.8 4.6 0.5 0.7
in
standard
300.8 372.8 347.9 339.8 358.9 380.4 367.6 364.0 333.0 313.7 288.7 259.7 229.1 199.4 172.5 122.6 101.6 20.3 2.8 63.7
The paramet,crs frij (eqns (4) and (5)) for eacli pair of itltt.ractitlg sldc*-clle.itls i ,rtitl j nli,.? be obtained from thr, most probable interaction distances brt\l-cztLrl spatially nci,ghbouriIrg and conformat,iotl ((‘rampin et al.. 197X). side-chains in prot,rins of known sequence Generally, it, is better to underestimate parameters Rij for sidr-clrai n irlt,f’l.A(‘t,iorls. I)c~+~,r~sc~ tile above work emphasised that side-chains can rcorirntlat.e to pormit closer approac~h~~s tllan the most probable separation distance. Similarly, it, is also desirable to use a s,)ft VI’ repulsive term to represent this reorientation effect, though this is t,o sorn~ tlrgrt~~ ~~),11pensated for by the fact that van der Waals’ repulsions from many atoms accumulatc~ to repulsive force between groups (Levitt. 1976). By c~alculations OII rnyoglol,itl give a “stiff’” cr-helices, we decided that use of a 7 9, h = 6 \vas again appropriat,rx. it1 conjturc%ic)ll with somr rofinemrnt~s of the radii constitllting a reduction of t,hr Inost, probable cot1b;tc.l distance of IIon-alanyl side-chains by 15 to 25%,. Those \vero t,llrs smallest retluc%ions t,llat, would avoid clashes between the larger side-chains in a-helices. and VY’ haye rmploycd it. in the parametcrisation of all side-cha,in interactions indepc>rld(bnt of the hackhonc> C*OII. formation. A similar technique was used to parameterisc: side-chain hackborrn intrractSiorlr (again wing a -:- 9, h _ 6). These parameter assignments differ from tllctstl IISNL 1)~. previous workers in folding simulations because, like all t,hcl param’tc’rs used k,y IIF. ~IIv> are explicitly derived and refined by observations of kno\vrl protein struc*tlu*cs. (iii)
Solvent
epects
Tile effect of tho solvent may be introduced irlto cqr~ (3) cia t,llci Et, term. For all irrt,fst.actions between dipeptide units, solvent was taken into account in t,his way. Howc~\.t~r. it may be noted (see above) that the solvent effect was also irrtSroducetl for interactions within dipeptides via a dielectric constant, while backbone ha.rkborlcl interactiorls h(?i.w(~(~t~ units were flIrther refined, as described above, in order t,o make them rrasonably c*( )I, sistent with the all-atom representation an,d the treatment drscri bed helow. Approximate methods for treating the solvent &feet arr widely ernploye~l (SCY~ ~‘.p. Gibson & Scheraga, 1967 ; Levitt, 1976). Such methods confine the influence of prot,t,irl groups on water to a shell of hydration around the groups, ihfld ca,lculate tllrx c’nerg\’ ot exclusion of sornc of the water from interaction between groups on the basis of mutual their llydration shells. The validity of this approach is suggested by solvent Monte (‘a~10 studies (Moult & Hagller, 1978; A. T. HagIer, D. .J. Osgutllorpc~ & 13. Robson. ~~nput)lisllc?ti results). In calculating group interactSions in water, ttlc following motlrl \VitS Ils&. 1 t, \vas asslir~r~~l shell 3.7 ‘4 tle~p. ‘I’hc rx that each group X is spherical with a radius ox ,A and a salvation terms were determined from the contact) distances Rij discussed ahov~. axsurniny /2,; IZ 11, group ,j displaces group i is approached by group j to a distanccl \~ibt IA,’ 9.i 1 yj. When from the salvation shell of i, and vice ver8a. If Fx is thcl free energy for rc~rnoving all \v:at(‘r frown grcjnp X, atrd fx is t.hc actual fraction of water rc~rnovc~d. tlrcxtl
The
fraction
fx
rernovcd
from
a group
is itI turn
gi\-cl1
h?
where 2)x is tile volume> of water excluded by tlie approacll of anottrtbr group, 7’1aL is t ll(\ total volume of water around group X wit(hout taking the prescncc> of caovalntltly t)l,rrtlttti factor for thr- volume of wattar disbackbone atoms into account, and ‘ubb is a correction placed from 7>tOt by the permanent presence of the backbone. The ox, l:tr,t ard v,,~ tcsrnts aI1 depend on the radius ox of group X, and are readily calculated by R-dimensional geom(ltry. The ox term also depends, of course, on Rij and the radius of the a.pproacl,illg grollp. I II spection of rnodcls suggested that the effect of tire hackhorrcb on t’trrl Ilydrutioll shc~ll of ~1
COMPUTER
RIMULATIOS
OF
PROTEIS
FOLDING
29
side-cha,rn could be represented to a first approximation by a planar surface or, in practice, a sphere of diameter 100 A with a centre 98 A from the side-chain plus the length of the virtual bond linking the side-chain centre to the Ca atom. The free energies F, for removing all the water (utot - vUbb) from around a group may bo most readily obtained from thermodynamic transfer experiments in which group X is transferred from water to a non-aqueous solvent. They should then be checked (as described below) against empirical parameters derived by observations of proteins of known conformation. On the assumption that the non-aqueous solvent in the transfer experiments should resemble the interior of a globular protein (when parameters appropriate to protein stability are to be obtained), one should justify the choice of transfer experiments to be considered in defining parameters. We consider a solvent as resembling tile protein interior if (1) it has some polar character, since this is characteristic of many protein interiors, and (2) it does not generate a significant solvophobic effect (Ray, 1971). Nozaki & Tanford (1971) recommended parameters derived from transfer experiments that include ethanol or dioxane as the recipient solvent. The choice of ethanol and dioxane is supported if the transfer free energies correlate tolerably well with the distribution of side-chains between the interior and exterior of globular proteins whose structures have been determined by X-ray crystallography. The information furlction I(S = X:x; R) was thus evaluated by statistical analysis (Robson. 1974). where S = X is the state of a side-chain highly solvated on the protein surface, S = x is the state of a residue buried or mostly buried in the protein interior, and R is the type of side-chain (e.g. alanyl). The data base (25 proteins) used by Tanaka 8: Scheraga (1976) was employed. Formally, the t,ransfer free energy should be zero when the information function is zero, and there should be a const,ant factor of proportionality. However, t,his factor need not be RT, because the average perturbing energy is not thermal but arises from environmental perturbations due t.o interactions within the protein. Fig. 2 shows a reasonable correlation wit.h the transfer t,o ethanol and dioxane. (RT == gas constant x absolute temperature.) As suspected, the constant of proportionality between the free energy and the information was larger than RT, because interaction energies inside proteins frequently exceed 122’. Approximate free energies may bc obtained from the information measures by multiplying tile information by 1.1. Since side-chains with a very high energy of transfer to non-aqueous solvent cannot be determined by transfer experiments, it is valuable that they can be estimated by tho information analysis. The transfer free energies for t,lre side-cllains estimated in this way are listed in Table 2. Tile experimental F, values include changes in van der Waals’, electrostatic, and hydrogen-bonding interactions in going from water to ethanol, so that we do not add these in as a separate contribution. On the other hand, these contributions relate principally to t,ransfer to ctha,nol, whereas in a folding simulation them is considerable variation in callaracter of tile groups that mutually displace salvation water, and this will be particularly important for interactions involving t)wo polar groups. We have only adjusted t)hc Ex value for polar side-chains on the basis of their electrostatic interactions with water molecules, using space-filling molecular models and data from the Monte Carlo calculations of water behaviour around peptides (Hagler & Moult, 1979; A. T. Hagler, D. J. Osgut,horpe & B. Robson, unpublished results). Some minor adjustments of parameters for the less polar and non-polar side-chains have also hecn made (see E* in Table 3), based on an analysis of the dist,ribut)ion of these side-chains bctwcon polar and non-polar regions of thr interiors of proteins of known st)ructure (cf. Crampin et al., 1978). Although them are, in our view, insufficient data in proteins of known conformation for refining free energies between specific pairs of side-chains, the refinement did conform to our philosophy of Itsing such data as much as possible by using the dat)a of Nicholson et al. t,o apply some const,raint,s t,o the choice of mom theoretically calculat~rd adjustments. (iv)
Molecular geometry For the backbone, the goomotry of Brant 8.~ Flory (1965) was adapted. The bond lengths, valence angles and torsion angles linking the single centres of action of the sidechains to the backbone are more problematic, since in reality the internal conformational
B.
30
ROBSON
AN11
I
11. ,J. OSGUTHOHPE
I 0
-I
AF,
I I Ckcol)
I 2
I 3
FIG. 2. Experimental transfer froc enorgies of side-chains frurn water to c%hanul (circles) an(l dioxane (squares), plotted against the information moaswe for hydrophobicity (see the text). The regression line should go through the origin (filled circle). The side-chains are ass&nod their IUPBC l-letter code. Glycine (g) refers to the backbone transfw dat:L obtained from the difference in transfer free energy to ethanol between diglycine and triglycine (square) and glycine and diglytine (diamond). The triangle refers to the backbone transfer free energy est,imated by Tanfortl (1970). Note that in this plot a correction was applied for the effect of residue conformation in salvation, but this did not significantly alter the data.
freedom of most side-chains makes tllclse variable quantities Avon for side-chains (c:.g lysyl) of the same type. By the analysis of proteins of known conformation, peomctric parameters have been determined and it has been noted that tllrse arc sufficiently wclldistributed round mean values to make them useful for a preliminary study of this nature, wit,h the deviatiorts (Nicholson et al., 1978). This geometry is given in Table 4, along observed in carboxypeptidase A, chymotrypsin, cytochromes c and h,. lactate dc!r,ydrogenasc, lysozyme, oxyhaemoglobin, ribonuclease and subtilisin. (17) Treatment
of disulphide
bridges
in
PTI
Three disulphide bridges are formed from 3 pairs of sulphydryl groups it1 the cystcainta side-chains of PTI. On the rationale that these bridges form part of the covalent structures of PTI and can he located chemically without X-ray crystallographic analysis, we ha,\-e facilitated the correct pairing of the sulphydryl groups in this preliminary investigation by adding an artificial potential to the interaction between bhem. This closing potrntial has a value of zero at the equilibrium distance between the cent,res of cysteine side-chains linked by a disulphide bridge, and rises as the 4th power of the srparatiorr distance.
(vi)
Minimisation
Minimisation We avoided
of y-variables was carried out tl-le use of “normal
in sucll a way as mode thermalisation”
to
avoid
trappilrg in small, local lnilrimu. (wllich implies R specific model of
COMPUTER
SlMULATION
OF TABLE
Free energy of hydration Side-chain
(via
energy vahxx
estimated
Transfer
by different
methods
to ethanol Transfer to dioxane (Nozaki & Tanford, 1971)
4.0
-
-
TrP Phe Tyicys
3.0 2.8 2.1 2.1
3.2 2.6 2.4
3.5 2.3 2.3 -
LeLI Met
2.0 1.8
2.2
1.4 -
Val His Pro Ax
1.4 I.1 0.4 0.3
1.5 0.5
of hydration All values
estimated are kcal/mol
- 0.3 0.4
-0.2
54513.00 23206.00 194785.00 54513.00 263746.00 617913.00 493205.00 13872.00 61277.00 684446.00 816666.00 10606~00 24496.00 1244796.00 169467.00 638149.00 869019.00 746966.00 962089.00 40476.00 300777.00 89776.00
-
0.5 -0.3
from st,atistiotll side-chain.
Interaction A
0.5 -
-
-0.1 -0.5 -0.7 -0.7 -0-Y -1.0 -1.2 -1.2
TABLE
CQ...CG Ca...NH 0.. .co c=...a c=‘...v CU...L @...I c=...s @...T @...D C=...E w.. .N 0.. .Q U=...K @...H CQ...R CQ...F O...Y ca...mP...C c=...1\1 c’a...r
31
Ile
Gln Thr Asn Glu LYS Ala. Ser rzsp
Free mental
FOLDING
2
of side-chains
Empirical information)
PROTEIN
analysis
of proteins,
compared
with
experi-
3
energy parameters
R
E*
R*
E (5.5)
1762.61 0.00 0.00 1752.61 6618.29 13062.24 11387.83 0.00 1648.92 0.00 0.00 0.00 0.00 0.00 3311.38 0.00 16980.63 16016.69 18603.81 1258.85 6842.13 4308.42
- 0.268 0.228 0.912 - 0.268 -0.617 - 0.866 - 0.899 0.071 -0.177 3.002 2.782 0.030 0.064 2.762 -0.187 1.636 -0.961 - 0.899 ~ 1.052 --0.180 ~ 0.626 - 1.470
3.600 3.600 3.910 3.600 3.910 4.140 4.020 3.870 3.820 3.870 4.060 4.130 4.250 4.260 4.260 4.210 4.250 4.210 4.260 8.640 4.040 3.150
-0.051 0.005 0.042 - 0.051 -0.182 - 0.338 - 0.304 0.003 - 0.046 0.127 0.177 0.002 0.006 0.270 - 0.083 0.139 - 0.425 - 0.380 - 0.466 -0.037 -0.182 -0.136
32
H.
ROHSON
ASI>
D. 3-cmtirr
Tasm
J.
OHGUTHOHL’E ~0’
.-l
I,’
NH...P NH...NH NH...CO NH...d NH...V NH...I, NH...1 NH...S NH...? SH...I) NH...J+: NH...N SH...Q NH...K NH...H NH...R NH...F NH...Y NH...\V NH...<’ NH...I\I XH...I’
23206.00 7873~00 1036308.00 23206.00 52064.00 !1077”,00 6821 1 m ” 10355~00 431407.00 566152M 87989940 4897984)O 76%57940 70933 1 m) 1048576m 9578!)2.00 I 1973740 lOG797~00 “7885%40 65864.00 Tl584.00 G20340
040 040 “6612.46 040 040 0.00 0~00 9 I9 I .0x 14837.15 18631.67 %50”8.“:1 36147-45 21691-13 040 0.00 0.1)0 0.00 04)O 0~00 0~00 040 0.00
0,2%X 0478 %~tiOO 0465 0~190 I aI4 o-394 2~600 2.600 3~000 :l~ooo 2~600
co...0 CO...NH co...co CO...A CO...V CO...L CO...1 co...s (‘O...T (10...1) CO...E CO...N CO...Q CO...K CO...H CO...R CO...F CO...Y co...w co...c CO...hl CO...P
194785~00 1036308~00 501791m 19478540 I78G:18~Oo 178638.00 178638.00 1817832.00 1857832.00 1786377.00 1786377~00 1857833~00 18~5783%~00 12143652.00 2143653.00 2143652.00 178638.00 178638.00 178638.00 178638.00 178638.00 69879.00
040 26612.46 040 0~00 woo II.00 0~00 38707.48 39”73.‘3 *I 0.00 0.00 :jg%T‘j.+:I < 39273.13 45315.25 45315.27 45315.27 0.00 0.00 040
t54513~00 27“06.00 -1 194785m 54513.00 “63746.00 617913.00 493205.00 13872.00 61277.00 584445.00 X1.5566.00 10506.00
1752.61
A...(‘” A...?jH A...CO A...,1 A...V A...L ;I...1 A...S A...T d...l) >\...I? A...?c
0.00 WOO 0.00
fc *
“.tioo I ~535 2~269 L!.3oti
IT*
fC ($5)
:I~520
IkIN) 0.00: 0.7:1ti o.lltl5 0~01 1 0.(11’0 ll.lIIR 0.2X6 0.441’
3.570 3.750 :i~;,itl 3.7RO 4.2GO
I~~550 (,.‘I’!I t 0~477 0.6lS II.154
3~KOO
3~600 :l~xxo 4.140 4,O”O 3~5.50 3~X”O
3.250
A.1611
O.L)L’X
‘t.z!l(l
Il.r’rlx
O~llX 0.951 0.!~73 0.x47 I.542 0416
4~550
O.(l”(i
3.640 4~04~ 3.150 :I.300 4. I x0
0.01:: lI4ltiI 04lOli
(I.1112 2~GOO 1,1X” 0.285 o~:Kbl O.!KI!)
:I~!110 3.X80 4.220 4~450
0.51 I :‘%OO
-&.1:10 4. I30
2.600 4~38ti 3.1 38 2~600 .~ 2~600 -~ R~ootl --3~000 - 3~000 0.098 0.763 0.320 2.514 1.716 0.196
4~:I:N 3.X60
1.140
4.180 4~?lGO 4.140 4.140 4. I40 4.140 4,140 4,960 3,950
I),01
o~tl-12 o~i:ifi ll.](,!i 11.(11:’ fl413!1
0.039 ~l4lX) I.004 I.0I.i I).3XX 0~3XH I~Old 1.015 I.172 1.17? l.li:! 0~03!1 0.039
4.350 3.460 3.610 4.140
0~03!3 o~o:N o.o:~o (1411.-l
:I~MMl
I)~051
04lO OVOID
4,140 4~450
0~005
1752.6 I 6618.2!) 13062.24 11387.83 040 1648.92 O~fI(l 04lll
3~tiOO
0.00
6
IbllOl
3.9 IO 4.140 4~020 4.410 :1.x20 4.410 4~5!fO 4.67(l
ll.(l11’ (I.051
(1. IX” o.:ns 0.3Il4 04)o:~ Cl~O4(i 0~1z!7 0,177 il.(IOL’
COMPUTER
SIMULATION
OF
TABLE A
PROTEIN
FOLDING
33
3-continued
B
E’*
R*
E (53)
A...Q ,4...K a...I% A...R A...F A...Y a...n A...C A...M A...l’
24495.00 1244795.00 188278.00 651790.00 887596.00 746965.00 2139211.00 40475.00 300777~00 89776.00
0.00 0.00 3552.11 0.00 17921.77 I 15015.69 31914.35 1258.85 6842.13 4308.42
0.017 0.920 - 0.18i 0.529 -0.961 --0.899 - 1.052 -0.180 -0.525 m-1.470
4.840 4.800 4.300 4.750 4.260 4.210 4.650 3.640 4.040 3.150
0.005 0.270 -0.087 0.142 -0.429 - 0.380 -0,688 - 0,037 ~ 0.182 -0.136
V...C” V...NII V...CCI V...A v...v T...L V...I V...S V...T V...D V...E V...N V...Q V...K V...H V...R V...F \:...y v...w v...c V...M v.. I’
263746.00 52064.00 178638.00 263746.00 820361.00 1661147.00 1335944.00 73850.00 367695.00 1033349.00 1385240.00 229896.00 302280.00 2153736.00 1008709~00 923046.00 2277132.00 1966514.00 5092219.00 247859.00 974376.00 258562.00
6618.29 0.00 0.00 6618.29 16374.16 28276.09 24684.02 1926.11 7829.41 0.00 0.00 4721.64 5470.69 0.00 15443.80 0.00 35787.45 31926.56 62596.97 6032.60 17756.21 9363.27
-0.617 0.190 0.334 -0.617 - 0.966 -1.214 -1.248 -0.194 --- 0.526 2.100 1.944 - 0.“95 -0.265 2.000 PO.536 0.944 ~ 1.310 ~ 1.248 ~ 1.401 -0,519 - 0.874 - 1.819
3.910 4.020 4.330 3.910 4.220 4.450 4.330 3.860 4.130 4.290 4.470 4.180 4.360 4.680 4.610 4.630 4.570 4.520 4.960 3.950 4,350 3.460
-0.182 0.01 I 0.039 --- 0.182 -0.413 -0.661 -'I.602 -0.054 --0.203 0.224 0.301 -0.121 --0.132 0.468 ~ 0.339 0.100 - 0.798 PO.727 -1.156 PO.164 -0.430 - 0.282
L...@ L...NH L...CO L...A L...V L...L L...L L ..s L...T L...I) 1. E 1,.,.x L ..& L ..K L...H L...R L...F L...Y L...W L...C L...M L...P
617913.00 90772.00 178638.00 617913.00 1661147.00 3147264.00 2550275.00 282772.00 880516.00 1517375.00 1976504.00 684409.00 927494.00 3131782.00 2284441.00 1101399~00 4211693.00 3681346.00 9009005~00 605223.00 1987901.00 524190.00
13062.24 0.00 0.00 13062.24 28276.09 46056.07 40344.44 6199.52 15935.62 0.00 0.00 11969.92 14386.81 0.00 30222.82 0.00 57124.74 51524.88 96664.28 12430,19 31037.70 15649.51
PO.865 1.014 0.939 PO.865 -1.214 -1.461 - 1.496 - 0.441 --0.773 X.760 7.540 -0.542 -0.513 7.538 PO.784 2.952 -1.557 - 1.495 ~ 1.649 PO.777 mm1.121 -2,066
4.140 3.550 3.860 4.140 4.450 4.680 4.560 4.090 4.360 3.820 4.000 4.410 4.590 4.210 4.840 4.160 4.800 4.750 5.180 4.180 4.580 3.690
~ 0.338 0.020 0.030 PO.338 -0.661 ~ 0.980 ~ 0.904 -0.163 - 0.385 0.329 0.429 -0.284 -0.318 0.680 ~ 0.596 0.239 - 1.149 - 1.062 - 1.536 PO.318 - 0.690 ~ 0.452
B.
34
ROBSON
AND TABLE
4
I>.
J.
OSGUTHOREE
S-contiwued
I3
E*
K*
R (5.5)
-- O-RQQ 0.394 0.511 -- 0.8Q'j 1.248 ~~ 1.496 1.530 0.476 0.808 3.643 3,248 O.tji7 0,547 3,368 -~-0.81x 1.278 1.591 1.530 1.683 0.811 1.156 2.101
4420 3.820 4.130 4.020 4.730 1t 4,560 4.440 3.970 4,240 4mo 4.270 4.290 4.470 4.480 4.720 4.430 4.680 4.630 5.070 4.060 4.460 3.570
0.015 043!b 0.304 0.602 0,904 0.82.5 (J.151 0.353 0~2.53 0.333 0.267 0.304 0.532 0.567 0.182 1.068 0.984 1.480 0.288 0.635 lb3X5
0.071 Z.600 2.600 04". d.. 0.194 0.441 0.476 0.128 0.072 I.660 1.603 0.134 0.153 1.554 0.133 1.048
I~~oo:j 0.286 I .004 O~lN13 0~054 0~163 0~151 0~010 O~OOX 0.129 0.18” lb01 8 0~029 0.2il om:j 0.165 0.227 0.190 0.394 0405 0.033 -0~oRQ
I...@ I...NH I...CO I...A l...V l...L I...1 l...S I...T L...D l...E 1.,.x I...& I...K I...H I...R I...F I...Y I...W I...C l...M l...P
493205.00 68211~00 178638.00 493205.00 1335944.00 2550275.00 2052578.00 23327600 715643.00 1166693.00 1532528.00 568051.00 780137.00 2448691.00 1903034~00 839640.00 3428165.00 2991891.00 7452099.00 486434.00 1613786.00 395813.00
11387.83 0.00 0.00 11387.83 24684.02 40344.44 35175.64 5592.29 14082.83 OdJO 0.00 10792.1% 13102.06 0.00 27146.37 0.00 50166.69 45216.20 85772.10 10902.77 27286.55 13048+8
S...C!a S...EH S...CO S...A s...v S...L H...I s...s S...T S...D S...E S...N S...Q S...K S...H S...R S...F S...Y s...w s...c S...M S...P
13872.00 210355.00 1817832.00 13872.00 73850.00 282772.00 233276.00 45572.00 37907.00 593213.00 840259.00 82663.00 132867.00 1248161.00 106754.00 761950.00 446390.00 354942.00 1160163.00 24058.00 61903.00 56352.00
0.00 91'31~68 38707.48 040 1926.11 6199.52 5592.29 0.00 040 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8973.44 7395.52 17878.74 0.00 1225.64 2787.0'
--0.476 -0.629 0.376 -~ 0.101 -~ 1.045
3.870 3.250 4.130 4.410 3.360 4,090 3.970 4.140 4,320 4.140 4.320 4.400 4.570 4.530 4,530 4.480 4.210 4.160 4.600 3.420 3.990 3.100
T...(P T...NH T.. .CO T...A T...V T...L T...T T...S T...T T...D
61277.00 431407.00 1857832.00 61277.00 367696.00 880516~00 715643.00 37907.00 48967.00 990593.00
1648.9% 14837.15 39273.23 1648.92 7829.41 15935.62 14082.83 0.00 1113.91 0.00
0.177 2.600 -m2.600 -~0.177 -0.626 0.773 0.808 0.072 0.085 I.890
:3~x”o 3.520 4.140 3.820 4.130 4.360 4.240 4~320 4.040 4.320
--0.537
0.304
lM46 0.44” 1.015 -~ 0.046 -0.203 ~13.385 0.353 woox 0~030 0~21.5
COMPUTER
SIMULATION
OF
TABLE A
PROTEIN
FOLDING
35
3-continued
R
E*
0.00 0.00 0.00 0.00
1.792
R*
E (5.5)
4.500 4.580 4.750 4.710 4.520 4.660 4.480 4.430 4.870 3.860 4.260 3.370
0.010 0.018 0.442 - 0.056 0.237 - 0.487 -0.431 - 0.746 -0.025 -0.194 -0.185
0.294
T...E T...N T...Q T...K T...H T...R T...F T...Y T...W T...C T...M T...P
1356024.00 46681*00 8260500 2036894.00 150908.00 1091981~00 1263449.00 1061156~00 2960981.00 33828.00 400181*00 154570.00
2451.25 0.00 21077.30 18308.78 38453.86 882.28 7764.61 6057.97
0.053 0.067 1.786 - 0.096 1.054 - 0.869 - 0.807 -0.961 -0.089 - 0.433 - 1.379
D...Ca D...NH D...CO D...A D...V D...L D...I D...S D...T D...D D...E T)...N D...Q D...K D...H D...R D...F D...Y D...W D...C D...M D...P
584445.00 565152.00 1786377.00 584445.00 1033349.00 1517375.00 1166693.00 593213.00 990593.00 2240643.00 3171396.00 1187541~00 1757316.00 4222220.00 2680191.00 3759966.00 1855186.00 2723578.00 3746027.00 659906.00 1408959.00 115206.00
0.00 18631.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3.002 - 3.000 4.585 0.926 2.100 8.760 3.643 1.660 1.890 6.271 6.052 1.921 2.021 5.256 3.336 5.172 1.098 8.069 7.299 10.310 14.958 0.322
3.870 3.570 4.180 4.410 4.290 3.820 4.090 4.140 4.320 4.140 4.320 4.400 4.570 4.530 4.530 4.480 4.920 3.910 4.310 3.420 3.570 4.140
0.127 - 0.550 0.388 0.127 0.224 0.329 0.253 0.129 0.215 0.487 0.689 0.258 0.382 0.917 0.582 0.816 0.403 0.374 0,813 0.143 0,306 0.025
E...Ca E...NH E...CO E...A E...V E...L E...I E...S E...T E...T) E...E E...N E...Q E...K E...H E...R E...F E...Y E...W E...C
815566~00 879899.00 1786377.00 815566.00 1385240.00 1976504.00 1532528.00 840259.00 1356024.00 3171396.00 4413314.00 1626778.00 2371421.00 5862339.00 3524369.00 5088950.00 2384063.00 2229517.00 4663906.00 919167.00
0.00 25028.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2.782 - 3.000 3.138 0.902 1.944 7.540 3.248 1.603 1.792 6.052 5.832 1.835 1.926 5.139 3.090 4.911 I.021 6.961 6.288 9.050
4.050 3.750 4.360 4.590 4.470 4.000 4.270 4.320 4.500 4.320 4.500 4.580 4.750 4.710 4.710 4.660 6.100 4.090 4.490 3.600
0.177 -0.713 0.388 0.177 0.301 0.429 0.333 0.182 0.294 0.689 0.958 0.353 0.515 1.273 0.765 1.105 0.518 0.484 1.013 0.200
E...M E...I’
040
s...cu S...NH N...CO N...A N...V N...L N...I N...S N...T N...I) N...E N...N N...Q N...K N...H N...H N...P N...Y N...W N...C N...M N...P
10506~00 489798.00 1857832.00 1050640 229896.00 684409.00 568051~00 82663.00 46581.00 1187541~00 1626778.00 134439.00 “12697.00 2400248.00 117399m 1392131~00 1025297~00 838326.00 2466206~00 30316~00 207553m 146927.00
16147.45 39273.23 040 4721.64 11969.92 10792~12 040 040 040 040 oa) 040 O@O 040 0.00 16544.23 13985.24 31061.70 040 3888.55 5509.53
Q...P Q...NH Q...c’O &..-A Q,..V Q...L Q...I Q...S Q...T &...I1 Q...E Q...N
::::r rQ...Y Q...W Q...C Q...M Q...l’
2449500 762579.00 1857832~00 2449500 302280.00 9274944~0 780137.00 132867.00 82605.00 1757316~00 %:~71421~00 “12697~00 329527~00 3469521.00 302650.00 “010667~00 1388441~00 1133694.00 3269619.00 5493640 256042.00 227112~00
0~00 21691.13 39273.23 0.00 5470.69 14386.81 13102.06 040 WOO 0.00 0.00 WOO 040 040 04)O 040 19932.24 16804~65 36972.4” 040 4242.90 73111.71
K...C’” K...NH K...(Y) K...A K...V K...L K...T K...S K...T
1244795m 70933la) “14365“m 1244795.00 215‘~776~00 I * 3131782.00 244869140 1248161.00 2036894M
Q...Q Q...K Q...H
0~00 0.00 45315.27 040 040 040 0.00 040 (I4lO
0~030 2.600 - 2.600 0~010 -~~0.295 .~ 0.54% 0.57T 0.134 0.053 I.921 1 .x:15 0~130 0.149 1.X08 o~oxx 1 153 O~(i3X 0.5ii 0.730 0.245 0~20”
1~148 0~054 2.600 L'.(ioo 0.01 7 0~265 0.5 I 3 .~ 0.545
0.1 .53 0467 242 I I ~!I”6 0.149 0.1 69 I~910 0.112 I.212 0~609 -~ 0.547 - o.ioo 0.296 -~ cl.1 i3 I.1 IX 2.752 I.535 3400 0.920 2400 7~538 :I 3 6 8 1.55-4 I.786
4.130 3.570 4.140 4.6iO 4.180 4,410 4.290 4,400 4,580 4.400 4~580 +fifM 4~x30 4.790 4.790 4.740 4.530 4.480 4.920 3.680 4.31fb :1.4L’o 4.250 3.7 3 I 4,140 4.840 4.360 4.590 1.470 4.570 4.7.50 4~570 4~i60 4~840 5~000 I.960 4.!JfiO ‘4~910 4.7 IO 4.660 5. I00 3.x50 -L.490 3~600 4.250 4-260 -4.140 4.x00 4.680 4.210 4.480 4.630 4.711)
0~002 0.4i7 1,015 0~002 0~121 0.284 0.267 (I4)IX O~lllC, O.“r,X Il.‘lj’(> 1 , 114m (I~(l46 0.52 I IJ~O2.5 o.:1oL’ o.:si,j ().:(“:I (1.5X7 ~I~007 1bO!J;, 0, I67 0~IJ05 0.61 x I~015 0~005 0. 1x2 0.3 I8 0.:(04 0~029
0~01 8 0.:1x:! 0.51 .i (I446 04)7:! 0.753 0.034 0.137
0.41’1 0.36 I 0~626 lb1ll 2 O~O!)X
COMPUTER
SIMULATION
OF
TABLE A
B 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
PROTEIN
37
FOLDING
S-continued R*
E (5.5)
5.266 5.139 I.808 1.910 4.533 3.084 4.514 1.135 7.042 6.663 8.223 11.998 0.817
4.530 4.710 4.790 4.960 4.920 4.920 4’870 5.310 4.300 4.700 3.810 3.960 4.140
0,917 1.273 0,521 0.753 1.663 1.131 1.510 0.827 0.768 1.619 0.302 0.624 0.063
E*
K...I) K...E K...N K...Q K...K K...H K...R K...P K...Y K...W K...C K...M K...P
4222220.00 5862339.00 2400248.00 3469521.00 7658015.00 5210131.00 6955642.00 3808033.00 3539220.00 7456271.00 1391142.00 2873221.00 291999.00
H...Cn H...NH H...CO H...A H...V H...L H...I H...S H...T H...D H...E H...N H...Q H...K H...H H...R H...F H...Y H...W H...C H...M H...l’
169467.00 1048576.00 2143652.00 188278.00 1008709~00 2284441.00 1903034~00 106754.00 150908~00 2680191.00 3524369.00 117399.00 202650.00 5210131.00 414979.00 2710350.00 3195706.00 2712985.00 6974982.00 108523.00 1071275.00 516266.00
3311.38 0.00 46315.27 3552.11 15443.80 30222.82 27146.37 0.00 2451.25 0.00 0.00 0.00 0.00 0.00 4979.75 0.00 39283.76 34379.07 68323.96 1991.34 15088.89 13570.08
-0.187 2.269 - 3.000 -0.187 ~ 0.536 PO.784 -0.818 0.133 - 0.096 3.336 3.090 0.088 0,112 3.084 PO.106 1.759 - 0.879 -0.818 -m0.971 ~ 0.099 ~ 0.443 ~ 1.389
4.250 4.260 4.140 4.300 4.610 4.840 4.720 4.530 4.520 4.530 4.710 4.790 4.960 4.920 5000 4.870 4.960 4.910 5.350 4.340 4.740 3.850
- 0.083 0.228 ~ 1.172 -0.087 - 0.339 - 0.696 - 0.567 0~023 - 0.056 0.582 0.765 0.025 0.044 1.131 - 0.090 0.589 ~ 0.725 ~ 0.653 ~ 0.954 -0.048 --0.31” - 0.378
R...C’= R...NH R...CO R...A R...V R...L R...I R...S R...T R...D R...E R...N R...Q R...K R...H R...R R...F R...Y R...W
638149.00 957892.00 2143652.00 651790.00 923046.00 1101399~00 839640.00 761950.00 1091981~00 3759966.00 5088950.00 1392131~00 2010667.00 6955642.00 2710350.00 5219170.00 1213761.00 1202010~00 2103408.00
0.00 0.00 45315.27 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.536 2.306 ~ 3.000 0.549 0.944 2.962 1.278 1.048 1.054 5.172 4.911 1.153 1.212 4.514 1.759 3.717 0.394 2.657 2.069
4.210 4.210 4.140 4.750 4.630 4.160 4.430 4.480 4.660 4.480 4.660 4.740 4.910 4.870 4.870 4.820 5.260 4.250 4.650
0.139 0.208 - 1.172 0.142 0.200 0.239 0.182 0.166 0.237 0.81fi 1.105 0.302 0.437 I.510 0.589 1.113. . 0.264 0.261 0.457
13. ROBSON
38
AND TABLE
A R...C R...M R...P
753202.00 1292028~00 35354.00
D.
J.
OHGUTHOKPIC
3--continued E*
fI
5.014 6.049 0.099
3.760 3.910 4.140
mmo.9til 0.118 om8 0.!)61 1.310 .1.557 1.592 0.537 0.869 1+98 1~021 0.63X 0.609 I .135 0,879 0.394 1.653 I.591 I.744 0.873 I.217 2.162
4.250 4.650 4.960 4.260 4.570 4~800 4.680 4.210 4.480 4.920 6~100 4.530 4.710 5.310 4.960 5.260 4.920 4.870 5,310 4.300 4~700 3.X10
0.00 0~00 WOO
I?...@ P...NH P...CO P...A F...V F...L F...l I?...8 B...T P...D B...E P...S F...Q F...K P...H F...K F...F F...Y F...\V F...C F...M F...P
869019.00 119737.00 178638.00 887596.00 2277132.00 4211693.00 3428165.00 446390.00 1263449.00 1855186~00 2384063.00 1026297.00 1388441.00 3808033.00 3195706.00 1213761.00 5583299.00 4903191.00 11709469.00 877052.00 P723221~00 731554.00
Y...CQ Y...NH Y...CO Y...A Y...V Y...L Y...I Y...S Y...T Y...J) Y...E Y...N Y...Q Y...K Y...H Y...R Y...F Y...Y Y...W Y...C Y...M Y...P
746965.00 106797.00 178638.00 746965.00 1965514.00 3681346.00 2991891.00 354942.00 1061156~00 1723578.00 2229517.00 838326.00 1133694.00 3539220.00 2712985.00 1202010~00 4903191.00 4295404.00 10374208.00 733724.00 1348165~00 631055.00
15015.69 0.00 0.00 15015.69 31926.56 51524.88 45216.20 7396.52 18308.78 0.00 0~00 13985.24 16804.65 0.00 34379.07 0.00 63677.08 57537.92 106927.39 14336.97 35031.68 17807.18
W...(>” W...NH W...CO W...A W...V W...L W...I W...S
952089.00 278852.00 178638.00 2139211.00 5092219.00 9009005~00 7452099.00 1160163~00
18603.81 om woo 31914.X 62596.97 96664.28 85772.10 17878.74
16980~6:1 0.00 0~00 17221.77 35787.45 57124.74 50166.69 8973.44 21077.30 0.00 0~00 16544.23 19932.24 0m 39283.76 om 70321~29 63677.08 117312.67 16546.69 39344.18 19840.96
/I’*
~ ~ -
0,899 0.95’ 0.763 wu99 1.248 1.495 1.530 0.476 0.807 8.069 6.961 0.577 0.547 7.042 0.818 2,657
- 1.691 ~- 1.530 - 1.683 mm~0.811 1.155 2.101 ~~ I.052 0.97? , .
0.320 1.05t’ I.401 1.64!) ~~ 1.683 -~O.B”!J
4,210 3.640 3.950 4.210 4.620 4.750 4.630 4.160 4.430 3.910 4.090 4.480 4.660 4.300 4.910 4.250 4.870 4.820 3.260 4.250 4.650 X.760 4.250 4.040 4.350 4.650 4.960 5. I90 5.070 1.600
R (5.5) 0.164 0.281 ems ~- 0.4L’5 0.026 0.039 0.429 -~ (I.798 I.149 1~068
().-““7 0.48i 0.40:1 0.518 0.375 0,419 0.817 0.71’5 0.264 1.328 1.236 I.696 0.4oi 0.831) I).*5*5X 0.380 0~01:l om9 0.3x0 0.7”i 1 .06:!
W984 0.190 --0.451 0.374 0.4x4 0.323 0.36 1 0.768 (M53 0.261 1 .“‘lfi _* I I46 1.610 0.35!) (I.756 0.506 0.465 0.06 1 0~039
0.688 1.156 1.536 I .4x0 0.3!)4
COMPUTER
SIMULATION TABLE A
OF
PROTEIN
FOLDING
39
S-continued
B
E*
R*
E (5.5)
W...T W...I) W...E W...N W...Q W...K W...H W...R W...F W...Y w...w w...c W...M W...P
2960981.00 3746027.00 4663906.00 2466206.00 3269619.00 7456271.00 6974982.00 2103408.00 11709469.00 10374208.00 23325827.00 2117467.00 6001475.00 1833269.00
38463.86 0.00 0.00 31061.70 36972.42 0.00 68323.96 0.00 117312.67 106927.39 188931.23 30788.56 68264.66 37116.73
-0.961 7.299 6.288 - 0.730 - 0.700 6.663 - 0.971 2.069 - 1.744 ~ 1.683 - 1.836 - 0.964 - 1.308 - 2.254
4.870 4.310 4.490 4.920 5.100 4.700 6.360 4.650 5.310 5.260 5.700 4.690 5.090 4.200
- 0.746 0.813 1.013 - 0.687 - 0.626 1.619 - 0.954 0.457 - 1.696 - 1.610 - 1.760 - 0.662 - 1.163 - 0.943
C...O C...NH c...co C...A C...V C...L C...I c...s C...T C...D C...E C...N C...Q C...K C...H C...R C...F C...Y C...W C...C C...M C...P
40475.00 25864.00 178638.00 4op75.00 247859.00 605223.00 486434.00 24058.00 33828.00 659906.00 919167.00 30316.00 54936.00 1391142.00 108523.00 753202.00 877052*00 733724.00 2117467.00 10485760.00 273542.00 94564.00
1258.86 0.00 0.00 1258.85 6032.60 12430.19 10902.77 0.00 882.28 0.00 0.00 0.00 0.00 0.00 1991.34 0.00 16646.69 14336.97 30788.56 246760.00 6041.36 4369.16
- 0.180 0.847 2.614 -0.180 - 0.629 -0.777 -0.811 0.376 - 0.089 10.310 9.060 0.246 0.296 8.223 - 0.099 6.014 -0.873 -0.811 -0.964 20.000 -0.437 - 1.382
3.640 3.150 3.460 3.640 3.950 4.180 4.060 3,420 3.860 3.420 3.600 3.680 3.860 3.810 4.340 3.760 4.300 4.250 4.690 4.000 4.080 3.190
-0.037 0.006 0.039 -0.037 -0.164 -0.318 -0.288 0~005 - 0.025 0.143 0.200 0.007 0.012 0.302 - 0.048 0.164 - 0.407 ~ 0.359 - 0.652 - 6.602 -0.159 -0.137
M...CU M...NH M...CO M...A M...V M...L M...I M...S M...T M...l) M...E M...N M...Q M...K M...H M...R M...F M...Y
300777.00 71584.00 178638*00 300777*00 974376900 1987901.00 1613786.00 51903.00 400181.00 1408969.00 1873396.00 207663.00 266042.00 2873221.00 1071275.00 1292028.00 2723221.00 2348166.00
6842.13 0.00 0.00 6842.13 17766.21 31037.70 27286.66 1226.64 7764.61 0.00 0.00 3888.65 4242.90 0.00 15088.89 0.00 39344.18 36031.68
- 0.626 1.642 1.716 - 0.526 - 0.874 - 1.121 - 1.166 -0.101 ~ 0.433 14.968 12.776 - 0.202 -0.173 11*998 - 0.443 6.049 - 1.217 - 1.165
4.040 3.300 3.610 4.040 4.360 4.580 4.460 3.990 4.260 3.570 3.760 4.310 4,490 3.960 4.740 3,910 4.700 4.660
-0.182 0.016 0.039 -0.182 - 0.430 -0.690 - 0.636 - 0.033 - 0.194 0.306 0.407 - 0.095 -0*098 0.624 -0.312 0.281 - 0.830 -0.756
,\I...\$ iN...c Bl...N 1\1...1’
P...CU P...NH P...CO 1’....1 I’...\ l’...L I’...1 P...S E’...T P...T) P...E: P...N P...Q l’...K P...H P...R I’...F I’ Y P...W P...C P...M P...P
6ool475w “7354”m 1135066~00 341961.00 X9776~00 6203.00 69879m X9i76.00
258562.00 524190~00 :I95813~00 55352.00 154570~00 115206~00 166502~00 146927.00 L’27112~00 L’91999Gl 516266.00 :v354~00 731554.00 63105590 183326’160 94554.00 :141961TK~ 40747~00
68264.66
5.090
I.163
6041.35 18935.57 11086~23
4.086 4.480 3’590
0. 159 0.438 0.326
4308.4” 0.00 040 4308.4L’ *, 9163.27 15649.51 13048.98 2787.02 6057.97 040 040 5509.53 7301~71 040 13570~08 0~00 19840.96 17807.1X 37116~iR 4X69.16 11086.23 3105”‘5
- I .470 0.016 0~196 I.470
~-~ 1 .X1!) 2466 ~2.101 I.047 l.Ri!) 0.322 0.440 1.14x 1.1 IX 0.817 1.380 0~099 2.16“ 2.101 2.254 I ,:3x:! 1.726 i.(j7”
3-150 4,180 1.140 3.150 :j.-ltio 3.690 X.570
:1.100 3.370 4.140 1.140 3.420 3~600 4.140 3.850 4.140 :S 8 1 0 3’ 7 60
4~200 :i, 190 3.590 L’.700
Energy paramr&ers (see the test) for intwactions bc~t\w~~~ii 11~ 22 gt~~~rp (19 si(l+ch;utl typr+ and C”H or PH , and iW, CO barkhonc~ groups). For a tl<3cription of parameters <1 and H, SC,% the text. Ii* is the contact distanrc of th(? pairaise intrrac-tion. and E’ is t,hr cncrgy at this tlistatici~. 5.5 refers to the distance at 5.d a, the mean tlistancc bet\\cen side-rhain wntros found in protr+ns of known conformation by (‘rampin rt (II. ( 1978). The parameters for sitk-chain interactions whew calculatetl from Table 2, with some refinrmcnt for non-adtlitjivc effects tlopontlent on tbc natun~ of bot,h groups (see the text). In general, however, interact,ions were not imposed to be favourablr~ because of possible charge or hydrogen-bond int,oractions. because tjhow cannot, be guamnt~ertl to On the other hand. such intwwtronr occur in protrim brt,w-een an,y specified pair of groups. between a side-chain and a, backbone in its vicinity are frequently obswvt~cl anti WWY~ tnkcii into account for sicle-chain~b~ckbonc contact*:.
tliermal effects) and the apparently arbitrary occasiorral use of “holdirig,r” and “puslimp” potentials (Levitt. 1976). This was done by altertiatively and repeatedly invoking a SIMPLEX minirniser, which readily escapes from shallow minima, and a Da,vidoti mini miser, in order to efficiently locate the minimum energy conformation of deep rnitiirna. As programmed, tlie SIMPLEX and Davidon procedures may be considered as 2 states of tlic state being switched when t,hc cnrrrnt,ly activcb the overall minirnisation process, numerous small minima procedure attains a specified degree of convergence. In practice, are encountered and so t,he SIMPLEX procedure is most often active. When a, required degree of convergence was obtained by the Davidon procedure. and the SIMPLEX pro was held t,o hare reached a deep cedure could not, find an escape routca. tlte simulation minimum in nlrich t,lie prot,ein w-as likely to be trapped for a sipnificaiit period of t,inm. Typically, this corresponded te a mitiirntun surronnded by a u-all of at least 5 kcal. In preliminary stud&, the minimiser was t,hen switclied to tianclle t,lrc + and $ variablw individually for the first 26 residues, tlir last 25 residues. and t,he middle X5 residiws of PTI. No now minima were reached l)y this procedure.
COMPUTER
SIMULATION
OF
TABLE Geom,etry linking Bond
Ala -b2 ASll Asp cys Gln GlU His Ile LeLl Lys Met, Phe R-0 SW Thr Try TY~ Val
length
1.33-l.54P1.82 3.40-4.25-4.71 1.65-2.47-2.68 2.32-2.48-2.63 2.02-2.34-2.90 2.47-3.12-3.51 2.25-3.12-3.50 2.98 - 3.20& 1.71-2.28-3.75 2.19-2.60-2.85 %.673.542.34-2.95-3.28 3.063.41~~ 1.59P1.84--2.09 1.67 - 1.911~75-1~93-2~06 3.17P3.87P4.34 3.45-3.78-5.09 1.811.99--2.10
Bond
4.99 3.6% 2.15
41
centres to the backbone? angle
0 (“) 5
79-109-127 * < 75P11:SP154 59-116-153 75-111-154 85-111-145 81~115~154 65~111~152 70-111~162 49-110P148 81--113P152 53-115-157 66-ill-69-113-174 5464-81-lo9P144 8410876~115~170 72p 106~go--114.-137
4.35
FOLDING
4
the side-chain
I (A)$
PROTEIN
*
140 75 126 163
Torsion
angle
dp
(“) 1~
95-121P138 60-134-194 83-139-185 82-139-181 82-145-185 74p138~-180 95-144-192 84-145-209 9513Op 163 99-144p~l78 70-140-213 102-142p 187 78-152-215 . I YO- l%OP~ 139 88p124158 97p129--156 72p144206 68& 14SP~191 97 -~ 132157
Observed geometries linking side-chain centres to the backbone in ribonuclease, lysozyme, cytochromes c and b5, chymotrypsin, carboxypeptidaso A4, subtilinin, lactate dehydrogenase, and oxyhaemoglobin given in the format “lowest value-mean value-highest, value”. Distributions we approximately symmetrical or skewed Gaussian. The bond lengths for alanine (with a single methyl group side-chain) presumably reflect errors in co-ordinates rather than true flexibility. t Backbone geometry used N--C=-C’-N $. N-0 1.44 A; C’-N-U-C 4. $ I, Bond distance Ca-R. Ej 8, Bond angle N-0-R. I\ Ap, Torsion angle p(C’-S-P-C’)
in the C’-N-C
simulations is 122”; Ca-C’-N-0
~
as
follows: 180”:
C’-S Ca- C’
1.33 1.52
A ; Ca-C’PN 8; N-C=C’
116”; 111”;
p(C’-N-0-R).
3. Results (a) Statistical
predictions
of y-values of PTI
The use of statistical (information theory) methods, based on analysis of local interactions between residues close together in the amino acid sequence, are widely held to lead to predictions of backbone conformation that relate to the earliest stages of folding. Predictions of the residue conformations may also be made in terms of y. Initially, a prediction of H: E, T and C states is made according to the directional information method tested by Garnier et al. (1978). E and H states are then assigned y = 0” and y = + 85”, respectively. State T is assigned y = + 63, which lies close t’o the y value for the predominant type of reverse turn in the 26 proteins studied. The y assignment for state C is dependent on the type of residue (e.g. alanine, valine), and is made according to the y value corresponding to the 4 and I,L values most favoured by the residue in nine proteins (see Table 1 of Pain & Robson, 1974). The conformation of proline is treated specially in deriving y, the favoured 4 and 4 values for proline corresponding closest to the predicted H, E, T or C state being used in its calculation (see Table 1 of Robson & Pain, 1974).
42
R.
ROBSON
L 0
I IO
891)
Tl.
I 20
J.
OISGUTHORPE
I 30 Residue
I 50
I 40
-.J
number
FIG. 3. The secondary structure (local bitckbono conformitt~ion) of PTI protein, represented b>plotting y (see Fig. 1 and the text) as a function of rosidw number. Filled circles, observed y values. Open circles, initially predicted values. Arrows, lurge changes (more than 46’) from the initially predicted y values when the tjertiary (overall) structure is calculated via minimisrttion of the energy of the predicted secondary structuw. Despite the fact that t,he?se chimges are nft,en to the detriment of the predicted secondary structure, the r.m.s. fit to t,he observed falls from 26.6 .i t,o 6.0 A after minimisation. y for residues 38 to 46 fell to -45’ r:arly in thr ti)ld.
The resulting predicted y values of PTI are compared with the observed conformation in Figure 3, which also illustrates the convenience of the y notation. There is, in general, good agreement for u-helix and extended chain, except that one short section of a-helix (residues 4 to 6) is predicted as extended chain, while the major a-helix (residues 48 to 55) is somewhat shortened. Further, many “coil” regions (not a-helix, pleated sheet or turn) are predicbed y values close to the observed. In obtaining the statistical data and in employing the rules discussed above, no account was taken of the conformation of PTT, except to assume that PTI belonged neither to the helix-rich or pleat)ed sheet-rich class of proteins as defined by Garnier et al. (1978). (h) Behauiour
of a-helical
regions
du&g
folding
of PTI
The y representation allows a convenient representation of the time-course of protein folding. However, since our simulation is not a molecular dynamic procedure (which would be enormously expensive in terms of computer time), the time-course of folding is represented by sampling conformations after equal numbers of minimisation steps. Figure 4 shows the changes in or-helical regions contoured around a region of y
COMPUTER
0
20
SIMULATION
40
OF
60
60
PROTEIN
FOLDING
100
120
140
“Time” FIG. 4. The “time-course” of changes in secondary structure of the protein PTI during the minimisation of overall energy, starting from the initial prediction of y values (see Figs 1 and 3). The 2.dimensional surface of y as a function of residue number and “time” is contoured at 45’ intervals. The “time” actually represents an arbitrary periodic sampling of the value of the y variables during the minimisation. The following represent conformations within the time-course of the simulation. Black regions, y values within 22.5” of a near-ideal right-hand or-helical conformation (y = 190). Densely hatched regions, y values within 22.5” to 45” of the near-ideal right-hand cc-helical conformation. Medium hatched, y values within 22.5” of the most extended conformation (y = 0”). Sparsely hatched regions, y values wit)hin 22.5” to 45” of the most extended conformation. Reading up the diagram describes a protein conformation at any moment of “time”, the conformation being represented by a list of y values for each and every residue. Reading along the diagram shows how the y value of a selected residue changes with “t)ime”. m, Observed a-helical regions. 8, Observed extended chain regions. Note that,, apart from some distortions, most changes responsible for overall tertiary structure formation occur as an accumulation of usually small changes in y in the regions that are not a or ,!?. These correspond to hing+ points as described in the text. Note that. residues 3 to 6 are not xssignotl by the initial prodiction, but adopt the correct a-helical conformation very early in the simulation.
space (y = 45” to 135”) in the (residues 48 to 55) was initially shortened x-helix persisted to residue. Some distortions of up hand a-helix occurred. Residue
a-helical region. The major observed cc-helical region predicted to be shorter (residues 48 to 52) and this the end of the simulation with the loss of the last to about 30” from the classical +-# angles for right52 became severely distorted and cannot be regarded
44
13. ROHSOS
dN1)
I).
.J. OSGL!‘I’HOHI’E:
as a-helical, although lying in the a-helical quadrant of y space. The short distortetl z-helix in the middle of the PTI chain (residues 25 to 27) is generally described as a series of reverse turns of the a-helical type III (C rawford et al., 1973). This was 1~~ dieted correctly in the starting conformation, although during t,he simulation rcsiducs 26 became distorted, so that residues 25 to 26 approximat,ed a type IT’ (Crawford et al., 1973) reverse turn. The simulat,ion demonstrates that’ a-helices observc>d in ttrc native structure do not necessarily have t,o he predicted in the starting oonformatiorl : the turn of a-helix observed at the beginning of the chain (residues 4 to 6) \\‘as prc’dieted to be extended chain in the starting conformation but became sc-helical early. in the simulat,ion. Many of the a-helical distortions observed are not reflected in the observed structure>. though only residues 25 and 52 showed serious departure from classical sc-helical conformations. This can be improved by refining the paramet’ers further t,o avoid clashes between a-helical side-chains, or by neglecting local side-chain interactionh already accounted for in preliminary, more exact calculations of local interactions. However, since we are here concerned with simplicity. such refinements will 1~ considered elsewhere. (c) Behaviour
of B-pleated
sheet regions during
the folding
of PTJ
In the native structure of PTT: two extended regions, comprising residues 14 to 24 and 29 to 35, come together to form an antiparallel pleated sheet arrangemrant. The initial prediction gives residues 14 t’o 23 and 29 to 35 as extended chain and also predicts that the intervening residues (24 to 28) are of t*lre reverse turn type. However. neither the +-I/J nor y values for these t,urn residues in t’he starting conformation bring the two extended regions together to form an ant’iparallcl pleated sheet arrangement; i.e. the observed structure, until minimisation is carried out. During the minimisation residues 14, 18. 23 and 29 became kinked rather t’hau extended. with bhr const~luen~ that the ext,ended region (residues 14 to 24) is no\?. bent 90’.. as in the native prot’cin. Regions 19 to 22 and 30 to 35 come t,ogether to form an unt’\\isted antiparallcl pleated sheet, and this persists to the end of the minimisation. The ohservcd twist, in the pleated sheet has not been reproduced. ln the natjive structure. rc>sidues 20 to 25 lie close to residues I tjo 12. so providing a very distorted antiparallel chain arrangement. In the simulation. the first se&ion of the chain (residuc>s I to 12) nerds to l)(b folded against t’he rest of the molecule. and this also has not heen satisfactorily reproduced. This is precisely because the pleated sheet formtad hy residues 19 to 22 and 30 t,o 3.5 is t,oo “classical”. and needs to be twisted to rrwivc residues 6 to I2 correctly. The ult,imate cause of this may 1~ the failure t,o predict rcbsidues 52 t#o 55 as a-helical? since t,hese would stabilise the twisted sheet by a helix~shert contact.
The problem of describing overall agreement lwt,u.ec:n simulated folds and the observed native structure is complex. and has not, yet hrtln ~,csolved. WC herr employ three principal methods of descript’ion. First, comparison of stereo pictures for thr simulated and observed structures (Fig. 5(a) and (b)) reveal s any genera,1 similarit> in the backbone arrangement* and correct, threading of t)hc> ba~ckbone, since knott’ed structures are possible. Second, an Ooi plot (Ooi 8~ Niahika~\~a. 1973) may b(, (hm ployed. This is a table of distances between all Ca atoms. \r.hich is capable of repr+ senting a three-dimensiona. chain topology in two dimensions. and which may 1~
COMt’UTER
SIMULATION
uk’
k’HOTKIN
FOLDING
45
(b) I’IG. 5. Stew pictures of PTI in the native (a) and calculated (b) conformations. The latter simulation was begun in the predicted secondary structure conformation. The energy of the final conformation is - 190 kcal mol-l on slightly relaxing the structure without the contribution from the disulphide closing potential (the protein conformational entropy contribution is not calculated), and the energy of starting conformation was 52 x IO3 kcal mol-’ without the disulphide closing potential.
contoured to show separations within certain distances (Fig. 6(a) and (b)). Third, the root-mean-square deviation between calculated and observed structures may be expressed via a function (here called r.m.s., as used by Levitt & Warshel, 1975), except that we are interested in the comparison of backbone topography and measure distances between the Ca atoms rather than the side-chains. The simulation gives r.m.s. = 6.0 8. A theoretical lower limit of r.m.s. with the use of the y variable may be determined by minimising r.m.s. as a function of the simulated conformation, and is found to be r.m.s.,,, = 1.1 8. An upper limit could be defined by replacing the calculated
46
B.
d
5
IO
ROBSON
I5
AND
20
Il.
J.
25
OSGUTHORl’E
30 Residue
35
40
45
50
55
number (0)
FIG. 6. Ooi to the stereo
plots (see the text) of the native pictures in Fig. 5(a) and (b).
(a) and
calculatt~d
(b)
conformations,
corroqwntling
structure by a near-extended chain with y = 0, and is found to give r.m.s.,,, :=71.2 A. An even better “upper limit”, in t’he sense that no structure generated by a simulation would resemble a fully extended chain, is to define r.m.s.,ax as the average generated r.m.s. between a set of random chains and the native structure. This r.m.s. of 23.6 was determined by generating PTI structures by the Mont,e Carlo procedmc of Premilat & Hermans (1975), not coupling 4 and # and calculating statistical weights only on energies based only on interactions within dipeptide units. Using r.m.s.,,, == 23.6, a fairer measure of “native-like” character is given by: Jyzz which imply
r.m.s.,,,
- r.m.s.,,,,
rms.,,,
- r.m.smin
)< 100,
gives 78% for the simulation. A bad simulation could give N ( that, it was worse than a random choice of tertiary structure.
(:,I 0, which
would
COMPUTER
25 P z f 9 f a
SIMULATION
OF
PROTEIN
47
FOLDING
-
30-
0
5
IO
15
20
25
30 Residue
35
40
number (b)
Of the methods considered above, r.m.s. or N seem the most objective and comprehensive measure of success. N still overemphasises crude topological agreement, which many crystallographers would consider unsatisfactory, but provides a better comparison between structures not in close agreement. We note, however, that compact structures made by assigning the same attractive potential to non-glycpl sidechains yields N = ~500/~. Hence simulations must do considerably better than N = 50% in order to be counted as a significant success, and even the simulated structure reported here can only be considered as being in crude topological agreement with the observed conformation. Similar observations have been made for other simulations and numerical estimates of success (M. Levitt, personal communication).
(e)
Effect of folding
from other starting using different functions
conformations,
and
Despite the fully automatic nature of the simulation carried out here, there is always the danger that the results are sensitive to the calculated energy surface,
4x
B.
ROBSON
.%X1)
0.
,J. OSGUTHOHPE
and the starting point chosen in the energy surface. Further studies indicate that this is not the case. Minor changes in the starting conformation lead to similar results, ‘q started in the observed eonproviding the C-terminal helix (residues 48 to 55) 1. formation. The most radically different starting conformation tested was a nwrextended chain (all y = 0). Even here, a comparable degree of success in terms of r.m.s. (7.0) and N (74%) was obtained. In all cases the pleated sheet features formed. although with differing distortions from the observed pleated sheet configurations. at least for side-chain interactions, (which Considerable changes in the functions, have not been considered by many workers) lead to virtually identical results Fol example, halving all the side-chaill-side-chain A,, terms (which reduces R by dO”,, and quadruples the depth of the energy well in a pairnise intZeract’ion) lead bo almost identical final conformations. The fold from the predicted conformation produwd stereoscopic pictures and Ooi plots that were not readily distinguishable from thaw 5.6 and s achieved with the functions described above, but for which r.m.s. 80%. The most’ successful refinement, gave r.m.s. =5.26. The folding simulat’ionx were also not sensitive to drastic reductions in tile disulphide closing potential, although preliminary studies suggest) t’hat in the absenw of a closing potential. r.m.s. values in the range 5.0 to 7.0 can be obt’ained only if, for ttw repulsive potentials between the side-chain interact,ions for \\+ich bherc are no energ? minima, R is reduced by at, least a further 20:5. Disulphide bonds may be a major stabilizing factor in PTI, but, in any event failure to rcducc repulsive side-c*hain potentials leads to r.m.s. values of approximately 10 .A. These studies \\ould suggest that the general conclusions are independent of all rensonablc choices of starting conformation and potential functions. except t)hat t htl potentials br~t~\vet~n A(>chains that are unst,able in contact need further rrfinement. Wra emphasiw t ha’t interactions between side-chains were calculated principally on the basis of solvntion. and that there may be some justificat,ion, taking the possibility of ionic shielding into account,, for treating side-chains of opposite charge as mutually attractivy. In practice, parameterisation of such interactions is one of bhc most difficult problrrns LO be solved. depending on correct treatment of solvent structure. including counterionh. Further: since interactions between polar side-chains depend on interactions bet\vwn one group in each of those side-chains, it) is probable tha~t a correct. description of thc’ir interactions awaits t’he use of model systems that provide a fuller account of’ sid(,chain structure. The not insignificant improvement in r.m.s. (from 74 to 6.U) on start’ing from ;L predicted, rather than extended, structure does suggest the adva,ntagt: of using initial predictions as the starting conformation, although this may be of great,er irnportance for other proteins. One problem of using r.m.s. and S and comparable methods as criteria of success is that by using t’he square of the deviation of dist,ances between calculated and observed positions, overall (tertiary structuw) feat)urrs contribute a greater weight t,han local (secondary structure) features of thr backbontL. Further, because of the ease with which the extended regions (residues 14 to 23 and 29 to 35) of PTI come together t,o form pleated sheet), and sitwe t’hr loop so formed is a that dominant feature of PTI, this protein may be n special Cas(l. The possibility initial predictions are, in general. likely t’o be highly advantageous is suggestecl 1)~. our current preliminary investigat,ion of myoglobin folding.
COMPUTER
SIMULATIOK
OF
PROTEIN
FOLDING
49
4. Conclusions An important feature of this study was the re-examination, with our refined parameters and more detailed backbone representation, of the secondary structure changes during the folding of PTI. As described above, the extended regions between residues I4 to 23 and 30 to 35 were conserved during folding. The extended region between residues 2 and 7 became partly a-helical early in the simulation, in agreement with the observed structure, while the major u-helical region near the carboxy terminus of PTI remained stable, providing it was predicted in advance. Difficulties in obtaining this u-helix without an initial prediction are also reflected in the simulation presented by Levitt & Warshel (1975), when the observed x-helix had to be assigned to the starting conformation, and in the simulation presented by Hagler & Honig (1978), who report that they obtained a left-hand a-helix at the end of the simulation. Nevertheless, the ability to predict at least the greatest part of this helical region by statistical methods suggested that local interactions are responsible for its formation, while this simulation suggests that once correctly formed, the x-helix is largely preserved. Thus the a-helix and p-pleated sheet regions of PTI appear to be relatively inflexible during the major part of folding. If there are relatively inflexible regions of a protein during folding, it follows that part of the remaining backbone must have sufficient flexibility to bring about tertiary packing. Proteins may have hinge-points where the chain bends to bring extensive regions of the protein together. This was suggested by Honig et al. (1974) and by Hagler & Honig (1978). The latter authors showed that assigning glycyl backbone behaviour to residues suspected to be hinge-points in PTI, as done by Levitt & Warshel (1975), leads to a structure with an r.m.s. value of 6.2 A, even if all other residues are alanyl. This shows that glycine is an important “hinge residue”, and that fortuitous agreement may be obtained by emphasising only the glycyl-like and alanyl-like backbone behaviour of residues. Neither the Hagler & Honig or the Levitt & Warshel work says anything about the true “hinge behaviour” of nonglycyl residues of PTI, since glycyl backbone behaviour was artificially imposed a priori on these non-glycyl residues. Our simulation does not impose glycyl-like behaviour a priori, but shows that certain non-glycyl residues do show considerably larger deviations than others in bringing extensive regions of the protein together. These are residues 4 to 6 (Phe, Cys, Leu), 9 to 11 (Pro, Tyr, Thr), 14 (Cys), 18 (Ile), 26 (Lys) and 29 (Leu). Changes in residues 52 (Met), 55 (Cys) and 56 (Gly) result from the failure to correctly predict the C-terminal end of helix 48 to 55, and the movement of residues 52, 55 and 56 does not correct the starting conformation determined by the initial prediction. The glycine residues at positions 12, 28, 36 to 37 and 57 do not move extensively ‘from their initially predicted positions. The fact that glycine and other residues may form stable turns must therefore be distinguished from their flexibility during the course of folding. These findings cause us to differ from Levitt & Warshel (1975) and Hagler & Honig (1978) by distinguishing hinge-point behaviour in the sense used above from the tendency to form reverse turn structures which, like helix and pleated sheet, may be locally stable. This conclusion supports the observation that reverse turns may be fairly readily predicted (see e.g. Garnier et al., 1978). It is, perhaps, the “coil” residues (neither helix, pleated sheet, nor turns) that are best candidates for inherently flexible hinge-points during folding.
50
Iii.
RORRON
ANT)
D.
.J. OSG1TTHORPE
Finally, we note that’, while predictions of coil regions may suggest, possible hinpcapoints, not all coil regions need necessarily be hinge-points. A hinge-point is tlrtirwd by a major cha,nge in tertiary structure during folding. and this cau 1~1 found by a full folding simulation. Future studies should fruitfull,v be directed towards discovrry of which coil features are largely locally determined, and which arc inherently mow flexible. Although we find that prediction of tertiary structure from an extended structuw also achieves comparable results in r.m.s. fit.. we find tha,t minimisation from initial secondary structure predictions gives a better account of secondary structure in the minimized conformation. Our rationale differs from tha,t of Levitt & War&e1 (1975). and Kuntz et al. (1976), because we ascribe equal importance to both overall (tert’iary) and local (secondary) backbone arrangements. Tt is more closely related in this context to the rationale of Tanaka. $ Schernga (1975). As an example of the importance of initially establishing good secondary structuw. we again note that, in folding from the extended form. the major a-helix of P’l‘l Dora not form unless initially predicted, as in this work, or imposed, as in ttw work of’ Levitt & Warshel (1975). Whether or not this is t,ruc of all represent’ations and cncrg~~ functions, initial good secondary predictions of sbrucburc are likely to he important in the technology of protein folding. More important, it suggests t’hat initial nucleating of secondary structure features in the real molecule is essential for producing thtb detailed arrangement, of groups vital to the biological f’unction of R protein. The present) studies art: to be regarded as preliminary invwtigations. illustrating the value of the y-variable a,nd empirical methods for para,metw assignments. ( )rw possible criticism is the imposition of a disulphidc hridgc closing potential. Ho~vww. the importa,nt point remains that the gl,vcyl rcsiducs wt‘w not thaw bhat underwut gross conformational changes t,o produce the folded structurc~. ilnd it is difEcult to see how t’hia could arise as a. consequenw of the tw~tmrnt of disulphidc bonding.
Rl~‘l?IawNC’ES 2 Brad, Brant, Chothia,
D. A. & Glory, t’. J. (1965). J. Arrher. Clwu. SW. 87, 2501 2800. D. A., Miller, \V. G. & Flory, I’. cJ. (1967). J. :%lol. Hid. 23, -CT- 6.5. C. (1973). J. iWoZ. Hiol. 75, 295- 302. Crampin, J.. Nicholson, R. H. & Robson, R. (1978). ^Vatcrre (/,or~rlor~). 272, 558 500. Crawford, .I. L.. Lipscomb, IV:. N. & Scllr~lhnatr. (‘. (:. (1973). I’loc. ,Yat. rlcatl. Sci.. I ‘.S.,-l
70, 538 542. $J., Osguthorpe, D. J. & Robsol~, 13. (1978). ./. J/o/. Hiol. 190. !Ji 120. K. D. 8: Scheraga, H. A. (1967). Proc. I2’at. Acat/. i%i., I’.S.=l. 58, 4%0--426. I. H. & Robson, R. (1979). ./. ‘Ph~c,ret. Biol. 76, 83-98. A. T. & Honig, B. (1978). Proc. Kat. Acad. Ski.. 7’.8.A. 75, 5.54 568. A. T., Hular, E. & Lifson, S. (1974). ,7. Amer. Chon. SW. 96, 531%5327. B., Ray. A. dt, Levinthal, C. (1976). Proc. Nat. dcatl. Rci.. T’.S.d. 73, 1974 1978. R., Kukla, D., Ruhlmarl, A. & Stcigrmantt, W. ( 197 I ). Poltl Z+ql Harbor S!/VLI~. &uant. Biol. 36, 141- 150. Kuntz, I. D., Crippen, G. M., Kollmatl. I’. A. & Kirnehnatr, 13. (1976). ./. dlol. IZlol. 106, 983- 994. Levitt, M. (1976). ,J. No1. Biol. 104, 59~ 107. Levitt, M. & Warshel, A. (1975). Nature (Londor,), 253. 694 698. Moult, J. & Hapler, A. T. (1978). Nature (Lomion), 272, 222~ 226. NBmethy, G. & Scheraga, H. A. (1977). Quart. Rev. Riophys. 10, 23!+ 352. Nt?methy, (: . . Philips, D. (‘., Hnatll. S. .J. &. Scl~erapa, H. A. (1967). Saturr (/,o~~do)/).
Carnier, Gibson, Hillier, Hagler, Hagler, Honig, Huber,
214 ,. 462-765 %I.
COMPUTER
SIMULATION
OF
PROTEIN
FOLDING
51
Nozaki, Y. & Tanford, C. (1971). J. Biol. Chews. 246, 2211-2217. Ooi, T. & Nishikawa, K. (1973). Jerusalem Symp. Quantum Chem. Biochem. 5, 173--187. Pohl, F. M. (1971). Nature New. Biol. 234, 277-278. Premilat, S. & Hermans, J. (1975). J. Phys. Chem. 79, 1169-1175. Ray, A. (1971). Nature (London), 231, 313--314. Robson, B. (1974). Biochem. J. 141, 853-867. Robson, B. (1976). Trends Biochem. Sci. 3, 49-50. Robson, B. & Hagler, A. T. (1979). J. Amer. Chem. 25’0~. In the press. Robson, B. & Pain, R. H. (1971). J. Mol. Riol. 58, 237-259. Robson, B. & Pain, R. H. (1974). Biochem. J. 141, 869-882. Robson, B. & Suzuki, E. (1976). J. Mol. Biol. 107, 327-356. Robson, B., Hillier, I. H. & Guest, M. F. (1978). Chem. Sot. Faraday Trans. 11, 74, 1311-1317. Tanaka, S. & Scheraga, H. a. (1975). Proc. Nat. Acad. Sci., U.S.A. 72, 3802-3806. Tanaka, S. & Scheraga, H. A. (1976). Macromolecules, 9, 945-950. Tanford, C. (1970.) Advan. Protein Chem. 24, l-95.
iNote added iv& proof: Several points were raised by the referees which while dealt, with in the text admittedly deserve special emphasis as t,o their implicatjions. (1) The r.m.s. used hrre is the function used by Levitt & Warshel (1975), not by crystallogra.phers, and is used here for comparison. As the sole criterion (which it is not) it is of limited value. (2) By assigning (various) identical side-chain parameters t)o all non-glycyl residues in a comparatjive test (see Results, section (d)), this test’ procedure becomes similar to that of Hagler & Honig (1978). Since a poor result N = 501;, was t)hen obtained, it further emphasises that’ the ot)her results obtained by us are not procedure-dependent, and that N= 78y{, obt,ained in the principal simulation indeed reflects the important, role of the side-chain parameters. The reason that, the Hagler-Honig result was much better than N-50?;, is that they assigned glycyl backbone behaviour to cert’ain non-glycyl residues as did Levitt & Warshel (1975). As emphasised, we did not do this, since it is unrealistic and, according to Hagler & Honig, exploits knowledge of the observed structure and opinions about the glycyl-like behaviour of certain non-glycyl residues. (3) Finally, while we agree that) glycine is intrinsically more flexible t)han other residues in dipeptides, we disagree about, t,hc extent t,o which glycine can escape from reverse t,urn conformations once these arc formed. According to more recent. detailed Monte Carlo studies by Hagler, Osguthorpe and Kobson, the effect of solvent in stabilising the cc-helical conformation of a dipeptide has probably been urlderestimated in Methods, section (b). However, it is probably still a good estimate for an a-helical section,, since favourable inter-residue hydrogen bond interactions reduce opportunities for residue--solvent interactions.