d. Mol. Riol. (1986)
188,
259-2X1
Refined Models for Computer Calculations Protein Engineering
in
Calibration and Testing of Atomic Potential Functions Compatible with more Efficient Calculations B. Robson
and E. Platt
Laboratory of Theoretical Biochemistry Department of Biochemistry, The Medical School ~~nie~ersity of Man,chester, Manch~rster M 13 9 PT, IT. K
(Received 22 May 1985, and in! reuiwd form 5 November 1985) A reappraisal has been made of interatomic pot.ential functions for protein structure calculations using the all-atom approximation (except CH, CH, and CH,. which are treated as “united atoms”). Some key problems are identified and treated. The potential functions are somewhat novel in form and consistent’ with more efficient and robust folding algorithms. In addition, the potentials are calibrated for the rigid geometry approximation, since use of fixed standard bond lengths and valence angles (and fixed trans planar peptide groups) reduces the number of conformational variables and saves a great deal of computer time. Though these algorithms demand the use of pot,ent,ia.l functions of this special type, these functions can be readily implemented in more classical programs for the conformational analysis of proteins. They are calibrated or tested against a large body of experimental data, including (1) extended basis set ab in&o, quant’um mechanical calculations, (2) nuclear magnetic resonance spect,roscopic data and dipole moment data for di- and oligopeptides, (3) characteristic rat’io data for random coil homopolypeptides, (4) extensive data from peptide solubility studies, and (5) experiment)al structures of polyalanine fibres and globular proteins. This paper will form the basis of a further report, which will include investigations of hoa wat,er might be more realisticallv represented subject to the computing power available.
1. Introduction The ability to predict the conformational changes that may occur when the amino acid sequence of a prot.ein is modified is of fundamental importance in protein engineering, as is the ability to predict the conformations of natural proteins when experimental determination is prohibited or likely to be time-consuming. This paper describes an analysis of the interatomic forces pertinent to the conformations of peptides and proteins, resulting in a set of parameters suitable for computat,inn in prot,ein engineering. Prediction of protein structure by folding simulations has been attempted by several workers, including Levitt & Warshel (1975), Robson & Osguthorpe (1979), Levitt (1983a,b) and Robson et al. (1985). These studies represent various levels of detail concerning the model of the protein used, e.g. in some cases side-chains are represented by single “dummy atoms”. In many respects, such studies
come close to the kind of calculation required to explore effects of sequence modifications when the native conformation is known, except that. gross simplification of the protein structure would be less appropriate for the latter, since more detail is required. Hence. this paper describes a model in which all atoms are represented, except, that CH, CH, and CH, groups are represented as single centres of interaction. The reason why calculations similar to those used in folding simulations seem more appropriate t,o calculating effects of sequence changes is that. such changes may shift the protein conformation to a new, if neighbouring, minimum in the energy surface. Generally, we may say that new minima and intervening barriers are introduced. This is clearly the case when a larger sidechain, with more rotatable bonds, replaces a smaller one. There may also be an overall shift in the conformation of the protein, crossing a kinetically facile energy barrier. Techniques involving energy
minimizat,ion to the nearest minimum (as used in t.hr energy refinement of protein co-ordina,tes) or molecular dynamics simulations. which are limited to short time-scales and crossing only of very lon energy barriers. would most generally be misleading. In addition, it should he ensured that the change in sequencedoes not lead to the stahilitjr of it11 open form exceeding that, of the native-like state. which requires extensive energy minimization of open forms. Such calculations are computationally t’ime-consuming, and the model presented here is more rapid in that (1) the rigid geomet)ry approximat’ion is used (bond lengths, valence angles. and rotation angles round at least partiall) double bonds are fixed at standard values). with the interatomic parameters calibrat,ed for such a representation, and (2) because the pot)ential funclions for interatomic interactions are of a special form compatible with a class of rapid minimization techniques, of which one example is described &ewhere (Robson, 1982). The role of the solvent in determining prot)ein conformation is worthy of consideration. This is pa,rticularly so since. in our experience, an inadequate solvent model is t’he main cause of failure of existing potential functions to guarant,ee t)he csalculat,edstability of the native state over more open forms. This problem is not generally apparent unless powerful minimization techniques capable of cdrossingmany barriers are employed, since the low energy open forms must be locat,ed. The present model takes account of the hydrophobic3 effect, though it is found that,, at the level of the energy of int,rrac+on bet’ween each pair of atoms, t’his is of the tnagnitude of the weak van der Waals’ attractive force and hence, despite the fa,ct t’hat this is readily calibrated to reproduce the experimental hydrophobicities of whole side-chains, it does not in itself resolve the above problem. Rather. it, is found necessary to introduce a more explicit account of water molecules and an approach used is to a&l dummy water molecules int,o the model to allow for the fact t)hat the stabilitv of an intramolecular hydrogen bond is not sign&ant,ly great~erthan t’he satne hydrogen-bonding groups engaged with wat’er molecules. Trea,tment of this problem turns out to be possible though non-trivial. and this will be discussedin a subsequent paper in conjunct,ion with 11. C:. E. Sternberg. This paper is. however. useful n-hen considering conformations that are already compact (as in exploring sequence modifications of proteins of known st’ructure) or in folding simulations in which, as an interim measure. a compaction field is introduced (Robson et al., 1985). There is a reservation since, as will be discussed elsewhere. solvent molecules can affect the conformations of surface side-chains. Sonetheless. the model as prt’sent,ed here provides a more det’ailed 1reatment of solvent, effects than do those previously employed. using parameters based on quantum mechanical calculation of side-chain-water complexes as well as t’reatment of the hydrophobic effect on an atotn-atom interaction basis.
2. Theory and Methods E,(s). the potential energy of a molecule ivith I/ atoms and 77~ rotatable bondsis given by: L-J j:n k-:m E,(s) = c 1 c’ij(“)+ c z(a,). (1) f=l ?i=1 where eij(s)is the energy interact*ionbetween atoms i and j. s dependsupon. but is not equal to. rijt thta dist,ancr betweenatoms i andj (Rohsonrt ~1.. 1982).This unusual choiceof’ replacing l‘ij direct,ly is describrd in sections(b) t,o (d). below. I(Q) is the interna,] rotational pot,rntial ior thtl rotatable bond SIP.The enrrpirs of interaction pii are given by the analytical expression: j=*
Pi,= AijSY- BijS6$Cij.S.
(2)
where A. R and f’ parametersmodel the van der Waals forces (repulsive and att,ractive), the elert,rostatic* interaction and the hydrophobic,
effect. Z(Q) is given I)!-:
I(cq,)= bg (1 +c’os(S,(?,-a,))),
(3)
where K, is the rotational “force constant”, (lk ix a reference value of the angle cykand -Vk is the number of minima in t,he internal rotational potential.
Internal rotational pot’entialswereintroduced to modrl more realistically the conformat,ional behaviour of sidrchain rotatable bonds. involving CH, CH, and CH,, which are represent’ed as united atoms.
(b) The
s(rij)
dejinition
is dependent
and
upon
signi$cance
the
oj s
co-ordinates
(xi, yi. zi)
(xj. yj3 zj) of atoms i and j. and is defined by t)hefollowing algorithm
(Robson.
1982):
p = (I.i-.rj)2+(yi-yj)*+(Zi-7j)2
= r;
q = 1 +025p s = 2q~(q*+p’)
Note that r’. the reciprocal of s. is a poor rstimak of thtx Euclidean distance between atoms i and j. However. the resulting value of eij(
Refined
Models for Protein
Engheering
261
Table 1 C’ompnrison
of interatomic
energies
for
interaction
between
carbonyl
oxygen
atoms
in
r and I’ spacr Separation iA) -
van der Waals’ energ? (kcal mol-‘)
Electrostatic enerpv (kcal mol-‘) ‘-
(I)
CT’)
(r)
(1.7
(II
0.1 0.5
0.5 0.65 1.5 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.8 4.0 4.2 4.5 4-7 5-o 5.2 6.0 6.8 8.6 15 31 52
5 x 10’3 2 x 10’ 1067 41.18 15~88 6.22 2.36 0.78 0.142 -0.106 -0.186 -0.197 -0.181 -0.157 -0,132 -0.109 - 0.090 - 0.074 -0.041 -0.02 -0.01 - 040 0 0
2x 10’ Ix 10” 1056 41.18 15~88 6.20 2.33 0.78 0~117 -0.121 -0.19” -0,194t -0.171 -0.142
703 111 46.9 33.5 30.6 28.1 26.0 24.2 22.7 21.3 20.1 19,o 18.0 1i.l 16.3 156 14.9 14.3 12.X 11.7 9.9 6.9 44 3.:‘,
1 ,5
2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9 -1.1 4.3 4-5 4.7 44 5.5
6.0 7.0 IO 15
20
-0~11-1
- OG90 -0.070 - 0.054 - 042:i -0.01 - 0.00 - 0~00 0 0
0.7 139 108
46.8 33.5 30% P8- 1 264 24.2
224-j “1-l 19-9 18.7 17-6 16-i 15.X I49 l-l-2 13.5 11-6 10.4 8. I 4.6 2.2 1.4
In r’ space, r’ = (1 +1~5r2+0~06251-4)~(2+0~~~r2) replares the normal value of T in t.he potential function. Note that r’ = l/s (eqn (2)). t Minimum energy in van der Waals’ interaction.
(c)
PILysical ,jnsti’cation functions
for
the nature
of potentia,l
used
Departures between the values of eij calculat~ed by the 2 met,hods occur (1) at. small and (2) at large r values. At small values of r, the use of s softens the repulsive and attract,ive forces at) close range, the merits of which have been recognized (Robson. 1982: Levitt. 198%). \Yithout some approach of this kind, the terms Arm9 would go to infinity at r = 0. For non-zero but short distances. the trend in r is in better agreement with the trend obtained from SCF-MO ab initio calculations (Hillier 6 Robson, 1979), where the 9th power moves t,o a soft,er funct,ionalit~y at. short. contact, distances. *tt, longer range. it. is the electrostatic contribution to cij that, is affected primarily. Attenuation of the electrostatic interaction occurs as 7 increases in a manner that is consistent with a progressive increase in the value of the dielectric constant with increasing r. This behaviour is grnerall~~ considered desirable and has been modelled by other authors in various ways (see e.g. (ireenberp et ul.. 1978). Thus. the use of s forms part) of our model of the solvent effect. ((I) (~omputntiod
justiJication for potPntia1 functions
the nature
of th
The merit of the potential functions in regard to computation resides in a number of factors. Some of these are not dependent on t.he particular choice of software or “protein engineering model“. Others are, and of particular interest are t,he SIMPLEX method and the method of zero minima a.s described in t.he following section. The efficiency and robust,ness of these depends on
the potential functions having the form imposed by the above s-algorit’hm. More generally. an att)ractive feature of the s-algorithm is its avoidance of a square-root calculation. XII energy calculation rout.ines in conformational ana.lysis using pairwise poten& functions require Z nested caomputational loops, which select, all relevant pairs of atoms t,o evaluate their energy contribution. The innermost loop may be by far the most time-consuming part of the calculation, such that the computation t>ime required to solvtl a problem. once various initializations of the variables have been performed, is largely governed hy thr within the innermost loop. Depending on coding hardware and support software, the rate-limit.ing step is the evaluation of a square-root function to piv~ the interatomic distance of a pair of atoms. Thus. man> workers employ “look-up tables” of square root)s of r2: only a finite number of table elements can br stored. so unfortunat,ely some time-consuming c&ulation is required for interpolation of the tabulated values. The s-algorithm substit)utes 2 additions, 3 multiplications and 1 division for a square-root call and a tlivision. This is generally faster. and sometimes considerabl? so. hut t,his critirally depends on the hardware and software support. and particularly on the degree of c-ode optimization for square-root,. division and multiplication operations. With proper regard given to efficient programming. it is never slower. Thus. the s-algorithm seems an excellent vh0ic.e for programs to be run on a variety of machines. though the fact that it may be less advantageous on any one machine means that speed is not its most important merit.. This is with the important esception that substa.ntial speeding overall romes from indirect hen&s
.I substantial and appm~ntl,v universal merit of thcl s-algorithm is that it keeps the value of the potential t’1inc.tion finit.e. aud more important rralisticall> IOU. whtan the implied value of r approaches zero. That is. when atoms make a trlosr approach t.o racah other. A c.lassichal problem is t.hat t,he finite precision of a machine tlot~s not allow precise evaluation of the gradients in the pot,ential surface when at least one pair of atoms make a c~low approach, This csondition may arise in the c~hoic~eof t.hr initial st,art)inp conformation. the generation of a bad ljrobe point in a numerical gradient, method using post c4culat,ed gradients by sampling. and particulari>. at c*c,rtain stages of operation of a SIMPLEX procedure (see the next section). Larry large value derivat,ives calculated from derivative equations for gradient, methods and molecular dynamics can be a serious problem. In somr instam-es. these are also considerations of robustness. sin(.r the program may occ*asionally a.ctually ‘.blow up” in rrsponse to an overflow or underflow condit,ion of the accurnulat~or. The function forms proposed by R,obson (I!N) and Levitt (19836) avoid such difficulties. Though I,tAvitt’s approach involves a formulation not, entireI> dissimilar to the s-algorithm. it does diffrr fundamentally from Robson’s proposal by requiring the cust’omary evaluation of a real Euclidean dista,nce r, wit,h a postmultiplied correction t)o the pot,ential function. which we find to br slower. The 1nodificat)ion that the s-algorithm makes to longrange electrostatic interactions. consistent with our treatment of the electrostatic effect (see section (c). above). is clartainly less important than t,he above c*cmsiderations. However, there is a rrduct,ion in the nurnb~r of deep minima t.hat. can arise from several electrost~ati~ interactions: t.his presumably facilitat,es the use of t)he SIMPLEX type of procedure described in the nest section. and one case has been observed where this was of importance. Of considerable pract,ical importance is that the s-algorithm. which “fools” the computer into thinking that a wry close approach has not occurred by transforming the true interatomic distance, avoids the artifacts that oc(*ur in some potential functions for low values of r. Examples include the energy hole that results at 14ose approach when exponential functions of van der \\‘aats’ repulsion are employed. and when an amide hydrogen at,om makes a close approach to an elect.ronrgativta receptor atom in the potent,ial functions described by Hagler r/ nl. (1974). lising gradient m&hods. this is not normally a serious problem except by an unfortunate c.hoire of starting conformation. but it IS a problem when H powerful minimization procedure capable of exploring many minima by crossing energy barriers is employed (see the SIMPLEX method described below). Then, c*onformations may be attained that involve a wholl> artificial deep energy well and wholly artificial close contacts. Some numerical gradient, methods may give similar problems. This difficulty is avoided by the s-algorit,hm for what, we would consider as any reasonable cahoic*r of pot,ential fun&on parameters. \Vrittrn into a ln-ogram. the s-algorithm would allow the robust use of ruost types of potmt,ial fun&ion (although t.he functional form is thus c-hanged somewhat. we recall that the
variat,ion 1s relativeI>- minor). >lorc rnlportil~\t its us<‘ employed in para1neter c*alibration from t hca clutstst 21s reported here. c hsve no ljossibility of implying a very deep nrtitic.ial hoIt>. and brrcnnt~ more rflicirnt wit.h drc*rrasing tliffc~rrnc*c~ brtwerr~ the highest. and loarst. rnrrgy saltit+ ~)f’ t ht. func.tir)tl of’ (real or implied) r. These c.or1sidel.ittioi1:: arisr because the constant cij term to be added to tht1 func.tior1 (see beIoN ) ih the absolute value of tllfl IoH e.-t tanerg> at any point in t,tir function. It is noteworthy that. although many of t,hesr potrnt1al dificultirs could be avoitifd in other \r-ays. these invariably lead t,o the need for further t,in1~-c,clrlsurni11~ calculations. decision t,est,s that prohibit vectorization on currently available hardware, or introduce disc*ontinuities into the c>nrrgy surfacae or derivatives of it.
minimum
mdhod
Here \ve describe the algorithms in our modelling approach that, are most pertinent t)o the potential form chosen above. Those less pertinent in that respect will be described elsewhere. The SIMPLEX method of minimization is wideI> available in the program libraries of most c*omputer centres and in its modern form was first drsc+ribed b? Welder & klead (1965). Its first known application to proteins has been described by Robson & Osguthorpr (1979: set‘ also Robson, 1982, and Robson VI nl. 1982). The principles are thus presented only briefly her,,. emphasizing the changes associated wit.h the (*on1 bined use of the novel zero mini1nurn algorithm. Stricstly. the SIMPLEX procedure as used by us was written rspreia.ll> for applications in protein engineering. and does not rorrespond to standard versions. However. the changes exploited in this study relat,e to speeding. and deliver the same results. Other improvements. to be described elsewhere. are for simplicity and reproducibility omitt,ed in these studies. As in any procedure of t,he SIMPLEX class, the core feature is the manipulation of a simplex, i.e. of S+ 1 point,s in the paramet,er space, 9 being t)he number of variables of the problem (here dihedral angles). Once t.he simplex is generat,ed. only one point is moved at a time. For example. a major operation is the reflection of the point of highest funrtion value (here, conformation of highest energy) t,hrough the centroid of the remaining points. This ‘implies the concerted rot’ation of man? bonds. Movement of points by this and other operations is tentative. i.e. the new conformation may be rejected and another operation performed if the new point is of higher function value t,han that of a reference point in t,hr simplex. Typically, this reference point, is that of highest function value within the simplex. The precise function value of the tentative point is not, required. only the qualitative information that it is of higher energy. Tt is this disregard for gradient information in microscopic> regions of the function surface. and regard for only the more general gradient trend implied by t.he points in the simplex in t.he context of the operations such as reflection
Refined
Models for Protkn
performed on the simplex, that makes the simplex class of methods highly suited to minimization of functions with many small minima. The small minima are transparent to the procedure, depending on the regulatable spread of simplex points. In any event. the absence of any need to know the precise function value of a tentative point opens the possibility of a class of estimator methods by which the full function need not be evaluated for every point. The zero minimum algorithm provides such an “estimator”. More precisely. it allows the part’ial generation of a bentative conformation. and partial evaluation of the energ?, to inform the minimization procedure that the tentative point must be of higher energy. perhaps by just a very small amount.. even t.hough function evaluation is not complet,ed. The principle is outlined below. At present it suffices to state that this allows conformations t’o be rejected more rapidly by an early prediction of them as “bad” tentative points. This zero minimum algorithm thus has an eficiency depending on the proportion of bad simplex points generated, but would have no effect on a hypothetical ideal minimization procedure in which bad points are never generated. This is in t,he context of global. not local, minimization: a good gradient method used alone may not normally generate a higher energy point. but will become entrapped in the nearest. even microscopic and kinet’icallv facile. minimum. In practice, then, a perfect and practical minimization procedure in the above specific senses cannot be envisaged and the zero minimum algorithm alwa,ys enhances program speed to an extent depending on the efficiency of the specific minimization procedure. the nature of the molecule, the nature of the problem being treated, and thr details of the region of the conformational energy surfacr currently being traversed. The zero minimum algorithm (Robson. 1982) allows a conformer of ‘vi atoms, with conformation and energy evaluated only up to atom i, to be rejected without completion of the structure if EF > Ei, during minimization. Here. Ez is the previous conformation a with energ E” evaluated for interact.ions between all n atoms. and Ef is. similarly. t.he energy of conformation and evaluated only up to atom i. The zero minimum method ensures that if EF > Ei. t,hen Ef: > Ez is necessarily implied. In fact, this is a ver)’ conservative rejection criterion. and equally safe but much more efficient is rejection at that atom > at which Ef +F(i) > 8’:. Here F(i) is a function dependent on at)orn number and overall covalent chemistry. Choices of F(i) will be discussed e&where. but clearly it. aan be found by t,rial and error if a good but not necessarily optimal choice is required. The philosophy of the zero minimum approach is to recognize that rejection of an uncompleted structure, seemingly of high energy. is riskv and ma\- lead t,o premature rejection because of the possibility of negative interaction energies not yet counted. To circumvent this. all functions are made to have zero-positive enrrgp values at all int,eratomic dist,ances. in such a way as to affect only the absolute value:: of the energies of conformers. but not, relative vnlurs. This does not demand a change in t.he form of the potential functions presented in this paper. It implies adding a distance independent but atom-type-dependent caonst,ant Zij to the right-hand side of eqn (2). This is minus one times the minimum value (as a function of s) of eqn (2). which is dchpendent on dij. Bij and (Ii,. We refer to a potential cont,aining the cmonstant Zij so calibrated as a zero minimum potential. The name is associated with the fact t.hat the minimum value of the potential function is reset at. zero b,v the introduction of Zij. 6
Engineeri~Lg
(f) Palibration. (i)
(‘hoiw
of functions
263
of
the potrntial
as a starting
poinf
functions for
rrjnrnwnt
Recall that the 9th power term of eqn (2) is prefer& over t.he popular choice of 12th pcnvver becausr it is consistent with both t.he repulsive functiona1it.y implied by extended basis set ab initio IL’AO-MOS(‘F calculations (Hillier & Robson. 1979) and with the experimental conformation-dependent propert,ies of simple peptides (Robson et al.. 1979). These comparisons also suggest the merits of the parameters given by Hagler et rrl. (1974) calibrated against the crystal structures. heats of sublimation and dipole moments of peptidc,-like mokcules. Thus. the parameters given by Hagler et nl. (I 97-L) provided a basis for refinement of paratnet.rrs for the protein rasp. We also followed the philosophy of Hagler r/ al. (lB74) in calculating the values of .jij. /ji, and (‘i, according to the geometric mean constraints. i.? Aij = (diiA,j)l,‘2 or
Aij = .lidj.
wherca
‘4.1 = AI8 .j !..‘2.
(8)
In the case of paramet,ers C’. each (‘, and Cj term is to br taken as the partial atomic charge in appropriate units (here reported in electron units (e.u.)). Experimental dipole moments and ab initio quantum mechanical calculations are obviously of value in determining these partial charges. d and K being rr-calibrated for consistency with sssignments made 011 that basis. (ii) Rf$vwnwnt of thr potentid &nctions Refinement was generally carried out in 3 stagrs (SW Results for a detailed discussion). Step (1). .4b initio LCAO-MO S(IF e&ulations at the 3.~1%~ le\,el were rarried out or previously published calculations were used where available (reviewed by Platt & Robson. 1983). both for conformational energies and assignment of atomic8 charges by Mullikrn population a,nalysis (Mulliken. 1955). The structures used w?I’cL X-acetylmet.hylamide, ~~-acet\il-glvrvl-S’-mrt~h~lamid~~. ,V-formylalanyl-LV’-methplamide and various side-(#hain analogues (see below). Step (2). The potential functions were calibrat’rd to producae the optimal fit to conformational energies from ab initio calculations. using as a starting point (I) the *-! and R parameters given by Hagler it 01. (1!)74) for tht> closest at)omic t?pe. and (2) the charges f’ from thr JIullikrn population analysis. St,ep (3). Thesr A, H. C and I(%,) paranlet.ers wt’re t.hrn further refined to fit’ optimally the following simultaneously. (1) Dipole moments of t,he constituent c.hrmicA groups (McUrllan. 1963). (2) Dipole moments of 14 u.r,-alaninr oligopeptides, using the data and met,hod of Flory & Srlrimmel (1967). (3) ?;uc4ear magnetic resonance vicinal SH -(‘H coupling constant of S-acetvlalan?rl-S’-mt?th~larnide. (4) Thts rharacteristic ratios of hornopol?lpel~tides unbranchcd at C” by the method of Brant & Flory (1965). (5) Transfer free rnrrgies of a side-chain from water t,o a non-polar solvent: (Models the hydrophobic. interaction.) (6) Experimental barriers of rotation. Step (4). The resulting parameters were then rechecked against the ah inilio data to check that good agreement had been retained. There was no evidence that signifipant,ly different results would be obtained if all
experimental and cluantum mechanic~al dat,a were tittcd sittinltaneo~tsl~. .I referee has pointed out that the use of such nuc,lear trragnetic~ resonance data (item (3) of st’rp 3. above) is not a sensit,ivr test. This is t,o some extent so. as we have found earlier in conjunction with Dr A\. T. Hagler (Robson it 01.. 1979). It) must) be applied. however. since it would be problemat,ic if a tolerable result is not c~hined. This failure is certainly possible with reported pot,ential functions: what we have found is that. of those functions set,s that do produce tolerable agreement. the results are not sensitive to caertain subsequent changes (c,.~. c4anges in implied nitrogen dimensions, introdu&on of any dielectrics constant)). Valculat,ion of the chara?terist)ic+ ratio is a sensitive test), since calrulated values may ranpr from 2 to infinit.y. in theory. It is uot hard to fi11t1 established potentials that’ do not rrproduw the cwrrwt value. In the rigid geometry approximations. the functions described by Hagler et 01. (1971) give about IX. c~ompared wit.h an awept,ed experimental value of X to 9 and. including older data 5 to Il. The test of dipole moments of n.l,-oligoalanine is of int~ermediate srnsitivity. The ability to tit all 3 simultaneouslydoes Ilot automatically follow as facile (Robson c,t 01.. 1979). (‘ombinetl w&h further tests, such as comparison \vith high-grade quantum mechanical calculations. an o\-era11 tolerable fit provides a degree of cwnfidenee. Equal wighting was given to all the data in t’he calibration prowss. justified by the observation t,hat. rr pon~~riori. a tolerable fit) to all the dat,a caan be achieved. We might note that. whatever the underlying physical reactions. there is considerable merit in being able to reproduw local and random coil brhaviour preferences for applicw tions in prot,ein folding for an extended chain. In the case of (‘. (‘H (aliphat,icz and aromatic). (‘H, and (‘H, paramrt,ers. valyes .of A and II were calibrated for the hydrophobic interac%lon (see below) between st,eps (2) and (3). Agreement with (16 initio results is still welllwserwd. in that the equilibrium energies for the hydrophobir interaction emerges as being 1%ithin 0. I kral mol - ’ (I kcal = 4.184 kJ) of those for t,he van dpr Waals’ interaction (see below). It may be noted that n6 i~ifio IA’AO-MOWF calculations are widely held not to give an accwunt of dispersion forces; nonetheless. the values of H assigned t)o the (it,h power can be czhosen to tit simult~antwusly the a6 initio results and the hydrophobic interactions and dispersion forces. Xote that t,erm H is of fundamental importance in t,ailoring the repulsive term. rather than simply modelling the dispersion interacation.
This is a qualitative (*heck on the validity of the results for atomic dimensions implied by the chofce of A and I? values. tvhich should give rise to equilibrium distances roughly propnrtional t,o t,he sum of experiment~al van der \\‘aals’ groups in proteins. One diffirult,y here is that untrea,t)ed crystallographic data may contain unrealistic atomic clashes or, if treated by energy refinement, may imply distances dictated by the potential functions used in t’hat refinement that would be clashed by other criteria. To some extent. these difliculties have been c~ir~umrented by a novel “atom growth” algorithm first developed by us at a (‘.E.C..A.M. Workshop. Atoms are prawn from the points at) the reported co-ordinates at a relative rate proport,ional t,o initially assigned van der i\‘aals’ radii. (:rowt,h of any pair of atoms is stopped when they are about to overlap. i.e. the sum of their current radii is stopped when t,hr sum of their radii will
rxcwtl the wparation distanw m the next inc.rrment lit radii. \VhcsIi growth of all atoms has been st’oppeti in this \vay, the average cwtact radii for each atom or grou]) (e.g. (‘H,) is csalwlated. and used as the rw\l stat of radii for a new growth c-v&,, st)arting at, the given w-orclinateh. This procws is re&atrd I’or ahout 8 to 12 c*yclrs until c~c,nvrrgc~nc~c~of radii occurs. when the results I’MI ht. shown to brx independent of the initial set of \-an de1 M’aals‘ raclii c*hosrn. A\ major justification tc,r this approa(‘h is that it can be used automatic&ally to assign atom typrs. For example. a group with a bimotlel distribution of radii should be split int’o 2’ groups I+ ith further qualiticsation of their type. Hen egg-white lysosomr. bovine ribonuclease and sperm whale myglobin (then using data in the Brookhaven I)ata Katrk; Bernstein it ~1.. 1977) wew used to obtain tht, avrrap caont,act radii reported below. which Lvertl c.otlsistvnt \I ith the results ohtained from other proteilrs. Thtb 3 IIWI trt>rt illustrate the range of diversit!- of the radii ohtainrtl (f’ol, example. there are man!’ c-loser contacts of bac~kbow groups in helix-rich myoglohirl).
3. Results The non-polar groups comprise aromatic carbon atoms without’ attached hydrogen, and the groups CH (aliphatic and aromatic), CH, and C’H, (aliphatic). They also constitute the principle hydrophobic groups. In practice, the parameters A, B and (’ for these groups cannot’ be considered independently from t’hosefor backbone atoms, since the backbone contains a CH group (CH, for glycine). while the first, CH. CH, & CH, group of the side-cxhain has int,imate interact,ions with the backbone and a pronounced effect on bac*kbone behaviour. Nevertheless. t,hey represent the most convenient, starting point, Initially, t,hese particular calibrations were also carried out using nh initio calculations at t,he 3sj2p level for molecules with methyl groups and aromatic rings, and there are also data from st,udies on end-blocked alanine derivatives (Hillier & Robson, 1979). Mulliken (1955) population analysis shows that overall zero charge can reasonably be assigned to t,hesegroups. though in individual cases there may be a slight positive or negative surplus of charge due to the electronegativities of associated groups. To some extent, these may be absorbed into the ot,her solvent-dependent parameters calibrated as described below. Only certain values of H satisfactorily shape the net repulsive cont’ribution near the minimum of the int,eraction potential. and it must be ensured that assignments of H are consistent with t’his. These calculat’ions show that, choosing A = 75 x 103 and B = 19 x IO2 for CH. CH, and CH, groups gives a reasonable fit of the energy calculated via potent,ial fun&ons t,o the rcb inifio results. The minimurn value occurs at an interatomic separation of 3.9 A. whicah is consistent with the values obtained below by ot,her methods. The value at the minimum is -0.18 kcal mol-’ (atom pair) less than the energy at infinite separation (classically zero, but Zij in t,he
Re$ned
Models for Protein
zero-minimum potentials). which is also in excellent accord with the values det,ermined below, at least, for aliphatic groups. Subsequent’ly, nonetheless. me assumedthat A and R for non-polar groups are best calibrated against empirical data, providing this still gives a fit to the ah initio data within the accuracy of that method. R,ecall that the basis of the calibrat’ion against empirical data was to assign A, R and (’ such that the hydrophobic interact’ion is modelled. More specifically, the parameters are chosen such that. for sidrh-chainsburied in the non-polar interior of a protein: their energy of interact’ion wit’h the rest of t,hr protein will be equivalent to the experimental transfer fee energies of the side-chain from water t,o non-polar solvent’. Thus, strictly, the int,eraction energy of these groups is a free energy rather than an int,ernal energy: it includes dominant entropic contributions from the ordering of water round the non-polar groups in the transfer experiments. A% hydrophobic* side-chain in a protein has many int.erac%ions with the surrounding protein at,oms. From the outset. we did not expect t’he energy of each pairwise interaction between united at’oms CH, etc. t,o be very large: it is the sum over all these int,eractions that gives the large values chara,ct.eristic of hydrophobic interaction energies for whole groups. These pairwise contributions to the net transfer free rnerg,v of hydrophobic interaction energy are taken t,o relate t,o the equilibrium energy ti$. i.e. the lowest’ energy obtained by varying the int’eratomic separation. With neut’ral charges on the groups, the potent,ial functions cont,ain only trtrms .4 and IZ, and for the 9-6-l potent’ial. functions A and R are determined by the equilibrium energy e$ and equilibrium distance r* through: .-lij = -
2e$(r$)9
/lij = - 3e$(rfj)6.
(9) (10)
The choice of e* was made as follows. Data for transfer free energies of hydrophobic side-chains were collated from various sources; namely, from transfer from wat’er to ethanol and dioxane (Nozaki & Tanford, 1971). to chloroform, carbon tetrachloride. butyl ether, isoamyl alcohol, n-hexanol. and I-octanol (Nandi, 1976), and from the surface tension studies reported by Bull & Bressie (1974). These all give fairly consistent results over a narrow range for the hydrophobic side-chains. The equilibrium energies for the potent’ial functions of C (aromatic), CH (aliphatic and aromatic), CH, and CH, were then adjusted to bring the average interaction of each side-chain t’ppe with t’he rest of the protein (lysozyme. ribonuclease and bovine pancreatic trypsin mhibit’or) into the range of the above experimentSal t)ransfer free energies. To evaluate these hydrophobic “energies” within the protein one must have at least some preliminary estimate for r* as well as the trial values of e*. We initially took r* to be the sum of the radii for each
EngineerirLy
265
specific pair of groups after a converged series \)f growth cycles by the atom growth method (see Theory and Slet,hods). The equilibrium distance /.* is then either the sum of the mean radii for each interacting group (CH, etc.) of that type or the int.eratomic distance if less than the sum of thesr, radii. With this approach, it was found reasonable within the tolerance of the fit to treat aliphat’ic groups (‘H. CH, and CH, as having P* = -0.18 kcal mol-‘. and aromatic (’ and CH groups as having P* = -0.08 kcal mol- ‘. Then. for whole hydrophobic side-chains. thr average inter,action energy of the side-chain with the rest of the protein is consistent with the experimental ranges of the transfer free energies of these side-(*hains(see Table 2). Kate that int,erartions with polar groups were included in this study. and t,hat the values of P*. the int,eraction energies between polar and norIpolar groups, were evaluated by the geometric mean method (see above). The arithmetic+ mean method gives similar results. Sate also that solvents used to obtain the transfer data had some polar character. as does t)he protein interior. Preliminary estimates of r* provided by the atom growth st,udy (see Table 3) were used as a starting point for the penultimate phase of the refinement of potential functions. The radii for aliphatic CH groups lie in the range 1.81 to 1.87 AA(m>-oplobin = 1.X.54, lysozyme = 1.87 A, ribonurlease = 1.81 -4). For (‘H, and CH,. the range is I .75 IO I.80 .% (myoglobin = 1.80 A, lysozyme = 1.7ti .r\, ribonuclease = I.75 A): inspection shows that a (‘loser approach for CH, groups tends to occur. in that these can rotat)e and int)erdigitate the hydrogen atoms. Aromatic CH groups yield radii iu the range 1.X2 to 1.X5 AA(myoglobin = 1.82 -4. I,~~;r.yrne = 1.85 ,i, ribonuclease = 1.8%,!I). For other atoms. comparison of twice the values of radii (the contact distance) a-ith the equilibrium values obtained b>Hagler et al. (1971) from fitting crystal data shows that. for atoms. t)he sum of contact radii is typicall> less t,han the equilibrium distance: this \vas noted by Hagler of nl. (1974) for the situation in t’hc crystals they investigated. It implies a slight compression due to long-range net attractions, primarily electrostatic. for the systrm a~ a whole. Table 2 Transfer free energies of side-chains interior as calculated using current functions, compared with experimental energiex from water to various organic the text) x- I
(‘aldated Sidr-chain Ala \‘a.1 I&U Phr Tr p
of
(kral
mol-‘)
0.7 2.2 2.9 2.6 3.3
to proteirz potential transfer free soltlunts (see
--__-----
-I_-
Table 4
Table 3 Quulity
aan der Wuals’ contact radii of atom,,s and ysoup~s cnlrulated from contact distances observed in protri?js (see
(‘H (aliphat,ic) (‘H,/(‘H, (‘H (aromatic) (’ (carbonyl) () (carbonyl) SH (amide) 0 (carboxyl) ?iH, SH, (arginyl) OH S:SH
of
potential functions
ro~fo,rm,ntion-depe7Lderit
the text)
Myoglobin (A)
Lysozymr (4
1.85 1 .no
I.87 I.76
1.84
I .H.i
1.66 1.57 1.57 1.79 I.96 I.81 I.78 1.I)8
1.66 I .60 1.65 I .6:3
I .97 1.94 1.71 I .63
Ribonuclease (A) I.81 I.75 1.82 1.67 1.58 1.64 1 75 2.09 1.80 I.81 I .TO
A.
LVucl~ar mqnetir r~sononcv N-ncetylalunyl-N’-mrthyl~m~~
vicinul (Ii-)
of obigo-
coupling
(‘sing Karplus rsing relation Using relation Experirnent,al Possible value
relationt of Kystrov rt rtl. (1969) of Ramachandran rl IL/. ( I97 I )
IS. f’hnractwistir unbmnrhed
ratio tri (‘O
Note. the nature of thr algorithm leads to protein-dependent overestimates of the contact radii of predominantly surface groups. Other variations from protein to protein are in some sense “real” but dependent primarily on the secondary structure wntent and secondary structure packing arrangements. H atoms wre not distinguished, since their co-ordinates are only inferred indirectly from X-ray data for non-hydrogen atoms.
Dipole
of random
coil
74.5 -i’!Ni 7.H-L 7.8 *0+ 0 II
polyprpfidrs
with
sidr-chaitrn
range
mornent,s
J, /;)I
c’onstaul
13.1.5 .i II :! ‘Xl (debye
units)
of r>,L-alanine
oligopvptidrs
C‘alculated 25 26 31 35 30 34 39 37 45 35
1.1) 1.1. l)l.ll OLL III.“,. I.I.11 I.OL1. ,.,.I. 1lLLI) LLl)l,
Experimental 27 29 34 36 37 :37 30 40 II 41
ItOLL
43
44
I.i.LL)
46 47 49
45 48 4%
1~1.1.1. I.I.LL D.
II rid
range
(‘alculatrd Experimental Possible value (‘.
For example, the radius of a carhonyl carbon is 1.66 Lk and the contact distance thus 3.32 ,q. compared with the equilibrium distance of 3.6 A. The optimum scaling factor that would ma,ke the contact distances consist’ent with equilibrium distances, applied t’o the non-polar groups CH. etc., gives a final value of about 3.9 L% for aliphatic (YH and CH, groups and aromatic CH groups. This is in accord with the nh initio result,s (see above). The problem of how to treat the interdigitating CH, properly is, of course. a problem that’ arises from t,he use of a united atom rat’her than separate treatment of C and H atoms. This problem is less serious. however, because of the use of the softer nint,h power functionality. That) is to say. if we also take r* = 3.9 A for this group, the energy is still negative (in the classical type of potential, and less I ban Zij in the zero minimum potentials) down to 3.4 ,r\. Further, the floor of the minimum in t,he f’unction varies by only about 0.01 kcal mol-’ from 3.7 t’o 4.0 A separation. The final precise assignments of parameters t’o these groups were sought) that’ give a good account of random chain behaviour of polypeptides. which will more closely resemble t’he behaviour during folding. (‘sing the statist’ical mechanical techniques dr>scribed by Flory & Schimmel (1967). the dipole tnoments of 14 D,L alanine oligopeptides were calculated, and, using t’he method of Brant & Flory (1965). the characteristic ratio of long chains of homoalanine was calculated as a model for honlopolypeptides with non-polar side-chains unbranched at Co. At the same t’ime, the nuclear magnetic resonance vicinal SH-CH coupling (Bonstant, was evaluated for S-acetylalanyl-LV’methylamide using the Karplus-like relationships (Karplus, 1963) between @ and coupling constant described bay Ramachandran et al. (1971), Bystrov vt nl. (1969) and Cung et al. (1973), these being
In calculntiug
properties polypeptidrs
Propwtius
of hydrogen
bond
Equilibrium energy and NH distance Calibrated from amide crystal data Calculated from present functions E. C’onformution and energy of N-nc~tylnorLa-alanyl-N’-mpthylamide
Energy = - 13.10 kcal mol- ’ extended struct,ure (4 = -75”, t See Karpius
UC’
-2.9 -2.9
cg most
stable
relative to $ = -47”)
kcal mol-‘: kcal mol. stereoregular C/J = -57”.
most,
2.1 ,l ‘: 1.1 A ronformcc -17”
stable
near-
(1963).
averaged over all conformers with statistical weight e -Us” in molar ( E is the energy of each conformer units, Zc is the gas constant, and T is absolute temperature). As described below, the parameters both for CH and CH, groups and those for the other at’oms were varied so as to reproduce maximum agreement with experiment. The results obtained with the optimal choice of parameters are shown in Table 4, and these results indicate tha,t the optimal value of r* for the interaction bet,ween aliphatic groups is 3.9 A, consistent with the more qualitative reasoning based on the use of the at,om growth method shown above, and with the a,b initio results. (In) Peptide
baclcbone atoms
The backbone at’oms comprise the S, H. C‘ and 0 atoms of the peptide group. plus the CH group (for glycine. CH,).
Refined
Xodels
for Protein
Since the ab initio (3s/%p) level calculat’ions have been shown to be in good accord with t’he potential functions given by Hagler et al. (1974) when applied to glycyl (Robson et aZ., 1978) and alanyl (Hillier & Robson. 1979) derivatives, in principle t)hesewould represent a suitable choice. However. the ?i. H, ( and 0 atom assignments must be shown to be consistent with the united a,tom CH/CH, groups. Also. in practice, flIrther refinements are indicat)ed for our purposes, as described below. partly relating to the use of rigid geomet,ry. R’ecall t,hat the paramet’ers given by Hagler et al. (19i4) were used as a starting point for an initial root-mean-square fit to the above ab initio potent,ial surfaces. l’nder the further constraint of needing to fit’ the experimental peptide group dipole moment of 3.7 deb?e, the partial charges C are unchanged b\- this refinement, and there is no justification for t,he alteration of assignments to .-1 and H parameters of the (’ and 0 atoms of the carbonyl group. However. it was found that a significant enhancement of all energies below 8 kcal mol-‘, t’o agreement within 0.5 kcal mol-‘, could be obtained by modifica,tion of the peptide nitrogen and hydrogen paramet,rrs. Hillier $ Robson (1979) noted that a point of significant disagreement did exist in relation to the potential functions given 1)~ Hagler it nl. (1974) for hydrogen-bonded regions, and particularly in relation to the S . HK hydrogen bond and the N . N overlap that occurs when the group N-CH-C’-N’ is &s. At that time. it was proposed that the ab initio calculations overestimated hydrogen bond strengths, and this seemed very reasonable in terms of t,he need properly to treat back-polarization and the proper need for computat’ionallv expensive double-zeta basis set,s in the ah iniiio calculation. However,
267
h?~gineering
inrreasing the van der Waals’ repulsion on the amide hydrogen and decreasing that on the amide nitrogen was found in this st’udy to enhance agreement with ab in&o to t’he level described above, while maintaining the same XH OC hydrogen bond strength as calculated wit#h the original parameters given by Hagler et al. (1974). which fitted well the original crystal data. It is emphasized that the merits of this adjustment’ are, in the present status of the art. a matter of opinion. Comparison with experimental data from studies of peptides in solution (Robson et al.. 1979) is complicated by the fact that valence angles can open further to relax a cis interaction. For example, in SCH-C-S t)he N-CH-C’ and CH-C’-N angles can increase to relax the N . K overlap. The agreement with experiment can thus be obtained either by including functions for valence angle bending (Robson et al., 1979), or by the above adjustment, which still retains agreement with ab i/LitkJ results and the original NH OC hydrogen bond strength calculated for peptide crystals. If fumtions for valence angle bending are included after such an adjustment, there is litt’le enhancement of agreement, since there is less of an ?I’ . N overlap for variations in valence angles to relax. i.e. valence angles depart’ only slightly from their equilibrium values. Identical conclusions are reached if individual C and H, rather t,han united (‘H/(‘H,/CH, atoms are used in the calculations. We conclude that the required conformat8ional behaviour can thus be obtained whichever way the treatment of the NH parameters is just’ified and this suffices for present purposes. One indirect benefit of this adjustment to NH parameters is obvious: it circumvents the need to trea,t valence angles as variables in a protein-folding simulation
Table 5 !I--&1
Name of atom type (‘I (‘2 ( ‘3 (‘-. A0 AI (‘A (‘Ii
N’ x+ S*
H’ H’ HO 0 0’ or
empiricial
potential function using the geometric Description of atom type
Aliphatic CH Aliphatic CH, AliDhatic CH, C ok carhonyl (’ of coogroup Aromatic carbon Aromatic CH Histidine C-3 Histidine C-2 N of -NH N of -NH,. -NH: N of proline Sulphur of Met and Cyn H-bonded to N’ or N+ H of -NH; H of hvdroxvl 0 of h;drox;l 0 of carbonyl 0 of coogroup
used in ca~lculationx ou peptides mean constraint of Aij etc.
ri
B
75.142 75,142 75,142 12.470
1900 1900 1900
r* Glp) 3.9 3.9 3.9 3.75 3.75 3.6% 3.9
and protein
E* (km1
mol-')
(e.11.)
-0.18 -0.18 -0.18
0 0 W-L6 049 0 0 0.39 0.39 -0.26 - w.52 0
12.470 17,080 33.396 17,080 33.396 15,037 15037 15:037
355 355 540 x45 540 x45 ti28 628 628
3.62 3.9 3.3 3.3 3.3
-0,042 - 0.042 -04X -0.08 - 0.08 - 0.08 -0.162 -0,162 -0,162
104.X.57.6
2457.6
4.0
-0.20
0
2.2 8 2.28 2.28 3.65 3.65 3.65
-0903 -0403 -0.003 - 0.20 - 0.20 - 0.20
0.26 0.33
9.99 9.99 9.99 45 770 451770 45,800
1.26 1.26 I.26 I410 l-110 1410
0.3 -0.3 - 046 - IO4
4 :/=\ !.+ 0
Re$ned
Models for Protein
as to model the effects of dielectric and count,er ionic screening. We have elected to employ the latter. having noted that this does not necessarily imply large changes to individual partial charges or. surprisingly. to the net dipole calculated for tht above groups. It is possible to assign partial charges t,o satisfy the following constraints, which we considered to be appropriate for all or most contexts t,o be cncbountered in folding simulations. (1) Partial charges assigned to lie between those for the corresponding atoms in the c.ase of the neutral group and the fully charged group. (2) Net charge of about +O.j e.u.. such that. for two rharyed groups at) a distance. this models t.he use of t,wo fully cshargedgroups wit,h the vu&mar\ effect ivr dielectric constant of about 1 for the intervening medium. Lye note also that bringing a unit charge towards I he model systems with associat.ed OH or H,O+ ions produces a rouphl? 750 reduction in the electrostatic interaction. ’ 0 which is consistent with it choice of about 1 for the dielectric* constant. (3) Partial csharges on the hydrogerl-bonding at,oms such that at close range, Arong electrostatics int’eravtions oc(*ur between charged groups. which approximates the electrostat,ir interact ion for the fully charged groups. (4) Partial c*hsrges huch that the csalculated dipole moments agree as closely- as possible with extensive dat.a for the neutral forms in organic solvent,s. This ensures sensible interactions between st.rongly polar groups when transientl>- or l’ermanently buried in a foldiq simulat’ion. The result’inp net charges and dipole moments are shown in Table A. Note t,hat the neutral form of hisMine is used in our calculations. though the assignments of partial charges consistent with nh iflifif) c*alculation still allow strong hydrogen bonding 1,. the histidine ring. (d)
I,/trirrsic
rotational
potentirrls
ln our view. inclusioil of an intrinsic rotational potential as funct’ions of rotation round single bonds should IN, introduced only as a last resort. since often t’hey seem onty to correct deficiencies in the interatomic* param&ers. Xo such correction
Engsineering
269
seems to be required in t,he present case for backbone @. ‘I’ angles. In other cases. dc+icienc*irs can be corrected bv inclusion of norI-core orhit’als as ventres of intera,&& in the calculation (Watt t+ trl.. 1981). This criticism is less justified here. however. because of the approximate representations used in some cases.and this is part,icularly I rue fi)r rotation round a,liphatic CC’ bonds. where hvdrogen atoms are not explicitly represent,ed. In pi\ratlleterizatioil of a detailed representation for energy refinement. to be reported elsewhere. this problem llas been resolved by an orbital force tield approacah. However. there are undoubt,edly some special quantum rnerhanic~al effects that must be COIIsidered as justifying int’rinsic rotational potentials in side-chain cont)exts, as in the rotation of the bond to a carboxy group (see below). The intrinsic rot,ational potential for the aliphatic c’-(’ bond was developed to reproducae the barrier heights and relative energies of minima obtained experiment~ally for butjane (Momany it trl.. 1975). the rotational pot’ential again being considered as a corrective factor for t,he barriers c~al~~utated wit,lr the C’H, and C”H, group parameters. Thrb optimal intrinsic, rotational potential is: E.,,), = ( lr3)( 1 +ros 38):“. wit II
I’, = 3.8 kcal mol- ‘.
and gives the agreement betwetkn theory and experiment shown in Table 7. To consider C”OO- and (‘O?r’H, groups. we here rnake reference to the aspartate and asparagine side-chains, though of course similar arguments apply to glutamate and glutamine side-chains. Here there is a rer?- interesting problem, \vhic*h should apply t’o most empirical potential functions and which is often neglected. The distribution of x2 angles for aspartat)e and asl)araginc residues obtained from S-ray crystaltographic studies on 19 proteins is caentreti as x2 = 156” with a standard deviation of 38” (Janin et nl.. 1978). However, empirical calculations using the parameter set discussed so far give a minimum at’ x2 = 90”. with a barrier of approximat)ely 51 kcal mol-’ at x2 = 0. The empirical energy of conformation x2 = 155” is 3.9 kcal mol- ’ above the minimum.
Table 6 Elwtrostutic
(hup
properties
of highly polar side-chain partial charge assignments
Charge on H-bond donor/acceptor
(11)
Charge on covalently bonded atom
an.d terminal used
groups
with
thp
Dipole moment (debyes)
Set charge
(‘arboxyl Aminp (NH:) Aryit1yl
-048 +0.X3 +0.26
(0) (H) (H)
+0-N -0.52 -0.52
(( ‘) (N) (N)
- 047 + 047 + 046
Histidyl ring Prpt ide
+0.26 f0.26 -046
(H) (H) (0)
-0.52 -0.26 046
(N) (N) (C’)
0.00 0.00 040
(‘dc.
Expt.
Thr “parameter set” used here &g-Et (kcal mol- ‘) (t-g) energy barrier (kcal molt ‘) (g-g) energy barrier (kcal mol- ’ ) (‘-(‘-(‘(’ torsion angle (deg.). q (‘-(‘-(‘-(‘ (deg.) t
I .35 3.H.i 12.1tit
$1 1HO
t. truns conformation: Et, its energy. q, guurhr conformation; Eg. its energy. t If required, this barrier height, may be brought into experimental angles for (‘Y-C. The discrepancy, of less importance in the present refinement. t,o be reported elsewhere.
Clearly, the experimental and empirical calculations disagree (see Table 7), and we proceeded as follows. The CH,CH,COy H30+ system was taken t,o represent the aspartate residue (i.e. the peptide backbone being represented by a methyl group). 4h ir~itio calculations were performed on CH,CH,COi H30+: an extended basis set using t’he (‘is, 3~) orbitals given by Roos & Siegbahn (1970) contracted to 13s, 2~1 on C. 0 and the (4s) orbitals given by Dunning (1970) contracted to [2s] on H were used. We assumed standard CH, CC and C=O bond length and bond angles with HJO+ positioned as in (‘HJCO~H,O+ (see Fig. 1). Only the rot’ation about the x2 angle was considered. Therefore, x1 was fixed in a staggered conformation (i.e. H-7-C-2-C-l(.’ = 180”: see Fig. 2). These ab initio results predict, a minimum at x2 = 30”, 1 SO”, 2 10” and 330”, with a barrier of I .48 kcal mol - ’ at xZ = 90”, 270”, and a barrier of O-044 kcal mol-’ at x2 = O”, 180”. Thus, the nb in&o calculation agrees reasonably well with the experimental observations. Reasonable agreement between the ab initio and empirical calculations for aspartat,e can be obtained if a 2-fold intrinsic rotational potential of the form: $
&OR =
with !I’~ = 6.55 kcal moltion with
the interatomic Lp
(1 -cos
2x*).
‘, is included potentials:
= E,, + &-OR-
in conjunca-
(13)
see Table 8. Intrinsic rotational potentials for xZ rotations of aromatic (including histidine) side-chains have been caonsidered using ah initio calculations on model systems such as CH,CH,&H,. and comparison made with experimental data. To a first approximation. the interatomic potentials give an adequat’e account without need for an intrinsic rotational potential. Further refinement is being carried out for potential functions using explicit hydrogen atoms, but a more detailed treat,ment is not,
range by choiw of’larger va.lerrvr context. disappears in an all at.om
justified in t,he present chntext. From the point of view of bot,h quantum mechanical studies and inspection of experimental protein co-ordinates. however, the intuitively expected gross behaviour based on the use of abom-centred functions is realised. as opposed to the more complicated behaviour observed for COO and CONH, groups as described above. Arginyl and lysyl side-chains have also been considered in preliminary studies as CH,NH . (‘(NH,); and C‘H,CH,CH,NH: analogues without hydration. Nonetheless, there is no evidence for anamolous rotameric behaviour. For the intrinsic rotational potentials applied in these cases, see Table 9.
(e) Ability of potential Pxperimental
functions to conserrr conformations of proteins
This is a classic: test area, nonetheless full of pitfalls (both technical and fundamenta,l) for the unwary. Some discussion of these is thus provided in context) with the t)ests and results, as well as in the Discussion. A given set of potential functions would be highly unsuited for folding simulations if they could not lead to a tolerable result starting from the reported experimental conformation of a protein. One way to test’ t,his is t’o use the pot,ential functions for the “energy refinement’. of reported co-ordinates. A zero deviation bet,ween the reported co-ordinat,rs and the co-ordinates after “refinement.” is not expected because of experimental errors in thrl reported co-ordinates, and any special (*hoice ot potential functions used by the crystallographer to refine them, and because the conformat.ions of sidr-
W A,+/
‘“‘c+-, 0 H’ Figure 2. Hydrogen-bonding arrangement ab initio calculations on CH,CH,CO; (‘V-C
-0
= 180”.
\
used in the H30+ with
271
ReJined Jlodels for Protein Etagineeriny
Tabie 8 Confovmational
energies of CH,CH,-CO,-
H,O’
(empirical
and
E -v 1., = 6.5<5 (kcal
4,645 4,325 3473 2.102
30. 150. 210, 330
4.5. 135, 225. 316 60. 120. 240, 300 90. "70 + 3b initin
0
energy
at x2 = 30” is -342.0537736
atomic
chains at the protein surface in t,he crystal are particularly likely to deviate from t,hosr calculated or observed for an isolated molecule. Once the calculation, starting from the experimental conformation or by folding from an open chain, reaches the native minimum, a more refined calculation involving gradient minimization with all degrees of freedom (including bond lengths and valence angles). or molecular dynamics simulation could be used as a final pass. For present purposes, a degree of agreement between the model structure minimized from the experimental st’ructure and the starting experimental structure would certainly be acceptable if equal to, or less than, the error in the experimental determination. Further root-meansquare deviation of backbone atoms of lessthan 3 !I between an experimental structure and a predicted one (from a folding simulation) would be considered a significant achievement in reproducing the principal experimental features of the chain fold. Generally speaking, in starting from the native structure. we would consider an agreement of about 2 i! r.m.s.t as highly satisfactory a priori (the actual results are very significantly better) noting that our requirements, as indicated above, are less stringent than those of the crystallographer interested in providing information in intimate detail. The price paid in using the time-saving rigid geometry approximation. with it,s major benefits in t .Abbreviat,iorl used:r.m.s.. root-mean-square.
(kcal
mol-
I)
0.044 04 19 WOO6 0t 0.161 0.628 I.477
0,301 0,794 1475
0.957
initio)
Eobinirio
mol - ‘)
0.039 0.008 0 0.035
j.115
0. I x0
If,. 165, 195, 345 20. 160. 2OU.340
ah
unit<.
terms of conformational speed and ease of use with global minimization procedures, is the technical difficulty that the geometry chosen in the model (bond lengths, valence angles, dihedral angle of peptide group, etc.) may not correspond exactly to those implied by experimental co-ordinat,es, and one cannot simply “start” from t’he experimental st.ructure. It is pertinent to recall t.hat such details are below the level of det,ail revealed by the electron density map and t,he co-ordinates quoted by the crystallographer may have involved a fit to the data using his own standard geometry, or an energy refinement in Cartesian space using a chosen set of potent,ials. That is, the geometric details implied by the co-ordinates largely reflect, again at t,his level of detail. the theoretical technique the crystallographer has employed to refine his data. Be that as it ma\-. the practical difficulty arises that it is generally impossible precisely to construct the experimental struct’ure with the model used even before energy minimizat’ion. This is a purely technical difficulty. because any technique such as energy refinement’ in Cartesian space or molecular dynamics applied in the same spirit to test potentials suffers from a comparable problem in that there is an implied equilibrium geometry that may not be consistent. One method t’o circumvent this is of course to set up the model with the bond lengths. valence angles, et’c. consistent with the initial experimental coordinat,es. This is not recommended. since t,he potentials have been calibrated for w specific
Table 9 Intrinsic rotational potentials (IRPs) used in this work 1.,t (1
sit
Nit 0
0
3.i
+ 1
3
li-55 6.i 0.6 3.4 I.9 I.8
-1 - 1 + 1 - I fl +1
2 1 3 2 3
3
Torsional
angles applied
to all side-chain
angles
X,ofH,Y,F.W;X,ofR X, of V. L. I, S. T. D, E. S, Q. K. H. R F. Y. W. M. (‘: X, of Id. I. E. Q. K. R. M; X, of R, K: X, of K X, of D: X, of E X,ofN:X,ofQ X, of S. T X,ofY X,ofK X,ofM
..
+ JJWs have hriyht.
the form
‘i (1 +S, cos SiBi). where E;,, = T-T
S, is thea phase of. and -Vi the number
of minima.
Bi is the torsional in the IRE’
angle:
I’, is the harriet
sta,ndard rigid geometry. and t)f?f’ilUW this docv not allow one to test a general model. Ii is wart h noting. however. that this gives for twvillcb pancreatic trypsin inhibitor on minirniza,tiotl. ;I 0.5 A r.ni.s. agreement for all atotns iLlId 04 X for, backbone atoms. This distinction is made be~usv it1 ca,lcula,tions 011 an isolated molecult~ the irttcv molecular interactions in the crystal are not accountrd for and the conformat)ion of surfa,ce sidcchains may. in particular. varv with the envirollmelrt. In practicr. the use 2 a miiiimizatio~~ itr ditrc&al a~ngle spa(~e first requires \-et’!- acq*urate specification of bond geometry consistent wit Ii t tlv c~sprriment~al structure and. for anything other t ban a small prot)ein. a fitting proc~edure is rfyuirecl (WC belo\\-). =\n alternative approach is the customary one. to c*arry out a minimization from the nat i\-cl c7mformat ioti by minimization in (~‘artesiatl spacv. This requires all degrees of freedom, anti a valencv forty field must be implenient.ed: i.e. paramctws for thv distortion of bond Iengt hs. valen(~t~ angles. and dihedral angles f‘or hnds of‘ at If?itSt partial double bond charac+t’er. from equilibrium \-dues. Since the present pot~ctiiials ivere refined specificall~~ ft)t. the rigid geometry apl)roxitnatior1. no siic+ vnlvnc~r fot~ field exists for the present potentials. Indrrcl. b>, definition. no valence force field (‘au 1~ absolutely caonsistent with such pot,entials. HOW vvttr. it is of cotisidvrable interest to ask just ho\! sensit.ivc the rnodrl is to addition of’ a I~t’itSOllilt~lt” f01~c~etield Michael I,evitt has kindly provided an indepem darit test of’ our potent)ials on bovine pancreatic, t.rypsin inhibitor and. following correc+ion of ;I deficiency. he noted in our assignment of sulphur atom potent~ials. noted that minimizat.iotl in (‘artesian space produced a I .2’ A4 r.m .s. deviation from the rcpo’ted co-ordinate using his measure t ban ArX.3 (Levitt, 19X36). Th’ 1s is less favourable ttrts value of 0% A obtained using his o\vn flcxiblr geomet’ry potent’ials at that time. but is rrmarkabl~~ good considering t,hat our potentials are refined for use with rigid geometry. The above result is t hlls surprisingly f;lvourablr when compared wit Ii t trc It~vtd of agreement obtaitrcd using a trunib(Jr of’ rc’})ortetl potential furic+on sets in itlt t~;ll’liC~t’ vomparal i\r st utiy (see Tabto 6 of I,tJvitt, 19X3~). for this totcrablt~ Icvt4 of The explanation agrcvmt~rrt is like]\- to be that our noti-l)ontling ititrratomic pot)entials lie largely within the range of variation implied by most other pot,entiat fun(s1ion sets c~alibrated or chosen for use \vit h a \.alcnvt~ forvc field. I ti gross t.erms, they represent no \vorse a selec+on of non-bonding potentials than those currentI\in use with a valence forcae field. Howt~vrr. \ve would expect improvement for minimization in (‘artesiatl space using a fully caonsistrnt scst. iis indeed I,evitt observed (SW at;ovtx). The use of’ su(bh a procedure i,s not rc,c,olnn~c,flcled I)y us for a reason quite distint*t from c0nsistel;c.v. I’s? of a \alenctl force field arid minimization 1n (‘artesian space c~urrently demands. in practiw, a
yradietlt tnri hod. That ih it “t.eI;i~(v,‘~ 1hft flr$~lcbr!i ,~ti energ!. nlinimization to the wares1 II)(YII mtiiitI1iitt1. This ma!- IN, separated frorrr neighbourittg. titv~~~r minima b\. 10~~ cntvgy l~at~ric~rs. \vtlicntr ;tt’,’ I;itttLl t (*alty fac~ilt~ ill the IWII protclin. In oII(s s(‘n5t’. ttl(, tthchniquta dcliberat.ely biases the rt~sult to Iit* (~Iosic~ to the starting point antI t bus dot3 not it1 oitt’ \itau. alone cotistituttb a st r’itigt~n1 tt,st The
rnos1
t~rc~orlirlietitl(~tl
prwduw.
t,ot II
fi)r,
tc4tig t tte potentials anti for first strl)s in I)rottbitl engineering c~al~~ulations. suc*h as for predict iota of’ the eff‘rct of sequent tnodific~atiotr, is as follows. Tnitially. the ttistancaes /Iii I)(‘t\v(ht>tI 1 tie atonis i iLIl(l ,j in the given esprritnental st tvct urc ;IW t~v;~luattyl, The modrl is presented to I he c+otrrputvr it1 an>. conformat ion. and ttic fimct ion: cxptnl jyfit
=
c!!& ij
modrl ----ctij
z ~~ !-
II
is minimized as a tunct,ioti of 1 he dilldrill snytes of the model. Here. t/ is the number of interac+iolis: fi)r S atoms. tl = N(X1)/g. This represents the fitting procedure before minimization. and ittiposes t trc closest possible approximation of the c.otlforrlrat’ic,rr of the cxperirnental strucature on the c*otrformation of the model. TJ-pica.1 minimal values of I+‘,~, art about 04M. more convenient,l?; re-expressed it is ~rJn.s. = ~%pt(I”,,,) as F,,,,,, = 0.2. However. typically necessary to carry the degree of fit down to only about F,,,,,, = 0.5 since. when energ) calrula~iions are carried out. only minor movements from the experimental st,ructure result. The best possible degree of fit depends on the differences in the geometry used in the mode1 and as implied in the experimental co-ordinates. The minimization of Ffit uses a program identical with t,hat used in t,he energy minimizat,ion, save that t hr energ? evaluation function is replaced by the fit,tinp fimction (actually. t,his is an option in the same program); that is. it uses a simplex corn bined with zero minimum procedure. It is a measure of t,tie power of the procedure that it is possible to start the mode1 from an extended chain. Howevrr. this is time-consuming and in routine use it is desirable to start from the dihedral angles calculated from the rxperitnental structure. It is not,rworth\,. tiowevrr. that’ in t,he normal case of the inc~onsistency of bond geometry between the experiment~al and model conformations, t’he starting c~onforrnatiori before fit,ting wilt generally have IIO obvious similarity to the natjive structure. For this reason, a pobverful minimization procedure is required and a gradient met,hod would give problems. as several workers including (‘. Sander and Al. .J. E. Sternberg have communicated to us. The model conformation obtained by t,he above fitting procedure is now subjected to energy minimization in dihedral angle space by t’he STMI’LES and the zero minimum method. In practice. t’he overall operation may be speeded up by using a weighted sum of Ffi, and energy from t’he outset. phasing up the energy contribut,ion and
Refined Models for Protein Engineerirlg phasing down t’he fit contribution for a further minimization each time convergence is obtained. On minimization of avian pancreatic polypeptide, a local minimum with F,,,,,, of 0.41 A for the nonhydrogen atom (0.29 ‘4 for the C” backbone atoms onlv) was obtained. This minimum is akin to a local minimum obtained by a gradient minimization, i.e. no kinetically facile energy barriers have been crossed. However, on further minimization using SIMPLEX, a deeper energy minimum was obt.ained wit,h b r.m.s. = W6-2 A (0.43 ,% for the C” atom). This relaxation on using a powerful energy minimizat’ion technique capable of crossing kinetically facile energy barriers is common to other potential functions. However, on much more extensive global minimization, a lower energy open structure would be obtained. Onlv by using a suitable model for the solvent. can this deficiency be overcome. This model will he reported elsewhere. (f)
Further
tests
and analyses
Many studies are being carried out to validate the potentials. AMost are major computational projects and will be reported elsewhere, but some particularly relevant preliminary results are described below. Also considered here are a number of tests. because they individually provide a weaker basis for calibration (though together t,hey provide, as tests. a significant weight of support and a deeper understanding). This is because they would provide redundant information for calibration, involve data of a more qualitat.ive nat’ure, or involve similar but not identical potential functions. The latter is because various studies have been carried out in the course of the calibration. in some cases to test the consequence of varying parameters. Of particular interest is the relationship to t’he parameters given by Hagler et al. (1974), which formed the st,arting point for calibration. Following some early studies bet,ween our laborat,ories. both these paramet*rr sets in the context of an overall force field for molecular calculation have diverged considerably in our halids and t,he hands of Dr Hagler. The difference is primarily because, in the cont’ext of this work: we are interested in the validity for rapid calculation of (1) intramolecular interactions in aqueous solution and (2) the rapid rigid geometry approximation. Hagler. on t.he other hand, is primarily interested in applicat,ions to molecular dynamics, which properly demands full flexible geometry. In principle, any set of potential functions could have served as a st,arting point for caalibration. The merit of the earlier set of Hagler et al. (1974) is t’hat it described intermolecular int’eractions between essent’ially rigid peptide-like molecules, and is thus free of any further treatment to consider the question of transferability to intramolecular calculations, giving us a freer hand to explore the validity of the choice of rigid geometry. Moreover, being calibrated to reproduce the crystal structures of peptide-like molecules and
173
the heats of sublimation of the crystals. the parameters given by Hagler et al. (1974) input informat,ion about the forces in uacuo, and to this contribution we could adhere where appropriate. Specifically, the problem is that it is fundament,ally impossible to t’est, as a whole set the potentials described here. by reproducing heats of sublima tion. This concept has been replaced by that of transfer free energy between a protein int’erior and aqueous medium. What we have done is ensure that. important polar group interactions considered in the original calibration by Hagler et nl. (1974) are reproduced quantitatively, Though water has a role here. it is this model which will be added to the present model as discussed and to be presented elsewhere. Thus. except at very short range, the function of energy with distance of t’he interact’ion CH,?U’H OCCH, when calculated with the presentrd functions (and a dielectric constant of unit! for the electrost,at,ic interactions) reproduces that, calculat,ed using the potent’ials given hy- Hagler et nl. (1974). This seems remarkable in view of the fact, that the amide group parameters are quite unlike those given by Hagler et al. (1971) but it may be recalled that. there is a “slackness” or ambiguity in the assignment such that decreasing the nit,ropen dimensions and increasing the hydrogen ditnensions allows similar results. Also recall that our use of high-grade nb irbitio quantum mechanical dcul;~tions would appear t,o resolve the ambiguity in favour of the amide group parameters reported here. Nonetheless, we acrept that the quantum mechaiiical calculat,ions. however good t,hr stat,e of the art (and even allowing for the fact that we show convergence by t,he quantum mechanical ra.riation principle on ext,ending progressively the basis set). cannot be accorded the same status as experiment. The question of validity of t’he rigid geometry approximat’ion and the nitrogen dimensions arises in cotmrct.ion with the S N int.eract.ion in S-C’“c”- S. The tip energy barrier appears too strong in the potentials used by Hagler et trl. (1971) if rigid geometry is assumed. Quit’e properly. Hayler has argued that this shows t’he need for flexible valence geometry. For our present purposes, lion-ever. the potent,ials used by Hagler et nl. (1974) are unsatisfactory: they do not model a hydrophobic effect. and they do not reproduce the database employed here so well in the rigid geometry approximation. Most seriously. the characteristic ratio for polypeptides unbranched at the beta carbon is 19. well outside the experimental range. In studies with Hagler and colleagues (!Aobson et al.. 1979) we have noted that (1) artificially weakening t.he ?; N interaction. or (2) intro ducing a flexible geometrv force field derived from the infrared spectrum of ‘A-acetyl-S’-met hylamide produces the correct qualitative t’rend. At the level of the kind of data studied here. and without prior prejudice. it. is not clear whether the route of reducing the nitrogen dimensions, or of introducing valence flexibility. is the best. Vsing a valence force field calculated from ah initio methods (Platt &
lXobson. 1983) in conjunction with the present nonbonding parameters retains tolerable agreement with t,he present, database. This force field is “stiffer” than t,hat infrared force field developed with Hagler and colleagues. There is an ambiguitv inherent in the fact that data can be tolerabl) reproduced either with larger and stiffer a,toms and a softer force field, or with smaller or less stiff atoms and a stiffer force field, including rigid geometry- as a limiting case. Here softer and stiffer refer to the functionality of energy when departSing from equilibrium values. Most’ workers. notably. use nor)bonding potentials of 12-6-1 form, with a stiffer 12th power van drr Waals’ repulsive functionality, when calibra,ting or employing a valence force field from infrared spectroscopy. It is nonet,heless the ninth power repulsive form that is support’ed by nh initio calculations on peptides (Hillier 8 Robson. 1979). From all these aspect,s. we conclude that. providing we are interested in calculation of behariour due to int’ramolecular forces in solution. the rigid geometry approximation is a good approximation. Wit,h the a priori knowledge that valence geometry is flexible. however. we do SW a need to introduce our (or ang other) more det,ailed flcxiblc geomet,ry force field to obtain high resolution physicochemical detail once the approximate conformat’ional details have been assigned b>. the methods reported here. In the last analysis. the question of thr rigid geometry approximation as a working tool for conforma,tional st,udies on peptides rests on further application tests. Tn this respect. the present pot,entiaI functions for protein engineering applications. the original paramet,er set used bv Hagler rt (11. (1971). and all intermediate derivat’ions of the la,tter form a cohesive set in producing similar results when applied to small peptides in the rigid geometry approximation. This is with t,he proviso that) Hagler it al. (1974) did not) calibrat,e potent,ial functions appropriate to many side-chain groups. which had to be calibrated at an early st,age in out studies. It is also with t,he most import,ant proviso tht thfl original nitrogen parameters givcsn bv Hagler rf nl. (1971) are modified or t)he r\’ k interaction strength artificially reduced. This is of ditninished importance when considering nuclear magnetic resonance’ backbone coupling constants. however. All these pot)ential functions reproduct, thr observed coupling constants in t)hr following scarips of ,~‘-acet4’larninoacvl-~~‘-rnet,tl~lals in dimethylsulphoxide: glq’cyl. alanvl. aminobutyr?-I. and isoleucvl. This is with the phen~lalanyl exception of t,hc phenylalar& derivative. which is about 0.X Hz below t,he experiment’al range. Nonct heless, such a deviation is within that observed experiment~all~ n-hen changing the solvent. which sugg&s either t’hc need for-a more det’ailed solvent model for greater realism. or at least the recalibrat ion of t.hr solvent aspect, of our paratnet,ers for t)hc solvents used by the nuclear magnetic resonanct~ spectroscopists. Tt is int,eresting that the experimental trend in using a less polar solvent. such as
chloroform. may rrduc*e t ht, \-slut, ot’ I hta c~oul)Iitl~ const,ant by up to 1 Hz. Rearing in mincl t ht) sprtk;t(l of values in the series from glycyl to phenylalan~l is about 2 Hz. this clearly implies that th(x tI;tta (10 nob provide a sensit)ive test, (‘ven though t hcb nlJiIit>, to reproduce the trend is interesting. FIon.c*vc~r. (‘f1 noting that t)he possible variation in t h(a NH ricinal coupling constant for prptitlr~ ca,lculatinp the potent)ial surface is 0 IO I1 11~. it i:, important at least’ to include one typical rnc~tnbc~r of’ the series in the calibration. Sot including t h(L cJtht,r members could arguably be int~rrprrtrd. of’ ~YJLII’S~~. as decreasing the welyht)ing from t htl llu(*lear magnetic resonance dat.a in t hr cAibration. t bough each item actually used \vas givt>rr equal weighting. Another aspecbt that justifies this indireci do~vrlweighting of the nuclear magnetic. I’(‘.Sotliitl(‘~’ contribut’ion is the fact that the parameter’s given by Hagler it al. (1974). and all intrrmrdiatt~ stagt,s of refinement givfx V(‘I’\’ similar rfasults;. Tht justification for using at ICast orit’ inem her,. 0tI 1h(s other hand. is that potential functions that appt~at reasonable in other respects may not reI)ro(luc,c, t ht> data. For example. the original valenc*t~ f;Jrc-rL field from infrared Spec~rOSW~Jy. \vhich brings thtb characteristic ratio into t)hc r~xperimental range. hah been not,ed to reproduc~e coupling c*onst,nnt h signiticant*]?- outside the realm of rxperimt~ntal tlrror SimilarI?-, the present or earlier iritc~rtnf~tliate forms of’ the potential functions g,rivcA tolf~rablf~ results when applied to c~alculatr thth \-icinal coupling constants (SH (‘H and (‘H (‘H) or biologicall\ int,eresting peptidrs well-studied 1)) nuclear magnetic resonancth. This inc4udrs a ch(bmotatic peptide (P. LV. Finn. A. .I. ~forf%~~v. 13. Robson. ?I. I). (:lasel. K. ,i. Frt~er K- :\. I<:. Ijay. unpublished result,s) and th~rot,roI)irl-r~~l~,i~sitl~ her rnone (‘Cl’wd rt rrl.. 1986). Itr the forrn~r (sast’. the calculations also supportf,d t hc crystal sl ru(*t urt’ of a n-analogue t)hat was more reaciily c~r~~t.allizablt~. The usefulness of the calcnlat~ions as a &sign 1ool is well-supported by a study using a drug design tnethod to correlate the c*alrulated and observed potencies of the thyrotropin-releasing hormone (R’obson 8 Finn, 1983: Finn rt (11.. unpublished results: Robson $ Gamier, 1986). E:xtensive nuclear tnagnet ic resonance data were not a~vailablc for t’he 1 S-residue polypeptide neurot’ensin, but t~ht~kastenergy structure calculated with a solvent model was in accord with biological data (Finn P/ n/.. 1984) and, it has been called to our attent,ion, in accord with the conformat’ion implied for a c~onstituent tyrosinr residue as revealed when substitution of a n-tyrosine residue increases the potency (Xemt~rofl’ d al.. 1980). The earliest (Hagler pt al.. 1971) and present potentials give extremely similar result,s when applied to neurotensin: the crit,ical factSor.is whether a solvent) model based on tht> Onsager reaction tield is included. When it is omitted. a more classical hairpin type of structure is obtained. though in both cases the gross feat’ures of a loop form are obtained (R. Fishleigh, unpublished results). i).
ht~fi~t~f~
Refined
The ability to calculate detailed alpha-helical geometry of polyalanine in crystal or fibre form has been employed occasionally to test potentials (Ramachandran, 1973). The present potentials reproduce such data well within the limits of experimental error, about 0.1 ,& r.m.s. It is thus reasonable to ask why the backbone geometry of globular proteins is fitted at about 0.4 A r.m.s. One explanation is that the standard geomet’ry used here does not accord exactly with that used in energy refinement or other modelling methods employed by the crystallographer. Another more technically interesting question relates, however, to the fact that reverse-turn-type conformations might be reproduced less well in the rigid geometry approximation, in that they involve in some instances a near eis N . N interaction i.e. with $ = -O’.) The resolution of the data makes this a weak test. The relative energies of different turn t,ypes can be shown, however, critically to depend on the potentials used and particularly in the cisbarrier heights they imply. Clearly, reverse t’urns can be obtained in energy refinement of experimental co-ordinat,es, but need not be the right type of t)urn. Interestingly, within the limits of t,he experimental data, each turn type at each locus is tolerably well-produced. This is clearly and strongly underlined when calculating the preferred turn conformations in tight hairpins, since the relative abundance of turn types here is almost the converse of that’ found in other conformational features of globular prot,eins (Sibanda & Thornton, 1985). To explore this, energy minimizations of the model hairpin ~V-acetyl-penta-alanyldiglycl-penta-alanylX-met,hylamide have been carried out. In accord wit)h observation. type I, II and III turns are of much higher energy (and hence implicitly rarely observed in t,ight hairpins). More interestingly, Table 10 shows that the rank order of abundance of I’+ TIT’ and II’ types calculated with the present potentials is consistent with the relative abundance of these classesobserved by Sibanda &r Thornton (1985). These computer experiments, including tests of other models and exploration of a solvent’ model, will be presented elsewhere. It is pertinent’ to note that this system does provide an identification of the limits of the validity of the rigid geometry approximation. in that the calculated dihedral angles of the turn residues did show a tendency to move significantly away from the N N barrier, though not so seriously as to cause movement to another t’urn type. This can be overcome by a valence force field allowing anglesto vary. It can also be overcome by a constraining force between the PI; and c’ t,erminus of the beta-strands of the hairpin, and imposing a less twist,ed loop, to model the constraints of the rest of the protein in situ. Even in this potentiallv sensitive test area, then. the cause for the me&s of flexible geometry is not overwhelming. The abili@ to reproduce the conformation phenomena in globular proteins as in the above instance is arguably of much greater importance
2-T,?
Engineering
Models for Protein
than the ability to reproduce physiochemical detail at’ high resolution. Hence, tests of that t,ype will be reported, and are being investigated in a sub&antial research program. The numerical assignment,s of residue t,ypes (alanine etc.), when converted to qualitative helix former and breaker classes, provide an interesting test database and, in principle. a challenging one because of t)he possible need to include entropy of vibration and libration. and interact’ions with the rest of the protein. Using model helices of X-acetyl-decaglyc$X’-methylamide with inserted guest residues between Gly5 and Gly6 and extensive minimization. caused the helix t)o break for guests Asp, Asn. Arg. Ser, Tyr and Pro, and not otherwise. (To do this study requires the procedure described in the next) paragraph.) As shown by Robson & Susuki (1976). these are largely the helix breakers with negat’ive assignments of “helix-forming character”. Aspartate was classed as helix-forming by Robson & Susuki (1976). but it is the weakest of the helix formers (i.e. with the lowest value of helix-forming propensity). Cyst,eine was not calculat,ed as a breaker. The parameters given by Robson & Susuki (1976) for CBS include both cysteine and the S-Sbonded cystme, however. Replacing the -CH,-SH side-chain by -C’H-S-S-(N-acetylcystyl-X’-methylamide) does disrupt the helix, though this may not be an ideal model. The origin of t,he breaking effect for polar residues was clearly the ability to form a hydrogen-bonding interaction between hackbone and side-chain hydrogen-bonding groups. However, this depends on side-chain stereochemistry. Glutamate and lysine can form side-chain hydrogenbonding interactions with the amide or carbonyl groups of their own residue backbone. respect#ively. without displacing the barkbone-backbone hydrogen bonding (bifurcate hydrogen bonds in the general sense)and without significantly pert,urbing helical backbone stereochemistry. They are thus not breakers. In contrast, the others can only do so by adopting ba,ckbone angles inconsist,rnt with an alpha-helix. Aspartate. for example. prefers an extended conformation to form a hydrogen bond with its own backbone group. There are problems: threonine did not emerge as a helix breaker. and
Table 10 !‘alculated
Turn
t.ype I’ II'
II I
energies types in
and observed abundances tight hairpin beta-loops
(‘alculated
mergiest -25.4 - 24.7 - 19.4 +3.8
Observed
of turn
ahundancesj 1-i
II) 0 45
t Energies in kcal mol -I with respect t,o the extended chain of the model polypeptide A,G,A, with the reported potentials. $ Obsrrved abundance in the protein database of Kabsch & Sander I 1984) noted by Sibanda & Thornton (1985). 5 So observed tight hairpins of type I have glycinr in the turn rrgion.
c,omplications, since in the case of t.yrosine antI asparagine t,he helix kinked hadly ht. did not hreak. More funda~mentally, glycine does not t)reak it polyglycine helix: in fact 1 one would e,xprc,t the glycine helix not. to be stable at all. This sugpest.s the nerd specially t’o doctor the glycine case to allow for t,he great conformational entropy in the random roil form or. of course. to c3lwlate explicitly this component, This study will be reported when complet.etl. since a proper ranking hy comparison of the energies. and ideally free energies. of helical and broken forms is rquired (as not,ed below. there is also merit in awaiting a detailed discussion of the solvent models). 11 is important too. however. that the polyalanine helix with one guest residue at a time is generally stable (except for il proline insertion). The above “experiment” depends on a “compression field”. which is the direction of compressing the structure to a sphere. This is calibrated so that t’he helix is only marginally mow sta,hle than an at least bent form. Tn practice. this can he done 1)~ using a term. added to t hrl net energy. which IS proportional to the root -mtwsquare distance between all atoms and hence. hi the theorem of Lagrange. to the radius of gyration. This difficulty does not arise in a protein-folding simulation or within c+alculatiotis on it glohulat st twcture. hecause hydrogen bonds broken in the helix can Iw replaced by Iivdrogen-l)ollcling to the rest of the protein. or to solvent. The problem (‘an ho c,irc~ttmrf~nt’etl also hy the appropriate choic~e of solvent model, which illust,rates the further nrwl fora it solwtit~ model for a ftill trcatmrnt.
4. Discussion The development of potential functions is an or)going activity in many laboratories and no single set can represent the final word at~y more than can my single experiment,al study of any kind 1ea.d to conclusions. However. att’empts to irrevocahlc develop a betIter set lead to a deeper insight into molecular forces determining pepMe conformat~ion attd the relation bet,ween primary sequencse and conformation. Kxamplex from the present stud!, are as followw: in t tica rigid geomct r? ( I ) Potential futic+ons a~c:(:olltlt of aI’l.troxitnatioti can give a.n adequ”te c~otiformat.iotial propertie s of peptide text systetns. Particular choices of atomic parameters may lower the barriers to bond rotation artificially. thus mimicking some effect,s of the opening of valrnw angles as A-K-C’ and IMP1) in t,he bond system A-FM-I). Ijut (lo not prohibit, t,heir use as iin ndt~qlmtr working system. It) is particularI> interesting that parameters that do this are no less well-supported 1)~ the experimental database used t ha,n would he parameters developed in conjunct.ion with t.ertns for flexible geometry (Rohson cut nl.. 1979). The most importalit aspect of the parameters that relat,es t’o the validity or otherwise of the IIW of the rigid geometry approximation is the question of
wpulsivf~ I)itt’killl~tt’l’s. illl(l il)f’f’itil’i\li,V ttjc, implied iltOtnic dimensions. of thf> aniitlr NH grc~ult. I)ecrrasinp the equilibrium tlistancy for intt-‘ra(,tiorth wtth the H. and tttcrrasitry those for N. (,t‘ P;O W~XII. (‘:I11 IW tlOtlP in hll<‘h it \V;\y a8 to tFtilil1 ;r.g,lrt~t~tttetlt Lvith mttclj c~sperirnental clata. while ;tlIcr\rittg it calihratiott to the rot)atiott harrierh of the N-C” bond 10 product a tit to the remaitlirtg tlut;\. 1Vit.h physical realism rather tttatt sptwi in rttittcl. thv obvious opt,ion is to have i’fw)iirw 10 1 tit’ tisf‘ t)t infrared spectroscopic tlftti1 ti)r pfy)tides. ‘I’his has been clone elsewhere (Robsott 14 ol., 1!479) I)III ttw not full>- resolved this dilemma. .-I0 iuitir, (YIIwI~ tions leading to both irttc~r’atotnic~ Itaramvtt~ru atttl valence angle terms would. if’ ianything. fav01ir the potentials dereloprtl hrrt~. as (lidcussfvl ah0vf~ at10 elsewhflrts (T’latt & K~ot)ac,tt. 19X3). (‘ottsitlfyiitg al1 these asfte(:ts. the IIW of the (*ompttt atiottall> fast. rigid peotnetr\- approsimat ion would c~c~rlairil\~s(wn well-justitied ‘for prc)tt~itl-folditty simitlatiotis fs\f*f%pt. possibly, in t tie final stages. awl f’vf’rt tbf*rfs lhfa sufficietic+\~ of a rigid geomet rv ttVi~t.tIlf~tlt Iltls yf.1 to he disprct\-en. (2) The s;tndv of sit&-chain rotation h;trrirrs ti)r polar sidr-chain g”“11ps ittdicat,t‘s t tlil t (~;L~t~t’lll scrutiny of t hr parameters used t+wwhere i tt tacrg? refinemrnt and tnolecular dynamics is required. The preferencc~ for a roughly cis cottfigurat iott OF the ca~rhox~~l antI cwtwxamidr (‘OSH, is ktto\vtr to cr~stallopraythers hut it is importattt to (~tttphasiw that “anomalous” h eh,a\ 7’tour, and the (letailh of’ the change in etrwgy as the (‘ (‘00 or (’ -(‘ONF12 hot~tl rotates. must he import)atit for a c~orrwt tltwript iott of hvdrogert hontling hclweeti lhcse ;~ticI other groups. Seglect of a corrfvt trrattnent will most generally he expected to clestahilizt~ t tiosfa cwttti)rrna tions of sidw~hains required to make suc,h ;I I~~nti. The overall studies on sith-chain cont;~r~tr~ntiot~s tlescrihed hew should just,ify the choices tttadfb for of sitIt>-chain c~ottfot~rtiatioris int,erim assipnmrnts made ivhcn the elect’ron density tttaft for srtrtitc,t groups indicates disorder. frw energies fi)r at least (3) The transfer predominant.ly hydrophobic side-chains taken from wa,ter to a variety of non-polar recipient solvents studied suggest their adequacy as data for calihmting t,he transfer of side-chains from outside to the int,erior of a protein. Though a hydrophobic aide-chain such ax t,hat. of t.yrosinr may cortt.ain a polar group. then many of the recipient solvents. for example ethanol, also have significant polar character and a,llow for side-chain to environment hydrogen honds as for t’he same side-chaitt itt a prot.ein int.erior. The data used overall t.hus emphasize the free energy changes due to thr transfer of the non-polar (‘, CH. (‘H, and (‘H, groups of t’he side-chains. When these data are used to calihratc the pot,entials su(‘h that the interactions of nort-polar groups of side-chains with the rest of the protein correspond to the transfer free energy of the side-chain as a whole. the equilibrium of intergroup pot,entials. for enerptes the (‘H,: (‘Hz etc., are of the same approximate t 1lP
Refined
Models
for Proteh En,ginewin.g
magnitude as the equilibrium energies in a typical van der Waals’ (in vacua) potential. The importance of the hydrophobic effect is in the number of interact,ions within a protein involving the nonpolar groups. and does not reside in large equilibrium energy per 8~ for atom-atom interactions modelling the hydrophobic effect. This does not imply that the hydrophobic contribution can be neglected, but it does imply that calibration against the experimental transfer free energies can be caarriedout, without causing a significant departure from the degree of fit between calculated and observed properties of simple pept,ides, and with the nb in&o results. It is worth emphasizing that our potentials as reported here do not, in general, give the native-like form as the least energy form in comparison to results obtained when minimizing relatively open st’ructures rich in intramolecular hydrogen bonding but not in hydrophobic contacts. Examination of the performance of long-established potentials under the same conditions, including those of Hagler rt nl. (1974). revealed. however, exact,ly t,he same drfi&ncy. It must. again be emphasized t,hat this general and fundamental difficulty is apparent only when fairly powerful minimization procedures are used. allowing the relatively open structures to negotiate many local minima. The nearest met’astable caonformation to an arbitrary start’ing conformation would almost invariably be of higher energy t,han the native-like state as calculated. giving a false satisfactory impression of the adequacy of the model for full folding simulations. This general deficiency is easily traced to t,he overestimat,ion of hydrogen bond strengths in aqueous media from the point of view of overall
277
thermodynamic balance. In reality, intramolecular hydrogen bonds have stabilities comparable with intermolecular hydrogen bonds between protein and wat,er. Simple extensive reduction of hydrogenbonding strengths, say by introduction of a dielectric constant to reduce the dominant. electrost’atic component, does not resolve the problem. since this also destabilized int,ramoleaular hydrogen bonds in compact, including native. structures. The problem is resolved in the interim by penalizing open structures using t,he compaction potential reported elsewhere (Robson ef 01.. 1985). The present, pot,entials can either be applied to folding simulations in conjunction wit,h t’his compaction potential, or be applied wit’hout’ it t’o already cornpact st’ructures for approximate energy refinement, prediction of the effects of site-directed mutagenesis, or modelling proteins by initial fitting t.o known homologous conformat,ions. as well as for simpler peptides in non-polar solvents. The compaction pot’ential is nonetheless somewhat artificial. However, with modification. a solvent tnodel developed in conjunction with M. ,I. E. Steinberg for better treat,ment of compact strmtures has been found to resolve the problem in a more realisbic way for the free energy balance of cornpact and open structures. It is this aspect that we will report elsewhere. It has not been our intention to describe proteitifolding simulations in t,his work. This will be done in subsequent papers from this laboratory. Potential functions are only one aspect of the problem. and algorithms other than those related to the use of the potentials must yet’ be itnplemented to treat a complete folding simulation iii rtlasonable computer time.
APPENDIX
Residue Representations and Molecular Geometries Recommended for use with the Potential Functions The following representations and geomet#riesare recommended for use where the rigid geometry approximation is employed. They are given in a format compat,ible with t.he program SPECIAL described by R,obson & Pain (1973). This program is now fairly widely disseminated in various versions and can generally be recognized by the method used for coding the molecule and generating the conformat’ion and the variables PULL. BE?I;D. TWIST. R’OTATE and UPDATE for geomet’ric parameters and matrices used throughout the calculation. However tnuch the program has changed in the hands of other workers, the method of coding the molecule according to the following representation is fundamental to t)he logic of operation, and the following input is likely to be applicable with relatively minor modificat.ions. The input is also compatible wit.h the developrnent.al programs BR’OFOLD l- 3 and the current version
LUCIFER (Logical Use of C’onformational Information and Fast Energy Routines), which: amongst other novel algorithms. exploits the SIMPLEX method, s-algorithm, and zero minimum algorithm described in the main text. These can be implemented easily in versions of SPE(YTAL. The data are given in the form of a series of integers describing the connect,ivities of atoms, and real values describing the bond geometries (internal variables) in relation to those connectivities. The units so described are residues and blocking groups that can be assembled by the protein into an overall representation of the protein by the program given a primary sequence as input. Since the present data in terms of residues and blocking groups are invariant as far as work on prot,eins is concerned. it is normally referred to as a Residue Dictionary. As far as earlier versions were concerned. a special treatment is required for the N-terminal blocking
-
-
-
---
----
d
2is
Re$ned Models for Protein En,gineering Table 11 (continued) 1
2
3
0’ ** H. S’ H’ (‘1 (‘2 (‘A (‘13 ?J+ B1 N+ H’ (” 0’ ** R. X’ H’ C’I (‘2 (‘2 (‘2 N’ H’ (” N+ H’ H’ Nf H’ H’ (” 0’ ** F. N’ H’ (‘I (‘2 A0 AI AI AI ‘4 1 AI (” 0’ **
12
13
Y. N’ H’ (‘I (‘2 AU AI
4
7
3 4
12
3
16 5 6 i 9
4
IO 11
13 12
14
18
1 1 1
0
2
Phrnylalaninr 1 2 0 1 2 I 3 11 3 4 5 4 5 6 6 7 5 i 8 6 7 8 9 8 9 10 9 10 3 11 13 II 12 Tyrosine 0 I 1 2 1 3 3 4 r 4 ,5 ii
6
8 1.24
Histidine 1 2 0 1 2 I 3 I1 3 4 5 4 :i 6 6 7 5 6 i 8 i 8 9 8 9 10 9 10 3 11 13 12 II Arginine 0 1 2 1 1 3 3 4 4 6 6 5 7 6 7 8 7 9 9 10 10 11 10 12 9 13 13 14 I3 I5 3 16 16 17
5
8
1
1 1 1 1 1
15
17
1
3 4
12
2
3
13 5 6 7
4
1 1 1
1
1 1 1
9 121.0
10
I
180.0
Al .A0 0 HO Al Al (” 0’ ** iv. N H’ (‘1 (‘2 A0 Al N’ H’ A0 Al A1 Al Al A0 C’ 0’ ** c. N H’ (‘1 (‘2 s (” 0’ ** M. N’ H’ c’l (‘2 (‘9 s(‘3 (“’ 0’ **
1.32 I.00 1.47 1.53 1.53 1.40 1.43 1.31 1.35 1m 1.53 1.24
114.0 123.0 123.0 109.47 109.47 130.16 107.81 106.7 112.67 126.44 109.47 121.0
0.0 0.0 180.0 - 121.5 0.0 180.0 180.0 0.0 0.0 180~0 0.0 180.0
1.32 1oI 1.47 1.53 1.53 1.53 1.47 1.00 1.35 1.33 1.00 1.00 1.33 1m 1.00 1.53 1.24
114.0 123.0 123.0 109.47 109.47 109.47 109.47 120.0 120.0 120.0 120.0 120.0 120.0 120.0 120.0 109.47 121.0
0.0 0.0 180.0 - 121.5 0.0 0.0 0.0 190.0 0.0 0.0 0.0 180.0 180.0 0.0 180.0 0.0 180.0
1.32 140 1.47 1.53 1.54 1.40 1.40 1.40 1.40 1.40 1.53 1.24
114.0 123.0 123.0 109.47 109.47 120.0 120.0 120.0 120.0 120.0 109.47 121.0
0.0 0.0 180.0 - 121.5 0.0 0.0 180.0 0.0 0.0 0.0 0.0 180.0
1.32 I.00 1.47 I .53 1.54 1.40
114.0 123.0 123.0 109.47 109.47 120.0
0.0 0.0 180.0 - 121.5 04 0.0
group (or indication that the chain commences NH:), when the first residue is glycine. This is a point of inconsistency between programs and arises from the fact the N-terminal modifications may modify bhe connectivities and geometries of the following unit’, in which glycine is unique in not having a side-chain. Entries to the dictionary (residues and blocking groups) are separated by the delimiter ** and commence with a two-character code (both in columns 1 and 2 of the input record) which is
I? N* c’l (‘2 (‘2 (~‘2 r 0’ **
2 6 i 8 9 8 iI 3 )3
3
4
6
7 8 9 IO 11 II 13 I4
8 9 10
11
Tryptophan 0 1 1 2 I 3 3 4 4 5 R 6 6 7 7 8 7 9 9 10 10 11 11 12 I2 13 13 14 3 15 15 16
Proline 0 I 2 3 4 2 6
1 2 3 4 5 6 i
7
I
12 15
14
2
3
I5 5 6 7 9
4
1
1 1 1
8
10 11 12 13 14 I7
(‘ystelne/(Zystine 0 1 2 1 2 1 3 6 3 4 5 4 5 3 6 8 6 7 Methionine 1 0 1 2 I 3 3 4 4 5 5 6 6 7 3 8 8 9
6
16
1
3 4
I 1
7
1
2
3
8 5 6 7
4
1 1 1 1
10
9
1
(* = I for rotatable 2 * 6 3 4 5 8
7
1
8
9
10
1.40 1.40 1.36 1 a0 1.40 1.40 I.53 1.24
120~0 120.0 120.0 1 I 0.0 120.0 120.0 109.4; 1ZI4
180.0 0.0 180.0 04 180.0 0.0 0.0 180.0
1.32 1 .oo 1.47 I ,53 1.53 1.34 1.43 1 .oo 1.31 1.40 1.39 1.35 1.41 1.40 1.53 I .24
1 14.0 123.0 123.0 109.47 109.47 127.0 107.0 127.0 109.0 128.0 116.0 121.0 122.0 117.0 109.47 121.0
0.0 0.0 1X0.0 - 121.5 0.0 0.0 180.0 180.0 0.0 1 no.0 180.0 0.0 0.0 0.0 0.0 180.0
1.32 140 1.47 1.53 1.86 1.53 1.24
114.0 123.0 123.0 109.4; 109.47 109.47 121.0
0.0 0.0 180.0 - 121.5 0.0 o-o 1 x04
1.32 1xKI 1.47 1.53 1.53 1.86 1.86 I .53 I .24
114.0 123.0 123.0 109.47 109.47 109.47 96.5 109.47 121.0
0.0 04 I no.0 - 121.5 0.0 04 04 04 180.0
peptide 1.32 1.47 1.53 1.53 1.53 I .53 1.24
link) 114.0 126.35 106.0 106.0 106.0 113.88 121.0
0.0 1 X0.0 - 121.5 22.5 - 13.3 - 65.5 180.0
followed by a clarifying comment. The twocharacter code relat’es to the code to be used in inputting the primary sequence. Although t,he recommended IUPAC code for computer input employs single letters, it is to be noted that our programs can be used to treat a rariet’y of molecules other than proteins, and flexibility must be allowed for this. Hence A. is employed for an alanine residue. and AP or A, for an adenine nucleotide in a nucleic acid. The two-letter code for amino acid sequences corresponds to the one-letter
2x0
H. K&son
urld E. I'latt
II:I-‘AC’ code. followed by a decimal point. ‘I%(~ ?r‘ and (’ termini of proteins must be spec4fic~all~ represented as + ( and ) - . respectively. corrtlspending t’o NH: and COO - groups. If ii t~rrminal unit complete in itself. such as pyroglut)amic ar%l. is coded in the dictionary. no end-blocking group is required. Input primar! sequences are + (K.V.F.G.R.(‘.F:.I,. (the S-terminal fragment 01 hen egg-white lysozyme) and M(A.)M (M-acetylalan,v-F-methylamide). The molecules are AWN strucatedin sequenceaccording to the ent’ry with the corresponding two-character heading in the diet,ionary ent’ry (see Table 11). Between the two-character heading and the delimiter ** are the ent’ries proper. The logic used for describing atom caonnrctivities and bond grometries (internal variables) means that each line of informat’ion (a file record) relat,es to it single atom. ( lld71 Yi1?7 I The first column (1) gives a two-character c~tle for the atom type. This is logically chosen for mnemonic purposes but the atom type onlv has meaning in regard to t,he ran drr Waals’ and elrct,rostatic terms to be associated with it). That, is. furthrr input will define the parameters -4, H, (‘ of the potential function defined for that atom or for pairs of named atoms (seeTable 5).
These describe t’he caonnect)ivit’ies between the atoms, i.e. wha,t is connected t’o what. and hence the cahemicalformulae. Column 3 is the simplest to understand since it is the “name by number”. the number of the atom in the chemical formulae. and the index for referring to that specific atom in subsequent calculation of the energy. In the earlier program SPEClAL, t,he numbering of the atoms c*ould be somewhat, arbitrary. The zero minimum algorithm is, however, more efficient if the energy of interaction of all atoms from j = 1 t,o .i = i- 1 are evaluated as the co-ordinates of atom i are calculat,ed. Rejection can then occur of a struct’ure not only when t’he energv is partly rvaluat’ed, but when the conformation ‘is also partly evaluated. This requires more caution for maximal efficiency, particularly when vectorization on a, supercomputrr is t,o be employed. Sumbering of the atoms generally proceeds from the N-terminal of a unit and goes down the shortest side-branch first. The significance of the remaining colums, 2 and 1. 5 and 6, is best understood by considering the method bv which a model would be constructed by hand, ad&ng each atom one at a time. It, is useful also to csonsiderthe molecular representation as a family tree of a unisexual species. Column 2 is the number of the atom t,o which t,he current atom is t.o be added: in family tree terms, this atom is the mother of t,he atom current,ly being added. Columns 1 to 6 carry the numbers of the atoms subsequently to be added to the current atom: in family tree not,ation. the daught,ers. Strictly speaking. either
mother or daughter inf’ormatioti alonc~ ~or~lrl tit’ suficient Howevrr. the order of t ht. ti;rlrghtr~r~ carries tiIrthrr informa,i ion in relation t (1 t orsioll angles. In particaular, the first dai~ghl~t~ it(orrl specified descmri beL q the standard rt~ti~rr~llc*t~ >~~~)III fi)r defining boric1rotat ion ai@++ (5~~~~ I)~low )
This is f for a rotatable bond and blank or zoo otherwise. If atoms 6 and 7 are c~onnrc+d hy II covalent single bond and t,he bond is to b(ballowing t’o rot,ate in rninimization of t,hr overall energy of’ the molecule, then 1 is placed in c.olumn 7 in the line (record) for atom 7. (~OlUwlnR N to 10 Bs in the Cart’esian (x.yz) system of c,o-ordinates. the geometric description of the molecule in terms of internal co-ordinates requires three numbers to be associated with each atom. These are bond length, valence angle and dihedral angle. Strictly. only 3n -6 numbers are required to describe the conformat’ion but this need not concern us here. except to note that bond lengths, valence angles and dihedral angles for specific atoms at the beginning of the molecule will describe t’he t’ranslation, tilt and rotation of the molecule wit)h respect to an external reference frame. Bond lengths, valence angles and dihedral angles require. respectively, two, three and four at,oms to be defined. To associate these three quantities with one at80m. say the (i)th, requires some consideration. (‘sing family tree notation, let m(i) be the tnot,hrr of atom i. let m(m(i)), or ,m,‘(;) sag. he the grandmothrr of i, and let m(m(m(i))), or m3(i) say. be the great grandmother of i. Then column 8 represent’s the length (A) of the bond ,n(i)-i. column 9 represents the valence angle (deg.) vr12(i)-~ m(i)+, and column 10 represents t’he dihedral angle m3(i)-m2(i)-m(i)-ii, all for atom i. The dihedral angle requires particular ment’ion. This is the angle between the plane occupied by m3(i), nr2(i) and m(i), and the plane occupied by m’(i)-m(i)-;. It is thus the torsion angle round t,hr m2(2)-m(;) bond. and not the m(i)-i bond, though it is associat,ed with atom i in the dictionary. More t,han one atom will generally be a, daughter of m(i). The first daughter added to m(i), say i, , has t,he absolut)e value of the dihedral angle associated wit’h it’ in the dictionary. Tf this value is zero, it, implies a cis relationship with respect to sm3(i).If bond m2(i)m m(i) is variable, the value assigned to i, in t,he dict,ionarx will be subsequently alt’ered, or “overwritt,en”, by the conformational changes in energy minimization. Thus, zero implies an arbitrary choice of torsion angle to be associated with atom 1. The subsequent daughter of m(i). say i,. is assigned a relative dihedral angle. It corresponds to the extent to which the atom i, must be rotated around bond m2(i)-m(i) to go from the position of atom i, (strictly. a position on the m(i) through i, vector) to reach its required position. Atom i, is similarly assigneda relative rotation angle with respect to i,.
R@ned Models for Protein Enyineeriny Thus, for a methyl group with symmetrically disposed hydrogen atoms, atom i, would be associated with a dihedral angle of 120”, and atom i, would be associated with a dihedral angle of 120”. This is the angle between the consecutive C-H bonds when looking down the C-C bond. The dihedral angle associated with atom i,, the first daught,er in the daughter list for atom m(i). defines the rotation angle round the m*(i)-m(i) bond, so for et’hane this would be O”, 120”, or -120” for an “eclipsed” conformation, and 60”, 180”, or -60” for a “staggered” conformation. Despite the initial impression of complexity of this method of describing molecular constitution and geometry. it is rigorous and comes easily with practice. In variants of the program SPECIAL in common use we have, however, noted that workers have often switched from the use of relative to absolute dihedral angles. That is, the three last hydrogen atoms in “eclipsed” ethane would be assigned dihedral angles O’, 120”, 240”. rather than 0”. 120”. 120”. This is clearly a matter of taste. We thank the SERC for financial support. The calculations were performed on the CDC Cgber 7600. I70-730 and 205 wmputrrs at. UMRCC.
References Arnott. S. B: Dover, S. D. (1967). J. ililol. Biol. 30. 209212. Bernstein, F. C.. Koetzle. T. F., Williams. G. J. B., Meyer. E. F. Jr, Bric,e. M. D., Rodgers, J. R.. Krnnard, 0.. Shimanouchi. T. & Tasumi. M. (1977). .J. Mol. Riol. 112, 535-542.
Krant, 1). A. & Flory. P. J. (1965).J. Amer.
Chem.
Sot.
87. 2791-2800. Bdl. H. K. h Bressie, K. (1!)74). Biochim. Biopkys. Acta. 161. 665-670. l<.yntrov. V. F.. Portnova, S. L., Tsetlin. V. I., Ivanov. V. I. & Ovcahinnikov. Y. A. (1969). Tetrahedron, 25. 493-5 15. (‘ung. MM.T., Marraud. M. & Neel, J. (1973). .Jerus. Symp. Quunt.
(rhem.
B&hem.
5. 69-83.
Dunning, T. H. (1970). J. Chem. Phys. 53. 2823-2833. Finn. P. W., Robson. B. & Griffiths, E. C. (1984). Znt. J. Pept. Protein Reu. 24, 407-413. Flory. 1’. J. & ScAhimmel. I’. R. (1967). J. Amer. Ch,em. Sot. 89, 6807-6813. (ireenherg. I). A., Barry. D. C. $ Marshall. (:. R. (1978). ,I. .-lmw. (‘hem. Sot. 100: 4020-4026. Hagler. A. T.. Huler. E. & Lifson, S. (1954). J. Amer. (‘hem. Sot. 96, 5319-5327. Hillier. 1. H. & Robson. R (1979). J. Theoret. Riol. 76. K---98.
,Janin. ,J.. Wodak, S.. Levitt, Mol. Biol. 125, 357-386.
M. $ Ma&ret.
R. (1978). J.
281
Kabseh, LV. & Sander, c‘. (1984). Biopolymrrc. 22. 15i’i2637. Karplus, .M. (1963). .I. Amer. Chem. Sot. 85. 2870--2871. Levitt. M. (1983a). J. Mol. Biol. 168, 595-620. Levitt, N (1983b). ,I. &fol. Biol. 170. 723-764. Levitt. M. & Warshel. A. (1975). NutzLw (London). 253. 694-698.
Mc(.‘lrllan.
L. (1963). Tables of Exppr,rimPnfrrl Dipok Freeman. San Francisco. Moman~. F. A., McGuire, R. F., Burgess. A. W. & Scheraga. H. A. (1975). J. Phys. C’hrm. 79. 2361 2381 Mulliken. R’. S. (1955). .J. Chem. Phys. 23. 1833- 1840. Xan,!i. I-‘. K. (1976). Znt. J. Pept. Pro&in Hrs. 8. E&264. Fielder. *J. A. & Mead. R. (1965). Comput. J. 7. 308-313. Ncmrroff. C. H., Hisseette, G.. Manberg. P. ,I.. Osbahr. .\. .J. III. Breese. G. R. & Prange. A. .J.. ,Jr (1980). RrairL Rrs. 195. 69-84. Sozaki. Y. 8 Tanford, C. (1971). J. Hiol. (~‘hunl. 246, 221 l-2217. Platt. E. & Robson, H. (1982). J. Theoret Rio/. 96. 3813!)9. Plat,t. E. & Robson. B. (1983). In CowLputing in Biological S&xce (Geisow. M. J. & Barrett, A. N., rds). pp. 91131. Elsevier Biomedical Press, Amsterdam. Platt E.. Robson. B. & Hillier. I. H. (1981). .I. Theorrt. Biol. 88. 333-353. Ramachandran, G. N. (1973). Jerus. Symp. Q7m7ct. (‘hem. Hiorhew1. 5. 1 Il. R.amachandran. (:. ?v’., Chandraseharan. R. & Kopplr. K. 1). (1971). Hiopolymers, 10, 2113-2131. Robson. B. (1982). Biochem. Sot. Trans. 10. 297Y298. Robson. K. & Finn. P. W. (1983). dtln J. 11. 67 78. Robson. B. & Gamier. ,J. (1986). In Introduction to Protrins rind Protein Engineering, Elsevirr Press. A1msterdam. In the press. Robson, B. hz Osguthorpe, D. .J. (1979). ,J. Mol. Rio/. 132. I !I-5 1 R’obson. B. &r Pain, R. H. (1973). In Conformation, of Hiological llrlolecules and PolywLers. Th,e Jerusalem S!ymposia on Quantum Chem&ry a& Biochemistry, \-()I. 5. p. 161. Academic Press,New York. Robson. 1s. & Susuki. E. (1976). J. Nol. Rio/. 107. 327.m 356. Robson. R.. Hillier. I. H. & Guest. M. F. (1978). d. C’h,em. Sot. (Farada.y Trans. II), 74, 1311-1318. Robson, B.. St)rrn. P. S., Hillier, I. H.. Osguthorpe. 1). J. & Hagler. A. T. (1979). J. (them. Phys. 76. 831-831. Robson. B.. Douglas. G. M. & Platt. E. (1982). Hiochem. Sot. Trans. 10. 388-389. Rohson. B., Plat,t. E., Finn, P. W.. Willard. I’., Gibrat. *J.-F. & Garnier. ,I. (1985). Int. .I. J+pf. Protein Res. 25. I-X. Row. F:. 8 Siegbahn, P. (1970). Throrrt (‘him. Acta (&din), 17. 209-215. Sibanda. B. 1,. & Thornton. .J. hl. (19X5). ,Vutwe (London), 316. 170-174. Ward. I). ,J.. Griffiths, E. C. & Robson. B. ( 19x6). Jnt. J. f~pt. Profein Res. In the press. A. .Wonzun.ts,
Edited by R. Huber