Journal of Molecular Structure, 88 (1982) 137-156 THEOCHEM Elsevier Scientific Publishing Company, Amsterdam - Printed in The Netherlands
QUANTUM THEORY OF THE STRUCTURE AND BONDING IN PROTEINS Part 12. Conformational analysis of side chains and the ethyl group as a model side chain
DAVID PETERS and JANE PETERS The Bourne Laboratory, Department of Chemistry, Royal Holloway of London, Egham Hill, Egham, Surrey TW20 OEX (Gt. Britain)
College, University
(Received 3 August 1981)
ABSTRACT The conformational effects of the side chains of proteins have been examined at the ab initio molecular orbital level of approximation using the ethyl group as a model side chain. The conformational energy map of the CQ ethyl glycine dipeptide (with the side chain away from the backbone) is computed first and the energetic consequences of rotating the ethyl group about its bond to the backbone are computed subsequently. The results may be analysed in terms of torsion barriers about the single bond and non-bonded repulsions. The conformational energy map is very like that of alanine. Rotation of the ethyl group shows the familiar staggering about a single bond (x, = + 60,180, -60”); this effect is sometimes obscured, however, by large non-bonded repulsions between the side chain and the backbone. When the side chain points away from the backbone (x, = 180, -SO”), there is little or no interaction between the two parts but when it is folded towards the backbone (x, = + 60” ) there are extensive interactions. Comparisons with experiment show that the ethyl group is an accurate model for such side chains as leucine and an approximate model for many others. One major simplification of protein structural theory implied by these results is that sidechain conformations are local effects which are generally unaffected by the secondary and tertiary structure of the protein.
INTRODUCTION
The earlier parts of this series [l] showed that the structure and bonding in the backbones of protein molecules may be understood in terms of ab initio molecular orbital theory, particularly as regards hydrogen bonds, nonbonded repulsions and torsion barriers. The Ramach,andran map [ 21 has thus been placed on a firm quantum mechanical basis and detailed maps have been drawn for the glycine [lb], alanine [ lh] and proline [ lj] dipeptides. Moreover, the alanine map has proved useful as a first approximation to the dipeptide maps of all the amino acids except glycine and proline.
0166-1280/82/0000-0000/$02.75
0 1982 Elsevier Scientific Publishing Company
138
It is obvious that the nature and conformation of the individual side chains is important in many respects and that it is necessary to examine the various side chains in a systematic manner. This has already been done in detail at the semi-empirical level using the PCILO method by Pullman and Pullman 131. The computational work is straightforward using the readily available Gaussian 76 package [ 41 as in the earlier work [l] . To organise the computations to cope efficiently with nearly 20 side chains is less simple. The direct approach is to compute all the dipeptide maps of the entire set of residues with their various side chains in every possible conformation, which is the same thing as computing the total energy as a function of several variables ($J, $, x1, x2 . .) for each amino acid. Such a task is, however, beyond our resources. Moreover, the results might prove too voluminous to comprehend and so we have attempted to simplify matters by first studying the ethyl group as a model for at least some of the side chains. Accordingly, the c* ethyl glycine dipeptide of Fig. 1 is featured throughout this paper. This
,,H
cC;ZI% 1
H1s
I
U) 60’
II
(iii)
-60a
Fig. 1. CQ Ethyl glycine dipeptide. The molecule is in the usual arrangement with the N terminus to the left, (IUPAC convention [ 121.) See Appendix 1 and Table 3 for details of the geometry and energies. The upper drawing shows the molecule in the extended conformation of the backbone (0 = $I = 180”) with the side chain in its standard conformation (x1 = -120, x 2 = 60”). The smaller diagrams show the physical situation corresponding to those staggered conformations of the side chain which occur in practice.
139
molecule differs from alanine in having an ethyl group in place of the latter’s methyl group and is the natural choice as a model for such side chains as leucine and perhaps also for those with the CH&H? structure (glu, gln, lys, arg, met). Moreover, the present results are readily adapted to the valine and isoleucine cases using a simple approximate method. Indeed, many other side chains also seem to fit the present results for P ethyl glycine at least in an approximate sense (see below), despite the fact that they are quite different from the ethyl group. The computation of the dipeptide map constitutes the first part of this paper, the computations on the side-chain conformations, the second part. It proved better to study first the computational results alone and then subsequently to compare these with the available experimental data. In this way the computational results are kept separate from current experimental data so that the former may then be readily compared with future experimental data. Several additional computations on the alanine residue are required for comparison purposes and these are included in Appendix 2. METHOD
The Ramachandran map of the c” ethyl glycine dipeptide of Fig. 1 was first computed. To do this the side chain was fixed in a standard conformation such that the resulting map would be as simple and clear as possible. That conformation which keeps the side chain and the backbone as far apart as possible [xl = -120” (cf. Fig. l)] results in a map which is very close to that of alanine [ lh] . This choice has the disadvantage of corresponding to an eclipsed conformation about CC@ but this is shown not to be a serious difficulty .
The important points of the Ramachandran map, i.e. the a! helix, the bridge, etc. (cf. Fig. 2) were then selected and the total energy of the molecule was computed at each of these backbone configurations as the side chain is rotated through 360” (i.e. x 1varies from 0 to 360” ). Experience suggested that this technique would suffice to map the important points of the energy surface but it should be remembered that this is an approximate method and so important or unexpected results should be thoroughly checked. The technique is quite feasible within a small volume of an energy surface in three or four variables, such as 9, J/ , x1 and x2. When the computations have been completed, it is essential to interpret the results in familiar terms. Suppose, for example, that the backbone and side chain choose their conformations independently of each other. The total conformational energy is then merely the sum of the two independent conformational energies. Another possibility is that non-bonded repulsions will exclude certain overall conformations of both parts of the molecule. Both these cases often arise in this work. Finally, more subtle interactions between the backbone and the side chain of a kind which are without simple classical
-120°
-MO ethyl
Illyclns
-60’
Da
0
60’
12-Y
180”
k cc,, mar
Fig. 2. Ramachandran diagram of CQ ethyl glycine. The left-hand half of the map was computed using a 30” grid overall plus a 10” grid in important regions (CKhelix, C, regions). The left half of the right-hand side was computed in less detail except for the special points (crL helix, both saddle points and the lowenergy region at the top and bottom near @J= + 60” ). The high-energy region to the right of the right-hand side was not computed because in the alanine case this region is at or above 30 kcal mol-I. The bottom left corner of the map is too high in energy because of the use of the standard conformation of the side chain over the whole map (cf. text). The zero of energy is the C, conformation (xi = 240, x1 = + 60”) and is -447.96467 hartrees. In practice, the lowenergy conformations are all about 4 kcal mol-’ below the energy shown on this map. Accurate energy values are given in Table 4.
interpretations may arise. Such effects seem almost inevitable in say chemical reactivity problems but so far none have been found within the present work. It is also useful to have a rough idea of the energies involved in this work. The Ramachandran map contains two regions of low and high energy separated by some 20 kcal mol-‘. Within the low-energy area, there are variations of about 5 kcal mol-‘. The conformational energies within the side chain vary from about 3 kcal mol-’ to over 20 kcal mol-’ but it seems unlikely that conformations with energies above about 7 kcal mol-’ are important in conformational problems. Overall, then, this work is concerned with energies of l-10 kcal mol-‘.
141 COMPUTATIONAL
RESULTS
The Ramachandran map of c” ethyl glycine This is shown in Fig. 2 and is computed with the side chain in the standard conformation (x1 = -120”). The map differs very little from that of alanine as models suggest. The only appreciable difference is in the bottom left-hand comer where the ethyl group interacts directly with the NH2 group of the backbone, an interaction which depends on whether x1 is eclipsed (as here) or staggered (as in alanine). One consequence of this difference is that the saddle point at about 4 = 0”, 9 = -90” is apparently raised in energy on the present map as compared with the alanine map. Another is that residues may occasionally appear (cf. Fig. 3) in a region which seems to be of very high energy (on maps computed using the standard conformation such as the present map) relative to the alanine map. This may be physically unrealistic and so if this aspect is important the computations must be repeated in greater detail with the side chain in the position of minimum energy. The above two exceptions apart, however, no unexpected complications arise on going from the alanine dipeptide to the CY ethyl glycine dipeptide. This is helpful since many of the conclusions on the alanine residue from earlier work [ lh] may be carried over to the present work. Two comparisons with experimental data (Figs. 3 and 4) are made using the present map and these are discussed below.
-180°
-120°
-60’
elastase/ethyl
+ 0’ g,yc,nemop
60° k Cd
1204
1sd
m01“
Fig. 3. Experimental data for elastase on the C? ethyl glycine map. The glycine and proline residues are omitted [lb, lj] . See caption to Fig. 2 concerning the bottom left-hand corner of the map.
142
Fig. 4. Experimental data for actinidin on the Cff ethyl glycine map. The glycine and proline residuesare omitted [lb, lj] . See caption to Fig. 2 concerning the bottom left-hand cornerofthe map.
Side-chain conformations The results of rotating the ethyl group about its bond to the backbone are shown in Figs. 5-11. Each diagram refers to one important point of the main map and x2 is held fixed at 60” in all cases. The immediate impression from these results is that the staggered conformations about x1 are strongly preferred in most situations. Only in the planar (or extended) form of Fig. 10 is there departure from this generalisation and this case is not of physical importance. The staggering is presumably a normal torsional barrier effect of the kind well known in small molecules [ 51 and, although the non-bonded repulsions often complicate the picture, it is a simple matter to distinguish the two effects using a model and the internuclear distances. The results are particularly straightforward when the side chain is away from the backbone (x1 = 180 and -60” ) as shown in ii and iii of Fig. 1. These two conformations of the side chain are of low energy in nearly all backbone conformations and they are separated by a low barrier whose magnitude is close to the torsion barrier in alanine (see below). An inspection of models and internuclear distances shows clearly that non-bonded repulsions are absent and so there is a normal torsion barrier about the cLy@ single bond in this range of x1 (180 to -60”). In other words, the conformations of the side
143
Fig. 5. Energy as a function of x, with the OLhelical backbone. The left-hand scale refers to the absolute energy of the molecule referred to the global zero (cf. Appendix 1). The repulsions to the left arise as the methyl group interacts with the NH, or the NH groups of the backbone. This arrow denotes the standard conformation of the side chain.
/
f
d
Fig. 6. Energy as a function of x, with the bridge backbone. See caption to Fig. 5 concerning the scales and axes. The repulsions to the left arise as the methyl group interacts with the NH group or with the right-hand side oxygen atom.
Fig. 7. Energy as a function of x1 with the C, backbone. See caption to Fig. 5 concerning scales and axes. The large repulsion arises from the interaction between the methyl group and the right-hand side oxygen atom and the smaller repulsion from the interaction between the methyl group and the NH group. In this case, the local and the global zeros coincide.
Fig. 8. Energy as a function of x, with the left sheet backbone. See caption to Fig. 5 concerning scales and axes. The repulsions arise from the interaction between the methyl group and the NH group or the right-hand side oxygen atom.
145
Fig. 9. Energy as a function of x, with the right sheet backbone. See caption to Fig. 5 concerning scales and axes. The larger of the two repulsions is between the methyl group and the NH group and the smaller is between the methyl group and the right-hand side oxygen atom.
Fig. 10. Energy as a function of x, with the extended backbone. See caption to Fig. 5 concerning scales and axes. The very large repulsions arise from the interaction between the methyl group and the NH, group or the left-hand carbonyl group. This case is not important in practice and is included only for completeness.
146
Fig. 11. Energy as a function of X, with the CX~L helix. See caption to Fig. 5 concc scales and axes. The very large repulsions arise from interactions between the mc group and both oxygen atoms of the backbone.
chain and the backbone are independent in this range of x, and for n the backbone conformations. These two side-chain conformations (x1 = 180 and -60”) are inva~ close in energy with a maximum gap of about 1 kcal mol-’ and so it surprising to find that both are appreciably populated in reality. It II the case that external factors of the kind which are neglected here w mine the relative populations of these two levels. The torsion barrier which separates these two conformations is us about 4 kcal mol-’ and agrees well with that found in alanine, see Fig Only in the left-hand sheet region is the barrier smaller, at about 2 kc and inspection of models suggests that this is due to non-bonded rep raising the total energy of the staggered form. This is why the alanin is fitted to the CY ethyl glycine curve in Fig. 8 while the fitting of th curves in all other cases is straightforward. Again, the extended regio of Fig. 10 are ignored since they are physically unimportant. The case of the third staggered position (x, = + 60” ) is altogether I complicated than that of the first two positions (x1 = 180 and -60”) third side-chain conformation is shown in (i) of Fig. 1 and the result depend on the conformation of the backbone itself. For example, if backbone is in the a: helix conformation, this side-chain conformatio + 60”) is of relatively high energy (+ 7.5 kcal mol-’ ). while if the back
147
is in the fi sheet conformation, it is of relatively low energy (Figs. 8 and 9). So there is now an interplay between the conformations of the backbone and the side chain and it is clear from models that the non-bonded repulsions between the two parts of the molecule are the cause of the complications. These non-bonded repulsions are also responsible for the very large barriers of the side chain from the other which separate the x1 = +60” conformation two such conformations. These energy barriers are large enough to lock the side chain into this + 60” conformation unless other factors intervene. The situation where x1 = + 60” is complicated and the most straightforward way to deal with it is to begin by fixing on one particular conformation of both side chain and backbone, reading it’s total energy from the map and diagrams and then studying the various ways in which the energy may be lowered. For example, with the backbone in the cy helical conformation and the side chain in the +60” conformation, one way of lowering the total energy is by the side chain surmounting a barrier of about 6 kcal mol-’ and reverting to a x1 value of 180” (Fig. 5). Another way is for the backbone to change its conformation towards the bridge region (Fig. 6). In the latter case, the side chain is changing the conformation of the backbone in order to lower the energy and doing so without an apparent barrier to overcome. One additional problem in this case is that if the backbone is already an integral part of an (Yhelix, it would require substantial amounts of energy to disrupt the a! helix. The overall outcome must therefore be judged in terms of all the relevant factors. The Ramachandran map plus the side-chain conformations Combining the map and the side-chain results is very simple when the lowenergy conformations only are examined. Indeed, it was assumed in the previous two sections that the two may be simply added together to give results which are self-consistent and in agreement with experiment (see below). Over most of the map (cf. Table 4), one may imagine that about 4 kcal mol-’ is subtracted from the whole map, such an exercise being purely formal since it leaves the map unchanged. There is an indication in both the bridge and the left-hand sheet region that the lowering of the energy should be about 1 kcal mol-’ less than this, or that these two regions of the map should be raised by about 1 kcal mol-‘. However, no substantial changes in the relative energy of the different parts of the map is suggested by our results and this approximately agrees with experiment since alI the residues apart from glycine and proline appear to fit onto one and the same Ramachandran map. Hence, the map is very successful in giving a qualitative or semi-quantitative account of the conformation of the backbone. It must be remembered of course that meeting the requirements of the Ramachandran map is a necessary but not sufficient condition on the individual residues and that additional conditions such as the formation of the (Yhelical hydrogen bonds must be followed in the real molecule.
148 COMPARISONSWITHEXPERIMENTALRESULTS
As far as the Ramachandran map of Fig. 2 is concerned, it is not necessary to make extensive comparisons with experiment since the map closely resembles that of the alanine residue and it has already been shown how well the latter map agrees with the experimental data for insulin, lysozyme and OL chymotrypsin [ lh] . It is interesting, however, to make two additional comparisons with the recently published data for elastase [ 61 and actinidin [ 71 and these results are shown in Figs. 3 and 4. It is clear that the experimental and theoretical results agree well, with some 400 points all falling in the lowenergy region to the left of the map plus a few in the 0~~region to the right of the map. Moreover, there is a marked concentration of the points in the cr ’ helix and p sheet regions of the map. The glycine and proline residues are well known to fit a different map which has been given elsewhere [lb, li] . There is one apparent exception in the bottom left-hand region of the elastase results (Fig. 3) where there is a point in the + 30 kcal moP region. The high value of this region on the present map is quite false, however, and in the alanine map where the side chain is staggered about its bond to the c” atom (xl = 60” ) this region is at about + 10 to + 12 kcal mol-’ which is high enough to prevent numerous residues appearing there but not high enough to prevent the appearance of one or two. These results are clearly satisfactory. The next step is to compare the results on the side-chain conformations with experimental data (Figs. 5-11). Two kinds of comparison are possible here. One is a broad comparison in which the detailed forms of the side chains are not fully considered and the other is a specific comparison with the experimental results for a particular side chain and a specific backbone conformation. The former is useful for the verification of the general validity of the computations and the latter for the analysis of specific problems. The broad comparison is made by reproducing in Figs. 12 and 13 the available histograms for the numerical values of the x1 angles in (Ychymotrypsin [ 81 and elastase [ 91. Both histograms show the clear preference for the staggered conformation of the side chain (x1 = + 60,180, -60”) and this is true despite the varying nature of the side chains of the residues displayed in the histograms. In a! chymotrypsin, 64% of the residues have a value of x1 which is within + 20” of the ideal staggered positions. In elastase this value is 91%. In addition, the computed results suggest that the values of x1 of 180 and -60” are generally of lower energy than the i-60” value and this point is verified by the experimental data of the histograms. Indeed, if serine and threonine are excluded from the comparison since it is known from other computations that these two residues are exceptional [lo] , then the preference for the x1 values of 180 and -60“ is very clear. One can conclude that in many cases the torsion barriers do govern the conformations of the side chains, as both experiment and theory suggest, and that these torsion barriers are sometimes overridden by large non-bonded repulsions.
149
0’
.60*
Fig. 12. Histogram of the values of x, in (Ychymotrypsin from ref. 8.
Three examples of the detailed comparisons are selected. The first of these is a comparison of the x1 values of P ethyl glycine with those of leucine in elastase. The other two are concerned with the two major areas of the Ramachat&an diagram, the (Yhelix and the QI~helical region. The object of these comparisons is to verify again that the computed values are correct and to apply the results computed here to explicit detailed problems.
Leucine in elastase Inspection of models of the isobutyl group as the side chain of leucine suggests that it may be represented adequately by the ethyl group for present purposes. The x1 values of the leucine residues in elastase [6] are shown in Fig. 14. Two of the 18 residues are near the (Yhelix region and both have a x1 value of near -60”. The computations of Fig. 5 show that 180” and -60” are the low-energy conformations of the side chain for the (Yhelical region of the backbone. The remaining residues are within the p sheet region of the Ramachandran map. Five of these have a x1 value around 180” and these are close to the C,
150
-60
Fig. 13. Histogram of the values of x, in elastase from ref. 9.
region of the map. Reference to Fig. 7 shows that this value of x1 is indeed the lowest energy conformation within this region of the map. Ten of these residues have a x1 value of -60” and these are within the right-hand sheet region of the map. The results of Fig. 9 show that this value of x1 is the sidechain conformation of lowest energy although the x, value of 180” is also of low energy. Finally, no case of a x1 value of +60” is found in the experimental data and this agrees with the computed results of the figures. There is therefore no disagreement whatever between experiment and theory in this example although in some cases the theory predicts two low-lying conformations for the side chain and only one of the two is populated in reality. The latter point suggests that there are additional small factors involved whose nature is unknown at present. The (11helical region The results for c” ethyl glycine show that in this region of the Ramachandran map the +60” conformation of the side chain, where the side chain is folded over the main chain, is of high energy (+7.5 kcal mol-‘). The CY
151
-160’
-12d ,CYCi”e
0’
-64’ in
elasmse,ethyl
giycine
0 mop
120°
6d k Cd
IFSo’
Inor
Fig. 14. The x, values of the leucine residues in elastase superimposed on the CQ ethyl glycine map of Fig. 2. The heavy line dividing the map into two regions at the top is drawn for convenience and is not a theoretical construction.
chymotrypsin results (Table 1) show [ 81 that this enzyme contains two regions of OLhelix (165-172 and 235-242) and of the 16 residues with side chains in these two helices all have x1 values of 180” (3 residues) or -60” (12 residues). The only ambiguous result is in the side chain of valine 235 where a x1 value of +30” is reported [8] . However, this can be connected with the well known result that residues near the ends of helices have unusual conformations [ 1 l] . In the two corresponding helices in elastase (Table l), the situation is less clear since one of the two helices is involved with the methionine loop (164-177) and is somewhat irregular [6] . Nevertheless, a x1 value of +60” is found only in four serine residues and it is known [lo] that the x1 values of serine and threonine residues sometimes differ from those of the simpler side chains (see Figs. 12 and 13) and are not modelled by the ethyl group in a reliable way. With the exception of the serines, then, experiment and theory agree well. It may be argued of course that these examples (Table 1) are of actual 01 helices and that the whole structure of the helix may be affecting the conformation of the side chains. This does seem to be a real possibility which we hope to investigate later when some of the relevant computations have been carried out.
152 TABLE 1 Numerical values of x1(“) in the helices of a chymotrypsin a Chymotrypsin Asn Thr Asn cys Lys Lys Tyr Trp
165 166 167 168 169 190 171 172
XI
-57 -64 -69 -88 -75 -54 -104 -94
Elastase Tyr 165 Ala 166 Ile 167 cys 168 Ser 169 Ser 170 Ser 170A Ser 170B Tyr 171 Trp 172
[8] and elastase [6]
-29
-50 -24 -20 -19 -28 -54 -33
-61 -68 -41 -165 -59 -80 -42 -56
J, -60 -59 -89 -78 -98 -78 -52 -77 -109 -96
-39 -42 -33 -16 +ll +138 -24 -17 -113 -24
tJ Val235 Asn 236 Trp 237 Val238 Gln 239 Gln 240 Thr 241 Leu 242
Xl
-172
-
+ 175 -166 +62 -59 +59 -180 -78 -82
Val231 Ser 232 Ala 233 Tyr 234 Ile 235 Ser 236 Trp 237 Ile 238 Asn 239 Asn 240 Val241 Ile 242 Ala 243 Ser 244
-58 -57 -75 -61 -66 -78 -74 -65
J, -44 -31 -39 -49 -31 -29 -62 -12
XI
+30 -58 +158 -171 -71 -87 -65 -29
0
J/
Xl
-41 -72 -85 -109 -42 -62 -68 -64 -79 -66 -73 -76 -90 -95
-33 -13 -21 -2 -42 -45 -40 -34 -38 -37 -40 -25 -22 +3
+ 176 +77 -62 + 175 +168 + 178 +167 -73 -74 +165 +160 ‘+ 62
The left-handed (01~) helix region Figure 11 shows that in this backbone conformation a x 1 value of + 60” is of very high energy while the -60” value is slightly lower in energy than the 180” value. In all 10 examples given in Table 2, the experimental value of x1 is close to -60”. It is interesting that no value of 180” is found for x1, but with only 10 examples and with varying side chains good agreement with experiment cannot be expected. Should the point become important in the future, it may be checked with computations using the explicit form of the relevant side chains and with more extensive coverage of the Ramachandran map than is possible in the present broad survey. To summarise the results on these three special cases, from over 60 examples there are only four which show clear disagreement with the experimental data concerning the possible values for the angle x1. All four of these involve serine as the residue which is known [lo] to be anomalous. There is, however, the problem that the theory sometimes predicts more than one conformation of low energy while it is found in practice that only one such conformation is populated. This is probably a genuine limitation of the theory (see below).
153 TABLE 2 Numerical values of x1 in the left-handed helix region of Q chymotrypsin and elastase Residuea
@(“I
@(“I
Elastase Tyr 101 Am133 Am148 Asn 204 Arg223
48 86 72 47 76
24 9 21 37 12
XX ) -51 -101 -71 -67 -75
Residuea
@(” 1
a Chymotrypsin Asn 18 65 Ile 99 76 Asn 101 57 Aan 204 45 cys 220 51
1L(“1 31 10 47 31 27
x1(“) -100 -20 -49 -58 -47
aAii of these residues occur in the left-hand helix region at about Q = + 60” , J, = + 40”. The Ile 99 example corresponds to both the methyl and the ethyl groups pointing away from the backbone as in Fig. 11.
CONCLUSIONS
The most significant result of this work is the inference that the conformations of the side chains are determined by the nature and conformation of their own backbones and not by the secondary or tertiary structure of the protein or the presence or absence of particular water molecules or similar external effects. The resulting simplification of the entire picture of protein structure is significant. It may be the case, of course, that individual side chains may depart from this generalisation and then the particular effects operating will have to be investigated. It was mentioned above that there seems to be a genuine limitation to the theory when energy differences between two or more conformations are as small as 1 kcal mol-’ since values as small as this cannot be computed reliably. It should be noted, however, that fortunately this kind of problem is readily solved experimentally by the simple Boltzmann distribution which shows that both conformations are likely to be appreciably populated. On the other hand, when the energy difference is e.g. 5 kcal mol-’ the theory is expected to be effective in demonstrating such large energy gaps while experimentally it is very much more difficult to detect the presence of very small populations. This is an interesting example of the complementary nature of theory and experiment. Naturally, a great deal more experience and critical testing of the theory is required before the accuracy of the computed results is known. The borderline between accuracy and inaccuracy can be tentatively estimated as being in the region of 2 kcal mol-‘. This is consistent with some relevant small molecule results [ 51 . The major explicit result of this work is the demonstration of the way in which the torsion barriers and the non-bonded repulsions combine to control the conformation of the side chain of the dipeptide of c” ethyl glycine. The question as to whether the side chain points away from the backbone or not is crucial in determining the balance between torsion barriers and non-bonded repulsions in this context.
154
The use of the ethyl group as a model side chain seems to be effective. It is obvious, however, that those side chains which are able to form hydrogen bonds will be materially different from the ethyl group and this is known to be so for serine and threonine. The same is probably true for aspartic acid and asparagine but not perhaps for glutamic acid and glutamine where there are indications that the hydrocarbon chain dominates the conformational situation. One problem of this work is that an element of subjective judgement is involved when the regions of the Ramachandran diagram are defined as the (Yhelix and the bridge and similarly at the top of the map in the /I sheet region. Experience so far suggests that this is not a serious difficulty but it is a source of uncertainty and there is no apparent way of avoiding the problem. Finally, the comparisons with experiment which are reported here are simply illustrative of the effectiveness of the theory and are not intended as an analysis of the data presently available. This task may be carried out at a later date when the overall situation concerning the conformation of protein molecules is understood in greater depth. For example, that the residues obey the requirements of the Ramachandran diagram is a necessary but not sufficient condition on the whole system and there are plainly additional requirements which must be satisfied before the overall conformation of the entire protein molecule is determined. One such requirement is the (Yhelical hydrogen bond which is not accounted for in the Ramachandran diagram but whose existence has been investigated [ lg] . When all the conditions which govern the conformations of the residues are established, then a comprehensive comparison with experimental data may be attempted. APPENDIX
1
The absolute zero of energy to which all values given are ultimately referred is the computed energy of the C, conformation of C* ethyl glycine with C”C? eclipsed and CpCystaggered ($ = -8O”, J/ = +80”, x1 = -120”, x2 = +60”). These values of x1 and x2 are used to define the standard conformation of the side chain in the text. The computed value of this energy zero is -447.96467 hartrees. It is often convenient to use the point on the Ramachandran map as the local zero for Figs. 5-11 which show how the total energy varies with the value of x1. The convention used throughout this work is the IUPAC version [ 121 . The standard geometries used for c” ethyl glycine are the same as those used in earlier work [lb, lh] . They are set out in Table 3 for convenient reference. The terms syn, gauche, tram etc. have not been used in this work since they are found to be more confusing than helpful in these complicated molecules. To determine the absolute value of the energy of any given overall conformation ($, $, x1, 60”), the energy of the point 4, $ can be read off the
155 TABLE 3 Fixed geometrical parameters in Co ethyl glyclne. The numbering is given in Fig. 1. Details of the sources are given in refs. la and lb. If two or more lengths or angles are equivalent, only one value is reported Bond lengths (A) C’c? c205 C2N6 C’Ha N6H’ N6Ha C’N9 N9H’0
1.519 1.220 1.380 1.124 1.022 1.022 1.459 1.022
Bond lengths (A)
Bond angles (” )
NV1 C”C’Z C”H’S C’c? CX!” CMH’,
C’C’OS 127 C’CzN6 113 NPC”CLZ 120 N9CL1C’J 113 C’, C?, Cl’ all angles tetrahedral (109.5) N6, N9 all angles trigonal (120)
C?H’*
1.380 1.220 1.110 1.519 1.530 1.124 1.124
Ramachandran diagram (Fig. 2) and the energy from the appropriate x1, 60” diagram added on. For example, the cuhelical region of the backbone is at +4 kcal mol-’ and the point x1 = 180” is about 4.5 kcal mol-’ below this, so that the overall value of the energy of the point -60, -40,180, + 60” is -0.5 kcal mol-’ relative to the absolute zero (-447.96467 hartrees). The energy may of course be read directly from the x1 diagram but then the connection with the Ramachandran map is not self-evident. In fitting the points onto the Ramachandran maps of Figs. 3,4 and 14, the positions of the points were frequently slightly adjusted so that the map is clear and easily read. Consequently, the values of the dihedral angles read back from these maps may differ slightly from the recorded values. TABLE 4 Parameters of the selected regions of the Ramachandran map of CQ ethyl glycine. The absolute energies are in hartrees, the relative energies in kcal mol-’ from the C, region. The side chain is in the standard configuration (x, = -120” , xa = + 60” ). These points are displayed in Fig. 2 (cf. text on the point that the relative energies are effectively unchanged whether the side chain is in the standard conformation or in the conformation of minimum energy) Region i ii iii iv v vi vii
c1Helix Bridge C, Left sheet Right sheet Extended L Helix CY
Absolute energy -447.95847 -447.95573 -447.96467 -447.96152 -447.95915 -447.96033 -447.95713
Relative energy 3.90 5.61 0.0 1.98 3.45 2.72 4.7 3
@(” ) -60 -100 -80 -150 -80 -180 +40
+(“) -40 0 +80 +150 +140 -180 +60
156 TABLE 5 Torsion barriers about the c(yCp bond in alanine. The total energy was computed at 0,30, 60 for x, and the energies quoted are the differences between the 0 and 60” values. The staggered conformation (x, = 60” ) is invariably the lowest energy point. Energies in kcal mol-’ (YHelix
Bridge
C,
Left sheet
Right sheet
Extended
(YL Helix
4.52
4.52
3.58
4.58
4.71
3.13
3.25
APPENDIX 2
The torsion barrier of the methyl group in alanine was partly established in earlier work [lh] but for present purposes greater detail was required. The additional data are shown in Table 5. The size of this barrier varies in the range 3 .l-4.7 kcal mol-’ depending on the backbone conformation. It is interesting to note that the larger barriers correspond to the populated regions of the Ramachandran map and the lower barriers to the unpopulated regions. It is possible that these short-base (STO-3G) computations underestimate the non-bonded repulsions between the methyl group and the backbone and that when it becomes possible to carry out long-base computations on these molecules the point will be clarified (cf. ref. Id). REFERENCES 1 D. Peters and J. Peters, J. Mol. Struct., 60 (1978) 133 (a) (Part 1); 53 (1979) 103 (b) (Part 2); 62 (1980) 229 (c) (Part 3); 64 (1980) 103 (d) (Part 4); 68 (1980) 243 (e) (Part 5); 68 (1980) 255 (f) (Part 6); 69 (1980) 249 (g) (Part 7); 85 (1981) 107 (h) {Ppa; ;;)85 (1981) 257 (j) (Part 9); 85 (1981) 267 (k) (Part 10); 87 (1982) 341(l) 2 C.kmakrishnan and G. N. Ramachandran, Biophys. J., 5 (1965) 909. 3 B. Pullman and A. Pullman in Adv. Protein Chem., 28 (1974) 348. 4 W. J. Hehre, W. A. Latham, R. Ditchfield, M. D. Newton and J. A. Pople, GAUSSIAN 76, QCPE 368, Indiana University, Bloomington, Indiana, U.S.A., 1976. 5 See for example, D. R. Truax and H. Weiser, Chem. Sot. Rev., 5 (1976) 411. 6 L. Sawyer, D. M. Shotton, J. W. Campbell, P. L. Wendell, H. Muirhead, H. C. Watson, R. Diamond and R. C. Ladner, J. Mol. Biol., 118 (1978) 137. 7 E. N. Baker, J. Mol. Biol., 141(1980) 441. 8 J. J. Birktoft and D. M. Blow, J. Mol. Biol., 68 (1972) 187. 9 Ref. 6, P. 185. 10 D. Peters and J. Peters, unpublished results. 11 T. L. Blundell and L. N. Johnson, Protein Crystallography, Academic Press, New York, 1976, p. 41. 12 IUPAC-IUB Commission on Biochemical Nomenclature, J. Mol. Biol., 52 (1970) 1.