Quantum theory of the structure and bonding in proteins

Quantum theory of the structure and bonding in proteins

Journal of Molecular Structure, 62 (1980) Q Elsevier Scientific Publishing Company, QUANTUM PROTEINS THEORY 229-247 Amsterdam - OF THE STRUCTURE ...

2MB Sizes 0 Downloads 15 Views

Journal of Molecular Structure, 62 (1980) Q Elsevier Scientific Publishing Company,

QUANTUM PROTEINS

THEORY

229-247 Amsterdam -

OF THE STRUCTURE

Printed in The Netherlands

AND BONDING IN

Part 3. The tripeptide

DAVID PETERSand JANE PETERS The Bourne Laboratory, Department of Chemistry, of London, Egham, Surrey (Gt. Britain) (Received

Royal Holloway

College,

University

10 July 1979)

ABSTRACT It is shown that ab initio computations using the GAUSSIAN 70 package are able to identify the two C,, hydrogen bonds which are formed by 1:3 interactions in the tripeptide. The type II hydrogen bond is about twice as strong as the type I bond (4.2 vs. 2.3 kcal mol-I). This is presumably because the former hydrogen bond is straight and the latter is bent. Such hydrogen bonds are often thought to be the stabilising factor in the formation of the bends in protein chains. It is also shown that many of the features found in the dipeptide, particularly the C, and C, hydrogen bonds, occur again in the tripeptide with virtually the same energy and geometry. This confirms the view, based on chemical experience, that in a first approximation the tripeptide may be viewed as the sum of three amide units plus small correction terms. INTRODUCTION

In this third paper of a series [l, 21 on the ab initio computation of the structure and properties of small peptides, we discuss the tripeptide of Fig. 1. We use the GAUSSIAN ‘70 package with a short basis set in order to keep the amount of computer time within reasonabIe limits (see below). This tripeptide is the simplest possible such molecule and contains two glycine residues as its central structure. Such a molecule is tiny by biochemical standards but it is large enough to exemplify several different kinds of hydrogen bonds, two of which, the C7 and the ClO, are often thought to be the stabilising factor in the bends which occur in protein chains. In this sense, the ab initio computations are of immediate biological significance. Notice that, although we speak of 1:3 interactions because they involve the first and third pepfide units, these Cl0 hydrogen bonds are often called 1:4 because they involve the first and fourth aminoacid residues. In this work on the tripeptide, we are unable to be as exhaustive as in the dipeptide work of ref. 2 because there we needed only 150 computations to construct the entire conformal energy map. To do the same thing in the

230

Fig. 1. The tripeptide. The molecule is obtained by adding another amino acid residue to the right-hand end of the dipeptide in the natura1 way. There are two amino acid residues in the molecule, plus the HCO on the left and the NH, on the right. The labelling @I$ 2; G,$, is conventional [6] and is used despite the fact that only fragments of the 1 and 4 amino acids are present. The labelling L/M/R for left, middle and right refers to the three peptide unite. It is convenient to use the central plane (C,O,N,) (the symmetry plane of the molecule in the extended form shown in this figure) as the reference plane for the and “below” refer to this plane. moleculein all conformations. The terms “above” The bond

lengths

and angles used are those given [2]

in the legend

of Fig. 1, ref. 2.

tripeptide would require 10’ computations. Such a tour de force was achieved by Maigret and Pullman in their work [3] on the tripeptide using the PCILO method, but to do the same thing in an ab initio context would require a prohibitively large amount of machine time. We were therefore forced to modify the approach by using the dipeptide results as a starting point and then, with the help of models and chemical experience, reducing the tripeptide problem into the realms of possibility. The same general approach should be effective with the larger peptides. Within this restriction, however, we follow the methods of ref. 2 as closely as possible. We firstly give evidence to show that the tripeptide is, in a first approximation, three separate peptide units joined together, as generally assumed. This point is not trivial because the molecular orbital method treats a molecule as a collection of fixed nuclei and a cloud of electrons and there is no preconceived idea about the nature of subunits within the molecule. This is unnatural from a chemical point of view 141, but one of the strengths of conventional molecular orbital theory is that it is free from any preconceptions. The next step is to examine the relationship between the tripeptide and the dipeptide. This is achieved by plotting the total energy of the tripeptide against the total energy of two dipeptides with matching conformations. We find a straight line with unit (45”) slope in all cases with the exception of the clearly established 1:3 interactions, such as the C!,, hydrogen bonds. It follows immediately that all the effects which occur in the dipeptide appear unchanged in the tripeptide, otherwise there would be departures from this line. In the next part, we look for the plateau region of the tripeptide just as we did in the dipeptide. Again, we cannot do this exhaustively in the tripeptide, so we begin with the plateau regions of the two dipeptides and compute the total energy of the tripeptide within this region. The plateau, which is a fourdimensional surface in a five-dimensional space, seems to exist with about the same degree of accuracy as that in the dipeptide.

231

Finally, we search for the Cl0 hydrogen bonds. There are three such possible bonds, of which we have examined two. The unexamined hydrogen bond is the helical case, which will be dealt with later. The other two prove to be typical COHN hydrogen bonds with one (type II) about twice as strong as the other (type I). There are two short sections containing preliminary comparisons with experiment and some notes referring to the dipeptide work. THE TRIPEPTIDE

AND THE THREE

MONOMERS

The object of this section is to show that, in a first approximation, the tripeptide is indeed a tripeptide, i.e. three peptide units joined together. The most direct and comprehensive way of doing this would be to optimise all the bond lengths and bond angles in the tripeptide at all conformations and then show that these are the same as those in the monomers of the appropriate conformation. This exercise is far beyond our resources so we have simply selected the bond length which seems most sensitive to environmental effects (the NC bond of the amide unit [l] ) and computed the optimised values of their lengths. The results for the tripeptide, the dipeptidc and the monomers are shown in Table 1. It is clear that the optimised values of these NC bond lengths differ little in going from the monomers to the dipeptide and tripeptide. This suggests (but does not prove) that the electron organisation within the amide unit and particularly the ‘IIelectrons of the NC bond are very alike in all three situations. Thus there is no evidence for extensive delocalisation of the 7~electrons throughout the dipeptide and tripeptide. The result does not rule out the possibility of an amount of delocalisation throughout the molecules sufficiently large to affect other properties, such as the dipole moment. Another way of showing the similarity between the tripeptide and the monomers is to examine the populations of the atoms in the molecule (Fig. 2). The uniformity among these populations is clear. It is interesting to notice in passing that the populations alter when the atom attached to the atom in question is changed and also that the populations remain the same when the atom one further removed is changed. TABLE

1

Optimised NC bond lengths (a) in mono-, di- and tripeptidesa ~~Monopeptide Acetamide N-methylformamide (trans)

Dipeptide(L/R)” 1.408 1.403

Extended C, H-bond

1.409/l 1.408/l

Tripeptide(L/M/R)” .409 -408

Extended 1.410/1.408/1.409 Double C!, H-bondc -/l-409/

-

aCf. Table 2 of ref. 2. bL/M/R is left/middle/right as in Fig. 1. The experimental values of the NC bond length [ZO] are in the range l-32-1.35 A. CThe double C, hydrogen bond has one above and one below the plane of Fig. 1.

232

0.923 Ho.arr 0.946

I

08.287 0.925

0.336

H~C~~~~~d; 9.2750"

II

OBOO

H6>2cHcxw&4;6 I\ ' Ho.elr 0.934H Ho.934

A\ o-933 Ho.933

Fig. 2. Populations in the tripeptide, the dipeptide and the monomers. All molecules are in the extended form (-180, -180). Dipeptide and monomer results are from refs. 1 and 2.

Finally, further evidence that the tripeptide is in a first approximation just three monomers is given in the following section, where we see how closely the tripeptide resembles the dipeptide, which in turn resembles the two monomers [2]. With no evidence to the contrary, we will then assume, in line with chemical experience and the X-ray results, that the tripeptide is simply three monomers joined together in the usual way. THE TRLPEPTIDE

AND

THE DIPEPTIDE

We seek to show now that the dipeptide unit is contained within the tripeptide. This point is altogether more subtle than the question of the monomers in the tripeptide of the last section, because serious distortion of the monomer units would require large amounts of energy (ca. 10 kcal mol-‘) to lengthen bonds and change bond angles substantially, while large changes in the conformation of the dipeptide may be achieved with relatively small amounts of energy (ca. 1 kcal mol-‘) as previously shown [Z, 51. We find that the best way of showing the dipeptide to be present in the tripeptide is simply to plot the total energy of the tripeptide against the sum of the total energies of two dipeptide units of the appropriate conformation. An excellent straight line with unit (45”) slope results. There is, of course, a large constant term for the extra atoms in the two dipeptides. The graph is shown in Fig. 3. This graph shows that whatever happens in the dipeptide also happens in

233

I

0.6000

I

0.6100

I

I

0.6200

I

-741.6400

0.6300

Enerqy of two dlpeptides

Fig. 3. The 45O line. This is drawn independently of the points and is not a fit to the points in the usual way. The points of low energy are top right. The crosses represent the type II hydrogen bond and the squares type I. Of the circles, the four points labelled A are double hydrogen bonds (C, or C,), the group some 25 points contains one C, or C, hydrogen bond and the group of some 20 points labelled C are plateau points. Points which correspond to 1:3 interactions must lie above the line if the interactions are attractive and below the line if they are repulsive.

of

the tripeptide. The graph covers several regions: plateau, extended forms, hydrogen bonds and non-bonded repulsions. In all regions save one there is an excellent fit to the line. Departures of about 0.5 kcal mol-’ are very clear. It then follows that the C7 hydrogen bond, for example, occurs in the tripeptide just as in the dipeptide (see below). Further confirmation is provided by the fact that the departures from the line occur in just the regions where 1:3 interaction might be expected, and, in particular, where we find (see below) that the Cl0 hydrogen bonds exist. We return to this point in the section on Cl0 hydrogen bonds, which are represented by crosses and squares in Fig. 3. More evidence that the dipeptide unit is present in the tripeptide may be obtained by optimising the geometry of the tripeptide in the region of the C, hydrogen bond and checking that this is the same as the optimised geometry of the same region of the dipeptide. There are four possible double C, hydrogen bonds in the tripeptide in the general case, but with our molecule these form two pairs which are mirror images (or, to be exact, pseudo mirror images which differ in the exchange of two hydrogen atoms). These two different double C7 hydrogen bonds may be defined as one above, one below and as both above the plane of Fig. 1. We have optimised the geometry of the former case and the results are shown in Table 2. It is clear that both the C, hydrogen bonds occur at virtually the same r$JIangles as in the

234

TABLE 2 Optimised (PIJJ angles (degrees) in the double C, hydrogen-bonded Angle

Dipeptideb

Tripeptide L(below)

QI JI

86.2 -70.4

tripeptidea

85.4 -70.8

R(above) 85.7 -69.5

aThe optim~ation was carried out in four steps by optimising each of the four angles in turn in the natural sequence. The terms “above” and “below” refer to the plane of Fig. 1. bData from ref. 2.

dipeptide [Z] . The local geometry of these two hydrogen bonds is thus that given in Fig. 5 of ref. 2, We have further examined the populations in the single and double C7 hydrogen bond conformations of the tripeptide (Table 3). We see essentially the same result as in the dipeptide, with the hydrogen donor (H, or H19, or both) losing about 0.02 to 0.03 electrons and the oxygen atom (0, or 012, or both) gaining electrons, although to a lesser extent. The above evidence shows that the basic concept of dipeptide units within the tripeptide is valid and that, if the tripeptide conformation departs from that of the dipeptide, factors external to the dipeptide unit must be responsible for the difference. Such external factors include salvation effects, crystal forces and hydrogen bonding from another part of the peptide chain. The point that the dipeptide unit survives unchanged in the tripeptide molecule was established in the PCILO work [3] in the sense that, “the genera! contour of stability of each residue (within, say, the usual limit of 5 keal mol-* above the global minimum) remains essentially unchanged”. The limit of 5 kcal mol” is rather large, since nearly all the effects with which we are dkaling are smaller than this, though

the general conciusion

agrees with

OUTS.

THE TRIPEPTIDE

PLATEAU

In the dipeptide work, we showed that over large areas of the conformal map the total energy is effectively a constant and we caBed this region the plateau. We cannot carry out the same exercise for the tripeptide in a direct fashion because the number of computations would be very large, so we have used the results of the last section to provide an initial basis for the problem. We assume, in effect, that the tripeptide plateau will lie within both dipeptide plateaux. The results of some twenty computations over this region of the fivedimensional space covered an energy range of about 4 kcal mol-’ ; this range is too large for a useful plateau. It is true, however, that the dipeptide plateau shows a scatter of some l-l.5 kcal mol-’ , so with two such units within the

235

tripeptide we must expect about double this scatter, and this is what we actually find. In this situation, we are forced to change the definition of the plateau slightly by subtracting the energy of the two dipeptides from that of the tripeptide. thus cancelling out the dipeptide effects. The result is shown in Fig. 4, using the same scale as for the hydrogen bonds. All the points but two, are now reasonably constant over about 1 kcal mol-’ . This is satisfactory in itself, but is only achieved at the price of using a more complicated definition of the plateau with the concomitant loss of some simplicity and clarity. THE C,, HYDROGEN

BONDS

AND

1:3 EFFECTS

IN THE TRIPEPTIDE

We are concerned here with those effects which occur in the tripeptide and not in the dipeptide, and in particular with 1:3 interactions. We ignore for the moment the question of 1:2:3 interactions. The most interesting of the 113 interactions are the Cl0 hydrogen bonds (also called l-4 or i to i + 3 hydrogen bonds when the amino acid is used as the basic unit). Inspection of simple models suggests that there are three possible such hydrogen bonds, one of which is helical (III) and two of which are non-helical (I and II). The labelling is that used by Venkatachalam [6] and Crawford et al. [7] . It was pointed out [S, 71 that if the signs of all four angles are reversed there will be created another three molecules whose local geometry within the hydrogen bond will be the same as in the initial three molecules. The three new molecules are labelled I’, II’ and III’. They are not mirror images of I, II and III in the usual sense of the term mirror image. This is because the molecules I and I’, for example, differ by a reflection in the plane of Fig. 1 plus the interchange of the two atoms on the cy carbon atoms. Even with two hydrogen atoms on the Q carbon atom, as in our tripeptide, the molecules I and I’ are still not mirror images because if one labels the two hydrogens as, say, a and b, one must interchange this pair of hydrogen atoms as well as carrying out the rotations about 4 and $Jin order to create a mirror image. Nevertheless, the primed molecules are physically identical with the unprimed counterparts in our molecule so we have only three physically distinct molecules to examine. We shall deal with the helical molecule (III) elsewhere, in company with other helical cases, so we have two Cl0 hydrogen bonds to consider. We have worked with the molecules I’ and II rather than I and II (because the former pair are both above the plane of Fig. 1) but we shall refer to them as I and II. Both of these Cl,, hydrogen bonds reverse the direction of the chain in a protein and are sometimes called reverse turns [ 7 ] . In the search for the Cl0 hydrogen bond, we firstly inspected models and used the package itself to calculate the local geometry within the hydrogen bond. We also used the dipeptide maps to ensure that the total energy within the dipeptide units is not too high. Using these three processes as filters, we were able to narrow down the region of search for the hydrogen bond to practical limits over a given volume of the five-dimensional space.

236 TABLE

3

Tripeptide populations

B

A Total -574.94610 energy of molecule C, 0, C, N, H, C, N, H, H.3 H,CI C,, 01, H,, C 14 HI, HI, 01, N,, HI, H*, R M L

for various conformationsa C

-574.93246

D

-574.93446

5.693

5.694

5.695

8.292 6.009 7.370 0.792 6.009 7.368 0.922 0.922 0.794 5.755 8.277 0.944 5.687 0.923 0.923 8.287 7.433 0.806 0.794 23.007 22.147 23.138

8.272 6.012 7.369 0.817 6.011 7.366 0.896 0.938 0.810 5.750 8.266 0.941 5.694 0.941 0.897 8.278 7.436 0.814 0.799 23.021 22.152 23.133

8.284 6.010 7.365 0.808 6.010 7.366 0.896 0.937 0.809 5.750 8.267 0.942 5.689 0.916 0.927 8.290 7.432 0.808 0.799 23.018 22.152 23.134

-574.94649

5.690 8.305(A) 6.011 7.378 0.778(D) 6.013 7.363 0.911 0.919 0.798 5.743 8.286(A) 0.932 5.695 0.918 0.922 8.293 7.449 0.787(D) 0.809 23.033 22.151 23.122

E

F

-574.94446

5.706 8.294 6.011 7.383 0.772(D) 6.010 7.364 0.913 0.922 0.801 5.745 8.281(A) 0.936 5.698 0.929 0.927 8.284 7.434 0.808 0.797 23.011 22.155 23.127

-574.94572

5.682 8.303(A) 6.009 7.365 0.799 6.011 7.368 0.920 0.919 0.794 5.754 8.275 0.941 5.695 0.920 0.914 8.290 7.448 0.787(D) 0.805 23.025 22.149 23.132

=The numbering sequence is that of Fig. 1. R/M/L is right/middle/left; and R is CONH,, M is CONH and L is CHONH. The identifying labels A, B _ _ . are as follows (angles in the sequence .02 + 2 9x I$3 (Fig. 1)): A is -180, -180, -180, -180 (extended); B is -100, -100 (plateau); C is -100, -100, 90, -140 (plateau); D is -85.4, 69.2, -80, -100, 84.4, -69.5 (double C, H-bond); E is -85.4,70.8, -180, -180 (C, H-bond plus extended); F is -180, -180, -85.6, 69.5 (extended plus C, H-bond); G is 85.4, -70.8, 85.6, -69.5 (double C, H-bond); H is -100, -100, -85.6, 69.5 (plateau plus C, H-bond); I is -180, -180, -100, -100 (extended-plateau); J is -80, -80, -180, -180 (plateauextended); K is -60, 120, 129.0, -30 (type II C,, H-bond); L is -60, -30, -120,45 (type 1 C,, H-bond). The total energy of the molecule is given at the top of the table (a-u.).

Starting with the type II bond, we then carried out some fifty computations over this region and were able to construct the Morse-like curve of Fig. 5 for the energy of the hydrogen bond as a function of the distance between the two heavy atoms. Such a curve could not be constructed for the dipeptide because the necessary flexibility of the whole molecule is lacking. As far as we are aware, this is the first demonstration of 1:3 intramolecular

237

I

H

G

K

J

L

-574.94625 -574.93977 -574.93917 -574.93959 -574.94082 -574.938808

5.690

8.305(A) 6.010 7.378 0.778(D) 6.013 7.363 0.918 0.910 0.798 5.744 8.286(A) 0.933 5.695 0.919 0.923 8.293 7.449 0.787(D) 0.808 23.023 22.151 23.124

5.688 8.294(A) 6.010 7.366 0.806 6.013 7.365 0.895 0.936 0.806 5.750 8.267 0.940 5.696 0.922 0.913 8.292 7.449 0.784(D) 0.807 23.028 22.154 23.128

5.688 8.283 6.008 7.367 0.807 6.010 7.367 0.925 0.922 0.792 5.754 8.278 0.943 5.693 0.939 0.898 8.280 7.435 0.813 0.798 23.019 22.145 23.134

5.698

8.278 6.009 7.371 0.801 6.010 7.367 0.909 0.931 0.808 5.754 8.264 0.943 5.687 0.924 0.923 8.291 7.433 0.805 0.795 23.010 22.148 23.136

5,693

8.285 6.006 7.368 0.803 6.016 7.360 0.932 0.916 0.798 5.742 8.289(A) 0.933 5.700 0.898 0.931 8.296 7.451 0.775(D) 0.808 23.030 22.149 23.122

5.693

8.285 6.009 7.373 0.803 6.017 7.370 0.918 0.914 0.804 5.748 8.275(A) 0.936 5.698 0.898 0.931 8.293 7.443 0.789(D) 0.804 23.027 22.154 23.133

hydrogen bonding by ab initio methods. The local geometry of this bond is shown in Fig. 6, with the plane of the carbonyl group as the plane of the paper. It is clear from Fig. 6 that the arrangement of the four atoms COHN is quite close to a straight line. Moreover, the bond energy (4.2 kcal mol-‘) is large, and it seems that this may approach the optimum geometry for this hydrogen bond. Certainly, if the hydrogen atom is in the plane of the carbonyl group it will encounter the lone pairs of the oxygen atom however these may be distributed [4] . The result shown in Fig. 5 is clear, and free from the difficulties associated w&h multiple minima of the kind which are found in other computations [ 3, 81. We cannot be absolutely certain of this conclusion, as we have not searched the entire energy surface exhaustively, but we have encountered no signs of other minima in this general region. A bond energy as large as 4.2 kcal mol-’ also means that the molecule will be firmly locked into this shape (the Boltzmann factor is about 10-j) insofar as thermal energy is

238

0

0.002 ;;

0.001

E -L

0

P

3

C’ rJ 0

OOooo

0

0

00

rs 0

0

-0.001 - 0.002 ‘i;

0

1.

Fig. 4. The plateau in the tripeptide molecule. The points are scattered throughout the plateau regions of the two dipeptide fragments and cover the two dipeptide plateaux comprehensively. The two points which depart from the zero line by some 0.002 hartrees are near the edges of the plateau of the dipeptides. The quantity d on the vertical scale is the difference in energy of the tripeptide versus two dipeptides with a large constant removed. TABLE

4

Potential

Type

I

Type

II

Type

III

C,,

hydrogen

bonds

in a tripeptidea

Literature

values

[ 6, 7 ]

-60, --L’O; (120,150; -60, 120; (120, 300; -60, -30; (120,150;

-90,O 90,180) 90,o 270,180) -60, -30 120,150)

This work -60, -30; (120,150; -60,120; (120,300;

-120,45 60, 225) 130, -30 310,150)

aConvention is 1970 IUPAC with the earlier convention shown in parentheses for ease of reference. The sequence is the natural one o2 0 2 Q~ +!J 3. Molecules I’ and II are similar in general structure but are not the same molecule, nor are they related by any symmetry operation. Molecules I’ and II are both above the plane of Fig. 1. (A clear drawing of the molecules is given on the right-hand side of Fig. 1 in ref. 7.) The type I molecule is point 17 on Fig. 7.

concerned.

to

activation

This

amount

of energy

is relatively

small, however,

with respect

energies.

The computed values of the geometry and energy of this hydrogen bond are close to general expectation, as expressed by the values adopted in many semi-empirical computations [3,8] . The energy estimate is particularly important because this quantity cannot be deduced directly from experiment and is essential in trying to understand the secondary and tertiary structure of proteins. The populations shown in Table 3 suggest again that formation of the hydrogen bond leads to a loss of about 0.02-0.03 electrons from the hydrogen atom (H, 9). There is also some accumulation, perhaps about 0.015-0.02

239

4 b

\

2.977

J

2

.22

Y

I

-L

0 I

3.0

4.0

N-0

l-

ci.0

/

dlslonce(A)

Fig. 5. The Morse-like curve for the type II C,, hydrogen bond. These points are represented by crosses on the 45” line of Fig. 3. The four atoms COHN are almost rectilinear in this hydrogen bond (Fig. 6). bond. The plane formed by the Fig. 6. The local geometry of the type II C,, hydrogen carbonyl group and the two atoms bonded to the carbon atom is the xz plane. The coordinates (x, y, z) of the hydrogen atom are (-O-53,0.84, 2.93).and those of the nitrogen atom are (-0.61, 1.12, 3.91). The y axis points vertically upwards.

electrons, on the oxygen atom (O,,). There is no indication of a simple transfer of electrons from the NH unit to the CO unit, and this suggests that we are dealing with a simple polarisation within the two units. Alternatively, these computations may not be of sufficient accuracy to reveal the charge transfer which is generally thought to occur in hydrogen bonding. Nevertheless, these results are very like those obtained for the C, hydrogen bond [2] and also like those for the same hydrogen bond in the tripeptide (cf. Table 3). We next searched for the other type of C,,-, hydrogen bond (type I of ref. 6). Using the method given above to compute the energy of the hydrogen bond, we obtained the results shown in Fig. 7. There is plainly much scatter in the points but there is an indication of a Morse-type curve, shown in Fig. 7, plus several points whose energy is altogether higher than would have been expected. Moreover, models show that this hydrogen bond is sharply bent with the hydrogen atom approaching the carbon of the carbonyl unit. We suggest that the scatter to higher energies arises from repulsion between hydrogen and carbon atoms. To check this point, we give the carbon-hydrogen distances in Table 5. Making a division at about 2.6 A, it appears that those points which fit on the curve are below this line and those which do not, lie

240 ‘13

N---O distance (a)

Fig. 7. The Morse-like curve of the type I C,, hydrogen bond. These points are represented by squares on the 45” line. The curve is drawn through the lowest points on the attractive part while for the repulsive part we have used the curve of the type II hydrogen bond. It might seem better to draw the curve through the points 13, 9,14, 2,ll but this makes it impossible to account for point 12 and destroys the explanation based on the c.arbonhydrogen distance given in the text and Table 5. The numbering sequencehas no significance. We have used point 17 as the source of geometry and populations (Fig. 8 and Table 3).

above it. In other words, there is a rough division into good and bad points depending on the carbon-hydrogen distance and this supports the above supposition that repulsion is mixing in with the hydrogen-bond formation. If this idea is broadly correct, we must be prepared to monitor the carbonhydrogen distance to confirm that it is above this value before we may expect the point to lie on the Morse curve. Further experience may allow us to make more accurate assessments of the threshold value of approach of the carbon-hydrogen pair. Notice that the repulsive part of the Morse-like curve is particularly vulnerable to this complication because when the hydrogen atom is close to the oxygen atom it is frequently also close to the carbon atom. Accepting for the moment that the curve of Fig. 7 is roughly correct, we may estimate the energy and geometry of the type I hydrogen bond. It is clear that any estimate of the bond energy must be more tentative than is that of the type II hydrogen bond, and it may be that further exploration of the energy surface will reveal points of lower total energy. Our experience in trying to find such points leads us to believe that, if they exist, these points are confined to a small region of the fivedimensional space. The situation is different with the type II hydrogen bond, where it is easy to find points on the curve of Fig. 5. It would seem to follow that, in statistical terms, there is

241 TABLE

5

~yd~oge~e~bo~

distances in the

Point number

C-H

13 14 5 9 2 1 I1 4

2.150 2.231 2.318 2.363 2,383 2.411 2.452 2.514

-

2.623 2.645 2.659 2.834 2.341. 2.852 2.934 3.352 3.806

type I hydrogen bonda

distance

_

aTbe points above the space having short ca~bou-hydrogen distances are generaliy not on the Morse-like curve, They are indicated by crosses in Fig. 7. The points below the space are generally within reasonable distance of the curve. They are indicated by circles in Fig. 7.

much greater probability of finding the type XI than the type I bond and this would still be true even if the two bond energies were the same. At the present time, we feel that the best estimate of the energy of the type I hydrogen bond is about 2.5 kcal mol-‘. The local geometry of the type I hydrogen bond is shown in Fig. 8. The bending of this bond, mentioned above, is clear from this figure. However, the distance between the two heavy atoms is very close to that in the type II bond and this reinforces our idea that this is indeed a genuine hydrogen bond, The populations shown in Table 3, column L, seem to show the same donor and acceptor action as the other hydrogen bonds, although the donor action by the hydrogen atom is perhaps rather smaller in this case than in that of the type II hydrogen bond. In addition, the acceptor action is smaller than might have been expected. It seems to us that the figures are not yet accurate enough to justify more thorough analysis, but it is pleasing to note that the donor and acceptor situations can be picked out from Table 3 by inspection. Again, there seems to be little, if any, charge transfer between the amide units within the molecule. Lastly, although we have not carried out a detailed investigation, we have looked for signs of the 1: 3 interaction of formal charges (multipolemultipole interactions). This was done because such interactions are often supposed to occur and are part of many semi-empirical schemes [S, 91. A a

242 Z

Fig. 8. The local geometry of the type I C,, hydrogen bond. The plane formed by the carbonyl group and the two atoms bonded to it is the xz plane. The NH distance is 1.022 a.

summary of the earlier semi-empirical results is given in ref. 3. If interactions of this type were to exist, they must cause deviations from the 45” line of Fig. 3. We have seen no clear indications of such deviations and it is difficult to believe that we would not have encountered them in the course of some one hundred computations on the tripeptide, unless they are smaller than about 0.5 kcal mol-’ in the conformations studied. The relationship between the hydrogen bonds and the bends in proteins It is often supposed [3, 7] that the C,,, hydrogen bonds are the factors responsible for the bends which involve amino acid residues l-4 (or i to i + 3) and which occur widely in proteins, leading to a change in the direction of the chain by about 180”. In addition, the CT hydrogen bonds may be the key to the bending of the chain at the glycine residues (see below) in several molecules. If this is really the case, we have a direct link between the bending of the protein chain and the results of the ab initio computations of this paper and ref. 2. We intend to take up the whole question of comparison with experiment in detail after further computations have been completed. At present we shall consider only a few of the salient points. Some of the most direct and clear-cut experimental evidence on the question of bends in proteins comes from the structure of the B chain of insulin and the work of Blundell and co-workers [ 111. There are three glycine

243

residues in this chain and ail three are associated with bends (Table 6). One of the glycine residues (B8) creates a sharp bend in the chain and the experimental values of @ and Q given in Table 6 (100” and -110”) are reasonably close to the computed values (86” and -70”) for these angles. This is the C7 hydrogen bond of ref. 2 and it is tempting to suggest that it may be this bond which stabilises the bend in question. We are unable to establish its energy as kcal clearly as that of the Cl0 hydrogen bonds, but an estimate of 3-O-3.5 mol-’ seems reasonable. This would be strong enough to lock the molecule into the required shape at normal temperatures and conditions (with a Boltzmann factor of about 2.10-3). A similar, but not identical, situation occurs at B20 in the B chain of insulin, though here the bend is said to be shallower and to involve residues B21, B22 and B23, the last of which is the third glycine residue (Table 5).

The B20 glycine residue again has angles reasonably close to the C, hydrogen bond but these values are also quite close to the +60”, -120” of the type II’ Cl0 hydrogen bond. It is difficult to understand the B23 angles in terms of the short-range forces which we are discussing here, but at least they are

within the plateau region of the dipeptide

map and are therefore

allowed

angles, even if they are remote from the hydrogen bonds under discussion. Further evidence concerning glycine and hairpin bends can be supplied by a study of the enzymes chymotrypsin, elastase, trypsin and cytochrome c (ref. 11, p_ 43). The experimental evidence relating to the Cl0 hydrogen bond [3,6, 7,12, 161 concerns both the polypeptides and the proteins themselves, and includes myoglobin and lysozyme. We are unable to make direct comparisons with the present work because nearly all this evidence is concerned with amino acid residues other than glycine, and so it may be misleading to attempt detailed comparisons. It does seem to be widely felt, however, that C,, hydrogen bonds are important in many cases and we intend to examine detailed examples later. It is important to realise that the single C, and single Cl0 hydrogen bonds by no means exhaust all the possibilities for hydrogen bonding, even within a fragment of protein chain as small as this tripeptide. For example, one can easily set up multiple hydrogen bonds involving two C, hydrogen bonds as shown above (Table 2) and the role of such interactions in protein structure TABLE

6

The @Q angles at the bends Angle

@ *

B8

B20 100/40

120/90

-120/-l

in the B chain of insulin

10

aConvention is 1970 IUPAC. cussion in ref. 11).

-9o/-120

[ 111”

B23 120/100

170/140

The two values are for molecule

This work

and ref. 2

86

-70 1 and molecule

2 (see dis-

244

seems to be quite unknown. The point is important because the computed energy of the structure of the tripeptide with a double C7 hydrogen bond is some 3 kcal mol-’ below that with the Cl,, hydrogen bond. Before we can extend our quantum-mechanical insight into the relationship between the secondary structure of proteins and the ammo acid sequence [14] , we must examine these structures with care. There is clearly a case for attempting long basis set computations, however time consuming they may be (see below), on special points such as multiple hydrogen bonds. DISCUSSION

Our main purpose in this series of papers is to establish the nature of the results yielded by the ab initio molecular orbital method when applied to the problem of protein structure. We have shown so far that the dipeptide and tripeptide are both reasonably well represented as two and three peptide units joined together, as generally assumed. The phenomena which occur in the dipeptide, particularly the hydrogen bonds, reappear directly in the tripeptide, as expected. We have also shown that the two possible non-helical Cl0 hydrogen bonds do exist and we have established their geometry and bond energy in a straightforward way. This has been done using the readily-available GAUSSIAN 70 package of conventional MO theory. This theory suffers from two disadvantages in the form used in these calculations. One is the purely practical point that in a large molecule we are compelled to use a short basis set. Judging from the results shown in Table 6 (see below), this limitation is not as damaging as one might think. The second limitation is a matter of principle; the correlation correction. The PCILO work of Maigret and Pullman [3] achieved part of these results in a more exhaustive manner, though this method is apparently less effective in finding hydrogen bonds than is the ab initio method. They report that, “areas of stability of the different types of the Cl0 ring. . . consist of very flat zones with small conformational holes of little more stability”. It seems reasonable to conclude that were we to continue the work through the tetra-, penta- and polypeptides we would get results similar to those obtained above. That is, we could confirm on Illsntum-mechanical grounds that the polypeptides are indeed peptide units I:iiked together in the usual way, with the major modifying factors being non-bonded repulsions and hydrogen bonds of various kinds. This is a pleasing result because it reinforces our confidence in both the intuitive chemical view which we have used for many years and in the newer quantum-mechanical approach to protein structure. Of course, the possibility remains that, as the molecules become larger, a departure from additivity of a more subtle kind could occur, which in the limit of genuine protein molecules could create solid state-like effects and even electrical conduction 1151. Our results have some bearing on the well-known semi-empirical methods for computing the conformations of proteins [16]. These methods start with

245

the assumption that a plateau exists and then add to this the special terms due to non-bonded repulsions, hydrogen bonds, torsional barriers and the interaction of point charges. Up to a point, our results support this general approach. We have demonstrated the existence of a plateau and of nonbonded repulsions and hydrogen bonds as separate and distin~ishable entities with characteristic numerical values. On the other hand, we have shown that hydrogen bonds do not depend solely on the distance between the two heavy atoms, as is sometimes assumed in the semi-empirical approach, but also on other aspects of the geometry about the hydrogen bond. Our results offer little help with torsional barriers (cf. ref. 2) and none at all with charge effects. It may prove possible to use ab initio results to establish parameter values for the semi-empirical methods (such values are difficult to obtain, judging by the large range in the literature) but we will need many more ab initio computations to achieve this convincingly. We have given detailed sets of population numbers (Table 3) in the hope that these will form a standard set against which the population results for isolated computations may be anaIysed_ ~atever physical meaning one attaches to popuiations, the incontrovertable point is that populations serve as a crude summary of the wavefunction in the sense that were the wavefunction changed the populations would also change. The weakest assumption that can be made about populations therefore is that they act as the theoretical fingerprints of a molecule, just as do spectroscopic results on the experimental side, This aspect of populations does not seem to be widely appreciated. It must be realised that changes in the basis set will generally change the absolute values of the populations. It is to be hoped that the relative values of the populations will remain substantially the same for different basis sets, The populations seem to make physicaI sense in that it is possible to inspect Table 3 and pick out which hydrogen atoms are forming hydrogen bonds by the relatively low value (0.77-U.78) of their populations. However, there are puzzling aspects of the population results, in particular the populations of the methylene hydrogen atoms, which change markedly with conformation.

Further comments

on the dipeptide

work [Zj

Whilst Part 2 was in the course of publication, a paper by Robson, Hillier and Guest appeared, which dealt with the same conformal energy map as ours 1171. The quality of the individual computations which these authors carried out was higher than ours, since they used the ATMOL3 package with a long basis set (9s5p) contracted to (3~2~) for the heavy atoms and (4s) contracted to (2s) for the hydrogen atoms. The STO-36 basis uses (3~3~) throughout. On the other hand, each of their points required 40 min of GDG7600 time while each of ours required 2 min on this machine. They were therefore able to compute 14 points while we computed 150 points in

246

They were unable to construct a conformal map in a straightforward way, but were able to fit their data to the CFF map of Hagler et al. [IS]. Moreover, the map they obtained is a close fit to ours (Fig. 2 of ref. 2). To demonstrate this quantitatively, we give in Table 7 the numerical values of the total energy at the 14 points in both the longer basis Cl’71 and in our short basis set. We have adjusted the absolute value of our result to zero at -9O”, +90” to make the comparison simpler. There is good agreement between the two sets of values throughout. Indeed, since the absolute error can hardly be less than about 2000 kcal mol-’ in even the best computations which do not include the correlation effect, such agreement suggests that the short basis is remarkably effective in dete~ining relative energy values in these molecules. Of course, this result in no way invalidates the point that a long basis is int~ns~~~ly more reliable than a short basis. There are some points of disagreement between OUTresults and those of Robson et al. 1171. For example, they find the extended form of the dipeptide to be some 3 kcal mol-’ lower in energy than the C, hydrogen bond. We find these two points to be roughly the same in energy. On the other hand, we agree with them that the point at about O”, -90” is a saddle point quite high in energy (ca. 15-16 kca! mol”’ as con-,. ared with their value of 18.7 kcal mol-I). The PCILO method finds this point to be relatively low in energy. The general agreement between the results for the long and short basis sets is pleasing and suggests that we may have some confidence in the results obtained for the tripeptide from the shorL basis. To investigate the tripeptide much

less time.

TABLE

7

Dipeptide total energies (kcai mol-*) Q

c,

-90 -90 -90 -90

@ +90

-60 -120 -120 -180 -180 -180 c, -180 -120 -60 0

i-60 +30 0 -50 -120 -180 0 -50 -120 -180 +I20 +50 i-90

from long and short basis sets? Long Base (ATMOL3) 0.0 -0.5 3.9 4.6 7.8 6.7 1.3 17.2 5.3 3.3 -3.2 1.4 19.6 18.7

Short Base(STO-3G) 0.0 -0.2 3.8 4.5 2.3 2.5 0.0 19.1 2.3 1.9 -1.7 1.3 28 15.0

aBoth sets of values are fitted to the common zero at -90, -+90. The first three columns are from Table f of ref. 17. Notice that the short basis set results must not be judged on their effectiveness at only fourteen points, e.g. the C, hydrogen bond is not located at the same point by both the long and the short basis sets.

247

using the long basis would hardly be a practicalproposition (severalhours of machine time per point) unless it proves critical for us to know, say, the relative energy of two points on the conformal map. We ~adve~ntly omitted to mention the elegat work of Griffin and Coppens 1191 in Part 2. This work is an experimentalevaluation of the e‘lectrondensity in the vicinity of the COHN system and we will return to the details elsewhere. REFERENCES 1 2 3 4 5 6 7 8 9 10 11 12

D. D. B. D-

Peters and J- Peters, J. Mol. Struct-, 50 (1978) 133. Peters and J. Peters, J. Mol. Struct., 53 (1979) 103. Maigret and B. Pullman, Theor. Chim. Acta (Berl.), 35 (1974) 113. Peters and J. M. Carpenter, in 0. Cbalvet, R. Daudel, S. Diner and J- P.Mah-ieu (Eds.), Localisation and Deloealisation in Quantum Chemistry, Vol. 1, Reidel, Dordrecht, p. 99. J. Caillet, P. CIaverie and B. Pullman, Tbeor. Chim. Acta {Berl.), 47 (1978) 17. C!. M. Venkatach~~, Biopolymers, 6 (1968) 1425. J. L. Crawford, W- N. Lipscomb and C. G. Schellman, Proc. Natl. Acad. Sci. U.S.A., 70 (1973) 538. G. N- Ramachandran, R. Chandrasekharan and R. Chidambaram, Proc. Indian Acad. Sci., Sect. A, 74 (1971) 270; D- Kotelehuek, H. A. Scheraga and R. Walters, Proe. Natt. Aead- Sci- U.S.A., 69 (1972) 3629. E.g. D. M. Hayes and P. A. Kollman, J. Am. Chem. Sot., 98 (1976) 3335. E.g. G- Nemethy and H. A- Scheraga, Q_ Rev. Biochem., 10 (1977) 239. T. L. Blundell and L. N. Johnson, in Protein Crystallography, Academic Press, London, 1976, Chap- 2 and p. 35. R. Chandrasekaran, A. V- Lakshminarayanan, U. V. Pandaya and G. N. Ramachandran, Biochim. Biophys. Acta, 303 (1973) 14; A. W_ Burgess, P. K. Pcmnswamy and H. A. Scheraga, isr. J. Chem., 12 (1974) 239; 1. L. Karle, J. Am. Chem. Sot., 101 (1979) 181; J. N. Brown and M. P. Klein, J. Am. Chem.Sot., lOl(1979) 445; M. Llinas,

D- M- Wilson and M. P. Klein, J. Am. Chem- Sot., 99 (1977) 6846. X3 C. Nemethy and M. P. Printz, Macromolecules, 5 (1972) 755. 14 J- A- Lens&a, Bioehim. Biophys. Aeta, 491 (1977) 33. 15 J. Ladik, Int. J. Quantum Chem., Quantum Biol. Symp., 3 (1976) 237. 16 P. N- Lewis, F. A- Momany and H. A. Scheraga, Bioehim. Biophys. Aeta, 303 (1973) 211; V. 2. Pletnev, F. A- Kadymova and E- M. Popov, Biopolymers, 13 (1974) 1085; K. Nishikama, F. A. Momany and 11. A. Scheraga, Macromolecules, 7 (1974) 797; 1. Simon, G- Nemethy and H. A. Seheraga, Macromolecules, 11(1978) 797; V- Renugopalakrishnan and D. W. Urry, Int. J. Quantum Chem., Quantum Biol. Symp., 3 (1976) 13; S. S. Zimmerman, M- S- Pottle, G. Nemethy and H. A- Scheraga, Macromolecules, 10 (1977) 1. 17 B. Robson, I. H. Hillier and M. Guest, J. Chem. Sot., Faraday Trans. 2,74 18 A_ T. Hagler, E. Huler and S- Lifson, J. Am_ Chem. Sot., 96 (1974) 5319.

(1978)

1311.

19 J. A. Griffin and P. Coppens, J. Am. Chem. Sot., 97 (1975) 3496. 20 G. N. Ramaehandran, A. S. Kolaskar, C. Ramakrishnan and V. Sasisekharan, Bioehim. Biophys. A&a, 359 (1974) 298.