CYCLIC TETRAPEPTIDES I: THE CALCULATED POTENTIAL ENERGY MINIMA OF CYCLIC TETRAPEPTIDES COMPOSED OF SMALL AMINO-ACID RESIDUES DAVID N. J. WHITE and CHRISTOPHER MORROW Chemistry Department,The University, Glasgow G12 SQQ, Scotland
(Receiued 24 iVovem6er 19?8) Abstract-An almost automatic general computational procedure has been devised which enables location of all the low-energy conformations of cyclic peptides containing- UD . to ten or so amino-acid residues. The scale of the computationiinvolved can be consider&l; reduced if only the global minimum energy conformation is required. The algorithm consists of folding a model of the linear polypeptide chain, with the desired number of residues and arbitrary geometry, into a series of low-energy conformations which correspond to all possible combinations of ‘generators’. The ‘generators’ are pairs of ((p, $1 torsion angles which indicate the positions of minima in Ramachandran maps for each overlapping dipeptide pair in the target cyclic peptide. Each ‘generator’ has an associated statistical weight which may or may not be subsequently used to screen out cyclic peptide conformations. The folded polypeptide chains so generated are operated on by a ring closure algorithm and a battery of tests to reject sterically unreasonable conformations. Recalcitrant conformations are marked for subsequent examination by a computer graphics system. The geometries of the surviving model conformations are fmally optimized by means of a hvo-stage, Newton-Rapbson energy minimization algorithm. Forty such low-energy conformations, including the global minimum, were obtained from 7776 generated conformations of c-Gly..
[NTRODU~ON
Attempts to define algorithms for the global energy minimization of model molecular structures have been by and large confined to one particular class of compounds, namely cyclic or linear polypeptides (White, 1975: De Santis & Liquori, 1971: Scheraga, 1971: Momany, 1976). There are a number of reasons why polypeptides have been tackled as indeed there are a number of reasons why the problem is amenable to solution with these compounds and not with others. The most obvious reason for singling out polypcptides is their biological importance, which has ensured that there is no dearth of accurate X-ray (Karle, Gibson & Karle, 1970) and neutron (Af-Karaghouii & Koetzle, 1975) crystal structure analyses as well as NMR-derived structural data (Ovchinnikov & Ivanov, 1975) suitable for checking the results of molecular mechanics calculations, Cyclic polypeptides are important for cation transport across membranes (Smith ef aI.. 1975) and molecular complexes involving cyclic peptides have been investigated as possible enzyme analogues (Bartman, Deber & Blout, 1977) in parallel with work on crown ethers (Cram & Cram, 1974). Meaningful structure/activity relationships are beginning to emerge from the wealth of data available for cyclic polypeptides (Pa&e, Chapman & Hillaby, 1972: Steele d al., 1976). Global minimization studies on polypeptides are bearing fruit because a number of circumstances conspire to reduce the dimensionality of the problem compared with say, alkanes. (Each molecule has 3N4 degrees of freedom so that, in general, energy minimization is performed in 3N-6 dimensions.) Firstly, the amide linkage itself is invariably truns-(o = 180”) (Biochemistry, 1970) except in a few well defined situations such as, incorporation into a cyclic peptide consisting of fewer than five amino-acid residues where ring closure constraints can force the adoption of a cisconfiguration (Kartha & Ambady, 1975): or instances where the amide nitrogen is alkylated, in which case the
There are two absolutely fundamental requirements for successful research projects based on molecular mechanics (MM) calculations (White, 1977). The first is a force field (White & Bovill, 1977) yielding sufficiently accurate geometries, energies, etc. for the purpose at hand-or the means to develop such a force field in a reasonable period of time. The present paper will not be concerned with this aspect of MM calculations because a force field suitable for polypeptide work has been developed previously (White & Guy, 1975), and also because the resultsaf the present work are to a large extent force field independent. The second requirement is that it must be possible to locate the gtobal minimum energy conformatiorl (GMEC) of every compound under consideration and it is this problem which forms the subject of the present work. The traditional approach to global energy minimization of model molecular structures by MM calculations consists .of manipulating Dreiding models into a series of intuitively plausible shapes, deriving sets of _internal coordinates from the Dreiding models, and using these as inputs to a molecular mechanics program. The GMEC is then, hopefully, the conformation of lowest energy amongst those considered. This approach is quite adequate for molecules containing up to I5 or 20 atoms when the number of possibilities to bc considered is fairly small (white, 1978). but becomes of dubious utility beyond this point when there are a large number of well defined energy minima. For example there are at least nine minima in the case of cyclodeca-ld-diene (White & Bovill, 1975). However, this simple procedure, or variants of it based on computer graphics molecular modelling (White & Morrow, 1978). remains the method of choice for almost all computations of molecular structure whether via molecular orbital or molecular mechanics calculations. 33
and CHRBTOFHER Moaaow DAVIDN. 1. WHITE
34
normal trans-preference of cti. 2 kcal mol-’ is reduced to almost vanishing point (Howard et al., 1973). Secondly, the intrinsic torsional potentials around the N-C, and C’-C, bonds (Biochemistry, 1970) can, for all practical be regarded as non-existent (Hagler, purposes, Leiserowitz & Tuval. 1976). This contrasts with the situation in say, alkanes, where the barriers to free rotation amount to ca. 3 kcal mol-‘. Thirdly, and perhaps most important of all, there are a number of stable hydrogen bonded substructures such as the Cs (Avignon & Huong, l!UO), y (Nbmethy & Printz, 1972) and fi (Blake et al.. 1967) turns which appear singly or in combination in all crystal structures and cyclic peptides (composed of naturally occurring amino acids) which have the capacity to form intramolecular hydrogen bonds of the N-H . . . O=C type. Finally, we propose the following hypothesis which can he used to half the dimensionality of energy minimization calculations on evenmembered ring structures: ‘The GMEC of an isolated monocyclic molecule composed of an even number of repeated subunits, each containing an integral number of atoms, will have C, or higher symmetry’. No proof is offered for this empirical observation. Even with the simplifications discussed above global energy minimization on a molecule of any size requires a large quantity of contiguous computer time if the procedure is to be automatic. There are few central computing facilities where several consecutive hours (or days) of computer time can he commandeered and even fewer where on-line interactive graphics facilities are available. The global minimization algorithm has therefore been implemented on a PDP-I1140 minicomputer with a VT1 l/VRIT refresh graphics display. G6 and Scheraga (1973) have proposed that differences between the backbone conformations of cyclic peptides composed of. glycyl residues and other cyclic peptides can be interpreted in terms of the extra steric requirements of the side-chains of non glycyl residues. It should therefore be expected that the conformations of all known cyclic peptides form a subset of the corresponding cyclopolyglycyl conformations. Cyctotetraglycyl represents the smallest cyclic peptide available for a non-trivial test of the global minimization algorithm, with the added bonus that a number of diierent cyclotetrapeptide and depsipeptide conformations are known so that Ga and Scheraga’s hypothesis can be tested. GumAL MrNlsuzA_
(a) Principles. Global optimization (i.e. maximization as well as minimization) is an important concept not only in the physical and mathematical sciences but commercially as well. It is not surprising therefore that .there is a substantial literature on the subject and that ,a number of algorithms have been proposed (Dixon & Szegii. 1973). There are three major classes of global optimization procedures which are of interest to chemists. namely, trajectory methods, random search techniques and space covering procedures. Trajectory methods are typified by the Wiberg-Boyd algorithm for simulating the dynamics of conformational changes (Wiherg & Boyd, 1972) and a similar but theoretically more elegant method described by Hilderbrant (Hilderbrant, 1977). Trajectory methods have been used to investigate the ring inversion of for instance cyclohexane (wibera & Bond. 1972) and cis.cis-cvcloo&a- 1.5diene‘(Et-m&, 1976j where . the traj&ory~ followed is the appropriate reaction coordinate. Exploration
of the various pathways for ring inversion leads naturally to the definitiqn of all local minima and maxima on the route so that the GMEC can easily be selected_ Trajectory procedures have never been used for the primary purpose of locating the GMEC of complex molecuies but rather used to interpret the dynamic NMR spectra of simple molecules with obvious ring inversion itineraries (Ermer, 1976: Anet & Yavari. 1977: Guy et al., 1978). The reason for this is that it is almost impossible to be certain that trajectories embracing all of the energy minima have been defined, particularly as it is possible to envisage a completely isolated minimum on the potential energy hypersurface. Branin’s trajectory algorithm appears to be a very promising approach to global optimization for problems of low (< IO) dimensionality (Branin, 1971) but has not been tested on the kind of systems under consideration here, which have high inherent dimensionality ( > 60). There are two kinds of random search algorithm, the pure random search and the multistart technique. In the pure random search method f(x) is evaluated at N uniformly distributed random points in the space S of the problem. Anderssen (Anderssen, 1972) has shown that the probability P of finding a value V such that not more than a predetermined proportion (Yof the locations of x in S have f(x) < V is given by P=l-(l-ap.
(I)
This enables an assessment to he made of the probability that the global minimum of f(x) has been located. Pure random searches, whereby all 3N-6 degrees of molecular freedom are atlowed to vary randomIy, have not been described in the chemical literature probably for the obvious reason that such an algorithm would be very inefficient. Random searches have been performed in the torsion angle space of linear dipeptides (Lewis, Momany & Scheraga, 1973) hut in this case the problem has heen considerably simplified by the prior input of chemical knowledge (i.e. of the bond lengths and angles). In the multistart technique the practitioner setects his favourite unconstrained optimization procedure and runs it from a number of dilferent starting points. The set of alt terminating points hopefully includes the global ininimum point. This is at present the most widely used procedure for chemical problems. However, as mentioned previously, there remains the problem of selecting an adequate starting set for problems of high dimensionality. The success of most space covering ‘algorithms is dependent on the requirement that tbe function to be minimized is Lipschitzian, that is lf(x”‘)
- f(x’“‘)l < LHX”’-
xq
(2)
for any x”’ and x”’ in S. The most obvious space covering procedure is the simple grid technique in which the function is evaluated at equispaced points throughout S. Equation (2) allows a lower bound on the function value to be established for each ‘cubic’ block, provided L is known, and any block in which this is bound is less than the lowest value of f(x) found can be investigated further. M?re efficient routines based on the Lipschitz property are available but all members of this ciass of procedure are only effective for problems of low dimensional&y (Evtushenko. 1971: Shubert, 1972). Despite t&e limitations space covering algorithms have been utilised
The calculated potential energy minima of cyclic tetrapeptides composed of small amino-acid residues
for quantitative conformational analysis of molecules whose steric energy can be approximated as a function of two or three torsion angles (Potenzone ef cl., 1977). It is obvious from the above discussions that global optimization is generally in a very primitive state of development. The saving grace as far as molecular mechanics calculations are concerned is that the algorithms used need not (indeed must not!) be completely general purpose, but must make as much use of easily available structural chemical information as possible in order to simplify the problem. (b) Previous algotithms. There are two algorithms which have been successfully applied to the location of the GMEC of cyclic peptides. Both are derivatives of the multi&art procedure where the starting points for energy minimization are chosen on a more objective basis than examination of Dreiding models. It is instructive to examine these procedures (which were developed some 5 yr ago) with a view to improvement in the light of the greatly increased computer power available at the present time. The first algorithm is due to Scheraga et al. (Gb & Scheraga, 1973, 1973a) and was applied to cyclohexaglycyl. The first step consisted of reducing the dimensionahty of the problem as far as was reasonably possible. The bond lengths and angles were fixed and the amide groups assumed to have invariant, tmns-PaulingCorey geometry (Pauhng, 1960) so that all the o’s were fixed at l&P. Furthermore, only molecular conformations with C,, &,, Ca. I and C2 symmetry were considered so that, as only the cpi and 4, are potential variables, there are 0, 1,.2, 3 and 4 independent variable torsion angles in each case. The dependent variables were calculated so as to fulfil the condition of exact ring closure of the polypeptide chain. The exclusive consideration of symmetric minima in this case appears to be just&d by the hypothesis proposed earlier. Initial points for the multistart procedure are generated by means of a grid search at 10” intervals threughout, the space or hyperspace of the particular symmetric :conformation. However instead of pursuing the grid se’arch to its conclusion the torsion angles corresponding to the vicinity of a minimum were refined by means of a generalized partan unconstrained optimization procedure (Williams, Stang & Schleyer, 19as) which is more efficient (but more time consuming) in the immediate neighbourhood of a minimum. Twenty four unique energy minima were located. There are a number of difficulties with this method, all of which arise from the approximations that were necessary to make the problem tractable. In the first place calculation of sets of torsion angles leading to exocf ring closure with standard Pauling-Corey (Patding, 1960) amide group geometries leads to the omission of an unknown number of potentially important conformations with non-standard (e.g. non-planar) amide groups. Indeed Gb and Scherazn (GG & Scheraaa. 1!970) move’ that it is impossibg to-generate a cycio&agIycyl conformation with phutar Pauling-Corey amide groups (although the GMEC for c-Gly4 has all transoid-amide groups) and point out that this is a geometric rather than energetic constraint. Secondly, conformations which are unique in a torsion angle subspace are not necessarily so in the full 3N-6 dimensional space (Niu, Gb C Scheraga, 1973) so that the twenty four c-Gly, conformations found above may not in fact be unique. Finally, the procedure is far from automatic and requires substantial operator intervention. The second approach, due to De Bantis and Liquori
35
(De Santis t Liquori, 1971) is conceptually less elegant than the first but offers considerable scope for refinement and automation. The calculations were performed on Gramicidin S which was known to have Cc molecular symmetry in sohrtion. For the purpose of calculating Ramacbandran maps of molecular potential energy (steric energy) as a function of the torsion angles 9 and I,$ (Ramachandran, 1968) it was assumed that the Pro, Val, Om, and Leu amino-acids could be approximated to by Ala and D-Phe bv D-Ala. There were five minima in the Ala-Ala Ramach&lran map and those for the D-Ala dipeptide were taken as the enantiomeric minima to the original five. Models of the Gramicidin S structure were computationally constructed by generating all the possible 2 x 4’ stereochemical sequences of the Ramachandran plot minima (and their enantiomorphs for DPhe). The nine conformations which resulted in an end-to-end distance of less than 10 A for the folded Gramicidin S amino-acid chain were selected for further examination, Of these nine conformations only two, after minor adjustment, could be induced to ‘form’ the hydrogen bonds which had been shown to exist, experimentally, between valine and leucine. Crude energy minimization calculations were performed in a p. 9 torsion angle subspace (o = lsoo) to select the GMEC. The principal diBiculties with this-algorithm are: the fairly heavy reliance on experimental data to ‘prime’ the calculation; the lack of generality, particularly as the Gramicidin S problem as formulated is almost trivial by 1978 (although not 1968)standards: and again the fact that all 3N-6 degrees of freedom were not allowed to relax. The global minimization algorithm which was finally constructed is a synthesis of all the foregoing ideas and discussions which was intended to be as general and as automatic as possible_ (c) The new algorithm. A block diagram of the interactive minicomputer graphics system employed for these calculations is shown in Fig. 1. In addition to the global energy minimization specific routines an advanced molecular modelling system, the Glasgow University Chemical Graphics System (White & Morrow, 1!978),was used for building the initial polypeptide chains, examining doubtful conformations generated by the GMEC algorithm and for the local minimizations on contenders for the GMEC. A block diagram of the global minimization procedure is shown in Fig. 2 and the following paragraphs discuss each step in detail. The first step in the algorithm consists of calculating a Ramachandran map for each overlapping dipeptide pair in the target cyclic peptide. These calculations are concerned with cyclotetiaglycyl and only the Gly-Gly map need be calculated; this is shown in Fig. 3. The map is almost identical to that calculated by Scheraga er al. who located seven minima and characterized them by means of molecular mechanics calculations. However, in view of the large areas of low energy conformational space one cannot be sure that there are no other minima. In addition, the precise location of the minima under these circumstances is very much force field and minimization routine dependent. Accordingly, we arbitrarily chose six roughly equidistantly placed points at ((p. +) equal to 1(90,60), 2( 120,60). 3(120,180), 4(- 120, l&o), X120, -60) and 6( L 90, - 60) degrees for ‘generators’ in the subsequent calculations. This removes any doubt that the final results might be iniluenced by the exact position of minima in the maps.
T4662 PLOTTER
VR-17 CRT
1
wDFv
VT-II
I
PDP 11140 CPU
KT-II MEMORY MANAGEM
, \
64 K WORDS MEMaRY
KE IIF CIS
I - I
KE HE EIS
I%. 1. A blockdiagramof the PDP-Ii configurationon which thesecalculationswere performed.
*
f
In RKOS
FULL
PURE MAGONAL MINIMISATION
.
L
/
\
I
\
1
REJECT NON-CYCLIC anmRMATlDNS
Fig. 2. A block diagramof theglobalenergyminimizationalgorithm.
v
/\
l
REJECT
GET STARTING COORMNATES
The calculated potential energy minima of cyclic tatrapeptides composedof small amino-acid residues
37
Fig. 3. A Ramachandran potential energy map for N-acetyl N’-methyl glycyl amide. Torsion angles in degrees and energies in kcal mol-’ above an arbitrary zero. The coordinates describing an arbitrary conformation of the -Glyr polypeptide chain would normally be built using the Chemical Graphics System and would probably contain planar amide groups. In fact, coordinates corresponding to a randomly chosen linear tetrapeptide chain were copied from the results of an X-ray crystallographic study nearest to hand. The fragment contained non-planar amide groups (Winkler & Dunitz. 1971) with a variation in geometry from residue to residue. The tetrepeptide chain was then folded up into a succession of probable low-energy conformations corresponding to all possible combinations of generators for each of the six possible configurational sequences of cisand trans-amide groups. At this point there are 63 combinations of (CPI,$11, (49. @~2)and (cpa. 43): ((p4.M being undetermined because they overlap the break in the polypeptide chain. In addition, the cis-/tram- sequences must be considered giving rise to 6’ possible conformations of the chain. Only some of these will approximate to a cyclic structure. Ring closure of the folded chains was attempted by means of a simple pattern search procedure which depended on geometric rather than energetic criteria to effect closure. Each individual ring torsion angle (cp. 1,5 and o) and a-carbon ring valency angle was allowed to vary by up to 220” from its starting value in an attem t to close the ring with f~,d~4 within to.1 A of 1.48 II. BtiCO+a and &_+.,+* as close to IIV and 122” as passible and finally ((~4, I#.+)as close as possible to each of the six generator values (Fig. 4). These requirements
were weighted so that ring closure was the predominant aim (weight=S) followed distantly by correct bond angles (weight = 1) and torsion angles (weight =0.8), Conformations which did not ring close to within -tO.lO A of the optimum value were rejected at this stage. The conformer generating algorithm, as so far described, will attempt to construct 6’ ( = 7776) structures for c-Gly,. However, not all of these are unique and a large number of possibilities can be eliminated in the following manner. Because these calculations were performed on a homopolymer it was possible to generate repeat conformations related to the original by advancing the pattern of torsion angles by one or more residues. The various conformations were labelled according to their configurational sequence together with the pattern of generators used to fold the chain: thus GLI-1234 represents a trans,trans.tmns,trans-isomer built from generators 1, 2, 3 and 4[((p, 4) = @0,60), (- l20,60), (120,180), (- 120,180) degrees] and is identical to GLI2341. GLl-3412 and GL1-4123. Obviously only one of these conformations need be considered, and the unique one is specified by taking the numerically smallest integer derived from the generator digits (1234 in the above set). Finding the smallest integer amongst contenders which can have up to twelve digits for large molecules, is not a trivial problem on a 16 bit minicomputer and our algorithm for achieving this discrimination is shown in Table 1. There is a second procedure for early elimination. of ‘impossible’ conformations which depends on
38
DAVID N. J. WHITE and CHRISTOPHERMORROW prior specification of suicide groups of generator digits. This point is best iilustrated by using a relatively simple molecule. such as cyclodecane for an example. A cwsot-y examination of a Dreiding model of ndecane reveals that if any three successive torsion angles are set in the anti-configuration (7 = 1W) then no matter how the remaining torsion a@es are arranged it is impossible to generate
Fig. 4. illustration of the nomenclature for atoms in the immediate vicinity of the ‘ring junction’.
a closed
ring model
of cyclodecane
Table I. Listing of the routine for performing cyclic redundancy checks of the generated confom
Ml-N-r+1 NT2-rme
-19E
m iG J-1.M m la 1-13-r K-MCkDC~J+I-21,M>+l W1OPB.B MAtCRCCI,J>-flCK>
C*mr***LooP TO FIMI-&%LE5T INTEGEB IN cuRRp(T Row a= PERMUTEDcocfP* c*wwuwEEs.ERcn R#l Cos POSITION IN THE l=CWw%n; TO R DIM1 m ld Lll.NT2 IF(MFFIKCL).EQ.l> Go To 14 IFIlWlQX:IK,Ll.LT.llIN~ IIIN-lWTCRC(K,L) coNTfNJz & IPlIrwrt: ==UrRTIDBCBy S3”rIFIc ~CM)-1)HITt-l DIGIT IN m POSITIa mm m T8-E IIINIPDo
is
m-l,Nr2
IF(~CMI.W.I>
m To 15
incor-
porating this triple anti-sequence. If the generator digit for MP was I then any conformation label containing the suicide sequence 111 would represent a forbidden conformation. A similar argument apdlies to cyclic pepprotides and the global minimization program con&s vision for specifying up to ten suicide sequences (5 digits maximum) in order to eliminate forbidden conformations. The algorithm is shown in Table 2.
The calculated potential energy minima of cyclic tetrapcptidas Table 2. Listing
composedof
small amino-acid residues
39
of the routine for eliminating conformations designated ‘impossible’ in advance of the actual calculations sumculIp(EIteOss
PFRTIAL** !zxmdT RKSIBLY BE I NcaRFwArrT) IN THECLCiSCIl RING.TE Drh*m*SEcu3cES PRI CONTRIN Up TO FOUR USER SPECIFIED DIGITS.********* Lc=sICa IGEN LOGICRLml IEUF(13l,IEXCLDC5~,EXCLUD CamCrvCNa
C-CONST~RRDl,~,MDI,NZING,NT,NZI,NtP,P9,NAV DcWW***ElKOlE TI-E CODE KneER INTO Gl Cl-kXQCTER
STRIrX;***+v***u~**u-cuw* EfKOIIEINT~,i.IELIF~ (Jl(I: ~. .I-I.NTP> FORPYlTI1211~ ~****TERMI:rwE cl+exTa? sn?Ir~ WITH A ZERO EvtE,*,*ts*****+***,*~*~~ NTPl .+iTP+l IBLFCNTPl>-0 C*******REWO\'E ONE iWf?lIa SEQLEHCE Rf FI TIPE FRMl THE EXCLLISION t?RRfiY** Cm**U*FIND USE TM LIBRRRY ROUTEt+i INDEX TO CHXK'IF' TM WRTIkLt****** C******uf;EQUEfKE FlppERps IN THE CODE NU1Ba7*~***uw*wa***u*~~**~~~**~~~~ DO 10 I-1,19XC1_13 DO 11. J-1,5 11 IEXCL.D(J)-EXCLUD(1.J) CRLL IN~XIISLF,IE%CLD,.rl, IFtll.NE.B> GO TO 12 C*******n HIT IF FI.NE.GIGO TO 12 SO MT WE CAN REMOVE CODE ri~~~BERvrvw* IC-0 C*****L*CLOC!XWISE MCKING fU_ WEE REVERSE PfV?TI!=d_SEMENCE TO SEM!CH** DIRECTIOPl+r*t******Irxx********xn*******rk~ C****t**IN F1I GNTICLOCKWIBE w a3 J-1,5 JfeJ-6-J I$(E$C$DCI,JREW.EQ.0> 60 TO 20 _ IExcLD~IC~~MauD~I.JREv:
a1
BY 21 J-IC+l.S IEXCl.D~J)-3 Ci3LL INDEX~I~,IEXCLD,,Ml IF(pI.NE.E) GO TO 12
::~*~%&GUENCE IGEN=. TFWE. 123 TO 13 12 IGEN-.FFltsE, 13
KESN'T
FPfZRR
IN CODE NVMBER,FRSS
CODE
MlMEERu
I1ETWN END
The connectivities of the surviving conformations were calculated on the basis of a maximum bonded distance of 1.6 A and those with ten or more ‘bonds’ in excess of the proper value, due to very short non-bonded intramolecular distances were rejected. Notice that this test was performed using only the main chain C, N and 0 atoms as steric crowding of the hydrogen atoms can usually be satisfactorily resolved during the subsequent energy minimization calculations. At this stage the original 7776 conformations had been reduced to around 60 structures which should have been capable of refinement by energy minimization. However, as a precaution, the conformations were individually examined by displaying calculated models on the C.R.T. of the minicomputer via the Chemical Graphics System prior to minimization. It would have been possrble at this point to arrange the conformation in decreasing order of probability, on the basis of their statistical weight products, (Appendix) so that only the most likely proportion (say the top quarter) of the full set need be considered further. This would have been an acceptable strategy if only the GMEC were being sought but in this case, and in general. the whole range of conformations within say 20 kcal mol-’ of the GMEC are of considerable interest. Again, if location of the GMEC were the primary interest then preliminary energy minimization in torsion angle space (with fixed bond lengths md z&es) could be used to screen out the higher energy con-
formations, at modest cost in computer time, prior to full relaxation energy minimization. The two stage Newton-Raphson energy minimization procedure used in those calculations has been described
elsewhere (White, 1977; white & Sim, 1973; White, M77a: White 8t Rrmer, 1975) and the pure diagonal iteration: x&+1= XL- VXX)-‘v:(X)
(3)
where x are the atomic coordinates, V, is the steric energy. VAX) = dii (a’ V,(X)/~X~) and the components of V.(x) are W,(x)/&, was used for stage one in this instance. During the course of energy mininnzation it became obvious that some groups of apparently different conformations were converging upon identical minima and redundant conformations were again deleted. The remaining forty conformations were subjected to stage two of the Newton-Raphson procedure which was terminated when the elements of V:(x) were all less than IO-” kcal mol-* A-’ in each cast. An example of an operator diiogue with the global minimization program is shown in Table 3. ltlEmm
AND Lltscumm
Diagrams of the tinal forty conformations of cGly*, armnged in increasing order of sterie energy and anno-
40
DAVID N. J. WHITE and CHIUSTOPHERMORROW
Table 3. An illustration of the dialogue between user and the GMEC program Global
Emrpv
V*r*1*n Th*
1.
N
Input
mndant
Omly
I*
thr
Format
*,-
cf
*c.
all
Cyclic
cheln
llol**ul**.
thr
O.rvgw*
*t*.> In
br
*noI**
of
ZrCU-l>
nolrculr, QT*
thr
*p*slfl*d.
wmbinotIon*
C--o.
not, I*
nrrd
torrlon
polr*
rrrponrr
QF-
r-rldu-r.
Thr
.r.-*c.
.=.a-rrct
morn
anglrr,
prrnutlne
In
for
Corbonvl
torrlon
thr**>.
*tkon**>
thr
CHvdroo*n*,
by
of
drralbr
N-3
numb*.-
genrr*l.rd NGEN
Proer*m
10376
*tom*
otomr
*tag*. M
Hlnlmlratlon dun*
of may
br
thr
#‘,n*r*d
CQI~
Thr
oil ot
thrr
p*ptldr*
of
conf*rm*r*
~nrrotorr
rpoclflrd
whu* QT*
Cthrrr
rlnaly
0~.
C*.e.
eqptldrr3.
In
~nrral,
floga*d.
Thr
program
loop*
till
*
rrcrlvrd.
_-_-_-----_ Bond
orrolr
rrqulrrd
varlatlon for
*mall
oehlrur C<
8
ring atom*
I*
clorurm3 or
4
In
g*n*rol
onlno
ocld
.,X*-I
of
bond*
r-J-et
conformation-
only
rr*l&u>.
----------Th.
rqqumrt
Input* *lenlfleMt
for
thr
lnfornotlcn
maxlmum vhlch
numbrr
of
vrry
lr
ollourd u--d rhort
La
non-bond-d
QYIC
thr
r-01
with
o
nurr(nr
cont*ct*.
_-__---__-_
DCiES THE FOLLOWING DATA REFER TO A NOMOPOLYNER TORSION *NtXES CIJ? 6 NUNBER OF VARIAELE TORSION ANGLES BY PAIRS CY/NIT q ARE THE SOND ANGLES TO REMAIN FIXED CY/N37 N r@-BCl~ ANGLES WILL VARY DURINB RING CLOSURE. NANE OF FILE t RK! tGLUfET TITLE ~N~T;SRAQLYCYL AU TRANS 2 TORSZON ENTER ENTIER 2 TORSION ENTER 2 TORSTON ENTER NUMBER OF
ANGLL N4 ANGLE<%> C6Y ATOH LASELS> Nf CENERATORS CZ3 6 CBY PAIRS, IN DEOREESB CFZ
CY/N37
C2 CS CS
CS CS Co
Y
N4 N7 NlS
zzs;z OS
ZK!&ZSER OF EXCLUSION STRINW crux tea cz3 e ENTER TAR= END-TO-END DISTANCE AND MAX DEVIATTON tNEREFRON ENTER SNORTEST TOLERABLE NON-BOWED DISTANCE WI I.0 ENTER NAXINUM PERMISSABLE NUNSEI? OF SUCN CONTACTS Cf3 8 DO YDU WANT TO RESTRICT THE DEPENDENT TORSION ANGLES CY/N3? ENTER STARTINS CONFORNER NUNBER CX3 1 ENTER STARTING OENmATOR STRSNO CII3 (II OUTPUT FILE NANE l RlCl -,TETOUT
Co
Nl0
CF3 N
Cl 1
1.47
.I
The calculated potential energy minima of cyclic tetrapeptides composed of small amino-acid residues tated with the ring torsion angles, are shown in Fig. 5.t It is important to remember that these calculations produce model structures which represent an isolated molecule in the gas phase.
The IR-spectrum
of sublimed
c-Gly,
suggests
that
all
four amide bonds are trons- (Titlestad, l977), and Gti and Scheraga’s proof that these amide groups cannot then be planar (Gti & Scheraga, 1970) lends support to the calculated $-symmetric GMEC which has four transoid amide groups. Further support for the calculations comes from the fact that a conformation very similar to GLI5252 has been observed in an X-ray crystal structure analysis of dihydrochlamydocin (Flippen & Karle, 1976). (Notice that this is contrary to a recent assertion (Titlestad, 19T7) that “No crystal structures have yet been determined of cyclic tetrapeptides which do not adopt the c-&r, (i.e. GL4-4532) conformation”.) The study of c-Gly4 in solution is much more varied and interesting and has been undertaken by two groups in a variety of solvents, largely by means of NMR spectroscopy (Titlestad, 1977: Grathwohl el al., 1975). There is some conflict be-en the two investigations and each will be discussed separately. The Scandinavian group (Titlestad, 1977) studied c-Gly, by NMR in both trifluoracetic acid (TFA) and water and find in favour of GL4-4532 but offer no explanation for this observation. If the minority conformations which are probably present (see Fig. 5 for candidates) are ignored then there are two possible explanations for this behaviour. There is some evidence that cyclic tetrapeptides could be protonated in TFA and this has been discussed by the Scandinavian group but they failed to reach a conclusion either way (Titlestad. 1977). Both possibilities (i.e. protonation and no protonation) will therefore be considered. If the amide groups on GLl-5252 are protonated (at the oxygen atoms) their double bond character wil’l be enhanced, thereby raising the barrier to free rotation around C’-N from the unprotonated value of in the region of the = 20 kcal mole’ to somewhere 60 kcal mol-’ typical of a C=C bond. Ring closure of GLl-5252 involves significant Pitzer strain as a result of the nonplanarity of the amide groups in addition to considerable Bayer strain at the u-carbon atoms. This situation is exacerbated to the extent of a IO kcal mol-’ on protonation (Fig. 6). On the other hand GL4-4532 is essentially free of Pitzer and Baeyer strain in this respect so that protonation has little effect on its stability. The net result of this argument
is that we calculate
pro-
of GLl-5252 to result in a conformational interconversion to GL4-4532-as is in fact observed. This explanation is rendered all the more plausible by-the fact that Winkler and Dunitz have produced conclusive evidence, backed by spectroscopic and X-ray crystal structure studies, that protonation of an amide group in a strained ring, such as that of GLl-5252, can induce trans-lcis- isomerization with large concomitant changes in ring geometry, exactly as discussed above (Winkler Bt tonation
Dunitz,
41
1975). There is however,
an alternative explanais stabilized by weak hydrogen bonds which form seven-membered rings each n-carbon atom across the ‘corners’ (where represents a corner) of GLl-5252, an observation which was first made by Karle on the basis of an X-ray crystal structure analysis of dihydrochlamydocia (Flippen & Karle, 1976). If sublimed c-Gly, is dissolved in TFA then the amide groups could form much stronger, and therefore energetically favourable, hydrogen bonds with the undissociated TFA than they could with each other. This would tend to sirrnificantlv destabilise GLI-5252. On the other hand GL&32 hai no intramolecular hydrogen bonds and intermolecular solvent/solute hydrogen bonding would, in the absence of other steric effects, have no influence on the stability of GL.4-4532 itself. It would appear therefore that even a relatively minor disturbance of the isolated gas phase GMEC of a c-Gty., is sufficient to destabilize it in favour of GL4-4532, an observation that is reinforced by further calculations on cyclic tetrapeptides (White 8i Morrow, 1978a). Identical conformations were found for c-Glv4 in TFA and water ITitlestad, l9n) and it seems likely in the latter cas; that GLI-5252 is destabilized in favour of GL4-4532 because of ,solvent solute hydrogen bonding as explained previously. The Swiss group have studied c-Gly, in TFA and dimethylsulphoxide (Grathwohl et al., 1975) and interpret their results in terms of GLI-5252. However. there is a strong possibility that one of the two separately prepared samples used was not authentic c-Gly, but c-Gly, (Titlestad, 1977: Wiithrich, 1978). These experiments are being repeated (Wlithrich, 1978) and we would expect them to favour GL4-4532 in both TFA, as in the ScanOS dinavian experiments, and DMSO because of NH.. hydrogen bonding. tion for this behaviour. GLl-5252
CONCLUSION
that the global minimization algorithm can successfully locate the GMEC of cyclic peptides, as well as a range of conformations close to the GMEC in potential energy. A conformation from this latter group has been used to explain the conformational changes which occur when sublimed c-Gly, (conformation GLl-5252) is dissolved in TFA, water and DMSO (to give conformation GL4-4532) (White & Morrow, 1977). It has also been proved that G6 and Scheraga’s hypothesis that all other cyclic tetrapeptide conformations (of which there are at least two) are a subset of the c-Gly, low-energy group is correct (Gti & Scheraga, 1973). However, this topic and its extension to include cyclodepsipeptide conformations (of which there are at least four) will form the subject of another paper (White & Morrow, 1978a). it
has been shown
PEFeRENCEs
in the half dozen or so lowest energy conformations. The occor-
Anderssen, R. S. (1972) In Optimization, (Edited by Anderssen, R. S.. Jennings, L. S. Bt Ryan, D. M.). AI-Karaghouli, A. R. & Kotkle, T. F: (1975) Acrn Cryst. B31, 2461.
rence of the cis- amide groups is not so unusual if it is home in mind that the all irons- conformations involve considerable angle strain and amide group distortion to effect ring closure. On the
Anet. F. A. L. a Yavari. I. (1977) I. Am. Chem. Sot. 99,64%. Avignon, hi. & Huong, P. V. (1970) Bio~olynters 9,427. Bartman. B.. Deber, G. M. & Blout, E. R. (1977) 1. Am. C/tern.
other hand the CTCT-conformations
sot. 99, 1028. (1970) Biochemistw 9.3471. kk& C. C. F., h&h: G. A., North, A. C. T.. Phillips, D. C. & Karma, R. (1967) Pt-oc. ROY. SOC. SW. Bt67. 365.
tNotice
that
the CTCT configuration is very well
represented
have the unfavourable
cis-
amide groups but very little angle strain or amide distortion. The energies. involved in the two cases are obviously of approximately the same order of magnitude.
DAVIDN. J. WHITEand CHRISTOPHER MORROW
42
163
-I
67
165
Fig. 5. The forty low energy conformations of cyclotetmglycyl. Peripheral values arc ring torsion angles in de-S and central values are energies in kcal mom’ referred to an arbitrary zero for theGMEC.
The calculated potential energy minima of cyclic tetrapeptides composed of small amino-acid residues
Fig. S.(Contd.)
43
44
DAVID N. J. WHITE
and CHRISTOPHER MORROW
Fig. Wontd.)
The calculated potential energy minima of cyclic tetrapeptides composed of small amino-acid residues
Fii. S.(Contd.)
45
DAvm N. 1. WHITEand CHIU~OPHERMo~now
46
131
44
94
-I
Fig. S.(Contd.).
The calculated potential energy minima of cyclii tetrapeptides composed of small amino-acid residues
Fig. 6. Energies in kcal mol-’ for the
eightlowest energy conformations of arbihary zero..
protonated c-Gly,, again referred to an
47
48
DAVIDN. J. WHITE and
Branin, F. H. Jr. (1971) Pmt. IEEE Cvnf. Systems, Networks B Computers. Mexico, p. 93. Cram, D. J. & Cram, J. M. (1974) Science 183, 803. De Santis. P. 8t Lisuori. A. M. (1971) Biom&mers 10.699. ti%On. L. c. w. 8i .%?epii. G. P. (Eels.) ii%+) Tow& GJobnl Optimization North-Holland, Amsterdam. Ertier. 0. (1976) 1. Am. Chem. SOC.98.3964. Evtushenko. Y. P. (1971) Zh. Vychisl. Mar. mat. Fir. 11. 1340. Flippen, J. L. & Karle. 1. L. (1976) Biopolymers 15. 1081. G& N. & Scheraga, H. A. (1970) Mucmmolecrdes 3, 178. G6. N. & Scheraaa, H. A. (1973) Macromolecules 6. 525. GcS, N. & ScherGa. H. A. (1973;) Macromolecules 6. 273. Guy, M. H. P., Heimbach. P., Macnicol, D. D. & White, Q. N. I. (1978) 1. Mol. strucr. 28, 143. Grathwohl. C., Tun-Kyi, A., Bundi, A., Schwyzer, R. & W&hrich, K. (1975) He/u. Chim. Acta 58, 415. Hagler. A. T., Leiserowitz, L. & Tuval, M. (1976) 1. Am. Chem.
CHRISTOPHER MORROW
White, 0. N. I. (1977a) Aclo Cryst. A33, 1010. White, D. N. 1. & Bovitl, M. J. (1977) I.C.S. Perkin II, 1610. white, D. N. J. & Morrow, C. (1977) Tetruhedmn Letters. 3385. White. D. N. J. (1978) In MO&~ Structure by D&action
Methods. Vol. 6, London. The Chemical Society, Chapter 2. White, D. N. J. & Morrow, C. (1978) Proceeding3 DECUS (UK), 17. White, D. N. 1. L Morrow. C. (1978a) 1. Am. Chem. Sot.. in preparation. Wiberg, K. B. & Boyd, R. H. (1972) 1. Am. Chem. Sot. 94, 8426. Willia& J. E.. Stang. P. J. 8t Schleyer, P. von R. (1968) Ann Rev. Phvs. ~,-~Chcm. 19.531. ~~ Winkler. F. K. & Dunilz, I. D. (1971) J. MO/. Biol. 59, 169. Winkler, F. K. & Dunitz, J. D. (1975) Acta Cryst. B31, 278. Wilthrich, K. (1978) Personal communication. I
sot. 98,4600.
Hilderbrant, R. L. (1977) & Chemistry 1, 179. Howard. J. C., Momany, F. A.. An&eat@ R. H. & Scheraga, H. A. (1973) hfacmmokcules 6,535. Karle, 1. L., Gibson, J. W. & Karle. J. (1970) J. Am. Chem. Sot. 92,375s. Kartha, G. & Ambndy, G. (1975) Acta Cryst. 831, 2035. Lewis, P. N.. Momany, F. A. & Scheragn. H. A. (1973) Ismel/. Chem. 11, 121. Momany, F. A. (1976) .J. Am. Chem. Sot. 98, 2990. N6meth;. G. & Printz, M. P. (1972) Mncmmolecules 5,755. Niu. G. C. C.. G% N. & Scheraga, H. A. (1973) Macromolecules 6, 91. Ovchinnikov, Yu. A. & Ivanov. V. T. (1975) Tetrahedron 31, 2177. Pache. W.. Chapman, D. & Hillaby, R. (1972) Biuchim. Biophys. Actu 255. 358. Pauling, L. (1960) The Nature of the Chemical Bond, p. 282. Cornell Universitv Press. New York. Potenzone, R. Jr., C&&hi; E,, Weintraub, H. J. R. & Hopfinger, A. J. (1977) Computers & Chemistry I. 187. Ramachandra, G. N. & Sasisekhar& V. (I=) Ada. ProI. Chem. 23. 283. Scheraga, HL A. (1971) Chem. Reus. 71, 195. Shubert. B. 0. (1972) SIAM J. Namer Anal. 9, 379. Smith, 0. D., Dnax, W. L., Langs. D. A.. DeTitta, G. T., Edmonds, J. W.. Rohrer, D. C. & Weeks, C. hf. (1975) J. Am. Chem. Sot. W,7242. Steele, J. A. Uchytil, T. F., Durbin, R. D., Bhatnager, P. & Rich, D. H. (1976) Pmt. Nutl. Acud. Sci. U.S.A. 73.2245. Titlestad, K. (1977) Acfu Cbem. Scund. 831,641. White, D. N. I. & Sim, G. A. (1973) Tetrahedron 29,3933. White, D. N. J. (1975) .Terrohedron Letters, 2101. White, D. N. I. & Bovill, M. I. (1975) Tetrnhedmn Letters, 2239. White, D. N. J. & Ermer. 0. (1975) Chem. Phys. L&t. 31, III. White, D. N. J. &Guy, M. H. P. (1975) J.C.S. Perkin II, 43. White, D. N. J. (1977) Computers & Chemistry 1, 225.
APPENDIX
The statistical from
weight, zi, of the ith generator
may be obtained
zi = Z=RT(det F,)-‘” exp ( - V$:iRlRT) where V, is the steric energy at the generator dran map (Scheraga. 1971) and
(4)
in its Rnmachan-
F=
(5) F may be evaluated by explicitly calculating the second derivatives (White, 1977) or variaus approximations may be used. For example, if the curvature at the generator is small then there will be a large number of adjacent grid points in the Ramachandran map with almost the same low energy as that at the minimum. It is obvious therefore that the number of grid points, n;, in the immediate vicinity of the ith minimum with V. less than some arbitrary upper limit (say 2 kcal mol-’ in the case of Fig. 3) is inversely proportional to det F so that (4) may be written (6) because (4) gives only relative statistical weights in the tirst place. The statistical weights for any dipeptide pair may be normalised as shown in (7).
(z&.rnl = Z&i
(7)
The statistical weight product for any (tetrapeptide) conformation is then simply the product of the slatistical weights of the generators used to fold up the linear chain.