A FAST COMPUTER
PROGRAM
CONFORMATIONAL
FOR
ANALYSIS
GIORGIO CASTELLANI and RAIMONW SCORDAMAGLIA Istituto Ricerche “Guide Donegani”, Via G. Fauser 4, 28100 Novara,
Italy
(Received I8 Jammy 1983; in revised form 6 July 1983) describe a program, caned SOBALCAN, that enables us to compute very quickly the rotation hypcrsurfacc of a flexible molecule in the “soft-sphere” approximation. Only thoseconformations in which the overlap of spheres associated to all pairs of non-bonded atoms is less than a prefixed threshold are computed. _
Abstract-We
INTRODUCTION
Conformational freedom plays an important role in the study of structure-activity relationships of drugs: since the rotation of atomic groupings about single bonds is more or less hindered, molecules can adopt a number of shapes, often very different from each other, of which only a few can interact with the receptor site. This latter, in turn, is usually assumed to have a well-defined geometry, resulting from the balance of intramolecular interactions, and the changes it undergoes when interacting with the active molecules are regarded as fairly small and their energetic cost negligible. If one wants to determine which requirements molecules should satisfy in order to be active, their steric properties must, therefore, be known. No detailed information on the receptor structure is generally available; in most cases, even the specific receptor for certain molecules is unknown. One then has to infer the receptor form by studying the steric features of a series of molecules which, on experimental basis, are known to possess a certain kind of activity. The rationale underlying this approach is that, if congeneric molecules interact with the same receptor site, they must share some steric properties. The assessment of common features is not a simple task when dealing with classes of molecules having a large number of rotational degrees of freedom; in principle, all accessible conformations should be considered. If no information enabling one to limit his search is at hand, some simplifying assumptions have to be made: the most frequently used consists of comparing molecules in their minimum-energy conformations (as determined through theoretical computations) or in the crystallographic conformations, when known. However, this might be a poor assumption for flexible ligands. Only the achievement of satisfactory relationships with activity for certain molecules or, hopefully, of correct predictions for new molecules, give a confirmation aposteriori of the validity of this approach. From literature we see that unfortunately none of the numerical methods of energy potential function minimization can solve the problem of finding the global minimum in a definite way. The most obvious one is to calculate conformational energy at a number of points in conformational space chosen either at random or over a fixed grid of values of independent variables and finally compare the energy values at various local minima (McCornick, 1972). In prin-
ciple, this method requires an enormous number of energy evaluations. But the problem can be reduced to a tractable size eliminating certain regions of conforrnational space on the basis of a study of molecular models and to perform the minimum search only in selected subspaces (Lewis er al., 1973). Another way, if you have experimental evidence, for example NMR spectra available, is to automatically analyse the raw data by searching for conformations that are possible sterically and which agree with experimental data. In this case the conformation assumed you can consider the ensemble average seen by the physical technique (Barry er al., 1971). To do this either interactive methods are described, which are able to take advantage of the power and versatility of the “human computer” for pattern recognition and inductive thinking, or automatic ones, which define some criterion function to describe the acceptability of a conformation which can then be optimized (Barry et al., 1974). Particularly in large molecules, one of the most important problems is that of false minima. Multiple local minima are quite common and a local minimum may have a higher statistical weight than the global one and may therefore correspond to the actual predominant conformation of a molecule (Scheraga, 197 1). Our present effort is directed both to reach automatically local minima in conformational hypersurface and to define regions in the variables space in which we can perform further energy optimization. The possibility of yielding and checking all conformations a molecule can assume appears at first sight beyond the present possibilities of computation. The exploration of the whole conformational space for a molecule with rn degrees of rotational freedom (m rotatable bonds), in steps of 360”/n for each rotation, requires the computation of n” conformations: even in the simple case of m = 5 and n = 12, the number of possible conformations (248, 832) is excessive even with the fastest algorithms. Only a fraction of conformations which are possible from the geometric point of view are acceptable from the energy point of view. But is there any means to avoid computing the energy of those conformations which can be regarded with certainty as high-energy ones? In other words, if N, is the total number of molecular conformations and N, (in general < NJ is the number of conformations whose non-bonded energy is lower than a prefixed threshold, is it possible to reduce the number N, of actual 127
G. CASTELLANI and R.
128
computations to a value as close as possible to N., by establishing criteria such that N, - N, (the number of conformations which can be safely discarded without computing their energy) is as high as possible? We have developed a computer program which allows us to give an affirmative answer to this question. It has been called SOBALCAN (Soft-Ball Conformational Analysis) and its description forms the subject of the present paper. GENERATING
CONFORMATIONS
starting molecular conformation is constructed by means of the STERIMOL program (Verloop et nl., 1976). The internal rotation angles are chosen by manipulating Dreiding (or equivalent) stereomodels into “plausible” shapes, i.e. in such a way that a visual inspection reveals the absence of any bad contacts between nonbonded atoms. If the molecule contains atoms whose structure cannot be reproduced properly with the data stored in STERTMOL, we use the program BAC (Builder of Atomic Coordinates), which computes the internal coordinates of atoms with any desired geometry (Castellani & Scordamaglia, 1980). The molecule is then divided into m + 1 groups. Each group may be regarded as a rigid body; should unacceptable contacts between non-bonded atoms exist within it, none of the inter-group rotations would be capable of restoring a good geometry. Thus the program makes a preliminary check on the above interactions, and stops if bad contacts are detected. In the next step interactions between neighboring groups are considered, by performing the M rotations about the bonds connecting groups. From this analysis we obtain a m x n rectangular matrix which contains the allowed and the non-allowed angular ranges for each rotation (see next section for the criteria of acceptability). The forbidden angular ranges are no more taken into consideration by the program, since those contacts between non-bonded atoms which cause a range to be put aside depend only on the dihedral angle considered and cannot be removed by rotating the others. The overall computational gain, i.e. the number of conformations which need no more to be examined, is of n”- ’ for each range removed. At this point the program makes a control to detect all possible rotational symmetries, and reduces the angular range accordingly: for example, for a methyl group, the angular range explored is 120” instead of 360”. Subsequently all angles are rotated sequentially, according to the following scheme. Let us indicate by 0 rp:, P2r I.. I cp; the starting values of internal rotation angles. First a complete rotation of (P,,, is made, i.e. the conformations A
SCOFUXMAGLIA
are computed; their number is less than n if, in the single rotation of (Pi, one or more forbidden ranges had been detected, and equal to n otherwise. Then the angle qz _ , is given successive increments of 360/n, and each time the whole rotation of (Pi is repeated. When the cycle on 50, _ , is completed (after doing n2 computations at most), we give successive increments of 360/n to ~z_~, and for each of these (up to) n conformations we perform the coupled rotation [cp,, cp,_ J, thus reaching a maximum of n3 computations. The procedure is then repeated for q,,_ 3r the whole conrp,-4,. . >‘PI, thus covering formational space with a multidimensional grid, During the above process a large number of computations leading to predictably unacceptable conformations may be avoided through the “skipping” process illustrated below.
Suppose that, when doing rotation q3, we detect, at &, a contact between two atoms belonging to groups G, and G,, resp. It is clear that rotations rp,, (p5 and ‘ps cannot remove this contact. Computation of the whole rotations q5 and ‘ps at (p, would be a waste of time, and so we can save (up to) n2 - 1 computations. If the rotation about the bond G,-G, had had a sequential number higher than 3, for example if we had called (p4 this rotation and cp, the rotation about G,-G,, we would have been able to save (up to) n’ - 1 computations. This proves that the choice of the order in which internal rotations are performed may differ from the sequence of rotatable bonds along the backbone chain: the more well-chosen that order, the smaller the value of N,. As an exampk in Table 1 we report the number of total calculated conformations N, obtained rotating three angles cp,, I) in separated ‘Pi, ‘Pi and (Pi. ‘p5, q, (see Fig. calculations respectively, of NRDC 161, an insecticide of cyclopropane carboxylate class, in all possible different sequences. Obviously the number of accepted conformations for a threshold of compen-
Table 1. No. of calculated conformations N, 107 252 107 192 252 204 122 192 122 192 192 192
A fast computer program for conformational analysis
129
Fig. 1. Molecule of NRDC 161 (a - cyano - 3 - phenoxybenzyl cis - 2,2 - dimethyl - 3 - (2,2 dibromovinyl)cyclopropane carboxylate from X-ray crystal structure analysis (Owen, 1975), with the indication of the seven internal rotations dealt with by SOBALCAN. etrability of non-bonded spheres of 20% does not change with the chosen order and is fixed to 48, in the first case, and 106 in the second. The number of conformations which can be discarded in this way is timmk, where k is the sequential number of the rotation for which the skip occurs, for each angular range skipped. Until now we have spoken of conformations, and shown the devices to reduce the number of the computed ones. But it should be borne in mind that “to compute a conformation” actually means to compute a number of interatomic distances of the order of the square of the number of atoms contained in the moIecule. A careful consideration of the nonbonded interactions allows us to further reduce the computational time. First of all, only the so-called non-bonded active interactions (Tosi & Lipari, 1981), i.e. interactions between atoms whose reciprocal distance is affected by a rotation, are taken into consideration. Secondly, the indices corresponding to the interacting atoms are collected into matrices (one matrix per internal rotation angle), of which the rows are formed by the atoms in the groups before the rotated bond, and the columns by the atoms in the groups after rotated bond. With this ordering, we give priority to those interactions which, if above the acceptability threshold, allow us to determine the internal rotation angle for which the largest possible number of computations are saved. By using these criteria we were able to reduce the number of computed conformations by approximately an order of magnitude, and to keep the CPU time to a few minutes on our UNIVAC 1100/20 computer for each of the molecules reported in Table 2. These molecules are insecticides of the cyclopropane carboxylate ester class, synthesized by Elliott (Elliott et al., 19741: they have the general formula X,C = CHCHC(CH,),CHC(O)OCHRPhOPh, with different isomers of the ethylenic part with respect to the carboxylic group (see also Fig. I for an indication of the five internal rotations considered). We see from Table 2 that, on the average, the number iv, of computed conformations is about l/6 of the total number of possible conformations N,. In turn, the number n, of accepted conformations is about l/3 of N,. In actual fact, the difference between “computed” and “accepted” conformations is much
Table 2. Conformational analysis with SOBALCAN series of cyclopropane carboxylate esters NRDC No.? 143 157 158 161 163 166 167 168 169 170 171 172
Isometry tram cis trans CiS
trans trans cis cis trans cis trans cis
X Cl Br Br Br Br Cl Cl Cl F F F F
R H H CN CN H
CN H CN CN CN H H
N, 23232 10637 17332 8056 22954 17625 10883 8216 21171 9164 27782 12081
iV, 56506 29000 49893 26324 565.51 49974 29458 26726 58248 28200 65809 31061
3
of a 3
N,
N<
0.227 0.117 0.201 0.106 0.227 0.201 0.118 0.107 0.234 0.113 0.264 0.125
0.411 0.367 0.347 0.306 0.406 0.353 0.369 0.307 0.363 0,325 0.422 0.389
TNRDC stands for National Research Development Company, the English institution from which these data come.
smaller than it appears from this comparison. A conformation is accepted when the summation of overlaps, extended over all pairs of non-bonded atoms, does not exceed a global threshold (see next section), chosen in advance by the program user. A conformation is computed, but not accepted, when at least one contact between non-bonded atoms higher than a single threshold is detected: clearly this detection can occur at any time during the examination of non-bonded interactions, and as soon as this happens the conformation is rejected and the remaining interactions are not considered. THE
SOFT-SPHERE
APPROXLMATION
According to the well-known Pauling’s model, atoms may represented as spheres which, on penetrating into each other, give rise to bonds, the penetration degree increasing from single to double to triple bonds. The radii of the spheres, typical for any atom, are the van der Waals radii of the corresponding chemical species, as defined, e.g. by Bondi (1964). The criterion we chose to decide on the acceptability of conformations is the degree of interpenetration of the spheres centered on any pair of non-bonded atoms. Whenever two non-bonded atoms (i.e. two at-
130
G.
- 16
SCORDAMAGLIA
\ /
16
>
2 z 14 3 x
and R.
CASTELLANI
nl
c
14 2
12
12 ii x
10
10
0
a
8
6
6
4
0
y+JY+j 4
2
; :
0
90
180
-90
2
d
90
0
180
-90
0
Fig. 2. PCILO energies, in kcal mole- ,‘ for rotations about the bonds adjoining the cyclopropane ring in crysanthemic acid, and maximum fractional overlap between non-bonded atoms (open circles). oms at three or more bond distance) approach one another closer than a certain limit, the molecular energy increases very rapidly and its largest part is due to this interaction. A geometric parameter capable to describe this phenomenon in a satisfactory way is the overlap between the spheres associated to the two atoms. The absolute value of the overlap has the disadvantage of dealing with small atoms (i.e. atoms with short van der Waals radius) and large atoms (i.e. atoms with long van der Waals radius) in the same way. This does not reflect the trend of the energy curves with the interatomic distance: in fact large atoms, due to the greater polarizdbility of their outershell electrons, can approach closer before energy increases to unacceptable values. A better option proved to be the $-actionnl overlap, expressed as 0, = 1 - f&&r0 + rh) (& < r, + rb), as its plot versus internal rotation angles was fairly similar to that of the energy curve computed with quantum-mechanical procedures: see for example Fig. 2, in which we compare the mrtximum fractional overlap between nonbonded atoms with the energies of crysanthemic acid conformations corresponding to rotations around the two bonds adjoining the cyclopropane ring, computed with the quantum-mechanical semirigorous method PCILO (Castellani et a!., 1980). The fact that, when two atoms approach one another too close, the predominant contribution to intramolecular energy arises from their repulsion, allows us to discard a conformation as soon as we detect an interaction higher than a “single” threshold, and to avoid any further test. A molecular conformation may be regarded as unacceptable also when many local strains are present, i.e. strains which, if considered separately, would be tolerated, but become unacceptable if considered together. We, therefore, define also a “global” threshold, and discard a conformation when the sum of two or more overlaps exceeds it. The choice of the threshold above which a conformation should be discarded is critical and depends to a large extent on the subsequent use of the results of our conformational analysis, Should we take o, - 0, the model would coincide with the hard-sphere one, in which no overlap between the spheres is admitted; most conformations are then discarded. If,
on the other hand, we take o,m 1, all geometrically possible conformations are computed and accepted, with a large increase of the computer time. In choosing a value of of intermediate between these two extremes, many criteria may be applied. First it is necessary to decide at which energy value a conformation is regarded as unacceptable; from this, one easily finds the corresponding overlap. On the other hand, the inherent approximations of the method should be taken into account. For example, bond stretching and bond angle deformation are neglected, while it is known that relaxation of all internal degrees of freedom of a molecule can provide easier paths for conformational interconversion, by reducing the overlaps between pairs of nonbonded atoms. One should also bear in mind that the computing time increases with increasing the threshold and with decreasing the width of the angular ranges. It is thus necessary to find a good compromise between these contrasting factors, without losing the significance of results. In our correlation studies we obtained satisfactory results with n = 12 (steps of 30”)) a single overlap of= 0.2 and a global overlap of 30~ If the interest in using this program is limited to find the energy minima, a threshold of 0.05 may be sufficient. CONCLUDING
REMARKS
The methodology used in SOBALCAN does not imply the direct computation of intramolecular energy; this energy, though, is implicit in the concept of mutual penetration of van der Waals spheres centered on non-bonded atom pairs. Because of this characteristic, the method is more suited to ascertain the “density” of conformational states accessible in the various angular regions, than to determine the stability of a given conformation. It is necessary to point out that coulombic interactions, and in particular hydrogen bonds, which can substantially modify the molecular energy, are not considered in this version of the program. So, the use of SOBALCAN is discouraged when dealing with molecules where these effects play an important role (e.g. ions, or molecules with strong intramolecular hydrogen bonds). To obviate these drawbacks an improved version of the present program, which we plan to describe in
A fast computer program paper, has therefore been worked out; in this new program (ERHYCA, which stands for Entire Rotational Hypersurface Conformational Analysis) we have maintained the same algorithms of SOBALCAN to save computer time, but the accessibility of conformations is evaluated by means of a potential energy function. a subsequent
REFERENCES Bondi, A. (1964). 1. Phys. Chem. 68. 441. Castellani, G., Scordamaglia, R. C Tosi. C. (1980), Gazz. Chim. Ital. 110, 457. Castellani, G. & Scordamaglia, R. (1980), EUCHEM Conference on Logical Structures, P&I-I, FRG. Elliott, M. & Famham, A. W. ef al. (1974). Natwe 248, 710.
for conformational analysis
131
Owen, J. D. (1975), J. Chem. Sot. Per/& I 186.5. Tosi, C. & Lipari, G. (1981), Thee-et. Chim. Acfo (Berlin) 60. 41 Verl&p, A., Hoogenstraaten, W. & Tipker, J. (I976), in MedicinaI Chemistry, Vol. VII, Chap. 4, E. J. AriBns, Ed., Academic Press, New York. McComick, G. P. (1972), in Numerical Methods for Non-Linear Optimization, F. A. Lotsma, Ed., Academic Press, London, p. 209. Lewis, P. N., Momany, F. A. & Scheraga, H. A. (1973), ISraP J. Chem. 11, 121. Barry, C. D., North, A. C. T’., Glasel, J. A., Williams, R. J. P. & Xavier, A. V. (1971), Nolure 232, p. 236 et seq. Barry, C. D., Bosshard, H. E., Ellis. R. A. & Marshall, G. R. (1974), Fed. Proc. 33(12). Scheraga, H. A., (1971), Chem. Rev. 71, 195.