Journal of Molecular Structure ( Theochem) , 308 (1994) 1- 11 0166-1280/94/$07.00 0 1994 - Elsevier Science B.V. All rights reserved
Sprouting peptides in unprejudiced searches
1
conformation
space
Carl W. David Department of Chemistry, University of Connecticut, U-60 Rm. 161, 21.5 Glenbrook Road, Storrs, CT 06269-3060. USA (Received 28 April 1993; accepted 17 May 1993) Abstract Methods of generating polypeptide structures which have a high probability of slipping into reasonable minimum energy structures are presented. Unfortunately, when compared to the structures obtained by minimizing from a crystal structure starting point, all our methods fail to do as good a job as “nature”, raising again the alarm that the prediction of secondary structure from primary structure is nowhere near being achievable.
Introduction
One promise of fully empirical energy computations is that someday, somehow, we will be able to ab initioa generate the “true” structures of real peptides in the “vacuum” state (of low interest), in the solid state (of high interest to crystallographers), and in solution (of high interest to biologists). This would allow chemists and biologists to predict the secondary structure of peptides given the primary structure and to predict the effect of mutating one or several residues in a peptide. If function follows from structure, then structure predictions would allow function predictions. To improve the efficiency of these sprouting schemes it would appear that there are two strategies available for exploration. If one knew that the peptides being synthesized (presumably on or in a ribosome [1,2]) achieved their local a Not necessarilyfrom quantum mechanicalcomputations, but from imitation potential energy functions which mimic the results of such computations. b We limit our discussion to ribosomes which do not excrete their products into a membrane. SSDZ 0166-1280(93)03548-L
secondary structure as they were inserted into the extra-ribosomal (aqueousb) medium, then one could attempt simulations based on such a transition from a hydrophobic to an aqueous environment; i.e. add a residue in the hydrophobic region, then ratchet the entire nascent peptide one residue’s length “forward” so that the “N-terminal tail” of the nascent peptide extends further into the aqueous surroundings, then go back, add another residue, etc., until the entire chain has been generated and expressed into the extra-ribosomal aqueous medium. Without this knowledge, it appears that an alternative scheme for starting an exploration of secondary (and higher) achievement of minimum energy conformations would be that one attempts minimizations from arbitrary starting conformations, with the guiding principle that the more experiments performed, the more likely that the minimum energy structure found corresponds to the global minimum energy found in nature. Such a strategy demands an unprejudiced approach to generating peptide secondary structure. Given such an unprejudiced scheme, minimization from a starting conformation is a
2
C.W. David/J. Mol. Struct. (Theochem)
308 (1994) I-II
Table 1 Methods of generating peptide structures Method
Description
la
Linear starting conditions,
lb
Comments
Example
Results in helical structures
VLA-2, Fig. 1
Random inside small box
Results in highly compacted (and knotted) structures
VLA-2, Fig. 2
2a
Random inside small box located on previous C”
Abandoned
VLA-2 cloned, Lys -+ Leu, Fig. 3
2b
Random polar angles relative but constant (3.7 A) radius relative to last C” in chain
Leads to dissociation
None
3a
Create backbone as in lb, then sprout, solvate, resprout with waters, remove overlapping waters, simulated annealing, minimization (with or without “jiggling”)
Makes beautiful structures
Fig. 6
3b
Create backbone as in 2b, then sprout, solvate, resprout with waters, remove overlapping waters, simulated annealing, minimization (with or without “jiggling”)
Makes high energy structures
Fig. 7
randomized
In this relatively straightforward endeavor. paper, we continue an exploration of sprouting techniques which enable us to generate low energy conformations of peptides. Specifically, we here contrast several methods for generating starting conformations, and explore how the prejudice induced by using one of these systems is absent from the others. Further, we show how the absence of prejudice leads to beautiful structures of comparable low energy. Finally, we show that all of these methods fail! Sprouting (hydrated) peptides
In earlier work in this laboratory, we showed that the Nilges-Briinger [3] scheme using X-PLOR a Side-chain sprouting means locating all side-chain atoms initially randomly within a box centered on the C4 of each residue. Complete sprouting means locating all non-C” atoms initially randomly within boxes centered on each C”, and replicate boxes of water molecules (62 per box) centered on each Ca with the protons randomly distributed within small boxes centered on their own oxygens.
in favor of 2b
[4] can be employed to (1) sprout side-chains of peptides whose backbone structure is considered known [5] and (2) sprout side-chains (and water protons for hydrated peptides) whether their C* structure was known or not [6].a In the second paper [6] we used two techniques for generating unknown atom coordinates. In the first method, we laid the peptide down on the “x-axis” say, with the atom’s ordinal numbering converted in a one-to-one fashion into angstroms. This scheme (method la, Table l), which is introduced in the X-PLOR manual, usually leads to secondary structures which are helical [7] (an example from the VLA-2 system is shown in Fig. 1). Noting this fact, we turned to another scheme in which we generated the coordinates of all the C” randomly inside a box of pre-fixed size (method lb). With such a scheme, and using the sprouting techniques discussed, we found that we could generate high energy starting conformations (which were chemically nonsense) and that the simulated annealing scheme properly connected atoms and residues, sorting out a chaotic system into a simple, dense structure. Our ultimate
C. W. David/J. Mol. Struct.
Fig. 1. A helical structure
(Theochem)
308 (1994)
for native sequence (seq_ 1). From an originally stretched out (elongated) chain, one finds a local minimum structure which is highly helical.
energy structure for this same VLA-2 primary structure is shown in Fig. 2. It seemed reasonable to look for a generating scheme which would provide a semblance of chemical sanity to the starting conformation, prior to annealing. Presumably, C* connected starting structures would anneal more efficiently.
minimum
3
I-11
Random walk schemes We chose to explore the following schemes: (1) Place the coordinates of Cp at the origin (0, 0, 0) (2) In linear fashion, starting at the N-terminus peptide, locate the nth residue’s Ca atom:
C. W. D~vidl~~Mol. Struct. ~Theoche~~ 308 (1994) I-II
E = -58f.421 2-02-1993/1.7:50:47
VLA-2 Fig. 2. The ~nimum energy seq._1 structure found. This structure corresponds to an energy of -581 kcal mol-‘. The backbone has adjusted itself so that Leu 15allows its carbonyl oxygen atom to interact with Lys 7. Lys 7 is also interacting wtih Asp 8, i.e. Lys 7 is at the center of a large charge-charge interaction web.
C. W. David/J. Mol. Struct. (Theochem)
308 (1994)
1-11
(a) (method 2a) inside a random box of maximum size f 6 centered on the C* of the (n - 1)th residue, or (b) (method 2b) at a random set of polar angles but a fixed radius (3.7A) from the C” of the (n - 1)th residue. (3) Optionally hydrate.a (4) Sprout all non-Ca atoms (and optionally water protons) so that they are in the vicinity of their owners (CY and/or water oxygen atoms). (5) Anneal (simulated). (6) Optionally strip off waters (if present). (7) Finally, minimize the energy with the standard potential energy functions operative. Examples from mutations in the VLA-2 integrin system
VLA-2, a cell adhesion molecule found on platelets, fibroplasts, B cells and monocytes, binds to collagen in the presence of Mg2+. A putative Mg2+-mediated collagen binding domain on the a2 subunit of VLA-2 has been identified with the primary sequence Asp-Val-Asp-Lys-Asp-ThrIle-Thr-Asp (residues 470-478) [8] and we have undertaken a study of the possible interactions of this fragment in association with divalent cations and collagen. Since the three-dimensional structure of VLA-2 is unknown, it seemed worthwhile attempting to predict the structure of this fragment (flanked on either side by several extra residues (three in our case, yielding a fragment extending from 467, i.e. Cys, Ser, Val, Asp, Val, Asp, Lys, Asp, Thr, Ile, Thr, Asp, Val, Leu, Leu, through 481)), in the solvated state, to see if we could find a starting VLA-2 structure for the interaction between VLA-2 and collagen which would appear practical. Figures 1 and 2 show our results on this primary structure (herein denoted as seq_ 1). aIn hydrating, we took care to remove essentially overlapping waters which otherwise inadvertently raised the energy to too high a value. However, ultra-high energy situations occasioned by “overlapping” waters turned out to be less of a problem than the “overlapping” C”.
5
In the study being reported here, we attempted to mutate a Lys(7) residue (since it appears to be involved in a complicated charge-charge interaction in the structure depicted in Fig. 2) hoping to determine whether or not the secondary structure we found with the lysine was destroyed when the lysine was transmuted into something else. In this paper we discuss the mutation of Lys + Leu. In cloning Lys + Leu and then using method 2a for coordinate generation, we found that after the resultant molecule simulated annealing, needed substantial minimization coupled with “jiggling” the coordinates, i.e. translating them a small but random amount in the f x, f y and f z directions prior to invoking a minimization algorithm. This allowed the systems to escape from high lying local minima. However, we noted that it usually took two cycles of “jiggling” and minimizing before convergence was obtained. Our best mutated structure is shown in Fig. 3. We therefore decided to create an even more “faithful” starting point by requiring that C* be separated by approximately 3.7A (method 2b). This adjustment caused all the non-Ca atoms to lie in the vicinity of their ultimate destination (as specified by the location of “their” C”). We were (and still are) surprised to note that this method, which is physically quite appealing, fails miserably. The chains generated in this manner were correctly connected, i.e. the C’% were separated by the proper amount (M 3.7 A), but the cooperativity required to untangle badly tangled chains made it impossible to keep the chains intact, and instead, we always ended up with dissociated fragmentary residues rather than a properly connected peptide (which we had expected). Even small peptides such as this small mutated VLA-2 fragment gave problems. We attempted to remedy the situation by fixing all C* during sprouting and simulated annealing, but to no avail. We were forced to constantly rescale coordinates fractionally, i.e. compress the globular cluster, in order to obtain Cm-C” connectivity, and more often than not, the new
C.W. David/J. Mol. Struct. (Theochem)
Fig. 3. The best minimum for seq_2 (Lys + Leu) achieved.
308 (1994) l-11
C. W. David/J. Mol. Struct. (Theochem)
Fig. 4. The backbone
308 (1994)
I-II
of the reported crystal structure of rubredoxin. -60 kcal mol-’
starting point resulted in dissociation upon subsequent minimization. We conclude that this method is of no utility. Method 3a is finally reasonable. Here, the absence of solvent in the first stage of sprouting and annealing leads to acquisition of reasonably disentangled chains where the required motions
The energy of this structure is computed
to be about
have been unfettered by surrounding (and interstitial) waters (which can’t “get out of the way”). Once the backbone is formed, resprouting, and water protons results in using all non-P ultimate conformations of good connectivity. Method 3b fails because the Cp-Cia, 1-Cr+2 angle is not arbitrary!
C. W. David/J. Mol. Struct. (Theochem)
308 (1994) I-11
Fig. 5. A low energy view of rubredoxin (from the crystal structure). This is the lowest energy structure obtained by minimizing the system starting at the known crystal structure. The vacuum minimization leads to an energy of about -2042 kcal mol-’ .
Examples from the rubredoxin system Rubredoxin is an iron/sulfur-containing electron transfer protein whose structure is well characterized, and whose intermediate length is appropriate as a test subject. This molecule serves as an electron carrier in several reactions. There is no inorganic sulfur in this molecule, and the iron
(Fe(II1)) is tetracoordinated to cysteines. The Fe-S distances are all about 2.2 to 2.3 A to cysteines 6,9, 39 and 42 [9] and the S-S distances are about 3.1 to 3.8 A apart, approximately tetrahedrally disposed. We set ourselves the task of ab initio generating secondary structures and comparing them to both the known crystal structure and to the structure obtained by minimizing the energy starting at
9
C. W. David/J. Mol. Struct. (Theochem) 308 (1994) l-11
From Box_Spr_Solv_Spr
E=- 1669 (rubredoxin) Fig. 6. This is the best rubredoxin
energy we have ever achieved. The energy is about - 1669 kcal mol-’ .
this known structure, in the absence of the Fe(II1) cation. The latter should correspond to a reasonably low energy configuration. We started these ab initio computations in two ways, first in vacua (method lb) and second, hydrated. In every case, the hydrated minimum
structure, when stripped of waters and reminimized, came to a lower energy than those structures which were only minimized in vacua. Therefore, we will not further discuss the vacuum minimized structures. It is not clear that any minimization should
10
C. W. David/J. Mol. Struct. (Theochem)
Fig. 7. A high energy (but beautiful) knotted rubredoxin
the crystallographically known structure given the absence of ferric ion in the simulations, the lack of knowledge of whether or not packing forces and solvent presence influence the crystal structure, etc. In fact, we attempted minimization using X-PLOR from the reported crystal structure, just to see if indeed that structure was close to a minimum, and found that the energy dropped from -60 to -2042 kcal mol-' . This enormous drop in energy was accompanied by a further distortion but not a disruption of the sulfur “tetrahedron”. We generated dense globular cluster starting states using method 3a to create starting sprouted, minimized, C” chains, hydrated, dehydrated and reminimized several times, using different seeds for the random number generator which guarantees uniquely different experiments for each run. The system consistently succeeds in generating dense, compact and connected chains. We were not able to generate chains (using method 3a) of lower energy than the lowest energy of the molecule starting from the known crystal generate
308 (1994) 1-11
structure. The energy is about 2000 kcal mol-’ .
structure (see Fig. 4). The vacuum minimized structure of rubredoxin starting from the crystal structure is shown in Fig. 5. Our best effort is shown in Fig. 6. A whimsical (and interesting) artificial structure is shown in Fig. 7. Discussion
The fact that our best rubredoxin “ab initio” structure is approximately 400 kcal mol-’ higher in energy than one which is obtained by minimizing the structure starting with the known (experimental) structure is very discouraging. The most surprising conclusion resulting from this work is that the sprouting technique works best with randomly located C”, and poorest with sequentially randomly located C* whose entanglements make it impossible for the nascent polymer to properly anneal into reasonably connected configurations. The most disheartening result of this work is that one can be easily misled into accepting low energy
C. W. David/J.
Mol. Struct. (Theochem)
308 (1994)
11
l-11
structures as minimum energy structures based on limited sampling. Even using a sprouting technique, in which the atoms gradually find their positions guided by the increasingly realistic forces that are being brought into play by the simulated annealing, we are faced with the fact that the starting configuration of the C* limits the accessibility of regions of the phase space, so that it is entirely possible that the global minimum energy configuration lies outside of the sampled regions. In the case of rubredoxin itself, the suggestion appears to be that it would be worthwhile investigating the molecule’s structure when the Fe(II1) has been removed. Our work suggests that removal of this Fe3+ will cause a non-trivial disruption of secondary structure. Acknowledgments This research was supported in part by a grant from the Pittsburgh Supercomputing Center through the NIH Division of Research Resources
Cooperative Agreement 1 P41 RR06009-01 and through a grant from the National Science Foundation Cooperative Agreement ASC8500650. The author is also indebted to the Digital Equipment Corporation for the loan of an Alpha computer/workstation which enabled the prototyping of the X-PLOR programs to be carried out in an expeditious fashion. References K. Nagano, H. Takagi and M. Harel, Biochemie, 73 (1991) 941. M. Eisenstein, R. Sharon, Z. Berkovich-Yellin, H.S. Gewitz, S. Weinstein, SE. Pebay-Peyroula, M. Roth and A. Yonath, Biochemie, 73 (1991) 879. M. Nilges and A. Briinger, Protein Eng., 4 (1991) 649. A. Brunger, X-PLOR manual, version 2.1, Yale University, New Haven, CT, 1990. C.W. David, J. Comput. Chem., 19 (1993) 715. C.W. David, J. Comput. Chem., 15 (1994) 23. P. Kraulis, J. Appl. Crystallogr., 24 (1991) 946. Y. Takada and M.E. Hemler, J. Cell. Biol., 109 (1989) 397. L.H. Jensen, in H. Matsubara, Y. Katsube and K. Wada (Eds.), Iron-Sulfur Protein Research, Springer, 1986, pp. 3-31.