Solvent effects on the φ–ψ potential surfaces of glycine and alanine dipeptides studied by PCM and I-PCM methods

Solvent effects on the φ–ψ potential surfaces of glycine and alanine dipeptides studied by PCM and I-PCM methods

Journal of Molecular Structure (Theochem) 586 (2002) 111±124 www.elsevier.com/locate/theochem Solvent effects on the f ± c potential surfaces of gly...

361KB Sizes 0 Downloads 34 Views

Journal of Molecular Structure (Theochem) 586 (2002) 111±124

www.elsevier.com/locate/theochem

Solvent effects on the f ± c potential surfaces of glycine and alanine dipeptides studied by PCM and I-PCM methods Michio Iwaoka, Mai Okada, Shuji Tomoda* Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Komaba, Meguro-ku, 153-8902 Tokyo, Japan Received 21 November 2001; accepted 21 February 2002

Abstract Two-dimensional f ± c potential surfaces of representative amino acid models, For-Gly-NH2 (1) and For-Ala-NH2 (2), were calculated at the HF/6-31111G(2d,2p)//HF/6-31G(d) level of theory in various solvents by applying the polarizable continuum model (PCM) and isodensity polarizable continuum model (I-PCM) methods. Three dielectric constants of medium (i.e. 4.335, 32.63, and 78.39) were selected corresponding to ether, methanol, and water, respectively. The PCM potential surfaces in the solvents were found to be signi®cantly different from the corresponding I-PCM potential surfaces due presumably to the lower approximation level. However, the both methods commonly resulted in local energy minima at the a region for 1 and the a and b regions for 2. Thus, intrinsic conformational propensities of glycine and alanine residues to adopt folded secondary structures were suggested in the solvents. Comparison of the more reliable I-PCM potentials with the Ramachandran plots extracted from protein databank (PDB_SELECT) revealed that glycine and alanine residues in folded proteins roughly adopt random statistical structures in high energy ranges (DE . 3 and 5 kcal/mol for 1 and 2, respectively). It was also proposed that the conformational space of a polypeptide molecule is signi®cantly restricted in water compared with that in the gas phase. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Solvent effect; Conformational space; Glycine; Alanine; Isodensity polarizable continuum model

1. Introduction Conformational preferences of single amino acid residues in the context of a polypeptide sequence are of great interest [1±7] because folding pathways of protein molecules must be primarily restricted by the total conformational space determined by the individual amino acid residues. Several theoretical methods, such as ab initio molecular orbital calculations [8±13] and Monte Carlo and molecular dynamics simulations using empirical force ®elds [14±17], have been applied to search for the stable conformers of dipeptidyl amino acid models. However, ®ne * Corresponding author. Fax: 181-3-5454-6998. E-mail address: [email protected] (S. Tomoda).

pro®les of the whole potential surfaces, which are practically useful for predicting protein-folding pathways, have not been well elucidated, especially in polar solvents. Two-dimensional f ± c potential surfaces of representative amino acid models, For-Gly-NH2 (1) and For-Ala-NH2 (2) (Fig. 1), were previously studied by Head-Gordon et al. [18] by ab initio molecular orbital method: the potential surfaces obtained in vacuo at the HF/3-21G level showed a global energy minimum at a so-called C7 region, which corresponds to the conformer having a seven-membered hydrogenbonded ring, and another energy minimum of similar stability at a so-called C5 region, which corresponds to the extended conformer having a ®ve-membered hydrogen-bonded ring. In water, however, the

0166-1280/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0166-128 0(02)00076-3

112

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

Fig. 1. Structural formulae of glycine dipeptide (1) and alanine dipeptide (2) with the de®nitions of dihedral angles f and c .

potential surfaces changed signi®cantly [19]: the C7 conformer was relatively destabilized to allow the helical conformer (a) to become a local energy minimum, according to the potential surfaces calculated in water by using the self-consistent reaction ®eld (SCRF) method with the original Onsager model [20]. Similar solvent effects were reported by Gould et al. [21]: the relative energies of stable conformers for Ac-Gly-NHMe and Ac-Ala-NHMe in water were calculated by using an advanced solvation model, namely a polarizable continuum model (PCM) [22]. The calculation results were consistent with the experimental observation that Ac-Ala-NHMe mostly adopts a and b structures in water [23]. However, the whole potential surfaces have not been determined at the PCM level. Nevertheless, inadequacy of the PCM method has recently been claimed [24,25]. It is, therefore, worthwhile to carry out more reliable theoretical calculations to obtain the potential surfaces of these amino acid models in water. The isodensity polarizable continuum model (IPCM) method [26] is more advanced than the PCM method in that the cavity of a solute is de®ned by the electron isodensity surface in I-PCM, instead of the van der Waals surface in PCM. Ef®ciency of this method has been widely recognized in chemical behaviors in solution for small polar molecules [27±30]. In this paper, two-dimensional f ± c potential surfaces of glycine dipeptide (For-Gly-NH2, 1) and alanine dipeptide (For-Ala-NH2, 2) have been calculated at the HF/6-31111G(2d,2p)//HF/6-31G(d) level of theory in various solvents by applying the PCM and I-PCM methods. Three dielectric constants of medium, i.e. 4.335, 32.63, and 78.39 (corresponding to ether, methanol, and water, respectively), are selected. Based on the obtained f ± c potential surfaces, the solvent effects on the conformational preferences of glycine and alanine dipeptides have

been studied in detail. In addition, the roles of longrange inter-amino acid interactions on the stability of protein structures have been discussed by comparing the I-PCM potential surfaces with the Ramachandran plots [31] of glycine and alanine residues extracted from protein databank (PDB_SELECT) [32,33]. The f ± c potential surfaces obtained here will be useful for ab initio prediction of protein folding pathways and re®nement of protein force ®elds. 2. Computational method All molecular orbital calculations were carried out by using the gaussian 98 program [34] installed on DEC Alpha21164A unix workstations or on IBM SP2 and SGI SGI2800 computer systems in the Computer Center, Institute for Molecular Science, Okazaki National Research Institutes. The method reported by Head-Gordon et al. [18] was utilized to obtain the f ± c potential surfaces of 1 and 2, although larger basis sets and more advanced solute-treatment methods, i.e. PCM [22] and I-PCM [26], were applied in this study. The molecular structure was fully optimized at the HF/6-31G(d) level with the two dihedral angles (f and c ) frozen at the appropriate values, which were changed from 21808 to 11658 with a grid size of 158. For the glycine model (1), which does not have a chiral center, 290 independent structures were considered. On the other hand, 576 structures were calculated for the alanine model (2). The sets of structures thus obtained were allowed for energy calculation to obtain the f ± c potential surfaces of 1 and 2 in vacuo and in three different solvents; ether …1 ˆ 4:335†; methanol …1 ˆ 32:63†; and water …1 ˆ 78:39†: Larger basis sets, 631111G(2d,2p), were used for the energy calculations in solution in order to include long-range interactions as ef®ciently as possible. The PCM and I-PCM methods, which have been linked in gaussian 98 program, were applied with standard parameters. Ramachandran plots [31] of glycine and alanine residues in folded proteins were obtained as follows. 1085 heterogeneous protein structures (resolution Ê ) were selected from Protein Databank accord#3.0 A ing to the PDB_SELECT database [32,33], which lists up non-redundant protein structures with structural homology less than 25%. Main-chain dihedral angles

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

113

Table 1 A list of methods applied for calculation of the f ± c potential surfaces of For-Gly-NH2 (1) and For-Ala-NH2 (2) Solvation model a

Method

Calculation levels

I II

HF/6-31G(d)//HF/6-31G(d) HF/6-31111G(2d,2p)// HF/6-31G(d) HF/6-31111G(2d,2p)// HF/6-31G(d) HF/6-31111G(2d,2p)// HF/6-31G(d) HF/6-31111G(2d,2p)// HF/6-31G(d) HF/6-31111G(2d,2p)// HF/6-31G(d)

III IV V VI a

Medium (1 ) Vacuum (1.0) Vacuum (1.0)

PCM

Water (78.39)

I-PCM

Ether (4.335)

I-PCM

Methanol (32.63)

I-PCM

Water (78.39)

PCM is a polarizable continuum model [22]. I-PCM is an isodensity polarizable continuum model [26].

(f and c ) were then calculated for all non-terminal glycine and alanine residues. 10,836 glycine and 11,871 alanine residues were utilized for drawing the Ramachandran plots. The conformational energy of each glycine or alanine residue in folded proteins was calculated by plotting the f and c dihedral angles on the computed f ± c potential surfaces of 1 or 2, respectively. The energy was linearly interpolated for each structure. Conformational energies thus estimated for glycine and alanine residues were statistically analyzed. On the other hand, the area in an arbitrary conformational energy range on the calculated f ± c potential surfaces of 1 and 2 was determined by dividing the

potential surfaces into 360 £ 360 grids, for each of which the conformational energy was estimated by a linear interpolation method.

3. Results Two-dimensional f ± c potential surfaces were obtained at various calculation levels for For-GlyNH2 (1) and For-Ala-NH2 (2). Computational methods applied in this study are listed in Table 1. In all methods, molecular structures were obtained at the HF/6-31G(d) level. Table 2 lists de®nitions of conformers utilized in

Table 2 De®nitions of conformers for For-Gly-NH2 (1) and For-Ala-NH2 (2) (the ranges of f and c dihedral angles are with no relation to the depth and the width of the corresponding potential energy minima) Symbols

f (8)

c (8)

Notes

C5 PII b1 b2 21 C7, C7eq B a, aR a0 H C7ax aL aD X

2180 to 2150 290 to 260 2135 to 2105 2120 to 290 2165 to 2150 2120 to 275 2135 to 290 290 to 260 2180 to 2135 2150 to 290 60±90 30±75 45±75

150±180 150±180 135±150 105±120 45±90 45±90 0±30 260 to 215 290 to 245 2150 to 2105 275 to 215 15±60 2180 to 2120, 135±180

Extended with a ®ve-membered hydrogen-bonded ring Poly-Pro II helix Anti-parallel b sheet Parallel b sheet With a seven-membered hydrogen-bonded ring A bridge region Right-handed a helix A high energy region With a seven-membered hydrogen-bonded ring Left-handed a helix Other regions

114

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

Table 3 f and c dihedral angles and relative energies (DE in kcal/mol) of energy minima on the potential surfaces of For-Gly-NH2 (1) obtained by various calculation methods Conformer (f , c ) {DE} b

Method a

Conformer (f , c ) {DE} b

I

C5 (180, 180) {0.0} C7 (290, 60) {0.6}

V

II

C5 (180, 180) {0.0} C7 (290, 60) {1.1} C7 (290, 90) {1.2}

III

a (290, 215) {0.0} PII (275, 180) {0.3} C5 (180, 165) {0.5} PII (275, 150) {0.8} b2 (2120, 105) {2.0} H (2135, 2135) {2.2} a 0 (2135, 290) {3.0}

C5 (180, 180) {0.0} a (290, 230) {2.3} 21 (2165, 60) {3.9} H (290, 2135) {4.0} C7 (290, 45) {4.1} H (2135, 2120) {5.0} X (275, 275) {5.8} X (0, 0) {11.3}

VI

C5 (180, 180) {0.0} a (290, 230) {2.2} 21 (2165, 60) {3.9} H (290, 2135) {4.0} C7 (290, 45) {4.2} H (2135, 2120) {4.9} X (275, 275) {5.6} X (0, 0) {11.4}

Method

IV

a

C5 (180, 180) {0.0} a (290, 230) {2.6} C7(290, 45) {3.0} 21 (2165, 60) {4.0} X (275, 275) {7.0}

a

See Table 1 for calculation methods. Conformers are arranged in the order of the relative stability. f and c dihedral angles of the obtained energy minima are given at the resolution of 158. See Table 2 for the de®nitions of conformer symbols. b

this study. The same symbols as those used the literature [18,19] are employed here except for b, which was assigned to the conformer with f , 2 1208 and c , 308 in the previous de®nition. We have assigned symbol B (a bridge region) to this conformer. Instead, b1 and b2 have been used for the conformers corresponding to anti-parallel and parallel b-sheets, respectively. In addition, three new symbols, 21, H, and X, have been introduced for previously unde®ned conformers according to the assignment by Grant et al. [35]. It should be noted that the ranges of f and c dihedral angles for the stable conformers have been de®ned for convenience, hence they have no relation to the depth and the width of the corresponding energy minima on the potential surfaces. 3.1. For-Gly-NH2 (1) Fig. 2. The f ± c potential surface of For-Gly-NH2 (1) in vacuo …1 ˆ 1:0† at the HF/6-31111G(2d,2p)//HF/6-31G(d) level (method II). Contours are drawn with an interval of 1 kcal/mol up to 15 kcal/ mol. The positions of energy minima are indicated by appropriate symbols.

The main-chain f and c dihedral angles and the relative potential energies (DE) of stable conformers located on the potential surfaces of 1 are summarized in Table 3. Since the potential surface of the prochiral

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

Fig. 3. The f ± c potential surface of For-Gly-NH2 (1) in ether …1 ˆ 4:335† at the HF/6-31111G(2d,2p)//HF/6-31G(d) level with IPCM (method IV). Contours are drawn with an interval of 1 kcal/ mol up to 15 kcal/mol. The positions of energy minima are indicated by appropriate symbols.

glycine model (1) is center-symmetric, the only stable conformers found in the range of 2180 # f # 08 are listed in the table at the resolution of 158 for dihedral angles f and c . It is seen that the structure and the

Fig. 4. The f ± c potential surface of For-Gly-NH2 (1) in water …1 ˆ 78:39† at the HF/6-31111G(2d,2p)//HF/6-31G(d) level with IPCM (method VI). Contours are drawn with an interval of 1 kcal/ mol up to 15 kcal/mol. The positions of energy minima are indicated by appropriate symbols.

115

Fig. 5. Ramachandran plots for glycine superimposed on the I-PCM potential surface of For-Gly-NH2 (1) in water (method VI). 10,836 glycine residues were extracted from 1085 non-redundant protein structures.

relative stability of the energy minima greatly depend on the calculation methods. The potential surface of 1 obtained in vacuo at the HF/6-31111G(2d,2p)//HF/6-31G(d) level (method II) is shown in Fig. 2 The pro®le was very similar to those at lower calculation levels (i.e. the HF/3-21G// HF/3-21G level [18] and method I), suggesting the small effect of basis sets on the conformational propensities of 1. On the other hand, the potential surface obtained in water by applying the PCM method (method III, not shown) was found to be apparently different from that shown in Fig. 2. This feature is also deducible from the data in Table 3. The I-PCM potential surfaces obtained in ether (method IV) and in water (method VI) are shown in Figs. 3 and 4, respectively. The potential surface in methanol (method V, not shown) was essentially the same to Fig. 4. These I-PCM potentials were signi®cantly different from that in vacuo (Fig. 2), while they did not differ largely from each other (Table 3). Despite the obvious discrepancy between the PCM and I-PCM potential surfaces, the a regions commonly became distinct local energy minima in water, irrespective of the solvation models. Therefore, the helical tendency of glycine residues in proteins was strongly suggested. In order to analyze the effect of long-range

116

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

Table 4 Ratios of the area (%area) with a relative energy (DE) below 3.0 or 5.0 kcal/mol on the potential energy surfaces of For-Gly-NH2 (1) and the   mean potential energies estimated for glycine residues in proteins, E…protein†; and in the random distribution, E…Boltzmann†; on the basis of the calculated potential surfaces Method

I II III IV V VI

a

%Area DE # 3.0

DE # 5.0

32.2 30.5 44.7 8.4 4.5 4.1

53.2 52.5 70.2 38.5 22.8 21.3

b  E…protein† (kcal/mol)

c  E…Boltzmann† (kcal/mol)

D E d (kcal/mol)

2.91 ^ 1.61 3.00 ^ 1.70 1.60 ^ 1.03 4.05 ^ 1.51 4.72 ^ 1.55 4.80 ^ 1.57

0.96 ^ 0.76 0.94 ^ 0.79 1.19 ^ 0.70 1.07 ^ 1.05 0.99 ^ 1.00 0.99 ^ 1.00

1.95 2.06 0.41 2.98 3.73 3.81

a

See Table 1 for calculation methods. 10,836 glycine residues were extracted from 1085 heterogeneous protein structures. c The temperature was set at 300 K. d   The relative mean potential energies, E…protein†± E…Boltzmann†; indicating an average of the conformational strain energies for glycine residues in folded protein structures. b

inter-amino acid interactions on the statistical conformations of glycine residues in folded proteins, Ramachandran plots of glycine residues were obtained from heterogeneous protein structures (PDB_SELECT). The resulting Ramachandran plots are superimposed on the I-PCM potential surface in water (Fig. 5). The plots formed strong clusters in a regions and rather spread clusters in PII and C5 regions. Although the centers of the a clusters shifted slightly toward the directions of f ˆ 08; the statistical conformations of glycine residues in proteins are roughly consonant with the f ± c potential surfaces in water. Most of plots were located in the region with the relative energy (DE) less than 7 kcal/mol. Table 4 shows ratios of the area (%area) on the potential surfaces of 1 that belongs to the relative energy (DE) below 3.0 or 5.0 kcal/mol. The mean potential energies estimated for the glycine residues  in folded proteins, E…protein†; and those in the hypothetical random distribution at 300 K,  E…Boltzmann†; are also listed. The magnitude of %area indicates the ¯atness of the calculated potential  E…protein†  surfaces. The value of delta E‰ˆ 2  E…Boltzmann†Š represents an average of the hypothetical conformational strain energies (perturbation energies), due to inter-amino acid interactions, for glycine residues in folded proteins. There is a clear tendency that the value of delta E decreases as the value of %area increases. The remarkably small value of delta E obtained for the PCM potential in

water (0.41 kcal/mol in method III) is indicative of the excellent agreement between the PCM potential (method III) and the statistical conformations of glycine residues in proteins. The reason for this coincident is not clear at the moment. 3.2. For-Ala-NH2 (2) Potential surfaces obtained for For-Ala-NH2 (2) showed similar dependence on the calculation methods as observed for For-Gly-NH2 (1). Table 5 summarizes the main-chain f and c dihedral angles and the relative potential energies (DE) of stable conformers found on the potential surfaces of 2 at various calculation levels. Because of the methyl group at the C(a) atom of alanine, the potential surfaces became unsymmetrical. Therefore, all stable conformers on the whole potential surfaces (2180 # f # 1808) are listed in the table at the resolution of 158 for f and c . The structure and the relative stability of the energy minima were largely affected by the calculation methods, although the basis set effects (methods I vs. II) seemed to be marginal. Representative f ± c potential surfaces of 2 calculated by methods II, IV, and VI are shown in Figs. 6±8, respectively. Like the case of the glycine model (1), the potential surface of 2 in vacuo (method II, Fig. 6) was signi®cantly different from the PCM (method III) and IPCM (methods IV±VI) potential surfaces of 2 in

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

117

Table 5 f and c dihedral angles and relative energies (DE in kcal/mol) of energy minima on the potential surfaces of For-Ala-NH2 (2) obtained by various calculation methods Method I

a

Conformer (f , c ) {DE} b

Method a

Conformer (f , c ) {DE} b

C7eq (290, 75) {0.0} C5 (2165, 165) {0.4} B (2120, 15) {2.4} C7ax (75, 260) {2.5} aL (60, 30) {4.6} a 0 (2165, 245) {5.5}

(IV)

aL (75, 45) {3.9} H (2120, 2120) {4.1} aD (60, 2135) {4.3} X (75, 105) {7.4}

V

b1 (2135, 135) {0.0} b2 (290, 120) {0.4} C7eq (2120, 45) {1.1} B (2105, 15) {1.2} C5 (2150, 165) {2.3} aR (290, 260) {2.7} aL (75, 45) {2.7} H (2120, 2120) {3.2} C7ax (75, 260) {3.2} H (290, 2120) {3.4} aL (30, 60) {4.3} aD (60, 2135) {4.5} X (75, 105) {5.3} a 0 (2165, 290) {5.9} X (135, 260) {6.8} aD (60, 165) {7.2}

VI

b1 (2135, 135) {0.0} b2 (290, 120) {0.4} C7eq (2120, 45) {1.1} B (2105, 15) {1.2} C5 (2150, 165) {2.4} aR (290, 260) {2.5} aL (75, 45) {2.5} H (2120, 2120) {3.1} H (290, 2120) {3.2} C7ax (75, 260) {3.2} C7ax (75, 215) {4.2} aL (30, 60) {4.2} aD (60, 2135) {4.5} X (75, 105) {5.1} a 0 (2165, 290) {5.8} X (135, 260) {6.8} aD (60, 165) {7.1}

II

C5 (2150, 165) {0.0} C7eq (290, 75) {0.0} C7ax (75, 245) {2.6} aL (60, 30) {4.7} a 0 (2165, 245) {5.3}

III

C5 (2150, 150) {0.0} b1 (2120, 150) {0.4} B (2120, 15) {0.4} aR (275, 230) {0.7} PII (260, 150) {1.2} PII (290, 180) {1.3} C7eq (290, 75) {1.5} aL (75, 30) {2.2} 21 (2165, 75) {2.6} H (290, 2150) {2.7} H (2150, 2135) {3.6} aD (60, 180) {3.9} C7ax (75, 260) {4.0} aD (60, 2120) {4.3} aD (60, 2150) {4.4} aD (75, 135) {4.5} H (290, 2105) {4.7} X (165, 215) {4.9} a 0 (2135, 275) {5.0}

IV

b1 (2135, 135) {0.0} b2 (290, 120) {0.2} C7eq (2105, 45) {1.3} B (2105, 15) {1.5} C5 (2150, 165) {1.5} C7ax (75, 260) {2.7}

a

See Table 1 for calculation methods. Conformers are arranged in the order of the relative stability. The f and c dihedral angles of the obtained energy minima are given at the resolution of 158. See Table 2 for the de®nitions of conformer symbols. b

various solvents. Again, the I-PCM potential surface in ether (Fig. 7) did not largely differ from that in water (Fig. 8), and that in methanol (not shown) was essentially the same as that in water. Although the PCM and I-PCM potential surfaces were apparently different from each other, it is notable that the a and b

regions were commonly stabilized in the solvents irrespective of the solvation models (Table 5). Ramachandran plots of alanine residues in proteins are superimposed in Fig. 9 on the I-PCM potential surface obtained in water (method VI). Strong clusters appeared in the regions of aR, PII, and C5, and a weak

118

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

Fig. 6. The f ± c potential surface of For-Ala-NH2 (2) in vacuo …1 ˆ 1:0† at the HF/6-31111G(2d,2p)//HF/6-31G(d) level (method II). Contours are drawn with an interval of 1 kcal/mol up to 15 kcal/ mol. The positions of energy minima are indicated by appropriate symbols.

Fig. 8. The f ± c potential surface of For-Ala-NH2 (2) in water …1 ˆ 78:39† at the HF/6-31111G(2d,2p)//HF/6-31G(d) level with IPCM (method VI). Contours are drawn with an interval of 1 kcal/ mol up to 15 kcal/mol. The positions of energy minima are indicated by appropriate symbols.

cluster appeared in the region of aL. These regions were relatively stabilized on the PCM and I-PCM potential surfaces of 2 (Table 5 and Figs. 7±8). The distribution of the Ramachandran plots was well

consonant with the PCM and I-PCM potentials. Therefore, alanine residues would have intrinsic propensities to adopt folded secondary structures in the solvents. Table 6 shows ratios of the area (%area) with a

Fig. 7. The f ± c potential surface of For-Ala-NH2 (2) in ether …1 ˆ 4:335† at the HF/6-31111G(2d,2p)//HF/6-31G(d) level with IPCM (method IV). Contours are drawn with an interval of 1 kcal/ mol up to 15 kcal/mol. The positions of energy minima are indicated by appropriate symbols.

Fig. 9. Ramachandran plots for alanine superimposed on the I-PCM potential surface of For-Ala-NH2 (2) in water (method VI). 11,871 alanine residues were extracted from 1085 non-redundant protein structures.

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

119

Table 6 Ratios of the area (%area) with a relative energy (DE) below 3.0 or 5.0 kcal/mol on the potential energy surfaces of For-Ala-NH2 (2) and the   mean potential energies estimated for alanine residues in proteins, E…protein†; and in the random distribution, E…Boltzmann†; on the basis of the calculated potential surfaces Method

a

I II III VI V VI

%Area DE # 3.0

DE # 5.0

17.1 18.8 18.8 13.8 12.0 12.0

31.4 33.3 38.5 30.2 30.0 27.8

b  E…protein† (kcal/mol)

c  E…Boltzmann† (kcal/mol)

D E d (kcal/mol)

3.37 ^ 1.67 3.01 ^ 1.75 2.21 ^ 1.15 2.95 ^ 1.40 3.10 ^ 1.31 3.12 ^ 1.31

0.98 ^ 0.74 0.76 ^ 0.66 1.13 ^ 0.84 1.06 ^ 0.71 1.29 ^ 0.74 1.31 ^ 0.74

2.39 2.25 1.08 1.89 1.81 1.81

a

See Table 1 for calculation methods. 11,871 alanine residues were extracted from 1085 heterogeneous protein structures. c The temperature was set at 300 K. d   The relative mean potential energies, E…protein†± E…Boltzmann†; indicating an average of the conformational strain energies for alanine residues in folded protein structures. b

relative energy (DE) below 3.0 or 5.0 kcal/mol on the potential surfaces calculated for 2 and the mean potential energies expected for the alanine residues  in folded proteins, E…protein†; and in the hypothetical  random distribution at 300 K, E…Boltzmann†: In the case of the alanine model (2), the negative correlation between %area and delta E was not as obvious as that observed in the case of the glycine model (1). This suggested that the potential surfaces of 2 are more complicated than those of 1. The values of delta E obtained by method III (PCM in water) was signi®cantly smaller than those obtained by other methods, again indicating the curious coincident between the less reliable PCM potential and the statistical conformations of alanine residues in proteins.

4. Discussion 4.1. Effect of solvent According to the results of ab initio calculations shown in Tables 3±6 and Figs. 2±4 and 6±8, it is clearly seen that the conformational preferences of the glycine model (1) and the alanine model (2) change abruptly from in vacuo to in solution. Although the observed solvent effects signi®cantly depend on the solvation models (PCM vs. I-PCM), the a region on the potential surfaces of 1 and the a and b regions on the potential surfaces of 2 are

commonly stabilized in solution irrespective of the solvation models. It is also notable that the effects of solvent polarity are rather small, at least in the range of 4.335 (ether) # 1 # 78.39 (water), compared with the larger solvent effects observed by changing the medium from vacuo …1 ˆ 1:0† to ether …1 ˆ 4:335†: These solvent effects are discussed later in detail. 4.2. Polarizable continuum model potentials For the case of the glycine model (1), stable conformers were located in vacuo only in the regions of C5 and C7 (methods I and II) (Table 3). However, these conformers were relatively destabilized in the PCM potential obtained in water (method III). Instead, stable energy minima (DE # 2.0 kcal/mol) appeared in the regions of a, PII, and b2. The observed solvent effects can be rationalized by the consideration that the intramolecular hydrogen bonds present in the C5 and C7 conformers are strong in vacuo but are weak in polar environments. Similar solvent effects were previously reported by Shang and Head-Gordon [19] based on the simple Onsager's dipole model. For the case of the alanine model (2), similar solvent effects were observed as those observed for 1. Table 5 shows that three stable conformers (C5, C7eq, and C7ax) located in vacuo (method II) are relatively destabilized in the PCM potential in water (method III), while new stable conformers appear in

120

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

the regions of b1, B, aR, PII, and aL. The relative stabilities of these conformers are very close to each other (DE # 2.2 kcal/mol). Gould et al. [21] previously found two stable conformers (corresponding to B and a by the de®nition of this paper) for Ac-Gly-NHMe and ®ve conformers (corresponding to B, b1, C7eq, C7ax, and aL) for Ac-Ala-NHMe by applying the PCM method. The PCM potential surfaces obtained here for For-GlyNH2 (1) and For-Ala-NH2 (2) suggested the presence of other stable conformers, such as PII, C5, and b2 for glycine and C5, aR, and PII for alanine. The discrepancy may arise from differences in the calculation method and in the molecular structure of the amino acid models. 4.3. Isodensity polarizable continuum model potentials It must be more meaningful to analyze solvent effects on the I-PCM potentials because the I-PCM method is obviously more reliable than the PCM method [26]. The potential surfaces obtained by methods IV±VI were compared with those obtained by method II. For the case of the glycine model (1), the extended C5 conformer was found to be a global energy minimum both in vacuo (method II) and in solvents (methods IV±VI). However, the pro®les of the potential surfaces in solvents (Figs. 3 and 4) were largely different from that in vacuo (Fig. 2). The most signi®cant change was in the a region, which became a distinct local energy minimum in solvents. This feature is in common with the PCM potential of 1. Another obvious change in the I-PCM potentials was that the energetic barrier at f ˆ 08; between the C7eq and C7ax regions, increases with a dielectric constant of the medium, rendering the conformational space narrower in polar solvents than that in vacuo. The values of %area listed in Table 4 quantitatively show this feature. The observed trend was completely opposite to that observed for the PCM potential. However, considering more accurate treatment of a solute molecule in the I-PCM method [26], it was suggested that the conformational space of 1 in polar solvents is more restricted than that in vacuo. The potential surface in ether (Fig. 3) changed remarkably from that in vacuo (Fig. 2), whereas that

in water (Fig. 4) did not differ much from that in methanol. The decrease of the solvent effects with the solvent polarity is another feature of the solvent effects. For the case of the alanine model (2), similar trends of solvent effects were observed as those observed for 1. The b1, b2, aR, and aL regions became distinct energy minima on the I-PCM potentials of 2 in polar solvents (see Figs. 7 and 8). The result is consistent with the experimental observation that Ac-AlaNHMe has conformational preferences to adopt a and b structures in water [23]. According to the values of %area in Table 6, the conformational space of 2 may be more restricted in polar solvents than that in vacuo. This trend can be easily seen in Figs. 6±8. Although there were apparent discrepancies between the PCM and I-PCM potentials of 2, they showed the common trends that the a and b regions are relatively stabilized in solvents and that the solvent effects decrease significantly with the dielectric constant. 4.4. Effect of long-range interactions Conformations of amino acid residues in proteins are governed by both conformational preferences of the individual amino acids and long-range inter-amino acid interactions. Therefore, comparison of the f ± c potential surfaces obtained for single amino acid models with the Ramachandran plots [31] extracted from PDB would be valuable for elucidating the roles of inter-amino acid interactions on the structure and the folding pathways of proteins. Ramachandran plots of glycine (Fig. 5) and alanine (Fig. 9) resulted in apparently better agreement with the f ± c potential surfaces obtained in water than those in vacuo. This feature is quantitatively demon strated for alanine in Table 6 by the values of delta E; which represents an average of strain energies for alanine residues in folded proteins due to interamino acid interactions: The I-PCM potentials of 2 (methods IV±VI) showed smaller values of delta E than those in vacuo (methods I and II). However, for the case of the glycine model (1), the values of delta E of the I-PCM potentials (methods IV±VI) were larger than those in vacuo (methods I and II). This is contrary to the apparent similarity between the IPCM potentials (Figs. 3 and 4) and the Ramachandran plots (Fig. 5). The contradiction is due to unexpected

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

121

Fig. 10. Density of the Ramachandran plots (%obs/%area) as a function of the relative energy (DE) for glycine.

signi®cant stabilization of the C5 conformer (DE . 2 kcal/mol) in the I-PCM potentials of 1. The observed consonance between the pro®les of the potential surfaces calculated for single amino acid models in solvents and the pattern of the Ramachandran plots obtained for the corresponding amino acid residues in folded proteins suggests that the conformational propensities of single amino acids may play predominant roles on determining the folded structures of proteins. In order to access the relative importance of long-range inter-amino acid interactions, the density of the Ramachandran plots (%obs/%area) with respect to the relative energy (DE) on the calculated f ± c potential surfaces was analyzed. The resulting DE vs. ln(%obs/%area) plots for For-GlyNH2 (1) and For-Ala-NH2 (2) are shown in Figs. 10 and 11, respectively. Almost linear correlations were obtained for the PCM potentials of 1 and 2 (method III), indicating that the density of the Ramachandran plots exponentially decreases with the relative conformational

energy. This, in turn, implies the random (Boltzmann) distribution of the conformations of glycine and alanine residues in folded proteins. Therefore, longrange interactions may statistically cancel out based on the PCM potentials. The temperatures estimated from the slopes of the correlations were 480 and 400 K for glycine and alanine, respectively. However, for the case of more reliable I-PCM potentials of 1 and 2, the linear correlations between DE and ln(%obs/%area) were observed only in the high energy ranges. Conformations of glycine residues in proteins showed statistically random distribution in the range of the relative energy larger than 5 kcal/mol, while the density of the Ramachandran plots was nearly constant in the lower energy range. The observation suggested that the conformations of glycine residues in proteins are controlled by longrange interactions in the low energy range (DE , 5 kcal/mol) to a signi®cant extent. About a half of the glycine residues (51%) were assigned to this class based on the I-PCM potential in water (method VI).

122

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

Fig. 11. Density of the Ramachandran plots (%obs/%area) as a function of the relative energy (DE) for alanine.

It is also notable that the three I-PCM potentials of 1 (methods IV±VI) showed approximately the same correlations to each other, indicating that pro®les of the potential surfaces are very similar to each other. Similar correlations were obtained for the I-PCM potentials of 2 (Fig. 11). Random distribution of Ramachandran plots for alanine residues in folded proteins were indicated only in the high energy range (DE . 3 kcal/mol), while the density of the Ramachandran plots was roughly constant in the lower energy range. About 40% of alanine residues were assigned to the low energy class (DE , 3 kcal/ mol). The peak at DE ˆ 3±4 kcal=mol may arise from the strong a-helical tendency of alanine [36±38]: The relative energy (DE) of the a region (f ˆ 2608 and c ˆ 2458), where a strong cluster was observed in Fig. 9, was 13.2 kcal/mol on the I-PCM potential in water (Fig. 8). Based on the I-PCM potentials of 1 and 2, longrange inter-amino acid interactions would be important only in the low energy range (DE , 5 kcal/mol

for glycine and DE , 3 kcal/mol for alanine). Conformations of the amino acid residues in these ranges would be controlled by both the single amino acid conformational potentials and the long-range interamino acid interactions. In contrast, conformations of the amino acid residues in the higher energy ranges must be determined predominantly by the single amino acid potentials. This implies that the strength of the long-range interactions, such as hydrogen bonds, would be at least less than 5 kcal/mol per each amino acid residue on the average. It is also interesting to note that the potentials of 1 and 2 in vacuo (method II) exhibit similar correlation to those of the I-PCM potentials in Figs. 10 and 11. This suggested that the main-chain conformational energies of proteins are not much different in the gas phase from those in water. 4.5. Implication to protein folding. Since Levinthal [39] pointed out that an unfolded

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

polypeptide needs an unrealistic long time to seek the folded protein structure by random conformational search, protein folding pathways have been thought to be hierarchical [40±42]. Many kinetic experiments indeed suggested the presence of compact intermediates with native-like secondary structures in the very beginning of protein folding [43±46]. Theoretically, the rapid processes are explained by the dual effects of entropic loss due to compactness of the intermediates and enthalpic gain due to the formation of inter-amino acid interactions on the energetic landscape [47], which is believed to be signi®cantly biased to the native fold. However, details of the early folding events are not yet well elucidated. The ab initio f ± c potential surfaces of 1 and 2 obtained in this work would provide the following suggestion as to the early events of protein folding: the conformational space of polypeptides in water must be signi®cantly restricted to the folded a and b secondary structures by the conformational propensities of individual amino acid residues. In other words, polypeptides would have intrinsic conformational preferences to form the secondary structures without the aid of long-range interactions. If this is the case, it must be an easy task for polypeptide molecules to ®nd the folded secondary structures by random conformational search. Indeed, our preliminary Monte Carlo simulations of polypeptides using the simple potentials consisting of the obtained I-PCM potentials, electrostatic interactions, and van der Waals repulsion forces have reproduced helical and b-sheet-like structures as the stable conformations [48]. Moreover, similarity of the potential surfaces obtained in ether and in water suggested that the main-chain conformational preferences of the amino acids in the core of globular proteins or in membrane proteins does not change greatly from those in the surface of proteins or in the unfolded states in water. By using the ab initio f ± c potentials of 1, 2, and other amino acid models, development of a novel protein force ®eld for prediction of protein folding pathways is undertaken in our laboratory. The obtained f ± c potentials will also be useful for re®ning current protein force ®elds. 5. Conclusions The solvent effects observed commonly in the PCM

123

and I-PCM potentials of 1 and 2 suggested (1) that the a region of 1 and the a and b regions of 2 are distinct local energy minima in polar solvents and (2) that the conformational preferences change only slightly in the solvents …1 ˆ 4:4±78:4† but are signi®cantly different from those in vacuo. The former trend is in accord with the previous observations by Shang and Head-Gordon [19]. Thus, intrinsic folding nature of polypeptides is strongly suggested. The obtained IPCM potentials further suggested that the conformational space of the amino acid residues is more restricted in water than that in vacuo. Comparison of the obtained PCM and I-PCM potentials with the Ramachandran plots extracted from protein databank (PDB_SELECT) revealed that glycine and alanine residues in folded proteins adopt random statistical structures to a signi®cant extent. According to the PCM potentials, conformations of the amino acid residues were found to obey approximately random distribution, while they seemed to adopt random distribution only in the high-energy ranges based on the I-PCM potentials. The discrepancy between the PCM and I-PCM potentials is ascribed to simpler treatment of a solute molecule in the PCM method. Some unclear points as to the obtained potentials are noted ®nally. First, solvent molecules are treated as continuous dielectrics in the PCM and I-PCM methods. This must be too simple to elucidate the accurate conformational preferences of a solute molecule in the solvent. However, as long as the nonempirical potential surface is demanded, explicit consideration of hundreds of solvent molecules in ab initio calculations is currently impossible. Second, the resolution of the f ± c potential surfaces obtained here (158) might be a little too rough. Some shallow energy minima may possibly be overlooked in the present work, or some shallow energy minima found in the present work will disappear by more extensive calculations. Third, the effect of electron correlation was not suf®ciently included in the calculation methods applied here, although signi®cance of electron correlation was previously pointed out [49]. Fourth, the molecular structure of the amino acid models (i.e. For-Aaa-NH2 vs. Ac-Aaa-NHMe) may affect the conformational preference slightly as suggested by Gould et al. [21]. In order to clarify these points, further studies are necessary to be carried out.

124

M. Iwaoka et al. / Journal of Molecular Structure (Theochem) 586 (2002) 111±124

Acknowledgements We thank Professor H.A. Scheraga for valuable comments on this work. We also thank the Computer Center, Institute for Molecular Science, Okazaki National Research Institute for the use of IBM SP2 and SGI SGI2800 (project code, ep5). This work was supported by Grant-in-Aid for Scienti®c Research on Priority Areas (C) `Genomic Information Science' from the Ministry of Education, Culture, Sports, Science and Technology of Japan. References [1] S.S. Zimmerman, M.S. Pottle, G. NeÂmethy, H.A. Scheraga, Macromolecules 10 (1977) 1. [2] K.T. O'Neil, W.F. DeGrado, Science 250 (1990) 646. [3] M.J. Rooman, J.-P.A. Kocher, S.J. Wodak, Biochemistry 31 (1992) 10226. [4] M. Blaber, X. Zhang, B.W. Matthews, Science 260 (1993) 1637. [5] C.L. Brooks III, D.A. Case, Chem. Rev. 93 (1993) 2487. [6] A.G. Street, S.L. Mayo, Proc. Natl Acad. Sci. USA 96 (1999) 9074. [7] P. Koehl, M. Levitt, Proc. Natl Acad. Sci. USA 96 (1999) 12524. [8] I.R. Gould, P.A. Kollman, J. Phys. Chem. 96 (1992) 9255. [9] S. Gronert, R.A.J. O'Hair, J. Am. Chem. Soc. 117 (1995) 2071. [10] Y.K. Kang, J. Phys. Chem. 100 (1996) 11589. [11] A. Perczel, O. Farkas, I.G. Csizmadia, J. Am. Chem. Soc. 118 (1996) 7809. [12] A. Perczel, O. Farkas, A.G. CsaÂszar, I.G. Csizmadia, Can. J. Chem. 75 (1997) 1120. [13] F.J. RamõÂrez, I. TunÄoÂn, E. Silla, J. Phys. Chem. B 102 (1998) 6290. [14] M. Pettitt, M. Karplus, J. Phys. Chem. 92 (1988) 3994. [15] A.G. Anderson, J. Hermans, Proteins: Struct. Funct. Genet. 3 (1988) 262. [16] R. Elber, J. Chem. Phys. 93 (1990) 4312. [17] T.J. Marrone, M.K. Gilson, J.A. McCammon, J. Phys. Chem. 100 (1996) 1439. [18] T. Head-Gordon, M. Head-Gordon, M.J. Frisch, C.L. Brooks III, J.A. Pople, J. Am. Chem. Soc. 113 (1991) 5989. [19] H.S. Shang, T. Head-Gordon, J. Am. Chem. Soc. 116 (1994) 1528. [20] L. Onsager, J. Am. Chem. Soc. 58 (1936) 1486. [21] I.R. Gould, W.D. Cornell, I.H. Hillier, J. Am. Chem. Soc. 116 (1994) 9250. [22] S. Miertus, E. Scrocco, J. Tomasi, Chem. Phys. 55 (1981) 117. [23] V. Madison, K.D. Kopple, J. Am. Chem. Soc. 102 (1980) 4855. [24] X. Lopez, A. Dejaegere, M. Karplus, J. Am. Chem. Soc. 121 (1999) 5548.

[25] K.-H. Cho, K.T. No, H.A. Scheraga, J. Phys. Chem. A 104 (2000) 6505. [26] J.B. Foresman, T.A. Keith, K.B. Wiberg, J. Snoonian, M.J. Frisch, J. Phys. Chem. 100 (1996) 16098. [27] J.L. Radkiewicz, S. Clarke, K.N. Houk, J. Am. Chem. Soc. 118 (1996) 9148. [28] J.S. Jhon, Y.K. Kang, J. Phys. Chem. A 103 (1999) 5436. [29] M.K. Shukla, S.K. Mishra, A. Kumar, P.C. Mishra, J. Comput. Chem. 21 (2000) 826. [30] P. Bandyopadhyay, M.S. Gordon, J. Chem. Phys. 113 (2000) 1104. [31] G.N. Ramachandran, V. Sasisekharan, Adv. Protein Chem. 23 (1968) 283. [32] U. Hobohm, M. Scharf, R. Schneider, C. Sander, Protein Sci. 1 (1992) 409. [33] U. Hobohm, C. Sander, Protein Sci. 3 (1994) 522. [34] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman, V.G. Zakrzewski, J.A. Montgomery, R.E. Stratmann, J.C. Burant, S. Dapprich, J.M. Millam, A.D. Daniels, K.N. Kudin, M.C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G.A. Petersson, P.Y. Ayala, Q. Cui, K. Morokuma, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J. Cioslowski, J.V. Ortiz, A.G. Baboul, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham, C.Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challacombe, P.M.W. Gill, B. Johnson, W. Chen, M.W. Wong, J.L. Andres, C. Gonzalez, M. Head-Gordon, E.S. Replogle, J.A. Pople. gaussian 98, Gaussian, Inc., Pittsburgh, PA, 1998. [35] J.A. Grant, R.L. Williams, H.A. Scheraga, Biopolymers 30 (1990) 929. [36] S. Marqusee, V.H. Robbins, R.L. Baldwin, Proc. Natl Acad. Sci. USA 86 (1989) 5286. [37] E.J. Spek, C.A. Olson, Z. Shi, N.R. Kallenbach, J. Am. Chem. Soc. 121 (1999) 5571. [38] C.A. Rohl, W. Fiori, R.L. Baldwin, Proc. Natl Acad. Sci. USA 96 (1999) 3682. [39] C. Levinthal, J. Chem. Phys. 65 (1968) 44. [40] M. Karplus, D.L. Weaver, Biopolymers 18 (1979) 1421. [41] S.C. Harrison, R. Durbin, Proc. Natl Acad. Sci. USA 82 (1985) 4028. [42] P.S. Kim, R.L. Baldwin, Annu. Rev. Biochem. 59 (1990) 631. [43] K. Kuwajima, Proteins: Struct. Funct. Genet. 6 (1989) 87. [44] H. Christensen, R.H. Pain, Eur. Biophys. J. 19 (1991) 221. [45] P.X. Qi, T.R. Sosnick, S.W. Englander, Nat. Struct. Biol. 5 (1998) 882. [46] S. Akiyama, S. Takahashi, K. Ishimori, I. Morishima, Nat. Struct. Biol. 7 (2000) 514. [47] J.D. Bryngelson, J.N. Onuchic, N.D. Socci, P.G. Wolynes, Proteins: Struct. Funct. Genet. 21 (1995) 167. [48] M. Iwaoka, S. Tomoda, unpublished results. [49] R.F. Frey, J. Cof®n, S.Q. Newton, M. Ramek, V.K.W. Cheng, F.A. Momany, L. SchaÈfer, J. Am. Chem. Soc. 114 (1992) 5369.