J. Mol. Biol. (1992) 224, 725-732
Structure-derived Hydrophobic
Hydrophobic
Potential
Potential Derived from X-ray Structures of Globular Proteins is able to Identify Native Folds Georg Casari and Manfred J. Sippl Institute for General Biology, Biochemistry and Biophysics Department of Biochemistry, University of SaEzburg Hellbrunnerstrafle 34, A-5020 Salzburg, Austria (Received 19 August
1991; accepted 18 December 1991)
We present a model for the hydrophobic interaction in globular proteins that is based entirely on an analysis of known X-ray structures. This structure-derived hydrophobic force is identified as the strongest among the non-covalent interactions that stabilize native folds. The functional form of the hydrophobic interaction is found to be linear, corresponding to a constant force along the observable distance range (5 to 70 A). The parameters of the hydrophobic amino acid pair potentials yield a structure-derived hydrophobicity scale that correlates strongly with scales derived by a variety of complementary approaches. We demonstrate that the structure-derived hydrophobic interaction alone is able to distinguish a substantial number of native conformations from a large pool of misfolded structures. Keywords:
hydrophobic
interaction; hydrophobic potential; protein modelling; protein folding
potentials
of mean force;
(PMFP). These potentials describe the interactions between individual amino acid residues (Sippl, 1990). For a particular protein they yield a selfconsistent force field reflecting the various physical interactions that stabilize the native fold. The force field constructed from these PMF values can be used to identify native protein conformations among large numbers of misfolded structures (Hendlich et al., 1990). This ability to identify the proper native fold is an indispensable condition for any force field aiming at the prediction of protein structures from amino acid sequences. An individual PMF represents a combination of several basic forces contributing to the stability of native folds with respect to the average environment of globular proteins. Among these are steric, electrostatic, covalent and hydrophobic interactions. In this study we characterize the strongest interaction between amino acid side-chains and obtain its functional form from an analysis of the set of PMF values derived from the database. We find that the most dominant contribution to protein stability corresponds to the hydrophobic force.
1. Introduction Proteins adopt a stable and well-defined native fold that is essential for their proper function. Although the forces that govern the formation of a stable structure are not fully understood, several general principles have been identified to be of central importance (e.g. see Novotny et al., 1984, 1988). These are optimized hydrogen bonding, sidechain packing in the interior of proteins, as well as electrostatic forces and hydrophobic interactions. In the case of hydrophobic forces, the general rule is that hydrophobic residues are closely packed in the interior of globular proteins whereas hydrophilic side-chains are mostly exposed to the solvent water. Several hydrophobicity scales have been derived from experimental data (Kidera et al., 1985). However, the functional form of hydrophobic interactions is still unknown and the question of the importance of hydrophobic interactions relative to electrostatic and other forces is still unsettled. The database of native structures of globular proteins that have been solved by X-ray analysis in some coded form contains the information on all interactions that stabilize the native structure. It is possible to translate this information from the structures into energy potentials of mean force
t Abbreviations used: PMF, potential of mean force; SDRH, Structure-Derived Relative Hydrophobicity. 725
0022-2836/92/0707254-08
$03.00/O
0
1992 Academic
Press Limited
726
G. Casari
and M. J. Sippl
Furthermore, we show that the hydrophobic potential derived from our analysis is able to identify the native conformation of many proteins among a large number of misfolded structures. 2. Interaction Potentials of Mean Force for Amino Acid Side-chains X-ray structures of 88 individual chains of globular proteins of low sequence homology were selected for this analysis (Table 1). To obtain an effective and consistent representation of amino acid residues only the positions of b-carbon (Cs) atoms were considered. We choose the CY” atom to describe an amino acid, since the CB position is most sensitive to the location of the side-chain and it is uniquely defined by the backbone of the protein. In the case of glycine, which lacks a Cp atom, a virtual position is constructed from the N, C” and carbonyl C atoms. An individual PMF is defined by: AE’*(r) = -IcT
ln[f”*(r)/f(r)],
where a and b each correspond to one of the 20 amino acids, f”*(r) is the relative frequency of pair ab at distance r and f (r) is the relative frequency of any pair regardless of the nature of the amino acid at distance r (Sippl, 1990). The average environment as described by the distribution!(r) serves as a reference state. In this study, we focus on the nature of noncovalent forces. Since the interactions of amino acid residues linked by short peptide segments are largely dominated by covalent forces we calculated the PMF values for residue pairs with sequential separation of >20 peptide bonds. This results in 400 individual potentials AE’*(r) for the possible combinations of the 20 standard amino acids. A selection of some typical examples is shown in Figure 1. Arg-Lys and Lys-Asp have low energies at large separations as compared to short distances r. This reflects a general tendency of repulsion. Ile-Phe and Cys-Cys pairs exhibit an opposite behaviour since they have low energy at short
distances and high energy at large spatial separations. These residue pairs show a general tendency of attraction. The remaining examples, Asn-Ala and Asp-Trp, show no obvious tendency. The repulsive PMF values correspond to pairs of hydrophilic amino acids and the attractive PMF values to hydrophobic pairs. The PMFs with AE“*(r) z 0 belong to amino acid pairs of weak interactions. The fluctuations in the last two examples indicate the magnitude of statistical fluctuations that is inherent in our analysis due to the sparse data base. The pair Asp-Trp is less frequent as compared to Asn-Ala. Therefore, the fluctuations for this pair are more pronounced.
3. The Dominating Contribution to the Potentials of Mean Force The potentials AEs*(r) can be represented as a linear combination of a set of normalized basis functions tii(r): A&“*(r) = t
Cr*#i(r).
i=l
The functions tii(r) describe the functional form of the basic forces and the parameters cy” reflect their contributions to AEab(r). In order to obtain a potential describing the most common feature of the set of all PMF values we seek a potential function $,,,(r) that maximizes the contribution to all values of AE”*(r). Hence, we have to extremize: 20,20
2
m
AE”*(r)$,(r) o=l,b=l=
(s
r=O
dr
+ maximum, )
subject to the normalization
constraint:
O” ($,(r))2 dr = 1. s *=O Introducing
the Lagrange multiplier
Table 1
155c 1CRN 1GCR 1MBD 1PHH 1TGN 2CAB 2GRS 2MT2 2SSI 3GPD-G 4CYT-R RCPA
1ABP ICSE-I lGPl-A lMCP-H lPP2-R lTIM-A 2CCY -A 2HHB-A 2PAB-A 2TAA-A 31CB 4DFR-A 5RXN
List of protein
structures
1ACX 1CTF 1HIP lMCP-L 1PPT BABX-A 2CDV 2HHB-B BPKA-A 351c 3PGK 4LDH SCAT-A
1BDS 1CTs 1HOE 1OVO-A 1PYP 2ALP 2CNA 2LHB 2PKA-B 3ADK 3RP2-A 4PTI 9PAP
used in this work lBP2 1EST IHVP-A 1PAZ 1RHD 2AZA-A 2CYP 2LZM 2SBT 3FXC 3WGA-A 4TLN
lCC5 1ETU lLH4 1PCY lRN3 2B5C 2EBX SMDH-A 2SGA 3FXN 4ADH BAPI-A
1CPV 1FDX lLZ1 IPFK-A lSN3 2c2c 2GN5 2MHR 2SOD-0 3GAP-B 4APE BAPI-B
The structures were taken from the Brookhaven Structural Database (Bernstein et al., 1977) and are listed by the names of the database entry. Selected chains we suffixed with a hyphen and letter.
1 we obtain
727
Structure-derived Hydrophobicity
Lys-Psp
Arg-Lys
IO1 0
20
6
2-O
40
60
40
60
40
20
80
60
40
20
0
80
60
80
60
80
Asp-Trp
Asn-Ala IO-
-lo-: 0
20
‘“,
60
40
00
.
0
I
.
20
40 rl81
r(8)
Figure 1. Several examples of amino acid pair potentials of mean force (PMF) AEpb(r). The repulsive potentials belong to hydrophilic
pairs and the attractive
potentials to hydrophobic
pairs of residues. Weakly interacting
pairs illustrate the
fluctuations due to the sparse data set. the Eigen value problem: 20,20
c a=l,b=l
AEb(r)
r*l AE”b(W,t(r) s r=O
dr = 4,&,,(~),
where q?,,,(r)is the Eigen function corresponding to the largest Eigen value 1,. Since AEab(r) is obtained from sampling in intervals for numerical analysis, we have to use the discrete counterpart:
where E is a matrix with the discrete elements eob,r = AE’*(r). The potential Jim(r) obtained from this analysis is shown in Figure 2. In the regions that are well defined by many available data points (5 to 70 A: 1 A = 01 nm) the shape of $,,,(r) is nearly linear. Therefore, the force d$,(r)/dr corresponding
-1.04
. 0
1 20
*
a 40
-
1 60
.
El0
r(8)
Figure 2. The potential function $,,,(r) of maximum contribution to the set of mean force potentials AIPb(r). $,(r) is identified as the hydrophobic interaction potential A4(r). A linear fit for the region 4 to 70 A is superimposed.
-1.0 -1.0 -0.1 96 22
97 -1.3 I.0 1.2 -1.1
-98 -01 22 93 97
Gln Glu GUY His Ile
Leu LYS Met Phe Pro
SW Thr 3 Vr Val
-07 -68 2.0 62 61
-02 -21 65 96 -1.4
-1.6 -1.3 -04 92 1.4
-04 -15 -1.2 -1% 24
Ax
- 2.0 -1.3 0.4 -1.1 -96
-1.2 -2.7 -65 -0.2 -1.8 67 1.0 2.4 1.2 2.2
2.7 96 2.7 1.7 1.3
96 1.1 1.1 27 3.2
2.2 96 1.9 63 50
CYS
-94 -0.1 1.4 -92 0.5
0.3 - 1.8 1.0 0.7 -@9
-1.1 -1.2 0.4 0.2 1.9
0.0 -0.9 -63 -1.5 1.8
‘JOY
Table 2
@O -02 F9 1.3 1.2
98 -98 0.4 1.6 -62
-0.5 -62 -0.2 67 1.8
97 -97 91 -1.0 3.1
His
-0.1 0.0 2.0 0.9 1.0
-24 -2.2 93 -1.7 - 1.1 0.3 @9 34 1.4 1.6
0.0 93 28 68 1.4
0.6 - 1.4 1.6 1.4 -02
-1.5 -30 -1.3 -06 -26
1.2 -1.0 15 2.0 -05
1.3 -96 1.9 2.0 0.0
-97 -0.9 96 0.1 1.7
-2.3 -2.6 -1.9 - 1.2 -04
-04 -66 93 1.2 29
-02 -0.3 0.9 1.0 3.4
0.3 -0.4 0.7 -1.1 1.6
- 1.7 -3.0 -2.0 -28 98
97 -65 92 -0.9 23
1.1 -0.1 97 -96 2.8
Met
Lys
structures
Leu
derived from
He
hydrophobicities
66 65 37 2.3 1.4
1.2 -0.8 1.4 2.3 -0.4
-63 -64 65 1.1 3.5
1.1 -0.5 94 -0.7 3.3
Phe
1.5 -02 1.5 2.9 98
-01 -1.8 66 98 -09 -04 -2.0 0.1 -0.2 - 1.3 -1.6 -1.6 0.2 -1.1 0.3
-1.2 -2.4 -0.2 -95 - 2.0 - 2.0 -1.6 94 -0.5 -0.4
98 96 2.6 2.5 2.2
96 66 97 1.7 30
-1.4 -1.3 -97 @O 1.5 -2.3 -1.6 -0.8 0.0 1.1
-2.3 -1.9 -1.0 -98 0.8
-1.2 - 1.0 68 -09 63
1.4 1.0 1.2 91 36
-93 -1.4 -0.9 -2.0 1.3
-0.5 -1.3 -1.4 -23 1.3 - 1.1 -2.1 -1.7 -27 1.9
Trp
Thr
Pro
Ser
of globular proteins
amino acid of each pair. The same scaling is used as for SDRH values in Table 3.
-2.5 -2.0 65 -1.1 - 1.2
-1.5 - 2.5 -1.1 -66 -23
-97 -25 -09 -93 -1.8 -1.6 -1.3 68 -1.0 -03
-2.5 -2.8 -1.8 -0.9 -91
- 1.6 -2.7 -1.8 -2.9 68
GIU
acid pair
-2.1 -1.7 -1.4 -06 97
-1.0 -2.0 - 1.4 -30 98
Gln
amino acid columns to C-terminal
-1.5 -1.0 1.3 -91 0.0
-0.3 -2.1 92 93 -1.4
-2.2 -24 -1.3 -1.2 0.4
-1.2 -2.2 - 1.9 -26 0.9
-93 -1.7 -1.1 -20 2.0
-1.7 - 1.7 -05 -05 1.2
ASP
Asn
Rows correspond to N-terminal
96 -0.8 0.0 -1.4 2.0
Ala Arg Asn ASP CYS
Ala
The complete set of amino
-04 -91 2.6 0.9 1.1
1.5 -0.2 1.8 1.6 -0.3
0.0 0.2 2.9 1.0 1.3
1.0 -0.9 1.5 1.8 -02
-0.2 - 0.6 0.4 1.1 2.7
0.7 -65 0.3 -0.7 3.1
0.9 -62 0.4 -68 28 -0.2 @O 0.4 1.0 31
Val Tyr
Structure-derived
to this interaction is constant in the observable distance range. Once e,,,(r) is known, the coefficient c”,” can be determined from AE’*(r) as the scalar product: co ob AEab(r)\l/,(r) dr. %I s r=O This analysis results in a set of coefficients c”,” that measure the contribution of e,,,(r) to the corresponding AEab(r). The complete set of these coefficients is listed in Table 2. The c”,” values are large and positive for pairs of hydrophobic amino acids (e.g. Cys-Cys 50, Trp-Phe 3.7, Ile-Phe 3.5) and they are large and negative for pairs of hydrophilic amino acids (e.g. - 3.0, Asp-Gln - 3.0, Lys-Lys - 3.0). Arg-Lys Positive values correspond to an attractive potential, negative values to repulsive interactions. This is similar to the behaviour of hydrophobic interactions. Therefore, tjm(r) might serve as a model for the hydrophobic interaction where the coefficients ckb express the strength of that interaction for the pair ab. We define $Jr) = A&r), and we call A&r) the structure-derived hydrophobic potential. We note that A4(r) appears to be a linear function in the observable distance range, so that A&r) x (r - rO). Similarly we define ckb= hub as the combined hydrophobicity of the amino acid pair ah. We emphasize that the resulting hydrophobic pair potential A&“(r) = h”*A&r) is defined with respect to the same reference state as the potentials AEab(r).
4. Structure-derived Hydrophobicities and Hydrophobic Interaction Potential Our analysis yields a set of 400 amino acid pair hydrophobicities hab (see Table 2). To determine whether these pair hydrophobicities exhibit a regularity, the hab-values were investigated further. Using a principal component analysis (Lebart et al., 1984) the set of the hab can be decomposed to a h’. combination of single residue hydrophobicities h’” is then approximated by: h Ob“N h”+h*.
The hydrophobicities h’ of single residues establish a hydrophobicity scale, which we call Relative Hydrophobicity Structure-Derived (SDRH). In Tables 3 and 4 we compare the SDRH scale with other commonly used hydrophobicity scales. The correlation of SDRH with these scales is high (Z 74%) indicating that SDRH is indeed closely related to the hydrophobicity of amino acids obtained by several complementary approaches. Using hab x h” + hb and the linear model for A&r) z (r-To), we obtain an expression for the hydrophobic interaction energy A@*(r) of pair ab: A@‘(r) = habA$(r) z (h”+hb)(r-ro), where r is the distance between the CB atoms of amino acids a and b and r. = 23 A is a constant, which is determined from A&r) = 0 (see Fig. 2).
Hydrophobicity
729
Table 3 Comparison
SDH
CYS Trp Ile Phe Val TY~ Leu Met His Ala GUY Thr Asn Ax Pro Ser Gln Glu Asp LYS
of the structure-derived hydrophobicity values with other commonly used scales
SDH
SDRH
K&D
OMH
S&E
Eng
2.3 1.9 1% I.3 I.0 0.9 0.9 0.8 0.8 0.5 0.3 -0.1 -0.1 -0.3 -0.6 -0.4 -0.7 -0.9 - 1.0 -1.2
1.9 I.6 I.4 1.0 0.7 03 0.5 0.5 64 0.2 -01 -0.4 -@5 -0.7 -1.0 -Q7 -1.1 -1.3 - I.4 -1.6
I.0 -0.1 I.7 I.1 I.6 -0.3 I.5 0.8 -@Q 0.8 0.0 -0.1 -1.0 -1.4 -0.4 -01 -1.0 -1.0 -1.0 -11-2
0.2 0.5 I.3 2.0 0.9 I.7 I.3 I-O -@7 -04 -0.7 -0.3 -0.9 -0.6 - 0.5 -0.6 -@Q - 1.3 - 1.3 -0.7
0.3 0.8 1.4 I.2 I.1 0.3 I.1 0.7 -0.4 0.6 0.5 -0.1 -0.8 - 2.6 0.1 -0.2 -0.9 -0.X - 0.9 - 1.5
-0.7 -0.7 -@Q -1.1 -0.8 -01 -0.9 - I.0 0.3 -06 -0.5 -0.6 0.7 2.3 -0.2 -0.4 0.6 I.4 l-6 1.6
All scales are normalized to a standard (with the exception of SDH) to an average SDH, structure-derived hydrophobicity; derived relative hydrophobicity (centred (Kyte et al., 1982); OMH, S&E, (Sweet (Engelman et al., 1986).
deviation of I.0 and value of 00. SDRH, structurearound QO); K&D, et al., 1983); Eng,
Table 4 Correlation
SDH K&D OMH S&E Eng
table of several
hydrophobicity
SDH
K&D
OMH
I.0 074 0.74 073 -073
I.0 0.75 0.88 -Q85
I.0 0.72 -0.69
S&E
I.0 -094
scales Ew
1-o
The scales are labelled as in Table 3. SDH and SDRH yield identical correlations.
We note that the hydrophobic potential AVb(r) and its approximation are extracted from the set of AEab(r) values compiled from a database of known protein structures. These potentials display the dominating contribution to the PMF values and can therefore be regarded as the strongest non-covalent force that stabilizes native protein folds.
5. The Average Hydrophobicity Residues
of Amino Acid
The deduced hydrophobicities h’ of SDRH are based entirely on differences between different amino acid types. With this approach we are not able to identify the average hydrophobicity common to all amino acids. The structure-derived hydrophobic pair potential AVb(r) is the product of two terms (AVb(r) = hubA&r)). hab is an amino acid pair parameter that is independent of the separation r. A&r), on the
730
G. Casari and M. J. Sippl
Table 5 Identijkation
of the native fold using the hydrophobic potential
Protein
Length (residues)
1CRN 5PTI 1CTF 1PCY 1ACX 1CCR 1ALC lUP2 lRN3 6LYZ 4FXN 2HH13 A 2SNS 2HHU 13 3DFR 2LZM 2ALP 2FR4 L
46 58 68 99 107 111 122 123 124 129 138 141 141 146 162 164 171 216
A@
A@”
Apb
>loo 70 47 23 >loo >loo >loo 1100 Zloo >loo 31 >loo >loo >loo >loo >loo 1 >loo
82 13 1 1 1 1 >loo 3 1 3 1 1 1 2 1 11 1 1100
32 >loo >loo >loo >loo >loo >loo 89 >loo 2 >loo >loo zloo 59 >loo z-100 36 1
The list shows the position at which the native fold is ranked. The criterion w&s calculated in several models: A@, average hydrophobic energy potential (hydrophobicity is h); AW”, absolute hydrophobic energy potential with optimum b (hydrophobicity from SDH h’+B); A@“, relative hydrophobic energy potential (hydrophobicity from SDRH h’).
other hand, is exclusively a function of r. Hence, the average hydrophobic interaction follows the same functional form A&r), but the proper coefficient of the average hydrophobic potential A@(r) is unknown. In analogy to ha* z ha+ h* the average pair hydrophobicity can be written as the sum of average single residue hydrophobicities two fi+ fi = 26. For the average hydrophobic interaction energy A@(r) we obtain the expression: A@r) = 2&r-r,). We emphasize that the average hydrophobic potential is sequence independent. The complete model for the hydrophobic interactions of amino acid residues in globular proteins is the sum of the relative hydrophobic pair potential and the average hydrophobic potential: A@*(r) = A@*+Aa(r) = [(h”+g)+(h*+6)](r-TO).
6. Determination
of the Average Hydrophobicity
An important point is whether the hydrophobic potential A@“*(r) can be used to identify native folds. To investigate this point we used the approach described by Hendlich et al. (1990). A particular protein sequence is mounted on all conformations in the database. If the total hydrophobic energy: t i=l,
j-i+20
A@‘“*(rij),
of the native fold is lowest, then the potential is able to distinguish the native fold from the alternatives. In order to apply A@‘*(r) it is necessary to estimate the average hydrophobic&y 6. The most suitable estimate is that value of b which is most successful in the identification procedure. Hence, we applied the identification procedure using a series of values for h and monitored the success of identification. This variation affects only the average hydrophobic part A@(r) = 2hA&r) of the hydrophobic pair potential. A limited set of 18 proteins from the database was used to assess the value of h. Table 5 summarizes the identification success for AcM*(r) with the optimum value for the average hydrophobicity obtained as 6 = 036. For comparison, the results for Ad(r) and A@‘*(r) are listed. Table 3 shows the absolute hydrophobicity scale SDH obtained from h’+ h.
7. Identification of Native Protein Folds by the Hydrophobic Potential Using the absolute hydrophobicities SDH of Table 3 and the corresponding hydrophobic pair potential, we performed an extensive test of the ability of A@“*(r) to identify the native folds of 68 protein chains. The results are summarized in Table 6. Of the 68 proteins, 45 (66%) are correctly identified using the absolute hydrophobic potential. In ten more cases (14.7 %) the native fold is among the five conformations of lowest energy. In these cases the native structure belongs to the conformations that are strongly favoured by the hydrophobic several thousand energy as compared to alternatives. The 23 protein chains that are not identified correctly belong to one or several of the following classes. (1) Chains that are subunits of larger complexes (2FB4-H, 2FB4-L, 3WGA-A, BHHB-B, 1HMQ-A, ZPKA-A, IPPBR). (2) Proteins containing prosthetic groups or metal ions. (cytochromes BCCY-A, 155C, 3C2C, 2CDV, 451C, haemoglobin 2HHB-B, or metal proteins 3ICB, 5RXN, IFDX, lHMQ-A). (3) Protein chains smaller than 80 residues (7 cases). These proteins have only a small number of long-range interactions (i.e. pairs with sequential separation >20 peptide bonds). In these cases the calculations take into account only a small fraction of the total conformational energy. (4) Phospholipases (lBP2, IPPS-R), which have hydrophobic surface regions required in their interaction with membranes. (5) Lysozymes (2LZM and 6LYZ) consisting of two domains. This may account for the difficulty in identifying these proteins. If the average hydrophobic&y E is lowered by one-third, the native structures of lysozymes are correctly identified.
8. Discussion From our analysis, we obtain an energy potential for hydrophobic interactions of amino acid residues in globular proteins. This potential is derived from
Structure-derived Hydrophobicity
Table 6 Test of the hydrophobic potential Amab(r) for its ability to identify the native fold of a protein Length (residues) 374 333 329 325 323 316 307 293 293 275 256 247 237 229 229 222 218 218 216 215 184 174 171 170 169 164 162 153 153 151 149 146 146 141 141 138 136 129 129 127 124 123 I22 121 114 113 112 111 108
107 107 106 99 98 85 85 83 82 7x 75 tin 65 62 58 56 54 54 46
Protein
Position
Size of pool of conformations
8ADH 4MDH-A 6LDH 3APR-E* 2APP* 3TLN 5CPA 2CYP 1RHD 1SBT 2CAB lTIM-A 2CNA 2FB4-H* IEST lTPP* 2ACT* IHNE-E* 2FB4-L* 3RP2-A 2STV* 1GCR 2ALP 3WGA-A ZSGA 2LZM 3DFR 2LH4 IMBO* 2SGD-0 2LHB BPKA-B 2HHB-B 2SNs* 2HHB-A 4FXN lECD* 6LYZ 2AZA-A 2CCY-A lRN3 lBP2 IPPB-R 155C 2PABA lHMQ-A* 3C2C 1CCR 1CPV 2CDV 1ACX 4FDl* 1PCY 3FXC 2B5(’ 1HIP lC(‘5 / ,l 451(’ 2PKA-A 31CB 1CTF lSN3 lNXB* 5PTI 2GVO 5RXN IFDX 1CRN
1 1 1 1 1 1 1 1 1 1 1 1 1 > 100 1 1 1 1 >lOO 1 >lOO 1 1 >lOO 1 11 1 1 1 1 1 1 2 1 1 1 1 3 1 2 1 3 2 2 1 6 3 1 1 5 1 1 1 1 1 1 1 41 >lOO > 100 1 1 11 13 2 >lOO 2 a2
190 314 331 352 365 415 488 616 616 815 1044 1162 1303 1425 1425 1545 1618 1618 1657 1678 2330 2551 2621 2646 2672 2803 2858 3112 3112 3173 3236 3337 3337 3514 3514 3626 3703 3981 3981 4066 4196 4241 4287 4334 4664 4713 4763 4814 4968 5022 5022 5077 5463 5520 6270 6270 6391 6453 6702 6892 7341 7537 7736 8005 8142 8282 8282 8851
potentials of mean force, which are compiled directly from a data base of known protein structures. In deriving the hydrophobic interaction, no additional assumptions entered the calculation and the functional form of the potential followed directly from the analysis. The hydrophobic potential corresponds to the Eigen function associated with the largest Eigen value and therefore corresponds to the dominating stabilizing force. As such it is not necessarily a basic physical force. The identification of this Eigen function as a hydrophobic potential is based on two observations: (1) the potential is attractive for hydrophobic amino acid pairs and repulsive for hydrophilic pairs; and (2) the single residue hydrophobicities (SDH) obtained in our study correlate with hydrophobicity scales derived by a variety of complementary experimental and theoretical approaches. The most striking deviation of the structure-derived single residue hydrophobicities from other scales is found for cysteine. Most cysteine residues in the database form disulphide bonds (i.e. they are present as cystines) and are not comparable to free cysteine residues in solution. In fact cystine pairs are probably strongly hydrophobic since the formation of disulphide bonds increases the hydrophobicity of the reactants (examples can be found in Weast, 1974). On the other hand, the covalent link has strong influence in our analysis so that the apparent hydrophobicity of cysteine derived in our study is likely to be overestimated. If all pairs involving cysteine are excluded from the analysis, we obtain virtually the same single residue hydrophobicities for the remaining 19 amino acids. Some of the pair hydrophobicities, such as Ile-Phe and Phe-Ile, are non-symmetrical so that surprisingly the hydrophobicity of such pairs depends on chain direction. At present we can not estimate the significance of this result since it may be due to the problem of sparse data. The usefulness of the structure-derived hydrophobicity is demonstrated by the ability to identify a substantial number of native protein folds among a large number of alternatives. The hydrophobic effect is only one component of several stabilizing interactions. Therefore, it is surprising that the hydrophobic potential alone is able to identify a large number of native conformations without including the additional forces (e.g. electrostatic interactions). However, for several proteins additional forces may be very important. In such cases, the hydrophobic potential is likely to be an insufficient approximation to the molecular force field. Examples are 1HMQ or 2LZM. Using the complete potentials of mean force the native structures of
The protein structures and their abbreviations are taken from the Brookhaven Structural Database. Proteins marked with an asterisk were not included in the data set (Table 1).
732
C. Casari
ad
these proteins are identified correctly (Hendlich et al., 1999). The result that hydrophobic interactions are the dominating contribution to the stability of native folds is in agreement with experimental findings. Thermodynamic measurements demonstrated a large change in heat capacity AC, for reversible denaturation of proteins pointing to a dominating role of the hydrophobic effect for protein stability (Privalov, 1990). Similar conclusions have been reached by lattice simulations employing a simple hydrophobic potential based on nearest-neighbour interactions (Dill, 1999). In spite of the very complex nature of hydrophobic interactions involving the protein and solvent molecules, in the observable distance range the hydrophobic potential can be well approximated by a linear dependency on the separation. Hence, the associated force is extremely long range. A similar potential of constant force is known for the surface tension of liquids (Moore, 1972), which suggests that the formation of liquid drops and the hydrophobic effect in globular proteins are intrinsically the same physical phenomenon. Recently, Sharp et aE. (1991), reported similar conclusions. The hydrophobic interaction derived in this study is a potential of mean force. This implies that it describes the protein solvent system in equilibrium. Therefore, the potential cannot be applied in the study of very fast processes like molecular dynamics simulations. However, this may be a remarkable advantage in the study of slow processes in which the system has sufficient time to relax to equilibrium in each step. Protein folding is such a slow process, which occurs on a time scale of seconds or minutes, where the gross structural change8 can be regarded a8 a sequence of equilibrium states. Thus, mean force potentials like the structure-derived hydrophobic potential should be a valuable tool in studying protein folding. In addition, the structurederived hydrophobic potential might be useful for estimating the hydrophobic effect of protein complexes, since the assembly and stability of strongly depend8 on hydrophobic oligomers interactions. We are indebted to all X-ray crystallographers who submitted co-ordinates to the Brookhaven Protein Data Bank. We thank Anton Beyer and Peter Lackner for their support as well as Rita Taba and Uttam Surana for corrections on the manuscript. Part of this work was done during the Ph.D. thesis of G.C. at the Institute of Molecular Pathology, Vienna. G.C. is a postdoctoral fellow (Otto-Loewi-Stipendium Nr. KO953-MED). This
Edited
M. J. Sippl
work was supported by the Fonds zur Fiirderung der wissenschaftlichen Forschung (Austria) under project number P&%361-CHE. References Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rogers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. BioZ. 112, 535-542. Creighton, T. E. (1984). Proteins. W. H. Freeman & Co., New York. Dill, K. A. (1996). Dominant forces in protein folding. Biochemistry, 29, 7133-7155. Engelman, D. M., Steitz, T. A. & Goldman, A. (1986). Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu. Reo. Biophys. Chem. 115, 321-353. Hendlich, M., Lackner, P., Weitckus, S., Floeckner, H., Froschauer, R., Gottsbacher, K., Casari, G. & Sippl, M. J. (1999). Identification of native protein folds amongst a large number of incorrect models. J. Mol. Biol. 216, 167-180. Kidera, A., Konishi, Y., Oka, M., Ooi, T. & Scheraga, H. A. (1985). Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J. Protein Chem. 4, 23-55. Kyte, J. & Doolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein, J. Mol. Biol. 157, 105-132. Lebart, L., Morineau, A. & Warwick, K. M. (1984). Multivariate Descriptive Statistical Analysis. John Wiley & Sons, New York. Moore, W. J. (1972). Physical Chemistry. Longman, London. Novotny, J., Bruccoleri, R. & Karplus, M. (1984). An analysis of incorrectly folded protein models: implications for structure predictions. J. Mol. Biol. 177, 787-818. Novotny, J., Rashin, A. A. & Bruccoleri, R. E. (1988). Criteria that discriminate bet’ween native proteins and incorrectly folded models. Proteins, 4, 19-30. Privalov, P. L. (1990). Cold denaturation of proteins. Crit. Rev. Biochem. Mol. Biol. 25. 281-305. Sharp, K. A., Nicholls, A., Fine, R. F. & Honig, B. (1991). Reconciling the magnitude of the microscopic and effects. Science, 252, hydrophobic macroscopic 106-109. Sippl, M. J. (1996). Calculation of conformational ensembles from potentials of mean force: an approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 213, 859-883. Sweet, R. M. & Eisenberg, D. (1983). Correlation of measures similarity in sequence hydrophobicities three-dimensional protein structure. f. Mol. Biol. 171, 479-488. Weast. R. C. (1974). Editor of Handbook of Chemistry and Physics, 55th edit., CRC Press, Cleveland.
by R. Huber