PHY$1CA ELSEVIER
Physica D 107 (1997) 330-337
Local signals in the energy landscape of collapsed helical proteins J e f f e r y G. S a v e n *, P e t e r G. W o l y n e s School of Chemical Sciences, University of Illinois Urbana, IL 61801, USA
Abstract
Using simple statistical mechanical models, we present a theory based on a free energy function that combines classical models of the helix-coil transition in polymers with approximate treatments of the effects of excluded volume, confinement, and helical packing. The theory includes "local signals" for folding in the form of stabilization energies for three types of local structure. Randomness in the energies of the conformational states is also considered. The thermal behavior of the model is presented for realistic estimates of the signal energies. Estimates of the relative contributions of local signals and specific tertiary interactions to the folding stability gap are obtained. Keywords: Protein folding; Helix-coil transition; Local signals; Minimal frustration; Energy landscape
1. Introduction
In the energy landscape picture of protein folding, the statistical characterization of the protein's conformational energy surface focuses on two competing issues. First, for most collapsed conformations, the connectivity of the protein chain will prevent all the interactions between residues from being favorable. Because of this frustration, the energy surface of a random heteropolymer has many local minima corresponding to partially misfolded states. The roughness of the energy landscape leads to a glass transition at low temperature, wherein the protein can become trapped in these low energy conformations [1]. At low temperatures, the folding dynamics slows, and the search through landscape minima becomes difficult. Second, rapidly foldable proteins have an additional property that guides them through this rugged landscape: their sequences satisfy "the principle of minimal frustration" [2]. This principle states that the energy of the protein decreases more than is expected for a random sequence as the conformations it assumes become progressively more like the ground state. These energetic biases guide the protein "toward the native conformation. A natural concern in the thermodynamics of protein folding is the division of the guiding interactions between those that are local versus those that are nonlocal in the polymer's sequence. An understanding of the balance between these two ways of achieving minimal frustration is needed for a complete description of how a protein's sequence determines its structure. Local ordering of residues near in sequence is necessary to form familiar secondary * Corresponding author. Tel.: 1217 333 8710; fax: 1217 244 0789; e-mail:
[email protected]. 0167-2789/97/$17.00 Copyright © 1997 Elsevier Science B.V. All rights reserved PII SO 1 6 7 - 2 7 8 9 ( 9 7 ) 0 0 102-4
J.G. Saven, P.G. Wolynes / Physica D 107 (1997) 330--337
331
structures such as a-helices and/3-turns. On the other hand, nonlocal interactions that involve tertiary contacts are more difficult to understand since they involve the bringing together of very distant (in sequence) parts of the polymer. In harmony with the principle of minimal frustration, each type of native structure may have corresponding signals or stabilization energies. We summarize in this report a model that incorporates local conformational signals and treats the entropies of both the one-dimensional local ordering and the collapsed heteropolymer correctly. The theory allows us to probe the individual effects of various signals. Our calculation describes proteins whose secondary structure comprises ahelices and conformationally well-defined turns between helices. Although this model includes tertiary interactions in a statistical fashion, it highlights the degree to which local conformational signals may guide folding. The fact that the entropy of the protein is greatly reduced relative to the free chain by the effects of excluded volume, confinement, and helical ordering is included. Owing to the low entropy density, once collapse has occurred small energies may suffice to guide the folding of the protein. We apply the model to some representative cases and discuss some of theory's implications for the overall folding process including specific tertiary effects. (The reader is referred to the work of Saven and Wolynes [3] for a more complete description of the model.)
2. Statistical thermodynamicai model of a collapsed, helical protein with local conformational signals We begin by defining Sconfig, which accounts for the number of configurational states for given numbers of native helical and nonhelical (coil or turn) residues, and native interfaces. In the folded or native structure of a helical protein having n residues, there are hnat residues in a-helical conformations, and Cnat = n -- hnat residues in nonhelical or "coil" conformations, which can form conformationally defined turns or dangle as random coil at the end of the sequence. In addition, there are jnat interfaces or junctions between helical and nonhelical runs in the native structure. We take interfaces that are native to be in a well-defined conformational state: Zi = 1 for native interfaces. The number of residues that have the same conformation that they have in the native structure is np. This quantity measures how similar a particular conformation is to the native structure. The number of residues that are in their native helical conformations is hh, the number of nonhelical residues in their native conformations is Cc = np - hh, and the number of native interfaces is jr. As the protein folds and approaches the native state, the limiting values of np, hh, Cc, and jr are n, hnat, chat, and jnat, respectively. The total numbers of helical residues and interfaces are nh and ni, respectively. For given values np, nh, ni, hh, and jr, the number of arrangements ~2config depends on the specific locations of the native helices in the protein's sequence. We simplify by dividing the sequence into native helical and native nonhelical regions, according to the conformations of individual residues in the native structure. We consider each type of region as a homopolymer, though each region is not necessarily contiguous in the protein's sequence. We treat each region using combinatorial methods used in the conventional helix-coil problem for homopolymers. In so doing, we introduce one additional order parameter, the number of interfaces in the native helical region jh. An estimate for the total number of states for given values of the six ordering parameters np, nh, hi, hh, Jh, jr is
"Qconfig
,~,{jnatl(hh-jrl{hnat-hh t nh--hh t{Cnat--jr--nh+hhl \ jr 1 \ jh/2 ,] k, jh/2 ,] ((ni -- Jh -- jr)~2/ \ (hi -- Jh -- Jr)/2 ,] {(Chat -- nh q ' - h h ) ~ z h n a t _ h h ( z c __ l)Cnat+2hh_nh_np × 1k ( t t p - - h h ) ] c
(1)
Zc is the number of nonhelical conformations per residue (Zc ~ 10). The configurational entropy is just Sconfig = k BIn $-2config.
332
J.G. Saven, P.G. Wolynes/Physica D 107 (1997) 330-337
We define ASdir heteropolymer:
as
the entropy due to the directional degrees of freedom of the helices in the confined
ASdir/kB = lot(ni -- jr) ln(Zi),
(2)
where Zi is the number of degrees of freedom for interfaces. Here we take Zi = Zc. Each correct interface has only one native conformation. After hydrophobic collapse, the heteropolymer chain is contained to a small volume. For simplicity, we take the shape of this volume to be a sphere of radius D. For a 100-residue protein, D ~ 18/~. The entropy decrease due to confinement is ASconfine/kB
--
~6 D
+ In
,
(3)
where R 2 = n c rt c2 + lniR2(1 - # )
(4)
and rc is the persistence length of a nonhelical residue, and n~ is the number of "free" coil segments not in their native conformations (n'c = n - nh - ( n p - hh)). RH is the helix persistence length. As the protein folds and becomes more compact, adjacent native helices in the sequence are likely to become antiparallel to one another, thus diminishing the portion of the mean square displacement due to the helical segments; # accounts for this. We use the following form:
At
=
a (Jr~ 2 (nP.~_hh~ y -(l -- 2/ni) , 2 \hi/ \ Cnat ]
(5)
where a is a constant and is exactly 2 in the extreme limit of independent, exponentially distributed helix lengths (as they would be for a Poisson process). In real systems, a is likely to be less than 2, and, for definiteness, we take that a = I. For the excluded volume loss of the entropy, we take ZISsteric/kB = - ( n - (1 - cQ(nh + n p - hh)) ( 1 -r/ r/In(1 - r/) + 1) + (l _ o t ) [nh [ l -- rlh'~ , [ 1 Q'~ • 1)], ~-~h ) ' ° g ~ -1- - - - ~ h ) + 2 (l°g(1 -- Q) --
(6)
where Q = r/h (1 -- ni/2nh) and Oh = nh r / I n . The volume fraction occupied by the polymer is r/; here we take 0 "~" I. In the aligned polymer (tr = 0), only nonnative coil segments contribute to the nonhelical part of the excluded volume entropy. The second term includes the length dependence of the helices. For the average total energy of the protein, we use the following general form: /~ --= --2nh~:hb -k-4niEhb -- eNHhh -- ~:NC(np -- hh) -- eNJjr.
(7)
This includes the energy of nonspecific helix formation and additional stabilization due to the formation of native structure. The number of native helical residues, native coil, and native interfaces are hh, np --hh, and jr, respectively. The effective hydrogen bond energy in a helix is ebb. The native signals, eNH, ENC, and eNJ, are the additional stabilization energies (per segment) of forming native helical residues, native coil (turn) segments, and native helix caps (interfaces), respectively.
J.G. Saven, P.G. Wolynes/Physica D 107 (1997) 330-337
333
The canonical (thermal) energy is AE 2 E(T) = E - --. kBT
(8)
Similarly, the thermal entropy S ( T ) is given by
S ( T ) = Sconfig --I-ASdir -J- ASconfine -4- ASsteric
AE 2 2kBT 2 .
(9)
A n d lastly, we specify the thermal (Helmholtz) free energy F(T) : E(T) - T S(T).
(10)
The e q u i l i b r i u m values of the ordering parameters (np, nh, hi, hh, jr, and Jh) are those that m i n i m i z e the free energy.
3. Results and discussion We choose a set of native stabilization energies roughly estimated from experiment. O n e possible mechan i s m for helix start/stop signals is the interaction of hydrogen b o n d i n g side chains with hydrogen bonds of the b a c k b o n e at the end of the helix [4-6]. We estimate that ENJ = -100
2 Ehb for compact proteins. Our estimate of
. . . . . . . . . . . . . . . . . . . . . . . . !
.200 ¸ -300 J
-400
_5oo !
,\
,%
-600 -700
%
-800 0
..................... 1 2 3 kBT / ehb
4
5
Fig. 1. The free energy F (see Eq. (10)) as a function of the temperature kBT/ehb for the following values of the native helical, native coil, and native helix cap stabilization energies: ENH = 0, ENC = 2.6~hb, and ENJ = 2Ehb. Presented are the free energies for the liquid crystalline t~ = 0 (solid) and disordered a = 1 (dashed) phases of the collapsed globule. Indicated on the plot is the first-order phase transition at ka/ic/~hb = 3.0. For temperatures below Tic, the liquid crystalline phase (solid) is thermodynamically favored. Also indicated is the glass transition temperature at kBTg/ehb = 0.75. As the temperature is lowered below Tg, since the system is frozen in a few low energy states, the free energy remains unchanged.
334
J.G. Saven, P.G. Wolynes/Physica D 107 (1997)330-337
the native coil stabilization energy is obtained from estimates of the populations of fl-turns in model peptides as observed by Wright et al. [7]. We estimate a turn stabilization energy of approximately eNC = 2.6 Ehb ---~ 1.3 kcal/mol, which may well be an overestimate. We assume that there is no additional native stabilization of helical residues, ENH = 0. This choice is motivated by studies of peptides in which only alanine was found to favor helix formation, and the other amino acids (except for proline) were indifferent or slightly helix destabilizing [8]. We now consider the temperature dependence. There is the first-order phase transition at Tic, where the helically aligned state (a = 0) and the disordered state (or = 1) are of equal free energy [9]: F(ot - 0, Tic) = F(ot = l, Tic). The free energies for the two types of ordering are presented in Fig. I. At high temperatures, the lower free energy state is the disordered ot = 1 state. The ordering temperature is ka 7ic/ehb "~ 3. Below this temperature, the ordered state (t~ = 0) becomes thermodynamically favored. For temperatures below the glass transition temperature kB Tg/~hb ~ 0.75, the free energy remains unchanged since the system would be frozen in only a few conformational states in a non-self-averaging way. In Fig. 2, we present entropy as a function of the temperature. As we might expect, the entropy of the ordered state vanishes at temperatures higher than that for the unaligned state, since the ot = 0 state is always more ordered at any given temperature. In Fig. 3, we present the total number of residues that are in their native conformations np as a function of the temperature. Also plotted are the number of residues in their native helical h h and native nonhelical conformations Cc -~ np - hh, respectively. At high temperatures, the helices show no preferential alignment, and less than 20%
175 o~=1
150
°°
o°°°
0~=0
125 100 75
50 25 0 ~
0
1
2
3
4
5
kBT / Ehb Fig. 2. The entropy S(T) as a function of the temperature ka T/Ebb (solid) for the following values of the native helical, native coil, and native helix cap stabilization energies: ENH = 0 , E N C = 2.6 ebb, and eNJ = 2 Ehb. Presented are the entropies for the liquid-crystalline a = 0 (dashed) and disordered ~ = 1 (dotted) collapsed globules. The jump in the entropy at the liquid crystal ordering temperature kBTic/ehb = 3.0 is proportional to the latent heat of the transition. The entropy vanishes at the glass transition temperature kaTg/Ehb = 0.75.
J.G. Saven, P.G. Wolynes/Physica D 107 (1997) 330-337
335
100 t
• I,,,,I rgl
Z
6O
40
\
2O I I
0
•
0
1
,
,
,
,
,
2
.
.
.
.
i
3
.
.
.
.
i
4
.
.
.
.
5
kBT / ghb Fig. 3. The number of native residues as a function of the temperature kBT/Ehb. Plotted are the total number of residues in their native conformations np (solid), the number of residues in their native helical conformations hh (dashed), and the number residues in their native turn conformations Cc (dotted). The native stabilization energies for native helical, turn, and helix cap local conformations are ENH ----- 0 , ENC = 2.6 ~hb, and ENJ = 2 Ehb. As the temperature decreases, the total number of native residues (solid) changes by only three residues at the liquid-crystalline ordering temperature kBT/Ehb = 3.0 but increases rapidly near room temperature kBT/Ehb -----1.2. No additional native ordering occurs at temperatures below the glass transition temperature kBTg/% b = 0.75. The maximum possible values for np, hh, and Cc are their values in the native structure, where np = n = 100, h h = hnat = 80, and Cc =Cnat = 20.
of the molecule is folded. As the temperature is decreased, the system undergoes a liquid crystalline ordering transition at Ttc, where the helices become aligned; kBTic/E.hb = 3.0. Surprisingly, this occurs for small values of the average helix length, which is only about two residues, and there are approximately five such helices. The number of native helical residues hh is roughly independent of a , the degree of liquid-crystalline ordering at Tic. There is little native structure at the liquid-crystalline ordering transition. Less than 20% of the protein's residues are in their native conformations for T > ~qc. At Tic, the number of native residues increases by only about three residues. Thus this liquid-crystalline ordering transition, in this model, does not correspond to the folding transition. As the temperature decreases, the number of native helical and nonhelical segments increases. The transition to increasing helical content is much broader than that seen for unconfined systems [9]. The values of the native ordering parameters reach their m a x i m a at the glass transition temperature Tg. At this temperature, the number of conformational states thermally accessible to the system is o f order 1. For the values of the stabilization energies chosen here, kBTg/Ehb -~ 0.75. Note the high degree of similarity to the native structure (np/n = 0.91)
atTg. We compare these results for a system with local conformational signals to previous estimates of the energetic and entropic contributions to the folding process. Using models without local signals, Onuchic et al. [10] used the theory of the helix coil transition in collapsed polymers to estimate that at 65% helicity, the Levinthal entropy SL
336
J.G. Saven, P.G. Wolynes/Physica D 107 (1997)330-337 E ~
T
--
""..
~
J
~,,*,g,, ~l~ ~ , , , , ~ .
if-,
Nonspecific Helix
-4-
•....
o_.
/'
l
Native Tefiary Contacts
Native Local Signals
....... : E..
~g.Str~.,o
T
Fig. 4. Within the protein folding funnel [10,15], the stability gap 8Es is the energy gap between the folded state and the average energy of the molten globule states. Roughly, 39% is due to specific, local nativizing interactions, 33% is due to nonspecific helix formation, and 28% is due to specific, native tertiary interactions.
is approximately 0.6 kB per residue; thus for a 100-residue protein, SL = 60 kB. AE 2
SL = S(T) + 2kBT 2.
(11)
When the local native signal stabilization energies take on reasonable values (ENrt = 0, ENC = 1.3 kcal/mol = 2.2 kBT, and ENJ = 2 ehb), the Levinthal entropy at the value of Ehb/kBT that corresponds to 65% helicity is SL = 40 kB. The native stabilization acts to reduce the relevant conformational entropy. At this degree of helicity, the local signal stabilization energies account for roughly 40 kB T or 24% of the molten globule's internal energy [8]. We can also calculate the respective contributions to the stability gap 6Es, which is the energy difference between the folded state energy and the mean energy of the molten globule ensemble [10] (see Fig. 4). Assuming that the folded state entropy is negligible, 8Es is
~Es Tf
AE 2 --
SL +
-
-
2 kB T2 '
(12)
where Tf is the folding temperature. Estimates of AE2/(2(kBTf)2), based upon folding kinetics and fluctuation measurements on molten globules, range from 11 to 18 [10] for proteins that are 106 and 126 residues in length. Our choice of AE 2 yields a value of zaE2/(2(kBTf) 2) = 17. Using the SL for systems without local signals and this choice of AE 2, we find that for a 100-residue protein 8Es -----78kBTf. The stability gap that we obtain from our calculation involving local native signal stabilization is 8Es = 58 kB Tf, where Tf is near room temperature. The energy gap is diminished since some native ordering has already occurred in the molten globule. Within this stability gap, 23 kB Tf (39%) is due to locally nativizing interactions and 19 kB Tf (33%) is due to nonspecific helix formation. The remaining 16 kB T (28%) arises from specific, native tertiary contacts (see Fig. 4). The relative contributions of the specific nonlocal and local nativizing energies are comparable, and the stability gap is not dominated solely by one type of interaction or the other. Free energy functions of the type presented here are useful tools for studying the effect of physical parameters such as size, packing density, and degree of native ordering on protein folding. The calculation here may be readily extended to include other ordering processes relevant to the folding process, such as the entropy diminution due to the formation of native tertiary contacts [ 11-13] or generic hydrophobic/hydrophilic phase separation in the molten globule [14].
J.G. Saven, P.G. Wolynes/Physica D 107 (1997) 330-337
337
Acknowledgements We thank Prof. Z.A. Luthey-Schulten for informative discussions. This material is based upon work supported by the National Science Foundation under grant CHE-93-01474 awarded in 1993 to JGS and by the National Institutes of Health under grant PHS 1 R01 GM44557.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [1 l] [12] [13] [14] [15]
J.D. Bryngelson, J.N. Onuchic, N.D. Socci and EG. Wolynes, Proteins 21 (1995) 167. J.D. Bryngelson and EG. Wolynes, Proc. Nat. Acad. Sci. USA 84 (1987) 7524. J.G. Saven and EG. Wolynes, J. Mol. Biol. 257 (1996) 199. J.S. Richardson and D.C. Richardson, Science 240 (1988) 1648. L.G. Presta and G.D. Rose, Science 240 (1988) 1632. H,X. Zhou et al., J. Am. Chem. Soc. 116 (1994) 6482. EE. Wright, H.J. Dyson, J.E Waltho and R.A. Lerner, in: Protein Folding, eds. L.M. Gierasch and J. King (AAAS, Washington, 1990) Ch. 9, pp. 95-102. A. Chakrabartty, T. Kortemme and R.L. Baldwin, Protein Sci. 3 (1994) 843. Z. Luthey-Schulten, B.E. Ramirez and EG. Wolynes, J. Phys. Chem. 99 (1995) 2177. J.N. Onuchic, EG. Wolynes, Z. Luthey-Schuiten and N.D. Socci, Proc. Nat. Acad. Sci. USA 92 (1995) 3626. M. Sasai and EG. Wolynes, Phys. Rev. Lett. 65 (1990) 2740. M. Sasai and EG. Wolynes, Phys. Rev, A 46 (1992) 7979. A. Gutin and E. Shakhnovich, J. Chem. Phys. 100 (1994) 5290. K.A. Dill and D. Stigter, Adv. Protein Chem. 46 (1995) 59. EG. Wolynes, J.N. Onuchic and D. Thirumalai, Science 267 (1995) 1619.