Exploring structures in protein folding funnels with free energy functionals: the transition state ensemble1

Exploring structures in protein folding funnels with free energy functionals: the transition state ensemble1

Article No. jmbi.1999.2613 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 287, 675±694 Exploring Structures in Protein Foldi...

1MB Sizes 0 Downloads 58 Views

Article No. jmbi.1999.2613 available online at http://www.idealibrary.com on

J. Mol. Biol. (1999) 287, 675±694

Exploring Structures in Protein Folding Funnels with Free Energy Functionals: The Transition State Ensemble Benjamin A. Shoemaker1, Jin Wang2,3,4 and Peter G. Wolynes1* 1

School of Chemical Sciences University of Illinois Urbana-Champaign, Urbana IL 61801, USA 2

Stragetic Analytics Unit Citigroup, 2nd Floor, 1125 Old Country Road, Plainview NY 11803 3

Department of Chemistry State University of New York at Stony Brook, Stony Brook NY 11794, USA 4

Department of Physics Jilin University, Changchun Jilin 130021, People's Republic of China *Corresponding author

We use free energy functionals that account for the partial ordering of residues in the transition state ensemble to characterize the free energy surfaces for fast folding proteins. We concentrate on chymotrypsin inhibitor and l-repressor. We show how the explicit cooperativity that can arise from many body forces, such as side-chain ordering or hydrophobic surface burial, determines the crossover from folding with a large delocalized nucleus and the speci®c small classical nucleus of the type envisioned in nucleation growth scenarios. We compare the structural correlations present in the transition state ensemble obtained from free energy functionals with those inferred from experiment using extrathermodynamic free energy relations for folding time obtained via protein engineering kinetics experiments. We also use the free energy functionals to examine both the size of barriers and multidimensional representations of the free energy pro®les in order to address the question of appropriate reaction coordinates for folding. # 1999 Academic Press

Keywords: folding kinetics; transition state ensemble; extrathermodynamic free energy relations; protein structure

Introduction If it was possible to view several different movies of the same protein folding into its native structure, it would be clear that the molecule follows different detailed routes to the folded state on each attempt (Wolynes et al., 1995). The precise set and order of structures through which the protein passes is not encoded in the amino acid sequence itself, but depends also on the random motions of the chain that are excited by thermal ¯uctuations of the surrounding environment. Despite this diversity of detailed folding routes, kinetically, protein folding is often quite simple to describe at the phenomenological level (Fersht, 1995b). Many of the fastest folding proteins studied in detail exhibit exponential kinetics, which suggest that there is an ensemble of structures which must be passed through that represents a bottleneck to the folding process. A completely precise characterization of this transition state ensemble would Abbreviations used: CI2, chymotrypsin inhibitor 2. E-mail address of the corresponding author: [email protected] 0022-2836/99/130675±20 $30.00/0

involve a knowledge of many features of protein chain dynamics. Nevertheless, a major determinant of the characteristics of this ensemble is the interplay between the guiding forces towards the folded state and chain entropy. This can be described at a quasi-thermodynamic level. A variety of arguments strongly suggest that the energy landscape of fast folding proteins resembles a funnel, because residues that interact in a native like fashion do so more stably than incorrect associations of residues. This is in accordance with the principle of minimal frustration (Bryngelson & Wolynes, 1987; Leopold et al., 1992). The funnellike nature of the energy landscape allows an approximate reaction coordinate for folding to be introduced (Bryngelson & Wolynes, 1989; Onuchic et al., 1995, 1996). Using such a coordinate, the transition state ensemble can be characterized and the statistics of the speci®c structures present in that bottleneck ensemble deduced, using a quasiequilibrium picture of the folding bottleneck. Here, we use free energy functional methods (Shoemaker et al., 1997) to explore the structures of proteins in the transition state region and to assess different reaction coordinates that can be used to # 1999 Academic Press

676 characterize folding. The approach taken here distinguishes different ensembles of structures on the energy landscape by the fraction of native-like contacts which are formed in an ensemble and the locations of those native-like contacts. Polymer statistical mechanics allows us to ®nd an approximate functional that gives the free energy of these ensembles. This functional re¯ects the exchange between the entropy loss caused by forming contacts and the free energy stabilization gained by making them. We have previously used such a functional to describe the structural correlations in the transition state of CI2 using empirical information to locate the folding bottleneck (Shoemaker et al., 1997). Here, we amplify the results of that study and use the free energy functional to approximately locate the transition state region and estimate barrier heights. This allows us to consider several different choices of the folding reaction coordinate and evaluate their usefulness. We are also able to quantify and qualitatively understand several physical effects that determine thermodynamic barrier heights. Two extreme limiting pictures of the reaction coordinate for folding have been used to discuss kinetic theories of folding. One is a globally de®ned reaction coordinate that depends on the total fraction of native contacts formed (Dinner et al., 1996; Onuchic et al., 1995). The other extreme choice of a reaction coordinate would be to describe progress towards the folded state by monitoring the formation of only a few contacts contained in a speci®c nucleus for folding (Abkevich et al., 1994). In our previous work we showed that when chain statistics is accounted for, even the globally de®ned reaction coordinate actually yields a statistical set of structures that could loosely be described as a delocalized folding nucleus (Onuchic et al., 1996). This delocalized nucleus or core does have a few ``hot spots'' that could be identi®ed with the classical speci®c nucleus if there were no other residues involved in the folding, but other residues are indeed involved. The partial order in these residues in the transition state contributes to their in¯uence on folding rates. Here, we will show that a reaction coordinate specifying only the ordering of these speci®c nucleus contacts can also be used to describe the transition state ensemble. As might be expected, the two choices give fairly similar results for experimental observations. The simultaneous consideration of both coordinates gives a satisfying picture of the ensemble of structures involved in what has been termed the ``nucleation-condensation'' mechanism of folding for this molecule (Fersht, 1995a; Itzhaki et al., 1995). We contrast this with the classical ``nucleation-growth'' picture in which the nucleus orders in a kinetically distinct step from the rest of the chain (Lifshitz & Pitaevskii, 1981). Another feature of the folding reaction that we explore is the role of explicit cooperativity in determining the magnitude of the barrier to folding. The magnitude of this explicit cooperativity, which

Free Energy Functionals for Transition States

can arise from the many body nature of the forces, determines the crossover from the mean®eld to capillarity picture of folding. In contrast to the former, the latter envisions a free energy surface where folded and unfolded parts of the chain are spatially sharply distinct. Here, we discuss our calculations of one and two-dimensional free energy pro®les. As noted, these pro®les allow us to re®ne our view of the reaction coordinate for folding. We discuss in detail the structures involved in the transition state ensemble of CI2. We pinpoint the effects of the heterogeneity of interactions on the folding transition state ensemble, and of explicit cooperativity in determining the degree of delocalization of the nucleus or core. We discuss the theoretical values of the extrathermodynamic free energy coef®cients (f values) using the different reaction coordinates, and compare them with the results of extensive site-directed mutagenesis experiments of the Fersht group. We then apply our methods to study the transition state ensemble of l-repressor. This fast folding helical protein has been the subject of protein engineering studies undertaken by the Oas group (Huang & Oas, 1995; Burton et al., 1997).

Results Free energy surfaces for CI2 The free energy changes over the entire course of folding are determined by ®rst calculating the Qij values at discrete points along the reaction coordinate, Qtot. This is done by changing the Qtot equation of constraint, recalculating the Qij values, and then substituting the resulting Qij values into the free energy functional itself. Figure 1(a) shows the free energy pro®le as a function of Qtot for CI2 (chymotrypsin inhibitor 2) with a realistic level of capillarity cooperativity in the low surface tension case (ac ˆ 0.05,ah ˆ 0.58,ah ÿ r ˆ 0.4). The resulting free energy barrier is located at Qtot ˆ 0.23 and has a height of 0.12 kBT. The unfolded state occurs at Qtot ˆ 0.13 and the folded state occurs at Qtot ˆ 0.37. In Figure 1(b) the free energy pro®les for CI2 are plotted for the high surface tension case (ac ˆ 0.1,ah ˆ 0.58,ah ÿ r ˆ 0.4). At the folding temperature the free energy barrier occurs at Qtot ˆ 0.31 with an increased barrier height of 4.5kBT. In this case the unfolded state shifts down to Qtot ˆ 0.06, and the folded state shifts up to Qtot ˆ 0.68 at Tf. Absolute experimental barrier heights are hard to determine, but the values computed here are not entirely unreasonable, even if lower than the expectations of some workers. As will be explained, the approximate transition state location inferred from experiment is at Q* ˆ 0.2. This is close to what the functional gives in the one dimensional scan. A more accurate two dimensional free energy dissection gives even better agreement for the transition state location. This suggests that the calculated free energy curve captures the behavior of the system.

Free Energy Functionals for Transition States

677

Figure 2. The dependence of barrier height on capillarity cooperativity. The free energy of CI2 is plotted along the reaction coordinate, Qtot, at increasing values of capillarity cooperativity, from ac ˆ 0.05 to ac ˆ 0.2.

Figure 1. Variation of calculated free energy pro®le with temperature for CI2 in the (a) low and (b) high surface tension regimes. The transition state shifts from Qtot ˆ 0.23 to Qtot ˆ 0.31 with the barrier height increasing from 0.12 kBT to 4.5 kBT in going from the low to high cooperative regimes.

In Figure 1 the free energy curves for CI2 are plotted by varying the temperature. As expected, the transition state ensemble and completely folded state shift to lower free energies as the temperature decreases. At the same time the unfolded state free energy scales with temperature as expected for a purely entropically stabilized ensemble. In both cases the folded state does not occur precisely at Q ˆ 1. This may be a sign of the need for additional side-chain ordering steps with their concomitant cooperativity. At lower temperatures Qfolded switches rapidly to one. Free energy curves are changed dramatically by the addition of explicit cooperativity mimicking those interactions with a many body character like the hydrophobic energy that depends on buried surface area or the effect of side-chain freezing. We compare a progression of free energy curves calculated at the folding temperature ranging from small to large capillarity cooperativity in Figure 2.

Notice that while the barrier height increases with increasing cooperativity, the locations of the unfolded, transition, and folded states differ only slightly from the least cooperative situation of the three cases. Although Qtot works well as a reaction coordinate, there are many other ways to measure the extent to which a protein has reached its native conformation. Another metric we can choose is the fraction of contacts formed in a given set of special contacts, Qs, such as a de®ned nucleus or core of contacts which predominates in the transition state. This core of contacts can be guessed using experimental input or calculated from the theory by using ®rst the Q coordinate to ®nd the approximate transition state and then identifying the most strongly participating residues. Contacts among these residues are measured by this second metric, Qs ˆ Qhot. Since the speci®c nucleus conception of the transition state would be most appropriate for high capillarity cooperativity, we can also search for a central core of contacts by examining the most ordered region in the very high capillarity cooperativity limit. In this case the coordinate is referred to as Qs ˆ Qcore to distinguish the two methods of ®nding the structured residues in the transition state (these nucleus residues are indicated in Figure 8(d)). The nucleus is comprised of residues from the N-terminal end to the middle of the a-helix (including residue 16), and the three residues about 49 and 57. Notice that for CI2 the contacts inferred from these theoretical approaches are the same as those arrived at experimentally by Itzhaki et al. (1995). In addition to the Qtot and Qcore reaction coordinates we may often use particular Qi values to monitor structure formation. Figure 3 shows the free energy as a function of various choices of this reaction coordinate. For each of the alternative

678

Free Energy Functionals for Transition States

Figure 3. The dependence of the free energy pro®le on reaction coordinate. The free energy of CI2 is plotted using ac ˆ 0.05,ah ˆ 0.58, ah ÿ r ˆ 0.4 along the following reaction coordinates: Qtot (black), Qhot (red), Q16 (green), Q49 (blue), Q57 (yellow), and Q13 (cyan). The Qx coordinates follow the average contact probabilities of residues chosen from the core and the helix.

reaction coordinates there is a double-well potential which is very broad. This breadth implies dynamical effects (transmission coef®cients) will be slightly different in different representations but structural details are pretty much the same for each. From this we conclude that the folding trajectory can be explored reasonably well for CI2 by monitoring contacts in the core and helix in addition to the overall contact probability. The reaction coordinates Qtot and Qcore (or Qhot) can be used to create two-dimensional free energy surfaces. This allows us to compare the merits of the two different reaction coordinates and to see if they order separately as in a classical nucleation plus growth scenario where an initial speci®c core forms followed by growth of the remaining contacts. To calculate these two dimensional surfaces, we now calculate the Qij* values subject to two equations of constraint, ij Qij/native Qij ˆ Q* and core ˆ Q*core at points throughout ij Qcore ij /native Qij tot core plane. the Q , Q Figure 4 shows a progression of two-dimensional free energy surfaces from low to high temperature using Qhot as determined by the one dimensional Q*ij values (with ac ˆ 0.1, ah ˆ 0.58, ah ÿ r ˆ 0.4). From the surface at the folding temperature, Tf, the unfolding minimum occurs at Qtot ˆ 0.06, Qhot ˆ 0.02 and the folding minimum occurs at Qtot ˆ 0.68,Qhot ˆ 0.98. The transition state location at Tf is calculated at Qtot ˆ 0.3, Qhot ˆ 0.7 with a barrier height of 4.5 kBT. The location and height of the barrier are reasonable in comparison with experiment (Itzhaki et al., 1995) and with the values calculated using only the Qtot reaction coordinate. As expected; the barrier height calculated using the two-dimensional representation is nearly the same as in one dimension. Also

Figure 4. Variation of two-dimensional free energy surfaces with temperature calculated for CI2. The free energy is plotted along two reaction coordinates: Qtot, the fraction of native contacts formed, and Qhot, the fraction of core contacts formed in the physically motivated high capillarity cooperativity regime. Core contacts are calculated from the one-dimensional transition state, but can also be determined by experiment. A progression of free energy surfaces is shown at (a) 23Tf, (b) Tf, and (c) 32Tf. In Tf, the dots correspond to the location of observed Qhot values in the one-dimensional case constraining only the Qtot coordinate.

Free Energy Functionals for Transition States

shown in Figure 4 are the locations of the free energies calculated with only the Qtot constraint (Figure 1(b)) by simply checking the value of Qhot from the appropriate Qij values at each point along Qtot. These one-dimensional points closely follow the folding contour of the two-dimensional surface and as at the transition state generally agree quite closely in free energy to the corresponding twodimensional value. Starting at Qtot ˆ 0,Qhot ˆ 0, we see the natural trajectory primarily follows Qhot until midway through Qtot, when the remaining non-core contacts ®ll in. We see that although growth and nucleation would be nearly concomitant that experimentally there is a tendency toward a two-step nucleation-growth (Lifshitz & Pitaevskii, 1981) scenario. In Figure 5 the two-dimensional free energy surface is shown at the folding temperature in the low capillarity cooperativity regime (ac ˆ 0.05, ah ˆ 0.58, ah ÿ r ˆ 0.4), again using the Qhot and Qtot coordinates. Notice that as in Figure 4(b) the free energy points calculated without the additional Qhot constraint (shown with black dots) correspond well to the two-dimensional folding contour. Here the folding trajectory follows much the same course as in the high capillarity picture of Figure 4(b), but the folding and unfolding minima have moved in much closer together away from the extremes of Q ˆ 0 and Q ˆ 1. Also the barrier height has dropped dramatically as was seen in Figure 1. Interestingly though, when the core residues from the high capillarity limit are used to determine Qcore a slightly different picture of the folding trajectory emerges. Using the same parameters

679

Figure 6. Two-dimensional free energy surface using the capillarity core calculated for CI2. The free energy is plotted along two reaction coordinates: Qtot, the fraction of native contacts formed, and Qcore, the fraction of core contacts formed in the high surface tension regime (ac ˆ 0.1, ah ˆ 0.58, ah ÿ r ˆ 0.4). Core contacts are calculated from the capillarity core.

(ac ˆ 0.1, ah ˆ 0.58, ah ÿ r ˆ 0.4), the two-dimensional pro®le in Figure 6 gives a transition state location at Tf of Qtot ˆ 0.3, Qcore ˆ 0.34 with a barrier height of 4.5 kBT. In this case the natural trajectory follows nearly a parallel course through the transition state and beyond to the folding minimum. The free energy functional approach then strongly indicates that either Qtot or Qcore can be used nearly equally well as an appropriate reaction coordinate to describe the folding of CI2. The initial Qtot formation is largely the formation of the frayed helix, but a detectable amount of initial collapse also exists. For extremely high cooperativity (ac ˆ 1) the two dimensional plots show a scenario much more like the traditional nucleation followed by growth. For many body forces this strong we ®nd Qcore to be a better reaction coordinate than Qtot. Nevertheless Qtot would be acceptable if a reasonable transmission factor is included. The role of explicit cooperativity in transition state ensemble structure

Figure 5. Two-dimensional free energy surface at low capillarity cooperativity calculated for CI2. The free energy at Tf in the low capillarity cooperativity regime is plotted along two reaction coordinates: Qtot, the fraction of native contacts formed, and Qhot, the fraction of core contacts formed. Core contacts are calculated from the one-dimensional transition state, but can also be determined by experiment. In Tf the dots correspond to the location of observed Qhot values in the one-dimensional case constraining only the Qtot coordinate.

Analogous to our companion study of equilibrium denatured states in the folding funnel (Shoemaker & Wolynes, 1999), we now consider the effects of varying the cooperativity parameters and energetic heterogeneity on the structures in the transition state ensemble. This gives some insight into the role of different physical effects in determining transition state geometry. We illustrate the effects using the protein CI2 (chymotrypsin inhibitor 2) as it is the main subject of the mutagenesis experiments. Figure 7(a) shows the three-dimensional crystal structure of CI2 colored by Sth at the transition

680

Figure 7. Transition state structure in¯uenced by helical cooperativity. We color the ribbon representation of CI2 with the calculated Qi with (a) no cooperativity, (b) helical cooperativity, ah ˆ 0.58, ah ÿ r ˆ 0, and (c) helicallocal density interaction cooperativity, ah ˆ 0, ah ÿ r ˆ 0.5. Red indicates Qi ˆ 0 and blue indicates Qi ˆ 1.

state (Q* ˆ 0.2) with no explicit cooperativity in the functional. In this limit the contact probability is fairly representative of the important structural regions in the transition state, with the highest

Free Energy Functionals for Transition States

density in the core and helical residues. In this strict mean ®eld limit, cooperativity comes only from the Flory correction to the Jacobson-Stockmayer entropy and from the coarse-graining (see Methods). However, the optimized potential seems to do a good job on its own of ®nding the important hot spots. Notice that although the free energy functional without explicit cooperativity does not give a free energy barrier (see Figure 2) along the progress coordinate, it is able to characterize the structural aspects of the transition state at the same value of Q*. Figure 7(b) displays the Sth values for the helical cooperativity, ah ˆ 0.58, which accounts for the cost of creating an helix-coil interface. The effect of this cooperativity in the case of CI2 which has only one helix of moderate length is to destabilize its contact probability relative to the rest of the protein. In Figure 7(c) the helical-local density interaction cooperativity is introduced, ah ÿ r ˆ 0.4, ah ˆ 0. The Sth values show that in the transition state mainly residues in the helix, but also residues in the core are partially ordered. Due to the constraint ij Qij/native Qij ˆ Q*, the increased stability of helical hydrogen bonds forces contacts in the rest of the protein with marginal stability to lose contact probability. Only contacts with relatively large stabilities persist outside of the helix. Figure 8 shows the progressive changes in transition state structure caused by increasing capillarity cooperativity from ac ˆ 0.05 to ac ˆ 0.2, for the one dimensionally determined transition state ensemble at Q8* ˆ 0.2 of CI2. This progression illustrates the migration from the mean ®eld limit to the capillarity limit. As expected, when the capillarity cooperativity is increased, the diffuse contact probability becomes more strongly localized to a speci®c spatial region. The high capillarity limit nucleus that forms may thus be identi®ed with the ``classical'' nucleus inferred by (Itzhaki et al. (1995), within the resolution of the coarsegrained contacts. This nucleus contains residues making contacts between the helix and the core. The emergence of the classical nucleus in the high capillarity limit suggests that its origin lies more in its being a core region of high local density in the native structure rather than from signi®cant energetic heterogeneity re¯ecting particularly stable individual contacts. This change of description can be followed in the distribution of Sth and Qij values. Histograms of these quantities are shown in Figure 9. In Figure 9(c) and (d) the Sth and Qij distributions are shown respectively using the cooperativity values ac ˆ 0.05, ah ˆ 0.58, ah ÿ r ˆ 0.4. In Figure 9(c) and (d) the Sth and Qij distributions are shown respectively using the high cooperativity limit of ac ˆ 0.2. Following the discussion of Onuchic et al. (1996), we see unimodal distributions for the low, physical values of the cooperativity while at ac ˆ 0.2, a nucleus begins to emerge. An interesting feature of these distributions is the qualitative similarity between the Sth distributions (left-hand panels) and

Free Energy Functionals for Transition States

681

Figure 8. Transition state structure of CI2 using an increasing amount of capillarity cooperativity. The structures are colored by Qi, which is calculated at the calculated transition states with capillarity cooperativity from ac ˆ 0.05 to ac ˆ 0.2. Red indicates Qi ˆ 0 and blue indicates Qi ˆ 1.

the Qij distributions (right-hand panels) for each of the cases in Figure 9. This shows the credibility of presenting results for kinetics based on either Qij or Sth in making comparison with experiment. f and Sth for CI2 Comparison with protein engineering kinetic experiments Using protein engineering methods, structural correlations in the transition state ensemble of CI2 have been inferred from the effects on rate and stability of single and multiple mutations throughout the sequence. These experiments give the extraWe have thermodynamic coef®cients, fexp i . calculated corresponding theoretical values, fth i , for all of the single mutations and we compare these to experiment. We group the mutated residues categorized by secondary structure as de®ned by Itzhaki et al. (1995). Earlier we set the location of the transition state at the empirical Q* ˆ 0.2. This was determined from the average of experimental f values, fexp, but agrees well with the predicted

transition state location above. Q* and fexp are roughly equivalent, since f is a measure of the change in the free energy difference between the transition and unfolded states over the change in free energy difference between the folded and unfolded states or, i.e. how similar the transition state is to the folded or unfolded state. Here the calculated transition state location is used to compare with protein engineering experiments. The amount of explicit helical cooperativity used (ah ˆ 0.58) is determined from helix-coil theory using known thermodynamic s values (Cantor & Schimmel, 1971; MunÄoz & Serrano, 1994; Luthey-Schulten et al., 1995). The magnitude of the helical-local density cooperativity, ah ÿ r ˆ 0.4 is found from the helix-coil theory by (LutheySchulten et al. (1995) to correspond roughly to 1 kBT of stabilization energy per helical residue. Two levels of capillarity cooperativity are used that bound our entropy calculation for speci®c surfaces to the surface entropy estimate of Finkelstein (ac ˆ 0.05 and ac ˆ 0.1; Finkelstein & Badretdinov, 1997). This enables our free energy functional to explore the two limits of low and high surface ten-

682

Free Energy Functionals for Transition States

Figure 9. The localization of contact probability throughout the progress coordinate of CI2. A progression of distributions of (a), (c) Sth values and Qij values is shown from Q* ˆ 0.1 to Q* ˆ 0.9 with the physically motivated level of low capillarity cooperativity (a), (b) ac ˆ 0.05, ah ˆ 0.58, ah ÿ r ˆ 0.4 and with the large limit of capillarity cooperativity (c), (d) ac ˆ 0.2.

sion with thermodynamically reasonable parameters. Details of the determination of the cooperativity parameters are discussed in the accompanying paper (Shoemaker & Wolynes, 1998). No additional many body cooperativity is used. While, as discussed earlier, barrier heights are sensitive to cooperativity values, we have found that structural averages (f values) are fairly robust even at the calculated location of the transition state. Table 1 lists the experimental f values for the mutations performed by the Fersht group. Also listed in Table 1 are the theoretical f values for low and high ac both at the one (using only the Qtot coordinate) and two-dimensional (using the Qtot and Qcore coordinates) transition state ensembles. Notice that a high degree of correlation exists among the four choices of surface tension and transition state. In Table 2 two of these choices are listed again with f values averaged over multiple mutations of a given residue. These average f values using low ac at the one-dimensional tran-

sition state and high ac value at the two-dimensional transition state are also plotted in Figure 10 and are representative of the four highly correlated cases listed in Table 1. As you can see the f values for different ac and choice of transition state ensemble are very similar. The left-hand panels of Figure 10 show the correlation plot of fth to fexp for the different groups of mutations based on secondary structure for the low cooperative case at the Q* ˆ 0.24 transition state calculated from the one-dimensional free energy. The experimental f values were averaged over multiple mutations of each site, when these were carried out. Figure 10(a) shows the correlation of f values for mutations in the core (red), the b-sheet (black), and the turns and coil (green). These f values show good correlation with a slope of 1.1 and a correlation coef®cient of 0.73 for core mutations, a slope of 0.72 and a correlation coef®cient of 0.60 for b-sheet mutations, and a slope of 0.80 and a correlation coef®cient of 0.71 for turns and coil mutations. Note that mutations on resi-

683

Free Energy Functionals for Transition States Table 1. The f values calculated from experiment and free energy functional theory Mut.

fth 1D/low

fth 1/d/high

fth 2D/low

fth 2D/high

fexp

L8A A16G V19A I20V I29V I29A V47A L49A V51A I57A P61A S12G E14Q E14D E15Q E15D K17A K17G K18A K18G L21A L21G Q22G D23A D23G K24A K24G T3A T3V T3G I30A I30G I30T V34A V34G V34T A58G V60A V60G V63T V63A V63G L32A L32I L32V V38A F50L F50V F50A K2A K2 M P6A E7A E26A T36V R43A D52A N56D N56A

0.3 0.6 0.2 0.6 0.4 0.3 0.2 0.3 0.3 0.2 0.1 0.3 0.6 0.5 0.6 0.8 0.6 0.4 ÿ0.8 0.4 1.4 ÿ0.3 0.4 0.6 1 0.4 0.6 0.1 0.1 0.1 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.1 0.1 0.2 0.2 0.2 0.4 0.6 ÿ0.5 0.3 ÿ0.7 0.1 0.3 0.1 0.1 0.1 0.3 0.3 0.2 0.3 0.1 0.2 0.4

0.3 0.4 0.2 0.5 0.5 0.3 0.2 0.3 0.3 0.2 0 0.2 0.5 0.4 0.5 0.8 0.4 0.2 ÿ0.1 0.4 0.8 0.1 0.4 0.5 0.7 0.4 0.5 0.1 0.1 0.1 0.5 0.5 0.5 0.2 0.2 0.2 0.1 0.2 0.2 0.2 0.2 0.2 0.4 1.9 ÿ0.1 0.4 ÿ0.8 0.1 0.4 0.1 0.1 0.2 0.3 0.3 0.3 0.3 0.2 0.2 0.4

0.2 0.6 0.2 0.6 0.4 0.3 0.2 0.3 0.3 0.2 0.1 0.4 0.6 0.5 0.6 0.8 0.6 0.4 ÿ0.9 0.4 1.5 ÿ0.3 0.4 0.6 1 0.4 0.6 0.1 0.1 0.1 0.4 0.4 0.4 0.2 0.2 0.2 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.9 ÿ0.3 0.3 ÿ0.6 0.1 0.3 0.1 0.1 0.1 0.3 0.3 0.2 0.3 0.1 0.2 0.4

0.3 0.4 0.2 0.5 0.5 0.3 0.2 0.3 0.3 0.2 0 0.2 0.5 0.4 0.4 0.7 0.4 0.1 0 0.4 0.8 0.1 0.4 0.5 0.7 0.3 0.4 0.1 0.1 0.1 0.5 0.5 0.5 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.4 1.9 ÿ0.1 0.4 ÿ0.8 0.1 0.4 0.1 0.1 0.2 0.3 0.3 0.3 0.3 0.2 0.2 0.4

0.2 1.1 ÿ0.3 0.4 0.2 0.3 0.2 0.5 0.3 0.1 0.4 0.3 1.2 0.2 0.5 0.5 0.3 0.4 ÿ0.4 0.7 0.3 0.4 0.1 ÿ0.3 0.3 ÿ0.4 0.1 0.1 0.5 0.1 0.3 0.3 0.4 0 0.2 0.2 0.1 0 0 0.1 0 0 0.2 ÿ0.3 ÿ0.1 0.1 0.3 0.3 0.3 ÿ0.2 0 0.1 0.4 0.4 0.2 0.1 0.1 0.2 0.1

The energy-weighted f values are listed for each mutation performed in CI2 as calculated from experiment and in the low and high surface tension regimes in the one-dimensional and twodimensional transition states as calculated by the free energy functional theory.

dues that are both in the core and another category, such as the a-helix, are listed with the other core mutations only. Figure 10(b) shows the correlation of f values for mutations in the a-helix (red) and in the minicore (black). The correlation in these plots

is not as strong as in the core or b-sheet mutations. This re¯ects the fact that the energy function contains no residue speci®c secondary structure energies. The right-hand panels of Figure show the correlation for the two-dimensional high cooperativity case. The f values for

684

Free Energy Functionals for Transition States

Table 2. Average f and S values calculated from experiment, simulation and free energy functional theory Mut. 8 16 19 20 29 47 49 57 61 12 14 15 17 18 21 22 23 24 3 30 34 58 60 63 32 38 50 2 6 7 26 36 43 52 56

fth 1D/low

fth 2D/high

fexp

0.3 0.6 0.2 0.6 0.35 0.2 0.3 0.2 0.1 0.2 0.55 0.7 0.5 ÿ0.2 0.55 0.4 0.8 0.5 0.1 0.4 0.2 0.2 0.1 0.2 0.17 0.3 ÿ0.1 0.1 0.1 0.3 0.3 0.2 0.3 0.1 0.3

0.3 0.4 0.2 0.5 0.4 0.2 0.3 0.2 0.1 0.1 0.45 0.55 0.25 0.2 0.45 0.4 0.6 0.35 0.1 0.5 0.2 0.2 0.2 0.2 0.73 0.4 ÿ0.1 0.1 0.2 0.3 0.3 0.3 0.3 0.2 0.3

0.2 1.1 ÿ0.3 0.4 0.25 0.2 0.3 0.1 0 0.35 0.7 0.5 0.35 0.15 0.35 0.1 0 ÿ0.15 0.23 0.33 0.13 0.1 0 0.03 ÿ0.07 0.1 0.3 ÿ0.1 0.1 0.4 0.4 0.2 0.1 0.1 0.15

Sth 1D/low 0.43 0.77 0.81 0.81 0.46 0.43 0.57 0.55 0.3 0.72 0.87 0.78 0.87 0.89 0.86 0.78 x 0.56 0.26 x 0.35 0.33 0.3 0.35 0.42 0.39 0.48 0.34 0.31 0.51 0.37 0.34 0.33 0.3 0.57

Sth 1D/high 0.47 0.81 0.83 0.83 0.46 0.43 0.67 0.68 0.29 0.85 0.91 0.82 0.87 0.89 0.9 0.77 x 0.55 0.2 x 0.35 0.32 0.25 0.33 0.41 0.32 0.53 0.33 0.24 0.58 0.33 0.27 0.25 0.27 0.72

Sexp 0.27 0.86 1.2 0.41 0.35 0.52 0.03 0.46 0.29 0.75 0.62 0.96 0.47 0.6 0.5 0.68 x 0.22 0.03 x 0.2 0.07 0.28 0.21 0.49 0.91 0.14 0.47 0.07 0.75 0.75 0.52 0.55 0.76 0.43

The energy-weighted f values are listed for each residue as calculated from experiment and for two choices of parameters as calculated from the theory, namely the low surface tension regime at the one-dimensional transition state and the high surface tension regime at the two-dimensional transition state. In addition the unweighted Ster values from simulation are shown with the analogous Qi in the low and high surface tension regimes of the theory at the transition state determined empirically from simulation.

different ac and choice of transition state ensemble are very similar. It is dif®cult to see which to use in comparison with experiment, but the one-dimensional low cooperativity case seems to agree best. Although f values show a range of structure throughout CI2 in the transition state, we can easily locate a small region, or hot spot, which contains the greatest amount of structure in the transition state. The Fersht group has classi®ed the residues Ala16, Leu49, and Ile57 as the classical nucleus, placing the highest amount of organized structure between the core and a-helix. We ®nd very similar f values for these three mutations and come to the same conclusion for the location of the central core or nucleus with the addition of residue ile20 from the helix. As Figure 11 shows, this residue along with Ala16 face into the core interacting with core residues Leu49 and Ile57. We see that our contact potential does a good job of predicting f values for mutations of buried residues. As is the case with other potentials though, solvent-accessible mutations are more dif®cult to calculate. Such mutations probably require an additional type of potential, such as secondary

structure propensity potentials based on back bone rotations, which could easily be incorporated within the current free energy functional framework. We can also compare Sth at the transition state to ter calculated from molecular dynamics simuS lations on the same system undertaken by Daggett et al. (1996). Ster is the ratio of the number of contacts a residue has in the transition state to the number of contacts it has in the folded state. While the experimental f re¯ects both energy and structure, the molecular dynamics simulated Ster is not weighted by an energy function. The molecular dynamics results also were computed at an unphysically high temperature. Figure 12 shows the correlation of our Sth values calculated again in two regimes of explicit cooperativity (ac ˆ 0.05, ah ˆ 0.58, ah ÿ r ˆ 0.4 and ac ˆ 0.1, ah ˆ 0.58, ah ÿ r ˆ 0.4), but this time at the transition state value Q* ˆ 0.45. We chose this value of the transition state again to match the average Ster formed by Daggett and Fersht in their simulation. They use this as their transition state location. As is clear from the free energy curves in Figure 4, the barrier determined by free energy functional

Free Energy Functionals for Transition States

685

Figure 10. Correlation of energy-weighted fth values to fexp values grouped by secondary structure classi®cation for CI2. The fth versus fexp for single mutations made in the core (a, red), the b-sheet (a, black), the turns and coil (a, green), the a-helix (b, red), the minicore (b, black), with fth calculated at the experimentally determined one-dimensional transition state in the low surface tension regime on the left and at the two-dimensional transition state in the high surface tension regime on the right.

Figure 11. Calculated transition state hot spots for CI2. Ribbon representation of the crystal structure colored by Qi using ac ˆ 0.05, ah ˆ 0.58, ah ÿ r ˆ 0.4 at Q* ˆ 0.3. Residues considered by Ithzaki et al. (1995) to be the classical transition state nucleus are indicated by arrows and identity. Red indicates Qi ˆ 0 and blue indicates Qi ˆ 1.

methods is relatively broad so the precise value of Q* is not easy to de®ne. In all likelihood experiments average the f values over a small region in Q. Showing the same general trends as in Figure 10, the Sth and Ster are grouped with values in the core and b-sheet regions in Figure 12(a) and values in the a-helical, turn and coil regions in Figure 12(b). The left and right-hand panels show the low and high surface tension regimes, respectively, for completeness, but notice that there is very little difference between the two cases. The correlation does not vary as much between secondary structural regions for these purely structural quantities in Figure 12 as it does for the corresponding fth-fexp plots in Figure 10, which also involve energetics. These ®ndings of increased structural correlation are in agreement with Figure 7(a) which shows that the ``classical'' folding nucleus can be found using the free energy functional without any explicit cooperativity. This suggests that while the free energy functional adequately describes structures resulting from the protein interactions, individual

686

Free Energy Functionals for Transition States

Figure 12. Correlation of the calculated Sth to Ster from simulation in the transition state. Ster, the fraction of native contacts in the transition state, was determined by Daggett et al. (1996) at Q* ˆ 0.45. Sth is also calculated at Q* ˆ 0.45 again with the low surface tension regime on the left and the high surface tension regime on the right. The same coloring scheme is used as in Figure 10.

interaction terms which must be convoluted with these structures to ®nd f need to be improved. By changing the constraint, Q*, in our expression for the Qij values, we can study how changing the location of the transition state will affect the contact probabilities, Qij, and therefore fi values. In Figure 13 there is a series of three-dimensional representations of CI2 colored by Sth with Q* from 0.1 to 0.6. The progression shows the emergence of a few well-formed residues near the classical nucleus at Q* ˆ 0.1. As Q* increases, the hot spots grow through the a-helix and core regions up to Q* ˆ 0.6, where a majority of the protein contains well-formed structure. We also show in Figure 14 the progression of Q values averaged over each residue as a function of sequence position at various depths in the funnel. Notice the clear emergence of the classical nucleus around residues 16, 49 and 57 as Q* increases. The f values shown so far have been calculated at the folding temperature. We are not limited to this temperature, however, and can study the sensitivity of our transition state to a temperature perturbation. Figure 15 shows the correlation of f

values calculated from the free energy functional theory at Tf to 3/2Tf. Even with this substantial change in temperature, which corresponds to F ˆ 7kBT, the f values in all domains remain relatively unchanged. This is consistent with the interpretation that the transition state ensemble of CI2 consists of a continuum of closely spaced energy levels and not two distinct populations (Fersht, 1997). Clearly for CI2 the nature of the transition state is constant over a range of denaturant conditions. A similar study of temperature has been carried out in which the effect of temperature on the intermediate and transition states in barnase was monitored (Dalby et al., 1998). They ®nd very little change of the average structures in the transition state as the temperature is varied. From Figure 15 we calculate an average @f/@T ˆ 0.0002 which is actually less than the values reported for barnase by the experimentalists. l -Repressor The 80 residue l-repressor is another system where transition state ensemble structure has been

Free Energy Functionals for Transition States

687

Figure 13. Distribution of native structure in CI2 along the folding funnel. The ribbon representation of the native structure of CI2 is used to show the progression of Qi values calculated from Q* ˆ 0.15 to Q* ˆ 0.6 using ah ˆ 0.45, ac ˆ 0.1. Red indicates Qi ˆ 0 and blue indicates Qi ˆ 1.

investigated and to which we can compare. Oas, Huang, and co-workers have performed kinetic and equilibrium experiments on this 80 residue version of l-repressor, l6-85, and a thermostable variant, G46A/G48A (Huang & Oas, 1995; Burton et al., 1997). This study is much less extensive than that done on CI2. The fth values are calculated for each of the single residue mutations they studied. Sth values for all the residues are also used to compare structural hot spots throughout the protein in the transition state. To begin with, one-dimensional free energy plots in Qtot are calculated to determine the transition state location, barrier height, and the general order of folding events. The free energy is calculated from equation (6) for l6-85 along the Qtot reaction coordinate and plotted in Figure 16. The transition state occurs at Q* ˆ 0.34 with a barrier height of 1.1kT in the low capillarity cooperativity limit (ac ˆ 0.05, ah ˆ 0.58, ah ÿ r ˆ 0.3). When the capillarity cooperativity is increased (ac ˆ 0.1, ah ˆ 0.58, ah ÿ r ˆ 0.3), the transition state remains at Q* ˆ 0.35 and the barrier

Figure 14. Residue structure formation in a progression down the funnel for CI2. Qi is plotted versus sequence number at representative values of the folding process in the physically motivated low surface tension regime (ac ˆ 0.05, ah ˆ 0.58, ah ÿ r ˆ 0.4).

688

Figure 15. Correlation of energy-weighted fth values at Tf to 3/2Tf for CI2. fth at Tf versus fth at 3/2Tf for single mutations made throughout all of the secondary structure regions of CI2, with fth calculated at the twodimensional transition state in the high surface tension regime.

Free Energy Functionals for Transition States

Figure 17. Calculated core of contacts in the capillarity limit of l6-85. Ribbon representation of the crystal structure colored by Qi, the average of all Qij values around residue i with ac ˆ 0.1 and Q* ˆ 0.37. Red indicates Qi ˆ 0 and blue indicates Qi ˆ 1.

height increases to 6.6 kT. These absolute numbers for barriers are slightly higher than CI2, but are still quite low and in keeping with the folding observed in the laboratory. Using our free energy functional, as in the case of CI2, a more complete description of the folding funnel can be made by imposing the additional constraint of a ®xed total probability for a predetermined set of special contacts, Qs. Qij values can then be calculated along the surface de®ned by ®xed values of Qtot and Qs. In order to ®nd an appropriate second reaction coordinate one approach is to ®nd

the set of Qij values in the limit of large capillarity cooperativity using the one-dimensional transition state (Figure 17). This limit favors residues spatially surrounded by the largest density of contact probability, making it possible to determine a set of core residues to use for the Qs ˆ Qcore coordinate. The capillarity core consists of blocks of three residues about residues 37, 40, 43, and 46. When this ``capillarity'' core is used to calculate the two-dimensional free energy surface, it gives the anticipated behavior of a transition state ensemble with considerable structure in the de®ned core residues as is shown in Figure 18. The folding trajectory is fairly parallel, but proceeds

Figure 16. Variation of calculated free energy pro®le with temperature for l6-85 in the (a) low and (b) high surface tension regimes. The transition state occurs at Qtot ˆ 0.34 with a barrier height of 1.1kBT in the low surface tension regime and at Qtot ˆ 0.35 with a barrier height of 6.6 kBT in the high surface tension regime.

Figure 18. Two-dimensional free energy surface using the capillarity core calculated for l6-85. We plot free energy along two reaction coordinates; Qtot, the fraction of native contacts formed, and Qcore, the fraction of core contacts formed. Core contacts are calculated from the capillarity core.

Free Energy Functionals for Transition States

Figure 19. Two-dimensional free energy surface calculated for l6-85. We plot free energy along two reaction coordinates; Qtot, the fraction of native contacts formed, and Qhot, the fraction of core contacts formed. Core contacts are calculated from the one-dimensional transition state. The free energy surface is shown at the folding temperature, Tf. At Tf, dots are also shown corresponding to the location of observed Qhot values in the one dimensional case constraining only the Qtot coordinate.

689 more along the Qcore coordinate than Qtot until reaching the transition state. Beyond the transition state Qtot orders slightly more rapidly than does Qcore en route to the folded state. The free energy surface for l6-85 using the capillarity core de®ned Qcore is similar to the Qcore surface calculated for CI2 in its parallel nature, but shows a slightly more nucleation-growth picture. An alternative way of calculating an appropriate second reaction coordinate is to examine the hot spots in the one-dimensional transition state at the physical values of cooperativity (see Figure 20(b)). These hot spots contain some of the same residues as the high capillarity core indicating that the two cores are similar. We can calculate the free energy surface with this set of hot residues obtained from the one-dimensional transition state rather than the capillarity core coordinate. Figure 19 shows the two-dimensional free energy surface using the Qs ˆ Qhot coordinate. There is an unfolding minimum at Qhot ˆ 0.15,Qtot ˆ 0.09 and a folding minimum at Qhot ˆ 0.95, Qtot ˆ 0.75. The general course of folding takes place close to the diagonal of the free energy surface in Figure 19 with contacts in the hot core ordering slightly more than contacts in

Figure 20. Distribution of native structure in l6-85 along the folding funnel. The ribbon representation of the native structure of l6-85 is used to show the progression of Qi values calculated from Q* ˆ 0.1 to Q* ˆ 0.6 using a physically reasonable level of cooperativity, ac ˆ 0.08, ah ˆ 0.58, ah ÿ r ˆ 0.3. The transition state is shown in (b) at Q* ˆ 0.35. Red indicates Qi ˆ 0 and blue indicates Qi ˆ 1.

690

Free Energy Functionals for Transition States Table 3. Values calculated from experiment and free energy functional theory for l6-85 Mut

Th

Exp

M15 M20 M37 M49 M63 M66 M81 M46,48

1.8 0.8 0.8 0.6 0.5 0.6 0.4 0.1

0.5 1.0 0.2 0.3 0.8 1.2 0.6 0.5

The energy-weighted f values are listed for each mutation performed for the repressor as calculated in the transition state from experiment and by the free functional theory.

Figure 21. Residue structure formation in a progression down the funnel for l6-85. Qi is plotted versus sequence number at representative values of the folding process using physically motivated cooperativity, ac ˆ 0.08, ah ˆ 0.58, ah ÿ r ˆ 0.3.

general prior to the transition state. Initial contact formation involves mainly the i,i ‡ 4 contacts of the helices. The transition state occurs at Qhot ˆ 0.61,Qtot ˆ 0.34 with a barrier height of 3.5 kBT. The hot core residues show comparable ordering in the transition state to the capillarity core residues described earlier. The difference in the nature of the hot spots for folding in l6-85 compared with CI2 is that in the repressor the hot spots nucleate only slightly faster than general growth regardless of how Qs is de®ned. In the case of CI2, the hot spots nucleate more quickly than general growth when Qhot is used instead of Qcore.

To locate the regions with most native like structure in the transition state of l6-85, it is useful to calculate the Sth values at the calculated transition state (Q* ˆ 0.35) as is displayed in Figure 20(b) at physically reasonable levels of cooperativity, ac ˆ 0.08, ah ˆ 0.58, ah ÿ r ˆ 0.3. We also show the progression of Sth versus depth Q* and sequence in the funnel (Figures 20 and 21, respectively). In these structures the hot spots are centered on helices A and D in the denatured states below the transition state. At this early part of the funnel helices C and E contain partial ordering and ahelix B is relatively unordered. The same trends are found experimentally from amide exchange studies (Huang & Oas, 1995). As folding continues into the transition state the ®rst four helices, in particular a-helix A, all contain structure with only helix E remaining relatively unordered. Beyond the transition state helices A-D continue to be the most structured with a-helix C forming the greatest amount of structure late in the folding process. Throughout the folding funnel structure generally forms in the center of helices leaving the frayed ends to form last. Figure 22 shows a theoretical titration curve for Sth averaged over each of the ®ve helices in l6-85. This shows that there is very little cooperative nature to helix formation with the ®ve helices ®lling in as a gradual process. Table 3 and Figure 23 show the theoretical and experimental f values for the thermostable variant of l6-85 using the same notation for the mutations as in the experiment (Burton et al., 1997). As in the experiment, higher f values are found from the free energy functional theory in helices A and D. Also in qualitative agreement with experiment is the partially ordered f value from a-helix E. Agreement is not as good for the other mutations.

Conclusions Figure 22. Calculated titration curves for the a-helices in l6-85. Calculated Qi values using physically motivated cooperativity in the free energy functional theory are averaged for each a-helix throughout the reaction coordinate, Q, giving a theoretical pro®le of the folding funnel.

The energy landscape approach to protein folding kinetics reduces much of the problem of computing rates to one of equilibrium statistical thermodynamics. As discussed in other studies (Plotkin & Wolynes, 1998; Socci et al., 1996; Du et al., 1998), there are, of course, important kinetic modi®cations to the thermodynamic approach that

691

Free Energy Functionals for Transition States

Further improvements in the parameters of the free energy functionals, especially in the residuespeci®c energies and contributions of residues to secondary structure formation should improve the agreement with experiment which is already quite pleasing. The resulting picture of the free energy landscape also can be re®ned through comparison with atomistic simulations of the unfolding process. Currently such approaches are somewhat limited by sampling dif®culties but they are fast becoming more realizable. Probably in the long run such calculations will act as benchmarks for the easier application of free energy functionals to a wide variety of proteins.

Methods Summary of methods Figure 23. Correlation of energy-weighted fth values to fexp values for l6-85. fth versus fexp for single mutations made in each of the ®ve helices of l6-85 and for a double mutation back to the wild type structure with ac ˆ 0.08, ah ˆ 0.58, ah ÿ r ˆ 0.3.

are related to the dynamics of crossing through the transition state. Here especially the non-Markovian aspect of chain dynamics is signi®cant, but can be theoretically treated (Plotkin & Wolynes, 1998). Much work has been done on that problem, but much remains to be done in quantifying dynamical effects on actual protein kinetics. Similarly, the glassy aspects of trapping deserve more attention. Nevertheless, the thermodynamic view of the energy landscape approach, when combined with free energy functionals, allows many structural features of the transition state ensemble to be deduced which are directly re¯ected in measurable effects on kinetics. As we have seen, there is already considerable richness in that information. Furthermore, the free energy functional allows a rapid assessment of how the thermodynamic barriers change with thermodynamic conditions. The free energy functional also allow us to dissect out the various contributions to the free energy barriers to gain insight into how the bottleneck is determined. We also see that the quantitative treatment of free energies made possible using such functionals allows us to assess the merits of different pictures of the reaction coordinate for folding, at least with the assumption that the semi-empirical energies of interaction are suf®ciently well known. In particular this allows us to contrast the nucleation condensation mechanism of folding in which some residues are more important than others, but most participate to some extent (Fersht, 1995a; Itzhaki et al., 1995) with the rival classical nucleation growth picture with a speci®c nucleus in which only three or four residues are considered important (Lifshitz & Pitaevskii, 1981).

We brie¯y review the details of the free energy functional as described (Shoemaker & Wolynes, 1999). The free energy functional is written in terms of contact probabilities de®ned by amino acid residue pairs which come within a cutoff distance of each other in the native state. These contact probabilities vary continuously from zero in the unfolded state to one in the native state. They are affected by energetic forces dependent on residue identity and entropic forces dependent on chain collapse. In addition, the contact probabilities become dependent on each other as cooperative forces corresponding to hydrophobic collapse and side chain ordering help bring together blocks of contacts in concert. The free energy functional which we use accounts for each of these forces. Once we have the free energy functional we use a variational approach to minimize the functional and constrain it along a particular reaction coordinate. This leads to self-consistent equations solvable by iteration. From this framework we discuss how to calculate experimentally relevant quantities.

Free energy As explained in detail in the accompanying paper, we use a free energy functional to calculate the contact probabilities, Qij, in the transition state ensemble. These represent residue-residue interactions between Cb atoms Ê in the crystal structure. Several which are within 6.5 A parameters are used in the functional, which are chosen to give proper thermodynamics (Table 4). This functional includes a two-body potential calculated by the information theoretic approach from databases (Miyazawa & Jernigan, 1985, 1996) or by an optimization strategy (Goldstein et al., 1992) giving an energetic contribution to the functional, E ˆ ij eijQij. The results shown here are calculated exclusively with the optimized potential including helical propensities veri®ed to agree with measured S values from helix-coil theory (Chakrabartty et al., 1994) as explained in the accompanying paper. In keeping with other statistically based potentials the eij values are scaled to give an appropriate folding temperature, Tf, and are known up to an additive constant, b0, whose magnitude sets the degree of hydrophobic collapse. Varying b0 determines whether collapse is a distinct phase of the folding process. The total energy

692

Free Energy Functionals for Transition States Table 4. Parameters used in Qij calculations for CI2 and l6-85 Parameter

Description

n Tf b0/m r0 l0 x ah ac ah-r

CI2 value

Conformations/residue Folding temperature (kBT) Hydrophobic collapse (kBT) Ê) Contact radius (A Ê) Kuhn length (A Contact matrix cell size Magnitude of helical cooperativity Magnitude of capillarity cooperativity Magnitude of interaction cooperativity

l6-85

4 1.4 ÿ1.2 6.5 5 3 0.58 0.05/0.1 0.4

4 1.8 ÿ1.3 6.5 5 3 0.58 0.05-0.1 0.3

All parameters are ®xed to realistic estimates according to experiment and polymer physics theory.

gained from all the contacts formed in the native state is scaled to the total entropy lost as the chain goes from the extended to folded states, Nlog(n) ˆ (ef ‡ b0)/Tf . This entropy loss is estimated as Nlog(n), where n is the number of conformations per residue. We also calculate the total molten globule energy, Äemg, with our potential by threading through alternate structures and ®x the difference between the folded and molten globule energies at the folding temperature as Nlog(n/e) ˆ ÿ(ef ÿ eÄmg)/Tf. The use of these two criteria amounts to assuming the protein is near a triple point. The free energy functional also includes a contact based entropy functional which is an interpolation between the Jacobson-Stockmayer functional (Jacobson & Stockmayer, 1950), Sij0 ˆ kBlog[V/ji ÿ jj3/2] and the Flory entropy functional in the mean ®eld limit (Flory, 1956), Sij0 ˆ kB log[(V/(N/m))3/2], where m is the number of coarsegrained contacts. In these expressions V ˆ (3/ 2p)3/2 t/l30 , where t is the volume of the interaction range and l0 is the persistence length of the chain. We set Ê V ˆ 2 consistent with a contact radius, t1/3, of 6.5 A Ê . This gives and a reasonable persistence length, l0 ˆ 5 A us the interpolated entropy functional: ij S0 …m†

ˆ kB log‰V…ji ÿ jj

ÿ3=2

‡ …N=m†

ÿ3=2

†Š

F…fQij …m†g† ˆ

ij

k

In addition, an interaction cooperativity exists between helical contacts and the local density contacts around the helix: Fhÿr …Q† ˆ ahÿr T

hx X

Qiÿ4;i

i

X

Qik

…4†

k

This cooperativity is estimated from the helix-coil theory by (Luthey-Schjulten et al. (1995) to be 1 kBT per helical residue. Finally, we include capillarity cooperativity which makes it easier for spatially local contacts to form in the neighborhood of i, j when the contact i, j is already formed, as discussed (Shoemaker & Wolynes, 1999):   XXX  1 1 Fc …Q† ˆ ac Qij ÿ Qlk ÿ QNat ik 2 2 ij k l 

1 ‡ Qij ÿ 2



!  1 Nat Qkl ÿ Qkj 2

…5†

In addition to these explicit forms of cooperativity, some non-additivity also enters the free energy from the coarsegraining of contacts as described in the companion paper (Shoemaker & Wolynes, 1999). The ®nal form of the free energy functional, including a mixing term that takes into account the different ways of forming contacts in a partially ordered protein, is:

0 1 m X X ij X ij eij Qij …m† ÿ T @ S0 …m†Qij …m† ‡ …@S0 …m0 †=@m†dQij …m0 † ‡ N log…n†A ij

hx  X

m0 ˆ1 ij

  hx X X 1 1 Qiÿ4;i …m† Qik …m† Qi;i‡4 …m† ÿ ÿ ahÿr T 2 2 i i k       X X X 1 1 1 1 Nat ÿ ac Qlk …m† ÿ QNat Q Q Qij …m† ÿ ‡ Q …m† ÿ …m† ÿ ij kl 2 2 ik 2 2 kj ij k l

ÿ ah

‡

…2†

In the general form of this expression we can add cooperativity to contacts forming anywhere in the protein by summing over the contacts neighboring i, j: X Ft …Q† ˆ at …Qij Qik ‡ Qij Qkj † …3†

…1†

Cooperative effects in the functional arise both from the mean ®eld limit of the entropy functional, from explicit terms added to the functional, and from coarsegraining the contacts as described (Shoemaker & Wolynes, 1999). Several of these explicit forms of cooperativity are added separately or in combination. A helical cooperativity comparable to the interface cost of forming helical ends is introduced by appropriately summing over all q pairs of i,i ‡ 4 contacts in a helix and consequently estimated from s in helix-coil theory (Cantor & Schimmel, 1971; MunÄoz & Serrano, 1994; Luthey-Schulten et al., 1995): X

  hx  X 1 1 Qiÿ4;i ÿ Qi;i‡4 ÿ Fh …Q† ˆ ah 2 2 i

X ij

Qiÿ4;i …m† ÿ

T…Qij …m† log‰Qij …m†Š ‡ …1 ÿ Qij …m†† log‰1 ÿ Qij …m†Š†

…6†

693

Free Energy Functionals for Transition States

where dQij(m0 ) ˆ Qij(m0 ) ÿ Qij(m0 ÿ 1). Minimizing this free energy with respect to Qij and solving for Qij gives for the contact probabilities at the transition state: !   hx X X ij Qij ˆ1= 1 ‡ exp eij =T ÿ l=T ÿ S0 …m† ‡ ahÿr 2 Qiÿ4;i ‡ Qik i

k

hx 1X 1 XX Nat ‡ ah …Qi;iÿ4 ‡ Qi‡4;i‡8 † ‡ ac …Qlk QNat ik ‡ Qkl Qkj † T i T k l

Here we use a Lagrangian multiplier, l, to satisfy the constraint ij Qij/native Qij ˆQ*. The equation is directly solvable when the extra cooperativity terms are absent. We can solve the expression for Qij iteratively when the explicit cooperativity is present. A separate Lagrange multiplier can be introduced to constrain the ordering of any subset of contacts. The equation for those contacts naturally contains a different l value than the global value in equation (7). The location of the transition state, Q*, was determined empirically using experimental data (Shoemaker et al., 1997). Here we also estimate Q* by using the calculated free energy curves from the functional itself. Since barrier heights arise from a sensitive balance of cancelling energy and entropy, these absolute numbers are more sensitive to modelling errors than the structural details. By changing the equation of constraint, the expression for Qij allows us to traverse the reaction coordinate, Qtot, from the extended state with no contacts formed (Qtot ˆ 0) to the native state with all the contacts formed (Qtot ˆ 1). Using the free energy functional to describe the transition state of protein folding we calculate several quantities which can be compared directly to experiment and to atomistic simulations. One of the quantities most easy to understand is Sth which we de®ne as the average of Qij around residue i. We use Sth to color three-dimensional crystal structure representations of proteins as a complement to the two-dimensional contact probability map representation. This quantity is accessible also to atomistic simulations that compute averages over a thermodynamic ensemble. To study the folding transition state, mutagenesis experiments have been carried out on a number of proteins (Itzhaki et al., 1995; LoÂpez-HernaÂndex & Serrano, 1996; Burton et al., 1997; Dalby et al., 1998). In these experiments the quantity f is calculated by measuring the ratio of the free energy change upon mutation in the transition state to the stability change of the folded state using kinetic and equilibrium experiments (f ˆ d log[kf]/d log[K] , where kf is the rate and K is the equilibrium constant). Similarly, we calculate the quantity f using our free energy functional as a ratio of energy differences fi ˆ (j eijQTST ÿ j eij QUF ij ij )/(j eij F UF Qij ÿ j eijQij ) where eij is the change of the contact mut energy upon mutation, (ewt ij ÿ eij ) (Onuchic et al., 1996). To calculate the energy of the mutated protein, emut ij , the identity of the residue is changed and the new interactions are calculated without changing the number of native interactions. This is correct in the linear response regime. This expression for f determines the effect of point mutations on the transition state by weighting the perturbed contact probabilities with the energy functional. Often f has been interpreted as directly measuring Sth. This is a good approximation if all interactions are of roughly the same strength and sign. This is reasonable under two conditions: ®rst, when the interactions are not frustrated and second, when the



…7†

mutations are very mild and can be thought of as changes caused by elimination of contacts as when alanine substitutions for larger residues are carried out. Interestingly an important residue in the transition state for CI2 has some frustrated interactions (Ladurner et al., 1997) and leads to a sensitive dependence of f on the energy function and transition state location. Here the energy weighted formula must be used. Rather than think of mutations at an atomistic level as the reduction (or addition) of methyl groups to a sidechain and the subsequent loss (or gain) of contacts, our algorithm takes residue speci®city into account by treating mutations at the residue level as the change in interaction strengths as determined by the contact potential. We can compare the computed fi directly to experiment. We can also use the analysis to study the variation of Sth or fi as the location of the transition state is changed, for example by the addition of denaturant.

Acknowledgments We thank John Portman, Shoji Takada, Steve Plotkin, and Michael Eastwood for helpful discussions. This work was supported by a grant from the National Institutes of Health, No. PHS 5 R01 GM44557-07.

References Abkevich, V. I., Gutin, A. M. & Shakhnovich, E. I. (1994). Speci®c nucleus as the transition state for protein folding: an evidence from the lattice model. Biochemistry, 33, 10026-10036. Bryngelson, J. D. & Wolynes, P. G. (1987). Spin glasses and the statistical mechanics of protein folding. Proc. Natl Acad. Sci. USA, 84, 7524-7528. Bryngelson, J. D. & Wolynes, P. G. (1989). Intermediates and barrier crossing in a random energy model (with applications to protein folding). J. Phys. Chem. 93, 6902-6915. Burton, R. E., Huang, G. S., Daugherty, M. A., Calderone, T. L. & Oas, T. G. (1997). The energy landscape of a fast-folding protein mapped by alagly substitutions. Nature Struct. Biol. 4, 305-310. Cantor, C. R. & Schimmel, P. R. (1971). Biophysical Chemistry. The Behavior of Biological Macromolecules, vol. 3, W. H. Freeman, New York. Chakrabartty, A., Kortemme, T. & Baldwin, R. L. (1994). Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions. Protein Sci. 3, 843-852. Daggett, V., Li, A., Itzhaki, L. S., Otzen, D. E. & Fersht, A. R. (1996). Structure of the transition state for folding of a protein derived from experiment and simulation. J. Mol. Biol. 257, 430-440.

694

Free Energy Functionals for Transition States

Dalby, P. A., Oliveberg, M. & Fersht, A. R. (1998). Movement of the intermediate and rate determining transition state of barnase on the energy landscape with changing temperature. Biochemistry, 37, 46744679. Dinner, A. R., SÏali, A. & Shakhnovich, M. K. E. (1996). The folding mechanism of larger model proteins: role of native structure. Proc. Natl. Acad. Sci. USA, 93, 8356-8361. Du, R., Pande, V. S., Grosberg, A. Y., Tanaka, T. & Shakhnovich, E. S. (1998). On the transition coordinate for protein folding. J. Chem. Phys. 108, 334-350. Fersht, A. R. (1995a). Optimization of rates of protein folding: the nucleation-condensation mechanism and its implications. Proc. Natl Acad. Sci. USA, 92, 10869-10873. Fersht, A. R. (1995b). Characterizing transition states in protein folding: an essential step in the puzzle. Curr. Opin. Struct. Biol. 5, 79-84. Fersht, A. R. (1997). Nucleation mechanisms in protein folding. Curr. Opin. Struct. Biol. 7, 3-9. Finkelstein, A. V. & Badretdinov, A. Y. (1997). Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Design, 2, 115-121. Flory, P. J. (1956). Theory of elastic mechanisms in ®brous proteins. J. Am. Chem. Soc. 78, 5222-5235. Goldstein, R. A., Luthey-Schulten, Z. A. & Wolynes, P. G. (1992). Optimal protein folding codes from spin glass theory. Proc. Natl Acad. Sci. USA, 89, 4918-4922. Huang, G. S. & Oas, T. G. (1995). Structure and stability of monomeric l repressor: NMR evidence for twostate folding. Biochemistry, 34, 3884-3892. Itzhaki, L. S., Otzen, D. E. & Fersht, A. R. (1995). The structure of the transition state for folding of chymotrypsin inhibitor-2 analysed by protein engineering methods: evidence for a nucleationcondensation mechanism for protein folding. J. Mol. Biol. 254, 260-288. Jacobson, H. & Stockmayer, W. H. (1950). Intramolecular reaction in polycondensations. i. the theory of linear systems. J. Chem. Phys. 18, 1600-1606. Ladurner, A. G., Itzhaki, L. S. & Fersht, A. R. (1997). Strain in the folding nucleus of chymotrypsin inhibitor 2. Fold. Des. 2, 363-368. Leopold, P. E., Montal, M. & Onuchic, J. N. (1992). Protein folding funnels-a kinetic appoach to the

sequence structure relationship. Proc. Natl Acad. Sci. USA, 89, 8721-8725. Lifshitz, E. M. & Pitaevskii, L. P. (1981). Physical Kinetics, Pergamon Press, Oxford. LoÂpez-HernaÂndex, E. & Serrano, L. (1996). Structure of the transition state for folding of the 129 aa protein CheY resembles that of a smaller protein, CI-2. Fold. Design, 1, 43-55. Luthey-Schulten, Z. A., Ramirez, B. E. & Wolynes, P. G. (1995). Helix-coil, liquid crystal, and spin glass transitions of a collapsed heteropolymer. J. Phys. Chem. 99, 2177-2185. Miyazawa, S. & Jernigan, R. L. (1985). Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules, 218, 534-552. Miyazawa, S. & Jernigan, R. L. (1996). Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 256, 623-644. MunÄoz, V. & Serrano, L. (1994). Elucidating the folding problem of helical peptides using empirical parameters. Nature Struct. Biol. 1, 399-409. Onuchic, J. N., Wolynes, P. G., Luthey-Schulten, Z. A. & Socci, N. D. (1995). Towards an outline of the topography of a realistic protein folding funnel. Proc. Natl Acad. Sci. USA, 92, 3626-3630. Onuchic, J. N., Socci, N. D., Luthey-Schulten, Z. A. & Wolynes, P. G. (1996). Protein folding funnels: the nature of the transition state ensemble. Fold. Design, 1, 441-450. Plotkin, S. S. & Wolynes, P. G. (1998). Non-markovian con®gurational diffusion and reaction coordinates for protein folding. Phys. Rev. Letters, 80, 5015-5018. Shoemaker, B. A. & Wolynes, P. G. (1999). Exploring structures in protein folding funnels with free energy functionals: The denatured ensemble. J. Mol. Biol. 287, 657-674. Shoemaker, B. A., Wang, J. & Wolynes, P. G. (1997). Structural correlations in protein folding funnels. Proc. Natl Acad. Sci. USA, 94, 777-782. Socci, N. D., Onuchic, J. N. & Wolynes, P. G. (1996). Diffusive dynamics of the reaction coordinate for protein folding funnels. J. Chem.Phys. 104, 5860-5868. Wolynes, P. G., Onuchic, J. N. & Thirumalai, D. (1995). Navigating the folding routes. Science, 267, 16191620.

Edited by A. R. Fersht (Received 6 October 1998; received in revised form 5 February 1999; accepted 5 February 1999)