J. theor. Biol. (2002) 217, 397–411 doi:10.1006/yjtbi.3006, available online at http://www.idealibrary.com on
Thermodynamics and Kinetics of Protein Folding: An Evolutionary Perspective Lloyd Demetrius*w wDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, U.S.A. (Received on 2 January 2002, Accepted in revised form on 6 March 2002)
This article appeals to an evolutionary model which postulates that primordial proteins were described by small polypeptide chains which (i) lack disulfide bridges, and (ii) display slow folding rates with multi-state kinetics, to determine relations between structural properties of proteins and their folding kinetics. We parameterize the energy landscape of proteins in terms of thermodynamic activation variables. The model studies evolutionary changes in these thermodynamic parameters, and we invoke relations between these activation variables and structural properties of the protein to predict the following correspondence between protein structure and folding kinetics. 1. Proteins with inter- and intra-chain disulfide bridges: large variability in both folding rates and stability of intermediates, multi-state kinetics. 2. Proteins which lack inter and intra-chain disulfide bridges. 2.1. Single-domain chains: fast folding rates; unstable intermediates; two-state kinetics. 2.2. Multi-domain monomers: intermediate rates; metastable intermediates; multi-state kinetics. 2.3. Multi-domain oligomers: slow rates; metastable intermediates; multi-state kinetics. The evolutionary model thus provides a kinetic characterization of one important subfamily of proteins which we describe by the following property: Folding dynamics of single-domain proteins which lack disulfide bridges are described by two-state kinetics. Folding rate of this class of proteins is positively correlated with the thermodynamic stability of the folded state. r 2002 Published by Elsevier Science Ltd.
Introduction Molecular recognition, the non-covalent interaction of complementary molecular surfaces, is the process that drives the spontaneous transformation of a protein from the denatured to the native state. Non-covalent forces also determine the secondary and tertiary structure of proteins. These facts suggest that protein structure *E-mail address:
[email protected] (L. Demetrius). 0022-5193/02/$35.00/0
imposes constraints on protein folding dynamics, as represented by folding rate and folding pathway. The problem addressed in this article is: Does a set of general rules exist which enables one to infer folding kinetics from protein structure? One of the more powerful experimental techniques that has been brought to bear on this problem is site-directed mutagenesis. The development in recombinant DNA technology now makes it possible to replace any specific r 2002 Published by Elsevier Science Ltd.
398
L. DEMETRIUS
amino acid in a protein sequence by any other naturally occurring amino acid and thus assess the influence of these changes on folding and stability. It is surmized that such experiments will result in the development of a database of structure–activity relations, ultimately leading to general rules relating structure and folding kinetics. The application of this experimental technique has generated considerable insight into the relations between the structure and folding dynamics of several proteins, see for example, Fersht (1995); however, no general principle has emerged from these engineering considerations. This article appeals to the notion of sitedirected mutagenesis as a technique for elucidating structure–kinetic relations, but embeds this process in an evolutionary context. We exploit the idea that the structure and folding dynamics of natural proteins are the result of an evolutionary experiment of several million years duration. This experiment, which can be considered to be Nature’s analogue of site-directed mutagenesis, is a two-stage event, involving mutation, a random process which acts on the gene that codes for the protein, and selection, a deterministic process, which acts on the organism that carries the protein. Mutation generates new varieties of proteins through changes in the amino acid sequence; selection screens proteins for those structures which confer a selective advantage to the organism. The large diversity observed in the folding dynamics and folding pathway of present-day proteins will therefore be contingent on the folding dynamics of primordial proteins, and the nature of the evolutionary trajectory which relates present-day proteins with their primitive ancestors. The problems we now consider may be articulated as follows: (I) What are the structure–activity relations that define primordial proteins? (II) What are the dynamical principles that specify the changes in protein structure and kinetics as one class of proteins replace another during the evolutionary process? We address (I) by postulating the following properties regarding primitive polypeptide chains: Primitive polypeptide chains can form multiple conformational states and are characterized by the absence of disulfide bridges.
The hypothesis regarding multiple conformational states derives from the notion that pre-biotic polypeptides were necessarily short sequences, a property which it would have inherited from an ancestral RNA-based translation machinery. Empirical studies of the folding dynamics of short polypeptide chains indicate that these sequences can assume multiple conformational states, Davies et al. (1998), Kabsch & Sander (1984). The hypothesis regarding the absence of disulfide bridges derives from the following property: the formation of covalent links with disulfide bridges requires an oxidative environment. However, it is a well-established fact that an oxidative environment only emerged with the advent of oxygen-producing photosynthetic bacteria, a later event in the evolutionary history of protein molecules; accordingly, disulfide bridges in polypeptides represent evolutionary addenda. We address (II) in terms of a mathematical model which analyses changes in the topography of the folding energy landscape in terms of a new synthesis of transition state theory and evolutionary dynamics, Demetrius (1995). The outcome of this analysis is a series of rules relating (i) the structure of a protein, as defined by (a), the incidence of disulfide bridges, (b) the multiplicity of its domain structure, with (ii) its folding dynamics, as described by: (a) the intrinsic folding rate, (b) the incidence of intermediate states, (c) the stability of the folded state. The correlation between structure and kinetics which our model predicts is described in Table 1. Our analysis thus provides a characterization of an important subfamily of proteins which we describe by the following principle: Folding dynamics of single-domain proteins which lack disulfide bridges is described by two-state kinetics. Folding rate in this class of proteins is positively correlated with the thermodynamic stability of the folded state. We now provide a synopsis of the evolutionary argument which forms the basis for the correlations expressed in Table 1. Transition state theory assumes that the transformation from the denatured to the folded state will occur by the reaction pathway with the lowest energy barrier and requires that the protein has sufficient energy to overcome this
399
THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING
Table 1 Relation between protein structure and folding kinetics Structure
Folding rate
Stability of intermediates
Nature of folding pathway
Molecules with disulfide bridges Molecules which lack disulfide bridges (a) Single-domain chains (b) Multi-domain monomers (c) Multi-domain oligomers
Large variability
Large diversity in stability
Multi-state
Fast Intermediate Slow
Unstable Metastable Metastable
Two state Multi-state Multi-state
barrier. The structure of highest energy on the reaction pathway is called the transition state. The structure of the transition state and its relation to the structure of the denatured state is determined by a set of activation parameters which reflect energetic and structural differences between the protein in the transition state and the unfolded state. The free energy, DG{ ; describes the mean-energy difference between the transition state and the unfolded state; the enthalpy, DH{ ; measures the change in magnitude of inter-chain interactions; and the entropy DS{ ; describes the corresponding conformational and configurational changes the protein undergoes. These three quantities are related by the identity DG{ ¼ DH{ TDS{ ;
ð1Þ
where T denotes the absolute temperature. The activation parameters are assumed to characterize intrinsic properties of the molecule, a condition which is equivalent to ignoring solvent effects. A fourth quantity of interest, the heat capacity of activation, DC{ ; arises when the activation enthalpy is temperature dependent. The quantity DC{ describes the difference between the heat capacities of the unfolded and activated states. Hence DC{ ¼ C{ Cu ; where C{ and Cu denote the heat capacity of the transition state and the denatured state, respectively. These quantities are assumed to refer to intrinsic properties of the protein: they thus characterize the protein in the anhydrous state, when the energetic contributions due to solvent effects may be neglected. Now, when contributions due to hydration terms
are ignored, the heat capacity C{ in the transition state derives from two factors: (a) the primary or covalent structure of the protein, and the vibrational frequencies arising from the stretching and bending modes of each valence bond and internal rotations; (b) the noncovalent interactions arising from secondary and tertiary structure in the transition state. The first factor defines the heat capacity in the denatured state, Cu ; and represents the major contribution to C{ ; Gomez et al. (1995). Accordingly, the heat capacity will be primarily determined by contributions from non-covalent interactions, hence DC{ will be a small nonnegative quantity. We will appeal to these observations to classify proteins in terms of their thermodynamic parameters in their kinetically significant transition states. When DC{ ¼ 0; a condition which entails that DH{ is temperature independent, the classification assumes the following pattern, Demetrius (1998) DH{ 40; DS{ o0; type ð1Þ; DH{ o0; DS{ o0; type ð2Þ; DH{ 40; DS{ 40; type ð3Þ: When DC{ 40; a condition which entails that DH{ is temperature dependent, we subdivide the type (3) model into three subgroups: DH{ oTDC{ ; DS{ oDC{ ;
type 3ðaÞ;
DH{ 4TDC{ ; DS{ oDC{ ;
type 3ðbÞ;
DH{ 4TDC{ ; DS{ 4DC{ ; type 3ðcÞ:
400
L. DEMETRIUS
The significance of the classification in terms of the activation parameters resides in the fact that the categories are evolutionarily stable, that is, they remain invariant under mutation and selection, provided the mutational changes in the energy landscape induce no variations in DC{ ; Demetrius (1998). This property of evolutionary stability entails that the different subclasses can be considered as defining folding mechanisms which are not altered by small changes in the structure of the energy landscape. Our model postulates that the selective advantage of the protein is determined by two factors: (a) the production rate of the native protein, (b) its thermodynamic stability. We consider a class of primordial polypeptide chains, that is, chains which lack disulfide bridges and have multiple conformational states, and we analyse changes in the free-energy landscape of the protein under mutation and natural selection. Now production rate is a function of the intrinsic folding rate, a parameter determined by the activation free energy, and the stability of the intermediate in the folding pathway. We will appeal to the production rate parameter, as a measure of selective advantage, to describe evolutionary changes in the intrinsic folding rate, the stability of the intermediate, and the stability of the folded state. We will exploit this description to predict the following correspondence between the different protein classes and the kinetic variables. A(1) Type 1: DH{ 40; DS{ o0: Large variability in intrinsic folding rates, large variation in the stability of intermediates, multi-state transition, instability of the folded state. A(2) Type 2: DH{ o0; DS{ o0; Type 3(a): DH{ oTDC{ ; DS{ oC{ : Rapid folding rates, unstable intermediates, two-state transition, stable folded state. A(3) Type 3(b): DH{ 4TDC{ ; DS{ oDC{ : Intermediate folding rates, metastable intermediates, multi-state transition, stable folded states. A(4) Type 3(c): DH{ 4TDC{ ; DS{ 4DC{ : Slow folding rates, metastable intermediates, multi-state transition, stable folded states.
Now, the thermodynamic stability of the folded state, the other selective property we consider, can be modulated by the presence of disulfide bridges between cysteine residues in the protein, Betz (1993), Doig & Sternberg (1995). The introduction of disulfide bridges would therefore confer a selective advantage to proteins with unstable folded states, and be selectively neutral in the case of proteins with stable folded states. We invoke this property, together with the relations between the stability of the folded state and thermodynamic condition, as expressed in A(1)–A(4), to predict for proteins in the evolutionary limit, the following correspondence between structure, as defined by the incidence of disulfide bridges, and thermodynamic condition. B(1) Molecules with disulfide bridges: Type (1). DH{ 40; DS{ o0: B(2) Molecules which lack disulfide bridges: Type (2). DH{ o0; DS{ o0; Type (3). DH{ 40; DS{ 40: Furthermore, in the case of proteins which lack disulfide bridges, we derive a correlation between molecular complexity and thermodynamic condition by appealing to certain structure–reactivity relations expressed in Hammond’s postulate (see for example, Maskill, 1985; Matthews & Fersht, 1995). We obtain the following relationships: C(1) single-domain chains: Type 2, DH{ o0; DS{ o0; Type 3(a), DH{ oTDC{ ; DS{ oDC{ : C(2) multi-domain monomers: Type 3(b), DH{ 4TDC{ ; DS{ oDC{ : C(3) multi-domain oligomers: Type 3(c), DH{ 4TDC{ ; DS{ 4DC{ : These results will be integrated to predict, in the case of modern proteins, a relation between (a) structure, as defined by domain complexity, and the incidence of disulfide bridges; and (b) activity, as measured by folding rate and the stability of intermediates. The predictions are described in Table 1. One of the findings which has emerged from our analysis is a characterization of the folding
THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING
dynamics of single-domain proteins which lack disulfide bridges. Our analysis predicts that the folding rates in this class of proteins will be optimal and positively correlated with the thermodynamic stability of the folded state. This principle identifies in structural terms a subclass of proteins whose evolutionary lineage is described by a uni-directional increase in folding rate: the present-day molecules in this class are defined by maximal rates subject to the structural constraints that describe the protein. The folding dynamics of small proteins without disulfide bridges have been investigated experimentally, see for example Fersht (1995), Schindler & Schmidt (1996) and computationally, Thirumalai & Klimov (1999). Plaxco et al. (1998) have drawn from various sources to present a list of empirical observations regarding the folding kinetics of single domain chains. These studies indicate that: (1) folding is typically two state, without intermediate states; (2) folding rates are relatively insensitive to sequence changes that do not significantly alter the structure or stability of the folded state; (3) stability of the native state is correlated with folding kinetics. The analysis in this paper provides an evolutionary rationale for these empirical observations. Studies of the folding dynamics of proteins with disulfide bridges based on lattice models, Abkevich & Shakhnovich (2000), and experimental systems, Clarke & Fersht (1993), indicate that even though a disulfide bond always stabilizes the native state, its effect on the kinetics of protein folding depends on its location in the structure. Accordingly, correlations between folding rate and the stability of the native state should not necessarily obtain in the case of proteins with disulfide bridges. The evolutionary analysis developed in this paper provides an explanation for these empirical findings. The ideas and techniques used in this paper draw extensively from our earlier studies, Demetrius (1995,1998), where we first developed a synthesis of transition state theory and evolutionary dynamics to elucidate structureFactivity relations in enzyme catalysis. The notion of the free energy landscape which informs our analysis appeals to ideas of Bryngelson et al.
401
(1995), Dill (1993), Chan & Dill (1998). The series of articles from Fersht’s group based on chymotrypsin inhibitor and barnase, see Tan et al. (1996) for a review, together with the studies of the cold shock protein CspB; Schindler & Schmidt (1996), provide the experimental background for several of the theoretical developments advanced in this paper. A general account of transition state theory and a discussion of the significance of the activation parameters considered in this article can be found in Jencks (1975). Folding Dynamics and Transition States The transformation from the compact denatured state to the native state can be described by the following scheme: U#I1 #I2 #?#In #F ; where U and F denote the unfolded and folded states, respectively, and I denotes the series of intermediates. In our model of the folding process, we will describe the transition state of the overall reaction in terms of the transition state of the rate determining step. The model then assumes the form: U#I#X{ #F ;
ð2Þ
where I denotes the intermediate, and X{ the transition state (Fig. 1). We will assume throughout a quasi-thermodynamic equilibrium between the activated state X{ and the unfolded state U: CLASSIFICATION OF PROTEINS: THE ACTIVATION PARAMETERS
The intrinsic activation parameters, the freeenergy DG{ ; the enthalpy DH{ ; the entropy DS{ and the heat capacity at controlled pressure DC{ ; characterize the configurational properties of the protein in the transition state. We first observe that the variables DG{ ; DH{ and DS{ are related by the identity (1). Accordingly, the transition state can be parameterized in terms of the variables DH{ and DS{ :
402
L. DEMETRIUS
When DC{ 40; type (3) admits a decomposition into three subgroups, Demetrius (1998). Type 3(a): DH{ oTDC{ ; DS{ oDC{ ; type 3(b): DH{ 4TDC{ ; DS{ oDC{ ; and type 3(c): DH{ 4TDC{ ; DS{ 4DC{ (Fig. 3).
+ +
Free Energy
X
U
STABILITY OF THE CLASSIFICATION
I
F
Reaction Coordinate
Fig. 1. Free energy landscape with intermediate state.
When DC{ ¼ 0; we can appeal to the condition DG{ 40 to classify proteins into three groups, Demetrius (1992, 1995). Type (1): DH{ 40; DS{ o0: This describes molecules in which the transformation to the transition state involves a weakening of the interaction between residues in the proteins ðDH{ 40Þ; and a shift from a less structured unfolded state to a more structured transition state, ðDS{ o0Þ: Type (2): DH{ o0; DS{ o0: This corresponds to a folding mechanism which involves a strengthening of the interaction between residues in the protein ðDH{ o0Þ; and a transformation from a less compact unfolded state to a more compact transition state, ðDS{ o0Þ: Type (3): DH{ 40; DS{ 40: The folding mechanism represents a weakening of the interchain interactions ðDH{ 40Þ; and a shift to a less-structured transition state, ðDS{ 40Þ (Fig. 2).
The biological significance of the classification we have described rests on certain stability properties of the different categories with respect to mutations that change the free energy profile of the folding reaction. Let V{ denote the potential function defined on O; the configuration space of the protein molecule. The partition function Z{ is given by Z { Z ¼ expðbV{ Þ dm: O
Here b ¼ 1=kT; where k denote Boltzmann constant, T the absolute temperature, and dm denote the probability measure assigned to each configuration in the space O: The thermodynamic functions G{ ; H{ and S{ , can all be expressed in terms of the potential function V{ and the partition function Z{
H{ ¼
G{ ¼ kT log Z{ ; ð3aÞ Z V{ g dm; S{ ¼ k g log g dm; ð3bÞ
Z
2
{
kT C ¼
Z
{2
V g dm
Z
2 V g dm ; {
ð3cÞ
+ + ∆H =T∆ S + + ∆H >T∆ C+, ∆ S+> ∆ C+ Type 3(c) + + + + ∆H+>T∆C ,+∆S+< ∆C + Type 3(b)
+ +
∆H
+ +
+ +
∆H =T∆ S
+
+
+ +
∆H >0, ∆ S< 0 Type (1)
+
∆ H+
+
∆H+ >0, ∆ S+>0 Type (3)
+
T∆C +
+
+
+
+
∆ H+< T∆C +, ∆S+< ∆C + Type 3(a)
+
T∆S+ + +
+ +
∆H <0, ∆ S <0 Type (2) T∆C +
Fig. 2. Classification of proteins according to the parameters DH{ ; DS{ :
T∆S+
Fig. 3. Classification in terms of the thermodynamic parameters DH{ ; DS{ ; DC{ :
403
THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING
defined at the transition state and the unfolded state, respectively. In view of eqn (4) and its analog, which pertains to the folded state, we obtain:
+ +
Free Energy
X
{ * Þ ¼ dDH{ ; DðDG U I
F Reaction Coordinate
Fig. 4. Changes in free energy landscape due to mutation.
where g¼R
expðbV{ Þ : expðbV{ Þ dm
A mutation is described in terms of a change in the potential function. We will assume that the change in V{ is given by V{ ðdÞ ¼ V{ þ d:V{ ; where d denotes the intensity of the mutation (Fig. 4). Mutations will typically result in a change in the values of the thermodynamic parameters that describe the transition state. { { * { ¼ X{ Let DX ðmutantÞ XðwildtypeÞ where X denotes a given thermodynamic parameter. The following perturbation relations have been shown to hold when d is small, see the appendix, * { ¼ d:H{ ; DG
* { ¼ d:C{ : DS
ð4Þ
These relations pertain to the perturbation at the transition state X{ : Evidently, we can consider the unfolded state U; defined by the thermodynamic parameters Gu ; Hu ; Su and Cu ; and also invoke statistical mechanics to express these quantities in terms of the partition function Z u and the potential function Vu ; yielding equations analogous to eqn (3). Assuming that the effects of mutation on the potential energy functions that define the transition states and the unfolded states are equivalent, we will obtain for mutational changes in the thermodynamic parameters Gu and Su ; relations similar to eqn (4). Now, let DX{ ¼ X{ X u denote the difference between the thermodynamic variables
{ * DðDS Þ ¼ dDC{ : ð5Þ
Since DC{ 40; we can infer from eqn (5) the following implications: { * * Þ:DðDS{ Þ40; DH{ o0 ) DðDG
ð6aÞ
{ * * Þ:DðDS{ Þo0: DH{ 40 ) DðDG
ð6bÞ
These mutation relations allow us to assess the evolutionary stability of the different categories. In applying these relations we will take account of our earlier observation that, since solvation effects are ignored, DC{ ; the difference between the activation capacity at the transition state and the unfolded state, is a small positive quantity. In assessing the evolutionary stability of the different classes of proteins, we will distinguish between two categories of mutations in terms of their effects on DC{ ; a quantity which reflects the intensity of fluctuations due to the non-covalent interactions within the molecule. { * Þ ¼ 0: This correClass (a) mutations: DðDC sponds to variations which induce no change in the heat capacity. This class of mutations will exert, as shown in Demetrius (1998), small effects on DH{ and DS{ : It can be shown from the perturbation relations given by eqn (5) that, under this class of mutations, the parameters DH{ and DS{ undergo no change in sign. This property entails that each protein category, as defined by the Type 1, Type 2 and Type 3 classes, characterize a superfamily: all members of a given category being defined by similar folding pathways. { * Þa0: This characClass (b) mutations: DðDC terizes mutations that alter the heat capacity. Such perturbations may induce large effects on the parameters DH{ and DS{ : We showed in Demetrius (1998) that, in the event of such changes, the parameters DH{ and DS{ may undergo changes in sign. This situation indicates that, under class (b) mutations, shifts from one protein family to another may occur: types 2 and 3 will remain stable under these mutations, however, in the case of
404
L. DEMETRIUS
type (1) proteins, transitions to types (2) and (3) may occur. The instability of the type (1) condition and the stability of types (2) and (3) models imply that type (1) model must necessarily characterize the folding dynamics of primordial proteins. A critical point in our analysis will be the assumption that class (a) mutations in view of their small effects will be comparatively common, whereas class (b) mutations which induce large effects will be comparatively rare. ACTIVATION PARAMETERS AND RATE CONSTANTS
The production rate v of the process described by eqn (2) is given by equation 1 v¼k : ð7Þ 1 þ 1=K K represents the equilibrium reaction U"I; whereas k; the rate, is related to the activation by relation kB T DG{ k¼ exp ; ð8Þ h RT
The parameter constant of the intrinsic folding free energy DG{
where kB denote Boltzman’s constant and h Planck’s constant. Now mutations in the energy landscape will induce changes in the kinetic variables k and K; and correspondingly, changes in the thermodynamic variables DG{ and DS{ : In view of the analytic relation between k and DG{ as given in eqn (8), we obtain { * DðDG * Þo0: Dk
ð9Þ
Now, there exist no analytic relation between K and DS{ : To derive correlations between the { * * Þ; we will appeal to changes DK and DðDS Hammond’s postulate: If two states occur consecutively during a reaction process and have nearly the same energy content, their interconversion will involve only a small reorganization of molecular structure. In applying this postulate, we first observe that the relative position of the transition state and the native state, can be characterized in terms of the activation entropy. This observation
derives from computational, experimental and theoretical considerations which we now delineate. The transformation from the denatured state to the native state is always attended by a decrease in entropy. Hence when DS{ o0; the transition state will be more structurally similar to the native state than to the intermediate state. Analogously, when DS{ 40; the transition state will possess only a few of the contacts that define the native state. We infer from these observations that the sign of the DS{ provides an index of the relative position of the transition state along the reaction pathway: DS{ o0 denotes a configuration far from the intermediate state; DS{ 40 describes a configuration close to the intermediate state. We can now use this property to infer, using Hammond’s postulate, correlations between the changes in DK and DðDS{ Þ induced by mutations in the energy landscape. We have the following implications: { * DðDS * Þo0; DS{ o0 ) DK
ð10aÞ
{ * DðDS * Þ40: DS{ 40 ) DK
ð10bÞ
Evolution of Rate Constants Protein evolution is a two-stage process: the first is non-adaptive and is described by random mutation acting on the gene that codes for the protein; the second is adaptive and refers to natural selection acting on the organism that carries the protein. We will analyse the effects of this two-stage process on the different classes of protein molecules. The idea that drives this analysis is the notion that all modern proteins are the result of a long evolutionary history described by mutation and natural selection acting on a class of primordial molecules. Our study is based on the following set of assumptions. 1. The native configuration of primordial polypeptide chains is described by metastable conformational states. 2. Primordial sequences lack intra and interchain disulfide bridges.
THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING
3. The changes in protein structure and function over evolutionary time are driven primarily by class (a) mutations, changes which induce small effects, and secondarily by class (b) mutations, variations with large effects. We can infer from assumption (3), and our earlier remarks concerning the stability of the different protein categories under class (a) and class (b) mutations, that our protein phylogeny will situate the type 1 molecules at the root of the evolutionary tree. Evolutionary changes within the Type (1) lineage will be generated by class (a) mutations. Branching to type (2) and type (3) will be induced by class (b) mutations. We will now study evolutionary changes within each of the protein categories. We will first consider models in which the selective criterion is uniquely determined by the folding rate parameter v; and analyse the changes in the kinetic and thermodynamic parameters under random mutation which acts on the gene that codes for the protein, and natural selection, which acts on the organism that carries the protein. KINETIC VARIABLES: MUTATIONAL EFFECTS
A mutation in the gene that codes for the protein will induce a change in the rate constants k and K: The corresponding changes in v can be derived by a perturbation analysis of eqn (7). We have * * * Dv 1 1 Dk 1 DK ¼ 1þ þ : ð11Þ 1þ v K K k K K From eqn (11) we observe the following series of implications: * * * (I) (a) If Dk40; DK40; then Dv40 holds for all values of the kinetic parameters. * * * (II) (a) If Dk40; DKo0; then Dv40; * * provided ð1 þ KÞK=k4 DK=Dk: * * * (b) If Dko0; DK40; then Dv40 provided * * ð1 þ KÞK=ko DK=Dk: It is evident from items (I) and (II) that, in view of the constraints on the parameters k and
405
K which attend the invasion criterion Dv40; the following properties necessarily obtain: * DK40 * (III) If Dk: then invading mutants are described by a uni-directional increase in both k and K: * DKo0; * (IV) If Dk: then invading mutants are characterized by non-directional changes in the parameters k and K: The Evolutionary Dynamics We will now draw from the analyses in the previous sections to study the evolutionary changes in the properties: intrinsic folding rate, the stability of the intermediate state, and the thermodynamic stability of the folded state. The intrinsic folding rate k can be described in . terms of Kramer’s model, Honggi et al. (1990). In this model the rate k is given by ! A DG{ exp : ð12Þ k¼ gp þ gs RT Here A is a viscosity and temperature independent parameter which depends on the shape of the potential surface. The function gp is a frictional constant determined by the internal motion of the protein such as intra-chain collisions. It reflects certain structural constraints within the protein and can be considered an expression of its topological complexity. The parameter gs is a frictional constant generated by motion in the solvent; it is proportional to its viscosity. The stability of the intermediate is determined by the equilibrium constant K ¼ ½I=½U; where ½U and ½I denote the equilibrium concentrations of the denatured state and the intermediate state, respectively. The stability of the native state is defined by the free energy difference DG ¼ GðFÞ GðUÞ : This can be obtained from the experimental denaturation curve by using expression # DG ¼ RT log K; where K# ¼ ½F =½U; and ½F and ½U denote the equilibrium concentrations of the native and denatured protein, respectively. * * K# due to mutation The changes DK and D are correlated. We have, by appealing to
406
L. DEMETRIUS
Hammond’s postulate regarding reactivity–specificity relations, the property, * D * K40: # DK: ð13Þ We will exploit eqn (13) together with the correlations expressed in eqns (6), (9) and (10) to derive evolutionary trends in the folding rate k; the stability parameter K which describes to the intermediate state, and the stability parameter K# which describes the native state. Items (III) and (IV) in the preceding section regarding evolution of rate constants will play a critical role in our analysis. The trends we now describe are all derived by analysing evolutionary changes in the kinetic and stability parameters k; K and K^ : Similar patterns were obtained in the study of evolutionary changes in the kinetic parameters that define enzyme catalysis, Demetrius (1998). We will refer to this article for details regarding the correspondence between the thermodynamic condition and the evolutionary trends we now delineate. Type 1. DH{ 40; DS{ o0: This thermodynamic constraint entails in view of eqn (6b) that * DKo0 * the condition Dk: holds for mutations in the Type 1 category. We can now appeal to (IV) to infer that evolutionary changes in this protein family will be described by non-directional changes in the kinetic variables. In the evolutionary limit, this class of proteins will therefore be characterized by a large variability in the folding rate, and a large range of values for the # The latter stability parameters K and K: property entails that stable intermediates and unstable final states, may occur within this protein family. Type 2: DH{ o0; DS{ o0; Type 3(a) DH{ oTDC{ ; DS{ oDC{ : The thermodynamic condition now implies, in view of eqn (6a), that * DK40 * Dk: for mutants in the Type (2), Type (3)(a) families. We conclude from (III) that evolutionary changes in this protein family will be characterized by a uni-directional increase in the kinetic parameters. The uni-directional increase in k; in view of the constraints on DH{ and DS{ ; imply that the global maximum in folding rates will be attained. This property, together with the directional
changes in K and K# entail that intermediates will be unstable and the folded state thermodynamically stable. Type 3(b): DH{ oTDC{ ; DS{ 4DC{ ; Type 3(c): DH{ 4TDC{ ; DS{ 4DC{ : The thermody* DK40 * namic condition also entails that Dk for mutants in this superfamily. We can therefore also infer a uni-directional increase in the kinetic variable k and the stability para# In view of the constraints on meters K and K: { { DH and DS which exist for this superfamily, the folding rate will tend to a local maximum. Accordingly, the folding rates, intermediate for Type 3(b) and slow for Type 3(c), will be distinctly inferior to the values defined by the Type 2 and Type 3a models. This feature, in view of the uni-directional increase in the parameters # imply that intermediates will be K and K; metastable and the folded configuration thermodynamically stable. These characterizations of the evolutionary dynamics of the different protein categories entail the following representation of the protein classes in the evolutionary limit. Type 1 proteins: variable folding rates, stable intermediates, unstable native state. Type 2 and Type 3(a): fast folding rates; unstable intermediates, stable native state. Type 3(b): intermediate folding rates, metastable intermediates, stable native state. Type 3(c): slow folding rates, metastable intermediates, stable native state. This correspondence between thermodynamic condition and kinetic variable is summarized in Table 2. We should emphasize at this juncture that the above description derives from the assumption that evolution occurred in a reducing environment, which, as we have postulated would hinder the formation of disulfide bridges. We now consider evolution under conditions where molecular oxygen is present, a much later stage in the evolutionary history of proteins. In this new model, we analyse the fixation and extinction dynamics of mutants that introduce disulfide bridges in the population, and we exploit the fact that these covalent crosslinks between cysteine
407
THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING
Table 2 Relation between thermodynamic condition and folding kinetics Thermodynamic condition
Folding rate
Folding kinetics: stability of intermediate
Nature of folding pathway
DH{ 40; DS{ o0 DH{ o0; DS{ o0 DS{ oTDC{ ; DS{ oDC{ DH{ 4TDC{ ; DS{ oDC{ DH{ 4TDC{ ; DS{ 4DC{
Large variability Fast
Stable Unstable
Multi-state Two-state
Intermediate Slow
Metastable Metastable
Multi-state Multi-state
residues can increase the stability of the folded state, Betz (1993), Doig & Sternberg (1995). Now, in view of the instability of the folded state of Type 1 proteins, Type 1 mutants with disulfide bridges will have a selective advantage since these covalent links will necessarily confer stability to the folded state. Such mutants will increase in frequency and will ultimately become established in the population. However, the stability of the folded state in Type 2 and Type 3 proteins implies that, in evolution within these lineages, mutants with disulfide bridges will have a neutral effect on the stability of the folded state. Such mutants can only become established through random drift. We can therefore predict the following correspondence between: (a) the thermodynamic condition of the proteins, and (b) the incidence of disulfide bridges, and the kinetic and stability properties of the molecule. D(I) Type 1 proteins: Incidence of disulfide bridges, large variability in folding rates, stable intermediates, thermodynamically stable native states. D(II) Type 2 and Type 3(a): Absence of disulfide bridges, globally maximal folding rates, unstable intermediates, stable native state. D(III) Type 3(b), Type 3(c): Absence of disulfide bridges, locally maximal folding rates, metastable intermediates, stable native states.
Protein Structure and Thermodynamic Parameters Our ultimate goal is to predict kinetics by appealing to the molecular geometry of the protein, as defined by the complexity of the domain structure (single-domain chains, multidomain chains).
The complexity of the domain structure of a protein is known to determine the degree of cooperativity of the folding process, and the structural changes in the molecule during the transformation from the unfolded to the transition state. These structural changes can be parameterized in terms of the degree of compactness of the transition state and the uniformity of the interatomic interactionsFproperties which have thermodynamic representations. We will exploit these observations to relate the classification based on the domain structure, with the categorization based on the activation parameters. This relation which we will derive uniquely for proteins which lack disulfide bridges, is based on two patterns of correspondence, the first relating domain structure with the dynamics of folding pathways, the second relating the dynamics of the folding pathway with the thermodynamic condition. The correspondence we describe is based on empirical studies of folding dynamics by Fersht (1995) using CI2; a single-domain chain, and barnase, a multi-domain protein. (I) Domain structure and folding dynamics. A domain is a unit of tertiary structure. Domains constitute structural and functional folding units of a protein. Evidence that two regions of the same protein are domains derives from the fact that a limited cleavage of the polypeptide releases independent fragments of the protein which retain their respective native structures. We will distinguish between three structural classes, defined by domain multiplicity: singledomain chains, multiple domain monomers, multiple domain oligomers. We will then parameterize the three structures in terms of the following properties that define the transition state. (i) The compactness of the transition state;
408
L. DEMETRIUS
that is, the extent to which the elements that make up the secondary and tertiary structures are integrated to form a relatively rigid entity. (ii) The range of the interaction; that is, the extent to which the interactions do not segregate into regions that make more secondary or tertiary interactions within themselves than they do with neighboring regions. Single-domain proteins: Folding in this class of molecules proceeds by a nucleation–condensation mechanism. This leads to the construction of a transition state around an extended delocalized nucleus. Proteins in this class will be characterized by compact transition states and short-range interatomic interactions. Multi-domain monomers: Folding in this family of proteins does not proceed as a single cooperative unit. There is rapid formation of the individual subunits followed by association to form folded monomers. The rapidity of the formation of the individual units induces a highly compact transition state, as in the case of single-domain proteins. However, the independent folding dynamics of subunits entails that the interatomic interactions will be long range. Multi-domain oligomers: The folding subunits are monomers, which are relatively stable in isolation. The units fold independently thus leading to a heterogeneous transition state. Folding in this case will be characterized by a non-compact transition state. The interactions, as in the folding dynamics of multi-domain monomers, will be long range. The correspondence between domain structure and folding pathways which we have delineated can be summarized as follows: I (a) Single-domain chains: compact transition states, short-range interactions. I (b) Multi-domain monomers: compact transition states, long-range interactions. I (c) Multi-domain oligomers: non-compact transition state, long-range interactions. (II) Folding dynamics and the thermodynamic condition. The folding dynamics, as parameterized by the degree of uniformity of the interatomic interactions, and the degree of compactness of the transition state, can be expressed
in terms of certain expansivity parameters that describe changes in activation enthalpy and activation entropy with respect to temperature. Our argument is based on the assumption that changes in the structure of the proteins with respect to changes in temperature are determined by the degree of uniformity and the degree of compactness of the transition state. We write a¼
@ logðDH{ Þ ; @ log T
b¼
@ logðDS{ Þ : @ log T
ð14Þ
The parameters a and b describe the fractional increase in enthalpy and entropy respectively, due to a fractional increase in temperature. Now the enthalpy DH{ represents the change in the extent of interaction between residues during the transformation from the unfolded to the transition state. The parameter a is a measure of the relative expansivity of this property, a condition which will depend on the range of the interatomic interactions. The condition a41 describes an increase in expansivity with temperature and hence corresponds to a transition state with short-range interactions between residues; ao1; describes a decrease in expansivity and represents the condition of longrange interactions. The entropy DS{ represents the change in the degree of disorder of the polypeptide chain during the transformation to the transition state. The parameter b is a measure of the relative expansivity of the change in disorder, a property which depends on the degree of compactness of the transition state. The condition b41 describes an increase in expansivity with temperature and hence corresponds to a compact transition state; bo1 represents a non-compact transition state. In view of the preceding observations, the relations between the structure of the protein in the transition state and the conditions on the parameters a and b can be summarized as follows: II (a) Short-range interactions, compact transition state: a41; b41: II (b) Long-range interactions, compact transition state: ao1; b41: II (c) Long-range interactions, non-compact transition state: ao1; bo1:
THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING
(III) Domain structure and thermodynamic condition: We now invoke (I) and (II) to infer a correspondence between domain multiplicity and thermodynamic condition. We first use (I) and (II) to infer a relation between domain multiplicity and the expansivity parameters, as defined by eqn (13). We have (i) single-domain proteins: a41 b41; (ii) multi-domain monomers: ao1 b41 (iii) multi-domain oligomers: ao1 bo1: Now DC{ ¼ T@ðDS{ Þ=@T ¼ @ðDH{ Þ=@T: Hence from eqn (14), we have a ¼ TDC{ = DH{ ; b ¼ DC{ =DS{ : These expressions for a and b yield, when the condition DH{ 40; DS{ 40 obtain, the following series of implications: DH{ oTDC{ ; DS{ oDC{ 3 a41; b41; DH 4TDC ; DS oDC 3 ao1; b41; {
{
{
{
DH{ 4TDC{ ; DS{ 4DC{ 3 ao1; bo1: The above implications were derived for the cases where DH{ 40; DS{ 40: This condition defines the Type 3 proteins. The preceding analysis thus refers to the Type 3(a), Type 3(b) and Type 3(c) molecules. We now consider the Type 2 case ðDH{ o0; DS{ o0Þ: We note that when DH{ o0; the transition state will necessarily be described by short-range interatomic interactions, and when DS{ o0; the state is characterized by a decrease in disorder, a property which corresponds to a compact transition state. We can now appeal to the above series of observations to infer the following correspondence between structure and thermodynamic condition for proteins which lack intra- and inter-chain disulfide bridges: E (I) Single-domain chains: Type (2); Type (3a); E (II) Multi-domain monomers: Type (3b); E (III) Multi-domain oligomers: Type (3c). PROTEIN STRUCTURE AND FOLDING KINETICS
The evolutionary analysis based on transition state theory and evolutionary dynamics has established a correspondence between thermo-
409
dynamic condition and the folding mechanism (see Table 2). The argument based on structure– activity relations has determined a correspondence between protein structure and the thermodynamic condition, as expressed in D(I), D(II), D(III) relating the incidence of disulfide bridges with the thermodynamic state; and in E(I), E(II) and E(III), relating domain structure with the thermodynamic state. These properties can be integrated to yield a series of relations between (a) protein structure, as defined by the multiplicity of the subunits, and (b) the folding kinetics, as expressed by intrinsic folding rates and the stability of intermediates. This correspondence is summarized in Table 1. Folding Dynamics in Single-domain Chains Proteins are known to show a large variability in folding rates, from the submillisecond range to minutes. The analysis we have described codifies and rationalizes this kinetic diversity. In the case of single-domain chains which lack disulfide bridges, our model predicts that folding kinetics will be two state and the folding rate will now be given by eqn (12). Accordingly, the folding rate will now be determined by (i) the height of the energy barrier DG{ ; (ii) the intra-chain collisions, as expressed by gp ; (iii) the chain–solvent interactions, as reflected in gs : The parameter gp ; a measure of the topological complexity of the molecule, may show large variation. Consequently, the folding rates within the class of single-domain chains will be highly diverse: maximal rates being attained by molecules defined by negligible intra-chain collisions (topologically complex chains, gp a0). In view of eqn (12), and the uni-directional increase in the folding rate k and the stability parameter K; which the evolutionary model predicts, it follows that the folding rate and the thermodynamic stability of the folded state will be positively correlated. Table 3, adapted from Jackson (1998), gives a list of single-domain chains which lack disulfide
410
L. DEMETRIUS
Table 3 Single domain proteins with two-state kinetics Protein
Structure
Chain length
Folding rate k ðs1 Þ
Acyl-coenzyme A Cytochrome C Cold shock protein, (CspB) Chymotrypsin inhibitor (CI2) Ubiquitin FKB12
a a b
86 104 67
704 2800 1070
a=b
64
48
a=b a=b
76 107
1532 4:3
bridges. These proteins, in view of the large differences in topological complexity, exhibit a large variability in folding rates. Their kinetic behavior, however, is described by two-state transitions, in accord with our predictions. The list in Table 3 is restricted to molecules which function in the cell as independent units. We thus exclude polypeptide chains which are small fragments of larger proteins. Such chains, for example, the IgG binding domain of streptococcal protein L and the SH3 domains, are not independent entities; accordingly, their kinetics may not be determined by the dynamical principles which underlie the evolution of functional molecules.
Conclusion This article invokes an evolutionary dynamic model of the energy landscape in protein folding to explain the large diversity which characterizes folding rates in proteins. The model rests on the idea that disulfide bridges act to stabilize the folded state and constitute late addenda in the evolutionary history of proteins. The models predict that: for proteins which lack disulfide bridges, folding dynamics in single-chain domains will be two state and folding rate will be positively correlated with the stability of the native state. These predictions provide an evolutionary rationale for a large class of empirical observations and lattice model simulations.
REFERENCES Abkevich, V. I. & Shakhnovich, E. I. (2000). What can disulfide bonds tell us about protein energetics, function
and folding: Simulations and bioinformatics analysis. J. Math. Biol. 300, 975–985. Betz, S. (1993). Disulfide bonds and the stability of globular proteins. Protein Sci. 2, 15 551–15 558. Bryngelson, J., Onuchi, J., Socci, N. & Wolynes, P. G. (1995). Funnels, pathways and the energy landscape of protein folding: a synthesis. Proteins, Struct., Function Genet. 21, 167–195. Chan, H. & Dill, K. (1998). Protein folding in the landscape perspective: Chevron plots and nonArrhenius kinetics. Proteins: Struct., Function Genet. 30, 2–33. Clarke, J. & Fersht, A. R. (1993). Engineered disulfide bonds as probes of the folding pathway of barnaseF increasing the stability of proteins against the rate of denaturation. Biochemistry 32, 4322–4329. Davies, S. M. A., Kelly, S. M., Price, N. C. & Bradshaw, J. P. (1998). Structural plasticity of the feline lukaemia virus fusion peptide, a circular dichoism study. FEBS Lett. 425, 415–418. Demetrius, L. (1992). Thermodynamic perturbation of molecular systems. J. Chem. Phys. 97, 6663–6667. Demetrius, L. (1995). Evolutionary dynamics of enzymes. Protein Eng. 8, 791–800. Demetrius, L. (1998). The role of enzyme–substrate flexibility in catalytic action: an evolutionary perspective. J. theor. Biol. 194, 175–194. Dill, K. A. (1993). Folding proteins: finding a needle in a haystack. Curr. Opin. Struct. Biol. 3, 19–103. Doig, A. & Sternberg, M. (1995). Side chains, conformational entropy in protein folding. Protein Sci. 4, 2247–2251. Fersht, A. R. (1995). Mapping and structures of transition states and intermediates in folding: delineation of pathways at high resolution. Philos. Trans. R. Soc. London B 348, 11–15. Gomez, J., Hilser, V. J., Xie D. & Freire, E. (1995). The heat capacity of proteins. Proteins: Struct., Function Genet. 22, 404–412. H˛nggi, P., Falkner, P. & Barkovec, M. (1990). Rev. Mod. Phys. 62, 251. Jackson, S. (1998). How do small proteins fold? Folding Des. 3, R81–R91. Jencks, W. P. (1975). Binding energy, specificity and enzyme catalysis: the Circe effect. Adv. Enzymol. 43, 219–410. Kabsch, W. & Sander, C. (1984) On the use of sequence homologies to predict protein structure: identical pentapeptides can have different conformations. Proc. Natl Acad. Sci. U.S.A. 81, 1075–1078. Maskill, H. (1985). The Physical Basis of Organic Chemistry, Oxford: Oxford University Press. Matthews, J. M. & Fersht, A. R. (1995). Exploring the energy surface of protein folding by structure-reactivity relationships and engineered proteins: observations of Hammond behavior for the gross structure of the transition state and anti-Hammond behavior for structural elements for unfolding/folding of barnase. Biochemistry 34, 6805–6814. Plaxco, K. W., Simons, K. T. & Baker, D. (1998). Contact order, transition state placement and the refolding rates of single domain proteins. Mol. Biol. 277, 985–994. Schindler, T. & Schmidt, F. (1996). Thermodynamic properties of an extremely rapid folding reaction. Biochemistry 35, 16 833–16 842.
THERMODYNAMICS AND KINETICS OF PROTEIN FOLDING
Tan, Y.-J., Oliveberg, M. & Fersht, A. R. (1996). Titration properties and thermodynamics of the transition state for folding: comparison of two state and multistate folding pathways. J. Mol. Biol. 264, 377–389. Thirumalai, D. & Klimov, D. (1999). Deciphering the time scales and mechanisms of protein folding using minimal off-lattice models. Curr. Opin. Struct. Biol. 9, 197–207.
Appendix A Thermodynamic Perturbation Relations Let O denote the configuration space of the state X and let V denote the potential function defined on O: The partition function Z of the system is given by Z expðbVÞ dm: ðA:1Þ Z¼ O
expðbVÞ : expðbVÞ dm
thermodynamic variable induced by the changes in V: Since G ¼ H TS; we have DG ¼ DH TDS: We now invoke a perturbation analysis to evaluate DG and DS: (1) The function DG: We note from eqn (A.2), that dGðdÞ 1 Z0 ðdÞ ¼ ; dd b ZðdÞ R where ZðdÞ ¼ R O exp½bðV þ dVÞ dm: Since Z0 ðdÞ ¼ R bV expðbV Þ dm; we obtain dG=ddR¼ VgðdÞ dm:Hence for d small, DG ¼ d Vgð0Þ dm: By appealing to the definition of H given in eqn (A.3), we have DG ¼ dH:
Here b ¼ 1=kT; T the absolute temperature, and dm the probability measure assigned to each configuration in the space O: The thermodynamic functions can all be expressed in terms of the potential function V Z expðbVÞ dm ; ðA:2Þ G ¼ kT log O Z Z H ¼ Vg dm; S ¼ kT g log g dm; ðA:3Þ "Z Z 2 # 1 ; ðA:4Þ V2 g dm Vg dm C¼ kT 2 g¼R
411
ðA:5Þ
Consider a perturbation of the potential function V given by VðdÞ ¼ V þ dV; where d denotes the magnitude of the perturbation, and let DG; DH and DS denote the changes in the
(2) The function DS: Now G ¼ H TS: Hence TdS=dd ¼ dH=dd dG=dd: From the definition of H; we have Z Z dH ¼ ðV þ dVÞg0 ðdÞ þ VgðdÞ dm; dd R 0 Hence T dS dd ¼ ðV þ dVÞg ðdÞ dm: R Now g0 ðdÞ ¼ bVgðdÞRþ bgðdÞ VgðdÞR dm: Hence TdS=dd ¼ b½ V2 gðdÞ dm ð VgðdÞ dmÞ2 : Assuming d is small, we obtain "Z Z 2 # d 2 V g dm DS ¼ 2 : Vg dm kT In view of eqn (A.4), we obtain DS ¼ d C: