Lecture 21 In this lecture, we shall continue to discuss protein folding. All the experimental data we discussed, though very interesting by themselves, cannot answer the main question as to how a protein manages to find (Anfinsen, 1973; Levinthal, 1968) its native structure among zillions of others within those minutes or seconds that are assigned for its folding. And the number of alternative structures is vast indeed: it is at least 2100 but may be 3100 or even 10100 for a 100-residue chain, because at least 2 (but more likely 3 or 10) conformations are possible for each residue. Since the chain cannot pass from one conformation to another faster than within a picosecond (the time of a thermal vibration), their exhaustive search would take at least 2100 picoseconds (or 3100 or even 10100), that is, 1010 (or 1025 or even 1080) years. And it looks like the sampling has to be really exhaustive, as the protein can ‘feel’ that it has come to the stable structure only when it hits it precisely, ˚ deviation can strongly increase the chain energy in the because even a 1 A closely packed globule. How does the protein choose its native structure among zillions of possible others?, asked Levinthal (who first noticed this paradox), and answered (Levinthal, 1968, 1969): It seems that the protein folding follows some specific pathway, and the native fold is simply the end of this pathway, no matter if it is the most stable chain fold or not. In other words, Levinthal suggested that the native protein structure is determined by kinetics rather than stability and corresponds to the easily accessible local rather than the global free energy minimum. The difficulty of this problem is that it cannot be solved by direct experiment. Indeed, suppose that the protein has some structure that is more stable than the native one. How can we find it if the protein does not do so itself? Shall we wait for 1010 (or even 1080) years? On the other hand, the question as to whether the protein structure is controlled by kinetics or stability arises again and again, when one has to solve practical problems of protein physics and engineering. For example, in predicting a protein’s structure from its sequence, what should we look for? The most stable or the most rapidly folding structure? In designing a protein de novo, should we maximize the stability of the desired fold, or create a rapid pathway to this fold? However, is there a real contradiction between “the most stable” and the “rapidly folding” structure? Maybe, the stable structure automatically forms a focus for the “rapid” folding pathways, and therefore it is automatically capable of fast folding? Before considering these questions, ie, before considering the kinetic aspects of protein folding, let us recall some basic facts concerning protein thermodynamics (as before, I will talk about single-domain proteins only, ie, chains Protein Physics. http://dx.doi.org/10.1016/B978-0-12-809676-5.00021-1 © 2016 Elsevier Ltd. All rights reserved.
323
324 PART
V Cooperative Transitions in Protein Molecules
of 50–200 residues). These facts will help us to understand what chains and what folding conditions we have to consider. The facts are as follows: 1. Protein unfolding is reversible, and it occurs as an “all-or-none” transition. The latter means that only two states of the protein molecule, native and denatured, are present (close to the denaturation point) in a visible quantity, while all others, semi-native or misfolded, are virtually absent. Such a transition (as we already know) requires an amino acid sequence that provides a large energy gap between the most stable structure and the bulk of misfolded ones. 2. The denatured state, at least that of small proteins unfolded by a strong denaturant, is often the random coil. 3. Even under normal physiological conditions the native state of a protein is only more stable than its unfolded state by a few kilocalories per mole (and these two states have equal stability at mid-transition, naturally). The native structure is stable because of its low energy, ie, because of strong interactions within this structure, and the coil state is stable because of its high entropy, that is, because of the vast number of unfolded conformations. Inner voice: You say “entropy” and then mention solely “the vast number of unfolded conformations”, as if solvent entropy does not contribute to the total entropy! Lecturer: It is essential that you note the following: (1) as is customary in the literature on this subject, the term “entropy” as applied to protein folding means only the conformational entropy, and this “entropy” does not include the solvent entropy; (2) correspondingly, the term “energy” means, actually, the “free energy of interactions” (often called the “mean force potential”) since, eg, the hydrophobic and other solvent-mediated forces, with all their solvent entropy, are included in the “energy”. This terminology is commonly used to concentrate on the main problem of sampling the protein chain conformations. Throughout this lecture I will use the terms “energy” and “entropy” only in the above mentioned sense, which allows me to avoid speaking on the solvent—or rather, to take it into consideration in an implicit way. Thus, to solve the Levinthal’s paradox… Inner voice: Are you sure that the “Levinthal’s paradox” is a paradox indeed? Bryngelson and Wolynes (1989) have already mentioned that this ‘paradox’ is based on the absolutely flat (and therefore unrealistic) ‘golf course’ model of the protein potential energy surface (Fig. 21.1A), and somewhat later, Leopold et al. (1992), following the line of Go and Abe (1981), considered more realistic (tilted and biased to the protein’s native structure) energy surfaces and introduced the ‘folding funnels’ (Fig. 21.1B), which seems to eliminate the “paradox” completely!
Lecture 21
325
FIG. 21.1 (A) The “golf course” model of the protein potential energy landscape. (B) The “funnel” model of the protein potential energy landscape. The funnel is centered in the lowest-energy (“native”) structure. (C) In more detail: the bumpy potential energy landscape of a protein chain. A wide (of many kBT) energy gap between the global and other energy minima is necessary to provide the “all-or-none” type of decay of the stable protein structure. The gap makes the protein function specific: like a light bulb, either the protein works or it does not. Only two coordinates (q1 and q2) can be shown in the drawings, while the protein chain conformation is determined by hundreds of coordinates.
Lecturer: It’s not as simple as that… The problem of huge sampling time exists: it has been mathematically proven that, despite the folding funnels and so on, finding the lowest free-energy conformation of a protein chain is the so-called “NP-hard” problem (Ngo and Marks, 1992; Unger and Moult, 1993), which, loosely speaking, requires an exponentially large time to be solved (by a folding chain or by a man). But, indeed, various “folding funnel” models became popular for explaining and illustrating protein folding (Wolynes et al., 1995; Karplus, 1997; N€olting, 2010). The “energy funnel”, centered in the lowest-energy structure, seems to allow protein chains to avoid the “Levinthal’s” sampling all conformations. However, it can be shown that the energy funnels per se do not solve the Levinthal’s paradox—this, as Eugene Shakhnovich used to say, “Fermat’s Last Theorem of protein science”… Strict analysis (Bogatyreva and Finkelstein, 2001) of the straightforwardly presented funnel models (Zwanzig et al., 1992; Bicout and Szabo, 2000) shows that they cannot simultaneously explain both major features observed in protein folding: (1) its non-astronomical time, and (2) co-existence of the native and unfolded protein molecules during the folding process. By the way, a stepwise mechanism of protein folding (which we discussed recently) also cannot (Finkelstein, 2002) simultaneously explain both these major features observed in protein folding. Thus, neither stepwise nor simple funnel mechanisms solve the Levinthal’s problem, although they give a hint of what accelerates protein folding. A solution of the paradox is provided by special (so-called “capillarity”; see Wolynes, 1997) nucleation funnels (Finkelstein and Badretdinov, 1997a,b), allowing for separation of the unfolded and native phases within the folding chain. It is this solution that I am going to present.
326 PART
V Cooperative Transitions in Protein Molecules
So, I continue. To solve the “Levinthal paradox” and to show that the most stable chain fold can be found within a reasonable time, we could, to a first approximation, consider only the rate of the “all-or-none” transition between the coil and the most stable structure. And we may consider this transition only for the crucial case when the most stable fold is as stable as (or only a little more stable than) the coil, all other forms of the chain being unstable, ie, close to the “all-or-none” transition midpoint. Here the analysis can be made in the simplest form, without accounting for accumulating intermediates. True, the maximum folding rate is achieved when the native fold is considerably more stable than the coil, and then observable intermediates often arise. But let us first consider the situation when the folding is not the fastest but the simplest. Since, as you may remember, the “all-or-none” transition requires a large energy gap between the most stable structure and the misfolded ones (Fig. 21.1C), we will assume that the considered amino acid sequence provides such a gap. I am going to show you that the “gap condition” provides a rapid folding pathway to the global energy minimum, to estimate the rate of folding and to prove that the most stable structure of a normal size domain can fold within seconds or minutes. To prove that the most stable chain structure is capable of rapid folding, it is sufficient to prove that at least one rapid folding pathway leads to this structure. Additional pathways can only accelerate the folding since the rates of parallel reactions are additive. (One can imagine water leaking from a full to an empty pool through cracks in the wall between them: when the cracks cannot absorb all the water, each additional crack accelerates filling of the empty pool. And, by definition of the “all-or-none” transition, all semi- and misfolded forms together are too unstable to absorb a significant fraction of the folding chains and trap them.) To be rapid, the pathway must consist of not too many steps, and most importantly, it must not require overcoming of a too high free-energy barrier. An L-residue chain can, in principle, attain its lowest-energy fold in L steps, each adding one fixed residue to the growing structure (Fig. 21.2). If the free energy went downhill along the entire pathway, a 100-residue chain would fold in 100–1000 ns, since the growth of a structure (eg, an α-helix) by one residue is known to take a few nanoseconds (Zana, 1975). Protein folding takes seconds or minutes rather than a microsecond because of the free-energy barrier: most of the folding time is spent on climbing up this barrier and falling back, rather than on moving along the folding pathway. You should remember that, according to conventional transition state theory, the time of the process is estimated as: (21.1) TIME τ exp + ΔF# =RT where τ is the time of one step, and ΔF# the height of the free-energy barrier. For protein folding, τ is about 1–10 ns (according to the experimentally measured time of the growth of an α-helix by one residue; see Zana, 1975).
Lecture 21
327
FIG. 21.2 Sequential folding pathway. At each step one residue leaves the coil and takes its final position in the lowest-energy structure. The folded part (shaded) is compact and native-like. The bold line shows the backbone fixed in the already folded part; the fixed side chains are not shown for the sake of simplicity (the volume that they occupy is shaded). The broken line shows the still unfolded chain. A rapid pathway requires compactness of the folded phase, ie, the minimal border between folded and unfolded phases. (Adapted from Finkelstein, A.V., Badretdinov, A.Ya., 1997b. Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des. 2, 115–121.)
As for ΔF#, this is our main question: how high is the free-energy barrier F# on the pathway leading to the lowest-energy structure? Folding of this structure decreases both the chain entropy (because of an increase in the chain’s ordering) and its energy (because of the formation of contacts stabilizing the lowest-energy fold). The former increases and the latter decreases the free energy of the chain. If the fold-stabilizing contacts start to arise only when the chain comes very close to its final structure (ie, if the chain has to lose almost all its entropy before the energy starts to decrease), the initial free energy increase would form a very high free-energy barrier (proportional to the total chain entropy lost). The Levinthal paradox claiming that the lowest-energy fold cannot be found within any reasonable time since this involves exhaustive sampling of all chain conformations originates exactly from this picture (loss of the entire entropy before the energy gain). However, this paradox can be avoided if there is a folding pathway where the entropy decrease is immediately or nearly immediately compensated for by the energy decrease. Let us consider a sequential (Fig. 21.2) folding pathway, which was first suggested by Wetlaufer (1973). At each step of this process, one residue leaves the coil and takes its final position in the lowest-energy 3D structure. Inner voice: This pathway looks a bit artificial… How can the residue know its position in the lowest-energy fold? Lecturer: As I told, we have to trace one pathway leading to the native fold. And the outlined pathway does so. As for the impression of its being artificial: look, it is exactly the pathway that you expect to see watching the movie on unfolding, but in the opposite direction.
328 PART
V Cooperative Transitions in Protein Molecules
Inner voice: And why cannot the protein use one way to unfold and quite another to fold? Lecturer: It cannot: the direct and reverse reactions follow the same pathway(s) under the same ambient conditions (and we have already agreed to consider the fixed ambient conditions corresponding to the mid-point of the folding-unfolding equilibrium). This is the detailed balance law of physics. Its proof is very simple: if the direct and reverse reactions followed different pathways under the fixed ambient conditions, a perpetual circular flow would arise in equilibrium. And you could use it to rotate a small turbine. That is, your suggestion would lead to a device (called a perpetuum mobile of the secondorder) that converts surrounding heat into work. And, as you know, or should know, the second law of thermodynamics, the law of maximum possible entropy, states that such a perpetuum mobile is impossible (Landau and Lifshitz, 1980). (Being more specific, the direct and reverse reactions must follow the same pathway under the same conditions, but under different conditions the pathways can be different, of course. That is, folding in water need not follow the same pathway as unfolding in concentrated denaturant; but at the same (fixed) denaturant concentration (eg, at mid-transition) they must use the same pathway.) Inner voice: Still, a movie about explosion of a building, even watched in the opposite direction, is quite different from a movie about building construction … Lecturer: Both construction and explosion proceed at the expense of a huge energy (or, to be more precise, free energy): fuel, manpower, explosives. There are no “fixed ambient conditions” for the building. On the contrary, protein folding and unfolding do not consume any “fuel”, and, as the chevron plots show (see Fig. 21.7 in the last part of this lecture), they can occur around and even at the equilibrium point (Fersht, 1999). This fact (I have stressed it many times) is very significant for an understanding of protein folding. Here, near the equilibrium point, the free energy difference between the folded and unfolded states is very small: zero or about kBT. And, at the very mid-transition (ie, the point where unfolding is in equilibrium with folding and the free energy difference is completely absent), these two processes occur simultaneously under the same ambient conditions, and here the direct and reverse reactions must follow exactly the same pathway and equilibrate one another. Inner voice: So, you are going to estimate the time of protein unfolding (not folding!), and then use the detailed balance law to prove that the folding time is exactly the same? Lecturer: Precisely. I shall estimate the protein unfolding time because it is easier (it is easier to outline a good unfolding pathway than a good folding pathway!) and, according to the detailed balance law, the folding time, under the same ambient conditions, is exactly the same as the unfolding time.
Lecture 21
329
And then, to complete the solution of Levinthal’s problem, we shall come back to considering the folding process. Thus, let us consider the energy change ΔE, the entropy change ΔS and the resultant free energy change ΔF ¼ ΔE TΔS along the sequential (Fig. 21.2) folding pathway. When a piece of the final globule grows sequentially, the interactions that stabilize the final fold are restored sequentially as well. If the folded piece remains compact, as in Fig. 21.2, the number of restored interactions grows (and their total energy decreases) approximately in proportion to the number n of residues that have taken their final positions (Fig. 21.3A). Approximately in proportion—but with one significant deviation: At the beginning of folding, the energy decrease is a little slower, since the contact of a newly joined residue with the surface of a small globule is, on average, smaller than its contact with the surface of a large globule. This results in a
FIG. 21.3 The change of energy (A), entropy and (B) free energy (C) along the sequential folding pathway (D) close to the point of thermodynamic equilibrium between the coil (n ¼ 0) and the final structure (n ¼ L: all the L chain residues are folded). The full energy and entropy changes, ΔE(L) and ΔS(L), are approximately proportional to L. The blue lines show the linear (proportional to the number of already folded residues n) parts of ΔE(n) and ΔS(n). The non-linear parts of ΔE(n) and ΔS(n) result mainly from the boundary between the folded and unfolded halves of the molecule. The maximum deviations of the ΔE(n) and ΔS(n) values from linear dependences are proportional to only L2/3. As a result, the ΔF(n) ¼ ΔE(n) –TΔS(n) value also deviates from linear dependence (the blue line) by a value of L2/3. Thus, at the equilibrium point (where ΔF(0) ¼ ΔF(L)), the maximal, along the pathway, free energy excess ΔF# over the blue free energy baseline is also proportional to only L2/3. (Adapted from Finkelstein, A.V., Badretdinov, A.Ya., 1997a. Physical reason for fast folding of the stable spatial structure of proteins: A solution of the Levinthal paradox. Mol. Biol. (Moscow, Eng. Trans.) 31, 391–398; Finkelstein, A.V., Badretdinov, A.Ya., 1997b. Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des. 2, 115–121.)
330 PART
V Cooperative Transitions in Protein Molecules
non-linear surface term (proportional to n2/3 for an n-residue folded part) in the energy E of the growing globule. Thus, the maximum deviation from a linear energy decrease is proportional to the surface of the folded part, that is, to L2/3, while the total energy decrease is proportional to the total number L of residues. The entropy decrease is also approximately proportional to the number of residues that have taken their final positions (Fig. 21.3B). At the beginning of folding, though, the entropy decrease can be a little faster owing to disordered but closed loops protruding from the growing globule (Figs. 21.2 and 21.4). Their number is proportional to the interface between the folded and unfolded phases, and the free energy of a loop is known (Flory, 1969) to have a very slow, logarithmic dependence on its length. This again results in a non-linear surface term in the entropy ΔS of the growing globule. The overall entropy decrease is proportional to L again, and the maximum deviation from the linear entropy decrease again is proportional to L2/3 (actually, it is L2/3ln(L1/3) at the maximum, but, on the average, the multiplier of the main term, L2/3, is not more than 1 (Finkelstein and Badretdinov, 1997a); you may also take a look at the later rigorous mathematical papers (Fu and Wang, 2004; Steinhofel et al., 2006)). Both linear and surface constituents of ΔS and ΔE enter the free energy ΔF ¼ ΔE TΔS of the growing globule. However, when the final globule is in thermodynamic equilibrium with the coil, the large linear terms annihilate each other in the difference ΔE TΔS (since ΔF ¼ 0 both in the coil (ie, at n ¼ 0) and in the final globule (at n ¼ L)), and only the surface terms remain: ΔF(n) would be zero all along the pathway in the absence of surface terms. Thus, the free-energy barrier (Figs. 21.3C and 21.5) is connected only with the relatively small surface effects at the coil–globule boundary, and the
FIG. 21.4 (A) Compact semi-folded intermediate with protruding unfolded loops. Its growth corresponds to a shift of the boundary between the folded (globular) and unfolded parts. Successful folding requires correct knotting of loops: the structure with incorrect knotting (B) cannot change directly to the correct final structure: it first has to unfold and achieve the correct knotting. However, since a chain of 100 residues can only form one or two knots, the search for correct knotting can only slow down the folding two-fold or at most four-fold; thus, the search for correct chain knotting does not limit the folding rate of normal size protein chains. (Adapted from Finkelstein, A.V., Badretdinov, A.Ya., 1998. Influence of chain knotting on the rate of folding. Erratum to Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des. 3, 67–68.)
Lecture 21
331
FIG. 21.5 The entropy converts the energy funnel (Fig. 21.1B) into a volcano-shaped (as it is called now, see Rollins and Dill, 2014) free-energy folding landscape with free-energy barriers (Fig. 21.3C) on each pathway leading from an unfolded conformation to the native fold.
height of this barrier on a sequential folding pathway (Figs. 21.2 and 21.3D) is proportional not to L (as Levinthal’s estimate implies), but to L2/3 only (Fig. 21.3C). As a result, the time of folding of the most stable chain structure grows with the number of the chain residues L not “according to Levinthal” (ie, not as 2L, or 10L, or any exponent of L), but as exp(λL2/3) only. The value L2/3 arises from the separation of the “native” and “unfolded” phases, and it is much smaller than L. A thorough estimate of the coefficient λ (which is based on only one experimental parameter, namely, the average heat of protein melting per residue that is known from Privalov’s (1979) work to be equal to about 2kBTmelt) shows that λ ¼ 1 0.5, the particular value of λ depending on the distribution of strongly and weakly attracted residues within the lowestenergy structure, and in the main, on the topology of the lowest-energy structure (λ is large when this structure is such that its folding requires intermediates with many closed loops protruding from the native-like part, and λ is small if such loops are not required; (see (Finkelstein and Badretdinov, 1997a,b, 1998) and Supporting Information for Garbuzynskiy et al., 2013). The observed protein folding times (for the coil ! native globule transition at the point of equilibrium between these two states) are indeed (Fig. 21.6) in the range 10 ns exp(0.5 L2/3) to 10 ns exp(1.5 L2/3) ns, in accordance with the estimate obtained. The reason for the “non-Levinthal” estimate, obtained for the mid-transition conditions (where the native fold free energy ΔF counted off that of the denatured state equals zero by definition), h i TIME τ exp ð1 0:5ÞL2=3
(21.2)
is that (1) the entropy decrease is almost immediately compensated for by the energy gain along the sequential folding pathway and (2) the free-energy barrier occurs owing to the surface effects only, which are relatively weak; τ ¼ 10 ns (Zana, 1975). It is noteworthy that the sequential folding pathway does not require any rearrangement of the globular part (which could take a lot of time): all rearrangements occur in the coil and therefore are rapid.
332 PART V Cooperative Transitions in Protein Molecules
FIG. 21.6 (A) Observed (before 2000) folding time at the point of equilibrium between the unfolded and the native states, vs L2/3 (L being the number of residues in the chain). The circles and squares refer to all 36 proteins listed in (Jackson, 1998), whose folding time at the mid-transition can be calculated from the available data. The symbols α, β refer to α-helical (Thompson et al., 1997) and β-hairpin (Mun˜oz et al., 1997) peptides. For many proteins, the folding intermediates are observed, but this occurs far from the mid-transition. The theoretically allowed (gold) region of ln(time) is limited by the lines ln(108 s) + 0.5 L2/3 and ln(108 s) + 1.5 L2/3. As seen, all the experimental points are within this range (except for that of the α-helix, which is a one-dimensional rather than 3D object). (Adapted from Galzitskaya et al., 2001.) (B) Experimentally measured in vitro folding rate constants in water and at the mid-transition for 107 single-domain proteins (or separate domains) without disulfide bonds or covalently bound ligands. The rates are shown in the background of the golden (with the bronze belt) triangle, theoretically outlined for “biologically normal” conditions, plus the white extension of the triangle, additionally outlined for mid-transition conditions. Yellow dashed line limits the area allowed only for oblate (1:2) and oblong (2:1) globules at mid-transition; bronze dashed line means the same for “biologically normal” conditions. L is the number of amino acid residues in the experimentally investigated protein chain. ΔG is the free energy difference between the native and unfolded states of the chain. (Adapted from Garbuzynskiy et al., 2013.) The left edges of triangles in plots (A) and (B) (at ΔF ¼ 0) approximately coincide with the estimated (Finkelstein and Garbuzynskiy, 2015; Finkelstein, 2015) time necessary for exhaustive sampling of all protein folds at the level of secondary structure formation and assembly.
Lecture 21
333
Another similar scaling law (ln(TIME) L1/2) was obtained by Thirumalai (1995) for the case of a very high native fold stability (ΔF ≪ –kBT). Then protein folding essentially goes “downhill” in energy all the way, but the “downhill slope” has (due to protein heterogeneity) random bumps, whose energy is proportional to L1/2. The estimate obtained in Eq. (21.2), illustrated by Fig. 21.6 shows that a chain of L80–90 residues will find its most stable fold within minutes or hours even near the mid-transition, where the folding is the slowest (Fig. 21.7). The native structures of such relatively small proteins are under thermodynamic control: they are the most stable among all structures of such chains. Native structures of larger proteins (of 90–400 residues) are additionally under kinetic control, in a sense that some too entangled folds of their chains cannot be achieved within days or weeks even if they are thermodynamically stable (Garbuzynskiy et al., 2013). This equation also explains why a large protein should consist (according to the “divide and rule” principle) of separately folding domains: otherwise, chains of more than 300 residues would fold too slowly. The above estimates (80–90 and 400 residues) are somewhat increased if the native fold’s free energy F is by many kBT lower than that of the unfolded chain (see below), but essentially remains the same (Garbuzynskiy et al., 2013). The following points are noteworthy: (1) Having found the free energy of the transition state (Fig. 21.3), one can further estimate the size of its globular part. This estimate shows that the nucleus must be as large as about half the protein. This is more or less compatible with
FIG. 21.7 Refolding and unfolding rates of hen egg-white lysozyme vs the concentration of guanidine dihydrochloride. The experimental points show the apparent rate kapp of approach to the equilibrium between the native and denatured forms of the protein. The filled circles correspond to refolding obtained upon diluting a concentrated GdmCl solution of the denatured protein. The open circles correspond to unfolding occurring after GdmCl addition to aqueous solution of the native protein. Note that these points overlap at the GdmCl concentration of 4.2 M, which corresponds to kinetic (and thermodynamic) equilibrium between folded and unfolded protein molecules. (Adapted from Kiefhaber, T., 1995. Kinetic traps in lysozyme folding. Рroc. Natl. Acad. Sci. USA 92, 9029–9033.)
334 PART
V Cooperative Transitions in Protein Molecules
experiment (Fig. 21.7), which shows a crudely equal (but of opposite sign) dependence of the folding and unfolding rates on the denaturant concentration. This means that the solvent-accessible area of the chain in the transition state is between the solvent-accessible areas of the unfolded and the globular states. It seems that the globular part of the transition state is somewhat larger than the folding nucleus itself at the cost of non-native interactions (Li et al., 2000). A large (comprising about half the protein) folding nucleus implies that there cannot be many alternative folding nuclei, which means that there cannot be too many parallel, alternative folding pathways, and thus consideration of only one of them can give a reasonable estimate of the folding time. (2) The “quasi-Levinthal” search over intermediates with different chain knotting (Fig. 21.4) can, in principle, be a rate-limiting factor, since knotting cannot be changed without a decay of the globular part. However, since the computer experiments show that one knot involves about a hundred residues, the search for knotting can only be important for extremely long chains (Finkelstein and Badretdinov, 1998), which cannot fold within a reasonable time (according to Eq. (21.2)) in any case. (3) Our estimate, Eq. (21.2), refers to ΔF ¼ 0, ie, to the point of equilibrium between the unfolded and the native states where the observed folding time is at a maximum and can exceed by orders of magnitude the folding time under native conditions (Fig. 21.7). How will the folding rate change when the native state becomes somewhat more stable than the coil (ie, ΔF < 0, but still ΔF kBT)? (In the opposite case, ie, if ΔF > 0 no folding will happen, and that’s all.) The initial growth of native fold stability has to increase the folding rate, since the transition state is stabilized as well, and the competing misfolded structures are still unstable (relative to the coil) owing to the energy gap between them and the most stable fold (Fig. 21.8A). This acceleration is indeed observed (see the left chevron limb in Fig. 21.7). However, the acceleration proceeds up to a certain limit only (see the short plateau at the top of the left chevron limb in Fig. 21.7; in larger proteins such a plateau is much more pronounced (Baryshnikova et al., 2005; Melnik et al., 2008)). It seems that the maximum folding rate is achieved when the metastable “misfolded” states (which include molten globules and serve as the on- or off-pathway kinetic intermediates) become as stable as the unfolded state. After that, the further increase in stability of the folded states leads to a rapid misfolding followed by a slower conversion into the native state that may occur via the unfolded state (Fig. 21.8B). Leaving temporarily aside the latter case, ie, rare proteins with extremely stable folding intermediates, let us estimate the time of folding under the native conditions. It depends on the mid-transition folding time and on the native (relative to the unfolded) state free energy ΔF under the native conditions, and can be estimated as: h i (21.3) TIME τ exp ð1 0:5Þ L2=3 + 0:4 ΔF=RT
Lecture 21
335
FIG. 21.8 Folding under different conditions. The bold lines show the free energies of the unfolded chain (U), of the most stable (‘native’) fold (N), and of the mis- or semi-folded structures (M). Dotted lines schematically show the behaviour of the free energy along the folding pathways leading to different structures (ignoring small irregularities); their maxima correspond to the transition states on these pathways. The main folding scenarios are given. (A) Fold N is more stable than the coil U, and the coil is more stable than all misfolded states M taken together. Misfolding and kinetic traps do not hinder rapid folding to the native state. The chain virtually does not explore the pathways leading to misfolded states, since on these pathways the free energy rises too high. (B) Both native and some of the misfolded folds are more stable than the coil (which is the phase where a fast rearrangement occurs). The chain rapidly comes to some misfolded form and then slowly undergoes transition to the stable state N via the partly unfolded state. Arrows show the mainstream of the folding process. (Adapted from Finkelstein, A.V., Badretdinov, A.Ya., 1997b. Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des. 2, 115–121.)
The multiplier 0.4 corresponds to the average (between 0.3 and 0.5) fraction of a chain involved in the folding nucleus (Garbuzynskiy et al., 2013), so that 0.4 ΔF is, proportionally, the average free energy of the folding nucleus. This equation gives a unified, though approximate estimate (Fig. 21.6B) of folding rates occurring under various conditions. (4) Eqs. (21.2) and (21.3) estimate the range of possible folding rates rather than folding rates of an individual protein, which may differ (Fig. 21.6) from one another by many orders of magnitude. The influence of a protein chain fold upon the folding time (the factor λ ¼ 1 0.5 in Eqs (21.2) and (21.3)) can be estimated (Ivankov et al., 2003) using a phenomenological “contact order” parameter (CO%) (Plaxco et al., 1998). CO% is equal to the average chain separation of the residues that are in contact in the native protein fold, divided by the chain length. (A somewhat different order parameter, proportional to the average chain separation squared has been suggested by N€olting et al. (2003) and later analyzed in N€ olting (2010).) A high CO% value reflects entangling of the chain in the native protein (ie, existence of many long closed loops in the protein fold), while a high value of the factor λ reflects entangling of the chain in a semi-folded globule (Fig. 21.4). Therefore, CO% is more or less proportional to the λ value (Ivankov et al., 2003). CO% is a good predictor of folding rates of proteins equal in size, but it fails to compare folding rates of small proteins with those of large ones, because CO% decreases with decreasing protein size (which reflects a low entangling of chains forming large domains (Ivankov et al., 2003; Garbuzynskiy et al., 2013)), while the folding rate grows, on the average, with increasing protein size (Fig. 21.6).
336 PART
V Cooperative Transitions in Protein Molecules
As a result of collaboration of A.V.F.’s and Plaxco-Baker’s groups, it was shown that logarithms of the folding rates correlate well (Ivankov et al., 2003) with the “absolute contact order” value AbsCO ¼ CO%L, which grows with L as L2/3 on the average. The AbsCO (and ln(AbsCO) even better (Finkelstein et al., 2013)) allows predicting the folding rate of a protein from its size and structure. Moreover, protein folding rates can be predicted from the chain length and the secondary structure content (Ivankov and Finkelstein, 2004), and, because the latter can be predicted from the amino acid sequence (Jones, 1999), one can rather successfully predict protein folding rates from their amino acid sequences alone. (Here, I put aside those numerous works on protein folding or unfolding rate prediction that do not contain physical insights and are based simply on correlation of folding or unfolding rates with amino acid content, and so on. I also put aside works on relationship between folding nuclei and conservatism of some sites of amino acid sequences; cf. Shakhnovich et al., 1996.) The estimate of the folding time that I gave above is based on consideration of protein unfolding rather than folding. I considered unfolding because it is easier to outline a good unfolding pathway than a good folding pathway, while the result must be the same. You may remember that I considered the free-energy barrier between the unfolded and folded states, focusing on its unfolding side (connected with energy increase), and did not consider it from its folding side connected with entropy loss. Since the rates of direct and reverse reactions are equal under the mid-transition conditions (as follows from the physical “detailed balancing” principle), here the “unfolding” and “folding” sides of the barrier are of equal heights, and therefore, examination of only one (“unfolding”) side is sufficient to estimate the barrier height. However, a complete analysis of folding urges us to look at the barrier from its folding (connected with entropy loss) side as well; I promised to do this, and will do it now. To analyze folding, we have to analyze sampling of configurations of the protein chain. The total volume of the protein conformation space, considered at the level of amino acid residues by Levinthal, is huge indeed. However, interactions occurring inside the chain are mainly connected with secondary structures. Thus, a question arises as to how large the total volume of the protein conformation space is, if considered at the level of formation and assembly of secondary structures, that is, at the level that had been considered by Ptitsyn (1973) in his model of stepwise protein folding. To estimate this volume, one has to enumerate the number of local energy minima in the space of interacting secondary structures. It turns out that the total volume of the protein chain conformation space is by many orders of magnitude smaller at the level of secondary structures than that at the level of amino acid residues: the latter, according to Levinthal’s estimate, scales up as something like 10L or 3L with the number L of residues in the
Lecture 21
337
chain, while the former scales up no faster than LN with the number N of the secondary structure elements. N is much less than L, and this is the main reason for the drastic decrease of the conformation space. The estimate LN has been obtained as follows (see Fig. 21.9): The number of architectures (ie, types of dense stacks of secondary structures) is small (cf. Levitt and Chothia, 1976; Murzin and Finkelstein, 1988; Chothia and Finkelstein, 1990), since the architectures are packagings of a few secondary structure layers, and their combinatorics is small compared with other factors. The number of packings, ie, combinations of positions of N elements in the given protein architecture cannot exceed N! The number of topologies, ie, combinations of directions of these elements cannot exceed 2N. Transverse shifts and tilts of an element within each architecture are prohibited (Fig. 21.9D). Shifts and turns of secondary structure elements within an architecture are coupled (this is shown in Fig. 21.9E using β-sheet as the most evident example, but it also refers to α-helices— remember ‘knobs in the holes’ close packing (Crick, 1953); as a result, each α or β element can have about L/N possible shift/turns in the globule formed by N secondary structures of the L-residue chain. All this limits the space to
FIG. 21.9 A scheme of estimate of the volume of conformation space at a level of assembly of secondary structures. (Adapted from Supplement to Finkelstein, A.V., Garbuzynskiy, S.O., 2015. Reduction of the search space for the folding of proteins at the level of formation and assembly of secondary structures: A new view on solution of Levinthal’s paradox. Chem. Phys. Chem. 16, 3373–3378.)
338 PART
V Cooperative Transitions in Protein Molecules
(L/N)N 2N N! configurations, or LN in the main term (if L ≫ N ≫ 1). This number can be somewhat reduced by symmetry of the globule; also, no α-helix can take the place of a β-strand, and vice versa, and short or crossing loops between secondary structures can prevent these from taking arbitrary positions and directions in the globule, etc. (Ptitsyn and Finkelstein, 1980). However, this reduction is not important to us, because our aim is to estimate the upper limit of the number of configurations. Inner voice: How does the chain know where to form a secondary structure and what secondary structure is to be formed there? Lecturer: Most of the secondary structures are well outlined by local amino acid sequences (Jones, 1999). Besides, its “existence or non-existence” adds only one state to the number L/N of its possible shift/turns, which is already taken into account. The value of L/N (ie, the number of residues per secondary structure element plus accompanying loops) is 15 5 according to protein statistics (Rollins and Dill, 2014), and it should scale as 3 L1/3 in a large globule, because the secondary structures go from one end of the globule to another, so that the resulting LN scales up, in the main term, approximately as the upper limit of the estimate given by Eq. (21.2) (cf. also Fu and Wang, 2004; Steinhofel et al., 2006). Taking, from folding of the smallest proteins, a few microseconds as a rough estimate of the time necessary to sample one configuration, we see that the time necessary to sample the whole conformation space at the level of secondary structure formation and assembly approximately coincides with the upper limit of the time estimate obtained from consideration of unfolding, as well as from experiment (Fig. 21.6). Inner voice: Does this mean that a folding protein samples the entire conformation space at the level of secondary structure formation and packing? Lecturer: Not at all. This means only that the funnels and all that starting their work from the level of secondary structures have to accelerate folding by several orders of magnitude, rather than by many tens of orders, which would have been the case if the funnels were to start working from the level of amino acid residues. Actually, protein folding resembles crystallization (cf. Ubbelohde, 1965; Slezov, 2009); at the freezing temperature a perfect large single crystal (the lowest-energy structure) arises, although extremely slowly. As the temperature is lowered, a little the single crystal grows faster; and a further temperature decrease leads to rapid formation of a multitude of small crystals rather than of a perfect large single crystal. I would like to mention that the similarity of the thermodynamic aspects of protein folding to those of first-order phase transitions has been experimentally proved by Privalov (1979), and the similarity of the kinetic aspects of these two events has been outlined by the experiments of Fersht and his co-workers
Lecture 21
339
(Matouschek et al., 1989, 1990; Fersht, 1999) and by the analytical studies (Shakhnovich and Gutin, 1989, 1990) and computer experiments (Abkevich et al., 1994; Gutin et al., 1996) by Shakhnovich and Gutin. Sˇali et al. (1994) used simple computer models of protein chains to show that even a small (several kBT) difference in stability displayed by the lowest-energy fold and its competitors provides a reliable and fast folding to the global energy minimum. This work had a significant impact on the subsequent development of the theory of protein folding. Then the same simple computer models of protein chains were used to explore the region of fastest folding. It has been shown (Gutin et al., 1996) that the folding time grows with the chain length L in this region much more slowly than predicted by Eq. (21.2) for the mid-transition. Specifically, in this region, the folding time grows with L not as exp(L2/3) but as L6 for “random” chains and as L4 for the chains selected to fold most rapidly (ie, having a large energy gap between the most stable fold and the other ones). This emphasizes once again the dependence of the folding rate on the experimental conditions and on the difference in stability between the lowest-energy fold and its competitors (Wolynes, 1997). Now, it would be only natural if here you asked me the question as to what happens if two structures instead of one are separated from others by a large energy gap. The answer is: If these two structures are stable relative to the denatured state, the one with the better folding pathway (with a lower barrier) will be the first to fold. However, if this “more rapidly folded structure” is even a little less stable than the other one, a very slow transition to the more stable structure will follow. The transition will be slow since unfolding of the first stable structure will be required (Fig. 21.8B). This transition is similar to a polymorphous transition in crystals (recall the “tin disease”, ie, the transition of white tin into grey tin: in the 16th–18th centuries, this “disease” is known to have destroyed whole stores of tin soldier buttons during unusually severe frosts). It seems, though, that “polymorphous” proteins must be rare: as I told you, theoretical estimates show that the amino acid sequence coding one stable chain fold is a kind of wonder, but the sequence coding two of them is a squared wonder! There is evidence, though, for polymorphous transitions in some proteins (Tsutsui et al., 2012). I mean serpins (a family of serine protease inhibitors, Fig. 21.10) and a few other “chameleon” proteins (putting aside already considered amyloids, because their alternative folds are achieved in aggregates, ie, in an environment that differs from conditions in which a single chain folds; so, this is not surprising; the surprising thing is that a separate protein molecule undergoes rearrangement under unchanged environmental conditions). The serpin works as follows: it works as an inhibitor for some time and then stops working—without causing any damage to the chain or aggregation. Being denatured and then renatured in vitro, it regains its active state (Fig. 21.10, left), works as an inhibitor for an hour or so, and then stops working—and again
340 PART
V Cooperative Transitions in Protein Molecules
FIG. 21.10 Polymorphous latency transitions in serpin. Left panel: in the active state, the RCL region (red, on the top) is solvent-exposed, and the s1C region (yellow β-strand, on the top) is hydrogen-bonded to β-sheet C. The active state contrasts with the latent state (right panel), where s1C (yellow) is released from sheet C and the RCL (now in the bottom, left) is inserted in the centre of β-sheet A as the fourth (red) strand in a now six-stranded, fully antiparallel β-sheet. (Adapted with permission from Cazzolli, G., Wang, F., Beccara, S., Gershenson, A., et al., 2014. Serpin latency transition at atomic resolution. Proc. Natl. Acad. Sci. USA 111, 15414–15419.)
without damaging the chain or aggregation. And this cycle can be repeated a few times. This means that the “working” state of serpin is not the free energy minimum: it is a metastable state, while the stable state, ie, the global free energy minimum corresponds to its “latent” state (Fig. 21.10, right). The transition from the active, metastable state of serpin to its latent, inactive but stable state requires a large change in its conformation. With this, I have redeemed my promise to return to the “rare proteins with extremely stable folding intermediates”. In conclusion, let us return to the energy (Fig. 21.1) and free energy (Fig. 21.5) landscapes to visualize the fast folding pathways that automatically lead to the global energy minimum, provided it is much deeper than other energy minima. Each low-energy structure, each energy minimum is surrounded by a rosette of tracks going to the minimum through the hilly, even rocky landscape (the “rocks” stressed by Frauenfelder (2010) correspond to powerful but short-range repulsions at collisions of chain fragments). They form a “folding funnel” (Leopold et al., 1992; Wolynes et al., 1995; Karplus, 1997; Wolynes, 1997; Dill and Chan, 1997; N€ olting, 2010). Some of these tracks, more or less smooth, correspond to the sequential (Fig. 21.2) pathways of folding to the given energy minimum; the most advantageous are those that provide separation of the dense, compact (and thus relatively stable) already folded part of the native globule from the still unfolded chain. No rearrangements of the already folded part
Lecture 21
341
of the globule take place on the sequential pathways, and therefore they have no large pits and bumps (and small ones pose no obstacle when the temperature is not too low). When moving along these pathways to the energy minimum, the molecule gains energy but loses entropy (Fig. 21.3), since its structure becomes more and more fixed. The deeper the energy minimum, the greater the energy gain—and the easier it is to overcome the “entropic resistance”. The strength of this resistance is proportional to temperature. At too high a temperature the resistance is strong, and it does not allow the chain to fold into any fixed structure. At too low a temperature the resistance is too small, and the chain falls in any neighboring energy well; then it spends a lot of time to get out of it just to fall in a local minimum again—and thus it cannot approach the global energy minimum for a very long time (Fig. 21.8B). And at last, when the temperature is good for folding, the entropic resistance is overcome only on the pathways to the deepest energy minimum (which is much deeper than other minima: just recall the “energy gap”), and the chain gets there rapidly (Fig. 21.8A). The scheme given above of entropy-by-energy compensation along the folding pathway with separation of folded and unfolded phases, and the conclusion that it can solve the Levinthal’s paradox are applicable to formation of the native protein structure not only from the coil but also from the molten globule or from another intermediate. However, for these scenarios, all the estimates would be much more cumbersome, while these processes do not show (experimentally) any drastic advantage in the folding rate. Therefore, I will not go beyond the simplest case of the coil-to-native globule transition. All the above emphasizes the significance of development of the “new view” on folding, the view based on the nature of the free energy surfaces or “landscapes” that define the folding reaction rather than on discrete accumulation of folding intermediates. This process differs from the more familiar reactions of small molecules by the following: (1) a particular protein structure is the result of many weak interactions; (2) a large change in configurational entropy occurs during folding; (3) the native and denatured phases separate during folding. One of the most satisfying features of recent conceptual advances is the emergence of a “unified mechanism” of folding and the realization that apparently disparate patterns of experimental behavior result from the interplay of different contributions to the overall free energy under different circumstances. This unification of ideas provides a framework for joint work by experimentalists and theoreticians in order to obtain a more detailed understanding of the intricate folding process. In particular, special attention is now paid to the link between the fundamental process of folding and the multiplicity of folding protein states that can exist in a crowded cellular system, which leads to the need to consider their intermolecular as well as intramolecular interactions, such as the interaction of the folding nuclei with chaperones or other proteins, RNA and DNA. The understanding of protein folding is much more than just a challenge to the skills and intellects of protein scientists. Folding (or self-organization)
342 PART
V Cooperative Transitions in Protein Molecules
represents a bridge between the laws of physics and the results of the evolutionary pressure under which the character of biology has developed. For many years we could only discuss this issue in very general terms, emphasizing the biological importance of this complex physical process. Now, we are beginning to understand that folding is a physical process taking place only in systems that have been biologically selected to ensure its efficiency. Protein physics is grounded on two fundamental experimental facts: (1) protein chains are capable of forming their native structures spontaneously in the appropriate environment, and (2) the native state is separated from the unfolded state of the chain by an “all-or-none” phase transition. The latter ensures the robustness of protein action and minimal populations of partially folded and therefore aggregation-prone species. It appears that biological evolution selects only those sequences that fold into a well-defined 3D native structure. This is necessary for the protein to work reliably. Such a well-defined structure can be formed only by a sequence with a free energy that is much lower than that of alternative structures. And this is precisely the sequence that is capable of “all-or-none” folding and unfolding. The enhanced stability of the native structure seems to be due to its tight packing (even though it is not yet clear exactly what constraint is placed on protein chains by their capability of tight packing). Interestingly, these selected sequences appear to meet the requirement of correct folding into the stable structure simultaneously with the ability to fold quickly. It is probable that the latter helps incompletely folded polypeptide chains to avoid the competing process of intermolecular aggregation. The ability of stable structures to fold quickly solves the long-standing contradiction between the kinetic and thermodynamic choice of the native fold. The sequences of globular proteins look rather “random”, and their secondary and tertiary structures bear many features typical of all more or less stable folds of random co-polymers. However, the ability of polypeptides to fold rapidly and reproducibly to definite tertiary structures is not a characteristic feature of random sequences. The main feature of “protein-like” amino acid sequences, the feature that determines all their physical properties, is the enhanced stability of their native fold, ie, the existence of a large gap between the energy of the native fold and the minimum energy of misfolded globular structures. Although the size of this energy gap is not yet established by a physical experiment, some theoretical estimates show that even a rather narrow gap (of a few kilocalories per mole) can cause “protein-like” behavior of a protein domain. And it is already clear that the necessary reinforcement of the lowest-energy fold can be achieved by a gradual evolutionary selection among many random point mutations. Protein folding is perhaps the simplest example of a biological morphogenesis that can only take place in a system evolutionarily designed to allow it to happen. Nevertheless, protein folding has the typical physical characteristics associated with a complex process obeying the laws of statistical mechanics.
Lecture 21
343
In particular, folding of globular proteins involves a capillarity nucleation mechanism generally typical for “all-or-none” (ie, first-order) phase transitions. Folding pathways seem to be not unique: various pathways can lead to one target, although their rates may be different and may depend on the folding conditions. (A low-speed folding may be, though, dangerous: as shown by (Dobson, 2003), non-folded proteins tend to aggregate, which, as have been mentioned, can lead to “conformational diseases”.) Folding is the result of statistical fluctuations within the unfolded protein chain, which (under appropriate temperature and solvent conditions) result in its transition to the native state, which is almost always the structure of the lowest free energy. In this way a single welldefined structure can emerge from the statistical ensemble of unfolded or partially folded species. All this once again emphasizes a connection between protein structure, stability and ability to fold spontaneously and the role of natural selection in the introduction and maintenance of these qualities. It is easy to imagine and it is possible to show by computer simulations (but much more difficult to prove experimentally) that selection could start from the slightly increased stability of some structure of a polypeptide coded by a “random” piece of DNA, provided this structure can do something even marginally useful for a cell. And that then the pressure of evolution results, by random mutations and selection of the more and more reliably folding sequences, in the emergence of a sequence with the rare quality of spontaneously folding into a definite fold, which also allows specificity of function to be achieved, by additional selection, in a manner susceptible to feedback control and regulation in a complex and crowded cellular environment. However, this way of originating new globular proteins, origination from random sequences, seems to play no role at observable (ie, not the earliest) stages of biological evolution (this concerns water-soluble globular proteins, while fibrous and membrane proteins seem to evolve by multiplication of relatively short repeats or merging of relatively short blocks). As I have already mentioned, it is commonly accepted that now “new” proteins originate in two ways. One is gene duplication, which allows the “old” protein to keep working, while the sequence of the “new” one is free to mutate and to drift (with the aid of selection) towards some “new” function. The other is the merging of separate domains and small proteins into a multi-domain protein capable of performing more complicated functions and thus more susceptible to regulation because of the physical interactions of these domains. It is worth mentioning that multi-domain proteins are more typical of higher organisms than of bacteria and unicellular eukaryotes. (Though I have to say that there is no evidence that, historically, proteins went from “small and simple” to “large and complicated”; the already considered observation that natively disordered proteins are more typical of eukaryotes than of prokaryotes provokes some suspicions to this idea.) However, this does not revoke the privilege of some (“defect-free”) protein architectures, whose stability is compatible with a greater variety of sequences
344 PART
V Cooperative Transitions in Protein Molecules
to ensure more freedom for evolution and selection. Of course, selection is capable of creating even most improbable structures if they give an advantage to a species (the eye and brain being examples). However, protein function has only a little connection with its architecture, as we have seen and will see again. Therefore, it is not really surprising (although it is remarkable) that protein structures often look like those to be expected for stable folds of random sequences; that is, the “multitude principle” still seems to work in biological evolution.
REFERENCES Abkevich, V.I., Gutin, A.M., Shakhnovich, E.I., 1994. Specific nucleus as a transition state for protein folding: evidence from the lattice model. Biochemistry 33, 10026–10031. Anfinsen, C.B., 1973. Principles that govern the folding of protein chains. Science 181, 223–230. Baryshnikova, E.N., Melnik, B.S., Finkelstein, A.V., et al., 2005. Three-state protein folding: experimental determination of free-energy profile. Protein Sci. 14, 2658–2667. Bicout, D.J., Szabo, A., 2000. Entropic barriers, transition states, funnels, and exponential protein folding kinetics: A simple model. Protein Sci. 9, 452–465. Bogatyreva, N.S., Finkelstein, A.V., 2001. Cunning simplicity of protein folding landscapes. Protein Eng. 14, 521–523. Bryngelson, J.D., Wolynes, P.G., 1989. Intermediates and barrier crossing in a random energy model (with applications to protein folding). J. Phys. Chem. 93, 6902–6915. Chothia, C., Finkelstein, A.V., 1990. The classification and origins of protein folding patterns. Ann. Rev. Biochem. 59, 1007–1039. Crick, F.H.C., 1953. The packing of α-helices: Simple coiled coils. Acta Crystallogr. 6, 689–697. Dill, K.A., Chan, H.S., 1997. From Levinthal to pathways to funnels. Nat. Struct. Biol. 4, 10–19. Dobson, C.M., 2003. Protein folding and misfolding. Nature 426, 884–890. Fersht, A., 1999. Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. W.H. Freeman, New York (Chapters 2, 15, 18, 19). Finkelstein, A.V., 2002. Cunning simplicity of a hierarchical folding. J. Biomol. Struct. Dyn. 20, 311–313. Finkelstein, A.V., 2015. Two views on the protein folding puzzle. Available at: http://atlasofscience. org/two-views-on-the-protein-folding-puzzle/. Finkelstein, A.V., Badretdinov, A.Ya., 1997a. Physical reason for fast folding of the stable spatial structure of proteins: A solution of the Levinthal paradox. Mol. Biol. (Moscow, Eng. Trans.) 31, 391–398. Finkelstein, A.V., Badretdinov, A.Ya., 1997b. Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des. 2, 115–121. Finkelstein, A.V., Badretdinov, A.Ya., 1998. Influence of chain knotting on the rate of folding. Erratum to Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des. 3, 67–68. Finkelstein, A.V., Garbuzynskiy, S.O., 2015. Reduction of the search space for the folding of proteins at the level of formation and assembly of secondary structures: A new view on solution of Levinthal’s paradox. Chem. Phys. Chem. 16, 3373–3378. Finkelstein, A.V., Bogatyreva, N.S., Garbuzynskiy, S.O., 2013. Restrictions to protein folding determined by the protein size. FEBS Lett. 587, 1884–1890. Flory, P.J., 1969. Statistical Mechanics of Chain Molecules. Interscience Publishers, New York (Chapter 3).
Lecture 21
345
Frauenfelder, H., 2010. The Physics of Proteins: An Introduction to Biological Physics and Molecular Biophysics. Springer, New York (Chapters 11, 12, 15). 11/d Fu, B., Wang, W., 2004. A 20(n log(n)) time algorithm for d-dimensional protein folding in the HPmodel. Lecture Notes in Computer Science 3142, 630–644. Galzitskaya, O.V., Ivankov, D.N., Finkelstein, A.V., 2001. Folding nuclei in proteins. FEBS Lett. 489, 113–118. Garbuzynskiy, S.O., Ivankov, D.N., Bogatyreva, N.S., Finkelstein, A.V., 2013. Golden triangle for folding rates of globular proteins. Proc. Natl. Acad. Sci. USA 147–150. Go, N., Abe, H., 1981. Noninteracting local-structure model of folding and unfolding transition in globular proteins. I. Formulation. Biopolymers 20, 991–1011. Gutin, A.M., Abkevich, V.I., Shakhnovich, E.I., 1996. Chain length scaling of protein folding time. Phys. Rev. Lett. 77, 5433–5436. Ivankov, D.N., Finkelstein, A.V., 2004. Prediction of protein folding rates from the amino-acid sequence-predicted secondary structure. Proc. Natl. Acad. Sci. USA 101, 8942–8944. Ivankov, D.N., Garbuzynskiy, S.O., Alm, E., et al., 2003. Contact order revisited: Influence of protein size on the folding rate. Protein Sci. 12, 2057–2062. Jackson, S.E., 1998. How do small single-domain proteins fold? Fold. Des. 3, R81–R91. Jones, D.T., 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202. Modern version available at: http://bioinf.cs.ucl.ac.uk/psipred/. Karplus, M., 1997. The Levinthal paradox: yesterday and today. Fold. Des. 2 (Suppl. 1), S69–S75. Landau, L.D., Lifshitz, E.M., 1980. Statistical Physics (Volume 5 of A Course of Theoretical Physics), third ed. Elsevier, London (Sections 7, 8, 150). Leopold, P.E., Montal, M., Onuchic, J.N., 1992. Protein folding funnels: a kinetic approach to the sequence-structure relationship. Proc. Natl. Acad. Sci. USA 89, 8721–8725. Levinthal, C., 1968. Are there pathways for protein folding? J. Chim. Phys. Chim. Biol. 65, 44–45. Levinthal, C., 1969. How to fold graciously. In: Debrunner, P., Tsibris, J.C.M., Munck, E. (Eds.), M€ ossbauer Spectroscopy in Biological Systems: Proceedings of a Meeting Held at Allerton House, Monticello, Illinois. University of Illinois Press, Urbana-Champaign, IL, pp. 22–24. Levitt, M., Chothia, C., 1976. Structural patterns in globular proteins. Nature 261, 552–558. Li, L., Mirny, L.A., Shakhnovich, E.I., 2000. Kinetics, thermodynamics and evolution of non-native interactions in a protein folding nucleus. Nat. Struct. Biol. 7, 336–432. Matouschek, A., Kellis, J.T., Serrano, L., Fersht, A.R., 1989. Mapping the transition state and pathway of protein folding by protein engineering. Nature 340, 122–126. Matouschek, A., Kellis, J.T., Serrano, L., Fersht, A.R., 1990. Transient folding intermediates characterized by protein engineering. Nature 346, 440–445. Melnik, B.S., Marchenkov, V.V., Evdokimov, S.R., et al., 2008. Multi-state protein: Determination of carbonic anhydrase free-energy landscape. Biochem Biophys. Res. Commun. 369, 701–706. Mun˜oz, V., Thompson, P.A., Hofrichter, J., Eaton, W.A., 1997. Folding dynamics and mechanism of beta-hairpin formation. Nature 390, 196–199. Murzin, A.G., Finkelstein, A.V., 1988. General architecture of α-helical globule. J. Mol. Biol. 204, 749–770. Ngo, J.T., Marks, J., 1992. Computational complexity of a problem in molecular structure prediction. Protein Eng. 5, 313–321. N€ olting, B., 2010. Protein Folding Kinetics: Biophysical Methods. Springer, New York (Chapters 10, 11, 12). N€ olting, B., Scha¨like, W., Hampel, P., et al., 2003. Structural determinants of the rate of protein folding. J. Theor. Biol. 223, 299–307.
346 PART
V Cooperative Transitions in Protein Molecules
Plaxco, K.W., Simons, K.T., Baker, D., 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994. Privalov, P.L., 1979. Stability of proteins: small globular proteins. Adv. Protein Chem. 33, 167–241. Ptitsyn, O.B., 1973. Stages in the mechanism of self-organization of protein molecules. Dokl. Akad. Nauk SSSR 210, 1213–1215 (in Russian). Ptitsyn, O.B., Finkelstein, A.V., 1980. Similarities of protein topologies: evolutionary divergence, functional convergence or principles of folding? Quart. Rev. Biophys. 13, 339–386. Rollins, G.C., Dill, K.A., 2014. General mechanism of two-state protein folding kinetics. J. Am. Chem. Soc. 136, 11420–11427. Sˇali, A., Shakhnovich, E., Karplus, M., 1994. Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. J. Mol. Biol. 235, 1614–1636. Shakhnovich, E.I., Gutin, A.M., 1989. Formation of unique structure in polypeptide-chains theoretical investigation with the aid of a replica approach. Biophys. Chem. 34, 187–199. Shakhnovich, E.I., Gutin, A.M., 1990. Implications of thermodynamics of protein folding for evolution of primary sequences. Nature 346, 773–775. Shakhnovich, E., Abkevich, V., Ptitsyn, O., 1996. Conserved residues and the mechanism of protein folding. Nature 379, 96–98. Slezov, V.V., 2009. Kinetics of First-Order Phase Transitions. Wiley-VCH, Weinheim, Germany. Steinhofel, K., Skaliotis, A., Albrecht, A.A., 2006. Landscape analysis for protein folding simulation in the H-P model. Lecture Notes in Computer Science 4175, 252–261. Thirumalai, D., 1995. From minimal models to real proteins: time scales for protein folding kinetics. J. Phys. I. (Orsay, Fr.) 5, 1457–1469. Thompson, P.A., Eaton, W.A., Hofrichter, J., 1997. Laser temperature jump study of the helix < ¼ > coil kinetics of an alanine peptide interpreted with a ‘kinetic zipper’ model. Biochemistry 36, 9200–9210. Tsutsui, Y., Cruz, R.D., Wintrode, P.L., 2012. Folding mechanism of the metastable serpin α1-antitrypsin. Proc. Natl. Acad. Sci. USA 109, 4467–4472. Ubbelohde, A.R., 1965. Melting and Crystal Structure. Clarendon Press, Oxford (Chapters 2, 5, 6, 10–12, 14, 16). Unger, R., Moult, J., 1993. Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implications. Bull. Math. Biol. 55, 1183–1198. Wetlaufer, D.B., 1973. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc. Natl. Acad. Sci. USA 70, 697–701. Wolynes, P.G., 1997. Folding funnels and energy landscapes of larger proteins within the capillarity approximation. Proc. Natl. Acad. Sci. USA 94, 6170–6175. Wolynes, P.G., Onuchic, J.N., Thirumalai, D., 1995. Navigating the folding routes. Science 267, 1619–1620. Zana, R., 1975. On the rate determining step for helix propagation in the helix-coil transition of polypeptides in solution. Biopolymers 14, 2425–2428. Zwanzig, R., Szabo, A., Bagchi, B., 1992. Levinthal’s paradox. Proc. Natl. Acad. Sci. USA 89, 20–22.