Article
Effects of Topology and Sequence in Protein Folding Linked via Conformational Fluctuations Daniel Trotter1 and Stefan Wallin1,* 1
Department of Physics and Physical Oceanography, Memorial University of Newfoundland, St. John’s, Newfoundland, Canada
ABSTRACT Experiments have compared the folding of proteins with different amino acid sequences but the same basic structure, or fold. Results indicate that folding is robust to sequence variations for proteins with some nonlocal folds, such as all-b, whereas the folding of more local, all-a proteins typically exhibits a stronger sequence dependence. Here, we use a coarse-grained model to systematically study how variations in sequence perturb the folding energy landscapes of three model sequences with 3a, 4b þ a, and b-barrel folds, respectively. These three proteins exhibit folding features in line with experiments, including expected rank order in the cooperativity of the folding transition and stability-dependent shifts in the location of the freeenergy barrier to folding. Using a generalized-ensemble simulation approach, we determine the thermodynamics of around 2000 sequence variants representing all possible hydrophobic or polar single- and double-point mutations. From an analysis of the subset of stability-neutral mutations, we find that folding is perturbed in a topology-dependent manner, with the b-barrel protein being the most robust. Our analysis shows, in particular, that the magnitude of mutational perturbations of the transition state is controlled in part by the size or ‘‘width’’ of the underlying conformational ensemble. This result suggests that the mutational robustness of the folding of the b-barrel protein is underpinned by its conformationally restricted transition state ensemble, revealing a link between sequence and topological effects in protein folding.
SIGNIFICANCE An unresolved issue in protein folding is how the folding process is altered by variations in the amino acid sequence, i.e., do different proteins with similar structures fold in a similar manner? Here, we show that the structural diversity of a key state for folding, the transition state, impacts the magnitude of effects from sequence variations. In other words, how a protein folds is not uncoupled from how it responds to mutations.
INTRODUCTION Single-domain proteins or domains taken from multidomain proteins have long been of central interest as minimal models for protein folding (1–4). These proteins typically fold spontaneously and efficiently into their native states and therefore do not require consideration of chaperone molecules, which often aid folding in cells by preventing long-lived, partially folded molecular species (5). Moreover, small single-domain proteins with less than around 100 amino acids typically fold in a simple two-state manner (6). Experiments on two-state proteins, along with considerations of simple lattice models, were integral in the development of the energy landscape theory, which provides a conceptual framework for folding (7–9). From Submitted July 29, 2019, and accepted for publication January 13, 2020. *Correspondence:
[email protected] Editor: Alexandr Kornev. https://doi.org/10.1016/j.bpj.2020.01.020 Ó 2020 Biophysical Society.
1370 Biophysical Journal 118, 1370–1380, March 24, 2020
an equilibrium perspective, two-state proteins populate two structurally distinct states, a denatured (or unfolded) state, U, and the native folded state, N, whereas intermediate states are only weakly populated. Folding is thereby rate limited by a single transition state (TS). Because of the high dimensionality of the energy landscape and the multiplicity of possible folding routes, the TS must be represented by an ensemble of conformations. A long-standing question in two-state protein folding has therefore been the characterization of the TS ensemble (10). Folding simulations on early lattice models (11) and subsequent bioinformatics studies (12) showed that the core of the TS ensemble, i.e., the folding nucleus, can be conserved across proteins with a common structure but highly divergent sequences. By emphasizing the role of native structure over sequence in determining the TS, these early lattice results foreshadowed the important observation that the native state topology, i.e., the pattern of residueresidue contacts in the native structure, is a major—even
Topology and Sequence Effects in Folding
dominant—determinant of the folding rates kf of two-state proteins and hence of the TS (13). Observed correlations between lnkf and various topological parameters, such as the original relative contact order (RCO) (13) and later constructions (14–16), show that proteins with many local contacts (i.e., contacts with small sequence separation ji jj, where i and j are residue numbers along the chain) tend to fold faster than proteins with relatively more nonlocal or long-ranged contacts (large ji jj). Some, but not all, of these parameters also show significant rate correlations when the three main structural classes of proteins (all-a, mixed a/b, and all-b) are considered separately (17). Topology as a major folding determinant is also evident from the success of various implicit chain native structure-based approaches (18,19) in quantitatively predicting experimental data on folding such as f-values. Simulation studies using Go-type (20–26) and lattice (27–29) models have attempted to uncover the implications for folding of these rate correlations. In particular, these studies indicate that the folding into complex, nonlocal topologies such as many all-b proteins is dominated by chain fluctuations into a conformationally restricted, nonlocal TS (24,28). In simpler, more local topologies such as a-helix bundles, this entropic cost can more easily be overcome by energetic factors (25). However, although topology is a major determinant of folding, delineating the role of the amino acid sequence is clearly central because it is the sequence that ultimately drives proteins into their native states. Fortunately, the ‘‘sequence effects’’ on folding have been studied by comparing the folding of different proteins within a single fold family, i.e., proteins with similar structures but divergent sequences (for reviews, see (30,31)). Results indicate that the TS structure is conserved within some all-b and mixed a/b-fold families, whereas the TSs of all-a proteins are typically are more sequence dependent. For example, Im7 and Im9 fold in different ways despite having practically identical four-helix bundle structures and 72% sequence identity. Whereas Im7 exhibits two-state behavior, Im9 folds through an intermediate (32). A computational study suggests the difference is due to non-native interactions in Im9 (33), which, in turn, might result from functional constraints (34). By contrast, the src, spc, and fyn SH3 domains, with a common five-stranded b-barrel and 33–76% pairwise sequence identities, exhibit very similar TS structures (35–37), as judged by striking correlations between f-values at equivalent positions, as shown in (30) and Fig. 1 A. We carried out an analysis of more recent experimental f-value data from the literature, which confirms this trend but also provides a more nuanced picture (see Figs. 1 B and S1). For example, the more distant Grb2 SH3 domain (38), with only 25–30% sequence identity to src, spc, and fyn, exhibits no significant f-value correlations with these proteins (see Fig. S1), thus indicating a limit on the TS conservation within SH3 domains. Moreover, data from a folding study (39) on variants of
FIGURE 1 Conservation of TS structure in SH3 domains and GA variants. Shown are experimental f-values taken from the literature (35–37,39,78) at structurally aligned positions in proteins of similar structures. (A) Pairwise comparisons of the src, spc, and fyn SH3 domains with a common b-barrel fold are shown (35–37,78). (B) Pairwise comparisons of the GA30, GA77, and GA88 variants of the A domain of protein G (GA) with a common 3a fold are shown (39). Dashed lines are linear fits. Labels indicate the pair of proteins compared, their sequence identity in percent, and the correlation coefficient r. The pairwise alignments of the SH3 domains are chosen as in previous studies (35–37,78) (see Figs. S2 and S3). All f-values included are listed in Tables S1 and S2. To see this figure in color, go online.
the A domain of protein G (GA) indicate that two-state folding and f-value correlations can be maintained to some extent even within a three-helix bundle fold, as shown in Fig. 1 B. In an attempt to uncover general principles, we set out to investigate the folding free-energy landscapes of proteins with distinct native state topologies and how they are perturbed by variations in their amino acid sequence. To this end, we use a coarse-grained sequence-based model for folding with three amino acid types and an effective potential energy function (40). This model provides a self-contained framework within which the relationship between sequence and folding behavior, including structure, stability, and folding landscape, can be systematically studied (41,42). We selected a previously studied sequence, A54 (43), which adopts a 3a fold (44), and designed two new, to our knowledge, sequences, M54 and B35, which adopt 4b þ a and b-barrel folds, respectively (see Fig. 2, A–C; Table 1), so that the three main structural classes of proteins are represented. We first characterize the folding of these three ‘‘parent’’ sequences and show that the barrier-crossing region along a simple progress variable can act as an approximate representation of the TS for these proteins. We then study sequence effects by applying a recently developed generalized-ensemble algorithm that allows the equilibrium behavior of multiple sequences to be determined in a single run (45). We find that, on average, stability changes of individual structural elements are the smallest for the b-barrel protein, in line with the previous theoretical (11) and experimental (30,31) results discussed above. Moreover, by analyzing the conformational properties of the TS ensemble, we find that the size of mutational effects on folding can be understood from consideration of the size and direction of conformational variations within the TS.
Biophysical Journal 118, 1370–1380, March 24, 2020 1371
Trotter and Wallin
FIGURE 2 Native structures, motifs, and free-energy profiles. (A–C) Minimal energy conformations found for (A) A54, (B) M54, and (C) B35 are shown in ribbon representation and colored by the three amino acid types: p (beige), h (green), and t (yellow). Contact maps showing the residue-residue contact probabilities, Pij, at the lowest studied temperature (kBT ¼ 0.50) are given. Any contact ij falling within an indicated structural motif (labeled boxes) contributes toward the total number of native contacts, Nnat. (D–F) Free-energy profiles F(Nnat) ¼ kBTlnP(Nnat) are shown for (D) A54, (E) M54, and (F) B35, where the probability distribution P(Nnat) is taken at the respective Tf-values. The Nnat ranges 80–100, 60–100, and 55–67 for A54, M54, and B35, respectively (gray N areas and horizontal arrows), are referred to as TS regions (see text). Insets: locations of minima (N U nat or N nat ) and maxima (N nat ) of F(Nnat) are shown as functions of T. To see this figure in color, go online.
MATERIALS AND METHODS Coarse-grained sequence-based model for protein folding Calculations are carried out with the coarse-grained 3-letter protein model developed in (40). We briefly describe it below. The model combines an all-atom backbone (N, Ca, C, O, H, Ha1) with a single-site side-chain representation (an enlarged Cb). There are three amino acid types: polar (p), hydrophobic (h), and turn type (t). Type t is closely related to glycine; it differs from p and h in that it lacks a Cb atom, which is replaced by a smaller Ha2. All bond angles and lengths are held fixed at standard values such that the only degrees of freedom of an L-amino acid chain are the 2L backbone dihedral angles fi and ji, where i ¼ 1, ..., L. Solvent molecules are not explicitly represented. An effective potential energy function E(r, s), where r is the protein conformation and s the amino acid sequence, is based mainly on excluded-volume effects (1/r12 repulsions), directionally dependent backbone NH-CO hydrogen bonding, and pairwise attractions between h amino acids (Lennard-Jones-like attractions between Cb atoms of hh pairs). Model parameters such as the relative strengths between the different interaction types were determined by requiring that six designed model sequences of length TABLE 1
Model Sequences Studied
Name
Length L
Sequencea
A54 M54 B35
54 54 35
(A16)ttt(A16)ttt(A16)b (B16)ttt(A16)ttt(B16)c p(hp)3tt(ph)2tt(hp)3tt(ph)2tt(hp)3
a
Subscripts indicate repeats; e.g., (hp)3 is hphphp. b (A16) is short for p(phpphhp)2p. c (B16) is short for p(hp)3tt(ph)3p.
1372 Biophysical Journal 118, 1370–1380, March 24, 2020
16–54 amino acids exhibit global free-energy minima corresponding to various structures, including a-helical structures and single-layered b-sheets (40). As it turns out, other sequences for which the model was not developed spontaneously fold into more elaborate folds, including mixed a/b and b-barrel structures, as we demonstrate in this work.
Model sequences The three model sequences studied, A54, M54, and B35, shown in Table 1, are constructed by applying basic protein design principles (44). In particular, basic repeating heptad (phpphhp) or binary (ph) h-p patterns give sequences that can produce amphipathic a-helix or b-strands structures, respectively, and tt or ttt segments can accommodate turns between secondary structure elements. For example, the 16-amino-acid sequence p(hp)3tt(ph)3p, denoted B16 in Table 1, can produce a b-hairpin with all h Cb atoms on one side of the sheet. M54 is constructed using two B16 segments linked by two ttt segments to a central a-helical pattern p(phpphhp)2p (denoted A16 in Table 1) that together allow the formation of a 4b þ a fold with an h core. A54 is constructed with three A16 segments linked by two ttt segments. B35 is constructed from 5 b-strand segments with binary ph patterns that are linked together by four tt segments. For B35, some exploration of strand and turn lengths was required to find a sequence with a stable b-barrel native state.
Simulated tempering Monte Carlo The thermodynamic behaviors of A54, M54, and B35 are determined using simulated tempering Monte Carlo (46,47), carried out using previous procedures (41,42). Simulated tempering works by letting the temperature jump between a set of predefined values, T1, ., TK, while keeping the
Topology and Sequence Effects in Folding simulation at equilibrium. This is achieved by sampling the joint probability distribution
i h . Pðr; jÞfexp EðrÞ kB Tj þ gj ;
(1)
where j ¼ 1, ., K and gj are simulation parameters that control the marginal distribution P(j). Determining a set gj that provides a roughly flat P(j), thereby providing good conformational sampling at all temperatures, can be achieved by trial simulations. We use two types of conformational updates: the global pivot move, which rotates the chain around a single bond, and the semilocal biased Gaussian step (BGS) (48), which turns up to eight consecutive torsional angles in a coordinated way. The BGS update is tuned by two parameters, a and b, controlling the acceptance rate and degree of bias, respectively; we set a ¼ 300 and b ¼ 10, following previous work (49). For A54, M54, and B35, we use K ¼ 16 temperatures in the range kBT ¼ 0.50–0.70 and carried out, for each sequence, 16 independent runs of each 3 109 elementary Monte Carlo steps. Runs were started from a randomly chosen initial chain conformation.
Multisequence Monte Carlo To determine the thermodynamic behavior of a set of mutant sequences at a single temperature T, we use multisequence simulations (45). This method works by sampling the joint probability distribution
Pðr; sÞfexp½ Eðr; sÞ = kB T þ hðsÞ;
(2)
where the parameters h(s) are analogous to the gj-values of simulated tempering and chosen such that P(s) is roughly flat. The method alternates conformational updates (r / r0, s fixed) and sequence updates (s / s0 , r fixed) during a run. Both update types are combined with appropriate accept or reject criteria so that a detailed balance is maintained (45). For each set of sequences considered (see text), we performed 30 independent multisequence runs of each 2 109 elementary MC steps. Runs were started from a random sequence (within the selected sequence set) and a random conformation. Simulation data were analyzed using the multistate Bennett acceptance ratio method (50) to optimally combine statistics across sequences.
Observables and TS regions The TS region for each protein is defined by an interval Nlow % Nnat % Nhigh, where Nnat is the total number of native contacts, and the range Nlow–Nhigh is 80–100, 60–80, and 55–67 for A54, M54, and B35, respectively. Conformations with Nnat > Nhigh are considered native such that the native state population Pnat ¼ P(Nnat > Nhigh). The choice of the limits Nlow and Nhigh are described in the text. In calculating the observables Nnat, qI, and fi, amino acids i and j are considered in contact if ji jj R 3 and at ˚. least one of the Cia Cja , Cia Cjb , Cib Cja , or Cib Cjb distances is <7.5 A
pfold analysis For each of the three proteins, a representative sample of conformations in the TS regions, taken from our simulated tempering simulations at the folding temperature (Tf), was selected and subjected to pfold calculations. The number of conformations considered were 16,975, 3889, and 2566 for A54, M54, and B35, respectively, representing 50% (A54) or 10% (M54 and B35) of all saved conformations within the respective TS regions. Following (51), we determined pfold values by carrying out 100 independent simulations at the fixed temperature Tf for each conformation. In these MC simulations, only the BGS update was used, with a ¼ 300 and b ¼ 0.0 (the pivot update was turned off). With this parameter choice, a BGS move corresponds to turning eight consecutive fi and ji backbone angles by amounts df drawn independently from a Gaussian distribution with zero mean and
pffiffiffi standard deviation 1/ a ¼ 3.4 (48). The acceptance rate was monitored for M54 and found to be around 25%. The runs were terminated when the chain was considered either unfolded (Nnat % N U nat ) or folded U N (R N N ), where N and N are the respective U and N minima in the nat nat nat free-energy profiles F(Nnat) at Tf (see Fig. 2). For A54, N U nat is taken at T ¼ 0.99Tf because no clear minimum for U is present at Tf. The fraction of trajectories that reach N is assigned as the pfold value for the conformation. The number of conformations in the TS region satisfying 0.4 < pfold < 0.6 is 1936, 630, and 283 for A54, M54, and B35, respectively. pfold values were also determined for some conformations outside of the TS region.
RESULTS Native structures and motifs Motivated by the empirical observation that small proteins of different folds respond differently to variations in sequence, as shown in Fig. 1, we study the folding of the model sequences A54, M54, and B35 given in Table 1. Using our computational model (40) and simulated tempering Monte Carlo (46,47) (see Materials and Methods), we determined their thermodynamic behaviors over a range of temperatures T (kBT ¼ 0.50-0.70 in model units, where kB is Boltzmann’s constant). At low T, all three sequences fold spontaneously into stable, secondarystructure-rich native states. As representative native structures, we use the minimal energy conformations found for each sequence, as shown in Fig. 2, A–C. These structures are not unlike the experimental structures of some real single-domain proteins—e.g., the 3a and 4b þ a structures of the A (52) and B (53) domains of protein G and the fivestranded b-barrel of the fyn SH3 domain (54)—although there are minor topological differences. For example, b1 and b5 are antiparallel in B35, whereas they are parallel in fyn SH3 (54). B35 also lacks a small 310-helix and an extended loop between b-strands 1 and 2 (the so-called RT loop), making B35 shorter than the 56 amino acid fyn SH3. To quantify folding progress, we use as typical the total number of native contacts formed, Nnat. Here, we define a set of native motifs to specify each fold (boxed areas in the contact maps in Fig. 2, A–C). Any contact ij is considered native if it falls within one such motif. This way, Nnat becomes insensitive to minor differences in the specific contacts formed as long as the overall fold is retained. This is desirable because it allows folding progress for both parent and mutant sequences to be assessed on an equal footing. Specifically, the 3a fold of A54 is defined by six motifs corresponding to intra- (a1, a2, and a3) or inter- (a1-a2, a2a3, and a1-a3) helical contacts. The b-barrel fold of B35 is defined by 4 b-hairpin motifs and the nonlocal b1-b5 motif, which ‘‘closes’’ the barrel. The 11 motifs of the 4b þ a fold of M54 include four motifs that correspond to the four possible ways of combining the b1-b2 and b3b4 hairpins into a single b-sheet. We include all four possibilities because our model does not fully discriminate these different tertiary arrangements, as seen from the equilibrium contact probabilities Pij in Fig. 2 E. However, the arrangement found in the native conformation is dominant with
Biophysical Journal 118, 1370–1380, March 24, 2020 1373
Trotter and Wallin
75% of the population at the lowest studied temperature (see Figs. S4 and S5). Character of the folding transition: Cooperativity, free-energy barriers, and TS shifts Given the distinct native topologies of our three proteins, we expect differences in their folding. To characterize their folding transitions under similar stability conditions, we first determine their respective folding temperatures, Tf, by considering the temperature dependence of the heat capacity, Cv, and the native state population, Pnat (for the definition of Pnat, see Materials and Methods). Selecting Tf based on either the Cv peak or by fitting Pnat to a two-state model gives similar values, as shown in Fig. 3 A. The twostate fit also provides an estimate of the energy difference DE between the U and N states. We obtain DE/kBTf ¼ 52, 88, and 54 for A54, M54, and B35, respectively. Although a precise quantitative agreement should not be expected, these values can be compared to experimentally determined enthalpies of unfolding on related proteins. For example, DHcal from differential scanning calorimetry measurements on GA (170 kJ/mol) (55), GB (258 kJ/mol) (56), and fyn SH3 (232 kJ/mol) (57) give DHcal/RTm ¼ 60, 87, and 81, respectively, where Tm is the (experimental) midpoint and R the gas constant. The somewhat low DE for B35 compared to DHcal for fyn SH3 is likely due to the relatively shorter length of B35. The rough agreement between DE and DHcal shows that from the perspective of thermodynamic unfolding (melting) curves, the degree of cooperativity of our model is in line with experiments. However, we find that the apparent cooperativity of the folding transition is highly observable dependent. The chain
FIGURE 3 Folding thermodynamics. (A and B) Temperature dependence of (A) the native state population, Pnat, and (B) the average radius of gyration taken over Ca atoms, hRg i, is shown. Solid lines in (A) are obtained by fitting Pnat to the two-state equation 1/(1 þ X), where X ¼ exp[DE(1/kBT 1/kBTf)] and DE and Tf are fit parameters. Inset: temperature dependence of the normalized heat capacity Cv/LkB ¼ ðhE2 i hEi2 Þ=Lk2B T 2 , where L is the chain length, is shown; solid lines are obtained using reweighting techniques (50). The final selected folding temperatures Tf (solid plot symbols) are determined from the Cv peaks and given by kBTf ¼ 0.55, 0.55, and 0.52 for A54, M54, and B35, respectively, whereas the two-state fits give 0.54, 0.55, and 0.51. Dashed lines in (B) are drawn to guide the eye.
1374 Biophysical Journal 118, 1370–1380, March 24, 2020
collapse, as measured by the radius of gyration hRg i, occurs over a wider range of temperatures (see Fig. 3 B), and accordingly, the T dependence of hRg i cannot be described with a two-state model (data not shown). A somewhat reduced cooperativity of our model sequences is also seen from the free-energy profiles F(Nnat) taken at Tf (see Fig. 2, D–F). Whereas M54 and B35 exhibit two clear minima representing U and N, respectively, separated by a small free-energy barrier in the range 0.5–1.0kBT, the barrier for A54 is very small or even absent. Importantly, the folding cooperativities of our three proteins, as suggested by the barrier heights in the free-energy profiles F(Nnat), have a rank order (B35 > M54 > A54) that follows a corresponding decrease in RCO (0.37 > 0.29 > 0.25). Hence, despite exhibiting a somewhat reduced cooperativity overall, our model reproduces the general trend of topologydependent protein folding. We also identify a ‘‘TS region’’ along Nnat for each of our three proteins such that it includes most of the TS ensemble conformations. To this end, we first determine the location of the barrier peak, N nat , in the profiles F(Nnat). We find that N nat is strongly temperature dependent for both M54 and B35, as shown in Fig. 2, E and F (insets). Reducing T below Tf shifts N nat to lower values, and conversely, increasing T shifts N nat to higher values. To the extent that a shift in N nat reflects a shift in the TS ensemble, this means that the TS becomes more native-like under destabilizing conditions and more unfolded-like under stabilizing conditions. This behavior is in line with the general chemical-denaturant-driven TS shifts (i.e., Hammond behavior) seen in a large-scale analysis of experimental f-value data on 24 small single-domain proteins (58). We then use the ranges of N nat values obtained by varying T around Tf to select the TS regions for M54 and B35, as shown in Fig. 2, E and F (insets). For A54, a specific barrier location N nat cannot be robustly identified at any T, but there are shifts in U and N for this sequence, too, in a manner similar to M54 and B35. This variation allows us to pick a reasonable TS region for A54 also (see Fig. 2 D, inset), keeping the size of the TS range relative to N N nat roughly the same as for M54 and B35. We emphasize that a set of conformations selected on the basis of a free-energy barrier along a one-dimensional order parameter such as Nnat does not, in general, coincide with the actual TS ensemble (59,60). The reason is that folding occurs on a high-dimensional energy landscape and is therefore not always captured by simple diffusion along a onedimensional parameter (61), although Nnat has been found to work as a reaction coordinate for some Go-type models that ignore non-native interactions (62). Because our analysis below focuses on the ‘‘barrier regions’’ in the F(Nnat) profiles (shaded areas in Fig. 2, D–F), it is important to establish to what extent they represent the actual TS ensembles. To this end, we carry out a pfold analysis (59). The premise of this analysis is that, from a kinetics point of view, a true TS conformation should be equally likely to proceed rapidly to either the
Topology and Sequence Effects in Folding
native or the unfolded state. We determine the pfold value, i.e., the probability of proceeding to the N before U, for a representative sample of conformations taken from the TS regions, as described in Materials and Methods. In line with previous studies (51), we find that pfold values can be very different even for conformations with similar Nnat. The frequency of ‘‘true’’ TS conformations (0.4 < pfold < 0.6) peaks at around 15–20% within the TS region for each of the three proteins (see Fig. S6). Hence, most of the ‘‘true’’ TS conformations are therefore located within the TS region. Moreover, we compare averages of various observables taken over the two different ensembles (see Fig. S7). Although there are differences, they are relatively small. Hence, although the TS ensembles identified in Fig. 2, D–F are not true TS states, their statistics are reasonable representations of those found using a more rigorous pfold analysis. For ‘‘notational’’ simplicity, we refer occasionally to conformations taken from the shaded regions in Fig. 2, D–F as TS ensembles.
Local versus nonlocal structure formation and the origin of f-value dispersion To quantify the extent to which various structures are formed at different stages of the folding process, we focus on two variables, qI ¼ nI/hnI inat and fi ¼ Ni/hNi inat, where nI is the number of contacts formed in native motif I, and Ni is the number of native contacts formedP by amino acid P position i. Note that Nnat ¼ nI ¼ ð1 =2Þ Ni so that nI I
i
and Ni simply express how the formed native contacts are partitioned over different motifs and positions along the chain, respectively. The normalization constants, hnI inat and hNi inat, ensure that qI and fi are approximately bounded between
0 and 1. This way, hqI i can be interpreted as a normalized measure of motif stability. Moreover, hfi i, with the average taken over the TS ensemble, is a common computational analog of experimental f-values (see, e.g., (25,63,64)). Stratified by Nnat, hqI i and hfi i exhibit several interesting trends, as shown in Fig. 4. For hqI i, we note the following: 1) symmetries in the native structures of A54 and M54 are apparent. In particular, the N- and C-terminal helices in A54 (a1 and a3), which also have identical sequences (see Table 1), exhibit very similar stabilities across all Nnat, and similarly for the N- and C-terminal b-hairpins in M54 (b1-b2 and b3-b4); 2) local motifs, such as a-helices and b-hairpins, become stable at lower Nnat than nonlocal (tertiary) motifs; 3) tertiary structures are the drivers of cooperativity, as reflected by the sigmoidal-like shapes of the nonlocal hqI i curves (red curves). Interestingly, the inflection points in these curves, to the extent they can be determined, coincide with the barrier locations in F(Nnat). Cooperativity and long-range interactions have indeed been linked in previous computational studies (29,65,66). Krobath et al. (29) showed that folding cooperativity is modulated by the strength of interactions between the Nand C-termini regions, which are often spatially close in two-state proteins. Experiments on consensus ankyrin repeat proteins by Aksel et al. (66) showed that long-range interfacial repeat interactions are strongly contributing to cooperativity. Taken together, 2 and 3 indicate that the TS is associated with the formation of nonlocal, tertiary structures whose stabilities, in turn, depend on at least partially formed constituent local structures. An interesting feature of the hfi i curves in Fig. 4, D–F is the difference in the dispersion of hfi i-values between the three proteins, especially in the TS regions. The situation is in line with the large-scale analysis of experimental f-values
FIGURE 4 Local versus nonlocal contact formation. (A–C) Stabilities qI ¼ nI/hnI inat of structural motifs I as functions of Nnat are shown for (A) A54, (B) M54, and (C) B35. Boxes drawn around plot symbols in (A)–(C) indicate a division of the structural motifs into two types: local (solid boxes) and nonlocal (dashed boxes). (D–F) Fractions of native contacts formed fi ¼ Ni/hNi inat for h or p positions, i, are shown as functions of Nnat for (D) A54, (E) M54, and (F) B35. hfi i curves are drawn in dark blue for positions forming nonlocal motifs (i.e., 1–16 and 39–54 for A54, 1–16 and 39–54 for M54, and 1–6 and 29–35 for B35) and dark gray for all other positions. Normalization constants hnI inat and hNi inat are determined for conformations with Nnat R 140, 130, and 90 for A54, M54, and B35, respectively, corresponding to the high-Nnat end of the respective free-energy profiles in Fig. 2, D–F, where F(Nnat) is >4kBT higher than at the N minimum; qI or fi > 1 states are therefore not statistically relevant. To see this figure in color, go online.
Biophysical Journal 118, 1370–1380, March 24, 2020 1375
Trotter and Wallin
by Naganathan and Munoz (58). They observed that the dispersion of f-values was linked to structural class, with the TS of all-b proteins being the most polarized (largest spread in f) and all-a proteins the least polarized (smallest spread). To understand the origin of this difference, it is instructive to compare the two extreme cases in our study, A54 and B35. Any dispersion in fi must originate from a dispersion in the contact stabilities, Pij. In this regard, the two proteins are similar; both A54 and B35 exhibit a large stability gap between contacts ij formed in local versus nonlocal motifs (cf. Fig. 4, A and C). In A54, however, these stability differences do not result in a large dispersion in fi because, in this fold, the fraction of the native contacts that are local, Nloc/Nnat, is relatively constant along the chain (see Fig. S8 A). By contrast, in the all-b fold, there are large variations in Nloc/Nnat between amino acid positions (see Fig. S8 C). This variation is mainly due to the fact that b1 and b5 have very few local contacts. As a result, f-values in b1 and b5 are much lower than those in the central region b2-b4, where the fraction of (more stable) local contacts is higher. Effects of mutations on the TS are topology dependent We turn now to the question of how features of the folding energy landscapes of A54, M54, and B35, as quantified by hqI i and hfi i, are modified by mutations. To examine this question systematically, we first generate all possible singleand double-point mutants involving h 4 p swaps. As it turns out, there are 1176, 990, and 378 such mutants for A54, M54, and B35, respectively (for example, the 48 h or p positions in A54 give 48 þ 48 47/2 ¼ 1176 mutants). We then determine the thermodynamics of all mutant sequences at the Tf-values of their respective parent proteins using multisequence Monte Carlo simulations (see Materials and Methods and (45)). Based on these simulations, we extract two sequence sets for each protein: a stable set, which satisfies Pnat > 0.5P0nat , and an isostable set, which satisfies 0:75P0nat < Pnat < 1:25P0nat , where P0nat is the parent stability. The stable sets contain 50% of A54 and M54 mutants and 16% of B35 mutants, suggesting that the 3a and 4b þ a folds are mutationally more robust than the b-barrel. The isostable sets contain 25% of A54 and M54 mutants and 6% of B35 mutants. Next we use the isostable sets to examine the magnitude of mutation-induced shifts DhqI i ¼ hqI i hqI i0 and Dhfi i ¼ hfii hfi i0, where the average hi0 refers to a parent sequence. Taken over the TS ensemble, we find that the size of the shifts DhqI i for either local or nonlocal motifs can be ranked A54 > M54 > B35, as shown in Fig. 5 A. For example, there are many isostable A54 mutants with jDhqI i j > 0:1, especially for nonlocal motifs, but very few B35 mutants have jDhqI i j > 0:05, and none have >0.1. There is a similar trend for the shifts Dhfi i. More positions i exhibit small hfi i shifts for B35 than for A54 or M54
1376 Biophysical Journal 118, 1370–1380, March 24, 2020
FIGURE 5 Effects of sequence variations on the TS ensemble. (A) The fraction of sequences in the isostable set as function of the magnitude of the stability shift jDqIj for nonlocal native motifs I is shown. Inset: results for local motifs are shown; DqI is here the summed shift over all local motifs. For the division of motifs into local and nonlocal types, see Fig. 4. (B) The fraction of amino acid positions i as a function of fi-value shift jDhfi ij is shown, averaged over all M sequences in the isostable set. All DqI and Dhfi i quantities are calculated over the respective TS regions in Fig. 2, D–F. To see this figure in color, go online.
(see Fig. 5 B). Hence, fi-values are on average more conserved for the b-barrel protein B35 than for the all-a and mixed a/b proteins, A54 and M54. Interestingly, when examining the shifts DhqI i and Dhfi i in the unfolded state, we find no such trend (see Fig. S9). Conformational fluctuations and mutationinduced shifts are linked Are there features of the folding of A54, M54, and B35 that can explain their distinct responses to mutations? To address this question, we note that for an observable X and a mutation s / s0 , the shift DhXi ¼ hXis0 hXis can be written hXwis hXis hwis ; hwis
(3)
where w ¼ exp(bDE), and DE ¼ Es0 Es, which follows from the Zwanzig-type formula hXis0 ¼ hXwis =hwis . The quantity w indicates whether s / s0 is energy neutral (w ¼ 1), stabilizing (w > 1), or destabilizing (w < 1). Equation 3 describes reasonably well the observed shifts in qI and fi from the isostable single- and double-point mutations, as shown in Fig. 6, A and B. Moreover, Eq. 3 implies that the magnitude of the mutational shift jDhXi j is controlled by the size of the ‘‘fluctuations’’ dX ¼ X hXi and dw ¼ w hwi, i.e., s2X ¼ hdX 2 i and s2w ¼ hdw2 i, and the extent to which dX and dw are correlated. To delineate the impact of these different factors, we express the quantity in Eq. 3 as ð1 =hwiÞswsXcosa, where a is the ‘‘angle’’ between the X, w fluctuations satisfying p/2 % a % p/2. We find that the size of (nonlocal) qI fluctuations, taken over TS conformations, is sqI ¼ 0.29, 0.36, and 0.24 for A54, M54, and B35, respectively, suggesting the all-b B35 protein has the most conformationally restricted TS. Similarly, averaged over all isostable sequences, the factor cosa (i.e., the dqI, dw correlation
Topology and Sequence Effects in Folding
FIGURE 6 Linking mutational effects and conformational properties. (A and B) The observed mutational shifts of (A) the stability of nonlocal motif I, DhqI i, and (B) the fi-value of a central chain position i, Dhfi i, for all sequences in the isostable set are compared to applying ð1 =hwiÞswsXcosa (see Eq. 3) with X ¼ qI or fi to the parent protein and the corresponding mutations (s ¼ parent sequence and s0 ¼ sequence variant); note that the sizes of fluctuations sqI and sfi are properties of the parent sequence alone, whereas w, sw, and cosa depend on which isostable sequence (mutation) is considered. (C and D) The fraction of sequences in the isostable set is shown as a function of the following quantities: (C) sqI cos a, (C, inset) sw/hwi, and (D, inset) the fi, w correlation coefficient cosa. (D) The fraction of amino acid positions i as function of sfi is shown. All quantities are calculated over the respective TS regions in Fig. 2, D–F. To see this figure in color, go online.
coefficient) is 0.015, 0.010, and 0.008, respectively. Hence, the mutational-energetic fluctuations dw are most aligned with qI fluctuations in the all-a protein, even though the correlations are very small overall. By contrast, the overall size of the energetic fluctuations, sw/hwi, are similar across the folds, as seen in Fig. 6 C (inset). Mutationally induced shifts to hqI i should therefore be described by the remaining factors sqI cosa, which is indeed the case, as seen in Fig. 6 C (cf. Fig. 5 A). Interestingly, for fi, cosa (i.e., the dfi, dw correlations) are similar across the folds (see Fig. 6 D, inset). The factor sw/hwi is, of course, the same for fi because it is observable independent. Therefore, sfi alone captures many of the observed trends in the shifts of hfi i values, as seen in Fig. 6 D (cf. Fig. 5 B). In particular, this means that the underlying distribution P(fi), through its width sfi , is a factor in controlling the rate of divergence of hfi i as sequence changes accumulate. DISCUSSION We have used a coarse-grained model (40) and enhanced sampling techniques (45) to systematically study how vari-
ations in the amino acid sequence affect the folding of three topologically distinct proteins. We have found that the energy landscapes are perturbed in a fold-dependent manner. In particular, mutation-induced shifts in the TS average of two structural parameters, hqI i and hfi i, tend to be smaller for the b-barrel protein than for the all-a and mixed a/b proteins, in agreement with the overall trend from experimental folding studies on structurally homologous proteins (30,31). Our analysis shows further that the malleability of the TS ensemble to mutational effects can be understood from features of the underlying conformational ensemble. We have found that mutation-induced shifts of the TS average of structural parameters such as hqI i and hfi i are modulated by two factors: 1) the ‘‘width’’ of the ensemble (i.e., sqI or sfi ) and 2) the correlation between the structural parameter (qI or fi) and a factor that we denote w (see Eq. 3), describing the energetic response of the mutation. As a result of point (1), many all-b proteins, which typically have conformationally restricted TSs (24) and thus smaller ‘‘fluctuations’’ in relevant structural parameters, will exhibit a reduced effect from (stability-neutral) sequence changes. By contrast, all-a proteins with typically wider TS ensembles will be more sensitive to mutational effects. We stress that these trends do not prevent individual mutations on the same protein from having very different impacts on folding. Indeed, the relative size of the fluctuations dw (i.e., sw/hwi) exhibits large mutation-to-mutation dispersion (Fig. 6 C). Perhaps surprisingly, however, we find that sw/ hwi, averaged over many mutant sequences, does not depend strongly on topology. We note in this context that the addition of statistical (random) perturbations to either native (25) or non-native (26) contact energy parameters in a structurebased Ca model led to a fold-specific response akin to our results, i.e., low RCO proteins are more sensitive to energetic perturbations (25). Here, we have shown that systematically chosen (stability-neutral) mutations produce, on average, similar energetic influences across folds. It is therefore not the size of the fluctuations dw that underlies the different mutational sensitivities of folds but rather their ‘‘alignment’’ with conformational fluctuations (point 2). For example, we find, on average, lower correlations between w and (nonlocal) qI in the TS of the b-barrel protein compared to those in the all-a and mixed a/b proteins, still further reducing the impact of mutations on the folding of the b-barrel protein. Previous studies have indicated that non-native interactions may play a relatively larger role for topologically simple proteins such as a-helix bundles (26,67). Consistent with these reports, we find relatively fewer non-native contacts in the TS of the b-barrel protein compared to the mixed a/b and all-a proteins (see Fig. S10), suggesting perhaps that non-native interactions may play an accelerating role in the sequencedriven divergence of folding mechanisms. We note, however, that important non-native interactions have been experimentally demonstrated in the folding of both the fyn SH3 domain (68,69) (b-barrel) and in a-helix bundle proteins (32,70). The
Biophysical Journal 118, 1370–1380, March 24, 2020 1377
Trotter and Wallin
relationship between non-native interactions and divergence of folding mechanisms will need to be further explored. Finally, our results imply that pathway heterogeneity in folding might be linked to a larger divergence in the folding behavior driven by accumulation of sequence changes. This lends support to a previous speculation by Davidson and Zarrine-Afsar (30). Indeed, heterogeneity in folding trajectories has been observed in small a-helical proteins (71,72). More specifically, the tendency for f-values to diverge within structurally homologous proteins should be controlled by the underlying distribution P(f), a quantity not easily accessible experimentally. However, this assertion might be testable by comparing measured f-value data with distributions P(f) extracted using statistical-mechanicalbased models, such as the Wako-SaitoˆMun˜ozEaton model (73). CONCLUSIONS We have shown that the observed mutational robustness in the folding of some two-state proteins, such as SH3 domains, originates in part from the conformationally restricted nature of their TS ensembles. Because the TS ensembles of two-state proteins are determined largely by native state topology, effects of topology and sequence in protein folding can therefore not be separated. Experiments on proteins with the same fold but divergent sequences have provided crucial insights into protein folding despite being restricted to relatively few sequences (30,31). Here, we have demonstrated how systematically probing the relationship between sequence and folding energy landscape using a novel, to our knowledge, generalized-ensemble sampling scheme (45) can uncover novel principles. In this regard, recent developments in techniques for large-scale protein design (74,75) provide exciting new avenues for additional insights into folding and, by their extension, perhaps also into the mutational processes that drive proteins to abruptly switch into entirely different folds (42,76,77).
REFERENCES 1. Jackson, S. E., and A. R. Fersht. 1991. Folding of chymotrypsin inhibitor 2. 1. Evidence for a two-state transition. Biochemistry. 30:10428– 10435. 2. Fersht, A. R., A. Matouschek, and L. Serrano. 1992. The folding of an enzyme. I. Theory of protein engineering analysis of stability and pathway of protein folding. J. Mol. Biol. 224:771–782. 3. Grantcharova, V. P., and D. Baker. 1997. Folding dynamics of the src SH3 domain. Biochemistry. 36:15685–15692. 4. van Nuland, N. A., F. Chiti, ., C. M. Dobson. 1998. Slow folding of muscle acylphosphatase in the absence of intermediates. J. Mol. Biol. 283:883–891. 5. Kim, Y. E., M. S. Hipp, ., F. U. Hartl. 2013. Molecular chaperone functions in protein folding and proteostasis. Annu. Rev. Biochem. 82:323–355. 6. Jackson, S. E. 1998. How do small single-domain proteins fold? Fold. Des. 3:R81–R91. 7. Bryngelson, J. D., and P. G. Wolynes. 1987. Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. USA. 84:7524–7528. 8. Sali, A., E. Shakhnovich, and M. Karplus. 1994. How does a protein fold? Nature. 369:248–251. 9. Dill, K. A., and H. S. Chan. 1997. From Levinthal to pathways to funnels. Nat. Struct. Biol. 4:10–19. 10. Sosnick, T. R., and D. Barrick. 2011. The folding of single domain proteins–have we reached a consensus? Curr. Opin. Struct. Biol. 21:12–24. 11. Abkevich, V. I., A. M. Gutin, and E. I. Shakhnovich. 1994. Specific nucleus as the transition state for protein folding: evidence from the lattice model. Biochemistry. 33:10026–10036. 12. Mirny, L. A., and E. I. Shakhnovich. 1999. Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J. Mol. Biol. 291:177–196. 13. Plaxco, K. W., K. T. Simons, and D. Baker. 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277:985–994. 14. Gromiha, M. M., and S. Selvaraj. 2001. Comparison between longrange interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction. J. Mol. Biol. 310:27–32. 15. Zhou, H., and Y. Zhou. 2002. Folding rate prediction using total contact distance. Biophys. J. 82:458–463. 16. Naganathan, A. N., and V. Mun˜oz. 2005. Scaling of folding times with protein size. J. Am. Chem. Soc. 127:480–481.
SUPPORTING MATERIAL
17. Istomin, A. Y., D. J. Jacobs, and D. R. Livesay. 2007. On the role of structural class of a protein with two-state folding kinetics in determining correlations between its size, topology, and folding rate. Protein Sci. 16:2564–2569.
Supporting Material can be found online at https://doi.org/10.1016/j.bpj. 2020.01.020.
18. Henry, E. R., R. B. Best, and W. A. Eaton. 2013. Comparing a simple theoretical model for protein folding with all-atom molecular dynamics simulations. Proc. Natl. Acad. Sci. USA. 110:17880–17885.
AUTHOR CONTRIBUTIONS
19. Jacobs, W. M., and E. I. Shakhnovich. 2018. Accurate protein-folding transition-path statistics from a simple free-energy landscape. J. Phys. Chem. B. 122:11126–11136.
S.W. designed the research. D.T. and S.W. carried out the simulations and analyzed the data. S.W. and D.T. wrote the article.
20. Mirny, L., and E. Shakhnovich. 2001. Protein folding theory: from lattice to all-atom models. Annu. Rev. Biophys. Biomol. Struct. 30:361– 396.
ACKNOWLEDGMENTS
21. Jewett, A. I., V. S. Pande, and K. W. Plaxco. 2003. Cooperativity, smooth energy landscapes and the origins of topology-dependent protein folding rates. J. Mol. Biol. 326:247–253.
We thank Tobin Sosnick and Hue Sun Chan for useful discussions. This work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada and by the computational resources provided by Compute Canada.
1378 Biophysical Journal 118, 1370–1380, March 24, 2020
22. Chavez, L. L., J. N. Onuchic, and C. Clementi. 2004. Quantifying the roughness on the free energy landscape: entropic bottlenecks and protein folding rates. J. Am. Chem. Soc. 126:8426–8432. 23. Wallin, S., and H. S. Chan. 2005. A critical assessment of the topomer search model of protein folding using a continuum explicit-chain
Topology and Sequence Effects in Folding model with extensive conformational sampling. Protein Sci. 14:1643– 1660.
46. Marinari, E., and G. Parisi. 1992. Simulated tempering: a new Monte Carlo scheme. Europhys. Lett. 19:451–458.
24. Wallin, S., and H. S. Chan. 2006. Conformational entropic barriers in topology-dependent protein folding: perspectives from a simple native-centric polymer model. J. Phys. Condens. Matter. 18:S307.
47. Lyubartsev, A. P., A. A. Martsinovski, ., P. N. Vorontsov-Velyaminov. 1992. New approach to Monte Carlo calculation of the free energy: method of expanded ensembles. J. Chem. Phys. 96:1776–1783.
25. Cho, S. S., Y. Levy, and P. G. Wolynes. 2009. Quantitative criteria for native energetic heterogeneity influences in the prediction of protein folding kinetics. Proc. Natl. Acad. Sci. USA. 106:434–439.
48. Favrin, G., A. Irb€ack, and F. Sjunnesson. 2001. Monte Carlo update for chain molecules: biased Gaussian steps in torsional space. J. Chem. Phys. 114:8154–8158.
26. Kluber, A., T. A. Burt, and C. Clementi. 2018. Size and topology modulate the effects of frustration in protein folding. Proc. Natl. Acad. Sci. USA. 115:9234–9239.
49. Favrin, G., A. Irb€ack, and S. Wallin. 2002. Folding of a small helical protein using hydrogen bonds and hydrophobicity forces. Proteins. 47:99–105.
27. Faisca, P. F., M. M. Telo Da Gama, and R. C. Ball. 2004. Folding and form: insights from lattice simulations. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 69:051917.
50. Shirts, M. R., and J. D. Chodera. 2008. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129:124105.
28. Faı´sca, P. F., R. D. Travasso, ., A. Rey. 2012. Why do protein folding rates correlate with metrics of native topology? PLoS One. 7:e35599.
51. Ding, F., W. Guo, ., J. E. Shea. 2005. Reconstruction of the src-SH3 protein domain transition state ensemble using multiscale molecular dynamics simulations. J. Mol. Biol. 350:1035–1050.
29. Krobath, H., A. Rey, and P. F. Faı´sca. 2015. How determinant is N-terminal to C-terminal coupling for protein folding? Phys. Chem. Chem. Phys. 17:3512–3524.
52. Kraulis, P. J., P. Jonasson, ., J. Ko¨rdel. 1996. The serum albuminbinding domain of streptococcal protein G is a three-helical bundle: a heteronuclear NMR study. FEBS Lett. 378:190–194.
30. Zarrine-Afsar, A., S. M. Larson, and A. R. Davidson. 2005. The family feud: do proteins with similar structures fold via the same pathway? Curr. Opin. Struct. Biol. 15:42–49.
53. Gronenborn, A. M., D. R. Filpula, ., G. M. Clore. 1991. A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein G. Science. 253:657–661.
31. Nickson, A. A., and J. Clarke. 2010. What lessons can be learned from studying the folding of homologous proteins? Methods. 52:38–50. 32. Ferguson, N., A. P. Capaldi, ., S. E. Radford. 1999. Rapid folding with and without populated intermediates in the homologous four-helix proteins Im7 and Im9. J. Mol. Biol. 286:1597–1608. 33. Chen, T., and H. S. Chan. 2015. Native contact density and nonnative hydrophobic effects in the folding of bacterial immunity proteins. PLoS Comput. Biol. 11:e1004260. 34. Friel, C. T., D. A. Smith, ., S. E. Radford. 2009. The mechanism of folding of Im7 reveals competition between functional and kinetic evolutionary constraints. Nat. Struct. Mol. Biol. 16:318–324. 35. Martı´nez, J. C., and L. Serrano. 1999. The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved. Nat. Struct. Biol. 6:1010–1016. 36. Riddle, D. S., V. P. Grantcharova, ., D. Baker. 1999. Experiment and theory highlight role of native state topology in SH3 folding. Nat. Struct. Biol. 6:1016–1024. 37. Northey, J. G., A. A. Di Nardo, and A. R. Davidson. 2002. Hydrophobic core packing in the SH3 domain folding transition state. Nat. Struct. Biol. 9:126–130.
54. Noble, M. E., A. Musacchio, ., R. K. Wierenga. 1993. Crystal structure of the SH3 domain in human Fyn; comparison of the three-dimensional structures of SH3 domains in tyrosine kinases and spectrin. EMBO J. 12:2617–2624. 55. Rozak, D. A., J. Orban, and P. N. Bryan. 2005. G148-GA3: a streptococcal virulence module with atypical thermodynamics of folding optimally binds human serum albumin at physiological temperatures. Biochim. Biophys. Acta. 1753:226–233. 56. Alexander, P., S. Fahnestock, ., P. Bryan. 1992. Thermodynamic analysis of the folding of the streptococcal protein G IgG-binding domains B1 and B2: why small proteins tend to have high denaturation temperatures. Biochemistry. 31:3597–3603. 57. Schweiker, K. L., A. Zarrine-Afsar, ., G. I. Makhatadze. 2007. Computational design of the Fyn SH3 domain with increased stability through optimization of surface charge charge interactions. Protein Sci. 16:2694–2702. 58. Naganathan, A. N., and V. Mun˜oz. 2010. Insights into protein folding mechanisms from large scale analysis of mutational effects. Proc. Natl. Acad. Sci. USA. 107:8611–8616. 59. Du, R., V. S. Pande, ., E. S. Shakhnovich. 1998. On the transition coordinate for protein folding. J. Chem. Phys. 108:334–350.
38. Troilo, F., D. Bonetti, ., S. Gianni. 2018. Folding mechanism of the SH3 domain from Grb2. J. Phys. Chem. B. 122:11166–11173.
60. Hummer, G. 2004. From transition paths to transition states and rate coefficients. J. Chem. Phys. 120:516–523.
39. Giri, R., A. Morrone, ., S. Gianni. 2012. Folding pathways of proteins with increasing degree of sequence identities but different structure and function. Proc. Natl. Acad. Sci. USA. 109:17772–17776.
61. Neupane, K., A. P. Manuel, and M. T. Woodside. 2016. Protein folding trajectories can be described quantitatively by one-dimensional diffusion over measured energy landscapes. Nat. Phys. 12:700–704.
40. Bhattacherjee, A., and S. Wallin. 2012. Coupled folding-binding in a hydrophobic/polar protein model: impact of synergistic folding and disordered flanks. Biophys. J. 102:569–578.
62. Best, R. B., and G. Hummer. 2010. Coordinate-dependent diffusion in protein folding. Proc. Natl. Acad. Sci. USA. 107:1088–1093.
41. Holzgr€afe, C., and S. Wallin. 2014. Smooth functional transition along a mutational pathway with an abrupt protein fold switch. Biophys. J. 107:1217–1225. 42. Holzgr€afe, C., and S. Wallin. 2015. Local versus global fold switching in protein evolution: insight from a three-letter continuous model. Phys. Biol. 12:026002. 43. Irb€ack, A., F. Sjunnesson, and S. Wallin. 2000. Three-helix-bundle protein in a Ramachandran model. Proc. Natl. Acad. Sci. USA. 97:13614– 13618. 44. Hill, R. B., D. P. Raleigh, ., W. F. DeGrado. 2000. De novo design of helical bundles as models for understanding protein folding and function. Acc. Chem. Res. 33:745–754. 45. Aina, A., and S. Wallin. 2017. Multisequence algorithm for coarsegrained biomolecular simulations: exploring the sequence-structure relationship of proteins. J. Chem. Phys. 147:095102.
63. Best, R. B., and G. Hummer. 2016. Microscopic interpretation of folding f-values using the transition path ensemble. Proc. Natl. Acad. Sci. USA. 113:3263–3268. 64. Yang, J. S., S. Wallin, and E. I. Shakhnovich. 2008. Universality and diversity of folding mechanics for three-helix bundle proteins. Proc. Natl. Acad. Sci. USA. 105:895–900. 65. Badasyan, A., Z. Liu, and H. S. Chan. 2009. Interplaying roles of native topology and chain length in marginally cooperative and noncooperative folding of small protein fragment. J Quantum Chem. 109:3482– 3499. 66. Aksel, T., A. Majumdar, and D. Barrick. 2011. The contribution of entropy, enthalpy, and hydrophobic desolvation to cooperativity in repeatprotein folding. Structure. 19:349–360. 67. Faı´sca, P. F., A. Nunes, ., E. I. Shakhnovich. 2010. Non-native interactions play an effective role in protein folding dynamics. Protein Sci. 19:2196–2209.
Biophysical Journal 118, 1370–1380, March 24, 2020 1379
Trotter and Wallin 68. Zarrine-Afsar, A., S. Wallin, ., H. S. Chan. 2008. Theoretical and experimental demonstration of the importance of specific nonnative interactions in protein folding. Proc. Natl. Acad. Sci. USA. 105:9999– 10004. 69. Neudecker, P., P. Robustelli, ., L. E. Kay. 2012. Structure of an intermediate state in protein folding and aggregation. Science. 336:362– 366. 70. Chung, H. S., S. Piana-Agostinetti, ., W. A. Eaton. 2015. Structural origin of slow diffusion in protein folding. Science. 349:1504–1510. 71. Otosu, T., K. Ishii, ., T. Tahara. 2017. Highly heterogeneous nature of the native and unfolded states of the B domain of protein A revealed by two-dimensional fluorescence lifetime correlation spectroscopy. J. Phys. Chem. B. 121:5463–5473. 72. Nagarajan, S., S. Xiao, ., R. B. Dyer. 2018. Heterogeneity in the folding of villin headpiece subdomain HP36. J. Phys. Chem. B. 122:11640–11648.
1380 Biophysical Journal 118, 1370–1380, March 24, 2020
73. Gopi, S., S. Paul, ., A. N. Naganathan. 2018. Extracting the hidden distributions underlying the mean transition state structures in protein folding. J. Phys. Chem. Lett. 9:1771–1777. 74. Tian, P., J. M. Louis, ., R. B. Best. 2018. Co-evolutionary fitness landscapes for sequence design. Angew. Chem. Int.Engl. 57:5674–5678. 75. Rocklin, G. J., T. M. Chidyausiku, ., D. Baker. 2017. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 357:168–175. 76. Bryan, P. N., and J. Orban. 2010. Proteins that switch folds. Curr. Opin. Struct. Biol. 20:482–488. 77. Sikosek, T., H. Krobath, and H. S. Chan. 2016. Theoretical insights into the biophysics of protein bi-stability and evolutionary switches. PLoS Comput. Biol. 12:e1004960. 78. Northey, J. G., K. L. Maxwell, and A. R. Davidson. 2002. Protein folding kinetics beyond the phi value: using multiple amino acid substitutions to investigate the structure of the SH3 domain folding transition state. J. Mol. Biol. 320:389–402.