Hydrophobic Core Formation and Dehydration in Protein Folding Studied by Generalized-Ensemble Simulations

Hydrophobic Core Formation and Dehydration in Protein Folding Studied by Generalized-Ensemble Simulations

Biophysical Journal Volume 99 September 2010 1637–1644 1637 Hydrophobic Core Formation and Dehydration in Protein Folding Studied by Generalized-Ens...

746KB Sizes 0 Downloads 29 Views

Biophysical Journal Volume 99 September 2010 1637–1644

1637

Hydrophobic Core Formation and Dehydration in Protein Folding Studied by Generalized-Ensemble Simulations Takao Yoda,†‡* Yuji Sugita,‡§ and Yuko Okamoto{ †

Nagahama Institute of Bio-Science and Technology, Tamura, Nagahama, Shiga, Japan; ‡CREST, Japan Science and Technology Agency, Kawaguchi, Saitama, Japan; §RIKEN Advanced Science Institute, Wako, Saitama, Japan; and {Graduate School of Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, Japan

ABSTRACT Despite its small size, chicken villin headpiece subdomain HP36 folds into the native structure with a stable hydrophobic core within several microseconds. How such a small protein keeps up its conformational stability and fast folding in solution is an important issue for understanding molecular mechanisms of protein folding. In this study, we performed multicanonical replica-exchange simulations of HP36 in explicit water, starting from a fully extended conformation. We observed at least five events of HP36 folding into nativelike conformations. The smallest backbone root mean-square deviation from the crystal ˚ . In the nativelike conformations, the stably formed hydrophobic core was fully dehydrated. Statistical analstructure was 1.1 A yses of the simulation trajectories show the following sequential events in folding of HP36: 1), Helix 3 is formed at the earliest stage; 2), the backbone and the side chains near the loop between Helices 2 and 3 take nativelike conformations; and 3), the side-chain packing at the hydrophobic core and the dehydration of the core side chains take place simultaneously at the later stage of folding. This sequence suggests that the initial folding nucleus is not necessarily the same as the hydrophobic core, consistent with a recent experimental f-value analysis.

INTRODUCTION Protein is a biological macromolecule by which the efficiency and specificity of most physiological phenomena are realized. Because of the variety of tertiary structures of proteins, spontaneous folding of proteins is essential for organisms. Otherwise, a cell organelle would be required that could force thousands of different proteins to fold into their native conformations. In the folding of soluble protein domains, hydrophobic interaction is an important driving force. To ensure that the hydrophobic core stays buried in folded protein molecules, there should be a lower limit of protein chain length (1). Thus, the typical chain length of domain structure, ranging from 50 to 200 residues (2), should be the result of foldability requirements of protein domains, as well as their physiological functions. However, there are a few subsequences of proteins, or miniproteins, that are shorter than the typical chain length of protein domains and known to fold spontaneously to their native conformations. Villin headpiece subdomains HP35 and HP36 are well-known examples of such miniproteins (3). Because of their small size and fast folding (4–6), these miniproteins have received attention from many researchers. HP35 has a sequence that corresponds to the C-terminal 35 residues of the headpiece domain (HP) of an actin-binding protein villin. HP35 folds into its stable native conformation with three a-helices (7). A hydrophobic core that contains three phenylalanine residues is surrounded by the helices. HP35 is one of the smallest proteins

Submitted January 29, 2010, and accepted for publication June 22, 2010. *Correspondence: [email protected]

that fold into the native structure with the hydrophobic core located inside the molecule. HP36 is the 36-residue polypeptide whose primary sequence is the same as HP35 except for an additional N-terminal methionine. The native structure, thermodynamic stability, and (un)folding kinetics of these two miniproteins have been studied experimentally (8–18). A number of simulation studies of HP35 and HP36, including some studies in explicit solvent (19–23), have also been reported (19–29). We have performed multicanonical replica-exchange (MUCAREM) (30–32) molecular dynamics (MD) simulations of HP36 with explicit water molecules. MUCAREM MD is a combination of multicanonical algorithm (33) and replica-exchange molecular dynamics (REMD) (34), the latter of which is the MD version of the replica-exchange method (35). The initial unfolded conformation of the protein was prepared without using information about the native structure. The CHARMM22 force field (36) with CMAP correction (37,38) was used for the protein molecule, and explicit TIP3P (39) water molecules were included in the simulated system. The total length of the production runs was 2.28 ms. MUCAREM allowed us to sample a wide range of conformations of HP36, including nativelike conformations, within the 2-ms production run (40–42), which is faster than the experimental folding time. METHODS AND SIMULATION PROCEDURES For efficient sampling of protein conformations, we adopted MUCAREM (30–32). The amino acid sequence of HP36 is MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKG LF. The N- and C-termini were not capped. After a 300-step

Editor: Gregory A. Voth. Ó 2010 by the Biophysical Society 0006-3495/10/09/1637/8 $2.00

doi: 10.1016/j.bpj.2010.06.045

1638

steepest-descent minimization, we first performed an REMD simulation (34) with a distance-dependent dielectric constant with 90 replicas and temperatures ranging from 250 K to 2000 K. The initial conformation of HP36 for all 90 replicas was the fully extended conformation that was prepared using MOE software. An unfolded conformation without helices was arbitrarily chosen from a high-temperature replica, and this was used as the initial conformation of HP36 for the succeeding simulations with explicit water solvent. After soaking the protein molecule in a water ˚ radius and performing a minimization, we sphere of 30-A performed a 50-ps MD simulation for equilibration and a 100-ps REMD simulation with 128 replicas at temperatures ranging from 250 K to 700 K. Based on the results of the REMD, we updated the temperature distributions to obtain uniform acceptance ratios of replica exchange and again performed a 50-ps equilibration and a 100-ps REMD. We prepared the first multicanonical weight factors based on the second REMD results and started MUCAREM simulations. The number of replicas in the MUCAREM simulation was 32 initially (for 3.6 ns), and then was reduced to 8. We updated the multicanonical weight factors many times during the 35.5-ns (per replica) MUCAREM that was followed by a 140.9-ns (per replica) production run (henceforth referred to as MUCAREM1). Hence, the total simulation time for MUCAREM1 is 140.9 ns  8 ¼ 1.127 ms. Note that the number of replicas required for MUCAREM was only 8, whereas for REMD as many as 128 replicas were required. The second production run was also a MUCAREM MD with eight replicas. The initial conformation of HP36 was the same as that described above. After soaking it in explicit water and applying an energy minimization, we performed a 3-ns conventional MD simulation at 699 K. The final configuration was then used to start the next eight-replica MUCAREM MD for 154.6 ns. Here, the first multicanonical weight factors were prepared based on the MUCAREM1 data. Thus, we did not need to perform an additional REMD for weight preparations. The final 144.6-ns trajectories (per replica) of the MUCAREM (henceforth referred to as MUCAREM2) were used for the data analysis. The total simulation time for MUCAREM2 was 144.6 ns  8 ¼ 1.157 ms. We note that during the two production runs (MUCAREM1 and MUCAREM2), we further updated the multicanonical weight factors several times. We used CHARMM22 force field (36) with CMAP backbone-torsion corrections (37,38) for the protein molecule and TIP3P (39) for the water model. The number of water molecules in the simulated system was 3513. The shape of ˚ ) was kept by a harmonic conthe water sphere (radius 30 A straining potential. Atoms in the protein molecule were kept ˚ from the center of the sphere by another conwithin 27 A straining potential. We also kept the center of mass of the HP36 molecule at the center of the water sphere during the simulations. The computer code developed previously Biophysical Journal 99(5) 1637–1644

Yoda et al.

(30,34,43,44), which is based on version 2 of PRESTO (45), was used after modification for calculation with CHARMM22 þ CMAP force field. The single MD time step was 1.0 fs. The temperature during the MD simulations was controlled by the constraint method (46). During the simulations with explicit water solvent, the electrostatic interactions were calculated without cutoff using the cell multipole method. Canonical expectation values of physical quantities were estimated by the weighed histogram analysis method (WHAM) (47,48), extended to accommodate the MUCAREM simulations (30–32). Finally, to examine the native state directly, we also performed a conventional canonical MD simulation at 300 K for 23 ns, starting from the native conformation determined by x-ray crystallography (PDB code: 1YRF) (7).

RESULTS AND DISCUSSION Efficiency of conformational sampling and nativelike conformations observed in the simulations We analyzed the 2.28-ms MUCAREM production trajectories (MUCAREM1 and MUCAREM2) of villin headpiece subdomain HP36 with explicit water solvent. Because we used MUCAREM, we expect our simulations to be long enough to observe folding events of HP36, but with a total simulation length shorter than the folding time evaluated from experiments (ranging from 4.3 to ~10 ms) (4–6,17), and also shorter than the recently reported long-term MD simulations with explicit water solvent (23). In previous studies, we examined the performance of generalized-ensemble simulations for conformational sampling by monitoring potential-energy random walks or tunneling events (32,49). However, because water-water interactions account for the major part of the potentialenergy values, potential-energy change from a small value to a large value and back (i.e., a potential-energy tunneling event) does not necessarily correspond to a conformational transition of proteins. In this study, where the conformational change of the solute molecule can be much slower than the typical timescale of potential-energy random walks because of the large size of the HP36 molecule, we examined conformational sampling by monitoring the radius of gyration (Rg) and backbone root mean-square deviation (RMSD) (with respect to the N, Ca, and C atoms) of the protein molecule during the production runs (see Fig. 1). It can be seen that HP36 molecules underwent folding to compact conformations and unfolding many times (see Fig. 1). By comparing Rg and RMSD (Fig. 1), it can also be seen that there are different compact conformers with ˚ ) that similar Rg values close to the native value (9.4 A have different backbone RMSD values. Fig. 1 also shows that the backbone conformation fluctu˚ at ated substantially and that backbone RMSD was <3.0 A

Core Formation in HP36 Folding

1639

FIGURE 1 Time courses of backbone RMSD (a, d, and g), the number of nonlocal interresidue native contacts formed (Nn) (b, e, and h), and radius of gyration (Rg) (c, f, and i) of HP36. Here, backbone RMSD was calculated using N, Ca, and C atoms of the backbone, and Rg was calculated with respect to heavy atoms. Three of 16 replicas were chosen here: replica 5 of MUCAREM1 (a–c), replica 8 of MUCAREM1 (d–f), and replica 5 of MUCAREM2 (g–i). For native contacts, nonlocal means that the two contacting residues are in different secondary-structure elements (Helix 1: residues 4–11, Loop 1: residues 12–14, Helix 2: residues 15–18, Loop 2: residues 19–22, and Helix 3: residues 23–32) and not next to each other in the primary sequence. Two side chains are ˚ . In the native structure (PDB code: 1YRF), there are defined as in contact if the distance between any two heavy atoms in the side chains is %4.5 A ˚ . Those in b, e, and h represent 18 nonlocal native side-chain contacts. The horizontal dashed lines in a, d, and g indicate backbone RMSD ¼ 3.0 A ˚ ). Nn ¼ 10, and those in c, f, and i represent the Rg value of the native conformation (9.4 A

several time points. We now consider how many times our MUCAREM production runs reached the native conformation. For this purpose, we introduce the following criteria for folding into the native conformation. 1), We consider that the backbone folded into the native structure from unfolded ˚ . The structures if the backbone RMSD becomes %3.0 A folding event is counted separately if it goes through an ˚ ). 2), unfolded structure (with the backbone RMSD R6.5 A We use the number of nonlocal interside-chain native contacts formed, Nn, because the overall side-chain rearrangement including hydrophobic core is a good measure for defining a folding event. Here, we consider that the overall side-chain arrangement has gone from an unfolded to a native structure when Nn is R10. The folding event is counted separately if it goes through an unfolded structure (with Nn ¼ 0). The exact definition for Nn is given below. For Criterion 1, we observed 11 folding events in seven different replicas (namely, replicas 5, 7, and 8 in MUCAREM1 and replicas 1, 2, 4, and 5 in MUCAREM2 (see Fig. S2 in the Supporting Material). For Criterion 2, we observed 35 folding events in 14 different replicas (namely, replicas 1–5, 7, and 8 in MUCAREM1, and replicas 1 and 3–8 in MUCAREM2). Moreover, the number of folding events that satisfy both criteria (backbone structure and

side-chain contacts) was still up to 5 (replicas 5 and 8 in MUCAREM1 and replicas 1, 4, and 5 in MUCAREM2). The lowest backbone RMSD (for the nonterminal 34 resi˚ (see Fig. 1 g and Fig. 2 a). We also dues) was 1.1 A

FIGURE 2 Low-RMSD conformations observed in the MUCAREM1 and MUCAREM2 production runs (orange). The x-ray structure (PDB code 1YRF) is also shown (blue and green). Here, the a-helices in the x-ray structure are green and the others are blue. Three phenylalanine side chains (Phe7, Phe11, and Phe18) are shown in ball-and-stick representation. (a) Lowest backbone RMSD (with respect to N, Ca, and C atoms) conformation observed in the simulations presented here (replica 5 of ˚ (for nonterminal 34 MUCAREM2). The backbone RMSD value is 1.1 A residues). (b) A low-RMSD conformation observed in MUCAREM1 ˚ for residues 9–32 and 3.3 A ˚ for (replica 8). The RMSD value is 1.0 A the nonterminal 34 residues. These figures were prepared using VMD (61). Biophysical Journal 99(5) 1637–1644

1640

Yoda et al.

monitored the backbone RMSD of residues 9–32, which cover Helix 2 and Helix 3. Among these residues, we found ˚ ) but with conformations with small RMSD values (<2 A larger RMSD values for full length. One example of these conformations is shown in Fig. 2 b, in which the RMSD ˚ and the full RMSD (residues (residues 9–32) is 1.0 A ˚ 2–35) is 3.3 A.

Thus, residues in Loop 2 (residues 19–22) and Helix 3 (residues 23–32) appear to acquire nativelike structures in the early stage of folding of HP36. In the later stage of folding (Nn R 10), Phe7 (41.9%), Val10 (36.2%), Phe11 (49.5%), Met13 (81.1%), Lys30 (50.0%), and Lys33 (51.4%) also acquire on average >30% interside-chain native contacts formed at 350 K.

Formation of nonlocal native contacts

Dehydration of core residues

HP36 native structure has a well-packed hydrophobic core buried inside the molecule. Thus, hydrophobic interaction should play an important role in HP36 stability and folding (9). In folding processes of typical globular proteins, earlystage intermediate conformations already have the hydrophobic moieties positioned closer together than in the unfolded state. The hydrophobic core then acquires specific contacts among side chains in the later stages. In the HP36 native conformation, residues constituting the hydrophobic core are widely distributed in the primary sequence, and nonlocal interactions stabilize the native state. Here, ‘‘nonlocal’’ means that the two contacting residues are far from each other in the primary sequence. In the later part of this study, we use the number of nonlocal interside-chain native contacts formed (Nn) as the reaction coordinate to analyze the folding of HP36. In the conventional 23-ns MD simulation at 300 K that was started from the native structure, the mean Nn value was 15.4, and the standard deviation of Nn was 1.2. In the simulations presented here, various values of Nn were observed (see Fig. 1). The maximum Nn observed was 17. We drew the free-energy landscapes at different temperatures, using Nn and radius of gyration (Rg) as reaction coordinates (see Fig. S3) based on the simulation data. It is clearly seen that there is a free-energy minimum near ˚ , and that the population of conformaNn ~ 3 and Rg ~ 10 A tions with large Rg is small at low temperatures (%350 K). This suggests that a collapse into compact denatured conformations takes place in the earliest stage of folding. After that, specific nonlocal side-chain configurations are constructed. One can also find a free-energy minimum that may correspond to a folding intermediate near Nn ~ 6 and ˚ at both 320 K and 400 K (Fig. S3, a and c), while Rg ~ 10 A the minimum is not definite at 350 K (Fig. S3 b). We also evaluated by WHAM the number of native sidechain contacts (including local contacts) as a function of Nn at 350 K for 13 residues, namely, Leu2, Phe7, Val10, Phe11, Met13, Arg15, Phe18, Leu21, Gln26, Leu29, Lys30, Lys33, and Leu35, which have more than two contacts in their native structure (47,48). Phe18 (30.5%), Leu21 (48.5%), Gln26 (55.3%), and Leu29 (40.0%) turned out to have on average >30% of the native contacts formed when Nn is between 4 and 7, whereas <20% were formed for Leu2 (13.8%), Phe7 (19.7%), Val10 (7.9%), Phe11 (17.9%), Met13 (10.5%), Arg15 (4.2%), and Lys33 (15.7%).

Upon the formation of the nativelike side-chain packing, dehydration of hydrophobic-core residues should also be observed. We evaluated the number of water molecules ˚ of the phenylalanine side chains in the lying within 3.5 A hydrophobic core (Phe7, Phe11, and Phe18). The average value and standard deviation of this quantity in the conventional MD simulation at 300 K started from the x-ray structure were 1.90 and 1.34, respectively. Fig. 3 a shows the probability that the number of water molecules lying near ˚ from) the three phenylalanine side chains as (i.e., %3.5 A a function of the backbone RMSD is %3. Fig. 3 b shows the same quantity as a function of the native contacts,

Biophysical Journal 99(5) 1637–1644

FIGURE 3 (a and b) The probability that the number of water molecules ˚ of side-chain nonhydrogen atoms in Phe7, Phe11, or lying within 3.5 A Phe18 is <4. The solid curves represent the results at 300 K and the dashed curves those at 350 K. (c) The probability that the number of water mole˚ of side-chain nonhydrogen atoms in Phe18, Leu21, cules lying within 3.5 A 26 29 Gln , or Leu is <9. The probability values were evaluated as functions of the backbone RMSD (in a) or Nn0 (in b and c), where Nn0 is defined as the integer part of Nn/2. These results were obtained by WHAM applied to the data from MUCAREM1 and MUCAREM2.

Core Formation in HP36 Folding

1641

Nn/2. These results show that dehydration of the core residues takes place when the protein molecule folds into ˚ and nativelike conformations with backbone RMSD %3 A Nn R 10. This suggests that water molecules reside inside the protein molecule until the latest stage of folding. We also examined the water molecules lying close to the Phe18, Leu21, Gln26, and Leu29 side chains whose native contacts were formed in an early stage of folding (see Fig. 3 c). In the MD simulation at 300 K started from the x-ray structure, the average value and standard deviation ˚ of those of the number of water molecules within 3.5 A four side chains were 6.26 and 1.75, respectively, which indicate that these side chains are partially exposed to water even in the native state. We evaluated the probability that the number of water molecules hydrating those four side chains is %8. Again, the result (Fig. 3 c) suggests that the residues near Loop 2 and Helix 3 acquire a nativelike environment earlier than the aromatic cluster regions. Formation of nativelike backbone structure It is experimentally known that none of the short isolated peptide fragments that correspond to one of the three native a-helices of HP36 stably forms helical conformations (50), suggesting the importance of nonlocal interactions for the helical structures (50,51). These studies have also shown that a peptide fragment whose primary sequence covers Helices 1 and 2 has a large amount of helical structure. This suggests that the nonlocal interactions stabilize the helices even in the denatured ensemble. Based on the MUCAREM simulation results, one can examine this character for the full sequence of HP36. To analyze the helix formation, we categorized HP36 conformations into three groups according to Nn. Helix contents for these conformational groups are shown in Fig. 4, a–c. Secondary structures were identified by the DSSP algorithm (52). It is clearly seen that three helices tend to be formed even in the conformations that are not so nativelike, and that those positions in the primary sequence are also similar to those in the native structure. These conformations (Nn % 7) were found to have 15.1 and 14.7 nonlocal interside-chain contacts on average (including nonnative contacts) at 300 K and 350 K, respectively. Thus, this denatured ensemble is far from the random-coil state. Rather, it has some compact shapes (see also Fig. S3), and the nonlocal interactions should stabilize the helices. These observations are consistent with the experimental implications (50). Helix 3 is the most highly formed of the three helices in this condition. When Nn becomes >7 (see Fig. 4 b), two changes take place. First, the a-helix contents of residues 12–14 (Loop 1) decrease while the backbone shape of this region is not exactly nativelike at this stage (see below). Second, the N-terminal end of Helix 3 becomes more definite at low temperatures (%400 K): a-helix contents for Leu23 and

FIGURE 4 (a–c) Results of secondary-structure analysis of HP36. Canonical expectation values of a-helix contents for each residue evaluated by WHAM applied to the data from MUCAREM1 and MUCAREM2. Values at five temperature values are shown: 300 K (solid line), 350 K (dashed line), 400 K (short-dashed line), 500 K (dotted line), and 600 K (dash-dotted line). HP36 conformations are divided into three groups according to the number of nonlocal interside-chain native contacts formed (Nn): average helix contents for conformations with Nn % 7 (a), Nn ¼ 8 or 9 (b), and Nn R 10 (c). (d) Canonical expectation values of backbone (N, Ca, and C) RMSDs for the 15-residue regions of HP36 as functions of Nn0 at 350 K evaluated by WHAM, where Nn0 is the integer part of Nn/2, for residues 4–18 (solid line); 9–23 (long-dashed line), 14–28 (short-dashed line), and 19–33 (dotted line).

Trp24 are close to 100% and those for Leu21 and Pro22, which take the loop conformation in the native structure, are close to zero. This suggests the importance of the Biophysical Journal 99(5) 1637–1644

1642

Yoda et al.

FIGURE 5 Representative snapshot structures with different Nn values. Snapshot structures with potential energy values typical for the canonical NVT ensemble at T ¼ 350 K or lower are chosen from MUCAREM1 and MUCAREM2. a-helices are purple, side chains of Phe7, Phe11, and Phe18 are green, and side chains of Leu21, Gln26, and Leu29 are blue. Numbers written near the snapshot structures represent their Nn values. Side chains of the core phenyl˚ from) those side alanine residues (Phe7, Phe11, and Phe18) and of Phe18, Leu21, Gln26, and Leu29, and oxygen atoms in water molecules that are near (%3.5 A chains are shown by space-filling representation in a–c, respectively. In c, water oxygen atoms that are near a side chain of Phe7, Phe11, or Phe18 are red, and those near a side chain of Leu21, Gln26, or Leu29 are orange.

decrease in nonnative secondary-structure elements in the intermediate stages of folding. When nonlocal interside-chain native contacts are more highly formed (see Fig. 4 c), the secondary structures become closer to those in the native structure. The a-helix contents of Helices 1 and 2 become larger, indicating that these two helices are stabilized by specific nonlocal interactions. In the conventional 23-ns MD simulation at 300 K that was started from the native structure, the C-terminal end of Helix 2 often got extended by two residues, and the C-terminal end (Phe11) of Helix 1 was fragile. Thus, the secondary structures shown in Fig. 4 c are fully nativelike. Which part of HP36 folds early? Nativelike helix formation may take place even without the nativelike fold. We evaluated backbone RMSDs for four 15-residue fragments that overlap with neighboring ones and cover residues 4–33 as a whole (see Fig. 4 d). The C-terminal region seems to become nativelike earliest and the N-terminal region seems to become nativelike latest. Therefore, all the results of our analyses of side-chain contacts, side-chain dehydration, and shape of the peptide backbone strongly suggest that the C-terminal sequence (Loop 2 and Helix 3) is the earliest folding segment. This is consistent with the folding landscape of HP35 with implicit solvent (26). Other experimental and simulation reports have also suggested that this part is important not Biophysical Journal 99(5) 1637–1644

only for physiological functions (11,53,54) but also for folding and stability (22,56). The late formation of the hydrophobic core packing observed in the simulations presented here are also consistent with the long-time folding simulations recently reported (23). These results are also consistent with recent experimental studies that suggest the importance of the formation of Loop 2 (15,18), one of which is a f-value analysis (18). The simulation with explicit water solvent described in this article suggests that the side-chain interactions among Phe18, Leu21, Gln26, and Leu29 are important for the formation of the loop structure (see Fig. 5, b and c). The dehydration of those side chains, together with the formation of the loop backbone structure, was found to occur in the early stage of folding. As for the hydrophobic core residues, on the other hand, the side-chain packing and dehydration turned out to occur at a later stage of folding, with a back˚ , even though this region has residual bone RMSD of ~3.5 A structures in the denatured state (see Figs. 3 a and 5, a and c). These results suggest the importance of precise formation of the loop (or turn) structures between secondarystructure elements. We consider that the early formation of the loop (or turn) structures narrows the conformational space taken by the protein molecule and helps the fast folding of the protein by preventing misfolding. It is interesting that this view on the loop structure is similar to that on the turn structure in the b-hairpin folding proposed in our previous study (42).

Core Formation in HP36 Folding

Temperature dependence of conformational properties of HP36 Although we have mainly described the folding process at low temperature (350 K), temperature dependence of physical quantities can also be evaluated from MUCAREM data. Fig. S4 shows normalized canonical expectation values of the native helix contents and Rg as functions of temperature. The two curves roughly agree with each other. This suggests that the collapse and secondary-structure formation take place simultaneously, consistent with the view on HP36 folding described above. The folding transition temperature and the collapse transition temperature are also of particular interest because the large foldability of HP36 can be reflected in them (57–60). Hansmann et al. (59) evaluated the transition temperature values and discussed folding to the ground-state conformers for small peptide systems without explicit water. However, in the system with explicit water solvent presented here, the curves in Fig. S4 and the Nn curve, which is a measure of the specific tertiary-structure formation, are not sigmoidal, and it is difficult to evaluate transition temperatures. In these results, the native state is not in the most stable ensemble but in a local-minimum state (Fig. S3). This may be the cause of the difficulty in evaluating the transition temperatures. CONCLUSIONS In this study, we performed MUCAREM simulations of chicken villin headpiece subdomain HP36 in explicit water solvent with eight replicas. In the simulation started from a fully extended conformation, we observed several events of folding into nativelike conformations. Among these, a conformation with a lowest backbone RMSD value as ˚ was obtained. small as 1.1 A As far as we know, this is the largest protein in which successful folding simulations in explicit solvent from a fully extended initial conformation were achieved several times within a single simulation run. We are now applying this method to a small protein with ~50 amino acids. Moreover, we are ready to study several issues of protein folding, such as effects of salts in solvent and effects of mutations on the nature of intermediate states and folding landscape. SUPPORTING MATERIAL Four figures are available at http://www.biophysj.org/biophysj/supplemental/ S0006-3495(10)00788-5. Some of the early computations in this study were done on the computers at the Research Center for Computational Science, Institute for Molecular Science. We also thank the RIKEN Supercomputer System (RSCC) for providing us with computational resources. This work was supported in part by the Core Research of Evolutional Science and Technology (CREST) Project (High Performance Computing

1643 for Multi-Scale and Multi-Physics Phenomena) of the Japan Science and Technology Agency (JST) and Grants-in-Aid for Scientific Research on Innovative Areas (Fluctuations and Biological Functions) and for the Next Generation Super Computing Project, Nanoscience Program, from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan.

REFERENCES 1. Shen, M., F. P. Davis, and A. Sali. 2005. The optimal size of a globular protein domain: A simple sphere-packing model. Chem. Phys. Lett. 405:224–228. 2. Veretnik, S., P. E. Bourne, ., I. N. Shindyalov. 2004. Toward consistent assignment of structural domains in proteins. J. Mol. Biol. 339:647–678. 3. McKnight, C. J., D. S. Doering, ., P. S. Kim. 1996. A thermostable 35-residue subdomain within villin headpiece. J. Mol. Biol. 260: 126–134. 4. Wang, M., Y. Tang, ., D. P. Raleigh. 2003. Dynamic NMR line-shape analysis demonstrates that the villin headpiece subdomain folds on the microsecond time scale. J. Am. Chem. Soc. 125:6032–6033. 5. Kubelka, J., W. A. Eaton, and J. Hofrichter. 2003. Experimental tests of villin subdomain folding simulations. J. Mol. Biol. 329:625–630. 6. Buscaglia, M., J. Kubelka, ., J. Hofrichter. 2005. Determination of ultrafast protein folding rates from loop formation dynamics. J. Mol. Biol. 347:657–664. 7. Chiu, T. K., J. Kubelka, ., D. R. Davies. 2005. High-resolution x-ray crystal structures of the villin headpiece subdomain, an ultrafast folding protein. Proc. Natl. Acad. Sci. USA. 102:7517–7522. 8. McKnight, C. J., P. T. Matsudaira, and P. S. Kim. 1997. NMR structure of the 35-residue villin headpiece subdomain. Nat. Struct. Biol. 4: 180–184. 9. Frank, B. S., D. Vardar, ., C. J. McKnight. 2002. The role of aromatic residues in the hydrophobic core of the villin headpiece subdomain. Protein Sci. 11:680–687. 10. Vugmeyster, L., O. Trott, ., A. G. Palmer, 3rd. 2002. Temperaturedependent dynamics of the villin headpiece helical subdomain, an unusually small thermostable protein. J. Mol. Biol. 320:841–854. 11. Vermeulen, W., P. Vanhaesebrouck, ., F. A. Borremans. 2004. Solution structures of the C-terminal headpiece subdomains of human villin and advillin, evaluation of headpiece F-actin-binding requirements. Protein Sci. 13:1276–1287. 12. Havlin, R. H., and R. Tycko. 2005. Probing site-specific conformational distributions in protein folding with solid-state NMR. Proc. Natl. Acad. Sci. USA. 102:3284–3289. 13. Kubelka, J., T. K. Chiu, ., J. Hofrichter. 2006. Sub-microsecond protein folding. J. Mol. Biol. 359:546–553. 14. Brewer, S. H., B. Song, ., R. B. Dyer. 2007. Residue specific resolution of protein folding dynamics using isotope-edited infrared temperature jump spectroscopy. Biochemistry. 46:3279–3285. 15. Glasscock, J. M., Y. Zhu, ., F. Gai. 2008. Using an amino acid fluorescence resonance energy transfer pair to probe protein unfolding: application to the villin headpiece subdomain and the LysM domain. Biochemistry. 47:11070–11076. 16. Gao, J., and J. W. Kelly. 2008. Toward quantification of protein backbone-backbone hydrogen bonding energies: An energetic analysis of an amide-to-ester mutation in an a-helix within a protein. Protein Sci. 17:1096–1101. 17. Kubelka, J., E. R. Henry, ., W. A. Eaton. 2008. Chemical, physical, and theoretical kinetics of an ultrafast folding protein. Proc. Natl. Acad. Sci. USA. 105:18655–18662. 18. Bunagan, M. R., J. Gao, ., F. Gai. 2009. Probing the folding transition state structure of the villin headpiece subdomain via side chain and backbone mutagenesis. J. Am. Chem. Soc. 131:7470–7476. Biophysical Journal 99(5) 1637–1644

Yoda et al.

1644 19. Duan, Y., and P. A. Kollman. 1998. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science. 282:740–744.

40. Yoda, T., Y. Sugita, and Y. Okamoto. 2004. Comparisons of force fields for proteins by generalized-ensemble simulations. Chem. Phys. Lett. 386:460–467.

20. Ensign, D. L., P. M. Kasson, and V. S. Pande. 2007. Heterogeneity even at the speed limit of folding: large-scale molecular dynamics study of a fast-folding variant of the villin headpiece. J. Mol. Biol. 374:806–816. 21. Jayachandran, G., V. Vishal, ., V. S. Pande. 2007. Local structure formation in simulations of two small proteins. J. Struct. Biol. 157:491–499.

41. Yoda, T., Y. Sugita, and Y. Okamoto. 2004. Secondary-structure preferences of force fields for proteins evaluated by generalized-ensemble simulations. Chem. Phys. 307:269–283.

22. Piana, S., A. Laio, ., J. C. Martins. 2008. Predicting the effect of a point mutation on a protein fold: the villin and advillin headpieces and their Pro62Ala mutants. J. Mol. Biol. 375:460–470. 23. Freddolino, P. L., and K. Schulten. 2009. Common structural transitions in explicit-solvent simulations of villin headpiece folding. Biophys. J. 97:2338–2347. 24. Zagrovic, B., C. D. Snow, ., V. S. Pande. 2002. Simulation of folding of a small alpha-helical protein in atomistic detail using worldwidedistributed computing. J. Mol. Biol. 323:927–937. 25. Jang, S., E. Kim, ., Y. Pak. 2003. Ab initio folding of helix bundle proteins using molecular dynamics simulations. J. Am. Chem. Soc. 125:14841–14846. 26. Lei, H., C. Wu, ., Y. Duan. 2007. Folding free-energy landscape of villin headpiece subdomain from molecular dynamics simulations. Proc. Natl. Acad. Sci. USA. 104:4925–4930. 27. Yang, J. S., S. Wallin, and E. I. Shakhnovich. 2008. Universality and diversity of folding mechanics for three-helix bundle proteins. Proc. Natl. Acad. Sci. USA. 105:895–900. 28. Wei, Y., W. Nadler, and U. H. Hansmann. 2008. Backbone and sidechain ordering in a small protein. J. Chem. Phys. 128:025105. 29. Lei, H., X. Deng, ., Y. Duan. 2008. The fast-folding HP35 double mutant has a substantially reduced primary folding free energy barrier. J. Chem. Phys. 129:155104. 30. Sugita, Y., and Y. Okamoto. 2000. Replica-exchange multicanonical algorithm and multicanonical replica-exchange method for simulating systems with rough energy landscape. Chem. Phys. Lett. 329:261–270. 31. Mitsutake, A., Y. Sugita, and Y. Okamoto. 2003. Replica-exchange multicanonical and multicanonical replica-exchange Monte Carlo simulations of peptides. I. Formulation and benchmark test. J. Chem. Phys. 118:6664–6675. 32. Mitsutake, A., Y. Sugita, and Y. Okamoto. 2003. Replica-exchange multicanonical and multicanonical replica-exchange Monte Carlo simulations of peptides. II. Application of a more complex system. J. Chem. Phys. 118:6676–6688. 33. Berg, B. A., and T. Neuhaus. 1992. Multicanonical ensemble: a new approach to simulate first-order phase transitions. Phys. Rev. Lett. 68:9–12. 34. Sugita, Y., and Y. Okamoto. 1999. Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 314:141–151. 35. Hukushima, K., and K. Nemoto. 1996. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 65: 1604–1608. 36. MacKerell, Jr., A. D., D. Bashford, ., M. Karplus. 1998. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B. 102:3586–3616. 37. MacKerell, Jr., A. D., M. Feig, and C. L. Brooks, 3rd. 2004. Improved treatment of the protein backbone in empirical force fields. J. Am. Chem. Soc. 126:698–699. 38. Mackerell, Jr., A. D., M. Feig, and C. L. Brooks, 3rd. 2004. Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J. Comput. Chem. 25:1400–1415. 39. Jorgensen, W. L., J. Chandrasekhar, ., M. L. Klein. 1983. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79:926–935.

Biophysical Journal 99(5) 1637–1644

42. Yoda, T., Y. Sugita, and Y. Okamoto. 2007. Cooperative folding mechanism of a b-hairpin peptide studied by a multicanonical replicaexchange molecular dynamics simulation. Proteins. 66:846–859. 43. Sugita, Y., and A. Kitao. 1998. Improved protein free energy calculation by more accurate treatment of nonbonded energy: application to chymotrypsin inhibitor 2, V57A. Proteins. 30:388–400. 44. Kitao, A., S. Hayward, and N. Go. 1998. Energy landscape of a native protein: jumping-among-minima model. Proteins. 33:496–517. 45. Morikami, K., T. Nakai, ., H. Nakamura. 1992. PRESTO (protein engineering simulator): a vectorized molecular dynamics program for biopolymers. Comput. Chem. 16:243–248. 46. Evans, D. J., and G. P. Morris. 1983. The isothermal/isobaric molecular dynamics ensemble. Phys. Lett. A. 98:433–436. 47. Ferrenberg, A. M., and R. H. Swendsen. 1989. Optimized Monte Carlo data analysis. Phys. Rev. Lett. 63:1195–1198. 48. Kumar, S., D. Bouzida, ., J. M. Rosenberg. 1992. The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J. Comput. Chem. 13:1011–1021. 49. Sugita, Y., and Y. Okamoto. 2005. Molecular mechanism for stabilizing a short helical peptide studied by generalized-ensemble simulations with explicit solvent. Biophys. J. 88:3180–3190. 50. Tang, Y., D. J. Rigotti, ., D. P. Raleigh. 2004. Peptide models provide evidence for significant structure in the denatured state of a rapidly folding protein: the villin headpiece subdomain. Biochemistry. 43:3264–3272. 51. Tang, Y., M. J. Goger, and D. P. Raleigh. 2006. NMR characterization of a peptide model provides evidence for significant structure in the unfolded state of the villin headpiece helical subdomain. Biochemistry. 45:6940–6946. 52. Kabsch, W., and C. Sander. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 22:2577–2637. 53. Doering, D. S., and P. Matsudaira. 1996. Cysteine scanning mutagenesis at 40 of 76 positions in villin headpiece maps the F-actin binding site and structural features of the domain. Biochemistry. 35:12677– 12685. 54. Vardar, D., A. H. Chishti, ., C. J. McKnight. 2002. Villin-type headpiece domains show a wide range of F-actin-binding affinities. Cell Motil. Cytoskeleton. 52:9–21. 55. Reference deleted in proof. 56. Vermeulen, W., M. Van Troys, ., C. Ampe. 2006. Identification of the PXW sequence as a structural gatekeeper of the headpiece C-terminal subdomain fold. J. Mol. Biol. 359:1277–1292. 57. Camacho, C. J., and D. Thirumalai. 1993. Kinetics and thermodynamics of folding in model proteins. Proc. Natl. Acad. Sci. USA. 90:6369–6372. 58. Klimov, D. K., and D. Thirumalai. 1996. Criterion that determines the foldability of proteins. Phys. Rev. Lett. 76:4070–4073. 59. Hansmann, U. H. E., M. Masuya, and Y. Okamoto. 1997. Characteristic temperatures of folding of a small peptide. Proc. Natl. Acad. Sci. USA. 94:10652–10656. 60. Mazzoni, L. N., and L. Casetti. 2006. Curvature of the energy landscape and folding of model proteins. Phys. Rev. Lett. 97:218104. 61. Humphrey, W., A. Dalke, and K. Schulten. 1996. VMD: visual molecular dynamics. J. Mol. Graph. 14:33–38, 27–28.