Communication
Determination of Protein Folding Intermediate Structures Consistent with Data from Oxidative Footprinting Mass Spectrometry Florian Heinkel and Jörg Gsponer Centre for High-Throughput Biology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
Correspondence to Jörg Gsponer:
[email protected] http://dx.doi.org/10.1016/j.jmb.2015.10.022 Edited by A. G. Palmer
Abstract The mapping of folding landscapes remains an important challenge in protein chemistry. Pulsed oxidative labeling of exposed residues and their detection via mass spectrometry provide new means of taking time-resolved “snapshots” of the structural changes that occur during protein folding. However, such experiments have been so far only interpreted qualitatively. Here, we report the detailed structural interpretation of mass spectrometry data from fast photochemical oxidation of proteins (FPOP) experiments at atomic resolution in a biased molecular dynamics approach. We are able to calculate structures of the early folding intermediate of the model system barstar that are fully consistent with FPOP data and Φ values. Furthermore, structures calculated with both FPOP data and Φ values are significantly less compact and have fewer helical residues than intermediate structures calculated with Φ values only. This improves the agreement with the experimental β-Tanford value and CD measurements. The restraints that we introduce facilitate the structural interpretation of FPOP data and provide new means for refined structure calculations of transiently sampled states on protein folding landscapes. © 2015 Elsevier Ltd. All rights reserved.
Despite constant advances in experimental and computational techniques, the structural characterization of protein folding pathways remains a significant challenge [1]. This holds true because of the short lifetimes and low populations of the states of interest that are often inaccessible by highresolution techniques such as nuclear magnetic resonance (NMR) spectroscopy or X-ray crystallography and the fact that the folding timescale of many proteins is still outside the range of all-atom molecular dynamics (MD) simulations. The recent development of fast photochemical oxidation of proteins (FPOP) combined with mass spectrometry (MS) [2] marks an important advance for the quest to fully characterize protein folding pathways [3]. FPOP is able to map the modification footprint of photochemically formed hydroxyl radicals onto the sequence of a protein by identifying the mass shift of modified residue side chains in a tandem-MS approach [4,5]. The lifetime of the radicals (microseconds) is well below the timescale of folding (greater than or equal to milliseconds) and 0022-2836/© 2015 Elsevier Ltd. All rights reserved.
the timing of the photochemically induced radical formation can be tightly controlled with a laser, making this technique capable of taking snapshots of structural details along the folding/unfolding pathway of a protein [6–9]. The relation between oxidation pattern and protein structure originates from the fact that only solvent-accessible side chains can be covalently modified, whereas buried side chains are protected from the attack of hydroxyl radical. Nevertheless, the structural interpretation of FPOP data has been qualitative so far. Here, we use FPOP data as restraints in MD simulations in order to determine the structure of an early folding intermediate of barstar. In this approach, we apply a restraining potential to MD simulations of the molecule that compares the normalized experimental oxidation level oxl exp to the simulated oxidation level oxl sim . As described previously in biased MD [10], by minimizing this difference, which is incorporated in the reaction coordinate ρ, the potential drives the back-calculated oxidation levels oxl sim toward the experimental ones. J Mol Biol (2016) 428, 365–371
366
Structure calculation with FPOP data
The normalized experimental oxidation level oxl exp is defined as oxl exp ¼ i
m ðX Þ− m ðfoldedÞ m ðunfoldedÞ−m ðfoldedÞ
where m() are the fractions of peptides that are modified (oxidized) at amino acid position i in the respective states. Because the level of oxidation depends on solvent accessibility of the amino acid side chains, oxl sim can be defined as ox l sim ¼ i
SASAðX Þ−SASAðfoldedÞ SASAðunfoldedÞ−SASAðfoldedÞ
where SASA() are the solvent accessibilities of residue i in the respective states. Hence, oxlisim = 1 corresponds to the case where residue i has the same solvent accessibility in state X as in the unfolded state. If oxlisim = 0, the residue is as protected from hydroxyl radical attack as much as in the folded state, that is, has the same solvent accessibility. To illustrate the functionality of this approach, we applied the FPOP/MS restraints to barstar, an inhibitor of the ribonuclease activity of barnase. Folding of this 89-residue protein has been extensively studied, X-ray [11] and NMR [12] structures are available, and residue-resolved FPOP data have been collected recently to characterize its folding pathway [5,13]. More specifically, FPOP has been used to characterize the long-lived early folding intermediate. The FPOP data suggest formation of an early folding nucleus at the N-terminus of barstar.
(a)
We used the available FPOP data [5] of seven residues to restrain MD simulations and generate structures of the barstar folding intermediate. A key assumption that we make is that there is a dominant folding pathway for barstar and the width of the structural variability on this pathway is reasonably small. If multiple folding pathways are populated at a significant level, partial normalized oxidation levels that are different from 1 or 0 may result from structures in different pathways with varying levels of residue exposure. In such a case, ensemble-restrained simulations have to be used, in contrast to what we did here. The simulations were carried out using a modified CHARMM [14] version with the CHARMM19 force field [15] and the FACTS implicit solvent model [16]. The details of the implementation of the restraint are given in the supplementary information. Simulations were initiated from the lowest-energy, NMR-derived structure. After random assignment of velocities, the system was slowly heated from 0 K to 300 K for 30 ps without the restraints applied. Afterwards, the restraints were slowly turned on and the restraint force constant α was adjusted to get good agreement between oxl sim and oxl exp. Production runs with 100 stimulated annealing cycles and heating to 450 K were then started to generate different structures complying with the restraints. As a control, an unfolding simulation at 500 K was carried out and structures that have the same radius of gyration (15 ± 1 Å) as those from the biased annealing cycles were selected. The comparison of experimental and back-calculated normalized oxidation levels is given in Fig. 1. While the correlation
(b)
Fig. 1. Comparison of simulated (oxl sim) and experimental (oxl exp) normalized oxidation levels of seven residues in barstar. (a) Comparison for a simulation biased by the new FPOP restraint. The restrained residues are indicated. (b) Comparison for structures from a 500 K unfolding simulation that have a similar radius of gyration as the structures calculated with the FPOP restraint.
367
Structure calculation with FPOP data
(a)
(b)
Fig. 2. Comparison of simulated and experimental normalized oxidation levels and Φ values. (a) Comparison of oxl sim and oxl exp for structures of barstar obtained from a simulation biased by both the new FPOP restraints and the Φ values. The restrained residues are indicated. (b) Comparison of Φsim and Φexp calculated from the same structures.
coefficient is above 0.9 for the restrained simulation, it is 0.2 for the structures collected from the unrestrained simulations. Another well-established experimental technique that is capable of describing folding transition and intermediate states of proteins is a Φ value analysis [17–19]. This analysis gives an estimate for the amount of native structure around a mutated residue present in a state of interest on the folding pathway. Based on this structural link, Φ values are often back-calculated from known structures or used as restraints in simulations by approximating them as the fraction of native contacts present in the state of interest [20,21]. An extensive Φ value analysis for both transition states, as well as the early intermediate state, has been carried out for barstar [22]. We compared back-calculated Φ values from structures determined by the new FPOP restraint with experimental Φ values. Overall, there is a poor agreement between experimental and back-calculated Φ values (Supplementary Fig. 1). However, this is not surprising given the small number of FPOP restraints (7) that were used compared to the 21 Φ values. Therefore, we tested whether Φ values and FPOP data are compatible in the sense that both of them can be satisfied in a single structure and whether the use of FPOP data improves the structural description of a folding intermediate state. To do so, we compared structures calculated from simulations restrained with Φ values alone with structures from simulations that use both normalized oxidation levels and Φ values as restraints. We used the same simulation protocol as for the simulations with FPOP restraints only, with heating to 450 K in the annealing cycles (see the supplementary information). Figure 2
reveals that both the FPOP restraints and the restraints based on Φ values can be satisfied at the same time. The correlation coefficients of both back-calculated parameters with the corresponding experimental values are above 0.9 (Φ: 0.95, oxl: 0.98). As a control, we back-calculated FPOP values also from the restrained simulations in which we only used Φ values. As can be seen in Supplementary Fig. 3, the back-calculated oxidation levels correlate poorly with the experimental ones when using Φ-value restraints only, indicating that FPOP data and Φ values provide complementary, non-redundant structural information. We also carried out “leave-one-out” cross-validation for the restrained simulations using FPOP and Φ values. The ability to reproduce a left-out data point in a cross-validation is a good test for the self-consistency of the data, respectively, for how well the retrained structural property has converged overall. We were able to get a good agreement between back-calculated and experimental FPOP values for residues H17 and L20 when they were not used in the calculations (Fig. 3c and d). Both residues are part of the N-terminal region of barstar that is well defined in terms of the FPOP coverage. However, oxidation levels for the “isolated” residue L88 cannot be reproduced, if not used as an input, due to the lack of FPOP data for the surrounding region (Fig. 3b). These results suggest that the structures generated for the folding intermediate of barstar may be further refined by using more FPOP data from C-terminal residues. MD simulation restraints that are based on the number of native contacts such as Φ values [21] and hydrogen exchange restraints [23,24] are known to generate structures that are often too native-like and
368
Structure calculation with FPOP data
(a)
(b)
(c)
(d)
Fig. 3. Leave-one-out cross-validation. (a) Comparison of oxl sim and oxl exp for structures of barstar obtained from a simulation biased by both the complete set of FPOP restraints and the Φ values. Data points omitted in the cross-validation simulations are shown in green (H17, L20) and orange (L88). (b–d) Comparison of oxl sim and oxl exp for structures of barstar obtained from restrained simulations in which one FPOP data point was omitted: for residues L88, H17, and L20 in (b), (c), and (d), respectively.
too compact. Therefore, we checked whether the use of FPOP data as restraints improved the agreement between the experimental and back-calculated β-Tanford [25,26] values of the barstar intermediate . A previous experimental study found the intermediate state of barstar to have a β-Tanford value of 0.5 [22]. We back-calculated the β-Tanford values from the generated structures based on the radii of gyration and SASA as βT; R G ¼ 1−
R G ðX Þ−R G ðfoldedÞ ; R G ðunfoldedÞ−R G ðfoldedÞ
βT;SASA ¼ 1−
SASAðX Þ−SASAðfoldedÞ SASAðunfoldedÞ−SASAðfoldedÞ
where RG() and SASA() are the radii of gyration and total SASA values, respectively, in the different states. A β-Tanford value of 1 represents the compactness of the folded state, whereas a βT value of 0 represents the expansion of the unfolded state. We calculated the radii of gyration for the unfolded state from structures of the 500 K unfolding simulation and by using the formula RG = R0N υ (R0: function of the persistence length of the polymer; ν: exponential scaling factor; N: number of amino acids) for which R0 and υ were determined empirically previously based on radii of gyration derived from small-angle X-ray scattering of various proteins of different sizes [27]. The two methods provide quite different values for RG of 17.6 and 28.2 Å, respectively. The former values is likely to be too low as it
369
Structure calculation with FPOP data Table 1. Comparison of β-Tanford values based on RG and SASA and helical content computed from restrained simulations Restraint applied β-TanfordRG, upper β-TanfordRG, lower β-TanfordSASA, upper β-TanfordSASA, lower Percent (%) of native helical content
oxl
oxl + Φ
Φ
0.763 0.358 0.828 0.720 24
0.821 0.514 0.842 0.742 25
0.923 0.792 0.897 0.832 47
has been shown that many force fields do a rather poor job in describing unfolded states and often show them too compact [28,29]. For the SASAbased calculation of the β-Tanford value, we calculated the upper- and lower-bound SASA values of the unfolded state as described by Rose and co-workers [30,31]. As can be seen from Table 1, structures that were calculated only with Φ values as restraints are too compact and native-like. The addition of oxidative labeling restraints drives the calculations toward structures that are less compact and more compatible with the experimental β-Tanford value. The reduction in compactness upon addition of the FPOP restraint is also evident when comparing structures from calculations using only Φ values as restraints with those that were generated with the help of both Φ values and
FPOP data (Fig. 4b and c). The difference in compactness of the structures generated with Φ value and FPOP restraints, respectively, can be explained by the difference in the restraint implementation. As Φ values are calculated based on the number of native contacts only, enforcing Φ values different from unity will break native contacts but allow for non-native ones in their place, which can often conserve much of the compactness of the native state. FPOP restraints, by contrast, are defined with respect to the SASA, which will disfavor any type of contact whether native or non-native. The structures from both restrained simulations have some helical conformations at the N-terminus corresponding to the native helix H1. This helix is thought to form first along the folding pathway of barstar [5,22,32]. Also part of helix H4 is present in both groups of structures. This is consistent with the idea that this helix is formed in the consolidation process for the folding intermediate [22]. It is evident, however, that there is more secondary structure present when the intermediate is derived from Φ values only (Fig. 4b) compared to when the oxidative labeling restraint is turned on (Fig. 4c). This is also reflected in the percentage of helical content with respect to the native structure (Table 1). Two independent experiments, using different conditions, previously measured 40% [22] and 0% [33], respectively, of the native helical content
(a)
(b)
(c)
(d)
Fig. 4. Comparison of the NMR-derived native structure of barstar (a) with structures of the most populated clusters derived from biased simulations of the early folding intermediate using (b) Φ values only; (c) Φ values and FPOP data combined; (d) FPOP data only. Residues are rainbow-colored from the N-terminus (blue) to the C-terminus (red). The N-terminal early forming secondary structure motif around helix H1 is highlighted with a broken ellipse.
370
Structure calculation with FPOP data
in the folding intermediate. Structures derived from simulations using Φ values alone contain a too high amount of helical structure (47%) whereas the addition of FPOP restraint reduces the helical content to 25% of that seen in the native state. The FPOP restraints lead to a significant expansion of the entire structure. This expansion is most striking for the N-terminal hairpin that includes the early forming helix H1. The N-terminal hairpin is detached from a C-terminal globule. This modular arrangement can be found when data from FPOP experiments are included but not in the compact structures calculated from Φ values alone. The modular structure of the intermediate is consistent with the folding mechanism that has been proposed for barstar [22,34]. In summary, we introduce a new MD restraint that uses data from FPOP/MS experiments for structure calculations. We demonstrate that, when compared to structures calculated with Φ value restraints only, the combined use of FPOP and Φ values restraints improves the accuracy of the structures of the early folding intermediate of barstar in terms of their agreement with the experimental β-Tanford values and CD measurements. It has to be stressed that this improvement was achieved despite the fact that only seven FPOP restraints were available for these calculations. As most amino acids are susceptible to the attack of hydroxyl radicals, it should be possible to get more FPOP restraints for long-lived folding intermediates of proteins and use them in restrained calculation. This will in turn further improve the quality of calculated structures. In any case, our new restraint makes way for better structural interpretation of data from FPOP/MS experiments.
Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.jmb.2015.10.022. Received 22 July 2015; Received in revised form 21 October 2015; Accepted 22 October 2015 Available online 30 October 2015 Keywords: fast photochemical oxidation of proteins; restrained molecular dynamics simulation; solvent-accessible surface area; protein folding intermediate; structure determination
Abbreviations used: FPOP, fast photochemical oxidation of proteins; MD, molecular dynamics; MS, mass spectrometry; SASA, solvent-accessible surface area.
References [1] T.R. Sosnick, D. Barrick, The folding of single domain proteins—Have we reached a consensus? Curr. Opin. Struct. Biol. 21 (2011) 12–24, http://dx.doi.org/10.1016/j.sbi. 2010.11.002. [2] D.M. Hambly, M.L. Gross, Laser flash photolysis of hydrogen peroxide to oxidize protein solvent-accessible residues on the microsecond timescale, J. Am. Soc. Mass Spectrom. 16 (2005) 2057–2063, http://dx.doi.org/10.1016/j.jasms.2005. 09.008. [3] L. Konermann, Y. Pan, B.B. Stocks, Protein folding mechanisms studied by pulsed oxidative labeling and mass spectrometry, Curr. Opin. Struct. Biol. 21 (2011) 634–640, http://dx.doi.org/10.1016/j.sbi.2011.05.004. [4] T. Poor, L.M. Jones, A. Sood, G.P. Leser, M.D. Plasencia, D.L. Rempel, T.S. Jardetzky, R.J. Woods, M.L. Gross, R.A. Lamb, Probing the paramyxovirus fusion (F) protein-refolding event from pre- to postfusion by oxidative footprinting, Proc. Natl. Acad. Sci. USA (2014)http://dx.doi.org/10.1073/pnas. 1408983111. [5] J. Chen, D.L. Rempel, B.C. Gau, M.L. Gross, Fast photochemical oxidation of proteins and mass spectrometry follow submillisecond protein folding at the amino-acid level, J. Am. Chem. Soc. 134 (2012) 18724–18731, http://dx.doi. org/10.1021/ja307606f. [6] J. Chen, D.L. Rempel, M.L. Gross, Temperature jump and fast photochemical oxidation probe submillisecond protein folding, J. Am. Chem. Soc. 132 (2010) 15502–15504, http://dx.doi.org/ 10.1021/ja106518d. [7] B.B. Stocks, A. Rezvanpour, G.S. Shaw, L. Konermann, Temporal development of protein structure during S100A11 folding and dimerization probed by oxidative labeling and mass spectrometry, J. Mol. Biol. 409 (2011) 669–679, http://dx.doi.org/10.1016/j.jmb.2011.04.028. [8] Y. Pan, L. Brown, L. Konermann, Kinetic folding mechanism of an integral membrane protein examined by pulsed oxidative labeling and mass spectrometry, J. Mol. Biol. 410 (2011) 146–158, http://dx.doi.org/10.1016/j.jmb.2011.04.074. [9] B.B. Stocks, L. Konermann, Time-dependent changes in side-chain solvent accessibility during cytochrome c folding probed by pulsed oxidative labeling and mass spectrometry, J. Mol. Biol. 398 (2010) 362–373, http://dx.doi.org/10.1016/j. jmb.2010.03.015. [10] E. Paci, M. Karplus, Forced unfolding of fibronectin type 3 modules: An analysis by biased molecular dynamics simulations, J. Mol. Biol. 288 (1999) 441–459, http://dx.doi. org/10.1006/jmbi.1999.2670. [11] V. Guillet, A. Lapthorn, J. Fourniat, J.P. Benoit, R.W. Hartley, Y. Mauguen, Crystallization and preliminary X-ray investigation of barstar, the intracellular inhibitor of barnase, Proteins Struct. Funct. Genet. 17 (1993) 325–328, http://dx.doi.org/10. 1002/prot.340170309. [12] M.J. Lubienski, M. Bycroft, S.M. Freund, A.R. Fersht, Threedimensional solution structure and 13C assignments of barstar using nuclear magnetic resonance spectroscopy, Biochemistry 33 (1994) 8866–8877, http://dx.doi.org/10. 1021/bi00196a003. [13] B.C. Gau, J. Chen, M.L. Gross, Fast photochemical oxidation of proteins for comparing solvent-accessibility changes accompanying protein folding: Data processing and application to barstar, Biochim. Biophys. Acta 2013 (1834) 1230–1238, http://dx.doi.org/10.1016/j.bbapap.2013.02.023.
371
Structure calculation with FPOP data
[14] B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan, M. Karplus, CHARMM: A program for macromolecular energy, minimization, and dynamics calculations, J. Comput. Chem. 4 (1983) 187–217, http://dx.doi.org/10. 1002/jcc.540040211. [15] T. Lazaridis, M. Karplus, Effective energy function for proteins in solution, Proteins Struct. Funct. Genet. 35 (1999) 133–152, http://dx.doi.org/10.1002/(SICI)10970134(19990501)35:2b133::AID-PROT1N3.0.CO;2-N. [16] U. Haberthur, A. Caflisch, FACTS: Fast analytical continuum treatment of solvation, J. Comput. Chem. 29 (2008) 701–715, http://dx.doi.org/10.1002/jcc.20832. [17] A. Matouschek, J.T. Kellis, L. Serrano, A.R. Fersht, Mapping the transition state and pathway of protein folding by protein engineering, Nature 340 (1989) 122–126, http://dx.doi.org/ 10.1038/340122a0. [18] L. Serrano, A. Matouschek, A.R. Fersht, The folding of an enzyme, J. Mol. Biol. 224 (1992) 847–859, http://dx.doi.org/ 10.1016/0022-2836(92)90566-3. [19] A.R. Fersht, S. Sato, Phi-value analysis and the nature of protein-folding transition states, Proc. Natl. Acad. Sci. USA 101 (2004) 7976–7981, http://dx.doi.org/10.1073/pnas. 0402684101. [20] M. Vendruscolo, E. Paci, C.M. Dobson, M. Karplus, Three key residues form a critical contact network in a protein folding transition state, Nature 409 (2001) 641–645, http://dx. doi.org/10.1038/35054591. [21] E. Paci, M. Vendruscolo, C.M. Dobson, M. Karplus, Determination of a transition state at atomic resolution from protein engineering data, J. Mol. Biol. 324 (2002) 151–163, http://dx.doi.org/10.1016/S0022-2836(02)00944-0. [22] B. Nölting, R. Golbik, J.L. Neira, A.S. Soler-Gonzalez, G. Schreiber, A.R. Fersht, The folding pathway of a protein at high resolution from microseconds to seconds, Proc. Natl. Acad. Sci. USA 94 (1997) 826–830, http://dx.doi.org/10. 1073/pnas.94.3.826. [23] R.B. Best, M. Vendruscolo, Structural interpretation of hydrogen exchange protection factors in proteins: Characterization of the native state fluctuations of CI2, Structure 14 (2006) 97–106, http://dx.doi.org/10.1016/j.str.2005.09.012. [24] J. Gsponer, H. Hopearuoho, S.B.-M. Whittaker, G.R. Spence, G.R. Moore, E. Paci, et al., Determination of an ensemble of structures representing the intermediate state of the bacterial immunity protein Im7, Proc. Natl. Acad. Sci.
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
USA 103 (2006) 99–104, http://dx.doi.org/10.1073/pnas. 0508667102. A. Matouschek, A.R. Fersht, Application of physical organic chemistry to engineered mutants of proteins: Hammond postulate behavior in the transition state of protein folding, Proc. Natl. Acad. Sci. USA 90 (1993) 7814–7818, http://dx. doi.org/10.1073/pnas.90.16.7814. A. Matouschek, D.E. Otzen, L.S. Itzhaki, S.E. Jackson, A.R. Fersht, Movement of the position of the transition state in protein folding, Biochemistry 34 (1995) 13656–13662, http://dx. doi.org/10.1021/bi00041a047. J.E. Kohn, I.S. Millett, J. Jacob, B. Zagrovic, T.M. Dillon, N. Cingel, et al., Random-coil behavior and the dimensions of chemically unfolded proteins, Proc. Natl. Acad. Sci. USA 101 (2004) 12491–12496, http://dx.doi.org/10.1073/pnas. 0403643101. M.R. Shirts, J.W. Pitera, W.C. Swope, V.S. Pande, Extremely precise free energy calculations of amino acid side chain analogs: Comparison of common molecular mechanics force fields for proteins, J. Chem. Phys. 119 (2003) 5740–5761, http://dx.doi.org/10.1063/1.1587119. K. Lindorff-Larsen, N. Trbovic, P. Maragakis, S. Piana, D.E. Shaw, Structure and dynamics of an unfolded protein examined by molecular dynamics simulation, J. Am. Chem. Soc. 134 (2012) 3787–3791, http://dx.doi.org/10.1021/ ja209931w. T.P. Creamer, R. Srinivasan, G.D. Rose, Modeling unfolded states of proteins and peptides. II. Backbone solvent accessibility, Biochemistry 36 (1997) 2832–2835, http://dx.doi.org/10. 1021/bi962819o. T.P. Creamer, R. Srinivasan, G.D. Rose, Modeling unfolded states of peptides and proteins, Biochemistry 34 (1995) 16245–16250, http://dx.doi.org/10.1021/bi00050a003. J. Yunger, A study of Barstar folding events using boundary value simulations, Phys. A Stat. Mech. Appl. 386 (2007) 791–798, http://dx.doi.org/10.1016/j.physa.2007.08.057. V.R. Agashe, M.C. Shastry, J.B. Udgaonkar, Initial hydrophobic collapse in the folding of barstar, Nature 377 (1995) 754–757, http://dx.doi.org/10.1038/377754a0. B. Nölting, Structural resolution of the folding pathway of a protein by correlation of phi-values with inter-residue contacts, J. Theor. Biol. 194 (1998) 419–428, http://dx.doi. org/10.1006/jtbi.1998.0783.