doi:10.1016/S0022-2836(03)00625-9
J. Mol. Biol. (2003) 330, 1189–1201
Effects of Domain Dissection on the Folding and Stability of the 43 kDa Protein PGK Probed by NMR Michelle A. C. Reed1, Andrea M. Hounslow1, K. H. Sze2 Igor G. Barsukov3, Laszlo L. P. Hosszu4, Anthony R. Clarke4 C. Jeremy Craven1* and Jonathan P. Waltho1* 1
Department of Molecular Biology and Biotechnology Krebs Institute for Biomolecular Research, University of Sheffield, Firth Court, Western Bank, Sheffield S10 2TN, UK
4
The characterization of early folding intermediates is key to understanding the protein folding process. Previous studies of the N-domain of phosphoglycerate kinase (PGK) from Bacillus stearothermophilus combined equilibrium amide exchange data with a kinetic model derived from stopped-flow kinetics. Together, these implied the rapid formation of an intermediate with extensive native-like hydrogen bonding. However, there was an absence of protection in the region proximal to the C-domain in the intact protein. We now report data for the intact PGK molecule, which at 394 residues constitutes a major extension to the protein size for which such data can be acquired. The methods utilised to achieve the backbone assignment are described in detail, including a semi-automated protocol based on a simulated annealing Monte Carlo technique. A substantial increase in the stability of the contact region is observed, allowing protection to be inferred on both faces of the b-sheet in the intermediate. Thus, the entire N-domain acts concertedly in the formation of the kinetic refolding intermediate rather than there existing a distinct local folding nucleus.
*Corresponding authors
Keywords: folding; kinetic intermediate; NMR; assignment; phosphoglycerate kinase
2
Department of Biochemistry The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, SAR, People’s Republic of China 3
Department of Biochemistry University of Leicester Leicester LE1 7RH, UK MRC Prion Unit, Institute of Neurology, National Hospital for Neurology and Neurosurgery, Queen Square London WC1N 3BG, UK
q 2003 Elsevier Ltd. All rights reserved
Introduction Most proteins do not show one-step folding and have at least one distinct, populated intermediate between the fully unfolded state and the transition state barrier.1 – 5 The properties ascribed to these kinetic intermediates range from unfolded state reorganisation to well-ordered, relatively stable, native-like species.4,6 – 13 Kinetic intermediates have Abbreviations used: PGK, phosphoglygerate kinase from Bacillus stearothermophilus; W-PGK, W290Y mutant of full length PGK; N-PGK, engineered isolated Nterminal domain (residues 1 – 174) of PGK; W-PGKN, the N-terminal domain of PGK in the context of the intact molecule; W-PGKC, the C-terminal domain of PGK in the context of the intact molecule; U, unfolded state; I, kinetic folding intermediate; F, folded (native) state. E-mail addresses of the corresponding authors:
[email protected];
[email protected]
been shown both directly14,15 and indirectly13,16,17 to be advantageous to folding (on-pathway intermediates), though there are also examples of misfolded kinetic intermediates with non-native conformation18,19 or misligation of a prosthetic group.20 While detailed structural information about kinetically relevant protein folding intermediates is central to our understanding of protein folding mechanisms, the transient nature of these states limits the experimental methods with which they may be probed.21 Insights into their structural properties have come primarily from the effect of site-directed mutations on the kinetics of folding and unfolding,4,22 and from pulsed and equilibrium amide exchange studies.23,24 Equilibrium amide exchange methods, in combination with independent measurements of the stability of kinetic intermediates, can be used to overcome difficulties caused by the variation of stability with pH in pulsed experiments.10,13 Though these
0022-2836/$ - see front matter q 2003 Elsevier Ltd. All rights reserved
1190
methods establish that kinetic intermediates can involve the majority of residues and can be relatively stable, the intermediates remain more solvent accessible than their folded state counterparts and do not have immobilised side-chains.1 The latter occurs on the folded state side of the major refolding transition state.25 Phosphoglycerate kinase (PGK) from B. stearothermophilus folds via stable kinetic intermediate states. PGK is a 43-kDa protein that comprises two similarly sized and structurally homologous domains.26 On refolding from their unfolded states, both domains collapse independently and rapidly (on a millisecond or shorter timescale) to kinetic intermediate states.27,28 The primary involvement of inter-domain contacts occurs subsequently in folding and is indistinguishable from the major refolding transition state of the C-terminal domain, the slowest process in the folding mechanism. The rapid formation of kinetic intermediate states is similarly observed in engineered isolated N and C-domains of PGK, which also retain essentially the same final fold as in the intact protein (Ref. 29 and unpublished data). In contrast to many smaller proteins where kinetic intermediate states are much less stable than the native protein, the N-domain of PGK realises the majority of the total free energy change of folding in the unfolded to intermediate state transition.27 The resulting small difference in free energy between the folded state and the intermediate state permitted an exploration of the nature of its intermediate state using equilibrium amide exchange.13 Hydrogen bonding within the intermediate state can be readily probed since this state is visited numerous times before the majority of amide hydrogen atoms exchange. In combination with fluorescence-detected measurements of the intermediate to folded state transition, we thus established that many amide hydrogen atoms were afforded considerable protection from solvent exchange in the kinetic intermediate state of the isolated N-terminal domain.13 However, it is striking that although this protection is extensive, it occurs only on one side of the b-sheet. Two scenarios could cause this observation and each would lead to a very different interpretation of the nature of the kinetic intermediate, and thus the mechanism of folding. First, structure may be forming on one side of the sheet in the intermediate, while the folding of the other side only occurs on crossing the major refolding transition state. This would correspond to a rapidly folding unit that constituted only a part of the domain; however, this was sufficient to attain the tortuous topology of the native state. In other words, within a single domain, there exists an independently folding nucleus. Alternatively, the intermediate may be far more extensive, encompassing the entire secondary structure within the domain, but with its extent masked by local solvent exchange processes in the native state caused by the removal
Folding and Stability of PGK
of interactions with the C-domain. In this interpretation, the entire domain is acting concertedly. The resolution of these alternatives is presented here and requires the amide protection and corresponding folding kinetics to be measured on the intact protein, in which local solvent exchange processes are retarded. A prerequisite to the interpretation of these data is the backbone NMR resonance assignment for PGK, which at 394 residues is the second largest single-chain protein for which free energy backbone assignment has been reported to date. Due to the large size of the protein, and given the paucity of assignments approaching this size in the literature, we describe the process in detail, and in particular describe a semi-automated protocol based on a simulated annealing Monte Carlo technique. The methods are readily applicable to smaller proteins and should be particularly pertinent for high throughput studies. The assignment is then used to resolve the nature of the kinetic intermediate of the N-terminal domain.
Results Backbone assignment The strategy employed to obtain the 95% complete backbone assignment of the HN, N, Ca, Cb and C0 resonances of the 394 residues of full-length W290Y mutant of PGK (W-PGK) utilised a combination of 2D 15N-TROSY,30 3D TROSY ct-HNCA, ct-HN(CO)CA, HN(CA)CB, and HN(COCA)CB, HNCO, and HN(CA)CO spectra.31,32 The correlation time of the protein was estimated from 15N relaxation measurements to be ca 18 ns at 37 8C, and the 15N-TROSY spectrum showed excellent dispersion (Figure 1). Excluding proline residues and the N-terminal residue, W-PGK contains 378 residues. An initial list of 347 spin systems was constructed based upon the positions of 1H – 15N correlations in the 15 N-TROSY spectrum. Each spin system was then inspected in the ct-HNCA, ct-HN(CO)CA,
Figure 1. 15N – 1H TROSY spectrum of W-PGK.
Folding and Stability of PGK
HN(CA)CB, and HN(COCA)CB spectra simultaneously, in order to determine the chemical shifts of the Ca, pCb, Cb, and pCb resonances (for definitions of terminology see Materials and Methods). At this stage a 3D TROSY HNCO spectrum was used mainly as a sensitive aid to detect spin systems with overlapped 1H – 15N correlations, although pC0 chemical shifts were also collected for a later checking phase. This procedure indicated the existence of at least a further 16 spin systems, with two particularly problematic regions of 1H – 15N overlap. The peak picking process was completed in approximately three days using in-house macros. The Ca, pCb, Cb, and pCb chemical shifts were then used as input to the simulated annealing program of the “asstools” suite of assignment programs. This program attempts to reorder the spin systems and place them in the amino acid sequence of the protein, by minimising an energy function based on both matching up the chemical shift data for adjacent residues, and on characteristic chemical shift ranges for resonances in particular residue types. In order to assess how well-determined each part of the assignment is, a comparison can be made of the output of a number of different runs of the program with different initial random configurations of spin systems. Further details of the algorithm are given in Materials and Methods. Initially, chemical shifts for only 348 spin systems were included in the input data for the program, as there was some uncertainty in the data for the other 15 spin systems. In 30 runs of the asstool program using this initial chemical shift data, 218 spin systems (58% of the 378 assignable backbone NHs) were consistently uniquely assigned to particular residues and these assignments were all correct when later compared to the finalised assignment. Later analysis also showed that a less restrictive criterion of accepting an assignment if it was the same in 25 or more runs of the program would have given no incorrect assignments, and would have assigned 300 (79%) residues. Where residues were assigned to a number of different spin systems or were assigned to void spin systems (see Materials and Methods), the spectra and spin system lists were manually inspected. Although a very small number of errors and peak picking inconsistencies were present in the original peak picked data, the main source of assignment uncertainty was that the original peak list did not contain sufficient spin systems. It appears that single missing residues create rather large regions of assignment uncertainty. It was possible to work into these regions from either side, starting from residues with secure 30/30 assignments, and slowly confirming the more marginal assignments by checking that no other spin system could equally fit well at that position and by checking that the correspondence of chemical shifts to residue types and the quality of frequency
1191
matching between sequential residues were good. This generally led to the conclusion that no spin system in the current set would fit in a particular position. It was then possible to examine the 1 H – 15N planes of the 3D spectra to find 1H – 15N correlations with appropriate carbon shifts. For instance if the Ca and Cb shifts of residue X 2 1 were known then the planes of the HN(CO)CA and HN(COCA)CB at these frequencies could be examined. The 1H – 15N of residue X should be at a position showing intensity in both these planes. In some cases, after elimination of 1H – 15N positions that were already securely assigned, this procedure led to one of the regions of intense 1H – 15N overlap described above. It was then possible to pair up pCa and pCb shifts and assign them to one particular spin system. Elimination of these unpicked resonances reduced (or eliminated) ambiguity of other spin systems at the same 1H – 15N position, thus often reducing assignment uncertainty in other parts of the sequence. It was also possible to work backwards from residue X þ 1 in a similar fashion, thus confirming the new spin system, and identifying its Ca and Cb shifts. In cases where no such 1H – 15N position was found, it was concluded that the 1H – 15N correlation was absent. In the final stages of the assignment, 15 residues were fixed as void in this way. The assignment proceeded in an iterative fashion, with the assignment program being rerun after the incorporation of additional information. When an extra spin system was added, improvements were often obtained in regions of the sequence distant from the added spin system, since adding information that makes the assignment of a particular residue unique reduces the freedom in the assignment of other residues. Firm assignments were obtained for 357 (94%) of the assignable backbone amide groups in about five days. The assignment was finalised in approximately a further five days, with 360 (95%) of the assignable backbone amide groups being confidently assigned, three residues (Ala273, Asp186 and Arg187) being provisionally assigned, and 15 “missing” residues giving no detectable 1H – 15N correlation. Residue Ala273 lies between two missing residues and was provisionally assigned to a single remaining spin system which had appropriate Ca, pCa, Cb, pCb chemical shifts for an alanine preceded by a phenylalanine. A similar situation occurred for residues Asp186 and Arg187, which lie between two proline residues. When the assignment was deemed complete, carbonyl chemical shifts were then examined and found to be consistent with the obtained assignment. This acts as a strong independent check on the assignment, although it is not immune to problems of amide cross-peak overlap. No backbone assignment could be obtained for 15 of the 394 residues of W-PGK. A very careful inspection of spectra was made using the information about neighbouring residues as described above, to be confident that the assignment was not possible for
1192
Folding and Stability of PGK
these residues due to the absence of amide crosspeaks rather than resonance overlap. The only additional peaks in the spectrum appeared to be due to a very small population of a modified form of the protein. The missing residues are all significantly solvent exposed in the X-ray structure, suggesting that the amide cross-peaks are absent due to extremely rapid chemical exchange with the solvent at the high pH and temperature that were necessary for sample stability and reduction of resonance linewidths. Of these missing residues, Ala196, Gly351, Asp352, Ser353 and Gly373 correspond to five of the seven backbone amide groups that ligate the substrates in the X-ray structure of the ternary complex of Trypanosoma brucei PGK, 3-phosphoglycerate and ADP.33 In the unliganded form of the protein studied, these amide groups are fully solvent exposed. Comparison of chemical shifts of N-PGK and W-PGKN The differences in chemical shift between the isolated N-domain (which we designate N-PGK) and the N-domain in the intact molecule (which we designate W-PGKN) are shown in Figure 2. The changes in chemical shift are very small for the region comprising residues 7 – 143. In contrast, larger changes in chemical shift were observed for residues 3– 6 and 144– 174, of which residues 145 – 152 and 171 – 174 showed the greatest chemical shift changes (Figure 2(g)). In N-PGK the secondary a-carbon shifts of residues 145 – 152 and 171 – 174 are negligible (i.e. DCarc < DCatr; Figure 2(e) and (f)), whereas the secondary a-carbon shifts in W-PGKN are substantial, with magnitudes typical of stable secondary structure (Figure 2(f)). The chemical shifts were analysed using TALOS,34 which predicts backbone dihedral angles by comparison of observed chemical shifts with those in a database of proteins of known structure. In W-PGKN, TALOS makes confident predictions for residues 171–174 that are very close to the a-helical values observed in the X-ray structure,26 whereas in N-PGK the prediction for 171 deviates from a-helical values, and no confident prediction is made for residues 172 and 173. (A prediction cannot be made for residue 174 since it is not the central residue of a triplet of residues as required by the TALOS algorithm.) Residues 145–152 are in a region of irregular secondary structure, and since such regions are poorly represented in the TALOS structural database, these tend to be the regions in which TALOS is not able to make confident predictions. Consequently, few confident predictions are made by TALOS for residues 145–152 in either N-PGK or W-PGKN, which precludes a more detailed comparison to the X-ray structure by this method. The effect in the portion of the protein between these two regions (residues 153– 170) is quite different. Although the shift changes are still large compared to those observed in the 7 – 143 region,
Figure 2. Pattern of chemical shift differences between the N-domain of PGK in the intact and truncated proteins, and comparison to a-carbon secondary shifts, as a function of residue number. (a)– (e) Changes in chemical shift for, respectively, the backbone amide proton, backbone amide nitrogen, b-carbon, carbonyl carbon, and a-carbon. DtrX is defined by DtrX ¼ dXW 2 dXN, where dXW and dXN are the chemical shifts of a particular nucleus X in W-PGKN and N-PGK, respectively. (f) The a-carbon secondary chemical shift DrcCa ¼ dCaW 2 dCarc, where dCarc is the a-carbon random coil chemical shift, taken from Wishart & Sykes.45 DrcCa is that part of the chemical shift that is predominantly due to the secondary and tertiary structure of the intact protein. (g) The weighted root mean square chemical shift change for all five nuclei in (a)– (e), the contributions for each nucleus being inversely weighted by the root mean square value for that nucleus across the whole N-domain.
they are much smaller than for the flanking regions (145 – 152 and 171 –174) and, furthermore, the changes in a-carbon chemical shift for residues 153 –170 are small compared to the substantial values that exist for the secondary a-carbon shifts in the isolated N-domain (i.e. DCatr p DCarc). For residues 3 – 6, a combination of these types of behaviour is observed. Comparison of protection factors of N-PGK and W-PGKN Amide exchange rates were measured for
1193
Folding and Stability of PGK
W-PGK (see Materials and Methods), and were converted to protection factors calculated using intrinsic exchange rates obtained using the method of Bai et al.35 (Table 1; Figure 3). The changes in chemical shift observed for residues 3 –6 and 144 – 174 are paralleled by differences in protection factors between N-PGK and W-PGKN (Figure 3). In W-PGKN, protection is observed for residues 6 and 7, and for many residues in the region 145 –174, which in N-PGK do not show measurable protecTable 1. Amide exchange data for W-PGKN Residue
kobsa
kintrinsicb
Log10 (PF)c
Log10 (PFI)d
6 9 15 24 38 39 40 49 50 60 72 77 81 83 102 103 104 115 116 117 118 119 123 127 133 134 135 136 139 141 145 149 153 154 155 157 158 160 163 166 167 168 169 172 173 174
56 160 72 190 34 250 13 15 34 22 60 8.1 150 8.3 200 3.4 190 3.6 1.6 2.3 2.3 2.5 160 39 8.9 0.64 2.3 11 4.7 1.1 2.2 150 30 7.9 3.2 15 9.8 2.8 17 1.9 2.6 5.7 30 5.5 47 17
44 20 240 62 110 250 150 27 110 69 110 110 170 28 150 30 54 28 330 62 130 140 68 160 200 86 170 46 75 49 98 150 160 260 350 73 200 88 44 30 110 280 130 59 96 61
5.9 5.1 6.5 5.5 6.5 6.0 7.1 6.3 6.5 6.5 6.3 7.1 6.1 6.5 5.9 6.9 5.5 6.9 8.3 7.4 7.8 7.8 5.6 6.6 7.3 8.1 7.9 6.6 7.2 7.6 7.6 6.0 6.7 7.5 8.0 6.7 7.3 7.5 6.4 7.2 7.6 7.7 6.6 7.0 6.3 6.6
3.0 2.1 3.6 2.5 3.6 3.0 4.1 3.3 3.6 3.6 3.3 4.2 3.1 3.6 2.9 4.0 2.5 4.0 5.4 4.5 4.8 4.8 2.6 3.7 4.4 5.2 4.9 3.7 4.3 4.7 4.7 3.0 3.8 4.6 5.1 3.7 4.4 4.6 3.5 4.3 4.7 4.8 3.7 4.1 3.4 3.6
Data is shown only for those residues for which a value could be reliably determined. The errors on the observed exchange rates are estimated to be ca 20% for kobs 3 £ 1026 s21 and ca 50% for kobs , 3 £ 1026 s21. The lower limit for reliably determining kobs was 0.3 £ 1026 s21. Residues 16, 17, 18, 19, 20, 44, 45, 47, 48, 53, 54, 55, 56, 57, 58, 80, 156, 112, 140, and 142 exchanged more slowly than this lower limit. a Observed exchange rate £ 1026 (s21). b Intrinsic exchange rate (s21). c Log10 (Protection factor in native state). d Log10 (Inferred protection factor in intermediate).
Figure 3. Backbone amide protection factors in PGK. (a) Protection factors in the native state of N-PGK. (b) Protection factors in the native state of W-PGKN. (c) Protection factors in the kinetic intermediate of W-PGKN. Black bars denote protection factors calculated from measured exchange rates, kobs. Red bars denote minimum protection factors for residues for which the exchange was so slow that only a lower limit could be placed on kobs.
tion. Furthermore, in N-PGK, residues 130 –143 have protection factors significantly smaller than those for the core of the domain, whereas in W-PGKN, the protection factors are comparable to the rest of the domain. The protection factors are shown in the context of the 3D structure of W-PGK in Figure 4(a). For the C-domain, protection is observed predominantly in the half of the domain distal from the N-domain. This may reflect the absence of ligands or may reflect a general plasticity of the interdomain region. Resolution of this question will be the subject of future studies. Protection in the kinetic intermediate It has previously been shown that the folding of the domains of PGK occurs via the following simple scheme: UOIOF where the unfolded state (U) to intermediate state (I) transition occurs within the dead time of stopped flow measurements, and the intermediate state (I) to folded state (F) transition is rate limiting. The thermodynamics and kinetics of folding has
1194
Folding and Stability of PGK
been extensively studied previously13,27 – 29 for each of the N and C-domains, both in the context of the intact protein and in the isolated domains. The basis of the method for investigating the level of protection in the kinetic intermediate is as follows. If an amide group is completely solvent exposed in the intermediate, then under the conditions of the present study the exchange from F via I will be in the so-called EX1 limit, since the intrinsic amide exchange rates (which are in the range 10– 100 s21 under the conditions of this study) exceed the rate constant for the “closing” reaction, kI – F (1.1 s21). The observed exchange rate would therefore be given by kF – I (0.0013 s21). Exchange significantly slower than this rate implies protection in the intermediate. Further analysis is complicated by the observation that a consequence of protection in the intermediate is that the exchange may cross into the so-called EX2 regime. Therefore, a more detailed analysis is required to obtain quantitative results, as follows. First, one can consider the partial contribution, kIobs to the total observed exchange rate that arises owing to occasional excursions to the I state. Thus, if exchange occurs from the intermediate state at a rate kIex, then: kIobs ¼
kF – I kIex kI – F þ kIex
ð1Þ
where kF – I and kI – F are the forward and backward rate constants for the F to I equilibrium. As the overall observed exchange rate kobs may contain contributions from processes other than exchange in I (i.e. kIobs # kobs) an upper limit can be obtained for kIex by rearrangement of equation (1):
Figure 4. Effects of domain dissection on the inferred nature of the kinetic intermediate and on the folded states of PGK. (a) Backbone representation of the structure of W-PGK,26 with spheres representing the protection factors in the native state. The radii of the spheres are proportional to the logarithm of the protection factors. Spheres in the N-domain are coloured red, whereas those in the C-domain are green. The position of ADP molecule in the X-ray structure26 is shown in cyan. (b) Detail of the structure of PGK,26 to show the regions in the N-terminal domain perturbed by dissection from the intact molecule. The N-terminal domain is shown space-filled, and the C-terminal domain is shown in a cartoon representation. The N-terminal domain is coloured grey for regions effectively unperturbed by dissection; yellow for regions mildly affected (residues 144, and 153–170); and red (residues 143– 152 and 171– 174) or orange (residues 1 –6) for regions strongly affected. (c) Backbone representation of the structure of W-PGKN,26 with spheres representing the inferred protection factors in the intermediate state of W-PGKN. Spheres for which protection could previously be inferred in the kinetic intermediate in the isolated N-domain are shown in red, whereas residues for which protection could not previously be inferred are in green. (a) and (c) were created using rasmol and povray; (b) was created using pymol.
kIex #
kobs kI – F kF – I 2 kobs
ð2Þ
Thus, a lower limit can be obtained for the protection factor in the intermediate, PFI: PFI ¼
kintrinsic kF – I 2 kobs $ kintrinsic I kex kobs kI – F
ð3Þ
where kintrinsic is the intrinsic rate of exchange for an unprotected amide. Using values for kI – F and kF – I from Parker et al.,27 lower limits on the protection factors in the intermediate in W-PGKN were calculated according to equation (3) (Table 1; Figure 3(c)) and are shown in the context of the 3D structure of W-PGKN in Figure 4(c). For the C-terminal domain, W-PGKC, the folded state is much more stable relative to the kinetic intermediate than is the case for the N-domain. It is the relatively low value for KF/I in the N-domain that makes it possible to probe the nature of the intermediate by monitoring amide exchange in the native state. Even if exchange occurs very rapidly from the intermediate, the rate of exchange observed in the folded state via conversion to the
Folding and Stability of PGK
kinetic intermediate fundamentally cannot exceed kF – I. For the very stable C-domain, the rate kF – I is very low (, 10210 s21), and is much lower than the observed rate of exchange from the folded state. Thus, we can infer no limit on the rate of exchange in the kinetic intermediate of the C-domain. We conclude that the exchange in the folded state of the C-domain is dominated by local exchange processes.
Discussion Backbone assignment At 394 residues, W-PGK is one of the largest proteins for which the backbone assignment has been completed. Whilst the assignment of a 723 residue protein has recently been reported,36 and even though proteins of similar size were first assigned some four years ago,37 it is significant that there remain rather few reported assignments of proteins of this size. We made a survey of the assignments deposited to date in the BioMagResBank38 (Figure 5), which showed that the vast majority of assignments are for proteins less than half the size of W-PGK. Initially we anticipated that we would be able to use the backbone assignment of N-PGK29 to facilitate the assignment of the full-length protein, despite the fact that we also anticipated that there would be regions of substantial chemical shift change. However, the assignment proceeded so rapidly without reference to the N-PGK assignment, that we decided to keep the N-PGK data in reserve as a form of check on our final assignment. The method used for assignment was a semiautomated one, in which we manually picked peaks in the 3D spectra, and grouped them into spin systems, but then used a simulated annealing approach to assign spin systems to actual residues
Figure 5. Histogram of the sizes of entries containing Ca assignments in the BioMagResBank (BMRB) in January 2003. The histogram is plotted with a bin width of ten Ca resonances. Multi-chain entries are represented by their individual component chains. The arrow locates the W-PGK assignment.
1195
(see Results). The simulated annealing method appears to be both robust and rapid, and is based on very simple principles, which makes it very easy for the spectroscopist to intervene to sort out problem areas. In our groups the annealing method has been used in the backbone assignment of more than 12 proteins, and the efficiency of the method makes it attractive for use in high throughput programs. A number of other automated or semiautomated schemes for backbone resonance assignment have been reported, as reviewed by Moseley & Montelione.39 Our approach appears to be highly competitive. For instance the assignment of a 263 residue protein was recently described,40 using the AutoAssign program. 95% complete assignment was reported to be complete in approximately five weeks, including processing, manual peak picking, peak editing and assignment. For our assignment of W-PGK, the equivalent time was two weeks. Recently, a direct combinatorial enumeration technique using just Ca shifts was discussed by Andrec & Levy,41 which was proposed to be capable of assigning proteins up to a size of ca 80 residues. We had previously tested our simulated annealing method on the 98-residue protein stefin A, and found that an assignment could be achieved very straightforwardly with the data from a constant time HNCA/HN(CO)CA pair (unpublished data). Direct enumeration scales particularly badly with protein size, whereas a simulated annealing method is much less sensitive to protein size. Peak picking and assembling of spin systems is often considered an attractive target for automation, however, we preferred to use a manual method, coupled with an intensively streamlined and ergonomic computerised book keeping system. As a result, the peak picking occupied only three days, which is a relatively small part of the two-week process. Thus further automation of this stage would yield little benefit, especially as checking of the output of an automated method would be required. On the other hand, after the initial peak picking and one run of the assignment program, a 60 –80% complete assignment was obtained. The final 20– 40% of the assignment took up the majority of the assignment time. Improving the level of assignment to 94% and then to 95% involved two further periods of five days each. This phenomenon of diminishing returns is a familiar feature of macromolecular NMR. For some applications, the initial partial assignment would be adequate, in which case automation of the peak picking process would yield a greater relative increase in efficiency. Since the assignment of W-PGK could be completed with only Ca and Cb data, it is clear that even for proteins of this size it is not overlap in the carbon dimensions that is the limiting factor, but rather the necessity to resolve sufficient spin systems in the H – N plane.
1196
Although we found it necessary to use only 3D experiments, the 4D experiments of Kay et al.31 alleviate the H –N overlap problem significantly by allowing the carbon shifts to be unambiguously compiled into spin systems, even for overlapped H – N resonances. How large is the effect of PGK N-domain isolation on its folded state and kinetic intermediate? On isolation of the N-domain from the intact ˚ 2 of surface area protein, approximately 1400 A becomes solvent exposed, which represents a 19% increase in surface area. Previous work to resolve the issue of the effect that this perturbation has on the structure of N-PGK was limited to comparing NOE, chemical shift and amide protection data to that expected from the crystal structure of the intact protein.29 With the assignment of the intact W-PGK protein, it is now possible to make a much more direct comparison, which makes it clear that a variety of effects occur, ranging from complete abolition of secondary structure (residues 145 – 152 and 171– 174) to a very mild destabilisation of structure (residues 130– 142). The evidence for this comes from the analysis of chemical shift data and amide protection factors. For residues 145 –152 and 171– 174 not only is protection lost on isolation of the N-domain, but also the chemical shifts are effectively returned to random coil values, whereas for residues 130– 142 there is a measurable loss of protection with no significant change in chemical shifts. Intermediate between these two extreme cases is the behaviour of residues 153 – 170 which are flanked in sequence by the stretches that lose structure. These residues show a significant loss of protection (to below the detection limits of the experiment) along with chemical shift changes that are measurable but small compared to the secondary shifts that are observed in W-PGKN. The data cannot distinguish between two subtly different explanations. One is a rapid equilibrium between a structure very similar to that of W-PGKN and a significant (ca 10%) proportion of a different structure with very low protection, which is perhaps partially unfolded. The other explanation is that a small change in structure occurs, coupled with a destabilisation against local “open” states which are not sufficiently populated to influence the chemical shift. In NMR terms, the difference between these cases is that the chemical shift change arises due to the presence of a minor form in the first case, and due to a change in the major form in the second case. Each model is plausible given the loss of structure in the flanking regions. This range of effects correlates well with the proximity of residues to the C-domain (Figure 4(b)). The N and C-domains of PGK make contact not only in the vicinity of the interdomain helix but also by the intimate folding of the ca 25-
Folding and Stability of PGK
residue C-terminal segment of the C-domain against the N-domain. In the isolated N-domain, the severe destabilisation of structure for residues 145 –152 is compatible with loss of contacts to the two C-terminal helices comprising residues 373 – 380 and 386 –389, and also from the disruption of the alpha-helical structure of residues 171 –174. For instance, in the intact protein, the side-chain of Phe146 is packed into a large hydrophobic pocket formed by residues Glu174, Leu175, Leu178, Gly372, Ala375, Ser376, Phe379, Leu385, and Val388. The destabilisation of the structure for residues 171 – 174 is also readily understood since the C-terminal part of this helix is deleted. The N terminus of the protein also makes contact with this helix, and with the C-terminal four residues of the C-domain, which accounts for its destabilisation. Residues 153– 170, which are less severely affected, make less extensive contacts with the C-domain, and retain many contacts with the core of the N-domain. The folding of PGK has been extensively studied,13,27 – 29 providing a detailed kinetic framework for the folding of each of the two domains, in both the intact protein and as isolated domains. Each domain folds via a kinetic intermediate that is formed on a millisecond or shorter timescale. In the N-domain, in particular, the stability of the folded state relative to the intermediate is modest, with KF/I , 1000. We have previously exploited this fact to obtain information about stable structure formed in the kinetic folding intermediate. This methodology relies on the fact that exchange occurring via I can have a significant effect on the rate of exchange occurring from the folded state due to the significant fraction of the time for which the I state is populated. However, the method is always limited by the extent to which other rapid exchange processes are occurring in the folded state, which will render any protection in the intermediate state undetectable. Our previous studies focussed on the isolated N-domain, due to its relatively tractable size for detailed NMR assignment. In the present study, by assigning the intact PGK protein, we are able to study the folding in the context of the intact protein, where the stability of structure in the vicinity of the interdomain contact region is increased significantly. This increase in local stability observed for the region of the N-domain comprising residues 145 –174 in the full-length protein enables the picture of the kinetic intermediate of the N-domain to be completed. From Figure 3 it is clear that residues 145– 174 are also stably hydrogen bonded in the kinetic intermediate. Thus, it appears that the answer to our original question13 as to the origin of the one-sidedness of the intermediate is that its true extent is masked by local exchange processes caused by removal of interactions with the C-domain. Thus, the entire N-domain acts concertedly in the formation of the kinetic refolding intermediate rather than there existing a distinct local folding nucleus. For the C-domain, the
1197
Folding and Stability of PGK
stability of the folded state relative to the intermediate is such that local exchange effects dominate even in the intact protein, and thus information regarding hydrogen bonding in its kinetic folding intermediate is not attainable by this method. Previously, NMR methods have predominantly been applied to the study of rather small proteins, which tend to fold as single units. The ability to rapidly assign proteins of the size of PGK will enable methods such as amide exchange in the folded state (as in the present study) or pulsed amide exchange to be applied to proteins with a number of extensive independently folding units.
Materials and Methods Protein expression and purification The plasmid pKK223-3 bearing the gene for the W290Y mutant of PGK from B. stearothermophilus was transformed into BL21(DE3) cells and the resultant cell line expressed W-PGK as a soluble protein under the control of the tac promoter. Cells were cultured in a defined medium containing M9 salts,42 1 mg/l thiamine and 650 ml/l trace elements solution (containing 5.5 g/l CaCl2·2H2O, 1.4 g/l MnSO4· H2O, 400 mg/l CuSO4·5H2O, 2.2 g/l ZnSO4·7H2O, 450 mg/l CoCl2·6H2O, 260 mg/l NaMoO4·2H2O, 400 mg/l H3BO4 and 260 mg/l KI) with 3 g/l of sodium acetate as the C source and 100 mg/l ampicillin. Except where stated otherwise, all cultures were incubated at 37 8C, with liquid cultures shaken at 250 rpm. 3 mM NaN3 was added to all buffers used during purification and NMR experiments. For the 15N-labelled sample, cells were grown in the defined M9 media (with (15NH4)2SO4 as the sole nitrogen source) until A600 nm ¼ 0.6. W-PGK production was induced by adding IPTG to a final concentration of 2 mM. After a further 16 hours, cells were harvested by centrifugation, resuspended in a minimal volume of 50 mM triethanolamine pH 7.5 buffer (containing 1£ Completee protease inhibitor cocktail), and lysed by sonication. Ammonium sulfate was added to 45% saturation to precipitate further contaminating proteins, which were removed by centrifugation. The protein was then further purified by hydrophobic interaction chromatography on a phenyl toyopearl 650S column. W-PGK eluted at approx. 1.5 M ammonium sulfate on a 600 ml 1.9 –1.0 M ammonium sulfate gradient in 50 mM Tris, pH 8.0. The samples gave a single band of apparent Mr 43 kDa on an SDS-PAGE gel, and the method yielded approximately 50 mg per litre of Escherichia coli culture. The purified protein was stored as a precipitate in an 80% saturated ammonium sulfate solution at 4 8C. For 15N/2H and 15N/13C/2H protein samples, the same defined media was used except that 2H2O replaced H2O, and the sole C source used for the triply-labelled sample was 13C/1H sodium acetate. Stock reagents were filter sterilized and the media was prepared, filter sterilized and used immediately to reduce 2H/1H exchange. Before expressing protein in 2H2O media, cells were acclimatized to deuterated defined M9 media by a gradual change in culture conditions.43 Initially a single colony was inoculated into 10 ml LB/H2O and incubated
until A600 nm ¼ 0.6. Cells were then diluted 20-fold into LB(90% 2H2O) and grown to A600 nm ¼ 0.6. Finally, these cells were diluted 20-fold into defined M9 media (3 g/l glucose as C-source, 90% 2H2O) and grown until A600 nm ¼ 0.6. 50% glycerol suspensions were flash frozen and stored at 280 8C. When required, the glycerol suspension was streaked on a defined M9 media agar plate (90% 2H2O, 3 g/l glucose, 200 mg/l ampicillin). After about 36 hours at 37 8C, single colonies were picked into 4 £ 10 ml defined M9 media (90% 2H2O, 3 g/l glucose) and grown overnight. This inoculum was diluted 50-fold into defined M9 media (99.8% 2H2O (15NH4)2SO4, 3 g/l sodium acetate). The cells were incubated until A600 nm ¼ 0.4 (about 23 hours) and then protein expression was induced by adding IPTG to a final concentration of 2 mM. The temperature was reduced to 34 8C and the cells were harvested after a further 14 hours. The protein purification procedure was identical with that used for protonated samples but yields were reduced to about 25 mg/l media. NMR sample preparation In order to reprotonate amide groups, the 2H/13C/15N sample was partially unfolded in 100 mM piperidine, 50 mM EDTA, 40 mM DTT, pH 11.45 at a final concentration of 0.1 mg/ml for 15 minutes and then refolded by addition of an equal volume of 1 M Tris pH 7.5. Precipitate was removed by filtration. For the assignment experiments, NMR samples at concentrations of approximately 1 mM were prepared by dialysis against 20 mM potassium phosphate buffer containing 3 mM NaN3 at pH 7.50. 10% (v/v) 2H2O was added as an internal lock. To prepare a fully amide deuterated 2H/15N sample for hydrogen exchange experiments, the sample was maintained at 37 8C in 20 mM potassium phosphate (pHread ¼ 8.75, 2H2O) until amide deuteration was complete as determined by the absence of cross-peaks in HSQC spectra. The solution was then dialysed into 2 H2O before snap freezing in liquid nitrogen and freezedrying. NMR backbone assignment experiments NMR experiments were acquired at 37 8C on a Bruker DRX-800 spectrometer (HN(CA)CO) and a Bruker DRX600 spectrometer (all others). The following experiments were recorded: 2D 15NTROSY;30 3D TROSY ct-HNCA, ct-HN(CO)CA, HN(CA)CB, HN(COCA)CB and 3D TROSY HN(CA)CO;31 3D TROSY HNCO.32 All experiments employed water flip-back pulses to minimise saturation of the water signal. The protein was expressed in bacteria grown in 2H2O with 13C/1H sodium acetate as the carbon source. This is much more economical than using a 13C/2H labelled carbon source, but does not produce such high levels of deuteration. Presumably, as a result of this, we found that the constant time HN(CA)CB and HN(COCA)CB experiments were not adequately sensitive,37 and thus we used non-constant time versions. Although clearly the constant time experiments give higher resolution,37 we found the simulated annealing strategy was still able to assign W-PGK with data from ct-CA experiments and non-ct-CB experiments, and without use of carbonyl data. For a high resolution 2D 15N-TROSY, the numbers of
1198
Folding and Stability of PGK
acquired complex data pairs, acquisition times, and final matrix size were (1H, 15N) 4096 £ 300, 273 ms £ 143 ms, and 2048 £ 2048, respectively. For the 3D spectra, the numbers of acquired complex data pairs (1H, 15N, 13C), acquisition times, and final matrix sizes were HNCA/ HN(CO)CA 2048 £ 48 £ 105, 136 ms £ 22.8 ms £ 24.7 ms, 1024 £ 128 £ 256; HNCACB/HN(CA)CB 2048 £ 48 £ 80, 136 ms £ 22.8 ms £ 8 ms, 1024 £ 128 £ 256; HNCO 2048 £ 48 £ 81, 136 ms £ 22.8 ms £ 36.1 ms, 1024 £ 128 £ 256; HN(CA)CO 2048 £ 40 £ 32, 102 ms £ 14.2 ms £ 15.6 ms, 1024 £ 128 £ 256. The direct proton dimension was apodised using a 38 (2D) or 608 (3D) shifted sine-bell. The constant time 15N time domain data were extended to double the acquired length by linear prediction, then apodised using a 908 shifted squared sine-bell. Constant time 13C dimensions were apodised with a 908 shifted sine-bell. Non-constant time 13C dimensions were apodised with a 608 shifted sine-bell. Chemical shifts were referenced relative to the 1H signal of TSP, using 15N/1H and 13C/1H g-ratios of 0.101329118 and 0.25144953, respectively. NMR data were processed and peak picked using FELIX 2000 software (Accelrys). The peak picking was performed semi-manually in FELIX, whereby the user selects a point close to the maximum of a peak and the software optimises the position. In-house scripts were used to reformat peak tables for the simulated annealing program of the asstool suite, and to analyse the output of the program. Simulated annealing assignment
N RES X
X
j¼2
L
j
j21
aL fL ðdpL 2 dL Þ þ
(a) Square well fL ðxÞ ¼ 21; lxl , tolL ; fL ðxÞ ¼ 0;
lxl $ tolL
or
(b) Power law with cutoff:
fL ðxÞ ¼ fL ðxÞ ¼ 0;
lxl tolL
g
21;
lxl , tolL ;
lxl $ tolL
the choice between forms, and the choice of power g and tolerance tolL are user-definable, aL is a weighting term for nuclear type L.
If the chemical shift information is missing (either for a void spin system, or for an incomplete spin system), then: fL ðxÞ ¼ 21
We describe here the aspects of the program pertinent to this study. Further details can be found in Sze, Barsukov, Lian & Roberts (unpublished results), and the program is available on request from I.G.B. Scripts for formatting input and analysing output data can be obtained from C.J.C. The primary input for the simulated annealing program from the asstool suite is the amino acid sequence and a table of arbitrarily numbered spin systems. We will assume below that the spin systems are derived from a conventional 1H – 15N based strategy from experiments such as HNCO, but the program does not impose this. For each spin system, a number of experimentally determined chemical shifts can be specified. These are typically the intraresidue Ca, Cb, and C0 chemical shifts and the corresponding shifts for the preceding residue, pCa, pCb, and pC0 . Values can also be specified as missing. If less spin systems are specified than the number of residues for which NH correlations are in principle detectable (i.e. excluding proline residues, and the N-terminal residue) then the necessary number of “void” spin systems are automatically generated by the program. In our main assignment runs, we did not include the carbonyl shifts, in order to reserve them as a check on the assignment. Initially spin systems are randomly assigned to residues to generate a starting “configuration”. A simulated annealing Metropolis Monte Carlo algorithm is then used to minimise an energy function defined by: E¼
energy, which describes how well the chemical shifts of spin systems match those expected for the residue types to which they are assigned. NRES is the number of residues. The sum on L can run over nuclear types Ca, Cb, C0 etc as specified in the table of spin systems. djX is the shift of nucleus X in the spin system assigned to residue j, fL can have two main forms, either
N RES X
X
j¼1
B
j
bB gB ðRB Þ
where the first double sum is termed the linking energy, which describes how well sequential chemical shifts match, and the second double sum is termed the binding
The sum on B can run over the Ca, Cb, C0 , pCa, pCb and pC0 chemical shifts. RjB is 0 if the B shift of the spin system assigned to residue j lies within a range specified for that residue type in a library derived from the data of Grzesiek & Bax,44 otherwise RjB is the amount by which this shift lies outside the library range. gB(x) can take the same forms as fL(x). bB is a weighting term for the chemical shift type B. A standard Metropolis Monte Carlo scheme is used whereby a trial new configuration is generated from the current configuration via the reconfiguration scheme described below. If this new configuration has a lower energy than the current configuration it is accepted as the new configuration. If the new configuration has a higher energy than the current configuration, then it is accepted with a probability exp(2 DE/T), where DE is the difference in energy between the two configurations and T is the current “temperature” of the system. Simulated annealing is achieved by iterating the Monte Carlo procedure at an elevated temperature (where configurations that make the energy worse have a higher probability) and then reducing it gradually to a very low temperature. A number of reconfiguration schemes were tried, of which the most efficient was found to be a version of the random segment best swap scheme. In this scheme, a segment of the configuration is chosen at random, and a systematic search is performed to find the position into which this segment can be swapped to generate the configuration with the lowest energy. In broad outline, the segment is chosen such that a balance is struck between moving round large parts of the configuration, ensuring that adjacent portions for which there is inadequate information to assess their sequential link are not moved together, and ensuring that adequate shuffling of the configuration occurs. In detail, a spin system is randomly chosen as the
1199
Folding and Stability of PGK
head of a segment. The segment is then generated by extending out from the head of the segment in one direction only by one spin system at a time. The extension only proceeds if there are sufficient nuclei for which chemical shift information is available to assess the sequential link between the two spin systems. At this stage of generating segments, the degree to which these links are within tolerance is not relevant, but rather it is the possibility of assessing the link that is crucial. The extension direction is chosen at random for each segment. The initial temperature is NRES £ TINIT, and is reduced by a factor of TFACTOR at each temperature cycle until no higher or lower energy configuration is obtained for three successive temperature cycles. At each temperature, configurations are generated until either the number of successful reconfigurations exceeds NRES £ NSUCC, or the total number of reconfiguration trials exceeds NRES £ NRECONF. The first condition typically occurs first at high temperatures, whereas the second condition becomes limiting at low temperatures. In a configuration there will typically be a number of “void” spin systems. These are used to handle spin systems for which no chemical shift information is available, of which there are two distinct types. The first type is spin systems that correspond to proline residues (and unprotected N-terminal residues). For these residues, void spin systems are fixed at corresponding positions in the configuration. The second type is spin systems that must be added to account for “missing” spin systems for which no NH correlation is observed. By default, this type of void spin system is not fixed to a particular position in the configuration, with the reconfiguration rules described above forcing them to be moved as single spin system segments. As an example, consider a ten-residue peptide with an unprotected N terminus, a proline at position five and for which only seven NH correlations are observed. The observed spin systems would be numbered 1 – 7, and void spin systems would be denoted by 0. Example configurations would be 0547020136, 0607021453, or 0763001254, where the movable void spin system is marked in bold. If the sequence position of a missing spin system can be determined then a void spin system can be fixed at that position. For the assignment of W-PGK, the following parameter values were used. For both the linking and binding terms, the default value of g ¼ 1 was used, with tolerances of 0.5 ppm and 5 ppm, respectively. aL ¼ bB ¼ 1 for all L and B, TINIT ¼ 0.15, TFACTOR ¼ 0.9, NSUCC ¼ 10 and NRECONF ¼ 100. On a Silicon Graphics R10000 processor, 30 runs of the program for the WPGK dataset took seven CPU hours. Void information must a priori be as acceptable at any position as a good match. Since there are only enough void spins available to give the correct total number of spin systems, then the misplacement of a void spin is only possible if the displaced spin system can fit in an alternative position. If a square well potential is used, then the assignment of a void spin system scores equally with the assignment of a spin system that is within tolerance. For a power law well then a void spin does score better than one with a small lxl but empirically this is balanced by the necessity to fit the displaced spin elsewhere as described above. The parameters used here were found to work well for the present dataset. In other cases, it may be necessary to adjust the weighting of missing data and void spin systems.
Amide exchange Freeze-dried protein (see above) was resuspended in 20 mM potassium phosphate buffer containing 3 mM NaN3, 5% (v/v) 2H2O, pH 7.50. Spectra were acquired at 37 8C. Acquisition of the first spectrum started 20 minutes after resuspension. 1H – 15N TROSY spectra were acquired with a 15N acquisition time of 42 ms, and a total experiment time of one hour. Spectra were acquired every hour for 20 hours, then every two hours for the following 20 hours, and every four hours for a final 48 hours, giving a total experiment time of 88 hours. Peak heights were extracted using FELIX, and fitted to single exponentials using in house software. Intrinsic exchange rates were determined using the method of Bai et al.,35 as implemented†. Data for N-PGK were acquired and analysed as described by Hosszu et al.13 Survey of sizes of assignments in the BMRB All entries in the BioMagResBank on 20 January 2003‡ were downloaded in NMRStar format. The strings “_Chem_shift_ambiguity_code” and “stop_” were used to delimit individual chains, and the number of Ca entries were then counted.
Acknowledgements This work was funded by project grants and equipment funding from BBSRC and the Wellcome Trust. We acknowledge the use of the BBSRC U.K. National 800 MHz NMR facility at the University of Cambridge, and thank Daniel Nietlispach for his expert assistance. We acknowledge the provision of FELIX software from Accelrys. The Krebs Institute is a designated BBSRC Center and a member of NESBIC. We thank Martin Parker for helpful discussions.
References 1. Clarke, A. R. & Waltho, J. P. (1997). Protein folding and intermediates. Curr. Opin. Biotechnol. 8, 400– 410. 2. Roder, H. & Shastry, M. C. R. (1999). Methods for exploring early events in protein folding. Curr. Opin. Struct. Biol. 9, 620– 662. 3. Dyson, H. J. & Wright, P. E. (1996). Insights into protein folding from NMR. Annu. Rev. Phys. Chem. 47, 369 –395. 4. Brockwell, D. J., Smith, D. A. & Radford, S. E. (2000). Protein folding mechanisms: new methods and emerging ideas. Curr. Opin. Struct. Biol. 10, 16 –25. 5. Dalby, P. A., Oliveberg, M. & Fersht, A. R. (1998). Folding intermediates of wild-type and mutants of barnase. I. Use of phi-value analysis and m-values to probe the cooperative nature of the folding preequilibrium. J. Mol. Biol. 276, 625–646. 6. Qi, P. X., Sosnick, T. R. & Englander, S. W. (1998). The † http://www.fccc.edu/research/labs/roder/sphere/ sphere.html ‡ BMRB, http://www.bmrb.wisc.edu/
1200
7. 8.
9.
10.
11. 12. 13.
14.
15.
16.
17.
18.
19. 20. 21. 22.
23. 24.
burst phase in ribonuclease A folding and solvent dependence of the unfolded state. Nature Struct. Biol. 5, 882– 884. Jennings, P. A. & Wright, P. E. (1993). Formation of a molten globule intermediate early in the kinetic folding pathway of apomyoglobin. Science, 262, 892– 896. Englander, S. W., Sosnick, T. R., Mayne, L. C., Shtilerman, M., Qi, P. X. & Bai, Y. W. (1998). Fast and slow folding in cytochrome c. Accts Chem. Res. 31, 737– 744. Shastry, M. C. R., Sauder, J. M. & Roder, H. (1998). Kinetic and structural analysis of submillisecond folding events in cytochrome c. Accts Chem. Res. 31, 717– 725. Parker, M. J., Dempsey, C. E., Hosszu, L. L. P., Waltho, J. P. & Clarke, A. R. (1998). Topology, sequence evolution and folding dynamics of an immunoglobulin domain. Nature Struct. Biol. 5, 194– 198. Fersht, A. R. (2000). A kinetically significant intermediate in the folding of barnase. Proc. Natl Acad. Sci. USA, 97, 14121– 14126. Takei, L., Chu, R. A. & Bai, Y. W. (2000). Absence of stable intermediates on the folding pathway of barnase. Proc. Natl Acad. Sci. USA, 97, 10796– 10801. Hosszu, L. L. P., Craven, C. J., Parker, M. J., Lorch, M., Spencer, J., Clarke, A. R. & Waltho, J. P. (1997). Structure of a kinetic protein folding intermediate by equilibrium amide exchange. Nature Struct. Biol. 4, 801– 804. Heidary, D. K., Gross, L. A., Roy, M. & Jennings, P. A. (1997). Evidence for an obligatory intermediate in the folding of interleukin-1beta. Nature Struct. Biol. 4, 725– 731. Capaldi, A. P., Shastry, M. C. R., Kleanthous, C., Roder, H. & Radford, S. E. (2001). Ultrarapid mixing experiments reveal that Im7 folds via an on-pathway intermediate. Nature Struct. Biol. 8, 68 – 72. Capaldi, A. P., Ferguson, S. J. & Radford, S. E. (1999). The Greek key protein apo-pseudoazurin folds through an obligate on-pathway intermediate. J. Mol. Biol. 286, 1621– 1632. Tsui, V., Garcia, C., Cavagnero, S., Siuzdak, G., Dyson, H. J. & Wright, P. E. (1999). Quench-flow experiments combined with mass spectrometry show apomyoglobin folds through an obligatory intermediate. Protein Sci. 8, 45 – 49. Kuwata, K., Shastry, R., Cheng, H., Hoshino, M., Batt, C. A., Goto, Y. & Roder, H. (2001). Structural and kinetic characterization of early folding events in beta-lactoglobulin. Nature Struct. Biol. 8, 151– 155. Capaldi, A. P., Kleanthous, C. & Radford, S. E. (2002). Im7 folding mechanism: misfolding on a path to the native state. Nature Struct. Biol. 9, 209– 216. Sosnick, T. R., Mayne, L., Hiller, R. & Englander, S. W. (1994). The barriers in protein-folding. Nature Struct. Biol. 1, 149– 156. Fersht, A. (1999). Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding, W.H. Freeman, New York. Matouschek, A., Kellis, J. T., Serrano, L., Bycroft, M. & Fersht, A. R. (1990). Transient folding intermediates characterized by protein engineering. Nature, 346, 440–445. Clarke, J. & Itzhaki, L. S. (1998). Hydrogen exchange and protein folding. Curr. Opin. Struct. Biol. 8, 112 –118. Li, R. H. & Woodward, C. (1999). The hydrogen
Folding and Stability of PGK
25.
26.
27.
28.
29.
30.
31. 32. 33.
34.
35.
36.
37.
38.
39. 40.
exchange core and protein folding. Protein Sci. 8, 1571– 1590. Staniforth, R. A., Dean, J. L. E., Zhong, Q., Zerovnik, E., Clarke, A. R. & Waltho, J. P. (2000). The major transition state in folding need not involve the immobilization of side chains. Proc. Natl Acad. Sci. USA, 97, 5790– 5795. Davies, G. J., Gamblin, S. J., Littlechild, J. A., Dauter, Z., Wilson, K. S. & Watson, H. C. (1994). Structure of the ADP complex of the 3-phosphoglycerate kinase ˚ . Acta from Bacillus stearothermophilus at 1.65 A Crystallog., sect. D, 50, 202– 209. Parker, M. J., Spencer, J., Jackson, G. S., Burston, S. G., Hosszu, L. L. P., Craven, C. J. et al. (1996). Domain behavior during the folding of a thermostable phosphoglycerate kinase. Biochemistry, 35, 15740 –15752. Parker, M. J., Sessions, R. B., Badcoe, I. G. & Clarke, A. R. (1996). The development of tertiary interactions during the folding of a large protein. Fold. Des. 1, 145– 156. Hosszu, L. L. P., Craven, C. J., Spencer, J., Parker, M. J., Clarke, A. R., Kelly, M. & Waltho, J. P. (1997). Is the structure of the N-domain of phosphoglycerate kinase affected by isolation from the intact molecule? Biochemistry, 36, 333– 340. Pervushin, K. V., Wider, G. & Wuthrich, K. (1998). Single transition-to-single transition polarization transfer (ST2-PT) in [N-15,H-1]-TROSY. J. Biomol. NMR, 12, 345– 348. Yang, D. W. & Kay, L. E. (1999). TROSY triple-resonance four-dimensional NMR spectroscopy of a 46 ns tumbling protein. J. Am. Chem. Soc. 121, 2571 –2575. Yang, D. W. & Kay, L. E. (1999). Improved (HN)-H-1detected triple resonance TROSY-based experiments. J. Biomol. NMR, 13, 3 –10. Bernstein, B. E., Michels, P. A. M. & Hol, W. G. J. (1997). Synergistic effects of substrate-induced conformational changes in phosphoglycerate kinase activation. Nature, 385, 275– 278. Cornilescu, G., Delaglio, F. & Bax, A. (1999). Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR, 13, 289– 302. Bai, Y. W., Milne, J. S., Mayne, L. & Englander, S. W. (1993). Primary structure effects on peptide group hydrogen exchange. Proteins: Struct. Funct. Genet. 17, 75 – 86. Tugarinov, V., Muhandiram, R., Ayed, A. & Kay, L. E. (2002). Four-dimensional NMR spectroscopy of a 723-residue protein: chemical shift assignments and secondary structure of malate synthase G. J. Am. Chem. Soc. 124, 10025– 10035. Gardner, K. H., Zhang, X. C., Gehring, K. & Kay, L. E. (1998). Solution NMR studies of a 42 kDa Escherichia coli maltose binding protein beta-cyclodextrin complex: chemical shift assignments and analysis. J. Am. Chem. Soc. 120, 11738– 11748. Seavey, B. R., Farr, E. A., Westler, W. M. & Markley, J. L. (1991). A relational database for sequencespecific protein NMR data. J. Biomol. NMR, 1, 217– 236. Moseley, H. N. B. & Montelione, G. T. (1999). Automated analysis of NMR assignments and structures for proteins. Curr. Opin. Struct. Biol. 9, 635–642. Mcfeeters, R. L., Swapna, G. V. T., Montelione, G. T. & Oswald, R. E. (2002). Semi-automated backbone resonance assignments of the extracellular ligand binding domain of an ionotrophic glutamate receptor. J. Biomol. NMR, 22, 297– 298.
1201
Folding and Stability of PGK
41. Andrec, M. & Levy, R. M. (2002). Protein sequential resonance assignments by combinatorial enumeration using 13Ca chemical shifts and their (i, i 2 1) sequential connectivities. J. Biomol. NMR, 23, 263–270. 42. Sambrook, S., Fritsch, E. F. & Maniatis, T. (1989). Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 43. Venters, R. A., Huang, C. C., Farmer, B. T., Trolard, R., Spicer, L. D. & Fierke, C. A. (1995). High-level
2
H/13C/15N labelling of proteins for NMR studies. J. Biomol. NMR, 5, 339– 344. 44. Grzesiek, S. & Bax, A. (1993). Amino-acid type determination in the sequential assignment procedure of uniformly C-13/N-15-enriched proteins. J. Biomol. NMR, 3, 185– 204. 45. Wishart, D. S., Bigam, C. G., Holm, A., Hodges, R. S. & Sykes, B. D. (1995). H-1, C-13 and N-15 random coil NMR chemical shifts of the common amino acids. 1. Investigations of nearest neighbor effects. J. Biomol. NMR, 5, 67 – 81.
Edited by C. R. Matthews (Received 24 January 2003; received in revised form 24 April 2003; accepted 8 May 2003)