Proteins Studied Using NMR Spectroscopy

Proteins Studied Using NMR Spectroscopy

PROTEINS STUDIED USING NMR SPECTROSCOPY 1885 Proteins Studied Using NMR Spectroscopy Paul N Sanderson, GlaxoWellcome Research and Development, Steven...

145KB Sizes 1 Downloads 441 Views

PROTEINS STUDIED USING NMR SPECTROSCOPY 1885

Proteins Studied Using NMR Spectroscopy Paul N Sanderson, GlaxoWellcome Research and Development, Stevenage, UK Copyright © 1999 Academic Press

Introduction The study of proteins by NMR spectroscopy has gained great impetus in recent years, providing a focus for the proliferation of many new complex NMR experiments and, quite possibly, justification for the purchase of more high-field spectrometers than any other field within NMR. The first 1H NMR spectrum of a protein was published in 1957; this accurately reflected the amino acid composition but had neither the sensitivity nor resolution to yield further information. In the last two decades, however, many protein structures have been characterized and greater insight into the activity of proteins has been obtained from protein NMR studies. These achievements became possible as a result of the concurrent development of higher field (≥ 500 MHz) superconducting magnets, powerful computational hardware and software, complex multidimensional heteronuclear NMR experiments and isotopic labelling techniques. The quest for structural knowledge has been driven by the recognition that functions of biologically active proteins (such as enzymes, hormones and receptors) are fundamentally dependent on their three-dimensional structure. The development of NMR techniques, in parallel with X-ray crystallography, to obtain greater structural information from increasingly complex protein systems has resulted in increased understanding of biological processes. NMR also has a role in the characterization of protein interactions via, for example, titrations that map the binding surface of a protein through specific chemical shift changes. These binding studies, which are discussed in a separate article, benefit from prior characterization of solution structure by NMR and it is the structural aspects of protein NMR that are the focus of the present article. The theory and practical aspects of many areas of NMR spectroscopy which are of importance to protein NMR studies are described in detail elsewhere in this Encyclopedia; these are consequently mentioned only briefly here. Protein NMR has been thoroughly documented in many books and reviews and for more detailed discussion the reader is directed towards the Further reading section.

MAGNETIC RESONANCE Applications

What proteins are suitable for study by NMR? Full structural characterization of proteins in solution is possible for proteins of up to ~100 amino acids using homonuclear proton NMR. For larger proteins, of up to ~30 kDa, isotopic enrichment with 15N and 13C is required. There is no clear ‘cut-off’ in terms of protein size; each protein has to be considered on its own merits. Often, the likelihood of a successful structure determination only becomes clear after considerable protein purification and preliminary one-dimensional (1D) 1H NMR data have been obtained. Proteins must be non-aggregated and monomeric, or at least not present as large heteromultimers, under conditions of NMR measurements. Aggregated proteins give increased line widths and thus reduced spectral resolution and sensitivity. Internal mobility and the presence of multiple interconverting conformations also influence resonance line width.

Protein structure Proteins are composed of a linear sequence of Lamino acid residues linked via amide bonds. The 20 naturally occurring amino acids are distinguished by the chemical nature of their side-chains. The amide bonds, or peptide linkages, are essentially planar and provide structural rigidity to the protein backbone, the only freedom to rotate being around the bonds to the α carbon. The angles of rotation are called, in IUPAC nomenclature, phi (I) for the N–Cα bond [C(O)i–1–Ni–Cαi–C(O)i] and psi (\) for the Cα–C(O) bond [Ni–Cαi–C(O)i–Ni+1] (Figure 1). If these two torsion angles are known for each amino acid the conformation of the whole polypeptide backbone is defined. The linear sequence of amino acids represents the primary structure of the protein. Local regions within this sequence can adopt stable, defined, secondary structure such as α helices or β sheets. The packing of these secondary structural elements into compact domains gives rise to the protein’s tertiary structure, such that distant regions of the peptide chain can be spatially close together. Multimeric

1886 PROTEINS STUDIED USING NMR SPECTROSCOPY

Figure 1 Stylized representation of a portion of a polypeptide chain, indicating the nomenclature of the backbone torsion angles (1 and \) and the side-chain torsion angle (F1) for the bonds emanating from the α carbon of an amino acid residue(i ).

proteins are composed of several polypeptide chains arranged together in a quaternary structure. Adoption of the correct tertiary and quaternary structure is usually essential for biological function of a protein and a central dogma of biochemistry is that knowledge of a protein’s structure leads to greater understanding of its activity.

Isotopic labelling of proteins The range of proteins that can be studied by NMR and the nature of the structural information available have been extended through the biosynthetic incorporation of 15N and 13C isotopes. Data on isotopes encountered most frequently in protein NMR are given in Table 1. Most proteins for structural analysis are prepared in cultures of bacteria or yeast that have been genetically modified to overexpress the protein of interest. Typically, a primary expression of 20–100 mg of protein is required for a structural NMR study. Escherichia coli (E. coli) bacteria are preferentially used as they can be grown rapidly in large quantities on chemically defined media which can be supplemented with 15N-labelled ammonium salts and 13CTable 1 Properties of isotopes most commonly encountered in protein NMR

Isotope 1

H

2

H

Resonance Natural abundance frequency at 14.0926 T (MHz) (%) 99.98

Relative sensitivity

600.00

1.0000 0.00965

0.015

92.10

13

1.108

150.86

0.0159

15

0.37

60.80

0.00104

242.88

0.0663

C N

31

P

100

labelled compounds, such as glucose, acetate or glycerol, as required. It is essential to exclude all sources of natural abundance nitrogen and carbon from growth media to maximize incorporation of isotopic label. The production of 15N-labelled protein is reasonably straightforward and relatively inexpensive, provided it can be expressed in E. coli. Additional incorporation of 13C-label is more costly, but necessary for full analysis of larger proteins. The activity of expressed proteins should be validated to ensure correct folding and show that isotope incorporation has not impaired function. Mammalian proteins produced in bacteria or yeast may not undergo correct post-translational processing, i.e. disulfide cross-linking, protein folding, phosphorylation or glycosylation. This problem may be overcome by expression in baculovirus systems, in insect cells or in mammalian cell lines such as Chinese hamster ovary (CHO) cells. Mammalian cells will not grow on minimal media, so their growth media must contain appropriately labelled amino acids, which can be obtained, for example, from hydrolysates of labelled algae. Isotopic labelling with deuterium (2H) can be used to provide spectral editing. For example, specific incorporation of a deuterium atom into methylene or methyl groups of amino acids can be used to obtain detailed structural and dynamic information. Dramatic increases in proton NMR resolution for larger proteins can be achieved through random labelling of the protein with deuterium, at levels between 50 and 85%, by growth on substrates in which the ratio of 1H to 2H is controlled.

Preparation of protein samples for NMR A typical sample for protein NMR will contain 1 mM protein, of at least 95% purity, in 0.5 mL of aqueous solution. The final stages of purification may include desalting, buffer exchange, 2H2O exchange and concentration by lyophilization (if the protein is sufficiently stable), vacuum centrifugation or ultrafiltration. Sample volumes can be as small as 100 µL (for 2.5 or 3 mm diameter tubes) if only limited amounts of protein are available. Data collection can take several days, during which the protein must remain stable; therefore oxidation, hydrolysis and microbial contamination must be minimized. Samples are not usually degassed, as protein solutions tend to ‘froth’ and paramagnetic broadening by oxygen can usually be ignored. Cysteine and methionine residues are susceptible to oxidation, so spontaneous formation of

PROTEINS STUDIED USING NMR SPECTROSCOPY 1887

non-native intra- or intermolecular disulfide bonds should be prevented by addition of low levels of dithiothreitol or β-mercaptoethanol. The growth of microorganisms can be prevented by sodium azide. The quality of NMR spectra of proteins can be strongly influenced by sample pH, ionic strength, buffer, concentration and temperature; optimum conditions are generally determined empirically via 1D spectra. For 1H NMR, solutions are generally buffered, at 10–50 mM, with phosphate (which can cause precipitation or aggregation of some proteins) or with deuterated buffer salts (e.g. Tris). Deuterated reducing agents, cation chelators and proteolytic enzyme inhibitors may also be incorporated as necessary.

Obtaining NMR data from proteins Protein NMR spectra are usually recorded in aqueous solutions to obtain data from the exchangeable amide protons. In these spectra the solvent water signal is at least 105 times more intense than the protein protons of interest and must be suppressed to allow detection of protein signals at an acceptable signalto-noise ratio. The most common method for suppressing the water signal is presaturation, in which continuous low level irradiation is applied at the frequency of the water resonance. This is the most effective method if the effects of exchange between solvent and solute are not important. Many alternative methods that do not reduce the intensity of solute signals in exchange with the water are available; these include the use of pulsed-field gradients. Full discussion of solvent suppression techniques can be found in a separate article. One-dimensional NMR data

Initial assessment of the suitability of a protein sample and preliminary solution optimization can be achieved via 1D 1H NMR experiments; these should also be used to assess the consistency of a sample before and after lengthy acquisitions. Broad signals may indicate protein aggregation and suggest that further experiments are not worth pursuing on that sample. The presence (or absence) of protein structure can be assessed from the distribution of signals; typically, upfield shifted methyl signals (δ 0), and downfield shifted alpha proton (δ 5–6) and NH proton (!δ 9) signals indicate that a protein is ‘structured’. These regions are highlighted in a 1H NMR spectrum of lysozyme in Figure 2. For a denatured (or random coil) protein, such signals are absent from these regions.

Multidimensional heteronuclear NMR experiments

Having established optimum sample conditions, all subsequent assignment and structural data are obtained from multidimensional NMR experiments. For smaller proteins, this can be achieved using twodimensional (2D) homonuclear proton NMR experiments: correlation spectroscopy (COSY) and total correlation spectroscopy (TOCSY) to give scalar through-bond coupling data, and nuclear Overhauser effect spectrometry (NOESY) to provide ‘through space’ information. For larger proteins, resonance overlap is considerable, increased line width results in cancellation of COSY cross peaks and much signal loss occurs during long TOCSY mixing times. Full assignment and NOE analysis can, in these cases, be achieved by spreading data into three or four dimensions to increase signal resolution and by further spectral simplification through 15N and 13C isotope editing. In heteronuclear correlation experiments, magnetization transfer between protons and heteronuclei can be via either heteronuclear single quantum coherence (HSQC) or heteronuclear multiple quantum coherence (HMQC) pathways. The HSQC sequence gives rise to narrower lines, but uses more pulses and requires a longer phase cycle than the HMQC. Thus, HSQC is used for 2D experiments where the highest resolution is required and HMQC is preferred for 3D sequences in which the experimental time is limited. Labelling with 15N alone can be sufficient to overcome spectral overlap for proteins of up to 20 kDa and, for these proteins, virtually complete resolution can often be achieved for the backbone amide groups in 2D-1H–15N HSQC experiments. These experiments are very robust and can be used to determine amide proton exchange rates or chemical shift temperature coefficients. For high protein concentrations 1H–15N HSQC data sets can be acquired rapidly; typically within 10 min for a 2 mM sample or 2–3 hr for a 0.2 mM protein sample. Consequently, this experiment has become the mainstay of NMR approaches to monitor the binding of ligands to 15N-labelled proteins through titration experiments. Information on side-chain resonances or sequential connectivities can be obtained by converting the 2D1H–15N heteronuclear correlation experiment into a 3D experiment by adding either a TOCSY or a NOESY step. These 3D experiments can be considered as a series of 2D homonuclear experiments in which each is edited by a different 15N frequency. Thus, having established amide 1H/15N pairs via the HSQC, the TOCSY-HMQC correlates alpha protons

1888 PROTEINS STUDIED USING NMR SPECTROSCOPY

Figure 2 One-dimensional 1H NMR spectrum of lysozyme in aqueous solution obtained with presaturation of the water signal, illustrating the range of proton chemical shifts expected for a random coil, or denatured, protein. The positions of upfield shifted methyl signals and downfield shifted alpha and amide proton signals that are indicative of a non-random, ordered protein structure are also shown.

(and in favourable cases side-chain protons) to these pairs and residue types can be assigned. Sequential assignment can then follow in a manner analogous to the 2D procedure (see below) via the NOESYHMQC spectrum. Several other 3D experiments have been devised to facilitate assignments of 15Nlabelled proteins. One such, which is of particular use for identifying helical regions of a protein, is the HMQC-NOESY-HMQC experiment, which allows identification of 1H–1H NOEs between residues with degenerate amide proton resonances. For larger, double-labelled proteins, assignments are made via a range of three- and four-dimensional triple resonance experiments in which sequential assignments are facilitated via magnetization transferred through 13C-couplings. These couplings are usually at least as large as the 13C line width for proteins of up to 30 kDa and are reasonably

independent of the protein backbone conformation. A summary of the magnitudes of coupling constants utilized in correlation experiments is shown in Figure 3. These experiments each focus on a particular coupling network (their nomenclature usually reflects this in a logical manner) and several different experiments may perform the same correlation, but via a different route. Residue-specific experiments can extend assignments from the peptide backbone along amino acid side-chains, for example through to the guanidino group of arginine residues. Sequence-specific assignments

NMR spectra of small proteins can be fully assigned in a systematic manner by a sequential assignment procedure using well-resolved 2D homonuclear spectra and a knowledge of the amino acid sequence of

PROTEINS STUDIED USING NMR SPECTROSCOPY 1889

are considered below in terms of the structural information they represent, provide useful restraints to allow calculation of protein structures and, in the case of proteins that cannot be crystallized, access to structural information that cannot be obtained by other means. NOEs Figure 3 Schematic representation of the peptide backbone, showing the magnitudes of the one-bond coupling constants that are utilized in multidimensional heteronuclear correlation experiments to provide sequential connectivities.

the protein. This procedure depends on the fact that for all of the sterically allowed values of the torsion angles I, \ and F1 (Figure 1) NOEs will be observed for at least one of the interproton distances between NH, αH or βH of adjacent amino acids. Hence, spin systems of individual amino acids are first identified via spin–spin couplings in COSY and TOCSY spectra and are then connected to neighbouring amino acids across the peptide linkage via ‘through-space’ correlations in NOESY spectra. The latter are often obtained from a small region of the 2D NOESY spectrum, the ‘fingerprint region’. This comprises amide proton frequencies in the directly detected dimension (F2) and α (and side-chain) proton frequencies in the indirectly detected dimension (F1) and contains both intra- and interresidue NH–αH connectivities, allowing sequential assignments to be made simply by ‘walking’ from peak to peak. The early stages of spin system identification are facilitated by the fact that many amino acids, either individually or in groups, give rise to characteristic peak patterns in COSY and TOCSY spectra and this feature provides the basis for automated assignment routines.

Protein structure information from NMR parameters Conformation-dependent data, primarily in the form of NOEs and scalar coupling constants, are obtained from essentially the same experiments (NOESY and COSY respectively) as the assignments. Additional structural information can also be extracted from chemical shifts, relaxation times and amide exchange data. It should be emphasized here that all of these NMR parameters are population-weighted and time averaged; thus, as molecular systems are inherently dynamic in nature, they do not in general represent single, precise values of interatomic distances and angles. Nevertheless these NMR parameters, which

A cross peak in a NOESY experiment indicates dipolar cross relaxation between two nuclei that are spatially close to one another. The cross peak intensity is dependent upon the inverse sixth power of the distance between the two nuclei. Thus, for two protons, i and j separated by a distance r, which give rise to a NOESY cross peak, the intensity of that cross peak I is proportional to r –6. This simplified relationship assumes that the protein can be considered as a rotating rigid body in which correlation times for all proton pairs are the same, and are equal to the correlation time for the overall tumbling of the protein. For a more thorough analysis, account must be taken of, amongst other factors, internal mobility and spin diffusion. These are dealt with in greater detail in a separate article. Interproton distances can, therefore, be determined from unambiguously assigned, well-resolved, high signal-to-noise NOESY data, by analysis of cross peak intensities. These may be obtained by volume integration and can be converted into estimates of interproton distances, using the equation above, for NOESY data at short mixing times. In this method, each proton pair is considered in isolation and NOESY cross peak intensities are compared with a reference cross peak from a proton pair of fixed distance, such as a geminal methylene proton pair or aromatic ring protons. A problem inherent to this approach is that the fixed distance is usually smaller than the unknown distance; this usually leads to systematic underestimation of the latter. An alternative method of analysis of NOESY data, which is usually sufficient for resolved peaks with a digital resolution much greater than the intrinsic line width and coupling constants, is to measure the maximum peak amplitude or to count the number of contours. NOESY cross peaks can then be classified as strong, medium or weak and can be translated into upper distance restraints of around 2.5, 3.5 and 5.0 Å respectively. The lower distance constraint is usually the sum of the van der Waals radii (1.8 Å for protons). This simple approach is reasonably insensitive to the effects of spin diffusion or non-uniform correlation times and can usually lead to definition of the global fold of the protein, provided a sufficiently large number of NOEs have been identified.

1890 PROTEINS STUDIED USING NMR SPECTROSCOPY

Greater accuracy can be achieved by methods that involve calculation of a full relaxation matrix from the NOESY data to generate interproton distances. A model protein structure can then be iteratively refined by back calculation until differences in the empirical and calculated data are minimized. The resulting distances can be used as restraints for further refining the protein structure by distance geometry or molecular dynamics methods.

Table 2 NMR parameters that define conformations about the Cα–Cβ bond in amino acids

Coupling constants

Conformation

g–

t

g+

F1

60°

180°

–60° 2.2–3.1 Å

Geometric information, particularly for the bonds around the peptide backbone (Figure 1), can be obtained from vicinal spin–spin coupling constants. The magnitude of the coupling constant J is dependent upon the dihedral angle T as well as the nature and orientation of substituents in a manner that is defined by the Karplus relationship, which has the general form:

Parameter

d NH,Hβ2

3.5–4.0 Å

2.5–3.4 Å

NH–Hβ2 NOE

Weak

Strong/medium Strong

d NH,Hβ3

2.5–3.4 Å

2.2–3.1 Å

NH–Hβ 3 NOE

Strong/medium Strong

H α–Hβ2 NOE

Strong

Hα–H β 3 NOE Strong

Weak

Strong

Weak

Weak

Strong

J HαH β2

< 5 Hz

< 5 Hz

> 10 Hz

J H αH β 3

< 5 Hz

> 10 Hz

< 5 Hz

J N,H β 2

∼ 5 Hz

∼ 1 Hz

∼ 1 Hz

J N,Hβ 3

∼ 1 Hz

∼ 1 Hz

∼ 5 Hz

3 3 3 3

For a given coupling constant there are up to four valid solutions of the Karplus equation although knowledge of protein torsion angles (from protein structure databases) can be used to discount unlikely values. For example, the backbone torsion angle I is usually negative, except in the case of asparagine, aspartate and glycine residues. The 3JNH,αH coupling constant, which is dependent upon the dihedral angle: [HN i –Ni –Cαi –Hαi] (T = I –60° for L-amino acids), is commonly used for assessing secondary structure. Thus, a sequence of small (< 5 Hz) values indicates an α helix, whereas extended βstructures have large (> 9 Hz) values that reflect the trans relationship of the NH and αH protons. Intermediate values are indicative of nonstandard structure or conformational averaging. The 3J NH,αH values can be measured, in exceptional cases for small proteins, from high digital resolution 1D spectra but are more commonly obtained from 2D DQF-COSY spectra. If 15N-labelled protein is available they can be extracted from 15N-filtered correlation experiments and, in cases of signal overlap or insufficient digital resolution, from cross peak intensities in quantitative J-correlation experiments. The 3JαH, βH vicinal coupling constants can be determined from Exclusive COSY (E-COSY) type experiments and, together with intraresidue NOEs, can be used to obtain stereospecific assignments of β-methylene protons and side-chain F angle restraints (Table 2).

3.5–4.0 Å

Chemical shifts

The chemical shift of a resonance reflects the chemical environment of the atom that gives rise to it. This is determined mostly by covalent bonding and to a smaller extent by the non-bonded environment. In unstructured peptides each amino acid exists in an ensemble of conformations and the random coil chemical shift represents the population-weighted mean value of these environments. The chemical shifts for amino acids in denatured proteins are close to random coil values; however, for structured proteins, many resonances are far removed from their random coil position (Figure 2). Even greater changes in proton chemical shift can result from the proximity of a proton to an aromatic ring (the ring current effect) or to a paramagnetic centre, as found in haem proteins such as haemoglobin and cytochromes. The availability of chemical shift assignments for many proteins of defined structure has allowed changes in chemical shift from random coil values to be related to secondary structure and, consequently, to be used predictively. Thus, upfield shifts of ∼0.3 ppm are characteristic for α protons in α helices, whereas α protons in β sheets experience downfield shifts of ∼0.3 ppm. Similar effects are observed for 13C chemical shifts of α carbons, which are shifted

PROTEINS STUDIED USING NMR SPECTROSCOPY 1891

downfield by ~3 ppm in α helices and upfield by ~1.5 ppm in β sheets. A sequence of similar changes can therefore help in the initial characterization of regions of secondary structure. Chemical shifts can also be used for refining protein tertiary structure; 13C shifts, in particular, are sensitive to backbone geometry and can therefore help define backbone torsion angles. Relaxation times

The introduction of 15N and/or 13C labels into a protein facilitates the study of dynamic properties and, in particular, localized intramolecular motions. This arises because relaxation of these nuclei is usually dominated by dipole–dipole interactions with the directly bonded proton and this relaxation is dependent upon internuclear distance (which is fixed) and the rotational correlation time, which is only uniform throughout a ‘rigid’ protein. Proteins, however, usually contain regions that have greater flexibility, such as surface loops, which have different local correlation times that are reflected in heteronuclear relaxation times. Amide proton exchange and other exchange effects

Table 3 Characterization of protein secondary structure from NMR data

Parameter a

α Helix b

β Sheet c

I[C(O)i –1–Ni–Cαi –C(O)i ]

–57°

–139°

< 4 Hz

> 9 Hz

d αN (i,i ) (NOE intensity)

2.6 Å (strong)

2.8 Å (strong)

d α N(i,i +1) (NOE intensity)

3.5 Å (weak)

2.2 Å (very strong)

d αN(i,i +2) (NOE intensity)

4.4 Å (weak)



d αN(i,i +3) (NOE intensity)

3.4 Å (medium)



dα β (i,i +3) (NOE intensity)

2.5–4.4 Å (medium)



d αN (i,i+4) (NOE intensity)

4.2 Å (weak)



d NN (i,i+1) (NOE intensity)

2.8 Å (strong)

4.3 Å (weak)

d NN (i,i+2) (NOE intensity)

4.2 Å (weak)



d αα (i,j ) (NOE intensity)



2.3 Å (very strong)

d αN (i,j) (NOE intensity)



3.2 Å (medium)

d NN (i,j) (NOE intensity)



3.3 Å (medium)

NH exchange rate

Slow

Slow

1

~ –0.3 ppm

~ +0.3 ppm

~ +3 ppm

~ –1.5 ppm

3

J NH,αH

Hα Chemical shift changed

13

Reduced rates of exchange of amide protons with the bulk solvent water indicate reduced solvent accessibility and potential involvement in hydrogen bonds. Almost all amide protons in regions of regular protein secondary structure (except for those near the edges) are hydrogen-bonded. A corollary of this is that fast amide exchange rates generally imply the absence of ‘structure’. Measurement of amide proton exchange rates by following the time-course of the disappearance of signals in COSY, TOCSY or 1H–15N HSQC spectra therefore provides supportive evidence of secondary structure. Other exchange phenomena that manifest themselves in NMR spectra include cis–trans isomerization of proline residues, aromatic ring ‘flipping’ and the rotation of primary amides of asparagine and glutamine.

Deriving protein structures from NMR data Extensive computational calculations are necessary to translate the information contained within NMR data into a protein structure. The quality of the structure obtained is dependent on the accuracy and, to a greater extent, quantity of the NMR data. As a rule of thumb, at least ten long-range NOE restraints

a

b

c

d

Cα Chemical shift changed

dxy(i, i +n) refers to the distance between proton x in residue i and proton y in the residue n positions from the C-terminus of residue i. In the case of the β sheet j refers to the cross-strand partner. A 310 helix differs from an α helix in that the NH of residue i is hydrogen bonded to the carbonyl of residue i –4 in the α helix and of residue i – 3 in the 310 helix. The consequences of this for differences in NMR data are small but are, most notably, a small decrease in d αN(i, i +2), an increase in d αN(i, i +4) to the point where a NOE is not observed and a small decrease in 3 JNH,αH for the 310 helix. The values given above for a β sheet are for an antiparallel β sheet. Equivalent values for a parallel β sheet are essentially the same except for the interstrand distances, in particular d αα (i, j ) is much larger (4.8 Å). 1 Hα and 13Cα chemical shift changes are relative to random coil values.

are required for each amino acid residue to generate a ‘reasonable’ protein structure. Regions of protein secondary structure are identified via characteristic short-range (i.e. ≤ 5 residues apart) NOEs, coupling constants, amide proton exchange and chemical shift data (Table 3). Longerrange NOEs then define the tertiary structure, which can be refined further against all available data. Calculation of protein structures from NMR data requires conversion of the NMR data into distances and angles, usually in the form of allowed ranges, as

1892 PROTEINS STUDIED USING NMR SPECTROSCOPY

described above. These are incorporated as restraints into protein structure calculations, such that deviation from these values incurs an energetic penalty. In these calculations, distances and angles within a protein structure are optimized using a combination of distance geometry, molecular dynamics and simulated annealing procedures. Distance geometry calculations aim to optimize all interatomic distances within a protein on the basis of the experimental restraints and are often used to define the global fold of a protein. This may then be refined using restrained molecular dynamics simulations in which structures evolve over time under the influence of a force field that contains potential energy terms for both covalent and non-bonded interactions and includes NMR restraints. A variation on molecular dynamics is simulated annealing, which differs in that normally ‘prohibited’ potential energy barriers can be crossed, allowing regions of a molecule to ‘pass through’ one another, thereby sampling larger regions of conformational space; this technique does not require a defined starting structure. After optimization of the NMR restraints by any or all of these techniques the potential energy of the resulting structure is minimized by molecular mechanics calculations to determine the lowest energy structure. The accuracy of the protein structure can be further increased by refinement against coupling constants, chemical shifts (both 1H and 13C), relaxation time rates (T :T ) ratios and resid1 2 ual dipolar couplings. Structural calculations usually result in an ensemble of protein structures that must be assessed to determine how well they satisfy the initial restraints. Violations in excess of 1 Å may indicate that regions of the structure are ill defined. NOESY spectra can be back calculated on the basis of the structures and compared with the experimental data to identify potential errors. Comparison of calculated and experimental NOEs can also lead to an ‘R factor’ which gives an indication of the quality of the structures in a manner analogous to that used in X-ray crystallography. An alternative is to measure the RMS deviation of the ensemble of structures from the average structure. This measure should be used with care as it may indicate a high level of precision in the structures but not give a true indication of their accuracy.

Other applications of protein NMR Protein dynamics

Static 3D protein structures cannot always explain biological processes or point the way, for example,

to rational drug design; the dynamic properties of a protein may be of equal functional importance. Internal motions within proteins were recognized by early NMR experiments, and NMR spectroscopy has developed into a powerful technique for the study of dynamics, from the picosecond motions of bond vectors to millisecond motions. For labelled proteins, information about backbone dynamics and motions of nitrogen-containing side-chains can be obtained via 15N relaxation, which is dependent upon reorientation of the 15N–1H bond vectors. A more complete description can result from the additional use of 13C relaxation data. Side-chain dynamics can be further characterized via deuterium relaxation by incorporation of a single deuterium atom into methyl or methylene groups and determining the attenuation of intensities in 1H–13C-correlation experiments. Protein folding

A major advantage of NMR over X-ray crystallography is the ability to characterize the structure and dynamics of unfolded and partially folded states of proteins. These are of importance in protein folding and many cellular processes; indeed, many proteins or domains are intrinsically unstructured and only become structured upon binding other molecules. In these states, proteins exhibit rapid fluctuations between a range of conformations and insight into these processes can be gained from NMR, either through studying equilibrium states or through direct monitoring of kinetic folding events. Stabilized intermediates or fully denatured states at equilibrium can be characterized by essentially the same techniques as structured proteins, although chemical shift dispersion, with the exception of 15N and 13Ccarbonyl resonances, is usually poor. The kinetics of protein folding can be monitored via the time-course of hydrogen–deuterium exchange in 2D experiments, by pulse labelling or by stopped-flow techniques. For 15N-labelled proteins, folding can also be monitored during the course of a single 2D-HSQC experiment by analysis of line shape changes.

Concluding remarks There are other areas of protein NMR that have not been considered here but are covered elsewhere in the Encyclopedia. These include the binding of ligands and the study of membrane-associated proteins in detergent micelles or phospholipid bilayers using solid state magic-angle spinning techniques. The present article has focused on structural aspects of protein NMR. Determination of a de novo

PROTON AFFINITIES 1893

protein structure by NMR takes longer, in general, than by X-ray crystallography – provided the protein can be crystallized. However, NMR is the only technique for obtaining high resolution structural data from proteins that cannot be crystallized and for which appropriate concentrations of non-aggregated protein can be achieved. Limitations of molecular size can be overcome, as with other biochemical and structural techniques, by considering protein fragments, or domains. Structural data for large proteins can thus be obtained for domains individually and the linkage and assembly of domains can then be determined. The extension of protein NMR to larger molecules may be further facilitated by measurement of residual anisotropic interactions, which give structural restraints that are orientational rather than distance based. To summarize, NMR spectroscopy can provide a wealth of structural information about proteins and, leading from this, a greater insight into protein interactions and dynamic processes.

List of symbols I = cross peak intensity; J = coupling constant; r = distance between two protons; T1, T2 = relaxation times; I = dihedral angle defined by Hi –Ni –Cαi –Hαi; \ = dihedral angle defined by Hαi –Cαi –C = 0; χ1 = dihedral angle defined by Hαi –Cαi –Cβ–R. See also: Labelling Studies in Biochemistry Using NMR; Laboratory Information Management Systems (LIMS); Macromolecule–ligand Interactions Studied By NMR; Magnetic Field Gradients in High Resolution

NMR; NMR Data Processing; NMR Pulse Sequences; NMR Relaxation Rates; Nuclear Overhauser Effect; Parameters in NMR Spectroscopy, Theory of; Solvent Suppression Methods in NMR Spectroscopy; Structural Chemistry Using NMR Spectroscopy, Peptides; Two-Dimensional NMR, Methods.

Further reading Cavanagh J, Fairbrother WJ, Palmer AG and Skelton NJ (1996) Protein NMR Spectroscopy: Principles and Practice, 587 pp. San Diego: Academic Press. James TL and Oppenheimer NJ (eds) (1989) Methods in Enzymology, Vol 176 (Nuclear Magnetic Resonance, Part A: Spectral Techniques and Dynamics) 530 pp and Vol 177 (Nuclear Magnetic Resonance, Part B: Structure and Mechanism) 507 pp. San Diego: Academic Press. James TL and Oppenheimer NJ (eds) (1994) Methods in Enzymology, Vol 239 (Nuclear Magnetic Resonance, Part C) 813 pp. San Diego: Academic Press. Leach AR (1996) Molecular Modelling – Principles and Applications, 595 pp. Harlow, UK: Longman. Nature Structural Biology (NMR I Supplement) (1997) Vol 4, 841–866. Nature Structural Biology (NMR II Supplement) (1998) Vol 5, 492–522. Reid DG (ed) (1997) Methods in Molecular Biology: Protein NMR Techniques, 419 pp. New Jersey: Humana Press. Roberts GCK (ed) (1993) NMR of Macromolecules – A Practical Approach, 399 pp. Oxford: Oxford University Press. Wüthrich K (1986) NMR of Proteins and Nucleic Acids, 292 pp. New York: John Wiley & Sons.

Proton Affinities Edward PL Hunter and Sharon G Lias, National Institute of Standards and Technology, Gaithersburg, MD, USA

What is a proton affinity? The ‘proton affinity’ and the related quantity ‘gasphase basicity’, are defined thermodynamic quantities that enable us to assign numeric values to the tendency of a molecule to accept a proton in the gas

MASS SPECTROMETRY Theory

phase, or conversely, the tendency of a positive ion to donate a proton. That is ‘proton affinities’ and ‘gasphase basicities’ provide quantitative measures of the acid–base properties of positive ions and the corresponding conjugate-base neutral species in the gas phase. These quantities have proved to be of