TIBTECH - NOVEMBER
Protein structure determination by NMR lain D. Campbell and Brian Sheard For soluble proteins below about 10 kDa, nuclear magnetic resonance (NMR) now offers a route to obtaining a 3-D structure. The method is illustrated with work on human epidermal growth factor. The significance for biotechnology is that for the first time there is now an alternative to X-ray crystallography. Once a structure is solved by NMR methods, a relatively rapid examination of minor variants produced by recombinant DNA methods then becomes possible. A recurring challenge in biotechnology is to determine, control and monitor the 3-D structures of proteins. Good experimental techniques are few, and the more one considers what is meant by 'structure', the more elusive the concept becomes. Nevertheless, and quite rightly, regulatory authorities seek reassurance about structure when proteins are being developed for human or animal therapy. Since the 1960s, the only really useful experimental route to a detailed structure of a protein has been single crystal X-ray crystallography. Unfortunately this still retains as its Achilles' heel the uncertain process of preparing suitable crystals. This article deals with another method, one in which the protein is examined in solution. It takes its experimental information from high-resolution nuclear magnetic resonance (NMR). A number of computer-based mathematical and molecular simulation methods are then needed to solve the structure. NMR spectra (e.g. Fig. 1) arise from transitions between different magnetic energy levels in the molecule's hydrogen atoms when placed in a highly homogenous magnetic field and exposed to radiofrequency radiation. The spectrum is a set of
absorption peaks at different frequencies which are diagnostic for the different types of amino acid and contain information on their environment within the protein. The sample, held in a cylindrical glass tube, is inserted into a fixed radiofrequency transmitter and receiver coils. The tube is spun on its axis to average imperfections in the magnetic field. Broadly speaking, the higher the magnetic field, the greater the sensitivity. The best modern instruments operate at 500 MHz or 600 MHz and the high magnetic fields required for such instruments are provided by superconducting solenoids. The new method is illustrated with work on human epidermal growth factor (hEGF) resulting from a collaboration combining NMR expertise at Oxford University's Department of Biochemistry with modelling experience in ICI 1'2. Good crystals are not yet available for any member of the EGF family of proteins. The amino acid sequence of hEGF was determined by Gregory's group in ICI in 19753. It has 53 amino acid residues, with 3 disulphide bridges. In practice the 1-48 derivative was found to have solubility advantages and this was used for most of the NMR-based structural work reported so far. It is biologically very similar to the complete protein and further lain D. Campbell is at the Department NMR work has confirmed that the of Biochemistry, University of Oxford, 1-48 and the 1-53 structures are South Parks Road, Oxford OX1 3QU, UK. substantially the same. A similar structure has been reported for Brian Sheard is at ICI Pharmaceuticals, Mereside, Aldedey Park, Macclesfield, m o u s e EGF 4,5. Sequence homologies Cheshire SKIO 4TG, UK. have been recognized with TGFe~, © 1987, Elsevier Publications, Cambridge 0166- 9430/87/$02.00
1987 [Vol. 5]
with a vaccinia virus protein and with several other proteins.
Simplifying the spectrum For spectroscopy to give detailed information, peak assignments are required. That is, peaks of the spectrum must be associated with resonating groups in the protein. Different types of amino acid side chain are recognized from (i) NMR peak positions and (ii) the fine structure of the peaks. This is straightforward for small peptides and a brief examination of the spectrum of a dipeptide such as AlaTyr, for example, would be enough for an NMR spectroscopist to recognize the contributions from alanine and tyrosine. By contrast, Fig. la shows a 500 MHz NMR spectrum of hEGF. Too many peaks overlap for a complete assignment of the spectrum to be possible. The large quantity of information in protein spectra was 'until recently' a barrier to progress. Fortunately modern methods can now improve the resolution by spreading a spectrum out in two dimensions. Several 2-D NMR methods have been d e v e l o p e d , which differ according to the sequence and nature of the radiofrequency pulses used to generate the spectrum. Two main types of 2-D experiment are important for proteins: one reveals through-bond interactions between atomic nuclei while the other detects throughspace interactions. Specialized texts 6 should be consulted for the details of 2-D NMR; it is sufficient here simply to note that it provides an improvement in peak separation. Correlated spectroscopy (COSY) 7 is the archetypal through-bond experiment. Part of a COSY spectrum for hEGF is shown in Fig. lb. From the complication of Fig. la there has emerged a separation of peaks in a form that can be analysed.
Assigning the spectrum Improved resolution allows the spectroscopist to solve the difficult problem of assigning observed resonances to particular groups in the protein. Spectral assignment involves recognizing patterns of coupled resonances (spin systems) that are characteristic of a particular
TIBTECH - NOVEMBER 1987 [Vol. 5]
- - Fig. 1
"0.6 0.8 1.0
-1.2
~',~','~il!~°' I
I
I
I
2.4
2.2
2.0
1.8
I
I
I
I
I
1.2
1.0
0.8
0.6
I
1.6 1.4 p.p.m.
type of amino acid residue (Fig. 2). Details are not known at this stage in an investigation about which particular residue in the amino acid sequence (e.g. which valine) is to be identified with each particular spin system. Therefore a way is needed to obtain a sequential assignment and allow specific peaks in the spectrum to be identified with known residues in the sequence. A way forward was shown by Wuthrich's group 8. Their method, and most of the NMR-based discussion from this point onwards, depends on the phenomenon of nuclear Overhauser enhancement (NOE). The nuclear Overhauser effect causes a change in intensity of a resonance when the resonance of a nearby nucleus is irradiated in an appropriate manner. The NOE arises from magnetic dipole-dipole interactions, which cause spin polarization to be transferred from one nucleus to any other physically close to it. The atoms do not have to be within the same residue; they simply have to be near each other. If everything else is equal, an NOE varies inversely with the sixth power of the internuclear separation, r; that is, it varies with 1/]76. The important information is qualitative and it is this: if an NOE between two atoms is seen, the two atoms must be close together. In the 2-D version 9 of this experiment (NOESY), atoms that are near each other are identified from peaks away from the main diagonal of the 2-D spectrum.
"
• 1.4
p.p.m. -1.6 ¢
NMR spectra. (a) 500 MHz NMR spectrum of hEGF 1-48 in DE0 at pH 3.2 (not corrected for isotope effects) and 300(?,. The spectrum is a set of absorption peaks at different frequencies. Conventionally the frequency axis is expressed as a shift from a reference frequency given in parts per million (p.p.m.). (b) Part of a 2-D spectrum of hEGF, drawn as a contour plot and showing an increased separation of NMR peaks compared with (a). Information along the diagonal can be ignored.
,.~IB ' ' "~~" J
1.8
lllll
2.0 -2.2
,.
2.4 212 2'.0 1:8 1'.6 114 £2
Sequential assignment 8 is a pTocess of identifying pairs of adjacent residues (spin systems) and assembling them to form a chain by comparison with the known amino acid sequence. The process is as follows. Through-bond connectivities are not seen across peptide links; too many bonds separate the hydrogen atoms of one residue from the hydrogen atoms of the following residue: However, pairs of adjacent residues can be identified from NOEs between, for example, an N-H, e~C-H or [3C-H hydrogen atom of one residue and the peptide NH hydrogen atom of the following residue. These are labelled NN, aN and [~N connections in Fig. 3. For the conformations usually found in proteins, at least one of the corresponding distances is small enough for an NOE to be observed 1°. At this stage, not only have most of the NMR peaks been assigned to recognizable types of amino acid through their characteristic spin systems, but a valine peak, for instance, would be known to be specifically from Va119, say, rather than from Va134. Secondary structure The next step is to recognize major structural features in the protein. One kind of clue comes from the pattern of NOEs; for example, ~-strands have strong e~N connectivities while ahelices have medium-strong NN connectivities. To see how different types of secondary structure give
1:0 0'.8 0[6
2.4
p.p.m.
recognizable NOE patterns, molecular models have been studied in detail 11. A further clue comes from noting which peptide N-H hydrogen atoms do not undergo rapid isotopic exchange with deuterium atoms when D20 is used as the solvent. Although care is needed in its interpretation, slow exchange can point to the existence of long-lived hydrogen bonds, as in a helix or a sheet. Also useful are coupling constants between peptide N-H and o~C-H resonances which depend on the peptide dihedral angle phi. Low values are found in helices while larger values (>7 Hz) occur in extended strands 12. Helices, parallel and anti-parallel [~sheets and various kinds of turns have been recognized from this kind of information 11. In hEGF, the major structural feature was an anti-parallel [~-sheet 1. 3-D structure When this point of the analysis is reached, NMR has given some idea of --Fig. 2
-,,, / CH~\
I C H s CH-, ~
CH ....
,
I
,/
,
/
CH 2 )
t
I
NH--CH;~CO--NH--CH-FCO l % sll l Ill 1
lie
Examp/es systems.
,
....
Glu
of
comp/ete
spin
TIBTECH - NOVEMBER 1987 [Vol. 5] - Fig. 4
0--~
•,
.'••
20
the structural elements of the protein but important questions still remain: how do these structural elements fold together in three dimensions and what route is followed by the connecting strands? NOEs again provide key information. This time the NOEs which are particularly important are those between residues that are close in space but not close together in the amino acid sequence. Figure 4 shows such information for hEGF. One way to handle this accumulating information is to convert it to a set of limits on the distances between pairs of atoms. Tables containing all internuclear distances in the protein are constructed, with experimental information used to provide some of the entries in these tables. Upper limits and lower limits are recorded for each internuclear distance. In practice strong, m edi um and weak NOEs are taken to indicate upper limits of 0.25 nm, 0.35 nm and 0.45 nm. Known molecular bond lengths, bond angles and standard geometries can be used to provide interatomic distances for atoms separated by one or two bonds. Information on peptide dihedral angles, obtained from coupling constants, can also be recast as limits on the distances between atoms separated by three bonds. A lower limit on interatomic distances is typically set at the sum of the van der Waals radii.
--
Fig. 3
\/ CH
I -NH--CH--CO-
-NH--CH--CO-
Some important connections established by NOE measurements. NN labels the sequential connection established between two peptide N-H hydrogen atoms; o~N labels the linking of an ~C-H hydrogen atom to the following peptide N-H hydrogen atom; fiN labels the linking of a fiC-H hydrogen atom to the following peptide N-H hydrogen atom.
-o
_
30-
•
_
40-
.•
! • ~•
! e..,,
.-
•
I
0
%
e...~
10
I
I
I
20 30 40 Residue number
]
50
A summary of experimental information on hEGF, showing details of where NOEs are seen between amino acid residues that are not adjacent in the linear sequence. NOEs between backbone hydrogen atoms are shown above the diagonal NOEs between side chains and between backbone and side chains are s h o w n below the diagonal
We now have a fascinating problem. Is the set of distance constraints sufficiently complete to be compatible with a single 3-D structure? If so, can this structure be found? The interatomic distances are not precise, but are known only to lie within a defined range. Many of the distances are poorly constrained and the NMR-based data will contain experimental uncertainty and possibly error. Generating a 3-D structure is certainly not straightforward. Two essentially different approaches are available: 'distance geometry' methods 13 and molecular dynamics 14. An alternative method 15 allows torsion angles to vary within the protein and finds a structure that gives an optimum value for some user-defined function of deviations from the target distances. A review of distance geometry and related methods has been given by Braun 16. Distance
geometry
The 'distance geometry' method, as its name implies, works with distances between points rather than with their Cartesian coordinates. This, of course, is the form in which the experimental information is now held. The mathematical manipulations involved, which use matrix mathematics, allow a choice of three mutually perpendicular axes to be made such that a 'best fit' emerges as a three-dimensional description of the structure. This 'best fit' usually still contains some small incompatibilities with the distance information
supplied, however. These can then be minimized according to criteria supplied by the user. Although geometrically reasonable results are obtained, some kind of energy relaxation calculation may be n e e d e d to relieve strain. As noted, the starting information on distances is a set of limits, an upper limit and a lower limit for each interatomic distance. Alternative solutions to the problem are supplied by repeating the calculation with a random choice for the interatomic distances, each somewhere between its upper and lower limits. The effect is to sample conformation space. No test is k n o w n for uniqueness, but confidence grows if repeated calculations converge to a similar endpoint. Molecular
dynamics
The molecular dynamics m e th o d starts with a three-dimensional model, typically built in a molecular graphics system. Standard molecular geometries are usually used but the starting structure can, in principle, be anything. The atoms are given kinetic energy and the structure then changes as the atoms move according to equations of motion supplied by the user. Additional terms are included which make the energy increase w hen the NOE distances are violated. This is simply a stratagem to bring the overall structure to one that satisfies the NMR data. The trajectories of atoms are followed until an end point is recognizable, typically a simulation of a 10-50 picosecond period. As with the distance geometry approach, there is no known test for uniqueness, but confidence increases if repeated simulations from many starting points converge to a similar final structure. Experience shows that all of these methods can provide convergence to a repeatable end point. For hEGF, distance geometry and restrained molecular dynamics gave what was recognizably the same end point. Distance geometry calculations are faster, but with a need to provide energy relaxation there is something to be said for using a blend of these various techniques. As noted, there is no proof that the structure of hEGF (Fig. 5) has
TIBTECH - N O V E M B E R 1987 [Vol. 5]
--Fig. 5
determining macromolecular structure. High-temperature superconducting materials may also provide improvements in probe design. Significance for biotechnology For biotechnology, the significance of this approach is that for the first time an experimental alternative exists when crystals suitable for Xray analysis are not available. Even if it lacks some of the precision available from X-ray methods, this structure of hEGF would be an excellent starting point for a proteinengineering approach to new health care products derived from EGF, TGF(~ or the vaccinia virus protein. A structure for hEGF. Knowledge of the domain structure of a protein, or simply of which been determined unambiguously. would be needed for a spectroscopist residues are close together in space, The possibility remains that a rad- and a modeller to solve a new is a powerful stimulus to invention ically different structure might satisfy problem for a protein the size of EGF, when analogues are being designed the experimental observations. For assuming they are experienced on and decisions have to be made the moment we rely on a repeated this kind of project, are working about how to proceed with genetic convergence to a similar end point in a first class environment and manipulation to alter the protein's and the fact that a similar structure is have no limitations on the supply of properties. Once this kind of spectral analysis emerging independently from NMR pure protein. Examination of close studies of mouse EGF 5. analogues, if there is no large is complete, NMR provides an exconformational change, should be tremely powerful monitor of conThe method in practice formation. Besides being valuable in very much faster. Besides the h u m a n 2 and mouse 5 a research programme designing protein analogues, it may be useful as EGFs several proteins have now had The future 3-D structures determined in soluIn the future, extending such a check on conformation in the early tion by these methods. They include methods to soluble proteins of higher stages of process development. NMR glucagon bound to micelles 17, molecular weight depends on con- in biotechnology also retains the a proteinase inhibitor from bull tinued improvement in NMR tech- ability, well recognized in chemistry, semen TM, the Lac repressor head- nology and on increasing use of to detect impurities, particularly piece ~9, the prothrombin inhibitor, isotopic substitution methods, which small-molecule trace contaminants. hirudin from the leech 2°, the plant can simplify spectra and also toxins pharotoxin 21 and od-puro- aid spectral assignment 6. Spectral References 1 Carver, J. A., Cooke, R. M., Esposito, thionin 22, the globular domain of assignment is progressing for the C., Campbell, I. D., Gregory, H. and chicken histone H523 and the o~- 18.7 kDa protein T4 lysozyme, for Sheard, B. (1986) FEBS Lett. 205, 77amylase inhibitor, tendamistat 24. For example, by the use of 15N and 13C 81 tendamistat, X-ray analysis was per- enrichment and detection of their 2 Cooke, R. M., Wilkinson, A. J, Baron, formed independently and in paral- effects on the 1H NMR signals26. M., Pastore, A., Tappin, M. J., lel with NMR 25, and gratifying Griffey and Redfield estimate that Campbell, I.D., Gregory, H. and agreement was obtained. proteins 'up to 40kDa, and very Sheard, B. (1987) Nature 327, 339The practical requirements of the likely larger' might be amenable to 341 protein are that it must be obtainable study by such methods 6. Whereas 3 Gregory, H. (1975) Nature 257, 325in pure form in quantities of the order determination of the structure of 327 4 Montelione, G. T., Wt~thrich, K., Nice, of 20-50 mg, and it must be suf- smaller proteins such as EGF is now E.C., Burgess, A.W. and Scheraga, ficiently soluble. With a few 0.5 ml becoming well established, however, H.A., (1986) Proc. Natl Acad. Sei. samples at 5 mM, there are good the region beyond 10 kDa is still USA 83, 8594-8598 prospects for solving the structure of largely unexplored. 5 Montelione, G. T., Wtithrich, K., Nice, There is no doubt that advances a protein below 10 kDa. Like single E. C,, Burgess, A.W. and Scheraga, in other technologies, such as crystal X-ray crystallography, NMR is H.A. (1987) Proc. Natl Acad. Sci. expensive both in terms of the the use of artificial intelligence USA 84, 5226-5230 specialist expertise required and the methods to help spectral assign6 Griffey, R. H. and Redfield, A. G. equipment and computing resources ment, will bring about significant (1987) Q. Rev. Biophys." 19, 51-82 used. With luck, about 3-4 months improvements in this new way of 7 Aue, W. P., Bartholdi, E. and Ernst,
TIBTECH - N O V E M B E R 1987 [Vol. 5]
R. R. (1976) J. Chem. Phys. 64, 22292246 8 Wfithrich, K., Wider, G., Wagner, G. and Braun, W. (1982)J. Mol. Biol. 155, 311-319 9 Jeener, J., Meier, B. H., Bachmann, P. and Ernst, R. R. (1979) J. Chem. Phys. 71, 4546-4553 10 Billeter, M., Braun, W. and Wfithrich, K. (1982) J. Mol. Biol. 155, 321-346 11 Wfithrich,. K., Billeter, M. and Braun, W. (1984) J. Mol. Biol. 180, 715-740 12 Pardi, A., Billeter, M. and Wfithrich, K. (1984) J. Mol. Biol. 180, 741-751 13 Havel, T. F., Kuntz, I. D. and Crippen, G. M. (1983) Bull. Math. Biol. 45,665720 14 van Gunsteren, W. F., Boelens, R., Kaptein, R., Scheek, R. M. and Zuiderweg, E. R. P. (1985) in Mol[]
[]
[]
15 16 17 18 19
20
[]
ecular Dynamics and Protein Structure, (Hermans, J., ed.), pp. 92-99, Polycrystal Book Service Braun, W. and Go, N. (1985) J. MoL Biol. 186, 611-626 Braun, W. (1987) Q. Rev. Biophys. 19, 115-157 Braun, W., Wider, G., Lee, K. H. and Wfithrich, K. (1983) J. Mol. Biol. 169, 921-948 Willamson, M. P., Havel, T. F. and Wfithrich, K. (1986) J. Mol. Biol. 182, 295-315 Kaptein, R., Zuiderweg, E. R. P., Scheek, R.M., Boelens, R. and van Gunsteren, W. F. (1985) J. Mol. Biol. 182, 179-182 Clore, G. M., Sukumaran, D. K., Nilges, M., Zarbock, J. and Gronenborn, A. M. (1987) EMBOJ. 6,529-537
[]
[]
[]
[]
[]
[]
21 Clore, G. M., Sukamaran, D.K., Nilges, M. and Gronenborn, A.M. (1987) Biochemistry 26, 1732-1745 22 Clore, M., Nilges, M., Sukamaran, D. L., Bruenger, A. T. and Gronenborn, A. M. (1986) EMBO J. 5, 2729-2735 23 Clore, G. M., Gronenborn, A. M., Nilges, M., Sukamaran, D.K. and Zarbock, J. (1987) EMBO J. 6, 18331842 24 Kline, A. D., Braun, W. and Wfithrich, K. (1986) J. Mol. Biol. 189,377-382 25 Pflugrath, J. W., Wiegand, G. and Huber, R. (1986) J. Mol. Biol. 189,383386 26 McIntosh, L. P., Griffey, R. H., Muchmore, D. C., Nelson, C. P., Redfield, A. G. and Dahlquist, F. W. (1987) Proc. Nat] Acad. Sci. USA. 84, 1244-1248 []
[]
[]
[]
Cloning of genes involved in penicillin and cephalosporin
efficient promoters. Each of these areas will be reviewed here.
biosynthesis
Genes involved in penicillin or cephalosporin biosynthesis have been cloned in Escherichia coli, since the cloning systems in Penicillium and Cephalosporium are still at an early stage of development. However, cloning in E. coli has the disadvantage that the final product (the antibiotic itself), cannot be assayed directly. Even if the whole biosynthetic pathway was cloned and the genes expressed in E. coli, this host organism is very sensitive to the antibiotic formed. Identification of [3-1actam genes cloned in E. coli necessarily relies on hybridization with labelled probes or detection of enzymic activities. These approaches have been used at Eli Lilly (Indianapolis, IN) to clone the isopenicillin N synthase (IPNS) genes of C. a c r e m o n i u m 3 and P, chryso g e n u m 4 and by our group at the University of Leon (Spain) to clone the same gene from a different strain of P. chrysogenum (Ref. 5 and unpublished results.) Isopenicillin N synthetase catalyses the oxidative condensation of 3-(L-oc-aminoadipyl)-L-cysteinylD-valine (LLD-ACV) to isopenicillin N, an intermediate in the biosynthesis of penicillins, cephalosporins and cephamycins (Fig. 1). The IPNS gene was identified by purifying the
Juan F. Martin Fermentations for the production of antibiotics have provided one of the cornerstones of biotechnology in the 20th century. However, despite tremendous improvements in fermentation yields, the molecular mechanisms of antibiotic synthesis are poorly understood. The isolation of genes coding for enzymes involved in antibiotic pathways is an important step towards understanding. Industrial production of antibiotics by fermentation is well established, but basic studies on the biosynthesis and molecular mechanisms of control of the production of these microbial metabolites are scarce 1. Progress in understanding the biosynthesis of antibiotics has been very slow due, in part, to the intrinsic instability of the enzymes involved 2. Thus for the [3-1actam antibiotics (penicillins, cephalosporins, cephamycins, carbapenems, monobactams), some of the enzymes involved in penicillin, cephalosporin and cephamycin biosynthesis have been characterized, but a compre-
Juan Martin is at the Departamento de Ecologia Gen6tica y Microbiologla Universidad de Le6n, Le6n, Spain.
hensive picture of the biosynthetic pathways and their control mechanisms is not available. The molecular genetics of antibiotic-producing microorganisms are still in an early stage of development. Application of recombinant DNA technology to Penicillium chrysog e n u m and Cephalosporium acrem o n i u m will lead to the rational construction of strains with multiple copies of antibiotic biosynthetic genes. Another goal is to increase expression of these structural genes by fusing them to strong fungal promoters. Successful amplification of [3-1actam biosynthetic genes will depend on progress in three areas: (1) cloning the structural genes, (2) reintroducing the cloned genes into P. chrysogenum or C. acrem~onium and (3) expressing them using highly
Q 1987, Elsevier Publications, Cambridge 0166- 9430/87/$02.00
Cloning genes involved in ~-lactam biosynthesis