350
Helical protein design Christian E Schafmeister* and Robert M Stroudt It is now possible to design small proteins capable of folding into compact structures with well-packed cores. Given the present state of knowledge of protein folding and design, it is possible to extract a set of engineering guidelines that may assist in future d e n o v o protein design.
Addresses 'Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Box 233, Cambridge, MA 02138, USA tS-964 Department of Biochemistry and Biophysics, UCSF School of Medicine, San Francisco, CA 94143-0448, USA
Current Opinionin Biotechnology1998, 9:350-353 http://biomednet.com/elecref/0958166900900350 ~ Current Biology Publications ISSN 0958-1669
Introduction T h e goal of de novo protein design is the design of proteins with new catalytic, sensor, and information processing functions, as well as the ehlcidation of the principles of protein fnlding. T h e de novo design of proteins may be thought of as the construction of an amino acid sequence that folds back on itself in a designed (or at least in a defined) fashion to generate a molecule that exhibits cooperative unfolding based on tertiary as well as secondary structural interactions. This challenge can be recast as the problem of beginning with a backbone conformation and determining an amino acid sequence (or the manifi)ld of sequences) that will fi)ld into the desired conformation. For any prntein to fold, its amino acid sequence must take advantage of hydrophobic, Van der Waals, and electrostatic interactions between amino acids to overcome the high chain cntropy of the disordered amino acid chain and fold into a compact three-dimensional structure. An cxcellcnt way to imagine the problem and process of protein fnlding is through the use of energy landscapes [1"]. Each possible conformation of the amino acid chain is represented by a point in a muhidimensional conformational space, with 3N-3 dimensions, where N is the number of atoms in the protein. T h e muhi-dimensinnal energy landscape of a protein is described by the free energy of each conformation of the protein as a function of the atomic coordinates. A deep funnel with a single free energy minimum represents a well-fi)lded protein that can fold reversibly. T h e current state of the art in de novo protein design has demonstrated that it is possible to design small proteins that are compact, have well packed hydrophobic cores and fold reversibly and, thus, may be described by funnel-like energy landscapes [2,3,4"',5"]. From what has been learned about protein folding, and protein design, one can extract engineering guidelines that can be used in the design of small compact proteins that exhibit native-like properties and can, therefore, be used as templates for display of flmctional determinants.
T h e scope of this review is an attempt to distill what are to the authors thc most important considerations a protein designer must address, as well as principles that can be uscd to design proteins in general and helical proteins in particular.
Binary patterning T h e most basic protein design guideline is that protein folding is driven by the burial of hydrophobic side-chains within the core of the protein [1",6,7]. At the simplest level, this impacts de no~,o design in that residues that fall along thc backbone of the desired three-dimensional structure can be assigned a binary pattern of being eithcr a hydrophobic or hydrophilic residue. This assignment is based on whether the side-chain of the amino acid will project into the cnrc of the protein (hydrophobic) or project into solvent (hydrophilic). Thermndvnamic scales for o~-helical propensities of amino acids [8,9] are useful t))r stabilizing o~ helices. Analysis of natural proteins [10-12], howevcr, together with protein design experiments [13], demonstrates that short (5-11 residue) amino acid sequences can be found to fold into different secondary structures, showing that secondary structure can be determined by context. 'lb stabilize o~ helices in the helical bundie structures, hydrophnbic residues should be placcd, on averagc, every 3.6 residues in such a way that thcy may interact with hydrophobic residues from other helices in the typical left-handed supertwistcd pattern characteristic of knobs into holes packing. To stabilize ~ sheets in an amphipathic environment, hydrnphobic and hydrophilic residues should alternate evcry other residue. By reducing the alphabet of amino acids used in a prntcin design, binary patterning and amino acid sequence selection become almost synonymous. My co-wnrkers and I [4"] used a redticed alphabet approach in the four hclix bundle protein D H P 1 that we designed and demonstrated to have a stable tertiary fold. We used Icucinc and alanine :is the only hydrophobic amino acids, and lysine, glutamate, and glutamine as the hydrophilic rcsidues [4"1.
Proteins designed using patterning and inspection Early reports incorporated the principle of binary patterning ahmg with visual inspection to design helical proteins. In a series of papers, DcGrado and co-workers [14-16] described (xl, c~2, and or4, a four helix bundle protein designed incrementally. T h e y first constrticted an anaphipathic ot helix (or1) that was capable of self-assnciation and then connected two and four of the peptides together with short turns (0~2, o~4, respectively). T h e component amphipathic helices of (x4 consisted of leucines at roughly every 3.6 residues, and whereas the protein forms stable helices and is cnmpact, it has properties that suggest that the residues in the core are not well packed. In a later rcpnrt,
Helical proteindesign Schafmeisterand Stroud 351
Handel et a/. [17] demonstrated that incorporating a zinc binding site into ~4 produced a protein with more nativelike properties. Hecht eta/. [18] described the design and characterization of the four helix bundle protein 'Felix', the protein incorporated binary patterning by laying the component helices out on helical net diagrams and hydrophnbic residues were placed at 3.6 residue intervals. Helical net diagrams can be constructed by arranging the o~-carbons of amino acids as they fall on a helix onto a paper tube and cutting the tube lengthwise, followed by flattening the resuhing rectangle. The arrangement of the ~-carbon positions represents the helical net. These early designs demonstrated the power of binary patterning to produce proteins that fi~ld into helical bundles; however, in each case these proteins demonstrated prnpctties consistent with a poorly packed core.
Libraries of binary patterned proteins Kanttekar eta/. [19] demonstrated the power of binary pattcrning for protein design by constructing a large libra U of putative four helix bundle proteins where the individual amino acids of the proteins where nnt explicitly defined; rather, residue positions on the inside of the bundle were assigned random hydmphnbic codons (NTN) (where N represents a random choice of DNA base), and residt,e positions on the outside were assigned random hydrophilic codons (NAN). Of the expressed proteins 29 folded into structures that were protease resistant, and, three of these were purified and found to form helical strt,ctures. In a subsequent report, Roy eta/. [20"] described the characterization of one of these binary patterned proteins and dcmnnstrated that it is well folded using several criteria, including NMR chemical shift dispersion and slow amide proton exchange.
Packing All protein designs mr, st addrcss the difficult problem of satisfactorily packing the protein core so that each amino acid side chain can pack into a unique environment. The simplest packing arrangement was st,ggested by Crick [21], in which two 0~ helices with hydrophobic side chains idealized by knobs can pack against each other, 'knobs into holes' packing, with the holes being the spaces between side chains. Packing this wa> Crick predicted that the helices would have a crossing angle of about 20 °. My coworkers and 1 [22] dealt with the packing problem in ot, r design of a helical peptide that cot, ld pack against the hydrophobic surface of a transmcmbrane protein, by designing the primary hydrophobic interface consisting solely of alanines. We predicted that this hydrophobic alanine surfiace would pack against itself and observed that, in the crystalline-state, it indeed packs 'knobs into holes' with a crossing angle of about -20 ° as an anti-parallel four helix bundle. In a subsequent report, we linked four of the helical peptides with glycine turns and demonstrated that the protein folded into a well packed structure [4"]. By systematically altering the hydrophobic residues of the heptad repeat in the helical peptide GCN4, Harbury eta/.
[23] were able to control the packing and self association of the peptides to form peptide dimers, trimers and tetramers. Desjarlais and Handel have developed an algorithm for packing the hydrnphobic cores of proteins in which mtamer libraries (libraries of faw)rable side chain geometries for each amino acid side chain) were constructed for each hydrophnbic side chain tailored to the local protein backbnnc. Then a genetic algorithm (an algorithm that modifies the sequence randomly and retains high scoring sequences for further modification) is used to optimize an amino acid sequence that best packs the protein core [24,25"]. They have demonstrated the power of this computational strategy by repacking the core of ubiquitin and phage 434 cro protein. In a series of reports, I)ahiyat eta/. [26-29] demonstrated another automated approach to solving the packing problem, starting with a fixed backbone fold, containing no amino acid side chains, and searching through all possible amino acid sequences in order to find one that best satisfies their scoring flmction. Each sequence is evaluated by scoring interactions of side-chains with the backbone and with other side-chains. Sidc-chain flexibility is treated by searching through a rotamer library; and the size of the rotamer/sequence space that must be searched is reduced using a variant of 'dead-end' elimination [30].
Turns and helix termini The role of turns in defining protein structure has been proposed as being that of defining elements nf secondary structure that break helices or strands and send the protein chain off in a new direction. Some experimental evidence suggests that turns play a more passive role in helical proteins. Brunet eta/. [31] replaced a three residue helical turn in cytochrome B-562 to create a library of tum mutant proteins. Their finding was that of the 31 mutants that they characterized all of them folded into stable structures, this included a turn that contained prnline, a conformationally restricted amino acid, and several turns that contained highly flexible glycines, demonstrating that cytochronre B-562 is very accommodating nf the types of amino acids that it will accept as turn residues. In a related experiment, MacBeath et ak [32"] randomly mutated three solvent accessible residues in a turn of the helical protein chorismate mutase and found that more than 63% of the sequences generated active protein; however, they found that a leucine that was immediately adjacent to the turn had a very strict requirement for aliphatic hydrophobic residues. In recent work [33"], they converted dimeric chorismate mutase into a monomeric protein through directed evolution and in the process engineered into the protein a new interhelical turn composed of random amino acids. Of these random turns, less than 1% generated active protein. The greater sensitivity of the engineered chorismate mutase to its turn composition compared to B-562 is probably due to the greater number of interactions that the engineered chorismate mutase turn makes with the hydrophobic core of the protein.
352
Protein engineering
Nagi et aL 134"] incorporated a series of turns consisting of 1-10 glycine residues in the protein ROP. T h c y determined that all of the ROP turn mutants ntaintained their RNA binding activity and wild-type structure. T h e stability of the ROP turn mutants, however, decreased with increasing turn length in a way that correlatcd well with the free energy cost of closing the loop.
helical proteins the composition of turns is not a determining factor in the overall structure of the protcin so hmg as the side-chains of the turn residues do not form specific contacts with the core of the protein. Finally, the length of turns should be kept short in order not to reduce the stability of the protein. De izot,o protein design has reached a very cxciting stage,
Analysis of 215 natural proteins illuminates statistical prefcrences for particular residues at tire ends of helices [35]. Asparagine is found as the first residuc (N-cap residue) 3.5 times more than wotdd bc expected based purely on a random distribution of residues. In most cases, the side chain of the asparagine residue was fotmd to be hydrogen bonding to onc of the free N-H hydrogen bonds at thc end of the helix. Praline was found with a preference of 2.6:1 at N-cap + 1, presumably because at that position it caps one of the free N-H hydrogen bonds. Glvcine had a large preference as the last residue of helices ((:-cap) where it ended 34% of all Ilelices, and is capable of satisfying two C=O groups while turning the protcin chain in a new direction. T h c s e preferences can be rationalized in terms of the ability of these residues to satisfy hydrogen bonding requirements of the helix ends.
Functional helical bundles T h e design of ftmctional proteins is challenging, but concrete examples are becoming available. Robertson e t a / . [36] constructed a 62-residue helix-turn-helix peptide that assembles into a four-helix direct and binds heine through hydrophobic contacts and two histidine side-chains. Using the 'binary code' patterned four helix brindle library developed by Kamtekar eta/. [19], Rojas eta/. [37"'] have shown that half of the 30 proteins thcy expressed were capable of binding heme. A development in protein engineering that has implications fbr design of functional proteins is a recent report of the engineering of c-yclophilin into a proline-specific protease [38"]. A Ser-His-Asp catalytic triad was introduced into the natural peptide-binding cleft of cyclophilin, converting it into endopeptidase that cleaved the natural substrate of cyclophilin.
where it is now possible to design small proteins that fold into predetermined three-dimensional structures. By combining the tools of de Jzot,o design, computational strategies of packing, combinatorial approaches to packing and binding, and ultimately tire incorporation of functional determinants, the goal of & Ho~o designing proteins with novel and useflll flmctions is a reality.
Acknowledgement Research x~:ts Sul~portcd by thc N l l l , GM 24485 to Robert M ~troud, and by an Ill IMI graduate fellowship to (]hrisrian E Sc]lafmcister.
References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: • of special interest • " of outstanding interest 1.
Dill KA, Chan HS: From Levinthal to pathways to funnels. Nat Struct Biol 1997, 4:10-19. An excellent review of the current state of understanding of protein folding. The authors show the use of free energy surfaces to describe the process of protein folding, •
2.
Struthers MD, Cheng RP, Imperiali B: Design of a monomeric 23residue polypeptide with defined tertiary structure, Science 1996, 271:342-345.
3.
Raleigh DP, Betz SE DeGrado WF: A d e n o v o designed protein mimics the native state of natural proteins. J Am Chem Soc 1995, 117:7558-7559.
4. •.
Schafmeister CE, LaPode SL, Miercke LJ, Stroud RM: A designed four helix bundle protein with native-like structure, Nat Struct Biol 1997, 4:1039-1046. An example of the design and characterization of a four helix bundle protein with a compact and well folded structure. 5. Dahiyat BI, Mayo SL: De n o v o protein design: fully automated •, sequence selection [see comments]. Science 1997, 278:82-87. Describes an elegant computational approach to solving the problem of constructing an amino acid sequence that packs efficiently, 6.
Chothia C: Hydrophobic bonding and accessible surface area in proteins. Nature 1974, 248:338-339.
7.
Dill KA: Dominant forces in protein folding. Biochemistry 1990, 29:7133-7155.
8.
Padmanabhan S, Marqusee S, Ridgeway T, Laue TM, Baldwin RL: Relative helix-forming tendencies of nonpolar amino acids. Nature 1990, 344:268-2?0.
9.
O'Neil KT, DeGrado WF: A thermodynamic scale for the helixforming tendencies of the commonly occuring amino acids. Science 1990, 250:646-650.
Conclusions Given the present understanding of protein folding and design, we believe that it is possible to extract a tentative collection of engineering guidelines that might assist in the process of de nos'o protein design. Firstly, protein folding is driven by hydrophobic collapse, and so as a first step, binary patterning of amino acids according to a given backbone conformation can go a long way to producing the desired Ibid. Secondly, thermodynamic scales for secondary structure preferences can be used to stabilize helices. Thirdly, side-chain packing is complicated but can be treated using computational methods that search automatically through sequence space using rotamer libraries to find sequences that can pack together in an optimal manner. Fourthly, for
10. Kabsch W, Sander C: On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations, Proc Nat/Acad Sci USA 1984, 81:1075-1078. 11. Argos P: Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. J Mol Biol 1987, 197:331-34& 12. Cohen BI, Presnell SR, Cohen FE: Origins of structural diversity within sequentially identical hexapeptides. Protein Sci 1993, 2:2134-2145.
Helical protein design Schafmeister and Stroud
13. Minor DL Jr, Kim PS: Context-dependent secondary structure formation of a designed protein sequence. Nature 1996, 380:730-734. 14. Eisenberg D, Wilcox W, Eshita SM, Pryciak PM, Ho SP, DeGrado WF: The design synthesis and crystallization of an alpha-helical peptide. Proteins - Struct Func Genet 1986, 1:16-22. 15. Regan L, DeGrade WF: Characterization of a helical protein designed from first principles. Science 1988, 241:976-9?8. 16. Hill CP, Anderson DH, Wesson L, DeGrado WE Eisenberg D: Crystal structure of ~1 : implications for protein design. Science 1990, 249:543-546. 17. Handel TM, Williams SA, DeGrade WF: Metal ion-dependent modulation of the dynamics of a designed protein. Science 1993, 261:879-885. 18. Hecht MH, Richardson JS, Richardson DC, Ogden RC: De novo design expression and characterization of Felix: a four-helix bundle protein of native-like sequence. Science 1990, 249:884891. 19. Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH: Protein design by binary patterning of polar and nonpolar amino acids [see comments]. Science 1993, 262:1680-1685. 20. Roy S, Ratnaswamy G, Boice JA, Fairman R, McLendon G, Hecht oo MH: A protein designed by binary patterning of polar and nonpolaramino acids displays native-like properties. J Am Chem Soc 1997, 119:5302-5306. Describes the characterization of a well folded four helix bundle constructed using a random library of binary patterned proteins. 21. Crick FHC: The packing of s-helices: simple coiled-coils. Acta Crystallogr 1953, 6:689. 22. Schafmeister CE, Miercke LJ, Stroud RM: Structure at 2.5/~. of a designed peptide that maintains solubility of membrane proteins. Science 1993, 262:734-738. 23. Harbury PB, Zhang T, Kim PS, AIber T: A switch between two-, three- and four-stranded coiled coils in GCN4 leucine zipper mutants. Science 1993, 262:1401-1407. 24. Desjarlais JR, Handel TM: De novo design of the hydrophobic cores of proteins. Protein Sci 1995, 4:2006-2018. 25. LazarGA, Desjarlais JR, Handel TM: De novo design of the • hydrophobic core of ubiquitin. Protein Sci 1997, 6:1167-1178. An example of the use of a computational strategy to repack the core of a natural protein. 26. DahiyatBI, Sarisky CA, Mayo SL: De novo protein design: towards fully automated sequence selection. J Mo/Biol 1997, 273:?89-796.
27
353
DahiyatBI, Mayo SL: Probing the role of packing specificity in protein design. Prec Nat/Acad Sci USA 199?, 94:10172-101 ?7.
28. DahiyatBI, Gordon DB, Mayo SL: Automated design of the surface positions of protein helices. Protein Sci 1997, 6:1333-1337. 29. DahiyatBI, Mayo SL: Protein design automation. Protein Sci 1996, 5:895-903. 30. Desmet J, Demaeyer M, Hazes B, Lasters h The dead-end elimination theorem and its use in protein side-chain positioning. Nature 1992, 356:539-542. 31. Brunet AP, Huong ES, Huffine ME, Loeb JE, Weltman RJ, Hecht MH: The role of turns in the structure of an alpha-helical protein. Nature 1993, 364:355-358. 32. MacBeath G, Kast P, Hilvert D: Exploring sequence constraints on o. an interhelical turn using in vivo selection for catalytic activity. Protein Sci 1998, 7:325-335. One of two papers that explore the sequence requirements of turns in the helical protein chorismate mutase. This paper demonstrates that some protein turns can have rigid sequence requirements, making their design more difficult. 33. MacBeath G, Kast P, Hilvert D: Redesigning enzyme topology by oo directed evolution. Science t 998, 279:1958-1961. This is the continuation of the story that began in [32°°]. 34. Nagi AD, Regan L: An inverse correlation between loop length and oo stability in a four-helix-bundle protein. Fold Des 1997, 2:67-75. This paper illustrates the effect of roop length on protein stability and underscores the importance of balancing between making the loop long enough to span the distance required without making it too long so that it destabilizes the protein. 35. RichardsonJS, Richardson DC: Amino acid preferences for specific locations at the ends of (alpha) helices. Science 1988, 240:16481652. 36. Robertson DE, Farid RS, Moser CC, Urbauer JL, Mulholland SE, Pidikiti R, Lear JD, Ward AJ, DeGrado WE Dutton PL: Design and synthesis of multi-haem proteins. Nature 1994, 368:425-432. 37. eo
RojasNR, Kamtekar S, Simons CT, McLean JE, Vagel KM, Spiro TG, FaridRS, Hecht MH: De novo heine proteins from designed combinatorial libraries. Protein Sci 1997, 6:2512-2524. This paper demonstrates that some types of binding functionality can be relatively easy to achieve in designed proteins. 38. Quemeneur E, Moutiez M, Charbonnier J-B, Menez A: Engineering • cyclophilin into a proline-specific endopeptidase. Nature 1998, 391:301-304. CycIophilin is converted into a protease by the engineering of a catalytic triad into the natural substrate binding site.