Protein folding from a combinatorial perspective

Protein folding from a combinatorial perspective

Review R27 Protein folding from a combinatorial perspective Robert T Sauer Combinatorial mutagenesis experiments show the existence of many differe...

195KB Sizes 0 Downloads 67 Views

Review

R27

Protein folding from a combinatorial perspective Robert T Sauer

Combinatorial mutagenesis experiments show the existence of many different solutions to the problem of complementary packing of non-polar sidechains in the protein core. They suggest that a significant amount of structural information is carried by the simple pattern of polar and non-polar residues along the polypeptide chain, indicate that the formation of buried polar interactions may be a fundamentally slow step in protein folding and show that proteins with many native properties occur at reasonable frequencies in random sequence libraries. Address: Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA. E-mail address: [email protected] Electronic identifier: 1359-0278-001-R0027 Folding & Design 01 Apr 1996, 1:R27–R30 © Current Biology Ltd ISSN 1359-0278

Introduction

How proteins find and maintain a unique conformation is an issue that has fascinated researchers for the past 40 years. How much information do individual sidechains in a natural protein sequence carry and how is this information decoded? How many different sequences are compatible with the same basic protein structure? Are critical sets of interacting residues required for specific protein folds? How much sequence information is really required to encode a cooperatively folded protein structure? Why do some proteins fold in milliseconds or less whereas others require hours or more? The analysis of mutant proteins provides a powerful method for addressing questions of this type. In this review, I discuss studies in which combinatorial mutagenesis has been used to investigate the determinants of protein folding and stability. In a combinatorial mutagenesis experiment, a library of mutants is generated by using cassette mutagenesis or PCR to randomly mutagenize a set of codon positions in a gene for a natural or designed protein [1–5]. The library is then transformed into cells, and mutant proteins that are functional or folded are identified using a biological selection or screens for activity, expression, or antibody cross-reactivity. By sequencing mutant genes which encode active and/or folded proteins, residue substitutions compatible with folding and function can be identified. If a specific sidechain property, such as size or charge, is required at a given sequence position, then only those sidechains with the required properties will be recovered. If the chemical identity of the sidechain is unimportant, then a variety of

chemically dissimilar sidechains will be recovered. In some instances, statistical analysis of the frequency at which particular substitutions are recovered can also be used to infer the relative importance of individual sidechains and to test for interactions between sidechains [6,7]. The end result of any combinatorial mutagenesis experiment is a set of multiply mutant proteins whose functional, biochemical, or structural properties can be determined and compared with each other and with the wild type. Sequence determinants of folding

What can be learned from combinatorial mutagenesis? Early application of this method to the N-terminal domain of l repressor and the Arc repressor of phage P22 showed that the information most important for folding is carried by residues in the hydrophobic core [1–4,8,9]. These core residues could often be substituted by other hydrophobic residues, but almost never by charged or polar groups. By contrast, residues on the surface of these proteins could, in general, be readily changed as long as the surface as a whole remained reasonably polar. More detailed studies of the tolerance of surface and turn residues in the GCN4 leucine zipper and cytochrome b562 showed that these residue positions do carry structural information, but far less than core positions [7,10]. If the protein surface is relatively accommodating of non-conservative amino acid substitutions, then the core must play the major role in specifying structure and stability. Indeed, the tight complementary packing of hydrophobic sidechains in native protein structures immediately suggests an important role for these residues by excluding solvent, maximizing van der Waal’s interactions, and avoiding costly steric overlaps [11]. Combinatorial mutagenesis provides a way to investigate the rules of hydrophobic core packing. In the first studies of this type, a set of interacting core positions in l repressor were mutagenized and active variants identified [3,12]. Somewhat surprisingly, a large number of different hydrophobic sidechain shapes and sizes were found to be accommodated in biologically functional proteins. For example, when three interacting residues in the l repressor core were mutagenized to allow all 125 combinations of the hydrophobic sidechains Val, Leu, Ile, Met, and Phe, 70% of the resulting sequences were found to be biologically active at some level, including some differing substantially in core volume [12,13]. The allowed diversity was even greater when folded but inactive proteins were included. Clearly, for this set of core positions in this protein, many different combinations of hydrophobic sidechain shapes and volumes allow maintenance of the overall protein fold. Similar findings of surprisingly per-

R28

Folding & Design Vol 1 No 2

missive core substitutions have also been observed for gene V protein [14], the GCN4 leucine zipper [15], cytochrome c [16], T4 lysozyme [17], the Rop protein [18], 434 Cro [19], and barnase [14]. In some of these studies, almost the entire protein core can be changed. In barnase, for example, active enzyme variants with hydrophobic substitutions at 12 of the 13 major core residues have recently been identified [20]. Although core packing can be remarkably malleable, there are also cases in which few if any changes are tolerated. Examples include the dimer interface of the N-terminal domain of l repressor [1] and a helix–helix packing interface in Arc repressor [21]. Why are some core positions more tolerant than others? One possibility is suggested by the crystal structures of multiply mutant core variants of T4 lysozyme [17] and l repressor [22], in which changes in core packing are accommodated by movements of the polypeptide mainchain. In l repressor, for example, several a-helices move away from each other by a small distance to allow the core substitutions to achieve good packing [22]. In instances in which the spectrum of permitted core substitutions is quite restricted, such adaptive mainchain movements may be too energetically costly. It is worth emphasizing that the observation of significant core permissivity does not indicate that complementary packing of core sidechains is unimportant for stable folding. In these cases, the flexibility of the protein structure, as seen in the mutant T4 lysozyme and l repressor structures [17,23], probably allows mutant sidechains of different sizes and shapes to maintain reasonably complementary packing.

and/or folding is perturbed that the detailed chemical properties of sidechains are important in determining the precise structural properties of proteins. In l repressor, for example, many of the core mutants were far less active than wild type despite being stably folded [12,13]. How then does simply having hydrophobic and hydrophilic residues at the correct positions help folding? One possibility is that the simple partitioning of hydrophobic and hydrophilic residues during folding constrains the polypeptide chain to a limited region of conformational space, thereby allowing it to find one or a few rough conformations from which more detailed structural factors can start to matter. Buried polar interactions

In addition to hydrophobic interactions, salt-bridge and hydrogen-bonding interactions are also present in some protein cores. One appealing idea is that buried polar interactions of this type help to confer conformational specificity, because small structural changes which disrupted these interactions would leave unsatisfied polar groups in a hydrophobic environment and thus would be very energetically costly [23]. In line with this model, the oligomeric and conformational specificity of a designed coiled-coil heterodimer was found to be reduced significantly when two Asn sidechains that mediated hydrogenbonding interactions in the core were replaced by Leu residues [24].

The results of early combinatorial mutagenesis experiments suggested that the positions of hydrophobic and hydrophilic residues in a protein sequence encoded some fraction of the basic structural information [9]. Hecht and colleagues [5] explicitly tested this hypothesis by designing a combinatorial library in which only the polar/nonpolar nature but not the chemical identity of each of the a-helical residues in a designed four-helix bundle protein was specified. At non-polar positions, Val, Ile, Met, Leu or Phe were allowed, and at polar positions, Asn, Asp, His, Gln, Glu and Lys were allowed. Roughly 60% of the proteins in this complex library were found to be compact, soluble, and resistant to intracellular degradation as expected for proteins that can fold to some extent. Moreover, several purified proteins from this library were found to have properties expected for four-helix bundle proteins. Thus, the simple binary pattern of polar and non-polar residues along a polypeptide chain must encode a significant amount of structural information.

A very different result was obtained, however, when the importance of buried polar interactions in Arc repressor was probed by combinatorial mutagenesis [25]. In wild-type Arc, the sidechains of Arg31, Glu36, and Arg40 interact via hydrogen-bonding and electrostatic interactions to form a partially buried salt-bridge triad. Following combinatorial mutagenesis of these positions, a number of active variants were recovered in which the salt-bridge residues were replaced by hydrophobic sidechains. Some of these mutants were as much as 4 kcal mol–1 more stable than wild-type Arc. The crystal structure of one mutant containing Met31, Tyr36, and Leu40 (MYL) was found to be very similar to wild type (0.7 Å rmsd) except for the substitution of hydrophobic interactions for the salt-bridge interactions. The main result with Arc, then, is that the buried salt bridge is not needed for conformational specificity and detracts significantly from protein stability. At present, it is not certain why buried polar interactions seem to be important for conformational specificity in some instances but not others. For some proteins, hydrophobic interactions appear to be sufficient to specify a unique fold. For other proteins, polar interactions may be required to tip the balance toward one fold and away from alternative structures that would otherwise have similar energies.

It is clear from numerous studies of mutants of natural proteins in which binary pattern is maintained but structure

The MYL variant of Arc folds at a rate that is 30–1000-fold faster than wild type depending on conditions [26]. Thus,

Binary patterns and folding

Review Protein folding from a combinatorial perspective Sauer

formation of the wild-type salt bridge in a non-polar environment slows the rate of protein folding significantly. The transition state in Arc refolding occurs before the majority of sidechain information is used in core packing or the formation of hydrogen bonds [27]. In this model, it makes sense that the need to form the salt bridge would slow folding because desolvation of the polar sidechains would exact a large penalty that would not be significantly recovered until the hydrogen-bond geometries of the salt bridge were optimized. Hydrophobic groups, on the other hand, could begin to interact favorably and thus reduce the transition-state free energy even before tight, complementary packing was achieved. It will be interesting to see whether buried polar interactions in other proteins also slow the folding rates of these molecules. Indeed, it seems possible that the amount of buried polar surface area in proteins is one of the main structural factors that determines their overall folding rates. Schindler et al. [28] have made the provocative suggestion that some proteins are capable of super-fast folding because there are no folding intermediates in these reactions. The MYL mutant of Arc can fold in less than 1 ms in a reaction in which the only significantly populated species are denatured and native protein [26] and thus would seem to support this general idea. However, there is evidence for an unstable, partially folded, dimeric intermediate in both the folding and unfolding reactions of the MYL mutant [26,29]. This suggests that even for proteins which exhibit super-fast folding, the acquisition of structure probably involves intermediate states. Folding of random protein sequences

If the identities of surface residues in proteins are relatively unimportant for folding and the need for appropriate core residues can be satisfied by a large number of different combinations of hydrophobic residues, then it is conceivable that protein sequences capable of folding could be identified in random libraries produced by combinatorial mutagenesis. In fact, in libraries composed of random combinations of Leu, Gln, and Arg, proteins resistant to intracellular proteolysis were found at frequencies of about 1% [30,31]. Purification and biochemical studies of several of these proteins revealed them to be a-helical, oligomeric, and to display reversible thermal denaturation. However, even the most native-like of these ‘random’ proteins differed from natural proteins in requiring some denaturant for solubility and in having extremely rapid rates of amide exchange. Recently, libraries composed of random combinations of 16 of the naturally occurring amino acids have also been constructed and screened (M Cordes, A Davidson, RT Sauer, unpublished data). One candidate from this library has a cooperatively folded b-structure and appears to have a well packed core by the criterion of near-UV circular dichroism.

R29

At present, it seems clear that molecules with many of the properties of natural native proteins can be identified at surprisingly high frequencies in random combinatorial libraries. Whether truly native-like proteins or functional proteins can be obtained in this manner remains to be seen. Some level of design coupled with combinatorial mutagenesis may be necessary to achieve these ends. Nevertheless, the results are consistent with the idea that, for some protein folds, enough structural information is encoded in a sufficiently simple manner to allow sequences encoding these folds to arise by chance with reasonable odds. Such events must, after all, have occurred during evolution from the RNA to the protein world. References 1. Reidhaar-Olson, J.F. & Sauer, R.T. (1988). Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241, 53–57. 2. Bowie, J.U. & Sauer, R.T. (1989). Identifying determinants of folding and activity for a protein of unknown structure. Proc. Natl. Acad. Sci. USA 86, 2152–2156. 3. Lim, W.A. & Sauer, R.T. (1989). Alternative packing arrangements in the hydrophobic core of l repressor. Nature 339, 31–36. 4. Breyer, R.M. & Sauer, R.T. (1989). Mutational analysis of the fine specificity of binding of monoclonal antibody 51F to l repressor. J. Biol. Chem. 264, 13355–13360. 5. Kamtekar, S., Schiffer, J.M., Xiong, H., Babik, J.M. & Hecht, M.H. (1993). Protein design by binary patterning of polar and nonpolar amino acids. Science 262, 1680–1685. 6. Gregoret, L. & Sauer, R.T. (1993). Additivity of mutant effects assessed by binomial mutagenesis. Proc. Natl. Acad. Sci. USA 90, 4246–4250. 7. Hu, J.C., Newell, N.E., Tidor, B. & Sauer, R.T. (1993). Probing the roles of residues at the e and g positions of the GCN4 leucine zipper by combinatorial mutagenesis. Protein Sci. 2, 1072–1084. 8. Reidhaar-Olson, J.F. & Sauer, R.T. (1990). Functionally acceptable substitutions in two a-helical regions of l repressor. Proteins 7, 306–316. 9. Bowie, J.U., Reidhaar-Olson, J.F., Lim, W.A. & Sauer, R.T. (1990). Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247, 1306–1310. 10. Brunet, A.P., Huang, E.S., Huffine, M.E., Loeb, J.E., Weltman, R.J. & Hecht, M.H. (1993). The role of turns in the structure of an alphahelical protein. Nature 364, 355–358. 11. Richards, F.M. & Lim, W. (1993). An analysis of packing in the protein folding problem. Q. Rev. Biophys. 26, 423–498. 12. Lim, W.A. & Sauer, R.T. (1991). The role of internal packing interactions in determining the structure and stability of a protein. J. Mol. Biol. 219, 359–376. 13. Lim, W.A., Farruggio, D.C. & Sauer, R.T. (1992). The structural and energetic consequences of disruptive mutations in a protein core. Biochemistry 31, 4324–4333. 14. Sandberg, W.S. & Terwilliger, T.C. (1991). Energetics of repacking a protein core. Proc. Natl. Acad. Sci. USA 88, 1706–1710. 15. Hu, J.C., O’Shea, E.K., Kim, P.S. & Sauer, R.T. (1990). Sequence requirements for coiled-coil interactions: analysis using l repressor–GCN4 leucine zipper fusions. Science 250, 1400–1403. 16. Fredericks, Z.L. & Pielak, G.J. (1993). Exploring the interface between the N- and C-terminal helices of cytochrome c by random mutagenesis within the C-terminal helix. Biochemistry 32, 929–936. 17. Baldwin, E.P., Hajiseyedjavadi, O., Baase, W.A. & Matthews, B.W. (1993). The role of backbone flexibility in the accommodation of variants that repack the core of T4 lysozyme. Science 262, 1715–1718. 18. Munson, M., O'Brien, R., Sturtevant, J.M. & Regan, L. (1994). Redesigning the hydrophobic core of a four-helix-bundle protein. Protein Sci. 3, 2015–2022. 19. Desjarlais, J.R. & Handel, T.M. (1995). De novo design of the hydrophobic cores of proteins. Protein Sci. 4, 2006–2018. 20. Axe, D.D., Foster, N.W. & Fersht, A. (1996). Active barnase variants with completely random hydrophobic cores. Proc. Natl. Acad. Sci. USA in press.

R30

Folding & Design Vol 1 No 2

21. Milla, M.E. & Sauer, R.T. (1995). Critical side-chain interactions at a subunit interface in the Arc repressor dimer. Biochemistry 34, 3344–3351. 22. Lim, W.A., Hodel, A., Sauer, R.T. & Richards, F.M. (1994). The crystal structure of a mutant protein with altered but improved hydrophobic core packing. Proc. Natl. Acad. Sci. USA 91, 423–427. 23. Hendsch, Z.S. & Tidor, B. (1994). Do salt bridges stabilize proteins? A continuum electrostatic analysis. Protein Sci. 3, 211–226. 24. Lumb, K.J. & Kim, P.S. (1995). A buried polar interaction imparts structural uniqueness in a designed heterodimeric coiled coil. Biochemistry 34, 8642–8648. 25. Waldburger, C.D., Schildbach, J.F. & Sauer, R.T. (1995). Are buried salt bridges important for protein stability and conformational specificity. Nature Struct. Biol. 2, 122–128. 26. Waldburger, C.D., Jonsson, T. & Sauer, R.T. (1996). Barriers to protein folding: formation of buried polar interactions is a slow step in acquisition of structure. Proc. Natl. Acad. Sci. USA in press. 27. Milla, M.E., Brown, B.M., Waldburger, C.D. & Sauer, R.T. (1995). P22 Arc repressor: transition state properties inferred from mutational effects on the rates of protein unfolding and refolding. Biochemistry 34, 13914–13919. 28. Schindler, T., Herrler, M., Marahiel, M.A. & Schmid, F.X. (1995). Extremely rapid protein folding in the absence of intermediates. Nature Struct. Biol. 2, 663–673. 29. Jonsson, T., Waldburger, C.D. & Sauer, R.T. (1996). Nonlinear free energy relationships in Arc repressor unfolding imply existence of unstable, native-like folding intermediates. Biochemistry in press. 30. Davidson, A.R. & Sauer, R.T. (1994). Folded proteins occur frequently in libraries of random amino acid sequences. Proc. Natl. Acad. Sci. USA 91, 2146–2150. 31. Davidson, A.R., Lumb, K.J. & Sauer, R.T. (1995). Cooperatively folded proteins in random sequence libraries. Nature Struct. Biol. 2, 856–863.