Analysis of conformational tendencies in proteins

Analysis of conformational tendencies in proteins

Analysis of conformational tendencies in proteins Malcolm J. McGregor and Fred E. Cohen U n i v e r s i t y of California, San Francisco, California, ...

578KB Sizes 1 Downloads 81 Views

Analysis of conformational tendencies in proteins Malcolm J. McGregor and Fred E. Cohen U n i v e r s i t y of California, San Francisco, California, USA A protein sequence is a blueprint for a three-dimensional structure. Computational and experimental scientists are learning to read this blueprint while protein designers are assembling the skills to create novel macromolecules. This article reviews recent developments in these areas. Current Opinion in Structural Biology 1991, 1:345-350

Introduction The folded structure of a protein is a unique compromise between the chemical preferences of the side chains and the physical demands of excluded volume on a polymeric chain. Theoreticians have endeavored to unravel the relationship between amino acid sequence and native folded structure by analysing proteins of known structure. More recently, simplified polymer models have been used to probe the behavior of proteins. Experimenters have studied the conformational properties of mutant proteins, protein fragments and d e n o v o designed proteins.

Conformational analysis of protein structures X-ray crystallographers and, more recently, NMR spectroscopists have determined the three-dimensional structures of over 150 different proteins. As Kauzmann anticipated, the native or folded state presents a surface to the solvent that is enriched in polar groups and simultaneously shields the majority of non-polar groups. Detailed analyses of these structures have suggested that secondary structure is an organizational intermediate in prorein folding. Although the number of polypeptide conformations could be astronomically large, it has been suggested that the number of plausible motifs for a folded protein is small [1]. These motifs can be described as regular assemblies of efficiently packed secondary-structure elements with an extensive network of hydrogen bonds. A detailed study of hydrogen-bonding stereochemistry has been conducted using 50 high-resolution structures [2]. 13 amino acid side chains were shown to be capable of forming classical hydrogen-bond interactions using 11 different functional groups. Scatter plots revealed the preferred hydrogen-bonding geometry; in general, interactions occur in the plane of the functional group and depend on the electronic configuration, steric accessibility and side-chain conformations of the groups involved.

The role of these factors has been further investigated in a study of the hydration of serine, threonine and tyrosine residues [3]. It was shown that the preferred interaction sites for these side chains with water molecules are influenced by the local secondary structure. Wilmot and Thomton [4] have considered the [3-tums (a subset of loops) in 58 protein structures. They have presented the latest version of their program for [3-tum prediction, BTURN 2.0, which uses the observed propensities of residue types to appear at the four positions of the tum. They have also suggested a new nomenclature for ]B-turns. This is based on the conformation of the central two residues as before, but the turn type refers directly to the area of the ~,~ map occupied by these residues. For example, what was formerly a type-I tum is now an am tum, where at denotes the area of the Ramachandran plot dominated by a-helical residues. This approach clarifies [3-tum nomenclature considerably. This analysis of ]3-turns is directly relevant to the model building of macromolecular structures. Although generic solutions to the protein folding problem are not available, recognizing that a sequence is homologous to a protein of known structure facilitates the construction of sensible model structures. The hypervariable regions of immunoglobulins provide the best example of this fact. There is a limited number of conformations for at least five out of six of these antigen-binding loops, and the known structures of immunoglobulins provide a structural basis for understanding this restriction [5"]. For four of these structures, extremely accurate predictions have been made in advance of crystal structure determinations; the structures predicted compare favourably with experimental data (see [5"], and references therein). Modelled structures are becoming increasingly important in many analyses of protein structure and function. Sophisticated computational tools, interfaced to computer graphics devices, facilitate the creation of plausible energy-refined models. Unfortunately, little quality control exists in the construction of these models. The calculated energy of the final structure fails to capture many

Abbreviation CD~ircular dichroism.

B

© Current Biology Ltd ISSN 0959-440X

345

346

Sequences and topology

features of the protein-solvent interface or the preferred interactions between side chains. In an effort to address this problem, Gregoret and Cohen [6.] have considered the quantitative features of side-chain packing in proteins. They have devised a method that rapidly assesses packing efficiency and the suitability of a side chain for its spatial environment. Chiche et al. [7] have followed this work with a study of the atomic solvation potential of globular proteins. Building on the work of Eisenberg and McLachlan [8], they showed that the expected atomic solration potential of a globular protein is linearly related to its molecular weight. For several (incorrect) protein models, the recorded atomic solvation potential values deviated significantly from the expected values. Thomas [9] has constructed a mechanical model in which a prorein is represented as a collection of rods (secondarystructure elements) joined by springs (loops). Hopefully, structures that efficiently distribute their mechanical energy will correlate with the more stable protein conformations. Preliminary resul~ have suggested that this computational procedure might distinguish some misfolded structures from their native-like counterparts.

determined after the development of the prediction algorithm. Kim and colleagues [11,12] have applied networks to the problem of predicting the surface exposure of amino acids and the disulfide-bonding state of cysteine residues. The challenge remains to understand how these neural networks succeed in encoding information about protein structure. It is also important to investigate whether or not deterministic networks with specialized inputs or hardwired connections can perform even better.

Lattice s i m u l a t i o n s

Polymer physicists have relied on simple lattices to describe the behavior of chain molecules (see Fig. 1). By quantizing conformational space, excluded volume effects are efficiently managed, thermodynamically valid ensembles can be studied, and simplified interaction potentials can be applied sensibly. The advantages offered by these simplified representations must be balanced against the errors intrinsic in a gross simplification of a fundamentally complex problem.

Neural networks deserve special mention as a tool for extracting information from the protein structure database in a compact algorithmic form. Kneller et al. [10] have studied secondary-structure prediction using this approach. By dividing proteins into structural classes, it was possible to construct an algorithm that correctly localized helices in all-helical proteins with 80 % fidelity. The network correctly anticipated 83 % of the secondary structure of three all-helical proteins whose structures were

(a)

(b)

(c)

(d)

>

>

The problem of the packing of polymers has been considered on a three-dimensional cubic [13"] or twodimensional square [14] lattice. Using a computer simulation, all possible stericalty allowed conformations of a polymer chain were enumerated. In the three-dimensional lattice, compact chains adopted secondary structures that resembled s-helices and 3-sheets, and the

oro

o\o • •

:o o... •





0

0 ~'O 0











0



Fig. 1. Simple lattices used to describe the behavior of chain molecules. (a) An example of a helix in a simple cubic lattice. (b,c) Two-dimensional projections of structures generated in a tetrahedral lattice: (b) a p-barrel; (c) a four-helix bundle. (d) An example of a knight's walk in two dimensions. Open circles represent excluded volume.

Analysis of conformational tendencies in proteins McGregor and Cohen amount of secondary structure increased with increasing compactness of the chain. These findings suggest that secondary structure is a requirement for any linear polymer that tends to form many intrachain contacts. However, maximally compact cubic-lattice 'protein' is 30 % more dense than real proteins. As significant secondary structure is observed in the molten globule, a relatively porous intermediate in protein folding [15], the precise function of compactness in the creation of secondary structure remains to be defined. In an attempt to duplicate the geometrical heterogeneity of the polypeptide chain, Covell and Jemigan [16] have exploited a face-centered cubic lattice to study protein structure. All possible folding topologies were generated subject to a precise boundary constraint on the molecular surface. The energy of each residue was evaluated using the pairwise interaction potential of Miyazawa and Jernigan [17]. It is encouraging that the low-energy conformations always included the narive-like state. Unfortunately, the numerical implications of even subtle distortions of the boundary conditions were large. As it is impossible to predict the precise molecular boundary of a protein, the usefulness of this approach is currently limited. Ca representations of the protein chain confined to a tetrahedral lattice and, more recently, a 210 or knight's walk lattice (Fig. 1) have been used to study the dynamics of protein folding [18-20,21.]. Simplified potential functions are used to describe the energetics of the assembling chain, and small biases are introduced to favor local tendencies of the native geometry. A dynamic Monte Carlo method is used to mimic protein folding. Reversible phase transitions have been observed between the native and denatured state on a picosecond time scale [18,19]. Examples of on-site construction, in which a protein assembles on a preformed nucleus, are common [21o]. More recent work using the knight's walk lattice has demonstrated diffusion-collision-adhesion behavior, in which preformed units collide and aggregate as the chain folds [21.]. The on-site construction mechanism has also been observed [21-]. In these simulations, the assembling chain passes through a compact transition state that is near the folded state. This finding agrees with experi- mental data, and supports the relevance of these simplified models. Work must now be directed towards the removal of any residual bias associated with preferences in the native structure.

Protein engineering Site-directed mutagenesis, peptide synthesis and protein design all offer direct approaches to answering how a sequence codes for a given structure. It has become increasingly clear that efficient side-chain packing plays a major role in determining the stability of the folded state. The molten-globule state, which can be trapped under certain conditions, has native-like secondary structure but lacks the precise tertiary packing that is the hallmark of

the native state. Presumably, the cooperarivity of the folding transition relates to the ability of the protein to elegantly interdigitate side chains while simultaneously satisf)dng the demands of the hydrophobic effect. In general, substitutions of buried residues, which are usually hydrophobic, are disruptive to structure (and therefore to function) unless a very similar side chain is substituted, whereas exposed side chains are much more tolerant of mutation. This has been shown, for example, for the globins [22] and for the k-repressor from k-bacteriophage [23]. As an example of the 'reverse' hydrophobic effect, there is even a side chain on the surface of X-repressor (Tyr26) which, when mutated to a more hydrophilic residue, increases the stability of the protein [24"]. The helix-forming tendencies of some hydrophobic amino acids have been studied using a 17-residue pepride that forms monomeric helices in water [25"']. Guest residues were inserted in specific positions, and circular dichroism (CD) spectroscopy was used to evaluate the helix-coil equilibrium. The results were compared to Pw the propensity of a residue type to occur in a helical conformation in proteins, as computed by Chou and Fasman [26]. The two sets of data do not necessarily agree. CD spectroscopy provides information about the average conformation of the whole pepride but cannot identify the residue-dependent distortions that underlie the Pet distinctions. The relatively homogeneous character of the sequence of the peptides studied makes it difficult to use NMR spectroscopy to determine the precise conformarion of individual residues. In order to avoid this problem, Bradley et a t [27] studied a peptide with a heterogeneous sequence that forms an 0t-helix. Regional conformational distinctions were noted by NMR that were not apparent by CD spectroscopy. Specific interactions between side chains were an additional source of helix stabilization that could not be followed in the host-guest systems [25"']. X-ray crystallography provides a detailed view of macromolecular structure. Peptide crystallography presents special challenges to conformational analysis: intermolecular interactions can have a profound impact on the pepride structure. Eisenberg and co-workers [28] determined the crystal structure of a 12-residue peptide which was designed to associate as a tetramer, with unexpected results. The fundamental unit was a hexamer, with component helix dimers and trimers. A pair of dimers related by a twofold axis of symmetry formed a tetramer, but energy calculations suggested that the hexamer was the more stable structure under acidic high-salt conditions. The packing in the hexamer is unusual, involving leucines that do not interdigitate but instead abut against each other, filling the center of the hexamer (Fig. 2). Helical conformations can be favored by using non-standard amino acids. Several investigators have incorporated ct-aminoisobutyric acid (at-methyl alanine or Aib) into their design [29,30]. The extra 0t-methyl group introduces conformarional constraints that strongly favor helix formation. The idea is that the conformation of novel se-

347

348

Sequences and topology

Fig. 2. Crystal packing in the (zl-peptide. Three helices are shown, forming half of the hexamer. In this schematic view, only leucine side chains are presented.

quences may be forced into rigid modules, providing a molecular 'Meccano (or Lego) set' approach to the synthesis of protein mimics. The most ambitious protein-engineering studies involve the design of globular proteins with novel sequences and/or properties. The overall topologies are based on commonly occurring structural motifs. Initially, some studies attempted to determine whether or not it is possible to design a novel protein that folds into a stable structural motif. Octarellin has been designed to fold into one of the best-studied topologies - - the eight-stranded 0t/[3-barrel [31"]. This consists of a retx~ting unit designed to adopt a [3-stmnd/0t-helix conformation. From an initial structural characterization, the eight-stranded unit adopts a compact structure whereas the seven- and nine-stranded units do not. These findings agree with the results of theoretical studies that have explored the most favorable geometric parameters for known 0t/[3-barrels [32]. These calculations suggested that any number of strands other than eight would generate substantial sheer in the [3-sheet hydrogen-bonding pattem; packing irregularities were also anticipated in seven- and nine-stranded barrels. Felix is a 79-amino-acid protein designed to fold into another common topology, the four-helix bundle [33"']. Needless to say, this is an extremely ambitious undertaking. The sequence is not homologous to any native protein but was designed using knowledge about the formation and packing of helices in real proteins. The sequence is native-like in that it is non-repetitive and has an

amino acid composition similar to that of real sequences. It has been shown that the protein possesses many of the desired properties. For example, it is monomeric in solution and predominantly at-helical (as measured by CD spectroscopy); a designed disulfide bridge linking two helices does actually form; and, the single ttyptophan residue of Felix is buried in a hydrophobic environment, though it appears to be accessible to solvent when the disulfide bridge is not formed.

The next step forward is to design specific binding sites. For example, a Zn 2 +-binding site has been successfully incorporated into a dimeric helix-loop-helix peptide and into a four-helical bundle [34°]. The binding site consists of 3 histidine side chains and is based on a Zn 2 +binding site in hemocyanin. The ultimate goal in protein engineering is to build proteins with specific catalytic activities. Chymohelizyme is a novel peptide that attempts to graft the catalytic residues of a serine protease onto a four-helix-bundle scaffold [35°]. In contrast to Felix, the helices are parallel and do not form a continuous chain; instead they are covalently linked at one end. At the other end, each helix contains a residue that forms the catalytic site, the 'oxyanion hole' and substrate-binding pocket necessary for hydrolysis. Afl~mityfor chymotrypsin ester substrates has been reported (with a rate of hydrolysis that is approximately 1 96 that of chymotrypsin) and the peptide is inhibited by chymotrypsin inhibitors. A detailed analysis of the origins of the catalytic activity of this molecule will depend on the stepwise replacement of the putative catalytic residues.

Analysis of conformational tendencies in proteins McGregor and Cohen

Conclusions The architecture of proteins is dominated by simple building blocks and secondary-structure elements that can be assembled in a number of ways. The hydrophobic effect dictates the collapse of these systems and efficient packing restricts the number of possible assemblies. Sequences have been designed to reproduce a structural theme, but the ability to precisely correlate a natural sequence with a three-dimensional structure remains beyond our grasp. Simplified models of protein structure are needed to manage the complexity intrinsic in the process of protein folding. Potential functions must be developed that are compatible with these simplified models. Finally, strate gies for bridging the gap between low- and high-resolution models are required if we hope to understand protein-chain assembly and computationally reproduce the process. The experimental characterization of intermediates in protein folding is vital to our understanding of this complex process.

7.

CHICHE L, GREGORET LM, COHEN FE, KOLLMANPA: Protein Model Structure Evaluation Using t h e Solvation Free Energy of Folding. Proc Natl Acad Sci USA 1990, 87:3240-3243.

8.

EISENBERGD, MCLACHLANAD: Solvation Energy in Protein Folding and Binding. Nature 1986, 319:199-203.

9.

THOMASDJ: The Entropic Tension of Protein Loops. J Mol Biol 1990, 216:459-465.

10.

KNELLERDG, COHEN FE, LANGRIDGER: I m p r o v e m e n t s in Protein Secondary Structure Prediction by an Enhanced Neural Network. J Mol Biol 1990, 214:171-182.

11.

HOLBROOKSR, MUSKALSM, KIM S: Predicting Surface Exposure of Amino Acids from Protein Sequence. Protein Eng 1990, 3:659-665.

12.

MUSKALSM, HOLBROOKSR, KIM S: Prediction of t h e DisulfideBonding State of Cysteine in Proteins. Protein Eng 1990, 3:667~72.

13. CHANHS, DILL KA: Origins of Structure in Globular Proteins. • Proc Natl Acad Sci USA 1990, 87:6388~392. The conformation of a polymer chain has been modeled on a cubic lattice. All possible compact conformations are generated, and it is found that the formation of helices and sheet-like structures increases with increasing compactness of the chain. Thus, secondary structure is an essential feature of any compact polymer chain. 14.

LAU KF, DILL KA: Theory for Protein Mutability and Biogenesis. Proc Natl Acad Sci USA 1990, 87:638~642.

Acknowledgements

15.

KUWAJIMAK: The Molten Globule State as a Clue for Understanding the Folding and Cooperativity of Globular-Protein Structure. Proteins 1989, 6:87-103.

This work was supported by the National Institutes of Health (GM39900) and the Searle Scholars/Chicago Community Trust.

16.

COVELLnG, JERNIGAN RI2 Conformations of Folded Proteins in Restricted Spaces. Biochemistry 1990, 29:3287-3294.

17.

MIYAZAWA S, JERNIGANRig Estimation of Effective interresidue Contact Energies from Protein Crystal Structures: QuasiChemical Approximation. Macromolecules 1985, 18:534-552.

18.

SKOLNICKJ, KOL~SKI A: Dynamic Monte Carlo Simulations of Globular Protein Folding/Unfolding Pathways. I. Six Member Greek Key ~-Barrel Proteins. J Mol Biol 1990, 212:787-817.

19.

SIKORSKIA. SKOLNICKJ: nynanlic Monte Carlo Simulations of Globular Protein Folding/Unfolding Pathways. II. or-Helical Motifs. J Mol Biol 1990, 212:819-836.

20.

SIKORSKIA, SKOLNICKJ: Oynatnic Monte Carlo Simulations of Globular Protein Folding. Model Studies of in vivo Assem. bly of Four Helix Bundles and Four Member 13-Barrels. J Mol Biol 1990, 215:183-198.

References and recommended reading Papers of special interest, published within the annual period of review, have been highlighted as: • of interest o• of outstanding interest 1.

DORIT RI~ SCHOENBACHI. GILBERTW: How Big is the Universe of Exons? Science 1990, 250:1377-1382.

2.

IPPOL1TOJA, ALEXANDERRS, CHRISTIANSONJM: Hydrogen Bond Stere•chemistry in Protein Structure and Function. J Mol Biol 1990, 215:457-471.

3.

THANKIN, THORNTONJM, GOODFELLOWJM; Influence of Secondary Structure on t h e Hydration of Serine, Threonine and Tyrosine Residues in Proteins. Protein Eng 1990, 3:49%508.

4.

WILMOTCM, THORNTONJM: ~-Turns and Their Distortions: a Proposed N e w Nomenclature. Protein Eng 1990, 3:479-493.

5. o.

CHOTHIAC, LESK AM, TRAMONTANOA, LEXqTr M, SMITH-GILL SJ, AIR G, SHERIFF S, PADLAN EA, DAVIES D, TULIP WR, COLMAN PM, SPINEIaa S, AIZARI PM, POLJAKRJ: Conformations of Immunoglobulin Hypervariable Regions. Nature 1989, 342:877-883. It is claimed that there is a small repertoire of main-chain conformations for five of the six immunoglobulin hypervariable regions. Predictions are made of newly determined immunoglobulin structures with great success. These findings have important implications for protein design. GREGORETLM, COHEN FE: Novel Method for the Rapid Evaluation of Packing in Protein Structures. J Mol Biol 1990, 211:959-974. This paper addresses the problem (not adequately covered elsewhere) of assessing the accuracy of model-built structures. A method is presented that gives a measure of the goodness of packing in protein structures, based on residue-residue contacts. 6. •

21. SKOLNICKJ, KOUNSKIA: Simulations of the Folding of a Glob• ular Protein. Science 1990, 250:1121-1125. A C a representation of a protein chain on a knight's walk lattice has been used in a simplified simulation of protein folding. The simulation shows some features that agree with what is known about the folding of real proteins. 22.

BORDO D, ARGOS P: Evolution of Protein Cores - - Constraints in Point Mutations as Observed in Globin Tertiary Structures. J Mol Biol 1990, 211:975-988.

23.

RE1DHAAR-OLSONJF, SAUER RT: Functionally Acceptable Substitutions in T w o or-Helical Regions of ~,-Repressor. Proteins 1990, 7:306-316.

PAKULAAA, SAUERRT; Reverse Hydrophobic Effects Relieved by Amino-Acid Substitutions at a Protein Surface. Nature 1990, 344:363-364. Tyr26 is found on the surface of ~,-repressor. Mutations at this position have the expected result: hydrophilic substitutions increase the stability of the mutant enzyme over the wild type. This is the reverse of the hydrophobic effect: enhancing the hydrophobicity of a residue on the surface of the native state stabilizes the folded state relative to the denatured state. 24. •

349

350

Sequencesand topology 25. •o

PADMANABHANS, MARQUSEES, RIDGEWAYT, LAUETM, BALDWIN P,L: Relative Helix Forming Tendencies of Nonpolar Amino Acids. Nature 1990, 344:268-270. A model peptide that forms monomeric helices in water is used to study the effect of introducing certain hydrophobic residues on the helix-forming tendency. The results do not necessarily agree with earlier host-guest experiments or with the frequency of occurrence of these residues in protein helices. 26.

CHOUFY, FASMANGD: Empirical Predictions of Protein Conformation. A n n u Rev Biochem 1978, 47:251-276.

27.

BRADLEYEK, THOMASONJF, COHEN FE, KOSEN PA, KUNTZ In: Studies of Synthetic Helical Peptides Using Circular Dichroism and Nuclear Magnetic Resonance. J Mol Biol 1990, 215:607-622.

28.

HILL CP, ANDERSONDH, WESSON L, DEGRADO WF, EISENBERG D: Crystal Structure of ~tt: Implications for Protein Design. Science 1990, 249:543-546.

29.

K A R ~ IL, FLIPPEN-ANDERSONJI, UMA K, BALARAM P: Parallel Zippers Formed by 0c-Helical Peptide Columns in Cr~'stals of Boc-Aib-Glu(OBzl)-Leu-Aib-Ala-Leu-Aib-AlaLys(Z)-Aib-OMe. Proc N a t l A c a d Sci USA 1990, 87:7921-7925.

30.

DEGRADOWF, LEARJD; Conformationally Constrained co-Helical Peptide Models for Protein Ion Channels. Biopolymers 1990, 29:205-213.

GORAJ K, RENARD A, MARTmL JA: Synthesis, Purification and Structural Characterization of Octarellin, a de n o v o Polypeptide Modelled on the 0t/~-Barrel Proteins. Protein Eng 1990, 3:259-266. The structure of octarellin consists of a repeating unit designed to fold into a 13-strand/0vhelix pattern. Seven-, eight- and nine-stranded

polypeptides are synthesized, of which only the eight-fold unit forms a compact structure. 32,

LASTERS1, WODAK SJ, PIO F: The Design of Idealized at/[~ Barrels: Analysis o f B-Sheet Closure Requirements. proteins 1990, 7:249-256.

33.

HECHTMH, RICHARDSONJS, RICHARDSONn c , OGDEN RC: D e n o v o Design, Expression, and Characterization of Felix: a Four-Helix Bundle Protein of Native-Like Sequence. Science 1990, 249:884-891. This is an attempt to design a protein of novel sequence that folds into a commonly occurring structural motif. Physical characterization indicates that the protein in solution has many of the desired features, such as high helix content. oo

34. HANDELT, DEGRADOWF: De n o v o Design o f a Zn 2 + -Binding • Protein. J A m Chem Soc 1990, 112:6710-6711. The Zn 2 + -binding site consists of three histidine side chains on a helixturn-helix motif. 35. •

HAHNKW, KIJS WA, STEWARTJM: Design and Synthesis of a Peptide Having a Chymotrypsin-Like Esterase Activity. Science 1990, 248:1544-1547. The structure consists of four covalently bonded helices, each of which provides one side chain necessary for catalytic activity. The peptide has 1% of the activity of chymotrypsin and is inhibited by chymotrypsin inhibitors.

31. •

MJ McGregor, Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94143-0446, USA. FE Cohen, Departt-nents of Pharmaceutical Chemistry and Medicine, University of California, San Francisco, California 94143-0446, USA.