J. Mol. Biol. (1974) 88, 857-878
Structural Principles of the Globular Organization of Protein Chains. A Stereochemical Theory of Globular Protein Secondary Structure v.
I. LIM
Institute of Protein Research Acadenay of Sciences of the U.S.S.R. Poustchino, Moscow Region, U.S.S.R. (Received 26 March 1974) The paper reveals the types of amino acid sequences of polypeptide chain regions of globular protein which form a regular (a or j?) or irregular conformation in the native globule. The study was made taking into account general “architectural” principles of packing of polypeptide chains in globular proteins and considering the interactions of proteins with water molecules. An a priori theory is developed which permits the identification, in good agreement with experiment, of or-helical and j?-structural regions in globular proteins from their primary structure.
1, Introduction In recent years the attention of researchers studying the interrelation between chemical and three-dimensional structures of globular proteins has been focused on the statistical analysis of the distribution of single amino acid residues and their different combinations within the helical, /3-structural regions and different bends of the protein chain. On the basis of such an analysis empirical rules have been proposed to find different types of regular (Q or p) or irregular regions (Prothero, 1966; Periti et al., 1967; Low et al., 1968; Ptitsyn & Finkelstein, 1970; Robson & Pain, 1971; Nagano, 1973; Wu & Kabat, 1973; Chou & Fasman, 1974). Attempts to elucidate the physical nature of the interrelation existing between the primary structure and secondary structure were made in the majority of these studies. For example, in the papers by Kotelchuk & Scheraga (1969), Lewis et al. (1970), Ptitsyn & Finkelstein (1970), Chou & Fasman (1974) local interactions were taken into account, i.e. interactions between the side group of the amino acid residue and the backbone; this permitted a classification of the amino acids into helical and non-helical and the development of methods for the localization of helical regions in globular proteins with the help of such a classification. However, it should be noted that in a real situation local interactions themselves can create only a fluctuating secondary structure in the unfolded polypeptide chains in which some regions stay longer in the helical state and others stay less. In a real compact globule, local interactions lose their significance (Ptitsyn et al., 1972) and cannot play a dominating role in stabilization of helical regions. The methods of helical region localization taking into account only local interactions are in satisfactory agreement with experiment (especially the method by Chou & Fasman) since local interactions in native globular proteins in most cases stabilize 857
v.
858
1. LIM
the same secondary structure as the long-range ones, i.e. there is a concordance 01 both types of interaction (e.g. Ptitsyn & Finkelstein, 1970; Robson & Pain, 1971; Scheraga, 1971; Ptitsyn et al., 1972). The unique secondary structure, in which some regions are always helical and t,he others always non-helical, is formed only in combination with the formation of a compact tertiary structure of globular protein. It means that the unique secondary structure is stabilized by long-range interactions between different regions of the globular protein. But this imposes its own rigid restrictions on the primary structure of helical and /3-structural regions, completely independent of the restrictions imposed by local interactions. This fact was taken into consideration for the first time in a paper (Schiffer & Edmundson, 1967) suggesting an empirical method of finding helical regions based on a partial accounting of long-range interactions. On the basis of studies by Perutz et al. (1965) of hydrophobic side group distribution on the surface of helical regions of myoglobin and haemoglobin molecules, Schiffer & Edmundson proposed the rule of “helical wheels”. According to this rule, the helix surface must be clearly divided into two mutually non-intersecting zones, the hydrophobic and hydrophilic if we look at) the helix from the butt. They came to the conclusion that this requirement is satisfied best of all by the regions with hydrophobic amino acid residues in positions i, i & 3, i f 4 along the chain. From their point of view, such a clustering of hydrophobic groups on the helix is important for realization of hydrophobic interactions in proteins and for stabilization of the helix by interactions of side groups in the neighbouring turns of the helix. The distribution of hydrophobic residues on the surface of helical regions taken into account by Schiffer & Edmundson’s method creates the necessary conditions (but not the sufficient conditions which can be determined by other types of interactions and the concrete stereochemistry of amino acid residues) for stabilization by hydrophobic interactions of helices situated only tangentially to the globule surface. Such an orientation is not the only possible one for helical regions in globular proteins. In some proteins the suggested method leads to results which can hardly be considered as satisfactory. The physical theory of secondary structure must be developed taking into account, first of all, long-range interactions playing the main role in stabilization of the globular protein three-dimensional structure. All the types of long-range interaction which are common and, consequently, basic for all the globular proteins must be taken into account. Our paper is devoted to the elaboration of such a theory.
2. Main Principles of the Globular Organizations
of Protein Chain
Voluminous experimental data on three-dimensional structures (see e.g. Dickerson & Geis, 1969; Matthews, 1973) and intramolecular packing (Lim & Ptitsyn, 1970; Klapper, 1971; Lee & Richards, 1971; Lim & Ptitsyn, 1972) of water-soluble globular proteins lead to the conclusion that the most salient and impressive structural features of these globular proteins are: (1) compactness of form ; (2) presence of a tightly packed hydrophobic
core (cores) and a polar shell.
These structural princip1e.s of the organization single out globular proteins into a separate class of biopolymers and must be taken as a basis for analysis of globular protein polypeptide chain conformation.
STEREOCHEMISTRY
OF
GLOBULAR
The first structural principle imposes the following formations of any region of the polypeptide chain:
PROTEINS
859
requirements for possible con-
(la) all the helical, /Mructural and irregular regions must be attached by noncovalent interactions to the main part of the globule; (lb) the linear dimensions of regular and irregular regions cannot exceed the linear dimensions of the globule. The following requirements result from the second structural principle : (2a) the overwhelming majority of massive hydrophobic side groups must be fully or partially immersed in hydrophobic cores, while hydrophilic groups must not penetrate into these hydrophobic cores; NH and CO groups of the backbone also must not penetrate into these cores without formation of a hydrogen bond.
(2b) The conformation of the backbone of the region having hydrophobic side groups must be compatible with the tight packing of these hydrophobic groups in the hydrophobic core or cores. (2~) The hydrophilic shell must at least partially shield each hydrophobic core from water molecules; hydrophilic groups must, if possible, form salt and hydrogen bonds which together with hydrophobic interactions will attach the separate parts of the globule to each other. A very important property of globular protein is the unique character of its threedimensional structure (Flory, 1969). Conformational mobility of the protein chain in the neighbourhood of the native “X-ray” structure is, naturally, tolerated since the protein molecule is a machine performing definite work and as such must have moving parts. Thus, in the case of globular protein we have a unique three-dimensional structure in which the conformation of the majority of regions of the chain is compatible with structural requirements 1, lb, 2a, 2b, 2~. The unique character of the three-dimensional structure permits us to presume that primary structures of globular proteins are chosen so that any region of the chain can form, as a rule, only one conformation compatible with stuctural requirements. Otherwise, if the majority of separate regions of the chain were compatible with structura1 requirements in two or more conformations it would lead to the appearance of two or more energetically (by free energy) equivalent conformations for these regions within the globule, and the polypeptide chain of the protein would form a set of several gIobular structures and not one unique functioning globule. It can also be supposed that for any region of the chain, the conformation compatible with structural requirements (these requirements embrace all the main types of physical interactions in globular protein in aqueous medium) will be the most advantageous energetically since otherwise it will also lead to a disturbance of the unique character of the protein structure. Consequently, the unique character of the protein three-dimensional structure permits the conclusion that all the regions of the chain compatible in the helical conformation with structural requirements la, 1b, Za, 2b, 2c will be helical. Similarly, all the regions of the chain compatible in the @tructural conformation with the structural requirements will be /3-st,ructural. In other words, a completely a priori physical theory can be developed which would find the most advantageous energetical conformations for any region of the protein polypeptide chain in the globule by a simple geometric approach based
v.
860
I.
LIM
on the above-mentioned structural requirements. Such an approach permits us to avoid quantitative evaluations of energy and is based only on the recognition 01’ amino acid sequences forming the regular or irregular type of structure with the help of qualitative stereochemical considerations.
3. Tight Packing of the Hydrophobic
Core
In the analysis of tight packing of the hydrophobic core two facts are important. First, packing of the hydrophobic side groups of amino acid residues in hydrophobic amino acid crystals and protein hydrophobic cores (see e.g. Gurskaya, 1966; Lim & Ptitsyn, 1972) can be schematically represented as in Figure l(a). According to this scheme, the packing of hydrophobic groups closely resembles the tight packing of spheres. The alternative scheme of packing (Fig. l(b)) gives a less tight packing. In the case when two or more hydrophobic groups are in spatially approached positions (i, i + I), (i, i + 3), (i, i + 4), (i, i + 1, i + 4), (i, i + 3, i + 4) on the z-helix and in positions (i, i + 2) in the p-structure, the approach to these groups of a hydrophobic side group that is remote in the amino acid sequence, according to scheme la, will lead to their rigid mutual fixation on the surface of the regular structure and, correspondingly, to the stabilization of the regular structure. Realization of scheme lb on the surface of the regular structure will lead to the formation of cavities comparable with real dimensions of a water molecule, which is not usually observed in protein hydrophobic cores (Lee & Richards, 1971; Matthews, 1973). Moreover, unlike scheme la, scheme lb does not lead to a rigid mutual fixation of hydrophobic groups on the surface of the regular structure and to the stabilization of the regular structure. Consequently, scheme la is the most acceptable for consideration of hydrophobic side group packing. Second, hydrophobic side groups interact between themselves mainly through their peripheral atomic groups. This regularity is strictly traced in crystals of hydrophobic amino acids (Gurskaya, 1966) as well as in the overwhelming majority of protein structures with hydrophobic cores (see e.g. Lim & Ptitsyn, 1972). It should be emphasized that the peripheral parts of hydrophobic side groups such as Val, Tle,
(0)
(b)
FIQ. 1. (a) Scheme of tight peaking of hydrophobic side groups in the hydrophobic core. Side groups are oonventionally designated by figures 1, 2, 3, 4. Shaded regions are the peripheral regions of groups. (b) Alternative scheme of packing.
STEREOCHEMISTRY
OF
GLOBULAR
PROTEINS
X61
Leu, Phe and Trp that come into contact are very similar in form and linear dimen. sions (Plate I). The peripheral parts (terminal methyl groups) of the side groups Leu and Val, for example, are geometrically identical; at sterically resolved values of thb angle xa equal to 180” and 300” the distance between terminal methyl groups in Ile is 2.6 A, which is close to the distance between terminal methyl groups in Val and Lel: of 2.5 A. Consequently, the spatial approach to each other of hydrophobic groups in globular proteins is not determined by the specificity of local geometry of their contacting peripheral surfaces. Thus, the conclusion can be drawn that the tight packing of hydrophobic side groups in globular proteins is described by the scheme represented in Figure l(a). This conclusion permitted us to perform a stereochemical analysis using Courtauld Atomic Models of a possible entry of hydrophobic groups on the surface of the regular structure into the hydrophobic core, taking into account requirement 2b. As an example we shall show how such an analysis was done for hydrophobic side groups which are in spatially approached positions in the u-helix. This was modelled by approaching the leucine side group to the side hydrophobic groups spatially drawn together in positions (i, i + l), (i, i + 3), ( i, i + 4), (i, i + 1, i + 4) and (i, i + 3, i + 4) on the surface of the u-helix (the nature of the approached hydrophobic group is of no importance due to the above-mentioned similarity of surfaces of the side group contacting regions). Side groups on the surface of the a-helix were fixed in different combinations of their rotational isomers. In working with the models a deviation of $20” from the equilibrium values was allowed for each angle of internal rotation. Pairs and triplets of hydrophobic residues in positions (i, i + l), (i, i + 3), (i, i + 4), (i,i+l,i+4)and(i,i+3,i+4) were considered to form “helical” combinations if they could be tightly packed according to scheme la with the approached side group of leucine. The packing was considered to be tight if the approached group simultaneously contacted with two side groups of the pair and with three side groups of the triplet, and if cavities comparable in dimensions with a water molecule were not formed. It was considered that side groups contacted between themselves if the distance between peripheral contacting parts of model atoms did not exceed ~2 cm (distances between the groups belonging to the helix were not taken into consideration; the scale of the Courtauld Atomic Models was 2 cm : 1 -4). Interaction of the approached group with the polar OH group of tyrosine, and nitrogen of tryptophan, was not permitted. We shall denote single hydrophobic pairs and triplets as “ant,ihelical” if they do not form on the a-helix surface a tight packing with the approached leucine side group. The list of antihelical pairs and triplets is given in Table 1 of the following paper (Lim, 1974a). None of the antihelical pairs and triplets will form ;I tight packing with side groups Ala, Met and Cys if these side groups are taken as an approached group. This is stipulated by the fact that the dimensions of the peripheral parts of the side groups of Ala, Met and Cys are smaller than the dimensions of thr peripheral parts of the side groups of Leu, Val, Ile, Phe, Tyr, Trp. (It is interesting to note that within the limits of such stereochemical analysis proline is equivalent’ t,o alanine.) The presented method of analysis roughly models the local region of the hydrophobic core formed with participation of spatially approached hydrophobic side groups of a fragment with a predetermined conformation of its backbone, the u-helical in particular. The tight packing of such a local region of the core in a real situation 50
862
V. I. LIM
depends, of course, on the nature and possible orientations of hydrophobic groups of the fragment with the predetermined conformation of the backbone, as well as on the type of the approached hydrophobic side group, its orientation and conformation of its backbone. However, there are situations when the spatially drawn together hydrophobic groups on the fragment with the predetermined conformation of the backbone cannot form a tight packing with any approached hydrophobic group, no matter how the approached group and its backbone are orientated relative to these groups. The tight packing analysis described above serves to reveal precisely these situations, By this procedure hydrophobic pairs and triplets were found on the a-helis surface which cannot tit tightly into the hydrophobic core exclusively due to t,hcir insufficient conformational freedom there.
4. Structural Role of Different Hydrophilic Side Groups in Stabilization of Protein Three-dimensional Structure Hydrophilic amino acid residues can be divided into two groups. The first includes residues with massive, long, and comparatively flexible side groups (Lys, Arg, Glu, Gln, His). Residues with small, short and comparatively rigid side groups (Ser, Thr, Asp, Asn) pertain to the second. The presence of polar atoms in the side groups of hydrophilic residues permits these side groups to form hydrogen and salt bonds which together with other forces make a considerable contribution to the stabilization of the protein three-dimensional structure. Differences should be noted in the stereochemical behaviour of these two types of hydrophilic side groups, the main bulk of which is concentrated on the protein surface. Massive long hydrophilic side groups are convenient for interaction with their spatially removed polar partners. When polar partners are spatially removed from the hydrophilic residues of the first group, they will naturally also be distant from them along the protein chain. Such a “long-range interaction” along the chain permits, together with hydrophobic interactions, the attachment of separate large blocks of the globule between themselves. It is clear that the attachment will be efficient if massive long side groups are situated tangentially to the globule surface in a relatively elongated, “tense” conformation. It will also permit half the surface of the massive hydrocarbon part of this side group to avoid undesirable contacts with water and at the same time to shield from water the internal parts of the globule (hydrophobic core, hydrogen bonds between NH and CO groups of the regular region backbone). The presence of hydrophilic residues of the first group is extremely desirable in the protein structure near hydrophobic groups. This permits the residues of the first group to easily shield the local region of the hydrophobic core. Small, short hydrophilic side groups are convenient for interaction with their spatially approached polar partners. Polar partners which are situated close along the chain to hydrophilic amino acid residues of the second group are of particular interest in these cases. Such a “short-range interaction” along the chain is optimal for stabilization of irregular regions by salt and hydrogen bonds formed by a small and rigid side group with NH and CO groups of the backbone (Watson, 1969; Birktoft & Blow, 1972; Esipova & Tumanyan, 1972) and with polar groups of other rigid side chains. In some cases the presence of residues of the second group is extremely desirable near residues of the first group. This permits us to put a polar “patch” on the polar shell of the protein near the side group of the hydrophilic residue of the first,
PLATE I. Comparison of contacting peripheral assembled frcnn Courtauld Atomic Models. The 3.4 ,!I. and t.hc horizontal 6.5 d.
parts vertical
of saline and phenylalaninr linear dimension of each
sult~ group. hirlra gv~q) i-
STEREOCHEMISTRY
OE’ GLOBULAR
PROTEINS
863
group, without creating steric obstacles for this side group in meeting the necessary polar partner, i.e. to fill “vacant” cramped sites near the massive hydrophilic group on the surface of the globule by a small polar group. Besides the considered cases of hydrogen and salt bond formation, it is necessary to take into account the possibility of hydrogen bond formation by separate water molecules with polar protein groups (Birktoft t Blow, 1972). The following cases are of greatest interest:
(a) side group-H,O-side
group,
(1,) side group-H,O-backbone, (c) backbone-H,O-backbone. It is precisely such interactions of water molecules with protein that can make a considerable contribution to protein structure stabilization. In cases (a) and (b), the water molecules actually permit variation of the length of the hydrophilic side group if it is taken into account that the water molecule can, as a rule, only join its end. This gives greater stereochemical possibilities to the side group of the hydrophilic residue.
5. Theory of fl-Structure /l-Structural regions can be divided into three types by the character of their localization in the globule : (a) the “internal”
type-the
/3-structural fragment is immersed in the protein;
(b) the “surface” type---the planes of the peptide groups of the p-structural ment are situated tangentially to the globule surface;
frag-
(c) the “semi-surface” type-the planes of the peptide groups of the ,&structural fragment are situated on the globule periphery approximately perpendicularly to its surface.
N+
.......*c k
Fm. 2. Schematic illustration of the 8.structure. The backbone is designated by a saw-toothed line where planes of the peptide groups are perpendicular to the Figure surface. Side groups am designated by circles.
St is seen in Figure 2 that in the p-structure the side groups are located on both sides of the “band” formed by the planes of the peptide groups. The side groups of the residues in positions i, i + 2, i + 4, . . . , k are situated on one side of the band, with those of the residues i + 1, i + 3, i + 5 . . . , k - 1 on the other. Due to structural requirement 2a (see section 2 above) the internal type of @tructure will be realized only in the cases when both sides of the band have only hydrophobic side groups, or when N and (or) C-terminal positions of one or both sides of the band (positions i, k and i + 1, k - 1) are occupied by hydrophilic, and the others by hydrophobic residues. In these cases the most acceptable of the hydrophilic residues are the residues of
864
v.
1. LIM
the first group (Lys, Arg, Glu, Gln, His). On their side of the band, side groups 01 these residues can shield the local region of the hydrophobic core. As an example let us consider the ,&structural region with the amino acid sequence: Val Arg Leu 1’1~ Val Ile Ser Glu. It is easy to set that Arg and Glu, located on the periphery of thr: globule, can shield local regions of the hydrophobic core formed respectively \+it,li the participation of Phe and Ile. If there is a hydrophobic residue in position i, the presence of Gly is intolerable in position i + 1; similarly Gly must not occupy position k - 1 if the hydrophobic residue is in position k. Otherwise it will lead to formation of a cavity in the hydraphobic core due to the absence of a side group in the glycine residue. Thus, the internal type of p-structure can be formed without violation of structural requirements (see section 2 above) by entirely hydrophobic regions or hydrophobic regions with one or two hydrophilic residues in the first two and (or) last two positions on the N and C-termini. Longer (more than two residues) hydrophilic termini will lead to violation of requirement la or 2a, since if the last residue of the long hydrophilic terminus and the neighbouring one along the chain not included in the /?structure are situated on the surface of the globule (and not above the surface which would be a violation of requirement la), at least one hydrophilic residue will be immersed in the globule (violation of requirement 2a). In the case of the surface type j-structure one side of the band must have only hydrophobic groups and the other only hydrophilic. For example, the residues in positions i, i + 2, i + 4 . . . , k (see Fig. 2) must be hydrophobic, and the residues in positionsi+l,i+3,i-t5..., k - 1 hydrophilic. Only such a chain can be localized on the surface of the globule as a surface type p-structure without violation of any structural requirements. In this case the hydrophobic side of the band can be immersed in the hydrophobic core and the hydrophilic side can take part in formation of the protein polar shell. Thus, the conclusion can be drawn that regions with amino acid sequences of the type hydrophobic residue-hydrophilic-hydrophobic-hydrocan form the surface type p-structure. Such a /&structure philic- . . . -hydrophobic must not have Gly on the hydrophobic side and Gly and Ala on the hydrophilic side. This is stipulated by the fact that the presence of Gly on the hydrophobic side will impede the tight packing formation in the hydrophobic core and the presence of Gly and Ala on t#he hydrophilic side can permit water molecules to considerably IOOSC~ hydrogen bonds of the peptide groups neighbouring with Ca-atoms of Gly or Ala. Loosening of such hydrogen bonds is extremely undesirable since they participate, as a rule, in the formation of the fi-structural sheet, i.e. they rigidly fasten regions of the globular protein distant along the chain and therefore play a considerable role in tertiary structure stabilization together with other interactions. Finally, the third, semi-surface type can be realized in peripheral regions of pstructural sheets. Semi-surface peripheral regions of the /3structural sheet must have either only hydrophilic side groups or mainly residues of glycine. In the first case, side hydrophilic groups of the /?-structural region can be situated tangentially to the surface of t,he globule and together with other spatially approached hydrophilic groups will shield the hydrophobic core and intramolecular hydrogen bonds forming the fi-structural sheet. In the second case, when the semi-surface type of /3-st,ructura mainly contains residues of glycine, the backbones of other protein molecule regions can approach the backbone of the /Xstructural region on the surface of the globule to create steric obstacles for water molecules passing into the internal parts of the
STEREOCHEMISTRY
OF
GLOBULAR
PROTEINS
865
globule. The necessity of approach of the backbones is conditioned by the fact that the distance between @-atoms of the residues forming pairs (l-3)t in the /%structural conformation is comparatively large (“7 A), and therefore shielding of the protein internal part only by hydrophilic groups of other backbones will be bad. In this case large gaps will always remain between the side groups through which water molecules will penetrate to the hydrogen bonds of the ,%structural sheet and to hydrophobic groups (violation of requirements Za, 2~). Thus, regions with glycine residues following one another without interruption, or completely hydrophilic regions, can form the semi-surface type ,!Lstructure. An analysis of primary structures of hydrophilic fragments which can form the /3-structural conformation of the semi-surface type was not done in this study. In the first two considered types of p-structure (internal and surface) we dealt with hydrophobic residues, and the stereochemical analysis of the possibility of tight building-in of these groups into the core should be carried out by the procedure described in section 3 above. However, such a stereochemical analysis of hydrophobic residues entering the hydrophobic core in the /l-structural conformation encounters great difficulties. These difficulties are stipulated by the large admissible spread of + and # angle values of the backbone, and a great number of rotational isomers in comparison with the number of rotational isomers of side groups on other t,ypes of structures, in particular, on the u-helix (Scheraga, 1968). At the same time a very high conformational mobility of side groups in the p-structure allows us to hope that practically any set of side groups (with rare exceptions) can tightly pack into the hydrophobic core. The rare cases when this is impossible occur, as a rule, due to the presence of such hydrophobic residues as Tyr and Trp. In these cases the tight packing of Tyr or Trp is either completely impossible, or would lead to the interaction of t’he OH group of tyrosine or nitrogen of tryptophan with the approached (distant along the chain) hydrophobic groups. This situation is observed when Tyr or Trp is the only hydrophobic group on one of the sides of the band (see Fig. 2), as well as in the case when Trp is simultaneously surrounded on one side of the band on the right and left along the chain by hydrophobic groups, i.e. when Trp occupies the jth position and the positionsj - 2 andj + 2 are occupied by hydrophobic residues. residue as proline. Proline dttention should also be paid to such an “unstandard” cannot be included in the p-structure because of the stereochemistry of its side group. It is also undesirable to include the residue preceding proline along the chain into the /3structure since in the majority of cases it will lead to steric difficulties at formation of the /?-structural sheet. It should be noted in conclusion that each of the three types of /?-structure is specifically orientated in the globule relative to its surface. Therefore the protein chain region with an undistorted p-structural conformation must not have an amino acid sequence, t’he separate parts of which correspond to the different types of the &structure.
6. Theory of u-Helical Structure Globular proteins must not have helices without hydrophobic side groups. Otherwise such helices will be detached from the main part of the globule (violation of requirement la) because of the tendency of their hydrophilic side groups to surround t For definition
of this notation,
see the following
paper, Lim (1974a).
v.
866
I.
LIM
themselves with water; or hydrophilic groups will penetrate into the hydrophobic core (violation of requirement 2a) if such helices are attached to the main part of the globule. Consequently, each separate helical region must have hydrophobic side groups or a group which would permit the helix to attach itself to the hydrophobic core of the globule without violation of structural requirements (see section 2 above). Helices in globular proteins can also be divided into different types by their geometric localization in the globule, namely, into the internal and surface types. But a separate consideration of helices is not obligatory in this case. We are dealing here with a cylinder and not with a band, as in the case of the p-structure. In fact, all the helices in the globule differ only by the degree of immersion into the globule, which mainly depends on the fraction of hydrophobic residues. This permits us to make one general consideration of helical amino acid sequences without dividing helices into different types by their geometric localization relative to the surface of the globule. Taking into account the fact that the main bulk of helices in globular proteins have parameters close to those of the Pauling-Corey u-helix, we shall confine ourselves to a consideration of only this type of helix.
i
i+a
i+2
FIG 3. Schemcttic representetion
of the right a-helix.
CP-atoms are designated
by tiled
circles.
The analysis of immersion of hydrophobic side groups situated on the u-helix surface into the hydrophobic core according to the scheme of Figure l(a) (see section 3 above) shows (here and hereafter only brief explanations or final conclusions will be given; see in detail Lim, 19743) that the approached group makes the most advantageous contacts with any two hydrophobic groups if the latter form the pair (l-5)? from positions i and i + 4 (see Fig. 3). Hence it follows that in a-helical regions of proteins, hydrophobic pairs (l-5) must play the leading role in the attachment of the a-helix to the protein hydrophobic core. It is then natural to presume that hydrophobic pairs (l-5) must be distributed along the whole length of the a-helical cylinder. This will permit all the separate parts of the cylinder to firmly bind with the hydrophobic core. It is seen in Figure 3 that such binding will be best ensured if the hydrophobic pairs (l-5) neighbouring along the chain are “linked” by one to four residues and if there are no unlinked pairs (see Fig. 4). Hydrophobic residues (see Fig. 4(b)) not forming hydrophobic pairs (l-5) must make up hydrophobic triplets (l-2-5) or (l-4-5) with hydrophobic pairs (l-5) (see hydrophobic residues crossed out in pairs 1 and 6 in Fig. 4). If a hydrophobic group is not included in the hydrophobic triplet, then one of the hydrophilic groups neighbouring it on the u-helix surface will penetrate into the hydrophobic core at the interaction of such a hydrophobic group with this core. An exception is hydrophobic residues which can be situated without violation of structural requirements to the t For definition
of this notation,
see the following
paper, Lim (1974a).
STEREOCHEMISTRY
OF
GLOBULAR
PROTEINS
867
C(b)
9
4. (a) Possible schema of hydrophobic pairs (l-5). H, hydrophobic residues; G, hydrophilic residues. The first and the second hydrophobic pairs (l-6) are linked by one residue, the second and third by three, the third and the fourth by two residues. Terminal hydrophobic pairs (l-4) designated by number 6 is tolerable if one of the hatched residues G is a hydrophilio residue of the fist group. (b) Intolerable scheme of distribution of hydrophobic pairs (l-6). The sixth-seventh and seventh-eighth pairs are unlinked. The hatched residue H enters neither the hydrophobic pairs (1-5) nor the hydrophobic triplet (1-2-K) or (l-66). Fro.
left along the chain from the N-terminal hydrophobic pair (l-5) and (or) to the right from the C-terminal hydrophobic pair (l-5), one for each terminal fragment of the u-helix. These N and C-terminal hydrophobic residues must form hydrophobic pairs (l-4) linked by one or two residues with the N and C-terminal hydrophobic pairs (l-5) respectively, and, moreover, these hydrophobic pairs (14) must form hydrophobic-hydrophilic triplets (l-2-5) or (14-5) together with hydrophilic residues of the first group (see Fig. 4(a)). Thus, the considered distribution of hydrophobic side groups can be briefly formulated as follows : all the hydrophobic residues on the u-helix surface must be included in linked hydrophobic pairs (l-5), terminal pairs (14) or hydrophobic triplets (l-2-5) and (l-4-5). (Such hydrophobic pairs and triplets were first examined by Schiffer & Edmundson (1967), but unfortunately, the authors of this study imposed extremely rigid conditions on their mutual arrangement on the a-helix surface. Our distribution permits a freer arrangement of hydrophobic groups on the u-helix surface, and the distribution suggested by Schiffer & Edmundson (1967) is one of its particular cases.) Stereochemical analysis shows that fort omparatively long a-helices (more than two turns) this distribution of hydrophobic residues is optimal for satisfying structural requirements in comparison with other alternative distributions (see Lim, 1974b). However, it should be kept in mind that within the framework of the above-mentioned distribution of hydrophobic residues along the protein chain it is possible sometimes to encounter separate fragments of the chain with an amino acid sequence incompatible with structural requirements in the a-helical conformation, For example, fragments having hydrophobic groups in positions i, i + 2 and i + 4 (1, 3 and 5) and hydrophilic groups in positions i + 1 and i + 3 (2 and 4) cannot become helical. Such a fragment in the a-helical state is incompatible with requirement 2a. If hydrophobic groups are in positions i - 1, i, i + 1, the presence of hydrophilic residues of the second group in positions i f 4 is intolerable. Due to their low mobility and small dimensions, side groups of the hydrophilic residues of the second group will penetrate intro the hydrophobic core by their polar parts, nhich is also incompatible with
868
V. I.
LIM
requirement 2a. There are also several cases connected with the stereochemical brhaviour of residues such as proline and tryptophan (see Lim, 19746). A short a-helix can be attached to the hydrophobic core, without violation of structural requirements, by a single hydrophobic pair (l-5) or a single triplet (l-2-5) or (l-4-5), as well as by a single hydrophobic residue or a single hydrophobic pair (l-2). This means that there can be a-helices with a single hydrophobic side group or with only two hydrophobic residues which are neighbours along the chain. In all these cases when there are a few hydrophobic groups on the a-helix (one or two), the positions spatially approached to the hydrophobic groups (such posit,ions will bc i & 1, i f 3, i f 4, if the hydrophobic group is in the ith position; see Fig. 3) must contain mainly hydrophilic residues of the first group. Side groups of these hydrophilic residues will be able to shield a local region of the hydrophobic core and form salt and hydrogen bonds with residues distant along the chain (see section 4 above). In this case such “long-range acting” salt and hydrogen bonds will be very important for attaching the a-helix to the globule since the contribution of hydrophobic interactions from one or two hydrophobic residues is comparatively small. a-Helices with only three, four or five hydrophobic residues following each other along the chain cannot build into the globule without penetration of hydrophilic groups into the protein. As mentioned in section 4 above, the presence of hydrophilic residues of the first, group is very desirable in the protein structure near hydrophobic groups. Therefore it. is convenient to begin and (or) finish the a-helix with the hydrophilic residue of the first group if this residue forms hydrophobic-hydrophilic pairs (l-2), (14) or (l-5) even with only one hydrophobic group of the a-helix. It is not difficult to see that pairs of positions (l-2), (14) and (1-5) are spatially approached (see Fig. 3). Stereochemical analysis shows that hydrophobic pairs (l-4), (l-5) and triplets (l-Z-5) and (14-5) together with the approached groups will be particularly exposed to contact with water molecules if the hydrophobic groups are situated along the chain beside glycine or serine residues. Therefore the proximity of hydrophobic groups with glycine and serine residues along the chain is undesirable. Such proximity can be justified when glycine and serine residues are situated along the chain between thr hydrophobic and the hydrophilic residues of the first group. It is seen in Figure 3 that in this situation the hydrophobic and hydrophilic residues n-ill be located on diametrically opposite sides of the cylinder. It is natural to suppose (see section 4 above) that the hydrophilic residue side group will be orientated in the direction of the hydrophobic group on the cylinder surface, and in particular will pass near the glycine or serine side group without steric impediments. Serine can play the role of a patch on the protein polar shell. In general, the presence of hydrophobic-hydrophilic pairs (l-3) composed of one hydrophobic and one hydrophilic residue of the first group is very desirable. Such hydrophilic residues will attach t’he a-helix to the globule more rigidly than other hydrophilic residues. Such an attachment will be particularly effective when there are two or more hydrophilic residues of the first group forming hydrophilic pairs (l-5) linked with hydrophobic pairs (l-5) by three residues (see Fig. 3). The structural analysis of a-helices in globular protein presented here does not take into account structural requirement 2b which will undoubtedly introduce a correction of the amino acid sequence of a-helical regions independently of other structural requirements. TO reveal these corrections a stereochemical analysis of tight packing in
STEREOCHEMISTRY
OF
GLOBULAR
PROTEINS
869
the hydrophobic core of single hydrophobic pairs (l-Z), (l-4), (l-5), hydrophobic triplets (l-2-5) and (145) on the a-helix surface (see section 3 above) was carried out and a Table of antihelical pairs and triplets was composed (see the Table and its discussion in the following paper and in Lim, 1974u,b). Satisfaction of structural requirement 2b will be ensured if each hydrophobic residue in the a-helix is included in even one a-helical hydrophobic pair (l-5), terminal a-helical hydrophobic pair (14) or a-helical hydrophobic triplet (l-2-5) or (145) (all the pairs and triplets not included in the Table are a-helical hydrophobic pairs and triplets). In the case of a-helices with a single hydrophobic group or a single hydrophobic pair (l-2) there must be no residues of tyrosine and tryptophan and antihelical hydrophobic pairs (l-2). In other cases there must be no antihelical terminal hydrophobic pairs (l-4) and antihelical hydrophobic pairs (l-5). These pairs (14) and (l-5) are the most immersed into the hydrophobic core, and therefore, to avoid cavities, hydrophobic groups distant along the chain must always approach them. Hydrophobic pairs (l-5) from positions i and i + 4 present an exception when positions i + 1 and i + 3 are occupied by hydrophobic residues and hydrophobic pairs (14) from positions (i, i -+ 3) and (i -I- 1, i + 4) are a-helical. Two hydrophobic groups can be approached to these two hydrophobic pairs (14) without formation of a cavity in the hydrophobic core. The presence of antihelical hydrophobic triplets (l-2-5) and (145) will not lead to violation of requirement 2b if the 2nd or the 4th hydrophobic residue of such triplets participates in the formation of other a-helical hydrophobic pairs or triplets and if hydrophobic pairs (l-5) of these antihelical triplets are not antihelical. For example, if there is an antihelical triplet (l-2-5), (Leu-Phe-Leu) (see the Table of antihelical pairs and triplets), and Phe forms an a-helical hydrophobic pair (l-5), (Phe-He) , requirement 2b can be satisfied by two pairs (I-5)) (Leu-Leu) and (Phe-Ile) . Competitive interrelations between a-helical and /?-structural conformations can occur when a-helical and #Mructural regions are searched for. It can turn out, for example, that the a-helical sequence contains one or several fragments formed entirely of hydrophobic residues. Such fragments (see section 5 above) can form the @tructural conformation without violation of structural requirements. It is natural to draw the conclusion that in such cases it is energetically more advantageous to have one long a-helix than one or several shorter /3-structural regions, since in the case of the a-helix more hydrogen bonds are formed than at its division into /&structures. On the other hand, separate helical regions in globular proteins have, as a rule, more residues than separate p-structural regions. This is explained by the fact that the pitch of the a-helix is about 2.5 times smaller than that of the p-structure (calculated per residue) and hence long fragments of the chain prefer to become helical than ext,end into the ,&structure so as not to violate structural requirements lb. Therefore, to avoid competitive situations between a-helical and p-structural conformations, or disturbance of form compactness, j%structural regions should be searched for on the fragments of the chain which cannot be helical, i.e. after finding a-helical regions. 7. Irregular
Regions in Globular
Proteins
In sections 5 and 6 amino acid sequences were found for separate regions of the protein chain which form a-helical and /3-structural conformations. Concrete sets of side groups were assembled in a definite manner on the predetermined structure of the backbone of a separate fragment so that the obtained structure could be built into the
870
V. I.
LIM
globule without violating structural principles of protein globular organization. An analogous approach could also be applied to irregular regions if the conformation OS irregular regions could be described by a limited set of standard structures. Unfortunately, at present this problem has been solved only partially by identifying structures such as /3,y-bends, etc. (see Venkatachalam, 1968; Chandrasekaran et al., 1973). Therefore, the task of selecting the “necessary” sequences can be solved only for structures such as ,3, y-bends. The stereochemical analysis of these structures is not touched upon in this paper but the author would like to express some of his considerations on the development of the theory of irregular regions. It is characteristic for irregular regions of globular proteins that for steric reasons comparatively few intramolecular hydrogen bonds are observed, and such regions are localized, as a rule, on the globular surface (their localization in the globule is intolerable because of violation of structural requirements 2a by NH and CO groups) where they contact with water molecules. This suggests that water molecules together with other types of interactions play an important role in stabilization of irregular region structure (Kuntz, 1972; Berendsen, 1972). Within the limits of an irregular fragment of the chain, hydrogen bonds can be formed by water molecules according to schemes a, b, c (see section 4 above) (the author hopes that a refinement of globular protein structural investigations will reveal a growing quantity of water molecules forming hydrogen bonds with protein, especially in irregular regions). Separate intramolecular hydrogen bonds between peptide groups as well as hydrogen bonds between hydrophilic side groups and peptide groups can be formed in irregular fragments. Rig2 conformations (conformations in which all the angles of internal rotation (+, and #J of the backbone are fixed by hydrogen bonds) can be easily found with the help of such hydrogen bonds for an irregular region with a definite amino acid sequence. Thus, it is possible to obtain a limited set of energetically advantageous conformations among which there will be an obligatory native conformation, and only this conformation will satisfy all the structural requirements due to the unique character of the three-dimensional structure of globular protein (see section 2 above). Therefore, it will not be very difficult to find the native conformation among the rigid ones. For this it is necessary to submit the whole set of rigid conformations to stereochemical analysis and to see which of them satisfies structural requirements. This analysis will not differ in principle from the analysis carried out in the previous sections of this paper. It should be noted that the rigidity of the chain, which is stipulated by hydrogen bonds, is tlrst of all observed in the regular regions. Realization of one or other type of regular structure from this class in globular proteins depends on the compatibility of the amino acid sequence with structural requirements. Thus, one general structural regularity can be presumed along all the protein chain, namely, the rigidity of the chain stipulated by hydrogen bonds. 8. Conclusion We see that a logical and stereochemical analysis, based on taking into account the main structural principles of globular organization of protein chains, permitted developing a completely a priori physical theory of globular protein secondary structure which considers all the major types of physical interactions in the protein globule in aqueous medium. A comparison of the predicted secondary structure with experiment gives good agreement for all the proteins whose three-dimensional and primary
STEREOCHEMISTRY
OF GLOBULAR
PROTEINS
871
structures were at the author’s disposal at the moment of writing the paper (see Lim, 1974u). It should be noted that the structural requirements on the basis of which the theory is developed, and consequently, the conclusions of the theory, can be violated for separate regions of protein chains. These violations can be mostly expected at the following points : (a) at the points of contact of the polypeptide chain with co-factors, the stereochemistry of which is not taken into account within this theory; (b) at the points of contact between separate subunits of protein which has the quaternary structure; massive hydrophobic groups not contacting with the hydrophobic core of the subunit can be situated at the points of such contacts on the surface of a separate subunit ; (c) at the active centres of enzymes, where packing of the polypeptide chain can apparently be energetically tense because of the necessity to create an increased affinity to the substrate (Blow 86Steitz, 1970). In some cases separate fragments can probably enter the protein chain, the amino acid sequence of which is not evolutionary quite well adjusted to structural requirements, but their presence does not prevent protein self-organization and the realization of its functions. The portion of such fragments in the protein chain is undoubtedly insignificant and the major part of the chain conforms to structural principles found experimentally. It should also be noted that this theory is applicable only to water-soluble globular proteins since the geometric structural principles and the unique character of the structure, which are the basis of the theory, are valid only for water-soluble globular proteins. The author is grateful to 0. B. Ptitsyn, A. S. Spirin remarks and discussion of the results of the study,
and A. V. Finkelstein
for valuable
REFERENCES Berendsen, H. J. C. (1972). Proc. VIIIth FEBS Meeting, vol. 29, pp. 19-27, North Holland Publishing Company, Amsterdam, London. Birktoft, J. J. & Blow, D. M. (1972). J. Mol. Biol. 68, 187-240. Blow, D. M. & Steitz, T. A. (1970). Ann. Rev. B&hem. 39, 63-100. Chandrasekaran, R., Lakshminarayanan, A. V., Pandya, U. V. & Ramachandran, G. N. (1973). Biochim. Biophys. Acta, 303, l&27. Chou, P. Y. & Fasman, G. D. (1974). Biochemistry, 13, 211-245. Dickerson, R. E. & Geis, I. (1969). The Structure and Action of Proteins, Harper & Row, New York, Evanston, London. Esipova, N. G. & Tumanyan, V. G. (1972). Mol. biologiya (U.S.S.R.), 6, 840-850. Flory, P. J. (1969). Stutz&icd Mechanics of Chain Molecules, Interscienco, John Wiley gG Sons, New York, London, Sydney, Toronto. Gurskaya, G. V. (1966). Structure of Amino Acids, Nauka, Moscow. Klapper, M. H. (1971). Biochim. Biophys. Acta, 229, 557-566. Kotelchuk, D. & Scheraga, H. A. (1969). Proc. Nat. Acud. Sci., U.S.A. 62, 14-21. Kuntz, I. D. (1972). J. Amer. Chem. Sot. 94, 4009-4012. Lee, B. BE Richards, F. M. (1971). J. Mol. Bio&. 55, 379-400. Lewis, P. N., Gb, N., G6, M., Kotelchuk, D. & Scheraga, H. A. (1970). Proc. Nat. Acud. Sci., U.S.A. 65, 810-815.
872
V. I. LIM
Lim, V. I. (1974a). J. Mol. Biol. 88, 873-894. Lim, V. I. (1974b). BiojZzika (U.S.S.R.), 19, 366-378. Lim, V. I. & Ptitsyn, 0. B. (1970). Mol. biologiya (U.S.S.R.), 4, 372-382. Lim, V. I. & Ptitsyn, 0. B. (1972). Biofizika (U.S.S.R.), 17, 21-33. Low, B. W., Love& F. M. & Rudko, A. D. (1968). Proc. Nat. Acad. Sci., U.S.A. 60, 15191526. Matthews, B. W. (1973). The Proteins, 3rd edn, vol. 3 (Neurath, H. & Hill, R. L., et&), pp. l-222, Academic Press, New York, London. Nagano, K. (1973). J. Mol. BioZ. 75, 401-420. Periti, P. F., Quagliarotti, G. & Liquori, A. M. (1967). J. Mol. Biol. 24, 313-322. Perutz, M. F., Kendrew, J. C. & Watson, H. C. (1965). J. Mol. BioZ. 13, 669-678. Prothero, I. W. (1966). Biophys. J. 6, 367-370. Ptitsyn, 0. B. & Finkelstein, A. V. (1970). Btijkika (U.S.S.R.), 15, 757-768. Ptitsyn, 0. B., Lim, V. I. & Finkelstein, A. V. (1972). Proc. L’IIlth FEBS Meeting, vol. 25, pp. 421-431, North Holland Publishing Company, Amsterdam, London. Robson, B. & Pain, R. H. (1971). J. Mol. BioZ. 58, 237-259. Scheraga, H. A. (1968). A& Phys. Org. Chem. 6, 103-184. Scheraga, H. A. (1971). Chem. Reviews, 71, 195-217. Schiffer, M. & Edmundson, A. B. (1967). Biophys. J. 7, 121-135. Venkatachalam, C. M. (1968). Biopolymers, 6, 1425-1436. Watson, H. C. (1969). In Progress in Stereochemistry, vol. 4 (Aylett, B. S. & Harris, M. M., eds), pp. 299-333. Butterworth & Co. (Publishers) Ltd., London. Wu, T. T. & Kabat, E. A. (1973). J. Mol. BioZ. 75, 13-31.