Taxonomy and conformational analysis of loops in proteins

Taxonomy and conformational analysis of loops in proteins

J. Mol. Biol. (1992) 224, 685-699 Taxonomy and Conformational Analysis of Loops in Proteins Christine S. Ring’, Donald G. Kneller’, Robert Langrid...

2MB Sizes 0 Downloads 45 Views

J. Mol. Biol. (1992) 224, 685-699

Taxonomy

and Conformational

Analysis of Loops in Proteins

Christine S. Ring’, Donald G. Kneller’, Robert Langridge’>’ and Fred E. Cohen1*2>3T Departments

and Biophysics2, of Pharmaceutical Chemistry ‘, Biochemistry University of California, San Francisco San Francisco, CA 94143-0446, U.S.A. (Received 6 September

and Medicine3

1991; accepted 6 December 1991)

segments of protein We propose a general classification scheme for loops, aperiodic structure. In an effort to avoid the geometric complexity created by non-repeating 4 $ angles, a morphologic definition that focuses upon the linearity and planarity of loops is utilized. Out of 432 loops (4 to 20 residues in length) extracted from 67 proteins, 205 are classified as linear (straps), 133 as non-linear and planar (omegas), and 86 as non-linear and non-planar (zetas). The remaining 8 are classified as compound loops because they contain a combination of strap, R, and [ morphologies. We introduce a structural alphabet as a shorthand notation for describing local conformation. The symbols of this alphabet are based on the virtual dihedral angle joining four consecutive alpha carbons. The notation is used to provide a compact description of loop motifs in phosphate binding and calcium binding proteins. Since similar loop conformations form similar “words”, the structural sequence facilitates the search for common structural motifs in a family of loops. Contrary preferences for to the view of loops as “random coils”, we find loops to have positional amino acid residues analogous to those previously described for p-turns.

Keywords: loop classification;

loop structure; loop motifs; protein structure

Pauling et al. (1951a,b) correctly postulated that two periodic structures, w.-helices and p-sheets, would be common features in protein structures. The regular nature of these secondary structure elements has facilitated the analysis of sequence termination signals, and packing preferences, arrangements. In contrast, loops, the connections between secondary structures, form aperiodic structures. The non-repeating # Ic/ angles lead to highly variable local conformations. The conformational diversity of loops coupled with the wide distribution in their chain length has complicated a comprehensive analysis of loop structures. The best characterized aperiodic structures are short and geometrically well defined. Examples include the three residue y-turns (Rose et al., 1985; Milner-White et al., 1988) and the four residue /?-turns (Venkatachalam, 1968; Lewis et aZ., 1973; 1981; Wilmot & Rose et al., 1985; Richardson, Thornton, 1988, 1990). Gamma turns have been classified into “inverse” and “classic” categories to whom

all correspondence

should

be

685 0022-2836/92/070685-15

$03.00/O

analysis;

on the basis of differences in the backbone torsion angles of the second residue (Rose et al., 1985). Venkatachalam (1968) classified b-turns into three categories based on a study of the hydrogen bond between the carbonyl oxygen of the first residue and the backbone nitrogen of the last residue. Lewis et al. (1973) broadened Venkatachalam’s original specification of /?-turns by introducing more geometric freedom into the hydrogen bond definition. Four residues were defined as a p-turn if the distance between the first and fourth alpha carbons were less than 7 A (1 A = 91 nm). Alpha-helices were excluded explicitly. Using backbone dihedral geometry, Richardson (1981) reclassified b-turns into seven commonly accepted categories (I, I’, II II’, Via, VIb, and IV). Recently, Wilmot & Thornton (1990) have proposed an expanded classification that includes 16 categories. This follows from a shorthand notation for the backbone dihedral angles of the second and third residues in the B-turn. Several efforts have been directed at understanding the conformations of longer loops. Chothia et al. (1987) have characterized canonical hypervariable loops in the immunoglobulin variable light and variable heavy chain domains. Edwards et al.

1. Introduction

t Author addressed.

conformational

0

1992 Academic

Press Limited

686

C. S. Ring

(1987) and later Rice et al. (1990) and Colloc’h & Cohen (1991) have studied the loops that join a-helices to B-strands in a//3 proteins. Leszczynski & Rose (1986) have examined Q loops, irregular segments of chain where sequentially distant N and C-terminal residues are spatially proximal. Unfortunately, these loops compose only a fraction of all the loops observed in globular proteins. Little or no categorization exists for the majority of loops that are often termed “random coil” for the lack of a better description. Unger et ~2. (1989) and Rooman et al. (1990) have addressed this problem by attempting to collect a common set of protein substructures. Since the number of relevant substructures exceeds 50, a simple picture of loop organization has yet to appear from their efforts. We propose a simple unifying classification scheme that includes previously described loop classes as well as those examples not yet classified. This scheme is based on a systematic study of loops between 4 and 20 residues in length. Using geometric arguments, we divide loops into three main classes: strap loops, Q loops, and [ loops. Compound loops represent a fourth class that is composed of two or more elements from the main classes. We also introduce a structural alphabet based on virtual

bond

dihedral

angles

that

conveniently

describes local conformations. With this notation, the conformations of similar functional loops are easily compared. This classification scheme, in conjunction with the structural sequences, provides the vocabulary necessary aperiodic loop structures.

for a concise description

of

From an examination of the correlation between the amino acid sequence and the corresponding structural alphabet sequence,

we find

that

loops

have

positional

preferences

of

amino acids analogous to those found in /?-turns.

2. Methods (a) Loop data sel Our protein data set consists of 67 high resolution ( I 25 A) structures with low sequence homology ( < 50 “/b sequence identity). Smith & Smith’s (1990) sequence alignment program PIMA is used to determine the level of similarity between proteins. In the case of identical subunits, only 1 of the subunits is considered. In the case of non-identical subunits, each is treated separately. The Protein Data Bank designations for the proteins studied are listed in Table 1 (Bernstein et aZ., 1977). Loops are defined as the connection between 2 adjacent periodic secondary structural elements, a-helices and B-sheets. Secondary structure is automatically assigned using Richards & Kundrot’s (1988) DEFINE program. Although DEFINE also identified a subset of loops, p-turns and Q-loops (as specified by Leszczynski & Rose. 1986), the program is only used to identify secondary structure and our more general definition of loops is employed. All putative loops are visually screened in the context, of the protein for “missed” secondary structure. especially b-strands. A total of 39 putative loops were deleted from the set if they were designated as a strand by the PDB author’s record or if the majority of this structure hydrogen bonds with a DEFINE assigned P-strand. From this remaining data base, 432 loops with lengths

et al.

between 4 and 20 residues are extracted. our loop data set for further analysis. (b) Linearity

This constitutes

and flatness

For each loop in the data set, we determine the principal axes of the alpha carbon backbone by computing the eigenvectors and eigenvalues of the inertia tensor. The resulting eigenvectors are the principal axes of rotation. Each loop is transformed so that its principal axes lie along the x-, y-, and z-axis, with x corresponding to the smallest and z corresponding to the largest eigenvalues. (The smallest eigenvalue corresponds to the axis with the largest extent.) We define the extent in the z direction, E,,‘ as the average absolute distance of the loop alpha carbons from the yz-plane:

N is the number of residues in the loop. Similarly, E, is the average absolute distance from the x.z-plane and E, is the average absolute distance from the xy-plane. By the choice of our principal axes, E, 2 E,, 2 E,. Linearity is defined as the ratio of E,, to E,. The linearity ratio ranges from 0 to 1, with 0 corresponding to a straight line and 1 describing an object that has identical extent along its 2 axes of greatest extent (i.e. circle or oblate spheroid). Flatness is defined as: Jk JE,z+E:’ It varies from a value of 0 for a plane to 1 for a sphere. Leszczynski & Rose (1986) previously developed a measure of flatness based on the eigenvalues (&, 1, I 1, I 1s) of the inertia tensor for an all atom representation of the chain. Their flatness measure, which we will call LR-flatness, is: 4 J- 12' This ratio can range from 1 (sphere) to co (line). As shown by Kneller (1988), the problem with this ratio is that it is difficult to correlate with the geometric notion of flatness. For example, an oblate spheroid (i.e. a pancake) with A1 = 1, < 1, can be as LR-flat as a prolate spheroid (i.e. a sausage) with i, < 1, = 1,. Furthermore, geometrically flat objects may not result in the same value of LR-flatness since there is no unique value of flatness in this ratio. This is illustrated in Fig. 1, which shows the distribution of LR-flatness values calculated for the alpha carbons of 127 three-residue loops. Since 3 points define a plane, these loops are geometrically flat by definition. Yet, a distribution of LR-flatness values are seen for these geometrically flat objects. Leszczynski and Rose the average LR-flatness for proteins, compared 1.77( kO.66) with that for loops, 2.17( +0.51), to conclude that loops are as globular as proteins. As demonstrated in Fig. 1, geometrically flat objects can also have these values. Hence, the Leszczynski and Rose measure seems ill-suited for determining the flatness or globularity of loops. (c) Structural

sequence

We have developed a structural sequence as a shorthand notation to represent the conformations of tetrapeptides. By analogy to the dihedral angle formed by 4

Class$cation

qf Protein Loops

Table 1 The loop data set proteins PDBt

A

Protein

lacx

2.0

Actinoxanthin

1ale lbp2 1ctf lgcr 1hne lhoe 1mbd 1Pcy lphh 1PPt 1prc lrhd lsn3 ltim luby 2abx 2act 2aPP 2apr 2aza lcab 2ccy Zcdv 2ci2 2CPP

1.7 1.7 1.7 1.6 1.84 2.0 1.4 1.6 23 1.37 23 25 1.8 25 1.8 2.5 1.7 1.8 1.8 1.8 2.0 1.67 1.8 2.0 1.63 1.85 2.0 1.5 1.9 2.3 1.74 2.4 1.7 1.7 1.8 2.0 24 1.5 1.55 1.6 1.X 1.5 1.5 20 2.5 2.1 2.2 1.4 1.65 2.5 1.54 2.3 1.6 1.6 1.7 1.9 1.8 2.5 15 1.54 1.5 2.0 1.26 2.4 2.5 1.65

Alpha lactalbumin Phospholipase 50 S ribosomal protein Gamma crystallin Human neutrophil elastase Alpha amylase inhibitor Myoglobin Plastocyanin Hydroxybenzoate hydroxylase Avian pancreatic polypeptide Photosynthetic reaction center Rhodanese Scorpion neurotoxin Triose phosphate isomerase Ubiquitin Alpha bungarotoxin Actinidin Acid proteinase, penicillopepsin Acid proteinase, rhizopuspepsin Azurin (oxidized) Carbonic anhydrase Cytochrome c (prime) Cytochrome c3 Chymotrypsin inhibitor Cytochrome ~450 Calcium binding pervalbumin B Citrate synthase Cytochrome c peroxidase Human myeloma patient kol serum Gene 5 DNA binding protein Deoxy-hemoglobin Leucine/isoleucine/valine/binding protein Lysozyme Myohemerythrin Prealbumin Pseudoazurin Phosphofructokinase Proteinase K Trypsin Bence-Jones protein Subtilisin Carlsberg Proteinase A Staphylococcal nuclease Superoxide dismutase Coat protein of satellite tobacco necrosis virus Adenylate kinase Calmodulin Erabutoxin Sative elastase Catabolite gene activator protein Glutathione reductase Calcium binding protein Thermolysin Cytochrome ~551 (reduced) Dihydrofolate reductase Ferredoxin Flavodoxin Cytoplasmic malate dehydrogenase Typsin inhibitor Carboxypeptidase A Cytochrome e (reduced) Lactate dehydrogenase Ribonuclease A Apo-liver alcohol dehydrogenase Catalase Papain

PCPV

Pets PCYP 2fb4 2gn5 2hhb 21iv 21zm 2mhr 2pab 2paz 2pfk Zprk Zptn Zrhe %sec lsga Zsns dsod 2stv

Yadk 3cln 3ebx

3est 3gap 3grs 3icb 3tln 45lc 4dfr 4fdl 4fxn 4mdh 3pti 5C!pa 5cyt 6ldh irsa Xadh scat 9pap

t PDH, Protein Data Rank.

Reference Pletnev & Kuzin (privat,e communications) Acharya et al. (1989) Dijkstra et al. (1981) Leijonmarck C Liljas (1987) Slingsby et al. (private communications) Navia et al. (1989) Pflugrath et al. (1986) Phillips (1980) Guss & Freeman (1983) Schreuder et al. (1988) Rlundell et al. (1981) Miki et al. (1989) Ploegman et al. (1978) Almassy et al. (1983) Ranner et al. (1976) Vijay-Kumar et al. (1987) Love & Stroud (1986) Raker & Dodson (1980) James & Sielecki (1983) Suguna et al. (1987) Raker (1988) Kannan et al. (private communications) Finzel et al. (1985) Higuchi et al. (1984) McPhalen & James (1987) Poulos ef al. (1987) Kumar et al. (1990) Remington et al. (1982) Finzel el al. (1984) Marquart et al. (1980) Brayer & McPherson (1983) Fermi et al. (1984) Sack et al. (1989) Weaver & Matthews (1987) Sheriff et al. (1987) Blake et al. (1978) Adman et al. (1989) Rypniewski & Evans (1989) Betzel et al. (1988) Walter et aE. (1982) Furey et al. (1983) McPhalen t James (1988) Moult et al. (1985) %g (1977) Tainer et al. (1982) Liljas & Strandberg (1984) Dreusicke et al. (1988) Rabu el al. (1988) Smith et al. (1988) Meyer et al. (1988) Weber & St&z (1987) Karplus & Schulz (1987) Szebenyi & Moffat (1986) Holmes & Matthews (1982) Matsuura et al. (1982) Rolin et al. (1982) Stout ( 1989) Smith et al. (1977) Rirktoft et al. (1989) Marquart et at. (1983) Rees et al. (1983) Takano (1984) Abad-Zapatero et al. (1987) Wlodawer et al. (1988) Eklund et al. (1976) Fita & Rossmann (1985) Kamphuis et al. (1984)

688

C. 8. Ring et al. Structural alphabet Dwdlng up dihedral space 1: Leszczynski and Rose’s average value calculated for loops

1.77iW66:

(a)

Lerzczynski and Rose’s average value calculated for proteins

-I7,

LS

Js

,-

0 1

2

4 3 5 Rose’s meowe of flatness

6

Figure 1. The distribution of Leszczynski and Rose’s flatness value for 3 residue loops. Since 3 points define a all these values describe flat structures. plane. Leszczynski and Rose used the average value of flatness for proteins and loops to argue that loops were as globular as proteins. Because there are no unique values of flatness with this measure, it is impossible to judge the flatness or globularity of an object based solely on this value.

(b)

= g i$ L

120 100 80 60

E

40

z’ 20 0’

4b

8b

ii0

I&

Virtual

atoms linked by chemical bonds, the virtual bond dihedral angle (2) is the angle formed by 4 consecutive alpha carbons linked by virtual bonds (CT-,, CF- ,, Cq, and Cy,,). Coincidentally, the common shapes of these consecutive virtual bonds resemble 4 letters of the alphabet: J, L, U and Z (see Fig. Z(a)). The possible dihedral angles can be divided into 4 equal regions carrying these labels. With the angular boundaries listed below, it is possible to separate the common tetrapeptide conformers. u: L: Z: J:

(285”) -75” I7 < 15” 5 7 < 105” 2 z < 195” < 7 <

15” 105” 195” 285” ( - 75”)

The conformation of a loop can be described as a series of U, L, Z and Js corresponding to the conformations of partially overlapping tetrapeptides. (d) Analysis

of periodic

features

in loops

260 240 280 3i0

360

dihedral

Figure 2. (a) The definition and illustration of the structural alphabet. The alphabets, J, L, U, and Z, were chosen as mnemonic device for their respective conformations. (b) The histogram shows the distribution of virtual dihedral values seen in loops. The same histogram for proteins shows a pronounced peak at approx. 50” corresponding to helices and a more broad peak at approx. 220” corresponding to strands.

(e) Determining

the level of significance

Wilmot & Thornton (1988) have defined a statistical measure for the positional preferences of amino acids in p-turns. Using this statistical approach, we define the amino acid preferences at the 4 positions (i, i+ 1, i+2, i + 3) of each element in the structural alphabet (J, L, U, Z). If there were no preferences, the probability of finding a particular amino acid, p,, is: Puo.of amino acid X observed in loon data set Total no. of amino acids in the loop data set I

Fourier analysis is the classic strategy for identifying periodic elements in a complex data set. The short length of loops complicates this analysis. Cornette et al. (1987) have suggested a least-squares fit procedure to ameliorate this problem. The results of the 2 methods converge for longer sequences. The hydrophobicity profiles of loops are examined for periodic features using the Eisenberg (1984) consensus hydrophobicity scale. The hydrophobic profile is compared with a test sequence of known periodicity, typically {A cos(kw) + B sin(kw)} where w is frequency and 2n/k is the period. Least-squares analysis involves estimating the frequency and period of the given sequence and optimizing A, B, and C to produce the best fit of {C + A cos(kw) + B sin(kw)) to the sequence hydrophobicity profiles. A FORTRAN program written by Cornette et al. (1987) is used to calculate power spectra for the strap, a, and [ loop categories.

The sum of the probabilities for all 20 amino acids is 1. Using position i of structural alphabet J as an example, the expected number (p) of alanines at i, is simply np, where n is the number of Js observed in the data set and p is the probability of finding alanine in loops. The expected values for the amino acids are normalized to the compositions found in our loop data set instead of proteins as a whole to avoid a bias toward common loop residues (e.g. serine, glycine). If we assume a binomial distribution, the standard deviation, o, is J--npq, where n and p are as before and q equals 1 -p. The significance factor, d (Z-score), is defined as (r-p)/o, where z is the number of times that a particular amino acid is observed at position i. The probability that the observed value, z, is due to the random scat-

Classification t,ering about the expected value, p, is twice the integral of f(z) with respect to 5 from d to co, where: f(r)

-- I -z-p 1 2c 0 >

=le &la

:: 2 3

40 30

$

20

f

IO

;.

Three values are given for reference:

689

of Protein Loops

0 0.0

0.2

0.4

,596 Jdlr 1.97 lo’0 IdI 2 2.57

0.8

0.6

I .o

Flatness

(a)

01 “/o (dJ 2 3.30 Strop Loops

As an example. IdJ2 I.97 corresponds to a 5 y0 probability that the number observed is part of the expected distribution. or a 95% confidence level t,hat this observed value is significant.

n 0.2

0.0

3. Results and Discussion

0.6

I.0

(b)

c$ 20

Loops are examined with the molecular display software ,MidasPlus (Ferrin et al., 1988) on a Silicon Graphics Personal Iris 4D/25G. The results of our

are contrary

O-6 Flatness

(a) Loop cZassi$cation

visual examination

,,.

0.4

to the conclusions

of Leszczynski & Rose (1986). Although some loops appeared as globular as proteins, most loops, especially their backbones, appear planar. Inspection of

5

I5

42 & “E

10

z’

Omega Loops

5 0_ -

0.0

u*z

0.4

0.6

0.8

I-0

0.8

I -0

Flatness Cc)

the planar loops reveals two different morphologies. One set contains linear, extended structures while the other contains non-linear and compact structures. When the loops are re-examined, they seem to 30 p

0

25

0.2

0

15

2

IO

5

0.4

0.6 Flatness

2b 20

(d)

Figure 4. A histogram of flatness of the loop data set as

5

a whole average

0 014

0.6

and the 3 main loop classes. For comparison,

the

flatness for the proteins (alpha carbons backbones) in the data set is 0628( +@130).

0.8

Llnearlty

(al 5

Llnear Laops

25

-L

::$ 20 ;

I5

0.0

0.2

0.4

-1 . . . . . . . ../I.... 1...$1...0.8 0.6

I.0

L~nearlty (b) Non-linear

0

0.2

Loops

0.4

0.6

0.8

I.0

Linearity (c)

Figure 3. A histogram of linearity of the loop data set as a whole and the corresponding histograms once the loops are divided into linear and non-linear categories.

fall into a natural morphological classification. Loops are generally either linear, non-linear but still flat, or globular. We have defined linearity and flatness to quantify our observations. Figure 3(a) shows the linearity of all loops in the data set. The histogram of these data suggests two different populations. Based on these data, 0.5 was chosen as the initial cutoff. Loops with linearity less than or equal to @5 are designed as linear and the remaining loops are designated nonlinear. Upon re-examination in the context of their designated assignments, the loops conform well with our visual expectations for linearity. Figure 4(a) shows the flatness values for the loop data set. Those loops in the non-linear category are designated flat if they had flatness values less than @4 and are designated globular otherwise. The cutoff value of @4 is chosen because almost all the linear, strap loops have flatness measures less than this value (Fig. 4(b)). Our measures of linearity and flatness

are continuous

variables.

Thus,

surprising that the loops with linearity

it is not

and flatness

690

C. S. Ring et al.

~0MP0lJN0

lb)

-2 i\.

Linear strap

online

loops mx--.+.& Omega

loops

Zeta

loops

Figure 5. The loop taxonomy based on the linearity and flatness of the geometric parameters. Loops can be viewed in terms of 3 main structural components which form simple loops. Combinations of these components result in compound loops.

values near the cutoffs display intermediate morphologies. The decision to choose 0.5 and 0.4 is somewhat arbitrary but their exact values do not markedly affect our results. The classiciation of the loops using these quantitative measures into linear, non-linear but still flat, or globular categories agrees with our initial visual assignments. We have also repeated our calculations using an all-atom representation instead of just the alpha carbon backbone. Although the results are shifted toward slightly higher values, they do not change our classification scheme or the conclusions of this work A morphological loop classification scheme is presented in Figure 5. This “loop taxonomy” presents a simple picture of aperiodic protein substructures. Re-examination of the loops in their geometric classifications reveals that most loops within each of the three main classes share a similar general morphology. We view the idealized representation of these morphologies as the basic building blocks of loop structures. The few that do not follow this generalization are found to contain more than one of these building blocks. As a result, we designate simple loops as those that contain only one of these main components and compound loops as those that contain any combinations of simple loops. Simple loops are subdivided into linear and nonlinear categories. The linear loops, or straps, are irregular extended structures. The term “straps” is borrowed from Richardson & Richardson (1989) who used it to describe general connections between secondary structure elements. The non-linear loops are further subdivided into flat and globular categories. Omega loops are non-linear, flat loops that look like the Greek letter omega (a). Our R loops are a subset of Leszczynski and Rose’s 0 loops, Their definition relied on the close proximity of the segment’s termini and this includes most loops in our R and [ categories. The zeta loops are the nonlinear globular loops so called because their alpha

Figure 6. Examples of the alpha carbon backbone of the 3 main loop classes and their respective structural alphabets. The extracted loops are from glutathione reductase (3grs: 317-324), human neutraphil elastase (lhne: 70&78E), and photosynthetic reaction center (lprc: 248L-257L), respectively.

carbon backbones appear twisted like the Greek letter zeta ([). Zeta loops often contain a screw-like twist, reminiscent of an irregular turn of a helix. This creates a significant deviation of the loop from the planar approximation. Examples of these structural categories are illustrated in Figure 6. Structurally more complicated loops are deconvoluted and are viewed as combinations of strap, Q. and [ loops. Figure 7 shows a compound loop from the coat protein of the satellite tobacco necrosis virus (2stv). The loop can be described as a combination of a strap and an R loop. (b) Structural

sequmces

The lack of a simple conformational description of loops has hindered an analysis of their structures. Unlike helices and strands that. are associated with a single consensus structure, the aperiodic nature of loops provides a wide range of possible conformations. Although classifying a loop as a strap, 0, or c, can be the first step in a complete description, the idealized representations can only present a low resolution picture of the loop structure. For most purposes, more detailed structural information is necessary. While computer graph& provides a detailed structural perspective. the visual images are qualitative and this approach is not convenient, when many loops are being considered. Backbone dihedral angles (4, $ and o) provide more quantitative specifications of the three-dimensional confor-

r(w

L.

Figure 7. An example of a compound loop. The 19. residue loop from the coat protein of the satellite tobacco necrosis virus (2stv: 97-115) connects a strand to a helix and can be thought of as a strap loop followed by an omega loop.

691

Classi$cation of Protein Loops

Table 2 Structural alphabet composition of the loop classes o/0 Composition Loop class strap loops Omega loops Zeta loops

of tetramers

No. of loops in dataset

No. 0f tetramers

J

L

u

2

205 133 86

985 561 479

369 308 253

284 382 43-8

11.5 130 14-6

23.3 l&O 163

Eight loops in the dataset were classified as compound loops since they could be best described as a combination of the three main loop classes.

mation. However, it is difficult to anticipate the impact of variations in torsion angles on the overall structure. Presenting the backbone dihedral angles on a Ramachandran map can sometimes be helpful for shorter loops but these representations are much too complicated for longer loops. A convenient method that easily conveys conformational information is desirable. As we described earlier, tetrapeptide conformations can be represented in a four-letter code: J, L, U, or Z. For instance, all b-turns are classified into three categories, J, L, and U, in our simplified representation. The Z conformer is not observed in a /I-turn because of its extended conformation. In general, types I and I’ p-turns are L conformers while types II and II’ as well as types Via and VIb /?-turns are U conformers. The miscellaneous category IV tends to be Jst. The structures of longer loops are specified as a series of J, L, U, and Zs that describe the conformations of the partially overlapping t)etrapeptides. With practice, structural sequences are easily translated into loop structures and vice versa. For n amino acid residues, there should be a structural sequence of length n-3 that describes the loop’s conformation. The structural sequence representation for sample structures from each of the three classes are also shown in Figure 6. The structural sequence composition of the different loop classes is given in Table 2. Not surprisingly, the most populated conformers, L and J. are also the conformers that coincide with the helical and extended geometries found in regular secondary structures. The compositions of the structural sequences are also indicative of a particular loop class. Strap loops have a greater preference for tetrapeptides with extended conformations (Z and J) while the Q and [ loops have a greater preference for more compact or helical conformations (1, and I’). The structural sequence also presents a convenient method for comparing loop structures. The t’raditional approach for comparing protein

7 The four-residue p-turns are almost always Qs and [s depending on the angles between the two connect,ing anti-parallel P-strands. In general, Cs tend to be Rs and Ls and ,Js are both Qs and is. $ Abbreviations used: r.m.s.> root-mean-square; PDB, Protein Data Rank.

substructures involves calculating the root-meansquare (r.m.s.1) deviation of atomic positions of two optimally superimposed structures. When comparing loop structures, the similarity between any two loops is reduced to a single number. If the loops share a common motif in only part of the structures, the calculated r.m.s. deviation will not necessarily reflect this similarity. By contrast, the structural sequence breaks down structures into tetrapeptide units. In this notation, similar conformations form similar “words”. Whole loop words can either be matched in their entirety or they can be sub-matched within longer loop words. In this way, a common motif can be easily recognized in a family of structures. In addition, longer structural sequences can be compared by exploiting algorithms and software already developed for DNA and protein sequence comparisons. We utilize the structural sequences to describe in detail the conformations of two loop motifs identified previously: calcium binding loops, or EF-hands (Richardson & Richardson, 1988) and phosphate binding loops, or p-loops (Saraste et al.. 1990). The p-loops are c loops, and three examples given by Saraste et al. for which co-ordinates exist in the PDB follow the same structural motif, LZJZ. On the other hand, examination of the EF-hands specified by Richardson & Richardson (1988) suggests that there are two related conformers. As shown in Figure 8, the calcium binding loops are Q loops and follow the same structural motif with the exception of the first EF-hand found in the calcium binding protein from bovine intestine, Sicbl (colored blue). The structural sequence of 3icbl is UlJL instead of the consensus JZJL. Calculating the r.m.s. deviation of the alpha carbon backbone supports t’his differentiation (Table 3). The mean r.m.s. deviation of Sicbl is 2170( kO.124) A from the rest of the members of this family. The mean r.m.s. deviation amongst the rest of the loops excluding 3icbl is 0.378( + 0.207) A. A similar type of anal,vsis of the immunoglobulins may aid in the classification of the canonical loop structures (Chothia et al.. 1987).

(c) Periodicity

of hydrophobic residurs

Alpha-helices and P-strands display characteristic hydrophobic periodicities of 3.6 and 2. respectively. We wish to see if strap, R and [ loops also display

C. S. Ring et al.

(a)

(b)

Figure 8. (a) Alpha carbon backbones of calcium binding EF-hands superimposed. Pictured are: 4 loops from calmodulin (3cln: 21-27, 57-63, 94-100, and 130-136), 2 loops from parvalbumin (4cpv: 52-58 and 91-97), 2 loops from calcium binding protein (3icb: 16-22 and 55-61) and 4 loops from troponin (5tnc: 31-37,67-73, 107-113, and 143-149). The consensus structural sequence is JZJL while 3icbl (blue) is ULJL. (b) Alpha carbon backbones of phosphate binding foops superimposed. Pictured are: adenylate kinase (3adk: 15-22), elongation factor TU (letu: 1%25), and p21 protein (2~21: 11-17). The consensus structural sequence is LZJZ. behavior. Using the Eisenberg (1984) consensus hydrophobicity scale, each loop’s amino acid sequence is converted into a sequence of hydrophobicities. These are examined for periodic periodic

r.m.s. deviation

3clnl 3cln2 3cln3 3cln4 3icbl 3icb2 4cpv 1 4cpv2 5tncl 5tnc2 5tnc3 5tnc4

features using the method of Cornette et al. (1987). Power spectra for the three loop classes are shown in Figure 9. Lower frequencies are deleted in an effort to minimize noise. This seems justified since

Table 3 (8) among the CA binding loops

3clnl

3cln2

3cln3

3cln4

3icbl

3icb2

090 021 O-24 0.42 226 0.15 0.23 025 0.37 0.79 0.13 025

0.27 om 0.39 0.34 2.06 025 0.27 0.12 047 0.73 0.27 0.21

024 0.39 0.00 0.49 2.34 0.20 0.27 0.36 0.40 091 @19 031

0.42 034 o-49 oao

226 2.06 2-34 202 o4lo 2.23 222 2.07 230 1.98 226 2.13

@15 0.25 @20 0.39 2.23 0.00 0.21 @26 @39 0.78 0.13 0.24

O-39 0.44 0.28 0.45 0.38 024

4cpvl o-23 o-27 0.27 0.44 2.22 @21 0.00 0.28 O-42 084 0.19 0.28

4cpv2

Btncl

5tnc2

5tnc3

5tnc4

025 0.12 0.36 0.28 207 0.26 0.28 090 0.43 0.71 0.25 0.13

0.37 047 0.40 0.45 2.30 039 0.42 043 @oo @71 @35 033

0.79 0.73 o-91 069 1.98 0.78 084 0.71 0.71 om 083 0.70

0.13 027 0.19 0.38 2.26 0.13 @19 025 035

025 0.21 0.31 0.24 2.13 0.24 0.28 0.13 0.33 0.70 0.22 0.00

Classijkation 520

observation that hydrophobic residues are often found at the ends of the R loops anchoring the loop to the rest of the protein. Since this periodicity spans the length of the loop, it will not be evident in the power spectrum.

510 500 5

490

8

480

of Protein Loops

470 460 20

40

60

80

(d) Amino acid positional too

120

140

160

Frequency (a) 370

Omega Loops

360 n

20

I

40

60

80

100

120

140

160

180

100 120 Frequency

140

160

180

Frequency (b) 230 220 &

210

B

200 190 180 20

40

60

80

preferences

180

(c)

Figure 9. The power spectra for strap, omega and zeta loops. The strap loops show a peak at 130 cycles/residue corresponding to a periodicity of 2.7 residues/cycle and the zeta loops show a broad peak at 83 cycles/residue corresponding to a periodicity of 4.3 residues/cycle. No significant peak is observed for omega loops.

frequencies below 20 cycles/residue correspond to a periodicity greater than the longest loop length in the data set. Strap and [ loops display some characteristic periodicities. Although the periodicities of these loops are not robust enough to predict loop class based on sequence alone, they corroborate some visual observations. The strap loop power spectrum contains a peak that corresponds to a periodic feature of 2.7 residues/cycle. We believe that this may be due to an averaging of the P-strand-like and a-bulge-like structures that are prevalent in strap loops. There is also a much smaller peak corresponding to a periodic feature every 4.2 residues/cycle. In some cases, this represents some “loose a-helical coils”, which at longer lengths contributes to a linear structure. This 4.2 residues/cycle peak is the dominant feature in the c loop power spectrum. Since [ loops often contain one or more loose helical regions, the 4.2 residues repeat may reflect an unraveling of the 3.6 residue repeat seen in a-helices. There are no significant peaks observed for n loops. This may be due to the

We have examined the amino acids in each of the four positions (i, i + 1, i+ 2, i+ 3) in the structural sequences for possible positional preferences. Loops with lengths greater than four residues are considered as a series of overlapping tetrapeptides. The results are shown in Figure 10. Not surprisingly, glycine and proline exhibit the most positional preference. In general, the L conformer contains many of the same positional preferences seen by Wilmot & Thornton (1988) for type I B-turns. Both display a strong preference for proline, glutamic acid and serine at i+ 1 and glycine at both i+2 and i+3. The main differences seem to be in position i. Although strong preferences for asparagine, aspartic acid, and serine are seen for type I turns, they were absent in the Ls. These amino acids are thought to be preferred in type I turns because they readily form hydrogen bonds with the backbone nitrogen of i+2. Presumably, the reason for this difference is that type I turns are only a subset of the L While hydrogen bonding ability is conformers. important for type I turns, it is not necessarily so for Ls. A direct comparison of our statistics with the Wilmot and Thornton results is complicated because we use a different normalization procedure. We choose to normalize to the amino acid composition found in loops, instead of proteins as a whole, to avoid bias toward common loop residues. The U conformer shares some of the characteristics of type II turns. Because the 4 $ values in the left-handed a-helical region are often necessary to accommodate an abrupt reversal of the chain direction, there is an overwhelming preference for glycine at i + 2. A striking difference between type IT turns and Us is shown in the preference for proline. In type II turns, proline is dominant, at i+ 1 while in Us, proline is dominant at i but disfavored at i + 1. At both positions i + 1 and i+ 2 little preference is seen for one amino acid over others. There are no known positional preferences with which structural alphabets J and Z can be compared. The differences in the amino acid preferences for the J, L, U, and Z conformers argue for some correlation between sequence and structure. However, much more work is needed t,o uncover these relationships.

4. Conclusions Although regular structures are adequately described as being either helical or extended, there is no simple analogy that describes the aperiodic nature of loop conformations. The non-repeating values of C$ II/ angles produce a continuum of possible local conformations from helical to fully

694

C. S. Ring et al.

ACDEFGHIKLMNPPRSTVWY

ACDEFGHIKLMNPQRSTVWY

(a)

(e)

-61

-6 ACDEFGHIKLMNPQRSTVWY

ACDEFGHIKLMNPQRSTVWY

(b)

(f)

b

-67

-6 ACDEFGHIKLMNPQRSTVWY

ACDEFGHIKLMNPQRSTVWY

(cl

(9)

ACDEFGHIKLMNPQRSTVWY

ACDEFGHIKLMNPQRSTVWY (d)

(h)

Fig.

extended. This diversity of possible structures, especially for longer loops, concomitant with a lack of an adequate method for their description are responsible for references to loops as “random coils”. In an effort to address this problem, we present a stratified picture of loop organization. Geometric

10.

variables, linearity and flatness, define a natural morphological classification for loop structures. The three main classes, linear, non-linear and flat, and globular, have been named strap, Q and ( loops, respectively. The similar general morphologies shared by the members of these classes may be viewed as the basic structural motifs of loop confor-

Classification

6

695

of Protein Loops 6-j

1

-6 ACDEFGHIKLMNPPRSTVWk

ACDEFGHIKLMNPQRSTVWY 1 i)

(m)

-69 ACDEFGHIKLMNPQRSTVWY

ACDEFGHIKLMNPQRSTVWY

(j)

(n)

-4 -6 ACDEFGHIKLMNPQRSTVWY

ACDEFGHIKLMNPQRSTVW (k)

6

I

2 0 -2

-6 1

ACDEFGHIKLMNPQRSTVWY

(1)

-6

'

ACDEFGHIKLMNPQRSTVWY (P)

Figure 10. Statistical d values of the 20 amino acids for the 4 structural alphabets. The positional preferences for structural alphabet J: (a) i, (b) i+ 1, (c) i+2, and (d) i+3. The positional preferences for structural alphabet L: (e) i, (f) i+ 1, (g) i+2, and (h) i+3. The positional preferences for structural alphabet U: (i) i, (j) i+ 1, (k) i+2, and (1) i+3. The positional preferences for structural alphabet Z: (m) i, (n) i+ 1, (0) i+2, and (p) i+3.

mations. Simple loops are those that contain only one of these structural motifs while compound loops are constructed from combinations of simple loops. In addition to the general morphologic description owing to the strap, R, or [ loop classification,

detailed conformational information can be further specified using the structural alphabet. The structural alphabet is a shorthand notation for the virtual dihedral angle formed by four consecutive alpha carbons. The three-dimensional structure of a

696

C. S. Ring et al.

loop is translated into a series of J, L, U, and Zs that describe the conformations of partially overlapping tetramers. The reduction of the three-dimensional structure into a one-dimensional string provides a convenient method for loop comparisons. Because similar conformations form similar words, consensus structural motifs can be used to describe a family of functionally equivalent loops. Establishing consensus structures for such functionally important loops as the calcium binding EF hands and phosphate binding p-loops facilitates their study by explicitly linking structure to function. Moreover, expanding this type of analysis to families of homologous proteins may eventually extend the canonical loop concept beyond immunoglobulins. Loops in homologous families such as serine proteases may also form a limited number of structures. By examining series of these homologous loops, key residues and the interactions that determine aperiodic conformations may be elucidated. The knowledge gained from studying loop conformations should be applicable to macromolecular structure modeling projects. So far, homology modeling, which uses a known structure as a scaffold to build the structure of a homologous protein, is the most accurate method for protein structure prediction. The secondary structure framework tends to be relatively well conserved between homologous structures. Because loops are most often sites of insertions and deletions, these are the least conserved elements and the most difficult regions to model correctly. Strategies for modeling loops will become increasingly important as more proteins become candidates for homology modeling. Recent work by Dorit et al. (1990) and Bowie et al. (1991) suggests that structure-based modeling may be extended to proteins that share little sequence identity by correlating sequences with known folding motifs. If the canonical loop concept can be extended beyond immunoglobulin structure, this, in principle, would allow predictions of loop conformations as accurate as those achieved by Chothia et al. (1989) for hypervariable loops. In the meantime, sequence preferences found for the structural alphabets may be used as an additional filter for traditional loop building procedures. The loop dictionary approach, which involves searching a database of known structures that matches certain specified criteria such as the number of residues and end-to-end distance, does not produce a unique solution (Jones & Thirup, 1986). Neither does de rwvo generation, which builds loop structures by searching through possible 4 II/ combinations (Bruccoleri & Karplus, 1987). Although energy calculations can help weed out unacceptable solutions, choosing between acceptable alternatives is difficult. By translating possible loop structures into structural sequences, positional amino acid preferences of the tetrameric units may be used to rank these structures. The likelihood of a particular loop structure could be defined as the product of the likelihoods of the tetrameric com-

ponent substructures. In this way, the most likely structures will be those that are compatible with the observed amino acid sequence of the loop. We thank Mcdonald Morris, Scott Presnell and John Troyer for their helpful suggestions and discussions. This work was supported by grants from the National Institutes of Health (GM39900, to F.E.C.; RR1081, to R.L.; GM07175 to C.S.R.), the Defense Advanced Research Proiects Anencv (NOOlC86-K-0757. to F.E.C. and R.L.), thk Sear]: Scholars (F.E.C.) and the Natural Resources and Engineering Research Council of Canada (D.G.K.). References Abad-Zapatero, C., Griffith, J. P., Sussman, J. L. & Rossman, M. G. (1987). Refined crystal structure of dogfish M4 apo-lactate dehydrogenase. J. Mol. Biol. 198, 445-467. Acharya, K. R., Stuart, D. I., Walker, N. P. C., Lewis, M. & Phillips, D. C. (1989). Refined structure of baboon alpha-lactalbumin at 1.7 A resolution. Comparison with C-type lysozyme. J. Mol. Biol. 208, 99-127. Adman, E. T., Turley, S., Bramson, R., Petratos, K., Banner, D., Tsernoglou, D., Beppu, T. & Watanabe, H. (1989). A 2.0-A structure of the blue copper protein (cupredoxin) from Alculignesfaecalis S-6. J. Biol. Chem. 264, 87-99. Almassy, R. J., Fontecilla-Camps, J. C., Suddath, F. L. & Bugg, C. E. (1983). Structure of variant-3 scorpion neurotoxin from Centruroides sculpturatw ewing, refined at 1.8 A resolution. J. Mol. Biol. 170, 497-527. Babu, Y. S., Bugg, C. E. & Cook, W. J. (1988). Structure of calmodulin 2.2 A resolution. refined at J. Mol. Biol. 204, 191-204. Baker, E. N. (1988). Structure of azurin from Alcaligenes denitrijcans. Refinement at I.8 A resolution and comparison of the two crystallographically independent molecules. J. Mol. Biol. 203, 1071-1095. Baker, E. N. & Dodson, E. J. (1980). Crystallographic refinement of the structure of actinidin at I.7 A resolution by fast Fourier least-squares methods. Acta Crystal&w. sect. A, 36, 559-572. Banner, D. W., Bloomer, A. C., Petsko, G. A., Phillips, D. C. & Wilson, I. A. (1976). Atomic coordinates for triose phosphate isomerase from chicken muscle. B&hem. Biophys. Res. Commun. 72, 146-155. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112 (3), 535-542. Betzel, C., Pal, G. P. & Saenger, W. (1988). Synchrotron X-ray data collection and restrained least-squares refinement of the crystal structure of proteinase K at 15 A resolution. Aeta Crystallogr. sect. B, 44. 163-172. Birktoft, J. J., Rhodes, G. & Banaszak, L. J. (1989). Refined crystal structure of cytoplasmic malate dehydrogenase at 2,5-A resolution. Biochemistry, 28, 6065-6081. Blake, C. C. F., Geisow, M. J., Oatley, S. J., Rerat, B. & Rerat, C. (1978). Structure of prealbumin, secondary, tertiary and quaternary interactions determined by Fourier refinement at 1.8 A. J. Mol. Biol. 121, 339-356. Blundell, T. L., Pitts, J. E., Tickle, I. J., Wood, S. P. &

Class$cation Wu, C. W. (1981). X-ray analysis (1.4 A resolution) of avian pancreatic polypeptide. Small globular protein hormone. Proc. Nat. Acad. Sci., U.S.A. 78, 41754179. Bolin. J. T.. Filman, D. J., Matthews, D. A., Hamlin, R. C. & Kraut, J. (1982). Crystal structures of Escherichia coli and Lactobacillus casei dihydrofolate reductase refined at 1.7 A resolution. I. General features and binding of methotrexate. J. Biol. Chem.

257, 13650-13662. Kowie, J. U.. Luthy, R. & Eisenberg, D. (1991). A method to identify protein sequences that fold into a known three-dimensional structure. Science, 253, 164-169. Brayer. G. D. & McPherson, A. (1983). Refined structure of the gene 5 DNA binding protein from bacteriophage FD. J. Mol. Biol. 169, 565-596. Bruccoleri, R. E. & Karplus, M. (1987). Prediction of folding of short polypeptide segments by uniform conformational sampling. Biopolymers, 26, 137-168. Chothia. C. & Lesk, A. M. (1987). Canonical structures for the hypervariable regions of immunoglobulins. J. Mol. Biol. 196, 901-918. C.. Lesk, A. M., Tramontano, A., Levitt, M., Chothia, Smith-Gill, S. J., Air, G., Sheriff, S., Padlan, E. A., Davies, D., Tulip, W. R., Colman, P. M., Spinelli, S., Alzari, P. M. & Poljak, R. J. (1989). Conformations of immunoglobulin hypervariable regions. Nature (London), 342, 8777883. Colloc’h. N. & Cohen, F. E. (1991). \-Breakers: an aperiodic secondary structure. J. Mol. Biol. 221, W3-613. Cornette, J. L., Cease, K. B., Margalit, H., Spouge, J. L., Berzofsky, J. A. & DeLisi. C. (1987). Hydrophobicity scales and computational techniques for detecting amphipathic structure in proteins. J. Mol. Biol. 195,

659-685. Dijkstra, B. W., Kalk, K. H. & Drenth, J. (1981). Structure of bovine pancreatic phospholipase A2 at 1.7 A resolution. J. Mol. Biol. 147, 97-123. Dorit, R. L.. Schoenbach, L. & Gilbert, W. (1990). How big is the universe of exons. Science, 250, 137771382. Dreusicke, D., Karplus, P. A. & Schultz, G. E. (1988). Refined structure of porcine cytosolic adenylate kinase at 2.1 A resolution. J. Mol. Biol. 199, 359-371. Edwards, M. S., Sternberg, M. J. & Thornton, J. M. (1987). Structural and sequence patterns in the loops of /3c@ units. Protein Eng. 1, 173-181. Eisenberg, D. (1984). Three-dimensional structure of membrane and surface proteins. Annu. Rev. Biochem. 53, 595-623. Eklund, H., Nordstrom, B., Zeppezauer, E., Soderlund, G., Ohlsson, I., Boiwe, T., Tapia, 0. & Branden, C. I. (1976). Three-dimensional structure of horse liver alcohol dehydrogenase at 2.4 A resolution. J. Mol. Biol. 102, 27-59. Fermi, G., Perutz, M. F., Shaanan, B. & Fourme, R. (1984). The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. J. Mol. Biol. 175,

159-174. Ferrin, T., Huang, C., Jarvis, L. & Langridge, R. (1988). The MIDAS Display System. J. Mol. Graph. 6, 13-37. Finzel, B. C., Poulos, T. L. & Kraut, J. (1984). Crystal structure of yeast cytochrome c peroxidase refined at 1.7-A resolution. J. Biol. Chem. 259, 13027. Finzel, B. C., Weber, P. C., Hardman, K. D. & Salemme, F. R. (1985). Structure of ferricytochrome c(prime) from Rhodospirillum wwlischianum at 1.67 A resolution. J. Mol. Biol. 186, 627-643. Fita, I. & Rossmann, M. G. (1985). The NADPH binding

of Protein Loops

697

site on beef liver catalase. Proc. Nat. Acad. Sci., U.S.A. 82, 1604-1608. Furey, W., Wang, B. C., Yoo, C. S. & Sax, M. (1983). Structure of a novel Bence-Jones protein (RHE) fragment at 1.6 A resolution. .I. Mol. Biol. 167,

661-692. Guss,

J. M. & Freeman, H. C. (1983). Structure of oxidized poplar plastocyanin at 1.6 A resolution. J. Mol. Biol. 169, 521-563. Higuchi, Y., Kusunoki, M., Matsuura, Y., Yasnoka, N. & Kakudo, M. (1984). Refined structure of cytochrome c3 at 1.8 A resolution. J. Mol. Biol. 172, 109-139. Holmes, M. A. & Matthews, B. W. (1982). Structure of thermolysin refined at 1.6 A resolution. J. Mol. Biol. 160, 623-639. James, M. N. G. & Sielecki, A. R. (1983). Structure and refinement of penicillopepsin at 1.8 A resolution. J. Mol. Biol. 163, 299-361. Jones, T. A. & Thirup, S. (1986). Using known substructures in protein model building and crystallography. EMBO J. 5. 819-822. Kamphuis, I. G., Kalk, K. H., Swarte, M. B. A. & Drenth, J. (1984). Structure of papain refined at 1.65 A resolution. J. Mol. Biol. 179, 233-256. Karplus, P. A. & Schulz, G. E. (1987). Refined structure of glutathione reductase at 1.54 A resolution. J. Mol. Biol. 195, 701-729. Kneller, D. G. (1988). Modeling of loops in proteins. Ph.D. Thesis, University of California, Berkeley. Kumar, V. D., Lee, L. & Edwards, B. F. P. (1990). Refined crystal structure of calcium-liganded carp parvalbumin 425 at 1.5 A resolution. Biochemistry, 29, 140441412. Legg, M. J. (1977). Ph.D. Thesis, Texas Agricultural and Mechanical University. Leijonmarck, M. & Liljas, A. (1987). Structure of the C-terminal domain of the ribosomal protein L7/L12 from Escherichia coli at 1.7 A. J. Mol. Biol. 195,

555-579. Leszczynski, J. F. & Rose, G. D. (1986). Loops in globular proteins: a novel category of secondary structure.

Science, 234, 849-855. Lewis, P. N., Momany, F. A. t Scheraga, H. A. (1973). Chain reversals in proteins. B&him. Biophys. Acta, 303, 21 l-229. Liljas, L. & Strandberg, B. (1984). The structure of satellite tobacco necrosis virus. In Biological Macromolecules and Assemblies (Jurnak, F. A. & McPherson, A., eds), pp. 97-119, John Wiley and Sons, New York. Love, R. A. & Stroud, R. M. (1986). The crystal structure of alpha-bungarotoxin at 2.5 A resolution. Relation to solution structure and binding to acetylcholine receptor. Protein Eng. 1, 3746. Marquart, M., Deisenhofer, J., Huber, R. & Palm, W. (1980). Crystallographic refinement and atomic models of the intact immunoglobulin molecule Kol and its antigen-binding fragment at 3.0 A and 1.9 A resolution. J. Mol. Biol. 141, 369-391. Marquart, M., Walter, J., Deisenhofer, J., Bode, W. & Huber, R. (1983). The geometry of the reactive site and of the peptide groups in trypsin, trypsinogen and its complexes with inhibitors. Acta Crystal&r. sect. B, 39, 480-490. Matsuura, Y., Takano, T. & Dickerson, R. E. (1982). Structure of cytochrome ~551 from P. aeruginosa refined at 1.6 A resolution and comparison of the two redox forms. J. Mol. Biol. 156, 389-409. McPhalen, C. A. & James, M. N. G. (1987). Crystal and

698

C. S. Ring

molecular structure of the serine proteinase inhibitor CI-2 from barley seeds. Biochemistry, 26, 261-269. McPhalen, C. A. & James, M. N. G. (1988). Structural comparison of two serine proteinase-protein inhibitor complexes. Eglin-C-subtilisin Carlsberg and 27, 6582-6598. CI-2-subtilisin novo. Biochemistry, Meyer, E., Cole, G., Radahakrishnan, R. C Epp, 0. (1988). Structure of native porcine pancreatic elastase at 1.65 A resolution. Acta Crystallogr. sect. B, 44, 26-38. Miki, K., Deisenhofer, J. & Michel, H. (1989). Three-dimensional structure of photosynthetic reaction center from purple bacteria. Tanpakushitsu Kakusan

Koso

(Protein,

Nucleic

Acid,

Enzyme).

34,

726-740. Milner-White, E. J., Ross, B. M., Ismail, R., Belhedj-Mostefa, K. & Poet, R. (1988). One type of gamma-turn, rather than the other gives rise to chain-reversal in proteins. J. Mol. Biol. 204, 777-782. Moult, J., Sussman, F. & James, M. N. G. (1985). Electron density calculations as an extension of protein structure refinement. Streptomyces griseus protease at 1.5 A resolution. J. Mol. Biol. 182, 555-566. Navia, M. A., McKeever, B. M., Springer, J. P., Lin, T. Y., Williams, H. R., Fluder, E. M., Darn, C. P. & Hoogsteen, K. (1989). Structure of human neutrophil elastase in complex with a peptide chloromethyl ketone inhibitor at 1.84 A resolution. Proc. Nat. Acad. Sci., U.S.A. 86, 7-11. Pauling, L. & Corey, R. B. (1951a). Configurations of polypeptide chains with favored orientations around single bonds: two new pleated sheets. Proc. Nat. Acad. Sci.,

U.S.A.

37, 729-740.

Pauling, L., Corey, R. B. & Branson, H. R. (1951b). The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Nat. Acad. Sci., U.S.A. 37, 205-211. Pflugrath, J. W., Wiegand, G., Huber, R. & Vertesy, L. (1986). Crystal structure determination, refinement and the molecular model of the alpha-amylase inhibitor HOE467A. J. Mol. Biol. 189, 383-386. Phillips, S. E. V. (1980). Structure and refinement of oxymyoglobin at 1.6 A resolution. J. Mol. Biol. 142, 531-554. Ploegman, J. H., Drent, G., Kalk, K. H. & Hol, W. G. J. (1978). Structure of bovine liver rhodanese. I. Structure determination at 2.5 A resolution and a comparison of the conformation and sequence of its two domains. J. Mol. Biol. 123, 557-594. Poulos, T. L., Finzel, B. C. t Howard, A. J. (1987). High-resolution crystal structure of cytochrome P450cam. J. Mol. Biol. 195, 687-700. Rees, D. C., Lewis, M. & Lipscomb, W. N. (1983). Refined crystal structure of carboxypeptidase A at 1.54 A resolution. J. Mol. Biol. 168, 367-387. Remington, S., Wiegand, G. & Huber, R. (1982). Crystallographic refinement and atomic models of two different forms of citrate synthase at 2.7 and 1.7 A resolution. J. Mol. Biol. 158, 111-152. Rice, P. A., Goldman, A. & Steitz, T. A. (1990). A helixturn-strand structural motif common in alpha-beta proteins. Proteins, 8, 334-340. Richards, F. M. & Kundrot, C. E. (1988). Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins, 3, 71-84. Richardson, J. S. (1981). The anatomy and taxonomy of protein structure. Advan. Protein Chem. 34, 167-339. Richardson, J. S. & Richardson, D. C. (1988). Helix lap-

et al.

joints as ion-binding sites: DNA-binding motifs and Ca-binding “Eli‘ hands” are related by charge and sequence reversal. Proteins, 4, 229-239. Richardson, *J. S. & Richardson, D. C. (1989). Principles and patterns of protein conformation. In Prediction of Protein Conformation

Structure

and

Principles

of

Protein

(Fasman, G. D., ed.), pp. l-98, Plenum Press, New York. Rooman, M. J., Rodriguez, J. & Wodak, S. J. (1990). Automatic definition of recurrent local structure motifs in proteins. J. Mol. Biol. 213, 327-336. Rose, G. D., Gierasch, L. M. & Smith, J. A. (1985). Turns in peptides and proteins. Advan. Protein Chem. 37, l-109. Rypniewski, W. R. & Evans, P. R. (1989). The crystal structure of unliganded phosphofructokinase from Eseherichia coli. J. Mol. Biol. 207, 805-821. Sack, J. S., Saper, M. A. BE Quiocho, F. A. (1989). Periplasmic binding protein structure and function. Refined X-ray structures of the leucine/isoleucine/ valine-binding protein and its complex with leucine. J. Mol. Biol. 206, 171-191. Sara&e, M., Sibbald, P. R. & Wittinghofer, A. (1990). The P-loop-a common motif in ATP- and GTP-binding proteins. Trends Biochem. Sci. 15, 430-434. Schreuder, H. A., Van Der Laan, J. M., Hol, W. G. J. & Drenth, J. (1988). Crystal structure of p-hydroxybenzoate hydroxylase complex with its reaction product 3,4-dihydroxybenzoate. J. Mol. Biol. 199, 637-648. Sheriff, S., Hendrickson, W. A. & Smith, J. L. (1987). Structure of myohemerythrin in the azidomet state at 1.7/1.3A resolution. J. Mol. Biol. 197, 273-296. Smith, J. L., Corfield, P. W. R., Hendrickson, W. A. & Low, B. W. (1988). Refinement at 1.4 A resolution of a model of erabutoxin B. Treatment of ordered solvent and discrete disorder. Acta Crystallogr. Sect. A, 44, 357-368.

Smith, R. F. & Smith, T. F. (1990). Automatic generation of primary sequence patterns from sets of related protein sequences. Proc. Nat. Acad. Sci., U.S.A. 87, 118-122. Smith, W. W., Burnett, R. M., Darling, G. D. & Ludwig, M. L. (1977). Structure of the semiquinone form of flavodoxin from Clostridium mp. Extension of 1.8 b resolution and some comparisons with the oxidized state. J. Mol. Biol. 117, 195-225. Stout, C. D. (1989). Refinement of the 7 Fe ferredoxin from Azobacter at 1.9 A resolution. J. Mol. Biol. 205, 545-555. Suguna, K., Bott, R. R., Padlan, E. A. & Subramanian, E. (1987). Structure and refinement at 1.8 A resolution of the aspartic proteinase from Rhizopus chinensis. J. Mol. Biol. 196, 877-900. Szebenyi, D. M. E. & Moffat, K. (1986). The refined structure of vitamin D-dependent calcium-binding protein from bovine intestine. Molecular details, ion binding, and implications for the structure of other calcium-binding proteins. J. Biol. Chem. 261, 8761-8777. Tainer, J. A., Getzoff, E. D., Beem, K. M., Richardson, J. S. & Richardson, D. C. (1982). Determination and analysis of the 2 A structure of copper, zinc superoxide dismutase. J. Mol. Biol. 160, 181-217. Takano, T. (1984). Refinement of myoglobin and cytoand Applications in chrome c. In Methods Crystallographic Comp&ing (Hall, S. R. & Ashida, T., eds), pp. 262-272, Oxford University Press, Oxford. Unger, R., Harel, D., Wherland, S. & Sussman, J. L.

Classijication (1989). A 3D building blocks approach to analyzing and prediction structure of proteins. Proteins, 5, 355-373. Venkatachalam, C. M. (1968). Stereochemical criteria for polypeptides and proteins. Conformation of a system of three linked peptide units. Biopolymers, 6, 1425-1436. Vijay-Kumar, S., Bugg, C. E. & Cook, W. J. (1987). Structure of ubiquitin refined at 18 A resolution. J. Mol. Biol. 194, 531-544. Walter, J., Steigemann, W., Singh, T. P., Bartunik, H., Bode, W. & Huber, R. (1982). On the disordered activation domain in trypsinogen. Chemical labelling Acta and low-temperature crystallography. Crystallogr. sect. B, 38, 146221472. Weaver. L. H. & Matthews, B. W. (1987). Structure of

of Protein Loops bacteriophage T4 lysozyme refined at 1.7 A resolution. J. Mol. Biol. 193, 189-199. Weber, I. T. & Steitz, T. A. (1987). Structure of a complex of catabolite gene activator protein and cyclic AMP refined at 2.5 A resolution. J. Mol. Biol. 198, 31 l-326. Wilmot, C. M. & Thornton, J. M. (1988). Analysis and prediction of the different types of /I-turn in proteins. J. Mol. Biol. 203, 221-232. Wilmot, C. M. & Thornton, J. M. (1990). B-Turns and their distortions: a proposed new nomenclature. Protein Eng. 3, 479-493. Wlodawer, A., Svensson, L. A., Sjolin, L. & Gilliland, G. L. (1988). Structure of phosphate-free ribonuclease A refined at 1.26 A. Riochemistry, 27, 2705527 17.

Edited by P. E. Wright