J. Xol.
Biol. (1959) 128, 49-79
Gene Duplications
in the Structural
(Received 5 July
Evolution
1.978, and i,r recised ,fkw
of Chymotrypsin
31 October 1.978)
(‘tiymotrypsin and other mtbmbers of ttlr srrirle protrxse er~yme family have a st,ructure built from two similar domains, eac11 of which is a tlydrogan-bonded tbarrel, containing six antiparallel strands of beta-sheet bonded in the order .&BCFED-.4 . . . Ttle folding patt,erns of the domains IIRVI, been reexamined hy several IWX~~ improved shape comparisotl m&hods t,o see whether the as proposed by Matthews and barrels could have evolved by gene dupliratiorl, Blow (Birkt,oft & Blow, 1972). The domains hn\.ts a similar 11ydrogen-bond pattern, the same shear number (defined in t,ttis paper) for the twist. of the barrel, and the cores of t!lleir fl-sttrets can he superimt)osrd so that 46 t,opologically cqllivnlent. x-carbons fit within a root -~RRII-S(~IIWIY dlst,nnce of 2.43 -4 and a larger set. of 57 n-carbons fit, within 3.4 A. Ttrescl results are highly significant when judged against &ape comparisons of mark?~other proteins with themselves, and give strong cvidencc for ycl,c> d~~plication. The tlllplicntiorl does not include any SS bridges. Roth domains have a slu-prisingly symmetrical struct’lu-ct of two halves ABC:, DEF paired round a dyad axis, and the half-domains arc each made of two loops twisted in an L-shape, sincca ttle second strand (H or E) is bent into two halves H,, B, or E,. E,. The cores of the four tralf-domains. each of 23 m-carbons, supcrimpoxr in pairs wit’tr root-mean-sctlIare distances ran+ng from l.i’3 to 2-45 K. 111 the entire molecule tile tlalf-domains arc related by a scww dyad which corl\ erts domain I strands (AR(‘) (DEF) int,o domain II st,rands (DEW) (ABC) superimposing the six strands wittl a root-lllearl-sctllare distance of 2.35 ‘4. These observations suggest that t h(, chymot~rppsin barrel originall>evolved from a celosely-linked dirnrr of two intertwined half-domains wllioll became united into OIW domain by gene duplication. The enzyme ~~l\-ed from H second dimer of The bacterial protease B from two full domains and a second duplication. ,Ytre@onsycex griseus shows tire same strllrtnral rt~t~ats and ix ronsist,errt~ with the gene duplicat,ion hypottlesis. improved methods for shape comparison of proteins tlavc beeli developed whictl are very fasit and reliable.
1. Introduction (a) Serine
potease
double domain structure
The original X-ray analysis of a-chymotrypsin (Matthews et ccl., 1967) showed that, the backbone of the protein consists largely of anti-parallel p-sheet and extended chain (Sigler et al., 1968; Birktoft et aE.. 1970) built round two hydrophobic cores. The cores are hydrogen-bonded barrels, or twisted cylinders, and Matthews (see Blow, 1969) noticed that their hydrogen-bonding patterns were similar, as if the entire molecule had been formed from two domains by gene duplication within the single polypeptide
The hydrogen-hondrd barrrl is a vu-> ancient st)ructurtl \I-ttiah occut’s t,tiroughout the chymotrypsin family (Shotlton & Hart~lt~y. 1970) iucluding nob only the closc~ly related serine enzymes rlast,ase and tr,vpsin (l+hlhatnmet~ P{ ~1.. 1977 : Kossiakoff et al.. 1977: Stroud d al., 1974). but also thromhitt (Magttusson of ~1.. 1975). l~lasmittogcn (Magnusson d al.. 1976) and many of t’hr blood-clot’t,ing protritt factors (Dnricb & Fujikawa, 1975; Koide et al., 1977: Ii’ujikawa rt ~1.. 1977; Di Scipio rt uZ.. 1977). This family in turn is di&ntly rclat,cd to tttr, Itwcterial sc,ritttl cnzymt~s (W Hai% ct rtl.. 1976) since protcase B of Strepton~yces grisws (Dclbacrt: d d., 1976) cot11ains t tic: satnt~ t,wo domain structures even though peripttc~t~at IOO~M of tttct trtolt~ct~lr ttt’tl very differc~ttt. The x-lytic prot)ense from Jly~ohrxct~~ 495 yrobaltly also folds in a sitnilar \\xy (Mu~cl,acI~Ian & Shott,on. 1971 : Delbaerc> P’fal.. 1975). The qu&ion of t,hrt c\-olution of t,lw domain still rtwains opti, l)ut many tn~tx protein structures have been solved in the last few years and these have led to it better understanding oft he relationships between structures. So it seems useful t’o rc-exatninc the suggestion of gene duplication. It is no longct~ true t’hat lack of sc~quctice hotnology is in itself conclusivtl rvidenoe ngairlstj gene duplicat iott. Xko olwrvatic~tts ott tlw folding of P-sheets and new quantit,ativr methods of comparing three-dinlt~tlsiotttll structures make a fresh approach possil)lc.
(b) Quuntitatier
compri.so7r
of sfructurw
Quantitative methods for detecting structural similaritg bet\veen t)hr z-carbon backbones of proteins usually work by a least-squares superposition of one piece of structure upon another and were originally developed to cornpa,re closel; related structures such as chymotrypsinogen with chymotrypsin or crythrocruorin with myoglobin (Freer et al.. 1970: Huber et nl.. 1971). The next, application was to compare known repeated st’ructural elements in the same protein. like the t’1I.o dyad-related calcium-binding sites in parvalbumin (McLachlan. 19726), hut a great advance has been the development of automatic met~hotls which s~st~enlwt,icallg search for rnat,ching runs of structure anywhere in a pair of pt.&e& (Rae & Rossmati. 1973 : Rossmann $ rcArgos, 1976,1977). This approach has shown clear st~tuctural and evolutionary lationships between the NAD-binding domains of dehydrogeuwsrs (Burhttr,r et crl.. 1973; Rossmann ef al., 1974.1975: Rossman C%Liljas. 1974). ‘l!ht~rt~ are also proltable relationships between distantly related enzymes with similar functions such as hen lysozyme and bacteriophage T4 lysozyme (Rossmann & Argos, 1976; Remington ut al., 1978; Imoto et al., 1972; Remington & Matthews. 1978). In many other cases bvhere two structures are partly similar it is difficult to distinguish hct\veen a distant relat’ionship and chance coincidence. Example s are the NAD-binding domains of’ adenylate kinase and dehydrogenases (Schulz & Schirmer, 1974: Rnssmann ft al.. 1975), the
I-::I”XE 75 THE
STRU(“l’llRAL
EVOLITTIOS
OF (‘HYMOTRYPSI~
51
super-secondary structures of subtilisin and carboxypeptidase (Rossmann & Argos: 1977). parts of parvalbumin and phage lysozyme (Tufty & Kretsinger, 1975): or the haem pockets of cyt’ochrome 11, and haemoglobin (Rossmann & Argos, 1975:1976). One of t,hc most intriguing is the p-sheet fold of the antibody variable domain, which matc+hrs the st,ructural core of Cu-211 suprroxide dismutnse (Richardson et nl., 1976) I jut, (XII hardly have a closr rvolut8ionary relationship t,o it.
Rossman’s mt+hod of comparison rotates one prot,ein across another and searches for long runs of at,oms which coincide (Rossmann & Argos. 1977). Remington & Matt,hews (1978) have tried a new approach. like the one used by Fitch (1966) for comparing sequences. in which zones of a fixed length. say 50 residues. are matched b) a least-squares process. This approach is simple itlld objective. while the statistical distributions of the matching distances give an empirical measure of significance. The only dra.whack is that it is difficult to compare related st,r‘uctures which differ by long deletions or insertions of amino acids. In the present, \vork thcb sa,me approach is used. with improvrmcwt8s which giw H 1000-fold increnw in wmputing speed.
That IWW zone comparisons will show that thr chymotrypsin doma,ins have similar cows which can be superimpostd quit.e prceisely when differences in the connecting loopy art‘ ignored. Mort:ovcr each core is itself made of a pair of consecutive intcrlocked three-stranded antiparallel ,&sheet units which are related by a 2-fold rotation axis. In the complet’e molecule the two cores havcb an unusual spatial relationship. Thtw is a 2-f&1 screw axis running t,hrough the centrr of t’he probein, and the scrw operation buptrimposes the 1,~) units of domain I onto t hc t \vo units of domain II chain). The actiw in rrvwwd order (hut, wit’h the h samt’ direction of the polvpeptide site thus consists of the front f:,kcr>of domain I and thr hack f&cc>of domain II. Thew wlntionships arcs illustrated latw in l$yrcs 4. 7 am1 9.
;j .I
Ai. 1). ~I<~I,.-\~‘H 1,.4X
If speed is unimportant and the 2 sets of atoms fit, well all 3 methods arc good. H11t tilt* rotation angle methods are all slow. The rotation matrix derived h\* drcomposiny tllca most general linear relation L according t,o tile srcond procrdr~rc~ clcrpctnds on whetllr~r .\ is fitted to B or aice 7%ersaand does note minirnisr, t’llr r.m.s.t distancr. ‘l’lrc m&lot1 used in this paper is a development of thr t,hirtl approach (MrLachlar~. 1972~). nrld is tl~~scrihc~tl in the Appendix. The need to improve existing algoritllms arises because a shape comparison of 2 proteins by tllc zone m&hod may need as marry as 10s least-squares fits. High speed artd are essential, and all tllo rare special CHSCSmust) be treat4 correctly to avoitl reliability errors. (b) Computer
prop-ma
To fit atoms 1, . . , iV of sets A and R t,o one another requires the matrix U, the sums of squares w,a,2 and w,b,2 and the centroids of A and B. These quantities arc defined ill the Appendix. Once they have been evaluated thebr can be updat,ed for the next zoncx 2. . . . , (N + 1) by deleting atom 1 and adding atom (LX’ -+ 1). Thus the most efficient procedure is to work parallel to the diagonals of the comparison matrix, saving the results to be rebuilt in rows and columns later. To calculat)e E without finding R, the solution of the cubic equation for the positive eigen\-alucs of UU’ is found by the cosine method, and takes l/l000 s on an IBM370/165. To calculate R by the decomposition of U using the eigenvectors of t.he 6 x 6 52 matrix takes l/l00 s. Thlls a comparison of 2 proteins 150 residues long over a zone length of 51) takes only 10 s if R is not, calculated, and it, is possible to survey many proteins in a short, tirnr. During the calculations the mean and standard deviation of E are found and the matrix is then printed out, as a pattern of contour symbols which show t,he most, significant rc4ationships between zones. (c) Screw transformatior~ The rigid-body transformation which superimposes includes the translation which moves A, t)lre centroid It is often useful to express t,he full t,ratlsformatiorl (Y -
B)
R(x
a structure A onto another one R of set A. ont,o the other centroid B.
~- A)
(1)
as a screw (Muirhead et al., 1967) cotlsistitly of tile rotation R about the axis 1 througtl a suitably chosen centre C, combined lvit.tl a translatiolr hy a distance t along 1. To do this. first resolve the \-e&or B - A irrt)o palta parallel atld perpendicular t,c) 1. B-A
s * It,
wllere t = (B - A).l. Now choose C to lit, OII the plane of A and B. A suitable transformation is tlrell (y -
where
0 is the rotation
C) = R(x
~~ C) i- It
C = +[(l
L. s) cot(O/2)
(2) tlortnal
-I- (B + A)],
t,o 1 tllrough
the midpoint,
(3)
angle.
When the backbones of 2 proteins havr a similar shape in part. of thcair structure tllc first molecule can be superimposed onto the second by choosing a certain set. of atoms as guide points for the least-squares fit. However it may happen that these atoms are not the best ones to use, or that a larger set of atoms fits equally well. At this stage of the investigation it is useful to draw a plot of all t,he distances between each a-carbon atom of the rotated molecule A and atoms of the fixed molecule B. The diagram is similar to the well-known Ooi contact map for the atoms of a protein (Ooi & Nishikawa, 1973; Nishikawa & Ooi, 1971; Sawyer et al.. 1978) except that there are 2 different structures plotted along the horizontal and vertical axes. Matched lengths of chait) show as lines of t Abbreviation
used: r.m.s., l,oot-mran-sqrllt~(,.
C,ENE
IS THE
STRUCTURAL
EVOLCTIOS
OP CHYMOTRYPYtS
53
closely superimposed atoms running parallel to a diagonal of the plot, and gaps or insertions show as clear breaks. If the 2 structures are very similar in overall architecture the plot may also show other ghost coincidences which mimic the atomic contacts within A itself or B itself. Once a possible relationship has been tried, and the molecules broupht into rough coincidence the fit can be improved repeatedly by using the superposition plot to eelcct, new guide pointIs until thr best match is follnd. Lat,cr. Figure 11 illustrates sliclr H plot.
3. Statistical Significance of Fits If the folded structures of proteins resembled frozen pieces of random coil it would be possible to estimate the a priori probability distribution of t,he least-squares fitting distances in a zone comparison of two unrelated structures. Rossmann & Argos (1976) have attempted to use a criterion of this kind. But most globular proteins contain long regions of regula,r secondary structure, which are packed against one another in a fairly regular succession (for example the prevalent /-la/3 pa.rallel sheets). This means that when the zone length is long the distribution of comparison distances in a typical protein is almost unpredictable and rather irregular. Remington & Matthews (1978) have used cumulative probability plots to look for significant deviations from a normal distribution of distances. But the distribution is not always normal, even in proteins which appear to have perfectly typical structures without) obvious repeats. ,4t this stage of our knowledge it seems better to USCa purelg empirical measure. A sample of typical proteins representing the main stru&ural types was taken, and each x-carbon backbone compared with itself to give a table of significance thresholds (Table 1). For example, if l$, of the fibs from a particular structure with a zone length of 21 a,mino acids have an r.m.s. distance less than 1.9 LA then the dist,ance of 1.9 L% is said to lie at, t,hc lo{, level for this part!icular stjruc+tnre.
TABLE 1 Probability
distrdmtiom
(zone length
All
for
zone lea.&squures
of 55 residues
along
the protein,
a-helix Myoglohiu
OLwith ,I3 pwdlel Subtiliun
C”arbnxypeptidaBe A Adenylate Itinase Triose phosphate iuomerasr Lactate
dehydrogonaso
275 307 194 ‘47 3%
7.0 6.0 7.1 o.:i* .5.7*
0.5 (i-i
151 “37 236 197
8.4 8.2’ 7.2* :3.4*
x.4 8.6 7.6 3.6
7.X 0.7 7.’
$ttiug
distance.s
backbone)
7.3
7.x
10.9
!I.:! 7.9
10.1 9.3
12.5 12.2
7.x 7.2 i.4
8.7 8.4 9.0
12.0 11.X 12.4
.I11 fl-ah,&
Superoxide tlismut,asr C’oncanavalin il Chymotrypsin C’hymotrypsin core
C’ompariaonx include all non-overlapping protein. Disitancos are the threshold values have some obvious repetitive featuw which cussed in section 3(a) of the text,.
x.x IO.2 X.5 4.7
9.6 11.5 10.1 8.2
matches of zones of a-carbon atoms for the 3 significance lrvels. Prot,eins map not bta typical. The rhymot,rypsin
12.3 14.9 13.5 12.2 within each marked (*) core is dis-
.\. 1). M(~L‘~~‘H I,;\N
54
The distributions depend strongly on the zone Irngt)h. In Gals on ovw 20 proteins a length of 55 residues proved conveniwt for c~xploring large-scalt~ organization (although lengths ranging from 33 to 131 havtl 1~~1 uwd to follo\v up intrr&ing features) and 21 rrsid&s for looking at finer details. ln chymotrypsin itself a zone of 13 was used; hut, this is too short to Iw of valw in most’ otjhchr protjeins. especialI>, a-tdical ones. When a protein is compared with itself two matched regions may overlap, e.g. residues (1 to 21) compared with (9 to 29). and such overlapping matches near t’hc diagonal of the sha,pe comparison matrix tend to give exceptionally good tits, because of the way secondary structure persists locally along the length of the protein. So in Tables 1 and 2 all the figures refer to t)he statistically independent non-overlapping comparisons. The significance of a given r.m.s. fitting distance varies quite st’rongly from one protein to another. It depends on t’he structural type (Schulz & Schirmer. 1974; Levitt & Chothia,. 1976) and the ratio of a-helix to p-sheet. For example with 55-residue zonw and a distance of 8.0 p\ the significance level is trivial (S)o) in the cc-helical myoglobin. 1 .l 94, in the mixed a-/3 protein carhoxppeptidnse. and zero in the /I-sheets of concanavalin A. The same trend is even stronger with 21-residue zones (Table 2). Finally there is the yuesbion of deletions and insertions. which is so important in the comparison of the one-dimensional patterns in protein seyuc~nws. Haher & Koshland (1970) and R’erdleman & Wunsch (1970) showed how two completely unrelated sequences may he made to look alike by postulating only a very fca- ga,ps in the comparison. lx the same caution necessa,ry \\+cn comparing three-dimensional structures!
2
TABLE
Probability
distributims
for zme least-aguaws Jittirq
rliatancm
(zone lerzgth 21 wcidues)
All
a-helix
Myoglobin Myohemerythrin Parvalbumin a with
153 11x I ox
0.8 0,1* ,.z*
0.6 0.2 I.3
1.i 1.:1 3. I
/3 pnrullel
Subtilisin Carbosyp~pt~idaw .4 Adenylate kinase Trioso phosphat,o isomwaw Lactate dehydrogcnaw a with fl untiprrrdlel Lysozyme
Cytochrome All
4.1 5.0 5.5
3.7 3.7
6,5
4.6 4.5 4, I 4.1 4.’
6.7 6.7 6.3 0.5 6.7
4.2 4.1
5.1 Bl
,%heet
SuperoxidF: dismutaw Concanavalin A Proalbumin Chymotrypsin Chymotrypsin cow
151 237 114 L’x 197
2.5 9.9 I“.!I L’.rl* “A)*
2.7 3.1 3.1 X-4 3.1
3.8 4.1 :j.ci 4.1 3.7
7.7 x.9 8.8 i.9 8.1
C:E:sF,
I?;
THE
RTR’IJC’TC-R-AL
RVOLI’TTON
OF CHYMOTRYPSTS
.‘r.i
Here there is t8hc important difference that t’he protr>in backbone is a t)hree-dimensional structure a~nd cannot be “closed up” around a long gap in the same way as a sequence can. Gaps wilt only be introduced at points where the chain makes a loop out, and back. Thus the scope for producing false fits of unrela,ted structures is more limitJed. There is however a different problem in shape comparisons, the fact of structural conv~1rgence becLause of the geometrical rules of protein folding. For example. tnatl~. pairs of a-helices cross at’ an angle of about 15 to 20 ’ : two pait,s from different proteins wilt generally match welt at tticir crossover point if suitable tcngths of cotinccting chain are deleted. The same caution applies when fitting parallel @IX/~units or otht~t~ typical subassemblies. Thcb strongest ca,se for an evolutionary relationship Ilctweeu two similar structures occurs when th(l structures share several at’ypical features and quirks. and I raquirt, vf’l’y
fv\\~
g:iIJS
ill
their
SuperpoSitiOIl.
4. Chymotrypsin
Domains
Before we consider the shapes of the domains it is useful to start with a much simptrr topological comparison based on their hydrogen bonding pat,terns. Structures which have diverged from a common ancestor should retain similar patterns even if their shapes have altered and so the topology helps to narrow down the search for meaningful shape relationships. (a) Topological
?~elafims
Thea central portions of the two domains are topologicalty equivalent in several senses : the connectivity of /3-strands (Schulz & Schirmer, 1974) : the hydrogen bonding pattern; the shear of the twisted barrel: and the positions of the huried side-chains. Thrb six antiparallel /!-strands A. B. C, D. E, F, in each half of the sequence occur in the same order SBC-FED across the sheet. The tna’in-chain hydrogen bonds have a similar hut not identical pat’tcrn (Fig. 1). These facts are not in themselves particularly remarkable. In order to cstabtish that the domains have a common origin it is uccessary t,o show a more detailed correspondence. in which related regions are aligned atom t)y atom. During the course of evohlt8ion the lengths of the connecting loops Mween adjacent ,%strauds may harts changed. but there are t’no structural feat’ures which ought to have been conserved. Ontl is the shear number of the hydrogen-tjonded connections. The other is that a side-chain which pointed towards t#hc interior of the domain should cont’inue to do so. (b)
The shear vunrher
-4tmost all p-sheets have a right-handed twist along the strand direction (Chothia, 1973 : Chothia rt al., 1977). This means that in a cylindrical barrel the pairs of a-carbon atoms which are hydrogen-bonded to one another on the same side of the sheet tie on a left-handed helical trace across the surface of the barrel (Fig. 2). Following this path once round the cylinder one arrives back at the first strand a certain number of residues further on. We call this t#he shear number S. In chymotrypsin S = f9 for both domains, as can be seen b.y tracing the vertical cross-connections in Figure 1 from top to bottom. Notice that S must be an even number in any strictly regular barrel. since alternate side-chains tie on opposite sides of the cylinder. The sign of S is posit.ive for sheets with right-handed twist,s.
I
:
:
:
:
:
:
:
:
’
FIG. 1. Hydrogen bonding diagram of or-chymotrypsin, adapted from Birktoft t Blow (1972). Arrows running from NH to CO show main-chain hydrogen bonds. Large circles mark the limits of the strands which form the structural core of each domain. Black circles (0) and half-shaded circles (0) show side-chains which are buried or half-buried inside the barrels. Hexagons mark active site residues His67, -4~~102, Asp194 and Serl96. Double lines are disulphide bridges. In each domain the first, /3-strand (A) is shown twice, at both ends of the barrel. Notice how the lower repeated strand is sheared 9 residues to t,he left. A4stflrisks mark the positions of local dyad axes inside each domain. Strands B and E change dire&ions near black bars following residues 44, 86, 160 and 212. Each barrel forms by folding t’he upper right and lower left corners of the 6 strands diagonally up and over so that t’hey meet above t,he central dyad.
FIG. 2. Shear number of a twist.ed F-sheet cylinder. Strands twist, to right from bottom Arrows follow left-handed helical pat)h of hydrogen bond ronnections once round cylinder, ing S residues higher on the first st,rand (here 8 -= 4).
to top. emerg-
GEFE
TIZ’ THE
STRUCTURAL
EVOLUTION
OF
57
CHYMOTRYPSI?;
The shear is related to the twist of the sheet as follows. Suppose that 7~ curved /3.strands form a regular twisted cylindrical barrel of radius R. in which each strand slopes at an angle ccto the cylinder axis. Suppose that the distancealong astrand between adjacent residues is a and the distance across from one hydrogen-bonded strand to the next is 6. (Typically a : 3.48 a and 6 - 4.72 A4 (Fraser & MacRae, 1973).) Thm
S = tL6tan u/u
(4)
K == 6/[2 sin(,/n)cos
~1.
(5)
The twist 7 may be defined as the number of turns by which the plane of the b-sheet twists in moving from one residue to the next along a strand. T = a sin cccos cc/fnR.
(6)
For chymotrypsin with n = 8 and AS== +9 these relations would give R = 7.0 A\ a = 47.9” and 7 = f0.039 turns/residue if the barrel were regular. Table 3 shows the shear numbers of some other p-sheet barrels. Notice that few of them are regular st)ructures, and that most are not properly closed by a complete ring of hydrogen bonds. TABLE 3 Shear wumbers of p-sheet hanels Strands
l’rotein C’hymotrypsin I ant1 II Triose
domains
phosphate
I’yrnvnte
domain
A
6
9
Two
bent
triple
8
x
Regular
3 + 5
8
Two chains cylinder
isomerase
kinaxe
sotcs
Shear
Reference st,rands
Rirktoft
cylmdw
Banner
et ~1. (1955)
Lrvinr
et r/Z. (1978)
sheets
Xrnone
et OZ. (1971)
Ekhmd
et cd. (1976)
C’olman
et cd. (1972)
6
( 11)
Two
5
(12)
One sheet itself
Thcnnolytiin
5
(5)
Only closrd by bontls calcium ion
--
Double
(6)
Not
closrd
Richardson
Not
closecl
Poljak
Prealbumin (‘n-Zn
nuclease
Immunoglobulin
8-l
tlimer
superoxide
8
diamutase
8
V, domain
8
The shear numbers interrupted. Prealbumin
in parentheses are and immunoglobulin
not V,
doubled
layer
(1972)
in rc~gular
-1lcohol dehydrngenase catalytic domain core I
Staphylococcal
crossed
& Blow
awoss to
Blake
et trl. (1974) et al.
et al. (1975)
(1976)
well-defined because the barrels concerned arc’ are only barrels in a loose topological sense.
Thr fact that both domains have the same shear number is highly significant. beca,uxe it allows the possibility that they have evolved continuously from a common precursor structure without ever breaching the main network of hydrogen bonds. Residues which are equivalent in the two domains should therefore still occupy corresponding positions in the hydrogen bond scheme. This severely narrows down the choice of possible alignments of the domains. For example. if residues 29 to 3fi in domain 1 correspond to 134 to 141 in domain II, then it, follows by tracing the connections in Figure 1 that 39 should match 155, 54 should match 185 and so on. Thea
58
A.
1). JIc~I,;Z(‘H
I,;\K
requirement that residues with buried side-chains should match is iLlso il strotlg restriction. Thus residue 33 (int’ernal) might, be equivalent t,o 136. 134 or 140. but not to 135 or 137. The positions of t)he internal side-&ins sho\vn in Figure 1 do indeed correspond quite well if residues 33 and 136 are taken to he equivalent,. Other reasons for t,his choice will be described belo\\.. Two cautions should t)e not’ed here. First. t,he shear number is not’ al~lays precisely debermined: there is some arbit,ra,riness in the starand displacements in Figure 1 because the hydrogen bonding scheme is irregular. Thu s although t’he net#work of hydrogen bonding connections can only change exceedingly slowly during evolution. one cannot exclude the possibility that, S might, alter. Thr strand alignment in Figure 1 is the best one according to the criteria of Levit’t, & Greer (1977) and t,he assignment of the same value of S to both domains is supported by the shape comparisons which follow. Secondly, only t’\vo six-stranded barrels art’ kno\\,n at present, and there ma,y be physical reasons which favour S m= ) 9.
(c) Domain
shape compurisous
The shape comparison mat’rix for t*hr complete chain of chymotrypsin, with a, zone length of 55 residues, has two strong peaks wibh a fitting distance of 7.2 A connecting domain I with domain I I. The first relates the regions cent’red on residues 69 and 198. the second relates 61 to 208. The pea,ks are strong enough (see Table 1) to suggest a relationship between t’he domains, in spit,e of the different lengths of many of the loops. Further comparisons with zone lengt’hs of 21 and 13 revealed which smaller portions of structure might match, and after a long series of trials bhe best alignment proved to bc consistent with t)hr hydrogen bonding topology (Pig. 1 ). A “regularised” pair of domains was constructed by t,rimming off the minimum number of residues in the outlying regions until corresponding pairs of loops had equal lengths: residues (9 to 15), (72 to 77) and (95 t,o 101) from domain 1; residues (143 to 153), (165 t,o 179). (192 to 193) from domain Il. The trimmed structure of 197 residues contains sn exceedingly strong repeat. Twenty.-four consecutive zones of 55 residues have a fitting distance below 4.7 A. The best’ overall mat,& includes 57 residues and has a distance of only 3.4 A. Tt mat,chcs the trimmed se&ions (29 to 108) lvith (134 to 226). Figure 3 illustrates the long repeat in the trimmed domains. using a short zone length of 13 residues, which shows more detail. Sotice that, therr are several secondary repeats, which will be discussed lat’rr. It is convenient to define the core of each domain in terms of the six inner strands of p-sheet described in Table 4. These are arranged to fit the alignment in Figure 1 and consist of 46 residues each. Each domain itself consists of the t’wo halves ABC. DEP whose cent,ral strands B and E are bent, midway along t,heir lengths. The.v are therefore divided into two segment’s B,. B, and E,. E,. The two cores superpose extremely well. with an r.m.s. distance of 2.43 A over the 46 residues. Superposed a-carbon distance plots were used to confirm that the correct pairs of atoms had been matched. \vit,h 35 atoms placed within 2.5 A and 11 more within 3.5 A. The plots also bring out a weak rela,tionship between the twisted connect’ing chains which form two halves of the active site. Residues 54 t,o 59 match 185 to 190 within 5 Lk while the a-carbons of ,%spl94 and Xer 195 match Asp61 and Thr62 within 2.7 and 1.7 A, respectively. It should be noted that none of the disulphide bridges match in this superposition.
Domoins I and II FIG. 3. Shape comparison plot wit,h domains trimmetl to match t,he loop lengths, leaving 197 residues. Strands ABCDEF are defined by the large circlcw in Fig. 1. Zigzags show whcrr r&clues (72 to 77), (95 to lOl), (143 to 153), (165 to 179) and (192 to 194) are cut out. Each point marks the comparison of 2 zones I3 residues wide ccntrrd at a pair of resitlucs. Three levels of yua1it.y arc drawn as full lines, dots and enclosed areas. They corI*c~spond to r.111.s. distances ot 2-j. 3.3 ant1 4.2 4 or prohrtbilitv
lcv~ls of
I “;,.
7,5O,/,ant1 2Fi0,;,.
Thea two halves of each domain are topologically
similar. since each is a group of
three antiparallel p-sheet strands with a similar hydrogen bonding pattern : A-(R,B,)-C and D-(E,-E,)-F. More precisely. each ha.lf folds as a pair of connected hairpin loops which have a kink in their shared ct:ntral strand and bend across one
60
.\.
I).
Mcl,ACH
TABLE
I,AX
4
Strand segments itr the aliped Domain
1
29-36 39-44 46-48 50- .i4 62-69 xoG35 (86) 87-90 104-10X
dowtait~s
I)omain
II
134-141 165-160 161-164 181.-185 195-202 207-212 214-217 “24-228
(213)
another at right angles. Chothia et al. (1977) h ave noted this bending as a typical consequence of the twisted nature of the p-sheet. A much more remarkable fact is that the two halves interlock symmetrically around a local dyad axis, as if each domain were a hydrogen-bonded dimer. Figure 4 illustrates this arrangement in domain T. Are the shapes of the two halves similar? This is suggested by the shape comparison in Figure 3 where a long correlation appears between ABC and DEF of domain I. One complication is that strand E is longer than B by one residue, but tests showed that the two halves could be matched best by deleting residue 86 in domain I and trimming away the loops (71 to 78), (92 to 103). The shape comparison plot’ in Figure 5(a)
.
FIG. 4. Exploded view of domain I backbone to illustrate symmetrical pairing of the 2 halves about, a dyad (central spot). Curved arrows show the course of the chain. Each half contains 2 loops which twist and cross over diagonally. The pieces come together and form a closed hydrogenbonded shell. Each half-domain contributes one of the active site residues His57 and AsplOP.
demonstrates the repeat. To check the relationships more precisely the cores of the halves have been fitted to one another and superposed distance plots drawn. These confirm that the best structural alignment of the half-cores agrees with the definitions in Table 4. Many comparisons can be made, since there are four half-domains which are potentially related. For each matched pair there is a screw transformation which superposes them, and Table 5 summarizes the results.
h.“k[ w -[
0 [
?+.d m [
Figures 6 to 8 illustrate
these relationships.
There is an unexpected spatial relationship between the wrong halves of the t,n-o domains within t’he whole molecule, since they can he superposed in pairs by a %fold screw dyad (Fig. 9). The last entry in Table 5 shows how the first halves of both
I -r z 1x I
x
View rnto octwc
centre
along
View
2
FIG. 6. Stereoscopic ribbon thagram of ch~wwtrypsin strands numbered. Labels arc immediat,ely to t,he disulphide bridges. Domain I is at top left antI domain rotation about the direction (0.501, -0.814, 0.293).
Domain1
along
/
.~
centre
along
.?
I
backbone \vith entls of t,hc mam F-sheet right, of a-carbons. T’riplc lines represent IT bottom right. ‘L’hcy are related by a 74”
Domain
pseudo-dyad
Into octave
I olonq
Y
pseudo-dyod
,i., x
’
~--____-~
FIG. 7. View of domain I along the dyad, from a point, on t,h
CESE
Domain
II along
IS
THE
$TRU(:‘I’UR.4L
EVOLI-‘L’ION
OP (‘HYMOTRYI’StN
63
pseudo-d;ad
I1 viewet along thr tlyatl ill an orientt~t mx matched to the view of domain I, FIG. S. I)cnnain fwm upper right, rear of motkl. Thv long autolysis loop is at top left anti the helical methionine loop undernc~at~h the crntw. The tiistort,ed region 217 to 224 forms the substrate-binding pock&. \‘ir\vpoirlt~ ( 14.3, ---18.0, 359): centw (19.2, --WC, 14.1).
Frc:. !+. Roth domains vicvwl along the screw tlyad from aurfaw of domain r (top front,) matches that of tlomam II ~llrwtton pwse~v~d (bot,tomj. A hydrogen bontl connectsitt= lips trt war of wntre on t,hr Mt. \Yewpoint ( 31.5.
the top of the molecule in Fig. 6. Upper with t,he halves reverwd but the chain resdr~c~ 43 t.o I95 near centm. Active 25.3, 6.0): ccntrc (11.6, - 1.2, 14.1).
domirins can each be fitted simultaneously to the second half of the other by a rotation of 172” and a shift of 849 .k along an axis through the c&re of the whole tr10ltcu1r. ;-\ s(ww dyacl. urllikc~ a 2- fold rotation axis. is not’ a possihlv symmetry clement for thv point-group of a protein with several subunit’s. but it’ is a very common element in thtb space groups of typical protein crystals. Apparently this operation often brings two idrntical globular object,s together in an ~~xcept~ionall- favourable packing relationship. An approximat,c screw dyad even appears in one allosteric protein strwtuw. t,he dimer of hcxokinase (Steitz rf al., 19iA). This wlationship applies whew the direction oft he polypept’ide chain is taken in its
The half-domains
are the drawls
A-H,-RA.C’ or D-E,-EP-F
described
0.501 -0*861
- 0.472 0.872 0.604 0.413 - 0.848 - 0*832 0.293 -0.161 co-ordinate
--0.814 .._ 0.482
0.223 0.679 0.857 0.499 0.534
- 0.725 - 0.437 0.418 &SO7 “- 0.179 -0*150
of screw axis
-- 0.601 -
Direction
of the half domains
in Table 4. The Cart&an
i-43 2.35
II to I II.2 anrl II.1 to I 1 and I._’
r.m.s. distance (A) “49 2.45 2.10 2.15 1.79 _,dO
Atoms
I.2 t.0 I. 1 II.2 to 11.1 II.1 to I.1 II.2 to I.2 II.1 to I._, II.2 to 1.1
Pieces
Leastsqmwes jtting
TABLE 5
1.69 - 0.42 16.24 - 15.56 6.56 8*3A
169.4 171.5 79.5 78.6 169.9 172.3
8.09 +-&em is that of Rirktoft. C L:!ow (19i?)
74.4 liP*l
_ 16.5”
Shift, along axis (A) (“)
Angle
GENE
IN
THE
STRUCTURAL
EVOLUTION
OF CHYMOTRYPSIN
65
normal sense. However each half-domain itself happens to have a similar shape along the forward and reverse directions of the chain, and this accident, coupled with the effects of the three pseudo-dyad axes, seems to account for the mirror symmetry observed by Shotton & Watson (1970).
5. Bacterial Serine Protease It is important to check whether the structural repeats found in chymotrypsin run through the whole serine protease family or merely apply to that enzyme alone. The structures of trypsin and elastase are too like chymotrypsin to yield any new insight, but there are three closely similar serine proteases from bacteria which are distant enough to give a more independent view of the evolution of the domains. They are proteases A and B from Streptomyces griseus (Delbaere et al., 1975 ; Johnson & Smillie, 1972) and the cr-lytic protease from Myozobacter 495 (Olsen et aE., 1970). The repeats in protease B will be described below, but first it is necessary to clarify the relationship between chymotrypsin and the bacterial proteases (McLachlan & Shotton, 1971 : Delbaere et al., 1975). When Delbaere et al. (1975) determined the crystal structure of protease B at 2.8 A resolution they found two surprising differences between this enzyme and chymotrypsin. The autolysis loop (141 to 154) appeared to be cut short and in its place Argl69 from the methione loop (168 to 189) formed a permanent ion pair with Asp194. The shortened methionine loop no longer seemed to wrap around the neighbourhood of &plO2 and instead a new “extra loop” between residues 186 and 18i seemed to take over the same function.
(a) Alignment
with chymotrypsin
Shape comparison plots of protease B against chymotrypsin confirmed the alignment of the active site regions, but showed no clear-cut relationship between the molecules in the first half of domain II. Further tests with least-squares fits of separate loops and with superimposed distance plots revealed surprisingly large shifts of up to 15 A for some /?-sheet strands in this region. It suddenly became clear that a revised alignment could match the whole core of the bacterial domain II to chymotrypsin extremely closely. (1) The strands A, B, C of domain II are now identified (105 to 113) and (130 to 133) of protease B (Fig. 10). (2) The autolysis to be nothing
as residues
(95 to 102).
loop now occupies its correct position, and the extra loop is seen more than a, very severe distortion of the methionine loop.
(3) The arginine (residue 99) which makes an ion-pair position 138 of the shortened autolysis loop.
in the active site belongs to
Just after this work had been done Dr M. N. James kindly told me that his own high resolution X-ray analyses of the structures of alpha-lytic protease and Streptomyces griseus protease A (Brayer et al,, 1978; James et al., 1978) had already lead him quite independently to this same re-interpretation of t,he alignments. With the new more conservative alignment the core of domain II fits chymotrypsin remarkably well, with an r.m.s. distance of only 1.49 A for 44 matched atoms.
A. I).
McLACHLAN 29
242 Y ; s
181
176
+
184
36
189
[19l]
‘A
*
FIG. 10. Protease R of Streptomyces griseus. Sequence aligned with chymotrypsin domain structure according to best fit of a-carbon atoms. Filled circles and underlines mark gaps and insertions. Numbers are for ehymotrypsin structure. Ringed numbers and bold letters show residues which govern enzyme activity. Arg138 forms an ion-pair with Asp194. Main strands are boxed. Large letters show residues which fit chymotrypsin well. Note deletions of (71 to 79) and (142 to 156). In methionine loop residues (176 to 178) come near Asp102 as in chymotrypsin. Cysl89 corresponds to Cys 191 of chymotrypsin but has changed position in space. Loops (109 to 130) and (166 to 178) are the same lengt,h as in chymotrypsin but fold differently.
Domain I of protease B has a slightly distorted structure, particularly in the shortened loops D and E 1 ,and fits chymotrypsin less well. Detailed comparisons suggest minor changes in the original alignment: residues 3 to 10 match strand A (29 to 36) and residues 51 to 57 match strand E (84 to 90). With these modifications, based purely on u-carbon positions, strands A, B, C, E, and F fit within a distance of 2.1 A. The cores of domains I and II, excluding strands D and E, of domain I, now superpose onto chymotrypsin very well overall, within a distance of 2.08 A for 76 matched atoms. (With these strands included the distance rises to 2.62 A for 86 atoms.) When the two complete molecules are superposed 79 atoms of protease B lie within 2.5 A of the corresponding residue in chymotrypsin and a further 18 within 3 x 5 A.
GENE
IN
THE
STRUC’TURAI,
EVOLCTLON
OF CHYMOTRYPSIN
67
The revised alignment of doma,in II also appears to be feasible in a-lytic protease. Here the disulphide bridge which McLachlan & Shotton (1971) matched with Cys168Cya182 would now be assigned t’o positions 137 and 159 of chymotrypsin, linking strands A and B of domain II.
These depend critically on the choice of alignment. In general the same relations hold as in chymotrypsin if the distorted strands D and E of domain 1 are left out of account. Table 6 shows typical comparisons using the st,rands from Figure 10. The two domains are again closely similar to one another; with internal dyads and a screw dyad relating the crossed-over half-domains. Figure 11 is a distance plot in which domain II of protease B has been superposed on doma,in J of chymotrypsin by matching 43 core residues which fit within 2.29 -%. Notice how the bends between strands C and D follow a similar path: so that dq)194 and Ser195 match Thr61 and Thr62. This study of protease B thus confirms that the repeats in the domains are indeed cha,racteristic of the serine proteases as a, whole.
6. Gene Duplication (a) dequence homology and structud
similarity
A seeptic might argue that chymotrypsin did not evolve by gene duplication, because there is no detectable similarity between the a,mino acid sequences of the domains, and that the structural similarities arose by chance. It is true that ancestral relut’ionships bet’ween prot’eins have often been established I)y comparing amino acid sequences zone by zone in all possible ways (Fitch, 1966, 1970: Needleman & Wunsch. 1970). These methods use the chemical similarity of amino acids or their observed substitution rates in evolution as a measure of relatedness. They are sensitive and can detect quite distant relationships (McLachlan & Walker, 1977), but amino a,cid sequence is generally a less reliable guide than threedimensional structure. Families of distantly relat’ed proteins illust’rate the principle that chain folding is conserved over long periods of evolution (Matthews, 1976). Protein molecules usually have a central closely-packed core of hydrogen-bonded secondary structure surrounded by more irregular outlying loops or bends. The core structure is conserved unchanged over very long periods, whereas outlying parts are often added or removed compa,ratively quickly (McLachlan, 1972a; McLachlan $ Shotton, 1971). Within closely related groups of proteins, such as chymotrypsin and elastase (Shotton & Hartley, 1970) or the haemoglobins (Ladner et ul., 1977; Fermi, 1975; Hendrikson & Love, 1971; Padlan 8~ Love, 1971), there are only small stixctural differences and the sequences are clearly related. When the evolutionary distance increases la,rger pieces of structure may be added or lost. as in the cytochromes (Dickerson et al., 1976) or the ant’ibody constant and variable domains (Poljak et a,Z., 1976; Beale & Feinstein, 1976), but the sequenccls are still visibly related. Further along the scale are proteins with a very distant relationship which is deduced from their similar structures but could not have been detected in the sequence alone. One example is the conserved NAD-binding domain in the lactate, malate, ::*
In each comparison
33
I.1 and I.2 to II.2 and II.1
resiclnrs were chosen from the &rands
RC clefinrtl
39
I.1 and I.2 to II.2 and II.1 169.3
169.8
82.4 80.8 172.6 178.9 165.3 79.4 95.0
Rotation angle (“)
in Fig. 10, omitting
2.76
3.17
3.07 2.45 3.05 2.00 2.25 1.90 2.88
39 33 17 12 21 “1 18
Pieces
6 B
E,
which
do not fit, ~~11.
Screw dyad. Crossed halves with strand D Without strand ID
\Vit,hout
Without strand E, Without strands D and E, a%11D, E, F. Local dyad I.2 residues 41-43, 54-57, 65-69 Local dyad
griseus protease
residues in small letters
in Streptomyces
I to II I to II I.1 to I.2 I.1 to I.2 II.1 to II.2 I.1 to II.1 I.2 to 11.2
of domains r.m.s. diatancc (A)
jitting
Atoms
Least-squares
TABLE
GENE
TN THE
STRU(“l’URAL
EVOLUTIOK
0%. CHYMOTRYPSIN
Protwse
B (If)
G9
-
--_ --
---
--------
A
l I-ox*-.x*-w3~
-
-*----*.---*-::‘I-
_
_
--a__-._ _----I--
-(ic _-
El
-_-
----__
B
__ ---
---~*--
__
____ -*-___-_ -ox-____ --ox*--,--cx*_____ --ox*---___ .“,,-
,
xx-0x -*
--.*-**--.0--
___
__
-l --*---•--
-4
__ -**--__-__ --*-
L -a--_ -_ I*OX. -0x -*.I --*---.--_ ---_----__ ^
-__-.*-----X*--*---*-*C ----
_
_
D
D
-*-**-*xx*--a-
_
--toll*-**--*xx----*---____ -
--
-_-
--_ -t*---xx-.‘I----_
____ ..*-___---I--- --*-**--
_ x-l **-0x0-*“*--ox0 -*x
-t*___ _____ _____ --___ ____
T*x-T-*-_ --_
-I-
__
--_ --__ -__ __
FIG:. 11. Superposed a-carbon protrase B domain II (across). and 9.0 a, respectively. Note mail1 p-sheet strands and gaps
distance plot. of chymotrypsin domain I (down) against S. griseus The symbols, x , 0 *, - represent fitting tolerances of 2.5, 3.6, 54 that every pair of atoms is compared. Boxed regions contain tho in t.he diagonal bands show insertions or deletions.
50
.\. I). 11~l,.iC'H I,.\ S
alcohol and glyceraldehyde phosphate dehydrogrnascs (Rossmanu rt /cl.. 1974.1955 : Biesecker et al., 1977). Others are t’he two-lobed structures of’ rhodamsc (l’lorgmar~ et al., 1978) and pencillopepsin (Tang et trl., 1978) \\bcrc intcrtial genc~ diiplicat,ion has probably occurred. These examples show that, t,he lack of scqucncc repeat,s in chymotryl~sin (McLactiIan, 19726) is not conclusive evidence a,gainst a gene duplication if the folding of the polypepbide chain shows highly significant repeats in itself. (II) Cl’eneral regularities
irl pmfeitl
structure
Many general t,rends and patterns have been established by examining known protein structures (Mahthews, 1976; St’ernberg & Thornton, 1978) and they make it possible to distinguish more reliably between structures which have a common ancestor and ones which merely share typical architectural features. (1) Proteins fall into classes according to the types of seconda,r,v structure which they contain and their topological connections (Schulz & SchirmerZ 1974; Levitt & Chothia, 1976; Sternberg & Thornton, 1977; Nagano, 1977): either all a-helix, all b-sheet with most strands ant’i-parallel, mixed CIwith antiparallel she&, or alternating helices and parallel sheets which form units of super-secondary st’ructure (Rao & Rossmann, 1973). Ta,bles 1 and 2 show t,hat in general the all P-sheet proteins are less repetitive in structure than t)he other classes. (2) The organisation of secondary structure tends to obey cert)ain rules. In particular p-sheets have a right-handed twist when viewed along t’he strands (Chothia, 1973) and the strands which are connected by hydrogen bonds are oft,cn adjacent in the amino acid sequence (Levitt & Chothin. 1976; Richardsou. 1977: Schulz 8: Schirmcr, 1974). The chain connections between parallel /3-sheets in both p-x-8 units and pure sheets are & Thornt,on. 1976,1977 ; almost always right-handed (Nagano, 1977 ; Steinberg Richardson et al., 1976). The need to pack int’ernal side-chains closely (Richards. 1977) also influences protein architecture: so that the contacts between t,wisted p-sheets and those between cr-helices obey geometrical rules (Chothia ct nl., 1977). (3) Several examples of twisted /3-sheets which roll up into cylindrical barrels arc known (Richardson, 1977) and their hydrogen bond patterns have been tabulated (Levitt & Greer, 1977). Examples were given in Table 3. \vhilc less complete barrels also exist in rubredoxin (Watenpaugh et al.: 1953) and soybean t,rypsin inhibitor (Sweet et al., 1974). It’ should be notliccd that, there is a great variety of anti-parallel p-sheet folds which have been described loosely as barrels, and they cannot 1)~ described purely by the order of their strands (Richardson 1977). Or+- a few possess a complete ring of hydrogen bonds. Others can be classified as double layers of crossed sheets, or as single sheets bent double (Chothia et al., 1977). The only common feature is the marked righthanded twist’, uhich clearly plays a dominant role in the organisation of these elaborate structures int,o a closed shell of hydrogen bonds. The chymotrypsin domain is different from all Dhe others, and this makes any similarity between its domains all the more significant. (c) Xtructural
evidence
In chymotrypsin the evidence that the two domains have a common ancestor comes mainly from the topology of the #I-sheets, the shear numbers of the barrels, their
(:ESE
IS
THE
STRUC’TURAL
EVOLUTIOS
OF CHYMOTRYPSIN
71
unusual construction with two interlocked briple strands. and their detailed threedimensional fit when superposed. The r.m.s. distance of 2.43 A between 46 pairs of related atoms in the cores is highly significant. Supporbing evidence includes the rough correspondence between the paths of the connecting chains (55 to 61) and (186 to 194). The bacterial proteases show that the domain relationships are an ancient feat’ure of the serinc protease family. The case for a relationship between domains I and 1I seems exceptionally strong. The case for an earlier gene duplication in the earliest stages of evolution of the anccastor domain depends on bhe approximate dyad axes. The r.m.s. distance of 1.79 :i for matching domain 1.2 with TI.l is quite unexpected. An earlier gene duplicat ion v ould explain t’hese rpsulhs quite simply, and the case seems strong.
7. The Dimer Hypothesis (a) Ilyatl
structuws
Monod ct al. (1965) have argued st,rongly that co-operative effects in multisubunit. prcheins usually involve int,eractions between pairs of identical units which associate symmetrically about’ a dyad axis. a’nd the great majorit’.v of allosteric enzymes are tetramers with 222 symmetry or dimers with a dyad (Klotz et aZ.. 1970; Matthews & Bernhard, 1973). The cast of format’ion of dimers. especially between antiparallel st’rands of p-sheet,, is illustrated by the molecular pairs found in many protein crystals, notably insulin (Blundell et al.. 1971). concanavalin A (Hardman & Ainsworth, 1972), prealbumin (Blake et al.. 1974) and alcohol dehydrogenase (Briinden et al., 1975). Even crystallima chymotrypsin it,self forms dimers about a local 2-fold axis (Birktoft & Blow. 1972: Tulinsky ef al.. 1973). (1)) lhniers
before duplicates
Gene duplication within a single polypeptide chain has occurred in so many proteins that a structure with two copitas of the ssme subunit joined in series must have considera ble evolut’ionary a,dvantapes. Often the repea,ted units effectively form a lockedtogether dimeric structure with properties like those of a multisubunit enzyme: new co-operative int’eractions, formation of binding sites at the interface. Many proteins which have probably evolved by gene duplication, such as bacterial ferredoxin (Adman et al., 1973). carp parvalbumin (Kretsinger & Nockolds, 1973) or rhodanese (Bergsma et al., 1976). have a dimeric structure within one chain with an approximabe local dyad axis. Ferredoxin transports elect,rons in two identical clusters of iron and sulphur atoms buried within a small protein. Weeds & McLachlan (1974) noticed a surprising feature of the structure : the two halves of the chain do not form separate iron-sulphur clusters, as might have been expected. Instead. each cluster uses residues from bot,h halves in a co-operative way (Fig. 12) and t’he repeated chain fold forms a very closely coupled dimcr. An isolated half-chain of ferredoxin with t’he present structure would function poorly without its parbner because one sulphur is missing from the cluster. If there ever was a well-funct’ioning half-chain monomer it must have had a different refolded st,ructure in which all four Cgs residues formed one cluster. One is forced to assume that thr two half-cha,ins adapted themselves to fit together before the chain doubled. so that the sequence of evolutionary steps was: single cluster-dimer clusterclonl~lf~-c~hair~pair. The gene duplication txvcnt n~ultl t herrfore consolidate an existing
72
FIG. 12. The intramolecular dimer structure of ferredoxin of Peptococcus aerogenes (after Adman et al., 1973). Residues from both halves of t’hc 27.rosiduo repeated structure conkibute to the surroundings of each iron-sulphur cluster. Circled positions are Cys residues.
interaction and later allow the two halves of the new intramolecular dimer to become different. This pattern of dimerisation followed by duplication seems to be a common strategy of evolution (McLachlan, 1977; McLachlan & Walker, 1977) and has been proposed independently by Tang et al. (1978) to account’ for the spectacular doublelobed structure of penicillopepsin. The same route of evolution may have been followed by other proteins, since the two halves of the NAD-binding domain in lactate dehydrogenase are related by an approximate dyad (Rao & Rossmann, 1973). In chymotrypsin the first cycle of dimerisation and doubling probably involved the association of two half-domains ABC, DEF along a line of hydrogen bonds between strands C and F, with a weaker interaction between strands A and D (Fig. 4). In the second cycle a primitive enzyme may have formed from two identical but separa,te complete domains paired round a screw dyad axis. Then a further chain doubling allowed the two sides of the active site to come together in one chain and to differentiate. The approximate screw dyad which appears iu the entire molecule when the halfdomains are compared crossed over is more difficult to account for. Since association about a screw dyad is an unusually favourable way of packing two identical objects together it seems possible that this spatial arrangement first established it’self at a time when the two halves of each domain were much more alike than they a’re now, perhaps before the two whole domains fused together. The contact regions betweeu symmetrical domains in a true screw and the crossed screw would have been nearly the same. If this path of evolution is correct it suggests that each domain once folded by itself in the way depicted in Figure 4 and may still do so. That is, each set of three strands is a folding unit which has a strong tendency to fold round the other.
(:ENE
IN
THE
STRUC’TUItdL
EVOLUTIOS
OF
73
CHYMOTIEYPSIN
The possibility remains that the two half domains have no evolutionary relationship, but are alike because of geometrical rules of p-sheet folding which are not yet appreciat,ed. The trut.h may become clearer n-hen more p-sheet protein structures a.re known.
8. Enzyme Function The dimer hypothesis suggests that the protease activity developed in several stages. It seems likely that a single isolated half-domain never had an independent existence in the shape it now has, but evolved for a long period as one partner of a pair. Any original biological function of the monomer fragment would have been lost and could have been quite unrelated to the final enc. The ferredoxin argument implies that the primit’ive split’ domain of paired halves had already acquired some function before the two halves fused together. but it is hard t,o see how a single split domain A more likely possibility is that the smallest funccould have had enzyme activity. tional unit was a pair of split domains associated together. Thus the earliest identifiable precursor of the serine proteases might have been a tetramer of four identical half-domains arranged in a quaternary structure corresponding to the present-day enzyme, with a narrow surface cleft forming the active centre. Once a whoIe domain had formed in one chain it \vould be possible in principle for two identical domains pairfad asymmetrically t,o develop active site residues in the present positions. Thus lwth domains could have had residues corresponding to Asp194 and Ser195 at positions 61 and 62 in addiOion to His57 and AsplO2. Whatever st’eps took place in the ancient’ stages, the gene duplication hypothesis implies that the /?-sheet barrels of the domains evolved before any of the other outlying pieces of the chymotrypnin skeleton were added. Therefore it seems reasonable to assume t,hat, features like the C-terminal helix and the extension of the methionine loop are not part of the original struct,ure, but developed later, after the complete two-domain chain had been formed. Refined enzyme functions probably developed in roughl,v t,lre following order. (1) Elongation of the methionine loop Growth of the C-terminal a-helix. (2) Formation
of the ion-pair
(3) Extension of the amino mechanism.
end as a /?-sheet to surround
from Asp194 to ,4rg138 in the autolysis end and the autolysis
(4) Reshaping of the methionine t’o form a helical region.
Aspl02.
loop.
loop to give the activation
loop round the Cys168-Cys182
disulphide
bridge
The disulphide bridges must be a comparatively lat,e development in the evolution of the double domain, since the two invariant bridges in the active site (42 to 58 and 191 to 220) are not homologous between domains. Thus the primitive domain probably had no disulphide bridges a,nd may even have evolved from an intracellular enzyme.
AI’PEXDIS
Least Squares Fitting of Two Structures Given two sets A and B of atomic position vectors a,, b, (t/, =- 1. . . : A’) with weights w,, whose centroids both lie at the origin. the object is to find an orthogonal rotation matrix R with determinant $- 1 which converts the co-ordinates (c+, (i == 1, 2, 3) to
and minimises
the residual E = 4 xww,(r, n
- b,J2.
(A2)
This is the process of fitting the co-ordinates a,, to a set of given guide points b,. Consider the rotation R followed by a small additional rotat’ion through an angle 8 about the direction 1. Then, to second order in 8, the vector r becomes
(A3)
r $ (fl x r) + $0 x (0 x r), where I3 =
10.The corresponding
change in E is i3E = -g
@‘TO,
.8 +
(93)
where g is a couple, 0’ is a row vector, 0 a column vector? and T is t,he matrix of second derivatives. Every quantity of interest can now he derived from two related matrices
UandV
which are connected
by the rotation
lJ,j = x w,,uinhin n
(A5)
vi1 = 1 w,rinoin
(AL\(j)
R: V = RU.
The antisymmetric
part of
V gives the couple, since 81 = (J’,,
and so on. Also the diagonal
gives the residual
(87)
-
1’32)
sum
as E=&~w,,(a~+b~)-v, n
since rz = a:. This follows components
from
(A2) and (A8). The second derivative
T,, = v iSii as may be verified
by substituting
(A91
$( Vij
(A3) into
b44). 14
matrix
c Vji),
(A8) and comparing
has (MO)
the result
with
GENE
IK THE
STRUCTURAL
EVOLC’TIOS
55
OF C’HYMOTRYPSIS
The problem of minimising E therefore becomes t,hat of finding a proper rotation R with /R 1 = 1 such that V = RU is symmetric and o is a maximum. or T is positive definite. McLachlan (1972c) described an analytic matrix solution, which was fair]! complicated, but used an iterative numerical method for practical calculat’ions. Kabsch (1976) used an ingenious Lagranginn multiplier to obtain a direct solution which is effectively the same. When /U I. the d et erminaru of U, is positive both mc+hotls yield tjhe solut8ion V = (1J’U)“2U~-1,
(All)
where U’ is the transpose of U and one takes the positive square root of the positive definibe matrix U’U. It may happen that 1U / < 0 and U can be singular or degenerate. Then Kabsch’s method may yield an improper rotation since the co-ordinate set A fits H best when both inv-erted and rotated. So it proves simpler to approach th(l problem differently, through the so-called singular-value decomposition (Golub & Reinsch, 1971) of t,he 3 x 3 matrix U. The proof bclo\v is quite short and shows how bo ident,ify all the special cases of the fitting problem. Consider the real eigenvalues D and eigcnrectors o of the 6 x 6 partitioned synrnictric matrix R :
R= Each normalised
eigcnvector
ctw
0 u ch ==Do ( U’ 0 ) be partitioned
(Al“)
into t\$,o three-component
vectors
1 u) == ;i2 (h k) and tlrc eigenvalues
occur in three pairs &D,,
with x = 1, 2. 3
ma = i (h,,. k,,) 42
foi, $ D,l
-f- (h,, -k,) o-L1 = 42
for -.lj,l
We make the convention that D, > 0 with D, > U, 3 I),. and then the three pairs of vectors h,, k, satisfy the equat,ion
U’h, = D,k, The vectors h,: h,, h, are orthonormal
Uk, :. D,h,.
(‘414)
and their signs may be chosen such that)
h, >: h, = h,.
(X15)
The corresponding orthonormal vectors k,, k,. k, are det’ermined uniquely from the h set with vectors through (A13) unless D, = 0, and if j U 1 > 0 they form a right-handed k, = k, x k,. If j U 1< 0 they are left-handed. Note that the matrix
(A41A) has three paired eigenvalues
of 0: and that ha, k, are, respectively,
the eigenvectors
A.
70
1). McLACHL.-IS
of the positive definite mahriccs UU’ and U’U. The usual expansion of n in trrms its cigenvectors now shows t’hat U can 1~ oxpr~sscd uniqm~ly in t’he form U = 2 h,;D,,k, a
Y.
I. 2, 3,
of
(A17)
with D, 3 D, 3 D, ;5 0. When 1U 1 > 0 t’he proper rotation
R = k;h, + k&h, + kjh, Xij = 2 kinhi, a converts
U into V = 1 khD,k,.
Here k& is a row vector and h, a column vector, so that each product khh, is a 3 x 3 matrix. Notice that V in (A19) is not merely symmetric, but will also bc diagonal when referred to the k system of a,xes, with diagonal elements C’,, = I>,. Hence
v = D, -!- D, -+ D,.
(A20)
The second derivative matrix T will also be diagonal, with elements such as T,, = D, + D, which are all positive unless U is of rank 1. Hence t,he given solution (A18) is stable. When 1U 1 < 0 the best proper rotation is
B = k;h, -I- k;h, - kjh,
(621)
and V has diagonal elements of D,. D,, -D, in the k syst’em, so that v = D, + D, - D3. Notice that since D, has been chosen as the smallest eigenvalue the diagonal elements of T, which are (D, - D,). (D1 - D,). (D, + D,), are non-negative, so that E is a true minimum. R is unique. The special ca.ses which arise when U is degenerate or singular can be treated as follows : (1) For IUI p osi t’ive equation (A18) gives a unique rotation doubly or triply degenerate.
R whether
(2) For 1U I negative
unless
equation
(A21) g ivcs a unique rotation
or not U is
D, = Da.
(3) If IU/ < 0 and D, = D, # D, then R has one degree of freedom, since h,, k, are not unique. If D, = D, = D, then h, is arbitrary and R has two degrees of freedom. In both cases v == D,. (4) When U is singular of rank 2 it is always possible to choose k, = k, x k, and use equation (A18). R is unique even if U is degenerate, and v = D1 + D, has a true minimum. (Rank 2 can occur if either set of atoms lies in one plane.) (5) When U is singular of rank any proper rotation which bisector plane of h, and k,. (Rank 1 can occur if either
1 only the first pair of eigenvectors are unique. R is converts h, into k, and the axis of R lies in the Thus a rotation of 180” about (h, + k,) is suitable. set of atoms lies on a line.)
GENE
IN
To summarise (a) Calculate
THE
EVOLUTION
STRUCTURAL
this section, the steps required the ma’trix
U equation
OF CHYMOTRYPSIK
to find the best proper rotation
(A5) and its determinant
77 are:
( U 1.
(b) Calculate the eigenvalues D, and eigenvectors ha: k, from equat)ions (A12) and (A13), choosing h, = h, >’ h, and with k, iixed by h,. (c) Determine
v = D, + D, + D, and R = k;h,
according
+ k;h,
f
kjh,
to the rules above for the special cases.
Sometimes only the vslue of the residual E is required (normally with all the weights equal to l/N) and not the rotation R. Then a quick way to determine v is to form the equation UU’ with eigenvalues 0: and solve the cubic eigenvalue 3 x 3 matrix is to calculat’e 1UU’ - D21! = 0 to determine t#he roots directly. A further variation the decomposition of U equation (A17) directly (Golub & Reinsch, 1971) without introducing the fi matrix, using for example the Not~tingham Slgorit,hms Group program FOlBHF. I t,hank B alld for Matthews Diamond
Dr Michael James for the use of his atomic co-ordinates for bacterial protease comments on the alignment with chymot,rypsin; Dr David Blow and Dr Brian for discussions about, chymotrypsin and shape comparison methods ; Dr Robert and Dr Arthur Lesk for discussions of thr mathematical aspects.
HEl+-ERENCES adman, E. T., Sieker, L. C. & Jensen, L. H. (1973). J. Biol. Chena. 248, 3987-3996. Amone, A., Bier, C. J., Cotton, F. A., Day, V. W., Hazen, E. E., Richardson, D. C’., Richardson, J. S. & Yonath, A. (1971). J. Biol. Chem. 246, 2302-2316. Banner. D. W., Bloomer, A. C., Petsko, G. ;2., Phillips, D. C., Pogson, C. I., Wilson, I. i\., Corran, P. H., Furth, A. J., Milman, J. D., Off’ord, H.. E., Priddle, a. D. & Waley. S. Q. (1975). Nature (London), 255, 60!)-614. Beak, D. & Feinstein, A. (1976). Quart. Rev. Bioph,ys. 9, 135-180. Bergsma, ,J., Hol, W. G. J., Jansonius. J. N.. Kalk. K. H., Ploepman, ,J. H. & &nit. ,J. D. (:. (1975). J. Mol. Biol. 98, 637.-643. Riesecker, G., Harris, J. I., Thiorry, J. C.. Walker, .J. E. & Wormcott,, A. J. (1977). iVature (London). 266, 328-333. Birktoft. ,J. J. & Blow, D. M. (1972). J. Mol. Biol. 68, 187.-240. Birktoft, J. J., Blow, D. M., Henderson, R. & Steitz. T. A. (1970). Phil. Trans. Roy. Sot. ser. B, 257, 67-76. Blake, C. C. F., Geisow, M. J., Swan, I. D. A.. &rat. C!. & Rernt,, B. (1974). J. Mol. Biol. 88, 1-12. Blow, D. M. (1969). Biochem. ,I. 112, 261-268. Rlundell, T. L., Cut.field, J. P., Cutfield, S. M., Dodson, E. J ., Dodson, G. G., Hodgkin, D. C., Mercola, D. A. & Vijayan, M. (197 1). IV&we (London), 231, 506-5 I 1. Brtinden, C. I.. Jbrnvall, H., Eklund, H. & Furugrm, B. (1975). In The Enzymes (Boyor, P. D., ed.), vol. 11, pp. 103~-190, Academic Press, New York. Braycar, G. D., Delbaere, L. T. J. & James, M. N. (:. (1978). J. .MoZ. Biol. 124, 261-283. Wurlhner, M., Ford, G. C., Moras, D., Olsen, K. W. & R,ossmann, M. G. (1973). Proc. Xat. Acad. Sci., U.S.A. 70, 3052.-3054. (fllot,Ilia, C. (1973). J. l’@ot. Biol. 75, 295-302. C%ot!hia, C.. Levitt, M. 85 Richardson, D. (1977). I’roc. ZVat. Acad. Sci., U.S.A. 74, 41304134. Colman, P. M., Jansonius, J. N. & Matthews, B. W. (1972). J. Mol. Biol. 70, 701-724. Davie, E. W. & Fujikawa, K. (1975). Anna Rev. Rio&em. 44, 799-829. De Ha&. C., Neurath, H. & Teller, D. C. (1975). J. ,vol. Biol. 92, 225-259.
.\.
78
Delbarrr,
L. T. .J., Hutcheou,
(London~),
257,
IV.
I). Jl<,I,A(‘Hl,.\S
I,. IS., ,J~II~(,s. &I. PC. (:. & ‘I‘llicxwtl.
\\‘.
I+:. ( 1!17.?). AVr~tr/r,c,
75%783.
Diamond, 12. (1966). Acta Crystallogr. 21. ZB:l 2Mi. Diamond, R. (1976). Acta Crystalloqr. sect. .-I. 32, I IO. Dickerson, Ii. E., Timkoviclr, R. R: Almass>.. R. .J. ( I!)i(i). J. :Ilo/. Hio/. 100, 473 4!J I. Di Scipio, H. G., Hrrmor~dso~~, M. A., Yat,cbs. S. (:. tl- Davi,~, E. IV. ( 19iS). Hiochewc%.stry. 16, 698-706. I+;.. Sodcrhmd. (:., Olrlssoti. 1.. Hou\vc~. ‘I’.. Eklund, H., Nordstrijm, H., Zeppezarwr. Soderberg, B.-O., Tapia, A. & Branden. (‘. 1. (1976). .1. 1zlol. Hiol. 102. 27 5!). Fehlhammer, H., Bode, W. & Hubrr, Ii,. (1!)77). .J. Mol. Hid. 111. 415 43X. Fermi, G. (1975). J. Mol. Hiol. 97, 237 256. Fitch, W. M. (1966). J. Mol. Biol. 16, I-7, 8-16, 17 27. Fitch, W. M. (1970). J. ,Vol. Biol. 49, 1 14. 15-21. Fraser, R. D. B. & MacRae, T. P. (1973). Conformatio,l ire Pibroccs f’roteins. Academic Press, New York. Freer, S. T.. Kraut, ,J., Robertus, .J. D., L$:ri&t’, H. ‘I’. & Xuon~, Nq H. (I!170). Hiochemistry, 9, 1997.-2009. Fujikawa, K., Walsh, K. A. & Davie, E. TV. (1977). &ocher&-try, 16, 2270-2278. Golub, S. & Reins&, c’. (1971). In Handhook for Automatic Com,putatl:ou II: I,i)lear ,41gehra (\Vilkinsoll, J. M. & Reins&, C., ~1s). pp 134 151. Sprirqcr, 13wlirl. Haber, .I. E. 8; Koshland, D. (1970). J. AWol. Biol. 50, 617 -639. Hardman, K. D. & i2insworth, C:. F. (1972). Biochemistry, 11, 4910~ 4918. Hendriksorr, IV. A. & Love, W’. E. (1971). A’ature Sew Biol. 232. 197 203. Huber, I%., Epp, O., Stcigemann, U:. & Formanek, H. (1971). Eur. ./. Hiochem. 19, 42-50. lmoto, !I’., Johnson, L. N.. North, A. (1. T.. Phillips, D. C. & Hupley, J. A. (1972). In The Enzymes (Boyw, T’. D.. cd.). 3rd edit., vol. 7, pp. 665 868. .L\cadrmic I’rws. Z;rw York. James, M. N. G., Delbaore, L. T. J. & Brayer, G. D. (l!j78). Canatl. .1. Ilkhem. 49, 548-X%. Jolmson, P. & Smillie, L. B. (1972). Canad. J. Biochem. 50, 589-699. Kabsch, IV. (1976). Acta Crystallogr. sect. A. 32, 922 023. Klotz, I. M., LanKerman, N. R. & Darrrall, D. IV. (1970). Anna. Hev. Bioch,ern. 39, 25-6”. Koide, T., Kato, H. & Davie, E. W. (1977). Biochemistry. 16, 2279--P%86. Kossiakoff, A. A.. Cirambrrs, J. L.. Kay. L. M. & Stroud. K. M. (1977). Hiochemistry, 16. 654-664. Kretsineer, H. H. & Nockolds, C. E. (1973). .J. Bid. Chem. 248, 3313 3326. Ladner, R. C., Heidner, E. J. $ Perutz, M. P. (1977). J. nlol. Biol. 114, 385 414. Levine, M., Muirhead, H., Stammers. D. K. 8: Stuart~, Il. I. (1978). A’ature (London). 271. 626-630. (London), 261, 552~ 558. Levitt), M. & Chothia, C. (1976). iVatwe Levitt), M. & Grew, J. (1977). J. Mol. Biol. 114, 181.-240. Mackay, A. L. (1977). dcta Crystallogr. sect. A, 33, 2 I2 2 15. Magnnsson, S., Petersen, T. E., Sottrup-Jensen, L. & Clae)-s, H. (1975). 111 l’roteases and Hiological Control (R,rich, E., Rifkill, D. B. & Sllaxv, E., tds). \sol. 2, pp. 123 14!). Cold Spring Harbor Laboratory, Cold Spring Harbor. (:. & Clacks. Mapnusson, S., Sottrup-Jensen, L., Peterson. T. E., Dudrk-\Yojciecllo\~ska, H. (1976). In Proteolysis & Physiological Regulation, Miami Winter Symposia (Kibbons, D. W. & Brew, K., eds), vol. 11, pp. 203-238, Academic Press, Nr\v York. Matthews, B. W. (1976). annu. Hen. 1’hy.v. Chem. 27, 403- 523. Matthews, B. W. $ Bernhard, S. A. (1973). dnnm. Rev*. Biopf~ys. Bioerrq. 2, 257 317. Matthews, B. W., Sipler. P. B., Helldrrsolr, R. S: Rlo\v, D. M. (1967). LVature (London), 214, 652-656. McLachlan, A. D. (1972a). J. &loZ. Biol. 64, 417-437. McLachlan, A. D. (1972b). Sature i\‘eu! Bid. 240, 83--85. McLachlan, A. D. (1972~). Acta Grystalloqr. sect. A, 28, 65Wi57. McLachlan, 4. D. ( 1977). l’roc. 2nd Taniquch,i Syrraposi~n on Biophysics. 7!17h’ (Motoo Kimnra, cd.), National Institute of Genetics, Mistrirna, Japan. McLachlan, A. D. & Shotton. D. M. (1971). A’ature A:eu Biol. 229, 202 ZO5.
GESE
I9
THE
STRU(‘TURdL
EVOLU’I’LOS
OF C’HYMOTRYI’STS
79
McLachlall, A. D. & Walker, ,J. E. (1977). J. &loZ. Hiol. 112, 543-558. Monad, .J . . Wyman, d. & Chanpeux, *J. P. (1965). -7. Mol. Biol. 12, 88-118. Muirhead, H.. (lox, J. M., Mazzarella, L. & Perutz. M. F. (1967). .I. NoZ. Biol. 28, 117 -I 56. Nagano, K. (1977). J. llfol. Biol. 109. 235-250. Ncedleman, S. B. 8: Wunsetr, (‘. D. (1970). J. Mol. Hiol. 48, 4-13-453. Nislrikalva, K. & Ooi. T. (1974). J. Th,eoret. Rio/. 43, 351 471. O~HPII. M. 0. ,I ., Nagabllnsllan, N., Dzwiniel. M.. Smilh. I,. 13. & \Vtritakt:r. 1). K. (1970). .Vature (London), 228, 438-442. (hi. T. & Nist~ikawa, K. (1!+73). In Conformation of Biological Molecules and Polymers (Bergman. B. D. & Pullman, B., eds). pp. 173 187, Academic Press, Nrw York. Padtatr, E. A. b Love. IV. E. (1974). .I. Riol. Chew&. 249, 4067 4078. Penrow. H. (1955). hoc. Car&. f’hil. See. 51, 406 1 t 3. I’toegman. .J. H., Drent, G., Kalk, K. H.. Hal, IV. G. .I ., Heirtrikson, M. L.. him. I’.. \Vrrlg, L. & Russell, -1. (1978). Xature (hndnn). 273. 124--t 2!J. I’oljak. R. ,f ., Amwl, 1~. M. & F’lrizackerlc\,, It. T’. (I 976). hoq. Riophys. Mol. Hiol. 31. (ii X!). Rae. S. I’. $ Rossman, M. G. (1973). ,/. Mol. Hiol. 76, %41-2.56. Rcnr~ngSton, 8. .I. & Matthews, H. W. (1978). Proc. Sat. Acad. Sci.. C.S.A. 75, 2180~2184. Kemitlgt,ort, 8. .I.. Anderson, tlr. ‘F., (hell. J., Ten E>~cak, I,. I?.. (:ra.ingw. C:. T. &. Mat&hews, 13. \I’. (1078). J. illol. Hiol. 118, 81-98. Rictwds, 17. M. (1977). Annrc. lieu. Biophys. Hioencq. 6, l.il--176. Ricl~artlsot~. ,I. S. (1977). ~Yature (London). 268, 495-500. Kiclwdson, .J. S., Thomas, K. 9., R,ubin. B. H. B Richardson, 1). (‘. (1975). hoc. Xat. .-I rd. SC%.. rY.S.A. 72. 134!)-I 353. Kictlarclsoll, .J. S.. Richardson, 1). C., Thomas, I\;. .-\.. Silvwt.on. E. W. & Davies, D. IC. (19i6). .J. :Flol. Biol. 102. 221 ~235. Kossrna~~n, M. (i. $ Brgns. I’. (1975). J. Biol. Chem. 250, $525.-i53%. Rossmimn, M. (:. & Argos, P. (1976). J. &Iol. Biol. 105, 75 96. Ross~nar~n. M. G. & Argos, P. (1977). .7. 3101. Biol. 109. 99-12!). Ross1natlll. M. G. & Lil,ias, A. (1974). J. Mol. Riol. 85, 177 ~181. Rossma~trt, M. (i.. Moras, D. & Olsen, K. W. (1974). AT&ire (London), 250, 194-199. tioss~nat~n. M. (;., Liljas, A., Brarlderr, C.-l. & Banaszak, L. *J. (1975). Th.e hhzynzes (Bayer. t’. I).. cd.). x.01. 11. pp. 61 102, Academic Press. Nrw York. SA\V>.(‘I.. L.. Stlotton, D. M.. Canlphell. J. W., Wendrlt, I’. L.. Muirtlead. H., Watson, H. (!., Ihmond, K. & Ladncr, R. c‘. (1978). J. Ilfol. Hiol. 118, 137-208. Schlitz. (:. E. & Sctlirmer, R. H. (1974). Nature (London). 250, 142 -164. Sllottorr. D. M. & Hartley, B. S. (1970). X&we (London). 225. 802-806. StlottoIt. D. M. & Watson, H. C’. (1970). Xature (Loudola), 225, 811~-816. Siplw. P. H.. tSlo\v. D. X., Mattltcws, B. R’. & Hwtd~rso~~, R. (1968). ,J. X’ol. Hi<>/. 35, 1-U 16-l. Stclitx. T. A., ‘Il’tettrrick, R. ,I . . Atrderson, UT. F’. & Arrrtwson, (1. M. (1976). J. Mol. Riol. 104. 1w 222. Sterrrtwrg, M. ,J. E. $ Thorntoll, .J. M. (1976). J. .Uo/. Biol. 105, 367-382. Stcwhrg, M. .I. E. & Thorntori. J. M. (1977). J. &lol. Biol. 110, 269-283. Strrntwp, M. -1. E. & Tilorntoll. .J. M. (1978). Nature (London.), 271, 15-20. St.rolltt. It. M., Kay, L. M. & Dickerson, R,. E. (1!?74). J. Xof. BioE. 83, 185-208. S\vrcht. K. M., \Vright, H. T., ,Ianin. .J.. C%othia. C. H. & Blow, D. M. (1974). Biochemistry, 13. 42 12-4228. Tar~,c. .J.. .Jamw, M. N. G.. Hsll, 1. S., Jrt~kins, .J. A. K- Hl~lttdctl. T. L. (1978). ,Vature ( Imtlon), 271, 618-62 1. Tnft~~. R. M. $ Kretsinper, H. H. (1975). S&ewe, 187, 167 169. Tulitlsk?-. i\., Vandlen, R. L., Morimoto. (‘. N., Mani. N. jr. & 1Vrigtlt, L. H. (1973). Niochemistr~q. 12. 4 I85 -4 I!!?. ~Vatwlpaugl1, R. I)., Sieker, I,. (‘.. Herriott? .J. K. & .Icnson. L. H. (1973). Acta Crystallogr. sect. B, 29. 943-956. Wciecls. A. G. & McLachlan,
A. I). (1974).
Xuture
(London),
252,
646--649.