J. Mol. Biol. (1985) 182, 317-329
Thiol Proteases Comparative Studies Based on the High-resolution Structures of Papain and Actinidin, and on Amino Acid Sequence Information for Cathepsins B and H, and Stem Bromelain I. G. Kamphuis Energie Onderzoelc Centrum Nederland (ENR) Postbus 1, 1755 ZG Petten, The Netherlands
J. Drenth Laboratory of Chemical Physics University of Groningen Nijenborgh 16, 9747 AG Groningen, The Netherlands
and E. N. Baker Department of Chemistry, Biochemistry and Biophysics Massey University, Palmerston North, New Zealand (Received 28 August
1984, and in revised form 18 November 1984)
An accurate three-dimensional structure is known for papain (1.65 L% resolution) and actinidin (1.7 A). A detailed comparison of these two structures was performed to determine the effect of amino acid changes on the conformation. It appeared that, despite only 4896 identity in their amino acid sequence, different crystallization conditions and different X-ray data collection techniques, their structures are surprisingly similar with a root-meansquare difference of 0.40 .& between 76% of the main-chain atoms (differences <3a). Insertions and deletions cause larger differences but they alter the conformation over a very limited range of two to three residues only. Conformations of identical side-chains are generally retained to the same extent as the main-chain conformation. If they do change, this is due to a modified local environment. Several examples are described. Spatial positions of hydrogen bonds are conserved to a greater extent than are the specific groups involved. The greatest structural similarity is found for the active site residues of papain and actinidin, for the internal water molecules and for the main-chain conformation of residues in a-helices and anti-parallel P-sheet structure. This was reflected also in the similarity of the temperature factors. It suggests that the secondary structural elements form the skeleton of the molecule and that their interaction is the main factor in directing the fold of the polypeptide chain. Therefore, substitution of residues in the skeleton will, in general, have the most drastic effect on the conformation of the protein molecule. In papain and actinidin, some main-chain-side-chain hydrogen bonds are also strongly conserved and these may determine the folding of non-repetitive parts of the structure. Furthermore, we included primary structure information for three homologous thiol proteases: stem bromelain, and the cathepsins B and H. By combining the threedimensional structural information for papain and actinidin with sequence homologies and identities, we conclude that the overall folding pattern of the polypeptide chain is grossly the same in all five proteases, and that they utilize the same catalytic mechanism.
papaya) and the Chinese gooseberry or kiwi fruit (Actinidia chinensis), respectively. Their molecular structures have been refined to a resolution of 1.65 a for papain (Kamphuis et al., 1984) and 1.7 L$
1. Introduction Papain and actinidin are related thiol proteases, present in the fruits of the papaya tree (Carica oo22-2836/s.~/O60317-13
$OS.OOK)
317
0 1985 Academic Press Inc. (London) Ltd.
318
I. G. Kamphuis,
J. Drenth
for actinidin (Baker, 1980). A detailed comparison of the two structures demonstrates the effect of the differences in amino acid sequence, in the crystallization conditions and in the X-ray data collection and refinement techniques on the conformation of the protein. These differences are appreciable. (1) Only 48% identity exists between the sequences. (2) Crystallization is from 67% (v/v) methanol for papain at pH 9.2 and from 20% (w/v) saturated ammonium sulfate at pH 6.0 for actinidin. (3) X-ray data collection was done with the oscillation technique for papain and by diffractometer for actinidin. (4) Finally, the structure refinement was carried out by the restrained least-squares method for papain and by unrestrained least-squares procedures alternated with regularization for actinidin. The only other three-dimensional structure of a thiol protease published is that of Calotropin DI (Heinemann et al., 1982). The structure of this protein is related to that of papain, but is known only to 3.2 A resolution and no sequence information is available. For no other thiol protease is secondary or t,ertiary structure information known, although the primary structures of some complete chains or fragments have been determined (Drenth, 1976; Goto et al., 1976, 1980; Lynn & Yaguchi, 1979; Lynn et al., 1980). A streptococcal proteinase has been sequenced by Tai & Liu (1976). Mammalian thiol proteases have been found in liver and spleen lyosomes, but they occur also outside lysosomes, e.g. in plasma (Dayton et al., 1976) and in nuclei (Tsurugi & Oyata, 1980). The amino acid sequences of the lysosomal thiol proteases cathepsins B and H have been determined by Takio et aE. (1983), who pointed out the homology of the cathepsins with papain. Their relationship is suggested also by the similar reactions of the cathepsins and papain with high molecular weight inhibitors, detected in the cell (Kominami et al., 1982). It is remarkable to find such an appreciable plant and mammalian homology between sulfhydryl proteases. In this paper, we present a comparison of the refined structures of papain and actinidin. This comparison is discussed in some detail because knowledge about the variability of a protein for mutagenesis conformation is important experiments aimed at modifying the properties of a protein molecule. On the basis of this experimental evidence, we then compare the primary structures of five thiol proteases with reference to the three-dimensional structures of papain and actinidin.
2. Comparison of the Papain and Actinidin Three-dimensional Structures (a) Co-ordinates
and temperature
factors
A preliminary account of the comparative work on the actinidin (l-70 A resolution) and papain
and E. N. Baker
(2.5 A resolution) structures was given by Baker (1981). Now that the papain structure has been refined to 1.65 A and individual atomic temperature factors and the positions of solvent molecules are available (Kamphuis et al., 1984), a detailed study is possible. The actinidin polypeptide chain has 220 residues, of which only the first 218 are visible in high-resolution maps (Baker, 1980). Actinidin has three insertions and one deletion compared with the papain sequence. The insertions are: (1) Thr59 and Gln60 at the end of the second helix, LII (SO-57)t; (2) Gly81 at the end of the third helix, LIII; (3) Thrl71, Glul72, GlyI73 and Gly174 in an extended piece of structure connecting two antiparallel /?-strands. The only extra residue in papain is Gly194. Superposition of co-ordinates was performed by the method of Rao & Rossmann (1973). Table 1 gives the results of this analysis, while Figure 1 shows both C” backbones after optimal superposition. Separate calculations were made for atoms of the first domain (L, residues 10 to 111) and of the second domain (It, residues 1 t.o 9 and 112 to 206). The degree of three-dimensional similarity can be judged by two criteria: (1) the root-mean-square difference in co-ordinates after the last cycle, and (2) the fraction of atoms remaining in the last cycle. Using these criteria, it. can be seen that the inclusion of identical side-chain atoms besides the main-chain atoms hardly affects the results. This means that side-chain and mainchain conformations of identical residues are conserved to the same extent. The R domains of both proteins, which as secondary structure contain mainly B-sheet, are more closely similar than the L domains. The L domain includes most of the poorly conserved “middle region” (82 to 116), however, and if only the helices in this domain are used in the superposition, it appears that their relat,ive orientation is conserved to approximately the same extent as that of the anti-parallel b-sheet structure. Thus: the relative orientation of a-helices in the L domain and of p-strands in the R domain are among the most-conserved features of both structures and few mutations are allowed in these regions. It can be seen from the data in Figure 2 that where equivalent atoms agree more closely in position this tends to be accompanied by lower B-values for individual atoms and by a smaller root-mean-square difference in B-value. Residues near the active site and those actually involved in the catalytic process superimpose to an extent that approaches the atomic co-ordinate accuracy of both structures. As noted by Baker et al. (1980), the changes in the active site that do occur take place in the hydrophobic specificity “pocket”, the S2 subsite. In view of this high degree of similarity in active site residue positions, it is t The actinidin numbering is in italics. L11 refers to the second helix in the N-terminal domain L. For the nomenclature. see Kamphuis et al. (1984).
319
Comparative Studies of Thiol Proteases
Figure
1. Backbone superposition
of papain (heavy lines) and actinidin.
difficult to envisage a difference in catalytic mechanism for the two enzymes as inferred from disulfide with 2-pyridyl probes studies (Brocklehurst et al., 1979, 1981). The similarity extends to the position of the Asp158 carboxylate group, being far removed from the His159 imidazole ring. The shortest distance in papain between hydrogen-bond donors and acceptors in these two residues is 5.46 A for His159-Nd’ to Asp158-0”. A two-state
mechanism
(Angelides
& Fink,
1978) with
a high pH, inactive UP-conformation and a low pH, active DOWN-conformation, in which the sidechains of Asp158 and His159 interact, must t’herefore be considered unlikely: Asp158 and
Superposition
?‘ypet
His159 are in the UP conformation in both proteins, even though actinidin was crystallized at pH 6. The patterns displayed by the B-factors (Fig. 2) are grossly the same for both proteins, except in the insertions and neighborhood of deletions. Deviations occur also at positions 21 and 97 to 101. The former may be due to improper refinement, but this is difficult to judge. The latter differences correspond with the site of the largest conformational deviation between the two proteins. It is in the “middle part” of the polypeptide chain, where sequence homology is poor, and where most mutations involving acidic and basic residues occur.
Table 1 of the papain and actinidin
No. 0 f atoms
Fraction of atoms in the last cycle1
r.m.s.A (A)
638 888
0.76 0.73
0.402 0.3955
255 368
0.63 0.67 0.75 0.73
three-dimensional structures
Bpsp (AZ)
r.m.s.AB (A’)
9.06 9.37
11.41 11.12
5.09 4.87
0.350 0.383
7.97 8.22
10.48 10.65
5.03 4.98
0.273 0.273
9.86 10.13
10.80 11.23
4.29 4.57
0.278
7.07
8.93
4.21
9.72
10.81
3.93
4,,
(AZ)
A. Whole molecule I II B. First domain (10-111) 1 II (1. Second domain (l-9: I II
112-206) 307 450
I). Hdices in the jrst domain (24-42; 50-57; 67-78) 1 0.68 lW1
E. Active site residues (19; 23-25; 65-69; 133; 156-160; 175; 177; 205-207) 11
86
0.69
0.294
t The superpositions are: I, main-chain only (CT’,C, 0, N); II, main-chain and side-chain atoms identical in the sequence. .$ The final solution was approached in cycles. Atoms deviating more than 3~ at the end of a cycle were excluded in the next cycle. This criterion was used as a filtering factor for bad structural homologies. Usually no atoms were removed in the last cycle of this procedure (normally cycle 7 or 8). Each cycle was found to give convergence in less than 10 iteration rounds, 4 The optimal rotation matrix and translation vector are as follows: XACT ykm = 4CT JkCT, YACT, ZACTare the Cartesian atomic ordinates (A) at optimal superposition. 11Almost all atoms of helix LIII deleted. r.m.s.. root-mean-square.
0@0617 0.65267 0.61323 -0MO91 o.46143 KElzl l;E~l+ l-~;~g 0.78988 co-ordinates of actinidin in the crystal structure
(A) and zicr,
yACT, rkCT the atomic co-
I. G. Kamphuis,
320
o- IO
20
30
40
50
60
70
60
90
J. Drenth and E. N. Baker
100 Residue
110
120
I30
140
I50
160
170
I60
190
200
210
220
number
Figure 2. The average R-factor (A’) for main-chain atoms as a function of residue number. Papain is shown by the dotted line and actinidin by the continuous line. The difference AR ( z &,--R,,,) is indicated by the broken line. The average difference in position for corresponding atoms is depicted in the lower graph. The y scale of the latter graph is shown on the right. The vertical lines indicate the position of insertions and deletions. These are responsible for the difference in isoelectric point of the two proteins (9.3 for papain, 3.1 for actinidin). The level of the B-values is, in general, higher for papain than for actinidin. This is true for both the main-chain and the side-chains. At positions where AR, the distance between equivalent atoms in both structures, is large, the B-factors for papain atoms are usually higher than those for actinidin. They were compared in a quantitative way as follows. The displacement (u) = J@@?) was averaged per residue over the main-chain atoms. These (G) values were plotted as a function of the distance d of the residues to a central point P in the molecule et al., 1984). The esition of P was (Kamphuis varied. Calculated values for (u) were obtained with the linear function (u) = ad +b. For papain and actinidin, maximum correlation between the observed and calculated values for (u) was found for a point near the molecular centroid: for papain, with a correlation coefficient of 0.777 and with a (u) = 0.012d + 0.197; for actinidin, correlation coefficient of 0.704 and (u) = 0*008d + O-213. The distance-dependent term is primarily due to internal protein disorder. It is 50% higher for papain than for actinidin. Although the absolute level of R-values is not as reliable as their relative values, this difference between papain and actinidin could be caused by the difference in crystallization medium. A similar solvent effect has been found by Singh et al. (1980) for bovine trypsinogen at various concentrations of methanol. The second term,
which is mainly due to packing disorder, shows a comparatively small difference for both proteins. (b) Conformational (i) Main-chain
angles
conformation
Local changes in related protein structures can be detected easily from a comparative study of conformational angles. Figure 3 is a plot of the difference in the dihedral angles cp and II/. A strong similarity can be observed for the active site helix, LI (24-42). The largest differences are present near insertions and deletions. These main-chain-induced differences are spread out over just a few residues in the vicinity of the insertions/deletions. A change in $ is often followed by a change in cp of the next residue, folding the chain back. The largest change in backbone conformation not coupled to an insertion/deletion is composed of residues 96 to 107 (and, in particular, residues 97 to 101) and is sidechain-induced. As noted earlier, this segment, which lies exposed on the protein surface, has virtually no homology in sequence. The reason for the conformational change has been noted (Baker, 1981): in papain, Ser97-07 points inwards to form a hydrogen bond to the carboxylate of Glu52 (d = 2.81 A) and to W52 (2.54 A). This water molecule is also hydrogen-bonded to TyrlOS-OH, a buried residue. In actinidin, Ser97 is replaced by VallOO, and Tyr103 by LyslOG, both with exposed side-chains. Interactions as in papain are thus not possible in actinidin, resulting in an altered main-chain
Comparative Studies of Thiol Proteases
321
Figure 3. The absolute values, IAcpl and [A$[, of the differences in backbone conformational angles cpand $ for papain and actinidin as a function of residue number. The positions of insertions and deletions in the sequencesare indicated by arrows. The overall mean value for IAqj is 14.5” and for IA$I it is 18.7”. If values larger than 40” are deleted, then IAcpl = 8.3” (a = 0.5”) and IA&l = 9.3” (a = 0.6”).
conformation (see Fig. 1). Another interaction in this region that may disturb the conformation occurs between the side-chains of Arg98 and Glu99 In actinidin, in papain. these residues are substituted by Ala101 and Leu102. Finally, the assumed ammonium ion in actinidin, W4, is located in this area (bound to 103-0, 104-O and the carboxylate of Glu52). Fifteen glycyl residues in identical sequence positions fall outside the allowed regions of the 4/$ plot (Ramakrishnan & Ramachandran, 1965), but are at approximately the same positions in this plot for papain and actinidin; namely, in the region allowed for n-amino acids. This means that the molecule does not, or cannot, seize the opportunity to change the conformation of the main-chain at the position of these glycyl residues. (ii) Side-chain conformations Differences in side-chain conformations can be detected by plots of IAx and lAxzl (Fig. 4). Only identical residues are included in this plot, as nonidentical residues, especially in the case of branched side-chains, are not directly comparable. It may be judged from Figure 4 that the conformations of identical side-chains are generally retained to the same extent (i.e. [Ax11 and 1AxJ < 40”) as the mainchain conformation. Exceptions arise for external side-chains that are fairly flexible (e.g. Tyr4 and Gln73) or for surface or internal side-chains whose
environment has changed as a result of nearby sequence changes. Examples of the latter include: (1) Lysl7, in the interface between the two domains, has a different conformation. In papain, NC forms hydrogen bonds to Thrl4-OY’, Pro150, Gln47-OE’ and W6; while in actinidin, interactions are with Glu35-O”‘, Glu50-0”’ and Glu86-0”‘. The NC-OY’ interaction in papain is not possible in actinidin because Thr14 is changed to Va114. Moreover, Glu86 in actinidin is Arg83 in papain, Thus, the environment of the Lysl7 side-chain is different. At the position of 17-Nr in actinidin, however, we find a water molecule, W6, in papain. The result is that only minor rearrangements of the buried Glu35 and Glu50 and their associated water molecules are necessary to adapt to the changed situation. Most importantly, the network of hydrogen bonds in the domain-domain interface remains essentially unchanged, since those made by 17.Nr in actinidin are instead made by W6 in papain (see Fig. 5(a)). (2) The exposed Arg58 side-chains in the two proteins have a different conformation due primarily to the presence of a nearby insertion of two residues in actinidin. The position of this insertion is occupied in papain by the side-chain of Tyr6 1. (3) An example of a concerted change in conformation can be found in the 120” rotation in x1 for the buried Ser131 side-chain. In papain, 07
322
I. G. Kamphuis,
144
J. Drenth and E. N. Baker
K-17
t
R-56 -
Ig-2 II Q-73
106 c
0
I ? z 1x
36
72
Q
u
108
y-4 R-191
t 144 t 1601 0
I
1’1 20
1
1 ” 40
1 “‘I 60
‘~~“““~““‘~~“‘11”‘1 EO
100 Residue
120
I40
160
‘IIt
I”““’ 160
200
2
number
Figure 4. The absolute values, IAx1( and (Ax2(, of the differences in side-chain conformational angles x1 and x2 for identical residues in papain and actinidin as a function of residue number. The l-letter code is used to identify the amino acid residues.
binds to a cluster of water molecules, which also make specific interactions with the main-chain atoms of Ser205 and Ser206. Water molecules are not present in the same place in actinidin, because Ser205 and Ser206 become Met211 and Pro212 with a consequent slight change in the main-chain conformation and in the hydrogen-bonding environment. A hydrogen bond from a rotated Ser134-OY to a slightly displaced Thr33-07’ then seems preferable. (4) Several side-chains differ substantially in x1 and xz, yet occupy similar spatial positions. For Lys211, changes in x1 are partially compensated for by changes in other conformational angles, so that the NC atoms are not very far apart. N5 forms hydrogen bonds to the same main-chain atoms (106-O and 109-O) and to a water molecule but, as noted before, the main-chain conformation is somewhat different in this region, and this may be the reason for slightly different side-chain conformations for Lys211. For Argl91, as for Lys211, Ax1 is fairly large, but rotations about bonds subsequent in the side-chain result in rather close spatial positions of the guanidino group in both proteins. The environment and orientation of this group, however, is different in the two proteins as a result of amino acid replacements relatively distant in the amino acid sequence, but close in the three-dimensional structures, and because of the close approach of a neighboring molecule in papain (Argl91 forms an intermolecular salt bridge with Aspl40). In papain, ArglSl-NS forms a hydrogen bond with Glyl67-0, whereas in actinidin N”’ forms a hydrogen bond to the equivalent Glyl7’0-0. In
ain, 191-N”’ is connected by a hydrogen bond to paP 0” of Gln118, which in turn is connected, through NE’, to the internal Tyr203-09. In actinidin, Tyr203 becomes Ala209 and Gln118 becomes Glu121. The carboxyl group of Glu121 moves in to take the position occupied in papain by Tyr203-0” (see Fig. 5(b)). The guanidino group of Arg198 in actinidin then cannot form a hydrogen bond to Glu121. Instead, the N@ atom is bound via water molecule WI0 to Asn199-0” (Gly in papain). (5) In actinidin, the side-chain of Thr210, which is partially exposed, forms a hydrogen bond to AsnllS-Od’. In papain, the corresponding residue is Tyrll6, and Thr204-OY’ is now involved in a hydrogen bond to Gly201-0. The Thr atoms Oy’ and Cy2 almost exchange positions. (c) Other structural features Details of the hydrogen bonding in actinidin and papain have been given elsewhere (Baker, 1980; Kamphuis et al., 1984). There is a distinct pattern in the extent to which hydrogen bonds are conserved in the two proteins. Papain has 93 mainchain-main-chain hydrogen bonds and aetinidin has 101, with the small difference arising mainly from the extra residues in actinidin. A total of 90 of these 93 hydrogen bonds are common to both proteins, i.e. almost all (97%) of the main-chainmain-chain hydrogen bonds in papain occur also in actinidin. This generally reflects the very strong conservation of secondary structures. With respect to side-chain-main-chain hydrogen bonds, papain has 54 and actinidin has 56, and the majority (38,
323
Comparative Studies of Thiol Proteases (TYR
203)
j ‘.
,’
’ (W71 W?irp> ,’ . ,’ -x
\_ ‘.
LVS 181074)
(199) (199)
-\. .\ -\ \\
XI
5
, ,‘.
(1901 (1901 w
ACT
( PAP1
@XT@ (b)
(al
Figure 5. Examples of the effects of sequence changes on the structures of actinidin and papain. Residues between parentheses are in papain. In (a) the conformation of Lysl7 is different in papain (compared with actinidin) because 17-N< is hydrogen-bonded to Oy’ of Thr14 (Val in actinidin). However, water molecule W6 fills the position occupied by 17-N” in actinidin so that the hydrogen-bonding network is conserved. Note also the conserved internal water molecules. The effect of substituti;! the internal Tyr203 by Ala is seen in (b). The position occupied in papain by Thr203-0” is filled in actinidin by 0 of Glu121 (Gln118 in papain) such that hydrogen bonding to an internal water molecule remains unchanged. or 70% of the total)
are also common to both
proteins.
in this
Furthermore,
group
there
are an
additional six cases where the loss of a hydrogen bond in one protein (e.g. due to an amino acid sequence change) is compensated for by the appearance of a new hydrogen bond involving a neighboring residue. Thus, the overall degree of conservation
of
main-chain-side-chain
hydrogen
bonds is of the order of 80%. Many of these stabilize regions without secondary structure (e.g. external loops). Finally, there is a much smaller of side-chain-side-chain hydrogen conservation bonds, because so many of the external polar sidechains are changed. Papain has 31 such hydrogen bonds, actinidin has 27 and only nine (about 30%) are common to both. In general, the spatial positions of hydrogen bonds are conserved more than are the specific groups involved. Examples are given by the internal water molecule W6 in papain, which fulfils the hydrogen-bonding
role of Lysl7-NC
in actinidin,
and by Gly121-0” in actinidin, which takes the position of Tyr203-Ov in papain (changed to Ala209 in actinidin) and so preserves the hydrogen bonding to an internal water molecule (see Fig. 5(b)). A number of changes are found for residues in the hydrophobic cores of both domains. This may be
attributable to imperfect packing of the side-chains, which leaves small “holes” (Baker, 1981). Where a hydrophobic residue is identical in the sequence, its side-chain conformation is approximately the same in every case, except Ile189, where a rotation of 120” takes place, such that Ile189-CY’ superimposes on Ile196-C?. Examples of concerted changes involving pairs of residues in van der Waals’ contact are Va132 . . , Ala162 to Ala32 . . . Va1165 Ile77. In most and Ile34 . . . Va175 to Va134 cases, however, the adaptations are more complex and involve several residues, as has been observed in comparative studies of the immunoglobulins (Lesk & Chothia, 1982). The packing eficiency appears to be approximately the same in both proteins. For instance, papain has 244 non-bonded C-C distances of less than 3.80 A and actinidin has 238. Another feature probably related to the similar hydrophobic cores in the two proteins is that the central a-helix LI (24-42) in both shows a similar curvature as it packs over the core of the L domain. There is a break in the a-helical hydrogen bonding at 29-O in both proteins. Some amino acid substitutions affect buried charges. Examples are: (1) Arg83 t* Glu86: in both enzymes, the side-
324
I. G. Kamphuis,
J. Drenth and E. N. Baker
chain is bound to water molecules with low B-factors: 83-N” and 86-O” both interact with while Glu86-0”’ also interacts with Glu50-0”‘) Lysl7-NC which, as noted before, has a different conformation in the two proteins. (2) Ser70 ++ Asp72: Ser70-OY interacts with the Asp57 carboxylate group (Gly57 in actinidin). Asp72 in actinidin is attached to water molecules and to Thr62-OY’. The latter residue is close to an insertion in actinidin. (3) Gln118crGlul21: as noted earlier, the carboxylate of GlulZl occupies the position of the internal Tyr203-0” in papain (Ala209 in actinidin), where it interacts with the internal water molecule W16 and with NE1 of the internal Trpl78 sidechain. Most amino acid replacements that involve a large change in shape and/or charge of a side-chain are found at the exterior of the protein, most prominently near insertions and deletions in the amino acid sequence. The best-conserved residue types are Cys (lOOye), Trp (SO%), Phe (75%) and Gly (71%). Apart from the acidic and basic residues that give rise to the different isoelectric behavior of the proteins, Asn (conservation rate 15%), Gln (31%), Ser (29%) and Thr (38%) are least wellconserved. Met occurs twice in the actinidin sequence but is not present in papain. The space of filled by Ile187. The Met194 is partly Met211 * Ser205 replacement, however, results in different surface characteristics of the S, binding site, explaining the preferential binding of aromatic side-chains in papain to this subsite (Baker et al., 1980). (d) The solvent structures in papain and aetinidin There are 272 water molecules in the refined actinidin structure (Baker, 1980). In the papain the positions of 195 water and 29 structure, methanol sites have been determined (Kamphuis et
Internal Actinidin Wl W2 W3 W6 W7 WS w9 w13 w14 w15 W16 w17 w19 W23 W26
al., 1984). The total number of hydrogen bonds made by solvent molecules differs substantially for the two proteins. In actinidin, 185 main-chainsolvent, 125 side-chain-solvent and 195 solventsolvent interactions shorter than 3.3 A are found. In papain, the corresponding numbers are 122, 124 and 93. Thus, the solvent around actinidin appears to be better ordered. This is not surprising, as the crystallization medium was quite different for the two proteins. Figure 6 shows the number of solvent, molecules structurally identical within 1.5 A, as a function of the B-value. It can be concluded that solvent structure identity is largest for low B-factor water molecules. The apparent rise in the number of solvent molecules in identical positions after number 140 can be explained by an extra dip in the curve in the region 60 to 140 in both structures, due to the presence of intermolecular water molecules with low B-factors. These water molecules cannot show any correspondence in position, due to a different crystal packing. Some water molecules in surface pockets in actinidin are replaced by methanol molecules in papain (the shaded area in Fig. 6). In these cases, the temperature factor of the methanol oxygen atom is substantially larger than the corresponding water temperature factor. Table 2 lists the 15 internal water molecules found in both the papain and actinidin structures. The root-mean-square difference in position and B-value for these molecules is comparable to the low values found for other structure elements. Only two internal “solvent” molecules present in actinidin are not found in papain. One of these is proposed to be an ammonium ion in actinidin. In papain, there is in any case no space for a water molecule in this position, because of the presence of the partially buried Tyr103 aromatic ring. The corresponding residue in actinidin is an exposed lysine. The other is the internal water W5 in actinidin, which in fact lies only 0.49 A from the position of the oxygen of methanol 23 in papain.
Table 2 water molecules in nearly identical positions (within 1 8) in actinidin and papain B
Papain
B
AR (4
7.24 7.94 7.98 8.89 9.29 9.39 9.65 10.36 10.63 10%) 11.05 11-11 12.04 12.94 14.17
Wll w3 w4 W16 Wl w2 w7 w17 W8 w19 W27 w5 W28 w14 w13
10.46 8.13 9.04 11.92 6.65 7.19 9.60 12.18 10.09 13.12 15.33 9.10 15.91 10.95 IO.81
0.71 0.52 0.39 0.29 0.45 0.19 0.30 0.46 0.13 0.34 0.52 0.42 0.58 0.99 0.16
r.m.s. AR = 0.481; r.m.s. AB = 2.48 r.m.s., root-mean-square.
Bm-B,,, -2.22 -0.19 - 1.06 - 3.03 + 2.64 +2.20 + 0.05 - 2.74 f0.55 -2.22 -4.28 +2.01 -3.87 + 1.95 + 3.36
(A')
Comparative
Studiee of Thiol Proteases
2.C
1 15
z
I(
5
(
40
60
Identification
120
160
200
number of actinidin
240 solvent
260 sites
Figure 6. Similarity in solvent structure in actinidin and papain crystals. The solvent molecules in actinidin are numbered in order of increasing B-value. The ordinate gives the number of identical sites for 20 consecutive water molecules in the actinidin crystal structure. The hatched areas indicate the number of methanol oxygen atoms in papain present on actinidin water sites. The optimal superposition matrix and vector were applied to the co-ordinates of the actinidin solvent molecules. Solvent sites were considered to be identical when they were within a distance of 1.5 A.
The very high degree of structural identity between papain and actinidin is thus found both for their polypeptide chains and for water molecules incorporated within the molecular structure. In contrast to the internal water molecules, the positional correspondence for water in surface pockets and depressions exists only for seven out of 34 molecules in actinidin. The identity is greatest for water molecules with low B-factors. Of 18 water molecules in the active site cleft in actinidin, only t’wo are preserved in papain, but most are very poorly ordered, with high B-factors.
3. Comparison of Primary Sequences of Five Thiol Proteases (a) Evidence for structural homology from sequence h,omology (i ) General comparison Table 3 gives a compilation of the sequence data available for sulfhydryl proteases. In the following, the papain numbering scheme will be used. The papain and actinidin sequences were aligned on the structure. basis of their three-dimensional Alignment of the stem bromelain (Goto et al., 1976,
325
1980) and cathepsins B and H sequences (Takio et al., 1983) was according to the original publications, with minor modifications on the basis of the compatibility of insertions and deletions with the three-dimensional structure, and of the presence of comparable insertions and deletions in actinidin and stem bromelain. Apart from segments near the active site residues, the streptococcal proteinase sequence is difficult to align (Tai & Liu, 1976) and is therefore not included in the Table. Four criteria were applied to estimate the conformational similarity for the five proteins in Table 3. (1) Retention of hydrophobicity in the cores of the two domains. (2) cp/ll/ combinations for glycyl residues outside the normally “allowed” regions. (3) The average minimum base change per codon. (4) The position of insertions and deletions. In applying the first criterion, it was found that only in a single case out of 26 substitutions in the cores is the hydrophobic character lost; namely, for Glu203 in cathepsin B. cpl$ combinations for glycyl residues outside the normally allowed regions very often have a functional significance. They can be subdivided into two categories. Type I glycyl residues, which are near the aL region in the so/$ plot (Ramakrishnan & Ramachandran, 1965), are involved mainly in turns. They provide conformational flexibility and facilitate insertions and deletions in the sequence. Taking papain and actinidin together, we find type I glycyl residues in 12 positions. ,411 of them are at or close to the surface and seven are near et al., 1984). deletions (Baker, 1980; Kamphuis Type II glycyl residues are found in the upper and lower right corner of the cp/rl/ diagram. They are present in 15 positions in Table 3, mainly in regions of appreciable sequence identity, e.g. 61 to 68. Serious alterations of the main-chain conformation through steric interactions would be caused by replacements of type II glycyl residues because of the introduction of a C’ atom. We calculated the minimum base change per codon including only residues that could be aligned reasonably well (i.e. 1 to 81, 117 to 212). The results are shown in Table 4. The minimum base change per codon is always obtained in much smaller than the values random comparing sequences: 1.24 to 1.60 (Dickerson, 1971). From the values shown in Table 4, one would conclude that the plant thiol proteases are evolutionarily most closely related among themselves, less so with the cathepsins, and that a closer relationship with the plant, enzymes exists for cathepsin H than for cathepsin B. This could mean that, in an evolut’ionary sense, cathepsin B diverged from the common ancestral gene long before cathepsin H and the plant thiol proteases. The position of insertions and deletions is such that they can be accommodated easily in the papain structure. The conclusion from the comparison of the five
326
I. G. Kamphuis,
J. Drenth and E. N. Baker
Table 3 Sequence data for papain (PAP), actinidin (ACT) (Carne & Moore, 1978), stem bromelain (SB) (Goto et al., 1976,198O) and cathepsins B and H (Takio et al., 1,983) PAP ACT1 SB CA-R CA-H
PAP ACT41 Sl3 CA-R CA-H
1 6 7 10 15 16 20 30 I PEYVD----WRQKGAVTP-VKNQGSCGSCWAFSAVVTI EGI LPSYVD----WRSAGAVVD-IKSQGEC’GGCWAFSAIATVEGINKI VPQS I D- - - - WRDYGAVTS - VKNQNPCGACW(A,F,G)AI ATVE(S,V)AS LPESFDAREQWSNCPTI A&-I RDQGSCGSCWAFGAVEAMSDRI YPSSMD----WRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAI 1 uauauaauuuuaaua~a 0 PB P E c c CC c 41 50 58 60 70 RTGNLNQYSEQELLDCDR--RSYGCNGGYPWSALQLVAQYp--m-G TSGSLISLSEQELIDCGRTQNTRGCDGGYITDGFQFIINDG----m-G YKGTLQPLSQQQVDDCAK HTNVNVEVSAEDLLTCCGI QCGDGCNGGYPSGAGNFWTRKGLVSGG ASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYI a u uauaauuu ++uauaauuuauuu~ +1 1 C C CC 0 +=g ++
=-
PAP ACT112 SB CA-B CA-H
85 86 IHYRNT-PPYYE---------GVQRYCR-SREKGPY-----.4AKTD INTQEN--YPYT--p---p--AQDGDCD-VALQDQK---p-YVTID
PAP ACT190 SB CA-B CA-H
96
97
cc
c
i 104
K-
&PI
lJ c
c
140
TVAVVS
BBPPB c c C
C
108
+
127 128 130
109 116 117 119 120 GVRQVQPY-p--p---N&G-ALLYSIAN-QPVSVVLQAAGKDFQLY TYDNVPYN---p----NEW-ALQTAVTY-QPVSVALDAAGDAFKQY KARVPR- - - - - - - NN’ES S MYAVS SYKEDKHYGYTSYSVSDSEKEIMAEIYKNGPVEGAFTVF-SDFLTY N’VVNIPTL-p------NDEAAMVEAVALYNPVSFAFEVT-EDFMMY uuuuauuuauuu PPPBB
79
LYN-----KG
103
VYNSHIGCLPYTI PPCEHHVN’GSRPPCTGEGDTPKCNKMCEAGYST IMGEDS-PYPYIP--p-----GKNGQCKFNPEKA---p---VAFVK 1 0 % C
+ PAP ACT148 SB CA-B CA-H
90
I (‘I
78
z?
80
PAP ACT83 SU CA-S CA-H
40 I KI
A-
144
ANF(Z,L,Y)
u u 61a u c
180 182 155 167 168 170 145 149 150 154 160 RGGIF-VGPCG-p-NKVDHAVAAVGYG---PGYILIKNSWGTGWG ASGI F-TGPCG---TAVDHAI VI VGYGTEGGVDYWI VKNSWDTTWG KS GVGDGYC K- - DKLNHAVTAI GYNP - - - KAEFGD(G,S,G)KKARWG KS GVYKHEAGD- - VMGGHAI RI LGWGI ENGVPYWLVANS WNVDWG KSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWI VKNSWGSNWG + + BBBPBBB + BBPBBBPPBB == ++ B 2-k; c ( c c (+-) c * + 190 200 183 ENGYI RI KRGTGNSYGVCGLYTSSFYPVKN EEGYMRI LRNV-GGAGTCGI EAGYI RMARDVSSSSGI CGI DNGFFKILRG----ENHCGIESEIVAGIPRTQ NNGYFLI ERGK----NMCGLAACASYPI B BBBBPBBO El== + 10 z=cc UC c + w c
210 ATMPSYPVKYNN AI DPLYPTEG P&V PBPBP C
The actinidin numbering is in italics. Explanation of symbols: a, a-helix; 8, p-sheet; q , type I Gly in PAP; m , type I Gly in ACT; + , type II Gly in PAP; + , type II Gly in ACT; C, residue in hydrophobic core; 1, S-S bond; c, carbohydrate attachment at the precedmg residue; cis, the peptide bond preceding the prolyl residue at this position has the cis configuration. Type I glycyl residues are near the a,-region in the rp/$ plot; they are involved mainly in turns. Type II glycyl residues are in the upper and lower right Corner of the cp/ij plot and are mainly in regions of appreciable homology.
Comparative
Studies of Thiol
327
Proteases
Table 4 Number
of identical residues (Jirst entry) and average minimum change per codon values (second entry) for the sequences aligned as in Table 3 PAP
PAP ACT CA-B CA-H SB
25; 0.737 26; 0.737 27; 0.684
CA-B 59; 0.918 53; 1.012
ACT 18; 0.895 \ 26; 0.702 28; 0.737
16; 1.053 16; 0.982
base
CA-H 74; 0.788 76; 0.771 62; 0.971 26; 0.719
The upper right part of the Table includes residue numbers 1 to 81 and 117 to 212. Below the diagonal, only the first 59 residues in all sequences are usedin order to be able to include bromelain. PAP, papain; ACT, actinidin; CA-B and -H, cathepsins B and H; SB, stem bromelain.
sequences in Table 3 is that these thiol proteases have very similar structures. (ii) Speci$c comparison into three The sequences can be divided segments: two strongly homologous parts, the X-terminal part (1 to 81), the C-terminal part (117 to 212) and a middle part (82 to 116), with virtually no identity or homology. (1) The N-terminal part (1 to 81). Extensive homology can be observed from residue 16 onwards. Glnl9 is conserved in all sequences: the carboxamide NE2 atom of this residue is important in the catalytic mechanism, because it stabilizes the tetrahedral intermediate by taking part in the formation of the “oxyanion hole” (Drenth et al., 1976). The disulfide bond 22-63 can be traced in all sequences. The sequence along the active site helix LT (24-42), which contains the essential Cys25 and which contributes side-chains to the core of the first domain (I,) and to the interface between domains L and R, shows a high degree of homology. The other two helices, LII (50-57) and LIII (67-78), show much less pronounced homology. In particular, LIII is extended by one residue in actinidin, and may be further extended in cathepsin B by the insertion of four residues. In the three-dimensional structure, this segment is adjacent to residues 103 to 105, which are part of a region with virtually no homology, and in the vicinity of which there are further insertions in cathepsin B. The buried acidic residues (Glu35, Glu50, Glu52 and Asp57) are not very well conserved. The sidechains of Glu35 and Glu50 in actinidin and papain t’ake part in a complex arrangement of compensating charges involving the side-chains of Lysl7 and Lysl74, and a number of water molecules with low B-factors (Fig. 5). Lysl7 is invariant in all sequences, except for cathepsin B where it is replaced by Arg. It is interesting to note that Lysl74 is conserved in those sequences where Glu50 is also conserved (papain, actinidin and cathepsin H). This suggests the evolutionary conservation of an inter-domain electrostatic interaction involving oppositely charged side-chains and water molecules. Extensive homology again exists for residues 62 to 70, which form part of the active site wall. Three conserved glycyl residues,
whose cp/$ values lie outside the allowed regions in the cp/$ plot, are in close proximity in this 62 to 70 segment. (2) The middle part (82 to 116). When the polypeptide chain in papain and actinidin leaves helix LIII it is far removed from the active site and folds across the first domain (L), separating the interior residues from solvent. Near residue 108 the chain crosses to the second domain (R) and forms the outer strand of the b-sheet structure of this domain. No further secondary structure is present in segment 82 to 116. In the papain and actinidin responsible for the sequences, most substitutions difference in p1 (values 9.3 and 3.1, respectively) and for the largest, difference in three-dimensional structure are found in this region. It is thus clear that the insertion here of 17 residues in the cathepsin B sequence should be possible without major changes in the overall molecular conformation and without affecting the conformational organization of active site residues. Bromelain and both cathepsins are glycoproteins. The attachment sites of the carbohydrate moiety are indicated in Table 3. They, too, lie in this middle region, and are at sequence positions corresponding to surface residues in the papain three-dimensional structure. (3) The C-terminal part (117 to 212). The main secondary structural element of segment 117 to 212 is a twisted antiparallel P-sheet forming a P-barrel. Two a-helices (RI and RII) cover the ends of this barrel. A high degree of structural similarity between papain and actinidin is found for the p-strands. This is nicely reflected in the alignment of the sequences from the central /l-strands 130 to 135 and 159 to 167. A major difference in conformation for the cathepsins may be expected for the segment 149 to 155. Three arguments may be put forward in this respect: (1) the S-S bond connection 153-200 is not retained in cathepsin B; (2) in cathepsin H, an insertion of three residues is observed; (3) cis-Pro152, found in papain and actinidin, is not conserved. The non-prolyl residues will almost certainly have a trans peptide unit. The insertion between 167 and 168 in the cathepsins is analogous to that observed in actinidin: it would involve an elongation of two adjacent anti-parallel p-strands. Finally, an insertion in papain of four residues
I. G. Kamphuis,
328
J. lkenth
between positions 192 and 198 occurs with respect to cathepsins H and B, and of one residue compared to actinidin. In papain, this segment forms a U-shaped loop projecting into the solvent. The temperature factors indicate that this loop is very flexible (Kamphuis et al., 1984). The absence of this loop would not seriously alter the folding of the rest of the protein molecule. The active site His159 is found in all sequences. Asn175 is also retained in all sequences, except for bromelain. The role of this residue is to orient the His159 imidazole ring by forming a hydrogen bond between 17506’ and 159-N”’ (Drenth et al., 1976). In the refined structures of papain and actinidin, the side-chain Ndz of Asn175 has no hydrogenbonding partners within 3.3 A. Therefore, a carboxamide group would not be strictly necessary, and a serine Oy could perform the orienting role equally well. This suggests the sequence (G-S-G) for bromelain. Another indication that this orientation mechanism for bromelain differs from that of actinidin and papain comes from the change of Trp177, which shields the His159-Asn175 hydrogen bond from the solvent, for Lys in bromelain. Asp158 in papain and actinidin, which has been implicated in alternative serine protease-like mechanisms for thiol proteases (Zannis & Kirsch, 1978; Angelides & Fink, 1978), is substituted by Asn in bromelain and cathepsin H, and by Gly in cathepsin B. By the concomitant change in charge, this substitution would inactivate the enzyme in such mechanisms and they must therefore be regarded as very unlikely. Valyl residues 133 and 157 are responsible for the substrate specificity of papain (Drenth et al., 1976). Va1133 is replaced by Ala in all other sequences, while in bromelain and cathepsin B, Va1157 is substituted by Leu and Gly, respectively. This is in accord with differences in specificity found between papain and stem bromelain (Lowe, 1976) and between papain and actinidin (Baker et al., 1980). 4. Discussion The results presented in this paper show that the folding of the polypeptide chain in papain and actinidin is very similar indeed. There is no indication that the results are affected in any way by different data collection techniques or refinement methods. Moreover, the difference in crystallization solvent and the different packing of the molecules in the crystals do not appear to have an appreciable influence. In this respect, it will be most interesting to include the crystal structure of the papain D modification at 2-O A resolution in the comparison. The refinement of this structure has been completed (Priestle et al., 1984). The greatest similarity is found for the conformation and spatial arrangement of the active site residues, the internal water, the cr-helices and the anti-parallel b-sheet structure. This suggests that the secondary structural elements form the skeleton of the molecule and that their interaction
and E. N. Baker
is the main factor in directing the fold of the polypeptide chain. Therefore, substitution of residues in the skeleton will, in general, have the most drastic effect on the conformation of the protein molecule. Some main-chain-side-chain hydrogen bonds are also strongly conserved, and these may determine the folding of non-repetitive parts of the structure. Side-chain conformations of identical residues in the sequence are conserved well, although some subtle changes are found due to neighboring mutations. Surprisingly, residues in the hydrophobic cores are not conserved well, although they do remain largely hydrophobic. The packing of t,he side-chains in these areas allows for mutations conserving hydrophobic character (Baker, 1981; Lesk & Chothia, 1982). Extensive amino acid exchanges and threedimensional structural differences are present in only one small segment of the polypeptide chain (around residue number loo), which is far removed from the active site and is not involved in any organized secondary structure. Insertions and deletions in the sequences disturb conformational homology over a very limited range of two to three residues and apparently do not influence the major conformational characteristics. However, they can change the molecular surface properties of the protein, e.g. charge and solubility. appreciably. Solvent in intermolecular contact areas is wellordered in both proteins. In other areas, papain has a less ordered and less extensive solvent shell than actinidin. Therefore, the conformation of individual groups in papain, especially near the surface, is less tightly constrained by the solvent structure. This results in a larger internal motion of the protein. Although no three-dimensional structure information is available for stem bromelain or the eathepsins B and H, the combination of sequence with the high-resolution, threeinformation dimensional structures of papain and actinidin suggests that the overall folding pattern is grossly the same and that these proteins utilize the same catalytic mechanism as papain and actinidin. We thank Dr M. C. H. De Maeyer for assistance with the Evans and Sutherland Picture System II, and Dr W. G. J. Ho1 for many valuable discussions. This work was supported in part by the Netherlands Foundation for Chemical Research, with financial aid from the Netherlands Organization for the Advancement of Pure Research.
References Angelides, K. J. & Fink, A. L. (1978). Biochemistry, 17, 2659-2668. Baker, E. N. (1980). J. Mol. Biol. 141, 441-484. Baker, E. N. (1981). In Structural Studies cm Molecules of Biological Interest (Dodson, G., Glusker, J. P. & Sayre, D., eds), pp. 339-349, Clarendon Press, Oxford. Baker, E. N., Boland, M. J., Calder, P. C. & Hardman, M. J. (1980). Biochim. Biophys. Acta, 616, 30-34.
Cmparative
Studies of Thiol
Brocklehurst, K., Malthouse, J. P. G. & Shipton, M. (1979). Biochem. J. 183, 223-231. Brocklehurst, K., Baines, B. S. & Malthouse, J. P. G. (1981). Biochem. J. 197, 739-746. Carne, A. & Moore, C. H. (1978). Biochem. J. 173, 73-83. Dayton, W. R., Goll, D. E., Zeece, M. G., Robson, R. M. & Reville, W. J. (1976). Biochemistry, 15, 2150-2158. Dickerson, R. E. (1971). J. Mol. Biol. 57, 1-15. Drenth. J. (1976). In Handbook of Biochemistry and Molecular Biology (Fasman, G. D., ed.), pp. 356-357, CRC Press, Inc., Cleveland. Drenth, J., Kalk, K. H. & Swen, H. M. (1976). Biochemistry, 15, 3731-3738. Goto, K., Murachi, T. & Takahashi, N. (1976). FEBS Letters, 62, 93-95. Goto. K., Takahashi, N. & Murachi, T. (1980). Znt. J. Pept. Protein Res. 15. 335-341. Heinemann, U., Pal, G. P., Hilgenfeld, R. & Saenger. W. (1982). J. Mol. Biol. 161, 591-606. Kamphuis. I. G., Kalk. K. H., Swarte, M. B. A. & Drenth, J. (1984). J. Mol. Biol. 179, 233-257. Kominami, E., Wakamatsu, N. & Katunuma, P. (1982). J. Biol. (Them. 257, 14648-14652. Lesk, A. M. 6 Chothia, C. (1982). J. Mol. BioZ. 160, 325-
Proteases
329
Lowe, G. (1976). Tetrahedron, 32, 291-302. Lynn, K. R. & Yaguchi, M. (1979). Biochim. Biophys. Acta, 581, 363-364. Lynn, K. R., Yaguchi. M. & Roy, C. (1980). Biochim. Biophys. Acta, 624, 579-580. Priestle, J. P., Ford, G. C., Glor, M., Mehler, E. L, Smit, J. D. G., Thaller, C. & Jansonius, J. N. (1984). Acta Crystallogr. sect. A, 40, suppl., Hamburg Congress Abstracts C- 17. Ramakrishnan, C. & Ramachandran, G. N. (1965). Biophys. J. 5, 909-933. Rao, S. T. & Rossmann, M. G. (1973). J. Mol. Biol. 76, 241-256.
Singh,
T. P., Bode, W. & Huber, R. (1980). Acta sect. B, 36, 621-627. Tai, J. Y. & Liu, T.-Y. (1976). J. Biol. Chum. 251, 19551959. Takio, K., Towatari, T., Katunuma, N.. Teller, D. C. & Titani, K. (1983). Proc. Nat. Acad. Sci., U.S. A. 80, Crystallogr.
3666-3670.
Tsurugi, K. & Oyata, K. (1980). Eur. J. Riochem. 109, 9-15. Zannis, V. I. & Kirsch. J. F. (1978). Biochemistry, 17. 2669-2674.
342.
Edited by R. Huber