A model to predict functional sites of protein: Active sites of trypsin

A model to predict functional sites of protein: Active sites of trypsin

Nonlinear Annlysis, Theory, Methods & Applications, Vol. 30, NO. 8, pp. 5395-5402, 1997 Proc. 2nd World Congress ofNonlinear Analysts 0 1997 Elsevier ...

668KB Sizes 3 Downloads 70 Views

Nonlinear Annlysis, Theory, Methods & Applications, Vol. 30, NO. 8, pp. 5395-5402, 1997 Proc. 2nd World Congress ofNonlinear Analysts 0 1997 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0362-546X1.97 $17.00 + 0.00

PII: SO362-546X(97)00070-9

A MODEL

TO PREDICT ACTIVE

FUNCTIONAL SITES OF PROTEIN: SITES OF TRYPSIN

YUKIO KIHOt

and AK0

UBASAWAS

tkhihara Sangyo Kaisha, Ltd., 1-3-15, Ekloboxi,Nishi-ku, Osaka 550 Japan; $Woodward Laboratory,Nara-cho, Aoba-lru,Yokohama,227, Japan Key uwrdr and phru.w: protein, amino acid sequence, active site, functional site, protein folding, trypsin, Fprotein of Newcastle disease virus, DEV, DD, higher-dimensional structure, comector.

1. INTRODUCTION

A primary amino acid sequence is one of the parameters that determine a function of protein. Each amino acid is arranged linearly and the sequence is not random in nature. In general, a functional system of protein is composed of several functional sites, which are scattered along a linear protein molecule. Why are those functional sites scattered? Some sophisticated technologies such asX-ray crystallography and NMR spectroscopy have revealed a three-dimensional structure of protein and suggested that a protein must be folded into such structure to perform its function. However they do not tell us how a Ijrotein molecule, once made as a linear form, is folded into a three-dimensional structure, and how or why some specific amino acids are selected as active sites. In 1988, one of us (YK) proposed a ‘deviation (DEV) model’ [l] for a qualitative analysis ofamino acid sequence distribution in protein. It starts from the notion: “The amino acid sequence of a protein is not random. A functional site must correlate with deviation from the randomness.” However, it would be so complicated to specify inherently the randomness of a given sequence that a somewhat different approach was taken, as outlined in Section 2. Using the DEV model as a tool and trypsin as its material, our attempts are described for prediction of the functional sites in Section 3, for proteinprotein interaction in Section 4, for a model with higher-dimensional structure in Section 5, and for prediction of the active sites in the final section.’ 2. PRELIMINARIES Hypothesis Z. A random sequence would be characterized in the sense that each amino acid is uniformly distributed throughout the sequence. In other words, within any region of the sequence, the expected frequency of occurrence of an amino acid is equal to its proportion in the overall sequence. Based on this notion, a regional deviation (DEv> was defined. Fig. 1 shows a simple linear representation of the amino acids in a protein, where Nis the total number of amino acids and Xis the length of some region in the sequence. If the number of occurrence of an amino acid z found in the protein is N(z), then its overall proportion F(z) is N(z)/N. When we look at a region whose length is X amino acids and follow the assumption that the amino acids are uniformly distributed, the expected number of a found inthe region is XF(a).Then letting n(z) denote the actual number of z found in the region, the Region deviation from uniformity of the amino acid z can be quantified by

Fig. 1. Linear representation

Dzw(z) -

of a protein.

5395

[n(z)-

m(z)]’

(2.1)

5396

Second World Congress of Nonlinear Analysts

Now, summing these quantities over all the amino acids present in the protein, the regional deviation is delined as DEV-

$DEV(z)lX= 2-A

-&n(z)-XF(z)]h

(2.2)

2-A

where z is the amino acid, A, R, N, D, C, Q, E, G, H, L, I, K, M, F, I’, S, T, W, Y, andV. Note that if a particular amino acid is not present in the overall sequence, none of it will be added. For the size of the region, smaller number ofX gave a complicated DEV pattern. In contrast, a DEV pattern smoothed out with larger X, for instance X=20. From the biochemical standpoint, there are many examples in which a few amino acids participate in a particular function. [2] Therefore X=5 was typically used. In a three dimensional structure of protein, some sites which are scattered in the primary amino acid sequence (Fig.2(a)) come closer. Suppose that a peptide ‘ABC’ locates near a peptide ‘DEF’ as shown in Fig.Z(b) in which a site-B lies close to a siteE. If the site-B is a functional site, its DEV value (a) One-dimensional structure that is calculated by the formula (2.2) in the linear I I I I I I arrangement of a protein should be appreciably I I I I I I higher. Then let us imagine a hypothetical D E F ABC molecule ‘ABEC’ in the form that the site-E is (b) Higher-dimensional structure incorporated into the site-B to make a site-BE. The DEV value of the site-BE would be decreased, GTT C because of effect of the site-E (Fig.2(c)). This idea is formulated as a decrease rate of DEV (DD). aD =[I-z;)]xlOO

(c) Quantification

of DD

DEV(B)

Fig. 2.

Location of the sites in a protein.

(2.3)

DD can be interpreted as a binding activity between a site-B and a site-E. [l] After the siteB binds to the site-E, the former loses its binding function because no more binding with another site-E is possible. However, it does not tell any physicochemical nature of the bond between them. In addition, DD is considered to be a measure of the distance between a site-B and a site-E. Large DD value means a short distance. Note that DD is a kind of vector assigned to’the site-B and it directs to the site-E.

HypothesisII. When a perturbation is given to a protein, the interaction takes place at the surface first. Most of protein function is observed when a protein interacts with some perturbation coming from outside (PT-E in Fig.2(b)). As any perturbation may not interact directly with an interior of a protein, a functional site may be located on or near the surface. Such a functional site that should have some higher DEVvalue suggests its exposed topology, which correlates more or less inversely with the DD value. Then, a value of DEWDD was introduced as a quantification of the hypothesis II. Practically, analysis with DEVZDD is not so different &om that with DEV, but the former often works out better than the latter.

Second

World Congressof Nonlinear Analysts

3. PREDICTION

OF FUNCTIONAL

5397

SITES [Zl

In general, a functional system of protein is made of several functional sites. In trypsin, the sites of 40-H, 84-D and 177-Sb are involved in its proteolytic cleavage reaction. [3] The way how some functional units come together to make a functional system can be explained as follows. Fig.3 shows the site-B and its two partners, the site-El and site-E2, which give two largest DD values to the site-B. If two functional sites, i andj, are located nearby these site-Es, it is possible that these three sites (B, i and 1) make a functional system (Bzj). (As elaborate later, this assumption is realized.) Here, we should make a notice on the following point. When the site-B and site-Es are involved in the formation of a functional system, the DD value of the site-B B should be large enough for gathering functional sites, i and i. However, in order to be accessible to a < 1 .’ ., perturbation from outside, the DD value given by : El : the site-El should be small enough. A functional : * “td system may be made on a delicate balance of these E2 two opposing principles. (Note that the DD value given by the site-E2 cannot be used here, because Fig. 3. Formation of a functional system Bij. the accessibility of the site-B is governed mainly by A solid line between the site-B and site-E represents LID. the site-El that gives the largest DD value to the site-B.) For trypsin, several site-Bs whose DEP’values are more than one are chosen. According to the hypothesis II, their DEWDD values are calculated, where the DD values are given by the-site El. The site-Bs with DEWDD> 1 are listed in Table 1 together with the site-El and site-E2. The site-B (7984) has themaximum DEWDDvalue, 33. Itssite-El is (182-186), and thesite-E2s (44-48,188-189). If you find such regions as those site-Es among the listed site-Bs in the table, the site-B (178-183) corresponds to the site-El, and (38-42,192-196) to the site-E2s. The active sites oftrypsin, 40-H, 84-D, and 177-S, are near or in the site-Bs (38-42,79-84,178183). The site-Els and site-E2s for these site-Bs are (27-3 1, 182- 186,28-32) and (44-48,44-48, 219-223), respectively. Out of these site-Es, (28-3 1) relates to 40-H and 177-S regions, and (44-48) relates to 40-H and 84-D regions. We call these siteEs ‘connector-Es’. Search for a Connector-E is also helpful for prediction of functional sites. In protein folding or interaction with a perturbation from outside, contribution of hydrophobic amino acids followed by that of hydrophilic amino acids are generally observed. To evaluate the contribution, hydrophobic amino acids in trypsin or its portion are replaced by glycine (G) to suppress the individuality of hydrophobic amino acids and the contribution of polar amino acid is accentuated. We call it a polar sequence. Conversely, a non-polar sequence is also made to see the contribution of hydrophobic amino acids by replacing hydrophilic amino acid by G or glutamic acid Q. (Any amino acid may be used, but there is a very small difference due to the formula (2.3).) With a non-polar sequence of trypsin, the DEV analysis was performed. As the site-Es are rather non-specific, we used information of site-Bs alone. Four largest values of DEV,,D are listed in Table 2, where the active sites of trypsin are found neat these site-Bs. With a polar sequence of trypsin, the useful information was not obtained. However, the polar sequence is found to be a powerful tool for prediction of the active sites through that of the functional sites, as described later. Similar strategies for prediction ofthe functional sites are discovered, and applied to various proteins including trypsin. [2,5,8] Although the DEVmodel offers several strategies forprediction offunctional sites, no single strategy is the best one. Above all, the reason why active sites locates near site-Bs and/or site-Es is mysterious. Only we can say is that both sites are derived from the formula (2.3).

5308 rable 1.

Second World Congress of Nonlinear Analysts DEL’ (> 1) and DD of site-Bs and their partners,

Site-B

DEV

Site-El

DEVDD

O-8)

1.04

(16-20) (68-72)

17

(12-16)

1.22

(27-31)

20

(23-27)

1.16

(32-37)

1.05

Table 2.

Site-Bs in the non-polar sequence of uypsm

site-El and site-E2, in uypsin

(16-20) (68-72) (27-3 1)

Site-E2

Hydrophilic amino acid replaced by Klvcine

(116-120) (122-126) (183-187)

21 20

(Gil)

(34-40)

1.05

(27-31)

23

(186-190)

(37-41)

1.09

(201-205)

21

(38-42)

1.38

(27-3 1)

22

(187-191) (4448)

(52-59)

1.26

(182-187)

23

(56-62)

1.18

(65-f 1)

23

(59-63)

1.24

(68-72)

24

(100-104)

(66-70)

1.42

(187-191)

21

(113-117)

(75-79)

1.20

(187-191)

19

(196-200)

(76-80)

1.01

(187-191)

18

(79-84)

1.93

(182-186)

33

(4448) (4448)

-

iG)

glutamic acid (E)

ACtiVfZ site nearby

Site-B

DEV

DEWDD

35-WGAA 86-MLIGL 138-LGAPI

2.26 2.01 1.56

29 26 18

40-H 84-D

178-GGPW

1.09

16

177-s

178-GGPW 60-GEGGF

2.52 1.93

30 26

177-s

35-WEAA

1.80

24

40-H

84EIMLI 1.55 21 As the site-Es are rather non-specific, these sites are not included in the table.

(188-192) (41-45)

Table 3.

Interaction between hypsin and NDV-F. Site-B

(a)

(intra-trypsin-DD)

trmsin 126-130

(61.8) (59.7)

199-203 214-218

(57.1)

3-l 166-170 199-203

(59.7)

(188-192)

Site -E

DD

NDV-F 66.3

23-28 97-104’ 266-270

63.4 61.9

(61.2)

362-369

65.3

(60.9)

362-369

65.9 63.4 66.2 62.4 62.8

(81-86)

1.28

(182-186)

22

(41-45)

(84-88)

1.31

(182-186)

23

(122-126)

3-7 166-!70

(61.2) (60.9)

362-369 421-425 427-43 1

(85-9 1)

1.52

(195-199)

26

(111-115)

108-l 12

(61.4) (62.8)

43 l-435 431-435

(91-95)

1.19

(187-191)

19

(9296)

1.20

(201-208)

19

(108-112)

1.22

(201-205)

18

(113-117)

(100-104)

144148 @)

63.4

NDV-F 12-16 37-41

tlypsin 155-159 8-12

61.0 64.7

104-108* 150-154

27-3 1 122-126

64.6 60.6

8-12

68.0

(126-130)

1.37

(208-214)

22

(134138)

1.13

(154158)

19

(135-140)

1.54

(44-48) (188-192)

26

(142150)

1.52

(S-12)

24

(164169)

1.07

(68-72)

18

(100-104)

(166-170)

1.03

(100-104)

17

(219-223)

Site-B

Site-E

DW

DD

(167-171)

1.10

(16-20)

19

(100-104)

(175-179)

1.55

(207-214)

28

9-13 16-20

116-120 196-120

0.78 0.73

67.2 67.3

(178-183)

1.35

(28-32)

22

5458

183-187

1.14

182-186 210-205 186-190

1.19 1.19 1.12

63.9 63.6 64.4 63.7

154158 201-205 68-72

0.75

63.4

1.21 1.07

187-191 44-48 8-12 187-191 27-31

0.81 0.80 1.42 1.42 0.83

63.5 63.5 66.9

(111-115)

234238

(187-191) (201-205)

(219-223)

Table 4.

(192196)

1.10

(207-214)

18

(198-203)

1.20

(116-120)

20

(189-193)

81-85 92-96 100-104

(200-204)

1.06

(100-104)

18

(115-119)

101-105

(205-209)

1.15

(27-3 1)

18

(116-120)

(209-213)

1.13

(43-47)

18

(213-220)

1.24

(17-21)

22

(184188) (150-154)

(195-199) Partly-overlap cd sites (B or E) with the same parmers are combined an z larger DEVand DEWDD values are shown. When two different site-Es give the same value to the site-B,

both sites are shown.

84-D

108-l 12 120-124 123-127 137-141 146-150 147-151 206-2 10

Intra-molecular interacuon of trypsin.

65.4 67.3 66.0 66.5

Second World Congressof Nonlinear Analysts 4.

INTERACTION

OF TRYPSIN

AND ITS SUBSTRATE

5399 [8,9]

F-protein of Newcastle disease virus (NDV-F)’ is cleaved by trypsin at the peptide bond just after 9 1-R, followed by the long hydrophobic amino acid sequence, (92113). Firstly the DEV analysis was applied to NDV-F alone, and the site-B (87-Q 1) having the maximum DEVZID (66) and its partners, site-El (241-245) and site-E2 (102-106), are identified. As shown in Fig. 4, the cleavage site, 91-R, is in the end of this site-B and the site-E2 is in the hydrophobic region. Then the interaction is studied. B EZ El First calculation is performed by I assuming that trypsin and NDV-F are I I I I I I I I//4 the source of site-Bs and site-Es, 50 100 150 200 250 460 respectively. As a control, trypsin alone Fig. 4. Site-B and its partners in NDV-F. was also analyzed. Thus, to the same site-B in uypsin, two kinds of DD are obtained. One is given by NDV-F and the other is from trypsin itself. When the former DD is larger than the latter, the intra- site-E of trypsin is possibly replaced by the site-E of NDV-F. The second calculation is reversed, that is, site-Bs from NDV-F and site-Es from trypsin. The results of these two calculations are shown in Table 3, in which the mutual interactions between two molecules are shown (DD>60). Ciearly the site-Bs and site-Es of trypsin interacted with NDV-F are located mainly in the N- and Cterminus and the intermediate region around (108-l 70). The active sites (40-H, 84-D, and 177-S) are more or less remote from them. Similarly inNDV-F, the site-Bs interacted with trypsin do not include the cleavage site, 91-R. In contrast, the hydrophobic amino acid trait that is located downstream of 91-R is used for the interaction. It is known that this hydrophobic sequence is important for the substrate specificity. Out of many Rs or KS in the molecule, only 91-R just before this hydrophobic sequence is cleaved by trypsin and this proteolysis is essential for NDV infection [lo], The above DEV analysis indicates that the active-site regions of trypsin and the cleavage site of NDV-F are not involved in the interaction. However, this interaction is considered to be the initial step of trypsin catalysis in a two-molecular mechanism. Following the interaction, a mono-molecular interaction involving the active sites of trypsin and the cleavage site ofNDV-F should take place, For the study of protein-protein interaction, more details can be referred to our published paper [8,9], 5. A MODEL

WITH

HIGHER-DIMENSIONAL

As DD is a measure of the distance between two sites in a protein, it can be used for construction of a higher-dimensional (HD) structure. On the entire region of a protein, site-Bs with large DEV and the respective site-Es giving the maximum DD are searched. From the cases a wire-andhaving large DD (usually DD>60), rod model is made to visualize the whole molecule. A wire represents an amino acid sequence and its length, w cm/amino-acid. A rod is used for a distance between a site-B and site-E and its length, r/DD cm. The values, w and r, depend on the size of a protein.

STRUCTURE

Fig. 5. HD-structural

model of trypsin.

5400

Second World Congress of Nonlinear Analyst\

For a I-ID structure of trypsin, the cases having large DD values (DD> 63) are listed in Table 4. The wire-and-rod model is made with ~~0.5 and ~500 (Fig.5). The DEV model gives a very rough structure, however, the relative position of three active sites are similar to that of X-ray crystallography. Several dotted circles in Fig.5 indicate the regions that are used for the interaction with NDV-F described in Section 4. If we imagine its three-dimensional structure, these sites for inter-molecular interaction protrudes out of this paper and the active sites locate in the other side. This conformation suggests that functional sites are classified into two groups. One is for the interaction with the substrate (two-molecular) and the other is around active site regions (mono-molecular). A HD-structural model like this is a useful tool in its own way and it gives us an important information for the functional aspects of protein.d More detailed structural information can be obtained only with a smaller protein that is shown in the next section. 6. PREDICTION

OF ACTIVE

SITES

[l l]

The DEV model, a somewhat beautiful story though, is unable to provide exact prediction of the active sites in trypsin. The reason will be that the model neglects physico-chemical nature of amino acids. Now, it is a time to look at the polar sequence of trypsin. As mentioned in Section 3, the polar sequence of the whole molecule of trypsin does not work and the assignment with it is less specific. Then we paid our attention on a somewhat small molecule, an abstracted trypsin. Our starting protein model (Table 5, a) is an artificial Table 5. Amino acid sequences of three protein models. molecule that consists of no i....,._. ~,,..._...,,...__.‘...,...,...,..,.,.........,......,...........,...........,...........,.,.........,.................................. i a i AAHCYNTLNNDGDSGGPVVC acids) ii more than three functional j........j..........................................................................................ij........(20 ..........amino ..................... :‘..........*... regions of ttypsin. Those three !..b..l.....GGH.C.~NrGN.NDGD.s..~~,~.~.~.~........i........c?k?!!!?:.a ........] acids) functional regions obtainedare t:....C .....‘.i......AAHCYNNDI ..................................MGD ............SGG .....................................ii........(15 ......f...amino ..................................... i: (a)In trypsin, the original locations of AAHCY, NTUND, and GDSGGGGGC are (3%42), (79-84) and (17538-42,79-84 and 175-183,respectively. (b) Non-polar amino acids, A, I_,P and V, are 183). (See Table 1) Its polar individually replaced by G. (c) The three active sites of ttypsin, 4&H, M-D and 177-S, are individually in the middle of each region sequence (Table 5, b) is made in the way that every non-polar

amino acid is replaced by glycine. The third model (Table 5, c) is composed of three regions, each of which contains a respective active site in its middle. The DEV analysis is then performed. From the DD values obtained, the wire-and-rod models are made in which w=l and r-=250 are used (Fig. 6). Each model protein is assumed to be one molecular entity that is used for further biochemical analysis, and any borders between functional sites that are originally existed in trypsin are neglected.

(b) Fig. 6.

Structures of three protein models, a, b, and c.

5401

Second World Congress of Nonlinear Analysts

Looking at the three models in Fig. 6, we can notice two. This is mainly due to the big DD value, i.e., DDz77.2 at the site-I3 (7-TLNND) and the site-E (14-SGGGG), suggesting that the sites corresponding to 84-D and 177-S come closer. The same conclusion can be obtained by estimation of an area of triangle made up of the three active sites. The area is calculated with the distance fkom one amino acid to another in the three active sites that lie at the vertexes of the triangle. The results are shown in Table 6. The model b clearly has the smallest area. The model b is better than a because the polar amino acids are accentuated in the former. In the case of the model c, the DEV values are rather low compared to those of a and b (data not shown). It suggests that we cannot recognize a functional system of trypsin directly from the active sites. As the size of the model protein is much smaller than trypsin, the simpler DEV pattern can be obtained. It permits the analysis at X=3. Surprisingly, 13-D corresponding to 176-D in trypsin was not caught in this analysis. Then, this 13-D was also replaced by G to make a model d. The results are shown in Table 7 and the structural models are made. The data of the models are shown in Table 8. Three cases with high DD values in Table 7 (d-l, d-2, d-3) are noticed. From their calculated area, the model d-3 is to be the most probable candidate representing the situation around the functional system of trypsin. Two possible structures can be illustrated for the model d-3, those of which are d-3-l and d-3-2, with and without counterclockwise folding, respectively. We cannot differentiate these two, but the model d3-1 has the smallest area. In the site_Bs and the site-Es of the model d3, the distinguished amino acids are 3-H, 1 l-D, 14-S and 20-C. The first three amino acids correspond to the active sites of trypsin, 40-H, 84-D and 177-S, respectively. Thus, such a series of study provided conclusive proof that the DEV model is a useful tool to predict the active sites of trypSi!l.

that the model b is more compact than the other

Table

6.

UjQSill

Model protein : C

Table 7.

Areas of triangle made of three active sites of inthe models, a, band c.

Distance (cm) between H and D Hands Dand S 4.0 3.5 3.6

3.4 3.0 3.9

3.0 3.3 3.4

Site-E

DEV

DD

3-HCY

11-DGG

1.49

66.5

3-HCY

12-GG S

1.49

66.5

I-GGH

1.29

74.8

CCNY

ll-DGG

1.29

74.8

4-CN Y

12-GG S

1.29

74.8

5-YN T

I-GGH

1.39

70.3

5-YN T

11-DGG

1.39

70.3

5-YN T

12-GGS

1.39

70.3

5-YN T

18-GG C

1.39

76.6

1.85

69.2

1.85

69.2

9-NN D 9-NN D

4.3 5.4 5.2

Intramolecular interaction of the model d (GGHCYNTGNNDGGSGGGGGC).

Site-B

4CNY

Area (cm*)

I-GGH 12-GG S

Model

d-l

d-2

d-3

73.9 9-NND 18-GG c 1 .a5 In terms of calculsticm, there are casts having the same amino acid composition, for example I2-GGS and_lEGSG or ICSGG. The fmt one alone is shown in the table. Also the cases having the same amino acid, such ss GGG, are omitted.

Table 8. Axtas of triangle made of three active sites of trypsin inthemodeld.(uF1,~100areused.) Model protein

Distance (cm) between Hand D Hand S Dand S

Area (cm’)

d-l

3.2

1.6

3.0

4.6

d-2

3.2

2.0

3.0

4.5

d-3-l

2.5

3.5

3.0

3.2

d-3-2

2.8

2.3

3.0

4.1

5402

Second World Congress of Nonlinear Analysts REFERENCES

1. KIHO Y., Regional variation and function of nucleotide and ammo acid sequence, Cell.% Func. 13,387-405 (1988). 2. KB-IO Y., A simple way to identify functional sites in pmtein. I. FunctionaI sites of nypsin, Pmt. Japan Aead. 71B, 45-50 (1995). 3. SHROUD R M., KAY L. M. & DICKERSON R E., The structure of bovine trypsin: electron density maps of the inhibited enzyme at 5 Angstrom and at 2-7 Angstrom resolution, .7. Mol. Biol. 83, 185-208 (1974). 4. KIHO Y., Protein folding: Construction of functional structure, cell&r. Fune. l&45-48 (1991). 5. KIHO Y., MIYATA K., UBASAWA A., BHANDARI G., MIZUNO H., IWAI T. & OKADA Y., A simple way to identify functional sites in protein. III. Application of DEV analysis to proteins other than ttypsin, Pmt. .japnn Acad. 7lB, 244-249 (1995). 6. MARGUART M. Z., WALTER J.; DEISENHOFER J., DODE W. & HUBER R, Acra Cryst. Cp) 39, 480 (Brookhaven Data Bank, lTP0) (1983). 7. TOYODAT., SAKAGUCHI T., IMAI K., INOCENCIO N. M., GOTOH B., HAMAGUCHI M. & NAGAI, Y., StructuraI comparison of the cleavage-activation site of the fusion glycoprotein between virulent and avirulent strains of Newcastle disease virus, Virology 15 8,242-247 (1987). in protein. II. Intelaction of ttypsin and its substrate, Pwc Japn Acad. 8. ICE-IO Y., A simple way to identifyfunctionalsites 71B, 51-56 (1995) 9. KU-IO Y., In Proceeding of rhe International Institute for Advan Study (Mathematical Approach to Fluctuation, complexity and non-linearity) (1993). M. J., WHITE J. M. & WATERFIELD M. D., Purificaticmnfthe fusion protein of Sendaivirus: analysis of 10 GETHING the NH2-terminal sequence generated during precursor activation, Proc. Natl. Acad. Sci., USA 75,2737-2740 (1978). 11 KIHO Y., BHANDARI G. Sr OKADA Y., Asimple way to identify functionalsites in protein. IV. Active sites of trypsin, Proc. Japan Acad. 72B, 11-15 (1996).

a The word, ‘active site’, is used in a limited meaning of experimentally-known trypsin is greatly impaired by modification of the amino acid in this site. b Numbering

functional site. The proteolytic activity of

of the amino acid sequence of trypsin (l-223) does not include the N-terminus

c The amino acid sequence of NDV-F d The other HD-structural

(l-460)

of trypsinogen.

is cited from the references [6,7].

models, such as chymotrypsin,

papain and carboxypeptidase,

are reported [5].