Investigation of protein folding: Uneven distribution of point mutations along polypeptide chains

Investigation of protein folding: Uneven distribution of point mutations along polypeptide chains

J. theor. Biol. (1979) 81, 247-258 Investigation of Protein Folding: Uneven Distribution of Point Mutations along Polypeptide Chains I. SIMON Institu...

670KB Sizes 2 Downloads 37 Views

J. theor. Biol. (1979) 81, 247-258

Investigation of Protein Folding: Uneven Distribution of Point Mutations along Polypeptide Chains I. SIMON Institute of Enzymology Biological Research Center, Hung. Acad. Sci., H-1502, P.O. Box 7; Budapest, Hungary (Received 4 September 1978, and in revisedform

4 June 1979)

The distribution of point mutations accepted by natural selection in the amino acid sequences of 16 cytochrome-C, 7 lysozyme, 15 myoglobin, 10 ribonuclease, 12 short neurotoxin, 16 plant ferredoxin and 6 bacteria1 ferredoxin molecules have been investigated. The number of point mutations shows an increasing tendency from the NH,-terminus towards the COOH-terminus of these proteins or at least within their structural domains. Our results suggest that the continuous folding of polypeptide chain during biosynthesis may play an important role in the formation of globular protein structure.

1. Introduction The polypeptide chains of proteins are built during their synthesis by successive addition of single amino acid residues as determined by the genetic code, beginning with the NH,-terminal amino acid. The biosynthesis of a polypeptide chain can be completed in a few minutes. The addition of a new amino acid to the growing chain requires several orders of magnitude longe; time than the formation of structural elements (a-helix, p-structures, etc.). It has been shown by refolding experiments that the time needed for the formation of an a-helix is about lo-’ s (Zana, 1975; Cumming & Eyring, 1975). The construction of a B-bend and B-sheet may be almost just as rapid (Finkelstein, 1978). According to our present knowledge there is no evidence for the existence of any factors that could prevent the formation of these stable structural elements on the surface of the ribosomes. Therefore these elements are expected to be formed during biosynthesis. As early as 1967, Phillips pointed out the role of continuous structure 241

0022%5193/79/220247

+ 12 %02.00/O

0

1979

Academic Press

Inc. (London)

Ltd.

248

I.

SIMON

formation during biosynthesis (Phillips, 1967) but the successful refolding processes have suggested that this cannot play main role in the formation of the native structure (Anfinsen, 1973). The refolding of proteins in vitro has been widely studied both experimentally and theoretically (Anfinsen, Haber, Sela & White, 1961; Anfinsen & Scheraga, 1975). In this process the structure formation starts simultaneously at different points of the polypeptide chain (e.g. in all “helix forming” regions) (Ptitsyn & Finkelstein, 1970; Wetlaufer, 1973; Chou & Fasmann, 1974, 1974~; Levitt & Chothia, 1976). It was shown experimentally for lysozyme that the COOH-terminal parts of the polypeptide chain reach first their final conformation during refolding (Anderson & Wetlaufer, 1976). In contrast, during biosynthesis the NH,terminal end of the protein molecule may assume a stable structure before the synthesis of the COOH-terminal part of the chain. Owing to the obvious difficulties to study the structure formation during biosynthesis there is no attempt to investigate this process experimentally. In this paper a theoretical method is used to study the protein folding in vivo. The investigation of the distribution of point mutations (transitions and transversions) along the polypeptide chains may show the importance of continuous folding of polypeptide chains during biosynthesis. Because of the difficulties arising from the application of common statistical methods for the investigation of the mutability in the different positions along a polypeptide chain (Croswon, 1972; Wong, Liu & Wang, 1976), we have introduced a new approach to characterize the distribution of point mutations. Whereas point mutations (i.e. exchange of an amino acid for another) occur with equal probability along the whole polypeptide chain, those accepted by the natural selection are not random. The surface groups of proteins are more mutable than those located inside owing to steric factors (Wong et al., 1976; Bello, 1978). The conservation of amino acids in the active centre of enzymes is well known. If a prenative structure necessary for the formation of the native structure or the native structure itself is formed during biosynthesis, it seems likely that a change in the amino acid sequence is more deleterious at the beginning of the biosynthesis than near the end of the chain. If continuous protein folding occurs starting with the NH,terminus of the molecule, the first structured segments of the chain may direct the folding of further parts of the molecule, therefore they should be more conservative than the average. On the other hand, the last segments have a higher probability to stay on the surface which implies less stringent steric restrictions. It follows then that the number of accepted point mutations would have an increasing tendency from the NH,- to the COOHterminus in the protein or at least within the individual structural domains.

PROTEIN

2. The Distribution

FOLDING

249

of Point Mutations

Point mutations in the DNA sequence result in the exchange of an amino acid in the corresponding polypeptide chain. However, the observed replacements may involve at least two (Trp --* Tyr) or in certain cases three point mutations (Trp + Asp). On the basis of the genetic code it can be shown that if there are Trp and Tyr respectively, in the same position of two homologous proteins these must have survived at least two point mutations in that position no matter whether the common ancestor contained Tyr, Trp, or perhaps Cys from which one point mutation can produce either of the two amino acids. If the common ancestor contained other amino acid in this position, more than two point mutations should have taken place during evolution. Therefore studying the distribution of point mutations in homologous proteins isolated from different species we have taken into account all possible base triplets corresponding to the amino acids in the given position to determine the minimum value of point mutations in the given base triplet. This value gives the minimum number of those exchanges that could convert all the triplet codons occurring in the corresponding position into each other, and does not depend on what amino acid was there in the common ancestor. These values, of course, not always equal the real number of accepted point mutations, but the expected tendency in the distribution is the same, since the numbers are monotonous functions of the observed ones (Dayhoff, 1972). We have investigated the observed point mutations in the amino acid sequences of 16 cytochrome-C, 7 lysozyme, 15 myoglobin, 10 ribonuclease, 12 short neurotoxin, 16 plant ferredoxin and 6 bacterial ferredoxin molecules. Polypeptide chains of identical length were compared with the exception of a few lysozyme and ferredoxin sequences. In the case of human leukaemia lysozyme we used corrected sequences (Thomsen, Lund, Kristiansen, Brunfeldt & Malmquist, 1972). There is either an insertion at position 48 in the 130-membered chain of lysozyme or a deletion after position 47 in the 129-membered one. We have considered the latter as a 130membered chain where an amino acid at position 48 was missing. Similar corrections were necessary in a few plant ferredoxin sequences. The deletions and insertions were regarded as single point mutations. The number of mutations at a single position is small thus the estimation of its relative error is large. AnotheI; source of error is that certain positions in the polypeptide chains aie extremely conservative to ensure the proper biological function, and these regions may distort the tendency of distribution of point mutations. This effect can be especially large in the case of oligomeric proteins in which large continuous parts of the chain at the

I.

250

SIMON

intersubunit contact areas should be conservative. In the case of hemoglobin more than 20% of the amino acids are involved in the interaction of the chains and all of these are elements of only two segments with 25 and 40 amino acid residues, thus they are not evenly distributed along the chain (Peru& Muirhead, Cox & Goaman, 1968). Therefore we have studied only single chain proteins. In these proteins the greater part of functionally relevant conservative amino acids are located in the second half of the molecule, or distributed evenly along the polypeptide chains. To reduce the relative error in the estimation of single point mutations and to make a possible tendency in the distribution of point mutations in longer segments of the polypeptide chain recognizable we have investigated the number of mutations in overlapping segments having 10, 20, 40 and 60 amino acid residues as a function of the centre of these segments. Figures 1 to 4 show the number of mutations in the 1, 10, 20, 40 and 60 I

f

... ..“I...“” -... ;.;..“:

1 i=z having m amino acids (m = 1, 10,20,40 and 60) and beginning the segments’ centre. C,,, = z+(m1)/2. FIG.

1. Sum of the observed

point

mutations

K,,,

=

“=60

ki in cytochrome-C at the position

in segments z as a function

of

PROTEIN

FOLDING

251 .:..:

605040-

--

“.. -.” *..

.*..*,-

“i :.-.*-.. ““2 :‘..*” :.*.. m=60

30 . :“... .. :‘r..‘.* .*:..,. . . a”.

40.. . *:” **,“. .**.... .* ....**. r--*-‘. ..

30-

m=40

20. E L” 20,. _

. :.~.: . . .. .:: ..*.*-

. . .:.. ..:.: .” ..‘Z . .:.. ..*.

..‘**r””

:.- ...“.. p...

-..* ..‘.: -:.-* *” m=20

0

..:i.

‘.. ,0 -

. . .:“” :‘,.. :.. .“:.: ’ 5.. ...%“. .. ..“y-.-.

*...“--;:.*:-

‘“::.A.”

z 5 . . 0 4:. A., .. .*.:.A...- .. .. ...“. ** ...” “. ..“.. ‘“-.r*..v.: l..:.-...“., I

I

I

II,,

I

0

:-.. .“:

m=,O

. . “j:.,.,.*.‘%” m=l

II,,

50

,

100 c +m

FIG. 2. Distribution of point mutations in lysozyme, plotted as in Fig. 1.

50““.

“‘r*8...“*

.”

2. ...“.“.,“.““.

i:’

..a: .*..: “.-.. *“.,*‘:“--

4030m = 60

20.

*...:.. 4030 -

F 2

““...: “:.‘; c . .“.“.“.... “..

/. -.;..*..-:... *:*.- .. .. *#.. .” *:’ . ..:.: ..a

5’

20m=40 ... . . .* - “. f”.’ :.:..” . . ..1.

10 20(5

.: ‘... f...*.%“.; .“.,

IO-

.:.*..~.*d* “...‘.;“““.

.. ..:...,..:.... : .*:* *:

m=20 ..:I. **..r.. ..... .“..,’ . p.. ..” *.. .. ,o -.‘- . ..*..:...:*: :.* ..:-... . . . ..a-.*J”** - *.. :..:*: %*... .5”. ..8-O .* “” 5m=IO ’ O5- . . . .. . . . . 0 C..‘.. . .:. ” >*:.: ..“i “-..:.:..““:“.-‘.,. ::..:.,5: .“..J..““.” .:-.:. ‘W . . . . . .“‘::*:-..~ m=l I I I I I, I I I I,,, , 0 50 100 c =,m

FIG

3. Distribution

of point mutations in myoglobin, plotted as in Fig. 1.

252

I. SIMON

50

.. . .. . -.-:..

40 t 30

*-

.., ..+".. :.

..:.-..'

,:. .:.. . . **a- :.. m=60

20 :..

40-

-.."-..

30F G-

-.

. ...*- ..*. -..:.-'.

*o20-

. . .:.... . .

-.."...

.:;$...-

. In=40

-. ... -:.. ,*-' .I ..

..:. '....... . .*...'.. ..".-.. .-* *...--.. .* ".....*:' " ... '. .. -*r,..

IO-

m=*O

o-

:c. 10 - ..**-" ::.-.=-*.-1. . ".":.. ..:"'_ .::... tr,-:*- '.,* In= IO .* *-. . .. ..--..-.-.." .. ; +. . 0" 9i.L.: : ::..:::s* ."......... ,..:.: . . ... . ...:.a.:.:-.:.,: . .. . '."g.$.. I

I

I

I

I 50

0

I

I

I

I

I 100

I

I

&,#?I FIG. 4. Distribution

of point

mutations

in ribonuclease,

plotted

as in Fig. 1.

membered overlapping segments in cytochrome-C, lysozyme, myoglobin and ribonuclease, respectively. (Ferredoxins and neurotoxin are too small for this kind of analysis.) To characterize quantitatively the tendency of distribution of point mutations we used the position of the centre of gravity of the distribution, i.e. the average of the position of mutations. Having ki mutations in position i of a polypeptide chain of II residues, the position of centre of gravity of the mutation distribution is : n

c iki j-i31 ,F, ki

This position relative to the centre of the polypeptide chain is characteristic of the distribution of point mutations. To obtain a parameter independent of the length of the polypeptide chains we introduce: P’

2j-n-1 n-l

PROTEIN

253

FOLDING

TABLE 1

Value of parameter p for different proteins Protein Cytochrome-C Myoglobin Lysozyme Short neurotoxin Bacterial ferredoxin Plant ferredoxin Ribonuclease

n

P

105 153 130 62 55 99 124

0,252 0.044 0.108 oGO4 0.070 0.051 -0.166

The expected value for this parameter is between - 1 and 1 and is proportional to the position of the centre of gravity in the polypeptide chains. For random distribution p is 0, for an increasing tendency it is positive, and for a decreasing one it is negative. Table 1 contains these p values calculated for cytochrome-C, lysozyme, neurotoxin, plant and bacterial ferredoxins, myoglobin and ribonuclease. Difference in the mutability of amino acid affect the distribution of point mutations (the conservative feature of cysteine is well known). To rule out the effect of uneven distribution of different amino acids along the chain we have analysed the amino acid sequences of ferredoxins from six different species. The evolution of this protein containing 55 amino acid residues have been complicated by intragenic duplication that yielded two sequentially identical parts in the chain ; namely the sequence from the l-26 position was identical to the segment 30-55 in the polypeptide chain. Comparison of the numbers of accepted point mutations in these proteins gives 24% higher value for segment 30-55 (31 mutations) than for the segment l-26 (25 mutations). 3. Effect of Domain Structure of Protein A large number of proteins-among those we have studied: lysozyme, ribonuclease, neurotoxin, bacterial and plant ferredoxins-have domain structure. According to X-ray analyses, there are much less external interactions, between these domains than internal ones. To determine domain structures we used X-ray crystallography data of Phillips (1967) for lysozyme, Wyckoff et al. (1967) for ribonuclease, Low et al. (1976) for neurotoxins and Adman, Sieker & Jensen (1973) for ferredoxin. In the absence of X-ray data for plant ferredoxins their domain structure were estimated on the basis of their sequence homologies with the bacterial ferredoxins.

I. SIMON

254

TABLE

2

Value of parameter p for individual domains Protein Lysozyme

Short neurotoxin Bacterial ferredoxin Plant ferredoxin

Ribonuclease

Domain

P

l- 40 41-110 111&130 l- 30 31- 62 l- 29 30- 5.5

0.134 0.124 0.205 0.025 0.266 0.155 0.156

l- 31 32- 65 66- 99 l- 20 21& 40 41-124

0.058 0.190 0.152 0.158 0.022 0.037

The domain set up their structure independent of each other. Therefore it is of interest to determine the distribution of point mutations along these sequentially continuous polypeptide chains constituting independent folding units. Table 2 contains the p parameters for the individual domains. These data show that the increasing tendency in the distribution of point mutations is even more striking for the individual domains than along the whole chain. Moreover, in the case of proteins (e.g. ribonuclease) where the distribution of mutations does not increase along the whole chain, the p parameters calculated for the individual domains are all positive, indicating an increasing tendency for all its domains. From these results we have concluded that continuous folding during biosynthesis may play an important role in the formation of the native structure. 4. Protein Folding During Biosynthesis

The balance between the role of short-range and long-range interactions in the determination of protein structure is one of the most important questions in the analysis of protein folding. It is accepted that short-range interactions alone cannot determine the formation of protein structure. Extended chains, 3,, helical fragments,

PROTEIN

255

FOLDING

different bends in an unfolded polypeptide chain are not stable enough to survive thermal fluctuations (Finkelstein & Ptitsyn, 1976). In our previous work conformation energy calculations were carried out on a tetrapeptide that occurs as a bend in staphylococcal nuclease at residues 94-97. This bend has relatively high energy indicated that it must also be stabilized by longrange interactions (Simon, Nemethy & Scheraga, 1978). The number of possible combinations of long-range interactions are so great that for a polypeptide chain of 100 amino acids it would take about 105’ years to find the native conformation in a random search with the frequency of thermal fluctuations (lOI 3 conformations s-i) (Karplus & Weaver, 1976). Not just protein molecules but the whole evolutionary process can not have such a long time to search through all possible variations. Thus it is very probable that the native structure of proteins does not correspond to the global minimum of free energy state but is rather a result of a more or less definite folding pathway. The continuous structure formation would be a definite folding process in which the just synthesized and not yet stable short segments go through their most stable and thermodynamically most probable forms under the given circumstances. The incoming amino acid can only alter the conformation of the last few (maximally 6-10) residues and become stabilized by short-range interactions with the last few preceding amino acids and by long-range interactions of this still movable small fragment with the already stabilized and locally adjacent parts of the polypeptide chain. The time between the addition of two amino acids to the growing chain is more than enough for the last small fragment to search through ail possible conformations and find its global energy minimum under the given circumstances. At the same time long-range interactions with the close and already stable parts of the polypeptide chain may insure the stability of the individually not stable segment without increasing considerably the time needed for the formation of the structure. These long-range interactions may explain the formation of unexpected ol-heiices or /&structures, as well as the absence of these elements, which cannot be predicted by considering only parts of the amino acid sequence. 5. Postsynthetic Conformation

Change

The data calculated from the sequence of cytochrome-C, lysozyme, myoglobin, ribonuclease, ferredoxin and toxin suggest that continuous structure formation during biosynthesis may play an important role in the formation of globular protein structure. In many cases the result of this continuous folding of the polypeptide chain

256

I.

SIMON

is only a definite prenative conformation. A number of enzymes are synthesized as inactive zymogens and converted to their active form by limited proteolysis. In general, the completed polypeptide chain comes to a different environment after leaving the surface of the ribosome therefore it may go through a postsynthetic conformational change. The relative positions of the structural domains formed more or less independently during biosynthesis must depend on the environment too. Our studies on the distribution of point mutations indicate that this postsynthetic conformational change is rather limited and does not mean the complete refolding of the polypeptide chain. It is to be expected that a complete refolding takes place similarly to an in vitro one where the NH,terminal end of the molecule has no special role. In fact, the refolding of lysozyme starts at the COOH-terminal (Anderson & Wetlaufer, 1976), therefore this part of the molecule should have been more conservative. 6. Stability of Continuously-formed Structures

At the beginning of biosynthesis the NH,-terminal end of the protein is synthesized in the absence of further parts of the molecule. The fragments synthesized towards the COOH-terminal cannot modify considerably the previously stabilized sequences. Thus protein molecules cannot search through a number of-perhaps very favorable-long-range interactions. Therefore the native structure of proteins is definite but comprises only local energy minima and is in a kinetically stable state. This does not contradict the notion that during biosynthesis the newly-formed movable segments assume the most stable structures available to them under the given circumstances. The main parts of the conformation energy-the hydrophobic energy and the energy of H-bonds-as structural invariants are simple functions of the molecular weight and are independent of the actual conformation of the closely-packed polypeptide chains (Chothia, 1975). The conformation energy also depends on the environment. By a postsynthetic conformation change the protein can reach the deepest point of the potential valley determined by continuous structure formation, but not necessarily the deepest potential valley of all (global energy minimum state), because the high potential barrier protects the compactly folded structures. Continuous folding of polypeptide chains beginning with the NH,terminal end would lead to the formation of a definite structure during biosynthesis. The maintenance of kinetically stable structures, the selfreassembly of molecules from metastable states caused by fluctuations with different amplitudes, the renaturation of reversibly denatured protein and the

PROTEIN

FOLDING

257

structure formation of chemically synthesized polypeptide chain can also be determined by other factors. These factors will be discussed in our forthcoming papers. For studying the number of mutations in different proteins we obtain data from the Atlas of Protein Sequence and Structure (Dayhoff, 1972), from the Handbook of Protein Sequences (Croft, 1973) and Wakabayashy et al. (1978). Cytochrome-C

Elephant seal, horse heart, dogfish, snapping turtle heart, kangaroo, whale heart, zebra, hog, human heart, camel heart, bovine heart, chicken heart, rabbit heart, lamprey, emu, ostrich. Lysozyme

Duck egg II, duck egg III, human milk, hen egg, guinea hen egg, human leukaemia, baboon milk. Myoglobin

Human horse, sperm, kangaroo, harbour seal, porpoise, beef heart, dolphin, badger, sheep, hedgehog, chicken, California sea lion, treeshrew, American opossum. Ribonuclease

Bovine, porcine, rat, giraffe, red deer, roe deer, ovine, reindeer, moose, fallow dear. Bacterial ferredoxin Clostridium pasteurianum, Clostridium acidi-urici, Clostridium butyricum, Clostridium tartarivorum, Clostridium thermosaccharolyticum, Clostridium M-E.

Plant ferredoxin

Taro, koa, spinach, alfalfa, P. americana I, P. americana II, E. telmateia I, E. telmateia II, E. arvense I, E. arvense II, S. quadricauda, A. sacrum I, A. sacrum II, S. platensis, S. maxima, N. muscorum I.

258

I. SIMON

Short neurotoxin

Black mamba, Jameson’s mamba, cape cobra, forest cobra, Iranian cobra, blackneck spitting cobra, ringhals, cape cobra (2), Formosan cobra, ringhals (2), broad-banded blue sea snake, common sea snake. REFERENCES ADMAN, E. T., SIEKER, L. C. & JENSEN, L. H. (1973). J. biol. Chem. 248, 3987. ANDERSON, W. L. & WETLAUFER, D. B. (1976). J. biol. Chem. 251, 3143. ANFMSEN, C. B. (1972). Biochem. J. 128, 737. ANFINSEN, C. B. (1973). Science 181, 223. ANFINSEN, C. B., HABER, E., SELA, M. &WHITE, F. H. (1961). Proc. natn. Acud. Sci. U.S.A. 47, 1309. ANFINSEN, C. B. & SCHERAGA, H. A. (1975). Adv. pmt. Chem. 29,205. BELLO, J. (1978). Int. J. Peptide Protein Res. 12, 38. CHOTHIA, C. (1975). Nature 254, 304. CHOLJ, P. Y. & FASMANN, G. B. (1974). Biochemistry 13, 211. CHOW, P. Y. & FASMANN, G. B. (1974~). Biochemistry 13, 223. CROFT, L. R. (1973). Handbook of Protein Sequences, Supplement A, 1973; Supplement B, 1976. Oxford : Joynson-Bruvvers Ltd. CROWSON, R. A. (1972). J. mol. Evol. 2, 28. CUMMINGS, A. L. & EYRING, E. M. (1975). Biopolymers 14, 2107. DAYHOFF, M. 0. (1972). Atlas of Protein Sequence and Structure, Supplement 1, 1973; Supplement 2, 1976, Vol. 5. Washington: The National Biomedical Research Foundation. FINKELSTEIN, A. V. (1978). Bioorg. Khimiya, U.S.S.R. (in press). FINKELSTE~, A. V. & PTITSYN, 0. B. (1976). J. mol. Biol. 103, 15. KARPLUS, M. & WEAVER, D. L. (1976). Nature 260, 404. LEVINTHAL, C. (1968). J. them. Phys. 65, 44. LEVITT, M. & CHOTHIA, C. (1976). Nature 261, 552. LEWIS, P. M., MOMANY, F. A. & SCHERAGA, H. A. (1971). Proc. natn. Acad. Sci. U.S.A. 68,2293. Low, B. W., PRESTON, H. S., SATO, A., ROSEN, L. S., SEAUL, J. E., RUDKO, A. D. & RICHARDSON, J. S. (1976). Proc. natn. Acad. Sci. U.S.A. 73, 2991. PERUTZ, M. F., MUIRHEAD, H., Cox, J. M. & GOAMAN, L. C. G. (1968). Nature 219, 131. PHILLIPS, D. C. (1967). Proc. natn. Acud. Sri. U.S.A. 57, 484. PTITSYPI’, 0. B. & FINKEUTEIN, A. V. (1970). Biojizika 15, 757. PTITSYN, 0. B.; LIM, V. I. & FINKEISTEIN, A. V. (1972). Proc. VII&h FEBS Meeting, Analysis and Simulation of Biochemical System (H. Hess & H. C. Hamker, eds), pp. 421-431. Amsterdam: North Holland Publ. Co. SIMOX, I., NBMETHY, G. & SCHERAGA, H. A. (1978). Macromolecules 11, 797. THOMSEN, J., LUND, E. H., KRISTALSEN, K., BRUMFELDT, K. & MALMQUIST, J. (1972). FEBS Leit. 22, 34. WAKABAYASHY, S., HASE, T., WADA, K., MATSUBARA, H., SUZUKI, K. & TAKAICHI, S. (1978). J. Biochem. 83, 1305. WETLAUFER, D. B. (1973). Proc. natn. Acad. Sci. U.S.A. 70, 697. WONG, A. K. C., Lx, T. S. & WANG, C. C. (1976). J. mol. Biol. 102, 287. WYCKOFF, H. W., HARDIVIAN, K. D., ALLEWELL, N. M., INAGAMI, T., JOHNSON, L. N. & RICHARDS, F. M. (i967). J. biol. Chem. 242, 3984. ZANA, R. (1975). Biopoiymers 14, 2425.