BIOCHIMIE, 1~975, 57, 1147-1154.
The evolution of the genetic code. R. P. Jonn~, *<> a n d R. N. CURNOW "*
" The Post Office (SBRD), 93 Howland Street, London, W1P 6HQ and ** Department of Applied Statistics, University o[ Reading, Whiteknights, Reading, Berkshire RG6 2AN. (12-5-1975). Summary. - - A Model is described for the evolution of proteins within organisms in the primitive environment. The model is then used to show which chemical and bio1.ogical properties of the amin.o acids could have been important in the evolution of the genetic code.
A model for p r o t e i n evolution has been described i n [I]. This model is n o w used to d e t e r m i n e the selective forces that m a y have i n f l u e n c e d the evolution of the genetic code. These forces are then l i n k e d w i t h c e r t a i n chemical a n d biological p r o p e r t i e s of the a m i n o acids. The d e v e l o p m e n t of the code most p r o b a b l y took place in very p r i m i t i v e life forms i n the primeval soup. By classical D a r w i n i a n theory the code must have been selected su.ch that it was the most efficient i n e n s u r i n g the survival of the primitive genotype. P r i m i t i v e o r g a n i s m s were most p r o b a b l y simple c o m b i n a t i o n s of DNA a n d protein u n d e r g o i n g high m u t a t i o n rates due to i m p e r fections in the r e p l i c a t i o n m e c h a n i s m [2]. U n d e r these c o n d i t i o n s o r g a n i s m survival w o u l d be critically d e p e n d e n t on w h e t h e r the t r a n s l a t i o n mec h a n i s m was such that mistakes in r e p l i c a t i o n would i. provoke a mi~nimum n u m b e r of h a r m f u l c o n s e q u e n c e s a n d it. be used as often as possible to i m p r o v e o r g a n i s m f u n c t i o n . Attempts have been made to explain the d e g e n e r a c y of the genetic code a n d the assignments of codons [2, 3] by assum i n g that a code m i n i m i z i n g errors at a m i n o acid level a n d m i n i m i z i n g s u b s t i t u t i o n s b e t w e e n u n l i k e acids w o u l d be selectively advantageous. The absence of tRNA species a n d of sophisticated genes such as m u t a t o r genes, w h i c h developed later t h a n the genetic code ; the relatively short life-span of p r i m i t i v e o r g a n i s m s ; a,nd the lack of specificity of the u r e n z y m e s , ~ o u l d suggest that the Markov c h a i n model of [1~ w i t h no selection at DNA, tRNA or a m i n o acid level could be approp r i a t e to the e n v i r o n m e n t in w h i c h the genetic code evolved. At this time the e v o l u t i o n a r y pressures on the code itself p r o b a b l y outweighed the To whom all correspondence should be addressed.
effects of selection on the fixation p r o b a b i l i t i e s of a m i n o acids at i n d i v i d u a l sites. To d e t e r m i n e possible c o n s t r a i n t s on the evolution of the code we need measures of the effects of the code as it e v e n t u a l l y evolved on the ease w i t h w h i c h each p a r t i c u l a r a m i n o acid c a n be r e p l a c e d by each of the others. We shall use as these measures, the m e a n first passage times between each p a i r of ami~no acids, aj a n d a k. The m e a n first passage time is the average n u m b e r of mutation-fixations before the first a p p e a r a n c e of a k after an a p p e a r a n c e of aj. The m e a n first passage times b e t w e e n all pairs of a m i n o acids, conditional on the a v o i d a n c e of ¢ stop >> states, can be derived from the model described i n [11 assum i n g i. m u t a t i o n s from a n y DNA base to a n y of the other three DNA bases are equiprobable, it. no se.lection at DNA or tRNA level a n d iii. no select i o n at amino acid level except for stop states w h i c h are c o n s i d e r e d as totally deleterious. A forreal d e r i v a t i o n for the c a l c u l a t i o n of these m e a n first passage times can be f o u n d i~ [4, 5] and the results are given in table I. The u n i t s i n table I are n u m b e r s of m u t a t i o n 4 i x a t i o n events. The three o u t s t a n d i n g features of this table are i. diagonal values tend to be smaller t h a n n o n diagonal values w h i c h explicitly confirms the c o n s e r v a t i v e n a t u r e of the genetic code by d e m o n s t r a t i n g that the mutation-fixation process favours the a p p e a r a n c e of tile same a m i n o acid as was p r e v i o u s l y f o u n d at the site i n question, it. m e a n first passage times from a n y a m i n o acid to a ¢ stop >> state are far longer t h a n those from one a m i n o acid to a n o t h e r as expected given that <> states are obviously highly deleterious a n d the genetic code should be such that they are difficult to reach, a n d iii. m e a n first passage times from a t s t o p >> stade to an a m i n o acid are usually short w h i c h could i n d i c a t e that the extension of p r o t e i n chains, a p a r t from
¢.n
o
=
¢J1
4.61
6.42
6.19
7.49
6.94
5.99
7.52
6.77
7.50
6.63
7.50
5.93
5.99
5.99
7.06
6.19
6.77
6.94
4.61
7.12
6.77
7.50
6.34
6.86
7.37
4.97
5.72
6.28
7,.27
6.39
5.53
5.93
5.93
6.18
5.08
5.87
7.07
7.46
Asp ..
Cys...
Gin...
Glu...
Gly...
His...!
Ile . . . ,
Leu...~ 6.82
L y s . , .I 6 . 1 8
4 98
4.66
S T O P . i 5.01
J
7.50
7.69
6.31
Val...i
4.98
6.86
4.42
4.42
5.48
Tyr ..i 5.23
i
5.33
5.33
Trp...!
3.60
7.13
6.00
5.64
6.19
6.98
5.12
/
6.31
5.82
Ser...!
I
6.31
Pro...i
Thr .
7.10
6.66
6.69
6.77
Phe...i
6.31
Met...! 7.35
!
5.99
6.19
7 01
7.07
5 34
5.93
6.85
5.64
4,37
6.88
Arg...
A s n ..
6.86
7.50
7.69
4.65
Asp
Ash
Arg
Ala
Ala ...
From
5.01
8,12
4.97
4.35
8.12
6.60
8.12
6.58
8.11
6.90
7.43
8.13
7.67
7.15
6.90
6.90
5.36
7.67
7.67
7 15
8.12
8.00 7.18
7.41 7.18
4.99
8. O0 4.99
7.41
5.64
5.59
5.59 5.64
8 . O0
8.00
7.52
7.44 6.05
7.05 5.92
6.29
6.26
7.30
6.44
4.98
5. n4
5.06
471
6 18
5.37 6 53
5.21
4.42 7.50
4.76
5.28
6.62
6.60
4.77
5.86
6.44
5.33
6.23
7.45
5.69
496
5.45
5.98 I 4.29
5.21!
7.05
7.02
6.25
5.35
7.44
5.99
6.25
6.41
I
6.661
7.74
6.73
5.53
442
! Leu
7.45 i 7.74
lie
To
6.53
7.50
6.63
6.59 7.84
6.86
7.84
6.77
7.16
7.45
7.74
7.83 7.45
5.88
5,88
6.51
7.83
7.23
7.37
6~97
7.49
5.64
5.99
7.88 6.53
7.45
7.41 8.06
7.06
4.67
6.95
7.54 8.06
5.99
5.45
5.39
5.88 6.12
4.61
6.53
6.19
5.88
6.30 5.93
5.39
6.55
7.50
i Ills
7.45 I 6.19
6.23
6.73
Gly
5.25
6.57
6.12
7.41
7.39
7.41
Glu
6.57
7.41
7.41
7.00
8.00
Cys I Gin
TABLE I.
5.09
6.57
4.99
8.00
5.64
5.59
7.41
7.06
8.00
7.45
7.00
5.39
7.37
7.55
7.41
7.54
5.88
6.18
5.08
5.93
7.07
7.07
6.13
6.31
Pro
6.50
6.09
6.94
7.21
6.18
6.08
i
L
5.82
4.65
4.90 i 5.01
4.27
7.43
5.23 7.02
4.55
5.10 ~ 5.12
I
7.64 ] 6.31
6.36
7.64
6.77
5.96
4.52
4.24
5.96
5.51
6.25
5.60 ] 5.53
7.35
6.70
6.75
6.75
4.23
7.35
6.34
6.21
6.87
Ser
4.53
7.86
4.51
4.30
6.12
4.77
6.87
6.02
7.42
6.27
7.00
7.46 I 7.01
5.93
7.22 ] 6.99
6.50
6.50
5.24
7.21
7.21
7.13
7.64
Phe
5.57 I 7.27 J 7.35
4.77
5.41
5.50
6.02
5.89
5.18
5.18
6.02
7.41 5.88
5.84
6.12
5.65
6.25
8.00 7.12
Met
Lys
Mean first passage times.
5.01
7.43
5.23
5.12
4.65
5.44
6.31
6.77
6.24
5.08
6.82
6.34
7.07
6.99
6.18
6.18
5.93
7.07
5.93
6.51
6.31
Thr
3.96
6.17
4.41
4.09
6.17
5.38
6.17
5.61
6.04
5.17
5.61
6.22
5.93
5.68
5.17
5.17
4.57
5.93
5.93
5.55
6.17
Trp
5.nl
8.44
5.08
5.84
8.44
7.20
8.44
6.97
8.44
6.99
7.75
8.44
7.30
7.96
6.99
6.99
6.05
7.30
7.30
7.87
8.44
Tyr
30.09
27.00
30.09
27,85
29.79
25.45
27.85
30.19
28.89
28.52
25.45
25.45
24,70
28.89
28.89
28.09
i30.09
STOP
5.01
4.65
21.29
30.09
5.23 !22.01
5.12 121.56
7.43
6.57
7.43
5.63
6.24
6.18
5.71
6.34
7.07
5.87
5.08
6,18
5.93
5.93
7.07
6.88
6.31
Val
z%
o¢
The evolution of the genetic code. d o u b l i n g of t h e gene w a y b y m u t a t i o n of a ¢ s t o p >> c o d o n to one c o d i n g for an a m i n o a c i d [1]. T h e table also sho'ws t h a t the m e a n first p a s s a g e t i m e b e t w e e n a m i n o a c i d aj a n d a m i n o a c i d a k is not e q u a l to t h a t b e t w e e n a k a n d a i. T h i s a s y m m e t r y b e t w e e n m e a n f i r s t p a s s a g e t i m e s is due to i. t h e i n f l u e n c e o,f the (< stop >> c o d o n s a n d it. d i f f e r e n t n u m b e r s of c o d o n s c o d i n g f o r d i f f e r e n t a m i n o acids. If the g e n e t i c c o d e e v o l v e d so as to m i n i m i z e h a r m f u l c h a n g e s t h e n the m e a n first p a s s a g e t i m e s shoul, d be i n v e r s e l y r e l a t e d to t h e e f f i c i e n c y w i t h w h i c h a m i n o a c i d c h a n g e s w e r e a d a p t i v e in t h e primitive environment. Any relation between the a d a p t i v i t y of t h e s e c h a n g e s a n d c h e m i c a l a n d b i o logi, cal p r o p e r t i e s of t h e a m i n o a c i d s w i l l r e s u l t in c o r r e l a t i o n s b e t w e e n t h e m e a n first p a s s a g e t i m e s a n d t h e s e p r o p e r t i e s . C o n v e r s e l y , if w e find r e l a t i o n s b e t w e e n m e a n first p a s s a g e t i m e s a n d any particular property then we can consider t h a t p r o p e r t y as a p o s s i b l e d e t e r m i n i n g c o n s t r a i n t in t h e e v o l u t i o n of t h e c o d e .
1149
v i t y in t h e s e a n a l o g u e s w e r e e x a m i n e d b y c a l c u l a t i o n of c o r r e l a t i o n c o e f f i c i e n t s (20]. F o u r i n d e p e n d e n t v a r i a b l e s w e r e consi,dered. T h e s e w e r e t h e r a n k e d m e a n first p a s s a g e t i m e s b e t w e e n t h e a m i no a c i d in the n a t u r a l l y o c c u r r i n g p r o t e i n at e a c h of f o u r sites an.d t h e a m i n o a c i d f o u n d in t h e s y n t h e t i c a n a l o g u e . T h e r a n k e d d i f f e r e n c e s in b i o l o gical activity between the naturally occurring protein a n d the s y n t h e t i c a n a l o g u e w e r e t a k e n as t h e d e p e n d e n t v a r i a b l e . All r e s u l t s a ~ d details of calc u l a t i o n can be f o u n d in [4, 5]. T a b l e I I s h o w s the results w h i c h w e r e o b t a i n e d f o r O c y t o c i n w h e n all t h e d a t a w e r e c o n s i d e r e d . F r o m this t a b l e a n d o t h e r r e s u l t s a v a i l a b l e in [4, 5] it c a n be s e e n t h a t m a n y of t h e s e c o e f f i c i e n t s w e r e h i g h l y s i g n i f i c a n t s u g g e s t i n g t h a t m e a n first p a s sage t i m e s are in,deed l i n k e d to t h e b i o l o g i c a l activ i t y of t h e s e p r o t e i n s . R a n k c(~rrelation a n a l y s i s [20] w a s also u s e d to r e l a t e m e a n first p a s s a g e t i m e s w i t h 138 c h e m i c a l a n d b i o l o g i c a l p r o p e r t i e s of t h e a m i n o acids. R a n k
TABLE II.
Regression Analysis. Biological Activity and Mean First Passage Times. Ocytocin (All Data). Site on oeytoein
2 3
Tyr Ile
4
GIn
8 Leu Sites 2, 3, 4, 8 combined
Mean first passage times
Correlation coefficient 0.57 0.60 0.35 0.27 0 79
F (a) 16.88 (*') 19.37 C*) 5.04 C) 2.71 (b) 13.46(")
Reduced transpose : mean first passage times Correlation coefficient O. 59 0.25 0.32 0.07 o. 65
F (a) 18.89 ('*) 2.36 3.98 0.16 (b) 5.72 (")
(a) Degrees of Fre.edom ~1 = 1, v2 -: 35 except for (b) where vl = 4, 'v~ ~ 32. (e) If R (aj, a~) is the mean: first passage time from amino acid aj to amino acid ak then this transformation R* (as, ok) is the average number of mutationfixations necessary to arrive for the first time at aj, given, that the original amino acid is ak as a pr~oportion of the average number necessary to proceed from ak to a~. (*) Significant at P < 0.05. (**) Significant at P < 0.01.
RESULTS. M e a s u r e m e n t s of b i o l o g i c a l a c t i v i t y , as s h o w n b y t h e r e a c t i o n of t h e u t e r u s of r a t s a n d the b l o o d p r e s s u r e of m a m m a l s , to s t i m u l a t i o n b y t h e s y n t h e t i c a n a l o g u e s of the h o r m o n e s V a s o p r e s s i n a n d O c y t o c i n a r e ava'ilable [6]. T h e r e l a t i o n s h i p s betw e e n m e a n first p a s s a g e t i m e s a n d b i o l o g i c a l a c t i .
BIOCHIMIE, 1975, 57, n ° 10.
c o r r e l a t i o n c o e f f i c i e n t s (or t h e i r e q u i v a l e n t in standardized normal variate form for dichotomous properties) were calculated between each prop e r r y a n d e a c h g r o u p of m e a n first p a s s a g e t i m e s a n d v a r i o u s t r a n s f o r m s of t h e s e times. E a c h of t h e s e 210 g r o u p s c o n s i s t e d of 19 p a s s a g e t i m e s between any one amino acid and the remaining 19 a m i n o acids. F u l l detaiIs are a v a i l a b l e i n [4, 5].
R. P. J o r r d a n d R. N. C u r n o w .
115,0
T h e n u m b e r of g r o u p s , o u t of 2'0 ( o n e f o r e a c h a m i n o a c i d ) , w h i c h w e r e s i g n i f i c a n t l y (P ~ 0.05) correlated with the apparently important propert i e s l i s t e d b e l o w a r e s h o w n i n ta,ble III.
tion. The compared tionship was used
s i g n s of t h e c o e f f i c i e n t s c a n h o w e v e r b e b e t w e e n t h e m s e l v e s . A m e a s u r e of r e l a e q u i v a l e n t to t h e c o r r e l a t i o n c o e f f i c i e n t w h e n t h e p r o p e r t i e s of t h e a m i n o a c i d s
TABLE i l I .
N u m b e r o[ Groups (a) of Mean First Passage Times w h i c h were signi[icantly (P < 0.05) correlated w i l h each of the important properties. Number ol groups out ol 20 which were significantly (P < 0.05) correlated Property (c)
Mean Iirst Reduced transpose passage times mean first passage times (b)
Dissoeialion conslanls : 9
pk '2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
pI
pk' pk' pk' pk'
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(COOH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (NH 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Side chain, other a d d s at 7.0) . . . . . . . . . . . . . . . . . . . . (Side chain, other acids at 11,0) . . . . . . . . . . . . . . . . . . . .
1
19 11
2
8
2 4 4
19
Abiogenic synlhesis : 13
Amino acids present in primitive e n v i r o n m e n t . . . . . . . . . . . . . Acids syntbesised by m e t h a n e , ammonia and electrical discharge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Thermodynamic properties : 6
H e a t capacity (Cop) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Absolute e n t r o p y (S °) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H e a t of combustion ( - A H°D . . . . . . . . . . . . . . . . . . . . . . . . . .
Dicholomous properlies : Aliphatie monoamino monoearboxylie acids . . . . . . . . . . . . . . Amino acids ~rith non-polar residues . . . . . . . . . . . . . . . . . . . . .
20
i 'r
18
Other properties : Molecular weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solubility in water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum m o m e n t of inertia (a-,~) . . . . . . . . . . . . . . . . . . . . . . .
i '
17 16 8
(a,) Each group consists of 19 m e a n first p a s s a g e t i m e s f r o m some a m i n o acid a~ to the o t h e r 19 a m i n o acids. There are, therefore, 20 such groups. (b) If R (aj, ak) is t h e me.an, f r s t passage time f r o m a m i n o acid aj to amin,o acid ak t h e n t h i s t r a n s f o r m a t i o n R ' ( a j , ak) is the average n u m b e r of m u t a t i o n - f i x a t i o n events necessary to arrive for the first time at aj, given t h a t the original a m i n o acid is ak as c o m p a r e d to t h o s e n e e s s a r y to proceed f r o m ak ¢o ak. (e) For exact p r o p e r t y specification a n d references see text.
T h e m e a n first p a s s a g e t i m e s w e r e also r e p r e sented in 4-dimensional space by non metric mult i d i m e n s i o n a l s.caling f o l l o w e d b y p r i n c i p a l c o m . p o r t e n t a n a l y s i s L7, 8, 9, 10]. T h e r e s u l t s of c o r r e l a t i n g e a c h of t h e p r o p e r t i e s l i s t e d b e l o w w i t h t h e d i f f e r e n t c o m p o n e n t s of t h e m e a n f i r s t p a s s a g e t i m e s p a c e a r e g i v e n i n [4, 5] a n d s u m m a r i z e d i n t a b l e IV. T h e s i g n s o f t h e c o r r e l a t i o n c o e f f i c i e n t s for semi-continuous properties in this analysis have no absolute meaning because both nonmetric multidimensional sealing and principal com,ponent analysis are invarianl under retire-
BIOCHIMIE, 1975, 57, n ° 10.
w e r e i n d i c h o t o m o u s f o r m ~1, 5]. T h i s m e a s u r e is normally distributed with zero mean and unit s t a n d a r d d e v i a t i o n [21]. T h e s i g n of t h i s m e a s u r e is o p p o s i t e to t h a t of t h e e q u i v a l e n t c o r r e l a t i o n c o e f f i c i e n t if t h e p r o p e r t y w e r e s e m i - c o n t i n u o u s . All t h e s e r e s u l t s w o u l d s e e m to i n d i c a t e t h a t t h e f o l l o w i n g p r o p e r t i e s c o u l d h a v e a c t e d as c o n s traints in the evolution of the genetic code. Presu.mably, t h e s e a r e t h e r e l e v a n t p r o p e r t i e s b e c a u s e of t h e i r i m p o r t a n c e i n t h e s u r v i v a l of p r i m i t i v e organisms.
1151
The evolution of the genetic code.
the code. The analyses show that pk'z which cons i s t s of a m i x t u r e of p k ' (NH.~) a n d p k ' ( s i d e c h a i n ) is c l o s e l y l i n k e d t o t h e s e l e c t i v e t e n d e n c i e s present in the code. A wariable was formed which c o n s i s t t ~ d of t h e p k ' ( s i d e chai~n) f o r a l l a m i n o
a. T h e D i s s o c i a t i o n C o n s t a n t s [11]. T h e e n z y m , a t i c c a p a b i l i t i e s of t h e p r i m i t i v e c e l l s through the urenzymes present were most probab l y d e c i d i n g f a c t o r s i n t h e b a t t l e f o r s u r v i v a l [2].
TABLE IV.
Correlation of I m p o r t a n t P r o p e r t i e s w i t h the P r i n c i p a l C o m p o n e n t s (a) of the M e a n First Passage T i m e Space. Component number t Component n?mb_er_2 Component number 3 C o m p o n e Property (c)
. . . . . ~ tn I tb~ tC°rrelati°nl N ~0 t~ /b Correlation t~°rrelau°nlcoefficientN . . . In . . .l~th~ CorrelatiOncoefficien_~_~t"" ~v, ~ , / coefficieat [ " ~"'~ '~ coeflicien! N (0A) (b) -
pk2 . . . . . . . . . . . . . . . . . . . . . . . . . . . pi.; ............................ pk (COOH) ............... I p k ' (NH.~) . . . . . . . . . . . . . . . . . . . [ p k ' (Side chain, o t h e r acids at)
i
-
I
--
oA-st
(')
'1
/
-
i
-
11.0)
........................
/
-
i
-
Abiogenie sllnthesis :
I
A m i n o acids p r e s e n t in t h e pri-] .......... mitive environment . Acids s y n t h e s i s e d by m e t h a n e . a m m o n i a , and electrical discharge /
1 _
---
7.0) .........................
I
0.545(.)
I ---
Thermodynamic properties :
t I I I
_
---:
I
- 0.583 (-),
HSfo. e a t .ot . . . .c. o. .m. . b. .u. s. .t.i.o. n. . . .(. . . '5 H°o) . • .~, - - ]/ E n t h a l p y of f o r m a t i o n of corn- [ [ p o u n d ( - A Hfo) . . . . . . . . . . . . . . . ~ - 0 . 6 1 8 (*)I Free energy c h a n g e for formation~ I of c o m p o u n d ( - A Gf °) . . . . . . . . . / - 0.618 (*)1
~- -
i 00ii,5(( )
--
- 2.315 (')
- 0.547 (') --
/ A~
- 2.355 (*) _ .
i
A b s o l u t e e n t r o p y (S°) . . . . . . . . . . E n t r o p y change for f o r m a t i o n
number ~
---
Dichotomous properties : Aliphatie monoamino monocar- I boxylie acids . . . . . . . . . . . . . . . . . / A m i n o acids w i t h n o n - p o l a r resi- I dues . . . . . . . . . . . . . . . . . . . . . . . . . ~
Other properties :
--
--
--
--
--
--
I
Molecular weight . . . . . . . . . . . . . . . i Solubility in w a t e r . . . . . . . . . . . . . I M a x i m u m m o m e n t of inertia (~-~)
--
--
--
--
--
--
0.650 (*') 0.607 (**) 0.608 (**)
- 0.
C)
--
(a) O n l y sign,ificant (P < 0.0.5) r e s u l t s a r e given. (b) A m e a s u r e o f r e l a t i o n s h i p e q u i v a l e n t to t h e c o r r e l a t i o n coefficient w h i c h w a s u s e d w h e n t h e p r o p e r t i e s of t h e a m i n o acid.s w e r e i n diehotomo.us f o r m [4, 5]. T h i s m e a s u r e is n o r m a l l y d i s t r i b u t e d w i t h zero m e a n and, u n i t s t a n d a r d d e v i a t i o n [21]. (c) F o r exact p r o p e r t y specification a n d r e f e r e n c e s see text. (*) Significant a t P < 0.05. (**) Significant a t P < 0.01.
E n z y m e e f f i c i e n c y is v e r y d e p e n d e n t o n t h e d i s s o cia~io,n c o n s t a n t s [12J a n d t h e s e c o n s i d e r a t i o n s w o u l d s e e m t o c o n f i r m t h a t p k ' 1, p k ' 2 a n d p k ' 8 could have been important in the evolution of
BIOCHIMIE, 1975, 57, n ° 10.
a c i d s w h i c h h a v e r e a c t i v e s i d e c h a i n s a n d 11.0 f o r all t h e o t h e r s . T h i s n e w v a r i a b l e p r o v e d i v b e a l m o s t a s c l o s e l y r e l a t e d t o t h e c o d e as p k ' 2. I t w a s m o r e c l o s e l y r e l a t e d to t h e c o d e t h a n w h e n
c2
0.56 0.54 0.54
15 69(") (1,35) 14.78 C ) (1,35) 1 4 . 0 6 ( " ) (I ,35)
(1,32) 0.41 0.36 0.34
0.00
0. Ol
0.18
0.07
(1,32)
7 . 1 8 ( ' ) (1,35) 5 . 2 8 ( ' ) (1,35) 4.52 (D (1,35)
.
I I
---
---
3.50~35)
[ Ii i
0.24
0.23
0.23 0.25
0.31
0.41
o.u
.14
0.85
0.85 1.02 0.92
3.11 1.61
o.71
.89
(1,35) 2,3,8
(1,15)[2,3,8
(1,15)12,3,8 (1,15)12,3,8 (t,15) 2,3,8
(1,15) 2,3,8 (1,15) 2,3,8
(1,35:2,3,4,8
(1,35 2,3,8
0.12
0.57 0.58 0.22
O.63 0.51
0.74
0.45
0.69
0.57 0.48 0.38 0.78
0.30 0.12 0.48 (1,35) 2,3,4,8 0.74 0.10 0.04(1,35) 0.33 4.44(') (1,35)12,3,4,8 0.69 0.30 f3.57(1,35)/ 0 20 1.46 (1,35)12.3,4,8 0.68
--
0.20 1.45(1,35)
0.24
0.,6
-
[ (4,32) (4,32) (4,32) (4,32) (3,33) (3,33)
(3,29) 9 . 8 3 ( " ) (4,32) 7.24 (") (4,32) 6 . 9 5 C ) (4,32)
0.14
4.72(**) (3,29) 4 . 8 9 ( " ) (3,29) 0.48 (3,29)
3.44(') (3,29)
6 . 3 4 ( " ) (3,29)
9.7o(-') (4,32)
3.90(') 2.33 1.35 12.76C) 9.83(") 2.72(')
r..
Multiple correlation
0.17 i.08_ (1,35)i2,3,4,81 o.29 3.19 (1,35) 2,3,4,81 0.23 .94 (1,35) 2,3,4,81 0.37 .68(')(1,3~) 2,3,4,~
--
~io.oo- - (1,35)j 0.00(1,35)! 0.00(1,35) 10.00 (1,35)
Site number 8 - leu
-
0.00 0.00 0.01 0.00
iS~tenumber 4-gln
i
(I,34) i
(1,34) (1,34) (t , 3 4 )
2.23 0.99 0.08
O.25 .17 .05
8 . 6 0 ( " ) (1,32) o 14 (1,32)
14.38(")
0,56 0.46 0.07
(1,34)
(1,34)
!. 48
16 6 3 ( " ) (1,35)
4.81(') (1,35)
(1,35) (1,35) (1,35) (1,35) (1,35)
0.01
¢}. 57
035
0.44 8.29 C ) 1~ 0,37 0.22 1.80 o.57 I I 7 . 1 3 ( " ) 0 38 I 6 o6(')
_~.~
Site number 3 - fie
~J 21) 0.02
1 6 . 6 3 ( " ) (1,35)
(1.35) (1,35) (1,35) (1,35) (1,35) (1,35)
12.77 C ) (1,32) D~.o0(") (1,32)
1
475(') 6.3,t(') 3.41 21 2 2 ( " ) 16.39(") 123
0 53 0.49
0.57
0.35 0.39 0.30 0 61 o 56 0.18
Site number 2 - tyr
Sites on oeytoein
Only non-dichotomous properties were included in this analysis. Degrees of freedom follow F walue and are in ( ). For exact property specification see text. A continuous variant of the p r o p e r t y (explained in [4, 5]). - - Insufficient observations for w o r t h w h i l e calculation of correlation coefficient. (*) Significant at (P < 0,05)'. (**) Significant at (P < 0.61,).
(a) (b) (c) (d)
Molecular weight . . . . . . . . . . . . . . . . . Solubility in water . . . . . . . . . . . . . . . . Maximum moment of inertia (a-[~).
Other properties :
Heat capaeity (C%) . . . . . . . . . . . . . . . Absolute entropy (SO) . . . . . . . . . . . . . Entropy change formation from elements A Sfo . . . . . . . . . . . . . . . . . . . . Heat of combustion - ~ H°e . . . . . . . Enthalpy of formation - A Hf- . . . . . Free energy change for formation - A Gf o . . . . . . . . . . . . . . . . . . . . . . . .
Thermodynamic properlies :
Amino acids present in the primitive environment (d) . . . . . . . . . . . . . . . .
Abiogenic synthesis :
pk~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pk' (COOH) . . . . . . . . . . . . . . . . . . . . . . pk' (NHa) . . . . . . . . . . . . . . . . . . . . . . . . pk' (side chain, other acids at 7.0)... pk' (side chain, other acids at 11.0).
Dissocialion constants :
Property (e)
V.
Regression Analysis. ~iological A c t i v i t y and I m p o r t a n t Properties (a). Ocytocin (All Data).
TABLE
g~
t~
The evolution of the genetic code. the acids w i t h non r e a c t i v e side c h a i n s w h e r e given a neutral p H of 7.0. This r e s u l t might ind~care that r e a c t i o n s w i t h i n p r i m i t i v e o r g a n i s m s took p l a c e in a b a s i c e n v i r o n m e n t .
b. Amino Acids Present in the Primeval Environment [13, 14, 15, 16]. This p r o p e r t y s e p a r a t e s the easily s y n t h e s i z e d a n d stable a c i d s from those that c a n n o t be synthesized in a b i o g e n i c c o n d i t i o n s a n d are most p r o b a b l y also quite unstable [13, 14, 15, 16]. A c o d e w h i c h favours mutation.s to a n d f r o m a m i n o a c i d s c o m m o n , or at least available, in the p r i m e v a l env i r o n m e n t has obvious s u r v i v a l advent, ages for p r i m i t i v e o r g a n i s m s p r o b a b l y i n c a p a b l e of synthesizing a m i n o a c i d s themselves.
e. Thermodynamic Acids [17].
Properties of the Amino
N e a r l y all the p r i n c i p a l themnodynami.c p r o p e r ties of the a m i n o a c i d s w e r e l i n k e d t h r o u g h one or m o r e of the c o m p o n e n t s to the selective tendencies i n h e r e n t in the code. These p r o p e r t i e s w e r e heat c a p a c i t y (C°p), absolute e n t r o p y (S°), e n t r o p y change for f o r m a t i o n of the a c i d f r o m the elements (h Sfo), heat c o m b u s t i o n ( - - h Hc°), e n t h a l p y for f o r m a t i o n of the a c i d ( - - h Hf°), a n d free energy change for f o r m a t i o n of the a c i d ( - - h Gf°). These p r o p e r t i e s could p e r h a p s p'lay a role in p r o t e i n molecule c o n f o r m a t i o n . The a m o u n t of energy w h i c h was r e q u i r e d to synthesise an a m i n o a c i d either w i t h i n the o r g a n i s m or t h r o u g h c h e m i c a l evolution in the p r i m i t i v e env i r o n m e n t could also be r e l a t e d to these p r o p e r ties.
d. Molecular Weight [18]. This p r o p e r t y is i m p o r t a n t in the s t r u c t u r a l c o n f o r m a t i o n of s h o r t p r o t e i n s , e s p e c i a l l y for amino acids in the i n t e r i o r of the m o l e c u l e [19].
e. Solubility in Water [6]. B i o c h e m i c a l r e a c t i o n s u s u a l l y take p l a c e in an aqueous e n v i r o n m e n t a n d c o u l d be effected b y c h a n g e s in solubility.
f. Nature of the side chains eg. acidic, basic, non-polar. These p r o p e r t i e s are primari,ly c o n c e r n e d w i t h d e s c r i b i n g the s t r u c t u r e of the acid. T h e i r i m p o r t a n c e in evolution is p r o b a b l y due to t h e i r influence on the c h e m i c a l p r o p e r t i e s of the acid.
BIOCHIMIE, 1975, 57, n" 10.
1153
g. Moment of Inertia (a-~). This p r o p e r t y is p r o b a b l y of less i m p o r t a n c e . It is m e n t i o n e d in [6] and t h o u g h t to be r e l a t e d to enzyme f u n c t i o n (Sheath P.H.A., p e r s o n a l c o m munication). Multiple r e g r e s s i o n [20] w a s also used to exam i n e the r e l a t i o n s h i p s b e t w e e n these p r o p e r t i e s of the a m i n o acids a n d b i o l o g i c a l a c t i v i t y in Vasop r e s s i n and O c y t o c i n analogues [4, 5]. F o u r i n d e p e n d e n t v a r i a b l e s w e r e once again c o n s i d e r e d . This time these w e r e the differences in p r o p e r t y values b e t w e e n the a m i n o a c i d in the n a t u r a l l y o c c u r r i n g p r o t e i n at each of the f o u r sites a n d the a m i n o a c i d f o u n d in the s y n t h e t i c analogue. The r a n k e d differences in b i o l o g i c a l a c t i v i t y b e t w e e n the n a t u r a l l y o c c u r r i n g p r o t e i n an,d the s y n t h e t i c analogue w e r e taken as d e p e n d e n t v a r i a b l e s . The h i g h l y significant c o r r e l a t i o n coeffieients w h i c h w e r e f o u n d w o u l d seem to c o n f i r m that these p r o p e r t i e s w e r e i m p o r t a n t as c o n s t r a i n t s in the evolution of the genetic code. Results for O c y t o c i n (all d a t a ) a r e s h o w n in table V.
DISCUSSION. The genetic code as f o u n d in p r e s e n t d a y organisms has p r o b a b l y s u r v i v e d against its competitors b e c a u s e it was the most efficient in e n s u r i n g the s u r v i v a l of the primi,tive o r g a n i s m . S u r v i v a l of these o r g a n i s m s could v e r y w e l l have d e p e n d e d on the effective use of energy, a v a i l a b i l i t y of the a m i n o acids in the p r i m e v a l e n v i r o n m e n t a n d the c o n f o r m a t i o n of the r e s u l t i n g p r o t e i n s . These survival c h a r a c t e r i s t i c s could be l i n k e d w i t h the diss o c i a t i o n constants an~d the t h e r m o d y n a m i c p r o p e r t i e s of the anfino acids. F u r t h e r m o r e , the results of analyses b a s e d on the m o d e l d e s c r i b e d in [1] also seem to i n d i c a t e the i m p o r t a n c e of c o m p o n e n t n u m b e r 3 of the m e a n first pass,age times w h i c h is in t u r n r e l a t e d to the t h e r m o d y n a m i c p r o p e r t i e s of the a m i n o a c i d s (table IV). All these results seem to s h o w that t h e r e is a r e l a t i o n s h i p b e t w e e n the evolution of l i v i n g matter a n d the b a s i c p r o p e r t i e s a s s o c i a t e d w i t h the p h y s i c a l c h e m i s t r y of the a m i n o acids.
A cknowledgemen ts. The Authors would Hke to thank Mrs Valerie Turner of the Programming Section of the Departmenl of Applied Slatistics of Reading University for her help with programming and computing, M.r C. Banfietd of the Departmenl of S'tatistics of Rothamsted Experimental S~atioa for the computation of non metric multidimensional scaling, principal components and polycrustes rotation and Dr. H. Vogel (CNRS) for his znany comments which helped to clarify tbis work,
R. P. Jorrd and R. N. Curnow.
1154 R~suM~.
On d.6crit un mod61e p o u r l'6volution, des prot~ines clans les o r g a n i s m e s qui se d6vel'oppaient d'ans r e n v i x o n n e m e n t primitif. Le modble s'emploie ensuite pour d 6 m o n t r e r quelles propri6t6s, chimiques et biologiques, p o u r r a i e n t avoir jou6 un r61e i m p o r t a n t dans l'6volution d'u code g~n6tique.
9. Kruskal, J. B. ~1971) Psychometrika, 36, 57-62. 10. Gower, J. C. (1971) Statistical Methods for Com-
paring Different Multivariate Analyses of the Same Data, Mathematics in the Archaeological and Historical Sciences (Hodson, F. R., Kendall,
11. 12. 13.
REFERENCES. t. J~orr~, R. P. ~ 2. Woese, C. R. Row, New 3. Sonn~born, T.
Curn'ow, R. N. f1975) 57, 1141-1146. (1967') The Genetic Code, Harper York. (1965) Degeneracy of the Genetic
14. 15. 16. 17.
Code : Extent, Nature and Genetic Implications, Ed. Bryson V and Vogel H. Academic Press, New York. 4. J.orr6, R. P. (1~974) P h . D . Thesis, University of Reading. Seatistieal analysis of t h e evolution of protein,s and of the genetic code. 5. Jorr~, R. P. ~ Curnow, R. N. (1975) Paper submitted for publication. 6. Sheath, P. H. A. (10(~6) J. Theorel. Biol., 12, 157-195. 7. Kr~skal,, J. B. (1954) Psychometrika, 29, 1-27. 8. Kruskal, J. B. (1964) Psychometrika, 29, 115-129.
BIOCHIMIE, 1975, 57, n ° 10.
18.
19. 29. 21.
D. G. and Tautu P., Eds') 13.8-149 Edinburg University Press, Edinburgh. Meister, A. (1965) Biochemistry of the Amino Acids, Academic Press, New York. Karlson, P. (1'969) Introduction to Modern Biochemistry, Academic Press, New York. Abelson, P. H. (1.957) Ann. N. Y. Acad. Sci., 2, 276-2:85. P o n n a m p e r u m a , C. (1971) Quart. Rev. Biophys., 4, 2 ÷ 3, 77-106. Lemmon, R. M. (1970) Chem. Rev., 70, 95-109. Abclson, P. H. (19591) Prog. Chem. Org. Nat. Prod., 17, 379-463. Hutchens, J. O. (1968) in Handbook of Biochemistry, B5, Eds Sober H.A., Harte R.A., The Chemical Rubber Co., Cleveland, Ohio. Dayhoff, M. O. (1072) Atlas of Protein Sequence and Structare, Vol. 5 National Biomedical Research Foundation., Georgetown University, Medical Centre, W a s h i n g t o n D.C., U.S.A. Epstein, C. J. (1967) Nature, 215, 355-359. Kendall, M. G. ~ Stuart, A. (1961) The Advanced Theory of Statistics, Vol. 2i, C. Griffin, London, Quenouil~le, M. H. (19591 Rapid Statistical Calculations (Method 14), C. Griffin, London.