ELSEVIER
Bioelectrochemistry and Bioenergetics 44 (1997) 23-29
A correlation between the genetic cede and the structure and electrical properties of bio amino acids V.Ph. Pastushenko *, A.V. Pastushenko Johamws Kepler Universi~., of Linz, Altenbergerstr. 69, A-4040 Linz, Austria Received 4 December 1996; received in revised form 21 March 1997
Abstract Simple rules are formulated for defining the polarity and acidic-basic properties of amino acids (aa) on the basis of oxygen and nitrogen content of their side groups. A correlation between the genetic code and the electrical properties of aa is investigated; the study involves a determin,-tion of the extent to which the codons in the same and in different groups of aa are related. Two codons are considered to be related if they differ in only one position. Two systems of symbols for the genetic code were used, a standard one and a "reduced' one. The average relatedness within the groups of nonpolar and of acidic aa are greater than the cross relatedness of these groups with others. The inner relatedness of uncharged polar and of basic aa are greater than their cross relatedness with nonpolar aa, but less than their cross relatedness with each other and with negatively charged aa. The distinctness of the separation of aa into different groups is evaluated quantitatively. The greatest distinctness is attained with the reduced symbolics when the aa are divided into two groups (nonpolar and other aa), and with the standard symbolics when the aa are divided into three groups. The distinctness of aa separation into groups with different electrical properties is greater when the reduced symbolics is used. © 1997 Elsevier Science S.A. Ke.xavords: Genetic code; Amino acid structure; Correlation; Electrical properties
1. Introduction The main aim of modern biology is the study of the molecular mechanisms of the processes of life. The maintenance and reproduction of all living systems currently known is dependent on cellular biosynthesis, in which the central role is played by the mutually catalytic interaction of two main classes of substances, proteins and nucleic acids (na). Very likely, the cooperation between these substances was already highly developed in precellular forms of life. Therefore, the study of these interactions may help us come closer to understanding the origin of life. Proteins and na consist mainly of the same four elements from the first two periods of the periodic system of elements, H, C, O and N. In addition, two neighboring elements from the third period are also present: S in proteins and P in na. The main modifiers of hydrocarbon
* Corresponding author. To the memory of Ph.E. Pastushenko--V.P., A.P. "What counts is the way in which the facts are interpreted and organized into theories that advance our understanding." C. de Duve. 0302-4598/97/$17.00 © 1997 Elsevier Science S.A. All rights reserved. Pll S 0 3 0 2 - 4 5 9 8 ( 9 7 ) 0 0 0 4 3 - 3
chains are O and N. These enable the molecular interactions to take place via hydrogen bonds, both within and between the two classes. In other words, proteins and na are two close chemical relatives, which 'speak the same language of hydrogen bonds'. The early stages of chemical evolution have led to the selection of several bases, which in combination with inorganic phosphate and ribose give rise to the four nucleotides U, C, A, G, the building blocks of RNA, and several amino acids (aa, the same abbreviation is also used for "aminoacyl'), the building blocks for proteins. It is very likely that the first polypeptides consisted of a different set of aa than the twenty aa which are characteristic for contemporary forms of life. At present, practically all catabolic and anabolic reactions in living organisms are catalyzed by proteins, the main 'chemists' of the cells. The evolution of the elaborate system of proteins has become possible due to the 'education" of living systems which have 'learned" in the course of evolution to write down the information concerning the technology of protein synthesis in the form of nucleotide sequences, and to transfer this knowledge direr ,'.iy to their progeny. This has eventually led to extremely intricate biochemical pathways and to the evolution of multitudi-
24
V.Ph. Pastushenko, A.V. Pasmshenko / Bioelectrochemistr3.' and Bioenergetics 44 (1997) 23-29
nous forms of life. Therefore, in spite of the absence of a clear boundary between living and nonliving matter, it is reasonable to consider the self-reproducing systems that existed before the appearance of the genetic code as nonliving. Regarded in this manner, the problem of the origin of life becomes identical to that of the evolution of the genetic code. According to Monod [1], 'the fiddle of the code origin' is one of the main biological challenges. In handbooks of molecular biology the high degree of order of the genetic code is emphasized, although the code is frequently considered to not be directly connected with the structure of the aa [2-4]. At the same time, the similar chemical compositions of na and proteins, together with important role of O H - N and N H - N hydrogen bonds in their interactions, indicates that there might possibly be a correlation between the structure and the properties of aa, as well as a correlation between aa and the genetic code. In fact even before the code had been completely elucidated its high degree of order was discussed in connection with some properties of aa [5-8]. The considerable interest in the logic behind codon assignment led to the search for its biochemical basis. In particular, the frequent irrelevance of the third nucleotide (and especially of which Pu-nucleotide or which Py-nucleotide occupies that site) has been discussed. A correlation between codons for groups of aa related in terms of their chromatographic behavior has been demonstrated [5] (cf. Section 5). In connection with the possible mech~i~;ms for the evolution of the genetic code, the reduction of the translation errors was considered as a possible criterion for natural selection [6]. This has been supported by the following observations. (1) For Py-rich codons, the degeneracy of the code is almost totally due to the llI-nucleotide, which is the one most prone to error. (2) Most of II-Pu codons are assigned to 'functional' aa, whereas more error-prone II-Py codons are mostly assigned to 'nonfunctional' aa. Similarly, the importance of ll-nucleotides in the determination of the polarity of aa was indicated in [7]. Somewhat later it was suggested that 'the structure of the codon system is primarily an imprint of the prebiotic pathways of aa formation' [8]. These points will again be considered in Section 5. The aim of this paper is to discuss the correlation between the structure and the electrical properties of aa, and to semi-quantitatively characterize the correlation between the electrical properties of aa and the genetic code.
Fig. !. Amino acid group and its short notation as a boat-shaped outline.
circle. The following convention is employed in this and the subsequent figure: N with four bonds is positively charged, N with three bonds is neutral. Analogously, O with two bonds is neutral, and O with one bond is negatively charged. This enables us to simplify the formulas by omitting corresponding + and - signs. The fact that the polypeptide chains emerge from ribosomal complexes with their N terminus in front is an additional mnemonic advantage of this notation; it makes the birth of a protein seem similar to that of an animal (head first). Using these conventions, the structures of the 20 aa commonly found in proteins are shown in Fig. 2, together with the one letter (not used here) and three letter notations for them.
A ala
c!. o
cys
asp
I
phe
T
K L lys ~ X leu
E
~ , X ile
met
asn
S ser
2. A connection between the structure of aa and their electrical properties In order to maximally emphasize the side chains, the homogeneous structural element NH~--CH0-COO- (at pH 7) will be represented only by its outline as shown in Fig. 1. Heteroatoms of oxygen are designated by empty circles, a heteroatom of nitrogen is represented by a filled
Fig. 2. A 'zoo' of amino acids, shown for reference. Essential amino acids are shown in color. S stands for sulfur. Empty circle = Oxygen, filled circle= Nitrogen. Hydrogen atoms correspond to free ends of bonds, carbon atoms are at the points where four bonds cross.
V.Ph. Pastushenko, A. V. Pastushenko / Bioelectrochemistry and Bioenergetics 44 ( ! 997) 23-29
Strictly speaking, the electrical properties of aa should reflect all of the reactions in which they or proteins made up of them participate. Here under electrical properties of aa we understand only the properties which they exhibit as monomers under typical conditions and which are reflected by their charges (charged or neutral) and the polarity of the corresponding aminoacyls, i.e., side chain groups. The aa are usually separated into four different groups according to these properties: nonpolar, uncharged polar, negatively charged (acidic) and positively charged (basic) ones. These groups are shown in Table 1. The aa are classified there according to the number of the following atoms in their side chains: (1) carbon, (2) oxygen and (3) nitrogen in oxygen-free side chains. With the minor exception of trp, three of the four aa groups appear in this table as one column (acidic, nonpolar and uncharged polar). The basic aa form three adjacent columns. Such a close relation between the electrical properties of aa and their structures makes Table 1 a kind of "periodic table of aa'. Looking at Table 1, one would expect trp to be a basic aa, just like lys, his and arg. However, according to [2], trp is uncharged polar, whereas according to [3,4], trp is nonpolar. This uncertainty reflects the fact that different hydrophobicity scales are used for the classification of the electrical properties of aa [9]. Based on Table 1, one can formulate the following rules. 1. Aa with side chains not modified by N or O are lt~iipolar.2. Aa with side chains modified by one atom of O are uncharged polar, irregardless of whether there is any additional modification by N. 3. Aa with side chains modified by two atoms of O are negatively charged (acidic). 4. Aa with side chains modified only by N atoms are positively charged (basic), except for nonpolar trp. Taking into account the fact that the electrical properties of aa considerably influence the properties of the proteins, one might expect the strong structure-properties correlation to be reflected in the genetic code. The basis for such an expectation is that the code must have originated as a result of mutually favorable interactions between oligonucleotides and oligopeptides. Of course, these
Table 1 Correlation between structure and properties of amino acids Carbon Oxygen 2 9 7 4 3 2 ! 0
Nitrogen 1
0
1
2
3
trp (nonpolar) tyr glu asp
gin asn. thr ser
phe lys leu. ile pro. val, met
ala, cys gly acidic uncharged polar nonpolar
basic
his arg
25
interactions should have also included mononucleotides, aa and their combinations with mononucleotides, and some other small molecules.
3. A correlation between the genetic code and the structure of aa
The genetic code is the universal code which is found in all cells, viruses and plant mitochondria and chloroplasts. Some deviations in the mitochondrial codes of mammals, Drosophila and yeasts will be discussed later in the context of exceptions which confirm the rule. The genetic code is usually said to be degenerate due to the fact that there are several different codons for most aa, and also for the "empty aa" STOP. The term 'degenerate' reflects the informational aspects of the code; perhaps to a lesser degree it manifests the inner logic of the coding. In our opinion, this logic is more clearly seen by employing several additional symbols: • = (UICIAIG) for any of the four nucleotides (vertical bar means 'or'), [] =(AIG) for purines, i.e., either adenine or guanine, and O = (UIC) for pyrimidines, i.e., either uracyl or cytosine. The underlining means 'anything but'. Thus, the codon AUG(ile) means any of the codons AUU, AUC or AUA. The codon UGG(STOP) means any of the three codons UAA, UAG and UGA, i.e., any possible combination of Pu at the second and third positions except for GG. With the use of these symbols, only three aa remain not 'uniquely' coded: ieu, ser, arg. The genetic code in such a reduced form is shown in Table 2, in which the aa are divided into |bur groups according to the different modifications of their side cl'~ains by O or N (where O is present, the atoms of N are not taken into consideration). Except for the aa STOP(UGG) and trp(UGG); ile(AUC) and met(AUG), whose codons will be considered as pair wise related and discussed later in connection with some of the mitochondrial codons, all aa are coded by an even number of codons. This enables us to use a compact form of notation for 17 aa, which involves the utilization of the symbols *, [] or O in the third position. This shows that there is a certain chemical logic to the coding, expressed in the tendency of codons to form groups according to the nature of their Ill-nucleotides ti.e., whether they are Pu or Py). One may see from Table 2 that the codons for almost all aa not modified by O or N have Py in the second position, with the exception of gly and cys. On the other hand, the codons for all aa modified by O or N have Pu in their second positions, with the exception of thr and (partly) of ser. Thus, as a rule, II-Pu codons are characteristic for modified aa. This rule is strictly true for groups 3 and 4; small deviations are observed in groups 1 and 2. Io. view of strong correlation between the structure and electrical properties of aa, this is an essential factor in f ~vor of the point of view that the coding of electrically active aa is logically connected with the presence of Pu in the second w
26
V.Ph. Pastushenko. A. V. Pastushenko / Bioelectrochemisto" and Bioenergetics 44 (! 997) 23-29
Table 2 Amino acids grouped according to the modifications of the side chains ! nonmodified 2 one O modified 3 only N modified 4 two O modified asp GAO set UC*, AGQ lys AA[] gly GG* glu GA • ash AAO his CAO ala GC * thr AC * arg CG *. AG[] cys UGO gin CA [] tip UGG pro CC* tyr UAO val GU * met AUG leu CU*, UU• ile AUG phe UUO m
• = Py; • = Pu; , = PulPy.
position. The role of II-nucleotide has been discussed previously in closely related contexts [5-7]. One may characterize the distribution of Pu and Py in each group of aa by corresponding probabilities, assuming certain rules for determining the statistical weights of the different codons. This could be done, for instance, by using the information concerning how frequently different codons me expressed. This kind of information is not yet available, and it will surely vary according to the species being considered and many additional factors. For this reason, in the first approximation, we employ an 'ergodic' principle, i.e., we assume equal statistical weights for different aa and equal statistical weights for different codons corresponding to a given aa. This approach enables us to estimate the probabilities of Pu in different positions. For the first and the third positions, these probabilities are approximately 0.5, irregardless of the type of symbolics. For probability of II-Pu codons, employing the reduced symbolics, we obtain 2 / 9 for nonmodified aa and 19/22 for modified aa. Correspondingly, due to the complementarity of Pu and Py, expressed by the equation P(Pu) + P ( P y ) = l the probabilities of ll-Py codons for nonmodified and modified aa are 7 / 9 and 3 / 2 2 respectively. As mentioned in [7], in the interpretation of these regularities an important role may be played by the structure-function relationships of different enzymes which participate in all stages of the biosynthesis of proteins. Especially important are aa-tRNA synthetases, because loading tRNA with aa closes the logical loop of expressing the genetic information. Although it is not excluded that aatRNA synthetases have appeared only at later stages of the evolution of the genetic code, they attract special attention [10-14]. Some essential differences between procaryotic and eucaryotic aa-tRNA synthetases have been found [ 1013], and some first insights into their structure and possible functional aspects were presented [14]. An interpretation of the correlation between II-nucleotides and the properties of aa may become possible on the basis of a more detailed study of evolutionary aspects of interactions between aatRNA synthetases and tRNA.
4. Relatedness of codons for different electrical groups of aa The similarity between codons within different groups of aa has frequently been mentioned. For instance, the codons which differ only in one position have been regarded as 'contiguous' [8]. We shall consider such codons to be 'related'. The correlation between the relatedness and the electrical properties of aa may be quantitatively characterized on the basis of the 'ergodic' assumption by calculating the average relatedness of codons belonging to the same group of aa (inner relatedness) or to different groups of aa (cross relatedness). Different groups of aa, selected according to their electrical properties, are shown in Table 3. This table differs from Table 2 only in respect to the position of trp. As already mentioned, we shall consider two codons to be related, if they differ only irt one position. The value of relatedness is l for related and 0 for nonrelated codons. For example, the codons AUG(ile) and AUG(met), as well as UUIl(leu) and UUO(phe) are related no matter which type of symbolics is used. When the reduced symbolics is used, the codons GG * (gly) and GC * (ala) are also related, because they differ only in the second position. However, within the framework of the standard symbolics these codons are only partially related. The codons CU * and U U O are not related within the framework of the reduced symbolics, and partially related within that of the standard symbolics. The average values of relatedness per one comparison are shown in Table 4, in which the diagonal elements of different matrices correspond to inner relatedness (i.e., within the same group), and nondiagonal elements correspond to cross relatedness (i.e., between different groups). Comparing left and right matrices in Table 4, one may see that the results obtained for different types of symbolics are quantitatively significantly different. Each entry in these matrices represents the average relatedness per one comparison. Within a given group, only different aa were compared. For instance, for acidic aa four comparisons are possible within the framework of the standard symbolics,
Table 3 Amino acids grouped according to their electrical properties I nonpolar 2 uncharged 3 positively 4 negatively polar charged charged gly GG * ser UC*, AGO lys AA• asp GAO ala GC * asn AAQ his CAO glu GA • cys UGQ thr AC * arg CG *, AG • pro CC * gin CA• vai GU * tyr UAO met AUG ieu CU*. UU[] ile AUG phe UUQ
V.Ph. Pastushenko, A.V. Pastusha~ko / l~ioelectrochemisto" and Bioenergetics 44 (1997) 23-29
but only one comparison is possible within that of the reduced symbolics, and both versions yield the same result. Consider first the results contained in the left matrices of Table 4, which correspond to the reduced symbolics. From the values in Table 4A, which corresponds to the four groups in Table 3, one can see that the most distinct groups are the first and the last ones, i.e., nonpolar aa and acidic aa, because the inner relatedness for these groups is greater than their cross relatedness with other groups. Groups 2 and 3 are relatively well separated from group 1, but less well separated from group 4 and from each other. Group 4 is relatively well separated from groups 2 and 3. It is therefore interesting to consider the relatedness between different groups when groups 2 and 3 are merged together. Such a merging leads to three different groups, Table 4B. There, the inner relatedness of group 2 is still less than its relatedness with acidic aa, but the difference is not so great. Moreover, the separation between groups 1 and 2 is now greater, whereas the separation of group 2 from acidic aa remains almost the same. At the same time the inner relatedness of group 2 is now remarkably greater than the values for groups 2 or 3 in Table 4A. Due to the high cross relatedness of groups 2 and 3 in comparison with the inner relatedness of group 2 in Table 4B, we may further merge together uncharged polar and charged aa into a single group, leaving the group of nonpolar aa unchanged. The corresponding results are shown in Table 4C. Such a division of aa shows a distinct grouping of the codons for electrically "active' and electrically 'passive' (nonpolar) aa, which is formally similar to the division of aa into 'functional' and 'nonfunctional' groups in [6]. Let us now compare these results with those obtained using the standard notation for the codons (see Table 4, right side). The same basic tendencies can be observed, but
Table 4 Relatedness within (diagonal elements) and between (nondiagonal elements) different groups of aa A. Four groups shown in Table 3 0.19 0. I 1 0.04 0.00
0.11 0.20 0.35 0.30
0.04 0.35 0. i 7 0.33
0.00 0.30 0.33 ! .00
0.15 0.08 0.08 0.08
0.08 0. ! 3 0.28 0. i 5
0.08 0.28 0. I 1 0. ! 7
0.08 0. i 5 0. ! 7 !
0.08 0.21 0. ! 6
0.08 0.16 !
0.15 0.08
0.08 0.2 !
B. Groups 2 and 3 from Table A are merged together 0.19 0.08 0
0.08 0.28 0.31
0 0.31 1
0.15 0.08 0.08
C. Groups 2 and 3 from Table B are merged together 0.19 0.07
0.07 0.3 !
Numbers of rows and columns correspond to group numbers. Left: reduced symbolics. Right: standard symbolics.
27
Table 5 Average overall distinctness of aa groups A B C
2.64 3.44 3.66
2.02 2.54 2.28
A, B, C: same as in Table 4. Second and third columns: reduced and standard symbolics, respectively.
there are some indications that the reduced symbolics may more closely reflect the inner logic of biocoding. (i) Within the framework of the reduced symbolics the acidic and the nonpolar aa are completely separated, i.e., they have zero cross relatedness. At the same time, their cross relatedness within the framework of the standard symbolics is nonzero. (ii) One may define an overall distinctness of a given group as the ratio of its inner relatedness to its average cross relatedness with other groups. Such an overall distinctness may be considered as being analogous to the signal/noise ratio. It is then possible to calculate the average values of the overall distinctness for the cases A, B and C from Table 4, using the numbers of aa in each group as statistical weights for the averaging. The results are shown in Table 5; they strongly support the point of view that the reduced symbolics more closely corresponds to the inner logic of biocoding. In all cases the average overall distinctness is higher when the reduced symbolics is used. Moreover, the greatest average overall distinctness obtained with the standard symbolics is lower than the lowest one obtained with the reduced symbolics. The greatest distinctness involves dividing aa into three groups with the standard symbolics and into two groups with the reduced symbolics. This can hardly be interpreted as an indication of a higher resolving power of the standard symbolics in view of the Tables 4 and 5.
5. Discussion We have formulated simple rules for correlating the electrical properties of aa with the chemical structures of their side groups (i.e. modifications of hydrocarbons by heteroatoms of O or N). All the aa follow these rules with the exception of trp, which is more frequently considered to be a nonpolar aa. What could be the explanation for the trp anomaly? The hydrophobicity scales used for classification of aa are so diverse [9] that it is not clear whether this anomaly needs any additional explanation. On the other hand, trp may really be a 'strange" aa. Universally, the codon UGA means STOP, but in the mitochondria of mammals, Drosophila and yeasts it codes for trp (Table 6). The strangeness of trp is reflected within the framework of the reduced symbolics by the fact that it is ~he only one related with the 'empty' aa STOP. Within the framework of the standard symbolics, the value of the relatedness of
V.Ph. Pastushenko, A. !/. Pastushenko / Bioelectrochemistr), and Bioenergetics 44 (1997) 23-29
28
Table 6 Meanings of some codons in universal code and in some mitochondrial codes Codon Universal
Mammals
Drosophila
Yeasts
UGA AUA CUA AGN
trp(UGG) met(AUG) leu STOP(UGG)
trp (UGG) met (AUG) leu ser(UC*,AGO)
trp (UGG) met (AUG) thr(AC* ) arg
STOP(UGG) ile(AUG) leu(CU* ,UU []) arg(CG,,AG[])
Relevant complete universal codons are shown in parentheses.
universal codons for STOP and trp is 1/3, and within the framework of the reduced symbolics it is 1. As the chemical structure of trp poses less uncertainty than its electrical properties, we have first considered the relatedness of codons for different groups of aa defined on the basis of their structures. This approach shows that the II-nucleotides are predominantly Py for nonmodified aa and Pu for modified aa (see also Refs. [6,7]). It is interesting that both in the universal and mitochondrial codes trp has II-Pu codons, like most of other modified aa. The question is, to what extent the II-Pu codons for modified aa might be explained biochemically. Although a really convincing explanation is not yet possible, some preliminary arguments can be put forward. The exceptions to this correlation are (gly, cys) among the nonmodified aa, and thr and to some extent ser among modified aa. The singular behavior of gly might be explained by its practically empty side chain. It has some similarity with the 'strange' aa trp, because (within the framework of the reduced symbolics) its codon is the only one which has the combination GG (within the framework of the standard symbolics, GG is also present in two of six codons for arg, which, like trp, is N-modified). Cys is not modified by either N or O, but it contains S, an element from the same group of the periodic system as O. Somewhat smaller electronegativity of S [3] to some degree explains the fact that the presence of S has not affected the polarity of cys. Nevertheless, the S-heteroatom is apparently important in the interaction of cys with tRNAcys or with cys-tRNAcys synthetase. The deviation of ser and thr from the regularity makes these aa in a certain sense 'hot'. This is confirmed by Table 6 where both aa are present. Comparing the universal and mitochondfial codons for anomalous aa in Table 6, one may see that Pu or Py in the second position are always conserved, although the changes do take place in all of the three positions. Thus, Table 6 represents the exceptions which confirm the rule. One more regularity may be seen from Table 6. With the exception of thr in yeast mitochondria, anomalous mitochondrial codons are related with the universal codons of the same aa. The most likely explanation for this is that the anomaly was introduced in the same manner as a point mutation. The deviation of thr in yeast mitochondria from this tendency could perhaps be explained by the high
mutagenic activity of alcohol, a common product of yeasts; they are the only species in which this deviation is observed. It can also not be excluded that the presence of six different codons for leu could play a role. The probability that one of them might deviate from its standard meaning (in given case this occurred with CUA) is thereby enhanced. The same argument could then explain the alteration of the meaning of the codon AG li in the mitochondria of mammals and Drosophila, because this codon represents two of the six codons for arg. The relations between inner and cross relatedness for different groups of aa, selected according to their electrical properties, show a strong correlation of these properties with the genetic code (Tables 4 and 5). As mentioned above, the electrical groups differ from the structural groups only in respect to the position of trp. A correlation between the genetic code and the functional properties of aa was also discussed in Ref. [5]. We should like to compare the grouping of aa in Ref. [5] and in this paper. The following groups of aa were distinguished in Ref. [5] according to their chromatographic behavior: 1. Phe, leu, ile, val, met. 2. Ser, pro, thr, ala. 3. Cys, arg, ser, gly. 4. Tyr, his, asn, asp, gin, lys, glu. Two minor remarks are possible: (i) ser is included in two different groups and (ii) trp is missing (it is really a strange aa!). More important is that the comparison of these groups with those shown in Table 2 or Table 3 shows that there is not much similarity between these two classifications of aa. For instance, nonmodified aa are here included in three different groups, (1), (2), (3), whereas O-modified aa are included in three other groups, (2), (3),
(4). The results obtained here for dividing aa into two different groups (nonpolar and others) may also be compared with the previous conclusion about the importance of lI-nucleotide in the determination of aa polarity [7]. It was mentioned in [7] that changing I- and III-nucleotides usually does not alter the polarity, whereas changing IInucleotide alters the polarity in 120/192 = 62.5% of all cases. Furthermore if we replace any nucleotide in a given codon by any other arbitrarily selected nucleotide, the resulting codon indicates an aa with the same polarity in 60% of the cases. This may be interpreted as 'signal/noise' ratio of the genetic code equal to 60/40 -- 1.5. Estimated in the same way, the signal/noise ratio for II-nucleotide is 1.7. This number is not very much different from the value of distinctness 2.3 obtained above within the framework of the standard symbolics (Table 5C, right). As we have seen, the reduced symbolics reflects the correlation between the code and the aa much better than the standard one (compare with the value of 3.7, Table 5C left; see also left and fight Tables 4 and 5). Based on these considerations, it seems that the reduced symbolics better reflects the inner logic of biocoding.
V.Ph. Pastushenko, A. V. Pastushenko / Bioelectrochemistr3.' and Bioenergetics 44 (! 997) 23-29
Whether this is true or not, the regularities discussed so far may help us to better understand the origin and the development of the genetic code. They clearly indicate that 'bioelectrochemistry was at the cradle of life', therefore their elucidation should be considered as one of the most important biological problems. Different speculations may be put forward about the chemical processes which may have led to the origin of the genetic code. We believe that an effort to substantiate such speculations should involve a better study of the interactions between oligopeptides and oligonucleotides, with the participation of some ions and small molecules. For instance, the evolution of the genetic code was considered from the point of view of a tendency to minimize translation errors [6]. This approach was based on the concept of 'statistical' proteins inherent to primitive cells. By definition, a statistical protein is reproduced with an extremely high rate of error, so that its correct reproduction occurs only with a negligible probability. The weak point here is that a cell, however primitive it might be, must have possessed a membrane. Therefore, the existence of the cell assumes a well developed system of regulatory membrane proteins, which is incompatible with the assumption of 'statistical' proteins. Thus, it is likely that the genetic code first originated in a primitive form, before the cell came into being. O f course, in the later stages of the code's evolution further perfection of the translational process did take place. Another example may be taken from [8], where it was postulated that the genetic code evolved from prebiotic pathways of aa synthesis. This postulate will necessarily remain a hypothesis as long as such pathways, which do not involve the participation of complicated enzymes, will not be discovered experimentally. Again, the possibility can not be excluded that the synthesis of aa has not left its imprint on the genetic code in the later stages of evolution. Nevertheless, it is possible to make one general remark. Primitive self-reproducing systems could not, by definition, have had highly developed enzymes which are necessary for a rapid reproduction. For this reason, the rate of self-reproduction of primordial oligopeptides and oli~onucleotides (whether on Earth or somewhere else) must have been relatively slow. However, it had to be high
29
enough to compensate for the destructive effect of thermal molecular motion (and other external effects). The life time of different molecules is mainly defined by energetic barriers created by interatomic attraction. For this reason, the bioenergetical methods of biochemistry may provide a basis for an estimation of the life time of primitive molecular systems. Such estimates may serve as a useful constraint in an experimental search for minimal self reproducing molecular systems, which could be considered as possible starting points for life.
Acknowledgements We thank Dr. S. Sokoloff and Dr. H.J. Gruber (Inst. for Biophysics, J. Kepler University of Linz), Dr. V.I. lvanov (V.A. Engelhardt Institute for Molecular Biology, Moscow), Dr. A.V. Efimov (Institute for Protein Research, Pushchino-na Oke, Russia) and E.Ph. Balobanova (Pushchino-na Oke) for several useful remarks.
References [!] J. Monod, in: Chance and Necessity. A.A. Knopf. New York. 1971, p. 143. [2] C. de Duve, A guided tour of the livingcell, Scientific American. New York, 1984. [3] J. Darnell, H. Lodish. D. Baltimore. Molecular cell biology, Sci. Am.. New York, 1986. [4] B. Alberts, D. Bray. J. Lewis, M. Raft, K. Roberts. J.D. Watson. Molecular Biology of the Cell. 2nd edn., Garland, New York. London, 1989. [5] C.R. Woese. Proc. Natl. Acad. Sci. 54 (1965) 71-75. [6] C.R. Woese. Proc. Natl. Acad. Sci. 54 (1965) 1546-1552. [7] M.V. Volkenstein, Biochim. Biophys. Acta ii9 (1966) 421-424. [8] J. Tze-Fei Wong, Proc. Natl. Acad. Sci. 72 (1975) 1909-1912. [9] J.L. Cornette, K.B. Cease. H. Margalit.J.L. Spouge. J.A, Berzofsky. C. DeLisi,J. Mt~l. Biol. 195 (1987) 659-685. [10] F.C. Neidhardt, J. Parker, G.W. McKeever, Annu. Rev. Microbiol. 29 (1975) 215-260. [11] P. Schimmei,Annu. Rev. Biochem. 56 (1987) 125-158. [12] G. Eriani, M. Delarue, O. Poch, J. Gangloff. Nature 347 (1990) 203-206. [13] M. Mirande, Prog. Nucleic Acids Res. Mol. Biol. 40 (1991) 95-142. [14] G. Cerini, M. Semeriva, D. Gratecos. Eur. J. Biochem. 244 (1997) 176-185.