Bulletin of Mathematical Biology (2004) 66, 1241–1257 doi:10.1016/j.bulm.2003.12.006
On the 28-Gon Symmetry Inherent in the Genetic Code Intertwined with Aminoacyl-tRNA Synthetases—The Lucas Series CHI MING YANG∗ Physical Organic Chemistry and Chemical Biology, Neurochemistry and System Chemical Biology Program, Nankai University, Tian Jin 300071, China E-mail:
[email protected]
Despite considerable efforts it has remained unclear what principle governs the selection of the 20 canonical amino acids in the genetic code. Based on a previous study of the 28-gonal and rotational symmetric arrangement of the 20 amino acids in the genetic code, new analyses of the organization of the genetic code system together with their intrinsic relation to the two classes of aminoacyl-tRNA synthetases are reported in this work. A close inspection revealed how the enzymes and the 20 gene-encoded amino acids are intertwined on the polyhedron model. Complementary and cooperative symmetries between class I and class II aminoacyltRNA synthetases displayed by a 28-gon organization are discussed, and we found that the two previously suggested evolutionary axes within the genetic code overlap the symmetry axes within the two classes of aminoacyl-tRNA synthetases. Moreover, it has been shown that the side-chain carbon-atom numbers (2, 1, 3, 4 and 7) in the overwhelming majority of the amino acids recognized by each of the two classes of aminoacyl-tRNA synthetases are determined by a mathematical relationship, the Lucas series. A stepwise co-evolutionary selection logic of the amino acids is manifested by the amino acid side-chain carbon-atom number balance at ‘17’, when grouping the genetic code doublets in the 28-gon organization. The number ‘17’ equals the sum of the initial five numbers in the Lucas series, which are 2, 1, 3, 4 and 7. c 2004 Society for Mathematical Biology. Published by Elsevier Ltd. All rights reserved.
1.
I NTRODUCTION
Since Weber and Miller (1981) described reasons for the occurrence of the 20 coded protein amino acids, the fundamental properties of the 20 canonical amino acids have inspired a tremendous amount of excellent research and speculation ∗ Corresponding address: Neurochemistry and System Chemical Biology Group, Nankai University,
Tian Jin 300071, China. 0092-8240/04/051241 + 17 $30.00/0 Elsevier Ltd. All rights reserved.
c 2004 Society for Mathematical Biology. Published by
1242
C. M. Yang
(Swanson, 1984; Shcherbak, 1993; Rako˘cevi´c and Jokic, 1996; Dufton, 1997; Davydov, 1998; Rako˘cevi´c, 1998). Swanson has shown that ‘the genetic code is almost an example of a Gray code’ and ‘an example of minimum change binary code’ (1984). Shcherbak (1993) brilliantly found some unique arithmetical regularity within the 20 amino acids of the genetic code, and made a hypothesis that certain formalization might be available within these amino acids. The genetic code was more recently shown to be determined by the golden mean through the unity of the binary-code tree and the Farey tree in Rako˘cevi´c’s work (1998), and it was identified that atom number balances in amino acids are directed by golden mean route, also directed by the double–triple system of amino acids, as well as by two classes of aminoacyl-tRNA synthetases (AARS’s). As is well known, the first two bases in the tri-nucleotide codons are recognized to be the most specific fragments of the codons. Therefore, they are instrumental in analysis aimed at the elucidation of the coding principle underlying the genetic code system. The 64 codons can be divided into 16 groups of genetic code doublets, which are UUN, UCN, UGN, UAN . . . , (N = U, C, G and A). We recently explored a structural regularity within the four RNA nucleobases U, C, G and A, by invoking the nature of covalent bonding, the sp2 -hybrid nitrogenatom numbers in the nucleobase molecules, to investigate the symmetry inherent in the genetic code (Yang, 2003a). By advancing and modifying some early graphic and geometric approaches to understanding the structural features of the genetic code (Bertman and Jungck, 1979; Jim´enez-Monta˜no et al., 1996; Zhang, 1997; Karasev and Stefanov, 2001; Yang, 2003b), solid geometric analysis of the 16 genetic code doublets revealed that the rotational symmetries inherent in the distribution of both the number of the amino acids and their side-chain carbon-atom contents in the genetic code follow a quasi-28-gon (icosikaioctago) model with two hypothetical evolutionary axes (Yang, 2003b,c). The newly identified polyhedral symmetric nature as a 28-gonal pyramid, or a double-pyramidal nature (Yang, 2003b), of the genetic code, echoes the similar, but different structure of the simplest viruses, i.e., spherical viruses, which are of icosahedral symmetry (Casper and Klug, 1962; Racaniello, 1996). The insightful hypercube structural model (Jim´enez-Monta˜no et al., 1996) for the 16 nucleobase doublets enhanced our understanding of the genetic code. In comparison, our 28-gon model, in which the 16 base doublets are arranged in a polyhedral and spherical organization, instead of a block structure, took a totally different approach to understanding the multifaceted organizational properties of the coding system. The major advantage afforded by the 28-gon organization includes its simplicity and instrumental capacity in revealing the rotational symmetry and spherical features of the genetic code system, and demonstrating how the genetic code system has the capability of maintaining its own integrity. Subsequently, we found that the overall symmetric features of the code described by our spherical and polyhedral model together with two newly identified evolutionary axes are in excellent agreement with the recent Trifonov proposal
28-Gon, Genetic Code and the Lucas Series
1243
that Ala could be the first codonic amino acid (Trifonov and Bettecken, 1997; Trifonov, 2000). More interestingly, from the polyhedron symmetric model, we can further identify that the symmetric distribution of the 20 amino acids around the 28-gon excellently fits Rumer’s regularity (Shcherbak, 1989) and explain some of the unique arithmetical regularity discovered by Shcherbak (1993, 2003) and Verkhovod (1994). The amino acids in group IV (degeneracy 4) and the rest of the amino acids in the quasi-group III–II–I (Shcherbak, 1993) each occupy half of the surface on the 28-gon model (Yang, 2003b); see Fig. 1. Moreover, as is illustrated in Fig. 1(b), the quasi-28-gon organization helps explain ‘why 20’: the distribution of the 20 amino acids is symmetrical along a possible evolutionary axis from Ala codons to Tyr codons: the numbers of genetic code doublets (together with the numbers of amino acids) at every stage starting from Ala codons to Tyr codons are 1(1); 4(1, 1, 1, 1); 6(1, 1, 1, 2, 2, 2); 4(2, 2, 2, 2); and 1(1, 0), respectively. Therefore, the 16-vertex quasi-polyhedron has a pattern of 1 : 4 : 6 : 4 : 1 or αβ4 γ6 β4 α , and the point group of this naturally designed symmetry in response to the polyhedron is D2h . The genetic code can be transparently described by disassembling the 28-gon into two major blocks (Fig. 2). As the C6 hexagonal plane bisects the polyhedron, here, we present the two halves of the 28-gon in Fig. 2. The basis units in the three-dimensional arrangements of these genetic code doublets are all symmetry related. On the basis of symmetry point groups, their structural arrangements can be described as Cs symmetry for both a and b, respectively. The establishment of the genetic code essentially requires 20 enzymes called aminoacyl-tRNA synthetases (AARS’s), which catalyze aminoacylation of tRNAs by joining an amino acid to its cognate tRNA to establish the specificity of protein synthesis (Di Giulio, 1992; Hartman, 1995; Hartlein and Cusack, 1995; Cavalcanti et al., 2000; Woese et al., 2000; de Pouplana and Schimmel, 2001). In the present work we extend the study to the cooperative symmetry within the AARS’s in response to the polyhedral and rotational symmetry of the genetic code system, and examine a possibly intertwined mode—between the symmetry of the genetic code and that of the AARS’s. Finally, we reach a conclusion by identifying the Lucas series, from the carbon-atom numbers in amino acid side chains, and propose that a mathematical principle the Lucas series may have been underlying the natural and cooperative selection of the amino acid carbon contents in the genetic code.
2.
A P OLYHEDRAL S YMMETRIC M ODEL D ISPLAYING H OW THE G ENETIC C ODE MAY BE I NTERTWINED WITH AARS’ S
AARS’s possess very high specificity in selecting their cognate amino acids and tRNA substrates. At present, AARS’s are classified into two distinct and structurally unrelated classes (I and II), each class encompassing ten amino acids
1244
C. M. Yang
Figure 1. The icosikaioctagonal or 28-gonal relation within the 20 amino acids in the genetic code. (a) An icosikaioctagon or a 28-gon. The numbers at vertices indicate the edges at the vertices. (b) The distribution of the 20 amino acids on the 28-gon model. (c) The side-chain carbon-atom numbers (SCCAN) within the amino acid molecules at each of the 16 genetic code doublets. Two hypothetical evolutionary axes are indicated by arrows: ‘⇑’ as the primary (vertical) evolutionary axis, from Ala codons to Tyr codons; and ‘⇒’ as the secondary (horizontal) evolutionary axis, from Ser codons (order 4) to Met/Ile codons. Block lines (both red and blue) are edges of the polyhedron; red lines (both block and dotted) denote neighboring code doublet connection. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Figure 2. Distribution of the 20 amino acids (aa’s) in the polygon organization with a pattern of αβ4 γ6 β4 α : (a) an overview from the bottom of the 28-gon organization of the genetic code: αβ4 γ6 ; (b) an overview from the top: γ6 β4 α .
28-Gon, Genetic Code and the Lucas Series
1245
Table 1. Two classes of aminoacyl-tRNA synthetases (AARS’s) corresponding to the two classes of amino acids.a
a Here, ArgRS represents arginyl-tRNA synthetase, and so forth. Two types of lysyl-tRNA synthetase (class I and class II) are labeled accordingly.
(Woese et al., 2000); see Table 1. A well-known assumption is that the tRNAcharging function of the AARS’s evolved at least twice. Perhaps the two classes reflect a dichotomous origin of protein translation processes via two different primitive processes (Nagel and Doolittle, 1991; Woese et al., 2000). Therefore, 20 AARS’s, which are responsible for establishing the genetic code, are the potential markers of genetic code development (de Pouplana and Schimmel, 2001). Consequently, it has been suggested that the evolution of the genetic code and evolution of the AARS enzymes are intertwined (Woese et al., 2000) and, accordingly, the evolution of the AARS’s is instrumental in understanding the selective pressures maintaining the genetic code. The coexistence of the two distinct classes of AARS’s is one of the most striking features of the AARS’s (Eriani et al., 1990; Burbaum and Schimmel, 1991; Arnez and Moras, 1997; Cusack, 1997). However, despite a clear existence of correlation between the genetic code and the evolutionary patterns of the AARS’s, the biological importance of this fact is not known. According to this reasoning, an investigation into the internal relation of AARS’s with respect to the symmetric relation within the amino acids is likely to provide in-depth insight into the uniqueness, conservation and evolution of the genetic code. Given that the distribution of the 20 amino acids in the genetic code is symmetric in accordance with a 28-gon organization (Yang, 2003b), we now start from
1246
C. M. Yang
Figure 3. Amino acids, recognized by two classes of AARS’s, corresponding to the universal genetic code and the mitochondria genetic code on an icosikaioctagon or a 28-gon model. Shown by the side of the amino acid letters are the side-chain carbon-atom numbers (SCCAN) within the amino acid molecules.
examining how the polyhedral symmetric relation of the 20 amino acids within the genetic code and the AARS’s are intertwined on the polyhedral symmetric model (Fig. 3). In Fig. 3, the intertwining of the genetic code with AARS’s can be clearly displayed on the 28-gon. In an effort to explore the internal symmetric relation of the two classes of AARS’s with the 20 amino acids, we divide the whole system into three presumed stages for the genetic code and the AARS’s on the polyhedral symmetric mode (Fig. 4). The display in Fig. 4 helps to understand that a close evolutionary relation between the (class I, II) AARS’s is mirrored in the obvious coding relationship between the corresponding amino acids and their codons. Importantly, although the degeneracies of the 20 amino acids vary, exactly half of the 16 genetic code doublet positions are recognized by one of the two classes of AARS’s. It has been pointed out that the central role played by the AARS’s in translation may suggest that their evolution and that of the genetic code are somehow intertwined. Hence, new fundamental features may provide a clue to answering the question of whether the AARS’s in their evolution have contributed to the present structure of the code (Woese et al., 2000).
28-Gon, Genetic Code and the Lucas Series
1247
Figure 4. Distribution of the two classes of amino acids at the three hypothetical stages for the genetic code, which is intertwined with two classes of AARS’s on the polyhedral symmetric model.
Figure 5. Description of the symmetries and intrinsic relation of the two classes of AARS’s to the two classes of amino acids within the three presumed evolutionary blocks of the 28gon organization.
The overall symmetric relation of the genetic code, when considered with the two classes of AARS’s on the polyhedral symmetric model, can be transparently described by similarly disassembling the polyhedron into three blocks; see Fig. 5. The symmetry point groups at each of the three stages are Cs (stage 1), C2v (stage 2) and C1 (stage 3), respectively. We now analyze the amino acid side-chain carbon-atom numbers based on the polyhedral symmetry of the genetic code with respect to the two classes of AARS’s. The results are shown in Fig. 6.
1248
C. M. Yang
Figure 6. Amino acid side-chain carbon-atom number balances on the 28-gon model, their relation to the two classes of AARS’s and the unique number ‘17’ from several groups of amino acids (at their genetic code doublets). (o: amino acids recognized by class I AARS’s; •: amino acids recognized by class II AARS’s. The special case is lysine, which in some organisms is charged by a class I enzyme.) (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
As depicted in Figs. 4 and 6, after we deploy the 20 amino acids onto the polyhedral model in response to class I and II AARS’s, the two presumed evolutionary axes within the genetic code are more evident. First, the 20 amino acids can be grouped into three possible evolutionary stages along the primary hypothetical (vertical) evolutionary axis, i.e., (1) A, P, G, V and T; (2) S, L, R, E, D, M and I; (3) F, L, H, Q, N, K, C, W and Y. Second, at both stage 1 and stage 2 on the polyhedral model, there is one symmetric plane with respect to the two classes of AARS’s. Also of note is the fact that the total numbers of amino acid side-chain carbon atoms within each stage are 9, 9 + 17, 17 + 17 + 7, respectively, Fig. 6. From this step-by-step graphic examination of the AARS’s and evaluation of the corresponding amino acid side-chain carbon contents within the genetic code, it is clear that the symmetric and possible evolutionary relations between the genetic code and AARS’s are closely coiled with each other. In addition to the amino acid side-chain carbon-atom number balance at ‘17’ from several neighboring vertex (base doublet) groups at both stage 2 and stage 3 (Fig. 6), amino acid side-chain carbon-atom number balance at ‘17’ also occurs from a non-neighboring vertex group in stage 3: F(7); N(2)/K(4); H(4) ≡ Q(3); C(1)/W(9); L(4) ≡ 17 carbon atoms. On the left of the ‘equation’ are the amino acids recognized by class II AARS’s, on the right are the amino acids recognized by class I. We will discuss this unique but interesting number ‘17’ in Section 3. Provided that the 20 canonical amino acids in the genetic code can be deployed on the surface of a 28-gon model, the above atomic content analysis reveals a
28-Gon, Genetic Code and the Lucas Series
1249
Figure 7. (a) The bipyramidal nature of the genetic coding contents, displayed by (b) the amino acid side-chain carbon-atom number distribution and balance on the 28-gon model, when the 20 amino acids in the genetic code are viewed as a whole. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
bipyramidal nature of the genetic coding contents; see Fig. 7(a). On the basis of the relative carbon-atomic compositions in amino acid side chains, see Fig. 7(b), vertically, from A ⇒ P ⇒ S ⇒ F/L ⇒ Y, the total number of amino acid sidechain carbon atoms is 23; from A ⇒ T ⇒ M/I ⇒ N/K ⇒ Y, the total number of amino acid side-chain carbon atoms is also 23. By comparison, horizontally, from S ⇒ R ⇒ E/D ⇒ M/I, the total number of amino acid side-chain carbon atoms equals 17; from S ⇒ L ⇒ S/R ⇒ M/I, the total number of amino acid side-chain carbon atoms also equals 17. A ratio of 15 [(P + S + F/L) or (T + M/I + N/K)] to 9[(R + E/D) or (L + S/R)] equals 1.6666, which is not very far from the golden mean 1.6180339. . . (Rako˘cevi´c, 1998). Secondly, 15 + 17 = 32; whether the
1250
C. M. Yang
connection ‘32/23’ can be interpreted in terms of a ‘cyclic permutation’ pattern (Shcherbak, 2003) remains to be further investigated. Thirdly, the accurate balance of the amino acid side-chain carbon-atom number on the 28-gon has presumably validated the complementary feature of the amino acids, which was previously demonstrated as ‘inverse equality’ of the nucleon sum in amino acids (Verkhovod, 1994). The above findings, in conjunction with the previous systemization principle based on the arithmetic regularity (Shcherbak, 1993, 2003) and the concrete golden mean within the genetic code (Rako˘cevi´c, 1998), may help elucidate, at the encoded atomic content level, the intricacy of the genetic apparatus and explain the remarkable stability of the genetic code, which is one of the most highly conserved characters in living organisms. 3.
A F IBONACCI –L UCAS R ELATIONSHIP WITHIN THE C ANONICAL A MINO A CIDS
We now explore the possible existence of a mathematical relationship within the 20 canonical amino acids. 3.1. The Fibonacci series. A few mathematical concepts of similarity and proportion are known to be critical in understanding the growth processes in the natural world. For example, the Fibonacci series: 1, 1, 2, 3, 5, 8, . . . , is well known to lie at the heart of plant growth and living organisms (Kappraff and Marzec, 1983). The Fibonacci sequence is generated by adding the previous two numbers in the list together to form the next and so on: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . . . The basic Fibonacci relationship is in equation (1): Fi+2 = Fi+1 + Fi .
(1)
In 1202, Leonardo Fibonacci (1170–1240) in Italy discovered this simple numerical series that is the foundation for a remarkable mathematical relationship behind Phi or . Starting with 0 and 1, each new number in the series is simply the sum of the two numbers before it. Divide any number in the Fibonacci sequence by the one before it, for example 55/34, or 21/13, and the answers promptly converge on the golden mean Phi or , 1.6180339887498 . . .. 3.2. The Lucas series. Another mathematician, Edouard Lucas (1842–1891) in France, gave the above series of numbers 0, 1, 1, 2, 3, 5, 8, 13, . . . the name the Fibonacci numbers, and established that an alternative series occurs when investigating the Fibonacci number patterns: 2, 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, . . . .
28-Gon, Genetic Code and the Lucas Series
1251
Here, the Fibonacci rule of adding the latest two to get the next one is reserved, but it starts from 2 and 1 (in this order) instead of 0 and 1 for the regular Fibonacci numbers. This series, called the Lucas numbers, is defined as follows (where one writes its members as Li , for Lucas): Li = Li−1 + Li−2
for i > 1
(2)
L0 = 2 L1 = 1 L2 = 3. Three formulae are usually available for relating the Fibonacci to the Lucas numbers. The first two are Li = Fi−1 + Fi+1
for all integers i
(3)
for all integers i.
(4)
and 5Fi = Li−1 + Li+1
The third one is based on what are called the Lucas factors of the Fibonacci numbers: (5) F2i = Fi × Li . Therefore, the properties of Lucas numbers are very closely associated with the properties of Fibonacci numbers. 3.3. The Lucas relationship within the 20 canonical amino acids. Previous studies show that it is possible to subdivide the intricate subset of 20 different amino acids in many contrasting and overlapping ways. However, from a bioorganic synthesis point of view, these 20 amino acids structurally and chemical compositionally vary from one to another in their individually distinctive sidechain groups, which carry a series of numbers of carbon atoms. In view of the structural chemistry of a biomolecule, the number of carbon atoms within an organic molecule or within a particular functional group of the molecule can often provide a clue to its biochemical origin and evolutionary features. In the previous section, we discussed that, on the 28-gonal symmetric model of the genetic code system (Fig. 7b), the ratio of 15 [(P+S+F/L) or (T+M/I+N/K)] to 9 [(R + E/D) or (L + S/R)] equals 1.6666, which may be relevant to the golden mean 1.6180339. . . (Rako˘cevi´c, 1998) and a systemization principle (Shcherbak, 1993, 2003). The numbers of side-chain carbon atoms in the 20 amino acids are composed of very simple numbers 0, 1, 2, 3, 4, 7 and 9, i.e., ranging from 0 for glycine (G) to
1252
C. M. Yang
Figure 8. Numbers of side-chain carbon atoms within the 20 canonical amino acids encoded at the 16 genetic code doublet (vertex) positions on the polyhedral model: 0; 1; 1; 1; 2; 2; 2 (=1 + 1); 3; 3; 3; 3; 3 (=2 + 1); 4; 4; 4; 4; 4 (=2 + 2 or 3 + 1); 7; 7 (=4 + 3) and 9 (=7 + 2). ‘o’ denotes ‘stop’ codons.
9 for tryptophan (W) (Fig. 8). A very interesting phenomenon is that any bigger number of the side-chain carbon atoms (>2) in a bigger amino acid can be the sum of the two smaller numbers of the side-chain carbon atoms in two smaller amino acids from ancestor codons [equation (6)]. These two properties of the canonical amino acids are reminiscent of the Fibonacci–Lucas relationship [see equations (1), (2) and (6)]: (6) aai + aai+1 = aai+2 . Three observations can be obtained from a comparison of the Fibonacci–Lucas numbers with the side-chain carbon-atom numbers within the amino acids in Table 2. First, except glycine (G), which possesses no side-chain carbon atom, and tryptophan (W), which carries nine carbon atoms on its side chain, 18 other canonical amino acids carry from 1, 2, 3, 4 to 7 side-chain carbon atoms, respectively, which exactly match the initial five consecutive Lucas numbers, 2, 1, 3, 4 and 7. Second, within the type I amino acids recognized by class I AARS’s, the numbers of side-chain carbon atoms include 1, 3, 4 and 7, except for W, following the Lucas rule, independent of the type II amino acids. Within the type II amino acids recognized by class II AARS’s, except G, the numbers of side-chain carbon atoms are 1, 2, 3, 4 and 7, also consistent with the Lucas sequence, independent of the type I amino acids. The above two observations merge to a suggestion that the majority of the 20 canonical amino acid side chains selected in the genetic code are naturally selected by a double golden mean. Moreover, the sum of 1, 2, 3, 4 and 7 equals 17 (1 + 2 + 3 + 4 + 7 = 17), interestingly in conformity with the number ‘17’, which has been repeatedly found for the amino acid side-chain carbon-atom balance on the 28-gon model, as is pointed out in Section 2. Seemingly, the strong analogy between the amino acid side-chain carbon-atom numbers (SCCAN) and the Fibonacci–Lucas series prompts an immediate inquiry as to whether this quantitative property of the canonical amino acids appears to be a coincidence, especially considered together with Miller’s prebiotic simulation
28-Gon, Genetic Code and the Lucas Series
1253
Table 2. A comparison of some Fibonacci–Lucas numbers with the side-chain carbon-atom numbers (SCCAN) of the canonical amino acids.a
a Shown by the side of the amino acid letters are the side-chain carbon-atom numbers (SCCAN)
within the amino acid molecules. Type I amino acids are recognized by class I AARS’s; type II amino acids are recognized by class II AARS’s; the special case is lysine, which in some organisms is charged by a class I enzyme. Amino acids in bold are produced from Miller’s prebiotic simulation experiment (Miller, 1987).
experiment (Miller, 1987), in which a highly diversified class of small biomolecules in addition to some of the standard amino acids are produced. However, besides the symmetric feature manifested by the amino acid side-chain carbon-atom number balance which is directed by both the 28-gon organization of the genetic code and the two classes of AARS’s, remarkably, the numbers ‘5’ and ‘6’, which are not in the Lucas series, are not incorporated into the amino acid SCCAN category either. Further analyses may or may not reveal whether this is more than just coincidental. Therefore, a further aspect of the 28-gon structural organization of the genetic code is its instrumental capacity to allow the perceivable identification of the sidechain carbon-atom number balance, in particular, at ‘17’, by grouping the amino acids from neighboring genetic code doublets on the 28-gon (Fig. 9). The accurate balance of carbon-atom numbers from the amino acid side chains provided strong evidence for a highly cooperative and complementary coding process of the genetic apparatus. The number ‘17’ equals the sum of the initial five numbers in the Lucas series, i.e., 2, 1, 3, 4 and 7, revealing that the Fibonacci–Lucas relationship, in combination with the two classes of AARS’s, plays a central role in evolutionarily selecting the overwhelming majority of the 20 amino acids (Yang, 2003a,b,c). Additionally, the phenomena discussed here are again in support of the cooperative feature in the genetic code, which can be manifested at multiple levels.
1254
C. M. Yang
Figure 9. Amino acid side-chain carbon-atom number balance at ‘17’ obtained by grouping the genetic code doublets. The number ‘17’ is equal to the sum of the initial five numbers in the Lucas series: 2, 1, 3, 4 and 7. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Within the past few decades, significant strides have been made toward understanding the molecular basis of the genetic code. One early important advance in this area was the definition of an amino acid property called the polar requirement (Woese et al., 1966). Recently, much progress has also been made in elucidating how specific interactions between amino acids and nucleic acids may have played an important role in the origin of the genetic code (Yarus, 1998; Szathmary, 1999). Despite considerable efforts, the evolutionary dynamics and mechanism that shaped the code remain an enigma, and what property or properties of the amino acids the code actually reflect has long remained a mystery. As described above, an apparently decisive connection between the Lucas series (a numerical basis of the golden mean) and amino acid side-chain carbon-atom numbers suggests that the carbon-atom contents in the 20 amino acids may be part of the intrinsic necessity to be selected into the genetic code system, which is confined by the Lucas series-based golden mean, in cooperation with the two classes of AARS’s, when the 20 amino acids in the genetic code system are viewed as a whole.
4.
C ONCLUSION
On the basis of a detailed evaluation of the amino acid side-chain carbon-atom numbers, it has been identified in this study that the 28-gonal and rotational symmetric features within the internal relation of the 20 amino acids in the genetic code are stepwise intertwined with the two classes of AARS’s. The two previously suggested evolutionary axes within the genetic code cooperatively overlap the symmetry axes within the two classes of AARS’s on the 28-gon model.
28-Gon, Genetic Code and the Lucas Series
1255
Moreover, irrespective of the side-chain structural diversity and functional diversity of the 20 amino acids, it is probably the Lucas relationship that initiated, drove forward and confined the natural selection of the side-chain carbon-atom numbers (1, 2, 3, 4 and 7) in the canonical amino acids. The specific 28-gon organization model described for the genetic code has therefore revealed probably significant parts of the intricate features of the genetic coding framework. The overall findings may provide new insight into biological understanding of the universality of the genetic code, and stimulate new theoretical efforts towards further exploring the information logic in the genetic apparatus.
A BBREVIATIONS aa, amino acid; 20 amino acids are represented by A (Ala), P (Pro), V (Val), G (Gly), T (Thr), S (Ser), L (Leu), R (Arg), D (Asp), E (Glu), M (Met), I (Ile), F (Phe), C (Cys), W (Trp), H (His), Q (Gln), N (Asn), K (Lys) and Y (Tyr). ‘o’ denotes ‘stop’ codons. SCCAN: side-chain carbon-atom number; AARS: aminoacyl-tRNA synthetase.
ACKNOWLEDGEMENTS The author is grateful to Prof. Zhenyang Lin (Department of Chemistry and Chemical Technology, Hong Kong University of Science and Technology, Hong Kong) for many valuable comments on symmetry, Prof. Yongchuan Chen and Dr. Qinghu Hou (College of Mathematics, Nankai University) for a helpful discussion on graphic features. The author thanks Prof. Weiping Zhang and Prof. Yiming Long (College of Mathematics, Nankai University) for friendly advice on geometric concepts, Dr. Kai Chen (Harvard Medical School, USA), Mr. Sanzhong Luo, my students Mr. Yitong Li, Ms. Xiaofeng Zhao and Mr. Tai Huang for excellent assistance in a massive literature search. The author also thanks two anonymous referees for very insightful suggestions.
R EFERENCES Arnez, J. G. and D. Moras (1997). Structural and functional considerations of the aminoacylation reaction. Trends Biochem. Sci. 22, 211–216. Bertman, M. O. and J. R. Jungck (1979). Group graph of the genetic code. J. Hered. 70, 379–384. Burbaum, J. J. and P. Schimmel (1991). Structural relationships and the classification of aminoacyl-tRNA synthetases. J. Biol. Chem. 266, 16965–16968. Casper, D. L. D. and A. Klug (1962). Physical principles in the construction of regular viruses. Cold Spring Harb. Symp. Quant. Biol. 27, 1–24.
1256
C. M. Yang
Cavalcanti, A. R., B. D. Neto and R. Ferreira (2000). On the classes of aminoacyl-tRNA synthetases and the error minimization in the genetic code. J. Theor. Biol. 204, 15–20. Cusack, S. (1997). Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol. 7, 881–889. Davydov, O. V. (1998). Amino acid contribution to the genetic code structure: end-atom chemical rules of doublet composition. J. Theor. Biol. 193, 679–690. Di Giulio, M. (1992). The evolution of aminoacyl-tRNA synthetases, the biosynthetic pathways of amino acids and the genetic code. Orig. Life Evol. Biosph. 22, 309–319. Dufton, M. J. (1997). Genetic synonym quotas and amino acid complexity: cutting the cost of proteins? J. Theor. Biol. 187, 165–173. Eriani, G., M. Delarue, O. Poch, J. Gangloff and D. Moras (1990). Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347, 203–206. Hartlein, M. and S. Cusack (1995). Structure, function and evolution of seryl-tRNA synthetases: implications for the evolution of aminoacyl-tRNA synthetases and the genetic code. J. Mol. Evol. 40, 519–530. Hartman, H. (1995). Speculations on the evolution of the genetic code IV. The evolution of the aminoacyl-tRNA synthetases. Orig. Life Evol. Biosph. 25, 265–269. Jim´enez-Monta˜no, M. A., R. de la Mora-Basa˜nez and T. P¨oschel (1996). The hyperstructure of the genetic code explains conservative and non-conservative amino acid substitutions in vivo and in vitro. Biosystems 39, 117–125. Kappraff, J. M. and C. Marzec (1983). Properties of maximal spacing on a circle relate to phyllotaxis and to the golden mean. J. Theor. Biol. 103, 201–226. Karasev, V. A. and V. E. Stefanov (2001). Topological nature of the genetic code. J. Theor. Biol. 209, 303–317. Miller, S. L. (1987). Which organic compounds could have occurred on the prebiotic earth? Cold Spring Harb. Symp. Quant. Biol. 52, 17–27. Nagel, G. M. and R. F. Doolittle (1991). Evolution and relatedness in two aminoacyl-tRNA synthetase families. Proc. Natl. Acad. Sci. USA 88, 8121–8125. Racaniello, V. R. (1996). Early events in poliovirus infection: virus–receptor interactions. Proc. Natl. Acad. Sci. USA 21, 11378–11381. Rako˘cevi´c, M. M. (1998). The genetic code as a golden mean determined system. BioSystem 46, 283–291. Rako˘cevi´c, M. and A. Jokic (1996). Four stereochemical types of protein amino acids: synchronic determination with chemical characteristics, atom and nucleon number. J. Theor. Biol. 183, 345–349. Ribas de Pouplana, L. and P. Schimmel (2001). Aminoacyl-tRNA synthetases: potential markers of genetic code development. Trends Biochem. Sci. 26, 591–596. Shcherbak, V. I. (1989). Rumer’s rule and transformation in the context of the co-operative symmetry of the genetic code. J. Theor. Biol. 139, 271–276. Shcherbak, V. I. (1993). Twenty canonical amino acids of the genetic code: the arithmetical regularity. J. Theor. Biol. 162, 399–401. Shcherbak, V. I. (2003). Arithmetic inside the universal genetic code. Biosystems 70, 187–209. Swanson, R. A. (1984). A unifying concept for the amino acid code. Bull. Math. Biol. 46, 187–203. Szathmary, E. (1999). The origin of the genetic code—amino acids as cofactors in an RNA world. Trends Genet. 15, 223–229.
28-Gon, Genetic Code and the Lucas Series
1257
Trifonov, E. N. (2000). Consensus temporal order of amino acids and evolution of the triplet code. Gene 261, 139–151. Trifonov, E. N. and T. Bettecken (1997). Sequence fossils, triplet expansion, and reconstruction of earliest codons. Gene 205, 1–6. Verkhovod, A. B. (1994). Alphanumerical divisions of the universal genetic code: new divisions reveal new balances. J. Theor. Biol. 170, 327–330. Weber, A. L. and S. L. Miller (1981). Reasons for the occurrence of the twenty coded protein amino acids. J. Mol. Evol. 17, 273–284. Woese, C. R., D. H. Dugre, W. C. Saxinger and S. A. Dugre (1966). The molecular basis for the genetic code. Proc. Natl. Acad. Sci. USA 55, 966–974. Woese, C. R., G. J. Olsen, M. Ibba and D. S¨oll (2000). Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64, 202–236. Yang, C. M. (2003a). Nitrogen atom hybrid in nucleobases suggests a primordial “core” in the genetic code? (submitted for publication). Yang, C. M. (2003b). On the quasi-icosikaioctagon (quasi-28-gon) symmetry and a presumed evolutionary axes of the genetic code (submitted for publication). Yang, C. M. (2003c). On the chemistry and three-dimensional display of the genetic code (submitted for publication). Yarus, M. (1998). Amino acids as RNA ligands: a direct-RNA-template theory for the code’s origin. J. Mol. Evol. 47, 109–117. Zhang, C. T. (1997). A symmetrical theory of DNA sequences and its applications. J. Theor. Biol. 187, 297–306. Received 22 August 2003 and accepted 3 December 2003