BioSystems 98 (2009) 105–114
Contents lists available at ScienceDirect
BioSystems journal homepage: www.elsevier.com/locate/biosystems
The fourfold way of the genetic code ˜ a,b,∗ Miguel Angel Jiménez-Montano a b
Division of Mathematics, Science, and Technology, Parker Building, Nova Southeastern University, Fort Lauderdale, FL, 33314-7796 USA Facultad de Física e Inteligencia Artificial, Universidad Veracruzana, Xalapa, 91000 Veracruz, Mexico
a r t i c l e
i n f o
Article history: Received 24 February 2009 Received in revised form 14 July 2009 Accepted 16 July 2009 Keywords: Genetic code table Klein-4 group Outer product Genetic language tRNAs aminoacylation
a b s t r a c t We describe a compact representation of the genetic code that factorizes the table in quartets. It represents a “least grammar” for the genetic language. It is justified by the Klein-4 group structure of RNA bases and codon doublets. The matrix of the outer product between the column-vector of bases and the corresponding row-vector VT = (C G U A), considered as signal vectors, has a block structure consisting of the four cosets of the K × K group of base transformations acting on doublet AA. This matrix, translated into weak/strong (W/S) and purine/pyrimidine (R/Y) nucleotide classes, leads to a code table with mixed and unmixed families in separate regions. A basic difference between them is the non-commuting (R/Y) doublets: AC/CA, GU/UG. We describe the degeneracy in the canonical code and the systematic changes in deviant codes in terms of the divisors of 24, employing modulo multiplication groups. We illustrate binary sub-codes characterizing mutations in the quartets. We introduce a decision-tree to predict the mode of tRNA recognition corresponding to each codon, and compare our result with related findings by Jestin and Soulé [Jestin, J.-L., Soulé, C., 2007. Symmetries by base substitutions in the genetic code predict 2 or 3 aminoacylation of tRNAs. J. Theor. Biol. 247, 391–394], and the rearrangements of the table by Delarue [Delarue, M., 2007. An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13, 161–169] and Rodin and Rodin [Rodin, S.N., Rodin, A.S., 2008. On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity 100, 341–355], respectively. © 2009 Elsevier Ireland Ltd. All rights reserved.
“La langue denotes the abstract systematic principles of a language, without which no meaningful utterance (parole) would be possible”. Ferdinand de Saussure, Course in General Linguistics 1. Introduction Since the discovery of the double helix in 1953, it was clear that its structure implied that DNA can accommodate almost any sequence of base-pairs without violation of physical laws, namely, any combination of the bases adenine (A), cytosine (C), guanine (G) and thymine (T) and, hence any digital message or information (Hood and Galas, 2003). The same is true for the complementary RNA transcript, called messenger RNA (mRNA), made up of A, C, G and uracil (U), instead of T. As it was later established, the four bases of the DNA and RNA alphabets are related to the 20 amino acids of the protein alphabet by a triplet code: each three letters (or
∗ Correspondence address at: Facultad de Física e Inteligencia Artificial, Universidad Veracruzana, Sebastian Camacho # 5, Xalapa, 91000 Veracruz, Mexico. Tel.: +1 52 228 817 2957; fax: +1 52 28 228 817 2855. E-mail addresses:
[email protected],
[email protected]. URL: http://www.uv.mx/ajimenez. 0303-2647/$ – see front matter © 2009 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2009.07.006
‘codons’) in a gene encodes one amino acid. The dictionary of DNA letters that make up the amino acids is called the genetic code. Specifically, the code defines a mapping between tri-nucleotide sequences and amino acids. There are 64 different codons, 61 of which encode an amino acid (with some exceptions in deviant codes), and three of which are used for ‘punctuation’ in that they signal the termination of the growing protein chain. Currently, a table or codon catalogue, which has remained unchanged for 40 years, displays the code dictionary. Although there is a considerable literature about the structure and evolution of the genetic code, two fundamental questions have remained almost untouched. First, how do we explain the simplicity-out-of-complexity that the table itself represents? Second, is there an optimal way to write the table? (Lehmann, 2000). That is, a most compact and informative arrangement that reflects not only the apparent symmetries of the code but also hidden symmetries. These are of special interest as they may highlight underlying organization principles of the code, as recently argued by Jestin and Soulé (2007) and corroborated by us. Here, we direct our attention to the second question and only briefly discuss some ideas that could shed some light on the first question. Except for the obvious purine/pyrimidine division of the third base degeneracy, the table information about any hidden symmetries underlying the observed structure of the code in multiplets
106
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114 Table 1 The fourfold genetic code tablea .
a
Shaded region: Four-member boxes (pro-robustness); white region mixed boxes (pro-diversity), t = termination signal (Ter). Amino acids displayed in single-letter notation.
is almost lacking. Let alone, about the two modes of tRNA recognition, from the minor and major groove sides of the acceptor stem, respectively. Coloring with different colors the codons recognized by the two types of synthetases, in the usual table (Ribas de Pouplana and Schimmel, 2001), neither shows any regular pattern nor separates the table in non-overlapping regions. In the Universal (or canonical) genetic code the codons are unevenly distributed among the amino acids, with degeneracy 6, 4, 3, 2, and 1. However, they show an organization in 16 codon boxes, associated with the first two bases of each codon. Eight of them are standard 1-aa boxes each assigning four codons to one amino acid, and five are standard 2-aa boxes each assigning two pairs of codons to two different amino acids. To find an optimal arrangement of the table we notice that the order in the pattern of degeneracy families depends on the succession in which the four bases are written. From the 4! = 24 possibilities, most of the formerly proposed orderings separate pyrimidines (Y = U, C) from purines (R = A, G) as in the usual one [U, C, A, G] (see e.g., He et al., 2004; Jungck, 1978; Lehmann, 2000; Lehmann and Libchaber, 2008). The alternating pyrimidine/purine ordering [C, G, U, A], which is appropriate for our main purpose, to find a compact representation of the genetic code that factorizes the table in quartets and conveys degeneracy through codon classes (Table 1), was originally proposed by Danckwerts and Neubert (1975). It was rediscovered by Duplij and Duplij (2000), building on the parallel research of Rumer (1968). Recently, Delarue (2007) reintroduced the ordering proposed by He et al. (2004) [C, U, G, A], with a completely different aim in mind: to find a binary partition of the code table deduced from the two existing aminoacylation mechanisms (making use of either the 2 OH or the 3 OH of the terminal A76 of the tRNA). With a similar purpose, Rodin and Rodin (2006, 2008) proposed an R/Y mirrorsymmetric order [U, A, G, C] for their rearrangement of the genetic code that separates the table in a yin-yan like pattern, where the two regions mark the two modes of tRNA recognition. Elaborating on these works and our former proposal about the structure of ˜ 2004), we built a decision-tree the genetic code (Jiménez-Montano, (Fig. 3) to predict the mode of tRNA recognition, from the minor and major groove sides of the acceptor stem, respectively. Therefore, two of our objectives and the ones in the report by Jestin and Soulé (2007), about the sets of twofold symmetries between partitions of the universal genetic code to study the properties of the degeneracy pattern and of tRNA aminocylation specificity, are closely related. To discover hidden symmetries underlying the distribution of redundancy in the code, one must understand why the set of codons splits in two groups such that in one group the third base
is necessary to specify the amino acid, and in the other is not (Rumer, 1968; Danckwerts and Neubert, 1975; Jestin and Soulé, 2007). Also, why it splits in two complete codes of 32 codons each: NNS and NNW, where W = [A, T/U] are weak bases, S = [C, G] are strong bases, and N is any base, such that both specify the 20 amino acids and termination signal as in the more sym˜ et al., 1996). To metrical mitochondrial code (Jiménez-Montano answer these and similar questions, several representations of the genetic code have been proposed in the literature. Due to its digital nature, the code admits a description in terms discrete mathematical tools, such as: finite groups, formal grammars, mathematical logic and other abstract algebraic structures (Rumer, 1968; Danckwerts and Neubert, 1975; Bertman and Jungck, 1979; Findley et al., 1982; Swanson, 1984; Antillon and Ortega-Blake, ˜ ˜ et al., 1996; 1985; Jiménez-Montano, 2004; Jiménez-Montano Zhang, 1997; Karasev and Sorokin, 1997; Petoukhov, 1999; Duplij ˇ and Duplij, 2000; Stambuk, 2000; Negadi, 2002, 2003; Magini and Hornos, 2003; MacDónaill and Manktelow, 2004; He et al., 2004; Sánchez et al., 2005; Jestin, 2006; Jestin and Soulé, 2007). Justifying the proposed arrangement of the codon catalogue (Table 1), is the central mathematical object in this paper: the matrix M (7) of the outer product between the column-vector of bases and the transpose vector VT = (C G U A), considered as signal vectors (see Section 3). Duplij and Duplij (2005) introduced this matrix, which has no obvious block structure, without making the signal interpretation of the vectors we make here. We discerned its 4 × 4 block structure in terms of the four cosets of the K × K group of base transformations acting on doublet AA. Formerly, Bertman and Jungck (1979) considered these cosets and realized that they do present some interesting features with respect to the redundancy distribution within the genetic code. However, they did not make any proposal for using them to optimize the code table. It is only after translating the matrix M (7) to binary alphabets, that we found that each block consists of the same B1 B2 dinucleotide-matrix (8), in the R/Y alphabet, and one of the four B1 B2 dinucleotide-matrices (9) corresponding to the 4 syntactic classes: SS, WW, SW, and WS, in the W/S alphabet. As we shall see, these symmetries of the codon doublets are responsible for the quartet-structure of the table, and its separation in two disjoint regions (Table 1). This structure is consistent with the syntactic tree to classify codons with respect to third base ˜ et al., 1996). The formudegeneracy proposed in (Jiménez-Montano lation of the code in terms of syntactical categories (codon classes) is an adaptation to the genetic language of Harris’s principle of least grammar (Harris, 1998). For natural languages, Harris emphasized the importance to find a “least grammar,” a description that required an absolute minimum of primitive objects and relations. Any additional objects or relations in the description introduce extrinsic structure that obscures the informational structure in language. Therefore, any structure as is found comes out in the process of making a least description. We notice, however, that a minimal description is by no means unique. The answer to the first question posed above, about an explanation for the simplicity-out-of-complexity that the table itself represents, has an important evolutionary component. We recall that, physically, the realization of the code consists of two processes. On one side, in the tRNA adaptor molecule the anticodon reads the codon in mRNA through base-pairing. On the other side, aminoacylation reactions link specific amino acids to tRNAs that bear anticodons, with the help of 20 aminoacyl-tRNa synthetases. Unlike the conserved codon assignments to amino acids in the genetic code, which is necessary to define a code, anticodon usages are variable and idiosyncratic among many organisms and organelles, due to the evolutionary change. In different stages of the code evolution, different organisms followed diverse strategies of decoding (Tong and Wong, 2004). It is in the evolution of wobble
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
rules (Crick, 1966), base modifications in the anticodon loop, and evolutionary pressures associated with CG and AT content, where the complexity of the code lies. In some lineages, these changes have produced various rearrangements of the codon boxes leading to deviant codes (see Section 4). A current assumption is that wobbling was initially maximized (Lehmann and Libchaber, 2008). This idea is consistent with “Superwobbling”, or the reading of all four nucleotides in the third codon position with an unmodified U. Recently, Rogalski et al. (2008) have demonstrated this reading mechanism for glycine triplets. The remaining part of this paper is organized as follows. In Section 2, we summarize some facts about the Klein-4 group structure of the bases and codon doublets, employing an intuitive approach suitable for the non-specialist. In Section 3, we perform a signal analysis of B1 B2 codon doublets. From the outer product matrix M (7), we get the corresponding matrices in terms of the W/S and R/Y categories and obtain the rearrangement of the usual codon catalogue into the fourfold table of the genetic code (Table 1). We report a basic difference between mixed and unmixed codon families: the non-commuting (R/Y) doublets: AC/CA, GU/UG. In Section 4, we propose an explanation of the degeneracy in the canonical code and the systematic changes in deviant codes in terms of the divisors of 24. In Section 5, with the help of a Gray code representation of matrix M (7), we discuss binary sub-codes characterizing mutations in the quartets, which do not occur in other representations of the genetic code. In Section 6, we introduce a decision-tree (Fig. 3), to classify codons with respect to the two modes of tRNA recognition, from the minor and major groove sides of the acceptor stem, respectively. We compare our result with the symmetries by base substitutions in the genetic code and 2 or 3 aminoacylation of tRNAs discussed by Jestin and Soulé (2007) and the works by Rodin and Rodin (2008), and Delarue (2007), respectively. Finally, in Section 7 we discuss our findings and extract our conclusions. 2. The Algebraic Structure of the Bases and Codon Doublets 2.1. The Order of the Bases Rumer (1968) and independently Danckwerts and Neubert (1975) first achieved the elucidation of the symmetries of codon ˜ et al., doublets (see Bertman and Jungck, 1979; Jiménez-Montano 1996; Karasev and Sorokin, 1997; Zhang, 1997; Negadi, 2002, 2003; He et al., 2004; Duplij and Duplij, 2005; Sánchez et al., 2005; Jestin, 2006; Jestin and Soulé, 2007). In a pioneering paper, Danckwerts and Neubert (1975) found the group structure of the RNA alphabet and the symmetries of the sixteen B1 B2 codon doublets, in terms of the Klein-4 group of base transformations. Here, we are going to adopt the definition of exchange operators proposed by Danckwerts and Neubert (1975) and discussed by Bertman and
107
Jungck (1979). Accordingly, from the 4! = 24 possibilities to place the 4 bases in the vertexes of a rectangle (Fig. 1), we adopted the order: C, G, U, A, and defined a digital representation of the bases following the Gray code: C (11), G (01), U (00), A (10). The first bit refers to the amino/keto character of the base (M/K), and the second one to the number of H-bonds (S/W). Starting from any vertex and moving in a circle to the right or to the left, the codes change a single bit from one base to the next one. We numbered the vertexes in the clockwise direction in descending order 4, 3, 2, 1. From the mirror-image ordering [A, U, G, C], and employing heuristic reasoning about the variability of GC and purine contents in codon usage, Yu (2007) found a genetic code table divided in quartets which is equivalent to our table, except for a rotation that transforms columns and rows, with respect to the ordinary codon catalogue. Since the table by Yu is the dual of our table, our mathematical approach explains also the symmetries in his table, without appealing to evolutionary pressure arguments. We emphasize the fact that both tables relate phenotypic characteristics of each quartet not only with one-base changes, as discussed by Yu for his table, but also to one-bit changes of the corresponding codons (see Section 5). Previously, Wilhelm and Nikolajewa (2004) also proposed an alternative scheme for the genetic code. Their binary representation separates fourfold and less than fourfold codon sets and displays explicitly several symmetries of the code. However, it is not compact and does not retain the organization in rows and columns of bases found in the current table or in our table. 2.2. The Klein-4 Group The Klein-4 group (K) is the group of smallest order that is noncyclic. It is an Abelian (commutative) group. It is also the subgroup V (Vierergruppe) of the symmetric group (S4 ), consisting of the following four permutations: (1); (12) (34); (13) (24); (14) (23), where we have employed the cycle notation (Milson, 2007), and (1) is the identity permutation. It is possible to realize V as the set of symmetries of a non-square rectangle (Fig. 1). A clockwise rotation of 180◦ corresponds to the permutation (13) (24). Flipping the rectangle over the horizontal axis gives the permutation (14) (23), and a flip over the vertical axis gives the permutation (12) (34). From Fig. 1, we have the following correspondences with the three transformations of the bases, ␣, , ␥: (13) (24) ↔ ␥ ⇒ R/Y invariance (transitions). (14) (23) ↔ ␣ ⇒ M/K invariance (transversion between noncomplementary bases). (12) (34) ↔  ⇒ W/S invariance (transversion between complementary bases). Where the base categorizations are: R: (A, G) Y: (C, U/T) W: (A, T/U) S: (C, G) M: (A, C) K: (G, U)
Fig. 1. Exchange operators ␣,  and ␥, which obey a Klein-4 group, and Gray code representation of the nucleotide bases. For the definition of letters and other symbols see text.
Purines Pyrimidines Weak bases (2 H-bonds) Strong bases (3 H-bonds) Amino bases Keto bases
Fig. 2. Hasse diagram of subgroups of the Klein-4 group (K).
108
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
Thus, the Klein-4 group of base-transformations is K ≡ V: = {e, ␣, , ␥}, with the usual multiplication table (see e.g., Bertman and Jungck, 1979), where e is the identity transformation. Fig. 2 is the Hasse diagram of the lattice of subgroups of K. Thus, by definition of a lattice, the exchange operators are not comparable. Therefore, it is incorrect to assume that any pair of them is fundamental, while the third one is a derived transformation, as others and we (Jiménez˜ et al., 1996) did before. Montano 2.3. The Structure of Codon Doublets Perhaps the primordial feature of the genetic code associated with a hidden symmetry is the splitting of the 64 codons in two sets, of 32 codons each, such that in one the third base is necessary to specify the amino acid and in the other is not. This symmetry of the codons is inherited from the corresponding symmetry of B1 B2 doublets: the 16 doublets are separated in two octets, M1 and M2 , distinguished by their ability of amino acid determination. The doublets in M1 (CC, CG, GC, CU, UC, GU, AC, GG) determine the amino acid independently of the third base (fourfold degenerate codons); while the doublets in M2 (UU, UG, CA, AG, GA, AU, UA, AA) require the third base to determine the amino acid (less than fourfold degenerate codons) (Rumer, 1968; Danckwerts and Neubert, 1975). We may express the octets M1 and M2 in a compact way, in terms of syntactic categories: M1 = {SC, WC, SK} and M2 = {SA, WA, WK}. In this representation, is evident that each set is invariant under the (, 1) transformation, that interchanges weak and strong bases among themselves in the first position (vide infra). One can go from one set to the other by applying the amino/ketopreserving exchange operator (Fig. 1) to the first and second bases: (˛,˛)
M1 ←→M2
(1)
where ␣ is the “Rumer transformation” (Rumer, 1968), applied only to the B1 B2 doublet: ˛
˛
C ←→A G←→U,
(2)
Recently, Jestin and Soulé (2007) obtained a new transformation for whole codons, which involves substitutions of the three bases, that accomplish the same result that Rumer transformation. It is the transformation (AG/CT) for the first base (a transition), and the transformation (GT/AC), for the second and third bases. To analyze codon doublets, Danckwerts and Neubert (1975) introduced a Cartesian product of two Klein-4 groups (K × K) = {e, ␣, , ␥} × {e, ␣, , ␥},with multiplication defined coordinate-wise, i.e., (x, y) × (z, w) = (xz, yw); with identity element (e, e), and where the inverse for each element (x, y) is (x, y)−1 = (x−1 , y−1 ) (see also Bertman and Jungck, 1979). Then, they showed that the two octets of dinucleotides M1 and M2 are invariant by operating with (, 1) on B1 B2 . Since the  transformation preserves the W/S character of the nucleotide (Fig. 1), we see that the third base degeneracy of a codon does not depends on the exact base B1 but only on its H-bond property ˜ et al., 1996). Jestin (2006) recently formulated (Jiménez-Montano this symmetry for codons. He writes, “Another unique symmetry is described here: it exchanges each group into itself by substitution of G into C, C into G, A into T or T into A and is applied to the first codon base”. In contrast, no operation on B2 leaves M1 or M2 invariant, indicating that B2 is more important than B1 for the stability of the two octets. Lehmann and Libchaber (2008) confirmed the conclusion about the importance of B2 , by the structural analysis of the codon–anticodon interaction in the ribosome. Furthermore, Lim and Curran (2001) have pointed out that prohibition of mismatches at the second codon position has implications for the distribution of mixed boxes.
From the formal point of view, Bertman and Jungck (1979), Table VI reached the conclusion that although it is impossible to divide the 16-group (K × K) into two cosets of eight elements each corresponding to M1 and M2 , it is possible to divide this group into four cosets with four elements each. We noticed that the 4 × 4 matrix of doublets M (7), of the outer product between the columnvector of the bases and the corresponding row-vector VT = (C G U A), considered as signal vectors (see Section 3), is identical to a block matrix built from the cosets if, and only if, we make the following identities: 1 = {UG, AC, UC, AG}, 2 = {AA, UU, AU, UA}, 3 = {GU, CA, GA, CU}, 4 = {GC, CG, GG, CC}.
(3)
where the numbering corresponds to the adopted sequence of bases (Fig. 1) (see matrix M(1) (8) in the next section). Translating these sets into the W/S alphabet, a hidden regularity becomes apparent: the doublets in each group belong to different syntactic classes: 1 → {WS} 2 → {WW} 3 → {SW} 4 → {SS}
(4)
The importance of this result for the fourfold arrangement of the genetic code (Table 1) will be clear in the next section. 3. The Signal Analysis of Codon Doublets 3.1. Outer Product Analysis In many experimental situations, one may wish to determine the relationships that exist between two types of signal. For example, Near Infrared (NIR) and Mid Infrared (MIR), MIR and Raman, UV–Visible, etc. (Barros et al., 2008). For this purpose, it is useful to get the two sets of signals for the same samples under study and analyze how they vary simultaneously as a function of some property, such as concentration. Outer Product Analysis (OPA) is the appropriate mathematical technique for this kind of testing (Barros et al., 2008). Here, we applied this procedure to B1 B2 codon doublets, employing signal vectors composed of nucleotide bases. Essentially the procedure starts by calculating the products of the intensities in the two signal domains for each sample. Multiplying all the intensities of one domain by all the intensities in the other domain, results in a data matrix containing all possible combinations of the intensities in the two domains. The outer product of two signal vectors (assumed of the same length m) for the n samples give n square (m × m) matrices. This procedure corresponds to a mutual weighting of each signal by the other. Several authors have employed different representations of the Klein-4 group to study DNA and RNA symmetries (Zhang, 1997; Negadi, 2003). Our approach is similar to the 4 × 4 representation proposed in (Duplij and Duplij, 2005) for the analysis of the symmetries of the B1 B2 codon doublets, although our interpretation of the nucleotide-vectors as signals is new. Let us consider a vector “Signal Space”. A column-vector represents a signal
⎡
⎤
⎡ ⎤
V1 C ⎢ V2 ⎥ ⎢ G ⎥ ≡ V=⎣ V3 ⎦ ⎣ U ⎦ A V4
(5)
And also the transpose VT = [C G U A]
(6)
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
Then, the outer product of V and VT gives the matrix:
⎛
CC
CG
CU
CA
⎞
⎜ GC GG GU GA ⎟ M =V×V =⎝ UC UG UU UA ⎠ T
AC
AG
AU
(7)
AA
We noticed that the four quadrants of M correspond, respectively, to the four cosets of the K × K group acting on AA (3). Therefore, we see that M has the structure
M(1) =
4 3 1 2
(8)
Translating M into the W/S alphabet we get
⎛
⎞
.. . .. . .. .
SW SW ⎟ ⎜ SS SS ⎜ ⎟ ⎜ SS SS SW SW ⎟ ⎜ ⎟ MWS = ⎜ . . . . . . ... ... ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ WS WS ... WW WW ⎟ ⎝ ⎠ WS
.. .
WS
WW
(9)
WW
where the block structure with respect to the W/S syntactic classes is apparent. In each block, the doublets have the same combination of intensities that only change from one block to the next. Thus, the intensities, coded in W/S character, combine in the form explained above producing a mutual weighting of each signal by the other. In the same way, translating M into the R/Y categories we get the block matrix:
M(2) =
J J
J J
With
J=
YY RY
YR RR
(10)
(11)
Explicitly,
⎛
⎜ YY
YR
.. . .. . .. .
⎞ YY
YR ⎟
⎜ ⎟ ⎜ RY RR RY RR ⎟ ⎜ ⎟ MRY = ⎜ ... ...⎟ ⎜... ... ⎟ ⎜ ⎟ ⎜ YY YR ... YY YR ⎟ ⎝ ⎠ RY
RR
.. .
RY
(12)
RR
Therefore, moving in the clockwise (or anticlockwise) direction, in each block, we have the same alternating pattern of changes of the R/Y character: 2nd letter → 1st letter → 2nd letter . . . Here also, the two signal vectors with intensities coded in R/Y character combine, producing a mutual weighting of each signal by the other. 3.2. Fourfold Genetic Code Table After taking into account the underlying symmetries of M (7), expressed in the derived matrices MWS (9) and MRY (12), we obtain the arrangement in the mitochondrial (UGR = W, AUR = M) or in the canonical genetic code, in view of the R/Y symmetry of the third base (Table 1). The arrangement of the table in four quadrants is consistent to a decision-tree we introduced previously (see Fig. 5 ˜ et al., 1996). From that tree, it is clear that in Jiménez-Montano
109
only the S/W character of bases B1 B2 determines the redundancy of codons when both bases are of the same type: SSN codons (first quadrant in Table 1) belong to M1 , and WWN codons belong to M2 (third quadrant in Table 1). However, for codons of mixed type: SWN and WSN it is not possible to decide, unless one has more information about the second base: SUN and WCN (with B2 = Y) belong to M1 (second and fourth quadrants in Table 1, respectively) while SAN and WGN (with B2 = R) belong to M2 (second and fourth quadrants in Table 1, respectively). From the new arrangement of the genetic code follows, with designations introduced in (Yu, 2007), that the four quadrants vary with respect to their sensitivity to genomic GC content, in clockwise direction, as GC-rich, GC in first codon position, AU-rich, and GC in second codon position, respectively. The changes within each quadrant are not only singlebase changes, but they are also one-bit changes, corresponding to the R/Y character of the nucleotide. This structure has important consequences for codon mutability between neighboring codons. In the six-dimensional representation of the genetic code (Jiménez˜ et al., 1996), which takes into account single-bit changes, Montano the quadrants correspond to four non-overlapping subspaces. With ˜ the help of the computer program HyperGCode (Jiménez-Montano, 2004), available at http://www.uv.mx/ajimenez/#software, one can follow mutational pathways. For example, we recently applied it to the antigenic evolution of influenza A H3N2 virus hemagglu˜ and Ramos Fernández, 2007), confirming tinin (Jiménez-Montano a high weight of the genetic code in the virus “fine tuning”, with possible important consequences for its infectivity. 3.3. Non-commuting (R/Y) Doublets Comparing the side diagonal in Table 1 with the side diagonals in matrices M (7), MWS (9), and MRY (12), respectively, we see that the doublets GU (SW) and AC (WS) are of the form RY, and belong to the octet M1 while UG (WS) and CA (SW) are of the form YR, and belong to M2 . Previously, Duplij and Duplij (2000, 2005) obtained the matrix M (7), building on the work of Rumer (1968) about the “power” of each of the 16 B1 B2 doublets to determine the encoded amino acid. Taking into account GC pressure and codon usage in evolution, they introduced an ad hoc numerical description of the “strength” of a nucleotide in a codon to determine the amino acid, called determinative degree. To calculate the determinative degree of codons, they assumed this measure to be additive. Since all the above doublets have the same value of the determinative degree according to Duplij and Duplij (2000, 2000a), we have to consider them as equivalent. However, they are not equivalent with respect to redundancy of the codon’s third base. Therefore, the non-commuting property shows that the additive property of the determinative degree of codons is indeed not correct. We assert that the basic difference between the codons in mixed from unmixed codon families is due precisely to these four non-commuting purine/pyrimidine doublets, separated in pairs as, AC/CA, GU/UG (see Table 1). Most codon doublets have their “chiral” partner (mirror reflection with respect to the R/Y categorization) in the same doublet-set, M1 or M2 , for example GC ↔ CG. However, AC and GU have their partners in the opposite set. AC and GU belong to M1 , while CA and UG belong to M2 . This characterization of M1 and M2 is consistent with the empirical rule that, for WS and SW doublets, the degeneracy is determined by the R/Y type of the base in the second position of the doublet. That is, a pyrimidine, as in AC and GU, specifies a fourfold degenerate codon, while a purine, as in CA and UG, indicates a twofold degenerate codon (Lagerkvist, 1978). According to Lehmann and Libchaber (2008), an extra hydrogen bond within the anticodon loop, which occurs when R (but not Y) is present at position 35, additionally stabilizes this base. Since this interaction is asymmetrical (because it does not occur on the codon
110
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
Table 2a Sense-to-sense codon identity changes. Codon
aa change
Standard
Degeneracy change
Conserves boxes
In codesa
AUA CUU cue CUA CUG CGA CGC AGA AGG AAA CUG AGA AGG AAA
Met M Thr T Thr T Thr T Thr T Absent Absent Ser S Ser S Asn N Ser S Gly G Gly G Asn N
Ile I Leu L Leu L Leu L Leu L Arg R Arg R Arg R Arg R Lys K Leu L Arg R Arg R Lys K
3+1→2+2 6+4→8+2
Yes Yes
2, 5, 13, 21 3
6→4+0
Yes
3
6+6→8+4
Yes
5, 9, 14, 21
2+2→3+1 6 + 6 → 7b + 5 6+4→4+6
No No Yes
9 12 13
2+2→3+1
No
14
a
Numbering from: The genetic codes, compiled by Andrzej (Anjay) Elzanowski and Jim Ostell National Center for Biotechnology Information (NCBI), Bethesda, MD, USA. http://www.ncbi.nlm.nih.gov/. b Changes with anomalous odd degeneracy (see text).
side) it may contribute to explain why AC/CA and GU/UG are indeed not commuting (I owe this explanation to an anonymous referee). Besides, the change in the relative order of base-type in the codon may be another factor that contributes to a small change in the stacking energy of the duplex, because the energy is orientationdependent: the energy of codon–anticodon pair formed with RYN codons is different from the energy of codon–anticodon pair formed with YRN codons. 4. Degeneracy in the Canonical Code and Changes in Deviant Codes 4.1. Degeneracy in the Canonical Code The problem of the degeneracy of the canonical code amounts to determine, from all the permutations of four letters, the ones that are allowed without violating the syntactic rules imposed by the three categorizations of the bases (Fig. 1).To approach this problem, we recall some facts from the theory of finite groups. The symmetric group S4 consists of all 4! = 24 permutations of four letters. It has five conjugacy classes, of which three are even and two are odd. The odd ones are the 4-cycle and the transposition. The even ones are the 3-cycle, the double transposition, and the identity element. The 24 elements of the symmetric group on {1, 2, 3, 4}, expressed using the cycle notation (Milson, 2007), and grouped according to their five conjugacy classes (subgroups) are: Permutation types of {1, 2, 3, 4}
Order
No change (identity): () Interchanging 2 (transposition): (1 2), (1 3), (1 4), (2 3), (2 4), (3 4) 3-Cycle: (123), (132), (124), (142), (134), (143), (234), (243) 4-Cycle: (1234), (1243), (1324), (1342), (1423), (1432) Double transposition: (12) (34), (13) (24), (14) (23)
1 6 8 6 3
The observed degeneracy in the canonical code comprises some single transpositions and the commutator subgroup of S4 . This is the subgroup of even permutations, A4 = {identity; 3-cycle; double transposition}. In turn, the commutator subgroup of A4 is the Klein-4 group, i.e., the normal subgroup comprising double transpositions. The permutations that preserve codon boxes are, (i) transitions in the third base, that correspond to transpositions (1 3) and (2 4) (see Fig. 1); and, ii) double transpositions, that preserve the symmetries of the B1 B2 doublets. Cyclic permutations of three elements (3-cycle) at position B3 match degeneracy 1/3, but break the box structure. These transformations permute three elements leaving
one unchanged, e.g., AUH: (I) and AUG: (M), in the canonical code (Table 1). Where H = (A, U, C). Now, to explain degeneracy 1, 2, 3, 4, 6, in the canonical code, and to predict allowed and not allowed deviations, we notice that transpositions correspond to the 2-cycle subgroup C2 and double transpositions correspond to C2 × C2 , which is isomorphic to the Klein-4 group. These groups are related to modulo prime multiplication groups Mm . C2 is isomorphic to the modulo multiplication groups M3 , M4 , and M6 (which are the only modulo multiplication groups isomorphic to C2 ), and C2 × C2 is isomorphic to M8 and M12 (and no other modulo multiplication groups). Furthermore, we recall that the only ordered m for which the elements Mm are all self-conjugate are the divisors of 24: 1, 2, 3, 4, 6, 8, 12, 24. Thus, we have the following correspondences among groups:
e C2 C2 × C2 = K C2 × C2 × C2
→ → → →
M2 (no change) M3 M4 M6 (transposition) M8 M12 (2 transpositions) M24
Obviously, the first five divisors of 24 match exactly the degeneracy of the canonical code. From the series, one could predict the next value 8, which has already been observed in invertebrate, Trematode, Equinoderm and Flatworm Mitochondrial Codes, where serine (S) codons AGA and AGG passed to arginine (R), which acquired two family boxes (see below). We think therefore, that is not a far fetching speculation to assume that in some early stage in the evolution of the code, with fewer amino acids, multiple assignments were frequent. Therefore, if one or two amino acids gained three family boxes, they would have a total degeneracy 12 each one, or with six family boxes, degeneracy 24, completing the correspondence. 4.2. Degeneracy in Deviant Codes Several authors have proposed different mechanisms of codon reassignment in a number nuclear and organelle lineages (see, e.g., Knight et al., 2001a and references therein). Most of the nonstandard codes arise from different alterations in tRNA, among them base modification. For a review of the changes in mitochondria, see (Knight et al., 2001). Here we analyze these changes with respect to the conservation of two- and four-member family boxes (Table 1).We separate the sense-to-sense codon identity changes (Table 2a) from the changes involving the termination signal (Ter) (Table 2b). We see from Table 2a, that most changes conserve the allowed box structure, with two exceptions. (1) The change of AAA from lysine (K) to
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
111
Table 2b Codon identity changes involving Tera . Codon
Aa/Ter change
Standard/Ter
Degeneracy change
Conserves boxes
In codesb
AGA AGG UCA UGA UAA UAG UGA UAA UAG UAG UAG
Ter * Ter * Ter * Trp W Gin Q Gin Q Cys C Tyr Y Gin Q Leu L Leu L
Arg R Arg R Ser S Ter * Ter * Ter * Ter * Ter * Ter * Ter * Ter *
6=4+2
Yes
2
6+3=5 +4 1+1=2 2 + 3 = 52
No Yes No
22 2, 3, 4, 5, 9, 13, 14 6
3+2=2+3 3+2=2+3 3+2=2+3 3 + 6 = 2 + 7c 3+6=7+2
No No No No No
10 14 15 16 22
2
a
Mostly nonsense-to-sense. b Numbering from: The genetic codes, compiled by Andrzej (Anjay) Elzanowski and Jim Ostell National Center for Biotechnology Information (NCBI), Bethesda, MD, USA. http://www.ncbi.nlm.nih.gov/. c Changes with anomalous odd degeneracy (see text).
asparginine (N) which produces degeneracy 3/1, analogous to isolucine (I) methionine (M) degeneracy in the canonical code. (2) The change of CUG identity from leucine (L) to Serine (S), that produces what we consider a forbidden odd degeneracy 7/5, with values which are not divisors of 24. Santos and coworkers (Miranda et al., 2006, 2007) have extensively documented this anomalous change of CUG to serine in the cytoplasm of various Candida and Debaryomyces species as an example of ambiguous translation (Knight et al., 2001a). Structural alteration of the translational machinery explains codon identity change in this case, as well as the change of the leucine CUN codons to threonine in the mitochondria of S. cerevisiae and C. glabrata. Although codons starting with G are resistant to changes, the ones starting with C are not. It is therefore not the strength of the base-pair which determines the resistance to changes. From Table 2b we observe that in most changes involving Ter, the box structure is not conserved, because of the frequent 3/1 degeneracy. There are only two cases of 7/5 degeneracy. This is not a surprise, because these changes are associated with the disappearance of release factors, and not with codon–anticodon interaction. However, the most frequent of all codon reassignments, UGG Ter → W, due to wobble expansion, actually creates a two-member family box. Notice that all changes correspond to codons starting with a weak base. 5. Binary Sub-codes Sub-codes occurring in the quartets of Table 1 become apparent after the codification of matrix M (7) according to the chemical classes. Expressing M in terms of the Gray coding of the bases (Fig. 1) we get a matrix without an apparent block structure
⎛ ⎜
M01 = ⎝
1111 0111 0011 1011
1101 0101 0001 1001
1100 0100 0000 1000
1110 0110 0010 1010
⎞ ⎟ ⎠
(13)
However, for each 2 × 2 block, we have a Hamiltonian cycle: moving in the clockwise direction, one bit changes in a definite order: 3rd bit → 1st bit → 3rd bit . . . . Thus, in each quadrant, we have a sequence of alternating transformations that change only the M/K character and conserve the W/S character of the base (2nd and 4th bits). For example, if we consider the sub-matrix in the third block, we have
A3 =
a33 a43
a34 a44
=
UU AU
UA AA
(14)
Because of the Gray code assignments (Fig. 1) a33 = UU = 0000 a34 = UA = 0010 a44 = AA = 1010 a43 = AU = 1000
(15)
The codons with B1 B2 doublets in the first column encode hydrophobic amino acids [F, L, I, M], with 2 OH tRNA aminoacylation (Class I). While the codons with B1 B2 doublets in the second column, encode non-hydrophobic amino acids [K, N, Y], with 3 OH tRNA aminoacylation (Class II) (Y is an ambiguous case), and the stop signal (Ter) (see Section 6). Therefore, mutations within the same column are conservative and are associated to aminoacyltRNA synthetases of the same class. In contrast, mutations between different columns are radical and are associated to aminoacyl-tRNA synthetases of opposite classes. We also noticed that the sub-matrices in the 1st and 3rd blocks and the sub-matrices in 2nd and 4th blocks of (13), respectively, are equivalent under the transformation: 0 → 1, 1 → 0, that changes both nucleotide classes M/K and S/W simultaneously. 6. Prediction of the Two Modes of tRNA Recognition 6.1. A Second Genetic Code Table Aminoacyl-tRNA synthetases establish the genetic code by catalyzing aminoacylation reactions that attach amino acids to tRNA adaptors. As is well known, these enzymes come in two groups, according to sequence and structural patterns, called Class I and Class II aaRSs (Eriani et al., 1990). For each class we have exactly 10 amino acids: For Class I, we have {L I M V C W R Y E Q}, and for Class II we have {F G P A S T K H D N}, with the corresponding partition of codons. Besides this classification, codons admit two more, slightly different, groupings: in the first one, codons are classified depending on whether the amino acids are acylated by aminoacyl-tRNA synthetases at the 2 or at the 3 hydroxyl group of the tRNAs last ribose (Jestin and Soulé, 2007; Delarue, 2007). In addition, the second one combines codons according to the two modes of tRNA recognition, from the minor and major groove sides of the acceptor stem, respectively (Rodin and Rodin, 2008). In an early paper, Wentzel (1995) pointed out a correlation between the middle base of a codon and aaRS class. We extended this result identifying the binary category of the middle base as M/K. Seven amino acids in class I have codons of the form NKN and eight amino acids of class II have codons of the form NMN ˜ 1999). Here, we elaborated this result with (Jiménez-Montano, the help of recent findings of Rodin and Rodin (2006, 2008). They
112
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
Delarue (2007) assigned them to both types. The ambiguous codons of lysine AAR (K), which have both 2 and 3 aminoacylation of their tRNAs were assigned the class 3 by Jestin and Soulé (2007). As was pointed out by Delarue (2007), AGR codons need a special mention. In the canonical code they belong to the second group of arginine (R) codons, thus Jestin and Soulé (2007) assigned them to class 2 acylated amino acids. However, in deviant codes are frequently substituted by serine (S) and glycine (G). Therefore, except for Jestin and Soulé (2007), AGR codons were classified in the major groove group. 7. Discussion and Conclusions
Fig. 3. Decision-tree of codon syntactic categories to classify codons according to the two modes of tRNA recognition, from the minor and major groove sides of the acceptor stem, respectively.
suggested a rearrangement of the genetic code that also divides the table in quartets (Table 3 of Rodin and Rodin, 2008). However, with respect to the amino/keto categorization of the bases, M (A/C) columns are atypical; the codons start with an N, in contrast to the K (U/G) columns where, as usual, the codons end with an N. In addition, in the left-hand side of the table, the rows are in the Y/R order while the right-hand side is in opposite R/Y order. As mentioned by Carter (2008), the unusual form of some codons implies an early equivalence between the first and third codon bases, and hence they would change some implications of the wobble hypothesis (Crick, 1966), which attributes the least significance to the third base. Actually, the order of significance, from the third to the first codon base, changes only in the case of NAN codons. This change is responsible for the main difference between the asymmetric pattern of codon differentiation proposed by Delarue (2007), and the model in Table 3 of Rodin and Rodin (2008), as already noticed by these last authors. 6.2. A Decision-tree to Predict the Mode of tRNA Recognition From the information in Table 3 in (Rodin and Rodin, 2008), we built a decision-tree (Fig. 3) from which it is possible to predict the mode of tRNA recognition. As we can see from the figure, recognition from the minor groove side of the acceptor stem occurs for all the codons in the class NUN, and from the major groove side of the acceptor stem for all the codons in the class NCN. However, for the codons in the classes NGN and NAN we need more information. The codons in classes YGN and NAR are recognized from the minor groove side of the acceptor stem, while the codons in classes RGN and NAY are recognized from the major groove side of the acceptor stem. Therefore, according to Fig. 3, we have the minor groove group:{NUN, YGN, NAR}, and the major groove group: {NCN, RGN, NAY}. It is a remarkable example of simplicity-out-ofcomplexity that such simple rules apply, given the complex nature of the recognition interaction between tRNAs and aminoacyl-tRNA synthetases. Our codon classification agrees with the results by Jestin and Soulé (2007), Delarue (2007) and Rodin and Rodin (2006, 2008), with a few exceptions. Jestin and Soulé (2007) discussed in their paper the special case of cysteine codons UGY (C), which they assigned to class of 3 acylated amino acids. Also, at variance with our result and the result of Rodin and Rodin (2006, 2008), they assigned tyrosine codons UAY (Y) to class 2 acylated amino acids.
In this paper, we have discussed two different representations of the genetic code arranged in quarters: our Table 1 and Table 3 in (Rodin and Rodin, 2008). Although they rest on very different arguments, and serve different purposes, they show similarities from an abstract perspective. In both, there are two quarters with definite properties and two quarters with mixed properties. In the fourfold genetic code (Table 1), these correspond to the quadrants with doublets B1 B2 in [SS, WW] and in [SW, WS], respectively. In Table 3 of (Rodin and Rodin, 2008), codons in the first and third quadrants have two definite modes of tRNA recognition: from the minor and from the major groove, respectively. While codons in second and fourth quadrants, contain codons of both kinds. As a byproduct, both tables separate, with a yin-yan like pattern, the 64 codons into two contiguous regions of 32 codons each. Each table has a corresponding decision tree. The first one matches the ˜ et tree we introduced previously (see Fig. 5 in Jiménez-Montano al., 1996), and the second one, the tree in Fig. 3. In both trees, the decisive factor is the middle base in the context of the other two bases. Both represent classification rules, as normally formulated in machine-learning algorithms, to classify codons with respect to different criteria, either third base degeneracy or tRNA recognition mode, respectively. ˜ 1994), we introduced a different codon In (Jiménez Montano, classification tree (Fig. 3B of the mentioned paper) which is independent of amino acid assignments. That is, it classifies 64 objects (codons) in 21 boxes (20 amino acid classes and 1 terminator signal class), according to the R/Y character of each base in the codon. For that purpose, we implemented Quinlan’s inference algorithm (Quinlan, 1983), that classifies objects by means of an informational criterion, which only depends on the codon pluralities. By construction, the tree reflects the distribution of redundancy in different codon positions, which agrees with the one observed in the codon catalogue of the canonical code. Furthermore, we wrote, “By its nature, the present approach cannot tell us much about the structure of primitive codes. However, interpreting the decision tree as a record of the code’s evolution, we make contact with some of the models previously proposed for describing its possible evolution” Fol˜ 1999) we interpreted the lowing this idea, in (Jiménez Montano, tree as reflecting the evolution of the code by reduction of codon ambiguity (Woese, 1965), in terms of codon–anticodon interaction energy. Independently, Delarue (2007) has found an almost identical tree (Fig. 4 of his paper) by means of an asymmetric process of codon differentiation. We leave the comparison of these models of evolution of the code, which we think may have a close relation, as an open problem worth pursuing in a future publication. We reported a basic difference between mixed and unmixed codon families in terms non-commuting (R/Y) doublets: AC/CA, GU/UG, and described the degeneracy in the canonical code and the systematic changes in deviant codes in terms of the divisors of 24. Altogether, the results reported contribute to highlight underlying organization principles of the genetic code, which have their support on physical chemical constraints and the digital nature of the code.
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
With respect to the first question posed at the beginning of this paper, we think that the answer should come from different lines of thought. We may rationalize the quasi-invariance and the simplicityout-of-complexity in the fourfold structure of the genetic code (Table 1), against the variability of the strategies of codon usage in different organisms and organelles, by making an analogy between the genetic language (Ratner, 1974; Kay, 2000) and linguistics (Saussure, 1986). On the one hand, we have the syntax of the genetic language (‘langue’). That is, the grouping of codons in classes and the probabilistic rules of mutation implicit in the topological structure ˜ 1994; Jiménezof the code (Swanson, 1984; Jiménez Montano, ˇ ˜ et al., 1996; Karasev and Sorokin, 1997; Stambuk, Montano 2000; He et al., 2004). On the other hand, we have the use of the language (‘parole’), employed by different organisms and organelles, which is embodied in the different strategies for codon usage, the number of different isoaccepting tRNAs, and their abundance in the cell. Another line of argument is as follows. Although it is true that the genetic code is a mapping, it is not an arbitrary mapping. The digital and complementary nature of RNA plays a fundamental role in protein synthesis by the ribosome. The reading of the correct tRNA as dictated by messenger RNA (mRNA) induces the degeneracy of the genetic code mapping that, in a certain way, behaves like a program. Now, according to Pattee (2002), “A program is a linguistic structure, by definition, with a set of symbols and grammatical rules. Whatever these symbols and rules are, they must be implemented in physical structures that obey physical laws; but the fundamental property of languages, and all semiotics, is that neither the symbols nor the grammar requires any reference to physical laws. In other words physics and semiotics are treated as functionally disjoint categories.” Therefore, once the symbolic description of the translation process was accomplished neither its origin nor its physical basis have to be considered anymore. However, in contrast to natural and artificial languages, for the genetic language this is not completely true (Conrad, 1972) because the grammar embodied in the code is not conventional or arbitrary. To select the adequate order of categories to classify the nucleotides, and the correct arrangement of codons in the fourfold table, is necessary to take into account the mechanistic aspect of translation (Woese, 2001); that is, the physical–chemical constraints of the translation process. Besides, the basis of the simplification and the symmetries implicit in Table 1 obey the fact that the genetic code is a threeletter code, in which the first two positions of the codon are read by the anticodon strictly according to the rules of classic base-pairing. It is the third position in the codon, the one that introduces complications (Agris et al., 2007). The first two base-pairs are completely restricted to Watson-Crick, canonical base-pair interactions, the principal function of modified nucleosides at the first anticodon position seems to ensure the correct conformation of the three anticodon residues in the context of a seven-nucleotide loop to allow for correct reading of the codon (Yokoyama and Nishimura, 1995). Recently, Woese (2001) complained, “Only one of the three facets of translation has been “solved,” that is, its coding aspect. Yet what kind of explanation have we here? The coding rules (the dictionary of codon assignments) are known. Yet they provide no clue as to why the code exists and why the mechanism of translation is what it is.” The subtle point to consider here is, what constitutes a very singular situation in science history: the inseparability of physical–chemical, informational and biological aspects in the description of the genetic code. Within the translation apparatus, modified bases, and wobble and codon misreading rules (Crick, 1966: Heckman et al., 1980; Lim and Curran, 2001; Agris et al., 2007) modulate the physics and chemistry of the codon–anticodon interaction. In turn, this modulation obeys selective pressures, which can only be understood in evolutionary terms, reminding us of Dobzhansky’s famous dictum (Dobzhansky, 1973).
113
Acknowledgements I wrote this paper during a sabbatical at Nova Southeastern University. I want to thank Prof. Matthew X. He for his kind hospitality and useful comments on the manuscript. The suggestions by one anonymous referee, that greatly improved the manuscript, are warmly acknowledged. I am grateful for the support from CONACYT, Project: 81484; Sistema Nacional de Investigadores; PROMEP, Project: UV-CA-197 MEXICO, and Universidad Veracruzana. Finally yet importantly, I want to thank my wife Maria Eta Castellanos, for her patience and suggestions during the preparation of this work.
References Agris, P.F., Vendeix, F.A.P., Graham, W.D., 2007. tRNA’s wobble decoding of the genome: 40 years of modification. J. Mol. Biol. 366, 1–13. Antillon, A., Ortega-Blake, I., 1985. A group theory analysis of the ambiguities in the genetic code: on the existence of a generalized genetic code. J. Theor. Biol. 112, 757–769. Barros, A.S., Pinto, R., Jouan-Rimbaud Bouveresse, D., Rutledge, D.N., 2008. Principal component transform—outer product analysis in the PCA context. Chemom. Intell. Lab. Syst. 93, 43–48. Bertman, M.O., Jungck, J.R., 1979. Group graph of the genetic code. J. Hered. 70, 379–384. Carter, C.W., 2008. Thawing the “Frozen Accident”. Heredity 100, 339–340. Crick, F.H.C., 1966. Codon–anticodon pairing: the wobble hypothesis. J. Mol. Biol. 19, 548–555. Conrad, M.T., 1972. The importance of molecular hierarchy in information processing. In: Waddington, C.H. (Ed.), Towards a Theoretical Biology 4. Essays. Edinburgh University Press, Edinburgh, pp. 222–228. Danckwerts, H.J., Neubert, D., 1975. Symmetries of genetic code-doublets. J. Mol. Evol. 5, 327–332. Delarue, M., 2007. An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13, 161–169. Dobzhansky, T., 1973. Nothing in biology makes sense except in the light of evolution. Am. Biol. Teach. 35, 125–129. Duplij, D., Duplij, S., 2000. Analysis of symmetries of genetic code and the extent of determination of codons. Biophys. Bull. Kharkov State Univ. 488 (1), 60–70 (in Russian). Duplij, D., Duplij, S., 2000a. Determinative degree and nucleotide content of DNA strands. Biophys. Bull. Kharkov Univ. 525, 86–92. Duplij, D., Duplij, S., 2005. DNA sequence representation by trianders and determinative degree of nucleotides. J. Zhejiang Univ. Sci. 6B (8), 743–775. Eriani, G., Delarue, M., Poch, O., Gangloff, J., Moras, D., 1990. Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347, 203–206. Findley, G.L., Findley, A.M., McGlynn, S.P., 1982. Symmetry characteristics of the genetic code. Proc. Natl. Acad. Sci. U.S.A. 79, 7061–7065. Harris, Z., 1998. Language and Information. Columbia University Press, New York. He, X.M., Petoukhov, S.V., Ricci, P.E., 2004. Genetic code, hamming distance and stochastic matrices. Bull. Math. Biol. 66 (5), 1405–1421. Heckman, J.E., Sarnoff, J., Alzner-Deweerd, B., Yin, S., Rajbhandary, U.L., 1980. Novel features in the genetic code and codon reading patterns in Neurospora crassa mitochondria based on sequences of six mitochondrial tRNAs. Proc. Natl. Acad. Sci. U.S.A. 77 (6), 3159–3163. Hood, L., Galas, D., 2003. The digital code of DNA. Nature 421, 444–447. Jestin, J.-L., 2006. Degeneracy in the genetic code and its symmetries by base substitutions. C. R. Biol. 329, 168–171. Jestin, J.-L., Soulé, C., 2007. Symmetries by base substitutions in the genetic code predict 2 or 3 aminoacylation of tRNAs. J. Theor. Biol. 247, 391–394. ˜ M.A., 1994. On the syntactic structure and redundancy distribuJiménez Montano, tion of the genetic code. BioSystems 32, 11–23. ˜ M.A., de la Mora-Basánez, ˜ Jiménez-Montano, R., Pöschel, T., 1996. The hypercube structure of the genetic code explains conservative and non-conservartive aminoacid substitutions in vivo and in vitro. BioSystems 39, 117–125. ˜ M.A., 1999. Protein evolution drives the evolution of the genetic Jiménez-Montano, code and vice versa. BioSystems 54, 47–64. ˜ M.A., 2004. Applications of hyper genetic code to bioinformatics. Jiménez-Montano, J. Biol. Syst. 12, 5–20. ˜ M.A., Ramos Fernández, A., 2007. An empirical method to idenJiménez-Montano, tify positively selected sites in antigenic evolution. In: Argüello-Astorga, G.R., González, A.R., Méndez-Salinas, E. (Eds.), Proc. V National Congress of Virology. Mexican Society of Biochemistry, Mexico. Jungck, J.R., 1978. The genetic code as a periodic table. J. Mol. Evol. 11, 211–224. Karasev, V.A., Sorokin, S.G., 1997. Topological structure of the genetic code. Russ. J. Genet. 33, 622–628. Kay, L.E., 2000. Who Wrote the Book of Life? A History of the Genetic Code. Writing Science. Stanford University Press, Stanford, CA. Knight, R.D., Landweber, L.F., Yarus, M., 2001. How mitochondria redefine the code. J. Mol. Evol. 53, 299–313.
114
M.A. Jiménez-Monta˜ no / BioSystems 98 (2009) 105–114
Knight, R.D., Freeland, S.J., Landweber, 2001a. Rewiring the keyboard: evolvability of the genetic code. Nat. Rev. 2, 49–58. Lagerkvist, U., 1978. Two out of three: an alternative method for codon reading. Proc. Natl. Acad. Sci. U.S.A. 75 (4), 1759–1762. Lehmann, J., 2000. Physico-chemical constraints connected with the coding properties of the genetic system. J. Theor. Biol. 202 (129), 144. Lehmann, J., Libchaber, A., 2008. Degeneracy of the genetic code and stability of the base pair at the second position of the anticodon. RNA 14, 1264–1269. Lim, V.I., Curran, J.F., 2001. Analysis of codon:anticodon interactions within the ribosome provides new insights into codon reading and the genetic code structure. RNA 7, 942–957. MacDónaill, D., Manktelow, M., 2004. Molecular informatics: quantifying information patterns in the genetic code. Mol. Simul. 30 (5), 267–272. Magini, M., Hornos, J.E.M., 2003. A dynamical system for the algebraic approach to the genetic code. Braz. J. Phys. 33 (4), 825–830. Milson, R., 2007. CycleNotation, http://planetmath.org/encyclopedia/CycleNotation. html. Miranda, I., Silva, R., Santos, M.A.S., 2006. Evolution of the genetic code in yeasts. Yeast 23, 203–213. Miranda, I., Rocha, R., Santos, M.C., Mateus, D.D., Moura, G.R., Carreto, L., Santos, M.A.S., 2007. A genetic code alteration is a phenotype diversity generator in the human pathogen Candida albicans. PLoS ONE (10), e996, www.plosone.org. Negadi, T., 2002. On the symmetries of the 16 genetic code-doublets. In: Gazeau, J.P., Kerner, R. (Eds.), Proc. XXIV Int. Coll. Group Theoretical Methods in Physics. Institute of Physics Publishing, Ltd., Paris, France, Bristol, UK, pp. 687–690. Negadi, T., 2003. Rumer’s transformation, in biology, as the negation, in classical logic. Int. J. Quant. Chem. 94, 65–74. Pattee, H.H., 2002. The origins of Michael Conrad’s research programs (1964–1979). BioSystems 64 (1–3), 5–11. Petoukhov, S.V., 1999. Genetic code and the ancient Chinese book of changes. Symmetry: Cult. Sci. 10 (3–4), 211–226. Quinlan, J.R., 1983. Learning efficient classification procedures. In: Michalski, J.G., Carbonell, T.M., Mitchel (Eds.), Machine Learning: an Artificial Itelligence Approach. Morgan Kaufmann, pp. 463–482. Ratner, V.A., 1974. The genetic language. In: Rosen, E., Snell, F.M. (Eds.), Progress of Theor. Biol. Academic Press, New York.
Ribas de Pouplana, L., Schimmel, P., 2001. Aminocyl-tRNA synthetases: potential markers of genetic code development. Trends Biochem. Sci. 26 (10), 591–596. Rodin, S.N., Rodin, A.S., 2006. Partitioning of aminoacyl-tRNA synthetases in two classes could have been encoded in a strand-symmetric RNA world. DNA Cell Biol. 25, 617–626. Rodin, S.N., Rodin, A.S., 2008. On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyltRNA synthetases. Heredity 100, 341–355. Rogalski, M., Karcher, D., Bock, R., 2008. Superwobbling facilitates translation with reduced tRNA sets. Nat. Struct. Mol. Biol. 15 (2), 192–198. Rumer, Y.B., 1968. Systematization of codons in the genetic code. Dokl. Akad. Nauk. SSSR 183, 225–226 (in Russian). Sánchez, R., Morgado, E., Grau, R., 2005. A genetic code Boolean structure. I. The meaning of Boolean deductions. Bull. Math. Biol. 67 (1), 1–14. Saussure, F.D., 1986. In: Bally, C., Sechehaye, A. (Eds.), Course in General Linguistics in collaboration with Albert Riedlinger. McGraw-Hill Book Company, New York. ˇ Stambuk, N., 2000. Universal metric properties of the genetic code. Croat. Chem. Acta 73 (4), 1123–1139. Swanson, R., 1984. A unifying concept for the amino acid code. Bull. Math. Biol. 46, 187–203. Tong, K.-L., Wong, T.-F., 2004. Anticodon and wobble evolution. Gene 333, 168–177. Wentzel, R., 1995. Evolution of the aminocyl-tRNA synthetases and the origin of the genetic code. J. Mol. Evol. 40, 545–550. Wilhelm, T., Nikolajewa, S., 2004. A new classification scheme of the genetic code. J. Mol. Evol. 59, 598–605. Woese, C.R., 1965. On the evolution of the genetic code. Proc. Natl. Acad. Sci. U.S.A. 54, 1546–1552. Woese, C.R., 2001. Translation: in retrospect and prospect. RNA 7, 1055–1067. Yokoyama, S., Nishimura, S., 1995. Modified nucleosides and codon recognition. In: Söll, D., RajBhandary, U.L. (Eds.), tRNA, Structure, Biosynthesis and Function. American Society for Microbiology, Washington, DC, pp. 207–223. Yu, J., 2007. A content-centric organization of the genetic code. Genom. Prot. Bioinf. 5 (1), 1–6. Zhang, C.-T., 1997. A symmetrical theory of DNA sequences and its applications. J. Theor. Biol. 187, 297–306.