Rumer’s transformation: A symmetry puzzle standing for half a century

Rumer’s transformation: A symmetry puzzle standing for half a century

BioSystems 187 (2020) 104036 Contents lists available at ScienceDirect BioSystems journal homepage: www.elsevier.com/locate/biosystems Rumer’s tran...

3MB Sizes 0 Downloads 6 Views

BioSystems 187 (2020) 104036

Contents lists available at ScienceDirect

BioSystems journal homepage: www.elsevier.com/locate/biosystems

Rumer’s transformation: A symmetry puzzle standing for half a century Diego L. Gonzalez a b

a,b,

b

a

⁎, Simone Giannerini , Rodolfo Rosa

T

IMM-CNR, Bologna Unit, Italy Department of Statistical Sciences, University of Bologna, Italy

ARTICLE INFO

ABSTRACT

Keywords: Genetic code Rumer’s transformation Symmetry Degeneracy Tessera code

In 1966, only a few months after the complete elucidation of the standard nuclear genetic code (Kay, 2000), the Russian theoretical physicist Yury Borisovich Rumer uncovered the existence of a particular symmetry (Rumer, 1966): when the keto-amino transformation (also known as Rumer’s transformation) is applied to the bases of a codon then the degeneracy of the transformed codon was changed. In particular, if the amino acid associated to the starting codon has degeneracy 4, then the amino acid associated to the transformed codon has degeneracy 1, 2 or 3 (and vice versa). After half a century from this discovery and despite the universality of Rumer’s symmetry, little is known about its origin and its possible biological significance. In this article we show that Rumer’s symmetry could have originated in an ancestral version of the genetic code, i.e., the pre-early code, and is a natural consequence of the stereo-chemical symmetries of the ancestral synthesis machinery working around such code (Gonzalez et al., 2019). Moreover, the conservation of Rumer’s symmetry through evolutionary periods suggests a connection with key biological features. In this respect, intriguing possibilities include those of error detection/correction, control over the synthesis of proteins, and frame maintenance. To a certain extent, such ideas have been explored in the framework of a mathematical model of the genetic code (the non-power model of the genetic code (Gonzalez, 2004; Gonzalez, 2008; Gonzalez et al., 2016), whose definition of dichotomic classes naturally includes Rumer’s symmetry (Gonzalez, 2008; Gonzalez et al., 2006, 2008) and the theory of circular codes (Arquès and Michel, 1996; Gonzalez et al., 2011; Fimmel et al., 2015).

1. Introduction What is Rumer’s transformation? In 1966, the Russian theoretical physicist Yuri Borisovich Rumer published a paper describing for the first time the existence of a symmetry in the genetic code (Rumer, 1966; Kay, 2000). When the bases of a codon are transformed with the keto-amino transformation (aka Rumer’s transformation), i.e., T,C,A,G - > G,A,C,T, then the transformed codon belongs to a different degeneracy class than the original one. For example, consider the codon TTC that codes for Phenylalanine (Phe) in the standard nuclear genetic code. Rumer’s transformation applied to this codon gives the transformed codon GGA, which codes for Glycine (Gly). While Phe has degeneracy 2, Gly has degeneracy 4, that is, the transformed codon codes for an amino acid whose degeneracy is different from that of the original one. Observe that this property is shared by all the codons of the standard genetic code. Furthermore, it suffices to transform only the two first letters of the codon to obtain this (anti) symmetry. Thus, Rumer’s symmetry can be defined between quartets of the genetic code, that is, between groups of 4 codons that share the two



first letters. In our example, the quartet TTN (where N means any of the four bases T,C,A,G) that has degeneracy 2+2 (coding for Phe and Leu2), is mapped onto the quartet GGN, that has a different degeneracy i.e., 4 (it forms a family of codons coding for Gly). Any family quartet is mapped onto a non-family one and vice versa (see Fig. 1). Observe that Rumer’s symmetry is a conserved property shared by almost any extant variant of the genetic code. Curiously, the exceptions involve codons of the amino acid Serine (quadrant AGN) in mitochondrial variants (codes 5, 9, 14, 21 listed in http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/ wprintgc.cgi), or the anti-symmetric ones corresponding to the amino acid Leucine (quadrant CTN) for nuclear variants (variants 12, 26). One variant of the mitochondrial code (22, Scenedesmus obliquus mitochondrial code) breaks the symmetry in the quartet TCN, corresponding also to Ser codons. In any case, those particular variants that break Rumer’s symmetry involve only a couple of quadrants, i.e., AGN and CTN, in the majority of cases (minimal symmetry breaking). All the other variants, including the paradigmatic cases of the Standard Nuclear Code and the Vertebrate Mitochondrial Code, preserve Rumer’s symmetry. Observe that this set includes the most symmetric codes such

Corresponding author at: IMM-CNR, Bologna Unit, Italy. E-mail addresses: [email protected] (D.L. Gonzalez), [email protected] (S. Giannerini), [email protected] (R. Rosa).

https://doi.org/10.1016/j.biosystems.2019.104036 Received 31 August 2019; Received in revised form 18 September 2019; Accepted 19 September 2019 Available online 04 October 2019 0303-2647/ © 2019 Elsevier B.V. All rights reserved.

BioSystems 187 (2020) 104036

D.L. Gonzalez, et al.

Table 1 Degeneracy distribution of the vertebrate mitochondrial genetic code. amino acid degeneracy

# of amino acids

2 4

16 8

early and early codes explains Rumer’s symmetry (and other partially conserved symmetries) in extant codes. In Section 4 we offer some conclusions and a discussion on open problems. 2. The tessera model of the pre-early code, a possible ancestor of all extant codes The degeneracy structure of the vertebrate mitochondrial genetic code is of particular importance in evolutionary molecular biology. Due to its inherent symmetry and simplicity it has been proposed as a model for the precursor of the Universal Genetic Code of the Last Universal Common Ancestor (LUCA), the so called early code (Watanabe and Yokobori, 2014). The main rationale is that the Universal Genetic Code evolved from an early code having the same degeneracy structure of the present vertebrate mitochondrial genetic code. We have previously shown (Gonzalez et al., 2016, 2012) that, on the basis of a mathematical model of the genetic code (Gonzalez, 2004; Gonzalez et al., 2016), the degeneracy structure of the vertebrate mitochondrial genetic code (see Table 1) can be obtained by using only symmetry properties of the chemical molecules involved (Gonzalez et al., 2019). The surprising consequence of such an approach is that, in order to achieve this task, we need to resort to a genetic code based on codons and adaptors of length four. We call this code the “tessera code”, a special set of 64 codons of length four with specific symmetry properties. Is this possible? Recent work hypothesizes that the genetic code originated by stereo-chemical affinity between long pieces of ancient nucleic acids and amino acids and a successive reduction of the length of “codons” and anti-codons until the optimal length of three was reached (Baranov et al., 2009). In this framework, at some time before this optimal final transition, the mean length of codons was necessarily four. Such state with non-specific tetracodons leads inevitably to the tessera pre-early code. Here we mention only that Jukes’ proposal of a primitive code (Jukes, 1968) is the most probable for a random selection of length-4 codons and that the transitions from generic codons to specific tesserae is ensured by evolutionary pressure favouring accuracy due to the error detecting properties of the set of tesserae; the evolutionary processes involved are discussed in detail in Gonzalez et al. (2019, 2016). At this point it is useful to study the coding potentiality of genetic codes with codons of any length as compared with the present solutions based on length 3 (that are shared universally by extant organisms). In Table 2 we show the number of possible length “n” codons composed of four different bases T(U), C, A, G. There is only an optimal solution with 64 codons that corresponds to length 3. However, if we consider

Fig. 1. Graphic representation of Rumer’s anti-symmetry. Grey quartets correspond to amino acids of degeneracy 4, white ones to degeneracy 2 + 2. If we apply the Keto-Amino transformation to a grey quartet we obtain a white quartet and vice versa (see text).

as the Vertebrate Mitochondrial and the Nuclear Euplotes. Rumer’s work represents one of the first attempts to apply the powerful mathematical lens of symmetry to genetics. He was convinced that the group theoretic approach based on symmetry, which had been so fruitful for the study of conserved quantities in physics, represents indeed one of the main theoretical tools for understanding the organization of genetic information. Probably, his background in theoretical physics motivated such belief (he collaborated in this field with Einstein and Landau, among others). The idea that the great achievements that conservation laws allowed to obtain in quantum mechanics could have been extrapolated as part of a mathematical approach to biology, was indeed in the air at the time of the uncovering of the genetic code, see, for example Gamow (1954). In fact, Rumer achieved the first result in 1966 but published early other observations about the symmetries of the genetic code (an introduction to Rumer’s work, together with an English translation of his first papers on the symmetry of the genetic code can be found in Fimmel and Strüngmann (2016). It is known that, the more universal an evolutionary feature is, the more ancient its origin should be and the more important is its associated biological significance. Different papers have been written about Rumer’s contributions, see for example, Guilloux and Jestin (2012), Shcherbak (1989), and references therein. But despite Rumer’s symmetry being a universal property of the genetic code, its origin and its biological significance are still poorly understood. In this article we aim to shed light on the origin of Rumer’s symmetry. First, we review some results about the evolution of the genetic code and describe the pre-early code, i.e., a code from which the early code could have evolved. The early code is considered as the precursor of the Universal genetic code characterizing the Last Universal Common Ancestor (LUCA) and is supposed to possess a degeneracy distribution coinciding with that of the vertebrate mitochondrial genetic code (Watanabe and Yokobori, 2014). We show that the pre-early code was probably constituted of codons of four letters (tesserae, from the Greek τεσσερα = four) that were read by symmetric ancient adaptors with anti-codons of the same length. This ancestral pre-early code naturally possesses Rumer’s symmetry, which arises as a consequence of the symmetry of the chemical molecules that originated the pre-early code. The evolutionary transition between the pre-early and the early codes implies some sort of symmetry breaking, as is expected in the transition to more complex instances of any physical system. This occurred in such a way as to preserve Rumer’s symmetry to the detriment of other symmetries that have been only partially preserved. In Section 2 we describe the tessera model as a prototype of the pre-early code and in Section 3 we show how the evolutionary transition between the pre-

Table 2 Number of codons as a function of codon length, “n”.

2

# bases per codon = n

Total # of codons = 4n

1 2 3 4 5 6 7 8

4 16 64 256 1024 4096 16384 65536

BioSystems 187 (2020) 104036

D.L. Gonzalez, et al.

symmetry, we found other possible solutions. We also found that the solution of length 3, having only 16 symmetric codons, is not optimal with respect to this aspect. In Table 2 we show the number of symmetric codons as a function of their length “n”. Symmetric codons are those that satisfy at least one of the two following invariances: i) They are invariant under the reverse transformation, i.e., they are palindromic codons; example for n = 3: TAT. ii) They are invariant under the self-complementary transformation, i.e., the complement followed by the reverse; example for n = 4: TATA Remarkably, codons with the self-complementary symmetry cannot exist for any odd codon dimension n. Palindromic codons, however, exist in all dimensions. Despite this fact, the total quantity of symmetric codons grows as the powers of two. This is because, when passing from an even dimension to an odd dimension, the quantity of palindromic codons is multiplied by four (22), while the number of self-complementary codons drops to 0. Now, let p2n be the number of palindromic codons in dimension 2n (even), the number of self-complementary codons is the same, i.e. s2n = p2n, thus, the total number of symmetric codons in dimension 2n is q2n = 2p2n. In an odd dimension 2n+1, the number of palindromic codons is p2n+1 = 22. p2n and the number of self-complementary codons is s2n+1 = 0, thus, the total number of symmetric codons in dimension 2n+1 is q2n+1 = s2n+1 + p2n+1 = 4p2n = 2q2n When passing from an odd to an even dimension, instead, the number of palindromic codons is kept constant but an equal quantity of self-complementary ones is added. That is, q2n = p2n + s2n = p2n-1 + s2n = p2n-1 + p2n-1 = 2 p2n1 = 2q2n-1 Focussing on the case n = 4, we have 32 symmetric codons that corresponds to one half of the total number of codons in dimension 3. This property holds only for this special case that links length-3 and length-4 codons. Moreover, these 32 codons of length 4 are exactly one half of the tessera code, as shown in Gonzalez et al. (2012) (see Table 4). The first column in Table 4 corresponds to palindromic tesserae, i.e., tesserae that do not change when read in the reverse direction. The second column corresponds to self-complementary tesserae, that is, tesserae that are invariant when complemented and reversed. In our approach, the symmetry both of codons and of the ancient adaptors that read these codons, determines the degeneracy of a specific amino acid. Indeed, if we read the tessera codons of Table 4 with adaptors able to i) read in both directions (direct and reverse) and ii) read self-complementary tesserae (i.e. they possess two anti-tesserae, one the selfcomplement of the other), then we obtain that each adaptor is always associated to two different tesserae of Table 4. In brief, all these tesserae code for amino acids of degeneracy two. Hence, it is possible to derive the complete tessera code that explains the degeneracy distribution of the vertebrate mitochondrial genetic code (see Table 5). It suffices to observe that any tessera of the first column is composed of two di-nucleotides. The first one contains all the 16 possible di-nucleotides, and the second di-nucleotide is obtained from the first one by observing that the first letter is equal to the fourth, and the second letter is equal to the third. In other words, the second di-nucleotide is obtained by applying the identity transformation, i.e., T,C,A,G < - > T,C,A,G, followed by the reverse one. For example, starting with the di-nucleotide AT, we apply the identity transformation, I(AT)=AT, we reverse it, R(AT)=TA, and, finally, we append it to the original di-nucleotide AT obtaining the palindromic tessera ATTA. It is also easy to see that the second column can be characterized in an analogous way with the identity replaced by the strong-weak transformation (or complement), i.e., T,C,A,G < > A,G,T,C. For example, starting with AT, we apply the complement transformation (SW) that gives: SW(AT)=TA, we reverse this di-nucleotide obtaining AT, and finally we append it to the original one for

Fig. 2. An ancient adaptor possessing the two fundamental symmetries, i.e., reverse, and self-complementarity, in its different spatial configurations. Reverse symmetry is achieved allowing Watson-Crick pairing in both directions. Self-complementarity is achieved by adding an additional anticodon which is the self-complement of the original.

obtaining ATAT, in fact, a self-complementary tessera from the second column of Table 5. The identity (I) and the complement (SW), together with the KetoAmino (KM) transformation, i.e., T,C,A,G < - > G,A,C,T, and the Purine-Pyrimidine (YR) transformation, i.e., T,C,A,G < - > C,T,G,A are isomorphic to the Klein V group of transformations. A group of transformations is characterized by a set of properties that need to be satisfied by the composition of the transformations (ordered successive application of two of them). Such properties are usually displayed as a multiplication table in which the row identifies the first transformation, the column the second one, and the intersection cell the result of the composition. In Table 6 we report the multiplication table between the four transformations I, SW, YR, KM, forming a Klein V group. Now, for completing the construction of the complete set of tesserae, it suffices to apply the two transformations that complete the Klein group to any di-nucleotide, i.e., KM and YR, followed by the reverse transformation, and append it to the original one. The two rightmost columns of Table 5 are obtained in this way. Reading the tessera set by cognate symmetric adaptors (depicted in Fig. 2), we obtain 16 elements of degeneracy 2 and 8 elements of degeneracy 4. Observe that a non-symmetric adaptor with an anti-codon of length four can read four different tesserae giving origin to an amino acid family (an amino acid coded by four different tesserae). In this way the degeneracy distribution of the vertebrate mitochondrial genetic code is exactly described (see Table 1). The tessera set is composed by exactly 64 four-base codons (a subset of the 256 possible tetracodons, see Table 2) that are invariant under the action of the Klein V group of transformations. The solution is unique and cannot be obtained by applying other sub-groups of the dihedral group (sub-groups of the group of transformations of a square that also naturally arise in all the possible symmetry transformations of 4 bases in DNA or RNA, (Rotman, 1995). Some remarks are in order, see details in Gonzalez et al. (2016). First of all, the set is error detecting. Any point mutation of a tessera produces a tetracodon that is not a tessera. Thus, any point mutation can be detected. Second, any quadrant (group of four tesserae pertaining to a given column in Fig. 2, and characterized by the same transformation between the two first letters) is invariant with respect to the symmetry transformations of the Klein V group (I, SW, YR, KM) and their reverse. In Seligmann (2012) the authors found that, in a framework of nucleotide transformation/deletion, tesserae are coded 3

BioSystems 187 (2020) 104036

D.L. Gonzalez, et al.

preferentially in predicted mitochondrial peptides. It would be interesting to assess experimentally whether tesserae are more abundant in peptides that are hypothesized to be coded by generic tetracodons (Seligmann, 2017 and references therein).

In Fig. 3a the SW transformation preserves degeneracy on the set of non-mixed codons; in Fig. 3b we can appreciate that, on the same set, the YR transformation breaks degeneracy. In Fig. 3d we see that the SW transformation breaks degeneracy on the set of mixed codons; in Fig. 3c, the YR transformation preserves degeneracy on the same set. Other interesting symmetries regarding degeneracy are also present in extant codons, for example regarding only one of the two first letters of a codon, inverse codons, or also compositions of these transformations. Regarding point iii) we can observe that the transition from the preearly to extant codes can be considered as a sequence of mappings that, starting from the tessera set, ends up on a prototype of the early code characterized by the standard set of 64 (3-base) codons. The key finding is that in the tessera world the transformations between bases of a tesserae play the role of the nucleotides in the world of codons. In order to appreciate this, note that in extant genetic codes, there exists an interesting symmetry: a central letter C define quartets of degeneracy 4 and a central letter A define quartets of degeneracy 2 + 2. Thus we can infer that, the identity transformation on the tessera set is mapped onto a central letter A in codons. Likewise, the keto-amino transformation is mapped onto a central letter C of a codon. In order to complete the mapping, we can observe that there are some constraints: Rumer’s symmetry is conserved so that quartets related by the Rumer’s transformation have different degeneracy. For example, the quartets TGN and GTN possess degeneracy 4 and non-4, respectively. Now, since, dinucleotides of the form WW define degeneracy 2 + 2 codons, and those of the form SS degeneracy 4 ones, we can make the ansatz that the same is valid for length-3 codons. Thus, the essential structure of the mapping between tesserae and codons entails that the transformations between adjacent letters of a tessera become the nucleotides of a codon with an additional ketoamino transformation of the first two bases of the codon when they are either Strong + T or Weak + G. In particular, given a tessera b1b2b3b4 we can have three chemical transformations between adjacent letters: t12 = f(b1b2) between b1 and b2, t23 = f(b2b3) between b2 and b3, and t34 =f(b3b4) between b3 and b4. We propose that t12 and t23 be mapped onto the first and second nucleotide of the codon, respectively (x1; x2). This correspondence is shown in Table 3. Moreover, the fourth letter b4 is mapped onto the third nucleotide of the codon x3. A schematic representation of the mapping is presented in Fig. 4. Note that, according to this mapping, two columns of the tessera set are mapped onto corresponding columns of the genetic code so that t23 = I is mapped onto NAN codons (degeneracy non-4), and t23 = KM is mapped onto NCN codons (composed only of families). Thus, these two columns of the tessera code share the same degeneracy with the corresponding columns of the genetic code (either 4 or 2 + 2). The natural completion of the mapping assigns t23 = SW to NUN codons and t23 = YR to NGN codons with the aforementioned additional transformation. With these assignments it is easy to see that Rumer’s symmetry is indeed preserved and that the YR and SW transformations are exactly mixed (in one half of the code degeneracy is preserved and in the other half it is broken).

3. Rumer’s symmetry in the pre-early and early codes As described in Section 1, Rumer’s symmetry consists in the change of degeneracy when the Keto-Amino (or Rumer) transformation is applied to the first two letters of a codon. In the following we show that i) this symmetry is naturally present in the tessera set (pre-early genetic code); ii) other symmetries that are present in extant codes are also a consequence of the symmetries present in the ancestor tessera code; iii) Rumer’s symmetry and the additional symmetries mentioned in ii) that are observed in extant codes can be derived by means of a mapping from the tessera-set to the codon-set of extant codes. We first show that Rumer’s symmetry is present in the tessera code: From Table 5 we can see that Rumer’s transformation applied to the two first letters of any tessera of the first column produce a tessera of the last column on the same row. The converse is also true. Since tesserae of the first column belong to degeneracy 2 + 2 and tesserae of the last column to degeneracy 4, this half of the tessera set satisfies Rumer’s symmetry. Similarly, it is easy to see that the second column is mapped onto the third and vice versa. This, again, implies a transformation from quartets of degeneracy 2 + 2 onto quartets of degeneracy 4. Thus, the whole tessera set has the property of being (anti) symmetric with respect to the Keto-Amino transformation of the first two letters. This proves assertion i). Regarding ii), note that the other two possible chemical transformations of the Klein V group are also related to symmetry properties on the tessera set. When applied to the two first letters of a given tessera, the SW, or complementary transformation, maps the first column onto the second one, and the third onto the fourth. Since it transforms tesserae between columns of the same degeneracy, i.e., 2 + 2 in 2 + 2 and 4 in 4, respectively, the SW transformation is a degeneracy-preserving transformation. Analogously, the YR transformation, maps the first column onto the third, and the second onto the fourth. Thus, similarly to Rumer’s transformation, the YR transformation is also a degeneracybreaking transformation. Finally, the identity transformation trivially maps the columns onto themselves and it is a degeneracy-preserving transformation. In conclusion, the tessera set has two degeneracy breaking transformations (KM, and YR) and two preserving ones (SW and identity). Extant codes, differently from the tessera code, have only two transformations that globally preserve or break degeneracy: the identity (preserving) and the KM (breaking). However, the other two chemical transformations, SW and YR, are not acting randomly. We will see that, instead, they are uniformly distributed (maximum entropy) and arise as a consequence of the evolutionary transition between the tessera primeval code and more modern codes. In the transition from tesserae to extant codons we pass from an even (n = 4) to an odd (n = 3) dimensionality of the codons. As discussed above, self-complementarity is not possible in codons of odd length. Hence, in dimension three this symmetry is lost so that the self-complementary transformation cannot define a column of quartets of degeneracy 2 + 2. Similarly, it is not possible to keep the degeneracy breaking symmetry of the YR transformation. How nature resolved this puzzle? In extant codes, the complementary (SW) and Purine-pyrimidine (YR) transformations, are evenly distributed and form a fantastic symmetry structure separating mixed (codons of the form SSN or WWN) from non-mixed codons (codons of the form WSN or SWN). In Fig. 3a–d we show the effect of the SW and YR transformations in these halves of extant codes defined by mixed and non-mixed codons.

4. Conclusions Symmetry features in the genetic code and in genome information are not a mathematical curiosity; as in the evolution of physical systems, where the transition from simple to complex systems is usually associated to a symmetry breaking chain, biological systems also exhibit this trend. In particular, Rumer’s transformation represents a key aspect for explaining the degeneracy of the genetic code, its origin, and its evolution to present forms, see Gonzalez et al. (2019), Lehmann and Libchaber (2008), van der Gulik and Hoff (2016). As shown in Gonzalez (2008), Gonzalez et al. (2006, 2008), Rumer’s symmetry can be described exactly as a dichotomic class in terms of the chemical characters of the first two nucleotides of a codon. In physics, symmetry features lead naturally to conserved 4

BioSystems 187 (2020) 104036

D.L. Gonzalez, et al.

Fig. 3. a. Symmetry-preserving complementary (SW) transformation of the non-mixed codon set. b. Symmetry-breaking Purine-Pyrimidine (YR) transformation of the non-mixed codon set. c. Symmetry-preserving (YR) transformation of the mixed codon set. d. Symmetry-breaking complementary (SW) transformation of the mixed codon set.

Table 3 Number of symmetric codons as a function of codon length, “n”. # bases per codon = n

# of symmetric codons = 2n+1

1 2 3 4 5 6 7 8

4 8 16 32 64 128 256 512

Table 5 Complete tessera set including the non-symmetric tesserae generated by the Klein V group of transformations. The last two columns are generated by the Purine-Pyrimidine, and the Keto-Amino transformations applied to the first di-nucleotide followed by the reverse one and appended to the first one. Groups of tesserae coding for the same amino acid are identified with the same colour (either white or green).

Table 4 First half of the tessera code composed of symmetric tesserae, i.e., palindromic tesserae (first column), and selfcomplementary tesserae (second column).

Table 6 Multiplication table of the four chemical transformations between the bases of DNA or RNA forming a Klein V group.

I SW YR KM

I

SW

YR

KM

I SW YR KM

SW I KM YR

YR KM I SW

KM YR SW I

conservation over evolutionary times connects it to fundamental aspects of coding and decoding of genetic information, in particular error detection and correction. We have explored to some extent different cases related to this hypothesis: point error detection/correction, frame-shift error

quantities. In biology, these conserved quantities should be associated with relevant biological functions whose eventual failure is deleterious for an organism. In the case of Rumer’s symmetry, we argue that its 5

BioSystems 187 (2020) 104036

D.L. Gonzalez, et al.

article. Mainly, our research on the symmetry features of the genetic code leads to interesting evolutionary hypothesis about the origin of the code and its evolution. It leads also to puzzling hypothesis about the role of these symmetries and related biological processes in present codes. Last but not least, this research can provide new promising tools in biotechnology and bioinformatics. Declaration of Competing Interest The authors declare no competing interests. References Arquès, D., Michel, C.J., 1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45–58. Baranov, P., Venin, M., Provan, G., 2009. Codon size reduction as the origin of the triplet genetic code. PLoS One 4, e5708. Fimmel, E., Strüngmann, L., 2016. Yury Borisovich Rumer and his ‘biological papers’ on the genetic code. In: In: Cartwright, J., Giannerini, S., Gonzalez, D.L. (Eds.), DNA as Information, Theme Issue of the PTRSA 374 Issue 2063, 2015.0228. Fimmel, E., Giannerini, S., Gonzalez, D.L., Strüngmann, L., 2015. Circular codes, symmetries and transformations. J. Math. Biol. 70, 1623–1644. Gamow, G., 1954. Possible relation between deoxyribonucleic acid and protein structure. Nature 173, 318. Gonzalez, D.L., 2004. Can the genetic code be mathematically described? Med. Sci. Monit. 10. Gonzalez, D.L., 2008. The mathematical structure of the genetic code. In: In: Barbieri, M., Hoffmeyer, J. (Eds.), The Codes of Life: The Rules of Macroevolution, Biosemiotics 1. Springer, Netherlands, pp. 111–152. Gonzalez, D.L., Giannerini, S., Rosa, R., 2006. Detecting structure in parity binary sequences: error correction and detection in DNA. IEEE Eng. Med. Biol. 25, 69–81. Gonzalez, D.L., Giannerini, S., Rosa, R., 2008. Strong short-range correlations and dichotomic codon classes in coding DNA sequences. Phys. Rev. E 78, 051918. Gonzalez, D.L., Giannerini, S., Rosa, R., 2011. Circular codes revisited: a statistical approach. J. Theor. Biol. 275, 21–28. Gonzalez, D.L., Giannerini, S., Rosa, R., 2012. On the Origin of the Mitochondrial Genetic Code: Towards a Unified Mathematical Framework for the Management of Genetic Information. Available from Nature Precedings. https://doi.org/10.1038/npre. 2012.7136.1. Gonzalez, D.L., Giannerini, S., Rosa, R., 2016. The non-power model of the genetic code: a paradigm for interpreting genomic information. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 374 2015.0062. Gonzalez, D.L., Giannerini, S., Rosa, R., 2019. On the origin of degeneracy in the genetic code. Royal Society Interface Focus. Royal Society, London (in press). Guilloux, A., Jestin, Jean-Luc, 2012. The genetic code and its optimization for kinetic energy conservation in polypeptide chains. BioSystems 109, 141–144. Jukes, T., 1968. Molecules and Evolution. Columbia University Press. Kay, L.E., 2000. Who Wrote the Book of Life? Writing Science, Stanford University Press, Stanford, California. Lehmann, J., Libchaber, A., 2008. Degeneracy of the genetic code and stability of the base pair at the second position of the anticodon. RNA 14, 1264–1269. Rotman, J.J., 1995. An Introduction to the Theory of Groups. Springer, Berlin. Rumer, Y., 1966. About the codon’s systematization in the genetic code. Proc. Acad. Sci. U.S.S.R. (Doklady) 167, 1393–1394 (in Russian). Seligmann, H., 2012. Putative mitochondrial polypeptides coded by expanded quadruplet codons, decoded by antisense tRNAs with unusual anticodons. Biosystems 110 (2), 84–106. Seligmann, H., 2017. Reviewing evidence for systematic transcriptional deletions, nucleotide exchanges, and expanded codons, and peptide clusters in human mitochondria. Biosystems 160, 10–24. Seligmann, H., Warthi, G., 2017. Genetic code optimization for cotranslational protein folding: codon directional asymmetry correlates with antiparallel betasheets, tRNA synthetase classes. Comput. Struct. Biotechnol. J. 15, 412–424. Shcherbak, V.I., 1989. Rumer’s rule and transformation in the context of the cooperative symmetry of the genetic code. J. Theor. Biol. 139 (2), 271–276. van der Gulik, P., Hoff, W., 2016. Anticodon modifications in the tRNA set of LUCA and the fundamental regularity in the standard genetic code. PLoS One 11, 1–21. Watanabe, K., Yokobori, S., 2014. How the early genetic code was established? -Inference from the analysis of extant animal mitochondrial decoding systems. In: Erdmann, V., Markiewicz, W., Barciszewski, J. (Eds.), Chemical Biology of Nucleic Acids: Fundamentals and Clinical Applications, RNA Technologies. Springer, pp. 25–40.

Fig. 4. (with permission from Interface Focus, Royal Society Publishing) Schematic representation of the mapping between the tessera (b1 b2 b3 b4) and the codon (x1 x2 x3).

detection/correction, and control over the synthesis of proteins. Concerning point mutations, the tessera model is error detecting. Moreover, in the framework of our mathematical model of the genetic code, we have shown that Rumer’s symmetry can be described algorithmically as a dichotomic class, similar to the parity and hidden classes (classes with two possible values which divide evenly the genetic code) (Gonzalez, 2008; Gonzalez et al., 2016). In this context, we found that strong short range correlations exist between dichotomic classes, including Rumer’s class (Gonzalez et al., 2006, 2008). This fact represents a strong evidence that Rumer’s class may be involved in some error detection/correction mechanism that compensates for the loss in redundancy due to the transition between the tessera code, the early code, and present codes. We hypothesize that the role played by chemical transformations in the tessera code is played by dichotomic classes (including Rumer’s one) in extant codes. As for frame-shift error detection-correction, we have found that dichotomic classes are intimately related to circular codes. In particular, some di-nucleotide circular codes exactly coincide with some partitions induced by dichotomic classes (forthcoming work). This result points to a role of Rumer’s class in circular codes and, consequently, in the detection and maintenance of the correct reading frame (Gonzalez et al., 2011; Arquès and Michel, 1996; Fimmel et al., 2015). Regarding the control over protein synthesis, we have explored the possibility that circular codes are related to the efficiency in protein synthesis. In this respect, we have found universal properties of circular codes which are independent of the organism and that are in agreement with recent experimental results on protein synthesis speed and efficiency. The main rationale is that compliance with circular code properties contributes to a correct reading and maintenance of the frame favouring the elongation step in protein synthesis (submitted work). Of course there are many open problems suggested by our model. For example, on the basis of a new symmetry index (Seligmann and Warthi, 2017), it is hypothesized that some of the codon symmetries correlate with antiparallel beta-sheet formation in mitochondrial sequences and with tRNA synthetase classes in nuclear sequences. In this context, we have found another strong anti-symmetric rule connecting the tRNA aminoacil synthetases with the YR, reverse, and KM transformations. We will report about these symmetries in a forthcoming

6