tRNA structure from a graph and quantum theoretical perspective

tRNA structure from a graph and quantum theoretical perspective

ARTICLE IN PRESS Journal of Theoretical Biology 240 (2006) 574–582 www.elsevier.com/locate/yjtbi tRNA structure from a graph and quantum theoretical...

319KB Sizes 0 Downloads 60 Views

ARTICLE IN PRESS

Journal of Theoretical Biology 240 (2006) 574–582 www.elsevier.com/locate/yjtbi

tRNA structure from a graph and quantum theoretical perspective Johan F. Galindoa, Clara I. Bermu´deza,b, Edgar E. Dazaa, a

Grupo de Quı´mica Teo´rica-Universidad Nacional de Colombia, Bogota´, D.C., Colombia Grupo Grupo de Biologı´a Molecular Teo´rica y Evolutiva-Universidad Nacional de Colombia, Bogota´, D.C., Colombia

b

Received 17 June 2005; received in revised form 23 September 2005; accepted 25 October 2005 Available online 6 December 2005

Abstract One of the objectives of theoretical biochemistry is to find a suitable representation of molecules allowing us to encode what we know about their structures, interactions and reactivity. Particularly, tRNA structure is involved in some processes like aminoacylation and genetic code translation, and for this reason these molecules represent a biochemical object of the utmost importance requiring characterization. We propose here two fundamental aspects for characterizing and modeling them. The first takes into consideration the connectivity patterns, i.e. the set of linkages between atoms or molecular fragments (a key tool for this purpose is the use of graph theory), and the second one requires the knowledge of some properties related to the interactions taking place within the molecule, at least in an approximate way, and perhaps of its reactivity in certain means. We used quantum mechanics to achieve this goal; specifically, we have used partial charges as a manifestation of the reply to structural changes. These charges were appropriately modified to be used as weighted factors for elements constituting the molecular graph. This new graph-tRNA context allow us to detect some structurefunction relationships. r 2005 Elsevier Ltd. All rights reserved. Keywords: tRNA structure; Graph-indices; Quantum properties; Genetic code

1. Introduction Two major problems can be recognized when chemical and biological phenomena are considered from a molecular perspective. The first one is related with the very concept of what molecular structure is (Villaveces and Daza, 1990, 1997). The second one arises when molecules are compared, because of the lack of good approaches to assert how similar or different two molecules are (Mezey, 1987, 1993; Carbo´, 1995). Both questions become severe when we consider biomolecules as proteins or nucleic acids (Reidys and Stadler, 1996). Nowadays, to characterize a molecular structure the usual approach is to try to figure out a fixed spatial distribution of atoms forming a molecule in the ordinary three dimensional space, i.e. to set a geometry for each molecule. The same approach has been used in protein and Corresponding author. Tel.: +57 1 316 5000x18324; fax: +57 1 316 5220. E-mail address: [email protected] (E.E. Daza).

0022-5193/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2005.10.017

nucleic acid research. Nevertheless, this is quite a difficult task; feasibility and accuracy depends on which type of methodology is employed, whether experimental or theoretical (Mezey, 1987; Sutcliffe, 1992). Experimentally, only an average geometry is available, because nuclei are in permanent motion. Moreover, depending on which approach is used, the resulting geometries can diverge and a great degree of uncertainty remains associated to every determination. This limitation is not due to experimental errors but to the underlying mechanical model, a classical perspective opposite to the quantum nature of nuclei, atoms and molecules. Theoretical approaches also lead to different results. When molecular mechanics or molecular dynamics are used, limitations arising from the classical model are presented. Additionally, implemented models make use of approximated interaction potentials whether or not the force field is fitted to reproduce experimental or theoretical potentials (Mezey, 1987). On the other hand, if quantum mechanical calculations are intended, the size of biomolecules preclude calculations with a level of accuracy

ARTICLE IN PRESS J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

comparable to data obtained experimentally (Cioslowski, 1993). In both cases, we are also limited by the lack of a model simulating the environment in which molecules are embedded. If we intend to establish models conceptually more rigorous, better suited to perform comparison between molecules, we consider that it is necessary to go a step further in the interpretation of what the molecular structure is, i.e. to go beyond the simple model of a geometrical distribution of nuclei. Therefore a new model encompassing most of the knowledge about chemical structure is required. To achieve this goal alternative approaches have been proposed; some of them convey hybrid models merging diverse knowledge. In the second half of the 20th Century a method to represent chemical structures that became prominent was to consider molecules as mathematical objects capable of being carriers of information coming from different sources. That approach has employed graph theory (Harary, 1969; Trinajstic, 1983). Based on it, several chemical systems have been successfully represented, for instance, single molecules, molecular fragments, crystals, polymers, clusters and chemical reactions. These systems share a common property, namely, they are composed of a set of elements: atoms, molecules acting as monomers or molecular fragments which are linked or related in some way, for example, as chemical bondings, van Der Walls interactions, hydrogen bonding, reactions paths, an so forth (Kier and Hall, 1986; Randic, 1992). We are able to recognize a mathematical object behind these correspondences: a graph, G ¼ ðV ; EÞ. A graph is a pair formed by one set of elements, V ¼ fvi g or vertices, and a set of pairwise relations, E ¼ fðvi ; vj Þg or edges. By its definition standard molecular graph arises from the primary concept of chemical structure, i.e. the binding of atoms; therefore it constitutes by itself a very good alternative to those methods based on mechanical analogies that are usually in conflict with the quantum character of the molecular world. From this representation it is possible to define mappings over a set of graphs (molecules) producing usually real numbers. These mappings are generally called graph theoretical indices. Families of these numbers have been used as quantitative representations of molecules and when viewed as n-tuples, they define spaces representing chemical structures, that is, structural spaces (Nin˜o et al., 2001). Until now multiple mappings have been defined (Randic, 2001; Basak et al., 2004). In order to distinguish among molecules having the same pattern of connectivity but different components, either vertices or connection types, properties from diverse origins are employed. For vertices, different classes of valences have been defined depending on which atoms are bonded, nuclear charges, etc.; for edges, weighting factors are associated depending on the type of bonding (single, double or triple) or to the bond order (Bermu´dez et al., 1999). Recently, weighting factors have been derived from

575

properties or quantities calculated by quantum mechanical methods on selected molecules. In this way, weighted graphs conjugating information from multiple sources are obtained. Within this framework we have developed a model to characterize families of tRNAs using simultaneously graph theory and weighting factors from quantum mechanical calculations (Bermu´dez et al., 1999). However, to study molecules with a very similar pattern of binding (like tRNAs) and having a set of vertices apparently small (four nucleotides) it is necessary to revisit the graph definition and the method of assigning weighting factors, in such a way that its construction reveals finer details in the structure, such as the nature of each nucleotide (vertex) with respect to its first neighbors. In other words, the graph must reflect interactions like hydrogen bonding or stacking of different nitrogenate base pairs. In this paper we have proposed an alternative way of constructing tRNA graphs as well as new graph theoretical indices which, used as structural descriptors, allowed us to compare the tRNAs decoding amino acids belonging to the aspartate and the glutamate biosynthetic families of the halophilic archeon Haloarcula marismortui, whose genome was reported in 2004 (Baliga et al., 2004). The methodology we have employed to build the new graphs, their weighting factors and graph theoretical indices is presented below. 2. Modeling tRNAs 2.1. Fragmenting tRNAs The new approach to study the tRNA structure allows us to include into the model, not only hydrogen bonding interactions between complementary base pairs, but also the stacking interactions of RNA base pairs, which are of the utmost importance for the RNA or DNA stability (Hobza et al., 1995; Grosjean et al., 1996; Hobza and Sponer, 1998; Hou et al., 1999; Sengupta et al., 2000). Before weighting each nucleotide (graph vertex), all the possible structural motifs defining the nearest-neighborhood for each of the four nucleotides (A, G, C, U) have been considered, i.e. we have looked for some differences due to the location of the nucleotide between one or more neighboring nucleotides within the tRNA structure. By following this strategy, we have found a nucleotide classification that responding not only to the chemical environment surrounding a particular nucleotide but also to their chemical nature. This method of fragmenting tRNA into several molecular motifs, taking into account the influence of the first neighbors of each nucleotide in the tRNA secondary structure, let us classify the structural motifs into seven different ones; see Fig. 1. Following this approach, 880 fragments were obtained, corresponding to all possible permutations for Watson– Crick base pairs, the atypical GU pair and three nucleo tides single strands (see Fig. 1). In Fig. 2, the amount of

ARTICLE IN PRESS 576

J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

2.2. Characterizing nucleotides

Fig. 1. Fragmenting tRNAs into seven structural motifs (a–g). Standard numbering of nucleotides in tRNAs from 1 to 76. Positions in each structural motif a–g are numbered from 1 to 6.

Fig. 2. Total structural fragments for each motif type.

fragments for each motif is shown. The black points represent nucleotides for which the influence of neighboring nucleotides (white points) was evaluated. These molecular fragments can be classified into seven different motifs: micro-helix, semimicro-helix and single strand. The motifs are shown in Fig. 1. Micro-helix (a) correspond to tRNA double helix regions, semimicro-helix to loop closed (b, c, d, e) and the single strand (g) to loop sequences and the CCA30 OH end. The motif (f ) was used to characterize the so-called discriminator base 73. Once the structural motifs were identified, we proceeded to build their corresponding 3D models in order to detect the nuclear geometry of each fragment (Hyperchem5.0, 1997). The initial parameters were the standard ones for RNAs, shape A, as they are reported in the Protein Data 0 0 Bank.  OH and PO2 4 were used as caps for 5 and 3 ends. Additional corrections for atomic superpositions were made by optimizing phosphodiester backbone, using MM+ and the AMBER force field.

Once the geometry of the 880 fragments were designed, ab initio HF/6-31G calculations were performed to obtain the corresponding wave functions (GAUSSIAN 98. Johnson et al., 1998). Thus, to estimate those properties which give us an idea of how a nucleotide responds to interactions taking place within the structural motif, we have calculated partial charges (Mulliken, 1955) associated to nuclei in the nucleotide. These charges allow us to monitor the answer of nuclei to structural changes due to nucleotide neighborhood. Although 880 molecular fragments were proposed (i.e. 880 nucleotides to be weighted), we really have to deal with 1241 nucleotides because there are some additional nucleotides to be considered in the micro-helices (a), semimicro-helices (b) and one more for the adenine in the CCA30 OH end. See the black points in Fig. 2. This requires to extract and analyze about 38,000 atomic charges. Each nucleotide in a particular chemical context was characterized at the outset by all its partial atomic charges. However, to keep only the more discriminant ones we reduced the set for each nucleotide by using Principal Component Analysis (PCA) (MINITAB, 2003). This multivariate method allow us to choose the nuclei with the most influential partial atomic charges for each nucleotide (A, G, C, U) in a specific neighborhood. To do this, each nucleotide was subdivided into two major structural components: the aromatic base and the ribosephosphate group. Then, the most influenced partial atomic charges in a fragment of one specific motif were evaluated (i.e. we selected the atoms having the main variance). To choose these atoms, the PCAs were computed using two motifs as reference: micro-helices and single strand. The most discriminant atomic charges in both cases belong to atoms in the nucleoside, i.e. nuclei in the aromatic base, mainly in double bonds, hydrogen bonding regions and ribose (Fig. 3). For atomic charges of the phosphate group, the neighborhood interactions did not seem to be influenced, at least by using this method. Since the number of discriminant atomic charges for each nucleotide was different, it was necessary to find an homogeneous set of variables which give as a result an homogeneous set of atoms for all the nucleotides being considered. To do that, we extended the lower bounds of minimal variance until a set of atoms fulfilling the property of being isoprotonic (sets of atoms having the same total nuclear charge, for instance: Ne, HF, H2 O, NH3 and CH4 ) was completed (Daza and Villaveces, 1994; Daza and Bernal, 2005). The Purine or Pyrimidine atoms whose partial charges were chosen are shown in the Fig. 3. fC01 ; H01 ; C02 ; H02 ; C03 ; C05 ; O05 ; N9 ; C4 ; C5 ; C6 ; N1 ; C2 , ðN6 þ H6;1 or O6 Þg ¼

Purines,

ð1Þ

fC01 ; H01 ; C02 ; H02 ; C03 ; C05 ; O05 ; N1 ; C6 ; C5 ; C4 ; N3 ; C2 , ðN4 þ H4;1 or O4 Þg ¼

Pyrimidines.

ð2Þ

ARTICLE IN PRESS J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

577

Fig. 3. Nucleotide isoprotonic series, in gray the atoms used to weight graph-tRNA components.

With the partial charges of atoms in these new sets, for each of the 1241 nucleotides, a new PCA was carried out. Now, the first four PCs which are linear combinations of the chosen partial charges, settle a common base to represent the molecular motif we have defined. Each nucleotide in a particular chemical environment then will ~ be represented by using 4-tuples, considered as vectors, Q, in this molecular representation space. ~ ¼ ðq1 ; q2 ; q3 ; q4 Þ, Q

(3)

where ðqi ¼ PC i Þ. An example of the nucleotide distribution in the molecular representation space defined by the first three PCs is shown in Fig. 4. This example corresponds to the 43 ¼ 64 different arrays of nucleotides in the structural fragment (g). The components of these vectors were used to define different weight factors for vertices and edges in the tRNA. As a theoretical perspective, we stress that in this procedure a quantum property is passed to the proposed graph model. 3. Mapping tRNAs: quantum and graph theoretical perspective Based on this alternative construction of the graph associated to each tRNA, we have proposed some variations on a family of very well known graph theoretical indices allowing us to represent tRNAs by real numbers. From the characteristic vector of each nucleotide we have defined weight factors to recalculate Randic indices (Randic, 2001) and a new weighted distance for the Balaban index (Balaban, 1985). Additionally, we have also evaluated two new indices, namely, the sum of areas, and one similar to the weight center, the charge-valence (Bermu´dez, 2004; Galindo, 2004). Software to carry out these calculations has been developed in our laboratory.

3.1. Weighting edges and vertices We have defined factors to weight edges as well as vertices. To weight vertices we have proposed four valences, Ba , Bb , Bc and Bd and one way of modifying graph distances, Di; j , which can be considered as a weighting factor for edges. All of them depend on the characteristics of the nucleotide determined by the structural motif where it is located. The first valence is defined as: Ba ¼ I þ ðpH þ 1Þjq1 j,

(4)

where I is the number of phosphodiester linkages and pH is an integer with values 0 or 1 depending on hydrogen bonding existence (either 0 to 1 whether or not there is a hydrogen bonding linkage), and jq1 j is the first PC. Note that the valance Ba implicitly has the standard graph theoretical valence for a vertex n, which is the number of incident edges: n ¼ I þ pH.

(5)

The Bb valence is just the first principal component, i.e. a linear combination of the more discriminant charges. By definition it has charge units: Bb ¼ jq1 j.

(6)

Third valence, Bc , is interpreted in a vectorial context like the 4-tuple of PCA: ~ Bc ¼ ðq1 ; q2 ; q3 ; q4 Þ ¼ Q.

(7)

The last valence, Bd , was proposed to define an unique number associated to each nucleotide represented by the ~ Since these vectors have a common base, we vector Q. consider each nucleotide as a radio-vector in a 4-dimensional linear space, with a common coordinated origin ð0; 0; 0; 0Þ. In our case, this origin corresponds to a molecule without redistribution of charges. We had made use of the concepts of area, volume and hyper-volumes

ARTICLE IN PRESS J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

PC3

578

0.62 0.6 0.58 0.56 0.54 0.52 0.5 0.48 0.46 0.44 0.42

-1.1

Adenylate Guanylate Cytidylate Uridylate

-1

-0.9

-0.8

-0.7

-0.6

PC1

-0.5

-1 -1.05 -1.1 -1.15 PC 2 -1.2 -1.25 -0.4 -0.3 -1.3

-0.85 -0.9 -0.95

Fig. 4. Structural Space defined by the three first PCs for structural fragments (g), i.e. for a single strand. Four clusters are well defined, each one with 16 single strands having the same central nucleotide.

(a)

Fig. 5. Area and volume generated by two and three vectors.

(b)

(c)

Fig. 6. Cases of the n-esimo nucleotide vectors.

generated by two, three or more vectors as have been generalized in the p-dimensional linear space theory (see Fig. 5). These quantities were calculated using Gramm determinants (Lo¨wdin, 1998). Depending on its neighborhood, a nucleotide might have one, two or three neighbors (see Fig. 6), hence valence Bd may be associated with the area D2 subtended by two vectors, a volume D3 or a hyper-volume D4 . In consequence, the valence Bd is: Bd ¼ D2 ,

(8)

or Bd ¼ 1:4932 þ D3 ,

(9)

or Bd ¼ 1:4932 þ D4  10,

(10)

depending on the nucleotide position in tRNA. The scale constant 1:4932 is the value of D2 for the cytosine in the middle of the CCA0 OH end. To weight the graph theoretical distance, we have used the area conformed by two adjacent nucleotides; hence the distance Di;j between two nucleotides i and j, is equal to the sum of the areas D2 of the shortest path between i and j.

With these values the distance matrices associated to each graph-tRNA were built. 3.2. Mapping tRNAs on real numbers As we said above graph theoretical indices has been proposed as numbers representing or capturing the structural information. However, to represent the molecular structure by one index it is necessary to ensure the existence of one to one relation between its scores and the tRNAs. In order to design a discriminant index it is really important for a specific characterization for vertex and edges, i.e. to define a particular value for each nucleotide or bonding patterns associated to the nucleotide. To do this, we employed the valences and weighted distances defined above. Thus, we have defined twenty indices, seventeen based on Randic definition of connectivity indices (wlvalence ), one on Balaban distance index (J) and two new ones (m; s). 3.2.1. Randic index w Randic index was proposed in 1975 (Randic, 1975) and soon extended to higher path values (Kier and Hall, 1986). They made some modifications to include heteroatoms

ARTICLE IN PRESS J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

making the indices more discriminant. In our case, to calculate each modified index, the normal graph-theoretical valences ni were replaced by the valences Ba , Bb and Bd defined in the preceding sections. By using these new valences three main groups of Randic indices were calculated: wla , wlb and wld , where l is 0 to 4. The Bc valence Randic index for w0 is: X X ~ jQ ~ iÞ1=2 ¼ ~ kÞ1 , ðhQ ðkQ (11) w0 ¼ n n n n

n

~ k is the norm of the vector representing to the where kQ n nucleotide n defined by means of the internal product. For the Randic index of order 1 (w1 ) we have: X ~n jQ ~m iÞ1=2 , ðhQ (12) w1 ¼

579

space idea and taking the index proposed by Pogliani (2000), we calculated a number similar to the definition of the center mass. This new index is the sum of the products of the graph-theoretical valences of each vertex by the ~ generated as in the components of the corresponding Q, previous sections, divided by the sum of standard graphtheoretical valences (n) in the graph, PN P4 nn  qin (14) m ¼ n¼1 i¼1 V and X V¼ nn , n

where N is the total number of vertices.

paths

which corresponds to the internal product between each couple of vectors. 3.2.2. Balaban index J Balaban index J (Balaban, 1985) is based on the graphtheoretical distance matrix and by definition the information on the secondary structure could be captured. Now, the distance Di; j is used to define the modified Balaban index: K X J¼ ðd i  d j Þ1=2 , (13) M þ 1 edges where K is the total number of edges in the molecular graph, n is the number of vertices, and M ðM ¼ K  n þ 1Þ is the cyclomatic number (which indicates the number of cycles of the graph; it is equal to the minimum number of edges to be removed so that the recurrent graph becomes P an acyclic graph). Finally, d i ¼ nl¼1 Dil , is the sum of distances from vertex i to the other ones. 3.2.3. Index charge-valence m This new index has not been previously reported in the literature. Since the numeric operations to determine Randic valences were replaced by vectorial operations, we have proposed it as an analogy between atomic charge spaces and linear spaces. Therefore, adopting this vectorial

3.2.4. Sum of areas s Finally, one last new index was defined. This corresponds to the sum of all the areas generated by each nucleotide with the adjacent ones: s¼

n X

D2i; j ,

(15)

i

where i and j are two adjacent nucleotides. An example of some of these indices is shown in Table 1. These indices correspond to the tRNAs from the halophilic archeon H. marismortui whose genome was reported in 2004 (Baliga et al., 2004), which decode amino-acids belonging to the aspartate and glutamate biosynthetic families. 4. Discussion: comparing tRNAs The twenty indices proposed were used to represent the H. marismourtui tRNAs in order to quantify their structural similarity. These graph-theoretical tRNAs indices were used as components of a vector representing each tRNA. Afterwards, tRNAs from a biosynthetic family were classified following a typical clustering analysis. This allowed us to group tRNAs according to the different distance coefficients being implemented in a cluster analysis.

Table 1 Indices from tRNAs that recognize amino acids belonging to aspartate biosynthetic pathway tRNA

J

m

w1a

w2a

w3a

w4a

s

AsnGUU AspGUC AspGUC LysCUU LysUUU IleGAU MetCAU ThrCGU ThrGGU ThrUGU

1.2305 1.3442 1.3442 1.4229 1.3759 1.4340 1.4972 1.2057 1.4598 1.3360

0.4920 0.5128 0.5128 0.5443 0.5056 0.5165 0.5654 0.5762 0.5111 0.5521

26.6238 26.8013 26.8013 27.2852 26.5819 27.1402 27.7902 27.7451 27.1338 27.4738

22.1596 22.3485 22.3485 22.9473 22.0232 22.6017 23.5567 23.6892 22.6946 23.1666

19.4459 19.5770 19.5770 20.2502 19.1998 19.7427 20.9602 21.2775 19.9480 20.5664

15.3878 15.4545 15.4545 16.1395 15.0981 15.5646 16.8336 17.2262 15.8363 16.4676

79.7830 76.3350 76.3350 70.5984 73.6908 76.5495 73.7162 79.3206 72.2404 77.9204

ARTICLE IN PRESS 580

J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

In order to quantify and establish relationships with biological functions, we select a special tRNA set. Since there are certain indications that a tRNA possesses detectable phylogenetic information (Di Giulio and Medugno, 1998), we propose that some ideas about the genetic code organization can be traced by studying molecular structure in the form characterized in our new model. Following the above considerations, we considered H. marismourtui tRNAs decoded amino acid belonging to the aspartate and glutamate amino acid biosynthetic pathway in order to understand the genetic code organization. These new perspective allow us to check the organization for these families, thinking of precursor–product relationships. Since the tRNA aminoacylation depends on the specific tRNA recognition, we claim that our model captures the structural features permitting to trace some aspects of the co-evolution of the genetic code proposed by Wong 30 years ago (Wong, 1975). To avoid sequence size influences in the index definition, the tRNAs were modified by using a constant size in D and an extra loop. This consensus sequences were assigned by consensus analysis among twenty different tRNA sequences from several archaeas (Nicholas and Nicholas, 1997; Thompson et al., 1997). A homogeneous size of 75 nucleotides was employed. The secondary structure was predicted using the program RNADraw (Matzura and Wennborg, 1996). Once the 2D structures were recovered, the graph-theoretical tRNAs and their indices were used in a PCA procedure to evaluate the influence of each index in the observed similarity. All the indices have an excellent discriminatory power in as much as all of them support a variance near to the 97%. These new orthogonal indices were used in a cluster analysis (Euclidiean and Pearson metrics as well as UPGMA as a linkage method were implemented). The cluster obtained are statistically significant. Matrix correlation between the cophenetic value matrix and the Euclidean distance matrix for tRNAs from aspartate biosynthetic family is 0.70, using normalized Mantel statistic Z and the approximate Mantel t-test: t ¼ 4:4, prob. random Zo observed Z: p ¼ 1, out of one thousand random permutations all being lower than Z. For tRNAs from glutamate biosynthetic family, matrix correlation between the cophenetic value matrix and the Euclidean distance matrix is 0.74, using normalized Mantel statistic Z and the approximate Mantel t-test: t ¼ 4:64, prob. random Zo observed Z: p ¼ 1. Fig. 7 shows the similarity patterns observed for the tRNAs decoded amino acids belonging to each biosynthetic family. Notice the power for discriminating and characterizing each tRNA. It is possible to establish some precursor–product amino acid relationships. For instance, tRNAs decoded glutamate and glutamine clustering together (Glu–Gln precursor–product). Another tRNA glutamate isoacepptor cluster with Arg and Pro recovered the original biosynthetic precursor–product relationships

Fig. 7. Clusters obtained from orthogonal indices using Euclidean metrics and UPGMA linkage, for tRNAs that recognize amino acids belonging to glutamate and aspartate families in H. marismortui.

between amino acids as it is presented in the evolutionary map of the genetic code (Wong, 1975, 2005). However, the aspartate family shows also some precursor–product relationships, Met–Thr in a group, and the other recovered Asp–Lys, Asp–Thr. The next cluster share no direct precursor–product relationships. 5. Conclusions Aiming to find a rigorous structural model, we proposed this new model to characterize tRNA chemical structure from a quantum and graph theoretical perspective. Our model embraces, in addition to molecular connectivity, other concepts like stacking interactions and some molecular interactions that are of the utmost importance to double helix stability. Also, the chemical environment surrounding nucleotides could be captured to handle tRNA secondary structure prediction and, indirectly, in the quantum chemistry approach. Nucleotide and nucleotide binding were characterized in a rigorous way in our model. Vertices and edges embraced the relevant information due to the nucleotide neighborhood (vertex neighborhood) interactions. Atomic partial

ARTICLE IN PRESS J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

charges were able to capture the nucleotide nearestneighborhood influences and the subsequent vertex and edge corresponded to an appropriate nucleotide chemical nature arising from the nucleotide stacking influence. The modified indices w, J, m and s, weighted using our approach, responded to an excellent discrimination and one-one characterization for the tRNA. The similarity analysis showed precursor–product relationships that are in accordance with the co-evolution theory of the genetic code. Our results indicated that the grouping shows a striking correlation between tRNAs that recognize amino acid belonging to glutamate and aspartate biosynthetic pathway. We have presented this model as a novel way of comparing biomolecules, specially tRNAs, but in the future we hope to extend the model to cover other RNA molecules. The connectivity pattern characterizing vertices and edges recovers in a quantitative way the tRNA structural information by means of single numbers. As our results suggest, we could apply our special structure characterization to another tRNAs with the purpose of understanding the genetic code evolution, mainly using the amino acid precursor–product relationship as the coevolution hypothesis propounded 30 years ago. Acknowledgements This work was supported by the Universidad Nacional de Colombia, sede Bogota´, and Colciencias under Grant 110-05-13613. We thank Professor Rodrigo De Castro, of the Mathematics Department, for his critical reading of this manuscript. We also express our gratitude to Professor Eugenio Andrade by his valuable discussions during the development of this work. Appendix The tRNAs we used have access number NC-006396 from genomic H. marismourtui ATCC 43049 at the NCBI Genbank. References Balaban, A.T., 1985. Applications of graph theory in chemistry. J. Chem. Inf. Comput. Sci. 25, 334–343. Baliga, N.S., Bonneau, R., Facciotti, M.T., Pan, M., Glusman, G., Deutsch, E.W., Shannon, P., Chiu, Y., Weng, R.S., Gan, R., Hung, P., 2004. Genome sequence of Haloarcula marismortui: a halophilic archaeon. Genome Res. 14 (11), 2221–2234. Basak, S.C., Gute, B.D., Balaban, A.T., 2004. Interrelationship of major topological indices evidenced. Croat. Chem. Acta. 77, 331–344. Bermu´dez, C., 2004. Caracterizacio´n de ARNs de transferencia como modelo para determinar correlaciones estructura funcio´n biolo´gica. Master’s Thesis, Universidad Nacional de Colombia, Bogota´ Colombia. Bermu´dez, C., Andrade, E., Daza, E., 1999. Characterization and comparison of Escherichia coli transfer RNAs by graph theory based on secondary structure. J. Theor. Biol. 197, 193–205.

581

Carbo´, R., 1995. Molecular Similarity and Reactivity: From Quantum Chemical to Phenomenological Approaches. Kluwer Academic Publisher Group. Cioslowski, J., 1993. Reviews in Computational Chemistry, vol. IV, VCH Publishers, Ch. 1. Daza, E.E., Bernal, A., 2005. Energy bounds for isoelectronic molecular sets and the implicated order. J. Math. Chem. 38 (2), 247–263. Daza, E.E., Villaveces, J.L., 1994. Upper and lower bounds for molecular energies. J. Chem. Inf. Comput. Sci. 34, 309–313. Di Giulio, M., Medugno, M., 1998. The historical factors: the biosynthetic relationships between amino acids and their physicochemical properties in the origin of the genetic code. J. Mol. Evol. 46, 615–621. Galindo, J.F., 2004. Construccio´n de ı´ ndices grafo-teo´ricos ponderados cua´nticamente para diferenciar tRNAs de plantas, mitocondrial y nuclear. Technical Report, Universidad Nacional de Colombia, Grupo de Quı´ mica Teo´rica. GAUSSIAN 98. Johnson, B.G., Robb, M.A., Cheeseman, J.R., Keith, T., Petersson, G.A., Montgomery, J.A., Raghavachari, K., Al-Laham, M.A., Zakrzewski, V.G., Ortiz, J.V., Foresman, J.B., Cioslowski, J., Stefanov, B.B., Nanayakkara, A., Challacombe, M., Peng, C.Y., Ayala, P.Y., Chen, W., Wong, M.W., Andres, J.L., Replogle, E.S., Gomperts, R., Martin, R.L., Fox, D.J., Binkley, J.S., Defrees, D.J., Baker, J., Stewart, J.P., Head-Gordon, M., Gonzalez, C., Pople, J.A., 1998. Gaussian, Inc., Pittsburgh PA. Grosjean, H., Edqvist, J., Straby, K., Giege´, R., 1996. Formation of modified nucleosides in tRNA: dependence on tRNA arquitecture. J. Mol. Biol. 255, 67–85. Harary, F., 1969. Graph Theory. Addison-Wesley, Reading, MA. Hobza, P., Sponer, J., 1998. Significant structural deformations of nucleic acid bases in stacked base pairs: an ab initio study beyond Hartree–Fock. Chem. Phys. Lett. 288, 7–14. Hobza, P., Sponer, J., Pola´ek, J.M., 1995. H-bonded and stacked DNA base pair: cytosine dimer. Ab initio second order Moller–Plesset study. J. Am. Chem. Soc. 117, 792–798. Hou, Y., Motegi, H., Lipman, R., Hamann, C., Shiba, K., 1999. Conservation of a tRNA core for aminoacylation. Nucleic Acids Res., 4743–4758. Hyperchem5.0, Hypercube, inc., 1997. Kier, L., Hall, H., 1986. Molecular Connectivity in Structure-Activity Analysis. Short Run Press Ltd., England. Lo¨wdin, P.O., 1998. Linear Algebra for Quantum Theory. Wiley Interscience, pp. 238–240 (Chapter 4). Matzura, O., Wennborg, A., 1996. RNAdraw: an integrated program for RNA secondary structure calculation and analysis under 32-bit Microsoft Windows. CABIOS 12 (3), 247–249. Mezey, P.G., 1987. Potential Energy Hypersurphases. Elsevier, New York. Mezey, P.G., 1993. Shape in Chemistry. VCH Publishers, New York. MINITAB, 2003. MINITAB. release 14.1. Statistical Software. Mulliken, R.S., 1955. Electronic population analysis on LCAO-MO molecular wave functions. J. Chem. Phys. 23, 1833–1840. Nicholas, K.B., Nicholas, H.B.J., 1997. GeneDoc: a tool for editing and annotating multiple sequence alignments. Distributed by the author. Nin˜o, M., Daza, E.E., Tello, M., 2001. A criteria to classify biological activity of benzimidazoles from a model of structural similarity. J. Chem. Inf. Comput. Sci. 41, 495–504. Pogliani, L., 2000. The Concept of Graph Mass in Molecular Graph by Molecular Descriptors. Nova Science Publishers Inc., New York. Randic, M., 1975. On the characterization of molecular branching. J. Chem. Ame. 97, 6609–6615. Randic, M., 1992. Chemical structure—what is she? J. Chem. Educ. 69, 713–718. Randic, M., 2001. The connectivity index 25 years after. J. Mol. Graph. Mod. 20, 19–35. Reidys, C., Stadler, P.F., 1996. Biomolecular shapes and algebraic structures. Comput. Chem. 20, 85–94. Sengupta, R., Saulius, V., Yarian, C., Sochacka, E., Malkiewiccz, A., Guenther, R., Koshlap, K., Agris, P., 2000. Modified constructs of the tRNA TCC domain to probe substrate conformational requirements

ARTICLE IN PRESS 582

J.F. Galindo et al. / Journal of Theoretical Biology 240 (2006) 574–582

of m1 A58 and m5 U54 tRNA methyltransferases. Nucleic Acids Res. 28, 1374–1380. Sutcliffe, B.T., 1992. The chemical bond and molecular structure. J. Mol. Struct. (THEOCHEM) 259, 29–58. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acid Res. 4876–4882. Trinajstic, N., 1983. Chemical Graph Theory, vols. I–II. CRC Press, Boca Raton, FL.

Villaveces, J., Daza, E.E., 1990. On the topological approach to the concept of chemical structure. Int. J. Quantum Chem. Quantum Chem. Symp. 24, 97–106. Villaveces, J., Daza, E.E., 1997. Concepts in Chemistry. J. Wiley and Sons, England, The Concept of Chemical Structure (Chapter 4). Wong, J., 1975. A coevolution theory of the genetic code. Proc. Natl Acad. Sci. 72, 1909. Wong, J.T., 2005. Coevolution theory of the genetic code at age thirty. BioEssays 27, 416–425 odoi 10.1002/bies.202084.