Co-Evolution of the Genetic Code and Ribozyme Replication

Co-Evolution of the Genetic Code and Ribozyme Replication

J. theor. Biol. (2002) 217, 235–253 doi:10.1006/yjtbi.3013, available online at http://www.idealibrary.com on Co-Evolution of the Genetic Code and Ri...

393KB Sizes 0 Downloads 67 Views

J. theor. Biol. (2002) 217, 235–253 doi:10.1006/yjtbi.3013, available online at http://www.idealibrary.com on

Co-Evolution of the Genetic Code and Ribozyme Replication David S. Stevenson*w wDepartment of Biology, University of Leicester, University Road, Leicester, LE1 7RH, U.K. (Received on 29 September 2001, Accepted in revised form on 4 March 2002)

The origin of translation has stimulated much discussion since the basic processes involved were deciphered during the 1960s and 1970s. One strand of thought suggested that the process originated from RNA replication in the RNA world (Weiner & Maizels, 1987, 1994). In this paper I seek to extend this model. The mRNA originates as a replication intermediate of minus-strand ribozyme replication and thus contains all the genetic information contained in both the ribozyme portion and the putative tRNA-like portion of the RNA molecule. Qualitatively, this is similar to the model for the origin of chromosomes (Szathmary & Maynard-Smith, 1993, Maynard-Smith & Szathmary, 1993). This model explicitly describes the evolution of early chromosomes and the role replication played in generating the modern mRNA. Moreover, by pursuing this model, the START and STOP codons were derived and their original function with regard to the primitive 23S ribosomal RNA is suggested. Coevolution of the genetic code (Wong, 1975) is also contained within the model. Lastly, I address some of the benefits and costs that the process may have for the organism in the context of autotrophy in the RNA world. r 2002 Elsevier Science Ltd. All rights reserved.

Introduction I present a hypothesis explaining how several of the key components of the modern translation apparatus could have arisen in the RNA world (e.g. Chowrira et al., 1993; Delarue, 1995; Di Giulio, 1997, 1998, 2000; Fry, 2000; Gesteland, 1999; Hall, 1979; Hopfield, 1978; Illangasekare et al., 1997; Lorsch & Szostak, 1994; Nitta et al., 1998; Noller et al., 1992; Piccrilli et al., 1992; Poole et al., 1999; Szathmary, 1989; Szathmary, 1993, White, 1976; White, 1982) and furthermore, why the cell invested in this strategy in the first place. The model serves to unite several key hypotheses by earlier workers in a model combining the virtues of the genomic tag hypothesis and the co-evolution theory with the origin of the ribosome and the origin of the *fax: +44-0116-252-33-30. E-mail address: [email protected] (D.S. Stevenson). 0022-5193/02/$35.00/0

messenger RNA. The central two hypotheses are: (i) The genomic tag model of Weiner and Maizels (1987 & 1994) (ii) The co-evolution theory of Wong (1975). It was while considering the origin of the genetic code along the lines of the genomic tag hypothesis that the basis of this paper arose.

Outline of the Model The model presented builds on the proposal by Weiner & Maizels (1987, 1994) that the tRNA is a molecular fossil of the RNA world. The tRNA structure represents a site where RNA molecules were recognized by replication complexes in the same way that certain viruses such as Qb (Blumenthal et al., 1972) and Brome r 2002 Elsevier Science Ltd. All rights reserved.

236

D. S. STEVENSON

Mosaic Virus (BMV) (Hall & Wepprich 1972, 1976, 1982; Hall, 1979, Miller et al., 1986), and the retroplasmid transcripts of the Mauriceville and Varkud plasmids of Neurospora (Chiang et al., 1992, Wang et al., 1992, Chiang and Lambowitz, 1997; Chen & Lambowitz, 1997) use tRNA-like structures to correctly position their replicase molecules. Furthermore, the CCA sequences, present at the 30 OH end of these structures (whether tRNA, viral or plasmid RNA molecules) are considered analogous to telomeric sequences in many (but not all) eukaryotes. The question then arises, whether the model of Weiner and Maizels can be extended to derive other components of the translation apparatus? The following propositions are made: (i) The key components of the translation apparatus (the aminoacyl-tRNA synthetases and the 23S ribosomal RNA) were evolved to improve the fidelity of ribozyme replication in a primitive cell with a rapidly expanding biochemistry. (ii) Co-evolution of the genetic code (Wong, 1975) is explained by co-expansion of the metabolic pathways of the cell and error-prone replication of the ribozymes that synthesize and inter-convert the amino acids. (iii) Origin of the mRNA as a replication intermediate in the synthesis of minus-strand ribozymes with tRNA-like 50 ends. (iv) I propose that the original function of the STOP and START codons lay in replication. Both of these codons are derived through a simple replication strategy. (v) Host–parasite interactions drive complexity. The genetic identity of the ‘‘first cell’’ is erased by gene transduction between primitive cells in a mixed population (Doolittle, 1999, Woese, 1998, 2000). The increased complexity of the cell is a means of regulating genetic interaction, but is also driven to increased complexity by these host–parasite interactions. I admit this seems an ambitious scheme. However, one is basically accepting a single proposition: that (error-prone) replication drives the evolution of translation at the molecular level, and that an increasing complexity of the

genetic content entails necessary advances in the regulation of its replication. As such, the following model forms a coherent system for increasing biochemical regulation based on increasing genetic complexity. Caveats Before proceeding further with this model, a few points need to be made. Firstly, the organism’s genome consists of minus-strand RNA molecules (with respect to the sense of modern mRNAs). These RNAs also substitute as the organism’s ribozyme-based, biochemical machinery. Secondly, the cell is assumed to be autotrophic, either because it has all the genes required for autotrophism, or because it can access (through transduction) any ‘‘missing’’ genes which otherwise lead to auxotrophism (see below). Later, I will consider the problem of autotrophy within the context of the stochastic corrector model (Szathmary, 1989) and the hypercycle model (Eigen, 1971). Thirdly, it is probable that lateral gene transfer was extensive (Doolittle, 1999; Woese, 1998, 2000). This is a major concern for molecular modelling (e.g. Szathmary, 1989). However, I will suggest ways in which this problem may be tackled. The Model For the sake of simplicity, I will break the model into sub-sections, each dealing with a different facet. The first section tackled will be the cell cycle. Figures 1 and 2 outline an early and late phase in the evolution of the cell cycle. In the early phase, only a few RNAs are present and simple replication is sufficient to ensure the continuation of the genetic heritage of the cell. At this stage the stochastic corrector model of Szathmary (Szathmary, 1989; Szathmary & MaynardSmith, 1997) would (or could) apply. Here, a relatively simple genetic content can be transmitted from parent to progeny cell as long as the abundance of each RNA species is sufficiently high and the number of progeny cells is sufficient to ensure at least some cells in the population obtain the full genetic complement of the parent.

237

CO-EVOLUTION OF THE GENETIC CODE Degradation

Simple replication loop for minusstrand ribozymes

Replicase Plus Strand Intermediates

Fig. 1. The earliest cell cycle. Minus-strand RNA molecules are fed through the replicase. Plus-strand intermediates play no further part on cellular activities and are degraded to recycle ribonucleotides.

PartiallyCoded polypeptides

Proto-mRNA ProtoRibosome

Plus-strand intermediate

Copy uncapped minusstrand RNAs

Capped-minus-strand RNA Replicase

AminoacyltRNA Synthetase Acetylated minusstrand RNAs

uncapped minus strand ribozymes (catalytica lly active and replication deficient)

free hydrolysis amino acids

Fig. 2. Extending the model shown in Fig 1 indicating where partial-coding of protein synthesis occurs with respect to the cell cycle.

In Fig. 1 the plus-strand intermediate is shown degraded following replication, as this initially plays no further part in the genetic content of the cell. It is presumed that the cell may behave like a modern day RNA virus (e.g. Barton et al., 2001) with bias in strand replication (Szathmary & Maynard-Smith, 1993, Maynard-Smith & Szathmary 1993). However, this is irrelevant at this stage and will not be considered in detail. In Fig. 2, elaboration of the cell cycle is shown. The cell aminoacylates the tRNA-like structures on each ribozyme associated with amino acid metabolism. The cell develops these mechanisms to improve the efficiency of genetic

transmission. Each RNA molecule is labelled, directing it to the replicase. Unlabelled ribozymes do not get replicated. Mechanistics 1: The Formation of the mRNA I will now discuss the mechanistics that lead us from the cell cycle shown in Fig. 1 to that shown in Fig. 2. I employ the copy-choice mechanism (Kennel et al., 1994; Maynard-Smith & Szathmary 1993; Tolskaya et al., 1983) to produce a polycistronic plus-strand intermediate (the mRNA). Replication is initiated de novo, opposite at the C2 nucleotide of the 50 tRNA-like

238

D. S. STEVENSON

structure (Hall and Wepprich, 1976, Mohr et al., 2000; Wang et al., 1992; Weiner & Maizels, 1987). In this scheme, the replicase jumps from the 50 end of one minus-strand RNA to the 30 end of the next molecule, continually copying each RNA. This is illustrated in Fig. 3(a). The Mauriceville and Varkud plasmids often produce concatamerized copies of themselves. Maynard-Smith and Szathmary suggested this mechanism (Maynard-Smith & Szathmary, 1993; Szathmary & Maynard-Smith, 1993) as a means of producing chromosomes. Completing Replication: a Role for the Ribosome and an Early START and STOP Codon The origin of the mRNA, through a copychoice replication mechanism, left open the problem of completing replication. As I mention in the discussion, the completion of viral RNA synthesis is still unclear. Therefore, any model based on viral replication strategies must be regarded as tentative, for now. However, whilst dabbling with this model, the possibility arose for incorporating functional START and STOP codons before protein synthesis, per se, evolved. Perversely, I will tackle the STOP codon first, as this was the easiest to derive through the model. In Fig. 3(a) the ‘‘16S’’ RNA component (the replicase) has completed the formation of a plus-strand concatamerized intermediate and dissociates from the 3’ end of the plus-strand molecule. The ancestral ‘‘23S’’ component then attaches to the replicase molecule and this complex re-attaches

to the 5’ end of the plus-strand RNA. The complex then cleaves the amino acid from the first tRNA tag CCA-OH it encounters. Once this has happened, the replicase-proto-ribosome complex (analogous to the full ribosome of today) scans down the plus-strand molecule, until it encounters the UGG triplet copy of the tRNA CCA tail of the next copied minus-strand RNA. At the UGG, the ancestral replicase-ribosome complex stops scanning and cleaves the next amino acid from the 30 end of the tRNA tag. This way, a signal encoded in the RNA serves as a ‘‘STOP’’ codon for the ribosome. There are two points to make here. Firstly, this ‘‘STOP’’ codon is very similar to modern STOP signals: all STOPs have the sequence URR (where ‘‘R’’ is any purine). UGG is the only modern codon for tryptophan, while STOP signals are UAA, UAG and UGA. Is it possible that the code has evolved by assigning this ‘‘STOP’’ codon to a new amino acid? Woese et al. (1966) and Wong (1975) take different sides in this aspect of the discussion (See Weber and Lacey, 1978 for further discussion.) (Similarly, one might argue that the use of UGA to encode selenocysteine, as well as the more common use as a STOP, (Ma, et al, 2002) could be regarded as transitionary phase in the evolution of this codon.) Moreover, if this is true then this suggested a leaky code at the stage of evolution discussed here. This is a reasonable assumption and fits with the earlier argument on the co-evolution theory and ribozyme replication (below). The second point to make, considers what happens after the proto-ribosome-replicase

——————————————————————————————————————————— Fig. 3. (a) production of plus and minus-strand RNAs from minus-strand ribozyme molecules. Concatamerized plus strands are sythesized by copy-choice shuffling of the replicase. This follows the scheme of Lambowitz et al. (see text) for the formation of concatamerized Mauriceville and Varkud plasmids from RNA precursors. A hypothetical scheme is also shown for the synthesis of the progeny minus-strand RNAs from the concatamer. Here, the ‘‘23S’’ ribosome component associates with the replicase after completion of plus-strand synthesis. The combined replicase-23S complex scans down the plus-strand in a 50 to 30 direction until it finds a UGG ‘‘STOP’’. The ‘‘23S’’ cleaves the amino acid tag off the minus-strand exposing a 30 hydroxyl to prime minus-strand synthesis. Upon hydrolysis of the amino acid, the ‘‘23S’’ dissociates from the replicase. The replicase then extends the minus-strand RNA. This process continues along the plus strand, with dissociation and re-association cycles until all the amino acid tags are removed. Thus another aspect of translation can be tied to replication. This stage is equivalent to that shown in Fig. 1. See text for further details. (b) Formation of the START codon by copy-choice initiation of plus-strand RNA synthesis. The scheme is largely identical to that shown in Fig. 3(a), except that the ribosome does not disengage from the mRNA until the last minus-strand molecule has been un-capped, this time by removal of amino acids to the growing polypeptide chain. Minus-strand RNAs are carried with the 23S from one ‘‘codon’’ to the next and the amino acids linked together by peptidyl transferase activity. The last minus-strand OH reached primes RNA synthesis.

239

CO-EVOLUTION OF THE GENETIC CODE

P

CCA-aa _

CCA-aa P

CCA-aa P

+

P Replicase

CCA-aa P

CCA-aa

GGU

GGU

P

CCA-aa GGU-P

_ +

“23S” (Regulator) Coupling of replicase to proto“23S” regulator

Replicase-Ribosome tracks from 5’ to 3’

replicase GGU CCA-aa

GGU CCA-aa

_

GGU - P CCA-OH

+

“23S” Cleavage of amino acid tags and priming of negative-strand synthesis on exposed hydroxyl groups. The “ribosome” tracks from 5’ to 3’ along the plus-strand RNA.

GGU CCA

GGU CCA

GGU-P CCA-OH CCA-OH

(a)

ACC CCA-aa

CCA-aa

CCA-aa Copy-choice priming of plus-strand RNA synthesis

GGU CCA-aa

GGU CCA-aa “23S RNA”

23S complex binds to START sequence (CCAUGG)

GGU CCA-aa

GGUACC CCA-aa

GGU CCA-aa

GGUACC

CCA-aa 23S disengages at end of RNA

+ aa-aa-aa

(b)

GGU

GGU

GGUACC

CCA

CCA-OH

CCA-OH

+

_ _

240

D. S. STEVENSON

complex has stopped elongating on the plusstrand molecule. In a modern cell the ribosome disengages from the RNA once it reaches a STOP codon. I would suggest something similar could have happened with replication. In this scenario, the ‘‘23S’’ component disengages from the replicase and plus-strand RNA after cleaving the amino acid at the ‘‘STOP’’. However, in this case the process yields a 30 OH group on the minus-strand molecule that is used by the replicase to prime the completion of minusstrand replication, as illustrated. Thus, the ribosome ‘‘23S’’ component has already evolved a simple code to regulate replication, cleaving off the positive signal from the amino acid tag and liberating a 30 OH group in a suitable location to prime minus-strand synthesis. As can be seen the process can run to completion on the plus-strand RNA (our mRNA). Once the first amino acid is cleaved, the replicase extends the minus-strand molecule and the amino acid tag is lost. The cycle continues with the completion of minus strand replication. After completion of the first minus strand, the ribosome and replicase re-associate and scan down the plus strand until the next UGG is reached. (The first signal is ignored as the amino acid tag is missing and the context of the signal has altered on the now extended minus strand.) The amino acid is cleaved from the 3’ of the next minus strand, the ribosome disassociates and the replicase primes synthesis once more. This process repeats until all the minusstrand molecules have been copied. Using an RNaseP-like molecule (Weiner & Maizels, 1987; 1994), the concatamerized, daughter minus-strand RNAs can be cleaved at each internal CCA liberating functional minusstrand ribozymes once more. There then remains the ‘‘50 problem’’ for the last tRNA ribozyme. However, assuming the cell has more than one copy of each minus-strand RNA no harm will come as long as at least one copy of each RNA molecule gets replicated (Szathmary, 1989). As you can see there is one weakness with this model: why would the ribosome couple at all to the RNA when the first sequence it encounters is a UGG (‘‘STOP’’)? Is there anything that can be done to ensure the ribosome complex skips the

‘‘STOP’’ at the beginning of the plus strand? Figure 3(b) explores this model. The model in Fig. 3(a) assumes the early replicase could begin RNA synthesis without a primer. This is not an unreasonable assumption given the behaviour of some viral RNA polymerases and the reverse transcriptase of the Mauriceville and Varkud plasmids. However, what if a primer was required? Figure 3(b) illustrates how intermolecular priming with another tRNA-CCA tag could have produced the START (AUG) codon. Interestingly, this facet of the model produces not only the AUG codon but also the correct context for this triplet in modern mRNA (Kaufman, 1994). The advantage of primed RNA synthesis is that the process can be regulated more efficiently. It also means that each plus-strand intermediate carries the CCAUGG internally to the 50 end of the molecule Fexactly as seen in all modern mRNAs. And like modern mRNAs, the ‘‘16S’’ ribosomal RNA (the replicase) has to scan 50– 100 bases down the molecule, from the 50 end, to reach the start codon. Furthermore, this scheme permits the replicase to exclude the RNA primer when synthesizing the progeny minus-strand RNAs. Importantly, this mechanism also removes the initial UGG signal (below). A common RNA primer could have been used for each plus-strand synthesisFor it could have been a molecule specific to each pathway. However, regardless of the nature of the primer used, the sequence at the 50 end of the copied portion of the plus-strand molecule would always be CCAUGG. This gave a common sequence for the ‘‘16S’’ (replicase) to recognize and bind the ‘‘23S’’. The ‘‘16S’’ binds to the 50 end of the plus strand, scans down the molecule until the CCAUGG is reached, the ‘‘23S’’ binds and the modern system of translation initiation has been developed without any forward requirement for protein synthesis. The priming mechanism for plus-strand synthesis is similar, though not identical, to the mechanism whereby tRNAs are incorporated into suppressor variants of the Mauriceville plasmid (Chiang & Lambowitz, 1997). I propose this may have happened in early RNA polymerization during plus-strand synthesis.

CO-EVOLUTION OF THE GENETIC CODE

In this extended scheme, the ribosome complex can ignore the RNA primer and the first UGG by binding directly to the sequence CCAUGG in a manner analogous to the modern ‘‘16S–23S’’ complex. The ribosomereplicase now knows which RNA molecules it wishes to replicate and which is a primer. Once again, cleavage of the primer from the CCAUGG sequence of the intermediate could be accomplished using an RNAseP like enzyme; thus regenerating the correct 50 terminus of the plus-strand intermediate. Could replication be the origin for polarized translation? In life simplicity is usually best. Since this model outlines a very simple role for the ribosome in replication, I remain unconvinced by the argument of Poole et al. (1999) with regard to the replication strategy. However, their role for the ribosome’s ‘‘23S’’ subunit was very similar to the one presented here and, therefore, requires discussion. Poole et al. see the ribosome as derived from the replicase with the enzyme catalysing triplet elongation of RNA molecules via the future 16S subunit. This subunit binds tRNA and mRNA and can ratchet along the mRNA as a polymerase would be expected to do. Poole et al. view the 23S as a sort of regulator of the process, perhaps stabilizing the interaction of the tRNA and the mRNA through amino acid attachment. The role I envisage for the ancestral ‘‘23S’’ rRNA is much more dynamic and ties the ribosome to decoding, with this role an intimate function of RNA replication. To precis this section, I propose that the function of the early ribosome was to read signals along the plus-strand replication intermediate and instruct the replicase where to prime daughter minus-strand RNA synthesis. It is possible that this syntenny in sequence requirement (between the modern ribosome and the model outlined here) is produced by chance. Yet it seems worthwhile to explore these ideas further. I will leave that topic for discussion. However, the similarity between the modern STOP and the sequence produced by replication of the tRNA tags seemed striking and worth considering, as did the origin for the START codon. The latter strategy produces an exact match with modern

241

mRNA signals and it seemed unlikely that it could have arisen by chance. Replication and Co-Evolution of the Genetic Code If the cell has ribozymes synthesizing (or interconverting) amino acids and these are replicated by an error-prone polymerase, then the coevolution theory naturally follows. In the proposed scheme, each amino-acidsynthesizing ribozyme has a tRNA-like tag that identifies that particular enzyme or biochemical pathway. Each pathway has its own code: this allows the cell to regulate the synthesis of new ribozyme molecules on a specific biochemical basis and not generally through a standard (modern) cell cycle. Since the replication machinery can now discriminate between biochemical pathways, the cell’s resources can be more effectively utilized. As error-prone replication proceeds mutations can either alter the ‘‘anticodon’’ on each tRNAlike tag, or the biochemical activity of the ribozyme. As long as mutations do not abolish the recognition of the biochemical pathway (or the molecule as a whole), the natural outcome is co-evolution (Wong, 1975) of the code with the biochemical activity of the ribozyme (and hence the amino acids). The section of the model has four requirements. (1) That the amino acids tag those ribozymes that make those amino acids. (2) The replication process is error prone. (3) A genetic code (of sorts) is present and is related to the biochemical pathways rather than specific amino acids. (4) The code is sufficiently wobbly that single base substitutions do not abolish replicase binding and RNA synthesis. The last point is important with regard to Fig. 3(a) and 3(b). Read Di Giulio (1998), for further discussion of this topic. From Aminoacyl Esterase to Peptidyl Transferase The model, illustrated in Figs 2 and 3(a) does not take the system through to protein synthesis, but it does introduce the key players: the

242

D. S. STEVENSON

ribosome, the mRNA and the aminoacyl transferases. Although, I am yet to fully model the step between the uncapping activity of the ribosome and modern translation, some ideas are slipping into place. Intimate knowledge of ribosomal structure and function will be important, here (Carter et al., 2001; Nissen et al., 2001; Ogle et al., 2001; Wimberley et al., 2001). Figure 3(b), therefore, forms a basis for further investigation. In the model shown in Fig. 3(a) and 3(b), the proposed signal for uncapping is the UGG. Clearly, this signal is not in an equivalent position to the copy of the ancestral anticodon in a modern tRNA: this signal lies in the copy of the acceptor stem. It is known that there are determinants of amino acid identity on the acceptor stem (Di Guilio, 1998) and this suggests a possible solution. Imagine the evolutionary steps in Figs 2 and 3(a) arose while the code was carried on a minihelix and not a full tRNA. At this stage the replicase-ribosome complex reads the UGG and STOPs scanning along the plus strand from 50 to 30 . The amino acid determinant on the minusstrand molecule lies adjacent to the CCA tail. However, when two minihelices are joined to make a tRNA (Di Guilio, 1992; Dick & Schammel, 1995; Schimmel et al., 1993, Schimmel and Ribas de Pouplana, 1995), the major determinant moves from the acceptor stem to the anticodon loop. It is possible that the transition from esterase activity in the peptidyl transferase centre to bona fide polymerization coincided with the duplication of the ancestral minihelix structure. It is possible that it was at this stage, the ribosome changed from reading the reiterated UGG as a signal in the mRNA, switching instead to reading the mRNA copy of the determinant in the anticodon (Dick & Schamel, 1995). However, the choice of using a triplet code had been set. Thus, at this stage the ribosome processed the minus-strand molecules, by moving along the mRNA in the 50 to 30 direction (as before). The proto-ribosome ignored the repeated UGG sequences while proceeding along the mRNA. When it encountered the new ‘‘codon’’ signal (copied in the mRNA from each tRNA tag) it cleaved the amino acid by transferring it to the

next minus-strand aminoacyl tRNA tag instead of reacting the ester link with water. One advantage of the peptidyl transferase reaction over hydrolysis relates to the ribosome-replicase itself. In the scheme in 3a, the replicase complex uncouples from the ‘‘mRNA’’ at the UGG signal (corresponding to the end of the copied minusstrand RNA). This requires hydrolysis of the amino acid tag, uncoupling of the complex from the plus strand then reinitiation of replicasesubunit (proto-16S) extension on the plus strand. With peptidyl transferase activity, the complex can proceed down the entire ‘‘mRNA’’ until it reaches the last UGG. Polymerization then produces a single daughter minus-strand RNA. An RNaseP-like enzyme would then cleave this concatamer, liberating the daughter ribozymes. Clearly this scheme would speed up replication by removing the need to continually couple and uncouple the ‘‘23S’’ subunit from the ‘‘16S’’ subunit during the replication process. I am not entirely satisfied with this part of the model at present, though it is not unreasonable. However, I feel enough of the key actors are on stage to be getting a feel for the play. One benefit this replication model has overall is conferred by the manner in which the plusstrand RNAs are made. In the model illustrated in Fig. 3(b) we have a defined START for plusstrand replication and a START for ‘‘23S’’ binding and processing of signals on the mRNA. However, the order of the minus-strands copied in the plus-strand is random by this model. If the cell produces more than one copy of each plusstrand intermediate, then the corresponding proteins produced during replication will differ. When the cell switches from maintaining a highly segmented minus strand genome to a less segmented plus strand one (Szathmary & Maynard-Smith, 1993), the arrangement of minusstrand RNAs replicated within this plus-strand molecule will become fixed. Thus by the model shown, the proteins produced during the replication cycle will also become fixed. The cell will now have invented several proteins by the random assortment of minus-strand RNA molecules during earlier replication. From here it is easy to allow further selection on a riboprotein cell rather than the ribozyme dominated one: the ribozymes’ days are now numbered.

CO-EVOLUTION OF THE GENETIC CODE

An Early Role for Protein Synthesis I am going to ask why the cell invented protein synthesis in the first place F and can any solution be accommodated with this model. A number of hypotheses have been presented but none is completely satisfactory. The synthesis of semi-random polypeptides could have provided simple structural functions to ribozymes (see Di Giulio, 1998; Orgel, 1989; Wong, 1991). Furthermore, Orgel (1989) and Poole et al. (1999) suggested aminoacylation could help facilitate replication by directly increasing the binding of the tRNA portion of the molecule to the replicase. This is true of the Qb phage of E. coli (Blumenthal et al., 1972) and of BMV in plants (Bastin & Hall, 1976) and is, therefore, reasonable. A further suggestion I would make is that if the cell is cycling in the manner I suggest here, then clearly an ‘‘ambitious’’ ribozyme would make a large amount of its product molecule and hence drive its replication furtherFpotentially taking over the cell. Clearly, this would be an unfortunate outcome for the cell. By binding the amino acid tags from each ribozyme in a protein complex, the aminoacyl transferases would have limited access to amino acids required to continually drive the replication cycle forward. Once the ribosome is capable of handling amino acid tags subsequent steps are selectable. Polymerization of amino acids in the cell cycle would carry two other advantages. Firstly, if we accept amino acids were important to the riboorganism (Szathmary, 1999, 1993, Yarus, 1998), then limiting their accessibility to parasites would be advantageous: polymerization would be beneficial. Secondly, by limiting availability of amino acids to the cell’s biochemical machinery, particular biochemical pathways could be regulated efficiently. Thirdly, large quantities of free amino acids would alter the osmotic potential of the cell more significantly than if they were bound up as proteins (see below). Therefore, it was useful to polymerize them in proteins, in a similar manner to polymerization of monosaccharides in modern organisms. Finally, it is possible that synthesizing random polypeptides conferred some other benefit to the cell that is now lost: we may never know. However, I have suggested the cell can make

243

non-random proteins once the change is made from maintaining the minus-strand chromosomes to the plus-strand ones. Selective pressures can act on any newly synthesized protein and hence drive further evolution in the system. The efficiency of the mechanism outlined here, depends sensitively on the kinetics of free hydrolysis that is hard to determine for a primitive cell without knowledge of its environment (Orgel, 1989; Assouline et al., 2001; Yusim et al., 2001,). However, free hydrolysis does not interfere with the amino acid tag’s role in directing the RNA for replication. As Fig. 2 shows, any RNA losing its tag by free hydrolysis simply fails to replicate. However, if aminoacylation is biochemically regulated, as one might suppose, the RNA will simply be redirected for re-acylation until it is successfully replicated. (An alternative cycle where de-acylation activates replication crumbles under pressure from any parasitic RNA.) As free hydrolysis does not block translation in modern cells, I do not consider this a problem. It does, however, introduce an energy cost to the cell. The dynamics of the system is difficult to assess. I have proposed some aspect of regulation by sinking the amino acids, following replication, in proteins. However, I feel viewing the cycle in isolation is too one-dimensional. One must consider that a complex cell would have other forms of biochemical feedback regulating both its metabolism and replication. Autotrophy and Eigen’s Paradox Why bother considering this model at all? Is there any reason to suppose complex aminoacylation reactions on multiple chromosomes were required as part of a primitive cell cycle? I will attempt to justify my conclusions here. Eigen (1971) showed that any cell with an error-prone replication (Johnston et al., 2001) would experience difficulty maintaining its genetic composition once its complexity increased beyond a certain point. Eigen proposed a hypercycle model, whereby certain co-operativity between ‘‘chromosomes’’ or genetic units could salvage the organism from genetic crises (see also Nemoto & Husimi, 1995). Cellularization was beneficial (Eigen & Schuster, 1977; Szathmary,

244

D. S. STEVENSON

1989) as was the formation of longer chromosomal units (Maynard-Smith & Szathmary, 1993, Szathmary & Maynard-Smith, 1993). However, for the hypercycle to work, cellularization would have to have happened late, otherwise boundaries might interfere with the running of the hypercycles (reviewed in Fry, 2000). Szathmary’s stochastic corrector model circumvents these problems. Simply put and as mentioned above, if the cell produces sufficient copies of each RNA molecule, and produces enough progeny, then there is a good probability that one or more of the progeny will contain the full requisition of genes for survival. Furthermore, chromosomization enhances gene transmission by linking genes together (Maynard-Smith & Szathmary, 1993; Szathmary and Maynard 1993). However, there is a problem with chromosome formation and this is the problem the model initially sought to address: Eigen’s information crisis (Eigen, 1971). The problem is a function of replication efficiency. Eigen (1971) and Szathmary (adopting Eigen’s assumptions) suggest that the early cell, replicating with ribozymes, probably had a low fidelity of replication. (See Johnson et al., 2001 for a reality check with regard to these assumptions.) This view might be backed by considering present day RNA virusesFalbeit with vastly more efficient protein replicases (Poole et al, 1999). The largest RNA viruses with a single chromosome have genomes no larger than 33 kb [the coronaviridae (Buchen-Osmond, 1998) have the largest genomes] with other slightly smaller viral genomes adopting a segmented approach (e.g. the orthomixoviridae; McCauley & Mahy, 1983) and the reoviridae (Wickner, 1993). In the reoviridae each segment is 1–4 kb in size with the total genome around 20-25 kb. In the case of the orthomixoviridae and reoviridae, each segment encodes a single (or at most 2 proteins) while the coronaviridae encode a handful of proteins on their single genomic RNA. Although, this limited genetic content may reflect a minimal requirement for a cellular parasite, it may also reflect the lower stability of the viral RNA genome over the cell’s DNA genome. Clearly, if the latter is a limiting factor,

then Eigen is correct in choosing a maximum RNA ‘‘gene’’ size in the range 100–1000 bases. Further, if the model of Weiner and Maizels is correct and each ribozyme required a minihelix or tRNA-like head, then 50–100 bases of the gene was already accounted for in the replicase recognition sequences. It should be noted that the replicase efficiency in a modern RNA virus is expected to be significantly greater than that of a primitive ribozyme (Eigen, 1971). Therefore, the genome size attained by a ribo-organism might be expected to be less than that of a modern virus. However, this is not a straightforward argument. Aside from the inherent error rate a polymerase may have, one other factor is important: processivity. The size of the template the enzyme can copy is determined by the processivity of the enzyme, not its error rate. If processivity is low then the maximal size of each RNA segment is reduced. However, this is not the same as saying the genome size is reduced. A highly segmented genome is possible even with an error-prone replicase (Doolittle, 1999; Woese, 1998, 2000). I am unconvinced that DNA can rescue the genome, here. The RNA-dependant RNA polymerases and reverse transcriptases are homologous at the protein and DNA level (Webster et al., 1989, Xiong & Eickbush, 1988, 1990, Lazcano et al., 1992). Unless we toss the rules of evolution aside, we must conclude that reverse transcriptases evolved from RNA-dependant RNA polymerases after protein synthesis had started but before DNA was present. If reverse transcription arose in the RNA world, then why would we suppose the sequence of the genes encoding these would be similar? There is no reason to believe that a sequence of a ribozyme that made DNA would be similar to one that made RNA. DNA is not the solutionFat least not yet. An additional concern comes from the evolutionary studies of Koonin (2000) and Kyrpides (Kyrpides et al., 1999). Using a statistical analysis of bacterial genomes it appeared that the minimal bacterial genome contained approximately 250 common genes. The smallest known bacterial genome belongs to the Human parasite Mycoplasma genitalium. This contains

CO-EVOLUTION OF THE GENETIC CODE

480 open reading frames and is an obligate parasite (an auxotroph). Naturally, the organism may have gained and lost genes as it adapted to life as an intracellular parasite. However, the genome size is a reasonable starting point for the consideration of autotrophy in our ribo-organism (Koonin, 2000). Koonin estimated that if every conceivable metabolite was supplied to the minimal organism, then the gene count could fall to B100. This still leaves the bacterium with complex translation, transcription and replication machinery. Trimming this to bare basics for a ribo-organism with no DNA or advanced translation, would still leave B50 genes to transmit. With some hand-waving uncertainty it is just possible to shrink the genome to accommodate these genes on 2–3 coronaviridae-sized RNA molecules. Overlapping functions (Schultes & Bartel, 2001) could trim this even more. However, we are now stuck with a ribo-organism obtaining all its raw materials in ready-made form from a stress-free environment. I am unconvinced such an organism ever existed. I conclude this, as the building blocks of RNA have not been found in any cosmological or geological setting. However, I note this conclusion may alter if any contradictory evidence is uncovered. If we examine the geological/astronomical evidence, some hope is given by recent finds in the Murchison meteorite (Cooper et al., 2001) and from the chemistry of olivine (Freund et al., 2001). The Murchison meteorite belongs to a class of objects called carbonaceous chondrites. These are rich in reduced carbon compounds including amino acids and, recently discovered, sugars. It is believed that much if not all of the Earth’s inventory of these reduced carbon compounds (and possibly water) was delivered during the Hadean period from 4.6–3.9 Ga. Interestingly, there is evidence for life only 50– 100 million years after the end of the Hadean (Rosing, 1999). However, there is no evidence that these carbonaceous chondrites (compositionally identical to the raw material in the solar system) contain the building blocks of RNA (Cooper et al., 2001). Therefore, we are forced to conclude the ribo-organism had to make at least

245

some of its own raw materialsFexpanding its genome size, probably beyond the minimum 50– 100 genes. What’s the problem? If each gene is (assuming a mid-range value) 500 bases long and the cell has a sufficient number of genes to be autotrophic (B300–2000?), then the total genome size would lie in the range 1.5  105–1.0  106 nucleotides. [The recently published ribozyme replicase of Johnston et al. (2001), was 204 bases long for comparison.] [The smallest sequenced genomes of autotrophs (e.g. E. coli and H. influenzae) contain around 3000 genes (Koonin, 2000)]. Clearly, by Szathmary and Eigen’s criteria on RNA stability, this could not have been organised in a single chromosome: it would have to be fragmented like the reoviridae or orthomixoviridae. The question is, then, how many chromosomes might the organism have had, and would the stochastic corrector model provide a sufficient probability of successful gene transmission, alone? If replicase processivity was low, then we could be looking at randomly assorting hundreds of gene segments from parent to progeny cells. This sounds like a tall order. (However, it would be interesting to follow the outcome of the stochastic model when there are a large number of gene segments.) Instead, I propose that, initially, like certain viruses, the genome of the early organism was highly fragmented (see also Doolittle, 1999; Woese, 1998). And like the viruses, the addition of molecular tags to each molecule could be used to enhance genetic transmission. Starting with tRNA-like tags, the cell elaborated this system to improve transmission of its burgeoning collection of genes. To this end, Hall (Bastin & Hall, 1976) suggested that the Brome Mosaic Virus (BMV) might amino acylate its tRNA-like genomic terminal structure to improve replication efficiency when template abundance is low. In essence, strong selective pressure is put on aminoacylation of certain RNA molecules to enhance transmission from parent to daughter cells. It is, for the reasons outlined in this section, that I believe the ribo-organism had a large, segmented genome and required tagging to regulate replication.

246

D. S. STEVENSON

Genomic Tagging for Transmission Interestingly, in the picornaviridae (Barton et al., 2001, Wimmer et al., 1987) tagged RNAs are packaged and delivered to progeny virions whilst untagged ones serve as mRNAs. The Picornaviruses are intriguing in that they have clover-leaf structures at the 50 end of their senseorientated genomes. Although these are not closely related to tRNAs in structure (Barton et al., 2001) they appear to promote initiation of minus-strand synthesis through a looping scheme which may be relevant here. The early cell could employ a related strategy, directing tagged RNAs to progeny cells whilst retaining those non-tagged molecules produced by replication. In this light the tags could serve an additional function as primitive centromeres, a further corollary to the model of Weiner and Maizels. This is an attractive proposal as it could be used to effectively partition RNAs in a cell with a substantial number of chromosomes. The Origin of mRNA: Final Caveats and Comparison with Related Models In their 1993 model, Maynard-Smith and Szathmary (1993) produced chromosomes by the copy-choice mechanism outlined above. They regarded the plus-strand molecule as the chromosome and the minus-strand daughter molecules as the transcripts. In a sense we are both correct. The appropriate analogy is presented by retroviruses and related LTR-retrotransposons. Whereas I regard the minus-strand molecules as the original, genomic molecules and the plus strand as the transcript, evolutionary drive leads both models together. During the replication of retrotransposons a plus-strand RNA is produced from the genomic DNA copy. Reverse transcription then produces a minusstrand molecule (in this case DNA) that is then copied via a copy-choice mechanism into a double-stranded DNA. Initially, I propose minus-strand chromosomes with plus-strand intermediates. Copychoice replication mechanisms lead to fused plus-strand intermediates. The role of the proto-ribosome, as regulator, is established as shown, here [Fig. 3(a) and 3(b)]. This creates a

temporary plus-strand chromosome that is also the future mRNAFidentical in every respect to the scheme employed by retroviruses and other (LTR) retrotransposons. For a time this plusstrand molecule serves as messenger only, degraded after replication is completed. The messenger is a slave to the minus-strand chromosomes. Permanence of this plus-strand molecule is later ensured when the number of minus-strand RNA molecules in the cell becomes too great to manage. Selection to fuse genes together to make larger units acts (MaynardSmith & Szathmary, 1993; Szathmary & Maynard-Smith, 1993). Finally, when DNA is invented, the role of the plus-strand molecule is downgraded once more to that of intermediate F this time between the gene and the functional product (the protein born on minus-strand tRNAs). I further suggest that is possible, or even probable, that the cell would replicate some of its genes for its own immediate purposes rather than for transmission to its progeny. Further, from biochemical considerations alone, a cell will want more of some enzymes than others and this will vary as the cell saunters through its environment. If concatamerization of the replication intermediates was a common occurrence during ribozyme replication, then a natural outcome of the replication of specific pathway ribozymes, would be formation of an ‘‘mRNA’’ containing ‘‘codons’’ specific to this particular biochemical pathway and hence some aspect of reproducibility in the synthesis of its mRNAs is introduced. Co-evolution of the code then becomes enshrined in the future chromosomes. As I see it, this problem of codon ordering is central to any model for producing coded protein synthesis. The model I present here, must therefore, be regarded as step in this process. The model produces ‘‘mRNAs’’, and a number of non-random proteins. I recognize that this is not a complete solution. However, the model does take the process further than it has been taken before. The Origin of the tRNA and Replication As an aside, a second method of priming RNA replication is rather intriguing. In his 1992 paper on the origin of the tRNA, Di Giulio

CO-EVOLUTION OF THE GENETIC CODE

postulated that duplication of a progenitor minihelix RNA (see also Ibba et al., 2000; Ribas de Pouplana et al., 1998, Schimmel et al., 1993, 1999; Schimmel and Ribas de Pouplana, 1995) could be followed by end ligation of a duplicated RNA minihelix. The site of the ligation would eventually form the anticodon loop, while the other end remained open. Figure 4 illustrates how an alternative replication scheme (which could have occurred as an aberration on the major replication pathway). This is very similar to the scheme of Dick & Schamel et al. (1995), involving homologous pairing between two minihelix molecules to form a precursor for the tRNA.

247

In the Dick and Schamel model, the genomic RNA has a minihelix at either end of the molecule to facilitate priming of forward and reverse RNA synthesis. Although this is appealing, it is not necessary to facilitate complete RNA synthesis. The model, outlined in Figs 2 and 3 will accommodate complete replication – and is more similar to the mechanisms employed by modern RNA viruses. Therefore, I prefer this model to the scheme of Dick and Schamel (and that shown in Fig. 4). However, the possibility that this scheme occasionally occurred is not unreasonable. Even the occasional production of a ‘‘full tRNA’’ by aberrant replication might be selected for if the product tRNA was a better

Pairing of minus-strand RNAs via their conserved minihelix 5’ ends (Di Giulio, 1992, Schimmel, 1995, Dick and Schamel, 1995).

CCA-OH HO-ACC

Priming of plusstrand RNA synthesis on the 3’OH of each

CCA ACC

Aberrant resolution of replicated RNAs leaving one with ribozyme with a duplicated tRNA-like 5’ end and the other ribozyme decapitated

Correct resolution following completion of replication CCA-OH

+

+

OH-ACC CCA-OH

Fig. 4. Alternative scheme for initiation of replication of minus-strand ribozyme RNAs. Replication is primed on the 30 OH of the CCA as before, however, one ribozyme pairs to another using homology in the tRNA-like/minihelix structure. Aberrant resolution of this complex after the formation of the plus-strand extensions can produce the duplicated helix structure proposed by Di Giulio and Schimmel as a progenitor for the tRNA. Grey bars indicate the ribozyme portion of the molecule. See Dick & Schamel, (1995).

248

D. S. STEVENSON

replication signal than the minihelix (Di Giulio, 1992; Dick & Schamel, 1995). As I suggested earlier, the formation of the tRNA by fusion of minihelix molecules could have had other advantages for RNA replication.

The Effect of Cellular Parasites: The Stochastic Corrector Model and N-Body Simulations What happens to cells operating this system when cellular parasites are taken into account? Before I consider this in detail, I want to address what Woese called the progenote (Woese, 1998). Although I have considered a genetically and biochemically autonomous organism, this may well be wishful thinking. Woese’s idea of rapid gene transfer between primitive cells (or perhaps better phrased as organisms) is appealing as it alleviates many of the problems of replication in a genetically autonomous organism. The cell does not require a full complement of genes if it lives in (for want of a better expression) a commune. Such cells could use viruses in an almost sexual fashion. Genes are traded between cells using viral-like particles. In this light viruses are not ‘‘parasites’’, they are a means to an end. The population, or commune, is autonomous as a whole, but the individual cell (organism) is not. The term organism, in the sense that we know it, is meaningless (Doolittle, 1999; Woese, 1998). How could this model fit in with the idea of the ‘‘pangenome’’? A cell may well want the genes that its neighbour has, but it would still wish to regulate the copying of these genes so that they do not interfere with its own machinery. Thus the replication regulation model I have outlined works as it allows the cell to assess incoming RNA molecules before replicating them. Like any system, this is not perfect and there is still scope for cellular parasitism, however, this may be a beginning. I would suggest that there are other systems (such as RNAi) that appear ancient and operate at the RNA and translation level (Vance & Vaucheret, 2001; Szuromi, 2001) that may also help. Interestingly, RNAi involves both the translation apparatus and endogenous RNAdependant RNA polymerasesFenzymes formerly assumed to be unique to the RNA viruses.

These systems appear to have evolved to deal with cellular parasitism and are present in all the eukaryotes so far tested (Vance & Vaucheret, 2001). There also appears to be a role for RNAi and the RNA-dependant RNA polymerase in normal animal and plant development (Szuromi, 2001). No man is an island, it is said, and I do not believe there is value in viewing systems in isolation F except to ease computational time. Moreover, one cannot argue for the antiquity of systems based on viral and transposon biology whilst simultaneously refuting their role in the biology of the day. If we propose that the tRNA tag is related to early replication, and that modern viruses and transposable elements are molecular fossils, then it would be folly to conclude they were not present, simply because our modelling capacity does not extend to include them. However, I do not wish this statement to seem negative. I draw analogy with the N-body problem in physics (Heritier, 2001; Kuhlman, et al., 1996; Takahashi et al., 1998). The N-body problem exists in several branches of physics (and I include articles from particle and cosmological physics for reference). Simply put, in a system with more than two interacting particles under investigation, the mathematical simulation often becomes intolerably complex. I am particularly fond of cosmological simulations involving hundreds to millions of particles (modelling stars and gas clouds), each interacting with one another. Some models even include the complicating effects of stellar evolution on the outcome of the simulation. The point is this: the computational time for these simulations is immenseFa single modelling run may take weeks on the fastest computers. Yet we are unable to produce realistic outcomes if we reduce the number of particles to more readily manageable proportions. I believe that in order to accurately determine the genetic behaviour of our autotrophic riboorganism we must consider the N-body problem. If we do not, we will remain stuck between a rock and a hard place. As Heritier says (Heritier, 2001), ‘‘A system where n is large constitutes the ‘many-body problem’ and can exhibit rich behaviour’’. I believe this is also true of our riboorganism.

CO-EVOLUTION OF THE GENETIC CODE

Discussion I would have been happier to forward some of these arguments if there was better information on the replication of the RNA viruses (see Butcher et al. 2001; Barton et al. 2001 for recent advances). By extension of the arguments of Weiner and Maizels (1987, 1994), clearly it is the RNA viruses that form the closest molecular fossils for replication models for translation. However, although we are aware of the mode many RNA viruses initiate replication at the tRNA structures, we do not know much about the subsequent initiation on the RF (replicative form) structures to regenerate the virion genomic strands. Whilst investigating this aspect of the model I was disheartened to compare models of these processes from the mid-1980s to present day: many gaps remain unplugged. Therefore, I consider this to be a weakness with any model centred on the tRNA and RNA replication. A central facet of this model is the improvement to replication initiation and indeed of replication regulation in general that modifying the tRNA structures might give to the cell. Bastin & Hall suggested, in 1976, that Brome Mosaic Virus replication might benefit from association with translation components (in this case with an aminoacyl tRNA synthetase). Similarly, Qb RNA associates with translation factors during its replication. Intuitively, one suggests that translation and replication must be linked and these viruses represent molecular fossils of this era (Weiner & Maizels, 1987, 1994). The alternative that each virus independently chose association with components of translation seems untenable. Consequently, one is drawn to the conclusion that all of the facets of the translation apparatus could have been associated either with RNA replication or its regulation. In this light it is possible to derive the origin of certain key components. Importantly, this is the first model I have seen that produces the mRNA. While other models show it appearing from thin air (Orgel, 1989; Poole et al., 1999), I show that it can be produced via replication alone and that it will contain all the genetic information present in the proto-tRNA necessary for (later) production of coded protein synthesis. The model of

249

Szathmary and Maynard Smith (1993; See also Maynard-Smith & Szathmary, 1993) is thus likely to be correct: chromosomization in this case is analogous to mRNA formation as is the case with modern retroelements. Furthermore, I was able to show that with a ribosome regulator, RNA replication could be completed using the mRNA as an intermediate and UGG ‘‘codons’’ as signals for the replicase. Like the Paramyxoviruses, replication/transcription was biased, producing nested minus-strand RNA molecules from the plus-strand template: the 50 end of the plus strand RNA encoding the largest number of progeny minus-strand molecules. Most strikingly of all, I can produce a START codon in its proper context using this model. Interestingly, this model may show the beginning of a triplet code in the translation system [(Fig. 3(a) and 3(b) without recourse to envisaging a replicase utilizing trinucleotide substrates of uncertain origin (Poole et al., 1999). Using the replication idea and the model of Di Giulio (Di Giulio, 1992) it was possible to derive a precursor of the tRNA through the replication route. Although this is not identical to the scheme of Di Giulio (1992) or Schimmel et al. (1995) it was similar and suggestive. More straightforward replication strategies are undoubtedly possible, but I felt this was worth pointing out. Most importantly of all, the sense of the molecules is correct with respect to present day translation, and with the advent of reverse transcription (presumably after protein synthesis was efficient), the mRNA can be used to produce the first DNA chromosomes in the same way that retroelements do today. An argument against the model might be that if amino acids and their precursors bound to tRNA structures to instruct the replication machinery, then why do we only see amino acid tagging today (Di Giulio, 1998)? (And what happened to ribozymes on other biochemical pathways?) As Szathmary might suggest, what I am proposing is that we are merely seeing the grin of the Cheshire cat. Amino acid modification on tRNAs is selected for, as protein synthesis is essential for cellular survival. Other pathways have dissolved through time with proteins

250

D. S. STEVENSON

completely superseding their RNA predecessors. Although this may not be the best argument one could make (it is rather negative), it may be the best we can make. Autotrophy seems to be the biggest problem evolutionary biologists have. An early organism would have grown up in a rough environment. Although it may have enjoyed a supply of some nutrients from cosmological (Cooper et al., 2001) and geological sources (Freund et al., 2001) it probably did not have a sufficient amount to shrink its genome to minimal proportions (Koonin, 2000). Unless there was a now missing accessory component in the RNA world that allowed larger genomes with ribozyme replicase, we seem stuck. Of the alternatives available, I prefer models with large genomes consisting of multiple (relatively) small RNAsFeach several hundred to a few kilobases long. Once protein replicases (or much more sophisticated ribozyme replicases than we can currently make in the lab) have evolved, longer chromosomal units can arise. However, initially extensive chromosomization with the RNA genome reduced to 3 or fewer molecules seems unlikely in an autotroph. Of course, this argument implies there was something before RNA that could survive using the contents of the environment. However, this is outFwith the scope of this article. I believe the ribo-organisms were autotrophic, or at least they had sufficiently complex biochemistries to preclude small genomes. Similarly, DNA cannot be the answer. DNA synthesis must post-date translation. The argument is simple: homology between RNAdependant RNA polymerases and reverse transcriptases at the sequence level must preclude the advent of DNA synthesis before protein synthesis (Xiong & Eickbush 1988, 1990, Lazcano et al., 1992). The chance that these distinctive classes of enzyme (present in a large number of distinct systems) arose separately yet have homologous (or orthologous) active site sequences seems hard to swallow and would contradict evolutionary teaching. The argument must be that protein synthesis was functional before evolution invented DNA synthesis. This leaves us with the problem of having large RNA genomes prior to protein

synthesis. Furthermore, the energy costs associated with producing DNA intermediates from RNA in ribonucleotide reductase is relatively high considering the nature of the biochemical transformation. This implies the advantages of having DNA are high, yet the process did not arise early on. As a penultimate point, I would like to throw down the gauntlet. We now have a non-specific ribozyme replicase with a reasonable error rate. I suggest that soon we could test the co-evolution theory directly. Take a ribozyme and allow the replicase ribozyme to copy it. Subsequently, after a few rounds of replication, assay the biochemical activity of the ribozyme, comparing it to the original enzyme. Then extend this experiment with a ribozyme that has a tRNA tag and a carries out a simple modification on an amino acid. At this pointFand not too far awayFthe co-evolution theory of Wong may gain some solid foundations. Soon science may have recreated the ribo-organism through testtube (natural) selection. An interesting thought. Finally, I conclude by reiterating that although I regard this model as a next step along the road to translation, it is far from the final one. I’d like to thank Carl Woese for useful comments on an earlier version of this manuscript.

REFERENCES Assouline, S., Nir, S. & Lahav, N. (2001). Simulation of non-enzymatic template directed synthesis of oligonucleotides and peptides. J. theor. Biol. 208, 117–125. 10.1006/jtbi.2000.2205. Barton, D. J., O’Donell, B. J. & Flanegan, J. B. (2001). 50 cloverleaf in Poliovirus RNA is a cis-acting replication element required for negative-strand synthesis. EMBO J. 20, 1439–1448. Bastin, M. & Hall, T. C. (1976). Interaction of elongation factor 1 with aminoacylated Brome Mosaic Virus and tRNAs. J. Virol. 20, 117–122. Blumenthal, T., Landers, T. A. & Weber, K. (1972). Bacteriophage Qb replicase contains the protein elongation factors EF-Tu and EF-Ts. Protein Nucleic Acid Sci. 69, 1313–1317. Buchen-Osmond, C. (1998). web page http://www.ncbi. nlm.nih.gov/ICTVdb/ICTVdB/19000000.htm # NucAcid Butcher, S. J., Grimes, J. G., Makeyev, E. V., Bamford, D. H. & Stuart, D. I. (2001). A mechanism for initiating RNA-dependant RNA polymerisation. Nature 410, 235–240.

CO-EVOLUTION OF THE GENETIC CODE

Cater, A. P., Clemons, W. M., Brodersen, D. E., Morgan-Warren, R. J., Wimberley, B. T. & Ramakrishnan, V. (2001). Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics. Nature 407, 340–348 Chen B. & Lambowitz, A. M. (1997). De Novo and DNA primer-mediated initiation of DNA synthesis by the Mauriceville retroplasmid reverse transcriptase involve Recognition of a 30 CCA sequence. J. Mol. Biol. 271, 311–332. Chiang, C. C., Kennell, J. C., Wanner, L. A. & Lambowitz, A. M. (1992). A mitochondrial retroplasmid integrates into mitochondrial DNA by a novel mechanism involving the synthesis of a hybrid cDNA and homologous Recombination. Mol. Cell. Biol. 14, 6419–6432. Chiang, C.-C. & Lambowitz, A. M. (1997). The Mauriceville retroplasmid reverse transcriptase initiates cDNA synthesis De Novo at the 30 End of tRNAs. Mol. Cell. Biol. 17, 4526–4535. Chowrira, B. M., Berzal-Herranz A. & Burke, J. M. (1993). Novel RNA polymerisation reaction catalysed by a group I ribozyme. EMBO J. 12, 3599–3605. Cooper, G., Kimmich, N., Bellisle, W., Sarinana, J., Brabham, K. & Garrel, L. (2001). Carbonaceous meteorites as a source of sugar-related compounds for the early earth. Nature 414, 879–882. Delarue, M. (1995). Partition of aminoacyl tRNA synthetases in two different structural classes dating back to early metabolism: implications for the origin of the genetic code and the nature of protein sequences. J. Mol. Evol. 41, 703–711. Dick, P. T. & Schamel, W. W. A. (1995). Molecular evolution of transfer RNA from two precursor hairpins: implications for the origin of protein synthesis. J. Mol. Evol. 41, 1–9. Di Giulio, M. (1992). On the origin of the transfer RNA molecule. J. theor. Biol. 159, 199–214. Di Giulio, M. (1997). On the RNA world: evidence on favour of an early ribonucleotide world. J. Mol. Evol. 45, 571–578. Di Giulio, M. (1998). Reflections on the origin of the genetic code: a hypothesis. J. theor. Biol. 191, 191–196. Di Giulio, M. (2000). The RNA world, the genetic code and the tRNA molecule. Trends Genet., 60, 17–19. Doolittle, W. F. (1999). Lateral genomics. Trends Cell Biol. 9, M5-M8. Du, S. & Traktman, P. (1996). Vaccinia virus DNA replication: two hundred base pairs of telomeric sequence confer optimal replication efficiency on minichromosome templates. Protein Nucleic Acid Sci. 93, 9693–9698. Eigen, M. (1971). Self-organisation of matter and the evolution of cellular macromolecules. Naturwissenschaften 58, 465–523. Eigen, M. & Schuster, P. (1977). The hypercycle, Part A. The Emergence of the hypercycle. Naturwissenschaften 64, 541–565. Freund, F., Staple, A. & Scoville J. (2001). Organic protomolecule assembly in igneous minerals. Protein Nucleic Acid Sci. 98, 2142–2147. Fry I. (2000). The Emergence of Life on Earth. Rutgers University Press.

251

Gesteland R. F. (1999). The RNA World. Cold Spring Harbour Laboratory Press. Hall, T. C. (1979). Transfer RNA-like structures in viral genomes. Int. Rev. Cytol. 60, 1–26. Hall, T. C., Miller, W. A. & Bujarski, J. J. (1982). Enzymes involved in the replication of plant viral RNAs. In: Advances in Plant Pathology, (Ingram D. S. & Williams P. H. eds), 1, pp. 179–211. Hall, T. C., Shih, D. S. & Kaesberg, P. (1972). Enzyme mediated binding of tyrosine to Brome Mosaic Virus ribonucleic acid. Biochem. J. 129, 969–976. Hall, T. C. & Wepprich, R. K. (1976). Functional possibilities for aminoacylation of viral RNA in transcription and translation. Ann. Microbiol. 127A, 143–152. Heritier, M. (2001). In search of exact solutions. Nature 414, 31–32. Hopfield, J. J. (1978). Origin of the genetic code: A testable hypothesis based on tRNA structure, sequence, and kinetic proofreading. Protein Nucleic Acid Sci. 75, 4334–4338. Ibba, M., Becker, H. D., Stathopolous, C., Tumbula, D. L. & Soll, D. (2000). The adaptor hypothesis revisited. TIBS 25, 311–316. Illangasekare, M., Kovalchuke, O. & Yarus, M. (1997). Essential structures of a self-aminoacetylating RNA. J. Mol. Ecol. 274, 519–529. Johnston, W. K., Unrau, P. J., Lawrence, M. S., Glasner, M. E. & Bartel, D. P. (2001). RNA-catalysed RNA polymerisation: accurate and general template primer extension. Science 292, 1319–1325. Kaufman, R. J. (1994). Control of gene expression at the level of translation. Curr. Opin. Biol. 5, 550–557. Kennel, J. C., Wang, H. & Lambowitz, A. M. (1994). The Mouriceville Plasmid of Neurospora spp. uses novel mechanisms for initiating reverse transcription in vivo. Mol. Cell. Biol. 14, 3094–3107. Koonin, E. V. (2000). How many genes can make a cell: the minimal-gene-set concept. Ann. Rev. Genom Hum. Genet. 01, 99–116. Kuhlman, B., Melott, A. L. & Shandarin, S. F. (1996). A test of the particle paradigm in N-body simulations. Astrophys. J. 470, L41–44. Kyrpides, N., Overbeek, R. & Ouzounis, C. (1999). Universal protein families and the functional content of the last universal common ancestor. J. Mol. Evol. 49, 413–423. Lazcano, A., Valverde, V., Hernandez, G., Gariglio, P., Fox, G. E. & Oro, J. (1992). On the early emergence of reverse transcription: theoretical basis and experimental evidence. J. Mol. Evol. 35, 524–536. Lorsch, J. R. & Szostak, J. W. (1994). In vitro selection of new ribozymes with polynucleotide kinase activity. Nature, 371, 31–36. McCauley, J. W. & Mahy, B. J. W. (1983). Structure and function of the influenza genome. Biochem. J. 211, 281–294. Ma, S., Hill, K. E., Caprioli, R. M. & Burk, R. F. (2002). Mass Spectrometric Characterization of Full-Length Rat Selenoprotein P and 3 Isoforms Shortened at the C Terminus. Evidence that the 3 UGA Codons in the mRNA Open Reading Frame Have Alternative Functions of Specifying Selenocysteine Insertion or Translation Termination. J. Biological

252

D. S. STEVENSON

Chemistry (in press or online as manuscript M111462200). Maynard-Smith, J. & Szathmary, E. (1993). The origin of chromosomes I: selection for linkage. J. theor. Biol. 164, 437–446. Mohr, S., Wanner, L. A., Bertrand, H. & Lambowitz, A. M. (2000). Characterisation of an unusual tRNA-like sequence found inserted in a Neurospora retroplasmid. Nucleic Acid Res. 28, 1514–1524. Miller, W. A., Bujarski, J. J., Dreher, T. W. & Hall, T. C. (1986). Minus-strand initiation by Brome Mosaic Virus replicase within the 3’ tRNA-like structure of native and modified RNA templates. J. Mol. Biol. 187, 537–546. Nemoto, N. & Husimi, Y. (1995). A model of the virustype strategy in the early stage of encoded molecular evolution. J. theor. Biol. 176, 67–77. Nissen, P., Hansen, J., Ban, N. Moore, P. B. & Steitz, T. A. (2001). The structural basis of ribosome activity in peptide bond synthesis. Science 289, 920–930. Nitta, I., Kamada, Y., Noda, H., Ueda, T. & Watanabe, K. (1998). Reconstitution of peptide bond formation with Escherichia coli 23S Ribosomal RNA domains. Science 281, 666–669. Noller H. F., Hoffarh V. & Zimniak, L. (1992). Unusual resistance of peptidyl transferase to protein extraction procedures. Science 256, 1416–1419. Ogle, J. M., Brodersen, D. E., Clemons Jr., W. M., Tarry, M. J., Carter, A. P. & Ramakrishnan, V. (2001). recognition of cognate transfer RNA by the 30S ribosomal subunit. Science 292, 897–902. Orgel, L. E. (1989). The origin of polynucleotide-directed protein synthesis. J. Mol. Evol. 29, 465–474. Piccrilli, J. A., McConnell, T. S., Zaug, A. J., Noller, H. F. & Cech, T. R. (1992). Aminoacyl esterase activity of the Tetrahymena ribozyme. Science 256, 1420–1423. Poole, A., Jeffares, D. & Penny, D. (1999). Early evolution: prokaryotes, the new kids on the block. BioEssays 21.10, 880–888. Rosing, M. T. (1999). 13C-depleted carbon microparticles in >3700-Ma sea-floor sedimentary rocks from West Greenland. Science 283, 674–676. Schimmel, P., Geige, R., Moras, D., Yokoyama, S. (1993). An operational RNA code for amino acids and possible relationship to genetic code. Protein Nucleic Acid Sci. 93, 8763–8768. Schimmel P. & Ribas de Pouplana, L. (1995). Transfer RNA: From Minihelix to Genetic Code. Cell 81, 983–986. Ribas de Pouplana, L., Turner, R. J., Steer, B. A. & Schimmel, P. (1998). Genetic code origins: tRNAs older than their synthetases? Protein Nucleic Acid Sci. 95, 11 295–11 300. Schimmel, P. & Ribas de Pouplana, L. (1999). Genetic code origins: experiments confirm phylogenetic predictions and may explain a puzzle. Protein Nucleic Acid Sci. 96, 327–328. Schultes, E. A. & Bartel, D. P. (2001). One sequence, two ribozymes: implications for the emergence of new ribozyme folds. Science 289, 448–457. Szathmary, E. (1989). The emergence, maintenance, and transitions of the earliest evolutionary units. Oxf. Surv. Evol. Biol. 6, 169–205.

Szathmary, E. & Maynard-Smith, J. (1993). The evolution of chromosomes II: molecular mechanisms. J. theor. Biol. 164, 447–454. Szathmary, E. (1993). Coding coenzyme handles: a hypothesis for the origin of the genetic code. Protein Nucleic Acid Sci. 90, 9916–9920. Szathmary, E. & Maynard-Smith, J. (1997). From replicators to reproducers: the first major transitions leading to life. J. theor. Biol. 187, 555–571. Szathmary E. (1999). The origin of the genetic code. 15, 223–228. Szuromi, P. (2001). This week, in science review and accompanying articles. Science 294, 742. Takahashi, K. & Portegies Zwart, S. F. (1998). The disruption of globular clusters in the galaxy: a comparative analysis between Fokker–Planck and N-body simulations. Astrophys. J. 503, L49–L52. Tolskaya, E. A., Romanova, L. A., Kolesnikova, M. S. & Agol, V. I. (1983). Intertypic recombination in poliovirus: genetic and biochemical studies. Virology 124, 121–132. Vance, V. & Vaucheret, H. (2001). RNA silencing in plantsFdefense and counterdefense. Science 292, 2277–2280. Wang, H., Kennell J. C., Kuiper M. R., Sambourin, J. R., Saldanha, R. & Lambowitz, A. M. (1992). Characterisation of a novel reverse transcriptase that begins cDNA synthesis at the 30 end of a Template RNA. Mol. Cell. Biol. 12, 5131–5144. Weber, A. L. & Lacey Jr., J. C. (1978). Genetic code correlations: amino acids and their anticodon nucleotides. J. Mol. Evol. 11, 199–210. Webster, T. A., Patarca, R., Lathrop, R. H. & Smith, T. F. (1989). Potential structural motifs for reverse transcriptases. Mol. Biol. Evol. 6, 317–320. Weiner, A. M. & Maizels, N. (1987). tRNA-like structures tag the 30 ends of genomic RNA molecules for replication: implications for the origin of translation. Protein Nucleic Acid Sci. 84, 7383–7387. Weiner, A. M. & Maizels N. (1994). Phylogeny from function: evidence from the molecular fossil record that tRNA originated in replication, not translation. Protein Nucleic Acid Sci. 91, 6729–6734. White, H. B. (1976). Coenzymes as fossils of an earlier metabolic stage. J. Mol. Evol. 7, 101–104. White, H. B. (1982). Evolution of coenzymes and the origin of pyridine nucleotides. In: Pyridine Nucleotide Coenzymes (Everse, J., Anderson, B., & You, K. S., eds) pp. 1–17, New York: Academic Press. Wickner, R. B. (1993). Double-strand RNA virus replication and packaging. J. Biol. Chem. 268, 3797–3800. Wimberley, B. T., Brodersen, D. E., Clemons Jr., W. M., Morgan-Warren, R. J., Carter, A. P., Vonrhein, C. Hartsch, T. & Ramakrishnan, V. (2001), Structure of the 30S ribosomal subunit. Nature 407, 327–339. Wimmer, E., Kuhn, R. J., Pincus, S., Yang, C.-F., Toyoda, H., Nicklin, M., J. H. & Takeda, N. (1987). Molecular events leading to the Picornavirus genome replication. J. Cell Sci Suppl. 7, 251–276. Woese, C. R., Dugre, D. H., Dugre, S. A., Kondo, M. & Saxinger, W. C. (1966). On the fundamental Nature and evolution of the genetic code. Cold Spring Harbour Symp. Quant. Biol. 31, 723–736.

CO-EVOLUTION OF THE GENETIC CODE

Woese, C. R. (1998). The Universal Ancestor. Protein Nucleic Acid Sci. 95, 6854–6859. Woese, C. R. (2000). Interpreting the universal phylogenetic tree. Protein Nucleic Acid Sci. 97, 8392–8396. Wong, J. T. Z. (1975). A Co-evolution theory of the genetic code. Protein Nucleic Acid Sci. 72, 1901–1912. Wong, J. T. Z. (1991). The origin of genetically encoded protein synthesis: a model based on selection for RNA peptidation. Origins Life Evol. Bios. 21, 165–176. Xiong, Y. & Eickbush, T. H. (1988). Similarity of reverse transcriptase-like sequences of viruses, transposable

253

elements and mitochondrial Introns. Mol. Biol. Evol. 5, 675–690. Xiong, Y. & Eickbush, T. H. (1990). Origin and evolution of retroelements based on their reverse transcriptase sequences. EMBO 9, 33553–33562. Yarus, M. (1998). Amino acids as RNA ligands: a direct RNA-template Theory for the code’s Origin. J. Mol. Evol. 47, 109–117. Yusim, K., Nir, S. & Lahav, N. (2001). A model for prototRNA loading. J. theor. Biol. 208, 109-116, d.o.i:1006/ jtbi.2000.2204.