PHYSICA® Physica D 98 (1996)111-127
ELSEVIER
The evolution of self-replicating computer organisms A.N. Pargellis Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974, USA
Received 1 September 1995; revised 23 December 1995; accepted 19 March 1996 Communicated by A.M. Albano
Abstract A computer model is described that explores some of the possible behavior of biological life during the early stages of evolution. The simulation starts with a primordial soup composed of randomly generated sequences of computer operations selected from a basis set of 16 opcodes. With a probability of about 10 -4, these sequences spontaneously generate large and inefficient self-replicating "organisms". Driven by mutations, these protobiotic ancestors more efficiently generate offspring by initially eliminating unnecessary code. Later they increase their complexity by adding additional subroutines as they compete for the system's two limited resources, computer memory and CPU time. The ensuing biology includes replicating hosts, parasites and colonies.
I. Introduction The detailed mechanism by which living systems formed from the prebiotic conditions that existed on Earth over four billion years ago is not understood and is the focus for a great deal of ongoing research [1-I0]. At a very basic level, the general scenario for the creation of life is as follows. The fundamental chemical and physical laws determined how atoms and molecules interacted, generating ever more complex molecular species. Eventually, out of the primordial broth consisting of these molecular building blocks arose self-replicating strands of information, probably RNA, that were made from the nucleotide bases that comprise today's genetic code. In contrast, the fundamental laws used in artificial self-replicating systems studied thus far are much simpler, defining limited systems within which information is stored and manipulated. These rules range from defining the molecules from which templates are made
in chemical systems to defining the opcodes used for generating self-replicators in a computer model. Ideally, one would devise an experiment that simulates the prebiotic world by having a system with a minimum number of initial constraints. For example, this might include not having previously defined genetic codes. Although no such experiments have yet been done, there have been a number of studies that have followed the evolution and dynamics of systems containing a predefined genetic code. This has included biological and chemical systems as well as mathematical and computer models. The Qt~ bacteriophage has been used to study Darwinian evolution of existing templates [11,12] as well as the de novo generation of RNA in a test tube containing Q~ replicase and activated nucleotide monomers, but no template [13]. Chemical systems have been used to study the kinetics of template-catalyzed replication [ 14-16]. At least one mathematical model has been devised to study selfreplication [17]. Turing's computing machine uses a
0167-2789/96/$15.00 Copyright © 1996 Published by Elsevier Science B.V. All rights reserved PII S0167-2789(96)00089-9
112
A.N. Pargellis/Physica D 98 (1996) 111-127
set of symbols which, with the machine's present state, determines subsequent behavior [18]. Von Neumann constructed a cellular system of self-reproducing 29state automatons [19]. A variety of computer models, starting with Core Wars studies [20,21], and more recently the Tierra [22] and Avida worlds [23], have been designed that use basis sets of opcodes from which sequences of instructions are constructed that code for self-replication. In this paper a computer model ("Amoeba" world) [24] is described that mimics some of the possible behavior of early biological life. This model uses a predefined genetic basis set of 16 opcodes. However, this basis set is simple enough that, for the first time, selfreplicating sequences (sometimes referred to in the literature as "digital life") are spontaneously generated from an instruction soup that is initially composed of randomly selected opcodes from a nontrivial basis set. A unique observation made with this computer model is that these ancestral organisms are large, unlikely and intricate combinations of opcodes that are able to replicate themselves. This is contrary to the conventional wisdom that early life formed from initially simple molecules that combined to give molecules of steadily increasing complexity [1-4]. The factors governing the spontaneous generation of these self-replicators, and their subsequent evolution and optimization, are discussed.
2. Introduction to the organization of information systems There appear to be three main requirements for the creation and subsequent development of a selfreplicating information system. First, there must be a source alphabet (for example, a genetic code or set of opcodes used as a language for a computer) which is used to physically or symbolically represent the information. Second, the information must be organized into subunits by encapsulating or partitioning it (for example, an RNA strand or computer program). Third, there must be a mechanism, such as randomly introduced mutations, for improving and evolving the information content. Ideally, these requirements would
arise naturally out of a random, prebiotic world that contains as few rules as possible. Since this is not a trivial task, previous studies, as well as this one, start with a system containing a well-defined basis set, a means of segregating the information, and a means of manipulating and evolving the information. These three main requirements are discussed in greater detail in the sections that follow. First, for a system to be self-continuing and evolving, a basis set must exist that enables the encoding and subsequent duplication of the information within the system. This requires a basis set that is large enough to allow for some interesting diversity as well as not being so large that the probability of generating a particular sequence is vanishingly small. In biological life there are four different nucleotide bases (A, U, G, C for RNA) that are read in groups of three, resulting in 64 different combinations (codons). Many of these triplets are redundant, coding for only 20 different amino acids and three stop codons. Thus, the effective genetic basis set is composed of only 20 different instructions, from which are manufactured all of the required chemicals that drive an organism's chemical machinery. However, this small basis set is large enough to generate a huge number of combinations. The number of possible combinations for a sequence of length v and basis set X is 2/. For example, for a sequence of 10 operations, a binary basis set yields 2 l° ~ 103 combinations whereas using a basis set that is 10 times larger (e.g. the biological basis set) yields 10 billion times as many combinations, 20 l° ~ 10 ~3. The real, biological world is far more complex than any artificial system studied. This can be partly ascribed to the following properties exhibited by many biological systems: the information is coded in very large DNA sequences (even viruses usually have more than 103 codons), eukaryotic DNA strands can simultaneously generate RNA strands at multiple locations, mechanisms exist for turning on ("expressing") different portions of the code, and the set of 20 amino acids can build proteins that are enormous three-dimensional molecules. In contrast, artificial systems study the behavior of very small sequences which: (i) are typically executed only one at a time (unless a parallel processing machine is used); (ii) use
A.N. Pargellis/Physica D 98 (1996) 111-127
basis sets that contain anywhere from 2 to 64 elements, and (iii) perform very limited functions that manipulate data stored in computer memory. Binary basis sets have been used in chemical and mathematical systems. Chemical studies [14-16] typically use solutions consisting of two molecular species, A and B, that combine to yield species C, where species C is the template that catalyzes the above combination reaction. Banzhaff has studied the spontaneous generation of self-replicators from a mathematical system based on the multiplying together of square matrices [17]. He used binary sequences of n 2 length that are folded into n × n matrices that are multiplied in pairs to generate offspring. Larger basis sets have been used in computer models. Turing [18] discussed a hypothetical computing machine that would read a tape containing symbols chosen from a basis set that was tailored to the specific problem (sequence) to be computed. The machine's program would act on each symbol, that along with the machine's present state, would determine its subsequent state. Von Neumann took this a step further by considering self-reproduction in a cellular system of automatons (Turing machines) that would interact with each other [19]. Dewdney created Core Wars [20,2 i ], where battle programs made up from a basis set of nine (later 10) opcodes competed for computer memory. More recently, Ray studied Darwinian evolution in a world called Tierra [22], a computer simulation that uses a basis set of 32 opcodes with which to code for "digital organisms" that compete for CPU time and memory. Variants of Tierra are the Avida [23] and Amoeba [24] worlds which use basis sets of 20-30 and 16 opcodes, respectively, in addition to some other major changes. Second, an information system must be organized into subunits. In biology this is accomplished at the lowest level by storing the nucleotide bases in DNA or RNA strands. At a higher level these strands can be organized into chromosomes and, with associated proteins, are encapsulated within cells. In computer models, the information is organized into cells that take the form of operations or executable programs. For example, a Turing machine reads symbols placed on a long, one-dimensional tape, whereas von Neumann's cellular automata contain individual instructions organized
113
on a two-dimensional lattice. More complex systems also enable cells to interact with each other by using each other's code resulting in parasites, colonies and hypercycles. Third, the information system must evolve in order to optimize some aspect of its performance. This can occur either through mutations such as substitutions, deletions and insertions [25, 26], through the bulk exchange of genetic material between cells, or some combination. Typical mutation rates in biological systems, due to the uncorrected mispairing, are about 10 - 4 mutations per nucleotide base per replication cycle [4], reduced by repair mechanisms to about 10-1°mutations per base. In the absence of repair, the 10 - 4 rate prevents sequences of more than a few thousand bases from functioning, since most mutations are deleterious. Although point mutations result in very low evolutionary rates, higher and more efficient rates of genetic optimization can be achieved through the exchange of genetic code between interacting cells, through interactions where there is viral or sexual transfer of information. Viral transfer occurs when one cell (virus) writes all or part of its code into its host's code. Sexual transfer occurs when two parents exchange pieces of their respective codes, as is done in eukaryotes, bacterial conjugation, and computer models (genetic algorithms 127]).
3. General operation of the computer model The three categories described briefly in Section 2, genetic basis set, information organization, and evolution, will be discussed specifically for the computer model 1 discussed in this paper. First, the genetic code, with a basis set of 16 operations, is motivated and outlined. Second, the partitioning of the computer's memory into separate cells is described in detail. Third, the overall external environment, within which the replicating computer cells must compete, is discussed, including mutations and fitness parameters.
! A copy of the computer code, written in Basic, is available from the author on request.
114
A.N. Pargellis/Physica D 98 (1996) 111-127 r
INITIATION Allocate block of memory
COPY
~
--
Copy ith instruction Increment i to i+t If last instruction; exit loop Jump back to loop's start
*~1
J
'"I SEPARATION Separate child from parent Activate child: pointer, etc.
RESET Load address registger Reset numerical registers Jump back to cell's start
[ Fig. 1. Block diagram depicting the replication process in a computer, separated into four main groups of subroutines: Initiation, Copy, Separation, and Reset.
The choice of instructions for the basis set used in this model was motivated by Tierra, the computer model of Ray [22], and the general processes observed in biological replication [13]. Fig. 1 is a block diagram showing the overall replication cycle in a computer world, which is divided into four phases: initiation, copy, separation, and reset. The initiation phase involves allocating a block of memory ("child") for a parent cell to copy its code into. This is followed by a copy phase, consisting of copy loop, where instructions are copied individually into the child's block of memory. Loops are exited through the use of conditional statements that transfer execution to the following separation phase where the child's code is made executable. The reset phase resets registers and jumps the parent's pointer back to the beginning of its code, enabling the parent cell to begin generating a new child. Therefore, the basis set for a computer model that studies self-replication should include commands that are similar to the following: copy instructions from one part of memory to another, move an instruc-
tion pointer to another part of memory, have a means of exiting loops (by comparing the contents of registers), have a means of altering the contents of registers and finally, have a means of simulating parallel processing by time-sharing CPU time between multiple, executable programs (cells). The computer codes used in Core Wars and Tierra follow this general format. In the model discussed here and previously [24], the basis set used by Ray has been abbreviated and modified in several fundamental ways so as to enable the "spontaneous" generation of replicators from a world initially composed only of random sequences. A summary of the computer basis set, consisting of 16 operations (opcodes), is shown in Table 1 and discussed in more detail in Appendix A. Each opcode is numerically described by a base-four number, using a two-digit number consisting of 1 4 . A "0" means there is no instruction at that memory location. For descriptive purposes, each opcode is also given a four-letter acronym. For example, the operation of allocating a block of memory in which to copy code is given the number 12 and the descriptor Memory allocation (MALL). Second, the computer memory is organized into a stack that is artificially divided, via software, into 1000 subunits or blocks of memory called cells. These cells can interact with each other depending on their position on the "interaction grid", a piece of which is shown in Fig. 2. Each cell occupies a coordinate on the grid, not necessarily unique, and under certain conditions can interact with a nearest neighbor by using its code. The interaction grid is discussed in more detail below. Any "active" cell, with an instruction pointer, must have at least one instruction and can contain up to 30 instructions. The instructions that fill the slots are selected from the set of 16 opcodes that comprise the genetic basis set. An empty slot has the value 0. Each cell is labeled with its unique position among the 1000 possible locations in the stack, with the oldest cell occupying the uppermost position, 1000. The simulation is run as a virtual parallel processor, with each cell successively given a small piece of CPU time (coined by Ray as a "timeslice"). In the data discussed here, the timeslice was about four times the cell's size, raised to the 0.8 power, thus creating a slight pressure for cells to reduce their size. The time
A.N. Pargellis / Physica D 98 (1996) 111-127
115
Table 1 A brief listing of the instruction set used in the Amoeba simulator. Code
Label
Function
I1 12 13 14
PNOP MALL COPY DIVD JMPB CALL RETN JMPF IFAG PYOP RESB IFAL ADRF RESA BEQA ADRB
Put complement of next op into cx, do not execute that op Allocate block of memory for daughter cell Copy parent's axth opcode to child; increment parent's ax-register Divide daughter from mother, activating daughter's pointer Jump back to opcode in cx, else to first statement if cx empty Call the opcode in cx, put op after call into dx-register Return to op placed into dx by previous call, else to first statement Jump to opcode in cx, else to last statement if cx empty If ax > bx then skip next instruction Put complement of next op into cx, do execute that op Reset bx-register; bx = cell size (last instruction) If ax < bx then skip next instruction Search forward for opcode in cx, put pointer location of cx into bx Reset ax-register; ax = 1 bx ~---ax Search back for opcode in cx, put pointer location of cx into ax
21
22 23 24 31 32 33 34 41 42 43 44
Note: The first column contains the numerical codes used, the second one contains the four-letter acronym for each computer operation used, followed by a brief description, including main defaults. Default values for registers are: ax = cx = dx = 1, bx = position of cell's last operation. Appendix A contains a more detailed description of each opcode summarized here.
of this c o m p u t e r world is divided into generations, where b e t w e e n 50 000 and 200 000 opcodes are executed during a typical generation. A new generation is initiated w h e n either 200 000 opcodes have been executed or w h e n all available cells in the stack are occupied by parents and their e m b r y o s (children w h o s e parents have not yet finished c o p y i n g in their code). At the beginning o f a new generation, cells are reaped (slots emptied, registers set to the default values, and pointer deactivated) with a higher probability for the older cells near the top. All cells surviving the reap cycle are m o v e d up the stack, leaving about 3 0 0 - 4 0 0 e m p t y cells at the bottom of the stack. The bottom 40 cells in the c o m p u t e r stack are loaded with random sequences of c o m p u t e r operations w h e r e the sequences
contain the executable code of cells (which may or
Fig. 2. The interaction grid used for the computer model. Cells randomly diffuse about, interacting with nearest neighbors only: (a) an isolated self-replicator; (b) a parasite using some of the code of its nearest neighbor; (c) cells sharing their pointers within a colony.
have a uniform size distribution b e t w e e n 1 and 25 operations. This provides a continual b a c k g r o u n d o f new sequences. The u p p e r m o s t slots (usually 600 or 700) m a y not be self-replicators), w h i c h survived the reap
E v e r y cell is given an instruction pointer that m o v e s
cycle, and their embryos. S u b s e q u e n t M A L L c o m -
through its code with the instruction it is presently
mands allocate m e m o r y f r o m the e m p t y blocks for fu-
pointing to being executed next. In addition, each cell
ture children. W h e n there are no m o r e available e m p t y
is given four registers: two numerical registers, ax
slots, the generation is o v e r and the cycle repeats
(copy register) and bx (size register); an o p c o d e reg-
itself.
ister, cx (opcode to j u m p pointer to); and one address
116
A.N. Pargellis/Physica D 98 (1996) 111-127
register, dx (return address register). The instruction pointer is a two-vector: ip (cell, op) where "op" is the location (from 1 to 30) of the operation presently being executed in the "cell" associated with the pointer. Initially, op = 1 and is successively incremented to 2, 3 . . . . unless one of the operations moves (jumps) the pointer to a new position. The two-dimensional interaction grid has periodic boundary conditions and serves only to define which cells can interact with each other. As mentioned above, each cell's code is a block in memory. Each cell has a position on the grid that changes as the cell randomly diffuses about on the grid. Under certain circumstances a cell can interact with a nearest neighbor by sending its pointer into the neighbor's code and executing it. This enables parasites and colonies to exist. There are two ways a cell loses its pointer: either the cell lacks a jump command at its end, so the pointer is not retained and is "lost", being appropriated by a neighboring cell, or the cell attempts to move its pointer to an opcode it does not contain and therefore cannot find. In the former case, the cell has permanently lost its pointer to another cell which now retains the extra instruction pointer, enabling it to use that cell's CPU time as well as memory for an additional child. In the second case, the cell usually retains ownership of the pointer and is now considered a parasite. However, in some situations, for example via an ADRF or ADRB command, the parasite loses its pointer to the host. This enables the host to now use the parasite's pointer (e.g. the parasite's CPU time) to copy the host's code into the block of memory originally allocated for the parasite's child (e.g. steal memory from the parasite). Therefore, in addition to optimizing their code, cells can also compete for the computer's two resources, CPU time and memory, by trapping pointers from nearby cells. Third, the overall environment within which various cells must compete consists of the "fitness landscape". In order to survive, a self-replicating cell must, in addition to replicating itself accurately, be able to replicate faster than its competitors. A cell's replication time is an extremely complicated function of many (n) parameters, six examples of which are: number of other viable cells, number of copy commands in its
copy loop, penalties incurred for attempting illegal operations, susceptibility to parasites, number of nearby cells that lose their pointers, mutation rate, etc. Thus, the fitness landscape is a dynamic and complicated n-dimensional surface that is the geometric locus of the replication time which is a function of n variables. Table 2 lists the default values of registers, as well as the time penalties incurred when a cell attempts an illegal operation such as copying or dividing before any memory was ever allocated. The fitness landscape is dynamic, being constantly altered by mutating cells as they pressure each other by competing for the limited resources of CPU time and computer memory. In the Amoeba simulation, mutations were invoked only when a child was separated from its parent. There were three types of mutations (with relative probabilities); substitution (50%), deletion (25%) and insertion (25%). It was observed that no qualitative difference could be observed when cells were mutated continually (probability of about 10-4-10 -3 per executed instruction) or only when a child was initiated (probability of about 10% that a single mutation occurred). This was done in order that the simulation would run faster and so that there would be no size dependence on the probability a cell would be mutated. Fig. 3 shows the anatomy and operation of a selfreplicator. The replicator cell is shown in the center of the figure and consists of nine operations that are executed starting with the first operation at the top. The pieces of code associated with each of the four replication phases: initiation, copy, separation and reset, are labeled on the left. On the fight are shown the varying states of three of the four registers. First, note that the order of the four replication phases is somewhat unexpected. For example, on the first cycle, this particular replicator tries to divide off a child before it has done any MALL or copying. Secondly, note that the opcode loaded into the cx opcode register is defined to be the complement of the argument of the opcode responsible (PNOP commands for this cell with the next opcode its argument) for loading the cx-register, where the complement of 1 is 4 and of 2 is 3. As an example, if we define "C" as the operator which takes the complement of the argument, C(42) = 13. Complements are used in order to prevent a pointer movement command
117
A.N. Pargellis/Physica D 98 (1996) 111-127 Table 2 A listing of the default and time penalty rules that determine the fitness landscape Category
Rule
Defaults
Numerical "copy" register ax: ax = I Numerical register bx: bx = cell size Opcode register cx: no opcode Return address register dx: dx = I Jump back (JMPB): to first instruction Jump forward (JMPF): to last instruction Return (RETN): to first instruction Address back (ADRB): RESA Address forward (ADRF): RESB
Lose one third of timeslice if
Attempt multiple Memory allocates Copy before Memory allocate Attempt to copy nonexistent statement Jump or opcode-search with cx empty No host found during search Return without prior Call
Lose entire timeslice if
Instruction pointer lost Divide an empty child Call at cell's end (no Return possible)
Note: The fundamental pressure driving the evolutionary process is the time a parent needs, in computer opcodes executed, to generate a copy of itself (child cell). (such as JMPB) from merely j u m p i n g the pointer to
PHASE
OPCODES
REGISTERS ax bx cx
the opcode used as the argument of the PNOP or PYOP command. In this replicator, the complements used for loading the cx-register are: C(RESA) = COPY, since C(42) = 13. Similarly, C(IFAL) = JMPB. The PNOP command indicates the next operation is not executed, but is merely an argument whose complement is placed into the opcode (cx) register. Until the cell has finished copying its code to a child, the IFAL
(4) Reset Registers
code, complement of the cell's third statement, JMPB. However, when ax > bx = cell size, IFAL is false, the [PNOP/COPY] sequence is executed, and the JMPB command jumps the pointer back to the RESA command at the cell's beginning as RESA is now the new opcode in the cx-register.
4. Evolution of replicators Fig. 4 shows the average probability (Pn) that a randomly generated sequence of n operations is an
"---~
1 1
(JMPB)
1
(3) Separation
DIVD
1
(1) Initiation
MALL
1
statement is true. Therefore, the PNOP command is skipped and the JMPB command is to the IFAL op-
RESA PNOP
_~__
1-10
PNOP
10
COPY JMPB
J
1-10 2-10
IFAL (2) Copy Loop
9 "
RESA IFAL
"
RESA
Fig. 3. Anatomy of a self-replicator. The replicator, consisting of nine instructions (opcodes), is shown in the center of the figure. The four different stages of the replication process are shown on the left-hand side. The values in the three registers ax, bx and cx are shown in the right-hand columns. initial self- replicating ancestor in a previously lifeless world. The average probability (Pn) for a sequence of length n opcodes (size) was determined as follows. For the ith run, it was recorded how many randomly
A.N. Pargellis/Physica D 98 (1996) 111-127
118
generated sequences (each of n opcodes only), si, were required before a self-replicator arose. This was done for each of q runs and the average taken, where Pi = 1/si, and
( P n ) - Zq=l(1/Si)
(1)
q For each size n, the number of runs q shown in Fig. 4 were: n(q) = 7(14), 10(15), 15(12), 20(16), and 25(13). The runs were much longer for worlds where sequences were generated for the smaller sizes. For example, a size seven self-replicator typically appears about once every 40 000 sequences, with 40 new sequences generated every generation. The error bars are not the standard deviation a but the precision [28] of the measurement a / 4 ' - ~ (also referred to as the standard error), since the probability (Pn) is not a Gaussian distribution distributed about some optimum number of randomly generated sequences. Instead, (Pn) is a constant, independent of the number of previously generated sequences. The smallest self-replicator is of size five where the last instruction is one of the {JMPB, RETN} commands which moves the pointer back to the first instruction of the cell's code. The first four instructions are one of six permutations of the set of four commands, {MALL, COPY, DIVD, IFAG}. For a sequence of five instructions, chosen from a basis set consisting of 16 instructions, there are 165 ----- 1 million possible combinations. Of these there are 12 possible self-replicators, so that the probability of a randomly generated sequence of size five being a self-replicator is P5 = 12/165 ~ 10 -s. If one next considers sequences consisting of six operations, there are several categories of self-replicators. First, there are the 12 size five replicators with anything in the sixth position. In these 12 cases the sixth opcode is irrelevant because the pointer never executes the last instruction as the fifth operation (JMPB or RETN) returns it to the beginning of the cell. In addition, there are self-replicators that are either a new size six sequence, where the pointer is returned by the sixth opcode, or a sequence including a size five replicator plus an embedded, neutral operation. Therefore, the probability of generating a self-replicator with m > 5 instructions is: Pm> (12 x 16m-S)/
16m = P5 since a replicator of size five can have anything attached to it, in the sixth position of the sequence, plus entirely new combinations of six opcodes. It turns out that P6 "~ 2P5 for the Amoeba world discussed here. Thus, an important observation that arises from this study is that the probability of a randomly generated sequence resulting in a self-replicator increases with replicator size. For a sequence of size 25, the largest size tested to date, P25 ~ 10P5 ~ 10 -4. It is important to note that this dependence of (Pn) on size n, although intriguing, is nevertheless a feature of the Amoeba simulation and not necessarily of other artificial life studies. For simulations such as Tierra [22] or Avida [23], which use larger basis sets (32 opcodes and from 20 to 30, respectively); the probability of generating a self-replicator from random sequences of code is so remote that the above behavior cannot even be tested. It is also interesting to determine the probability (Po~) for infinitely long sequences. While it is not trivial to exactly calculate (Poo), it is possible to get an estimate to the upper bound to P,n for large m. This is determined by the number of ways a pointer can get trapped in a loop that contains a finite number of opcodes that do not code for self-replication since, in the absence of such trapping, the pointer will eventually access a portion of code enabling the cell to replicate itself. If the first operation in a cell's code is JMPB or RETN, the pointer is trapped. As a crude estimate, 30% of the randomly generated combinations contain JMPB (or RETN) for the cell's first or second opcode, e.g. [JMPB/-] or [x/JMPB/-] respectively, where "x" represents a single instruction and " - " represents any sequence of instructions. Many of these combinations can trap the pointer in an infinite loop right at the cell's beginning. Therefore, this quick estimate shows P ~ < 0.7 is an upper limit for the probability of generating a self-replicator from infinite sequences generated in the Amoeba simulation. The actual upper limit is undoubtedly much less as the pointer could get trapped in more elaborate schemes where the pointer is jumped around later on in the code. Finally, the presence of colonies enables some additional computer sequences to self-replicate that would not survive in the absence of a colony. This
119
A.N. Pargellis / Physica D 98 (1996) 111-127 ' ' ' ' I ' ' ' ' I
....
I ....
I ' ' ' ' I ' ' ' '
LO
I
i
o
10
4-3 •~
.Q
5
.Q o
0 0
5
10
Size
15
20
25
30
(operations)
Fig. 4. Probability, P,1, a randomly generated sequence consisting of n operations is a self-replicator. Pn = 0 for n < 5 and P,~ = 1.14 × 10-5 for n = 5. Data are shown by crosses. results in P,(exp) > Pn(th) since the theoretically determined probability is determined for iso-
Prebiotic
- Protobiotic
, Biotic
lated self-replicators. For example, the sequence [ M A L L / C O P Y / I F A G / J M P B / D I V D ] is not a selfreplicator as the cell loses its pointer after dividing
t
off the first child. However, if the parent and child are isolated from the surrounding primordial soup, the parent's pointer will be lost to the child. In this way,
V
[
i
colonies can develop where cells lose their pointers to each other. Fig. 5 shows a schematic diagram of evolution in the computer world discussed here. This closely follows the proposed phases of evolution for biological life [3]. Initially the world consists of "prebiotic"
:...Jr.:
!.......i
sequences which are incapable of self-replication. These sequences either contain less than five instructions, the m i n i m u m required for a self-replicator in this model, or they lack a required operation such as allocating memory, MALL, or dividing off the completed child cell, DIVD. Most of the sequences in the prebiotic world contain only one or two instructions, being the children of cells with codes such as [ M A L L / C O P Y / D I V D ] . However, since P5 ~ 10-5, there is a better than 10 -5 probability that a randomly generated sequence of at least five instructions can replicate itself. Therefore, an ancestor eventually is created which may either be a "protobiotic" replicator, where it copies its code incorrectly at least once,
Fig. 5. A pictorial description of evolution in this computer model. Solid arrows represent moving the pointer to a particular opcode (or address if a RETN command), in some cases to another cell (host). Dashed arrows represent pointers lost to a neighboring cell. Dotted arrows to dashed cells represent children into which the parent has incorrectly copied its code. The prebiotic stage consists of cells incapable of replicating themselves accurately even once. This stage is replaced by the protobiotic stage where an ancestral cell is capable of replicating itself correctly once but also replicates incorrectly at least once. The final biotic stage consists of efficient replicators that replicate without error.
120
A.N. Pargellis/Physica D 98 (1996) 111-127
m MA LL
RESA
@
ADRB
m
MALL
MALL
m DIVD COPY
CALL - -
( rc>
COPY
OPY --
IFAG COPY
COPY --
IFAG
JMPB--
RETN
Fig. 6. The evolution of a protobiotic to biotic replicator from its initially very inefficient ancestral form (left-hand sequence) to a very efficient descendant (right-hand sequence). The shaded subcells represent pieces of code that are useless to the operation of the cell. The left-hand ancestor and an early variant (middle) are protobioticsbecause they lose their pointers to a neighboring cell after copying themselves once. or a "biotic" replicator that can copy its code many times without any errors. In either case, mutations will drive the ancestor to rapidly evolve into a more efficient replicator as it competes with parasites and variations of itself for computer memory and CPU time. Cells survive by minimizing the number of opcodes executed in order to generate a child. This frequently occurs in sudden jumps [22,24,29], where cells lose sequences of three or four operations in a short period of time, typically less than 50 generations. This is similar to punctuated equilibrium, an evolutionary pattern sometimes observed in the fossil records [30-32]. Fig. 6 shows the evolution of a protobiotic replicator to a biotic replicator over a period of 4200 generations. The code of the initial ancestor, a randomly generated sequence of 19 computer opcodes, is shown in the left-hand rectangle, read from top to bottom. The cell is very inefficient, performing several operations incorrectly: losing the pointer after the (second) DIVD operation, repeating the MALL operation in each copy loop, generating an empty child with
the first DIVD operation, and containing four separate pieces of useless code (introns) for a total of nine useless operations. The small opcodes given in square brackets are actually just showing the complementary opcodes being loaded into the cx-register. The opcodes in parentheses are not executed because they follow a PNOP statement which loads the cxregister and then skips the next statement. Note that the [PNOP/(PYOP)] sequence loads RETN(23), the complement of PYOP(32), into the cx-register. The CALL is thus to the RETN statement which then returns the pointer to the MALL statement immediately following the CALL. Unfortunately, this cell can only copy itself once, since the pointer is then lost to another cell in the soup, shown by the heavy arrow pointing downwards. However, this is not fatal, because the pointer is usually entrapped by a neighboring cell that is a replicator of the same species. In other words, primitive colonies form, composed of cells which trade pointers with each other. These colonies are primitive types of multicellular life where cells exchange resources, in this case computer memory and CPU time through the exchange of instruction pointers. The middle rectangle shows the code for one of the variants the cell has evolved into after about 1400 generations. This version replicates faster because it only contains 14 operations, having lost most of the useless code. By losing portions of their unused code, cells can replicate faster and eventually dominate the soup. This is analogous to the molecular evolution studies of Spiegelman [12] who observed self-duplicating nucleic acid molecules (Q~-RNA) to reduce their size to only 17% the original size when incubated in test tubes where they no longer required the viral genes encoding proteins. In addition, this variant has two COPY statements in the copy loop, resets the ax-register, and eliminates the multiple memory allocations. As before, this cell loses its pointer to a neighbor after correctly copying its code once. This cell finally becomes a truly viable biotic, shown by the sequence in the right-hand rectangle, 4200 generations after the ancestor emerged from the soup. This cell has redundant routines for loading the opcode register, resets the axregister (using the second ADRB operation), avoids
121
A.N. Pargellis/Physica D 98 (1996) 111-127
multiple memory allocations and only suffers one minor time penalty from a return with no prior CALL statement. It is not clear why, or whether, the redundant routines for loading the opcode register is a benefit to the cell since this is rarely seen. Fig. 6 shows a general trend observed in the evolution of self-replicators. The initial ancestor is a large, randomly generated sequence, and therefore contains lots of useless code. The ancestor discussed here has only eight useful operations, nine useless operations (shown by the shaded regions) and two deleterious operations (the first MALL and DIVD). Thus, the original ancestor can be seen to be very inefficient, primarily because it contains a great amount of useless code (introns). Random point mutations generate variants that lose this unneeded code, through deletions, and then begin to incorporate additional subroutines, through insertions, such as loading the opcode register and additional COPY commands.
:'''''''''I'''''''''1'''''''''1'''''
~
4
0
3
....
~2
0
1000
2000
3000
4000
Time (gens) Fig. 7. The change in polymeric entropy of the system as a function of time in generations. A robust self-replicator finally arose at the 1274th generation, feeble protobiotics emerged at the 42nd, 800th and 1234th generations. Ng
Sp = - ~_j Pi ln(pi),
5. Entropy as a m e a n s of quantifying the formation of life
To some extent, the generation and subsequent development of living organisms can be quantified by measuring the changes in the entropy of a system's information content [3]. The "polymeric" entropy 2 Sp is a measure of the average number of states available to a typical sequence of operations (string) and is thus an indicator of the degree of order in a system. But Sp is a useful indicator of the emergence of living systems in a previously lifeless world only if one observes a decrease in Sp that is sustained over a long enough time. This is because the entropy can precipitously drop for short periods of time since this system can become dominated by parasitic sequences, and thus ordered by processes other than the development of living organisms. The polymeric entropy Sp, per string, is defined [3,33] as
2Chris Adami suggested this nomenclature in a private communication.
(2)
i=1
where Pi = N i / N t o t is the density of genotypes: Ntot being the total number of sequences in the sytem, Ng the total number of different genotypes, and Ni the number of sequences that are the ith genotype. Therefore, a system with only one genotype has Ng = l, Pi = P l = 1 and Sp = 0, whereas a system with all unique sequences (genotypes) has Ng = Ntot, Pi = l/Ntot and Sp = ln(Ntot). Fig. 7 is a plot of the entropy Sp, averaged over five-generation time intervals, versus the time in generations. These data show, for a typical run, the development of the system from the initial chaotic phase (prebiotic world) to a phase where there are self-replicating organisms dominating (biotic world). A sustained drop in Sp is correlated with the spontaneous generation of a self-replicator in the 1274th generation. This organism's code is sufficiently efficient that it and its children and mutated variants are able to rapidly become the dominant genotypes in the world, resulting in a consistently lower polymeric entropy of the world. Several earlier protobiotics, at the 42nd, 800th, and 1234th generations, failed to establish a foothold and
122
A.N. Pargellis/Physica D 98 (1996) 111-127
were reaped or suffered a fatal mutation before they had enough children to continue the species. As briefly mentioned above, Sp frequently drops to very low levels during the prebiotic phase. But these decreases are ephemeral, lasting less than 20 generations in the Amoeba world, and are not due to the emergence of life. Rather, they indicate a temporary increase in the system's order as it is occasionally dominated by oneand two-instruction children of "parasitic" sequences that are not able to self-replicate but whose children dominate the gene pool for a few generations. Since the probability of generating such parasitic sequences is quite high, Sp can rapidly fluctuate during the prebiotic phase.
primitive self-replicating computer algorithms. This was presumably the case in our world when biological life (self-replicating strands of genetic code) arose. It has not yet been accomplished in artificial life studies. It should also be noted that the physical world we live in is also governed by a set of sophisticated and complex laws defining (and thus limiting) the ways in which atoms combine to form molecules, the rates at which molecules react to form new molecules, etc. It is possible that the set of genetic codes generated by chance in an ensemble of worlds similar to ours is more limited than we might imagine, with the possible genetic codes very similar to the one we have today.
Acknowledgements 6. Summary and discussion of results In this model the early evolution of primitive cells has been studied. The probability of a randomly generated sequence being a self-replicator increases with number of computer operations. In fact, a sequence of infinite length will always be able to replicate itself unless the instruction pointer becomes trapped in an infinite loop. Once a self-replicator has been created, the evolutionary pressures rapidly drive variants of the ancestor to duplicate themselves as quickly as possible. Cells compete for computer memory and CPU time. They can appropriate both by retaining their pointers while capturing pointers lost by neighboring cells. Cells can also replicate faster if they optimize their code by adding additional copy commands into their copy loops. Hosts can prevent parasites from allocating valuable memory by loading their opcode register in a way that prevents their code from being used by parasites to generate parasitic children or by capturing a parasite's pointer in a routine that generates children for the host or traps the parasite's pointer in a useless loop. This model starts with a world consisting of a fairly sophisticated set of algorithms (opcodes) that cells use to replicate themselves. The ideal prebiotic model would initially have no predefined genetic basis set. Out of some ill-defined chaotic mess would arise a genetic code that would be manifested in the form of
I am indebted to Bernie Yurke who gave me much encouragement and asked many excellent questions during the course of these investigations. I also received many useful suggestions from John Denker, S. Keshav, Ed Rietman, David Tank, David Stoler, and Paul Graham. Finally, I thank Chris Adami for his suggestions and comments regarding entropy, which led to the studies reported here in Section 5.
Appendix A There are five categories of computer instructions used in this model: fundamental replication algorithms: MALL(12), COPY(13), DIVD(14); operations that load the opcode register (cx): PNOP(I 1), PYOP(32); pointer movement operations JMPB(21), JMPF(24), CALL(22), RETN(23), register manipulation operations: RESB(33), RESA(42), BEQA(43), ADRF(41), ADRB(44) and conditional operations: IFAG(31), IFAL(34). This section discusses in some detail the operation of each opcode used in the 16element basis set. There are many examples of these opcodes in Figs. 8-10. The operational details of the self-replicators shown in Figs. 8-10 are discussed more fully in Appendix B. PNOP(11) - Load the complement of the next opcode (subtract instruction's numerical code from 55)
123
A.N. Pargellis/Physica D 98 (1996) 111-127
COPY JMPB --
JMPF
(a)
MALL DIVD MALL
(a)
---
copy
IFAG
(b) COPY - -
COPY COPY COPY
IFAG JMPB - -
DIVD RESB
IFAG
DIVD COPY
DIVD
RETN - -
\~J (b)
I
MALL RESA
- -
(c)
RETN
DIVD RETN
(c)
JMPB
(d)
Fig. 8. Biology of protobiotic cells. Sample genotypes of protobiotics that have just emerged from the primordial soup. See text for description of codes in (a)-(d). into the cx opcode register and do not execute the next instruction. Examples are shown in Figs. 8(d), 9(b), (d) and 10. The pointer skips it to the following instruction. This opcode is important as it is the means by which a cell loads its opcode register. It is this opcode which is searched for (and if found jumped to) by the jump opcodes JMPB and JMPF. For example, the sequence: [PNOP/RESA] loads the cx-register with the complement of RESA which is COPY ( 5 5 - 42 = 13). The RESA operation is not executed so that the ax-register is not reset. Subsequent JMPB (JMPF) commands would jump the instruction pointer back (forward) to the first occurrence of the COPY opcode. MALL(12) - Allocate the next available block of memory. This block initially consists of 30 empty slots for the parent's opcodes to be copied into. All selfreplicators use this opcode as shown in Figs. 3, 6 and 8-10. COPY(13) - Copy the opcode at the axth location of the parent to the axth location of the
(d)
Fig. 9. Sample genotypes of biotic cells that have just emerged from the primordial soup and can replicate themselves continuously: (a) the smallest self-replicator of size five; (b)-(d) larger replicators. See text for details.
child; increment ax-register. This operation first copies the parent's opcode, located at position ax, into the axth position of the child's memory block. Then the parent's copy (ax) register is incremented, ax = ax ÷ 1. This opcode is required for all self-replicators. DIVD(14) - Divide child from parent, activating child's pointer and registers. The child's code is now executable. The parent can now allocate memory (MALL) for another child. This opcode is also required for all self-replicators. JMPB(21) - Jump the instruction pointer to the opcode stored in the opcode register, cx. JMPB moves the pointer, from its present position, towards earlier instructions, stopping when it first encounters the opcode stored in cx. Examples are shown in Figs. 3, 8(c), (d) and 9(b)-(d). If the opcode is not found, a search ensues amongst the nearest neighbors on the interaction grid. If one contains the opcode, the "parasite's" pointer begins to execute that "host's" code. If the cxregister is empty (no opcode stored there by a previous PNOP or PYOP operation), the pointer is moved, by
124
A.N. Pargellis/Physica D 98 (1996) 111-127
IFAG
JMPB
t......
- -
,
¥
COPY
IFAG
@ L . . . . . . . .
RETN
. . . . . . . . .
u
Fig. 10. Biology of parasites: (top) an example of a host that is susceptible to a virus which repeatedly uses its code to generate daughters; (bottom) a host that is utilized by two different types of viruses. The left-hand virus is particularly efficient as it does not reset its bx-register to the host's (larger) size. Both viruses lose their pointer after a child has been generated. Since most cells in this run were viruses, a viral colony formed where viruses traded pointers.
default, all the way to the first statement in the cell's code (see Fig. 8(c)). CALL(22) - Call the opcode in cx by jumping the pointer to that location and load the numerical position (address) in that cell of the opcode following the C A L L statement into the dx-register (RETN register). Examples are shown in Fig. 6(a), (b). C A L L first attempts a JMPF, from the pointer's location, to the nearest opcode in cx. If the JMPF fails to find the opcode the pointer then attempts to JMPB from the C A L L to the opcode in cx. In the eventuality the opcode stored in cx is not anywhere in the cell (fairly common!) the pointer then searches neighboring cells (this was variable but in the runs discussed here "neighbor" meant cells within a search radius of 3) for an occurrence of
the opcode. The cell containing the C A L L statement is called the "called" so that the cell can call subroutines in a nearby host. If the cx-register is empty the instruction pointer goes forward to the last statement in the cell. RETN(23) - Return to the opcode at the numerical location placed in the retum register (dx) by a previous C A L L statement in the caller cell (which may not be the owner of the pointer if the cell is a virus in a host cell). For example, if the CALL opcode is the fifth statement in a "caller" cell, after the pointer has been sent to the called subroutine and encounters a subsequent RETN statement (which might be in another cell's code if the caller was a parasite) it is returned to the sixth statement in the caller cell. If the address stored in dx is not found, then the pointer is lost, at random, to a nearby cell on the interaction grid. If no address has been previously loaded into dx (because no C A L L operation has been executed yet) then the RETN is to the cell's first statement (as shown in Fig. 8(c) and (d). JMPF(24) - Jump the instruction pointer forward to the first occurrence of the opcode stored in cx. This is similar to JMPB but JMPF moves the pointer, from its present position, towards later instructions. If the cx-register is empty the pointer is moved, by default, all the way to the last statement in the cell's code (see Fig. 8(b), where the pointer is then lost). IFAG(31) - If ax > bx then skip next instruction, else continue on to the following operation as usual. Many examples are shown in Figs. 8 and 9. PYOP(32) - Similar to PNOP ( l l ) but in this case the following opcode, in addition to its complement being loaded into the cx opcode register, is also executed. Using the example from above for PNOP, the sequence: [PYOP/RESA] loads the cx with the complement of RESA which is COPY ( 5 5 - 42 --- 13). The RESA operation is then executed so that a x = 1. Examples are shown in Figs. 9(b), (c) and 10. RESB(33) - Reset bx-register: bx = cell size (last instruction). Examples are Figs. 9(c) and 10(b) where the only apparent value for this opcode is that parasites are slowed down as their bx-register is reset to their host's (usually larger) size.
A.N. Pargellis/Physica D 98 (1996) 111-127
IFAL(34) - If ax _< bx then skip next instruction, else continue with the following operation as usual. Many examples are shown in Figs. 8-10. ADRF(41) - Search forward for the opcode in cx and put the numerical location of that opcode (so this is actually an address for a particular cell) into bx. An example is shown in a parasite in Fig. 10(b) where ADRF is not executed but used for loading the cxregister. This opcode is sometimes used by host cells as a defense against parasites because ADRF also defines the cell in which the ADRF opcode is being executed to be the new owner of the parasite's pointer. Therefore, if a parasite has jumped or called its pointer to a nearby (host) cell containing the ADRF opcode, the host's code will be copied into the memory originally allocated for the parasite's child. If no opcode had been previously stored in cx, the default is that the bx-register is reset to the total number of opcodes in the cell: e.g. bx -- cell size. RESA(42) - Reset ax-register: ax ---- 1. Many examples are shown in Figs. 3, 6, 9, and 10. BEQA(43) - Set: bx = ax. This opcode is frequently used in a somewhat unusual class of replicators (two examples are in Figs. 8(a) and 9(d)). ADRB(44) - Similar to ADRF but search backward for the opcode in cx and put the location of that opcode (an adress in the cell) into the ax-register. One example is shown in Fig. 6(c). If no opcode was previously loaded into cx, the default is that the ax-register is reset: ax----- 1.
Appendix B Fig. 8 shows some examples of protobiotic cells, defined to be those cells which either copy themselves incorrectly at least once or lose their pointer to a neighboring cell. The initial, default values of the numerical registers are ax = 1 and b x = (cell size). Although the protobiotics frequently do not last more than a few generations, they sometimes become the dominant component of the soup by evolving into a more efficient organism, the biotic cell. In Fig. 8(a), the parent cell copies itself incorrectly the first two times before properly resetting the ax-register. The
125
first child contains only the parent's first statement, MALL, since the parent divides off the child when the IFAG conditional statement fails to skip the DIVD statement since ax = 2 < bx = 12. The second child lacks the first statement since the copy loop starts with ax ----2. The default routine in the main program penalizes the cell one-third of its timeslice since it has to reset the ax-register to one after copying the 13th operation (which is 0). All future children will have the same code as their parent. In Fig. 8(b), the parent cell correctly copies itself the first time but then loses its pointer to some nearby cell in the soup after the final statement. Fig. 8(c) is unusual because the cell is actually a true self-replicator, but the code for a prebiotic cell is embedded within it. The subprogram: [ M A L L / R E S A / D I V D ] does not copy anything, although the reset ax-register statement is useful as it resets the parent's registers. In Fig. 8(d), the parent copies itself correctly the first time, but after dividing off the child the JMPB is to the defined opcode, IFAG (31), with C ( J M P F ) = I F A G . This effectively traps the pointer in the infinite loop: [IFAG/DIVD/JMPB]. Fig. 9 shows some of the general trends encountered in this model. There is always a JMPB or RETN instruction at the end of the replication routines. The self-replicators last indefinitely, undergoing continual improvements as they compete for time and memory. As discussed above, in conjunction with Fig. 4, the genome of the smallest self-replicator has size five, an example of which is shown in Fig. 9(a). It is very inefficient as the cell attempts to allocate memory for a new child each time it enters the copy loop, does not load the opcode register (cx), and does not reset the ax-register. A more efficient replicator is shown in Fig. 9(b) where two separate routines exist for loading cx. The code shown here is for an earlier ancestor of the replicator shown in Fig. 3. The details of this cell's operation are interesting as they incorporate many of the features that exist in the genetic code used in these simulations. The first [PNOP/JMPF] routine does not execute the jump forward statement and loads IFAG = C(JMPF) into cx. The IFAG opcode is first reached after the M A L L statement by the JMPF jump, bypassing the four-operation intron in the process. Then, until copying is finished, the IFAG condition is false and
126
A.N. Pargellis/Physica D 98 (1996) 111-127
the subsequent IFAL conditional operation is always true, skipping the next (PYOP) statement and copying an instruction to its child with the COPY operation. However, when done copying, the subsequent j u m p back (JMPB) to the addressed IFAG bypasses the IFAL statement since IFAG is now true (ax > bx). This results in executing the load-cx sequence [PYOP/COPY] which now reloads cx with the cell's first statement, R E S A . This time, the JMPB is to the beginning of the cell. Fig. 9(c) shows the code of another fairly efficient biotic. This replicator resets the ax-register (RESA) at the beginning of each new replication cycle. Fig. 8(d) shows a very efficient replicator which copies itself in only four copy loops. By first setting the numerical value of the bx-register equal to that of the ax-register (BEQA; bx = ax), ax > bx throughout all the copy loops, since each COPY command also increments the ax-register each time it is executed. When ax > (cell's size + 1), the default routine resets the ax-register, incurring a minor time penalty (see Table 2) only once at the end of each set of copies. Note that the M A L L statement is not continuously used since it is preceded by PNOP, so that it is not executed after the first time. [ P N O P / M A L L ] sets B E Q A (cx ---- 55 - 12 = 43) as the opcode in cx. Thus, the final JMPB is to the beginning of the cell (BEQA). Fig. 10 shows some examples of parasites. Parasites abound in this world, usually using a J M P F or C A L L to access a routine in a host. Fig. 10(a) shows a host that is susceptible to a fairly sophisticated virus (shaded) that defines the host's COPY statement as the opcode that the viral pointer calls. Once in the host, the parasite's pointer remains within the host, repeatedly cycling through the host's COPY loop, then allocating memory for another child. Two other host-parasite combinations are shown in Fig. 10(b). This particular host was particularly susceptible to viruses, two of which are shown, for many thousands of generations. The host's initial statement, RESB, sets the bx-register to the host's size. This afforded the host some protection, as many small viruses had their size set equal to the host's much larger size. One example of this kind of virus is shown to the right of the host. The host's first statement is called, resetting the viral bx-register to the host's size of 14 statements. After copying its
four statements, the virus then copies 10 empty slots (0), which is time consuming. The host's return is to the end of the virus, which then loses its pointer to another cell in the neighborhood. Because there were so many viruses, the pointer was usually lost to another virus. Thus, a colony of viruses formed that essentially copied their code via the host and then shared their pointers with each other. A somewhat more robust virus is shown to the left of the host. This virus calls the DIVD statement of the host, thus the viral bxregister is not reset and the virus needs to only copy four statements before exiting the COPY loop. This virus also loses its pointer to the surrounding colony.
References [I ] B. Alberts et al., Molecular Biology of the Cell (Garland, New York, 1994). [2] J.E. Damell and W.E Doolittle, Proc. Nat. Acad. Sci. USA 83 (1986) 1271. [3] M. Eigen, Naturwiss. 58 (1971) 465. [4] L. Stryer, Biochemistry (Freeman, New York, 1988). [5] M. Eigen et al., Science 244 (1989) 673. [6] EH.C. Crick, Sci. Amer. 215 (4) (1966) 55. [7] E Dyson, J. Mol. Evol. 18 (1982) 344. [8] L.E. Orgel, J. Theoret. Biol. 123 (1986) 127. [9] T.R. Cech, Sci. Amer. 255 (5) (1986) 64. [10] E. Reitman, Creating Artificial Life: Self-Organization (McGraw-Hill, New York, 1993). [11] D.R. Mills, R.L. Peterson and S. Spiegelman, Proc. Nat. Acad. Sci. 58 (1967) 217. [12] S. Spiegelman, Amer. Sci. 55 (3) (1967) 221. [13] M. Eigen et al., Sci. Amer. 244 (4) (1981) 88. [14] L.E. Orgel, Nature 358 (1992) 203. [15] Q. Feng, T.K. Park and J. Rebek, Jr., Science 254 (1992) 1179; T.K. Park, Q. Feng and J. Rebek, Jr., J. Amer. Chem. Soc. 114 (1992) 4529: J. Rebek, Jr., Chem. Ind. 5 (1992) 171. [16] G. von Kiedrowski, Angew. Chem. Int. Ed. Engl. 25 (10) (1986) 932; A. Terfort and G. von Kiedrowski, Angew. Chem. Int. Ed. Engl. 31 (5) (1992) 654. [17] W. Banzhaf, Comput. Math. Appl. 26 (1993) 1. [18] A.M. Turing, Proc. London Math. Soc. 42 (1936) 230265; ibid. 43 (1937) 544-546. [19] J.von Neumann, Theory of Self-Reproducing Automata (University of Illinois Press, Urbana and London, 1966). [20] A.K. Dewdney, Sci. Amer. 250 (3) (1984) 15; ibid. 252 (3) (1985) 14. [21] S. Rasmussen, C. Knudsen, R. Feldberg and M. Hindsholm, Physica D 42 (1990) 111. [22] T. Ray, in: Artificial Life II, eds. C.G. Langton (AddisonWesley, Redwood City, 1991) p. 371; K. Thearling and T.S. Ray, in: Artificial Life IV, eds. R. Brooks and P. Maes
A.N. Pargellis/Physica D 98 (1996) I 11-127 (MIT Press, Cambridge, 1994) p. 283; T. Ray, Artificial Life 1 (1994) 179; C. Adami, Physica D 80 (1995) 154. 123l C. Adami, in: Artificial Life IV, eds. R. Brooks and P. Maes (MIT Press, Cambridge, 1994) p. 269; C. Adami and C.T. Brown, in: Artificial Life IV, eds. R. Brooks and P. Maes (MIT Press, Cambridge, 1994) p. 377. [241 A.N. Pargellis, Physica D 91 (1996) 86-96. [251 J.W. Drake, B.W. Glickman and L.S. Ripley, Amer. Sci. 71 (1983) 621. 1261 W-H. Li and D. Graur, Fundamentals of Molecular Evolution (Sinauer, Sunderland, 1991).
127
[27] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs (Springer, New York, 1992). [28] Y. Beers, Introduction to the Theory of Error (AddisonWesley, Reading, MA, 1957). [29] C. Adami, Phys. Lett. A 203 (1995) 29. [30] S.J. Gould and N. Eldridge, Nature 366 (1993) 223. [311 K.B. Springer and M.A. Murphy, Lethaia 27 (1994) 119. [321 B.A. Malmgren, W.A. Berggren and G.P. Lohmann, Science 215 (1984) 317. [33] C.E. Shannon, Bell Sys. Tech. J. 27 (1948) 379--423; ibid. (1948) 623-656.