Dynamics of molecular evolution

Dynamics of molecular evolution

Physica 22D (1986) 100-119 North-Holland, Amsterdam DYNAMICS OF MOLECULAR EVOLUTION Peter SCHUSTER h~stitut fiir Theoretische Chemie und Strahlenchem...

1MB Sizes 1 Downloads 159 Views

Physica 22D (1986) 100-119 North-Holland, Amsterdam

DYNAMICS OF MOLECULAR EVOLUTION Peter SCHUSTER h~stitut fiir Theoretische Chemie und Strahlenchemie, Universit~it Wien, Wiihringerstrasse 17, A 1090 Wien, Austria Molecular evolution is visualized as a dynamical process described by nonlinear differential equations which are derived from the kinetics of polynucleotide replication. First and higher order autocatalytic processes lead to different behavior in selection dynamics. One consequence of this fundamental difference is of primary importance for the evolutionary process: optimization of the mean rates of replication in the sense of Darwin's theory of evolution is restricted to first order autocatalysis and to some special cases of second order autocatalytic processes. Dynamical systems which model these processes are fairly simple: they can be represented by some kind of generalized gradient systems based on a potential function and they admit only saddle-node bifurcations. Emergence of new properties like oscillatory patterns, dissipative spatial structures and various forms of control and organization, however, is possible only in systems with complex dynamics which require at least second order autocatalysis. Error propagation through replication sets a limit to the genetic information content which can be transferred stably over many generations. This limitation is casted into quantitative terms by the theory. It leads to an error threshold which can be derived either from the differential equations of biochemical kinetics or from the theory of branching processes. The latter derivation is certainly more adequate since it allows to account for stochastic phenomena which are fundamental to evolution. Biological evolution is a superposition of many hierarchically ordered processes which take place on different time scales. We shall discuss three levels on which molecular evolution operates: elementary step dynamics of replication kinetics, selection dynamics leading to stationary mutant distributions, called "quasispecies", and evolutionary adaptation.

I. Concepts in molecular evolution Last century Charles Darwin introduced mechanistic thinking into biology. In essence his theory consists in the proposal of a mechanism of biological evolution which he derived from his rich empirical knowledge. His mechanism is based on two principles both of which are consequences of multiplication and inheritance under the constraints of some environment: 1) Variation of organisms is brought about by modifications which occur inevitably during replication. 2) Natural selection eliminates those variants which replicate less efficiently. The efficiency of replication is commonly called the Darwinian "fitness". It is a measure of the number of fertile descendants an individual contributes to the next generation. Darwin and most of his contemporaries had no idea what the mechanisms were which cause variation, although Gregor Mendel published his pioneenng studies in

genetics roughly at the same time the "Origin of Species" came out (for historical details see e.g. [1]). Mendel's work gave the first evidence for the existence of elements or "atoms" of inheritance. We call them now genes. The synthesis of Mendelian genetics and Darwin's theory was done successfully in the first half of this century by the famous scholars of population genetics, R.A. Fisher, J.B.S. Haldane and Sewall Wright. Clarification of the fundamental concepts as well as formalization casted this discipline into a mathematical frame which is based to a great extent on the analysis of nonlinear differential equations and of the corresponding stochastic models (see e.g. [2]). Population genetics deals with the spreading of genes in populations. It determines and discusses their transient and stationary distributions. Mutation, i.e. modification through unprecise replication, is introduced into population genetics as some rare event from outside, like the ~'deus ex machina" in the baroque theater. Knowledge of

0167-2789/86/$03.50 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)

P. Schuster/Dynamics of molecular evolution

the molecular processes basic to replication and inheritance was necessary to introduce a mechanism of mutation into the theory of evolution. Deoxyribonucleic acid (DNA) was identified as the carrier of genetic information. The proposal of the double helix as molecular structure of DNA by James Watson and Francis Crick in 1953 initiated a new discipline which is now known as molecular biology. The molecular mechanism of inheritance was discovered during a period of about thirty years of exciting research. Polynucleotides-this is the notion of a class of polymers comprising DNA and the closely related macromolecule ribonucleic acid, R N A - are properly visualized as strings of digits (fig. 1). There are four types of digits G(uanine), A(denine), C(ytosin) and U(racil) in RNA. In DNA uracil is replaced by T(hymidine). The bases are assigned to each other uniquely in complementarity relations. This assignment is the consequence of specific patterns of hydrogen bonds, three between G and C ( G ~ C ) and two between A and U or T ( A = U or A~-T), respectively. As indicated in fig. 1 such a string of digits, commonly called a "strand", can act as template in the synthesis of a uniquely defined negative copy, the minus strand (It-)). The minus strand in turn is the template for the synthesis of the plus strand (It+)). This mechanism of "complementary replication", i.e. replication via negative strands as intermediates has been found with RNA viruses. It has been investigated in great detail [3]. DNA replication is a much more complicated process [4]. It follows, nevertheless, the principle of template induced polymerization guided by the complementarity of bases. Several types of replication errors have been identified. They are distinguished as different classes of mutations. The most important of them are: 1) Point mutations are single base exchanges. The chain length of the polynucleotide (y) is conserved. 2) Deletions are mutations which lead to polynucleotides of smaller chain lengths. One base or a sequence of two or more bases is lost during replication.

101

3) Insertions lead to longer polynucleotides. A base sequence is inserted between two neighboring bases of the polynucleotide to be copied. Corn-

©

®@

©

A I~AUGCCAGUGAAC . . . . .

--> PLUS STRAND I (÷)

<'- UACGGUCACUUG . . . . .

- I MINUS STRAND I (-)

B F - AUGCCA["~ UGAAC . . . . . --> <-- UACGGU ILl I ACUUG . . . . . - I POINT MUTATION

,4, I~AUGCCA []

UGAAC . . . . . '->

C I--~AUGCCAGUGAAC . . . . .

--->

I~-AUGCCA~GUGAAC

INSERTION .....

->

13 I'-- AUG C I ' C " A ~ G A AC . . . . .

->

DELETION

,1, ~---AUGCIGAAC . . . . .

.-~

E Fig. 1. Molecular structure and replication of RNA. A. RNA consists of a molecular backbone built from ribose and phosphate onto which bases are attached regularly at equally distant positions. We have four different bases: G(uanine), A(denine, C(ytosine) and U(raeil). B, The complementarity of the bases (G and A are bound strongly to C and U by specific hydrogen bonds: ~ C and A ~ U ) allows to assign a unique minus strand to a given plus strand. Template induced replication of RNA proceeds from one end to the other through digit per digit assignment of the complementary base to.the template and formation of a linkage between the growing strand and the incoming digit. C. A replication error occurs as a consequence of an accidental mispalring of the type G--U. It leads to a "point mutation". D. "Insertions" may occur by partial duplication of the base sequence. E. In "deletions" part of the sequence is omitted during replication. The mechanism of DNA replication follows the same principles of complementarity of bases (in DNA the base uracil is replaced by thymine) and alignment of template and newly synthesized strand. In detail the process is much more complicated than RNA replication. It requires at least ten different enzymes [4] whereas virus RNA replication works with a single enzyme only.

102

P. Schuster/Dynamics of molecular evolution

monly, the sequence inserted is a copy of some part of the template. 4) Rearrangements of the polynucleotide sequence are caused by so-called transposable elements. These are base sequences which can bind specifically to each other. Certain combinations of transposable elements create "jumping genes" which may change their positions on the DNA during replication. Rearrangements may be accompanied by deletions or insertions. We know now what the major processes are which lead to mutations in prokaryotes-these are primitive organisms lacking the characteristic organized structure of a cellular nucleus, in particular viruses, bacteria and cyanobacteria. Much is known also about the replication process in the higher organisms, the eukaryotes. Yet, many details still wait to be discovered. Not surprisingly, molecular biology had a strong impact on the theory of evolution. We mention here two streams of research which contributed to a new field often referred to as "molecular evolution". The first is comparative studies of polynucleotide or polypeptide sequences in order to trace evolutionary changes in the genetic information through mutations. These studies are very popular and highly productive at present, because sequencing of DNA became almost routine and thousands of bases can be determined in very short times. One major goal of sequence comparisons is to reconstruct the history of present day organisms by measuring their relatedness. The classical method to construct phylogenetic trees is based on comparative morphological studies of organisms and hence suffers from the lack of a universal and precise quantitative measure of relatedness. Molecular evolution provides such a measure. It is the Hamming distance, D(j, k), of two polynucleotide sequences to be compared: D(j, k) is the number of positions in which the two properly aligned sequences Ij and I k differ. Another interesting detail was revealed by sequence studies as well. It turned out, quite unexpectedly, that selectively neutral mutations are common and play a non-negligible role in maintaining genetic variability in populations [5].

Comparison of sequences, the conventional static approach towards molecular evolution will be contrasted here by an alternative dynamical view of the evolutionary process. At the present state of the art this approach deals with primitive molecular systems which can evolve but which are not as complex as biological organisms in order to allow systematic kinetic investigations. The basic idea is to learn the mechanism of biological evolution from laboratory studies on replication of molecules under sufficiently simple and controllable environmental conditions. In essence, these studies demonstrated that the occurrence of Darwinian evolution is not bound to living organisms. Cell free assays of replicating molecules show all the characteristic properties predicted by Darwin's theory which we discussed above.

2. Autocatalysis and rate equations of replication Initiated by Manfred Eigen's pioneering study in 1971 [6] we developed a dynamical theory of molecular evolution which is based on the physical properties of biological macromolecules and the biochemical kinetics of replication [7-12]. Very detailed experimental investigations on replication of viral RNA [13-16] revealed that the initial assumptions of a simple overall replication kinetics used in the theory is well justified for most environmental conditions. 2.1. Replication and mutation RNA replication in the flow reactor (fig. 2) can be described properly by a series of single, first order autocatalytic reactions (A)+I k

fkQkk

,2Ik;

k = l , 2 .... ,n,

(1)

and the accompanying network of mutations fkQj~

(A) + Ik----~ Ij + Ik;

j , k = 1,2 ..... n.

(2)

Herein, we denote the individual polynucleotide

P. Schuster/Dynamics of molecular evolution INPUT SOLUTION: GTP,AT P.CTP.UTP ENZYME

OUTPUT SOLUTION: INSTANTANEOUS TANK CONTENT

(]

[.__ m

.......~

i--

~

i

103

tations, on the template I k. The frequencies at which correct copies and individual mutations are obtained are represented by the elements of the mutation matrix Q: Qyk is the frequency to obtain Ij as an error copy of I,. Accordingly, we have a conservation relation:

----~!- _ m

~ -

-

B

-

- -

-

-

_ -

~ _

l _ , .

Qj, = 1.

- m

.

1

_ _

-

- - i .

m

- -

-

~ ~

_

1

~ 1 .

m B

. - -

m

(3)

j=l

.

m

_ _

_

.

z - - i ,--] z _ j [--=

Every copy has to be either correct or erroneous. The diagonal elements of the mutation matrix, Qkk, were called the quality factors of the replication process [6, 7]. The case Qkk = 1 means ultimate accuracy of replication, or no mutations.

FLOW RATE : r = "OR-1 ]

Fig. 2. A continuously stirred tank reactor (CSTR). In this reactor which is known also as "chemostat" in microbiology, the material source, which makes up the open system, is provided by a continuous influx of a solution containing the material necessary for RNA replication: the four activated monomers, the nucleoside triphosphates GTP, ATP, CTP and U T P as well as the polymerizing enzyme. In the simplified model systems to be discussed here, we subsume all these compounds under a single input quantity A. Its input concentration will be denoted by a 0. The evolutionary constraint is provided by the continuous outflux of solution from the reactor. Replicating RNA molecules are injected into the reactor at t = to. Then, their concentrations may increase and eventually reach a stationary value, or the molecules may be diluted out of the reactor depending on the flow rate r which is commonly measured in terms of the reciprocal residence time (~-~) of the solution in the tank reactor.

sequences by I k. Accordingly, n such sequences form a set whose members are related by mutations (k = 1, 2 . . . . . n). The symbol (A) stands for the material required for RNA synthesis. In experimental systems this includes energy rich monomers, the nucleoside triphosphates GTP, ATP, CTP and UTP, in the proper stoichiometric amounts. Laboratory studies utilize also an RNA polymerizing enzyme, e.g. Qfl-replicase which is a virus specific RNA polymerase. The rate constants fk are a measure of the total rates of RNA synthesis, correct copies and mu-

2.2. The basis of structural variability It is worth considering the numbers of polynucleotides which are involved in such a replication-mutation system. Even if we restrict ourselves to point mutations-that means we consider exclusively polymers of the same chain length 7 - the numbers of possible sequences are "hyperastronomically" large: we have 4 r different sequences with 7 digits! In a typical experiment we are dealLn,g with 1012 polynucleotides or less. This number is approximately the number of oligomers of chain length 7 = 20. In nature we are dealing with much longer polynucleotide sequences: the smallest RNA molecules, the transfer RNA's are about 7 = 70 bases long. Tile genetic information of the smallest viruses amount to a few thousand bases. The D N A of bacteria i s s o m e million base pairs long, those of eukaryotic organisms even about 109 base pairs long. Any population size, "in vivo" or "in vitro", is by far smaller than the number of polynucleotide sequences. Thus, we observe a striking difference to ordinary chemical reaction networks which involve but a few molecular species each of which is present in a very large number of copies. The converse situation is the rule here: the numbers of different molecular species that may be interconverted through replication and mutation, exceeds

104

P. Schuster/Dynamics of molecular evolution

by far the total number of molecules present in any experiment, or even the number of molecules available on the Earth. Hence, the applicability of conventional kinetics to problems of molecular evolution is a subtle question which has to be considered carefully for each particular case. We cannot discuss this problem in full consequence here, but we refer to a forthcoming paper on molecular evolution described as a stochastic process [17] and treat only those aspects here for which the use of differential equations can be well justified. 2.3. Selection and constraints One basic concept of Darwin's theory is natural selection. In order to be able to provide a coherent picture of the evolutionary process we have to make the notion of selection more precise. Let us assume a population of several types of polynucleotides I k ( k = 1. . . . . n) at time t = 0 . We denote their concentrations by [Ik]----Ck. We say that the molecular species I,, has been selected if all other concentrations except c,,(t) vanish in the limit of infinite time:

lim c.,(t) 4=0 l'-* O0

and

lira Ck(t ) = 0 t"~ O0

for all k 4: m.

(4)

Selection occurs only in systems with some constraint concerning the total number of molecules. The basic effect of the constraint is to introduce finite lifetimes for the individual molecules. Constraints in nature are prox6ded by the finiteness of reservoirs and by degradation processes. The simplest degradation process we can think of is first order decay: dk

Ik~(B);

k = l , 2 .... , n .

(5)

Hydrolytic cleavage of RNA or DNA leads to energy poor monomers. These are the nucleoside monophosphates GMP, AMP, CMP and UMP or TMP, respectively. The symbol (B) stands here for

the stoichiometric combination of monomers as obtained by degradation of a given polynucleotide. One might well ask why we assume irreversibility of the reactions (1), (2) and (5). This question has been studied in some detail in previous papers (see e.g. [12, 18]). It turned out that irreversibility for practical purposes of these reaction steps is a necessary prerequisite of global selection. Only then the system is kept away from thermodynamical equilibrium for sufficiently long time. Moreover, we observe that the conditions of practical irreversibility of the formation and of the degradation processes are well fulfilled in all real biological systems: AG ( y o G T P + yAATP + YcCTP + YuUTP) >> AG(I + 3'PP) and zIG(I) >> AG(y c GMP + 3'A AMP + Yc CTP + ~'tJ UTP). We denote the stoichiometric coefficients by ~,6, YA, 7C and Yu with y = 7c + "/A + YC+ YU and use the symbol PP for pyrophosphate anion. Replication and degradation proceed under conditions of strongly negative AG. In a continuously stirred tank reactor (CSTR, see fig. 2) the constraint is provided by the flow which is commonly measured in terms of the reciprocal residence time of the solution in the tank: r = ~-~1. Then, th~ dilution flux introduces a constraint also in absence of degradation. The term in the kinetic equations is formally identical with that of first order decay. We just have: d 1 = d 2 . . . . dn=r. Another constraint called constant organization or constant population size is frequently used in molecular evolution and population genetics because it simplifies the mathematical analysis substantially. The total number of molecules, the population size, is kept constant by means of an unspecific dilution flux qfft). This flux is adjusted instantaneously to the rate of replication in order

P. Schuster/Dynamics of molecular evolution

to compensate for the excess production of the whole population [6, 8]. 2.4. Higher order autocatalysis Let us now consider examples of higher order autocatalysis. In these cases a polynucleotide does not only act as template in replication. In addition; it has some catalytic activity which is relevant for the replication process. Is it reasonable to assign such a catalytic activity to an RNA molecule? Some years ago the answer would have to be " n o " , since there was no evidence for catalytic activity of polynucleotides apart from ribosomal protein synthesis. Recent research, however, revealed such a catalytic activity of RNA also in natural RNA processing. In particular, specific autocatalytic action of some RNA molecules on its own production has been reported [19]. RNA catalysis in the specific cleavage of some precursor R N A molecules to yield the biologically active forms was found as well [20]. Catalytic activity, however, need not be exerted by the polynucleotide molecule itself. It may also be the result of the action of an intermediate. In the biochemistry of the cell, proteins are the usual carriers of catalytic activity. Proteins are synthesized from polynucleotides by template induced translation. Hence they are intermediates of the .type mentioned above. This type of indirect catalytic action of polynucleotides on polynucleotides via proteins represents the basic mechanism of interaction between genes. The biochemistry of the cell, indeed, can be visualized as an enormously complicated network of catalytic activities. Some of them are autocatalytic in higher order. In order to illustrate the effect of higher order autocatalysis we shall analyse the simplest case: second order autocatalysis without mutations, akj

(A)+Ik+Ij~2Ik+I

fi

k , j = l , 2 ....,n.

(6)

Here we have chosen I k to act as template and Ij to represent the catalyst for replication.

105

In the forthcoming sections we shall discuss the properties of several model systems which are based on the reactions (1), (2), (5) and (6). Two constraints will be applied: the CSTR and constant organization. 2.5. Rate equations for error-free replication Concentrations are denoted by lower case letters, [A] = a and [Ik] = Ck, as before. We shall normalize concentrations to unity wherever this is useful

Xk=Ck/~j cj;

~kX k = 1;

j , k = 1,2 .... , n .

In mass action kinetics the rate equations for reactions (1) and (5) or for reaction (6) under the conditions o f the CSTR (fig. 2) can be written in the form

6= aor-- a( Y',FkCk + r ),

(7a)

" k

Ok ----Ck(Fka -- r);

k = 1,2 . . . . . n,

(7b)

with r being the reciprocal residence time and a 0 the input concentration of compound A respectively. Under the constraint of constant organization we find for normalized concentrations

2k=Xk(Fk--cb);

k = 1 , 2 . . . . . n,

(8)

with ~ = EkXkFk. The replication kinetics is symbolized by the function Fk in both classes of differential equations. For first and second order autocatalysis We have by eqs. (1) and (5) or (6)

Fk=fk--dk----ek

or

Fk = ~ a k j x j , J

respectively. The differential equation for first order autocatalysis under constant organization can be solved easily since the individual equations are weakly coupled only. By straight-forward calculus we obtain

P. Schuster~Dynamicsof molecularevolution

106 QO

0_ P~:SELEC~ Z

I-<

Z tlJ cD Z 0 (..)

P0:EXTINCTION

I-n Z

FLOWRATE

)

l"

Fig. 3. The two asymptotically stable stationary states of first order autocatalytic replication in the CSTR. We observe only two states: Po, the state of extinction at which all RNA molecules are diluted out of the reactor and Pt, the state of selection at which the most efficiently replicating R N A molecule (I,,,) is selected. Stability of Po or P1 in the (r, ao)-plane is mutually exclusive. The outcome of replication in the CSTR is independent of the details of the initial conditions.

xk(t) = Xk(O)exp[ekt--fotE('r)d'r]; k = 1 , 2 ..... n.

(9)

Herein we define a mean excess production of the population by n

ft.(t) = ~, ekxk(t ). Clearly, the population converges asymptotically towards a homogeneous state in which the most efficiently replicating species is present exclusively:

lira xm(t ) I"-¢00

-- 1 ;

lira x k ( t ) = O t-cO0

for k4= m and em= max [ej; j =

1 ..... n].

(lo) We observe selection of the polynucleotide I m in this case. No analytical solutions were obtained so far for the differential equations of first order autocatalysis in the CSTR as described by (7a, b). It was possible, nevertheless, to perform complete

qualitative analysis of this dynamical system [12]. There are two stationary states. Only one of them is stable for a given pair of values of the external parameters (a0, r). As shown in fig. 3 we have either extinction of all polynucleotides, and then, the concentration of A approaches the input value a 0, or we have selection of I,, as under the constraint of constant organization. The transition between the two states occurs by a saddle-node bifurcation. The differential equations with second order autocatalysis under the constraint of constant organization (8) are more difficult to investigate. The general c a s e (Fk= Ejakjxj) is characterized by very rich dynamics including multiple stationary states, stable limit cycles and strange attractors. Transitions occur by a whole variety of bifurcations including saddle-node bifurcations, super- and subcritical Hopf-bifurcations, "blue sky" bifurcations and others. It was possible to perform complete qualitative analysis for some important special cases [21-25]. We shall come back to some of them in the next section. Second and higher order autocatalysis give rise to problems in mass action kinetics. Cubic o r higher order terms appear in the rate equations, if we take the concentration of A properly into account (see eq. (6)). These terms are highly improbable in elementary step reaction kinetics since they require simultaneous encounters of more than two molecules. Biochemical kinetics, however, is different in this respect. Very often mass action kinetics is inadequate to describe properly the dynamics of the many steps in enzyme catalysed reaction networks. Then it is advantageous to use simplifying assumptions and to apply a kind of over-all kinetics. One of the best estabfished approximations describes substrate binding together with the enzymic reaction. It is known as Michaelis-Menten kinetics. Allosteric enzymes are of particular importance because they can lead to co-operativity or self-enhancement which is equivalent to higher order autocatalysis within certain ranges of substrate concentrations. One well investigated example of this type is the glycolytic

107

P. Schuster/Dynamics of molecular evolution

chain. There are two allosteric enzymes giving rise to higher order autocatalysis: phosphofructokinase and pyruvate kinase [26].

eigenvalues (h~; i = 1. . . . . n) of the matrix W =

[w;;]: tl

n

Y k ( t ) = E u k i e x p ( ) ~ d ) E vqyj(O); i=1

2.6. The replication-mutation system As one example of a system with mutations we consider first order autocatalysis under the constraint of constant organization. In mass action kinetics and normalized concentrations the rate equations can be written as 5ck = X k ( f k Q k k -- dk -- ¢) + E f;G;X;; j*k k,j=l

.... ,n.

(11)

The unspecific dilution flux ~ in this equation can be expressed by the mean excess productivity of the whole population:

'=P.=EekXk=E(h-dAxk. k

k

In this case it is possible to remove the nonlinearity introduced by the flux term by means of a transformation of variables [27, 28]: Yk = Xk eX

r d~" .

02)

Then we are dealing with a linear differential equation Yk = ~ Wk_iYi, j=t

(13)

with w k j = f ; Q k j -- dk3k; being an element of the value matrix W. Accordingly, the diagonal elements of W are the selective values, e.g. Wkk = fkQkk -- dk is the selective value of the polynucleotide sequence I k. The linear differential equation can be analysed by standard techniques. Let us assume that the matrix W is diagonalizable. Then the solutions of eq. (13) can be expressed in terms of left-hand and right-hand eigenvectors (u i = [Uki; k = l . . . . . n] and vi=[Vik; k = l . . . . . n]) and

j=l

k = l . . . . . n.

(14)

In case we choose sufficiently small rate constants for the degradation reactions ( d k < f k Q k k ; k = 1 . . . . . n) the matrix W is a positive matrix which admits, by the theorem of Frobenius, an eigenvalue h 1 which is dominant in the sense that Xl > I~kl for all other eigenvalues h k of the matrix W. The eigenvalue 2~1 is simple and hence, standard linear theory shows that one has for long times t n

yk ( t ) = Ukl exp (hal) E °ljYj(O);

k=l,...,n.

j=l

In addition, the relative variables converge asymptotically to values which are independent of initial conditions:

k,j=l

..... n.

We called this stationary mutant distribution the "quasispecies" of the system in order to indicate the analogy to the notion of species in biology. The properties of the quasispecies are determined completely by the dominant eigenvector, ul, of the matrix W. This dominant eigenvector depends on the distribution of replication rate constants, fk (k = 1 . . . . . n),'as well as on the mutation matrix Q. We shall present one model for the Q matrix based on point mutations only [29] in section 4. It will be used to illustrate the error propagation problem. In the case of the replication-mutation system we do not observe selection in the ordinary sense. Instead, the stationary solution consists of a master sequence and its mutants. The master sequence (I,,) is the sequence which is present at the highest

P. Schuster~Dynamics of molecularevolution

108

concentration in the stationary distribution of polynucleotides. With the exception of some pathological cases with especially high mutation rates, the master sequence is the sequence with the maximum selective value:

I.,:

w.,.,=max[wn; j=l ..... n].

As we showed above, this stationary mutant distribution, called the quasispecies, is the dominant eigenvector of W. Hence, the concept of selection can be retained if we formulate the replicationmutation problem in an abstract space whose coordinate axes are spanned by the eigenvectors of the matrix W (see next section).

time, but it depends on time implicitly:

V=V[xt(t ) ..... x,,(t)]. It is easy to show that V is a non-decreasing function, since

dV v( °vld *

Selection in molecular evolution can be visualized as an example for optimization of rates of replication. In physics optimization is often described by dynamical systems whose trajectories follow the gradient of some potential function. Here we make an attempt to derive such a potential for molecular selection processes.

a [dxi]

=

{ a2v I

is symmetric and hence has no complex eigenvalues.

Other properties of gradient systems are more difficult to show, one important of them is the nonexistence of stable dissipative spatial structures in the corresponding reaction-diffusion equation

or in vector notation, x = (x 1. . . . . x,): dx d t = grad V. The potential function V is no explicit function of

;

k = l , . . . , n , (18)

Xj,~ k

3.1. Gradient systems

xj.k

(17)

= I OXi OXj ] x,..,., = aji

Ot =Dk~72Xk+ Optimization processes are particularly easy to visualize if they can be described in terms of "gradient systems". A gradient system is an ordinary differential equation of a special class which can be written in the form

(16)

Another property of gradient systems which can be proven straight away is the lack of oscillations in the variables Xk(t ). The Jacobian matrix A = [aq; i, j = 1 , . . . , n] with a'J=

3. Gradient systems, potential functions and selection

( °vl

dt = /_~~~-~k ] --jT-= ~ ~-~-~k] >0.

under no flux boundary conditions [30]. Herein, D k denotes the ,diffusion coefficient of the macromolecule I k. 3.2. Error-free replication and generalized gradients Let us first consider error-fre¢-replication with respect to selection and optimization of replication rates. In this case we have'Q~k = 8gk; the matrix W is diagonal (Wik= Wkk~ik=ek=fk--dk) and we are dealing with the differential equation

= x,(ek- 5); k = l .... ,n

and

ek=fk--dk"

(19)

109

P. Schuster~Dynamics of molecular evolution

Solution curves, Xk(t), of this equation were presented and discussed in the previous section. The system converges asymptotically to a homogeneous state in which the most efficiently replicating species (I,:) is selected. It is easy to verify that E is a non-decreasing function of time for time independent rate constants ek:

and hence

d/T

d t = Y'~xk(ek -- ~)2 > 0.

(20)

This equation is always fulfilled since the relative concentrations x k are positive or zero and rate constants are real by definition. The average excess production /T, however, is no potential function in the sense of eq. (15): the trajectories of eq. (19) do not intersect with the constant level sets of /~ at right angles. Using a mathematical technique originally introduced by Shahshahani [31] it was possible to derive a generalized gradient system of the form dx - ~ = Grad V

for error-free replication [12, 32]. This generalized gradient (Grad) is based on a Riemannian metric which is defined in the interior of the space of relative concentrations. More precisely, this is the interior of the unit simplex S,, (Int S,,: x k > 0; F.kxk = 1 for all k = 1. . . . . n). In order to point Out the difference between this Riemannian metric and the Euclidean metric which is the basis of the conventional gradient (grad) we compare the defintions of the two inner products: Euclidean metric: Shahshahani metric:

(x,

y) = ~ X k y k, k=l

[x, y]~

~ k-1

~1k X k Y k •

In the Shahshahani metric every component of the inner product is weighted by the corresponding coordinate of the point (z) at which the inner product is formed. This weighting distorts the direction of the gradient of the potential V ( x 1. . . . . x,,) = ff~= ~ ekx k k=l

(21)

just in the right way so that it coincides with the direction of the trajectories of eq. (19). Note, that the Shahshallani inner product is not defined on the boundary of S,, where at least one z k -- 0. Now we can visualize optimization of the mean rate of replication as a "hill climbing" process on a "landscape" which is given by an extremely simple, linear potential function (21) in this particular case. All results derived for gradient systems are valid for the first order autocatalytic system under the constraint of constant organization. Error-free replication in the CSTR can be transformed into a generalized gradient system too. In this case the form of the potential function is somewhat more complicated. It contains quadratic terms in the concentrations of polynucleotides as well. The basic formalism, however, is the same as that described above and therefore we despense from the details which can be found in a recent paper [12]. 3.3. Replication with errors Next we consider replication with errors under the constraint of constant organization. For sufficiently accurate replication the system approaches a stationary mutant distribution and we do not observe selection in the strict sense. We may, nevertheless, perform a linear transformation of variables and choose the eigenvectors of the matrix W as the new basis of the coordinate system: x(t)=[xt(t =

) . . . . . x~(t)] Xk(t)xk=

k--1

~_, Uk(t)Uk. k-1

(22)

110

P. Schuster~ Dynamics o f molecular evolution

Herein the vectors x k a r e the unit eigenvectors of the Cartesian coordinate system and u k the righthand eigenvectors of the matrix W. The variables u k ( t ) can be derived easily from the expressions given in the previous section:

u (t) = u (0) exp (x t) = exp

(Xkt) ~ O k j y j ( O ).

(23)

j--1

Formally the transformed differential equation is of the same general type as (8) with F k = h k a n d

,I, = E: /~k = U k ( X k - - if:);

k=l

..... n

and the same is true for the solutions

These are positions in DNA or RNA at which one observes extraordinarily high mutation rates. In general, the frequencies of the reverse mutations at these positions are much lower. As a consequence of (1) a n d / o r (2) the average excess production is no longer a non-decreasing function of time. There is no straight forward optimization of the mean rate of replication for replication with errors. There is, nevertheless, a uniquely defined asymptotically stable stationary state, the quasispecies, which is defined by the dominant eigenvector of the value matrix W. In the new coordinates this is the state: u~ = 1, u k = 0 for all k = 2 . . . . . n. It is illustrative to split the concentration space around this asymptotically stable point into orthants according to the signs of the variables u k. Since the differential equation (8) is invariant on the boundaries of the orthants, U k = O -"~ il k = O -"l' ~tk = O . . . .

k=l,...,n and the time dependence of the average excess production

dff~

dt =

EUk(Xk

-

~)2.

In detail, there are, however, two substantial differences between the two kinetic equations and their solutions: 1) The variables x k were relative concentrations and hence are non-negative by definition: x k > O. No such relation holds for the variables u k. 2) The coefficients e k were obtained from rate constants and mutation frequencies and hence represent real quantities. The h k ' S a r e the eigenvalues of a non-symmetric matrix and need not be real (see e.g. [33]). Condition (1) is a much more severe restriction than (2) is. In most of the relevant cases the matrix W has exclusively real eigenvalues (see e.g. the example discussed in section 4). Counter examples may be presented by systems with highly unsymmetric mutation frequencies (Qik --b Qki) like polynucleotide sequences with so called " h o t spots".

trajectories cannot cross these boundaries. The system, thus, will always stay inside the orthant to which it was assigned by the choice of initial conditions. There is one orthant, namely the positive orthant ( u k > 0; k = 1. . . . . n) in which ffS(t) is a nondecreasing function (for real eigenvalues Xk), and one orthant, the orthant (u 1 > 1, u k < 0; k = 2 . . . . . n) in which E(t) is nonincreasing (for real eigenvalues kk). For these two orthants potential functions can be derived easily. The differential equation then corresponds to a generalized gradient system. In the remaining 2 n- x _ 2 orthants may decrease or increase. The fact that the average excess production may decrease during a selection process can be illustrated by means of a simple example. Consider a homogeneous population consisting exclusively of the master sequence at time t = 0 (x,,(0) = 1). Clearly the excess production is largest since the master sequence is characterized by the maximum selective value (W,,m). During selection mutants are formed which have smaller selective values ( W k k < W,,,, for all k#: m) and finally when the system approaches the stationary distribution, the

P. Schuster/Dynamics of molecular evolution

quasispecies, the average excess production reaches the stationary value from above, i.e. from higher efficiency of replication. The existence of complex eigenvalues of the matrix W, although it may be an exceptional case, has another consequence: it indicates the occurrence of transient oscillations in the concentrations and rules out the existence of a potential function. In the limit of long times the system, nevertheless, converges towards the dominant eigenvector. The corresponding eigenvalue is real and positive and hence, all oscillations in the concentrations have to fade out inevitably. Jones [34] derived a function

g(t) = ~ pkRehk,

(24)

kffil

with Pk(t) = [uk(t)l/Ejluj(t)l and ReX k being the real part of an eventually complex pair of eigenvalues. It is easy to visualize that

d~

-47 =

~1~ pk(ReXk-- i)2> 0

(25)

k=l

holds in full generality. The results obtained here for the constraint of constant organization can be extended easily to the corresponding reph'cation-mutation system in the CSTR [12]. 3.4. Potential functions for second

order autocatalysis Let us now consider autocatalysis of second order. Again we shall restrict ourselves to the constraint of constant organization. Very similar results were obtained for second order autocatalysis in the CSTR [12]. Karl Sigmund derived necessary and sufficient conditions for the existence of a potential function corresponding to a Shahshahani gradient for the differential equation (8) with F k = Eakjx j [32]. The matrix of rate constants A = [akfi k, j - - 1..... n] has to fulfill the following condition: aij + ajk + aki =

aik + a k j

111

This relation includes two special cases of particular importance for which potential functions exist: 1) The multidimensional generalization of the SchliSgl model [34] which is characterized by a matrix A of diagonal form, ajk = akkSjk, and 2) Fisher's selection equation of population genetics in which the matrix A is symmetric, ajk = akj.

Fisher's selection equation describes the spreading of n genes, more precisely n "alleles" (A 1, A2,..., A,) at a single locus. The diploid genotypes are either homozygous (e.g. AkAk) or heterozygous (e.g. AkAj). The rate coefficients akk or aki are a measure of the fitness of the phenotypes corresponding to AkA k or AkAj, respectively. Clearly, the fitness of AkA j is identical to that of AjA k - i t does not matter on which arm of the chromosome A k or Aj sits. Consequently, the matrix A is always symmetric. The multidimensional SchlSgl model actually can be visualized as a pathological case of Fisher's selection equation, in particular as the case of vanishing fitness of all heterozygotes. Let us now consider the potential function

V(x x..... x.) = ¢k/2 =

a,yxkx

xk

k=l

1

(27) which applies to both cases. We use it here as an example to illustrate the usefulness of potential functions. We calculate first and second derivatives of V(x x..... x,,):

(OV) [lj~=x ]/'j~=x ~X k = "~ ( ak.i + ajk)X j - V xj,

(28)

02V 1 1

OV

OV-~ I ,, (29)

+ a.ii;

i,j,k=l,2

.... ,n.

(26)

Extremum

values of the potential

function

112

P. Schuster/Dynamics of molecular evolution

V(x 1. . . . , x , ) are characterized by vanishing first derivatives. In our example (27) we are dealing with one extremum in the general case, but we may also have no solution or infinitely many solutions of the corresponding equations ( OV/Ox k) -- 0 (k = 1 . . . . , n) within the physically accessible range of relative concentrations, the unit simplex S,, (0 ~ x k < 1 for all k -- 1. . . . . n and EkXk = 1). We consider the multidimensional SchlSgl model first. In this case the unique extremum V(£ 1. . . . . 2 , ) - - V a t lies always inside the unit simplex. The coordinates of the extremum are given by a~~

k, j = 1,2 . . . . . n.

(30)

The Hessian matrix H = [hkj= (02V/OXkOXj)] is diagonal at the extremum. The elements hk, = akk are identicalto the eigenvalues.They are allstrictly positive and hence, the cxtremum of V is a minimulIl,

The only equilibrium point in the interiorof S,, is unstable and all trajectoriesconverge asymptotiCally towards the boundaries. At the boundary of the unit simplex S. at least one of the n variables x , is zero. Consequently. the Shahshahani inner product as well as the generalized gradient are not defined there. By means of a slight modification, however, we can extend our approach easilyto the boundaries, or more preciselyto every subspace of S, with some x k = 0. W e just have to eliminate those variableswhich are zero. Consider for example the boundary where the variable xj = 0. There we define the Shahshahani inner product "

[x,y],=

E

1

~xkYk,

k - l , kC,j

and repeat the whole procedure described above. Again we have a:unique extremum of V which is a minimum. All trajectories converge to the .boundaries of tiffs n - 1-dimensional subspace :which ,is properly visualized as a .unit simplex :S,,_i. Clearly,w e have n such n - l-dimensional

subspaces defined by xj = O, j = 1,..., n. Depending on initialconditions x1(0).....x,(0) each of the variables x k may approach zero first. The property of V on the subspaces of S, holds in fullgenerality:in the interiorof every subspace of S,, with two and more dimensions we have one and only one equilibrium point. It is unstable. Only the homogeneous states(x k = I, xj = O, j = 1 ..... n, j ~ k with k=l,...,n) arc asymptotically stable. These states are the comers of S,,. Apart from special cases with probability of measure zero all trajectoriesstartingin the interiorof S,,converge asymptoticallyto one of the corners of the simplex at which one of the species initially present is selected. Hence, we observe a case of multiple stationary states in the multidimensional SchlSgl model. The outcome of the selectionprocess is not unique: it depends not only on the rate constants akk but also on the initialconditions. Every homogeneous state has its own basin of attraction.The sizesof these basins arc dctcrrnincd by the rate constants akk. From the coordinates of the unstable equilibrium point on S,, follows that the species with the largest rate constant, a m , ,= m a x [ a k , , k = 1 .....n], has the largest basin of attraction, etc. In fig. 4 we show a characteristic example in three dimensions. N o w let us consider Fisher's selectionequation. The internal equilibrium point corresponding to an extremum of V is the solution of the linear equation / I|

t!

~-'~akj.~y----~xtwith r , ~ j = l ; j-1

k = 1 . . . . . n.

j-1

(31) The extremum may lie on the simplex S,, or outside. From eq. (29) we derive that the extremum may be either a minimum or a maximum. Accordingly, the dynamics of this system is much more complicated. For further details of the properties of Fisher's selection equation see e.g. [2]. Both eases, flae multidimensional SchlOgl model and Fisher's selection equation share, nevertheless, one important property: they can be transformed

P. Schuster/Dynamics of molecular evolution

3

2

Fig. 4. The multidimensional Schl~Sgl model as an example of second order autocatalysis under the constraint of constant organization. For the purpose of illustration we chose n = 3, and a H = 3, a2," = 2 and a33 = 1. Unstable stationary states on the unit simplex $3 are denoted by open circles. The three asymptotically stable stationary states are the three comers: 1 (,x l = 1), 2 ( x , = 1) and 3 (x 3 = 1). Note that the basins of attraction of the three states are determined by the rate constants a~.k: the basin of 1 is largest followed by that of 2, and that of 3 which is smallest. The outcome of the selection process depends on the details of the choice of initial conditions, in particular in which basin the system started at t = t o.

into generalized gradient systems and hence, we are dealing with a process optimizing the potential function (27). In contrast to first order autocatalysis the optimum of the potential function is not unique on the unit simplex. There may be several local optima and the outcome of selection depends on initial conditions.

3.5. General second order autocatalysis Let us now come back to the general case of second order autocatalysis under the constraint of constant organization. The rate coefficients do not meet condition (26). It is worth noticing that the corresponding kinetic equation-eq. (8) with Fk = Eja~.j.- x j - k n o w n as "replicator equation" [35] is closely related to the generalized Lotka-Volterra equation, which is of particular importance in theoretical ecology. Josef Hofbauer has shown that

113

a replicator equation of dimension n can be converted into an n - 1-dimensional Lotka-Volterra equation by a nonlinear transformation [36]. The trajectories of both systems are equivalent up to a change in the time axis. Replicator equations with general rate coefficients akj Can lead to very complicated dynamical phenomena. In particular, Hopf-bifurcations occur for n > 3 and stable limit cycles were observed. The solutions of systems with n > 4 show series of bifurcations with period doubting and chaotic dynamics within certain ranges of the rate constants. The reaction-diffusion equations corresponding to replication with general second order autocatalysis show Turing instabilities for certain parameter values. Numerical integration demonstrates the occurrence of stable dissipative structures which play an important role in models of morphogenesis and biological pattern formation [37]. In general, there is no optimization of the mean rate of replication, q,(t)=q~[xl(t) ..... x,(t)], in dynamical systems with second order autocatalysis. In some cases q~(t) approaches an asymptotic value for long times which is below the maximum. Other examples are characterized by oscillating q~(t): no asymptotic value of the mean rate of replication exists. There is another important feature which distinguishes first and higher order autocatalysis. Systems which replicate with first order autocatalytic rate laws are very insensitive to external constraints. They have unique stable stationary states which they approach, no matter what the detailed time dependences of the input parameters are [38]. Systems with. second order autocatalytic terms, however, can be very sensitive to changes in the time dependence of the constraints. There are also fairly simple special cases of the replicator equation the coefficients of which do not fulfill (26) and hence, may give rise to complex dynamical phenomena and dissipative structures. Consider, for example, the hypercycle equation whose rate coefficients fulfill the relation aky= bkSk.j+l;

j, k m o d n ,

P. S c h u s t e r / D y n a m i c s of molecular evolution

114 or

JC ~- X k

matrix Q, into the contributions of single digits:

( b,xj- ep)" j=k-lmodn

~(k)~(k)~(k) Q k k ~ ffl if2 if3

and k = l .... ,n,

...

q~k).

(33)

(32)

with ~ = ~,kbkXjXk . This equation has been studied in great detail [8, 21, 23-25] and its dynamics is well understood. There are no asymptotically stable stationary states for hypercycle equations of dimension n >_ 5. Instead, the systems approach asymptotically stable limit cycles and all concentrations oscillate with wave like patterns [39]. Another interesting property of the hypereycle equation concerns the long time behavior of the solutions: no concentration approaches zero and hence no member of the hypercycle will disappear. This property is quite the opposite of selection and we called it "co-operation" [8] or "permanence"

[401. Reaction-diffusion equations with reaction terms closely related to the hypercycle equation were studied in the context of morphogenesis and biological pattern formation [37]. They yield dissipative structures in systems with suitable spatial boundaries and under the necessary chemical conditions to keep the system far away from thermodynamic equilibrium.

Herein q~k) is the accuracy of the incorporation of the first base into the newly synthesized polynucleotide sequence, q~k) that of the incorporation of the second base, etc. In analogy to the definition of quality factors (Qkk-) the limit q = 1 means no errors or ultimate precision of replication. In the model which we present here, we assume uniform single digit accuracies: ql = q2

.

.

.

.

.

qr = q-

This assumption implies also that the accuracy of replication does not depend on the base sequence of the polynucleotide. Then, the quality factors are of the simple form Qkk = qV. In addition we restrict the model to two "binary sequences", these are sequences built from two digits (0,1) only. Then, the frequency of the incorporation of the wrong digit is simply 1 - q. Offdiagonal elements of the mutation matrix can be calculated now directly from the Hamming distance of the two sequences to be interconverted by mutation, Qjk = qV- O(i'k)( 1 -- q) o(j,k)

4. Propagation of replication errors and error threshold

The accuracy of replication sets a limit to the amount of genetic information which can be transferred stably from generation to generation. We shall consider here first order autocatalysis under the constraint of constant organization. In order to explore the accuracy limit of replication we shall use a model for the Q matrix [29], the mutation matrix which we have discussed in section 2. This model is restricted to point mutations. Hence all polynucleotide sequences have the same chain length 3'. We may factorize the quality factors of replication, the diagonal elements of the mutation

(34)

(35)

Note, that the mutation matrix Q is symmetric in this case. Then, the off-diagonal elements of the value matrix W~can be written as wjk = f k q r - ocj, k)(1 _ q) O(j, k),

(36)

and hence, W is not symmetric. Nevertheless, David Rumschitzlcy [41] presented a proof that the value matrix W has exclusively real eigenvalues within this model. Let us now consider the dominant eigenvector of W as a function of q. As we showed in the previous section it represents the stationary mutant distribution or the quasispecies of the replication-mutation system. Binary sequences are par-

P. Schuster~Dynamics of molecular evolution ticularly attractive as a model for replication because we have two limiting cases at high and low single digit accuracies which allow a physically meaningful interpretation. Highly accurate replication, q = 1, converges towards error-free replication in the limit q ~ 1. Let us now consider the limiting case q ~ 0. Then, every digit of the template is copied as complementary digit. A polynucleotide sequence, thus, yields the complementary sequence on replication and we have precise complementary replication for q = 0. Direct and

1.0

1.0

J



0.5 ~i

i 0.5-

L 0 1.0 c

0.5 q ~

0

Fig. 5. The quasispecies as a function of the single digit accuracy of repfication (q) for a chain length 3' = 5. We plot the relative concentration of the master sequence (Y0), the sum of the relative concentrations of all one error mutants (Yt), of all two error m u t a n t s (Y2), etc. Note, that we have only one five error m u t a n t Its ) in this example. We observe selection of the master sequence at q = 1. Then, the relative concentration of the master sequence decreases with decreasing q. At the value q = 0.5 all sequences are present in equal concentrations. Hence, the s u m s of the concentrations of two and three error mutants are largest (they have a statistical weight of 10), those of the one and four error mutants are half as large (they have a statistical weight of 5) and finally the master sequence and its complementary sequence, the five error mutant, are present in a relative concentration of 1 / 3 2 only. At q = 0 we have selection of a " m a s t e r pair" which consists of I 0 and I s in our example. Thus, we have direct replication with errors in the range 1 > q > 0.5 and complementary replication with errors in the range 0 < q < 0.5. The rate constants were chosen f0 = 10 for the master sequence, and/'~. = 1 (k 4= 0) for all mutants.

115

complementary replication appear as the two error-flee limits of the same model. lim q --* 1"

Qkj = ~kj; Ik --'>2Ik,

limq-*O:

Qkj=t~kt+)kt-);

k = 1 .... , n,

I (++ ) ~ I) ( - ) +'kI ( kk

I(-) ~ I"k(+) + I(k-), k

k --- 1 . . . . . n / 2 .

Herein we use the notation I~÷) and I~-) for two complementary sequences. Clearly, we have n / 2 such pairs. A third special case is of particular interest for the following analysis. At the accuracy q = 0.5 the incorporation of correct and wrong digits occurs with the same probability. Replication errors are so frequent that the selective values Wkk have no influence on the stationary mutant distributions. Then, inheritance breaks down and all polynucleotides are synthesized with equal probabilities. At the stationary state the sequences are present according to their statistical weights. The quasispecies degenerates to a uniform sequence distribution and the concept of a master sequence becomes obscure. In order to point at the lack of sequence correlations between template and copy we called this process "random replication" [29]. We shall discuss the implications of random replication for the applicability of differential equations to describe low accuracy replication later on. Now we consider the results of numerical computations which are shown graphically in figs. 5, 6 and 7. The first special case represents the quasispecies in an ensemble of oligonucleotides of chain length "t = 5 (fig. 5). At high values of q we observe selection of a master sequence (Io). At low q-values we have the "master pair" of complementary replication which corresponds to the master sequence of direct replication. In our particular example this is the pair I0,I s (we denote here the unique five error mutant of I 0, which is its complementary sequence, by 15). Random replication in the strict sense occurs only at q--0.5, but highly ambiguous replication dominates the whole fiat range between 0.4 < q < 0.7.

P. Schuster/Dynamics of molecular evolution

116

1.0

1.0 qrnin

qmox Io

1o

1 I

03 Yi

Yi 0.5 t

/

1.0

015

-

0

q Fig. 6. The quasispecies as a function of the single digit accuracy of replication (q) for a chain length -y = 10. The computations were performed in complete analogy to those shown in fig. 5. Note, that the range of " r a n d o m replication" has increased substantially compared to the case "t = 5. We observe fairly sharp transitions between direct and random replication at the critical value q ~ qrain and between random and complementary replication at q = qmax'

Increasing chain length changes the general features of the plots of the quasispecies against the single digit accuracy q rather drastically. For 3' = 10 the range of random replication has widened substantially (fig. 6). The curves are horizontal in a wide range extending on both sides of the maximum irregularity condition at q = 0.5. In addition, the transitions from direct to random replication and from random to complementary replication are sharp. Lower and upper bounds for the range of random replication can be calculated from second order perturbation theory [8, 29, 42]. They coincide very well with the transition points determined numerically. We have a minimum single digit accuracy, q = q~a~, which is the critical accuracy separating direct and random replication, and we have a maximum single digit accuracy, q = qm~x, which represents the bound between random and complementary replication. The transitions become exceedingly sharp for still longer chains. In the calculations of the curves

shown in fig. 6 we used 3' = 50. The logarithmic plot gives an even better insight into the sharpness of the transition. It suggests the conjecture that we are dealing here with an analogue to higher order phase transitions in the limit 3' --+ ~ . Although we used a rather special model system for the demonstration of the existence of a sharply defined minimum accuracy of replication the resuits are valid in great generality. This has been shown, for example, by application to real systems as they are discussed in [6, 7]. Let us now consider the transition from direct to random replication in more detail. The critical accuracy of replication (q = qmin) defines an error threshold below which the concepts of quasispecies and master sequence become obscure. This error threshold of replication was casted into quantitative terms by means of straight-forward analysis [6, 7, 29]. We dispense here from a presentation of the formulas. We just mention that the minimum accuracy of replication leads to a maximum chain length for given, constant single digit accuracies q. The corresponding quantitative relations turned out to be very useful in the prediction of maximum chain lengths for real systems [7, 10]. There is, however, a fundamental conceptual problem with the deterministic error threshold: below error threshold the deterministic model predicts a uniform stationary distribution of sequences. Such a distribution can never be realized in any population, neither in nature nor in the laboratory. As ,we pointed in section 2.2 we are always dealing with many orders of magnitude more possible polynucleotide sequences than we have molecules in the population. A uniform distribution of polynucleotide sequences, thus, is far away from reality. A stochastic analysis of the replication-mutation system by means of multitype branching processes [43], however, allows a physically meaningful interpretation of the error propagation problem in finite populations. Instead of characterizing the details of the stationary mutant distribution in the replication-mutation system we ask for the probability of survival of the master

P. Schuster/Dynamics of molecular evolution

117

1.01

~in

T

~io.s

~-I(2s) I(23),~-I(27)

0 1.00

I(22),~'I(2a) ~-I(21X:=:I~a) , ~I(2o~')=Ibo) 0.90

o'.gs q~

,

0

0,97 I

q J

0.96 I

I,

0,95 I

0% I

-5"

T J

tog Yi -10~- I(1),~I(~9) --1o

-15'

~Iso q min

-20

b Fig, 7, (a) The quasispecies as a function of the single digit accuracy of replication (q) for a chain length y = 50. The computations were performed in complete analogy to those shown in fig. 5. The transition is very sharp at this chain length already, (b) In order to demonstrate the sharpness we show in addition a logarithmic plot of the relative concentrations around the critical point q = qmin"

118

P. Sdmster / Dynamics of molecular evolution

sequence (Ira) to infinite time. A non-zero probability of survival to infinite time is the stochastic equivalent to a stable quasispecies. The stochastic treatment shows that zero probability of survival of the master sequence implies zero probability of survival for all sequences. Below error threshold we are thus dealing with a steadily changing ensemble of polynucleotide sequences. Interestingly, the quantitative expressions for this stochastic error threshold become identical with those obtained from the analysis of differential equations provided both are derived from the same rate constants in the value matrix IV. This interpretation of the error threshold by means of stochastic processes shows some analogy to the "localization threshold" of mutant distributions which has been recently derived by John McCaskill [44]. The error threshold is of direct relevance for adaptation in evolution. Fast adaptation to a changing environment requires sufficiently high rates of mutation. Consider a system which is very precise in replication. It forms few mutants only and adaptation is a slow process. Soon, such a system will be driven out by its more flexible competitors. Too many replication errors, on the other hand, bring the system below error threshold, inheritance breaks down and deterioration is inevitable. Efficient evolution thus requires a compromise between the two extremes, an error frequency which is adjusted to the requirements dictatedby the environment. Empirical data from viruses and bacteria suggest [7] that at least these organisms operate close to the error threshold.

5. Concluding remarks In this contribution we made an attempt to analyse the dynamics of molecular selection processes. These processes occur typically on a time scale of thousands of generations. In molecular systems a generation corresponds roughly to one complete replication. Selection dynamics takes place on just one of the many hierarchical time scales of processes relevant for biological evolu-

tion. We shall mention here three levels of processes which are superimposed in biological evolution: elementary step kinetics of replication, selection dynamics in populations and evolutionary adaptation to a changing environment. Actually, there are many more such levels, particularly as far as long time scales are concerned. The fastest processes which are of interest for our purpose are the elementary steps of replication kinetics. A polynucleotide of a chain length of ~/= 1000 bases is synthesized in many thousands of elementary steps. For example, the mechanism of RNA replication by a virus specific enzyme is explored and discussed in [3]. Despite the complex dynamics of the polymerization process the overall kinetics of replication is fairly simple. It follows the rate law of first order autocatalysis provided the activated monomers (GTP, ATP, CTP arid UTP) as well as the polymerizing enzyme are available in excess. Other environmental conditions may lead to different rate laws. Apart from certain narrow regions of transition between different overall mechanisms, these rate laws are rather simple too. Accordingly, it is well justified and useful to visualize elementary step kinetics of replication and selection dynamics as two hierarchically separated processes which take place on different time scales. Natural selection is a typical process of the second time scale. It takes many, eventually thousands of generations to approach a stationary distribution of mutants in a population. Another process on this level is for example random drift caused by the spreading of selectively neutral mutants in populations [5]. Evolutionary adaptation is the slowest process to be considered here. The evolving system accounts for changes in the environment by changing the distribution of mutants and by the emergence of new mutants which were not present in the original population. The rate at which such rare mutants of higher selective value appear determines the pace of adaptation. The empirical data available at present confirm the suggestion that the time scales of selection dynamics and

P. Schuster/Dynamics of molecular evolution

a d a p t a t i o n are well separated. S t a t i o n a r y p o l y n u c l e o t i d e d i s t r i b u t i o n s of quasispecies type are c e r t a i n l y a t t a i n e d in test tube experiments on R N A e v o l u t i o n [16]. It seems that n a t u r a l p o p u l a t i o n s of viruses a n d b a c t e r i a reach i n t e r m e d i a t e stages of " q u a s i s t a t i o n a r i t y " as well. In higher organisms the p r o b l e m is m u c h m o r e involved: the question w h e t h e r selection equilibria are attained or not c a n n o t b e discussed without a detailed analysis of the n o t i o n o f biological species which goes far b e y o n d the scope of this contribution.

Acknowledgements F i n a n c i a l s u p p o r t of the w o r k r e p o r t e d here has b e e n p r o v i d e d b y the A u s t r i a n " F o n d s zur F S r d e r u n g d e r wissenschaftlichen F o r s c h u n g " ( p r o j e c t no. 5286), b y the " S t i f t u n g Volkswagenw e r k " , B R D a n d the " H o c h s c h u l j u b i l a e u m s s t i f t u n g d e r S t a d t W i e n " . T h e p r e p a r a t i o n of comp u t e r p l o t s a n d drawings b y Dr. JSrg Swetina a n d Mr. J o h a n n KiSnig is gratefully acknowledged.

References [1] G. de Beer, Mendel, Darwin and Fisher, in: Notes and Records of the Royal Society of London 19 (1964) 192. [2] W.J. Ewens, Mathematical Population Genetics (Springer, Berlin, 1979). [3] C.K. Biebricher, M. Eigen and W.C. Gardiner Jr., Biochemistry 22 (1983) 2544. [4] A. Kornberg, DNA Replication (Freeman, San Francisco, 1980). [5] M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge U.K., 1983). [6] M. Eigen, Naturwissenschaften 58 (1971) 465. [7] M. Eigen and P. Schuster, Naturwissenschaften 64 (1977) 541. [8] M. Eigen and P. Schuster, Naturwissenschaften 65 (1978) 7. [9] M. Eigen and P. Schuster, Naturwissenschaften 65 (1978) 341. [10] M. Eigen and P. Schuster, J. Mol. Evol. 19 (1982) 47. [11] M. Eigen, Ber. Bunsenges. Phys. Chem. 89 (1985) 658. [12] P. Schuster and K. Sigmund, Ber. Bunsenges. Phys. Chem. 89 (1985) 668. [13] C.K. Biebricher, M. Eigen and R. Luce, J. Mol. Biol. 148 (1981) 369.

119

[14] C.K. Biebricher, M. FAgenand R. Luce, J. Mol. Biol. 148 (1981) 391. [15] C.K. Biebricher, S. Diekmann and R. Lute, J. Mol. Biol. 154 (1982) 629. [16] C.K. Biebricher, Evolutionary Biology 16 (1983) 1. [17] M. Eigen, J.S. McCaskill and P. Schuster, A Stochastic Model of Molecular Evolution, preprint 1985. [18] J. Hofbauer and P. Schuster, Dynamics of Linear and Nonlinear Autocatalysis and Competition, in: Stochastic Phenomena and Chaotic Behaviour in Complex Systems, P. Schuster, ed. (Springer, Berlin, 1984) p. 160. [19] K. Kruger, P.J. Grabowski, A.J. Zaug, J. Sands, D.E. Gottschling and T.R. Ceeh, Cell 31 (1982) 147. [20] C. Guerrier-Takada, K. Gardiner, T. Marsh, N. Pace and S. Altmann, Cell 35 (1983) 849. [21] P. Schuster, K. Sigmund and R. Wolff, Bull. Math. Biol. 40 (1978) 743. [22] J. Hofbauer, P. Schuster, K. Sigmund and R. Wolff, SIAM J. Appl. Math. 38 (1980) 282. [23] P. Schuster, K. Sigmund and R. Wolff, J. Math. Anal. Appl. 78 (1980) 88. [24] J. Hofbauer, P. Schuster and K. Sigmund, J. Math. Biol. 11 (1981) 155. [25] P.E. Phillipson and P. Schuster, J. Chem. Phys. 79 (1983) 3807. [26] B. Hess and M. Markus, Ber. Bunsenges. Phys. Chem. 89 (1985) 642. [27] C.J. Thompson and J.L. McBride, Math. Biosc. 21 (1974) 127. [28] B.L. Jones, R.H. Enns and S.S. Rangnekar, Bull. Math. Biol. 38 (1976) 15. [29] J. Swetina and P. Schuster, Biophys. Chem. 16 (1982) 329. [30] R.G. Casten and C.J. Holland, J. Diff. Equ. (1978) 266. [31] S. Shahshahaxti, Memoirs Am. Math. Soc. 211 (1979). [32] K. Sigmund, The maximum principle for replicator equations,, in: Lotka-Volterra Approach to Dynamical Systems, M. Pesehel, ed. (Akademie Verlag, Berlin, 1985). [33] R. Feistel and W. Ebeling, Studia Biophysica 71 (1978) 139. [34] B.L. Jones, J. Math. Biol. 6 (1978) 169. [35] P. Schuster and K. Sigmund, J. Theor. Biol. 100 (1983) 533. [36] J. Ho.fbauer, Nonlinear Anal. 5 (1981) 1003. [37] H. Meinhardt, Ber. Bunsenges. Phys. Chem. 89 (1985) 691. [38] P. Schuster a.nd K. Sigmund, Selection and Evolution in General Open Systems, preprint 1985. [39] P.E. Phillipson, P. Schuster and F. Kemler, Bull. Math. Biol. 46 (1984) 339. [40] K. Sigmund and P. Schuster, Permanence and uninvadability for deterministic population models, in: Stochastic Phenomena and Chaotic Behavior in Complex Systems, P. Schuster, ed., (Springer, Berlin, 1984). p. 173. [41] D. Rumsehitzky, preprint 1985. [42] M. Eigen, J.S. McCaskill and P. Schuster, the Quasispecies Model in Molecular Evolution, preprint 1985. [43] L. Demetrius, P. Schuster and K. Sigmund, Bull. Math. Biol. 47 (1985) 239. [44] J.S. McCaskill, J. Chem. Phys. 80 (1984) 5194.