Evolutionary dynamics of a polymorphic self-replicator population with a finite population size and hyper mutation rate

Evolutionary dynamics of a polymorphic self-replicator population with a finite population size and hyper mutation rate

Journal of Theoretical Biology 382 (2015) 298–308 Contents lists available at ScienceDirect Journal of Theoretical Biology journal homepage: www.els...

855KB Sizes 0 Downloads 31 Views

Journal of Theoretical Biology 382 (2015) 298–308

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

Evolutionary dynamics of a polymorphic self-replicator population with a finite population size and hyper mutation rate Takuyo Aita a, Tetsuya Yomo a,b,c,n a

Exploratory Research for Advanced Technology, Japan Science and Technology Agency, Yamadaoka 1-5, Suita, Osaka, Japan Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Yamadaoka 1-5, Suita, Osaka, Japan c Graduate School of Frontier Biosciences, Osaka University, Yamadaoka 1-5, Suita, Osaka, Japan b

H I G H L I G H T S

    

We analyzed the evolutionary dynamics of an asexual self-replicator population. The Kauffman's NK model was used as a model of the fitness landscape. The dynamics was considered based on the local fitness distribution on the landscape. The first three cumulants of fitnesses in the evolving population were formulated. We examined how the dynamics depends on several parameters.

art ic l e i nf o

a b s t r a c t

Article history: Received 20 March 2015 Received in revised form 3 July 2015 Accepted 10 July 2015 Available online 21 July 2015

Self-replicating biomolecules, subject to experimental evolution, exhibit hyper mutation rates where the genotypes of most offspring have at least a one point mutation. Thus, we formulated the evolutionary dynamics of an asexual self-replicator population with a finite population size and hyper mutation rate, based on the probability density of fitnesses (fitness distribution) for the evolving population. As a case study, we used a Kauffman's “NK fitness landscape”. We deduced recurrence relations for the first three cumulants of the fitness distribution and compared them with the results of computer simulations. We found that the evolutionary dynamics is classified in terms of two modes of selection: the “radical mode” and the “gentle mode”. In the radical mode, only a small number of genotypes with the highest or near highest fitness values can leave offspring. In the gentle mode, genotypes with moderate fitness values can leave offspring. We clarified how the evolutionary equilibrium and climbing rate depend on given parameters such as gradient and ruggedness of the landscape, mutation rate and population size, in terms of the two modes of selection. Roughly, the radical mode conducts the fast climbing but attains to the stationary states with low fitness, while the gentle mode conducts the slow climbing but attains to the stationary states with high fitness. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Fitness landscape Experimental evolution NK model In vitro evolution Evolvability

1. Introduction Many kinds of experimental evolution, by using bacteria (Barrick et al., 2009), viruses (Meyer et al., 2012) or self-replicating molecular systems (Kita et al., 2008), have been conducted over the world, and then it becomes more and more necessary to analyze the evolutionary dynamics and statistical properties of fitness landscapes (Betancourt and Bollback, 2006; Pitt and Ferre-D'Amare, 2010; Steinbruck and n Corresponding author at: Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Yamadaoka 1-5, Suita, Osaka, Japan. E-mail address: [email protected] (T. Yomo).

http://dx.doi.org/10.1016/j.jtbi.2015.07.007 0022-5193/& 2015 Elsevier Ltd. All rights reserved.

McHardy, 2011; Otwinowski and Nemenman, 2013). Meanwhile, many theorists have tackled to construct mathematical theories for experimental evolution (Eigen, 1985; Gerrish and Lenski, 1998; Wahl and Krakauer, 2000; Voigt et al., 2000; Schiffels et al., 2011). These theoretical works can be categorized into two viewpoints: models of fitness landscapes and regimes of evolution. As for the models of fitness landscapes, there are three categories: (1) the abstract mathematical model based on the sequence space or the allele frequency space, (2) the molecular structure-based model and (3) the “stairway to heaven” model. There are so many varieties of the mathematical model based on the sequence space, such as the “flat landscape” (Derrida and Peliti, 1991), the “additive fitness landscape” (Woodcock and Higgs, 1996; Prügel-Bennett, 1997),

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

d

Notation

ν λ ϵ σ k M

299

sequence length number of all available letters at each site parameter that governs the “gradient” of the NK landscape standard deviation of site-fitnesses over allpavailable λ ffiffiffi letters at each site. Approximately, σ  ϵ= 3 parameter that governs the “degree of ruggedness” of the NK landscape population size. Number of all different (parent) genotypes in a population

“random rugged landscape” (Macken and Perelson, 1989), “royal road landscape” (Nimwegen et al., 1999), “Babel-tower landscape” (Suzuki and Iwasa, 1999), “block model landscape” (Perelson and Macken, 1995; Orr, 2006a), “NK fitness landscape” (Kauffman, 1993; Altenberg et al., 1997), “rough mount Fuji-type” (Aita et al., 2000) and “neutral network landscape” (Nimwegen et al., 1999). For example of the molecular structure-based model, RNA folding landscape (Fontana et al., 1993; Huynen et al., 1996) and protein folding landscape (Govindarajan and Goldstein, 1997) were studied well. Particularly, the concept of the “neutral network” was proposed through the study of the sequence- (secondary) structure mapping for RNA molecules (Schuster et al., 1994). Then, the neutral molecular evolution is interpreted as a random drift along the neutral network (Schuster and Fontana, 1999; Schultes and Bartel, 2000). The stairway to heaven model, in which one assumes a constant distribution of selection coefficients or a constant fraction of beneficial mutations for every point on the landscape, has been used in many studies for evolution of asexual populations (Gerrish and Lenski, 1998; Wilke, 2004; Park and Krug, 2007; Hosoda et al., 2014). This model is effective when the sequence length is infinitely long and the final goal as the global optimum is far away. Therefore, this model can be applied to evolution of bacteria and is not so suitable for evolution of viruses. Several articles inferred the structural properties of real landscapes by fitting parameters of the mathematical model to experimental data (Kauffman and Weinberger, 1989; Wahl and Krakauer, 2000; Aita et al., 2007; Kryazhimskiya et al., 2009; Szendro et al., 2013; Neidhart et al., 2013). As for regimes of evolution, population size M and mutation rate μ per site per replication are crucial parameters. A representative study with infinitely large population size was done by Eigen's group, as is known as the “quasi-species” theory (Eigen, 1985). The quasi-species is a polymorphic population of self-replicators with high mutation rate and large population size. Its concept has been applied to interpret the evolutionary dynamics of viruses (Elena et al., 2008). Lenski's group has been examining the evolutionary dynamics of bacteria experimentally and theoretically (Gerrish and Lenski, 1998; Wiser et al., 2013). Since the mutation rate of bacteria is much lower than that of viruses, it becomes easy to pursue the dynamics of clonal interference among a few genotypes and their fixation (Gerrish and Lenski, 1998; Park and Krug, 2007; Fogle et al., 2008; Schiffels et al., 2011). Recently, the synthetic biology is contributing to experimental evolution. Ichihashi et al. (2013) has conducted an experimental evolution of a translationcoupled RNA replication system. This system takes a hyper mutation rate (μ ¼ 10  3 ) and large population size (M ¼ 108 ), showing a polymorphic population and highly complex dynamics (Ichihashi et al., in preparation). We consider that Ichihashi's system is one of the most simplified biological systems constructed artificially and then it is worth to analyze it theoretically. In this paper, based on our previous studies (e.g. Aita et al., 2007), we examined the evolutionary dynamics of finite self-

number of mutated sites. Hamming distance between a parent genotype and each of its mutants α parameter that governs the skewness of an unsymmetrical Gaussian Wm fitness of the mth fittest among M genotypes in a population Xm standardized fitness of W m W; U; S mean, standard deviation and skewness of fitnesses over M genotypes W t ; U t ; S t the W, U and S at the tth generation, respectively W n ; U n ; S n the W, U and S in the stationary state, respectively

replicator population by using the NK fitness landscape and the regime of evolution with hyper mutation rate and large population size such like Ichihashi's system. Our aim is (1) to formulate a dynamics of population's fitness distribution along a single fitness coordinate without considering the details of individual sequences and (2) to clarify how parameters, such as mutation rate, population size and the gradient and ruggedness of the landscape, affect evolutionary equilibrium and the rate of fitness increase. As a previous study, Campos et al. (2002) performed a numerical study of the evolutionary dynamics on an NK fitness landscape, on the aim to identify the mutation rate at which either the mean or the maximum fitness of the population is maximized and a transition point at which the population delocalizes at evolutionary equilibrium. Our study focuses on the mutation–selection–drift balance on this landscape and an evolutionary process toward it. To focus on population's fitness distribution is effective. Kryazhimskiya et al. (2009) took a similar approach and applied to estimate the types of the fitness landscape from the observed time series of fitness. Prügel-Bennett (1997) formulated the evolutionary dynamics based on the cumulants of fitness distribution for an additive fitness landscape and showed excellent results of prediction consistent with the simulations. In our paper, we deduced explicit recurrence relations of the first three cumulants (mean, variance and skewness) of fitnesses in an evolving Sections 2 and 3.1–3.3 do not specify types of fitness landscapes and have a generality. The remaining parts are based on an NK fitness landscape defined in Appendix B.

2. Model of an evolving population through natural selection and mutation We consider an evolving population of asexual self-replicators in a fixed environment. Each of their genotypes has the genome sequence with chain length of ν, in which λ letters are available at every site. At every generation, the population consists of M different genotypes, where M is fixed through all generations. Let Wm be the “fitness”1 of the mth fittest among M genotypes in the population: W 1 4 W 2 4⋯ 4 W M . First, all genotypes reproduce

offspring. The mth genotype reproduces cMeW m =Z offspring by P Wm and c⪢1 is an replicating its sequence, where Z  M m¼1e infinitely large constant. During the reproduction process of each offspring genotype, d-fold point mutations occur randomly in each sequence, where d corresponds to the number of mutated sites and the Hamming distance between a parent genotype and each of its offspring. We assume that d is fixed to a constant value throughout 1 Although, in biology, eW m should be called the (true) “fitness”, we call W m the “fitness” for convenience.

300

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

all generations. Although the resulting pool is a mixture of the M genotypes and cM offspring, we neglect the fraction of the M genotypes in the pool because c is infinitely large. Next, M offsprings are randomly chosen from the pool and become new (parent) genotypes in the next generation. Thus, the old mth genotype leaves its offspring into the new population by MeW m =Z individuals on average. The resulting population is heterogeneous and polymorphic one at every generation because of high mutation rate. The above model is equivalent to the Wright Fisher model with M individuals (Fisher, 1930). First, the mth individual is chosen from the old population with probability of eW m =Z. Second, d-fold point mutations occur randomly in its sequence. Third, the individual becomes a member in the next population. Our computer simulation was performed by using the Wright Fisher model.

we can circumvent the issue of being trapped in local optima due to Assumption A. The second assumption is Assumption B. The ψ m ðWÞ is sensitive to the fitness of the reference genotype m, and insensitive to the details of its sequence information. Namely, the genotypes that have the same fitness share the same probability density ψ m ðWÞ. The validity of Assumption B was supported in our previous studies for an NK landscape (e.g. Aita, 2008) and this assumption was used in other studies (Kryazhimskiya et al., 2009). According to Assumption B, we can describe the evolutionary dynamics along the single fitness coordinate W without considering the details of individual sequences. 3.2. Fitness distribution for an evolving population

3. Formulation of evolutionary dynamics 3.1. Local fitness distribution around a single genotype Consider a set of all conceivable d-fold point mutants generated from a reference genotype m, where d is a constant. All the d-fold point mutants are located on a surface of a hyper-ball with radius d centering at the reference sequence m in “sequence space”  (Maynard-Smith, 1970). The size of the set is νd ðλ  1Þd , where λ is the number of available letters at each site. Let W be a variable representing the fitness of an arbitrary mutant in the set, and let ψ m ðWÞ be the probability density of W over all the mutants in the set, where the subscript m means that this probability density is for the set of (all conceivable d-fold point) mutants derived from the genotype m. That is, the probability that a randomly chosen mutant takes a value between W and W þ dW is ψ m ðWÞdW. Of course the ψ m ðWÞ drastically depends on the local structural property of an underlying fitness landscape around the reference genotype m in sequence space. We base on the following two assumptions. The first assumption is A. The sequence length Assumption  ν ðλ  1Þd ⪢M, where M ¼ 103  1012 . d

ν is large enough to satisfy

This means that the surface of the hyper-ball with radius d in sequence space is much larger than the population size M. Then,

M genotypes generate an offspring pool with infinitely large population size. Let W be the fitness of an arbitrary offspring in the pool, and let Ψ ðWÞ be the probability density of W in the pool. That is, the probability that a randomly chosen offspring takes a value between W and W þ dW is Ψ ðWÞ dW. Ψ ðWÞ is given by PM Wm ψ m ðWÞ ¼1e Ψ ðWÞ ¼ m P ; ð1Þ M Wm m¼1e where eW m is the fitness of the mth genotype (see Fig. 1). The fitness values of new M genotypes are randomly chosen from the probability density Ψ ðWÞ. Here, in order to remove the probabilistic effect by random sampling as much as possible, we base on the following assumption: Assumption C. The population size M is sufficiently large enough to identify Ψ ðWÞ with the probability density of fitnesses for the new M genotypes, except that the maximum of fitnesses among the new M genotypes is probabilistically determined. Hereafter, we characterize the evolving population, which consists of M genotypes, by the following four variables (W; U; S and X1): the first three cumulants of Ψ ðWÞ, that is the mean W, standard deviation U and skewness S, and the standardized highest fitness X 1  ðW 1  WÞ=U among M genotypes, where the value of W1 is probabilistically determined as the extreme value calculated from the probability density Ψ ðWÞ (Aita et al., 2007). It is convenient to introduce the standardized fitness X m  ðW m  WÞ=U. By introducing a continuous variable X, the distribution of Xm over all m's is described as the probability density

ρðXÞ 

M 1 X δðX  X m Þ; Mm¼1

ð2Þ

Probability density

where δðÞ is the Dirac delta function. It should be noted that the mean, standard deviation and skewness for ρðXÞ are 0, 1 and S respectively, and higher order cumulants of ρðXÞ are the same as those of Ψ ðWÞ except the upper truncation according to Assumption C.2 For higher order cumulants, we base on the following assumption:

Fitness W Fig. 1. The Gaussian distribution shown with a colored dashed line represents the probability density ψ m ðWÞ, which is derived from a genotype with fitness Wm (m ¼ 1; 2; …; 5). The relative size of the distribution reflects the fitness expðW m Þ. The probability density of fitnesses in the whole offspring pool, Ψ ðWÞ shown with the black solid line, is obtained by superposing these five distributions. The rightdirected arrow represents the change in population's mean fitness.

Assumption D. At every generation, ρðXÞ is assumed as the following “unsymmetrical Gaussian” with a truncation at X ¼ X 1 : ( expð ðX  bÞ2 =2s2 Þ for X o b ð3Þ ρðXÞ ¼ c  expð ðX  bÞ2 =2ðαsÞ2 Þ for b r X r X 1 ; and ρðXÞ ¼ 0 otherwise, where α is a variable parameter to govern the skewness of ρðXÞ. The c  cðαÞ, b  bðαÞ and s  sðαÞ are given 2 Exactly, Ψ ðWÞ takes an upper truncation, but we can neglect it based on Assumption A.

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

301

as functions of α (see Appendix A). We assume that the truncation RX is so small enough to satisfy 11 ρðXÞ  1.

we define the two “modes of selection” from the viewpoint of the shape of A(X) as follows:

Note that the mean, standard deviation and skewness of ρðXÞ are 0, 1 and S, respectively. The parameter α is related with the skewness S by

Radical mode

b; if X 1 o X

ð9Þ

Gentle mode

b oX 1 : if X

ð10Þ

α  S þ 1:

b, In the gentle mode, A(X) takes a Gaussian shape with mean X while in the radical mode, A(X) takes a sharp peak at X ¼ X 1 . We note that the above definition is not rigorous, that is, the boundary between the two modes is unclear and then we call the boundary the “intermediate mode”. It is obvious that, in the radical mode, only a small number of genotypes with high fitnesses near X1 can leave offspring. This means the selection is driven radically. In the gentle mode, genotypes with moderate fitness values lower than X1 can leave offspring. This means the selection is driven gently. Several computer simulations for an NK fitness landscape suggested that the mode of selection is likely to be unchangeable throughout the evolution process, once the values of the parameters are given. However, it remains to be confirmed exactly. We note that the concept of “modes of selection” is effective to describe the dynamics of fitness distribution.

ð4Þ

Their rigorous relationship is shown in Fig. S1(a). Assumption D is the most crucial assumption. Then, the validity of this assumption is discussed in Discussion. Again, the X1 represents the standardized fitness of the fittest among the M individuals. Therefore, the effect of population size M is directly reflected in X1. At every generation, the X1 value is probabilistically determined due to the effect of random sampling from the offspring pool with infinitely large population size.3 We assume that, based on Assumption D, the value of X1 is probabilistically determined as the extreme value calculated from the unsymmetrical Gaussian (Aita et al., 2007). In this paper, the X1 is calculated by the following manner. Suppose that M random numbers are chosen from the probability density ρðXÞ in Eq. (3) without truncation. We consider X1 as the greatest number among the M numbers chosen. The expectation and standard deviation of X1 are given in Appendix A. Although they are complicated, it turned out that a simple approximation pffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 1  2 ln M ð5Þ is effective and useful in almost cases. Hereafter, we use Eq. (5) and neglect the probabilistic effect of X1. 3.3. Two modes of selection: radical mode and gentle mode Eq. (1) is rewritten as Z X1 Ψ ðWÞ ¼ AðXÞψ m ðWÞ dX;

ð6Þ

1

where A(X) is the “amplification factor” defined as AðXÞ  R X

eUX ρðXÞ 1

1

eUY ρðYÞ dY

ðX r X 1 Þ;

ð7Þ

where ρðXÞ is defined in Eq. (2) and given in Eq. (3). The AðXÞ dX means a fraction of offspring generated from the genotypes with standardized fitnesses between X and X þ dX. The factor ψ m ðWÞ in Eq. (6) represents the effect of diversification by mutations. Here, we note that the variables and functions are determined in the following procedure: Reproduction

fW; U; S; X 1 ; ρðXÞg



Sampling

fAðXÞ; Ψ ðWÞg ⟶ fW; U; S; X 1 ; ρðXÞg-:

We focus on the function A(X), because its shape is related with the “modes of selection” (see Fig. 2). By substituting Eq. (3) into Eq. (7), A(X) is approximated to the following truncated Gaussian: ! b Þ2  ðX  X ð8Þ AðXÞ ¼ C  exp ðX r X 1 Þ; 2ðαsÞ2 b  b þ ðαsÞ2 U  U. This where C is the normalization factor and X states that the standard deviation of fitnesses in the population, U, plays a significant role in reproduction process. The shape of A(X) is classified into the following two types by the degree of truncation, as is presented in Fig. 2. If truncation is so small as b . If to be negligible, A(X) takes a Gaussian shape with mean X truncation is significant, A(X) takes a sharp peak at X ¼ X 1 . Then, 3 Exactly saying, one should consider the effect of random sampling from the offspring pool on W, U and p S.ffiffiffiffiThe statistical theory states that the W fluctuates by ffi standard deviation of U= M . However, we neglect this fluctuation due to Assumption C.

3.4. Parameters to give the effect from characteristics of an NK fitness landscape An NK fitness landscape we used in this study is defined in Appendix B. This landscape has two crucial parameters, ϵ Z0 and k Z 0, that correspond to the “gradient” and the “degree of ruggedness” of the landscape, respectively. The ϵ and λ (¼ number of available letters) are merged into a single parameter σ Z 0 (see Eq. (29)), which is defined as the standard deviation of the underlying pffiffiffi “site-fitness distribution” at each site. Approximately, σ  ϵ= 3 for large λ. The mean and variance of fitnesses over the whole sequence space are zero and σ 2 ν, respectively. The effect from the characteristics of the NK landscape is represented by σ and k. The form of ψ m ðWÞ, which is introduced in Section 3.1, was obtained in our previous paper (Aita, 2008), and shown in Appendix B. 3.5. Recurrence relations for an NK fitness landscape We denote the variables W (population's mean fitness), U (standard deviation), S (skewness) and α (a parameter of the unsymmetrical Gaussian) at the tth generation by W t , U t , S t and αt, respectively. These variables are predicted by solving recurrence relations through all generations. In the case of NK landscape, the recurrence relations are easily derived from Eq. (6) with Eqs. (32) and (33). For example, the population's mean fitness W is R1 calculated from  1 W Ψ ðWÞ dW (the details are omitted). The recurrence relations are given in Eqs. (4)–(8) in Supplemental materials, and the procedure is schematically presented in Fig. 3. The following three recurrence relations are simplified forms, which originated from Eqs. (4)–(8) in Supplemental materials:   ð1 þ kÞd Wt þ 1 ¼ 1  ð11Þ ðW t þF 1;t U t Þ

ν

U 2t þ 1  2σ 2 ð1 þ kÞd þ F 2;t U 2t  S t þ 1  F 3;t

Ut Ut þ 1

ð12Þ

3 ;

ð13Þ

where F n;t ðn ¼ 1; 2; 3Þ are significant quantities mentioned below. First, we define a function F n ðU; α; X 1 Þ (n ¼ 1; 2; 3) as the nth order center moment of the probability density A(X) defined in Eq. (7), RX where U, α and X1 are arguments: F 1  11 X AðXÞ dX and

302

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

1.2

Probability density

1.0

Radical mode AX Mean fitness

0.8

Intermediate mode

X

Gentle mode AX

X1

0

X1 X

X1 0.0

t

t

probabilistically

X 1,t

t

t

t 1

t 1

t 1

(X )

F1 (

t

,

t

, X 1,t )

F2 (

t

,

t

, X 1,t )

F3 (

t,

t 1

5

10

20

25

30

20

25

30

0.8

0.6

0.4

0.2

0.0 0

5

10

t , X 1,t )

X 1,t

15

1.0

Standard deviation

t

skewness

0

k

Fig. 2. Two modes of selection: gentle mode and radical mode. The black solid line represents ρðXÞ, where X is the standardized fitness. Each of the colored dashed lines represents the probability density A(X) (defined in Eq. (7)). The blue b o X 1 , while the red distribudistributions belong to the “gentle mode” because X b . The green distributions belong tion belongs to the “radical mode” because X 1 o X to the “intermediate mode”. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

s.d.

0.4

0.2

Standardized fitness X

mean

0.6

15

k

1

0.0

probabilistically

RX F n  11 ðX  F 1 Þn AðXÞ dX (n ¼2,3). Then, F n;t  F n ðU t ; αt ; X 1 Þ ðn ¼ 1; 2; 3Þ, where αt ¼ S t þ 1 (Eq. (4)) and X1 is given by Eq. (5). The forms of Fn are crucial in this theory and then shown in Fig. S4. Particularly, the mathematical form of F1 is described in Supplemental materials. The term W t þ F 1;t U t in Eq. (11) means the fitness value which supplies the largest number of offspring to the next generation. In the recurrence relations (Fig. 3), it is remarkable that the standard deviation U affects other variables. Therefore, the U plays a central role among all variables. We note that the resulting time series from this recursive procedure is an expected one. The effect of probabilistic fluctuation will be discussed elsewhere.

4. Results 4.1. Time series of variables through the recurrence relations and computer simulation To evaluate the validity of our theory, we carried out computer simulation of evolution based on the original model in Section 2

0.1 0.2

Skewness

Fig. 3. Recurrence relations of five variables for an NK landscape. Each arrow represents an elemental step for calculation. The variable at the head of the arrow is deterministically obtained from the variable at the origin of the arrow, except that X 1;t is probabilistically obtained from αt due to the effect of random sampling from the offspring pool. The dashed arrow from W t toward U t þ 1 represents the effect of a factor BðW t Þ defined in Eq. (9) in Supplemental materials. This effect is not so significant and negligible in the further approximation of BðW t Þ  2.

0.3 0.4 0.5 0.6

0

5

10

15

20

25

30

k Fig. 4. Time series of the population's mean fitness W, standard deviation U, skewness S and the maximal standardized fitness X1. By performing the computer simulation mentioned in Section 4.1, the average and standard deviation over 100 runs are shown with the series of dots (deep blue) and red lines, respectively. The black solid line represents the theoretically predicted one obtained by solving the recurrence relations (Eqs. (4)–(6) and (23) in Supplemental materials). The used parameters are λ ¼ 4, ν ¼ 200, ϵ ¼ 0:1, k ¼0, d ¼1 and M ¼5000. This case corresponds to the gentle mode. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

and an NK fitness landscape with no assumptions (without Assumptions A–D). For each trial of simulation, we started from a randomly generated sequence. The population size was fixed to M from the second generation. The used parameters are λ ¼ 4, ν ¼ 200, ϵ ¼ 1 or 0.1, k ¼0, 10, 20 or 30, d ¼1, M ¼500 or 5000. The results are shown in Figs. 4 and 5. Fig. 4 shows a time series of a single

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

evolution process, while Fig. 5 shows stationary values of W n , U n and S n in an evolutionary equilibrium. These results were compared with results of prediction by solving the recurrence relations, Eqs. (4)–(6) and (23), in Supplemental materials. Note that Eqs. (4)–(6) and (23) in Supplemental materials are more exact forms than Eqs. (11)–(13) with Eq. (5). As a whole, both the time series and stationary values predicted from the recurrence relations were close to those from the simulation. For example, in Fig. 4, the black solid lines (theory) reproduce the time series of dots (simulation) roughly, and in Fig. 5, the filled symbols (simulation) and open ones (theory) with the same color and same shape are located on close positions, except discrepancies between the filled circles and open ones for the skewness S n (the last figure). The discrepancies stem from the reasons below. We note that although this main text is presented based on Assumption A for a simple presentation, the situations in Figs. 4 and 5 do not necessarily satisfy Assumption A, that is, the issue of being trapped in local optima arises (Aita, 2008). This effect must be incorporated into calculation of the X1 value because X1 represents the upper limit of fitnesses in a population. More exact calculation of X1 is given by Eq. (23) in Supplemental materials, while Eq. (8) in Supplemental materials is based on Assumption A. Therefore, the predicted values in Figs. 4 and 5 were calculated by using Eq. (23) in Supplemental materials. We can guess that in the first figure (about the mean fitness W n ) in Fig. 5, the population represented by the filled and open circles was trapped in local optima actually because the filled and open circles are located near the single solid line, while the population represented by the filled and open triangles was not

303

trapped in local optima. In the last figure (about the skewness S n ) in Fig. 5, the filled and open circles are not close to each other. This discrepancy stems from that the effect of being trapped in local optima is not sufficiently incorporated into this theory. Concerning Fig. 4, the common features throughout the simulation are as follows: S and X1 tend toward around the stationary values after a few generations rapidly, while W tends toward the stationary value slowly. As for U, it rapidly increases up to the maximal value after a few generations but subsequently decreases slowly toward the stationary value. This gradual decrease is caused by the decreasing Vm (Eq. (33)), which is the variance of local fitness distribution around an arbitrary reference sequence. But this effect is not so significant and can be negligible in a further approximation. This simulation was implemented by the C program running in PC cluster Express5800/120Rg-1  128 nodes (48GFLOPS, Intel Xeon 3 GHz (Woodcrest) 2CPU 4 core, 16 GB memory per node). The operating system is SUSE Linux Enterprise Server 10. 4.2. Evolutionary equilibrium The evolving population reaches an evolutionary equilibrium after sufficient generations. There are two cases of the evolutionary equilibrium: settling into a mutation–selection–random drift balance or being trapped at local optima (Campos et al., 2002). The latter case occurs when the population size M is greater than ν d d ðλ  1Þ , which is the number of all possible d-fold point mutants derived from a reference sequence. Since this paper obeys Assumption A, we should consider only the former case.

0.4

Standard deviation

Mean fitness

15 10 5 0

0.3 0.2 0.1 0.0

0

100

200

300

400

500

0

100

Generation t

200

300

400

500

Generation t

0.0

4

0.1

3

X1

Skewness

5

2 0.2 1 0.3 0

100

200

300

Generation t

400

500

0

0

100

200

300

400

500

Generation t

Fig. 5. Results of computer simulation of evolution in comparison with theoretical prediction. The stationary values W n , U n and S n are shown against the ruggedness parameter k. The filled symbols and open ones with the same color and same shape represent the results of simulation and those of prediction, respectively. For the simulation, the average and standard deviation over 100 runs are shown with the filled symbols and error bars, respectively. The used parameters are λ ¼ 4, ν ¼ 200, ϵ ¼ 1 (circle) or 0.1 (triangle), k ¼0, 10, 20 or 30, d ¼ 1, M ¼ 500 (red) or 5000 (blue). The open symbols are the theoretically predicted ones obtained by solving the recurrence relations (Eqs. (4)–(6) and (23) in Supplemental materials). (a) The solid line represents the position where local optima with radius of one Hamming distance appear. (b) The pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi solid line represents the theoretically predicted lowest values U n ¼ 2σ 2 ð1 þ kÞd. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

304

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

We denote the stationary values of the four variables at an evolutionary equilibrium by W n , U n and S n . With three given parameters, ϵ (or σ in Eq. (29)), k, d and M, the stationary values

are determined by solving the recurrence relations (Eqs. (4)–(8) in Supplemental materials) at infinite generation. Fig. 6 shows how these values depend on ϵ (or σ), k, d and M. The behavior of these variables is classified by the modes of selection: the radical mode when X n1 o U n (Eq. (9)) and gentle mode when U n o X n1 (Eq. (10)). These conditions are rewritten by using the three parameters as follows: ln M o σ 2 ð1 þ kÞ; d

ð14Þ

ln M : d

ð15Þ

Radical mode

if

Gentle mode

if σ 2 ð1 þ kÞ o

The derivation is mentioned later. The ðln MÞ=d in (14) and (15) is a characteristic quantity in this theory. The d represents the degree of diffusion in sequence space by mutations, while M represents the area of the sampling in a sequence space. Roughly saying, the radical mode conducts a wide but rough search with small ðln MÞ=d values, while the gentle mode conducts a narrow but sophisticated search with large ðln MÞ=d values. As the population climbs the fitness landscape, the rate of ascending paths around the population decreases gradually. Then the gentle mode is expected to work well in order to find the ascending paths. In Fig. 6, we can see the following qualitative features. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiFig. 6 (a) states that X n1 is approximately given by X n1  2 ln M and insensitive to other parameters. Particularly, this equation is well fitted for the gentle mode (indicated with blue lines). Fig. 6 (b) shows that S n takes from a narrow range between -0.2 and 0.0, and that S n decreases monotonously against ð1 þ kÞd in the gentle mode and subsequently increases monotonously toward zero in the radical mode. This is interpreted as follows. The S n is approximately equal to the third order center moment (denoted by F3) of the probability density A(X). In Fig. 2, we can see that the F3 takes near zero for gentle mode (blue line) and radical mode (red line), while the F3 takes large negative values for intermediate mode (green line). This is reflected in the behavior of S n in Fig. 6(b). From Eqs. (11)–(13) with Eqs. (4) and (5), the stationary values for the three variables are given as the following explicit functions of the parameters σ, k, d and M (derivation is given in Supplemental materials): ( ðν=ð1 þ kÞdÞU n X n1 for radical mode Wn  ð16Þ 2 for gentle mode; ðν=ð1 þ kÞdÞU n ( U

n2



Sn  

2σ 2 ð1 þkÞd

2ln M  κ 2

σ 2 ð1 þ kÞd 2 ln M  κ 2

for radical mode for gentle mode; for gentle mode;

ð17Þ

ð18Þ

where

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ln M =ln M: κ  1  ln 2 σ ð1 þ kÞd

ð19Þ

Fig. 6. Dependency of the stationary values W n , U n , S n and X n1 on given parameters ϵ, ð1þ kÞd and M. The ϵ values are indicted by colors: red lines: ϵ ¼ 1; blue lines: ϵ ¼ 0:1. The M values are shown in the figures. The solid lines represent the gentle mode, while the dashed lines represent the radical mode. We used ð1  ð1 þ kÞd=νÞ ¼p 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffi in the calculation. In (a), the black dots at the left side indicate the values of 2 ln M . In (b), the thin lines represent the predicted values for the gentle mode by using pEq. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ffiffiffi (18). In (c), the dotted line represents U n ¼ 2σ 2 ð1 þ kÞd, where σ ¼ ϵ= 3. In (d), the horizontal thin dotted line represents the boundary between the two modes for the case of ϵ ¼ 1 (red) and for the casepffiffiffi of ϵ ¼ 0:1 (blue). Our theory states that the boundary lies at W n =ϵν ¼ 2σ= 3  σ. Note that these results were obtained by calculating the recurrence relations (Eqs. (4)–(8) in Supplemental materials) at infinite generation. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

S n tends to zero for radical mode. The validity of Eq. (18) can be seen in Fig. 6(b). At the intermediate mode, ðln MÞ=d ¼ σ 2 ð1 þ kÞ holds. Then, the upper term and lower term in each of the above equations share the same value (2σ 2 ν for Eq. (16) and 2σ 2 ð1 þkÞd for Eq. (17), and κ  1 in Eq. (19)). Fig. 6(c) and (d) is interpreted by Eqs. (16)–(17) as follows. The following is for cases where equilibrium settles in radical mode. The result of this mode is shown with the dashed lines in Fig. 6 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (c) and (d). In Fig. 6(c), 2σ 2 ð1 þkÞd represents the standard deviation of fitnesses of offspring generated from the single fittest genotype (details are in Eq. (33) in Appendix B). U n tends to pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2σ 2 ð1 þ kÞd because the diversity of genotypes is considerably small (only a small number of the fittest or near fittest genotypes can leave offspring). Therefore, U n is insensitive to the population size M. In Fig. 6(d), the W n =ϵν (as the “degree of adaptation”, where ϵν represents the upper limit of fitness over the whole pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sequence space) is approximately given by ðln MÞ=ð1 þ kÞd n (derived from Eq. (16)). The W =ϵν at the boundary (intermediate mode) between the radical mode and gentle mode takes about σ, pffiffiffi more exactly, 2σ = 3. The following is for cases where equilibrium settles in gentle mode. The result of this mode is shown with the solid lines in Fig. 6(c) and (d). It is remarkable that S n is related with U n and W n as can be seen in Eqs. (16)–(17). In Fig. 6(c), U n becomes larger pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi than 2σ 2 ð1 þ kÞd, because the gentle mode induces weak selection and the diversity of genotypes is large. Therefore, U n is sensitive to the population size M. In Fig. 6(d), although the behavior of W n =ϵν against M and ð1 þ kÞd is similar to that in the

radical mode, the W n =ϵν becomes sensitive to the σ values, as is a large difference from the radical mode. Again, we can see in Fig. 6(d) that if the population size M and the σ value are given, the gentle mode can lead the population to a much higher degree of adaptation than the radical mode does, although the radical mode induces strong selection than the gentle mode does (Fig. 2). The gentle mode enables the population to conduct a sophisticated search due to large ðln MÞ=ð1 þkÞd values (Condition (15)) and to find ascending paths toward the top of the landscape. 4.3. Kinetics toward equilibrium Next, we consider kinetics of the evolving population. We denote the four variables at the tth generation by W t ; U t ; S t and X 1;t . Let ΔW and ΔU be the expected changes in W and U, respectively, after a single generation: ΔW ¼ W t þ 1 W t and ΔU ¼ U t þ 1  U t . A mathematical analysis of ΔU suggests that U t reaches a stationary value U n rapidly in the early stages (after a few generations) of evolution. This tendency can be seen in Fig. 4 (b). This finding is very important because the U value can be replaced with the constant U n from the early stages of evolution. From the recurrence relation shown in Eq. (11), we obtain ( ðð1 þ kÞd=νÞW t þ X 1;t U t for radical mode ΔW  ð20Þ for gentle mode: ðð1 þ kÞd=νÞW t þ U 2t The right term þ X 1;t U t and þ U 2t pushes the population upward on the landscape, while the left term  ðð1 þ kÞd=νÞW t represents the effect of Muller's ratchet by random mutations and pushes the population downward. This states the following interesting findings: the climbing rate ΔW is proportional to the standard deviation U for the radical mode, while ΔW is proportional to the variance U 2 for the gentle mode. The latter case corresponds to the Fisher's fundamental theorem of natural selection (Fisher, 1930), and the discrepancy by the mutational effects is represented by the term  ðð1 þ kÞd=νÞW t . A mathematical analysis of

305

ΔW suggests that W t reaches a stationary value W n through so many generations and its time series is approximately given by  t ! ð1 þkÞd Wt  Wn 1  1  ; ð21Þ ν

where we assume W 0 ¼ 0 as the initial condition. The details of the mathematical analysis of kinetics is given in Supplemental materials. Here, we introduce the “relaxation generation” tn that satisfies W tn  1  e  2  0:86: Wn

ð22Þ

From Eq. (21), the tn is approximately given by tn 

2ν : ð1 þ kÞd

ð23Þ

We note that tn does not depend on σ and M in this simple case. If the population size M and σ value are given, the values of tn for the gentle mode are larger than those for the radical mode because the gentle mode takes small ð1 þ kÞd values. Roughly, the radical mode conducts the fast climbing but attains to the stationary states with low fitness, while the gentle mode conducts the slow climbing but attains to the stationary states with high fitness.

5. Discussion Assumption D is the most crucial assumption in this paper. In Supplemental materials, we showed that if ρðXÞ is the unsymmetrical Gaussian given in Eq. (3), Ψ ðWÞ in Eq. (6) is approximately an unsymmetrical Gaussian. Therefore, it seems reasonable to assume ρðXÞ to be the unsymmetrical Gaussian. Although there are several models of the unsymmetrical Gaussian, we adopted Eq. (3) because it is analytically solvable. But it remains to be sophisticated. The results of computer simulation (Figs. 4 and 5) support the validity of Assumption D. Our theory is based on a probability density ψ as a local fitness distribution around a reference sequence in sequence space (Section 3.1). The shape of ψ should be deduced from the structure of the underlying fitness landscape. In the NK landscape we used, the shape of ψ is a normal distribution with mean ð1 ð1 þkÞd=νÞW and variance σ 2 ð1 þ kÞd  2σ 2 ð1 þ kÞd (Appendix B). This is owing to that the change in fitness by mutations is given as a sum of random independent numbers at ð1 þ kÞd updated sites among ν sites. This is an interesting result because the structural property of the NK landscape is reduced to this simple distribution. Our approach may be applied to other types of landscapes, if their ψ's are obtained. Many researchers have tackled the issue about the shape of fitness (or fitness) distribution of beneficial mutations (Fontana et al., 1993; Imhof and Schlotterer, 2001; Orr, 2003; Sanjuan et al., 2004; Kassen and Bataillon, 2006; Fogle et al., 2008). Betancourt and Bollback (2006) reviewed this issue. A majority of them recognized that the shape of fitness distribution of beneficial mutations is an exponential function (e.g. Kassen and Bataillon, 2006; Kryazhimskiya et al., 2009). It is interesting to consider what type of landscape make the fitness distribution being an exponential function. The rate of fitness increase is a central issue of this field. Many studies proposed equations of time series of population's mean fitness. Kryazhimskiya et al. (2009) connected types of fitness landscapes with the time series, applying to prediction of the type of a landscape from the experimental data. Almost all time series show the diminishing returns, which are mathematically described as the convex exponential (this study), hyperbolic, power-law (Wiser et al., 2013) or others (Wahl and Krakauer, 2000; Kryazhimskiya et al., 2009). Which model is the most

306

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

realistic and general is a delicate problem because of the accuracy and reproducibility of experiments. When considering the time series deeply, the cause of plateau should be distinguished into two cases: setting in the mutation-selection-drift balance and being trapped at local optima. These are distinguishable by analyzing sequence information in the population. In the former case, sequences are variable after the equilibrium sets in. More sophisticated theory of evolution should pursue the fate of individual genotypes and mutation's identities (Wahl and Krakauer, 2000; Schiffels et al., 2011). If mutation rate is so small and beneficial mutations are rare, one can implement it through a mathematical analysis of clonal interference and fixation of mutations (Gerrish and Lenski, 1998; Park and Krug, 2007; Fogle et al., 2008; Schiffels et al., 2011). Our study assumes hyper mutation rate and strong selection, where every offspring genotype has point mutations and then almost all genotypes disappear in the next generation (in our model, the product of population size and mutation rate per site per replication is 1  107 ). Therefore, the regime in this study is close to that in the quasi-species model (Eigen, 1985), which assumes strong mutation and strong selection with infinitely large population size. Certainly, the stationary polymorphic population at equilibrium in our study corresponds to the quasi-species. Our theory can predict the stationary value of population's mean fitness and variance of the quasi-species, but can not predict the population's structure in sequence space after a mutation–selection–drift balance sets in. We speculate that the population at equilibrium diffuses in sequence space in keeping the same mean-fitness value. This view may be similar to random walks on a neutral network (Huynen et al., 1996). An interesting issue for NK landscapes is a transition between localization and delocalization of the population in sequence space. Campos et al. (2002) examined this problem by computer simulation. In their results, a low mutation rate makes the population to be trapped in local optima. As the mutation rate increases, the population can find higher local optima. When the mutation rate exceeds a transition point, the population leaves the local optima and settles in a lower fitness state, which may be caused by a mutation-selection-drift balance. Aita (2008) showed that the NK landscape defined in this paper has a hierarchical structure composed of ascending slopes, nearly neutral networks, highlands, and local optima with various radii. The findings by Campos et al. (2002) seem to be interpreted from this structural viewpoint. As mentioned in our previous studies (Aita et al., 2014), it is possible to interpret the evolutionary dynamics analyzed in this paper by introducing the thermodynamics-like concepts (Iwasa, 1988). We confirmed the evolutionary dynamics can be described in the same mathematical framework as the thermodynamics. In our thermodynamics-like scheme, the “evolutionary force” consists of the “fitness force”, which drives the population in high fitness direction, and “entropy force”, which drives the population in high entropy direction (Aita et al., 2014). In cases of high (low) mutation rate and small (large) population size, the entropy force (fitness force) dominates and the population settles in a low (high) fitness and high (low) entropy state. Wilke et al. (2001) proposed a similar concept of the “survival of the flattest” and “survival of the fittest”. The radical mode is the case where the entropy force dominates (and may be the case of the survival of the flattest), while the gentle mode is the case where the fitness force dominates (and may be the case of the survival of the fittest). Although we did not refer to the probabilistic fluctuation of the climbing rate in this paper, the fluctuation is deeply related with a temperature-like concept. Then, the relationship between them will be reported elsewhere.

Appendix A. Exact forms of equations The s and b in Eq. (3) are determined as the following functions

of

α:

sðαÞ 

bðαÞ 

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi π ð1 þ αÞ ð2 þ πα2 2α2 Þα þ ð  2 þ 2α2 þ π Þ rffiffiffiffi 2 ð1  αÞsðαÞ:

π

ð24Þ

ð25Þ

The shapes of these functions of α are shown in Fig. S1(b, c). We pffiffiffiffi can see that αsðαÞ  α and bðαÞ  1  α hold when α 40:8. bð1Þ ¼ 0 and sð1Þ ¼ 1. The rough relationship between α and the skewness S is given by α  S þ 1. Their rigorous relationship is shown in Fig. S1(a). The expectation E½X 1  and standard deviation SD½X 1  of X1 are respectively given by sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2M α E½X 1   b þ αs  2 ln ð26Þ 1þα

αs SD½X 1   sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  : 2M α 2 ln 1þα

ð27Þ

Appendix B. Model of an NK fitness landscape and mathematical form of ψ m ðWÞ The “NK fitness landscape” is very popular one (Kauffman and Weinberger, 1989; Weinberger, 1991; Kauffman, 1993; Limic and Pemantle, 2004). In this model, a fitness contribution of an arbitrary site is influenced from other k sites. If k ¼0, the fitness landscape has a single peak and is called the Mount Fuji-type. As the k-value increases, the fitness landscape becomes more rugged. Suppose that the jth site is influenced from other k sites fj1 ; j2 ; …; jk g. These are randomly chosen from all ν  1 sites except the jth site. The fitness W for a given sequence “A1 A2 ⋯Aν ” is defined by W¼

ν X

wj ðAj jAj1 ; Aj2 ; …; Ajk Þ;

ð28Þ

j¼1

where wj ðAj jAj1 ; Aj2 ; ⋯; Ajk Þ is the “site-fitness”, i.e. a fitness contribution from a particular letter Aj combined with other given letters fAj1 ; Aj2 ; ⋯; Ajk g. The assignment of site-fitness values is modeled as follows: with a set of letters fAj1 ; Aj2 ; ⋯; Ajk g given, a site-fitness value of an arbitrary letter X (e.g., X¼ A, T, G, C) is assigned randomly from a discrete uniform distribution in the range ½  ϵ; ϵ, where ϵ is a positive constant. Roughly, we can say that ϵ corresponds to the gradient of the landscape, while k gives the degree of ruggedness of the landscape. The mean of the site-

λ letters is equal to zero, while the variance, σ2, is given by

fitness values over denoted by

σ2 ¼

ϵ2 ðλ þ 1Þ : 3ðλ  1Þ

ð29Þ

The ϵ (or σ) is a significant parameter which governs the strength of natural selection. The d mutations affect about kd other interacting sites and force them to update their site-fitness values. Let D be the expected number of sites that update their site-fitness values as a result of the d-fold point mutations. D is given by  d k D ¼ ν  ðν  dÞ 1  ; ð30Þ ν1

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

D  ð1 þ kÞd

for kd⪡ν:

ð31Þ

Therefore, ð1 þ kÞd is a crucial parameter to govern the evolutionary dynamics. The NK fitness landscape4 defined above satisfies Assumption B approximately. Based on the statistical property of the NK landscapes, the mathematical form of ψ m ðWÞ (defined in Section 3.1) can be approximately given by the Gaussian distribution with mean Em and variance Vm5:   ð1 þ kÞd Em  1  Wm ð32Þ

ν

Vm 

2

  ! Wm 2

ϵν

σ 2 ð1 þ kÞd;

ð33Þ

where ϵν represents the upper limit of fitness over the whole sequence space. Therefore,  1 r W m =ϵν r 1 tells a “degree of adaptation” of the genotype m (Schiffels et al., 2011). The approximation of Eq. (33), V m  σ 2 ð1 þ kÞd  2σ 2 ð1 þ kÞd, is used in the main text. Fig. 6(c) supports the effectiveness of this approxima  tion. Because of finite number of νd ðλ  1Þd , an upper limit of W exists in ψ m ðWÞ. However, according to Assumption A, we can neglect the effects of the upper limit on the evolutionary dynamics and then we do not take into account the upper limit in the main text. The ψ m ðWÞ given above expresses a characteristic feature of a mountainous landscape well. As the reference fitness (Wm) becomes larger or as the number of mutated sites, d, becomes larger, the fraction of deleterious mutants (with lower fitnesses than the reference fitness) becomes larger caused by the negative effect of  ð1 þ kÞdW m =ν in Eq. (32). Our approach is similar to the previous approach based on the Fisher's geometric model (Orr, 2006b). This issue is crucial for the evolutionary dynamics and is referred to in Discussion.

Appendix C. Supplementary data Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.jtbi.2015.07.007. References Aita, T., Uchiyama, H., Inaoka, T., Nakajima, M., Kokubo, T., Husimi, Y, 2000. Analysis of a local fitness landscape with a model of the rough Mt. Fuji-type landscape: application to prolyl endopeptidase and thermolysin. Biopolymers 54, 64–79. Aita, T., Hayashi, Y., Toyota, H., Husimi, Y., Urabe, I., Yomo, T., 2007. Extracting characteristic properties of fitness landscape from in vitro molecular evolution: a case study on infectivity of fd phage to E.coli. J. Theor. Biol. 246, 538–550. Aita, T., 2008. Hierarchical distribution of ascending slopes, nearly neutral networks, highlands, and local optima at the dth order in an NK fitness landscape. J. Theor. Biol. 254, 252–263. Aita, T., Husimi, Y., 2014. Biomolecular information gained through in vitro evolution on a fitness landscape in sequence space. In: Richter, H., Engelbrecht, A.P. (Eds.), Recent Advances in the Theory and Application of Fitness Landscapes. Springer-Verlag, Berlin, Heidelberg, pp. 71–100. Altenberg, L., 1997. NK fitness landscapes. In: Back, T., Fogel, D., Michalewicz, Z. (Eds.), The Handbook of Evolutionary Computation, 1113. IOP Publishing, Bristol UK (B2.7:5-10). Barrick, J.E., Yu, D.S., Yoon, S.H., Jeong, H., Oh, T.K., Schneider, D., Lenski, R.E., Kim, J. F., 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461, 1243–1247. Betancourt, A.J., Bollback, J.P., 2006. Fitness effects of beneficial mutations: the mutational landscape model in experimental evolution. Curr. Opin. Genet. Dev. 16, 618–623.

4 Note that the NK model defined in this study is slightly different from the original NK model introduced by Kauffman's group (Kauffman and Weinberger, 1989). For example, the original NK model was introduced for binary sequences (λ ¼ 2 in this paper). 5 Roughly, this reason resides in that the change in fitness is given as a sum of random independent numbers (Aita, 2008).

307

Campos, P.R.A., Adamib, C., Wilke, C.O, 2002. Optimal adaptive performance and delocalization in NK fitness landscape. Physica A 304, 495–506. Derrida, B., Peliti, L, 1991. Evolution in a flat fitness landscape. Bull. Math. Biol. 53, 355–382. Eigen, M., 1985. Macromolecular evolution: dynamical ordering in sequence space. Ber. Bunsenges. Phys. Chem. 89, 658–667. Elena, S.F., Agudelo-Romero, P., Carrasco, P., Codoner, F.M., Martin, S., TorresBarcelo, C., Sanjuan, R., 2008. Experimental evolution of plant RNA viruses. Heredity 100, 478–483. Fisher, R.A., 1930. The Genetical Theory of Natural Selection. Clarendon, Oxford. Fontana, W., Stadler, P.F., Bornberg-Bauer, E.G., Griesmacher, T., Hofacker, I.L., Tacker, M., Tarazona, P., Weinberger, E.D., Schuster, P., 1993. RNA folding and combinatory landscapes. Phys. Rev. E 47, 2083–2099. Fogle, C.A., Nagle, J.L., Desai, M.M., 2008. Clonal interference, multiple mutations and adaptation in large asexual populations. Genetics 180, 2163–2173. Gerrish, P.J., Lenski, R.E., 1998. The fate of competing beneficial mutations in an asexual population. Genetica 102–103, 127–144. Govindarajan, S., Goldstein, R.A., 1997. Evolution of model proteins on a foldability landscape. Proteins 29, 461–466. Hosoda, K., Habuchi, M., Suzuki, S., Miyazaki, M., Takikawa, G., Sakurai, T., Kashiwagi, A., Sueyoshi, M., Matsumoto, Y., Kiuchi, A., Mori, K., Yomo, T., 2014. Adaptation of a cyanobacterium to a biochemically rich environment in experimental evolution as an initial step toward a chloroplast-like state. PLoS One 9, e98337. Huynen, M.A., Stadler, P.F., Fontana, W., 1996. Smoothness within ruggedness: the role of neutrality in adaptation. Proc. Natl. Acad. Sci. USA 93, 397–401. Ichihashi, N., Usui, K., Kazuta, Y., Sunami, T., Matsuura, T., Yomo, T., 2013. Darwinian evolution in a translation-coupled RNA replication system within a cell-like compartment. Nat. Commun. 4, 2494. Imhof, M., Schlotterer, C., 2001. Fitness effects of advantageous mutations in evolving Escherichia coli populations. Proc. Natl. Acad. Sci. USA 98, 1113–1117. Iwasa, Y., 1988. Free fitness that always increases in evolution. J. Theor. Biol. 135, 265–281. Kauffman, S.A., Weinberger, E.D., 1989. The NK model of rugged fitness landscapes and its application to maturation of the immune response. J. Theor. Biol. 141, 211–245. Kauffman, S.A., 1993. The Origin of Order. Oxford University Press, Oxford. Kassen, Bataillon, 2006. Distribution of fitness effects among beneficial mutations before selection in experimental populations of bacteria. Nat. Genet. 38, 484–488. Kita, H., Matsuura, T., Sunami, T., Hosoda, K., Ichihashi, N., Tsukada, K., Urabe, I., Yomo, T., 2008. Replication of genetic information with self-encoded replicase in liposomes. Chembiochem 9, 2403–2410. Kryazhimskiya, S., Tkacik, G., Plotkin, J.B., 2009. The dynamics of adaptation on correlated fitness landscapes. Proc. Natl. Acad. Sci. USA 106, 18638–18643. Limic, V., Pemantle, R., 2004. More rigorous results on the Kauffman–Levin model of evolution. Ann. Probab. 32, 2149–2178. Macken, C.A., Perelson, A.S., 1989. Protein evolution on rugged landscapes. Proc. Natl. Acad. Sci. USA 86, 6191–6195. Maynard-Smith, J., 1970. Natural selection and the concept of a protein space. Nature 225, 563–564. Meyer, J.R., Dobias, D.T., Weitz, J.S., Barrick, J.E., Quick, R.T., Lenski, R.E., 2012. Repeatability and contingency in the evolution of a key innovation in phage lambda. Science 335, 428–432. Neidhart, L., Szendro, I.G., Krug, J., 2013. Exact results for amplitude spectra of fitness landscapes. J. Theor. Biol. 332, 218–227. Nimwegen, E.V., Crutchfield, J.P., Mitchell, M., 1999. Statistical dynamics of the royal road genetic algorithm. Theor. Comput. Sci. 229, 41–102. Nimwegen, E.V., Crutchfield, J.P., Huynen, M., 1999. Neutral evolution of mutational robustness. Proc. Natl. Acad. Sci. USA 96, 9716–9720. Orr, H.A., 2003. The distribution of fitness effects among beneficial mutations. Genetics 163, 1519–1526. Orr, H.A., 2006a. The distribution of fitness effects among beneficial mutations in Fisher's geometric model of adaptation. J. Theor. Biol. 238, 279–285. Orr, H.A., 2006b. The population genetics of adaptation on correlated fitness landscapes: the block model. Evolution 60, 1113–1124. Otwinowski, J., Nemenman, I., 2013. Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter. PLoS One 8, e61570. Park, S.C., Krug, J., 2007. Clonal interference in large populations. Proc. Natl. Acad. Sci. USA 104, 18135–18140. Perelson, A.S., Macken, C.A., 1995. Protein evolution on partially correlated landscapes. Proc. Natl. Acad. Sci. USA 92, 9657–9661. Pitt, J.N., Ferre-D'Amare, A.R., 2010. Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379. Prügel-Bennett, A., 1997. Modelling evolving populations. J. Theor. Biol. 185, 81–95. Sanjuan, R., Moya, A., Elena, F., 2004. Evolution of the folding ability of proteins through functional selection. Proc. Natl. Acad. Sci. USA 101, 8396–8401. Schiffels, S., Szollosi, G.J., Mustonen, V., Lassig, M., 2011. Emergent neutrality in adaptive asexual evolution. Genetics 189, 1361–1375. Schultes, E.A., Bartel, D.P., 2000. One sequence, two ribozymes: implications for the emergence of new ribozyme folds. Science 289, 448–452. Schuster, P., Fontana, W., Stadler, P.F., Hofacker, I.L., 1994. From sequences to shapes and back: a case study in RNA secondary structures. Proc. R. Soc. (London) B 255, 279–284. Schuster, P., Fontana, W., 1999. Chance and necessity in evolution: lessons from RNA. Physica D 133, 427–452.

308

T. Aita, T. Yomo / Journal of Theoretical Biology 382 (2015) 298–308

Steinbruck, L., McHardy, A.C., 2011. Allele dynamics plots for the study of evolutionary dynamics in viral populations. Nucleic Acids Res. 39, e4. Suzuki, H., Iwasa, Y., 1999. Crossover accelerates evolution in GAs with a Babel-like fitness landscape: mathematical analyses. Evol. Comput. 7, 275–310. Szendro, I.G., Schenk, M.F., Franke, J., Krug, J., de Visser, J.A.G.M., 2013. Quantitative analyses of empirical fitness landscapes. J. Stat. Mech. 1, P01005. Voigt, C.A., Kauffman, S., Wang, Z.G., 2000. Rational evolutionary design: the theory of in vitro protein evolution. Adv. Protein Chem. 55, 79–160. Wahl, L.M., Krakauer, D.C., 2000. Models of experimental evolution: the role of genetic chance and selective necessity. Genetics 156, 1437–1448. Weinberger, E.D., 1991. Local properties of Kauffman's N-k model: a tunably rugged energy landscape. Phys. Rev. A 44, 6399–6413.

Wilke, C.O., Wang, J.L., Ofria, C., Lenski, R.E., Adami, C., 2001. Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature 412, 331–333. Wilke, C.O., 2004. The speed of adaptation in large asexual populations. Genetics 167, 2045–2053. Wiser, M.J., Ribeck, N., Lenski, R.E., 2013. Long-term dynamics of adaptation in asexual populations. Science 342, 1364–1367. Woodcock, G., Higgs, P.G., 1996. Population evolution on a multiplicative singlepeak fitness landscape. J. Theor. Biol. 179, 61–73.