A novel fitness allocation algorithm for maintaining a constant selective pressure during GA procedure

A novel fitness allocation algorithm for maintaining a constant selective pressure during GA procedure

Neurocomputing 148 (2015) 3–16 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom A novel fit...

6MB Sizes 0 Downloads 31 Views

Neurocomputing 148 (2015) 3–16

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

A novel fitness allocation algorithm for maintaining a constant selective pressure during GA procedure Li Feng Zhang n, Chen Xi Zhou, Rong He, Yuan Xu, Meng Ling Yan School of Information, Renmin University of China 59, Zhongguancun Street, Haidian, Beijing 100872, PR China

art ic l e i nf o

a b s t r a c t

Article history: Received 7 April 2012 Received in revised form 22 June 2012 Accepted 4 July 2012 Available online 12 August 2014

During the evolution procedure of GA, the fitness distribution of population is always unforeseeable since it varies with many factors such as the nonlinearity and multimodality of optimization problem, crossover and mutation algorithms, and the progress of evolution procedure. For the GAs using stochastic selection mechanisms, fitness distribution sometimes significantly impacts on the performance of selection process, so as to make it very ineffective in protecting superior individuals and preventing inferior individuals. In this study, a new fitness scaling method, named powered distance sums scaling (PDSS), is proposed to eliminate the influence of fitness distribution on stochastic selection. Unlike previous approaches, the new method uses the powered sums of fitness distance to substitute for both raw fitness values and ranking numbers for computing scaled fitness. It maintains a much more constant and consistent selective pressure in different conditions of optimization problem and GA algorithm design, and may help GA designers in balancing exploration and exploitation during evolution procedures. Empirical studies are employed to illustrate the new method that through using the new scaling technique the convergence speed of GA search becomes more controllable. & 2014 Elsevier B.V. All rights reserved.

Keywords: Genetic algorithm Fitness scaling Fitness distribution Stochastic selection Selective pressure Survival probability

1. Introduction GA was essentially developed based on the fundamental principle of Darwinian natural selection that a fitter individual will have more chances to survive and produce offspring. It has been widely applied to deal with real number optimization problems such as artificial neural network (ANN) training [1,2], and combinatorial optimization such as vehicle routing problem (VRP) [3]. In GA procedure, selection operation which samples a population to produce copies for new generation according to individuals fitness, provides the driving force in GA search to ensure a desirable evolutionary direction. Generally, there are three types of selection mechanisms including stochastic selection, deterministic selection, and mixed selection. In stochastic selection, selective pressure which is determined by the way of how to allocate survival probability to each individual, is a critical issue and influences the effectiveness of GA search to a great extent. Adjusting selective pressure, in some sense, is just to strike a balance between taking the risk of premature convergence and making the search ineffective. The most basic and straightforward methods for transforming raw fitness into survival probability (selective probability) are proportional selection and rank selection [4]. Proportional selection determines the

n

Corresponding author. E-mail address: [email protected] (L.F. Zhang).

http://dx.doi.org/10.1016/j.neucom.2012.07.063 0925-2312/& 2014 Elsevier B.V. All rights reserved.

probability of each individual proportional to its raw fitness, and in rank selection ranking numbers are used to substitute for the actual values of raw fitness. Unfortunately, both raw fitness values and ranking numbers cannot comprehensively indicate the status of an individual in its population. In proportional selection, survival probability of each individual is determined without adequately considering the other individuals. In rank selection, the scatter situation of fitness values does not affect on the sampling process at all. The selective pressure arose from the use of these two methods may vary with the fitness distribution of population [5–7], especially sometimes cannot provide sufficient protection to the individuals who need to be protected. In evolution procedure, fitness distribution is always unforeseeable, and varies with a number of factors. First, the landscape of optimization problem determines the spread of fitness values to a great extent. Second, crossover and mutation operators can be viewed as special types of neighbourhood search technique. Different crossover and mutation algorithms, as well as different parameter settings imply different properties of the neighborhood search, and then different distributions of offspring individuals. Finally, the fitness distribution of population is also impacted by some random elements, and obviously varies with the progress of evolution procedure. Therefore, both proportional selection and rank selection cannot give constant and consistent performances under different situations of optimization problem and GA design. In the last several decades, a number of fitness scaling methods have been proposed and widely applied in real applications [8–11].

4

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

the most famous selection methods and adopted by many researchers as the first choice. In this kind of selection mechanisms, how to allocate survival probability to each individual is a critical issue that significantly impacts on the performance of GA search. In GA, a population of individuals represents a set of solution to the optimization problem, and evaluation operator simply converts their objective function values into raw fitness fi. Survival probability, then, is determined by the use of fi. The most simple and straightforward approach, called proportional selection, is formulated as follows:

These methods are developed based on the same ideas as proportional selection and rank selection, i.e. using either actual fitness value or ranking number to compute survival probability, but adopting different linear or nonlinear functions to transform raw fitness to scaled fitness for the sake of achieving an adaptable selective pressure. Some of these methods adaptively normalize raw fitness values by using their mean, standard deviation and other statistics against the variances of population distribution. For example, dynamic linear scaling simply removes the worst value from raw fitness to ensure the positive values of scaled fitness [12]. Windowing technique introduces a moving baseline into fitness proportional selection, and subtracts the worst value observed in recent generations from raw fitness to maintain more constant selective pressure [4]. In Sigma truncation, a constant which is a linear function of the mean and standard deviation of fitness is subtracted from raw fitness values in order to prevent the inferior individuals which would make the scatter of raw fitness very large, so as to reduce the selective pressure [13]. Adaptive span scaling technique scales raw fitness also in terms of the mean and standard deviation of fitness, but through using a nonlinear scaling function [14,9]. However, it is obviously that these statistics cannot adequately characterize the scatter situation of raw fitness, so that all these scaling techniques still cannot completely eliminate the influence of fitness distribution on selection process, and all previous methods have the same problem to some extent. To overcome this problem, a new fitness scaling algorithm named powered distance sums scaling (PDSS) is proposed in this study. The new method scales raw fitness values through using the powered sum of the raw fitness distances from target individual to all the other individuals. In other words, it involves the entire fitness spread of population in each survival probability computation. Compared to previous approaches, PDSS reduces the impacts from fitness distribution of population, and provides more constant selective pressure to stochastic selection process. Furthermore, the selective pressure arose from using PDSS can be easily and effectively adjusted by setting different values of scaling parameters, to satisfy different optimization problem and GS design. The present study is organized as follows. Section 2 gives a brief introduction to stochastic selection, selective pressure, and previous fitness scaling methods. Section 3 formulates the problem, and simulated examples are employed to illustrate the influence of fitness distribution on the effectiveness of stochastic selection. In Section 4, a new fitness scaling method is proposed to overcome the problem. In Section 5, representative case studies are selected to demonstrate the new method, and both real coded GA for unconstrained optimization and integer coded GA for TSP optimization are implemented to extensively test the performance of PDSS. Finally, in Section 6 conclusions are drawn to summarize the study.

where Pmax is the probability for the best individual. Let 0 r P min r 1=N denotes the probability for the worst individual, r is then defined as follows:

2. Stochastic selection and fitness scaling

2.2. Fitness scaling techniques

2.1. Stochastic selection and selective pressure

In the last several decades, a number of fitness scaling methods have been proposed for the sake of statically or dynamically adjusting the selective pressure of stochastic selection. Some well-recognized fitness scaling techniques and recent developments are summarized in Table 1. As can be seen from the table, most previous fitness scaling approaches are monotone, i.e. in these methods a better fitted individual will definitely have a larger scaled fitness value. This kind of methods performs very well in most applications. In the fitness value based approaches, the actual value of raw fitness fatefully determines how the scaled fitness and survival probability scatter. In ranking based methods, ranking numbers are used to

Selection mechanisms of GA can be classified into three types including stochastic selection, deterministic selection, and mixed selection [4,15]. In deterministic selection and mixed selection mechanisms, such as truncation selection, steady state selection and tournament selection, fitter individuals are determinately picked up from a population or a set of individuals, and passed onto new generation. In these methods, fitness scaling techniques are obviously unnecessary to be considered. In contrast, stochastic selection mechanisms probabilistically select individuals to form new generation, such as roulette wheel selection which is one of

Pi ¼

fi ∑N j ¼ 1f j

ð1Þ

where Pi denotes the survival probability of individual i, and N is the size of population. In addition, another popular method, called rank selection, uses the ranking number of each individual to determine its survival probability rather than using actual fitness value. The survival probability Pk for the kth individual in the ranking of population is computed as follows. P k ¼ P max  ðk  1Þr



ð2Þ

P max  P min N1

ð3Þ

Selection process provides the driving force in GA search and the power of this driving force is commonly characterized as selective pressure, i.e. to what extent will superior individuals get more chances to be selected for producing offspring. A possible way to analyze the selective pressure arose from a selection method is to investigate the expected copy number of each individual that will be reproduced in new generation. Consider roulette wheel selection, the expected copy number of an individual obtained from using proportional selection and rank selection can be yielded as (4) and (5) respectively. EðN i ÞP ¼ NP i ¼

Nf i ∑N j ¼ 1f j

EðN k ÞR ¼ NP k ¼ NðP max 

ð4Þ ðP max  P min Þðk  1Þ Þ N 1

ð5Þ

Based on this concept, the expected distribution of new generation also can be observed with the use of histogram. This method will be applied in the following sections to illustrate the effectiveness of the new fitness scaling method.

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

5

Table 1 Fitness scaling methods. Fitness value based methods

Fitness ranking based methods

Non-adaptive linear or nonlinear mapping

Linear scaling [8] Power law scaling [13] Logarithmic scaling [12] Relative fitness scaling [16]

Adaptive to population

Dynamic linear scaling [12] Sigma truncation [13] Windowing [4] Normalizing [17] Exponential transformation with median [18] Adaptive span scaling [14,9]

Adaptive to generation

Boltzmann selection [19] Standard deviation schedule based Boltzmann selection [20]

Non-monotone

Averaged non-monotone scaling [21] Adjustable non-monotone scaling [22]

Non-adaptive linear or nonlinear mapping

Linear rank scaling [23] Exponential rank scaling [24] Probabilistic nonlinear rank scaling [25] Stabilizing scaling [26]

Adaptive to generation

Transform ranking [27]

substitute for actual fitness values for the sake of preventing the individuals with extreme values. In these methods, raw fitness is transformed to scaled fitness by using different linear or nonlinear scaling functions, and the selective pressure of these methods can be adjusted through tuning the scaling parameters in the functions. To obtain an adaptive capability against the variances of fitness distribution, some of these methods such as dynamic linear scaling, Sigma truncation, and windowing, normalize raw fitness values by using the statistics of the current or recent population such as mean, maximum, minimum, and standard deviation. Some other methods, such as Boltzmann selection, dynamically change the scaling parameters along with the increase of the number of generations in order to adjust the selective pressure according to the progress of evolutionary search. Non-monotone scaling methods can be viewed as a kind of disruptive selection, and tend to prevent the individuals with moderate fitness values. They provide outstanding performance for some special problems, such as needle-in-a-haystack problem where the global optima are surrounded by the worst solutions.

3. Problem formulation 3.1. The impact of fitness distribution on selective pressure As aforementioned, all the previous fitness scaling methods can be classified into two types. The most basic and typical approaches are proportional selection and rank selection. Unfortunately, both of them have drawbacks because neither actual fitness value nor ranking number can comprehensively describe the situation of an individual in its population. The distribution of population may sometimes significantly impact on the performance of stochastic selection. Furthermore, some of the previous scaling techniques adaptively adjust the scaling parameters by the use of the basic statistics of current or recent populations. These statistics, however, are unable to fully characterize the distribution of population so that they still cannot effectively eliminate the influence of the populations distribution on stochastic selection process. That is to say, all the previous scaling approaches have the same problem to some extent. For simplifying the argumentation without losing generalization, in this study, only proportional selection

Fig. 1. Fitness plot for situation 1.

Fig. 2. Fitness plot for situation 2.

incorporated with dynamic linear scaling and rank selection are considered to formulate the problem, and to draw comparisons with the new scaling method. Two typical situations, which often arise at the early and later stages of evolution procedure respectively, are further discussed as follows. Situation 1: As shown in Fig. 1, when a few super best individuals appear proportional selection will allocate comparative higher survival probabilities to these individuals according to their excellent raw fitness values. Thus, these super individuals will produce more copies so that GA can search around their neighbourhood regions immediately. Rank selection, however, cannot diagnose this kind of super best individuals, and cannot give them more chances to survive. These individuals may be omitted in stochastic selection process with considerable probabilities. In this case, rank selection is much less effective compared to proportional selection. Moreover, when super worst individuals appear, proportional selection will also display a weak selective pressure. The advantage of the best individuals in fitness values will be significantly weakened, since most moderate individuals are also excellent compared to super worst individuals. Situation 2: As shown in Fig. 2, the best individual is just slightly above average when GA performs an evolutionary convergence and most individuals assemble around an optimum already. Proportional selection fails to differentiate the individuals in such kind of population since their raw fitness values are very close. In addition, extreme values of inferior individuals may further reduce the selective pressure and make GA becomes almost a random search around the optimum. Under this condition, rank selection still provides the best individual a

6

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

Fig. 3. (a)–(d) The fitness distribution of four simulated populations; (a1)–(d1) the corresponding expected distributions obtained from using (1); and (a2)–(d2) the corresponding expected distribution obtained from using (2) and (3).

Fig. 4. Expected distributions of the new populations, (a1)–(d1), and (a2)–(d2) are respectively obtained from using PDSS with α ¼ 2 and α ¼ 0:5.

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

considerable large survival probability, and makes GA performs a fast exploitation. Contrary to situation 1, rank selection will be more effective than proportional selection here. The raw fitness distribution is always unforeseeable and varies with a number of factors [28]. Therefore, both proportional selection and rank selection cannot give constant and consistent performances under different optimization problems and GA algorithm designs, which is definitely undesirable.

7

pressure in this situation. In Fig. 3(c1) and (c2), both proportional selection and rank selection fails to produce more copies for superior individuals when super poor individuals appear. Similarly, in Fig. 3(d1) and (d2), both the two selection methods cannot provide sufficient protection to the best individual which stands alone out from the crowd.

4. New fitness scaling method 3.2. Simulated examples 4.1. Powered distance Sums scaling (PDSS) To demonstrate this problem, four simulated populations with size of 100 were generated with different raw fitness distributions as depicted in Fig. 3(a)–(d). Fig. 3(a1)–(d1) and (a2)–(d2) respectively presents the expected distributions of the new populations obtained from using (1)–(3). To maximize the spread of survival probability, the probability for the worst individual in each population was fixed to be zero by using linearly scaled fitness 0 f i ¼ f i f min to replace fi in (1), and setting P min ¼ 0 in (2). As shown in Fig. 3(a), (a1), and (a2), when super excellent individuals appear proportional selection displays a very high selective pressure as all the superior individuals have large numbers of copies. Rank selection fails to provide sufficient protection to such kind of superior individuals. For the population given in Fig. 3(b), proportional selection becomes very ineffective as the fitness distributions shown in Fig. 3(b) and (b1) are almost same. In contrast, rank selection provides a higher selective

In the present study, a new fitness scaling method is proposed to overcome the aforementioned problem. The new method, named powered distance sums scaling (PDSS), is formulated as follows: !α ○ fi

¼

∑ ðf i  f j Þ

f j A fi 

1=α

!1=α 

∑ ðf j  f i Þ

α

ð6Þ

f j A fi þ

where f i  and f i þ denote the raw fitness sets defined as f i  ¼ ff j jf j o f i ; f j  fg, and f i þ ¼ ff j jf j 4 f i ; f j  fg. α 4 0, is a parameter for adjusting the spread of scaled fitness. Through increasing α, the scaled fitness values of superior individuals will be enlarged and spread out, thus, they will take up more survival opportunities, and the selective pressure of PDSS will be subsequently enhanced. Otherwise, inferior individuals may have more

Fig. 5. Average objective function values of the best solutions obtained in 50 runs and sampled along the evolutionary processes for St70 problem.

8

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

chances to survive, and then the selective pressure will be reduced. ○ f i , then, is used to allocate survival probability as follows: 8 1 ○ ○ > > f best ¼ f worst P ¼ > < i N ○ ○ ð7Þ ð1  NP min Þðf i f worst Þ > > þ P min otherwise P ¼ > ○ ○ N : i ∑j ¼ 1 ðf j  f worst Þ ○



where f worst and f best denote the minimum and maximum values ○ of f respectively. Pmin in (7) is defined same as in (3), and needs to be fixed by GA user. A smaller Pmin will result in a lager selective pressure. When P min ¼ 0 survival probabilities are most scattered, and when P min ¼ 1=N all individuals have same chance to be selected for new population. In addition, ∑N i ¼ 1 P i ¼ 1, and it can be proved as follows: Proof. N

N

i¼1

i¼1

∑ Pi ¼ ∑ N





ð1 NP min Þðf i  f worst Þ þ P min ○ ○ ∑N j ¼ 1 ðf j  f worst Þ ○

N







¼

i¼1

N



N

N

i¼1

j¼1







∑ ðf j  f worst Þ

j¼1 N









∑ ðf i  f worst Þ

¼

i¼1 N

¼1

∑ ðf j  f worst Þ

j¼1



The properties of PDSS with different values of α are summarized as follows. ○

N Proposition 1. When α ¼ 1, f i ¼ Nf i  ∑N j ¼ 1 f j . Since N and ∑j ¼ 1 f j ○ are constants, f i is simply a linear scaling of fi. It can be proved as follows.

Proof. Let f i ¼ ff j jf j ¼ f i ; f j  fg, then f i þ [ f i  [ f i ¼ f, and ○

f i ¼ ∑ ðf i  f j Þ  ∑ ðf j  f i Þ þ ∑ ðf i  f j Þ f j A fi 

f j A fi þ

f j A fi

¼ Nf i  ∑ f j  ∑ f j  ∑ f j f j A fi þ

¼ Nf i  ∑ f j fj Af

f j A fi 

f j A fi



ð9Þ

Remark 1. When α 4 1, PDSS maintains a larger selective pressure than proportional selection.

!

∑ ðf i  f worst Þ  NP min ∑ ðf i  f worst Þ þ ∑ ðP min ∑ ðf j  f worst ÞÞ

i¼1

4.2. The properties of PDSS with different values of α

ð8Þ

Consider the two situations mentioned in subsection 3.1 again. When the raw fitness values of superior individuals are widely spread, PDSS allocates survival probabilities mostly depending on their raw fitness. Super excellent individuals, then, will receive much larger survival probabilities than the others. When the raw fitness values of superior individuals are very close, the new method provides a comparative wide and even spread of scaled fitness values, primarily according to their ranking numbers. It

Fig. 6. Average objective function values of the best solutions obtained in 50 runs and sampled along the evolutionary processes for Berlin52 problem.

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

ensures that the individuals in the first rank will have large survival opportunities even though their raw fitness values are just slightly better than the others. Proposition 2. When α-1, the selection pressure of PDSS is maximized that the best individuals occupy the entire assignable survival probability, and all the other individuals have same survival probability as Pmin. It is formulated as follows. 8 ð1  NP Þ < lim P best ¼ Nbestmin þ P min α-1 ð10Þ : lim P nonbest ¼ P min α-1

where Pbest and Pnonbest respectively denote the survival probability for the individuals who have the best raw fitness value, and all the other individuals in population. N best Z1 is the number of the best individuals. (10) is proved in Appendix. Remark 2. When α o 1, PDSS maintains a weaker selective pressure than proportional selection. When the raw fitness values of superior individuals are widely spread, the new method performs quite similar to rank selection. When the raw fitness values of superior individuals are very close and just slightly above average, PDSS assigns survival probability primarily according to their raw fitness values. Proposition 3. When α-0, the selection pressure of PDSS is minimized as all the individuals except the worst ones will be

9

allocated with same survival probability. It is formulated as follows: lim P nonworst ¼

α-0

1  N worst P min N  N worst

ð11Þ

where Pnonworst denotes the survival probabilities for all the individuals except the worst ones. Nworst is the number of the worst individuals. (11) is also proved in Appendix. 4.3. The performance of PDSS under different fitness distributions Consider the simulated populations given in Section 3 again. The expected distribution of the new populations obtained from using PDSS with α ¼ 2 and α ¼ 0:5 are shown in Fig. 4(a1)–(d1) and (a2)–(d2) respectively. As shown in Fig. 4(a1) to (d1), when α ¼ 2 the new method maintains a large selective pressure in all the four examples. Conversely in Fig. 4(a2)–(d2), PDSS always displays a weak selection pressure when α ¼ 0:5.

5. Empirical studies In the present study, empirical studies on real number and combinatorial optimizations were performed to illustrate the effectiveness of PDSS. In these experiments, the new scaling method was compared with proportional selection and rank selection. The scaling parameter α was set to be 2 and 4 respectively, and Pmin was fixed to be 0. PDSS with α r 1 has not been

Fig. 7. Average objective function values of the best solutions obtained in 50 runs and sampled along the evolutionary processes for Kroc100 problem.

10

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

minimum and average objective function values of the best solutions obtained from the last generations of the 50 runs, as well as the mean and standard deviation of the minimum and average objective function values across the four crossover methods. f B ðxn Þ to f K ðxn Þ are the best results so far reported on TSPLIB website. Firstly, consider the convergence speed of GA search with the use of proportional selection and rank selection. All the figures clearly suggest that for the GAs using PMX, these two methods always display very similar and quite slow convergence speeds. When PBX is applied, rank selection shows a faster convergence than proportional selection at the early stage of evolution procedure, but becomes slower at a later stage, particularly for St70. When CX is adopted, rank selection shows a much faster convergence speed than proportional selection. When SEX is applied, rank selection also displays a faster convergence, but the difference from proportional selection is very small. In other words, for rank selection, GA processes using CX and SEX converge much faster than when using PMX and PBX. On the other hand, proportional selection provides an effective driving force to GA seach only if SEX is also applied. The performances of these two methods are very inconsistent under different crossover algorithms. Secondly, consider the performance of PDSS. All the figures clearly indicate that PDSS, whether α ¼ 4 or α ¼ 2, always keeps a faster convergence speed than both proportional selection and rank selection. In some cases, it accelerates GA search almost ten times faster than the other two. Consider the performance of PDSS with α ¼ 4. For Berlin52 problem, GA processes converge between 1000 to 2000 generations whichever crossover algorithm is adopted. For St70, they all converge at the generations around 3000. For Kro100, PDSS exhibits a convergence between 5000 and 10,000 generations only except that when PMX is adopted GA search converges at the generations between 10,000 to 20,000.

considered in this study because if so, the convergence speed of GA would become very slow. Different crossover algorithms and different GA parameter settings were taken into consideration to induce various fitness distributions during the evolution procedures. It is noted that there was no any local optimization technique applied in these experiments, so that these GA processes were only driven by stochastic selection. In addition, 50 independent runs were carried out for each example. The most popular stochastic selection mechanism, roulette wheel selection, was adopted here together with regular GA procedure. Elite preserving with elitism size of one was also employed. 5.1. Integer coded GA for TSP In this subsection, a series of experiments on integer coded GA for solving well-known travel salesman problem (TSP) was conducted to demonstrate the consistency of the performance of PDSS under different conditions of GA algorithm design. For dealing with such kind of optimization problems, the algorithm design of crossover and mutation operators is always highly problemdependent, and sometimes may fatefully affect the performance of GA search. In this study, four crossover algorithms including partial-mapped crossover (PMX), position-based crossover (PBX), cycle crossover (CX), and sub-tour exchange crossover (SEX) were adopted separately to extensively test the performance of each fitness computation scheme. Three benchmark problems, Berlin52, St70, and Kroc100, were selected from TSPLIB website (http://elib. zib.de/pub/Packages/mp-testdata/tsp/tsplib/tsp/). The maximum generation number for the first two problems was set to be 10,000, whereas that for the last problem was taken to be 20,000. Figs. 5–7 depict the average objective function values of the 50 best solutions sampled along the evolution procedures. Table 2 presents the statistical results of the three examples, including the

Table 2 Statistical results for the 50 best solutions obtained at the last generation. Scaling methods Berlin52 f B ðxn Þ ¼ 7542

ST70 f S ðxn Þ ¼ 675

KroC100 f K ðxn pÞ ¼ 20; 749

PMX

PBX

Minimum objective function value of the 50 best solutions Proportional 7802 7542 Rank 8007 7627 PDSS(α ¼ 2) 7542 7542 PDSS(α ¼ 4) 7542 7542 Average objective function value of the 50 best solutions Proportional 8640.88 8092.76 Rank 8610.78 8171.50 PDSS(α ¼ 2) 7838.14 8081.92 PDSS(α ¼ 4) 7830.18 8116.20 Minimum objective function value of the 50 best solutions Proportional 837 709 Rank 821 701 PDSS(α ¼ 2) 675 689 PDSS(α ¼ 4) 675 687 Average objective function value of the 50 best solutions Proportional 888.90 746.42 Rank 898.72 784.34 PDSS(α ¼ 2) 700.22 730.38 PDSS(α ¼ 4) 690.74 723.64 Minimum objective function value of the 50 best solutions Proportional 29,785 24,540 Rank 30,123 21,384 PDSS(α ¼ 2) 21,211 21,410 PDSS(α ¼ 4) 20,819 21,377 Average objective function value of the 50 best solutions Proportional 32597.42 27876.62 Rank 31,916.70 26,200.56 PDSS(α ¼ 2) 22,347.82 23,449.88 PDSS(α ¼ 4) 21,561.56 22,547.62

CX

SEX

Mean

STD

7743 7808 7683 7757

7542 7596 7542 7542

7657.25 7759.50 7577.25 7595.75

135.24 189.65 70.50 107.50

8334.94 8269.94 8268.22 8325.14

7955.74 8160.30 7906.64 8081.22

8256.08 8303.13 8023.73 8088.19

300.65 210.93 192.63 202.93

690 685 697 687

680 684 680 676

729.00 722.75 685.25 681.25

73.00 65.96 9.74 6.65

761.36 730.20 736.06 729.50

707.60 722.26 701.72 719.04

776.07 783.88 717.10 715.73

78.56 81.38 18.77 17.20

21,707 21,339 21,180 21,609

20,890 20,895 20,908 21,016

24,230.50 23,435.25 21,177.25 21,205.25

4019.75 4463.96 206.41 354.74

23,388.56 22,623.02 22,937.44 22,948.18

22,022.12 22,735.44 21,986.02 22,233.36

26,471.18 25,868.93 22,680.29 22,322.68

4788.97 4360.44 645.74 585.70

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

11

When α ¼ 2 the convergence speeds displayed by PDSS are a little bit lower, and slightly less consistent than when α ¼ 4. However, it still provides a much more constant and effective performance than both proportional selection and ran selection. Finally, consider the quality of the optimization results. In Table 2, there are totally 36 best values have been picked up from each sub-column, and marked in bold-face. 33 of these best values

are obtained from using PDSS, moreover, in the other 3 cases PDSS also receives very good results. Particularly, the last two columns of the table clearly suggests that PDSS maintains a much more constant and better quality optimization performance under various crossover algorithms, because all the best values in these columns are obtained from using PDSS. It means that the new method could be a good choice for the GA designers who do not have enough prior knowledge of the performance of their GA algorithm design.

Fig. 8. Average objective function values of the best solutions obtained in 50 runs and sampled along the evolutionary processes for Rosenbrocks function.

Fig. 9. Average objective function values of the best solutions obtained in 50 runs and sampled along the evolutionary processes for Rastrigins function.

12

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

In this subsection, empirical studies on real coded GA for unconstrained optimization were performed to further demonstrate the effectiveness of PDSS in dealing with multimodal problems. Convex crossover and Gaussian mutation with standard

deviation of one were employed. Due to the multimodality and nonlinearity of the test functions, it becomes very challenging to suggest common values of GA parameters for the entire test suit. Hence, three combinations of crossover and mutation rates were applied in these experiments to examine the performance of each fitness computation scheme in the situations of ordinary

Fig. 10. Average objective function values of the best solutions obtained in 50 runs and sampled along the evolutionary processes for Ackleys function.

Fig. 11. Average objective function values of the best solutions obtained in 50 runs and sampled along the evolutionary processes for Griewanks function.

5.2. Real coded GA for unconstrained optimization

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

13

Table 3 Benchmark test functions. n1

Rosenbrock's function

f 1 ðxÞ ¼ ∑ ð100ðx2i  xi þ 1 Þ2 þ ð1  x2i ÞÞ, i¼1

 30 o xi o 30

f 1 ðxn Þ ¼ 0

n

Rastrigin's function

f 2 ðxÞ ¼ ∑ ðx2i  10 cos ð2πxi Þ þ 10Þ, i¼1

 5:12o xi o 5:12

f 2 ðxn Þ ¼ 0 Ackley's function

f 3 ðxÞ ¼  20 exp  0:2

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! 1 n 2 ∑ x  exp n i¼1 i

1 n

!

n

∑ cos ð2πxi Þ þ 20 þ e;

i¼1

 32 o xi o 32

f 3 ðxn Þ ¼ 0

  n x2i  ∏ cos pxiffii þ 1, 4000 i¼1 i¼1  600 o xi o 600

Griewank's function

n

f 1 ðxÞ ¼ ∑

f 4 ðxn pÞ ¼ 0

Table 4 Statistical results for the 50 best solutions obtained at the last generation. Scaling methods f 1 ðxn Þ

f 2 ðxn Þ

f 3 ðxn Þ

f 4 ðxn Þ

γ c ¼ 0:6, γ m ¼ 0:1

γ c ¼ 0:8, γ m ¼ 0:05

Minimum objective function value of the 50 best solutions Proportional 2.26  101 1.40  101 Rank 2.14  10  1 1.59  100 PDSS(α ¼2) 7.21  10  2 1.16  10  2 PDSS(α ¼4) 7.37  10  2 3.14  10  4 Average objective function value of the 50 best solutions Proportional 3.96  101 4.46  101 Rank 4.84  101 5.62  101 PDSS(α ¼2) 3.81  101 4.38  101 PDSS(α ¼4) 3.46  101 3.39  101 Minimum objective function value of the 50 best solutions Proportional 4.60  10  3 2.50  10  3 Rank 3.16  10  8 1.12  10  4 PDSS(α ¼2) 1.35  10  5 7.21  10  6 PDSS(α ¼4) 1.66  10  10 4.30  10  12 Average objective function value of the 50 best solutions Proportional 8.40  10  3 497  10  3 Rank 3.81  10  6 2.94  100 PDSS(α ¼2) 3.01  10  5 1.35  10  5 PDSS(α ¼4) 4.06  10  10 1.45  10  11 Minimum objective function value of the 50 best solutions Proportional 4.92  10  3 2.43  10  3 Rank 2.09  10  5 1.08  10  4 PDSS(α ¼2) 2.40  10  5 1.21  10  5 PDSS(α ¼4) 1.41  10  9 2.47  10  10 Average objective function value of the 50 best solutions Proportional 6.46  10  3 3.36  10  3 Rank 7.98  10  5 4.42  10  4 PDSS(α ¼2) 3.70  10  5 1.91  10  5 PDSS(α ¼4) 3.74  10  9 3.09  10  9 Minimum objective function value of the 50 best solutions Proportional 1.41  10  7 3.74  10  8 Rank 6.43  10  13 2.32  10  11 PDSS(α ¼2) 5.85  10  10 1.11  10  10 PDSS(α ¼4) 3.77  10  15 3.33  10  16 Average objective function value of the 50 best solutions Proportional 4.84  10  1 4.96  10  1 Rank 3.62  10  1 2.92  10  1 PDSS(α ¼2) 5.08  10  1 6.53  10  1 PDSS(α ¼4) 5.13  10  1 3.50  10  1

parameter setting, enhanced crossover, and enhanced mutation. Table 3 gives the four test functions, and f ðxn Þ denotes the global minimum of each function. The dimension of these functions was set as n ¼30, and the maximum generation number was fixed to be 20,000. Figs. 8–11 show the average evolutionary trends of 50 runs. Table 4 gives the

γ c ¼ 0:3, γ m ¼ 0:4

Mean

STD

1.37  101 3.43  10  1 3.85  10  1 1.31  10  2

1.67  101 7.17  10  1 1.56  10  1 2.90  10  2

5.03  100 7.62  10  1 2.01  10  1 3.91  10  2

3.41  101 2.88  101 4.91  101 2.28  101

3.95  101 4.45  101 4.37  101 3.04  101

5.28  100 1.41  101 5.46  100 6.59  100

1.69  10  3 7.11  10  7 2.85  10  5 8.41  10  9

2.93  10  3 3.7  10  5 1.64  10  5 2.86  10  9

1.50  10  3 6.46  10  5 1.10  10  5 4.81  10  9

556  10  3 1.85  10  6 6.35  10  5 2.38  10  8

6.31  10  3 9.80  10  1 3.57  10  5 8.07  10  9

1.83  10  3 1.70  100 2.5  10  5 1.36  10  8

5.94  10  3 5.42  10  10 1.10  10  4 3.06  10  8

4.43  10  3 4.31  10  5 4.86  10  5 1.07  10  8

1.80  10  3 5.75  10  5 5.33  10  5 1.72  10  8

8.37  10  3 3.60  10  9 1.50  10  4 5.92  10  3

6.07  10  3 1.74  10  4 6.87  10  5 2.20  10  8

2.53  10  3 2.35  10  4 7.11  10  5 3.22  10  8

9.33  10  8 1.11  10  16 3.16  10  9 2.44  10  13

9.05  10  8 7.96  10  12 1.28  10  9 8.27  10  14

5.17  10  8 1.32  10  11 1.64  10  9 1.40  10  13

6.01  10  1 6.77  10  1 8.11  10  1 5.60  10  1

5.27  10  1 4.43  10  1 6.57  10  1 4.74  10  1

6.43  10  2 2.05  10  1 1.52  10  1 1.10  10  1

minimum and average function values of the 50 best solutions obtained at the last generation. As can be seen from the figures, GAs using PDSS, especially with α ¼ 4, always converge faster than that using proportional selection and rank selection, only except that in the last example all the GA processes have similar evolutionary trends and

14

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

alternately surpass each others. Compared to proportional selection, rank selection exhibits a faster convergence speed in most cases. It is noted that rank selection sometimes even displays a faster convergence than PDSS with α ¼ 2 at the later stage of evolution procedures. Nevertheless, its performance highly depends on the settings of crossover and mutation rates. For Rastrigins function optimization, rank selection gives very excellent performances when rc ¼0.6, rm ¼ 0.1, as well as rc ¼ 0.3, rm ¼0.4, but becomes extreme ineffective when rc ¼0.8, rm ¼ 0.05. For Ackleys function optimization, only if rc ¼0.3 and rm ¼0.4, rank selection shows an outstanding convergence speed. In contrast, the performance of PDSS is always constant whatever crossover and mutation rates are specified. Moreover, as indicated in Table 4, most of the best results are obtained from using PDSS, and in all cases PDSS receives excellent experimental results. That is to say, by means of PDSS, the convergence speed of GA search can be significantly improved without losing of the capability of global optimization.

ðN  N best Þ1 þ ðf best  f worst Þ ○ N best ððN  N best Þ1 þ ðf best  f worst ÞÞ þ ∑ ðf i þ ðf best  f worst ÞÞ

¼

f i A f nonbest

1

¼



N best þ

∑f i A f nonbest f i ðN  N best Þðf best  f worst Þ þ ðN N best Þ1 þ ðf best  f worst Þ ðN  N best Þ1 þ ðf bpest  f worst ÞÞ

ðA:4Þ Consider the situation that f best [ f worst ¼ f, ○ ○ f best  f worst lim ○ ○ α-1∑N i ¼ 1 ðf i  f worst Þ

¼ N best þ

¼

1  ðN  N best Þðf best  f worst Þ ðN  N best Þðf best  f worst Þ þ ðN  N best Þ1 þ ðf best  f worst Þ ðN  N best Þ1 þ ðf best  f worst Þ

1 N best

ðA:5Þ

otherwise, ○

lim

6. Conclusions

α-1

In the present study, a new fitness scaling method has been developed for the sake of providing a more constant selective pressure to stochastic selection process during evolutionary procedure. Compared to previous approaches, the new scaling method massively reduces the impacts from the scatter situation of raw fitness values on population reproduction, such that through using the new method GA designer can easily adjust the performance of stochastic selection without caring the variances of the fitness distribution of population. Experimental results on both unconstrained optimization and TSP optimization clearly suggest that, by means of the new fitness scaling method GA search may display a much faster and more consistent convergence speed, without reducing its effectiveness in global optimization. Thus, the new method could be a good choice for the GA users who like to apply fitness scaling techniques in their AG designs to achieve a controllable selective pressure. Furthermore, the new method also can be used together with other dynamic fitness scaling or fitness sharing techniques, such as Boltzmann selection.

¼ N best þ ∑f i A f nonbest

Since f i A f nonbest , f best [ f worst a f, f best  f i 4 0, and f best  f worst 40 then N  N best 4N i  Z0, N  N best Z 2 ðN  N best Þðf best  f worst Þ ¼0 ðN  Nbest Þ1 þ ðf best  f worst Þ and;

8 ð1  NP min Þ > < lim P best ¼ þ P min α-1 N best > : lim P nonbest ¼ P min



Ni  N N best

ðA:1Þ

Proof. The scaled fitness and survival probability when α-1 can be derived as follows: !1 !1=1 ¼

∑ ðf i  f j Þ

f j A fi 

8 ðN  N best Þ1 > < ¼  ðf best  f worst Þ > : N 1  ðf f Þ i

best

i



1

∑ ðf j  f i Þ

α-1

f i ¼ f best f i ¼ f worst

¼

N1 i ¼0 ðN  N best Þ1

ðA:9Þ

In addition, if N i  4 1, N1 N1 i i   ðf best  f i Þ Z0; 1Z ðN  N best Þ1 þ ðf best  f worst Þ ðN  N best Þ

ðA:10Þ

ðA:11Þ

N1 i  ðf best  f i Þ ¼0 ðN  Nbest Þ1 þ ðf best  f worst Þ

ðA:12Þ

Therefore, ○

∑f i A f nonbest f i ¼0 ðN  Nbest Þ1  ðf best  f worst Þ

ðA:13Þ

ðA:2Þ

○ ○ f best  f worst 1 lim ¼ p ○ ○ α-1∑N N ðf  f Þ best i worst i¼1

otherwise

○ ○ ðf f Þ P min þð1 NP min Þ lim N best ○ worst ○ α-1∑ i ¼ 1 ðf i f worst Þ

○ ○ f best f worst ○ ○ N α-1∑ i ¼ 1 ðf i  f worst Þ

lim

1

ðA:8Þ

and

f j A fi þ

lim P best ¼

α-1

where N i  is the number of f j A f i  . Then, limα-1 P best is derived as follows: lim P best ¼

Ni  Z0 N  N best

If N i  Z 1, then N 1 i   ðf best  f i Þ is a finite number,

α-1

1=1

14

ðA:7Þ

N1 i  ðf best  f i Þ ¼0 ðN  Nbest Þ1 þ ðf best  f worst Þ

Reconsider Proposition 2.

○ lim f α-1 i

1 N1 ðN  N best Þðf best  f worst Þ i   ðf best  f i Þ þ ðN  Nbest Þ1 þ ðf best  f worst Þ ðN  N best Þ1 þ ðf best  f worst Þ

ðA:6Þ

then; Appendix A



f best  f worst ○ ○ ∑N i ¼ 1 ðf i  f worst Þ

ð1  NP min Þ þP min N best 

lim

ðA:3Þ



ðA:14Þ

α-1 P A P i nonbest

P i ¼ 1  N best

ðA:15Þ

ð1  NP min Þ þ P min N best

 ¼ ðN  N best ÞP min ðA:16Þ

Since P i ZP min , then

P i ¼ P min

for any f i a f best



ðA:17Þ

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16

Reconsider Proposition 3. lim P nonworst

α-0

1  Nworst P min ¼ N  N worst

ðA:18Þ

The scaled fitness for α-0 is derived as follows: !1=1 !1 ○ lim f α-1 i

¼

∑ ðf i  f j Þ1



f j A fi 

8 f  f worst ; > < best ðN  N worst Þ1 ; ¼ > : ðf  f Þ  N1 i

worst



∑ ðf j  f i Þ1=1

f j A fi þ

f i ¼ f best f i ¼ f worst

ðA:19Þ

otherwise

where N i þ is the number of f j A f i þ . (A.18) can be proved similarly as above for (A.1).

15

[22] M. Li, J. Kou, A new non-monotone fitness scaling for genetic algorithm, Progr. Nat. Sci. 11 (2001) 622–630. [23] J.E. Baker, Reducing bias and inefficiency in the selection algorithm, in: The Proceedings of the Second International Conference on Genetic Algorithms and their Applications, Cambridge, 1987, USA, pp. 14–21. [24] Z. Michalewicz, Genetic Algorithms þ Data Structures¼ Evolution Programs, 3rd ed., Springer, London, UK, 1996. [25] L. Nolle, D.A. Armstrong, A.A. Hopgood, J.A. Ware, Optimum work roll profile selection in the hot rolling of wide steel strip using computational intelligence, in: Lecture Notes in Computer Science, vol. 1625 (1999) 435-452. [26] M. Gen, B. Liu, K. Ida, Evolution program for deterministic and stochastic optimizations, Eur. J. Oper. Res. 94 (1996) 618–625. [27] A.A. Hopgood, A. Mierzejewska, Transform ranking: a new method of fitness scaling in genetic algorithms, in: M. Bramer, F. Coenen, M. Petridis (Eds.), Research and Development in Intelligent Systems XXV: Proceedings of AI2008, Springer, London, UK, 2008, pp. 349–354. [28] K. Chellapilla, D.B. Fogel, Fitness distributions in evolutionary computation: motivation and examples in the continuous domain, BioSystems 54 (1999) 15–29.

References [1] D. Rivero, J. Dorado, J. Rabuñal, A. Pazos, Generation and simplification of artificial neural networks by means of genetic programming, Neurocomputing 73 (2010) 3200–3223. [2] S-H. Yang, Y-P. Chen, An evolutionary constructive and pruning algorithm for artificial neural networks and its prediction applications, Neurocomputing 86 (2012) 140–149. [3] N. Jozefowiez, F. Semet, E-G. Talbi, An evolutionary algorithm for the vehicle routing problem with route balancing, Eur. J. Oper. Res. 195 (2009) 761–769. [4] M. Gen, R. Cheng, Genetic Algorithms and Engineering Design, WileyInterscience, New York, 1997. [5] P.J.B. Hancock, An empirical comparison of selection methods in evolutionary algorithms, in: Lecture Notes in Computer Science, vol. 865 (1994) 80–94. [6] M.V. Butz, D.E. Goldberg, K. Tharakunnel, Analysis and improvement of fitness exploitation in XCS: bounding models, tournament selection, and bilateral accuracy, Evol. Comput. 11 (2003) 239–277. [7] M.V. Butz, K. Sastry, D.E. Goldberg, Strong, stable, and reliable fitness pressure in XCS due to tournament selection, Genet. Programm. Evol. Mach. 6 (2005) 53–77. [8] V. Kreinovich, C. Quintana, O. Fuentes, Genetic algorithms: what fitness scaling is optimal, Cybern. Syst. 24 (1993) 9–26. [9] A. Wolfgang, N.N. Ahmad, S. Chen, L. Hanzo, Genetic algorithm assisted minimum bit error rate beamforming, in: The Proceedings of Vehicular Technology Conference, 2004, 17–19 May 2004, Milan, Italy, pp. 142–146. [10] S. Hill, J. Newell, C. O'Riordan, Analysing the effects of combining fitness scaling and inversion in genetic algorithms, in: The Proceedings of 16th IEEE International Conference on Tools with Artificial Intelligence, 15–17 November 2004, Boca Raton, USA, pp. 380–387. [11] N. Surajudeen-Bakinde, X. Zhu, J. Gao, A. K. Nandi, Effects of fitness scaling and adaptive parameters on genetic algorithm based equalization for DS-UWB systems, in: The Proceedings of 4th International Conference on Computers and Devices for Communication 2009, 14–16 December 2009, Kolkata , India, pp. 1–4. [12] J.J. Greffenstette, J.E. Baker, How genetic algorithms work: a critical look at implicit parallelism, in: The Proceedings of the Third International Conference on Genetic Algorithms, 1989, San Mateo, CA, USA, pp. 20–27. [13] D.E. Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison-Wesley, Reading, 1989. [14] F. Leclerc, J.-Y. Potvin, A fitness scaling method based on a span measure, in: The Proceedings of IEEE International Conference on Evolutionary Computation 1995, 29 November–1 December 1995, Perth, WA, Australia, 2, pp. 561– 565. [15] A. Rogers, A. Prugel-Bennett, Genetic drift in genetic algorithm selection schemes, IEEE Trans. Evol. Comput. 3 (1999) 298–303. [16] S. Gupta, Relative fitness scaling for improving efficiency of proportionate selection in genetic algorithms, in: The Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, 2009, Montreal, Canada, pp. 2741–2744. [17] R. Cheng, M. Gen, Evolution program for resource constrained project scheduling problem, in: The Proceedings of the First IEEE Conference on Evolutionary Computation, 27–29 June 1994, Orlando, USA, pp. 736–741. [18] W. Yan, Z. Zhu, A new optimization algorithm based on the principle of evolution, J. Electron. 15 (1998) 248–253. [19] T. Bäck, Selective pressure in evolutionary algorithms: a characterization of selection mechanisms, in: The Proceedings of the First IEEE Conference on Evolutionary Computation, 27–29 June 1994, Orlando, USA, pp. 57–62. [20] T. Mahnig, H. Muhlenbein, A new adaptive Boltzmann selection schedule SDS, in: The Proceedings of the Congress on Evolutionary Computation 2001, 27–30 May 2001, Seoul, South Korea, 1, pp. 183–190. [21] T. Kuo, S.Y. Hwang, A genetic algorithm with disruptive selection, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 26 (1996) 299–307.

Li Feng Zhang received the B.Sc. degree in electronic engineering from Heilongjiang University, Harbin, China, in 1999, the M.Sc. degree in radio frequency and communication engineering from the University of Bradford, Bradford, U.K., in 2003, and the Ph.D. degree from the Faculty of Computing, Engineering and Mathematical Sciences (CEMS), University of the West of England (UWE), Bristol, U.K., in 2007. Currently, he is an assistant professor at the Department of Economic Information Management, School of Information, Renmin University of China, Beijing, China. His research interests lie in the following fields: evolutionary computation, meta-heuristic algorithms, combinatorial optimization, vehicle routing problems (VRP), nonlinear dynamic system identification, intelligence modeling [artificial neural networks (ANN) and fuzzy inference systems (FIS)].

Chen Xi Zhou received the B.Sc. degree in information management and information systems from the Renmin University of China, Beijing, China, in 2011, where he is currently working toward the M.Sc. degree at the Department of Economic Information Management, School of Information. His research interests include genetic algorithm (GA), parallel genetic algorithm (PGA), and combinatorial optimization.

Rong He received the B.Sc. degree in information management and information systems from the Renmin University of China, Beijing, China, in 2011, where he is currently working toward the M.Sc. degree at the Department of Economic Information Management, School of Information. His research interests include artificial neural network (ANN), evolutionary computation (EC), and evolved neural models.

16

L.F. Zhang et al. / Neurocomputing 148 (2015) 3–16 Yuan Xu received the B.Sc. degree in information management and information systems from the Renmin University of China, Beijing, China, in 2011. He is currently studying at the State University of New York at Binghamton. His research interest is in computational intelligence.

Meng Ling Yan received the B.Sc. degree in information management and information systems from the Renmin University of China, Beijing, China, in 2011. She is currently working toward the M.Sc. degree at the Department of Management Science and Engineering, Guanghua School of Management, Peking University. Her research interest is in artificial neural networks (ANN) and genetic algorithms (GA).