A population initialization method for evolutionary algorithms based on clustering and Cauchy deviates

A population initialization method for evolutionary algorithms based on clustering and Cauchy deviates

Accepted Manuscript A Population Initialization Method for Evolutionary Algorithms based on Clustering and Cauchy Deviates Draˇzen Bajer, Goran Marti...

10MB Sizes 0 Downloads 15 Views

Accepted Manuscript

A Population Initialization Method for Evolutionary Algorithms based on Clustering and Cauchy Deviates Draˇzen Bajer, Goran Martinovi´c, Janez Brest PII: DOI: Reference:

S0957-4174(16)30228-7 10.1016/j.eswa.2016.05.009 ESWA 10667

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

24 February 2016 18 April 2016 5 May 2016

Please cite this article as: Draˇzen Bajer, Goran Martinovi´c, Janez Brest, A Population Initialization Method for Evolutionary Algorithms based on Clustering and Cauchy Deviates, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.05.009

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • A population initialization method for evolutionary algorithms is proposed. • The method utilizes partitional clustering and a simple Cauchy mutation. • Viability of the method is tested on numerous standard functions.

CR IP T

• Performance of differential evolution benefits from incorporating the method.

AC

CE

PT

ED

M

AN US

• A considerable increase in the convergence rate is observed.

1

ACCEPTED MANUSCRIPT

CR IP T

A Population Initialization Method for Evolutionary Algorithms based on Clustering and Cauchy Deviates Draˇzen Bajera , Goran Martinovi´ca,1 , and Janez Brestb a

AN US

Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek, Kneza Trpimira 2b, 31000 Osijek, Croatia b Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ul. 17, 2000 Maribor, Slovenia [email protected], [email protected], [email protected]

AC

CE

PT

ED

M

Abstract. The initial population of an evolutionary algorithm is an important factor which affects the convergence rate and ultimately its ability to find high quality solutions or satisfactory solutions for that matter. If composed of good individuals it may bias the search towards promising regions of the search space right from the beginning. Although, if no knowledge about the problem at hand is available, the initial population is most often generated completely random, thus no such behavior can be expected. This paper proposes a method for initializing the population that attempts to identify i.e. to get close to promising parts of the search space and to generate (relatively) good solutions in their proximity. The method is based on clustering and a simple Cauchy mutation. The results obtained on a broad set of standard benchmark functions suggest that the proposed method succeeds in the aforementioned which is most noticeable as an increase in convergence rate compared to the usual initialization approach and a method from the literature. Also, insight into the usefulness of advanced initialization methods in higher-dimensional search spaces is provided, at least to some degree, by the results obtained on higher-dimensional problem instances—the proposed method is beneficial in such spaces as well. Moreover, results on several very high-dimensional problem instances suggest that the proposed method is able to provide a good starting position for the search. Keywords: Cauchy deviates, clustering, differential evolution, evolutionary algorithms, initial population, mutation.

1

Corresponding author: G. Martinovi´c, e-mail: [email protected], tel.: +38531495401, fax: +38531495402

2

ACCEPTED MANUSCRIPT

1

Introduction

AC

CE

PT

ED

M

AN US

CR IP T

Evolutionary algorithms (EAs) (Eiben & Smith, 2003) are population-based search and optimization methods. As such, the initial population plays a key role in terms of convergence rate and may affect the success of a EA in finding high quality or satisfactory solutions. Most often the initial population is generated completely random (Maaranen et al., 2007; Kazimipour et al., 2013) (using uniform deviates). Such initial populations are as a rule very diverse. This is a desirable feature since it enables an extensive exploration of the search space and helps keeping the population from premature convergence and being trapped in local optima. On the other hand, such populations are generally composed of low quality individuals exclusively. Consequently, this requires a substantial amount of time for reaching promising regions of the search space and ultimately, requires more time in order to converge towards high quality solutions. When the evaluation of the objective function is computationally expensive or the number of function evaluations is limited, an increased convergence speed towards promising regions of the search space is beneficial since it would reduce the time needed for finding satisfactory or high quality solutions. An increased convergence rate may be achieved by introducing good solutions/individuals into the initial population (Martinovi´c & Bajer, 2012). Needless to say, this is not always easy. When knowledge about the problem being solved is available, simple heuristics may be employed to obtain good solutions which can then be introduced into the initial population. The traveling salesman problem (TSP) is a prime example since there exists a multitude of heuristics. A modified nearest neighbor algorithm was proposed in Martinovi´c & Bajer (2012) and utilized for the population initialization of a genetic algorithm (GA). Wang et al. (2009) proposed an improved greedy algorithm which was used to initialize half of the population while the rest was initialized randomly. An extensive comparison of various population initialization methods used for the TSP can be found in e.g. Paul et al. (2015). A number of population initialization methods were proposed for many problems other than the TSP. For example, Guerrero et al. (2012) presented two distinct population initialization methods for the segmentation problem and compared them to the usual (random) initialization approach. In order to solve the flexible job-shop scheduling, Zhang et al. (2011) proposed a new initialization procedure. However, many real-world problems are like black boxes, hence no knowledge about their internals is available (black-box optimization (Cassioli & Schoen, 2013; Audet, 2014)). Generating an initial population that contains good individuals in such cases represents a significant problem. Thus, it is not surprising that the population is most often initialized completely random, since it is very hard to find good solutions in an efficient and simple manner. Nonetheless, a few attempts trying to overcome those problems can be found in the literature. Rahnamayan et al. (2007, 2008) proposed a method that employs opposition-based learning (OBL). This initialization approach was utilized by Dong et al. (2012) in a hybrid algorithm

3

ACCEPTED MANUSCRIPT

Preliminaries

PT

2

ED

M

AN US

CR IP T

for the circle detection problem. Furthermore, a completely different approach to the initialization of the EA population was proposed by Ali et al. (2013). The approach employs either a slightly modified quadratic interpolation or nonlinear simplex method. Both the aforementioned methods generate a new population form a starting uniformly random one. Also, both methods were tested for initializing the population of differential evolution (DE) which is a very competitive EA for numerical optimization. This paper proposes a new method for initializing the population of EAs when there is no problem knowledge available. It is based on clustering and uses Cauchy deviates. Clustering is used in an attempt to identify i.e. to get close to promising regions of the search space which are then represented by cluster centers. A simple Cauchy mutation is then employed in order to generate new individuals around those centers while favoring the better ones. Finally, individuals created uniformly random are introduced in order to create a sufficiently diverse initial population. Although populations generated completely at random may span the whole search space, they become extremely sparse as the dimensionality of that space increases (Rahnamayan & Wang, 2009). Due to this fact, it is very unlikely that solutions in promising regions are generated. Hence, the approach aimed at generating a population that is not necessarily diverse, but one that contains at least a few relatively good solutions is adopted here. The rest of the paper is organized as follows. Section 2 gives a brief and concise introduction to clustering and DE, since clustering is a key part of the proposed method, while DE is employed as a representative EA for the numerical experiments. The proposed method for initializing the EA population is described and analyzed in detail in Sect. 3. Setup of the conducted experimental analysis and the obtained results are reported and discussed in Sect. 4. Finally, the drawn conclusions and suggestions for future work are presented in Sect. 5.

AC

CE

Generally, a global optimization problem can be represented with the pair (S, f ), where S ⊆ Rd is the search space, and f : S → R a real-valued objective function. Solving this problem requires that a d-dimensional point x∗ ∈ S is found such that ∀x ∈ S : f (x∗ ) ≤ f (x) .

(1)

Global or more specifically, numerical optimization (Nocedal & Wright, 2006) problems may be subject to a multitude of linear and/or nonlinear constraints. The focus here is on unconstrained optimization problems which are subject to no constraints, although bound-constraints (box-constraints) typically apply. Next, the clustering problem with focus on partitional clustering is described, followed by a brief review of some previous application of clustering in EAs. Also, a description of differential evolution is given. These topics are introduced since clustering is an essential part of the proposed method, whereas DE is the EA that 4

ACCEPTED MANUSCRIPT

has been used for testing purposes.

2.1

Clustering

k [

j=1

πj = A,

πr

\

CR IP T

For a given set A = {aj : j = 1, . . . , n} ⊂ [α, β] ⊂ Rd , where α = (α1 , . . . , αd ) ∈ Rd , β = (β1 , . . . , βd ) ∈ Rd , clustering (Theodoridis & Koutroumbas, 2008; Xu & Wunsch, 2009) requires the grouping/division of that set into 1 < k < n subsets π1 , . . . , πk called clusters such that πs = ∅, r 6= s,

|πj | ≥ 1, j = 1, . . . , k .

(2)

AN US

The grouping of the set A into k subsets that satisfy (2) is a hard (or crisp) partition (or k-partition) of A, and can be represented by the set Π(A) = {π1 , . . . , πk }. Furthermore, by introducing a proximity measure (Teboulle, 2007; Theodoridis & Koutroumbas, 2008) between the data points, each cluster πj can be assigned a representative in form of a center zj . Usually the Euclidean distance or norm (k · k) is used. Each cluster πj can be then represented by the center zj =

1 X a . |πj | a∈π

(3)

j

M

Conversely, a partition can the calculated based on provided centers Z = {z1 , . . . , zk } according to the minimum distance principle

ED

πj = {a ∈ A : k a − zj k≤k a − zr k, ∀r = 1, . . . , k},

j = 1, . . . , k .

(4)

min F(Z, Π) =

AC

CE

PT

Since the number of possible partitions of the set A is very large, a criterion for evaluating the different partitions must be introduced. The problem of finding an optimal partition may be defined as the global optimization problem (Teboulle, 2007; Xu & Wunsch, 2009) k X X

j=1 a∈πj

k a − zj k2 .

(5)

Different weights wj > 0 can be assigned to each data point aj ∈ A in order to emphasize their importance, thus determining their contribution in the calculation of the centers. Accordingly, (3) becomes zj =

1 X r a, Wj ar ∈π

Wj =

X

ar ∈π

j

wr .

(6)

j

One of the most popular and widely used algorithms that searches for a locally optimal partition of A (in terms of (5)) is the k-means (Theodoridis & Koutroumbas, 2008; Xu & Wunsch, 2009; Wu et al., 2007) algorithm. It is a simple alternating 5

ACCEPTED MANUSCRIPT

optimization procedure that calculates a partition based on fixed centers which is then used to correct/update those centers. The procedure is repeated until a termination criterion is met. Despite having some drawbacks, its simplicity and speed (as stated in Duda et al. (2000), in practice only a relatively low number of iterations is needed) make it a powerful and useful data mining tool (see e.g. Ayech ¨ urk et al. (2015)). The k-means algorithm for clustering data & Ziou (2015); Ozt¨ with assigned weights is outlined in Alg. 1.

CR IP T

Algorithm 1 k-means

2.2

AN US

1: Set k and choose initial centers 2: while while termination condition not met do 3: assign each a ∈ A to a corresponding cluster (Eq. (4)) 4: for each cluster π ∈ Π update the associated center (Eq. 6) 5: end while

Applications of Clustering in EAs

AC

CE

PT

ED

M

Clustering has found applications in numerous research areas that range form astronomy to signal processing. The partitioning into groups of mutually similar data, may reveal new and interesting/useful information about that data set. Thus it is not surprising that clustering, in various forms, also found itself a place in the domain of EAs. In that regard it has been used in a number of different ways to enhance the search capability of various EAs. A clustering algorithm has been utilized in a hybrid EA by Martinez-Estudillo et al. (2006) for the design of product-unit-based neural networks. More specifically, the k-means algorithm was used to partition a number of best solutions found by the underlying EA in order to determine which solutions will undergo local optimization. The main goal of this was to reduce the computational cost by applying local search (LS) to mutually different solutions i.e. by applying it only to the best solution in each cluster. A different clustering approach for determining promising regions of the search space was proposed by Oliveira & Lorena (2007) in form of a framework dubbed Clustering Search (CS). The framework serves for combining metaheuristics and LS, wherein clustering is used for detecting promising regions in which LS will be applied. Unlike usual clustering approaches, in CS the clusters are defined by a center and a common radius that is dynamically adjusted. Moreover, clusters are updated as soon as a new solution is generated by the used metaheuristic. Also, new clusters may be introduced or existing ones removed. Costa Salas et al. (2015) utilized the CS for continuous optimization by combining variable mesh optimization (VMO) (Puris et al., 2012) and a couple of LS procedures. Since CS is a generic framework, Nagano et al. (2014) successfully applied it to a combinatorial optimization problem i.e. a flow shop scheduling problem. Zhang et al. (2007) used clustering for a different purpose. Namely, in order to estimate the optimization state of a GA, the k-means algorithm was utilized to partition the population into a preset number of clusters. The estimation is based on the relative sizes of the 6

ACCEPTED MANUSCRIPT

2.3

CR IP T

cluster containing the best and the one containing the worst individual. This information was then used in a fuzzy system for adjusting the values of the crossover and mutation probabilities. With the goal of increasing the convergence rate and balancing exploration and exploitation, Gong et al. (2009) proposed the utilization of the fuzzy c-means (FCM) algorithm in DE. One step of the FCM algorithm is periodically applied to partition the population into a randomly generated number of clusters. Afterwards, the obtained cluster centers compete with selected population members for survival. The authors also used the k-means algorithm instead of the FCM algorithm in Cai et al. (2011). Nonetheless, in both cases the clustering procedure is aimed at increasing exploitation by gathering information contained in mutually similar individuals.

Differential Evolution

r1

PT

ED

M

AN US

Differential evolution (Storn & Price, 1997) is a simple population-based direct search method. It established itself as one of the most effective and efficient metaheuristics for global optimization, and has been successfully applied to a multitude of such problems (see e.g. Zamuda & Brest (2014); Martinovi´c et al. (2014); Chen et al. (2015); Amrane et al. (2015); Garc´ıa-Domingo et al. (2015)). Like other common EAs, it employs crossover and mutation for creating new individuals, and selection for picking individuals for a new generation. The mode of operation of the standard DE is outlined in Alg. 2. The population of DE is composed of NP individuals typically called vectors vj = (v1j , . . . , vdj ) ∈ Rd , j = 1, . . . , NP . In each generation/iteration a new population is created by mutation and crossover of individuals i.e. vectors of the current population. Mutation creates for each target vector vj (in the current population) a corresponding mutant or donor vector

r2

r3

uj = vr1 + F · (vr2 − vr3 ),

(7)

AC

CE

where v , v and v are randomly selected vectors from current the population (selected anew for each target vector), such that j 6= r1 6= r2 6= r3. The base vector vr1 is perturbed by a difference vector represented by the difference of vr2 and vr3 . The parameter F ∈ [0, ∞i is the scale factor and determines the mutation step-size. The value of F is seldom greater than 1 (Neri & Tirronen, 2010). Next, the obtained mutant uj and target vector vj are crossed over to create a trial vector ( uji , if Ui [0, 1) ≤ CR or i = rj tji = , i = 1, . . . , d, (8) vij , else

where Ui [0, 1) is an uniform deviate in [0, 1), the parameter CR ∈ [0, 1] represents the crossover-rate, while rj is a randomly chosen number from the set {1, . . . , d}. After the new population of size NP composed of trial vectors is created, a one-toone selection takes place. More specifically, a trial vector tj passes into the next generation only if it is equal or better (in terms of the objective function) than the corresponding target vector vj . 7

ACCEPTED MANUSCRIPT

1: Set NP , CR and F , and initialize population 2: while termination condition not met do 3: for j := 1 → NP do 4: create mutant vector uj (Eq. (7)) 5: cross over uj and vj to create trial vector tj (Eq. (8)) 6: end for 7: for j := 1 → NP do 8: if f (tj ) ≤ f (vj ) then 9: vj := tj 10: end if 11: end for 12: end while

CR IP T

Algorithm 2 Canonical DE (DE/rand/1/bin)

Proposed Method for Initializing the EA Population

M

3

AN US

The described algorithm represents the standard or canonical DE, usually denoted as DE/rand/1/bin (Storn & Price, 1997; Price et al., 2005). Many other variants that improve on the canonical algorithm have been proposed in the literature. A comprehensive overview of state-of-the-art variants can be found in e.g. Neri & Tirronen (2010); Das & Suganthan (2011).

AC

CE

PT

ED

The proposed method for initializing the population of a EA is based on the idea of generating (relatively) good quality individuals in different promising regions of the search space. In order to achieve this, the method utilizes clustering and Cauchy deviates. The former is used to identify (to get close to) promising regions, while the later, in form of a simple Cauchy mutation, is used to generate new individuals, thereby further exploring those regions. The proposed method is outlined in Alg. 3. The (initial) EA population can be represented by the set PR = {vj : j = 1, . . . , NP } ⊂ [α, β] ⊂ Rd . It is most often, as mentioned in the introduction, generated completely random, i.e. vij = Ui [αi , βi ], i = 1, . . . , d, where Ui [αi , βi ] is an uniform deviate in [αi , βi ]. Also, considering the limited size of the population, it can only sample a small fraction of the search space. Thus, it is to be expected that the initial population will contain low quality individuals exclusively. The proposed method starts out with such a population and utilizes information available about the fitness landscape while attempting to find promising regions√of the search space. It does so by clustering the whole population PR into k = NP clusters whereby it uses also the data about the quality of each individual. This way, better individuals contribute more to the cluster formation, meaning the centers will be closer to them. According to Gong et al. (2009); Cai et al. (2011), the clustering of the population can be considered a form of multi-parent crossover. However, unlike their approaches, each individual is first assigned a weight associated with its quality in terms of the objective function. The weights of the individuals are normalized in order to bound their range. Accordingly, each individual vj ∈ PR is assigned a 8

ACCEPTED MANUSCRIPT

weight wj = a +

(fmax − fj ) · (b − a) , fmax − fmin

(9)

ˆ ir = zpi + Ci (0, 1), v

CR IP T

where fj is the quality of the individual, fmax and fmin are the quality of the worst and best individual, respectively, while a = 0.1 and b = 0.9. After the clustering of the population PR , a set of centers ZR = {z1 , . . . , zk } representing the different clusters is obtained. Those can be considered as offspring, where the parents of the center zj compose the cluster πj . In hope those centers are close to promising parts of the search space, a further exploration around them is conducted. Using a simple Cauchy mutation, in the proximity of a selected center zp a new individual i = 1, . . . , d,

(10)

AN US

is created, where Ci (0, 1) is a Cauchy deviate with location parameter l = 0 and scale parameter s = 1. The probability density function of the Cauchy distribution (Dekking et al., 2005) fpd (x; l, s) =

1 , 2) sπ(1 + ( x−l ) s

(11)

AC

CE

PT

ED

M

resembles that of the Gaussian distribution, but approaches the axis so slowly that an expectation does not exist, whereas the variance is undefined. Therefore, the creation of individuals further away from the centers is more likely. Hence, a larger part of the search space around the centers can be covered. This, in the end, makes it better suited for exploring larger neighborhoods than the Gaussian distribution. Considering the difficulty of estimating the optimal number of p clusters for a √ given set (see e.g. Vendramin et al. (2010)), the value k = NP = |PR | has been selected. This value is as a rule considered the upper bound (Sheng et al., 2005; Naldi et al., 2011). Generally, not all centers will be equally far from promising regions, and it can be suspected that better centers (in terms of the objective function) are closer. Hence, a more extensive exploration around such centers should be performed. In the proposed method, better centers are more likely to be chosen for the creation of new individuals. More specifically, centers are chosen by a roulette wheel selection. This way, the impact of the set number of clusters is effectively reduced. The obtained centers, depending on the starting population PR and the final partition, may be closer or farther away from promising parts of the search space. Thus, in order to get as close as possible to those regions, a new individual replaces the center used to create it if it is better. Hence, exploration will be subsequently conducted around it. It must be noted that the original centers ZR are preserved. Based on the aforementioned approach, a new population PC of size NP −k is generated. However, it should be noted that such a population will be composed of individuals that are relatively close to the original centers. Furthermore, due to the close proximity of grouped individuals, the subsequent exploration of the search

9

ACCEPTED MANUSCRIPT

space by the EA will be biased towards the regions in which the centers reside— especially the best, since more individuals were created around them. This may be desirable when one or more centers are close to the global optimum, but due to a loss in diversity, this may also lead to premature convergence. Therefore, in order to ensure sufficient diversity, the final initial population is composed of two parts. The first one is represented by the NP/2 best individuals from the set {ZR ∪ PC }, while the second part is represented by the NP/2 best individuals form PR . p √ generate a uniformly random population PR , set k = NP = |PR |, PC = ∅ partition PR into k clusters to obtain centers ZR , ZT = ZR for j := 1 → NP −k do select a center zp from the set ZT ˆ j (Eq. (10)) and insert it into PC generate a new individual v if f (ˆ vj ) < f (zp ) then ˆj zp := v end if end for choose NP/2 best individuals from the set {ZR ∪ PC } and another NP/2 best from the set PR as the initial population

Implementation Details

M

3.1

AN US

1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

CR IP T

Algorithm 3 Proposed method for EA population initialization

AC

CE

PT

ED

The proposed method performs two key steps - (1) the clustering of the starting uniformly random population PR ; (2) the creation of new individuals using the simple Cauchy mutation. The clustering is performed only once using the k-means algorithm (Alg. 1), although any other algorithm for partitional clustering (Xu & Wunsch, 2009) would be usable as well. The Cauchy mutation is used to create NP −k new individuals. Concerning the choice of the initial centers for k-means, it was made by randomly selecting k distinct data points i.e. individuals from the population. Hence the possible appearances of empty clusters is avoided. The termination condition of kmeans was twofold - (1) a maximum of 50 iterations was allowed; (2) execution was terminated if the distance between any pair of centers in two consecutive iterations is not greater than  = 0.01 i.e. it is assumed the algorithm converged. After clustering, the centers are obtained and used to create new individuals. For each new individual a center is chosen according to a roulette wheel selection. First, it must be noted that the original centers are preserved thus a copy is used subsequently. Second, the quality of the centers is normalized analogously to the weight assignment i.e. according to Eq. (9). This is done before selection takes place. Since a newly generated individual replaces a corresponding center if it is better, the normalization must be repeated after each replacement.

10

ACCEPTED MANUSCRIPT

3.2

Complexity Analysis

Experimental Analysis

M

4

AN US

CR IP T

The proposed population initialization method generates a total of 2·NP individ√ uals, where NP are the ones created in a uniformly random fashion, k = NP are obtained through clustering, and the remaining NP − k are created using the Cauchy mutation. Therefore, 2·NP −k individuals are created using random variates (deviates). The computational cost for creating NP individuals using uniform deviates can be estimated as O(NP · d). Creating the NP − k individuals using the Cauchy mutation requires the previous selection of a center for each individual thus the computational cost can be estimated as O((NP − k) · (k + d)). Due to the relatively small value of k and simplicity of the roulette wheel selection, the computational cost for creating the 2·NP −k individuals may be roughly estimated as O((2·NP −k) · d). Next, since the k individuals represent cluster centers they are obtained by the k-means algorithm whose computational cost, according to Theodoridis & Koutroumbas (2008); Naldi et al. (2011), may be estimated as O(tmax · NP · d · k), where tmax is the maximum number of allowed iterations (50 in this case). This of course is the worst-case scenario since an additional termination criterion is employed as well i.e. termination occurs if no difference in centers is observed between two consecutive iterations. Note, however, that before clustering each individual is assigned a weight thus the computational complexity of the whole clustering process may be estimated as O(NP · (tmax · k · d + 1)).

AC

CE

PT

ED

An experimental analysis was conducted in order to assess the advantages and shortcomings of the propose method for initializing the EA population. It was divided into three parts - (1) performance analysis in terms of convergence rates and execution times on low and medium dimensional problem instances; (2) analysis of the influence of the population size; (3) performance analysis in terms of convergence rates and execution times on higher dimensional problem instances; (4) performance analysis in terms of convergence rates and execution times on very high dimensional problem instances. The whole analysis was performed on a broad set of standard unconstrained benchmark functions of different properties. The used set of unconstrained functions covers various dimensionalities and properties, and includes a total of 123 problem instances. The used functions are presented in Table 1, where D = {10, 30, 50, 80, 100, 120}. More details about each function can be found in (Yao et al., 1999; Ji & Klinowski, 2006; Das et al., 2009; Jamil & Yang, 2013). It should be noted that functions f1 ∼ f7 are unimodal, f8 is a step function, while f9 ∼ f15 are multimodal functions with many local minima, while f16 ∼ f28 are low dimensional functions with few local minima. The proposed method was incorporated into the canonical DE (denoted DEC ) in order to assess its impact on EA performance. Furthermore, for comparison needs, the population initialization method proposed in (Rahnamayan et al., 2007) (denoted DEO and described next) and the usual/standard initialization approach 11

ACCEPTED MANUSCRIPT

Table 1: Benchmark functions, used Function Pd 2 f1 (x) = i=1 xi Pd Qd f2 (x) = |x | i| + i=1 |x i=1  2 i Pd Pi f3 (x) = i=1 j=1 xj

f4 (x) = maxi {|x i |, 1 ≤ i ≤ d} i Pd−1 h 2 2 2 f5 (x) = i=1 100(xi+1 − xi ) + (xi − 1) 2 P 2 d f6 (x) = i=1 xi 4 2 P P Pd d 2 d + f7 (x) = i=1 0.5ixi i=1 0.5ixi i=1 xi + Pd 2 f8 (x) = (bxi + 0.5c) i=1 h i Pd 2 − 10 cos(2πxi ) + 10 f9 (x) = i=1 xi   q P 1 d f10 (x) = − 20 exp −0.2 d i=1 xi  P  d 1 − exp d +e i=1 cos (2πxi ) + 20  Qd Pd x 2 1 √i f11 (x) = 4000 i=1 cos i=1 xi − i πn P 2 2 10 sin (πyi ) + d−1 f12 (x) = i=1 (yi − 1) [1 + 10 sin (πyi+1 )] d o P + (yd − 1)2 + d i=1 u(xi , 10, 100, 4) n P 2 2 f13 (x) =0.1 sin2 (3πxi ) + d−1 i=1 (xi − 1) [1 + sin (3πxi+1 )] o Pd 2 + (xd − 1)[1 + sin (2πxd )] + i=1 u(xi , 5, 100, 4) Pd f14 (x) = i sin(xi ) + 0.1x i=1 |x i | qP q d 2 + 0.1 Pd 2 x f15 (x) = 1 − cos 2π i=1 i i=1 xi  −1 1 + P25 1 f16 (x) = 500 P2 j=1 6 i=1

f∗ 0 0

D∪{500, 1000}

[−100, 100]d

0

[−100, 100]d [−30, 30]d

0 0

D

[−100, 100]d

0

[−5, 10]d

0

D∪{500, 1000} D∪{500, 1000} D

[−100, 100]d

0

D∪{500, 1000}

[−5.12, 5.12]d

0

D

[−32, 32]d

0

[−600, 600]d

0

D∪{500, 1000}

[−50, 50]d

0

D

[−50, 50]d

0

[−10, 10]d

0

CR IP T

D∪{500, 1000}

D∪{500, 1000}

D∪{500, 1000} D∪{500, 1000}

[−100, 100]d

AN US

j+

S [−100, 100]d [−10, 10]d

d D∪{500, 1000} D

(xi −aij )

2

M

4 1 6 − 4x2 f17 (x) = 4x2 − 2.1x4 2+ 4x2  1 + 3 x1 + x1 x  1 2 5 x − 6 2 + 10 1 − 1 f18 (x) = x2 − 5.12 x2 + cos (x1 ) + 10 1 π 1 8π 4π h 2 2 f19 (x) = 1 + (x1 + x2 + 1) (19 − 14x1 + 3x1 − 14x2 + 6x1 x2 i h 2 2 2 + 3x2 ) · 30 + (2x1 − 3x2 ) (18 − 32x1 + 12x1 i 2 + 48x2 − 36x1 x2 + 27x2 ) i h P P 2 3 f20 (x) = − 4 j=1 aij (xj − pij ) i i=1 ci exp h− P4 P6 f21 (x) = − i=1 ci exp − j=1 aij (xj − pij )2 h i−1 P T f22 (x) = − 5 i=1 (x − ai )(x − ai ) + ci h i−1 P7 T f23 (x) = − i=1 (x − ai )(x − ai ) + ci h i−1 P T f24 (x) = − 10 i=1 (x − ai )(x − ai ) + ci

[−65.536, 65.536]

2

[−5, 5]d

2

[−5, 10] × [0, 15]

2

[−2, 2]d

3

[0, 1]d

6

[0, 1]d

4

[0, 10]d

4

[0, 10]d

4

[0, 10]d

4

[−10, 10]d

2 2 2 f26 (x) = (x2 1 + x2 − 11) + (x1 + x2 − 7) 2 7 3 1 4 f27 (x) = (1 − 8x1 + 7x2 1 − 3 x1 + 4 x1 )x2 exp (−x2 )

2 2

[−5, 5]d [0, 5] × [0, 6]

f28 (x) =

2

2 2 x2 )

2

2 2 x3 )

2

ED

f25 (x) =100(x1 − + (1 − x1 ) + 90(x4 − + (1 − x3 )   2 2 + 10.1 (x2 − 1) + (x4 − 1) + 19.8(x2 − 1)(x4 − 1)

[0, 10]d

0.998 −1.0316285 0.398 3

−3.86 −3.32 −10.1532 −10.4029 −10.5364 0

0 −2.3458 −0.673668

PT

sin2 (x1 −x2 ) sin2 (x1 +x2 ) q 2 x2 1 +x2

0

d

AC

CE

(denoted DER ) were implemented as well. Those two methods have been selected since they are widely used in the literature. Although the later is the most common choice, the former has been successfully utilized in a number of studies (see e.g. Jabeen et al. (2009); Gao et al. (2012); Dong et al. (2012); Ram et al. (2015)). It must be noted that the first two methods require 2 · NP objective function evaluations (FEs), while the third requires only NP evaluations. All algorithms were implemented in the C# programming language.

4.1

Opposition-based Learning for Population Initialization

A method for initializing the EA population that utilizes opposition-based learning (see e.g. Tizhoosh (2005)) was proposed by Rahnamayan et al. (2007), and further integrated into DE in Rahnamayan et al. (2008). The method makes use of the definition of opposite points in order to generate a new population based on a 12

ACCEPTED MANUSCRIPT

starting uniformly random one. The definition states that for a given d-dimensional point v = (v1 , . . . , vd ) ∈ [α, β] ∈ Rd its opposite is ˜ i = αi + βi − vi , v

i = 1, . . . , d .

(12)

The method is conceptually simple and is outlined in Alg. 4. Algorithm 4 Population initialization by OBL generate uniformly random population PR of size NP for j := 1 → NP do ˜ j for vj ∈ PR (Eq. 12) calculate opposite point v end for ˜ NP }} as the initial population select the best NP individuals from the set {PR ∪ {˜ v1 , . . . , v

Behavior of the used Population Initialization Methods

AN US

4.2

CR IP T

1: 2: 3: 4: 5:

AC

CE

PT

ED

M

The initialization methods employed in DER , DEO and DEC generate the initial population in completely different fashions, but all have a randomly generated one in common. Hence, in order to provide some insight into the behavior of the aforementioned methods, at least on a conceptual level, they have been used to generate initial populations for several chosen functions for d = 2. For each, 50 solutions were generated by the used methods inside the whole search space. The generated populations are visualized in Fig. 1, where the star shape (F) represents the location of the optimum, and the diamond shape (♦) the best achieved solution. As is visible from Fig. 1, populations generated by the used methods differ in structure. Populations generated by the usual/standard method do not express a meaningful structure, whereas populations generated by the method employed in DEO express a certain symmetry which corresponds to the definition of opposite points. In case of the proposed method, small groups/clusters of solutions surrounding the optimum may be observed. Another interesting point to observe is the coverage of the search space, populations generated by proposed method usually cover a smaller area compared to the other two methods. This and the cluster formations directly affect the diversity i.e. decrease it. However, since numerous solutions (tightly) surround the optimum, the probability in reaching it with fewer FEs is larger in comparison since those solutions effectively drive the search towards those regions right from the beginning.

4.3

Experiment Setup

For each used algorithm and problem instance, 50 independent algorithm runs were performed. The values of DE control parameters were the same for all algorithms and correspond to the ones used in Rahnamayan et al. (2007). They were as follows: NP = 100, CR = 0.9 and F = 0.5. Furthermore, all algorithms were allowed a maximum of 106 FEs. Termination occurred after the targeted optimization error ∆f = |f ∗ − f best | < τ was reached or the maximum number of FEs was performed. 13

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 1: Visualization of populations generated by the different initialization methods

14

ACCEPTED MANUSCRIPT

The value of τ was selected to be relatively large, since the main interest was in the performance of the population initialization methods. The following values were used: τ = 10 for f4 and f5 , τ = 102 for f9 , τ = 1 for f15 and f21 , τ = 10−3 for f17 , f18 , f20 , f27 and f28 , while τ = 10−1 for the rest of the problem instances. It should be noted that values smaller than τ were taken as zero.

4.4

Results and Discussion

4.4.1

AN US

CR IP T

The subsequent sections present the results obtained in the conducted experimental analysis. It must be noted that functions f16 ∼ f28 were considered as lowdimensional problem instances (d ∈ {2, 3, 4, 6}), whereas functions f1 ∼ f15 for d = 10, 30, 50 were considered as medium-dimensional problems. Furthermore, functions f1 ∼ f15 for d = 80, 100, 120 were considered as higher-dimensional problems, whereas functions f1 , f3 , f4 , f5 , f8 , f9 , f11 , f12 , f14 , and f15 for d = 500, 1000 were considered as very high-dimensional problems. Results on low and medium-dimensional cases

ED

M

Tables 2, 3, 4 and 5 report the obtained results in terms of convergence rates and execution times. The convergence rate is expressed as the total (sum of each algorithm run on a given function) number of function evaluations needed to reach the targeted optimization error (TERNFEs ). The execution times are expressed as averages in milliseconds (t). Furthermore, the tables present the speed-up,   n P TERNFEs (fi ) for DEC   i=1  · 100%,  (13) speed-up = 1 − P n  TERNFEs (fi ) for DER/O i=1

AC

CE

PT

achieved by the algorithm incorporating the proposed initialization method compared to the algorithms incorporating the other two approaches, used. As may be noted form the presented results, the proposed method yielded a substantial speed-up in comparison. Observing the TERNFEs values, it can be easily noted how DEC , on almost all problem instances, needed considerably less FEs for reaching the targeted optimization error compared to DER and DEO . Considering only functions f1 ∼ f8 (unimodal functions and step function), a slight drop in speed-up may be noticed as problem dimensionality increases. In case of functions f9 ∼ f15 (multimodal with many minima), an increase for d = 30 and again a decrease for d = 50 is observable. As is evident from Alg. 3, the proposed method is certainly more complex and computationally expensive compared to the other two methods, but in the light of average execution times and especially their total, this is more than made-up for during the algorithm runs. As may be noted, on the majority of problem instances, the average execution times are lower for DEC in comparison to DER and DEO . Exceptions are the functions f1 ∼ f8 for d = 10 (Table 2) and low-dimensional functions f16 ∼ f28 (Table 5), where the achieved speed-up was not completely reflected in 15

ACCEPTED MANUSCRIPT

Table 2: Convergence rate in terms of TERNFEs , and average execution time (in milliseconds) on functions f1 ∼ f15 for d = 10 DEC TERNFEs t 565073 9.1 612907 5.3 751954 7.6 175286 1.8 669812 7.0 497777 4.3 601424 7.8 469705 4.3 4343938 47.22 speed-up 4921 0.1 642223 13.9 6410670 141.3 456325 13.4 461830 14.0 675851 13.1 343117 3.5 8994937 199.22 speed-up

CR IP T

1 2 3 4 5 6 7 8 total – 9 10 11 12 13 14 15 total –

DEO TERNFEs t 624249 5.9 657601 6.2 818200 8.0 254872 2.1 710082 7.3 558167 5.3 630162 7.6 520497 5.3 4773830 47.86 9.01% 13888 0.1 703421 15.2 6763696 148.5 519805 16.6 518419 15.7 814528 15.1 402740 4.0 9736497 215.16 7.62%

AN US

DER t TERNFEs 629598 5.3 663920 5.1 826731 7.7 258219 1.9 719079 7.0 564302 4.2 634274 7.3 534918 4.3 4831041 42.80 10.08% 14200 < 0.1 706835 14.3 6701493 144.2 523939 15.5 531172 15.5 823035 14.8 399100 3.4 9699774 207.78 7.27%

f

Table 3: Convergence rate in terms of TERNFEs , and average execution time (in milliseconds) on functions f1 ∼ f15 for d = 30

M

DEO TERNFEs t 2341683 46.0 2840518 58.3 8267451 278.4 1490386 30.1 10704923 270.9 2145044 41.9 7536734 199.3 1988418 43.0 37315157 967.88 8.30% 20675487 1035.3 2434837 131.8 2552295 151.7 1981410 151.7 2253951 173.8 2625741 131.9 2264354 47.2 34788075 1823.24 11.52%

AC

CE

PT

1 2 3 4 5 6 7 8 total – 9 10 11 12 13 14 15 total –

DER TERNFEs t 2360755 44.7 2901022 57.4 8323830 279.0 1598451 32.9 10894289 269.0 2137195 41.0 7616249 197.1 1983283 40.6 37815074 961.72 9.51% 21239208 1051.0 2461239 133.8 2565439 151.4 2032055 153.9 2296792 176.5 2659430 129.1 2319224 49.0 35573387 1844.80 13.47%

ED

f

DEC TERNFEs t 2020483 40.5 2504481 51.0 7420359 253.1 840614 17.1 10587676 263.4 1825632 36.6 7388675 192.7 1630459 35.0 34218379 889.26 speed-up 19013648 949.1 2115688 116.6 2178252 135.0 1597951 111.1 1899263 133.5 2055879 101.7 1921439 40.2 30782120 1587.30 speed-up

the execution times. This comes as no surprise considering the low computational cost involved in their evaluation. Nonetheless, the obtained results suggest that the population initialization plays an insignificant part in the overall execution times, at least in the case of the employed methods. It must be remarked that no attention was put on implementation efficiency. Due to this, better implementations may show larger differences in overall execution times. Insight into the behavior of the algorithms incorporating the initialization methods used in the analysis, is provided with Fig. 2. The mean optimization error ∆f in relation to the number of function evaluations (NFEs) is shown for several cho16

ACCEPTED MANUSCRIPT

Table 4: Convergence rate in terms of TERNFEs , and average execution time (in milliseconds) on functions f1 ∼ f15 for d = 50

*

DEC TERNFEs t 3193656 100.2 3872435 122.9 26373547 1785.3 – (98%)* 82.1 33845052 1363.7 2857032 88.2 29468536 1180.0 2567702 85.9 102177960 4808.34 speed-up 34606605 2781.8 3683472 327.1 3743924 375.1 3151983 379.0 3798797 463.1 3591100 287.0 4083576 129.5 56659457 4742.58 speed-up

CR IP T

1 2 3 4 5 6 7 8 total – 9 10 11 12 13 14 15 total –

DEO TERNFEs t 3736745 116.4 4452341 142.4 30394405 2048.2 – (92%)* 217.0 34426042 1378.7 3423683 105.9 29918351 1205.0 3077865 99.9 109429432 5313.48 6.63% 37322170 3061.6 3668995 307.6 3760713 366.8 3160072 389.9 3885502 458.2 3678935 292.4 4077163 127.8 59553550 5004.26 4.86%

AN US

DER t TERNFEs 3735224 115.0 4520364 141.4 30356378 2012.3 – (80%)* 416.9 34243369 1357.5 3420621 104.4 29997029 1180.8 3071125 98.9 109344110 5427.20 6.55% 38618846 3112.2 3687426 317.3 3782256 368.1 3259775 392.5 3942799 469.7 3667200 288.9 4099442 128.4 61057744 5077.12 7.20%

f

Success rate, percentage of algorithm runs in which the targeted optimization error was reached.

Table 5: Convergence rate in terms of TERNFEs , and average execution time (in milliseconds) on functions f16 ∼ f28

CE

DEO TERNFEs t 106943 15.2 76165 0.9 84012 0.6 53949 0.1 75477 2.1 11012 0.1 253844 1.9 241033 2.0 231687 2.1 361446 2.0 48343 0.1 44204 0.7 50581 0.1 1638696 27.96 7.69%

M

PT

16 17 18 19 20 21 22 23 24 25 26 27 28 total –

DER t TERNFEs 116201 15.0 75166 0.6 91805 < 0.1 58397 0.1 73899 0.3 11033 < 0.1 253652 1.5 239580 1.3 230185 1.7 345772 1.2 45235 < 0.1 41227 < 0.1 48498 < 0.1 1630650 21.78 7.23%

ED

f

DEC TERNFEs t 85463 12.1 80361 1.0 83664 0.4 51674 0.3 76606 1.0 9957 0.2 245285 2.9 222074 1.9 212711 2.1 322577 1.7 37522 < 0.1 36504 0.1 48314 0.1 1512712 23.66 speed-up

AC

sen functions. The optimization error was recorded every 100 FEs. A significant advantage of DEC compared to DER i DEO right from the beginning (after initialization is completed) is clearly visible on the shown instances. This also suggests that the proposed method successfully generates initial populations containing relatively good solutions/individuals. Although the advantage decreases with the NFEs, DEC manages to preserve it considering the previously mentioned achieved speed-ups. The obtained results clearly suggest better performance of the algorithm incorporating the proposed initialization method in comparison to the ones incorporating the other two methods, which are widely used in the literature. This better performance manifested itself in a considerably lower NFEs needed for reaching the 17

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 2: Convergence graphs for several selected functions

targeted optimization error, which in turn led to lower execution times on the majority of problem instances. The aforementioned suggests that the proposed method would be especially well suited for situations in which the allowed NFEs is limited or in which the evaluation itself is expensive.

18

ACCEPTED MANUSCRIPT

Table 6: Performance of the two initialization methods (used in DER and DEO ) in comparison to the proposed one (used in DEC ) on functions f1 ∼ f15 for d = 10 NFEs=200 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

NFEs=400 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

NFEs=600 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

DEC

— — — — — — — — — — — — — — — 5 (−) 5 (≈) 4 (≈) 4 (+)

CR IP T

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

NFEs=100 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

AN US

f

Table 7: Performance of the two initialization methods (used in DER and DEO ) in comparison to the proposed one (used in DEC ) on functions f1 ∼ f15 for d = 30 NFEs=400 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

ED

M

NFEs=200 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 14 0 1 0 0 0 0

AC

CE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

NFEs=100 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 4 (≈) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 14 14 0 1 1 0 0 0

PT

f

4.4.2

NFEs=600 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

DEC

— — — — — — — — — — — — — — — 5 (−) 5 (≈) 4 (≈) 4 (+)

Influence of the population size

The population size plays an important role in the performance of EAs. Bigger populations enable a more extensive exploration of the search space but require a larger number of FEs, compared to smaller ones, in order to converge. Additionally, bigger populations may cover a larger part of the search space which may lead to a quicker discovery of promising regions. Having that in mind, the influence of the population size on the performance of the proposed initialization method was further investigated. For that purpose, the used initialization methods were tested with different budgets of FEs. More specifically, with budgets of NFEs = 100, 200, 19

ACCEPTED MANUSCRIPT

Table 8: Performance of the two initialization methods (used in DER and DEO ) in comparison to the proposed one (used in DEC ) on functions f1 ∼ f15 for d = 50 NFEs=200 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

NFEs=400 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

NFEs=600 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 15 15 0 0 0 0 0 0

DEC

— — — — — — — — — — — — — — — 5 (−) 5 (≈) 4 (≈) 4 (+)

CR IP T

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

NFEs=100 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 4 (≈) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 14 14 0 1 1 0 0 0

AN US

f

Table 9: Performance of the two initialization methods (used in DER and DEO ) in comparison to the proposed one (used in DEC ) on functions f16 ∼ f28 NFEs=400 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (−) 4 (≈) 5 (≈) 4 (≈) 4 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (−) 5 (≈) 10 9 2 3 1 1 0 0

M

NFEs=200 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 4 (≈) 5 (≈) 5 (−) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (≈) 10 10 2 3 1 0 0 0

ED

CE

16 17 18 19 20 21 22 23 24 25 26 27 28

NFEs=100 DER DEO 5 (−) 5 (−) 4 (≈) 5 (−) 5 (−) 5 (≈) 5 (≈) 5 (≈) 4 (≈) 5 (≈) 5 (≈) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (≈) 8 8 3 5 2 0 0 0

PT

f

NFEs=600 DER DEO 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (−) 4 (≈) 5 (≈) 5 (≈) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (−) 5 (≈) 5 (≈) 5 (−) 5 (−) 5 (−) 5 (−) 9 10 3 3 1 0 0 0

DEC

— — — — — — — — — — — — — 5 (−) 5 (≈) 4 (≈) 4 (+)

AC

400 and 600 just for generating the initial populations. In case of the usual/random initialization method (employed in DER ) the population size corresponds to the allowed NFEs (NP ⇔ NFEs). In case of the method utilizing OBL and in case of the proposed method, the population size corresponds to half the allowed NFEs (NP ⇔ NFEs/2). Tables 6, 7, 8 and 9 report the relative quality (in terms of the mean optimization error) of the initial populations generated by the methods employed in DER and DEO in comparison to the proposed one, employed in DEC . The results are based on 50 independent runs per function. It must be noted that the reported values refer to the best solution in the initial populations. The sign 5 indicates that the mean 20

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 3: Performance of the initialization methods utilized in DER , DEO and DEC on several selected functions

quality is worse (larger mean) than the one achieved by the proposed method, while the opposite is indicated by the sign 4. Furthermore, in parentheses, the sign − indicates a statistically significant difference in favor of the proposed method, the opposite is indicate by the sign +, whereas the sign ≈ indicates that there is no statistically significant difference. The Wilcoxon signed rank test with a confidence interval of 95% was utilized for that matter. 21

ACCEPTED MANUSCRIPT

Results on higher-dimensional cases

AN US

4.4.3

CR IP T

From the presented results, it is clearly evident that the proposed method performs better compared to the usual method and the method utilizing OBL. It manages to generate initial populations containing considerably better solutions, regardless of the allowed budget. Also, this advantage is present in case of almost all problem instances. The advantage is only slightly less prominent in case of the lowdimensional functions (f16 ∼ f28 ) which comes as no surprise considering that search space is small for the majority of those functions. Figure 3 provides some insight into how larger the differences in quality are. As may be noted, the differences are significant, and mostly increase with problem dimensionality. Also, interestingly to note, the increase of the budget allowed for the initialization does not always result in considerably or visibly better populations. This indicates that finding always better solutions on some of the used functions (both, unimodal and multimodal) requires a more exhaustive and directed search than the employed initialization methods are able to perform with their limited NFEs.

AC

CE

PT

ED

M

As was stated in Kazimipour et al. (2013), many studies treating advanced initialization methods are usually limited to the analysis of performance on low and medium-dimensional problem instances. For example, the largest dimensional problem considered in Rahnamayan et al. (2007), was d = 30, in de Melo & Delbem (2012) it was d = 60, whereas in Richards & Ventura (2004) it was d = 50. Thus, the question about the usefulness of different population initialization methods in higher-dimensional search spaces mostly remains unanswered. In order to answer this question, at least to some degree, additional testing and analysis of the proposed method on higher-dimensional problem instances were performed. Tables 10, 11 and 12 present the obtained results in terms of the total number of performed function evaluations (total NFEs) and average execution times (t) in milliseconds. Furthermore, the percent values in parentheses represent the success-rate (SR) in reaching the targeted optimization error. Since the SR on some instance was below 100%, the Wilcoxon signed rank test (95% confidence interval) was performed in oder to assess the performance in terms of solution quality. Whereby, in parentheses, the sign − indicates a statistically significant difference in favor of DEC , while the sign ≈ indicates that there is no statistically significant difference. Also shown in the tables,   n P total NFEs(f ) for DE i C   i=1  · 100% (14) improvement =  1 − n   P total NFEs(fi ) for DER/O i=1

in terms of performed NFEs which was calculated analogously to (13). According to the shown results, the algorithm using the proposed initialization method (DEC ) achieved mostly large improvements (in terms of the NFEs) compared to the ones using the other two initialization methods (DER and DEO ). This 22

ACCEPTED MANUSCRIPT

Table 10: Convergence behavior in terms of total NFEs, and average execution time (in milliseconds) on functions f1 ∼ f15 for d = 80 t 264.7 310.7 6670.9 2294.5 3094.9 244.1 3105.9 264.1 16249.68 5346.4 752.2 876.7 1881.1 1582.2 800.9 380.0 11619.5

DEO total NFEs 5508089 (100%) 6349332 (100%) 50000000 (0%) (−) 48262521 (4%) (−) 50000000 (0%) (≈) 5059574 (100%) 50000000 (0%) (≈) 4934372 (100%) 171851367 12.59% 42193308 (70%) (−) 5239621 (100%) 5416636 (100%) 6059240 (100%) 10129853 (96%) (≈) 5911885 (100%) 7308487 (100%) 82259030 16.91%

t 261.2 306.3 6714.2 2297.4 3274.9 239.2 3093.0 253.8 16439.94

DEC total NFEs t 4759367 (100%) 226.6 5583410 (100%) 270.4 50000000 (0%) 6687.5 23208985 (58%) 1071.1 50000000 (0%) 3105.7 4360957 (100%) 209.6 50000000 (0%) 3050.2 4479827 (100%) 230.1 169183561 14851.4 improvement 38587109 (90%) 4902.7 4564096 (100%) 623.3 4565335 (100%) 736.8 3657658 (100%) 597.8 5641513 (100%) 909.0 4859737 (100%) 612.4 6469809 (100%) 312.2 68345257 8694.2 improvement

CR IP T

1 2 3 4 5 6 7 8 total – 9 10 11 12 13 14 15 total –

DER total NFEs 5592210 (100%) 6400680 (100%) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (≈) 5168572 (100%) 50000000 (0%) (≈) 4802160 (100%) 171963622 13.32% 40478105 (84%) (≈) 5258891 (100%) 5459632 (100%) 10888742 (90%) (≈) 8563183 (100%) 5907063 (100%) 7444791 (100%) 84000407 18.64%

5425.1 725.1 845.8 1084.4 1683.6 737.2 352.6 10853.78

AN US

f

Table 11: Convergence behavior in terms of total NFEs, and average execution time (in milliseconds) on functions f1 ∼ f15 for d = 100

M

DEO total NFEs 6831917 (100%) 7556683 (100%) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 6278105 (100%) 50000000 (0%) (≈) 8434526 (100%) 229101231 4.47% 41636469 (74%) (≈) 6531713 (100%) 6540032 (100%) 13234648 (90%) (≈) 15492165 (98%) (≈) 7539307 (100%) 10005952 (100%) 100980286 23.50%

ED

t 413.6 478.6 9829.0 2898.5 3934.1 379.5 3877.4 489.8 22300.34

6683.8 1078.8 1277.7 2694.0 3438.5 1171.7 597.5 16941.9

AC

CE

1 2 3 4 5 6 7 8 total – 9 10 11 12 13 14 15 total –

DER total NFEs 6868725 (100%) 7595983 (100%) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 6285518 (100%) 50000000 (0%) (≈) 7792071 (100%) 228542297 4.23% 41279611 (78%) (≈) 6508419 (100%) 6635141 (100%) 13048849 (94%) (≈) 17209505 (94%) (≈) 7656245 (100%) 10121500 (100%) 102459270 24.60%

PT

f

t 403.2 454.4 9712.4 2876.5 4194.0 369.4 3882.4 758.8 22651.08 6555.7 1082.9 1254.5 2679.1 3113.5 1167.4 591.4 16444.48

DEC total NFEs t 5957787 (100%) 350.8 6619519 (100%) 396.0 50000000 (0%) 9601.7 42787947 (16%) 2419.4 50000000 (0%) 3835.3 5333529 (100%) 311.3 50000000 (0%) 3797.4 8166257 (100%) 509.3 218865039 21221.22 improvement 37916268 (88%) 7037.6 5726470 (100%) 1242.2 5514219 (100%) 1458.5 4865743 (100%) 1357.3 7962195 (100%) 2243.1 6333863 (100%) 1339.2 8935042 (100%) 764.1 77253800 15442.04 improvement

particularly applies to functions f9 ∼ f15 (multimodal functions with many minima). In case of functions f1 ∼ f8 (unimodal and step function) the improvement decreases with the increase in problem dimensionality, which may be attributed to the virtually nonexistent success in reaching the targeted optimization error on half those functions. The aforementioned improvements achieved by DEC also reflect themselves in the average execution times, which are on most problem instances lower in comparison. A similar behavior of the algorithms, as previously (Fig. 2), may be observed in Fig. 4. Accordingly, DEC shows significant advantages compared to DER and DEO right from the beginning (after initialization). Although those ad23

ACCEPTED MANUSCRIPT

Table 12: Convergence behavior in terms of total NFEs, and average execution time (in milliseconds) on functions f1 ∼ f15 for d = 120 t 574.6 714.0 13388.2 3985.5 5084.0 636.4 5480.8 1454.2 31317.62 8736.6 2805.4 1916.8 6347.3 7769.9 1752.7 1318.8 30647.46

DEO total NFEs 8301895 (100%) 8911652 (100%) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 7595743 (100%) 50000000 (0%) (≈) 13212640 (100%) 238021930 1.99% 40606879 (72%) (≈) 10552120 (94%) (≈) 7786883 (100%) 14669624 (96%) (≈) 28990314 (90%) (≈) 8494864 (100%) 13664083 (100%) 124764767 21.17%

t 577.9 657.8 13445.2 3600.5 4636.3 527.1 4575.3 975.9 28995.94

DEC total NFEs t 7197444 (100%) 497.9 7862073 (100%) 580.5 50000000 (0%) 13017.9 48201963 (4%) 3305.7 50000000 (0%) 4674.0 6511978 (100%) 456.4 50000000 (0%) 4679.1 13508834 (100%) 1015.4 233282292 28226.84 improvement 37848993 (86%) 7294.7 11330892 (90%) 2339.0 6648083 (100%) 1556.0 7238706 (98%) 1724.8 16252973 (90%) 3661.6 7203369 (100%) 1331.4 11831174 (100%) 840.5 98354190 18747.98 improvement

CR IP T

1 2 3 4 5 6 7 8 total – 9 10 11 12 13 14 15 total –

DER total NFEs 8345483 (100%) 9026403 (100%) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (≈) 7669343 (100%) 50000000 (0%) (≈) 14312776 (100%) 239354005 2.54% 41144515 (74%) (≈) 12279621 (90%) (≈) 7865185 (100%) 23197719 (78%) (−) 29627870 (88%) (≈) 8774348 (100%) 13583091 (100%) 136472349 27.93%

7690.6 2083.4 1789.2 3572.2 6608.5 1548.0 956.3 24248.22

AN US

f

CE

PT

ED

M

vantages decrease with the NFEs, they remain until the end as indicated by the previously mentioned results. It is worth noting, that in cases where the SR is lower than 100%, often statistically significantly better solutions were obtained by DEC . Furthermore, Fig. 4 suggests that the proposed method is capable of generating populations containing relatively good solutions even in case of higher-dimensional search spaces. The reported results provide (at least to some degree i.e. for the problem instances considered) an answer to the question about the usefulness of advanced population initialization methods in higher-dimensional search spaces. According to those results, the answer is in favor of advanced initialization methods. It clearly suggests that those methods may be beneficial even in case of such spaces, and are capable of providing an edge over the commonly utilized, completely random initialization approach. 4.4.4

Results on several very high-dimensional cases

AC

The increase in problem dimensionality usually results in a even larger increase in problem complexity (Weber et al., 2011). This becomes a particular challenge in case of large-scale optimization problems as they tend to have a huge number of independent variables. Several hundreds or even thousands of variables are not uncommon for such problems. This increasing computational complexity cannot be followed by the same increase in computational resources. Due to this, different approaches are required to tackle such problems. A initial population containing relatively good solutions might provide a good starting position for the search. An additional testing and analysis were performed in order to shed some light on the impact of the used population initialization methods in such extreme cases. The analysis was conducted on a subset of the previously used functions. Due to 24

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 4: Convergence graphs for several selected higher-dimensional functions

the hugely increased problem complexity, the targeted optimization error have been increased accordingly: τ = 103 for f1 and f9 , τ = 104 for f3 , f5 and f8 , τ = 30 for f4 , and τ = 10 for f11 , f12 , f14 and f15 . The obtained results are reported in Tables 13 and 14, where, as previously, in parentheses the SR and statistical significance are given. Although the values of τ have been largely increased, the success-rate of DER and DEO is considerably low compared to DEC , especially for d = 1000. Hence the corresponding improvements and differences in the average execution times. Besides 25

ACCEPTED MANUSCRIPT

Table 13: Convergence behavior in terms of total NFEs, and average execution time (in milliseconds) for d = 500 1 3 4 5 8 total – 9 11 12 14 15 total –

DER total NFEs 23184643 (100%) 50000000 (0%) (−) 50000000 (0%) (−) 41701639 (96%) (≈) 12588581 (100%) 177474863 46.27% 20226704 (100%) 23196985 (100%) 50000000 (0%) (−) 19841894 (100%) 45066238 (64%) (−) 158331821 59.10%

t 5108 132099 10732 12483 3024 163446.60 12509 17386 48104 11807 10063 99869.60

DEO total NFEs 22591911 (100%) 50000000 (0%) (−) 50000000 (0%) (−) 39342687 (100%) 10896215 (100%) 172830813 44.83% 19611803 (100%) 22740062 (100%) 49876637 (4%) (−) 19102337 (100%) 40846942 (86%) (−) 152177781 57.44%

t 4994 132093 10846 11779 2628 162339.60

DEC total NFEs t 14113025 (100%) 3113 50000000 (0%) 131467 1423169 (100%) 314 25462833 (100%) 7587 4358827 (100%) 1048 95357854 143527.78 improvement 16885044 (100%) 10387 13761396 (100%) 10299 6754590 (100%) 5218 12479019 (100%) 7364 14882475 (100%) 3321 64762524 36588.60 improvement

CR IP T

f

12135 17051 45771 11375 9170 95502.06

62100 75048 109203 59431 22181 327963.02

DEO total NFEs 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 250000000 22.47% 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 250000000 20.60%

M

t 22316 499298 21275 29845 23852 596585.90

PT

1 3 4 5 8 total – 9 11 12 14 15 total –

DER total NFEs 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 250000000 22.47% 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 50000000 (0%) (−) 250000000 20.60%

ED

f

AN US

Table 14: Convergence behavior in terms of total NFEs, and average execution time (in milliseconds) for d = 1000 t 22421 499218 21402 29905 23943 596889.38 62116 75205 101653 59510 22301 320785.32

DEC total NFEs t 49948683 (4%) 22163 50000000 (0%) 496913 2945070 (98%) 1263 50000000 (0%) 29791 40942209 (76%) 19501 193835962 569631.60 improvement 33031662 (100%) 40780 49982314 (2%) 74849 15792332 (98%) 23682 49697957 (16%) 58456 50000000 (0%) 22134 198504265 219901.50 improvement

AC

CE

that, DEC achieved better solutions (statistically significant) in cases where the SR is lower than 100%. According to Fig. 5, this can be attributed to the large differences in the quality of the initial populations. The figure also, as in all previous cases, suggests that the proposed method is capable of generating significantly better initial populations compared to the other two methods (employed in DER and DEO ). Even though DEC performed in comparison much better than DER and DEO , it must be noted that the targeted optimization errors were set mostly very high. However, the results are still promising. Considering the presented results, it can be concluded that advanced initialization methods alone are not enough to handle large-scale optimization problems. Nonetheless, they are certainly capable of providing a good starting position for the search, especially the proposed method. This advantage in combination with appropriate techniques for enhancing the search process (see e.g. Zamuda et al. (2008); Segura et al. (2015)) might in the end yield substantial savings in computational cost.

26

ACCEPTED MANUSCRIPT

5

Conclusion

AC

CE

PT

ED

M

AN US

CR IP T

The paper proposed a method for initializing the EA population. The method is based on the idea of identifying promising regions of the search space and exploring them further. As a first step it employs (partitional) clustering on a starting uniformly random population. The obtained centers are then used to create further individuals/solutions in their proximity using a simple Cauchy mutation. The viability of the proposed method is supported by the results of the experimental analysis conducted on a broad set of standard benchmark functions, covering a wide range of problem dimensionalities. It was found that it successfully generates initial populations containing (relatively) good solutions, with regard to various population sizes. The main benefit it provides, is an increased convergence rate which usually reflected itself in lower execution times. The main drawback of the proposed population initialization method, clearly

Figure 5: Convergence graphs for several selected very high-dimensional functions

27

ACCEPTED MANUSCRIPT

M

AN US

CR IP T

seems to be its computational complexity. This may be mainly attributed to the clustering of the starting population. Nonetheless, the obtained results suggest that in the majority of cases, the time spent during initialization is more than madeup for during the algorithm run. Another limitation of the proposed method is the possibility that clusters of solutions are generated exclusively near local optima which may cause premature convergence or stagnation. However, the experimental results do not seem to indicate occurrences of such situations even on highly multimodal functions. Moreover, this possibility could be decreased by increasing the ratio between randomly generated and solutions generated by clustering and the Cauchy mutation. Increasing the number of randomly generated solutions in the initial population directly increases its diversity which might be beneficial for some problems. Although the proposed method was incorporated into the canonical DE, it represents a general approach, thus being applicable in other EAs. Therefore, its impact on the performance of other EAs and other DE variants could be investigated in the future. Another future direction that may be interesting to follow is constrained optimization, since the applicability and usefulness of the proposed method in such scenarios is rather unpredictable. Considering the method itself, it would be interesting to investigate its behavior after replacing the Cauchy with another probability distribution, like the Levy distribution, since it is essential in the generation of the initial population. Similarly, the replacement of the k-means with another partitional clustering algorithm may prove useful.

PT

References

ED

Acknowledgments. The authors would like to thank the anonymous reviewers and the editors for their careful reading and insightful comments that helped improve the paper.

CE

Ali, M., Pant, M., & Abraham, A. (2013). Unconventional initialization methods for differential evolution. Applied Mathematics and Computation, 219 , 4474–4494.

AC

Amrane, Y., Boudour, M., Ladjici, A. A., & Elmaouhab, A. (2015). Optimal VAR control for real power loss minimization using differential evolution algorithm. International Journal of Electrical Power & Energy Systems, 66 , 262–271. Audet, C. (2014). A survey on direct search methods for blackbox optimization and their applications. In P. M. Pardalos, & T. M. Rassias (Eds.), Mathematics Without Boundaries (pp. 31–56). Springer New York. Ayech, M. W., & Ziou, D. (2015). Segmentation of terahertz imaging using k-means clustering based on ranked set sampling. Expert Systems with Applications, 42 , 2959–2974.

28

ACCEPTED MANUSCRIPT

Cai, Z., Gong, W., Ling, C. X., & Zhang, H. (2011). A clustering-based differential evolution for global optimization. Applied Soft Computing, 11 , 1363–1379. Cassioli, A., & Schoen, F. (2013). Global optimization of expensive black box problems with a known lower bound. Journal of Global Optimization, 57 , 177– 190.

CR IP T

Chen, N., Chen, W.-N., & Zhang, J. (2015). Fast detection of human using differential evolution. Signal Processing, 110 , 155–163. Costa Salas, Y. J., Martnez Prez, C. A., Bello, R., Oliveira, A. C., Chaves, A. A., & Lorena, L. A. (2015). Clustering search and variable mesh algorithms for continuous optimization. Expert Systems with Applications, 42 , 789 – 795.

AN US

Das, S., Abraham, A., Chakraborty, U. K., & Konar, A. (2009). Differential evolution using a neighborhood-based mutation operator. IEEE Transactions on Evolutionary Computation, 13 , 526–553. Das, S., & Suganthan, P. (2011). Differential evolution: A survey of the state-ofthe-art. IEEE Transactions on Evolutionary Computation, 15 , 4–31.

M

Dekking, F. M., Kraaikamp, C., Lopuha¨a, H. P., & Meester, L. E. (2005). A Modern Introduction to Probability and Statistics, Understanding Why and How . SpringerVerlag London.

ED

Dong, N., Wu, C.-H., Ip, W.-H., Chen, Z.-Q., Chan, C.-Y., & Yung, K.-L. (2012). An opposition-based chaotic GA/PSO hybrid algorithm and its application in circle detection. Computers & Mathematics with Applications, 64 , 1886–1902.

PT

Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification. WileyInterscience.

CE

Eiben, A. E., & Smith, J. E. (2003). Introduction to Evolutionary Computing. SpringerVerlag.

AC

Gao, W., Liu, S., & Huang, L. (2012). Particle swarm optimization with chaotic opposition-based population initialization and stochastic search technique. Communications in Nonlinear Science and Numerical Simulation, 17 , 4316 – 4327. Garc´ıa-Domingo, B., Carmona, C., Rivera-Rivas, A., del Jesus, M., & Aguilera, J. (2015). A differential evolution proposal for estimating the maximum power delivered by CPV modules under real outdoor conditions. Expert Systems with Applications, 42 , 5452–5462. Gong, W., Cai, Z., Ling, C. X., & Du, J. (2009). Hybrid differential evolution based on fuzzy c-means clustering. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (GECCO ’09) (pp. 523–530). New York, NY, USA: ACM. 29

ACCEPTED MANUSCRIPT

Guerrero, J. L., Berlanga, A., & Molina, J. M. (2012). Initialization procedures for multiobjective evolutionary approaches to the segmentation issue. In 7th International Conference on Hybrid Artificial Intelligent Systems - Volume Part I (pp. 452–463). Berlin, Heidelberg: Springer-Verlag.

CR IP T

Jabeen, H., Jalil, Z., & Baig, A. R. (2009). Opposition based initialization in particle swarm optimization (O-PSO). In Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO ’09): Late Breaking Papers (pp. 2047–2052). Jamil, M., & Yang, X. (2013). A literature survey of benchmark functions for global optimisation problems. International Journal of Mathematical Modelling and Numerical Optimisation, 4 , 150–194. Ji, M., & Klinowski, J. (2006). A literature survey of benchmark functions for global optimisation problems. Proceedings of the Royal Society, A, 462 , 3613–3627.

AN US

Kazimipour, B., Li, X., & Qin, A. (2013). Initialization methods for large scale global optimization. In 2013 IEEE Congress on Evolutionary Computation (CEC 2013) (pp. 2750–2757).

M

Maaranen, H., Miettinen, K., & Penttinen, A. (2007). On initial populations of a genetic algorithm for continuous optimization problems. Journal of Global Optimization, 37 , 405–436.

ED

Martinez-Estudillo, A. C., Hervas-Martinez, C., Martinez-Estudillo, F. J., & GarciaPedrajas, N. (2006). Hybridization of evolutionary algorithms and local search by means of a clustering method. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 36 , 534–545.

PT

Martinovi´c, G., & Bajer, D. (2012). Impact of NNA implementation on GA performance for the TSP. In 5th International Conference on Bioinspired Optimization Methods and their Applications (BIOMA 2012) (pp. 173–184).

CE

Martinovi´c, G., Bajer, D., & Zori´c, B. (2014). A differential evolution approach to dimensionality reduction for classification needs. International Journal of Applied Mathematics and Computer Science, 24 , 111–122.

AC

de Melo, V. V., & Delbem, A. C. B. (2012). Investigating smart sampling as a population initialization method for differential evolution in continuous problems. Information Sciences, 193 , 36–53. Nagano, M. S., da Silva, A. A., & Lorena, L. A. N. (2014). An evolutionary clustering search for the no-wait flow shop problem with sequence dependent setup times. Expert Systems with Applications, 41 , 3628 – 3633. Naldi, M. C., Campello, R. J. G. B., Hruschka, E. R., & Carvalho, A. C. P. L. F. (2011). Efficiency issues of evolutionary k-means. Applied Soft Computing, 11 , 1938–1952. 30

ACCEPTED MANUSCRIPT

Neri, F., & Tirronen, V. (2010). Recent advances in differential evolution: A survey and experimental analysis. Artificial Intelligence Review , 33 , 61–106. Nocedal, J., & Wright, S. (2006). Numerical Optimization. Springer-Verlag. Oliveira, A. C. M., & Lorena, L. A. N. (2007). Hybrid evolutionary algorithms. chapter Hybrid Evolutionary Algorithms and Clustering Search. (pp. 77–99). Springer Berlin Heidelberg.

CR IP T

¨ urk, M. M., Cavusoglu, U., & Zengin, A. (2015). A novel defect prediction Ozt¨ method for web pages using k-means++. Expert Systems with Applications, 42 , 6496–6506.

AN US

Paul, P. V., Moganarangan, N., Kumar, S. S., Raju, R., Vengattaraman, T., & Dhavachelvan, P. (2015). Performance analyses over population seeding techniques of the permutation-coded genetic algorithm: An empirical study based on traveling salesman problems. Applied Soft Computing, 32 , 383–402. Price, K., Storn, R. M., & Lampinen, J. A. (2005). Differential Evolution: A Practical Approach to Global Optimization. Secaucus, NJ, USA: Springer-Verlag New York, Inc.

M

Puris, A., Bello, R., Molina, D., & Herrera, F. (2012). Variable mesh optimization for continuous optimization problems. Soft Computing, 16 , 511–525.

ED

Rahnamayan, S., Tizhoosh, H., & Salama, M. M. A. (2008). Opposition-based differential evolution. IEEE Transactions on Evolutionary Computation, 12 , 64– 79.

PT

Rahnamayan, S., Tizhoosh, H. R., & Salama, M. M. A. (2007). A novel population initialization method for accelerating evolutionary algorithms. Computers & Mathematics with Applications, 53 , 1605–1614.

CE

Rahnamayan, S., & Wang, G. G. (2009). Toward effective initialization for largescale search spaces. WSEAS Transactions on Systems, 8 , 355–367.

AC

Ram, G., Mandal, D., Kar, R., & Ghoshal, S. P. (2015). Opposition-based BAT algorithm for optimal design of circular and concentric circular arrays with improved far-field radiation characteristics. International Journal of Numerical Modelling: Electronic Networks, Devices and Fields, . In press. Richards, M., & Ventura, D. (2004). Choosing a starting configuration for particle swarm optimization. In 2004 IEEE International Joint Conference on Neural Networks (pp. 2309–2312 vol.3). volume 3. Segura, C., Coello, C. A. C., & Hernndez-Daz, A. G. (2015). Improving the vector generation strategy of differential evolution for large-scale optimization. Information Sciences, 323 , 106 – 129. 31

ACCEPTED MANUSCRIPT

Sheng, W., Swift, S., Zhang, L., & Liu, X. (2005). A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35 , 1156–1167. Storn, R., & Price, K. (1997). Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11 , 341–359.

CR IP T

Teboulle, M. (2007). A unified continuous optimization framework for center-based clustering methods. Journal of Machine Learning Research, 8 , 65–102. Theodoridis, S., & Koutroumbas, K. (2008). Pattern Recognition, Fourth Edition. (4th ed.). Academic Press.

AN US

Tizhoosh, H. R. (2005). Opposition-based learning: A new scheme for machine intelligence. In International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-1 (CIMCA-IAWTIC’06) (pp. 695–701). Washington, DC, USA: IEEE Computer Society. Vendramin, L., Campello, R. J. G. B., & Hruschka, E. R. (2010). Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining, 3 , 209–235.

ED

M

Wang, Z., Duan, H., & Zhang, X. (2009). An improved greedy genetic algorithm for solving travelling salesman problem. In 5th International Conference on Natural Computation (ICNC ’09) (pp. 374–378). volume 5. Weber, M., Neri, F., & Tirronen, V. (2011). Shuffle or update parallel differential evolution for large-scale optimization. Soft Computing, 15 , 2089–2107.

CE

PT

Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2007). Top 10 algorithms in data mining. Knowledge and Information Systems, 14 , 1–37.

AC

Xu, R., & Wunsch, D. (2009). Clustering. Wiley-IEEE Press. Yao, X., Liu, Y., & Lin, G. (1999). Evolutionary programming made faster. IEEE Transactions on Evolutionary Computation, 3 , 82–102. Zamuda, A., & Brest, J. (2014). Vectorized procedural models for animated trees reconstruction using differential evolution. Information Sciences, 278 , 1–21. ˇ Zamuda, A., Brest, J., Boˇskovi´c, B., & Zumer, V. (2008). Large scale global optimization using differential evolution with self-adaptation and cooperative coevolution. In IEEE 2008 Congress on Evolutionary Computation (CEC 2008) (pp. 3718–3725). 32

ACCEPTED MANUSCRIPT

Zhang, G., Gao, L., & Shi, Y. (2011). An effective genetic algorithm for the flexible job-shop scheduling problem. Expert Systems with Applications, 38 , 3563–3573.

AC

CE

PT

ED

M

AN US

CR IP T

Zhang, J., Chung, H. S. H., & Lo, W. L. (2007). Clustering-based adaptive crossover and mutation probabilities for genetic algorithms. IEEE Transactions on Evolutionary Computation, 11 , 326–335.

33