Investigating Smart Sampling as a population initialization method for Differential Evolution in continuous problems

Investigating Smart Sampling as a population initialization method for Differential Evolution in continuous problems

Information Sciences 193 (2012) 36–53 Contents lists available at SciVerse ScienceDirect Information Sciences journal homepage: www.elsevier.com/loc...

1MB Sizes 1 Downloads 15 Views

Information Sciences 193 (2012) 36–53

Contents lists available at SciVerse ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Investigating Smart Sampling as a population initialization method for Differential Evolution in continuous problems Vinícius Veloso de Melo a,⇑, Alexandre Cláudio Botazzo Delbem b a b

Institute of Science and Technology, Federal University of São Paulo, São José dos Campos, SP, Brazil Laboratory of Reconfigurable Computing, University of São Paulo, São Carlos, SP, Brazil

a r t i c l e

i n f o

Article history: Received 15 March 2011 Received in revised form 21 November 2011 Accepted 31 December 2011 Available online 12 January 2012 Keywords: Metaheuristic Smart Sampling Promising region Population initialization Differential Evolution Global Optimization

a b s t r a c t Recently, researches have shown that the performance of metaheuristics can be affected by population initialization. Opposition-based Differential Evolution (ODE), Quasi-Oppositional Differential Evolution (QODE), and Uniform-Quasi-Opposition Differential Evolution (UQODE) are three state-of-the-art methods that improve the performance of the Differential Evolution algorithm based on population initialization and different search strategies. In a different approach to achieve similar results, this paper presents a technique to discover promising regions in a continuous search-space of an optimization problem. Using machine-learning techniques, the algorithm named Smart Sampling (SS) finds regions with high possibility of containing a global optimum. Next, a metaheuristic can be initialized inside each region to find that optimum. SS and DE were combined (originating the SSDE algorithm) to evaluate our approach, and experiments were conducted in the same set of benchmark functions used by ODE, QODE and UQODE authors. Results have shown that the total number of function evaluations required by DE to reach the global optimum can be significantly reduced and that the success rate improves if SS is employed first. Such results are also in consonance with results from the literature, stating the importance of an adequate starting population. Moreover, SS presents better efficacy to find initial populations of superior quality when compared to the other three algorithms that employ oppositional learning. Finally and most important, the SS performance in finding promising regions is independent of the employed metaheuristic with which SS is combined, making SS suitable to improve the performance of a large variety of optimization techniques. Ó 2012 Elsevier Inc. All rights reserved.

1. Introduction The task of global optimization has arisen in several areas of real-world problems, such as protein structure prediction [8], logistics or circuit design (traveling salesman problem) [6], chemical engineering [35], and airspace design [10]. This task involves the minimization or maximization of a known objective function or an unknown black-box function. In general, these functions are highly complex and may be time-consuming, taking several days, weeks or months to achieve an adequate result, which may not be the global optimum. To solve this type of task, several global optimization metaheuristics have been developed. Metaheuristics [13,46,21] are optimization techniques used to search for high-quality solutions of a problem of which one is expected to be the global optimum. One of the main characteristics is that metaheuristics need neither gradient information to guide the search nor specific knowledge of a problem (heuristic), which makes them useful to solve a wide range of ⇑ Corresponding author. Tel.: +55 12 3309 9500. E-mail addresses: [email protected] (V.V. de Melo), [email protected] (A.C. Botazzo Delbem). 0020-0255/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2011.12.037

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

37

problems, including black-box ones. Several strategies have been investigated to improve the exploratory efficiency of metaheuristics to reach the global optimum of a problem. For instance, strategies have been developed to reduce premature convergence and to increase the chance of escaping from local optima. Basically, those strategies, when applied to populational metaheuristics, involve the maintenance of diversity in the set of solutions. Another procedure that has been investigated to improve a metaheuristic’s performance is the step related to the population initialization [15,27,34]. Initial population generation involves an exploration phase. This phase allows the algorithm to select locations to be explored and others to be discarded. Traditional metaheuristics move toward the best solutions. Thus, a bad initialization that generates solutions close to each other (clusters of solutions) could leave large areas with no solutions to explore. On the other hand, if the population is very disperse, a large number of iterations may be required to reach a local optimum. Moreover, if two solutions far from each other are combined, there is a high chance that the offspring will be closer to the best solution found, which could leave a large unexplored gap. Thus, an initialization method capable of providing a better exploration of the search-space and presenting only high-quality solutions should improve the performance of a metaheuristic. Metaheuristics themselves are naturally guided towards promising regions [36] (see Fig. 1). There is an exploration phase and then the population moves towards the best solution found in order to exploit that region. On the other hand, a desired movement could be the one presented in Fig. 2. The exploration phase may take longer and the population can be split into more than one region. After that, the regions can be exploited independently. This paper presents an approach to explore the search-space and to find promising regions. The objective is to aid global optimization algorithms by indicating the initial search-space areas with higher possibility of finding the global optimum. The approach is iteratively applied to explore the search-space inside promising regions which become smaller at each iteration, similarly to the strategy proposed in [17], excluding areas considered unfavorable. The new approach, called Smart Sampling (SS), seeks to preserve diversity in more than one promising region, providing a better exploration of the search-space. First of all, SS generates some solutions, evaluates them using the objective function, and splits them into good and bad based on a threshold applied to the function value of each solution. Then, SS employs a machine learning algorithm to map characteristics of these good solutions, allowing it to check if a new solution is good without being evaluated. If the new solution is identified as a good one, then it can be evaluated by the objective function. This approach works well on problems with small or high numbers of variables. In the last step, another machine learning algorithm separates the different promising regions to be exploited by any metaheuristic, which will refine the high-quality solutions found during the SS process. Therefore, SS is employed to increase the efficiency of global optimization algorithms, and can be essential to obtain satisfactory results in situations in which the execution of a large number of experiments is not viable. Furthermore, and most important, several researchers have studied ways to improve well-known metaheuristics by using heuristics, or local-search methods, or creating hybrids. Here, we propose a technique that is neither an operator nor a strategy to be included in a search technique, and has been developed for use as a preprocessing phase. It can be directly used to improve the performance of any populational continuous global optimization technique. SS is tested in conjunction with Differential Evolution (DE), a well-known metaheuristic, in several bound-constrained optimization problems with different properties. The quality of SSDE’s solutions is better than DE’s with fewer evaluations. The paper also presents a performance comparison with other three approaches (ODE, QODE and UQODE) that improve DE to explore the search-space in an attempt to find promising regions and escape from local optima. The results have shown that SSDE provides considerably better performance than the other three approaches. The paper is organized as follows: Section 2 contains some related works on population initialization and machine learning techniques used to improve metaheuristics; in Section 3, the proposed SS algorithm is presented in details; the Differential Evolution algorithm is briefly described in Sections 4 and 5 presents a preliminary study on SSDE’s behavior using some well-known benchmark functions. Some charts show the distribution of high-quality solutions during SS procedure and convergence curves through the optimization process. The experiments comparing SSDE, DE and the other three DE improvements are presented and discussed in Section 5. Finally, Section 6 concludes the paper and presents future works.

Fig. 1. The common movement of solutions in metaheuristics. The lighter areas are promising regions with higher probability of finding local or global optima.

38

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

Fig. 2. A possibly better movement of solutions in metaheuristics.

2. Related works The use of machine learning algorithms to improve metaheuristics is not new. Jourdan et al. [12] showed how classification and clustering techniques are applied to hybridize metaheuristics reducing the computational time and simplifying the objective function by an approximation technique or improve the quality of the search by adding background knowledge to the operators. Ramsey and Grefenstette [34] initialized a genetic algorithm using case-based reasoning to allow the system to bias the search toward promising areas. However, this approach is problem-specific. In [16], several distribution algorithms to cover the search-space were investigated. After a statistical analysis, the authors concluded that besides the uniform distribution presenting undesired clusters of points, the evaluated problems were not benefited from a better distribution algorithm. On the other hand, Rahnamayan et al. [32] developed a theory stating that it is possible to improve the performance of a metaheuristic by using oppositional points in the initialization and during the optimization process. The theory was successfully evaluated on several benchmark problems using the Differential Evolution algorithm as the metaheuristic to test it. Pant et. al. [27] tested a quasi-random sequence generator in conjunction with a Particle Swarm Optimization method for solving constrained optimization problems. The results showed that the proposed algorithm presented a promising alternative to the classical PSO. Based on the promising results obtained by related works, we have started to investigate other ways to generate good starting solutions for metaheuristics by finding promising regions in the search-space. 3. Smart Sampling The basic flowchart of SS is presented in Fig. 3. First, SS samples the search-space to identify the first large regions which must be explored. The higher the dimensionality of the problem, the larger the first sample. The main idea of SS is to perform

Fig. 3. Basic smart sampling flowchart.

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

39

a resampling only in areas considered promising regions, avoiding wasting evaluations in non-promising areas. If a simple random resampling is executed inside promising regions, final iterations will lead to local optima. This behavior should be avoided when the number of local optima is considerably high. Instead of elaborating complex probabilistic models or allowing a metaheuristic to perform the exploration, a simple resampling strategy is applied to the best points to generate new ones. A machine-learning technique is applied to verify if a new solution can be considered promising. The promising solutions are evaluated and the bad ones are discarded, avoiding computational effort searching in non-promising regions. The technique applied to separate promising solutions from bad ones must be generated in a simple way so that it can be quickly tested to predict the class (promising or non-promising) of the new sampled solutions. Box et al. [3] argues that ‘‘all models are wrong, but some are useful’’. Based on this principle, the algorithm does not have to achieve 100% precision or accuracy in the identification. Moreover, it is important that it fails sometimes. This importance is due to a problem called overfitting [18]. Overfitting occurs when the learned model presents high precision in determining the solutions used in the training phase, but low precision using new (not seen) solutions in the test phase, meaning that the model has memorized the training solutions, but cannot generalize the learning process. In our approach, the algorithm is expected to identify all promising solutions correctly, and it is acceptable that it identifies some non-promising solutions as promising ones. The opposite must be avoided, but it is difficult to achieve such high precision without overfitting. Following that idea, SS keeps diversity to reduce the chance of discarding the area which contains the global optimum. It is not possible to definitely guarantee that, because the search-space cannot be fully explored. However, when the search is performed for longer periods around and among the best solutions found so far, the chance of discarding the global optimum is considerably reduced. This is valid when at least one of the current solutions is in the promising region of the global optimum, which requires a large sample in the exploration phase. Many techniques that use stochastic knowledge try to guide the search by using highly complex models of the searchspace [14,28]. The approach proposed in this work uses a machine-learning technique, known as a classifier, to separate promising solutions from non-promising ones. When the SS process finishes, the promising regions must be identified. Some authors have proposed the use of a clustering algorithm for this procedure [23,25,4]. However, there are some drawbacks in this approach. The clustering result is dependent on the seed, which can make this approach less robust. Also, the number of desired clusters, which need to be specified for the majority of clustering algorithms, is not an easy parameter to obtain. The unsupervised k-windows clustering algorithm [41] does not require the number of clusters as a parameter, and was used as an operator in a hybrid DE [40]. Nonetheless, the experiments were conducted only in four and basically bi-dimensional benchmark functions. In this paper, problems up to 60-dimensions were tested. As the solutions are split into regions which must be identified and in which clustering cannot be used, another machinelearning algorithm has to be employed. In this case, we chose another machine learning technique, which provides an adequate output as the one presented in Subsection 3.2.2. Summarizing, the main advantages of SS are. 1. it can find one or more promising regions with high probability of containing the global optimum, instead of guiding the search to a unique region, as commonly occurs to several metaheuristics; 2. it can significantly reduce the total number of evaluations, when the best promising region is close to the global optimum; 3. it was developed in such a way that it allows a general use, i.e., differently from approaches developed for specific metaheuristics (niching, speciation) [7,2] and hybrid algorithms which identify these promising regions during the optimization process [4,11]. SS can be employed as a preprocessing phase for any metaheuristic. On the other hand, the main drawback of SS may be the computational cost. For each iteration of the algorithm, a machine learning technique must be employed to generate a model. If this task is time-consuming, SS will take a long processing time. This is one of the reasons why we have chosen a simpler model. However, if the most time-consuming part is the evaluation function, SS can be very attractive because it can find high-quality solutions using fewer evaluations. 3.1. Smart Sampling: details This section provides more details about the SS algorithm presented in Fig. 3. Generate an initial random population: in this step, the population is initialized randomly inside the bounds of the problem, as commonly performed in the literature. The population is evaluated and its best part is selected to be the population of the algorithm. It is important that this initial population should be large enough to provide an adequate covering of the search-space. As this step will be performed only once, the cost – in number of evaluations – is acceptable;

40

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

Select promising solutions: the promising solutions are the ones that present the best fitness value. A threshold is used to select them. A low threshold value can lead to local optima very quickly. A high value will probably do the opposite, as poor solutions will be treated as promising ones. A conservative value of 50% is indicated; Stop criteria: number of resamplings (trials) OR minimum window size; both user-defined. Let the limits be 50 6 xi 6 150, and the minimum window 1%, meaning that the difference between the lower limit (initially 50) and the upper limit (initially 150) must be larger than (150  (50)) ⁄ 0.01 = 2 for each xi. For instance, the final limit can be 1 6 xi 6 1;0 6 xi 6 2; or 0.5 6 xi 6 1.5. It is important to notice that each variable has its own interval; Learn promising solutions: in this work, k-Nearest Neighbor (kNN with k = 1), presented in Subsection 3.2.1, was used. It is important to observe that any classification technique can be used to differentiate promising from non-promising solutions. We chose kNN for simplicity. The procedure is shown in Algorithm 1; Sample new promising solutions: use some procedure to generate new solutions and classify them. Then save the solutions classified as promising and discard the others. Repeat this process until a desired number of promising solutions has been generated. When finished, evaluate the new solutions. This step discards any solution classified as non-promising. To generate new solutions, a simple strategy (see Algorithm 2) moves the promising points towards one of the best points found so far, called target point. However, this movement is made with some noise, which can rotate the point, shift it, or even move it away from the target point. This is not a problem because SS was developed exactly to explore the searchspace around promising regions. Also, at each call to this procedure, a counter is incremented to reduce the noise. Moreover, after these new solutions have been generated, only the ones classified as promising will be evaluated and possibly inserted into the population; Insert the new promising solutions into the population: after insertion, sort the population by fitness and discard the worst ones, keeping the same size of the initial population. Identify the promising regions: while kNN is used to learn the characteristics of the promising solutions, a rule-based learner (see SubSection 3.2.2) is used to separate the regions where the promising solutions are located. This step works as a clustering phase. Algorithm 3 was employed to obtain the solutions located inside each region.

Algorithm 1. Procedure to learn the difference between promising and non-promising solutions. input:

output: 1: 2: 3: 4:

PN,D: matrix with the Population of solutions, where N is the number of solutions and D the number of variables (dimensions). P can be all solutions of the current iteration or only a sample of them. fitnessN: a vector with the fitness of P. C: a classifier. sort the fitness vector. create the classN vector, where the solutions with better fitness (based on a threshold) are set to ‘‘GOOD’’ and the worst are set to ‘‘BAD’’. train C using P and class. return C.

Algorithm 2. Resampling Operator. Try resampling until the number of new solutions generated has been enough. input:

output: 1: 2: 3: 4:

5:

PSN,D: matrix with the Promising Solutions, where N is the number of promising solutions and D the number of variables (dimensions). tries: number of trials this operator was called. tpD: target point, one of the best solutions found, randomly chosen from the best 50%, for instance. lower_lim: lower limit for the random uniform distribution. upper_lim: upper limit for the random uniform distribution. NSN,D: matrix of the New Solutions. diff = tp  PS NOISEN,D = U  (N⁄D, lower_lim, upper_lim) NS = PS + (NOISE/tries⁄diff) Assert NS[i, j] is inside the problem’s bounds. If not, replace NS[i, j] by a value (randomly chosen) near the closest bound; For instance: if the interval is [-1, 1] and NS[i, j] = 1.328, then a possible result is NS[i, j] = 0.972. return NS

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

41

Algorithm 3. Obtain the points using the rules. input: output: 1: 2: 3: 4:

POPN,D: matrix with the Population, where N is the number of promising solutions and D the number of variables (dimensions). Ruleset: set of rules given by the RBL. Subsets: an array where each position stores the solutions that satisfy a different rule. for i = 1 to length(Ruleset) do Subsets [i]=(solutions from POP that satisfy Ruleset [i]) //For instance: V1 > 0.5 and V7 < 33: end do return Subsets

By reducing the initial search-space, the starting solutions that will be used by the metaheuristic can be concentrated inside the promising region found. Thus, SS tries to avoid the creation of solutions distant from the global optimum and guarantee a high-quality sampling near the global optimum. The best promising region may not contain the global optimum. For instance, the solution of the global optimum can be x⁄ = (0, 0), but a promising region found by SS may be defined by x1[1, 1] and x2[0.5, 1], where x⁄ is outside x2. To deal with this problem, a portion of the population of the metaheuristic can be randomly initialized outside the promising region in an attempt to generate at least one individual near the global optimum. Because the metaheuristic is not limited to the promising region, the recombination of the outside solutions with the solutions inside the promising region can reach the global optimum. When SS is finished, the final promising areas must be separated. This is the second phase of the algorithm (dashed region in Fig. 3). Some authors have proposed the use of a clustering algorithm to accomplish this task [25,4,24], however there are some disadvantages in this approach. Many clustering algorithms use stochastic approaches, thus depending on the initial configuration of the clusters that will be corrected through the process. Moreover, the number of clusters is an input parameter that must be specified for most clustering algorithms. Obtaining such a parameter is not trivial, therefore, we chose a different path – application of a classifier to generate rules that differentiate the areas. 3.2. Classifiers Classifying is a technique that consists in learning a previously labeled set of examples to generate a model capable of correctly labeling another set of unknown or unseen instances. This technique has been used, for example, in fraud detection, data mining, pattern recognition, and drug discovery [18]. The most common classifiers are based on trees, neural networks, and rules. To work with classifiers, we used the Weka [44,45]1 open source machine-learning package. We chose the simplest classifier to separate promising solutions from non-promising ones (kNN), and a more accurate classifier to identify and separate the final promising regions (rule-based). 3.2.1. k-Nearest Neighbor Instance-based classifiers, such as the k-nearest neighbor (kNN) algorithm, are amongst the simplest machine-learning algorithms. They classify unknown instances by relating them to the known instances according to some distance/similarity function. In other words, two close instances based on an appropriate distance function tend to belong to the same class, while two distant instances tend to belong to different classes. An object is classified by a majority vote of its neighbors to be assigned to the most common class amongst its k nearest neighbors. k is a typically small positive integer. If k = 1, then the object is simply assigned to the class of its nearest neighbor. Neighbors are solutions from the training set represented by position vectors in a multidimensional feature space. It is usual to use the Euclidean distance and, in an optimization problem where the solutions are points on a response surface[20], this is the most indicated distance function. The instance-based kNN learner used in this work was the IBk [1]. 3.2.2. Rule-based learner A rule-based classifier uses a set of IF-THEN rules to classify instances. An IF-THEN rule is a logical expression of the form.  IF condition THEN conclusion. An example of rule R1 is  R1:IF weather = sunny AND wind = weak THEN play_tennis = yes. The conclusion of the rule contains a class prediction. In the case presented for R1, the class is ‘‘to play tennis’’ (yes or no). Using this type of structure, it is possible to parse the ruleset and obtain the set of instances that triggers each rule, splitting them into different regions. 1

www.cs.waikato.ac.nz/ml/weka/.

42

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

As an example, let us use the Tripod function (see Fig. 4). This function has 2 local optima and 1 global optimum, which is located at x = (0, 50). First, generate a random set of points. Evaluate them and select the promising solutions (best 25%, for instance) to be of class ‘‘P’’ and the remaining points to be of class ‘‘n’’. Finally, train the RBL to separate the solutions. After this process, a possible set of promising regions is the one presented in Fig. 5 and the corresponding set of rules is shown in Fig. 6. The rule-based learner used in this work was the JRip, the Weka’s implementation of RIPPER (Repeated Incremental Pruning to Produce Error Reduction) [5]. Other classifiers, like neural networks or SVM, are able to determine regions with undefined shapes reducing the misclassifications. However, as SS was expected to be simple and fast, more complex and time-consuming techniques were not considered in this work. 4. Differential Evolution Differential Evolution was introduced by Storn and Price in 1995 [37]. It is a floating-point encoding populational metaheuristic, similar to classical evolutionary algorithms, successfully used to solve several benchmarks and real-world problems [22,26,43,42]. Population P of D dimensions is randomly initialized (using a uniform distribution) inside the problem’s bounds and evaluated using the fitness function for the problem. Next, until a stop criterion has been met, the algorithm runs a loop of mutation, crossover and selection operators. For each vector xi of the population, the mutation operator uses the weighted difference of parent solutions to generate mutation vectors vi. Two of the most frequently used mutation strategies are: 1. rand/1/bin

v i ¼ xr1 þ F  ðxr2  xr3 Þ

ð1Þ

2. local-to-best/1/bin

v i ¼ xi þ ðxbest  xi Þ þ xr1 þ F  ðxr2  xr3 Þ

ð2Þ

where xr1, xr2 and xr3 are three distinct and randomly chosen vectors from P, xbest is the best vector from P, and F 2 [0, 2] is the mutation factor. The crossover operator is applied on

 ui;j

vi to generate the final offspring vector ui.

v i;j ;

if U  ð0; 1Þ 6 CR or j ¼¼ truncðU  ð1; DÞÞ;

xi;j ;

otherwise

ð3Þ

where j = 1, . . ., D, U(a, b) is a random floating-point number from a uniform distribution between a and b generated for each j, and CR 2 [0, 1] is the crossover probability. Finally, the selection step selects the best evaluated vector between xi and ui. The offspring replaces the parent if its fitness value is better. Otherwise, the parent is maintained in the population. 5. Evaluating SS The functioning of SS in continuous 2D test functions (Ackley, Alpine, Griewank, Parabola, Rastrigin, Rosenbrock and Tripod, see mathematical definition in Table 1) is illustrated, respectively, in Figs. 7–13. For this experiment, SS was configured as follows: window_size = 0.01; lower_lim = 0.5 and upper_lim = 1.5. Those lim values are used in Algorithm 2. The graphs in the figures show the evolution/reduction in promising regions through SS iterations according to the distribution of high-quality solutions in the search-space. At each iteration, high-quality solutions are selected and a Kernel Density Estimation algorithm is applied to generate the plot. The graduation of colors, from dark to light, indicates the

y

z

x

z

y

x

Fig. 4. Surface of the Tripod function (see mathematical definition in Table 1).

43

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53 100

x2

50

0

−50

−100 −100

−50

0

50

100

x1 Fig. 5. Example of regions (rectangles) detected according to the rules generated for Tripod function. P are the solutions classified as promisingand n as nonpromising. Points in lighter color were incorrectly classified by the rules.

Fig. 6. Example of rules generated for the points in Fig. 5.

Table 1 Test functions used to plot the density charts showing SS procedure. B contains the problem’s bounds, x⁄ is the global optimum, and D = 2. Name

Definition

Ackley

 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi P  PD xi D f ðxÞ ¼ 20 exp 0:2  exp i¼1 D i¼1 cosð2pxi Þ þ 20 þ e

Alpine

f ðXÞ ¼

Griewank Parabola Rastrigin Rosenbrock Tripod

PD i¼1 jxi sinðxi Þ þ 0:1xi j PD 2 QD xiffi 1 p f ðxÞ ¼ 4000 i¼1 xi  i¼1 cos i þ 1 PD 2 f ðxÞ ¼ i¼1 xi P  2 f ðxÞ ¼ D i¼1 xi  10 cosð2pxi Þ þ 10 i

PD1 h 2 2 f ðxÞ ¼ i¼1 100 xiþ1  xi þ ðxi  1Þ2 8 > < if ðx2 < 0Þ jx1 j þ jx2 þ 50j f ðxÞ ¼ elseif ðx < 0Þ  1 þ jx þ 50j þ jx  50j 1 1 2 > : else  2 þ jx1  50j þ jx2  50j

x⁄

B [16, 48]

D

(0, 0)

[5, 15]D

(0, 0)

[200, 800]D

(0, 0)

[100, 100]D

(0, 0)

[5.12, 5.12]D

(0, 0)

[30, 30]D

(1, 1) D

[100, 100]

Fig. 7. SS procedure in Ackley function. x1 and x2 are the two variables of the problem.

(0, 50)

44

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

Fig. 8. SS procedure in Alpine function.

Fig. 9. SS procedure in Griewank function.

Fig. 10. SS procedure in Parabola function.

Fig. 11. SS procedure in Rastrigin function.

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

45

Fig. 12. SS procedure in Rosenbrock function.

Fig. 13. SS procedure in Tripod function.

concentration of high-quality solutions. The lighter the color, the higher the concentration. These figures allow observing that SS is capable of finding small regions close to the global optimum (x⁄). For those same functions, an experiment using D = 10 was performed (except for the Tripod function, where D = 2) and the convergence curves of DE and SSDE were compared. The total number of evaluations, including the SS procedure, was set to D ⁄ 5000, and the configuration of DE was popsize = 100, F = 0.5, CR = 0.9, and strategy rand/1/bin. After SS has stopped, the region where the best point found is located is chosen as a starting region. To initialize the population for DE, the best 50 points of the region are selected, and other 50 points are randomly generated outside the region. For each of the seven functions, 50 runs were performed. The convergence curves, using the median of the 50 runs, are presented in Fig. 14. SS can find more than one promising region, but we selected only the region where the best solution was found. This methodology was adopted in this experiment only to present the curves. However, it is interesting that more regions should be investigated by DE in real problems. Fig. 14 shows that SSDE provides better final solutions. In Ackley, Parabola and Tripod (from the last row) functions, both curves (DE and SSDE) present a similar behavior. One possible conclusion is that DE is only refining the best solution inside of the region found. The Tripod function is presented twice. First it represents the process using window_size = 0.01, where the optimum was not found. The region containing the global optimum was also found. As it was the second in the queue for investigation, it was not used by DE. On the other hand, the second Tripod plot (in the last row) represents the process using window_size = 0.1 and upper_limit lowered from 1.5 to 0.5, limiting the exploration. In this case, the region with the global optimum was selected more times, resulting in a much better curve. The performance was very similar in Rastrigin and Griewank functions. SSDE improved the best solution much faster than DE did. The region with the best solution was very close to the global optimum, making DE’s work easy and cheap. Moreover, SSDE found the global optimum with a very small error (<1e  200). This is reflected in the vertical line close to generation 400. When DE was applied alone in those functions, the improvement was slow and the best solution found presented a much inferior quality. For the Alpine function, a fast improvement was observed in the first hundred generations of SSDE, then smaller improvements were verified until the end. Nonetheless, SSDE was able to find better solutions. Similarly to what occured with Tripod function, Rosenbrock presented an unexpected result. SS got trapped in the local minimum region close to the solution s = (0, . . . , 0), whereas the global optimum is x⁄ = (1, . . . , 1). This result is depicted in the first Rosenbrock chart, which shows that SS finds a promising, but bad region very quickly. The second Rosenbrock chart (in the last row) was obtained when the

46

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

ackley

alpine 1e+00

1e−01

1e−03

1e−04 1e−07

1e−06 DE SSDE

1e−10

DE SSDE

1e−09 0

100

200

300

400

500

0

100

griewank

200

300

400

500

400

500

parabola 1e+04

1e+01 1e−04

1e−02

1e−09

1e−08 1e−14

1e−14 DE SSDE

1e−19 0

100

DE SSDE

1e−20 200

300

400

500

0

100

rastrigin

200

300

rosenbrock

1e+02

DE SSDE

1e+06 1e−03 1e+04 1e−08 1e+02 1e−13 DE SSDE

1e−18 0

100

1e+00 200

300

400

500

400

500

0

100

200

300

400

500

400

500

tripod 1e−07 1e−19 1e−31 1e−43

DE SSDE

1e−55 0

100

200

300

rosenbrock

tripod DE SSDE

1e+05

1e−11

1e+01

1e−31

1e−03

1e−51

1e−07

1e−71

1e−11

1e−91 0

100

200

300

400

500

DE SSDE

0

100

200

300

Fig. 14. Median curves over 50 trials for functions from Table 1. The circle at the beginning of SSDE’s curve is the generation in which SS procedure finished and DE started. The horizontal axis presents the number of generations, and the vertical axis presents the objective function value.

lower_limit was changed from 0.5 to 0.5. This change biases the search towards the best point found at each iteration, instead of allowing the algorithm to explore the response surface. Such a change accelerated the process by orders of magnitude, as seen in the chart. These results have shown that SS is able to find promising regions containing high-quality solutions. Those solutions can be used as a starting population of a metaheuristic, such as DE, without any modification in the metaheuristic. SS improved DE by finding much better solutions using fewer evaluations and less computational time. The next section compares our approach to another initialization method which uses Opposition-Based Learning.

47

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53 Table 2 Benchmark function definitions. Name 1st De Jong Axis parallel hyperellipsoid Schwefel’s Problem 1.2 Rastrigin’s function Griewank’s function Sum of different powers Ackley’s problem Levy function 13 Michalewicz function Zakharov function Schwefel’s Problem 2.22

Definition P 2 f1 ðXÞ ¼ D i¼1 xi PD f2 ðXÞ ¼ i¼1 ix2i 2 j¼1 xj PD 2 f4 ðXÞ ¼ 10D þ i¼1 xi  10 cosð2pxi Þ   PD x2i QD f5 ðXÞ ¼ 1 þ i¼1 4000  i¼1 cos pxiffii P ðiþ1Þ f6 ðXÞ ¼ D i¼1 jxi j  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi P  PD xi D f7 ðXÞ ¼ 20 exp 0:2  exp i¼1 D i¼1 cosð2pxi Þ þ 20 þ e f3 ðXÞ ¼

PD Pi

f10 ðXÞ ¼

PD

f11 ðXÞ ¼

PD

f12 ðXÞ ¼

Alpine function

f13 ðXÞ ¼

Exponential problem Salomon problem

f(x⁄) 0

[2.56, 7.68]D

0

[32.5, 97.5]D

0

D

0

i¼1

P 2 2 2 2 f8 ðXÞ ¼ sin ð3px1 Þ þ D i¼1 ðxi  1Þ ð1 þ sin ð3pxiþ1 Þ þ ðxD  1Þð1 þ sin ð2pxD ÞÞ P 2 2m sinðx Þðsinðix = p ÞÞ ; m ¼ 10 f9 ðXÞ ¼  D i i i¼1

Step function

S.P.B. [2.56, 7.68]D

2 i¼1 xi

þ

i¼1 jxi j

PD

P

þ

D i¼1 0:5ixi

2

þ

P

D i¼1 0:5ixi

4

QD

i¼1 jxi j 2

i¼1 ðxxi þ 0:5yÞ PD i¼1 jxi sinðxi Þ þ 0:1xi j

 P  2 f14 ðXÞ ¼ exp 0:5 D i¼1 xi

 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PD 2 f15 ðXÞ ¼ 1  cosð2pkxkÞ þ 0:1kxk; wherekxk ¼ i¼1 xi

[2.56, 7.68] [300, 900]D

0

[0.5, 1.5]D

0

[16,48]D

0

[10, 10]D

0

[0, p]D [5, 10]D

9.66015; 19.637014 0

[5, 15]D

0

[50,150]D

0

[5, 15]D

0

[0.5, 1.5]D

1

[50, 150]D

0

Table 3 SS settings. Initial sample size: the best half of D ⁄ 100, resulting in D ⁄ 50 solutions Number of promising solutions: the best 50% of the population New solutions at each iteration: D ⁄ 10 Maximum iterations: 100 Maximum iterations without improvement: 10 Minimum window size: 1% for each D Lower limit: 0.1 Upper limit: 1

Table 4 DE settings. Population size: N = 100 Differential amplification factor: F = 0.5 Crossover probability constant: CR = 0.9 Maximum number of function calls: MAXNFC = 106 Maximum generations without improvement: D ⁄ 10 Strategy to compare SSDE with DE: rand/1/bin Strategy to compare SSDE with ODE, QODE and UQODE: local-to-best/1/bin

6. Computational experiments This paper presents the effects of SS on global optimization problems using a well-known global optimization algorithm, i.e., the classical Differential Evolution (DE) [38]. The results using SS (our approach) in conjunction with DE – called SSDE – are compared to the results presented by ODE [33], QODE [31] and UQODE [29]. 6.1. Test benchmark functions To evaluate the SSDE, our approach was applied to the same set of 14 standard continuous global optimization test problems used to test the ODE, QODE and UQODE. These test problems correspond to 7 unimodal and 7 multimodal functions. The 14 test functions were tested with two different dimensions (D and 2⁄D) to increase the problem’s difficulty. Therefore, the DE started inside promising regions is compared using 28 minimization problems against a DE randomly started in the whole search-space utilizing the benchmark functions presented in Table 2.

48

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

Table 5 Comparison of DE and SSDE. D: dimension, NFC: number of evaluations (average over 50 executions) until VTR has been achieved; SR: success rate; SP: success performance. The last rows of the table present the averages. The best success performance (SP) for each case is highlighted in boldface. DE is unable to solve f9 (D = 20), and SSDE and DE are unable to solve f10 (D = 60). The last line presents the averaged SR. Symbol   means that the lowest value (NFC or SR) is statistically significant with a = 0.05. F

D

30 60 30 60 20 40 10 20 30 60 30 60 30 60 30 60 10 20 30 60 30 60 30 60 30 60 10 20

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14

DE

SSDE

NFC

SR

SP

NFC

SR

SP

78210 140070 86730 159300 161660 746430 275430 642800 103280 177250 17160 28980 150670 268800 87160 155844.4 183775 – 359640 – 164890 287910 39760 75070 383070 403610 16820 39940

1 1 1 1 1 1 1 0.1 1 1 1 1 1 0.7 1 0.9 0.4 0 1 0 1 1 1 1 1 1 1 1

78210 140070 86730 159300 161660 746430 275430 6428000 103280 177250 17160 28980 150670 384000 87160 173160.4 459437.5 – 359640 – 164890 287910 39760 75070 383070 403610 16820 39940

46030  93450  53940  115960  114290  464480  20290  51900  66330  133400  9010  17580  113440  223480  54590  115220  47600  55660  290760  – 106770  215460  9090  19560  103000  198930  9170  21580 

1 1 1 1 1 1 1 1  1 1 1 1 1 1  1 1 0.5 1 1 0 1 1 1 1 1 1 1 1

46030 93450 53940 115960 114290 464480 20290 51900 66330 133400 9010 7580 113440 223480 54590 115220 95200 55660 290760 – 106770 215460 9090 19560 103000 198930 9170 21580

SR

0.803

0.96

SP

415194

100663

As 13 of the 14 benchmark functions have a global optimum in the center of the search-space, the function’s domain was shifted by 50% to become asymmetric, moving the global optimum from the center, as follows: Let O.P.B be original parameter bounds and S.P.B. shifted parameter bounds. If O.P.B.: a 6 xi 6 a and f(x⁄) = f(0, . . . , 0) = 0, then S.P.B.:a þ a2 6 xi 6 a þ a2. 6.2. Comparison strategies and measurements To compare the results of the experiments and decide which algorithm performs better after 50 runs for each benchmark function, three metrics were used: number of function calls (NFC), success rate (SR), and success performance (SP). Since the global minimum is known for each of those functions, the success can be calculated according to a value-to-reach (VTR), which is an objective function value (f(x)) close to the global optimum (f(x⁄)). The level of proximity is defined by a maximum allowed error

VTR ¼ jf ðxÞ  f ðx Þj 6 108 : The lower the NFC, the higher the convergence speed. The stop criterion of the algorithms is to reach either MAXNFC (maximum number of function calls allowed) or VTR. The NFC is averaged over the number of trials. The SR value represents the percentage of times the algorithm succeeds to reach the VTR:

SR ¼

number of times VTR has been reached : total number of trials

To simplify the comparison of algorithms using multiple criteria, Suganthan et. al. [39] introduced the SP, which, by combining NFC and SR, presents an estimate of the maximum number of function evaluations needed to achieve the global optimum. SP is our final measurement to determine which algorithm performs better.

SP ¼

meanðNFC for successful runsÞ : SR

49

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

Table 6 Comparison of ODE, QODE, UQODE, and SSDE2 (our approach, but DE with local-to-best/1/bin strategy). D: dimension, NFC: number of function calls (average over 50 trials) until VTR has been achieved, SR: success, SP: success performance. The last rows of the table present the averages. The best success performance for each case is highlighted in boldface. ODE, QODE, and UQODE are unable to solve f10 (D = 60). F

D

ODE

QODE

UQODE

SSDE2

NFC

SR

SP

NFC

SR

SP

NFC

SR

SP

NFC

SR

SP

f1

30 60

50844 101832

1 1

50844 101832

42896 94016

1 1

42896 94016

23316 68245

1 1

23316 68245

25500 55490

1 1

25500 55490

f2

30 60

56944 117756

1 1

56944 117756

47072 105992

1 1

47072 105992

26166 77133

1 1

26166 77133

28480 64500

1 1

28480 64500

f3

20 40

177300 834668

1 1

177300 834668

116192 539608

1 1

116192 539608

33370 170508

1 1

33370 170508

28860 95740

1 1

28860 95740

f4

10 20

75278 421300

0.92 0.16

81823 2633125

181100 615280

1 0.16

181100 3845500

72362 154897

1 1

72362 154897

9977 28740

0.9 1

11086 28740

f5

30 60

74717 128340

0.92 0.68

81214 188735

100540 115280

0.80 0.68

125675 169529

105176 204985

1 1

105176 204985

33130 73710

1 1

33130 73710

f6

30 60

10152 11452

1 1

10152 11452

9452 14667

1 0.84

9452 17461

11050 18666

1 1

11050 18666

8810 16680

1 1

8810 16680

f7

30 60

100280 202010

1 0.96

100280 210427

82448 221850

1 0.72

82448 308125

151290 127272

1 1

151290 127272

50450 116280

1 1

50450 116280

f8

30 60

70408 121750

1 0.60

70408 202900

50576 98300

1 0.40

50576 245800

81988 172639

1 1

81988 172639

29620 65090

1 1

29620 65090

f9

10 20

213330 253910

0.56 0.55

380900 461700

247640 193330

0.48 0.68

515900 284300

63568 276348

1 0.96 

63568 287863

114930 585366

1 0.3

114930 1951222

f10

30 60

369104 –

1 0

369104 –

239832 –

1 0

239832 –

120278 –

1 0

120278 –

74210 744210

1 1 

74210 744210

f11

30 60

167580 274716

1 1

167580 274716

108852 183132

1 1

108852 183132

47208 126302

1 1

47208 126302

51160 114260

1 1

51160 114260

f12

30 60

26400 64780

1 1

26400 64780

21076 64205

1 1

21076 64205

13682 754400

1 1

13682 754400

9030 19140

1 1

9030 19140

f13

30 60

361884 425700

1 0.96

361884 443438

291448 295084

1 1

291448 295084

52492 157248

1 1

52492 157248

48310 104030

1 1

48310 104030

f14

10 20

16112 31720

1 1

16112 31720

13972 23776

1 1

13972 23776

4420 10689

1 1

4420 10689

6250 12940

1 1

6250 12940

SR

0.87

0.85

0.96

0.97

SP

268864

286536

112043

142209

6.3. Statistical analysis To compare the SR values for each algorithm, a proportion test (a = 0.05) was applied. On the other hand, Wilcoxon’s rank sum test (a = 0.05) was employed to detect differences in the means and identify the best of the 2 (DE versus SSDE) algorithms for each test function. This test is used to compare the means only when the global optimum has been found. However, as SR can be different for the same problem, the vector of means may have different sizes, leading to inadequate conclusions. Thus, those two statistics were added as supplementary information. As presented in Section 6.2, a more adequate metric of comparison is SP. As the results of ODE, QODE and UQODE are unknown, they cannot be included in the statistical analysis.

6.4. Configuration of the algorithms Parameter settings of SS and DE for all conducted experiments are defined in Tables 3 and 4, respectively. SS code was developed in R [30] language.2 For the DE algorithm, we used the DEoptim package [19] available in R. The DE is run for each promising region found by the SS until VTR or MAXNFC has been achieved. The SS returns a set of solutions separated into promising regions. The order of the promising regions is determined by sorting the best values found in each promising region. The region which contains the best solution is optimized first by the DE. If the best solution is the same during D ⁄ 10 iterations (stagnation) and is not the VTR, then the optimization of this region is stopped. This number of iterations was empirically determined based on experimental analysis. 2

www.r-project.org.

50

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

The population of 100 random solutions for the DE is generated as follows: 50% are the best points of a promising region and the other 50% are randomly generated inside the problem bounds. If the promising region has less than 50 points, the Resampling Operator is applied to these solutions in order to complete 50 solutions. As shown in Table 4, in the comparison between SSDE and DE both use the same strategy to evaluate if SS can improve DE. On the other hand, the other three algorithms employ improved strategies. Thus, for a fair comparison we cannot use a strategy that explores the search-space for longer periods (rand/1/bin). Because SS finds promising regions, we used local-to-best/1/bin, which is better in local optimization.

6.5. Results Results of the application of DE and SSDE (our approach) to solve 14 test problems in 28 test cases are given in Table 5. The best success performance (SP) for each case is highlighted in boldface. The DE results in this table correspond to our experiments, instead of the results in [33]. As SP considers NFC and SR simultaneously, it is preferred as a measurement to compare optimization algorithms instead of a multi-criteria approach. As seen in Table 5, SSDE achieved the best SP for all functions. Also, SR improved from 80% (DE) to 96% (SSDE). This improvement can be more easily verified for functions f4 and f9, where DE presented very poor performance. In general, DE explored the search-space for much longer periods until it found a promising region to exploit. At other times, DE got stuck inside a local optima. While in some functions the number of evaluations required by SSDE was close to the number required by DE (f2 and f11, for instance), for others the reduction was much stronger (see f4, f9 and f12). Moreover, the average number of function evaluations needed to find the global optimum (SP) reduced from 415,194 (DE) to 100, 663 (SSDE), i.e. approximately 76%. This is a substantial increase in performance considering that no changes were made in the DE algorithm. When compared to the classical DE in the set of optimization problems tested, SSDE obtained a higher success rate while using a considerably lower number of function evaluations. Based on this result, one can conclude that SS is very effective in finding high-quality promising regions with low effort.

Table 7 Statistics on the number of function evaluations for SSDE2. F

D

Min

1st Quartile

Median

Average

3rd Quartile

Max

Std. dev.

f1

30 60

24800 52300

25150 54225

25400 55200

25500 55490

25700 56600

26600 58400

551.76 1986.31

f2

30 60

27300 60100

28125 63025

28300 64800

28480 64500

29025 65850

29500 68400

734.54 2309.4

f3

20 40

27600 86900

28200 91700

28400 95250

28860 95740

29175 100925

32100 103300

1316.73 5763.14

f4

10 20

9500 19200

9700 19600

10000 19900

9977.78 28740

10200 20475

10400 108200

330.82 27923.24

f5

30 60

31400 68200

32900 71350

33150 73250

33130 73710

33725 75050

34400 81700

889.51 4016.76

f6

30 60

6900 13200

8400 16200

8850 17100

8810 16680

9300 17400

10300 18000

1088.78 1408.55

f7

30 60

48900 112500

49625 113600

50000 116250

50450 116280

51225 118975

53100 119900

1255.43 2835.02

f8

30 60

28600 62100

29400 63300

29450 65800

29620 65090

30000 66300

30700 68400

588.41 2092.02

f9

10 20

22900 422900

63200 438050

87450 453200

114930 585366.67

138775 666600

289600 880000

81833.53 255609.32

f10

30 60

69500 611700

71275 686950

74850 743950

74210 744210

76750 768200

79000 981900

3273.28 99864.26

f11

30 60

50400 111000

50750 112150

51050 114400

51160 114260

51350 115700

52800 118200

671.98 2429.08

f12

30 60

8700 17400

9000 18750

9000 19200

9030 19140

9225 19650

9300 20400

221.36 957.08

f13

30 60

45900 100700

47750 102525

48250 103400

48310 104030

48625 105325

51600 108200

1410.63 2535.55

f14

10 20

5800 12000

6200 12825

6250 13000

6250 12940

6375 13150

6600 13600

232.14 429.99

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

51

The next step is to compare SSDE to other approaches that improve the classical DE. As those approaches improve not only the population initialization, but the procedure that searches for the global optimum, a fair comparison cannot use the classical DE. Thus, we changed the classical strategy for another that is better at exploiting regions, as explained in Section 6.4. The results of SSDE compared to the results of ODE, QODE and UQODE are presented in Table 6. It is important to stress that UQODE is an improvement over QODE, which is an improvement over ODE, which is an improvement over DE. Thus, UQODE is the best of those four algorithms, as shown in [29]. In Table 6, SSDE achieved the best SP in 20 of the 28 functions, ODE in 1, QODE in 1, and UQODE in 6. In function f6 (D = 60), where ODE was the best, SSDE was the second best. In function f9 (D = 20), where QODE was the best, SSDE was the worst. One should notice that although SSDE was not the best (f1 with D = 30, f2 with D = 30, f11 with D = 30, f14 with D = 10 and D = 20) in five of the cases, it presented very competitive results. When the upper and lower limits of the resampling procedure of SS were changed to allow for a larger exploration, the result was similar to the one obtained by QODE. The idea is to avoid tuning the algorithm for specific problems, although it can be used to achieve better results. When compared to UQODE, SSDE required more evaluations because of SS. For easier problems, SS can be closer to the global optimum, being responsible for a large number of function evaluations. This behavior was observed for functions f1 (D = 30, more than a half), f2 (D = 30, almost a half), f6 (D = 60, all evaluations – DE was unnecessary), and f14 (more than half). As the best of the other three algorithms is UQODE, let us make a simpler comparison: SSDE x UQODE. Very large reductions in the number of function evaluations were obtained in f4 (85%), f5 (65%), f7 (66%), f8 (63%), f10 (38%), f12 (97%). Possible explanations are that UQODE wastes too much time in non-promising regions or gets trapped in local optima for too many iterations. By using SS initialization method, those behaviors tend to be reduced because SS focus on promising regions of the search-space, avoiding an excessive exploration. SR value is only 1% higher with SSDE, while SP is 21% lower, mainly because of f9 (D = 20). Ignoring f9(D = 20) and f10 (D = 60) as outliers, one would have SP UQODE ¼ 109; 590 and SP SSDE2 ¼ 49; 478, which seems much more adequate, and a reduction of approximately 55% in the number of function evaluations. Issues related to these two functions could possibly be solved by a different configuration for the algorithms, using, for example, a larger population size. However, in this paper we do not intend to find the best configuration for each function. Finally, Table 7 presents a descriptive analysis of the number of function evaluations required by SSDE2 to achieve VTR. This table is provided so that other authors can make different comparisons to our work. Those experiments were conducted to show that SSDE cannot only outperform other algorithms made to improve DE, but also highlight that such performance can be achieved using a default setting of SS for a variety of problems, without finetuning the parameters of the algorithm. Another important aspect is that DE is a very powerful technique and even the original DE strategies are able to perform well on several problems when a method to find promising regions is used.

7. Conclusions and future works This paper has presented a technique to find promising regions of the search-space of continuous functions. The approach, named Smart Sampling (SS), uses a machine-learning technique to identify promising and non-promising solutions to guide the resampling procedure to smaller areas where higher-quality solutions can be found. This iterative process ends when a stop criterion has been achieved, for instance, when a promising region is too small. At this point, another machine-learning technique is applied to separate the promising solutions into promising regions. Those promising solutions can then be used as an initial population for metaheuristics. In this work, the metaheuristic used in conjunction with the SS was the Differential Evolution (DE), originating the SSDE algorithm. To evaluate SSDE’s performance, a set of 7 classical optimization functions was used. We presented the distribution of promising solutions on the search-space during the SS process to find promising regions. An experiment was conducted on those functions to compare SSDE to DE and plot the optimization curves. The results showed a very clear advantage when using SS. DE and SSDE were then compared using 14 hard continuous optimization problems in two different sizes (dimensions) resulting in 28 cases. While DE achieved a success rate (SR) of 80% with success performance (SP) of 415,194 evaluations, SSDE achieved SR = 96% with only SP = 100, 663 evaluations, which represents an improvement of 16% in the success rate and a reduction of 76% in the number of function evaluations. SSDE was compared to other three state-of-the-art algorithms that improve DE based on population initialization and different mutation strategies. Those three algorithms, i.e. Opposition-based Differential Evolution (ODE), Quasi-Oppositional Differential Evolution (QODE), and Uniform-Quasi-Opposition Differential Evolution (UQODE) showed significant reductions in the number of function evaluations. Experimental results of SSDE to the same previous 28 case problems were compared to the results obtained by ODE, QODE and UQODE. SSDE (our approach) outperformed the other three approaches in 20 out of 28 cases based on the number of function evaluations, success rate, and success performance. In other five functions, SSDE achieved very competitive performance. Analyzing the results presented in this work, it is possible to conclude that the proposed approach has shown relevant results to the task of global optimization. The search for promising regions for population initialization is a research area that can be better studied as it can not only provide much better results using fewer evaluations, but also increase the success rate.

52

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

As a final conclusion, the development of methods similar to SS can be considered relevant, as they can be independent of the optimization technique. This property is especially interesting because SS could provide new solutions for hard problems in a wide range of scientific areas. The way SS was designed allows for its application to any other metaheuristic in future works. Several future works can be developed from this paper. SS currently uses simple classifiers to identify differences among the samples and split the final solutions into promising regions. As explained in Section 3.2.2, we have chosen those algorithms to reduce computational time and evaluate the performance of SS without complex relationship models. However, we are aware that better classifiers can provide better results. Therefore, SVM, Neural Networks and Bayesian Networks are some classifiers that can be evaluated in future researches. This first version of SS presents several parameters for configuration. However, the number of parameters of SS is similar to the number of parameters of a Genetic Algorithm (GA)[9], a well-known and largely employed metaheuristic. For instance, it comprises population size, crossover rate, number of generations, value-to-reach, and mutation probability, whereas SS comprehends number of promising solutions, new solutions at each iteration, maximum iterations, minimum window size, and lower and upper limits. Moreover, we are seeking for ways to make it auto-adaptive, reduce the number of parameters or make a parameterless version. For now, empirical observations have suggested that for noisy and multimodal problems SS requires larger population size (D ⁄ 100), higher number of iterations (100), larger window size (1% of the range) and larger limits to provide better exploration. On the other hand, unimodal problems can be solved in fewer iterations and smaller window size, reaching regions very close to the global optimum. Problems with a very small number of variables are not recommended for SS due to the large number of evaluations required by the first sampling of the search-space, unless the problem is very multimodal and noisy. Another important future study is the evaluation of SS with other populational metaheuristics. Significant improvements over the original metaheuristic are expected if it tends to fail escaping from local optima. Preliminary tests performed with other two populational metaheuristics have provided similar results and we are planning to publish them soon. Acknowledgments The authors would like to acknowledge CAPES (a Brazilian Research Agency) for the financial support given to this research. References [1] [2] [3] [4] [5] [6] [7] [8]

[9] [10]

[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]

D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, in: Machine Learning, 1991, pp. 37–66. T. Back, D.B. Fogel, Z. Michalewicz, Handbook of Evolutionary Computation, Institute of Physics, Ringbound, 1997. G.E.P. Box, W.G. Hunter, J.S. Hunter, Statistics for Experimenters, John Willey, New York, 1978. R. Chelouah, P. Siarry, Genetic and nelder-mead algorithms hybridized for a more accurate global optimization of continuous multi-minima functions, European Journal of Operational Research 148 (2) (2003) 335–348. W.W. Cohen, Fast effective rule induction, in: Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufman, 1995, pp. 115–123. T.R. Dastidar, P.P. Chakrabarti, P. Ray, A synthesis system for analog circuits based on evolutionary search and topological reuse, IEEE Transactions on Evolutionary Computation 9 (2) (2005) 211–224. K. Deb, D. Goldberg, An investigation of niche and species formation in genetic function optimization, in: J. Schaffer (Ed.), Proceedings of the Third International Conference on Genetic Algorithms, Morgan Kaufman, San Mateo, CA, USA, 1989, pp. 42–50. P. Gabriel, A. Delbem, Representations for evolutionary algorithms applied to protein structure prediction problem using hp model, in: K. Guimarpes, A. Panchenko, T. Przytycka (Eds.), Advances in Bioinformatics and Computational Biology, Lecture Notes in Computer Science, vol. 5676, Springer, Berlin/Heidelberg, 2009, pp. 97–108. D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989. S. Hitzel, L. Nardin, K. Sorensen, H. Rieger, Aerodynamic optimization of an ucav configuration, in: N. Kroll, D. Schwamborn, K. Becker, H. Rieger, F. Thiele (Eds.), MEGADESIGN and MegaOpt – German Initiatives for Aerodynamic Simulation and Optimization in Aircraft Design, Notes on Numerical Fluid Mechanics and Multidisciplinary Design, vol. 107, Springer, Berlin/Heidelberg, 2009, pp. 263–285. M. Jelasity, P. Ortigosa, I. Garcia, Uego, an abstract clustering technique for multimodal global optimization, Journal of Heuristics 7 (3) (2001) 215–233. L. Jourdan, C. Dhaenens, E.-G. Talbi, Using datamining techniques to help metaheuristics: a short survey, in: Hybrid Metaheuristics, 2006, pp. 57–69. A.Y.S. Lam, V.O.K. Li, Chemical-reaction-inspired metaheuristic for optimization, IEEE Transactions on Evolutionary Computation 14 (3) (2010) 381– 399. K.-H. Liang, X. Yao, C. Newton, Evolutionary search of approximated n-dimensional landscapes, International Journal of Knowledge-Based Intelligent Engineering Systems 4 (3) (2000) 172–183. H. Maaranen, K. Miettinen, M. MSkelS, Quasi-random initial population for genetic algorithms, Computers and Mathematics with Applications 47 (12) (2004) 1885–1895. H. Maaranen, K. Miettinen, A. Penttinen, On initial populations of a genetic algorithm for continuous optimization problems, Journal of Global Optimization 37 (2007) 405–436. V.V. Melo, A.C.B. Delbem, D.L. Pinto Junior, F.M. Federson, Discovering promising regions to help global numerical optimization algorithms, in: 6th Mexican International Conference on Artificial Intelligence (MICAI’07), 2007, pp. 72–82. T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. K. Mullen, D. Ardia, D. Gil, D. Windover, J. Cline, DEoptim: an R package for global optimization by differential evolution, Journal of Statistical Software 40 (6) (2011) 1–26. R.H. Myers, D.C. Montgomery, Response Surface Methodology: Process and Product Optimization Using Designed Experiments, second ed., Wiley, New York, 2002. Z. Naji-Azimi, P. Toth, L. Galli, An electromagnetism metaheuristic for the unicost set covering problem, European Journal of Operational Research 205 (2) (2010) 290–300.

V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53

53

[22] F. Neri, G. Iacca, E. Mininno, Disturbed exploitation compact differential evolution for limited memory optimization problems, Information Sciences 181 (12) (2011) 2469–2487. [23] A.C.M. Oliveira, L.A.N. Lorena, Hybrid evolutionary algorithms and clustering search, in: Crina Grosan, Ajith Abraham, Hisao Ishibuchi (Eds.), Hybrid Evolutionary Systems – Studies in Computational Intelligence – Springer SCI Series, vol. 75, 2007, pp. 81–102. [24] A.C.M.d. Oliveira, Algoritmos evolutivos híbridos com detecção de regiões promissoras em espaços de busca contínuos e discretos, PhD thesis, Instituto Nacional de Pesquisas Espaciais, São José dos Campos, Julho 2004. [25] P.M. Ortigosa, I. Garcı´a, M. Jelasity, Reliability and performance of uego, a clustering-based global optimizer, Journal of Global Optimization 19 (2001) 265–289. [26] Q.-K. Pan, L. Wang, L. Gao, W.D. Li, An effective hybrid discrete differential evolution algorithm for the flow shop scheduling with intermediate buffers, Information Sciences 181 (2011) 668–685. [27] M. Pant, R. Thangaraj, A. Abraha, Low discrepancy initialized particle swarm optimization for solving constrained optimization problems, Fundamenta Informaticae 95 (2009) 511–531. [28] M. Pelikan, D.E. Goldberg, E. Cantu-Paz, Linkage problem, distribution estimation, and bayesian networks, Evolutionary Computation 8 (3) (2000) 311– 340. [29] L. Peng, Y. Wang, Differential evolution using uniform-quasi-opposition for initializing the population, Information Technology Journal 9 (8) (2010) 1629–1634. [30] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008. ISBN: 3-900051-07-0. [31] S. Rahnamayan, H.R. Tizhoosh, M.M.A. Salama, Quasi-oppositional differential evolution, in: IEEE Congress on Evolutionary Computation, IEEE, 2007, pp. 2229–2236. [32] S. Rahnamayan, H.R. Tizhoosh, M.M.A. Salama, Opposition-based differential evolution, IEEE Transactions on Evolutionary Computation 12 (1) (2008) 64–79. [33] S. Rahnamayan, H.R. Tizhoosh, M.M.A. Salama, Opposition versus randomness in soft computing techniques, Applied Soft Computing 8 (2) (2008) 906– 918. [34] C.L. Ramsey, J.J. Grefenstette, Case-based initialization of genetic algorithms, in: Proceedings of the 5th International Conference on Genetic Algorithms, Morgan Kaufman, Publishers Inc., San Francisco, CA, USA, 1993, pp. 84–91. [35] G.P. Rangaiah, Stochastic Global Optimization, Techniques and Applications in Chemical Engineering, World Scientific Publishing Company, 2010. [36] P. Siarry, Z. Michalewicz (Eds.), Advances in Metaheuristics for Hard Optimization, Natural Computing Series, Springer, 2008. [37] R. Storn, K. Price, Differential Evolution – A Simple and Efficient Adaptive Scheme for Global Optimization over Continuous Spaces, Tech. rep., 1995. [38] R. Storn, K. Price, Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization 11 (4) (1997) 341–359. [39] P.N. Suganthan, N. Hansen, J.J. Liang, K. Deb, Y.-P. Chen, A. Auger, S. Tiwari, Problem definitions and evaluation criteria for the cec 2005 special session on real-parameter optimization, Tech. Rep. KanGAL Report 2005005, Nanyang Technological University, Singapore, 2005. [40] D.K. Tasoulis, V.P. Plagianakos, M.N. Vrahatis, Clustering in evolutionary algorithms to efficiently compute simultaneously local and global minima, in: Congress on Evolutionary Computation, 2005, pp. 1847–1854. [41] M.N. Vrahatis, B. Boutsinas, P. Alevizos, G. Pavlides, The new k-windows algorithm for improving the k-means clustering algorithm, Journal of Complexity 18 (1) (2002) 375–391. [42] Y. Wang, B. Li, T. Weise, Estimation of distribution and differential evolution cooperation for large scale economic load dispatch optimization of power systems, Information Sciences 180 (2010) 2405–2420. [43] M. Weber, F. Neri, V. Tirronen, A study on scale factor in distributed differential evolution, Information Sciences 181 (12) (2011) 2488–2511. [44] Weka Machine Learning Project, Weka. . [45] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed., Morgan Kaufman, San Francisco, 2005. [46] E.E. Zachariadis, C.T. Kiranoudis, A local search metaheuristic algorithm for the vehicle routing problem with simultaneous pick-ups and deliveries, Expert Systems with Applications (2010).