Information Sciences 193 (2012) 36–53
Contents lists available at SciVerse ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Investigating Smart Sampling as a population initialization method for Differential Evolution in continuous problems Vinícius Veloso de Melo a,⇑, Alexandre Cláudio Botazzo Delbem b a b
Institute of Science and Technology, Federal University of São Paulo, São José dos Campos, SP, Brazil Laboratory of Reconfigurable Computing, University of São Paulo, São Carlos, SP, Brazil
a r t i c l e
i n f o
Article history: Received 15 March 2011 Received in revised form 21 November 2011 Accepted 31 December 2011 Available online 12 January 2012 Keywords: Metaheuristic Smart Sampling Promising region Population initialization Differential Evolution Global Optimization
a b s t r a c t Recently, researches have shown that the performance of metaheuristics can be affected by population initialization. Opposition-based Differential Evolution (ODE), Quasi-Oppositional Differential Evolution (QODE), and Uniform-Quasi-Opposition Differential Evolution (UQODE) are three state-of-the-art methods that improve the performance of the Differential Evolution algorithm based on population initialization and different search strategies. In a different approach to achieve similar results, this paper presents a technique to discover promising regions in a continuous search-space of an optimization problem. Using machine-learning techniques, the algorithm named Smart Sampling (SS) finds regions with high possibility of containing a global optimum. Next, a metaheuristic can be initialized inside each region to find that optimum. SS and DE were combined (originating the SSDE algorithm) to evaluate our approach, and experiments were conducted in the same set of benchmark functions used by ODE, QODE and UQODE authors. Results have shown that the total number of function evaluations required by DE to reach the global optimum can be significantly reduced and that the success rate improves if SS is employed first. Such results are also in consonance with results from the literature, stating the importance of an adequate starting population. Moreover, SS presents better efficacy to find initial populations of superior quality when compared to the other three algorithms that employ oppositional learning. Finally and most important, the SS performance in finding promising regions is independent of the employed metaheuristic with which SS is combined, making SS suitable to improve the performance of a large variety of optimization techniques. Ó 2012 Elsevier Inc. All rights reserved.
1. Introduction The task of global optimization has arisen in several areas of real-world problems, such as protein structure prediction [8], logistics or circuit design (traveling salesman problem) [6], chemical engineering [35], and airspace design [10]. This task involves the minimization or maximization of a known objective function or an unknown black-box function. In general, these functions are highly complex and may be time-consuming, taking several days, weeks or months to achieve an adequate result, which may not be the global optimum. To solve this type of task, several global optimization metaheuristics have been developed. Metaheuristics [13,46,21] are optimization techniques used to search for high-quality solutions of a problem of which one is expected to be the global optimum. One of the main characteristics is that metaheuristics need neither gradient information to guide the search nor specific knowledge of a problem (heuristic), which makes them useful to solve a wide range of ⇑ Corresponding author. Tel.: +55 12 3309 9500. E-mail addresses:
[email protected] (V.V. de Melo),
[email protected] (A.C. Botazzo Delbem). 0020-0255/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2011.12.037
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
37
problems, including black-box ones. Several strategies have been investigated to improve the exploratory efficiency of metaheuristics to reach the global optimum of a problem. For instance, strategies have been developed to reduce premature convergence and to increase the chance of escaping from local optima. Basically, those strategies, when applied to populational metaheuristics, involve the maintenance of diversity in the set of solutions. Another procedure that has been investigated to improve a metaheuristic’s performance is the step related to the population initialization [15,27,34]. Initial population generation involves an exploration phase. This phase allows the algorithm to select locations to be explored and others to be discarded. Traditional metaheuristics move toward the best solutions. Thus, a bad initialization that generates solutions close to each other (clusters of solutions) could leave large areas with no solutions to explore. On the other hand, if the population is very disperse, a large number of iterations may be required to reach a local optimum. Moreover, if two solutions far from each other are combined, there is a high chance that the offspring will be closer to the best solution found, which could leave a large unexplored gap. Thus, an initialization method capable of providing a better exploration of the search-space and presenting only high-quality solutions should improve the performance of a metaheuristic. Metaheuristics themselves are naturally guided towards promising regions [36] (see Fig. 1). There is an exploration phase and then the population moves towards the best solution found in order to exploit that region. On the other hand, a desired movement could be the one presented in Fig. 2. The exploration phase may take longer and the population can be split into more than one region. After that, the regions can be exploited independently. This paper presents an approach to explore the search-space and to find promising regions. The objective is to aid global optimization algorithms by indicating the initial search-space areas with higher possibility of finding the global optimum. The approach is iteratively applied to explore the search-space inside promising regions which become smaller at each iteration, similarly to the strategy proposed in [17], excluding areas considered unfavorable. The new approach, called Smart Sampling (SS), seeks to preserve diversity in more than one promising region, providing a better exploration of the search-space. First of all, SS generates some solutions, evaluates them using the objective function, and splits them into good and bad based on a threshold applied to the function value of each solution. Then, SS employs a machine learning algorithm to map characteristics of these good solutions, allowing it to check if a new solution is good without being evaluated. If the new solution is identified as a good one, then it can be evaluated by the objective function. This approach works well on problems with small or high numbers of variables. In the last step, another machine learning algorithm separates the different promising regions to be exploited by any metaheuristic, which will refine the high-quality solutions found during the SS process. Therefore, SS is employed to increase the efficiency of global optimization algorithms, and can be essential to obtain satisfactory results in situations in which the execution of a large number of experiments is not viable. Furthermore, and most important, several researchers have studied ways to improve well-known metaheuristics by using heuristics, or local-search methods, or creating hybrids. Here, we propose a technique that is neither an operator nor a strategy to be included in a search technique, and has been developed for use as a preprocessing phase. It can be directly used to improve the performance of any populational continuous global optimization technique. SS is tested in conjunction with Differential Evolution (DE), a well-known metaheuristic, in several bound-constrained optimization problems with different properties. The quality of SSDE’s solutions is better than DE’s with fewer evaluations. The paper also presents a performance comparison with other three approaches (ODE, QODE and UQODE) that improve DE to explore the search-space in an attempt to find promising regions and escape from local optima. The results have shown that SSDE provides considerably better performance than the other three approaches. The paper is organized as follows: Section 2 contains some related works on population initialization and machine learning techniques used to improve metaheuristics; in Section 3, the proposed SS algorithm is presented in details; the Differential Evolution algorithm is briefly described in Sections 4 and 5 presents a preliminary study on SSDE’s behavior using some well-known benchmark functions. Some charts show the distribution of high-quality solutions during SS procedure and convergence curves through the optimization process. The experiments comparing SSDE, DE and the other three DE improvements are presented and discussed in Section 5. Finally, Section 6 concludes the paper and presents future works.
Fig. 1. The common movement of solutions in metaheuristics. The lighter areas are promising regions with higher probability of finding local or global optima.
38
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
Fig. 2. A possibly better movement of solutions in metaheuristics.
2. Related works The use of machine learning algorithms to improve metaheuristics is not new. Jourdan et al. [12] showed how classification and clustering techniques are applied to hybridize metaheuristics reducing the computational time and simplifying the objective function by an approximation technique or improve the quality of the search by adding background knowledge to the operators. Ramsey and Grefenstette [34] initialized a genetic algorithm using case-based reasoning to allow the system to bias the search toward promising areas. However, this approach is problem-specific. In [16], several distribution algorithms to cover the search-space were investigated. After a statistical analysis, the authors concluded that besides the uniform distribution presenting undesired clusters of points, the evaluated problems were not benefited from a better distribution algorithm. On the other hand, Rahnamayan et al. [32] developed a theory stating that it is possible to improve the performance of a metaheuristic by using oppositional points in the initialization and during the optimization process. The theory was successfully evaluated on several benchmark problems using the Differential Evolution algorithm as the metaheuristic to test it. Pant et. al. [27] tested a quasi-random sequence generator in conjunction with a Particle Swarm Optimization method for solving constrained optimization problems. The results showed that the proposed algorithm presented a promising alternative to the classical PSO. Based on the promising results obtained by related works, we have started to investigate other ways to generate good starting solutions for metaheuristics by finding promising regions in the search-space. 3. Smart Sampling The basic flowchart of SS is presented in Fig. 3. First, SS samples the search-space to identify the first large regions which must be explored. The higher the dimensionality of the problem, the larger the first sample. The main idea of SS is to perform
Fig. 3. Basic smart sampling flowchart.
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
39
a resampling only in areas considered promising regions, avoiding wasting evaluations in non-promising areas. If a simple random resampling is executed inside promising regions, final iterations will lead to local optima. This behavior should be avoided when the number of local optima is considerably high. Instead of elaborating complex probabilistic models or allowing a metaheuristic to perform the exploration, a simple resampling strategy is applied to the best points to generate new ones. A machine-learning technique is applied to verify if a new solution can be considered promising. The promising solutions are evaluated and the bad ones are discarded, avoiding computational effort searching in non-promising regions. The technique applied to separate promising solutions from bad ones must be generated in a simple way so that it can be quickly tested to predict the class (promising or non-promising) of the new sampled solutions. Box et al. [3] argues that ‘‘all models are wrong, but some are useful’’. Based on this principle, the algorithm does not have to achieve 100% precision or accuracy in the identification. Moreover, it is important that it fails sometimes. This importance is due to a problem called overfitting [18]. Overfitting occurs when the learned model presents high precision in determining the solutions used in the training phase, but low precision using new (not seen) solutions in the test phase, meaning that the model has memorized the training solutions, but cannot generalize the learning process. In our approach, the algorithm is expected to identify all promising solutions correctly, and it is acceptable that it identifies some non-promising solutions as promising ones. The opposite must be avoided, but it is difficult to achieve such high precision without overfitting. Following that idea, SS keeps diversity to reduce the chance of discarding the area which contains the global optimum. It is not possible to definitely guarantee that, because the search-space cannot be fully explored. However, when the search is performed for longer periods around and among the best solutions found so far, the chance of discarding the global optimum is considerably reduced. This is valid when at least one of the current solutions is in the promising region of the global optimum, which requires a large sample in the exploration phase. Many techniques that use stochastic knowledge try to guide the search by using highly complex models of the searchspace [14,28]. The approach proposed in this work uses a machine-learning technique, known as a classifier, to separate promising solutions from non-promising ones. When the SS process finishes, the promising regions must be identified. Some authors have proposed the use of a clustering algorithm for this procedure [23,25,4]. However, there are some drawbacks in this approach. The clustering result is dependent on the seed, which can make this approach less robust. Also, the number of desired clusters, which need to be specified for the majority of clustering algorithms, is not an easy parameter to obtain. The unsupervised k-windows clustering algorithm [41] does not require the number of clusters as a parameter, and was used as an operator in a hybrid DE [40]. Nonetheless, the experiments were conducted only in four and basically bi-dimensional benchmark functions. In this paper, problems up to 60-dimensions were tested. As the solutions are split into regions which must be identified and in which clustering cannot be used, another machinelearning algorithm has to be employed. In this case, we chose another machine learning technique, which provides an adequate output as the one presented in Subsection 3.2.2. Summarizing, the main advantages of SS are. 1. it can find one or more promising regions with high probability of containing the global optimum, instead of guiding the search to a unique region, as commonly occurs to several metaheuristics; 2. it can significantly reduce the total number of evaluations, when the best promising region is close to the global optimum; 3. it was developed in such a way that it allows a general use, i.e., differently from approaches developed for specific metaheuristics (niching, speciation) [7,2] and hybrid algorithms which identify these promising regions during the optimization process [4,11]. SS can be employed as a preprocessing phase for any metaheuristic. On the other hand, the main drawback of SS may be the computational cost. For each iteration of the algorithm, a machine learning technique must be employed to generate a model. If this task is time-consuming, SS will take a long processing time. This is one of the reasons why we have chosen a simpler model. However, if the most time-consuming part is the evaluation function, SS can be very attractive because it can find high-quality solutions using fewer evaluations. 3.1. Smart Sampling: details This section provides more details about the SS algorithm presented in Fig. 3. Generate an initial random population: in this step, the population is initialized randomly inside the bounds of the problem, as commonly performed in the literature. The population is evaluated and its best part is selected to be the population of the algorithm. It is important that this initial population should be large enough to provide an adequate covering of the search-space. As this step will be performed only once, the cost – in number of evaluations – is acceptable;
40
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
Select promising solutions: the promising solutions are the ones that present the best fitness value. A threshold is used to select them. A low threshold value can lead to local optima very quickly. A high value will probably do the opposite, as poor solutions will be treated as promising ones. A conservative value of 50% is indicated; Stop criteria: number of resamplings (trials) OR minimum window size; both user-defined. Let the limits be 50 6 xi 6 150, and the minimum window 1%, meaning that the difference between the lower limit (initially 50) and the upper limit (initially 150) must be larger than (150 (50)) ⁄ 0.01 = 2 for each xi. For instance, the final limit can be 1 6 xi 6 1;0 6 xi 6 2; or 0.5 6 xi 6 1.5. It is important to notice that each variable has its own interval; Learn promising solutions: in this work, k-Nearest Neighbor (kNN with k = 1), presented in Subsection 3.2.1, was used. It is important to observe that any classification technique can be used to differentiate promising from non-promising solutions. We chose kNN for simplicity. The procedure is shown in Algorithm 1; Sample new promising solutions: use some procedure to generate new solutions and classify them. Then save the solutions classified as promising and discard the others. Repeat this process until a desired number of promising solutions has been generated. When finished, evaluate the new solutions. This step discards any solution classified as non-promising. To generate new solutions, a simple strategy (see Algorithm 2) moves the promising points towards one of the best points found so far, called target point. However, this movement is made with some noise, which can rotate the point, shift it, or even move it away from the target point. This is not a problem because SS was developed exactly to explore the searchspace around promising regions. Also, at each call to this procedure, a counter is incremented to reduce the noise. Moreover, after these new solutions have been generated, only the ones classified as promising will be evaluated and possibly inserted into the population; Insert the new promising solutions into the population: after insertion, sort the population by fitness and discard the worst ones, keeping the same size of the initial population. Identify the promising regions: while kNN is used to learn the characteristics of the promising solutions, a rule-based learner (see SubSection 3.2.2) is used to separate the regions where the promising solutions are located. This step works as a clustering phase. Algorithm 3 was employed to obtain the solutions located inside each region.
Algorithm 1. Procedure to learn the difference between promising and non-promising solutions. input:
output: 1: 2: 3: 4:
PN,D: matrix with the Population of solutions, where N is the number of solutions and D the number of variables (dimensions). P can be all solutions of the current iteration or only a sample of them. fitnessN: a vector with the fitness of P. C: a classifier. sort the fitness vector. create the classN vector, where the solutions with better fitness (based on a threshold) are set to ‘‘GOOD’’ and the worst are set to ‘‘BAD’’. train C using P and class. return C.
Algorithm 2. Resampling Operator. Try resampling until the number of new solutions generated has been enough. input:
output: 1: 2: 3: 4:
5:
PSN,D: matrix with the Promising Solutions, where N is the number of promising solutions and D the number of variables (dimensions). tries: number of trials this operator was called. tpD: target point, one of the best solutions found, randomly chosen from the best 50%, for instance. lower_lim: lower limit for the random uniform distribution. upper_lim: upper limit for the random uniform distribution. NSN,D: matrix of the New Solutions. diff = tp PS NOISEN,D = U (N⁄D, lower_lim, upper_lim) NS = PS + (NOISE/tries⁄diff) Assert NS[i, j] is inside the problem’s bounds. If not, replace NS[i, j] by a value (randomly chosen) near the closest bound; For instance: if the interval is [-1, 1] and NS[i, j] = 1.328, then a possible result is NS[i, j] = 0.972. return NS
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
41
Algorithm 3. Obtain the points using the rules. input: output: 1: 2: 3: 4:
POPN,D: matrix with the Population, where N is the number of promising solutions and D the number of variables (dimensions). Ruleset: set of rules given by the RBL. Subsets: an array where each position stores the solutions that satisfy a different rule. for i = 1 to length(Ruleset) do Subsets [i]=(solutions from POP that satisfy Ruleset [i]) //For instance: V1 > 0.5 and V7 < 33: end do return Subsets
By reducing the initial search-space, the starting solutions that will be used by the metaheuristic can be concentrated inside the promising region found. Thus, SS tries to avoid the creation of solutions distant from the global optimum and guarantee a high-quality sampling near the global optimum. The best promising region may not contain the global optimum. For instance, the solution of the global optimum can be x⁄ = (0, 0), but a promising region found by SS may be defined by x1[1, 1] and x2[0.5, 1], where x⁄ is outside x2. To deal with this problem, a portion of the population of the metaheuristic can be randomly initialized outside the promising region in an attempt to generate at least one individual near the global optimum. Because the metaheuristic is not limited to the promising region, the recombination of the outside solutions with the solutions inside the promising region can reach the global optimum. When SS is finished, the final promising areas must be separated. This is the second phase of the algorithm (dashed region in Fig. 3). Some authors have proposed the use of a clustering algorithm to accomplish this task [25,4,24], however there are some disadvantages in this approach. Many clustering algorithms use stochastic approaches, thus depending on the initial configuration of the clusters that will be corrected through the process. Moreover, the number of clusters is an input parameter that must be specified for most clustering algorithms. Obtaining such a parameter is not trivial, therefore, we chose a different path – application of a classifier to generate rules that differentiate the areas. 3.2. Classifiers Classifying is a technique that consists in learning a previously labeled set of examples to generate a model capable of correctly labeling another set of unknown or unseen instances. This technique has been used, for example, in fraud detection, data mining, pattern recognition, and drug discovery [18]. The most common classifiers are based on trees, neural networks, and rules. To work with classifiers, we used the Weka [44,45]1 open source machine-learning package. We chose the simplest classifier to separate promising solutions from non-promising ones (kNN), and a more accurate classifier to identify and separate the final promising regions (rule-based). 3.2.1. k-Nearest Neighbor Instance-based classifiers, such as the k-nearest neighbor (kNN) algorithm, are amongst the simplest machine-learning algorithms. They classify unknown instances by relating them to the known instances according to some distance/similarity function. In other words, two close instances based on an appropriate distance function tend to belong to the same class, while two distant instances tend to belong to different classes. An object is classified by a majority vote of its neighbors to be assigned to the most common class amongst its k nearest neighbors. k is a typically small positive integer. If k = 1, then the object is simply assigned to the class of its nearest neighbor. Neighbors are solutions from the training set represented by position vectors in a multidimensional feature space. It is usual to use the Euclidean distance and, in an optimization problem where the solutions are points on a response surface[20], this is the most indicated distance function. The instance-based kNN learner used in this work was the IBk [1]. 3.2.2. Rule-based learner A rule-based classifier uses a set of IF-THEN rules to classify instances. An IF-THEN rule is a logical expression of the form. IF condition THEN conclusion. An example of rule R1 is R1:IF weather = sunny AND wind = weak THEN play_tennis = yes. The conclusion of the rule contains a class prediction. In the case presented for R1, the class is ‘‘to play tennis’’ (yes or no). Using this type of structure, it is possible to parse the ruleset and obtain the set of instances that triggers each rule, splitting them into different regions. 1
www.cs.waikato.ac.nz/ml/weka/.
42
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
As an example, let us use the Tripod function (see Fig. 4). This function has 2 local optima and 1 global optimum, which is located at x = (0, 50). First, generate a random set of points. Evaluate them and select the promising solutions (best 25%, for instance) to be of class ‘‘P’’ and the remaining points to be of class ‘‘n’’. Finally, train the RBL to separate the solutions. After this process, a possible set of promising regions is the one presented in Fig. 5 and the corresponding set of rules is shown in Fig. 6. The rule-based learner used in this work was the JRip, the Weka’s implementation of RIPPER (Repeated Incremental Pruning to Produce Error Reduction) [5]. Other classifiers, like neural networks or SVM, are able to determine regions with undefined shapes reducing the misclassifications. However, as SS was expected to be simple and fast, more complex and time-consuming techniques were not considered in this work. 4. Differential Evolution Differential Evolution was introduced by Storn and Price in 1995 [37]. It is a floating-point encoding populational metaheuristic, similar to classical evolutionary algorithms, successfully used to solve several benchmarks and real-world problems [22,26,43,42]. Population P of D dimensions is randomly initialized (using a uniform distribution) inside the problem’s bounds and evaluated using the fitness function for the problem. Next, until a stop criterion has been met, the algorithm runs a loop of mutation, crossover and selection operators. For each vector xi of the population, the mutation operator uses the weighted difference of parent solutions to generate mutation vectors vi. Two of the most frequently used mutation strategies are: 1. rand/1/bin
v i ¼ xr1 þ F ðxr2 xr3 Þ
ð1Þ
2. local-to-best/1/bin
v i ¼ xi þ ðxbest xi Þ þ xr1 þ F ðxr2 xr3 Þ
ð2Þ
where xr1, xr2 and xr3 are three distinct and randomly chosen vectors from P, xbest is the best vector from P, and F 2 [0, 2] is the mutation factor. The crossover operator is applied on
ui;j
vi to generate the final offspring vector ui.
v i;j ;
if U ð0; 1Þ 6 CR or j ¼¼ truncðU ð1; DÞÞ;
xi;j ;
otherwise
ð3Þ
where j = 1, . . ., D, U(a, b) is a random floating-point number from a uniform distribution between a and b generated for each j, and CR 2 [0, 1] is the crossover probability. Finally, the selection step selects the best evaluated vector between xi and ui. The offspring replaces the parent if its fitness value is better. Otherwise, the parent is maintained in the population. 5. Evaluating SS The functioning of SS in continuous 2D test functions (Ackley, Alpine, Griewank, Parabola, Rastrigin, Rosenbrock and Tripod, see mathematical definition in Table 1) is illustrated, respectively, in Figs. 7–13. For this experiment, SS was configured as follows: window_size = 0.01; lower_lim = 0.5 and upper_lim = 1.5. Those lim values are used in Algorithm 2. The graphs in the figures show the evolution/reduction in promising regions through SS iterations according to the distribution of high-quality solutions in the search-space. At each iteration, high-quality solutions are selected and a Kernel Density Estimation algorithm is applied to generate the plot. The graduation of colors, from dark to light, indicates the
y
z
x
z
y
x
Fig. 4. Surface of the Tripod function (see mathematical definition in Table 1).
43
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53 100
x2
50
0
−50
−100 −100
−50
0
50
100
x1 Fig. 5. Example of regions (rectangles) detected according to the rules generated for Tripod function. P are the solutions classified as promisingand n as nonpromising. Points in lighter color were incorrectly classified by the rules.
Fig. 6. Example of rules generated for the points in Fig. 5.
Table 1 Test functions used to plot the density charts showing SS procedure. B contains the problem’s bounds, x⁄ is the global optimum, and D = 2. Name
Definition
Ackley
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi P PD xi D f ðxÞ ¼ 20 exp 0:2 exp i¼1 D i¼1 cosð2pxi Þ þ 20 þ e
Alpine
f ðXÞ ¼
Griewank Parabola Rastrigin Rosenbrock Tripod
PD i¼1 jxi sinðxi Þ þ 0:1xi j PD 2 QD xiffi 1 p f ðxÞ ¼ 4000 i¼1 xi i¼1 cos i þ 1 PD 2 f ðxÞ ¼ i¼1 xi P 2 f ðxÞ ¼ D i¼1 xi 10 cosð2pxi Þ þ 10 i
PD1 h 2 2 f ðxÞ ¼ i¼1 100 xiþ1 xi þ ðxi 1Þ2 8 > < if ðx2 < 0Þ jx1 j þ jx2 þ 50j f ðxÞ ¼ elseif ðx < 0Þ 1 þ jx þ 50j þ jx 50j 1 1 2 > : else 2 þ jx1 50j þ jx2 50j
x⁄
B [16, 48]
D
(0, 0)
[5, 15]D
(0, 0)
[200, 800]D
(0, 0)
[100, 100]D
(0, 0)
[5.12, 5.12]D
(0, 0)
[30, 30]D
(1, 1) D
[100, 100]
Fig. 7. SS procedure in Ackley function. x1 and x2 are the two variables of the problem.
(0, 50)
44
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
Fig. 8. SS procedure in Alpine function.
Fig. 9. SS procedure in Griewank function.
Fig. 10. SS procedure in Parabola function.
Fig. 11. SS procedure in Rastrigin function.
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
45
Fig. 12. SS procedure in Rosenbrock function.
Fig. 13. SS procedure in Tripod function.
concentration of high-quality solutions. The lighter the color, the higher the concentration. These figures allow observing that SS is capable of finding small regions close to the global optimum (x⁄). For those same functions, an experiment using D = 10 was performed (except for the Tripod function, where D = 2) and the convergence curves of DE and SSDE were compared. The total number of evaluations, including the SS procedure, was set to D ⁄ 5000, and the configuration of DE was popsize = 100, F = 0.5, CR = 0.9, and strategy rand/1/bin. After SS has stopped, the region where the best point found is located is chosen as a starting region. To initialize the population for DE, the best 50 points of the region are selected, and other 50 points are randomly generated outside the region. For each of the seven functions, 50 runs were performed. The convergence curves, using the median of the 50 runs, are presented in Fig. 14. SS can find more than one promising region, but we selected only the region where the best solution was found. This methodology was adopted in this experiment only to present the curves. However, it is interesting that more regions should be investigated by DE in real problems. Fig. 14 shows that SSDE provides better final solutions. In Ackley, Parabola and Tripod (from the last row) functions, both curves (DE and SSDE) present a similar behavior. One possible conclusion is that DE is only refining the best solution inside of the region found. The Tripod function is presented twice. First it represents the process using window_size = 0.01, where the optimum was not found. The region containing the global optimum was also found. As it was the second in the queue for investigation, it was not used by DE. On the other hand, the second Tripod plot (in the last row) represents the process using window_size = 0.1 and upper_limit lowered from 1.5 to 0.5, limiting the exploration. In this case, the region with the global optimum was selected more times, resulting in a much better curve. The performance was very similar in Rastrigin and Griewank functions. SSDE improved the best solution much faster than DE did. The region with the best solution was very close to the global optimum, making DE’s work easy and cheap. Moreover, SSDE found the global optimum with a very small error (<1e 200). This is reflected in the vertical line close to generation 400. When DE was applied alone in those functions, the improvement was slow and the best solution found presented a much inferior quality. For the Alpine function, a fast improvement was observed in the first hundred generations of SSDE, then smaller improvements were verified until the end. Nonetheless, SSDE was able to find better solutions. Similarly to what occured with Tripod function, Rosenbrock presented an unexpected result. SS got trapped in the local minimum region close to the solution s = (0, . . . , 0), whereas the global optimum is x⁄ = (1, . . . , 1). This result is depicted in the first Rosenbrock chart, which shows that SS finds a promising, but bad region very quickly. The second Rosenbrock chart (in the last row) was obtained when the
46
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
ackley
alpine 1e+00
1e−01
1e−03
1e−04 1e−07
1e−06 DE SSDE
1e−10
DE SSDE
1e−09 0
100
200
300
400
500
0
100
griewank
200
300
400
500
400
500
parabola 1e+04
1e+01 1e−04
1e−02
1e−09
1e−08 1e−14
1e−14 DE SSDE
1e−19 0
100
DE SSDE
1e−20 200
300
400
500
0
100
rastrigin
200
300
rosenbrock
1e+02
DE SSDE
1e+06 1e−03 1e+04 1e−08 1e+02 1e−13 DE SSDE
1e−18 0
100
1e+00 200
300
400
500
400
500
0
100
200
300
400
500
400
500
tripod 1e−07 1e−19 1e−31 1e−43
DE SSDE
1e−55 0
100
200
300
rosenbrock
tripod DE SSDE
1e+05
1e−11
1e+01
1e−31
1e−03
1e−51
1e−07
1e−71
1e−11
1e−91 0
100
200
300
400
500
DE SSDE
0
100
200
300
Fig. 14. Median curves over 50 trials for functions from Table 1. The circle at the beginning of SSDE’s curve is the generation in which SS procedure finished and DE started. The horizontal axis presents the number of generations, and the vertical axis presents the objective function value.
lower_limit was changed from 0.5 to 0.5. This change biases the search towards the best point found at each iteration, instead of allowing the algorithm to explore the response surface. Such a change accelerated the process by orders of magnitude, as seen in the chart. These results have shown that SS is able to find promising regions containing high-quality solutions. Those solutions can be used as a starting population of a metaheuristic, such as DE, without any modification in the metaheuristic. SS improved DE by finding much better solutions using fewer evaluations and less computational time. The next section compares our approach to another initialization method which uses Opposition-Based Learning.
47
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53 Table 2 Benchmark function definitions. Name 1st De Jong Axis parallel hyperellipsoid Schwefel’s Problem 1.2 Rastrigin’s function Griewank’s function Sum of different powers Ackley’s problem Levy function 13 Michalewicz function Zakharov function Schwefel’s Problem 2.22
Definition P 2 f1 ðXÞ ¼ D i¼1 xi PD f2 ðXÞ ¼ i¼1 ix2i 2 j¼1 xj PD 2 f4 ðXÞ ¼ 10D þ i¼1 xi 10 cosð2pxi Þ PD x2i QD f5 ðXÞ ¼ 1 þ i¼1 4000 i¼1 cos pxiffii P ðiþ1Þ f6 ðXÞ ¼ D i¼1 jxi j qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi P PD xi D f7 ðXÞ ¼ 20 exp 0:2 exp i¼1 D i¼1 cosð2pxi Þ þ 20 þ e f3 ðXÞ ¼
PD Pi
f10 ðXÞ ¼
PD
f11 ðXÞ ¼
PD
f12 ðXÞ ¼
Alpine function
f13 ðXÞ ¼
Exponential problem Salomon problem
f(x⁄) 0
[2.56, 7.68]D
0
[32.5, 97.5]D
0
D
0
i¼1
P 2 2 2 2 f8 ðXÞ ¼ sin ð3px1 Þ þ D i¼1 ðxi 1Þ ð1 þ sin ð3pxiþ1 Þ þ ðxD 1Þð1 þ sin ð2pxD ÞÞ P 2 2m sinðx Þðsinðix = p ÞÞ ; m ¼ 10 f9 ðXÞ ¼ D i i i¼1
Step function
S.P.B. [2.56, 7.68]D
2 i¼1 xi
þ
i¼1 jxi j
PD
P
þ
D i¼1 0:5ixi
2
þ
P
D i¼1 0:5ixi
4
QD
i¼1 jxi j 2
i¼1 ðxxi þ 0:5yÞ PD i¼1 jxi sinðxi Þ þ 0:1xi j
P 2 f14 ðXÞ ¼ exp 0:5 D i¼1 xi
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PD 2 f15 ðXÞ ¼ 1 cosð2pkxkÞ þ 0:1kxk; wherekxk ¼ i¼1 xi
[2.56, 7.68] [300, 900]D
0
[0.5, 1.5]D
0
[16,48]D
0
[10, 10]D
0
[0, p]D [5, 10]D
9.66015; 19.637014 0
[5, 15]D
0
[50,150]D
0
[5, 15]D
0
[0.5, 1.5]D
1
[50, 150]D
0
Table 3 SS settings. Initial sample size: the best half of D ⁄ 100, resulting in D ⁄ 50 solutions Number of promising solutions: the best 50% of the population New solutions at each iteration: D ⁄ 10 Maximum iterations: 100 Maximum iterations without improvement: 10 Minimum window size: 1% for each D Lower limit: 0.1 Upper limit: 1
Table 4 DE settings. Population size: N = 100 Differential amplification factor: F = 0.5 Crossover probability constant: CR = 0.9 Maximum number of function calls: MAXNFC = 106 Maximum generations without improvement: D ⁄ 10 Strategy to compare SSDE with DE: rand/1/bin Strategy to compare SSDE with ODE, QODE and UQODE: local-to-best/1/bin
6. Computational experiments This paper presents the effects of SS on global optimization problems using a well-known global optimization algorithm, i.e., the classical Differential Evolution (DE) [38]. The results using SS (our approach) in conjunction with DE – called SSDE – are compared to the results presented by ODE [33], QODE [31] and UQODE [29]. 6.1. Test benchmark functions To evaluate the SSDE, our approach was applied to the same set of 14 standard continuous global optimization test problems used to test the ODE, QODE and UQODE. These test problems correspond to 7 unimodal and 7 multimodal functions. The 14 test functions were tested with two different dimensions (D and 2⁄D) to increase the problem’s difficulty. Therefore, the DE started inside promising regions is compared using 28 minimization problems against a DE randomly started in the whole search-space utilizing the benchmark functions presented in Table 2.
48
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
Table 5 Comparison of DE and SSDE. D: dimension, NFC: number of evaluations (average over 50 executions) until VTR has been achieved; SR: success rate; SP: success performance. The last rows of the table present the averages. The best success performance (SP) for each case is highlighted in boldface. DE is unable to solve f9 (D = 20), and SSDE and DE are unable to solve f10 (D = 60). The last line presents the averaged SR. Symbol means that the lowest value (NFC or SR) is statistically significant with a = 0.05. F
D
30 60 30 60 20 40 10 20 30 60 30 60 30 60 30 60 10 20 30 60 30 60 30 60 30 60 10 20
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14
DE
SSDE
NFC
SR
SP
NFC
SR
SP
78210 140070 86730 159300 161660 746430 275430 642800 103280 177250 17160 28980 150670 268800 87160 155844.4 183775 – 359640 – 164890 287910 39760 75070 383070 403610 16820 39940
1 1 1 1 1 1 1 0.1 1 1 1 1 1 0.7 1 0.9 0.4 0 1 0 1 1 1 1 1 1 1 1
78210 140070 86730 159300 161660 746430 275430 6428000 103280 177250 17160 28980 150670 384000 87160 173160.4 459437.5 – 359640 – 164890 287910 39760 75070 383070 403610 16820 39940
46030 93450 53940 115960 114290 464480 20290 51900 66330 133400 9010 17580 113440 223480 54590 115220 47600 55660 290760 – 106770 215460 9090 19560 103000 198930 9170 21580
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 1 1 0 1 1 1 1 1 1 1 1
46030 93450 53940 115960 114290 464480 20290 51900 66330 133400 9010 7580 113440 223480 54590 115220 95200 55660 290760 – 106770 215460 9090 19560 103000 198930 9170 21580
SR
0.803
0.96
SP
415194
100663
As 13 of the 14 benchmark functions have a global optimum in the center of the search-space, the function’s domain was shifted by 50% to become asymmetric, moving the global optimum from the center, as follows: Let O.P.B be original parameter bounds and S.P.B. shifted parameter bounds. If O.P.B.: a 6 xi 6 a and f(x⁄) = f(0, . . . , 0) = 0, then S.P.B.:a þ a2 6 xi 6 a þ a2. 6.2. Comparison strategies and measurements To compare the results of the experiments and decide which algorithm performs better after 50 runs for each benchmark function, three metrics were used: number of function calls (NFC), success rate (SR), and success performance (SP). Since the global minimum is known for each of those functions, the success can be calculated according to a value-to-reach (VTR), which is an objective function value (f(x)) close to the global optimum (f(x⁄)). The level of proximity is defined by a maximum allowed error
VTR ¼ jf ðxÞ f ðx Þj 6 108 : The lower the NFC, the higher the convergence speed. The stop criterion of the algorithms is to reach either MAXNFC (maximum number of function calls allowed) or VTR. The NFC is averaged over the number of trials. The SR value represents the percentage of times the algorithm succeeds to reach the VTR:
SR ¼
number of times VTR has been reached : total number of trials
To simplify the comparison of algorithms using multiple criteria, Suganthan et. al. [39] introduced the SP, which, by combining NFC and SR, presents an estimate of the maximum number of function evaluations needed to achieve the global optimum. SP is our final measurement to determine which algorithm performs better.
SP ¼
meanðNFC for successful runsÞ : SR
49
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
Table 6 Comparison of ODE, QODE, UQODE, and SSDE2 (our approach, but DE with local-to-best/1/bin strategy). D: dimension, NFC: number of function calls (average over 50 trials) until VTR has been achieved, SR: success, SP: success performance. The last rows of the table present the averages. The best success performance for each case is highlighted in boldface. ODE, QODE, and UQODE are unable to solve f10 (D = 60). F
D
ODE
QODE
UQODE
SSDE2
NFC
SR
SP
NFC
SR
SP
NFC
SR
SP
NFC
SR
SP
f1
30 60
50844 101832
1 1
50844 101832
42896 94016
1 1
42896 94016
23316 68245
1 1
23316 68245
25500 55490
1 1
25500 55490
f2
30 60
56944 117756
1 1
56944 117756
47072 105992
1 1
47072 105992
26166 77133
1 1
26166 77133
28480 64500
1 1
28480 64500
f3
20 40
177300 834668
1 1
177300 834668
116192 539608
1 1
116192 539608
33370 170508
1 1
33370 170508
28860 95740
1 1
28860 95740
f4
10 20
75278 421300
0.92 0.16
81823 2633125
181100 615280
1 0.16
181100 3845500
72362 154897
1 1
72362 154897
9977 28740
0.9 1
11086 28740
f5
30 60
74717 128340
0.92 0.68
81214 188735
100540 115280
0.80 0.68
125675 169529
105176 204985
1 1
105176 204985
33130 73710
1 1
33130 73710
f6
30 60
10152 11452
1 1
10152 11452
9452 14667
1 0.84
9452 17461
11050 18666
1 1
11050 18666
8810 16680
1 1
8810 16680
f7
30 60
100280 202010
1 0.96
100280 210427
82448 221850
1 0.72
82448 308125
151290 127272
1 1
151290 127272
50450 116280
1 1
50450 116280
f8
30 60
70408 121750
1 0.60
70408 202900
50576 98300
1 0.40
50576 245800
81988 172639
1 1
81988 172639
29620 65090
1 1
29620 65090
f9
10 20
213330 253910
0.56 0.55
380900 461700
247640 193330
0.48 0.68
515900 284300
63568 276348
1 0.96
63568 287863
114930 585366
1 0.3
114930 1951222
f10
30 60
369104 –
1 0
369104 –
239832 –
1 0
239832 –
120278 –
1 0
120278 –
74210 744210
1 1
74210 744210
f11
30 60
167580 274716
1 1
167580 274716
108852 183132
1 1
108852 183132
47208 126302
1 1
47208 126302
51160 114260
1 1
51160 114260
f12
30 60
26400 64780
1 1
26400 64780
21076 64205
1 1
21076 64205
13682 754400
1 1
13682 754400
9030 19140
1 1
9030 19140
f13
30 60
361884 425700
1 0.96
361884 443438
291448 295084
1 1
291448 295084
52492 157248
1 1
52492 157248
48310 104030
1 1
48310 104030
f14
10 20
16112 31720
1 1
16112 31720
13972 23776
1 1
13972 23776
4420 10689
1 1
4420 10689
6250 12940
1 1
6250 12940
SR
0.87
0.85
0.96
0.97
SP
268864
286536
112043
142209
6.3. Statistical analysis To compare the SR values for each algorithm, a proportion test (a = 0.05) was applied. On the other hand, Wilcoxon’s rank sum test (a = 0.05) was employed to detect differences in the means and identify the best of the 2 (DE versus SSDE) algorithms for each test function. This test is used to compare the means only when the global optimum has been found. However, as SR can be different for the same problem, the vector of means may have different sizes, leading to inadequate conclusions. Thus, those two statistics were added as supplementary information. As presented in Section 6.2, a more adequate metric of comparison is SP. As the results of ODE, QODE and UQODE are unknown, they cannot be included in the statistical analysis.
6.4. Configuration of the algorithms Parameter settings of SS and DE for all conducted experiments are defined in Tables 3 and 4, respectively. SS code was developed in R [30] language.2 For the DE algorithm, we used the DEoptim package [19] available in R. The DE is run for each promising region found by the SS until VTR or MAXNFC has been achieved. The SS returns a set of solutions separated into promising regions. The order of the promising regions is determined by sorting the best values found in each promising region. The region which contains the best solution is optimized first by the DE. If the best solution is the same during D ⁄ 10 iterations (stagnation) and is not the VTR, then the optimization of this region is stopped. This number of iterations was empirically determined based on experimental analysis. 2
www.r-project.org.
50
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
The population of 100 random solutions for the DE is generated as follows: 50% are the best points of a promising region and the other 50% are randomly generated inside the problem bounds. If the promising region has less than 50 points, the Resampling Operator is applied to these solutions in order to complete 50 solutions. As shown in Table 4, in the comparison between SSDE and DE both use the same strategy to evaluate if SS can improve DE. On the other hand, the other three algorithms employ improved strategies. Thus, for a fair comparison we cannot use a strategy that explores the search-space for longer periods (rand/1/bin). Because SS finds promising regions, we used local-to-best/1/bin, which is better in local optimization.
6.5. Results Results of the application of DE and SSDE (our approach) to solve 14 test problems in 28 test cases are given in Table 5. The best success performance (SP) for each case is highlighted in boldface. The DE results in this table correspond to our experiments, instead of the results in [33]. As SP considers NFC and SR simultaneously, it is preferred as a measurement to compare optimization algorithms instead of a multi-criteria approach. As seen in Table 5, SSDE achieved the best SP for all functions. Also, SR improved from 80% (DE) to 96% (SSDE). This improvement can be more easily verified for functions f4 and f9, where DE presented very poor performance. In general, DE explored the search-space for much longer periods until it found a promising region to exploit. At other times, DE got stuck inside a local optima. While in some functions the number of evaluations required by SSDE was close to the number required by DE (f2 and f11, for instance), for others the reduction was much stronger (see f4, f9 and f12). Moreover, the average number of function evaluations needed to find the global optimum (SP) reduced from 415,194 (DE) to 100, 663 (SSDE), i.e. approximately 76%. This is a substantial increase in performance considering that no changes were made in the DE algorithm. When compared to the classical DE in the set of optimization problems tested, SSDE obtained a higher success rate while using a considerably lower number of function evaluations. Based on this result, one can conclude that SS is very effective in finding high-quality promising regions with low effort.
Table 7 Statistics on the number of function evaluations for SSDE2. F
D
Min
1st Quartile
Median
Average
3rd Quartile
Max
Std. dev.
f1
30 60
24800 52300
25150 54225
25400 55200
25500 55490
25700 56600
26600 58400
551.76 1986.31
f2
30 60
27300 60100
28125 63025
28300 64800
28480 64500
29025 65850
29500 68400
734.54 2309.4
f3
20 40
27600 86900
28200 91700
28400 95250
28860 95740
29175 100925
32100 103300
1316.73 5763.14
f4
10 20
9500 19200
9700 19600
10000 19900
9977.78 28740
10200 20475
10400 108200
330.82 27923.24
f5
30 60
31400 68200
32900 71350
33150 73250
33130 73710
33725 75050
34400 81700
889.51 4016.76
f6
30 60
6900 13200
8400 16200
8850 17100
8810 16680
9300 17400
10300 18000
1088.78 1408.55
f7
30 60
48900 112500
49625 113600
50000 116250
50450 116280
51225 118975
53100 119900
1255.43 2835.02
f8
30 60
28600 62100
29400 63300
29450 65800
29620 65090
30000 66300
30700 68400
588.41 2092.02
f9
10 20
22900 422900
63200 438050
87450 453200
114930 585366.67
138775 666600
289600 880000
81833.53 255609.32
f10
30 60
69500 611700
71275 686950
74850 743950
74210 744210
76750 768200
79000 981900
3273.28 99864.26
f11
30 60
50400 111000
50750 112150
51050 114400
51160 114260
51350 115700
52800 118200
671.98 2429.08
f12
30 60
8700 17400
9000 18750
9000 19200
9030 19140
9225 19650
9300 20400
221.36 957.08
f13
30 60
45900 100700
47750 102525
48250 103400
48310 104030
48625 105325
51600 108200
1410.63 2535.55
f14
10 20
5800 12000
6200 12825
6250 13000
6250 12940
6375 13150
6600 13600
232.14 429.99
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
51
The next step is to compare SSDE to other approaches that improve the classical DE. As those approaches improve not only the population initialization, but the procedure that searches for the global optimum, a fair comparison cannot use the classical DE. Thus, we changed the classical strategy for another that is better at exploiting regions, as explained in Section 6.4. The results of SSDE compared to the results of ODE, QODE and UQODE are presented in Table 6. It is important to stress that UQODE is an improvement over QODE, which is an improvement over ODE, which is an improvement over DE. Thus, UQODE is the best of those four algorithms, as shown in [29]. In Table 6, SSDE achieved the best SP in 20 of the 28 functions, ODE in 1, QODE in 1, and UQODE in 6. In function f6 (D = 60), where ODE was the best, SSDE was the second best. In function f9 (D = 20), where QODE was the best, SSDE was the worst. One should notice that although SSDE was not the best (f1 with D = 30, f2 with D = 30, f11 with D = 30, f14 with D = 10 and D = 20) in five of the cases, it presented very competitive results. When the upper and lower limits of the resampling procedure of SS were changed to allow for a larger exploration, the result was similar to the one obtained by QODE. The idea is to avoid tuning the algorithm for specific problems, although it can be used to achieve better results. When compared to UQODE, SSDE required more evaluations because of SS. For easier problems, SS can be closer to the global optimum, being responsible for a large number of function evaluations. This behavior was observed for functions f1 (D = 30, more than a half), f2 (D = 30, almost a half), f6 (D = 60, all evaluations – DE was unnecessary), and f14 (more than half). As the best of the other three algorithms is UQODE, let us make a simpler comparison: SSDE x UQODE. Very large reductions in the number of function evaluations were obtained in f4 (85%), f5 (65%), f7 (66%), f8 (63%), f10 (38%), f12 (97%). Possible explanations are that UQODE wastes too much time in non-promising regions or gets trapped in local optima for too many iterations. By using SS initialization method, those behaviors tend to be reduced because SS focus on promising regions of the search-space, avoiding an excessive exploration. SR value is only 1% higher with SSDE, while SP is 21% lower, mainly because of f9 (D = 20). Ignoring f9(D = 20) and f10 (D = 60) as outliers, one would have SP UQODE ¼ 109; 590 and SP SSDE2 ¼ 49; 478, which seems much more adequate, and a reduction of approximately 55% in the number of function evaluations. Issues related to these two functions could possibly be solved by a different configuration for the algorithms, using, for example, a larger population size. However, in this paper we do not intend to find the best configuration for each function. Finally, Table 7 presents a descriptive analysis of the number of function evaluations required by SSDE2 to achieve VTR. This table is provided so that other authors can make different comparisons to our work. Those experiments were conducted to show that SSDE cannot only outperform other algorithms made to improve DE, but also highlight that such performance can be achieved using a default setting of SS for a variety of problems, without finetuning the parameters of the algorithm. Another important aspect is that DE is a very powerful technique and even the original DE strategies are able to perform well on several problems when a method to find promising regions is used.
7. Conclusions and future works This paper has presented a technique to find promising regions of the search-space of continuous functions. The approach, named Smart Sampling (SS), uses a machine-learning technique to identify promising and non-promising solutions to guide the resampling procedure to smaller areas where higher-quality solutions can be found. This iterative process ends when a stop criterion has been achieved, for instance, when a promising region is too small. At this point, another machine-learning technique is applied to separate the promising solutions into promising regions. Those promising solutions can then be used as an initial population for metaheuristics. In this work, the metaheuristic used in conjunction with the SS was the Differential Evolution (DE), originating the SSDE algorithm. To evaluate SSDE’s performance, a set of 7 classical optimization functions was used. We presented the distribution of promising solutions on the search-space during the SS process to find promising regions. An experiment was conducted on those functions to compare SSDE to DE and plot the optimization curves. The results showed a very clear advantage when using SS. DE and SSDE were then compared using 14 hard continuous optimization problems in two different sizes (dimensions) resulting in 28 cases. While DE achieved a success rate (SR) of 80% with success performance (SP) of 415,194 evaluations, SSDE achieved SR = 96% with only SP = 100, 663 evaluations, which represents an improvement of 16% in the success rate and a reduction of 76% in the number of function evaluations. SSDE was compared to other three state-of-the-art algorithms that improve DE based on population initialization and different mutation strategies. Those three algorithms, i.e. Opposition-based Differential Evolution (ODE), Quasi-Oppositional Differential Evolution (QODE), and Uniform-Quasi-Opposition Differential Evolution (UQODE) showed significant reductions in the number of function evaluations. Experimental results of SSDE to the same previous 28 case problems were compared to the results obtained by ODE, QODE and UQODE. SSDE (our approach) outperformed the other three approaches in 20 out of 28 cases based on the number of function evaluations, success rate, and success performance. In other five functions, SSDE achieved very competitive performance. Analyzing the results presented in this work, it is possible to conclude that the proposed approach has shown relevant results to the task of global optimization. The search for promising regions for population initialization is a research area that can be better studied as it can not only provide much better results using fewer evaluations, but also increase the success rate.
52
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
As a final conclusion, the development of methods similar to SS can be considered relevant, as they can be independent of the optimization technique. This property is especially interesting because SS could provide new solutions for hard problems in a wide range of scientific areas. The way SS was designed allows for its application to any other metaheuristic in future works. Several future works can be developed from this paper. SS currently uses simple classifiers to identify differences among the samples and split the final solutions into promising regions. As explained in Section 3.2.2, we have chosen those algorithms to reduce computational time and evaluate the performance of SS without complex relationship models. However, we are aware that better classifiers can provide better results. Therefore, SVM, Neural Networks and Bayesian Networks are some classifiers that can be evaluated in future researches. This first version of SS presents several parameters for configuration. However, the number of parameters of SS is similar to the number of parameters of a Genetic Algorithm (GA)[9], a well-known and largely employed metaheuristic. For instance, it comprises population size, crossover rate, number of generations, value-to-reach, and mutation probability, whereas SS comprehends number of promising solutions, new solutions at each iteration, maximum iterations, minimum window size, and lower and upper limits. Moreover, we are seeking for ways to make it auto-adaptive, reduce the number of parameters or make a parameterless version. For now, empirical observations have suggested that for noisy and multimodal problems SS requires larger population size (D ⁄ 100), higher number of iterations (100), larger window size (1% of the range) and larger limits to provide better exploration. On the other hand, unimodal problems can be solved in fewer iterations and smaller window size, reaching regions very close to the global optimum. Problems with a very small number of variables are not recommended for SS due to the large number of evaluations required by the first sampling of the search-space, unless the problem is very multimodal and noisy. Another important future study is the evaluation of SS with other populational metaheuristics. Significant improvements over the original metaheuristic are expected if it tends to fail escaping from local optima. Preliminary tests performed with other two populational metaheuristics have provided similar results and we are planning to publish them soon. Acknowledgments The authors would like to acknowledge CAPES (a Brazilian Research Agency) for the financial support given to this research. References [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10]
[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]
D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, in: Machine Learning, 1991, pp. 37–66. T. Back, D.B. Fogel, Z. Michalewicz, Handbook of Evolutionary Computation, Institute of Physics, Ringbound, 1997. G.E.P. Box, W.G. Hunter, J.S. Hunter, Statistics for Experimenters, John Willey, New York, 1978. R. Chelouah, P. Siarry, Genetic and nelder-mead algorithms hybridized for a more accurate global optimization of continuous multi-minima functions, European Journal of Operational Research 148 (2) (2003) 335–348. W.W. Cohen, Fast effective rule induction, in: Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufman, 1995, pp. 115–123. T.R. Dastidar, P.P. Chakrabarti, P. Ray, A synthesis system for analog circuits based on evolutionary search and topological reuse, IEEE Transactions on Evolutionary Computation 9 (2) (2005) 211–224. K. Deb, D. Goldberg, An investigation of niche and species formation in genetic function optimization, in: J. Schaffer (Ed.), Proceedings of the Third International Conference on Genetic Algorithms, Morgan Kaufman, San Mateo, CA, USA, 1989, pp. 42–50. P. Gabriel, A. Delbem, Representations for evolutionary algorithms applied to protein structure prediction problem using hp model, in: K. Guimarpes, A. Panchenko, T. Przytycka (Eds.), Advances in Bioinformatics and Computational Biology, Lecture Notes in Computer Science, vol. 5676, Springer, Berlin/Heidelberg, 2009, pp. 97–108. D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989. S. Hitzel, L. Nardin, K. Sorensen, H. Rieger, Aerodynamic optimization of an ucav configuration, in: N. Kroll, D. Schwamborn, K. Becker, H. Rieger, F. Thiele (Eds.), MEGADESIGN and MegaOpt – German Initiatives for Aerodynamic Simulation and Optimization in Aircraft Design, Notes on Numerical Fluid Mechanics and Multidisciplinary Design, vol. 107, Springer, Berlin/Heidelberg, 2009, pp. 263–285. M. Jelasity, P. Ortigosa, I. Garcia, Uego, an abstract clustering technique for multimodal global optimization, Journal of Heuristics 7 (3) (2001) 215–233. L. Jourdan, C. Dhaenens, E.-G. Talbi, Using datamining techniques to help metaheuristics: a short survey, in: Hybrid Metaheuristics, 2006, pp. 57–69. A.Y.S. Lam, V.O.K. Li, Chemical-reaction-inspired metaheuristic for optimization, IEEE Transactions on Evolutionary Computation 14 (3) (2010) 381– 399. K.-H. Liang, X. Yao, C. Newton, Evolutionary search of approximated n-dimensional landscapes, International Journal of Knowledge-Based Intelligent Engineering Systems 4 (3) (2000) 172–183. H. Maaranen, K. Miettinen, M. MSkelS, Quasi-random initial population for genetic algorithms, Computers and Mathematics with Applications 47 (12) (2004) 1885–1895. H. Maaranen, K. Miettinen, A. Penttinen, On initial populations of a genetic algorithm for continuous optimization problems, Journal of Global Optimization 37 (2007) 405–436. V.V. Melo, A.C.B. Delbem, D.L. Pinto Junior, F.M. Federson, Discovering promising regions to help global numerical optimization algorithms, in: 6th Mexican International Conference on Artificial Intelligence (MICAI’07), 2007, pp. 72–82. T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. K. Mullen, D. Ardia, D. Gil, D. Windover, J. Cline, DEoptim: an R package for global optimization by differential evolution, Journal of Statistical Software 40 (6) (2011) 1–26. R.H. Myers, D.C. Montgomery, Response Surface Methodology: Process and Product Optimization Using Designed Experiments, second ed., Wiley, New York, 2002. Z. Naji-Azimi, P. Toth, L. Galli, An electromagnetism metaheuristic for the unicost set covering problem, European Journal of Operational Research 205 (2) (2010) 290–300.
V.V. de Melo, A.C. Botazzo Delbem / Information Sciences 193 (2012) 36–53
53
[22] F. Neri, G. Iacca, E. Mininno, Disturbed exploitation compact differential evolution for limited memory optimization problems, Information Sciences 181 (12) (2011) 2469–2487. [23] A.C.M. Oliveira, L.A.N. Lorena, Hybrid evolutionary algorithms and clustering search, in: Crina Grosan, Ajith Abraham, Hisao Ishibuchi (Eds.), Hybrid Evolutionary Systems – Studies in Computational Intelligence – Springer SCI Series, vol. 75, 2007, pp. 81–102. [24] A.C.M.d. Oliveira, Algoritmos evolutivos híbridos com detecção de regiões promissoras em espaços de busca contínuos e discretos, PhD thesis, Instituto Nacional de Pesquisas Espaciais, São José dos Campos, Julho 2004. [25] P.M. Ortigosa, I. Garcı´a, M. Jelasity, Reliability and performance of uego, a clustering-based global optimizer, Journal of Global Optimization 19 (2001) 265–289. [26] Q.-K. Pan, L. Wang, L. Gao, W.D. Li, An effective hybrid discrete differential evolution algorithm for the flow shop scheduling with intermediate buffers, Information Sciences 181 (2011) 668–685. [27] M. Pant, R. Thangaraj, A. Abraha, Low discrepancy initialized particle swarm optimization for solving constrained optimization problems, Fundamenta Informaticae 95 (2009) 511–531. [28] M. Pelikan, D.E. Goldberg, E. Cantu-Paz, Linkage problem, distribution estimation, and bayesian networks, Evolutionary Computation 8 (3) (2000) 311– 340. [29] L. Peng, Y. Wang, Differential evolution using uniform-quasi-opposition for initializing the population, Information Technology Journal 9 (8) (2010) 1629–1634. [30] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008. ISBN: 3-900051-07-0. [31] S. Rahnamayan, H.R. Tizhoosh, M.M.A. Salama, Quasi-oppositional differential evolution, in: IEEE Congress on Evolutionary Computation, IEEE, 2007, pp. 2229–2236. [32] S. Rahnamayan, H.R. Tizhoosh, M.M.A. Salama, Opposition-based differential evolution, IEEE Transactions on Evolutionary Computation 12 (1) (2008) 64–79. [33] S. Rahnamayan, H.R. Tizhoosh, M.M.A. Salama, Opposition versus randomness in soft computing techniques, Applied Soft Computing 8 (2) (2008) 906– 918. [34] C.L. Ramsey, J.J. Grefenstette, Case-based initialization of genetic algorithms, in: Proceedings of the 5th International Conference on Genetic Algorithms, Morgan Kaufman, Publishers Inc., San Francisco, CA, USA, 1993, pp. 84–91. [35] G.P. Rangaiah, Stochastic Global Optimization, Techniques and Applications in Chemical Engineering, World Scientific Publishing Company, 2010. [36] P. Siarry, Z. Michalewicz (Eds.), Advances in Metaheuristics for Hard Optimization, Natural Computing Series, Springer, 2008. [37] R. Storn, K. Price, Differential Evolution – A Simple and Efficient Adaptive Scheme for Global Optimization over Continuous Spaces, Tech. rep., 1995. [38] R. Storn, K. Price, Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization 11 (4) (1997) 341–359. [39] P.N. Suganthan, N. Hansen, J.J. Liang, K. Deb, Y.-P. Chen, A. Auger, S. Tiwari, Problem definitions and evaluation criteria for the cec 2005 special session on real-parameter optimization, Tech. Rep. KanGAL Report 2005005, Nanyang Technological University, Singapore, 2005. [40] D.K. Tasoulis, V.P. Plagianakos, M.N. Vrahatis, Clustering in evolutionary algorithms to efficiently compute simultaneously local and global minima, in: Congress on Evolutionary Computation, 2005, pp. 1847–1854. [41] M.N. Vrahatis, B. Boutsinas, P. Alevizos, G. Pavlides, The new k-windows algorithm for improving the k-means clustering algorithm, Journal of Complexity 18 (1) (2002) 375–391. [42] Y. Wang, B. Li, T. Weise, Estimation of distribution and differential evolution cooperation for large scale economic load dispatch optimization of power systems, Information Sciences 180 (2010) 2405–2420. [43] M. Weber, F. Neri, V. Tirronen, A study on scale factor in distributed differential evolution, Information Sciences 181 (12) (2011) 2488–2511. [44] Weka Machine Learning Project, Weka.
. [45] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed., Morgan Kaufman, San Francisco, 2005. [46] E.E. Zachariadis, C.T. Kiranoudis, A local search metaheuristic algorithm for the vehicle routing problem with simultaneous pick-ups and deliveries, Expert Systems with Applications (2010).