Landscape-based adaptive operator selection mechanism for differential evolution

Landscape-based adaptive operator selection mechanism for differential evolution

Information Sciences 418–419 (2017) 383–404 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate...

1MB Sizes 6 Downloads 80 Views

Information Sciences 418–419 (2017) 383–404

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Landscape-based adaptive operator selection mechanism for differential evolution Karam M. Sallam∗, Saber M. Elsayed, Ruhul A. Sarker, Daryl L. Essam School of Engineering and Information Technology, University of New South Wales, Canberra, Australia

a r t i c l e

i n f o

Article history: Received 1 November 2016 Revised 14 July 2017 Accepted 6 August 2017 Available online 9 August 2017 Keywords: Differential evolution Fitness landscape Adaptive operator selection

a b s t r a c t Over the last two decades, many different differential evolution algorithms for solving optimization problems have been introduced. Although most of these algorithms used a single mutation strategy, several with multiple mutation strategies have recently been proposed. Multiple-operator-based algorithms have been proven to be more effective and efficient than single-operator-based algorithms for solving a wide range of benchmark and practical problems. In these algorithms, adaptive operator selection mechanisms are generally applied to place greater emphasis on the best-performing evolutionary operators based on their performance histories for generating new offspring. In this paper, we investigate using problem landscape information in an adaptive operator selection mechanism. For this purpose, a new algorithm, which considers both this problem landscape information and the performance histories of the operators, for dynamically selecting the most suitable differential evolution operator during the evolutionary process, is proposed. The contributions of each component of the selection mechanism are analyzed and the performance of the proposed algorithm is evaluated by solving 45 unconstrained optimization problems. The results demonstrate the effectiveness and superiority of the proposed algorithm to stateof-the-art algorithms. © 2017 Elsevier Inc. All rights reserved.

1. Introduction Optimization plays an important role in many practical decision-making processes. Numerous optimization problems arise in the engineering domain, industry, and both the public and private sectors. In general, a global unconstrained singleobjective optimization problem, which is considered in this paper, can be described as determining the appropriate decision − → − → − → vector x = (x1 , x2 , . . . , xD ) ∈ RD , that satisfies the variable bounds, x min ≤ x ≤ x max and optimizes the objective function − → − → − → f ( x ), where D is the problem’s dimensions, x min and x max its lower and upper bounds, respectively. Such problems may contain different types of variables, such as integer, real, discrete or a combination of them [10] while the objective function can be linear or non-linear, convex or non-convex, continuous or discontinuous, and uni-modal or multi-modal. Although researchers and practitioners have used both traditional and computational intelligence (CI) approaches, such as evolutionary algorithms (EAs) and swarm intelligence (SI) techniques [25], to solve these types of complex optimization problems. However, the former encounter many difficulties [19], such as: 1) their convergence to a near-optimal or optimal



Corresponding author: E-mail addresses: [email protected], [email protected] (K.M. Sallam), [email protected] (S.M. Elsayed), [email protected] (R.A. Sarker), [email protected] (D.L. Essam). http://dx.doi.org/10.1016/j.ins.2017.08.028 0020-0255/© 2017 Elsevier Inc. All rights reserved.

384

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

solution relies on the initial solution; 2) they require specific mathematical properties, like continuity, convexity and differentiability, to be satisfied; and 3) they may need to simplify a problem by making different assumptions for the convenience of mathematical modeling [41]. Therefore, researchers have begun to use CI approaches because of their many advantages, for example, as EAs are resilient to dynamic changes, have the capability of self-organization, do not need particular mathematical characteristics to be achieved, can evaluate several solutions in parallel, and are more widely used in practice [10], they have often been proven to work better than traditional methods [39]. However, there is no guarantee that EA-based methods can obtain exact solutions and the quality of their solutions relies on the settings of their parameter. In fact, as no single EA has been found to be the best for solving all types of optimization problems, generally, using more than one algorithm in a single framework may help to overcome this limitation [10]. Over the years, researchers and practitioners have used frameworks that contain more than one search operator or algorithm in a single algorithmic framework, such as ensemble-based (use a mix of operators) [29], hyper-heuristics (a heuristic to select among heuristics) [15], mutli-methods (use more than one optimization algorithm) [11,52], multi-operators (use more than a single search operator within a single optimization algorithm) [10,18], and heterogeneous (using different algorithms with different behaviors) [33] approaches. To clarify, they use a pool of different algorithms/operators and a selection procedure to determine the best-performing algorithm and/or operator through the evolutionary process. The selection procedures of these algorithms, which are usually called adaptive operator selections (AOSs), rely on different criteria, such as a re-enforcement learning mechanism [20], improvement in the solution quality and/or constraint violations and/or feasibility rate [10], convergence differences and progress ratios [14] and pheromone updates of the ant colony optimization meta-heuristic (ACO-MH) [33]. Landscape information is usually helpful for judging function complexity and, if carefully incorporated in the selection of operators, may boost the performance of an algorithm. While selection methods based on landscape analysis are very rare [2,7], they have the following limitations: (1) a landscape analysis is performed using an off-line mode, i.e., initial experiments are conducted to calculate the values of the landscape statistics independently of the evolutionary process for solving the problem; (2) calculations of landscape measures are computationally expensive; and (3) as a training and testing mechanism is used, the algorithm may be limited to the test problems considered and its performance deteriorate when solving another set of problems. Of the EAs, differential evolution (DE) has gained popularity for solving continuous optimization problems [5,8,37,40]. In this paper, in order to select the best-performing operator during the evolutionary process without any prior knowledge of the problem’s characteristics, a new DE algorithm, which utilizes the strengths of multiple DE mutation operators and has an adaptive operator selection process (DE algorithm with landscape-based Adaptive Operator Selection (LSAOS-DE)) based on both the (1) problem landscape and (2) performances of operators, is proposed. In the proposed algorithm, at each generation, m mutation strategies are considered, with each new solution able to be generated using any one of the m operators, and the landscape measure value and performance history of each operator calculated and recorded for a certain number (CS) of generations. Then, based on the normalized average value of both landscape and performance history measures of each operator, the best-performing operator is used to evolve the entire population for CS generations. However as the performance of one operator may change during the evolutionary process, all the operators are re-used to evolve all individuals in the entire population, and the landscape and performance history measures are then re-evaluated after CS generations. This process is continued up to a pre-defined number of fitness function evaluations and then the bestperforming DE mutation strategy selected to evolve the current population until an overall stopping condition is met. It is worth mentioning that this proposed algorithm has the same structure as multi-method/multi-operator, heterogeneous and ensemble-based algorithms with its pool consisting of several different DE mutation strategies. However, different from those in the literature, the selection mechanism relies on both landscape information and the performance history of each DE mutation strategy. To judge the performance of LSAOS-DE, it is tested on 45 problems from benchmark sets with different mathematical properties (10, 30 and 50 dimensions) (introduced in the CEC2014 and CEC2015 competitions [21,22]) with its overall computational results better than those obtained from other state-of-the-art algorithms for most of the test problems. The rest of this paper is organized as follows: in Section 2, a review of DE algorithms and operators as well as some landscape measures is provided; in Section 3, the proposed algorithm is described; in Section 4, the simulation results for benchmark problems and the effect of the parameters are discussed; and, finally, in Section 5, the conclusions drawn from this study and possible future research directions are presented. 2. Related work In this section, a literature review of DE and the concept of landscape analysis are discussed. 2.1. Differential evolution algorithm DE is an EA variant which was proposed by Storn and Price [44]. It is popular because it usually converges fast, is simple to implement and the same parameter values can be applied for various optimization problems. In the literature, it is shown to perform better than several other EAs for a wide range of problems [8,10]. It starts with an initial population and to

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

385

construct a new solution from current or existing ones, three operators (mutation, crossover and selection) are employed during the search process. A DE population consists of NP D-dimensional real-valued vectors

− → X i,G = (xi,1,G , xi,2,G , . . . , xi,D,G ),

∀ i = 1, 2, . . . , NP.

where NP is the number of solutions in the population and G the generation number. Generally speaking, DE starts with a random population, such that

xi, j,0 = x j,min + randi, j [0, 1] × (x j,max − x j,min ) i = 1, 2, . . . , NP, and j = 1, 2, . . . , D

(1)

where D is the number of decision variables and randi, j [0, 1] a uniformly distributed number generated between 0 and 1. 2.1.1. Mutation − → After the initialization step, DE employs a mutation process to produce a mutant vector, V i,G = (vi,1,G , vi,2,G , . . . , vi,D,G ). − → In a simple mutation (DE/rand/1), three candidates are randomly selected and a mutant vector, V i,G , produced by multiplying a scaling factor (F) by the difference value (DV) between two individuals, with the result summed to a third one as:

− → − → − → − → V i,G = X ri ,G + F × ( X ri ,G − X ri ,G ) 1

2

3

(2)

where r1i = r2i = r3i are random integer numbers selected from the range [1, NP], which are all different from the index i, F > 0 a control parameter of the mutation operator used to scale DV, and G the current generation.

2.1.2. Crossover Generally, after mutation, a crossover operator is employed on the parent vector to generate an offspring vector, with two simple ones usually used in DE, exponential and binomial. In the former, firstly, an integer, c, in the decision space is randomly selected so that c ∈ [1, D], where D is the number of decision variables, and acts as a starting point for the target vector prior to the crossover beginning. Another integer, C, selected from [c, D] denotes the number of elements the donor vector actually participates in the offspring vector. Once c and C are selected, an offspring is generated as:



ui, j,G =

vi, j,G for j = c D , c + 1 D , . . . , c + C − 1 D

xi, j,G

otherwise

(3)

where j = 1, 2, . . . , D and the angular bracket, c D , denotes a modulo function of modulus D with a starting index of c. The binomial crossover, which is applied on every j variable if a random number is less than the crossover rate (CR), is calculated by:



ui, j,G =

vi, j,G if (randi, j [0, 1] ≤ CR or j = jrand ) xi, j,G

otherwise

(4)

where randi, j ∈ [0, 1] and jrand ∈ [1, 2, . . . , Nx ] are randomly chosen to ensure that at least one value is obtained from the offspring.

2.1.3. Selection operation − → − → In order to decide whether the parent, X i,G or child U i,G vectors survive to the next generation, a selection operation based on the survival of the fittest is employed between these vectors. For minimization problems, this is performed using

− → X i,G+1 =

− →

U i,G − → X i,G

− → − → if f ( U i,G ) ≤ f ( X i,G ) otherwise

− → − → where f ( U i,G ) and f ( X i,G ) are the objective functions of the child and parent, respectively. The aforementioned steps are repeated, generation after generation, until some specific termination condition is met.

2.2. Improved DE algorithms In this section, commonly used improved variants of DE are discussed.

(5)

386

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

2.2.1. Single operator DE variants A fuzzy adaptive DE algorithm (FADE), the inputs to which incorporated the relative function values and individuals of successive generations to adapt the parameters for mutation and crossover, was introduced by Liu and Lampinen [23]. The results showed that it performed better than the standard DE on most test functions. Zaharie [56] proposed an adaptive DE algorithm (ADE), in which population diversity is used to adapt F and CR. Omran et al. [35] introduced a self-adaptive DE algorithm (SDE) in which F and CR were randomly chosen based on a Gaussian distribution and the ring neighborhood topology was used. The experimental results showed that SDE was superior to other DE algorithms. Zhang et al. [58] proposed an ADE algorithm with an optional external memory (JADE), in which the CRi of each individi at each generation was independently generated according to a normal distribution of the mean μCR and a standard ual X deviation of 0.1 where, when the value of CRi fell outside the range of [0,1], it was repaired to a value within it. Also, the i was independently generated according to a Cauchy distribution with parameter μF and a value of Fi of each individual X standard deviation of 0.1. If its value was greater than 1, it was truncated to 1 or re-generated if Fi < 0. Tanabe and Fukunage [47] proposed a success-history-based parameter adaptation for DE (SHADE) in which instead of using a single pair (μCR , μF ) to guide parameter adaptation as in JADE, the mean values of SCR and SF for each generation were stored in memory as μCR and μF , respectively. Later, Tanabe and Fukunage [48] improved SHADE algorithm by using a linear population size reduction (LPSR) to dynamically re-size the population NP during a run as the number of fitness evaluations increases, which is called L-SHADE. It showed good performance in comparison with other algorithms for solving a set of unconstrained optimization problems. A two-step sub-population strategy, in which individuals in the current population are sorted based on evaluation metrics, and are divided into superior and inferior sub-populations, is proposed [60]. In it, the inferior sub-population evolves to generate offspring. If the generated offspring has better evaluation metric values than individuals in the superior subpopulation, they will replace the latter and be used as vectors for mutation strategies. The proposed strategy was tested by solving a number of both single-objective optimization (SOP) and multi-objective optimizations (MOPs). Results indicate that the proposed sub-population strategy is capable of improving the performance of both single objective and multi-objective algorithms. The application of the proposed approach is demonstrated by solving a microwave circuit design problem with stringent requirements. 2.2.2. Multi-operator DE variants Motivated by the fact that no single DE operator is the best for all types of problems, a brief review of the considerable number of research studies in the literature aimed at overcoming this issue is provided in the following. Elsayed et al. [10] proposed a self-adaptive multi-operator DE (SAMO-DE) algorithm for solving COPs. In its internal process, the initial population is divided into a number of sub-populations each of which evolved by a different DE operator. The success rate (SR) of each DE operator is calculated based on the feasibility ratio, constraint violation and solution quality. Then the number of solutions in each sub-population is adaptively updated and more weight given to the variant with the highest SR. The results obtained from two sets of test problems showed that SAMO-DE performs better than other state-of-the-art algorithms. A new version of a DE algorithm, jDENP, MM , for solving optimization problems, where MM stands for multiple mutation strategies because two are incorporated, was proposed by Zamuda and Brest [57], with a methodology for reducing the population size introduced to enhance its performance. Each new solution is generated by one of the two mutation strategies using the following selection mechanism. The first mutation strategy is used: 1) when NP ≥ 100 or 2) when rand < 0.75, and rand is a uniform random number ∈ [0, 1]; and the second when 1) NP < 100 or 2) when rand ≥ 0.75. jDENP, MM was tested on 22 real world applications and demonstrated better performances than two other algorithms. Wang et al. [53] developed a composite DE (CoDE), in which three mutation strategies are randomly combined with three fixed control parameter settings to generate a new trial vector. At each generation, three vectors are generated for each target vector, with the one with the best objective fitness value (FV) surviving to the next generation. It was inferred from their experimental results, that CoDE obtained better results in comparison with state-of-the-art algorithm. A self-adaptive DE (SaDE) for solving unconstrained real-parameter optimization problems was proposed by Qin et al. [38]. In it, both the strategies for generating the trial vector and its associated control parameter values are gradually selfadapted according to a success rate calculated based on previous learning experiences. Success and failure memories store the numbers of successful individuals that enter the next generation and those discarded from the next generations, respectively. Firstly, all the mutation strategies have equal probabilities of generating a new solution which are updated after the initial learning period (LP generations) as follows: at the end of each generation, after evaluating all the trial vectors obtained, the numbers of those generated by each strategy that successfully entered and failed to enter the next generation are recorded in the success and failure memories, respectively. The updated probability for each mutation strategy is the number of successful individuals divided by the number of all the individuals generated by that strategy in the previous LP. This algorithm performs much better than both the traditional DE one and several state-of-the-art DE variants with adaptive parameters. A DE algorithm with an adaptive penalty technique for solving a set of constrained problems was proposed by Da Silva et al. [42]. In it, a mechanism for automatically choosing one of four DE mutation strategies is introduced and described as follows: 1) for the first 10% of the total number of generations allowed, four individuals are generated and the best chosen; 2) for the next 30% new individuals are generated by only the variant which produced the best individual in the current

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

387

population; 3) all the variants are used again to generate new individuals for another 10%. 4) the best strategy is used to generate new solutions for the next 30%; and 5) a third evaluation stage is performed using all the operators, with the bestperforming variant chosen for the remaining generations. Based on the results obtained from solving several structural and mechanical engineering optimization problems, it was claimed that the algorithm performs better than the GA techniques used in the continuous domain. An ensemble of mutation strategies and control parameters in DE (EPSDE) for solving unconstrained optimization problems was proposed by Mallipeddi et al. [29]. In it, each individual in the initial population is randomly assigned a mutation strategy with its associated parameter values taken from pools of distinct mutation strategies and the values of each control parameter, based on which new solutions are generated. The process for selecting the next generation of parent individuals is as follows: if the generated offspring 1) is better than its parent in the previous generation, its mutation strategy and parameter values are stored; or 2) is worse than its parent, the parent vector is reinitialized with a new mutation strategy and associated parameter values. The performance of EPSDE compared favorably with those of state-of-the-art DE methods in the literature. Elsayed et al. [11] proposed a united multi-operator EAs (UMOEAs) approach for solving constrained and unconstrained optimization problems. The algorithm starts with an initial population that is then divided into sub-populations, each of which is evolved using a different multi-operator EA, with the best-performing one determined based on its success rate. At the end of each cycle, an information exchange procedure is conducted to update some individuals in the worst-performing population based on the best-performing one. Later, Elsayed et al. [9] proposed an improved version of UMOEAs called UMOEAsII in which each multi-operator EA runs with multiple search operators. In it, an adaptive operator selection (AOS) mechanism designed based on the quality of solutions produced and diversity of the population places emphasis on the better-performing multi-operator algorithm and its search operators. UMOEAsII was tested on the CEC2016 competition’ s single-objective real-parameter optimization problems, with the results demonstrating its capability to obtain better results than those of state-of-the-art algorithms and it won the competition. Vrugt and Robinson [51] introduced the multi-algorithm genetically adaptive multi-objective (AMALGAM) algorithm that has proven to be a powerful approach for solving multi-objective problems. Later, Vrugt et al. [52] extended their work [51] by proposing an algorithm known as AMALGAM for single-objective optimization (AMALGAM-SO) which uses the strengths of the CMA-ES, a GA and particle swarm optimizer (PSO) in a single algorithm for evolving a population. A selfadaptive learning strategy is used to automatically update the number of offspring for each algorithm to produce in each generation. When tested on a set of benchmark problems, this algorithm obtained similar efficiencies to existing ones for relatively simple problems but was increasingly superior for more complex and higher-dimensional multimodal optimization ones. Recently, the concept of heuristic space diversity (HSD) was described by Grobler et al. [15], introduced six methods for controlling heuristic space diversity (HSD) during the optimization process. In their experiments, a heterogeneous metahyper-heuristic (HMHH) method with four common meta-heuristic algorithms as its set of constituent algorithms (a GA, guaranteed convergence PSO (GCPSO), self-adaptive DE with neighborhood (SaNSDE) and CMA-ES) is used as a basis for investigating the management of HSD. Their proposed algorithm showed good performance when compared with a popular PAP one. A multi-population-based framework, in which three mutation strategies (i.e., ’current-to-pbest/1’ and ’current-to-rand/1’ and ’rand/1’) were adapted into a novel DE variant called MPEDE, was proposed by Wu et al. [54]. Its initial population was dynamically divided into several sub-populations, including three equally sized small indicator sub-populations and one larger reward sub-population. Each mutation strategy was used to evolve one indicator sub-population, with the reward sub-population assigned to the currently best-performing mutation strategy as an extra reward. The ratios between the fitness improvements and function evaluations consumed were used to select the best-performing mutation strategy. When tested on well-known unconstrained problems, MPEDE showed a competitive performance compared with those of other state-of-the-art algorithms. For more information regarding multi-operator algorithms, readers are referred to [8,34]. All the above mentioned methods did not incorporate any landscape information in the selection phase. 2.3. Landscape analysis Generally, a fitness landscape consists of: (1) a set of solutions (populations of individuals); (2) a fitness value, which is an objective function value, assigned to each individual in the population; and (3) a neighborhood operator which may be a distance measure [30]. Measuring the fitness landscape of a problem aids researchers in classifying the problem as either easy or difficult to solve [32]. Many landscape measures for understanding and analyzing different characteristics of optimization problems have been proposed [4,32,36], with some briefly reviewed below. Auto-correlation and correlation length [6,36] are often used to measure the ruggedness of a fitness landscape. The fitness distance correlation (FDC) measures the correlation between the objective value and distance to the nearest optimum in the search domain, is another method used to measure a problem’s difficulty [49]. Lunacek et al. [24] introduced the dispersion metric (DM) to predict the existence of funnels in a fitness landscape, which are global basin shapes consisting of grouped local optima [46], by measuring the mean distance between two high-quality solutions. The length scale is a landscape measure that can be used to reflect the ruggedness, smoothness, and neutrality of a problem [45]. To determine the modality of a function, information characteristics analysis (ICA), which was proposed by Vassilev et al. [50], can be used to determine the characteristics of the fitness

388

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

landscapes of discrete problems. It was later adapted by Steer et al. [43] and Malan and Engelbrecht [27] to characterize continuous problems. Another measure is the searchability of a problem [26] which is the capability of a search operator to move to a region of the search space with a better fitness value. It is simple and does not require knowledge of the global optimal [26]. It is computed based on the difference between the information landscape vector of the problem to be solved and a reference one which is the landscape of a function that is easy to be optimized by any optimization algorithm in any dimension [26]. An information matrix M = [ai,k ] for a minimization problem is constructed using:



1 if f (xi ) < f (xk ) 0.5 if f (xi ) = f (xk ) 0 otherwise

ai,k =

(6)

where i = 1, 2, . . . , NP and k = 1, 2, . . . , NP . Not all the entries in an information landscape are necessary to define it [3] as, because there are duplicates due to symmetry (the lower triangle should be omitted), the entries on the diagonal are always 0.5 (and should be omitted) while the row and column of the optimum solution should also be omitted. Therefore, the information matrix can be reduced to the vector LS = (l s1 , l s2 , . . . , l s|LS| ), where the number of elements in LS is |LS| = (NP−1 )×(NP−2 )

. 2 Here, an example is given to understand how the LS is constructed. Consider the following minimization problem:

f (x, y ) = 100(x2 − y )2 + (x − 1 )2 and the values of (x, y ) = {(0, 0 ), (1, 1 ) , (1, 0 ), (0, 1 )}, then f (x1 , y1 ) = 1, f (x2 , y2 ) = 0, f (x3 , y3 ) = 100, and f (x4 , y4 ) = 100. Then the pairwise matrix M is:

M=

f1 f2 f3 f4

f1 0.5 1 0 0

f2 0 0.5 0 0

f3 1 1 0.5 0.5

f4 1 1 0.5 0.5

To construct the vector LS, the lower triangular, main diagonal and the second row and column are omitted. So, LS =

(a13 , a14 , a34 ) = (1, 1, 0.5 ).

Given two landscape LS f = (l s f,1 , l s f,2 , . . . , l s f,|LS f | ) and LSre f = (l sre f,1 , l sre f,2 , . . . , l sre f,|LSre f | ), the difference value between

the two given landscapes is computed using Eq. 7. |LS f |

LD =

 1 × |l s f,z − l sre f,z | |LS f |

(7)

z=1

where z = 1, 2, . . . , |LS f |. When LD is close to 0 or 1, the problem is considered easy or difficult, respectively. Recently, researchers and practitioners have used the fitness landscape to determine and select an appropriate algorithm or operator for solving optimization problems. Bischl et al. [2] suggested using a cost-sensitive learning model to select the best algorithm from a set of four selected manually from the algorithms used to solve black-box optimization benchmarking (BBOB) problems in 2009 and 2010 [16,17], with the functions in 10D characterized using 19 measurements extracted by the exploratory landscape analysis (ELA) technique. In their work, the low-level features [30] were obtained by systematic sampling of the functions on the search space. Then, the separability, modality and global structures of the optimization problem were measured off-line as the first step in characterizing the landscape. Next, a machine-learning model was constructed to select the best-performing algorithm from the portfolio. Although, based on two different cross-validation schemes, this model was validated, the results may not be generalizable for a knowledge base with problems of different dimensionalities, such as 2D, 3D, 5D and 20D. Because the low-level features were extracted in a different step, the authors did not add the computational cost of calculating the features to function evaluations. Also, as the four algorithms were selected manually, the model’s validation for unobserved problems was weak. It is worth mentioning that the algorithm proposed in this paper is different from that reported in [2] in the following aspects: 1) no training or testing mechanism is used; 2) an automatic selection of the best-performing operators from a set of operators is used; and 3) the value of the landscape measure is computed online, i.e., during the evolutionary process. Garden and Engelbrecht [13] employed a self-organizing feature map to cluster and analyze the landscapes of two commonly used benchmark problems. They concluded that, while there are functions that represent a wide range of properties, some other properties are not fully covered in these benchmarks. In [28], a prediction model for predicting when a PSO algorithm will fail to solve a particular optimization problem is developed, with decision trees used to predict the failures of seven different PSO utilizing a number of fitness landscape measures. Munoz et al. [31] used a regression model (multi-layer feed-forward neural network) to determine the best of 8 parameter combinations for the CMA-ES algorithm. A knowledge base of 1800 problem instances drawn from the comparing continuous optimization (COCO) benchmark [16] dimensions to train the model was established, data from the CEC2005 competition used to validate the model and seven ELA measures used to analyze and characterize each problem. However,

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

389

all the landscape analyzes were performed off-line and the sample size of 15 × 103 × D used to calculate the ELA measurement too expensive to be practical. Also, during the validation phase, the accuracy of the model was compared with only random configurations of unseen problems. In [7], an AOS mechanism based on a set of four fitness landscape analysis techniques is used to train an on-line regression learning model (dynamic weighted majority) for predicting the weight of each operator in each generation. It determines the most suitable of four crossover operators for solving a set of capacitated arc-routing problem (CARP) instances. It uses an instantaneous reward which is considered the value computed at the last evaluation. However, compared with some state-of-the-art algorithms, this algorithm was not significantly better. It should be mentioned that our algorithm is also different from that in [7] in that it does not incorporate a regression model for assigning the weight of each operator while a landscape measure as well as a performance measure are used to automatically select the best-performing operator during the evolutionary process. In addition, we deal with continuousdomain unconstrained optimization problems. 3. Landscape-based adaptive operator selection DE In this section, a landscape-based adaptive operator selection algorithm for DE is presented. 3.1. LSAOS-DE As previously described, existing multi-operator algorithms use an adaptive operator selection mechanism which is usually based on the success of generating new offspring. In this section, the LSAOS-DE algorithm, which uses the problem’s landscape information and operators’ performances to adaptively place emphasis on the most suitable DE operator, is proposed. Its general steps are given in Algorithm 1. Algorithm 1 Proposed algorithm. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:

counter ← 0; Generate an initial population (X ) of size NP using Latin HypercubeDesign; F ES ← 0; Calculate the fitness values of X; F ES ← F ES + NP ; while F ES ≤ MAXF ES do counter ← counter + 1; if F ES ≤ limit then if count er < C S then Randomly assign each operator to ([ NP m ]) solutions; Generate new population using the assigned operators. Calculate the LDop considering those individuals updated by operator op; Calculate the success rate SRop (Equation 11) considering thoseindividuals updated by operator op; end if if count er == C S then Measure the performance of every operator using Equation 14; Determine the best-performing operator, i.e., the one with the maximum value ofNP M; end if if count er > C S and C ount er < 2C S then Evolve the entire population using best DE mutation strategy; end if if count er == 2C S then counter ← 0, reset m to use all operators again, SRop = 0, LDop = 0 ; end if else Evolve the population using the best DE operator; end if F ES ← F ES + NP ; Update NP using equation (Equation 8); end while

In this algorithm, a pool of m mutation strategies with different characteristics (i.e, maintaining diversity and high convergence rate) is defined and then an initial population of NP individuals generated using the Latin hypercube design (LHD) [55]. Each operator is assigned to the same number of individuals and a new solution is generated according to its assigned mutation strategy. At the same time, the information landscape metric and performance history of each operator are calculated using Algorithm 2 and Eq. 11, respectively. This process continues for CS generations, after which the normalized average value of the landscape metric and performance history measures is computed for each operator. Based on these

390

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

Fig. 1. The general structure for the proposed algorithm.

Algorithm 2 Algorithm for computing ILNS. 1: 2: 3: 4: 5: 6: 7:

Input: population of individuals (X ) updated by operator op; − → Determine the location of the best individual in the sample, x b ; Construct the pairwise comparison matrix M using equation 6; Construct vector LS f that represent the information matrix of the problem; Construct the reference function, fre f , by using equation 10;. Construct the vector LSre f that represent the information landscape of the referencefunction; Compute the value of the Information Landscape negative searchability index using equation 7;

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

391

values, the best-performing operator is selected to evolve the entire population in the subsequent cycle (CS generations). Once this step is finished and, owing to the fact that the performance of one operator may change during the evolutionary process, i.e., it may perform well in early generations but poorly in later ones or vice verse, all m operators are re-used to evolve the entire population and the landscape measure again calculated and recorded. Note that, after every CS generation, the success rate is set to a value of zero as is the landscape metric. This process is continued until a pre-defined number of fitness evaluations (limit) is reached and then the best-performing operator found so far is used to evolve the entire population until a stopping criterion is reached. At the same time, during the evolutionary process, NP is adaptively re-sized by using a linear population size reduction [48], i.e., it has a large value at the start of the evolutionary process which is then linearly reduced (by deleting the worst individual from the population) until NP min is reached, as



N Pt+1 = round

N P min − N P init MAXF ES





× F ES + NP init

(8)

where NP min is the smallest number of individuals that the proposed algorithm can use, FES the current number of fitness evaluations and MAXFES the maximum number of fitness evaluations. The purpose of adapting NP is to maintain diversity during the early stages of evolutionary process while placing more emphasis on the intensification process at later stages [48]. 3.2. The initialization phase In this phase LHD, which is a type of stratified sampling, is used to generate the initial population, as LHD is capable of producing a sample of points that efficiently cover a search region. The initialization is performed by

xi, j,0 = x j,min + lhd (1, NP ) × (x j,max − x j,min ) i = 1, 2, . . . ., NP and j = 1, 2, . . . , D

(9)

where lhd is a LHD function that generates random numbers from independent standard normal distributions [55,59]. 3.3. Selection phase 3.3.1. Information landscape negative searchability (ILNS) measure This is based on the difference between the information landscape vector of the problem to be solved and a well-known spherical function used as a reference landscape due to its simplicity and scalability [26] which is constructed by:

 − → fre f ( x ) = (x j − xbj )2 D

(10)

j=1

− → where x b is the best individual in the sample. As the uniform distribution used in [26] to generate a sample of individuals may not properly cover the search space of the problem considered, as previously mentioned, the LHD is used. After constructing the vector landscapes of the problem to be optimized and reference function (LSf and LSref , respectively), the ILNS measure is computed using Algorithm 2 and the information landscape measure calculated using Eq. 7. 3.3.2. Normalized performance measure (NPM) In this section, the overall NPM of each operator is computed using the operators searchability index (LD) and success rate (SRop ) which is defined as the number of successful offspring generated by an operator (op) divided by the number of individuals assigned to that op as:

SRop =

Number of improved offsprings Number of all individuals evolved by operator

(11)

The capability of each operator to explore the landscape is calculated by 2 with the normalized values of the SR and LD metrics calculated respectively by:

MSRop NMSRop = m op=1 MSRop

(1 − MLDop ) op=1 (1 − MLDop )

NMLDop = m

(12) (13)

where op = 1, 2, . . . , m and MSR and MLD are the mean values of the SR and LD, respectively. Subsequently, the overall normalized performance of each operator is computed by Eq. 14.

N P Mop = N MSRop + N MLDop

(14)

392

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

3.4. Parameter adaptation To calculate the values of the crossover rate (CRi ) and control (Fi ) parameters, the same adaptation procedure as reported in [48], which uses a historical memory of length H for both F and CR, is used with their values denoted as μF and μCR , and − → initially set to 0.5. Each individual ( x i ) is associated with its Fi and CRi , the values of which are respectively created by:

CRi = randni(μCR,ri , 0.1 )

(15)

Fi = randci(μF,ri , 0.1 )

(16)

where ri is randomly selected from [1, H], randni and randci are values randomly selected from normal and Cauchy distributions with means of μCR,ri and μF,ri , respectively, and a variance of 0.1. If the values of CRi and Fi , are outside [0,1], a repair mechanism is used as follows: if CRi is outside the range, it is replaced by the limit value (0 or 1) closest to the generated value and if Fi > 1, it is replaced by 1 and, if Fi ≤ 1, Eq. 16 is repeatedly executed until a valid value is generated. At the end of each generation, the CRi and Fi used by the successful individuals are recorded in SCR and SF, and then the contents of the respective historical memories updated as:



 μF,h,g+1 =

if SCR = φ otherwise

meanwL (SCR )

μCR,h,g+1 =

μCR,h,g

meanwL (SF )

μF,h,g

if SF = φ otherwise

(17)

(18)

where 1 ≤ h ≤ H is the position in the memory to be updated. It is initialized to 1, and then incremented whenever a new element is inserted into the history, and if it is greater than H, it is set to 1 and meanwL (SF) the Lehmer mean computed by: | SF |

meanwL (SF ) =

h=1

wh .SFh 2

| SF | h=1

(19) wh .SFh

where wh is the weight computed using Eq. 20



( f (uh,g+1 ) − f (xh,g ))

wh = | SF |



( f (uh,g+1 ) − f (xh,g ))

(20)

h=1

4. Experimental setup and results To judge the performance of the proposed algorithm, experiments were conducted on 45 optimization functions taken from the CEC2014 (30 problems) and CEC2015 (15 problems) competitions for single-objective optimization [21] [22] with dimensions (D) of 10, 30 and 50, and a search space of [−100, 100]D . F01: F03 and F31: F32 are unimodal, F04: F16 and F33: F35 simple multimodal, F17: F22 and F36: F38 hybrid, and F23: F30 and F39: F45 a composite functions which combine multiple test problems in a complex landscape. The performance of the proposed LSAOS-DE algorithm is compared with those of three DE-based single-operator algorithms (LSHADE [48], SHADE [47], and JADE [58]), four multi-operator-based ones (CoDE [53], SaDE [38], EPSDE [29], and MPEDE [54]) and two powerful multi-method-based ones (UMOEAs [11] and AMALGAM-SO [52]). Also, LSAOS-DE was compared with seven of its variants (AOSDE-LS, AOSDE-SR, AOSDE-FIR, AOSDE-PR, AOSDE-PBM, AOSDE-RD and AOSDE-PAOC), in each of which only the selection mechanisms were different, as described below. 1. In AOSDE-LS, landscape information is used to select the best-performing DE mutation strategy as in Eq. 13. 2. In ASODE-SR, to select the best-performing operator, a success rate is used, as in Eq. 11. 3. In ASODE-FIR, the fitness improvement rate, which is used to evaluate the difference in quality between the parent solution and its offspring in a normalized manner [20], is used and computed by:

F IRop =

pf − cf pf

(21)

where pf and cf are the best fitness values of the parent and child solutions, respectively. 4. In AOSDE-PR, the progress ratio [14], which is used to evaluate the capability of an operator to improve its best solution, is computed by:



P R = |ln

fmin (t − 1 ) | fmin (t )

(22)

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

393

5. In AOSDE-PBM, an exponential model for each absolute function error of the best solution found (aebx ) and the optimal solution obtained by every DE mutation strategy is built. Then, it is used to calculate the expected absolute error after subsequent CS generations, with the DE mutation strategy that has the minimum expected absolute error chosen as the best-performing operator [11]. 6. In AOSDE-RD, one of the DE mutation strategies is randomly selected in every CS generation to evolve the entire populations for subsequent CS generations. 7. In AOSDE-PAOC, DE mutation strategies are given probabilities in proportion to their improvement in fitness over two iterations [33] computed by:

P b(t ) = P b(t − 1 ) +

Sop 

( f (xi (t − 1 )) − f (xi (t ))

(23)

i=1

where Sop is the number of individuals evolved by operator op and Pbop (0) the initial probability calculated by:

P bop (0 ) =

1 Sop

(24)

All the algorithms were coded in Matlab R2014a and run on a PC with a 3.4 GHz Core I7 processor, 16 GB RAM and Windows 7. For the state-of-the-art algorithms, all their parameter values were set as recommended in their corresponding papers, and for a fair comparison, all the algorithms started from the same seed. As it was originally reported that the population sizes for JADE were only for 10D, 30D and 100D, we used an approximate value of 185 individuals for 50D. For more comprehensive comparisons, each algorithm was run 51 times for all the optimization functions with 10D, 30D and 50D to a maximum number of fitness evaluations of 10 0 0 0D, with the best, mean and standard deviation results recorded. Note that, for any run, if the deviation from the optimal solution for the best fitness value was less than or equal to 1.0e − 8, it was considered zero. To compare different algorithms, two non-parametric tests (the Wilcoxon signed-rank test [12] and Friedman’s ranking [12]) and performance profiles [1] were conducted. The Wilcoxon signed-rank test, in which one of three signs ( + , - , and ≈ ) was assigned for the comparison of any two algorithms, where the ” + ” sign means the first algorithm was significantly better than the second, the ” - ” that it was significantly worse, and the ” ≈ ” that there was no significant difference between the two algorithms, with a 5% significance level was used to judge the difference between any pair of algorithms, and a Friedman test to rank all the algorithms. Also, to compare the performance of the proposed algorithm graphically, performance profiles [1], which are a tool to compare the performance, of a number of algorithms (S) using a set of problems (P), and a comparison goal (such as the computational time and the average number of fitness evaluations) to obtain a certain level, of a performance indicator (such as optimal fitness), are plotted. For an algorithm (s), the performance profile Rhos is given by

Rhos (τ ) =

1 × | p ∈ P : r p,s ≤ τ | np

(25)

where Rhos (τ ) is the probability for s ∈ S that its performance ratio rp, s is within a factor τ ∈ R of the best possible ratio and Rhos is the (cumulative) distribution function for the performance ratio. 4.1. Algorithms’ Parameters Regarding the algorithms’ parameters, NPinit was set to 18D, NP min to 7, CS to 25, ϕ to 0.5 for DE/ϕ best/1 to maintain diversity and 0.1 for all other variants to speed up the convergence rate, the archive rate (A) to 1.4, memory size (H) to 5 and limit, which was the limit for running the multi-operator, to 12 × MAXF ES , after which the best-performing operator evolved the population until the end of the run. 4.2. Analysis of algorithm’s components In this section, the proposed LSAOS-DE’s components are analyzed to determine the final version of this proposed algorithm. 4.2.1. Analysis of number of operators in LSAOS-DE (m) To design a multi-operator DE algorithm, one should first analyze and determine the mutation strategies that should be used. We tested 11 variants of the standard DE algorithm (DE/ϕ best/1, DE/current-to-ϕ best/1/ with archive, DE/current-toϕ best/1/ without archive, DE/rand/1, DE/rand/2, DE/current-to-rand/1/ with archive, DE/current-to-rand/1/without archive, DE/rand-to-ϕ best/1/with archive, DE/rand-to-ϕ best/1/without archive, DE/ϕ best/2, and DE/rand-to-ϕ best/2/with archive) by using them to solve the CEC2014 problems with 30D. Based on their average results, they were ranked according to the Friedman test and their mean values are presented in Table 1. The number of DE mutation strategies (m) required for LSAOS-DE was determined by conducting different experiments using m =3, 4, 5, 6 and 7 (and the best variants based on results shown in Table 1) to solve the 30D problems, with the

394

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404 Table 1 Friedman’s test results. Algorithm

Mean rank

Order

DE/current-to-ϕ best/1 without archive DE/current-to-ϕ best/1 archive DE/ϕ best/1 DE/rand-to-ϕ best/1 without archive DE/rand/1 DE/rand-to-ϕ best/1 with archive DE/current-to-rand/1 with archive DE/rand-to-ϕ best/2 with archive DE/current-to-rand/1 without archive DE/ϕ best/2 DE/rand/2

4.22 4.63 5.57 5.6 6.07 6.17 6.35 6.37 6.48 7.02 7.53

1 2 3 4 5 6 7 8 9 10 11

Table 2 Summary of comparison of different variants of proposed algorithm with different m values (3, 4, 5, 6 and 7) based on average and median results, where ’Dec’. statistical decision based on Wilcoxon signed-rank test results.

Mean

Median

Algorithms

Better

Equal

Worse

Prob.

Dec. α = 0.05

Dec. α = 0.10

m=3 m=3 m=3 m=3 m=4 m=4 m=4 m=5 m=5 m=6 m=3 m=3 m=3 m=3 m=4 m=4 m=4 m=5 m=5 m=6

12 9 13 15 6 16 15 18 15 13 11 7 13 17 7 16 15 17 17 10

7 8 8 9 11 7 7 7 9 8 9 11 9 8 10 9 8 10 8 8

11 13 9 6 13 7 8 5 6 9 10 12 8 5 13 5 7 3 5 12

0.761 0.322 0.390 0.054 0.147 0.034 0.064 0.002 0.009 0.833 0.768 0.314 0.092 0.14 0.232 0.003 0.026 0.001 0.002 0.465

≈ ≈ ≈ ≈ ≈ + ≈ + + ≈ ≈ ≈ ≈ + ≈ + + + + ≈

≈ ≈ ≈ + ≈ + + + + ≈ ≈ ≈ + + ≈ + + + + ≈

vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs.

m=4 m=5 m=6 m=7 m=5 m=6 m=7 m=6 m=7 m=7 m=4 m=5 m=6 m=7 m=5 m=6 m=7 m=6 m=7 m=7

Table 3 Friedman test results. Average ranking of LSAOS-DE and other seven variants

Average ranking for different values of NPinit

Average ranking for different values of NP min

Algorithm

Mean rank

Algorithm

Mean rank

Algorithm

LSAOS-DE AOSDE-LS AOSDE-SR AOSDE-FIR AOSDE-PR AOSDE-PBM AOSDE-RD AOSDE-PAOC

3.33 4.88 4.49 4.68 4.62 4.65 4.79 4.57

NP init NP init NP init NP init NP init NP init NP init

5.28 4.52 3.93 3.37 3.90 3.17 3.83

NP min NP min NP min NP min NP min

= 5D = 10D = 15D = 18D = 20D = 25D = 30D

Mean rank

=7 = 10 = 20 = 30 = 40

2.50 2.68 3.36 3.36 4.30

NP min = 50

4.25

remaining parameters set as in Section 4.1. Note that increasing the number of operators did not ensure better performances [10], with one reason for this being that the operators used were not complementary which may have caused a loss of FES and/or a biased search process. The Wilcoxon signed-rank test results are presented in Table 2 in which, for example, row 4, which compares m = 3 and m = 7, indicates that m = 3 was superior for 15 test problems, obtained the same results for 9 but was inferior for 6 which means that m = 3 was better than m = 7. It is also clear that m = 5 was better than the others, as also confirmed by the Friedman test results and performance profiles shown in Table 4 and Fig. 2a, respectively. 4.2.2. Analysis of reduction mechanism In this section, we analyze how m was reduced by running four different variants of the proposed algorithm (Ver1, Ver2, Ver3 and Ver4). In Ver1, it was reduced from 5 to 1; in Ver2, from 5 to 4 to 3 to 2 to 1, in Ver3, from 5 to 3 to 1, and in Ver4 from 5 to 3 to 2 to 1. To demonstrate the benefit of reducing m over generations, another variant (Ver5) with no

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

395

Table 4 Friedman’s test results (cont.). Average ranking for different values of CS

Average ranking for different values of m

Average rankings for different five variants

Algorithms

Mean rank

Algorithms

Mean rank

Algorithms

Mean rank

CS = 25 CS = 50 CS = 75 CS = 100 CS = 150 CS = 200

3.50 3.87 4.08 3.38 2.95 3.22

m=3 m=4 m=5 m=6 m=7

2.80 2.83 2.55 3.33 3.48

Ver1 Ver2 Ver3 Ver4 Ver5

2.38 3.42 3.37 2.70 3.13

Fig. 2. Performance profiles for different parameters, where Rhos (τ ) indicates the fraction of benchmark problems a solver can solve within a factor τ ≥ 1 of best-observed performance.

396

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404 Table 5 Summary of comparisons of different variants of proposed algorithm with different variants for reducing m based on average and median results, where ’Dec.’ statistical decision based on Wilcoxon signed-rank test results.

Mean

Median

Algorithms

Better

Equal

Worse

Prob.

Dec. α = 0.05

Dec. α = 0.10

Ver1 Ver1 Ver1 Ver1 Ver2 Ver2 Ver2 Ver3 Ver3 Ver4 Ver1 Ver1 Ver1 Ver1 Ver2 Ver2 Ver2 Ver3 Ver3 Ver4

19 17 12 14 11 8 11 8 10 15 21 17 13 12 9 4 7 8 7 13

7 8 10 8 7 7 6 6 7 7 7 9 8 13 7 8 7 7 7 7

4 5 8 8 12 15 13 16 13 8 2 4 9 3 14 18 16 15 16 10

0.002 0.026 0.332 0.042 0.891 0.097 0.909 0.103 0.761 0.059 0.0 0 0 0.001 0.338 0.011 0.274 0.001 0.114 0.107 0181 0.114

+ + ≈ + ≈ ≈ ≈ ≈ ≈ ≈ + + ≈ + ≈ ≈ ≈ ≈ ≈

+ + ≈ + ≈ ≈ ≈ ≈ + + + ≈ + ≈ ≈ ≈ ≈ ≈

vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs.

Ver2 Ver3 Ver4 Ver5 Ver3 Ver4 Ver5 Ver4 Ver5 Ver5 Ver2 Ver3 Ver4 Ver5 Ver3 Ver4 Ver5 Ver4 Ver5 Ver5

reduction was run. In it, all the five search operators were used in each generation and the number of individuals evolved by each mutation strategy updated based on our proposed selection mechanism. All the parameters were set as in the previous section and the detailed results (average, St.d) are presented in supplementary material. A summary of the results based on the Wilcoxon signed-rank test are shown in Table 5, which indicates that Ver1 was the best as is also evident from the overall rankings of the five different variants presented in Table 4. These results were confirmed by the plots of the performance profiles depicted in Fig. 2b. From the above comparisons, it is clear that Ver1 was the best of the variants that used a reduction mechanism for m. It was also better than Ver5 which means that reducing the number of DE mutation strategies is better than fixing it during the search process. 4.2.3. Effect of NPinit The influence of NPinit on the performance of LSAOS-DE was investigated by conducting several experiments with init NP =5D, 10D, 18D, 20D, 25D and 30D for solving the 30D problems. The other parameters were fixed as in Section 4.1 (detailed results are shown in the supplementary material by accessing the link provided in the appendix). Table 6 presents the summary of the obtained results based on the Wilcoxon signed-rank test in which is clear that NP init = 25D was the best. Further analysis based on the Friedman test results for different values of NPinit was conducted, with the mean ranks introduced in Table 3 showing that NP init = 25D was the best. Fig. 2c presents the performance profiles for NPinit with six different values in which it is clear that the variant with NP init = 25D was better than all the others. 4.2.4. Effect of CS The effect of CS, which determines reductions in the number of mutation strategies, was analyzed using 25, 50, 75, 100, 150 and 200 generations while all the other parameters were set as in the previous section (detailed results are shown in the supplementary material accessed via a link in the appendix). The overall rankings for all the different CS values based on the Friedman test are presented in Table 4 in which it is clear that higher values of CS had good impacts on performance, with CS = 150 the best. Table 7 presents a summary of the comparisons of six different CS values based on the Wilcoxon signed-rank test results, with CS = 150 proven to be better than the others. From the performance profiles of the six different CS values depicted in Fig. 2d, it is clear that the variant with CS = 150 was the first to reach a probability of 1 with τ between 4 and 5. 4.2.5. Effect of NP min As previously mentioned, the number of individuals in the whole population was initially NPinit which was then linearly reduced to NP min . To analyze the effect of NP min , different experiments were conducted using different values of NP min of 7, 10, 20, 30, 40, and 50, while all the other parameters were fixed, as previously discussed (detailed results are shown in the supplementary material accessed via a link provided in the appendix), with all the variants ranked based on the Friedman test, and their results are summarized in Table 3. Also, a Wilcoxon signed-rank test of the different variants of the proposed

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

397

Table 6 Summary of comparisons of different variants of proposed algorithm with different NPinit values (5D, 10D, 18D, 20D, 25D and 30D) based on mean and median results, where ’Dec.’ statistical decision based on Wilcoxon signed-rank test results.

Mean

Median

Algorithms

Better

Equal

Worse

Prob.

Dec. α = 0.05

Dec.α = 0.10

5D vs. 10D 5D vs. 15D 5D vs. 18D 5D vs. 20D 5D vs. 25D 5D vs. 30D 10D vs. 15D 10D vs. 18D 10D vs. 20D 10D vs. 25D 10D vs. 30D 15D vs. 18D 15D vs. 20D 15D vs. 25D 15D vs. 30D 18D vs. 20D 18D vs. 25D 18D vs. 30D 20D vs. 25D 20D vs. 30D 25D vs. 30D 5D vs. 10D 5D vs. 15D 5D vs. 18D 5D vs. 20D 5D vs. 25D 5Dvs.30D$ 10D vs. 15D 10D vs. 18D 10D vs. 20D 10D vs. 25D 10D vs. 30D 15D vs. 18D 15D vs. 20D 15D vs. 25D 15D vs. 30D 18D vs. 20D 18D vs. 25D 18D vs. 30D 20D vs. 25D 20D vs. 30D 25D vs. 30D

4 5 4 7 6 9 6 6 9 7 9 8 11 8 11 16 11 13 9 11 15 3 3 4 4 7 7 3 6 6 9 9 10 8 7 11 8 11 12 7 14 14

8 6 6 5 4 4 7 8 6 5 5 7 6 4 6 5 5 5 6 8 6 7 7 7 7 5 5 7 7 7 5 5 8 7 4 5 8 6 6 11 7 9

18 19 20 18 20 17 17 16 15 18 16 15 13 18 13 9 14 12 15 11 9 20 20 19 19 18 18 20 17 17 17 16 12 15 19 14 14 13 12 12 9 7

0.006 0.002 0.001 0.013 0.005 0.028 0.001 0.002 0.034 0.005 0.048 0.052 0.376 0.032 0.230 0.122 0.451 0.936 0.119 0.709 0.141 0.0 0 0 0.0 0 0 0.001 0.003 0.003 0.005 0.002 0.016 0.042 0.089 0.135 0.733 0.248 0.042 0.657 0.506 0.732 0.932 0.523 0.447 0.274

≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈

≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈

algorithm with different NP min values was conducted and the summary of the results presented in Table 8 which indicates that the setting of NP min = 7 was slightly better than those of other values, with small ones preferred. The performance profiles for different values of NP min are depicted in Fig. 2e which shows that the variant with NP min = 7 had the highest probability at the beginning and was the first to reach a probability of 1 at τ between 70 and 80. 4.3. Comparisons of proposed algorithm and other heuristic-based selection methods In this section, a comparison of LSAOS-DE and its seven variants, which are similar apart from their selection mechanisms, is discussed (detailed results are shown in supplementary material accessed via a link provided in the appendix). Their rankings obtained from the Friedman test are given in Table 9 which shows that LSAOS-DE was ranked first for the 10D, 30D and 50D problems, with AOSDE-RD second for the 10D, AOSDE-SR second for the 30D, and AOSDE-SR and AOSDEFIR equal second for the 50D ones. Table 10 presents a summary of comparisons of the proposed algorithm and its seven variants considering the average and median results obtained from the Wilcoxon signed-rank test for the 10D, 30D and 50D problems in which it is clear that, for example, for the 30D ones, LSAOS-DE was superior to AOSDE-LS for 32, obtained the same results for 3 but was inferior for 10. Generally speaking, it is evident that the proposed algorithm was better than its variants. Furthermore, from the graphs of the performance profiles depicted in Fig. 2f, it is clear that LSAOS-DE had the highest probability at the start and was the first to reach a probability of 1 at τ = 2.5 while AOSDE-SR was second. As a further illustration, the convergence plots of the best results obtained during 51 runs of LSAOS-DE, AOSDE-SR and AOSDE-LS for the 30D problems for four different functions, F01, F06, F20 and F30, are shown in Fig. 3, in which it is clear

398

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404 Table 7 Summary of comparisons of different variants of proposed algorithm with different CS values (25, 50, 75, 100, 150 and 200) based on mean and median results, Where ’Dec.’ statistical decision based on Wilcoxon signedrank test results.

Mean

Median

Algorithms

Better

Equal

Worse

Prob.

Dec. α = 0.05

Dec.α = 0.10

CS = 25 vs. CS = 50 CS = 25 vs. CS = 75 CS = 25 vs. CS = 100 CS = 25 vs. CS = 150 CS = 25 vs. CS = 200 CS = 50 vs. CS = 75 CS = 50 vs. CS = 100 CS = 50 vs. CS = 150 CS = 50 vs. CS = 200 10 CS = 75 vs. CS = 100 CS = 75 vs. CS = 150 CS = 75 vs. CS = 200 CS = 100 vs. CS = 150 CS = 100 vs. CS = 200 CS = 150 vs. CS = 200 CS = 25 vs. CS = 50 CS = 25 vs. CS = 75 CS = 25 vs. CS = 100 CS = 25 vs. CS = 150 CS = 25 vs. CS = 200 CS = 50 vs. CS = 75 CS = 50 vs. CS = 100 CS = 50 vs. CS = 150 CS = 50 vs. CS = 200 CS = 75 vs. CS = 100 CS = 75 vs. CS = 150 CS = 75 vs. CS = 200 CS = 100 vs. CS = 150 CS = 100 vs. CS = 200 CS = 150 vs. CS = 200

12 13 10 8 10 9 7 8 7 8 3 6 8 9 10 14 13 12 9 13 10 10 7 10 8 7 9 8 10 11

9 9 9 9 8 10 9 7 13 8 8 8 9 10 10 8 6 8 8 8 10 12 8 8 10 9 8 9 8 11

9 8 11 13 12 11 14 15 0.140 14 19 16 13 11 10 8 9 10 13 9 10 8 15 12 12 14 13 13 12 8

0.566 0.313 0.741 0.095 0.256 0.709 0.251 0.089 ≈ 0.168 0.004 0.020 0.076 0.232 1 0.236 0.168 0.661 0.158 0.465 0.528 0.758 0.081 0.158 0.794 0.073 0.062 0.099 0.200 0.376

≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈

≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈

Table 8 Summary of comparisons of different variants of proposed algorithm with different NP min values (7, 10, 20, 30, 40 and 50) based on average and median results., where ’Dec.’ statistical decision based on Wilcoxon signed-rank test results. Algorithms Mean

Median

NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min NP min

= 7 vs. NP min = 10 = 7 vs. NP min = 20 = 7 vs. NP min = 30 = 7 vs. NP min = 40 = 7 vs. NP min = 50 = 10 vs. NP min = 20 = 10 vs. NP min = 30 = 10 vs. NP min = 40 = 10 vs. NP min = 50 = 20 vs. NP min = 30 = 20 vs. NP min = 40 = 20 vs. NP min = 50 = 30 vs. NP min = 40 = 30 vs. NP min = 50 = 40 vs. NP min = 50 = 7 vs. NP min = 10 = 7 vs. NP min = 20 = 7 vs. NP min = 30 = 7 vs. NP min = 40 = 7 vs. NP min = 50 = 10 vs. NP min = 20 = 10 vs. NP min = 30 = 10 vs. NP min = 40 = 10 vs. NP min = 50 = 20 vs. NP min = 30 = 20 vs. NP min = 40 = 20 vs. NP min = 50 = 30 vs. NP min = 40 = 30 vs. NP min = 50 = 40 vs. NP min = 50

Better

Equal

Worse

Prob.

Dec. α = 0.05

Dec.α = 0.10

14 19 19 18 18 18 17 21 18 15 16 15 17 18 14 15 19 15 20 17 18 17 18 16 13 16 13 17 13 11

8 8 7 6 5 7 7 7 6 6 7 6 9 5 7 9 9 9 7 8 8 9 7 8 9 7 7 7 8 7

8 3 4 6 7 5 6 2 6 9 7 9 4 7 9 6 3 6 3 5 4 4 5 6 8 7 10 6 9 12

0.961 0.002 0.006 0.063 0.042 0.006 0.005 0.0 0 0 0.001 0.265 0.191 0.130 0.068 0.013 0.097 0.181 0.005 0.006 0.002 0.023 0.004 0.002 0.003 0.031 0.106 0.055 0.605 0.056 0.445 0.903

≈ + + ≈ + + + + + ≈ ≈ ≈ ≈ + ≈ ≈ + + + + + + + + ≈ ≈ ≈ ≈ ≈ ≈

≈ + + + + + + + + ≈ ≈ ≈ + + + ≈ + + + + + + + + ≈ + ≈ + ≈ ≈

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

Fig. 3. Convergence graphs for F01, F06, F20 and F30, which compares LSAOS-DE with both AOSDE-LS and AOSDE-SR.

399

400

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404 Table 9 Friedman test results obtained from comparing LSAOS-DE and seven variants. 10D



30D

50D

Overall rank

Algorithm

Mean rank

Order

Mean rank

Order

Mean rank

Order

LSAOS-DE AOSDE-LS AOSDE-SR AOSDE-FIR AOSDE-PR AOSDE-PBM AOSDE-RD AOSDE-PAOC

3.42 4.86 5.07 4.79 4.50 4.27 4.26 4.86

1 6 7 5 4 3 2 6

3.38 4.74 4.13 4.98 4.66 4.47 5.00 4.64

1 6 2 7 5 3 8 4

3.18 5.03 4.28 4.28 4.69 5.22 5.11 4.21

1 5 2.5 2.5 5 7 8 4

1 8 2 6 4 5 7 3

Overall rank based on sum of orders for 10D, 30D, and 50D

Table 10 Summary of comparisons of proposed algorithm and other seven variants based on mean and median results, where ’Dec.’ statistical decision based on Wilcoxon signed-rank test results. 10D Algorithms Mean

Median

LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE LSAOS-DE

vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs. vs.

AOSDE-LS AOSDE-SR AOSDE-FIR AOSDE-PR AOSDE-PBM AOSDE-RD AOSDE-PAOC AOSDE-LS AOSDE-SR AOSDE-FIR AOSDE-PR AOSDE-PBM AOSDE-RD AOSDE-PAOC

30D

50D

Better

Equal

Worse

Prob.

Dec.

Better

Equal

Worse

Prob.

Dec.

Better

Equal

Worse

Prob.

Dec.

24 27 25 27 24 21 25 17 20 19 21 18 19 21

11 8 9 8 10 10 10 19 13 16 15 17 16 8

10 10 11 10 11 14 10 9 12 10 9 10 10 16

0.004 0.017 0.010 0.006 0.007 0.047 0.004 0.012 0.044 0.024 0.042 0.031 0.023 0.007

+ + + + + + + + + + + + + +

32 25 29 28 31 28 28 23 24 22 25 27 27 23

3 4 2 4 2 3 4 12 8 12 13 7 8 10

10 16 14 13 12 14 13 10 13 11 7 11 10 12

0.003 0.141 0.038 0.035 0.009 0.037 0.015 0.041 0.158 0.027 0.003 0.037 0.031 0.043

+ ≈ + + + + + + ≈ + + + + +

35 29 28 29 32 37 28 32 27 26 24 26 30 25

1 3 2 2 1 1 2 7 7 9 15 5 5 5

9 13 15 14 12 7 15 6 11 10 6 14 10 15

0.0 0 0 0.043 0.036 0.033 0.001 0.0 0 0 0.070 0.0 0 0 0.023 0.041 0.001 0.008 0.0 0 0 0.108

+ + + + + + ≈ + + + + + + ≈

Table 11 Friedman test results obtained from comparisons of LSAOS-DE and state-of-the-art algorithms. 10D



30D

50D

Algorithm

Mean rank

Order

Mean rank

Order

Mean rank

order

Overall rank

LSAOS-DE LSHADE JADE SHADE CoDE SaDE EPSDE UMOEAs MPEDE AMALGAM-SO

3.38 4.02 5.96 5.49 4.87 6.40 6.93 6.37 4.87 6.72

1 2 5 4 3 7 9 6 3 8

3.12 3.90 6.59 5.72 5.88 7.27 7.17 5.84 4.11 5.40

1 2 8 5 7 10 9 6 3 4

3.01 3.91 6.34 5.64 6.52 7.41 7.27 5.47 4.53 4.89

1 2 7 6 8 10 9 5 3 4

1 2 8 4 7 9.5 9.5 6 3 5

Overall rank based on sum of orders for 10D, 30D, and 50D

that the convergence speed of LSAOS-DE was the best. Also, the values of the success rates and landscape measure are depicted in Fig. 3. It can be concluded that a balance between the success rate and landscape information led to good performance over each case. 4.4. Comparisons of LSAOS-DE and state-of-the-art algorithms Experiments were conducted to compare the performance of the proposed algorithm with those of state-of-the-art ones using the best set of previously determined parameter values, that is N P init = 25D, N P min = 7, CS = 150 and NP is updated using Eq. 8. − → − → Details of the computational results ( f ( x best ) − f ( x ∗ )) obtained from the proposed algorithm for the 10D, 30D, and 50D problems are shown in supplementary material accessed via a link provided in the appendix in which the average values and standard deviations of the solutions are compared with those obtained from the state-of-the-art algorithms, with the best average result for each problem presented in bold.

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

401

Table 12 Summary of comparisons of proposed algorithm and nine state-of-the-art algorithms based on mean and median results, where ’Dec.’ statistical decision based on Wilcoxon signed-rank test results. Algorithms

10D

Mean

LSAOS-DE vs. LSHADE LSAOS-DE vs. SHADE LSAOS-DE vs. JADE LSAOS-DE vs. CoDE LSAOS-DE vs. SaDE LSAOS-DE vs. EPSDE LSAOS-DE vs. UMOEAs LSAOS- DE vs. MPEDE LSAOS-DE vs. AMALGAM-SO Median LSAOS-DE vs. LSHADE LSAOS-DE vs. SHADE LSAOS-DE vs. JADE LSAOS-DE vs. CoDE LSAOS-DE vs. SaDE LSAOS-DE vs. EPSDE LSAOS-DE vs. UMOEAs LSAOS- DE vs. MPEDE LSAOS-DE vs. AMALGAM-SO

30D

50D

Better Equal Worse Prob.

Dec. Better Equal Worse Prob.

Dec. Better Equal Worse Prob.

Dec.

20 35 33 25 32 29 28 28 31 19 28 27 17 32 29 29 30 34

≈ + + + + + + + + ≈ + + ≈ + + + + +

+ + + + + + + + ≈ + + + + + + + + ≈

≈ + + + + + + + + ≈ + + + + + + + +

12 8 9 8 8 8 8 9 2 14 14 14 14 12 9 10 12 10

13 2 3 12 5 8 9 8 12 12 3 4 14 1 7 6 3 1

0.057 0.0 0 0 0.0 0 0 0.022 0.0 0 0 0.011 0.001 0.001 0.006 0.321 0.0 0 0 0.0 0 0 0.518 0.0 0 0 0.013 0.0 0 0 0.0 0 0 0.003

21 35 36 33 39 34 32 28 29 19 31 34 32 37 32 29 25 29

8 6 5 5 2 5 7 7 16 12 9 7 7 5 6 8 10 0

16 4 4 7 4 6 6 10 0 14 5 4 6 3 7 8 10 16

0.023 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.005 0.144 0.018 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.011 0.271

23 33 36 35 40 36 31 32 31 22 32 35 36 40 31 29 33 33

5 4 3 1 1 0 4 4 0 7 6 6 4 2 6 9 7 0

17 8 6 9 4 9 10 9 14 16 7 4 5 3 8 7 5 12

0.125 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.001 0.001 0.0 0 0 0.019 0.380 0.0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0.001 0.0 0 0 0.0 0 0 0.045

Table 13 Algorithmic complexity.

CEC2014

CEC2015

D = 10 D = 30 D = 50 D = 100 D = 10 D = 30 D = 50 D = 100

T0

T1

Tˆ2

(Tˆ2 −T1 )

0.110568

0.307946 0.288033 0.559683 2.55944 0.117537 0.382173 0.789205 2.463938

0.926645 1.176284 1.943288 4.424032 0.932886 1.276088 2.218334 4.172859

5.59565064 8.033201288 12.51798893 16.86744809 7.374186021 8.084753274 12.92534006 15.45583713

0.110568

R

T0

0.981004

0.946601

The comparison summaries among LSAOS-DE and the state-of-the-art algorithms for 10D, 30D and 50D test problems, are presented in Table 12. From this table, it is clear that the number of problems in which better average and median fitness values obtained by the proposed algorithm for 10D, 30D and 50D, is higher than those obtained by the state-ofthe-art algorithms. Considering the Wilcoxon signed-rank test for the average results, the results show that LSAOS-DE is significantly better than all algorithms, except LSHADE for 10D and 50D and AMALGAM-SO for 30D. For median results, the proposed algorithm showed superiority to all other algorithms except, CoDE for 10D, LSHADE for 10D and 50D, and AMALGAM-SO for 50D test problems. However, there were a biased towards LSAOS-DE in the results obtained. For further analysis, the Friedman test was performed to rank all the algorithms based on the average fitness values they achieved. The results shown in Table 11 indicate that, generally LSAOS-DE ranked 1st followed by LSHADE, MPEDE, SHADE, AMALGAM-SO, UMOEAs, JADE, CoDE, SaDE and EPSDE, respectively. As a further illustration, the convergence plots of the average results obtained during 51 runs of all the algorithms for the 30D problems are shown in Fig. 4, in which it is evident that the convergence speed of LSAOS-DE was the best. 4.5. Relative time complexity This section describes the algorithm complexity of our LSAOS-DE code as defined in [21,22]. Table 13 shows the time complexity of LSAOS-DE computed for 10D, 30D, and 50D in CEC2014 [21] and CEC2015 [22]. As defined in [21,22], T0 is the time calculated by running the code : for i = 1 : 10 0 0 0 0 0 x = 0.55+(double)i; x = x + x; x = x/2; x = x ∗ x; x = sqrt (x ); x = log(x ); x = exp(x ); x = x/(x + 2 ); end T1 is the time to execute 20 0,0 0 0 evaluations of the benchmark functions F18 for CEC2014 and F31 for CEC2015, respectively, with D dimensions and, T2 the time to execute LSAOS-DE with 20 0,0 0 0 evaluations of both functions in D dimensions. Tˆ2 is the mean of T2 values of 5 runs. The correlation coefficients (R) in Table 13 computed to determine the relationships between dimensions and which, for both CEC2014 and CEC2015, is a strong positive linear one (i.e., sions).

(Tˆ2 −T1 ) T0

(Tˆ2 −T1 ) T0

scaled linearly with the number of dimen-

402

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

Fig. 4. Convergence plots for different functions.

5. Conclusion and future work It is known that no one algorithm or operator is capable of solving all kinds of optimization problems as, although each may perform well in the early generations of a single run, its performance often deteriorates during later ones. Therefore, the selection of an appropriate algorithm or operator is not an easy task. During the last few decades, DE algorithms have shown superior performances to many other state-of-the-art algorithms for solving both constrained and unconstrained optimization problems. In this paper, the LSAOS-DE algorithm, which uses both a landscape measure and a performance history measure to select appropriate DE mutation strategies from a pool of five operators, was presented. We investigated the effects of the parameters of the proposed algorithm, such as population size, minimum population size and cycle length. To verify the performance of the proposed LSAOS-DE algorithm, comparisons are carried out with three DE-based single operator algorithms, four multi-operator-based algorithms and two powerful multi-method based algorithms for 45 bound constrained numerical optimization problems from CEC2014 and CEC2015. For further comparison, the proposed algorithm was also compared with seven of its variants, in each of which only the selection mechanisms were different. The comparison show that LSAOS-DE was superior to the state-of-the-art algorithm in most of the cases. In future work, we will investigate using more than one landscape measure and a multi-method approach in which each method uses multiple operators. We will also determine the effect of using a local search method. Supplementary material Supplementary material associated with this article can be found, in the online version, at 10.1016/j.ins.2017.08.028. References [1] H.J. Barbosa, H.S. Bernardino, A.M. Barreto, Using performance profiles for the analysis and design of benchmark experiments, in: Advances in Metaheuristics, Springer, 2013, pp. 21–36.

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

403

[2] B. Bischl, O. Mersmann, H. Trautmann, M. Preuß, Algorithm selection based on exploratory landscape analysis and cost-sensitive learning, in: Proceedings of the 14th annual conference on Genetic and evolutionary computation, ACM, 2012, pp. 313–320. [3] Y. Borenstein, R. Poli, Decomposition of Fitness Functions in Random Heuristic Search, in: Foundations of Genetic Algorithms, Springer, 2007, pp. 123–137. [4] F. Caraffini, F. Neri, L. Picinali, An analysis on separability for memetic computing automatic design, Inf. Sci. (Ny) 265 (2014) 1–22. [5] J. Cheng, G. Zhang, F. Neri, Enhancing distributed differential evolution with multicultural migration for global numerical optimization, Inf. Sci. (Ny) 247 (2013) 72–93. [6] F. Chicano, G. Luque, E. Alba, Autocorrelation measures for the quadratic assignment problem, Appl. Math. Lett. 25 (4) (2012) 698–705. [7] P.A. Consoli, Y. Mei, L.L. Minku, X. Yao, Dynamic selection of evolutionary operators based on online learning and fitness landscape analysis, Soft. Comput. (2016) 1–26. [8] S. Das, S.S. Mullick, P. Suganthan, Recent advances in differential evolution–an updated survey, Swarm. Evol. Comput. 27 (2016) 1–30. [9] S. Elsayed, N. Hamza, R. Sarker, Testing united multi-operator evolutionary algorithms-ii on single objective optimization problems, in: Evolutionary Computation (CEC), 2016 IEEE Congress on, IEEE, 2016, pp. 2966–2973. [10] S.M. Elsayed, R.A. Sarker, D.L. Essam, Multi-operator based evolutionary algorithms for solving constrained optimization problems, Comput. Oper. Res. 38 (12) (2011) 1877–1896. [11] S.M. Elsayed, R.A. Sarker, D.L. Essam, N.M. Hamza, Testing united multi-operator evolutionary algorithms on the cec2014 real-parameter numerical optimization, in: Evolutionary Computation (CEC), 2014 IEEE Congress on, IEEE, 2014, pp. 1650–1657. [12] S. García, A. Fernández, J. Luengo, F. Herrera, Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power, Inf. Sci. (Ny) 180 (10) (2010) 2044–2064. [13] R.W. Garden, A.P. Engelbrecht, Analysis and classification of optimisation benchmark functions and benchmark suites, in: IEEE Congress on Evolutionary Computation (CEC), 2014, IEEE, 2014, pp. 1641–1649. [14] L.-A. Gordián-Rivera, E. Mezura-Montes, A Combination of Specialized Differential Evolution Variants for Constrained Optimization, in: Advances in Artificial Intelligence–IBERAMIA 2012, Springer, 2012, pp. 261–270. [15] J. Grobler, A.P. Engelbrecht, G. Kendall, V. Yadavalli, Heuristic space diversity control for improved meta-hyper-heuristic performance, Inf. Sci. (Ny) 300 (2015) 49–62. [16] N. Hansen, A. Auger, S. Finck, R. Ros, Real-parameter black-box optimization benchmarking 2010: Experimental setup, INRIA, 2010 Ph.D. thesis. [17] N. Hansen, S. Finck, R. Ros, A. Auger, Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions, INRIA, 2009 Ph.D. thesis. [18] G. Iacca, F. Caraffini, F. Neri, Multi-strategy coevolving aging particle optimization, Int. J. Neural Syst. 24 (01) (2014) 1450 0 08. [19] K. Deb, Optimization for engineering design: Algorithms and examples, PHI Learning Pvt. Ltd., 2012. [20] K. Li, A. Fialho, S. Kwong, Q. Zhang, Adaptive operator selection with bandits for a multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comput. 18 (1) (2014) 114–130. [21] J. Liang, B. Qu, P. Suganthan, Problem definitions and evaluation criteria for the cec 2014 special session and competition on single objective real-parameter numerical optimization, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological University, Singapore (2013). [22] J. Liang, B. Qu, P. Suganthan, Q. Chen, Problem definitions and evaluation criteria for the cec 2015 competition on learning-based real-parameter single objective optimization, Computational Intelligence Laboratory (2014). [23] J. Liu, J. Lampinen, A fuzzy adaptive differential evolution algorithm, Soft Comput. 9 (6) (2005) 448–462. [24] M. Lunacek, D. Whitley, The dispersion metric and the cma evolution strategy, in: Proceedings of the 8th annual conference on Genetic and evolutionary computation, ACM, 2006, pp. 477–484. [25] L. Luo, X. Hou, J. Zhong, W. Cai, J. Ma, Sampling-based adaptive bounding evolutionary algorithm for continuous optimization problems, Inf. Sci. (Ny) 382 (2017) 216–233. [26] K. Malan, A. Engelbrecht, Characterising the searchability of continuous optimisation problems for pso, Swarm Intell. 8 (4) (2014) 275–302. [27] K.M. Malan, A.P. Engelbrecht, Quantifying ruggedness of continuous landscapes using entropy, in: Proceedings of the Eleventh conference on Congress on Evolutionary Computation, IEEE Press, 2009, pp. 1440–1447. [28] K.M. Malan, A.P. Engelbrecht, Particle swarm optimisation failure prediction based on fitness landscape characteristics, in: Swarm Intelligence (SIS), 2014 IEEE Symposium on, IEEE, 2014, pp. 1–9. [29] R. Mallipeddi, P.N. Suganthan, Q.-K. Pan, M.F. Tasgetiren, Differential evolution algorithm with ensemble of parameters and mutation strategies, Appl. Soft. Comput. 11 (2) (2011) 1679–1696. [30] O. Mersmann, B. Bischl, H. Trautmann, M. Preuss, C. Weihs, G. Rudolph, Exploratory landscape analysis, in: Proceedings of the 13th annual conference on Genetic and evolutionary computation, ACM, 2011, pp. 829–836. [31] M.A. Muñoz, M. Kirley, S.K. Halgamuge, A Meta-learning Prediction Model of Algorithm Performance for Continuous Optimization Problems, in: Parallel Problem Solving from Nature-PPSN XII, Springer, 2012, pp. 226–235. [32] M.A. Muñoz, Y. Sun, M. Kirley, S.K. Halgamuge, Algorithm selection for black-box continuous optimization problems: a survey on methods and challenges, Inf. Sci. (Ny) 317 (2015) 224–245. [33] F.V. Nepomuceno, A.P. Engelbrecht, A self-adaptive heterogeneous pso inspired by ants, in: International Conference on Swarm Intelligence, Springer, 2012, pp. 188–195. [34] F. Neri, V. Tirronen, Recent advances in differential evolution: a survey and experimental analysis, Artif. Intell. Rev. 33 (1–2) (2010) 61–106. [35] M.G. Omran, A. Salman, A.P. Engelbrecht, Self-adaptive differential evolution, in: Computational intelligence and security, Springer, 2005, pp. 192–199. [36] E. Pitzer, M. Affenzeller, A comprehensive survey on fitness landscape analysis, in: Recent Advances in Intelligent Engineering Systems, Springer, 2012, pp. 161–191. [37] I. Poikolainen, F. Neri, F. Caraffini, Cluster-based population initialization for differential evolution frameworks, Inf. Sci. (Ny) 297 (2015) 216–235. [38] A.K. Qin, V.L. Huang, P.N. Suganthan, Differential evolution algorithm with strategy adaptation for global numerical optimization, Evol. Comput., IEEE Trans. 13 (2) (2009) 398–417. [39] L. Rutkowski, Computational Intelligence: Methods and Techniques, Springer, 2008. [40] K.M. Sallam, R.A. Sarker, D.L. Essam, S.M. Elsayed, Neurodynamic differential evolution algorithm and solving cec2015 competition problems, in: Evolutionary Computation (CEC), 2015 IEEE Congress on, IEEE, 2015, pp. 1033–1040. [41] R. Sarker, J. Kamruzzaman, C. Newton, Evolutionary optimization (evopt): a brief review and analysis, Int. J. Comput. Intell. Appl. 3 (04) (2003) 311–330. [42] E.K. da Silva, H.J. Barbosa, A.C. Lemonge, An adaptive constraint handling technique for differential evolution with dynamic use of variants in engineering optimization, Optimization and Engineering 12 (1–2) (2011) 31–54. [43] K.C. Steer, A. Wirth, S.K. Halgamuge, Information theoretic classification of problems for metaheuristics, in: Simulated Evolution and Learning, Springer, 2008, pp. 319–328. [44] R. Storn, K. Price, Differential evolution a simple and efficient adaptive scheme for global optimization over continuous spaces, international computer science institute, berkeley, Berkeley, CA (1995). [45] Y. Sun, S.K. Halgamuge, M. Kirley, M. Munoz, On the selection of fitness landscape analysis metrics for continuous optimization problems, in: Information and Automation for Sustainability (ICIAfS), 2014 7th International Conference on, IEEE, 2014, pp. 1–6. [46] A.M. Sutton, D. Whitley, M. Lunacek, A. Howe, Pso and multi-funnel landscapes: how cooperation might limit exploration, in: Proceedings of the 8th annual conference on Genetic and evolutionary computation, ACM, 2006, pp. 75–82.

404

K.M. Sallam et al. / Information Sciences 418–419 (2017) 383–404

[47] R. Tanabe, A. Fukunaga, Evaluating the performance of shade on cec 2013 benchmark problems, in: Evolutionary Computation (CEC), 2013 IEEE Congress on, IEEE, 2013, pp. 1952–1959. [48] R. Tanabe, A.S. Fukunaga, Improving the search performance of shade using linear population size reduction, in: Evolutionary Computation (CEC), 2014 IEEE Congress on, IEEE, 2014, pp. 1658–1665. [49] M. Tomassini, L. Vanneschi, P. Collard, M. Clergue, A study of fitness distance correlation as a difficulty measure in genetic programming, Evol. Comput. 13 (2) (2005) 213–239. [50] V.K. Vassilev, T.C. Fogarty, J.F. Miller, Information characteristics and the structure of landscapes, Evol. Comput. 8 (1) (20 0 0) 31–60. [51] J.A. Vrugt, B.A. Robinson, Improved evolutionary optimization from genetically adaptive multimethod search, Proc. Natl. Acad. Sci. 104 (3) (2007) 708–711. [52] J.A. Vrugt, B.A. Robinson, J.M. Hyman, Self-adaptive multimethod search for global optimization in real-parameter spaces, Evol. Comput., IEEE Trans. 13 (2) (2009) 243–259. [53] Y. Wang, Z. Cai, Q. Zhang, Differential evolution with composite trial vector generation strategies and control parameters, Evol. Comput., IEEE Trans. 15 (1) (2011) 55–66. [54] G. Wu, R. Mallipeddi, P. Suganthan, R. Wang, H. Chen, Differential evolution with multi-population based ensemble of mutation strategies, Inf. Sci. (Ny) 329 (2016) 329–345. [55] K.Q. Ye, Orthogonal column latin hypercubes and their application in computer experiments, J. Am. Stat. Assoc. 93 (444) (1998) 1430–1439. [56] D. Zaharie, Parameter adaptation in differential evolution by controlling the population diversity, in: Proceedings of the international workshop on symbolic and numeric algorithms for scientific computing, 2002, pp. 385–397. [57] A. Zamuda, J. Brest, Population reduction differential evolution with multiple mutation strategies in real world industry challenges, in: Swarm and Evolutionary Computation, Springer, 2012, pp. 154–161. [58] J. Zhang, A.C. Sanderson, Jade: adaptive differential evolution with optional external archive, Evol. Comput., IEEE Trans. 13 (5) (2009) 945–958. [59] Z. Zhao, J. Yang, Z. Hu, H. Che, A differential evolution algorithm with self-adaptive strategy and control parameters based on symmetric latin hypercube design for unconstrained optimization problems, Eur. J. Oper. Res. 250 (1) (2016) 30–45. [60] L.M. Zheng, S.X. Zhang, S.Y. Zheng, Y.M. Pan, Differential evolution algorithm with two-step subpopulation strategy and its application in microwave circuit designs, IEEE Trans. Ind. Inf. 12 (3) (2016) 911–923.