Aggregate meta-models for evolutionary multiobjective and many-objective optimization

Neurocomputing 116 (2013) 392–402 Contents lists available at SciVerse ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom...

Download PDF

342KB Sizes 0 Downloads 63 Views

Report

PDF Reader
Full Text

Neurocomputing 116 (2013) 392–402

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Aggregate meta-models for evolutionary multiobjective and many-objective optimization Martin Pila´t a,b,n, Roman Neruda b a b

´mˇestı´ 25, Prague, Czech Republic Faculty of Mathematics and Physics, Charles University in Prague, Malostranske´ na ´renskou vˇezˇ´ı 2, Prague 8, Czech Republic Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod Voda

a r t i c l e i n f o

a b s t r a c t

Available online 10 October 2012

Evolutionary algorithms are among the best multiobjective optimizers. However, they need a large number of function evaluations. In this paper a meta-model based approach to the reduction in the needed number of function evaluations is presented. Local aggregate meta-models are used in a memetic operator. The algorithm is ﬁrst discussed from a theoretical point of view and then it is shown that the meta-models greatly reduce the number of function evaluations. The approach is compared to a similar one with a single global meta-model as well as to more traditional NSGA-II and E-IBEA. Moreover, it is shown that aggregate meta-models work even for a larger number of objectives and therefore should be considered when designing many-objective evolutionary algorithms. & 2012 Elsevier B.V. All rights reserved.

Keywords: Evolutionary algorithms Multiobjective optimization Many-objective optimization Surrogate models Meta-models Memetic algorithm

1. Introduction Many real life optimization tasks require optimizing multiple conﬂicting objectives at once. It has been shown and widely accepted that multiobjective evolutionary algorithms (MOEAs) are among the best methods for multiobjective optimization. In the past years several multiobjective evolutionary algorithms [1–4] were proposed and used to deal with these problems. However, most of them require lots of evaluations of each objective function which makes them problematic to use for solving real life problems. These problems may have complex objective functions whose evaluations are expensive (either in terms of time or money). There are two main approaches to make the MOEAs more practically usable. One of them is parallelization, the other is the use of so-called meta-models. Parallel evolutionary algorithms [5–7] use the power of more computers to get the results faster. This approach only helps when the objective functions are hard to compute and one has enough free computational resources. The number of evaluations is generally not decreased, neither are any costs associated with them. The meta-model (also referred to as a surrogate model, a response surface model) is a simpliﬁed and cheaper approximation of the real objective function. The use of meta-models aims at lowering the number of objective function evaluations which are needed to obtain the ﬁnal solution. This approximation is used instead of the complex and expensive original function. These models can be constructed in several ways. One approach, used

especially in engineering, is to consider a completely different physical model (such as removing the less important variables and using different approximations). Other approaches use some known precise values to create the approximate model. Here, models such as polynomial regression [8], Gaussian processes [9] or similar can be used. Often models from the ﬁeld of computation intelligence are utilized in this context—including neural networks [8] and support vector machines [10]. Although the meta-models are able to lower the number of evaluations of the original objective functions, they also add another overheads: the meta-models need to be trained and of course also evaluated several times. Under some circumstances, this overhead might be larger than the time saved by using these meta-models. Before such a meta-model based algorithm is applied for solving practical tasks, these issues shall be considered. The situation is quite easy, if only computation time is important: it is sufﬁcient to try, whether the required quality of solutions is attained faster with or without the use of meta-models. The reduction in the number of the objective function evaluations becomes even more important, if there are any ﬁnancial costs involved with the evaluation (i.e. if a real-life experiment needs to be run [11]). In this case almost any reduction is desirable, as computation time is usually cheaper than the cost of evaluating of the objective function. The goals of this paper are

to introduce a multiobjective evolutionary algorithm with local

n

Corresponding author. E-mail addresses: [email protected] (M. Pila´t), [email protected] (R. Neruda). 0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.06.043

aggregate meta-models (i.e. meta-models which are different for each individual and predict all the objectives at once), to show that the local meta-models work better than a single global meta-model, to show that the algorithm can be used to reduce the computation time even for fast objective functions,

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

to show that aggregate meta-models improve the performance

of many-objective optimizers and reduce the number of objective function evaluations, to show that aggregate meta-models are a natural hybridization of different many-objective techniques, to encourage the research of aggregate meta-models and memetic algorithms in the ﬁeld of evolutionary multiobjective and many-objective optimization.

The rest of the paper is organized as follows: in the next section we brieﬂy discuss the basics of multiobjective evolutionary algorithms and describe the use of meta-models in evolutionary algorithms, especially in evolutionary multiobjective optimization. In Section 3 the problem of multiobjective optimization and related terminology is deﬁned and discussed. In Section 4 we describe our approach and in Section 5 we discuss, under which circumstances the algorithm speeds up the search. In Sections 6 and 7 we present the results of the algorithm on bi-objective and many-objective problems respectively. Finally Section 8 concludes the paper.

2. Related work During the last decades several approaches were presented to deal with single-objective optimization, among them are, aside from evolutionary algorithms, particle swarm optimizers [12–14], gradient methods and others. All of them now coexist and are used to solve the single-objective problems. However, in the ﬁeld of multi-objective optimization, evolutionary algorithms are by far the most often used and studied method (however, several particle swarm methods have been surveyed in [15]). 2.1. Multiobjective evolutionary algorithms Multiobjective evolutionary algorithms (MOEAs) usually do not return a single solution, rather the whole population in the last generation (or an external archive) is returned as the solution. The algorithms also differ in how they select individuals to the next generation. They can be divided into three groups based on the type of selection they perform. The ﬁrst group, represented by the oldest multiobjective evolutionary algorithm, uses some kind of scalarization, or aggregation, during the ﬁtness assignment. VEGA [16], the oldest MOEA, used different objective function in each generation, thus ﬁnding compromise solutions. However, this often leads to convergence towards the optima of the respective objective functions and only a few compromise solutions remain in the population. A newer algorithm from the same group, called the MSOPS [17], uses weighted sums of objectives to create a ranking matrix, which is later used during the selection (as a simpliﬁcation: objective vectors, which yield better values of the weight sum more often are better and have higher probability of being selected). Another group of algorithms, represented, e.g. by the wellknown NSGA-II algorithm [1], uses the dominance relation (see Deﬁnition 3) during the selection process. Usually, the population is divided into so-called non-dominated fronts. Individuals which are not dominated by any other in the population are assigned front number 1. These are temporarily removed and individuals non-dominated by the rest are assigned front number 2, this process is iterated as long as there are any individuals in the population. Then, individuals from fronts with lower number are selected ﬁrst. There are usually other criteria to discriminate between individuals in the same front, in the case of NSGA-II it is so-called crowding distance, which roughly corresponds to the

393

distance to the closest individual in the objective space (and individuals from less crowded regions are given preference). Yet another group of algorithms is based on indicators. These indicators usually somehow reﬁne the dominance relation. One of the algorithms representing this group is IBEA [3]. This algorithm uses binary indicator which compares two individuals to assign the ﬁtness in the following way: the indicator value of each pair of individuals is computed, and an individual i is assigned ﬁtness X FðiÞ ¼ eIðfjg,figÞ=k j A P\fig

here P is the population of individuals and k is a scaling factor which has to be set in advance. The purpose of the exponential is to amplify the differences between dominated and nondominated individuals. An example of such an indicator may be the E þ indicator which expresses, how much an objective vector needs to be moved to become dominated by the other vector. The following deﬁnition states this formally. Deﬁnition 1. Let A, B be the two decision vectors ! ! ! ! IE þ ðA,BÞ ¼ minf8 x A B( y A A : f i ð y ÞE r f i ð x Þg E

Indicator based MOEAs are among the most modern ones. Some of them even use the hypervolume indicator directly. In this case, they must somehow overcome the complexity of the computation of this indicator, to be able to scale well for problems with many objective function. One of such algorithms, HypE [4], solves this problem by using Monte Carlo sampling to compute the hypervolume indicator.

2.2. Meta-models in evolutionary algorithms The use of meta-models has quite a long tradition in the ﬁeld of single-objective optimization. Here, the meta-models may be used in the following ways:

Meta-models are used directly instead of the ﬁtness function—

the ﬁtness function is replaced by the meta-model and the meta-model is optimized. In the extreme case, this is done in the beginning and the model never changes thereafter, more usually the model is updated after a given number of generations [18,19]. Meta-models are used to pre-evaluate individuals—each individual is evaluated by the meta-model to estimate its quality, but only the best individuals are evaluated by the original ﬁtness function [20,21]. Meta-models are used in some kind of memetic operator—this operator takes some of the individuals and moves them closer to the (local) optimum of the meta-model. Gradient methods and other local optimization methods (even evolutionary algorithms) may be used in this case [22].

In the case of multiobjective optimization the situation is more challenging, as there are more objective functions and the approaches differ in how and what they predict using the metamodel. In one of the ﬁrst approaches [23] Voutchkov and Keane used the NSGA-II [1] and replaced the objective functions with their meta-models. A more advanced method was proposed in [24]. The authors present various methods for pre-evaluation of individuals. The estimated increase in hypervolume [25] based on a meta-model for each objective function is used. OEGADO [26] uses a singleobjective meta-model based algorithm for each of the objectives,

394

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

information exchange between the single-objective algorithms occurs at intervals, which helps to ﬁnd trade-off individuals. ParEGO [27] uses the weighted sum of the objective functions to perform memetic local search. The weights are changed after each iteration. In [10] the authors describe an aggregate meta-model based on the combination of One-Class SVM and Support vector regression. Their model is trained to differentiate between dominated and non-dominated individuals, and it is used during the evolution to pre-evaluate the individuals and drop those who are not promising. The same authors in [28] proposed a similar approach based on rank-based SVM [29]. An alternative strategy is represented by the Pareto-following variance operator [30], where the behavior of the underlying optimizer is approximated instead of the objective functions. Although the memetic variant is also possible in multiobjective setting, only a few references were found in the literature which deal with meta-model assisted multiobjective memetic algorithms. In [8] the authors propose such an algorithm. They use a meta-model (in this case RBF networks are used) for each of the objective functions. During the local search one of the objectives is selected for reﬁnement, and a local meta-model is trained and used during the local search. In [31] the authors propose another method: they use a singleobjective meta-model assisted evolutionary algorithm in the local search phase. Two different local meta-models are used, both trained to approximate a weighted sum of the objectives. One is an ensemble model, the other is a low order polynomial. Two single-objective algorithms are run to ﬁnd optima of the respective models, which are then precisely evaluated. A selection procedure is then used to decide which of the individuals (if any) is added to the population. The authors of [32] propose a method based on RBF networks as local meta-models. They cluster the training set before the model training using the k-means algorithm and train metamodels for each of the clusters. The meta-models approximate the objective functions directly. There is also a memetic variant of MOEA/D [33] called MOEA/DEGO [9]. MOEA/D uses decomposition (weighted aggregation) to sample the Pareto front. MOEA/D-EGO uses Gaussian processes to model each of the objective functions and then derives the model for each of the decomposed problems, these are then optimized using evolutionary algorithms and some of the best found individuals are placed back to the population of the external algorithm. Finally, we proposed a method [34], which uses distance based aggregate models in a memetic algorithm to ﬁnd new nondominated solutions. In many-objective optimization, i.e. optimization of 5 or more objectives at once, there are additional problems to those that appear in multiobjective optimization. One of the most important is that the dominance relation used in many multiobjective algorithms like NSGA-II [1] and SPEA2 [2] can no longer be used to discriminate between good and bad individuals, as most of the individuals are mutually non-dominated [35]. There are several approaches which deal with this problem. One of them is to use scalarization functions (e.g. weighted sum of the objectives, distance to a goal vector, etc.) to rank the individuals in the population. This approach is used in MSOPS [17] and MOEA/D [33]. Another option is to use a different approach to compute the ﬁtness instead of the dominance relation. IBEA [3] uses indicators to compare two individuals and the ﬁtness is assigned based on the values of these indicators. HypE [4] optimizes the hypervolume directly, using Monte Carlo sampling to avoid the complexity of hypervolume computation. There are, to our best knowledge, no publications which deal with the meta-models in the ﬁeld of many-objective optimization.

In this paper, we show that aggregate meta-models may be another approach which improves the performance of evolutionary optimizers for problems with many objective functions.

3. Preliminaries Contrary to single-objective optimization, in multiobjective optimization there are more objective functions, which shall be optimized simultaneously. These objective functions are usually conﬂicting, and thus there is not a single solution, which would be optimal for all of them. This leads to a set of so-called Pareto optimal solutions. The following deﬁnitions introduce the multiobjective optimization problem and the Pareto dominance relation, which is used to compare two potential solutions to the problem. Deﬁnition 2. The multiobjective optimization problem (MOP) is ! a quadruple /D,O, f ,CS, where

D is the decision space, O D Rn is the objective space, C ¼ fg 1 , . . . ,g m g, where g i : D-R is the set of constraint func-

! tions (constraints) deﬁning the feasible space F ¼ f x A D9 ! g i ð x Þ r0g, ! f : F-O is the vector of n objective functions (objectives), ! f ¼ ðf 1 , . . . ,f n Þ,f i : F-R.

! ! x A D is called the decision vector and y A O is denoted as the objective vector. In the ﬁeld of multiobjective optimization, problems with more than four objectives are often called many-objective, as this higher number of objectives poses another challenges for the MOEAs (e.g. the dominance relation deﬁned in the next paragraph loses its power to discriminate between good and bad individuals as most of them are mutually incomparable). To compare two decision vectors, we deﬁne the so-called Pareto dominance relation. If one vector is better (has lower objective values) for all of the objective functions, we say it dominates the other vector. This is formally stated in the following deﬁnition. ! ! Deﬁnition 3. Given decision vectors x , y A D we say that

! ! ! ! ! ! x weakly dominates y ( x $ y ) if 8iA f1 . . . ng : f i ð x Þ rf i ð y Þ, ! ! ! ! ! ! ! ! x does not dominate y ( x $$ y ) if y $ x or x and y are ! ! ! ! incomparable (i.e. x $$ y and y $$ x ).

Now, we can state that the goal of the multiobjective optimization is to ﬁnd those decision vectors, which are minimal in the Pareto dominance relation. Deﬁnition 4. The solution of a MOP is the Pareto (optimal) set ! ! ! ! ! ! Pn ¼ f x A F98 y A F y a x : y $$ x g ! The projection of Pn under f is called the Pareto optimal front. The Pareto optimal set is usually inﬁnite for continuous optimization and thus we usually seek a ﬁnite approximation of this set. This approximation should be close to the Pareto set (ideally it is a subset of it) and should also be evenly distributed along the Pareto front. We can extend the Pareto dominance relation to such approximations and compare them with this relation, however, as the ordering is only partial, there would be pairs of approximations

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

395

which are mutually incomparable (in fact, most of such pairs would be incomparable). As we want to compare approximations, which are solutions found by a multiobjective optimizer, we need a way to compare any two sets. During the past years, many measures were proposed to compare such Pareto set approximations and one of the most often used is the hypervolume indicator [36]. This indicator expresses the hypervolume of the objective space, which is dominated by the solutions.

To obtain a training set for the meta-models, we also added an external archive of individuals with known objective values. This archive is updated after each generation when new individuals are added and at the same time the archive is truncated to ensure it does not grow indeﬁnitely. The following sections detail the important parts of the algorithm. The main loop (see Algorithm 1) is essentially a generic MOEA with an added memetic operator.

Deﬁnition 5. Let R O be a reference set. The hypervolume metric S is deﬁned as

Require Ginit: the number of initial generations run without the memetic operator t’0 Initialize randomly new population P0 Initialize empty archive A while Termination criterion not met do Use crossover and mutation to create new population P 0 if t ZGinit then Use the individuals in the archive A to train new model Mt Use memetic operator to improve each of the newly created individuals in P0 with the probability pmem end if Compute the objective values for each new individual in P0 Add all new individuals to the archive and truncate it Pt þ 1 ’ selected individuals from P t [ P0 t’t þ 1 end while return The non-dominated individuals from the population

SðAÞ ¼ lðHðA,RÞÞ where ! ! ! ! HðA,RÞ ¼ fx A O9(! a A A( r A R : 8i A f1, . . . ,ng : f i ð a Þ$ x i $ r i g, where fi is the i-th objective function,

R ! O 1 HðA,RÞ ðzÞ dz and is the characteristic function of the set HðA,RÞ.

! l is a Lebesgue measure with lðHðA,RÞÞ ¼ 1 HðA,RÞ

The reference set bounds the hypervolume from above. It usually contains only a single reference point. We should note here that although the deﬁnition of the hypervolume indicator is quite simple, its computation is known to be #P-complete [37] and its complexity grows exponentially with the number of objectives.

Algorithm 1. The main loop of the algorithm.

4. Algorithm description 4.1. Meta-model construction In the previous paper [34] we proposed a multiobjective memetic algorithm with aggregate meta-model (ASM-MOMA). It means that all of the objective functions are modeled using only a single model, this model expresses the overall quality of the particular individual instead of the values of each objective function. The algorithm is able to reduce the number of required evaluations of the objective functions by the factor of 5–10 on most problems. ASM-MOMA uses a single global meta-model trained after each generation as a ﬁtness function during the local search. One of the questions we stated in that paper was whether local models would improve the convergence speed of the algorithm any further. In this paper, we propose a new variant of ASM-MOMA with local models instead of a single global one. We call this variant LAMMA. The main difference between LAMMA and other multiobjective evolutionary algorithms is the addition of a special memetic operator, which performs local search on some of the newly generated individuals (the generation of the new individuals is handled by the respective MOEA to which this operator is added). The operator uses the meta-model constructed based on previously evaluated points in the decision space, for which the values of objective functions are known. The meta-model is trained to predict the distance to the currently known nondominated solutions. Moreover, as an addition to ASM-MOMA, in LAMMA the points do not have the same weight, as those that are closer to the locally optimized one are considered more important during the model building phase, see Eq. (1) for details. The main idea is that points closer to the known Pareto front are more interesting during the run of the algorithm and the memetic operator moves the individuals closer to the Pareto front. The purpose of the meta-model is not to precisely predict the value but rather provide a general direction in which the memetic search should proceed.

We train a dedicated model for each individual I which shall be locally optimized by the memetic operator. For such an individual I we create a weighted training set 1 T I ¼ /ðxi ,yi Þ,wi S9yi ¼ dðxi ,PÞ,wi ¼ ð1Þ 1 þ ldðxi ,IÞ where dðx,yÞ is the Euclidean distance of individuals x and y in the decision space, P is the set of non-dominated individuals in the archive and dðx,PÞ is the distance of individual x to the closest point in the set P. l is a parameter which controls the locality of the model, larger values of l lead to more local model, whereas lower values lead to more global one. The points which are closer to the individual I are more important during the training of the model. This distance weighting adds some locality to the models trained for each individual. The training set is constructed in such a way that for the individuals closer to the currently known Pareto front the meta-model should return larger values. This fact is used during the local search phase (which uses the meta-model as a ﬁtness function). The target value of the model depends only on the distance of the individual from the Pareto front. The nondominated points have the value of 0.0 and any dominated points have negative values. Ideally, after the local search phase, there would be new nondominated points in the population which should have positive values predicted by the model. The model does not respect the dominance relation: a point dominated by a lot of others may have higher target value than another point further from the Pareto front. We also experimented with models based on the number of nondominated front in which the particular individual is, but such models did not work well. The distance based models described in this paper provide better information about the search space and guide the local search algorithm towards the Pareto front.

396

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

Algorithm 2. Meta model training. Require The archive of evaluated individuals A Require The optimized individual I Require The locality parameter l Initialize empty training set TI and meta model MI N’ non-dominated individuals in the archive for each individual xi in the archive A do Let dðxi ,NÞ’ distance of xi to the closest individual in N Let dðxi ,IÞ’ distance of individual xi to the optimized individual I Add /ði,dðxi ,NÞÞ,1=ð1 þ ldðxi ,IÞÞS into TI end for Train the model MI on data TI return The trained meta model MI

4.2. Local search In the local search phase during the run of the memetic operator (see Algorithm 3) we use another evolutionary algorithm (this time it is only a single objective one) to ﬁnd better points in the surroundings of each individual. The algorithm runs only for a few generations and it uses only meta-model evaluations. The newly found individuals are placed back to the population. During the initialization of the local search the individual which should be optimized is inserted into the initial population and its variables are perturbed to create the rest of the initial population. Note that any other optimization method could in theory be used for ﬁnding the optima of the surrogate model, however, we chose the evolutionary algorithm for two reasons: it does not need any assumptions about the meta-model used, and due to its randomized nature, it provides multiple different points which lead to better diversity in the population of the external algorithm. Although the complexity of the evolutionary algorithm is larger than the complexity of other methods, the local search is still limited mainly by the time needed for the training of the model as we shall see in Section 5. Algorithm 3. Memetic operator. Require The trained meta model M Require The individual I to be improved Create new population by perturbing the values of the individual I Add I to the population Use evolutionary algorithm with the meta model as the ﬁtness function to improve the individual return The best individual found

4.3. Archive The algorithm uses an archive of previously evaluated individuals. This archive is used in the model building phase. The algorithm needs a few initial generations to ﬁll the archive with enough individuals to build the meta-model. This number of generations depends on the number of variables, the number of individuals in population and the complexity of the chosen model. Generally, this number corresponds to the number of free parameters of the model. For linear regression this is the dimension D of the search space and therefore there have to be at least D individuals in the archive before the linear regression model can be trained. The minimum required number of individuals in the archive can be determined in advance based on the model used, however, usually more points are desired in order to obtain more precise models.

The size of the archive should be kept under a certain limit to prevent large memory usage. The archive is truncated after each generation. In this paper we keep at most 400 individuals in the archive. The truncation process is very simple: random individuals are selected for removal from the archive. This ensures that individuals from more recent generations are more likely to be in the archive thus being used to build the meta-model. For more detailed discussion about the archive truncation see our previous paper [34].

5. Modeling overhead Generally, when talking about surrogate modeling, the assumption is that the ﬁtness evaluations take a long time and therefore the complexity of other parts of the algorithm is negligible. In this section we would like to discuss, how long it takes for the algorithm to create the model and make all the evaluations, and therefore how long must the evaluation of the real ﬁtness function take in order to hide this overhead. The proposed algorithm uses quite a large number of metamodel evaluations and even meta-model training. In this section we would like to discuss the usability of this approach. In the equations below we use the notation deﬁned in Table 1. We ignore the time needed to evaluate the ﬁtness function after the values of the objective functions are known, as well as the time consumed by the genetic operators. Adding this would make the difference between the memetic and original variants even larger as the time for these is constant in each generation and the memetic algorithm should need less generations. Note that both the initial and ﬁnal populations need to be evaluated, therefore the factor Ge is one higher than the number of generations of the external evolutionary algorithm (the same holds for Gi and internal evolutionary algorithm). The original algorithm takes the time T orig ¼ Ge P e T o to ﬁnish. The use of the meta-model reduces the number of generations needed, but on the other hand adds time to train and evaluate the local models. It takes T meta ¼ RGe P e T o þ RGe P e pmem ðT t þP i Gi T m Þ to ﬁnd the solution of the same quality. The ﬁrst part is identical to the original algorithm with the reduced number of generations. The second part corresponds to the training and local search. Now, we would like to know, how much faster the meta-model training and evaluations need to be (compared to the original objective evaluation) for this method to speed up the optimization, i.e. under which conditions the inequality T meta oT orig holds. After

Table 1 The notation used in the equations. Symbol

Meaning

To Tt Tm Ge Gi Pe Pi pmem R

The time of an objective function evaluation The time of meta-model training The time of meta-model evaluation Number of evaluated generations (external EA) Number of evaluated generations (internal EA) External EA population size Internal EA population size Memetic operator probability Reduction in the number of evaluations

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

Table 2 Times needed for training and evaluation of selected meta-models, in seconds.

397

Table 4 Parameters of the multiobjective algorithm.

Model

Training (Tt)

Evaluation (Tm)

Parameter

MOEA value

Local search value

Linear regression Support vector reg. Multilayer perceptron

0.142 0.328 3.75

8.46 10 7 7.14 10 7 1.80 10 5

Stopping criterion

50,000 objective evaluations 50 SBX 0.8 Polynomial 0.1 400 0.25 –

30 generations

Table 3 Limit evaluation threshold To for which the real time reduction is achieved considering various values of R, in milliseconds. R

0.1

0.2

0.5

0.8

0.9

Linear regression Support vector reg. Multilayer perceptron

4 9 105

8 21 237

30 82 949

144 330 3800

324 742 8540

substituting the above expressions and solving for To we get To 4

R p ½T t þ Gi P i T m : 1R mem

This inequality holds for R between 0 and 1. R is always positive and if R is larger than 1, no reduction is made and therefore the memetic algorithm cannot work faster than the original one. Although the factor Gi P i may be quite large the training of the meta-model is what usually dominates the time in this case. Table 2 shows results of measurements we made in order to ﬁnd out how fast some the models we used are. All the test were done on a computer with an Intel Core i7 920 (2.87 GHz) processor and 6 GB RAM. The size of the training set was set to be 400 and the training set was obtained during the run of the described algorithm. Based on these measurements and equations above we can compute the limit time of the objective function evaluation for which the use of our algorithm reduces the time needed to ﬁnd a solution. Some of these are computed in Table 3. (The remaining parameters were pmem ¼ 0:25, Gi ¼50, Pi ¼50.) The numbers imply that the use of multilayer perceptrons in this context might not be very advantageous unless they are able to provide much better reductions than the other models. Later in this paper, we show that the values of R¼ 0.2 and even lower are possible to obtain. This means that LAMMA is usable even for problems with relatively fast objective functions which take only milliseconds to evaluate.

6. Experiments—bi-objective problems To assess the performance of LAMMA in the bi-objective case, we tested our approach on the widely used ZDT [38] benchmark problems. These problems are all two dimensional, and we used 30 variables for ZDT1 and 15 variables for the other problems. In the local search phase we used various meta-models: namely multilayer perceptron, support vector regression, and linear regression. All the models use default parameters from the Weka framework [39] (which we used to run the experiments), i.e. polynomial kernels and normalization for the support vector regression and learning rate of 0.2 and momentum of 0.3 for multilayer perceptron, together with four neurons in the hidden layer, the instances are again normalized. See Table 4 for the parameters of the main multiobjective algorithm and the internal single-objective algorithm. We used the NSGA-II and E-IBEA with Simulated Binary Crossover [40] and Polynomial Mutation [41] as the external multiobjective

Population size Crossover operator Crossover probability Mutation operator Mutation probability Archive size Memetic operator probability Meta-model locality parameter l

50 SBX 0.8 Polynomial 0.2 – – 1

evolutionary algorithm. In the local search phase we used a simple single objective evolutionary algorithm with the same operators and the meta-model as the ﬁtness function. 6.1. Performance measure To compare the results, we use a measure we call Hratio, it is deﬁned as the Hratio ¼

Hreal Hoptimal

where Hreal is the hypervolume of the dominated space attained by the algorithm and Hoptimal is the hypervolume of the real Pareto set of the solutions. As the Pareto set is known for all the ZDT problems, we can compute this number directly. We use the ! vector 2 ¼ ð2,2Þ as the reference point in the hypervolume computation. All points that do not dominate the reference point are excluded from the hypervolume computation. We compare the median number of function evaluations needed to attain the Hratio of 0.5, 0.75, 0.9, 0.95, and 0.99, respectively. 6.2. Results Table 5 shows the results of our algorithm compared to original NSGA-II and E-IBEA and ASM-MOMA. In all the tables NSGA is the original NSGA-II, IBEA denotes the original E-IBEA. LR, SVM, and MLP stand for the model used: linear regression, support vector regression and multilayer perceptron respectively. G denotes the single global model of ASM-MOMA and L stands for the local models as described in this paper. The numbers in the table represent the median number of objective function evaluations needed to reach the speciﬁed Hratio value. Twenty runs for each conﬁguration were made. A ‘‘–’’ symbol means that the particular conﬁguration was not able to attain the speciﬁed Hratio within the limit of 50,000 evaluations of the objective functions. From the results, we can see that the global models signiﬁcantly decrease the number of required function evaluations, and the local models are even better than the global ones. Moreover, we can see that linear regression gives better results than support vector regression and multilayer perceptrons. It probably creates simpler models which indicate the right general direction in which the local search should proceed. Furthermore, we can see that the results of local models are almost always better than those of a single global model (see the following paragraphs for more detailed discussion). Comparing the differences between the chosen models (linear regression, support vector regression and multilayer perceptron) we can note that within local models these differences are smaller. This implies that the choice of the

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

398

Table 5 Median number of function evaluations needed to reach the speciﬁed Hratio on ZDT1, ZDT2, ZDT3 and ZDT6 test problems. Hratio

0.5

0.75

0.9

0.95

0.99

0.5

ZDT1

0.75

0.9

0.95

0.99

ZDT2

NSGA NSGA-LR-G NSGA-SVM-G NSGA-MLP-G NSGA-LR-L NSGA-SVM-L NSGA-MLP-L

5600 1500 1450 2100 1300 1350 1600

18,600 2000 2050 2800 1750 1650 2100

19,850 2400 2350 3850 2250 2150 2700

20,750 2800 2850 4500 2600 2450 3250

21,850 12,750 13,550 15,200 13,100 14,150 15,700

650 350 350 400 350 350 350

1650 550 450 550 450 550 500

3550 750 700 800 600 750 750

5050 950 1050 1000 850 900 850

7900 1250 1750 1500 1100 1250 1250

IBEA IBEA-LR-G IBEA-SVM-G IBEA-MLP-G IBEA-LR-L IBEA-SVM-L IBEA-MLP-L

7400 1450 1400 1800 1300 1350 1400

13,750 2500 2050 2550 1900 1900 1850

18,200 2800 2700 4000 2400 2350 2450

20,000 2950 3100 4600 2750 2600 3250

25,550 7450 6850 10,100 7500 7100 9650

750 350 350 450 300 350 350

2050 550 550 650 500 550 550

5150 750 850 950 700 800 750

7800 900 1050 1200 850 1000 900

13,000 1650 1550 2700 1350 1450 1400

7950 2750 2500 3300 2850 2600 3350

10,200 5950 4950 5850 5850 4950 6050

13,950 11,100 8650 10350 10,550 9100 10,300

17,700 15,750 12,500 14,650 15,350 12,900 13,950

28,650 30,500 23,500 26,800 29,200 25,300 27,150

10,300 3050 3000 3500 3050 3000 3400

13,650 6500 7250 7250 6850 6500 7050

18,400 13,400 14,100 13,250 13,050 12,650 13,300

23,150 17,600 19,250 18,900 18,750 17,850 18,200

34,050 32,100 34,150 32,450 31,400 32,550 32,950

ZDT3

ZDT6

NSGA NSGA-LR-G NSGA-SVM-G NSGA-MLP-G NSGA-LR-L NSGA-SVM-L NSGA-MLP-L

600 300 350 450 300 350 350

1250 500 500 700 450 550 550

4150 700 700 1000 650 700 850

7250 800 750 1150 800 850 950

– 1150 1100 1750 1050 1000 1300

IBEA IBEA-LR-G IBEA-SVM-G IBEA-MLP-G IBEA-LR-L IBEA-SVM-L IBEA-MLP-L

650 350 350 450 350 400 400

1550 550 550 800 450 650 650

5400 850 850 1100 750 850 950

8150 950 1000 1250 900 1050 1150

33,350 1300 1300 1800 1300 1450 1600

type of the model is less important when the local models are used. Following from the discussion in Section 5, we could recommend using the faster models, i.e. linear regression or support vector regression instead of multilayer perceptrons. To rule out the possibility of improperly set parameters of the multilayer perceptrons, we tried changing the number of neurons in the hidden layer. The results of this experiment on ZDT1 are in Table 6. We can see that the results of this model can indeed be improved, when more attention is paid to its settings, however LAMMA with multilayer perceptrons still does not outperform the results of LAMMA with other meta-models. On ZDT1, the global model decreased the number of function evaluations by the factor of 7.4 for the Hratio ¼ 0:95 (NSGA-II and linear regression), the local model decreased this number by another almost 8%, yielding a combined factor of 8. The numbers for Hratio ¼ 0:99 are not that good, although the number of function evaluations dropped to approximately a half with the global model and remains practically unchanged with the use of local models. The results for other combinations of MOEA and types of meta-models are similar on ZDT1. On ZDT2 (again NSGA-II and linear regression), the global model reduced the required number of evaluations (to reach the Hratio ¼ 0:99) by the factor of 6.3 with the local model lowering the number by another 12%, yielding the overall reduction factor of 7.2. Again, the results are similar for other combinations of MOEA and type of the meta-model, the only exception being the behavior of multilayer perceptron in the combination with E-IBEA, where the local model decreased the number of evaluations to 1400 compared to the 2700 of the global model and 13,000 of plain E-IBEA yielding a reduction by the factor of 9.3. On ZDT3, both ASM-MOMA and LAMMA were able to reach the Hratio ¼ 0:99 while the original NSGA-II was not. Moreover,

Table 6 The effect of different number of neurons in the hidden layer on the performance of LAMMA on the ZDT1 test problem. Number of neurons

0.5

0.75

0.9

0.95

0.99

1 3 5 10

1300 1400 1450 1500

2900 2100 1800 1900

3150 2800 2200 2550

3300 3250 2950 3250

8750 9600 8500 9350

LAMMA needed only 1000 evaluations (with support vector regression as the meta-model), ASM-MOMA needed 100 more evaluations, thus the local models reduced this number by 9%. The original E-IBEA needed over 30,000 function evaluations to attain the Hratio ¼ 0:99, the ASM-MOMA and LAMMA both needed only 1300 evaluations, thus reducing the number of evaluations almost 26 times. ZDT6 proved again (as in our previous paper [34]) to be the most difﬁcult problem among those we used for comparison. Although the number of evaluations needed to reach the Hratio ¼ 0:5 dropped approximately to a third of the original, this difference gets lower as the Hratio grows, and the results for Hratio ¼ 0:99 are almost identical. In this case, the local models helped to reduce the number of evaluations slightly, and for most conﬁgurations of LAMMA they were lower than those needed by the original algorithms. We believe the poor results are partially caused by premature convergence (as some preliminary tests showed that the results for higher percentage of individuals which are locally improved are even worse), together with the difﬁculty of modeling this particular function. On ZDT6 the Pareto front is biased for solutions with one of the functions close to 1,

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

which yields training set with low diversity and that could be the reason for poorly trained models. To compare LAMMA to another surrogate based algorithm, we chose the results of Loshchilov et al. presented in [10]. They describe an algorithm which uses a Pareto SVM-based aggregate surrogate model to pre-select the individuals and use a similar comparison methodology, they conclude that their algorithm decreases the number of required objective function evaluations 1.5–2 times compared to the original non-surrogate version of the algorithms (they compare their approach to a variant of NSGA-II). LAMMA is able to decrease the number of required evaluations much more drastically, and thus can be considered better.

7. Experiments—many-objective problems As we have shown both ASM-MOMA and LAMMA greatly reduce the number of objective function evaluations needed to ﬁnd a good solution to a multi-objective problem with two objectives. However, the question remained how these two algorithms would scale with respect to the number of objectives and how they will perform on many-objective problem. The idea is that both algorithms use some kind of scalarization during the creation of the meta-model, which is similar to some of the many-objective optimization algorithms. This could provide non-dominated individuals and have a similar performance without any additional objective function evaluation. The aggregate meta-models thus provide a natural hybridization of scalarization techniques with other many-objective optimization techniques (in our case with the indicator based algorithms, as we used the IBEA as the main many-objective optimizer). In this scenario we tested only E-IBEA based ASM-MOMA and LAMMA as NSGA-II is known to perform poorly in many objective cases [35]. We also did not test the performance of multilayer perceptrons as the meta-model as the results in bi-objective case indicated they do not work well and are much slower than linear regression and support vector regression. To assess the performance of aggregate meta-models in the many-objective case, we tested ASM-MOMA and LAMMA (with locality parameter l ¼ 1 and l ¼ 4) on the well-known DTLZ1 to DTLZ4 test problems with 5, 10, and 15 objectives, we used 20 variables in all cases. See Table 7 for the parameters of the algorithms used. We chose the hypervolume as the performance measure and report its values after 1000, 2000, 5000 and 10,000 evaluations of the objective functions. The hypervolume is normalized in such a ! way that the point 0 has the hypervolume of 1.0. As the exact hypervolume cannot be computed effectively for this large number of objectives, we used Monte Carlo sampling to ﬁnd its approximation, 100,000 samples were used. Again, 20 runs for Table 7 Parameters of the multiobjective algorithm. Parameter

MOEA value

Local search value

Stopping criterion

10,000 objective evaluations 50 SBX 0.8 Polynomial 0.1 400 0.25 –

30 generations 50 SBX 0.8 Polynomial 0.2 – – 1.0 and 4.0

Population size Crossover operator Crossover probability Mutation operator Mutation probability Archive size Memetic operator probability Meta-model locality parameter l (LAMMA only)

399

each conﬁguration were performed and the average values are reported. We do not use the Hratio in this case, as we do not know the real hypervolume for all of the test problems. Moreover, the hypervolumes differ largely for different numbers of objectives. Although we believe that the comparison methodology used in the multiobjective case is better, as it allows a direct comparison of the number of needed objective function evaluations, the complexity of the hypervolume computation for these large number of objectives makes it impractical, as it requires the evaluation of the hypervolume after each generation. Also due to the slight deviations in the computed hypervolume caused by the Monte-Carlo sampling the reported number of evaluations might have been incorrect in the case that the estimated hypervolume would be larger than the real hypervolume. Table 8 shows the result of aggregate meta-models on manyobjective optimization problems. Best results for each conﬁguration are in italics, if the result is signiﬁcantly better (one sided t-test, p-value o 0:05) than other results it is in bold and the worse result is in the parentheses (N for no model, S for support vector regression, and L for linear regression). Generally, we can see that the aggregate meta-models improve the results compared to the plain E-IBEA. The following sections discuss the results in more detail. 7.1. ASM-MOMA With AMS-MOMA, we can see that the results were improved in almost all cases (the only exception being DTLZ1 with 5 objective functions after 1000 and 2000 function evaluations). Moreover, the results for linear regression are generally better than those where support vector regression was used as the underlying meta-model. On the DTLZ1 problem, ASM-MOMA with linear regression as the meta-model reached the best hypervolume for 5, 10 and 15 objective function after 5000 and 10,000 objective function evaluations. However, for the conﬁguration with only 5 objective functions it was beaten by E-IBEA after 1000 and 2000 function evaluations. This might mean that the aggregate meta-models are able to provide new non-dominated solutions even in the later phases of the evolution. On the other hand, in the earlier phases the models are not well trained and may slow the evolution down, as they may provide the wrong direction for the search. On DTLZ2 with 5 and 10 objective function ASM-MOMA with support vector regression performed better than ASM-MOMA with linear regression as meta-model. Both performed better than E-IBEA, however only support vector regression ASM-MOMA was signiﬁcantly better. When the number of objectives is increased to 15, the situation changes. ASM-MOMA with linear regression works the best of the compared algorithms and is signiﬁcantly better than both E-IBEA and ASM-MOMA with support vector regression model. We can also note that ASM-MOMA reduces the number of evaluations: the same result E-IBEA reached after 10,000 evaluations, was reached by ASM-MOMA after 5000 evaluations in the 5 objective case, 2000 evaluations in the 10 objective case, and only 1000 evaluations in the 15 objective case. Linear regression seems to be the best model for DTLZ3 also, for almost all of the conﬁgurations (except 15 objectives after 5000 and 10,000 generations, where support vector regression wins). For the conﬁguration with 10 objectives, the linear regression ASM-MOMA provides signiﬁcantly better results than E-IBEA. Moreover, we can notice that even after 1000 evaluation the ASMMOMA already has better result than E-IBEA after 10,000 evaluations, thus reducing the required number of evaluations more than 10 times. On DTLZ4 ASM-MOMA with either model is signiﬁcantly better than E-IBEA in all cases. Similar to DTLZ2, support vector

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

400

Table 8 Results of ASM-MOMA and LAMMA (l ¼ 1 and l ¼ 4) with linear regression (LR) and support vector regression (SVR) on the test problems. Average hypervolume attained during 20 runs. Best values are in italics, signiﬁcantly better values are in bold with the appended letters denoting the variants which are signiﬁcantly worse (N for no model of IBEA, S for support vector regression, and L for linear regression), only the different models for the same variant of the algorithm are compared. Dim.

Algorithm

Objective function evaluations 1000

2000

Objective function evaluations 5000

10,000

DTLZ1

1000

2000

5000

10,000

DTLZ2

5

IBEA LR-ASM-MOMA SVR-ASM-MOMA LR-LAMMA-1 SVR-LAMMA-1 LR-LAMMA-4 SVR-LAMMA-4

0.294 0.265 0.270 0.283 0.260 0.246 0.262

0.413 0.386 0.384 0.400 0.383 0.374 0.375

0.569 0.580 0.573 0.596 0.595 0.549 0.566

0.711 0.753 0.737 0.747 0.749 0.702 0.731

0.650 0.654 0.660 0.640 0.639 0.653 0.652

0.685 0.699 0.720 0.700 0.688 0.710 0.708

0.732 0.745 0.779N 0.768N 0.772N 0.773N 0.771N

0.742 0.783 0.821NL 0.817N 0.822N 0.815N 0.812N

10

IBEA LR-ASM-MOMA SVR-ASM-MOMA LR-LAMMA-1 SVR-LAMMA-1 LR-LAMMA-4 SVR-LAMMA-4

0.244 0.257 0.245 0.260 0.249 0.250 0.244

0.271 0.305NS 0.270 0.291 0.277 0.277 0.268

0.304 0.318 0.308 0.316 0.325 0.303 0.311

0.325 0.339 0.336 0.346 0.352 0.322 0.311

0.726 0.732 0.739 0.747 0.727 0.726 0.717

0.730 0.739 0.743 0.755 0.735 0.737 0.725

0.738 0.748 0.754 0.760 0.740 0.745 0.737

0.744 0.756 0.761 0.771S 0.745 0.748 0.741

15

IBEA LR-ASM-MOMA SVR-ASM-MOMA LR-LAMMA-1 SVR-LAMMA-1 LR-LAMMA-4 SVR-LAMMA-4

0.193 0.198S 0.178 0.199 0.193 0.196 0.191

0.210 0.221 0.209 0.217 0.212 0.214 0.222

0.224 0.248S 0.223 0.240 0.226 0.238 0.256N

0.235 0.263N 0.239 0.256N 0.240 0.254 0.259N

0.83 0.849NS 0.836 0.845 0.853N 0.851N 0.840

0.832 0.856NS 0.837 0.846 0.853N 0.851N 0.844

0.836 0.864NS 0.837 0.854N 0.860N 0.857N 0.846

0.836 0.868NS 0.839 0.859N 0.862N 0.863N 0.852

DTLZ3

DTLZ4

5

IBEA LR-ASM-MOMA SVR-ASM-MOMA LR-LAMMA-1 SVR-LAMMA-1 LR-LAMMA-4 SVR-LAMMA-4

0.732 0.752 0.738 0.734 0.752 0.734 0.756

0.79 0.799 0.788 0.78 0.798 0.784 0.804

0.839 0.860 0.84 0.834 0.854 0.828 0.845

0.872 0.888 0.867 0.865 0.883 0.866 0.872

0.092 0.172N 0.202N 0.217N 0.230N 0.169N 0.166N

0.147 0.290N 0.347N 0.328N 0.346N 0.289N 0.272N

0.184 0.386N 0.449N 0.428N 0.431N 0.380 0.388N

0.208 0.421N 0.484N 0.470N 0.460N 0.407N 0.434N

10

IBEA LR-ASM-MOMA SVR-ASM-MOMA LR-LAMMA-1 SVR-LAMMA-1 LR-LAMMA-4 SVR-LAMMA-4

0.519 0.570N 0.545 0.542 0.546 0.540 0.525

0.519 0.572N 0.549 0.551N 0.55 0.549 0.537

0.527 0.581NS 0.548 0.560N 0.56 0.556 0.547

0.531 0.585N 0.555 0.565N 0.566N 0.562 0.55

0.557 0.626N 0.610N 0.627NS 0.597N 0.613N 0.599N

0.558 0.637N 0.625N 0.640NS 0.605N 0.618N 0.617N

0.559 0.637N 0.627N 0.641NS 0.605N 0.621N 0.642N

0.559 0.636N 0.626N 0.642NS 0.604N 0.621N 0.674

15

IBEA LR-ASM-MOMA SVR-ASM-MOMA LR-LAMMA-1 SVR-LAMMA-1 LR-LAMMA-4 SVR-LAMMA-4

0.447 0.458 0.458 0.464 0.467 0.479N 0.477N

0.452 0.463 0.462 0.472N 0.472 0.484N 0.480N

0.457 0.466 0.468 0.477N 0.473 0.489N 0.490N

0.458 0.469 0.474 0.477 0.47 0.491N 0.491N

0.96 0.973N 0.973N 0.974N 0.972N 0.976N 0.974N

0.961 0.973N 0.973N 0.974N 0.972N 0.976N 0.976N

0.962 0.973N 0.973N 0.974N 0.972N 0.975N 0.976N

0.962 0.973N 0.973N 0.974N 0.972N 0.974N 0.976N

regression works better in the 5 objective case and linear regression is slightly better for higher number of objectives. Moreover, in this case the performance of E-IBEA after 10,000 evaluations is reached by ASM-MOMA after only 1000 evaluations in all cases. The number of objective function evaluations is again decreased more than 10 times. 7.2. LAMMA The results for LAMMA on the selected benchmark functions are similar to those of ASM-MOMA. On DTLZ1 with 5 objectives the convergence is even slower than for ASM-MOMA, but the results after 10,000 evaluations are almost the same. The convergence slows down even more, when more local (l ¼ 4) are used. In this case locality of the models does not help and

a single global meta-model is better. When the number of objectives increases the differences tend to get smaller. LAMMA provides less signiﬁcant improvements than ASM-MOMA on this test problems. On DTLZ2 we see more signiﬁcant results with LAMMA, compared to ASM-MOMA. Also the differences between the two types of meta-models are lower both for l ¼ 1 and l ¼ 4. LAMMA provides similar speed-ups (in the terms of the function evaluations) as ASM-MOMA (5–10 times in this case). On DTLZ3 LAMMA again provides better results than E-IBEA. In the case of 15 objectives, the results are even signiﬁcantly better regardless of the meta-model used (for l ¼ 4). On DTLZ4 we again see the signiﬁcant improvements we observed with ASM-MOMA in all situations and cases. In this case LAMMA with l ¼ 1 provides the best results. The speed-ups are again more than 10 times, as LAMMA after 1000 evaluations

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

reaches better hypervolumes than those obtained by E-IBEA after 10,000 evaluations.

8. Conclusion In this paper we presented a memetic evolutionary algorithm for multiobjective optimization with local meta-models. We showed that the local models give better results than a single global model, usually reducing the number of needed function evaluations by 10%, with occasional reductions as high as 48%. Although this difference may seem rather small, it may greatly reduce the associated costs in practical tasks. We also showed that the algorithm is usable even for problems with quite simple objective functions, which take only milliseconds to evaluate, thus making it more widely usable. However, we saw that some problems are still difﬁcult to solve with LAMMA, and these provide the motivation for further research. The question is how well the benchmark problems correspond to the real-life ones, and also how to decide, whether a given problem belongs to a class of problems which can be easily solved by the presented evolutionary algorithm. We have also shown that aggregate meta-models can be used to speed up the search of evolutionary algorithms for manyobjective optimization problems. These meta-models are able to provide new non-dominated individuals and thus speed up the search, the reduction in the number of objective function evaluations is important before these algorithms may be used for solving practical tasks. Experiments with such tasks remain as a future work. The use of aggregate meta-models in many-objective optimization can also be seen as a combination of two approaches: indicator based algorithm and scalarization. In this case the scalarization is incorporated as a memetic operator and does not use new objective function evaluations, however, it still provides new non-dominated individuals which can be used by the main many-objective algorithm. The results indicate that using aggregate meta-models might be a promising direction in the ﬁeld of many-objective optimization, however some questions remain. One of them is, how would these approaches scale with the number of variables, as this is what directly affects the dimension of the space in which the model is built. We will also continue the work on memetic multiobjective algorithms with aggregate meta-models. One of the goals is the reduction in the number of times the model is trained which is a problem especially for more expensive local models. These are trained multiple times in each generation. One possibility could be to cluster the individuals before the model is constructed and to create a single local model for all the individuals in the cluster. Another open question is the effect of the degree of locality (represented by the l parameter) on the evolution convergence speed and the possibility to change this parameter adaptively. Moreover, local and global meta-models might be combined: the global meta-model may be used to pre-evaluate the individuals, and some of them might then be locally improved with a memetic operator based on a local model. The question is whether it would be more beneﬁcial to improve those individuals which already have good values, or those, which are worse.

Acknowledgments Roman Neruda has been supported by the Grant Agency of the Czech Republic under project no. P202/11/1368.

401

Martin Pila´t has been partially supported by Czech Science Foundation project no. 201/09/H057, GAUK project no. 345511, and SVV project no. 265 314.

References [1] K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: NSGA-II, in: M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J.J.M. Guervo´s, H.-P. Schwefel (Eds.), PPSN, Lecture Notes in Computer Science, vol. 1917, Springer, 2000, pp. 849–858. [2] E. Zitzler, M. Laumanns, L. Thiele, SPEA2: improving the strength pareto evolutionary algorithm, TIK Report 103, Computer Engineering and Networks Laboratory (TIK), ETH Zurich, Zurich, Switzerland, 2001. ¨ [3] E. Zitzler, S. Kunzli, Indicator-based selection in multiobjective search, in: X. Yao, et al., (Eds.), Conference on Parallel Problem Solving from Nature (PPSN VIII), Lecture Notes in Computer Science, vol. 3242, Springer, 2004, pp. 832–842. [4] J. Bader, E. Zitzler, HypE: An Algorithm for Fast Hypervolume-Based ManyObjective Optimization, TIK Report 286, Computer Engineering and Networks Laboratory (TIK), ETH Zurich, November 2008. [5] A.L. Jaimes, C.A.C. Coello, MRMOGA: a new parallel multi-objective evolutionary algorithm based on the use of multiple resolutions, Concurrency Comput.: Pract. Exper. 19 (4) (2007) 397–441. [6] F. Streichert, H. Ulmer, A. Zell, Parallelization of multi-objective evolutionary algorithms using clustering algorithms, in: C.A.C. Coello, A.H. Aguirre, E. Zitzler (Eds.), EMO, Lecture Notes in Computer Science, vol. 3410, Springer, 2005, pp. 92–107. [7] M. Pila´t, R. Neruda, Combining multiobjective and single-objective genetic algorithms in heterogeneous island model, in: IEEE Congress on Evolutionary Computation, IEEE, 2010, pp. 1543–1550. [8] C. Georgopoulou, K. Giannakoglou, Multiobjective metamodel-assisted memetic algorithms, Multiobjective Memetic Algorithms (2009) 153–181. [9] Q. Zhang, W. Liu, E. Tsang, B. Virginas, Expensive multiobjective optimization by MOEA/D with Gaussian process model, IEEE Trans. Evol. Comput. 14 (3) (2010) 456–474. [10] I. Loshchilov, M. Schoenauer, M. Sebag, A mono surrogate for multiobjective optimization, in: M. Pelikan, J. Branke (Eds.), GECCO, ACM, 2010, pp. 471–478. ˇ D. Linke, U. Rodemerck, L. Bajer, Neural networks as surrogate [11] M. Holena, models for measurements in optimization algorithms, in: K. Al-Begain, D. Fiems, W. Knottenbelt (Eds.), Analytical and Stochastic Modeling Techniques and Applications, Lecture Notes in Computer Science, vol. 6148, Springer, Berlin/Heidelberg, 2010, pp. 351–366. [12] J. Zhang, D.-S. Huang, K. hong Liu, Multi-sub-swarm particle swarm optimization algorithm for multimodal function optimization, in: IEEE Congress on Evolutionary Computation, 2007, pp. 3215–3220. [13] J. Zhang, D.-S. Huang, T.-M. Lok, M.R. Lyu, A novel adaptive sequential niche technique for multimodal function optimization, Neurocomputing 69 (16–18) (2006) 2396–2401. [14] J. Zhang, J.-R. Zhang, K. Li, A sequential niching technique for particle swarm optimization, in: D.-S. Huang, X.-P. Zhang, G.-B. Huang (Eds.), ICIC (1), Lecture Notes in Computer Science, vol. 3644, Springer, 2005, pp. 390–399. [15] M. Reyes-Sierra, C.A.C. Coello, Multi-objective particle swarm optimizers: a survey of the state-of-the-art, Int. J. Comput. Intell. Res. 2 (3). [16] J.D. Schaffer, Multiple objective optimization with vector evaluated genetic algorithms, in: J.J. Grefenstette (Ed.), ICGA, Lawrence Erlbaum Associates, 1985, pp. 93–100. [17] E.J. Hughes, Evolutionary many-objective optimisation: many once or one many? in: 2005 IEEE Congress on Evolutionary Computation (CEC 2005), vol. 1, IEEE Service Center, Edinburgh, Scotland, 2005, pp. 222–227. [18] A. Ratle, Optimal sampling strategies for learning a ﬁtness model, in: Proceedings of the 1999 Congress on Evolutionary Computation (CEC ’99), vol. 3, 1999, pp. 3 (vol. xxxvii þ2348). [19] Y. Jin, M. Olhofer, B. Sendhoff, A framework for evolutionary optimization with approximate ﬁtness functions, IEEE Trans. Evol. Comput. 6 (5) (2002) 481–494. [20] Y. Ong, K. Lum, P. Nair, D. Shi, Z. Zhang, Global convergence of unconstrained and bound constrained surrogate-assisted evolutionary search in aerodynamic shape design, in: The 2003 Congress on Evolutionary Computation, (CEC ’03), vol. 3, 2003, pp. 1856–1863. [21] H. Ulmer, F. Streichert, A. Zell, Evolution strategies assisted by gaussian processes with improved preselection criterion, in: The 2003 Congress on Evolutionary Computation (CEC ’03), vol. 1, 2003, pp. 692–699. [22] C. Goh, D. Lim, L. Ma, Y. Ong, P. Dutta, A surrogate-assisted memetic coevolutionary algorithm for expensive constrained optimization problems, in: 2011 IEEE Congress on Evolutionary Computation (CEC), 2011, pp. 744–749. [23] I. Voutchkov, A. Keane, Multiobjective optimization using surrogates, in: Presented on Adaptive Computing in Design and Manufacture (ACDM ’06), 2006. [24] M. Emmerich, K. Giannakoglou, B. Naujoks, Single- and multi-objective evolutionary optimization assisted by gaussian random ﬁeld metamodels, IEEE Trans. Evol. Comput. 10 (4) (2006) 421–439.

402

´t, R. Neruda / Neurocomputing 116 (2013) 392–402 M. Pila

[25] E. Zitzler, L. Thiele, Multiobjective optimization using evolutionary a comparative case study, in: Conference on Parallel Problem Solving from Nature (PPSN V), Amsterdam, 1998, pp. 292–301. [26] D. Chafekar, L. Shi, K. Rasheed, J. Xuan, Multiobjective ga optimization using reduced models, IEEE Trans. Syst. Man Cybern., Part C: Appl. Rev. 35 (2) (2005) 261–265. [27] J. Knowles, Parego: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems, IEEE Trans. Evol. Comput. 10 (1) (2006) 50–66. [28] I. Loshchilov, M. Schoenauer, M. Sebag, Dominance-based pareto-surrogate for multi-objective optimization, in: K. Deb, A. Bhattacharya, N. Chakraborti, P. Chakroborty, S. Das, J. Dutta, S. Gupta, A. Jain, V. Aggarwal, J. Branke, S. Louis, K. Tan (Eds.), Simulated Evolution and Learning, Lecture Notes in Computer Science, vol. 6457, Springer, Berlin/Heidelberg, 2010, pp. 230–239. [29] T. Joachims, A support vector method for multivariate performance measures, in: Proceedings of the 22nd International Conference on Machine Learning (ICML ’05), ACM, New York, NY, USA, 2005, pp. 377–384. [30] A. Talukder, M. Kirley, R. Buyya, The pareto-following variation operator as an alternative approximation model, in: IEEE Congress on Evolutionary Computation (CEC ’09), 2009, pp. 8–15. [31] D. Lim, Y. Jin, Y.-S. Ong, B. Sendhoff, Generalizing surrogate-assisted evolutionary computation, Trans. Evol. Comp 14 (2010) 329–355. [32] A. Isaacs, T. Ray, W. Smith, An evolutionary algorithm with spatially distributed surrogates for multiobjective optimization, in: Proceedings of the Third Australian Conference on Progress in Artiﬁcial Life (ACAL ’07), Springer-Verlag, Berlin, Heidelberg, 2007, pp. 257–268. [33] Q. Zhang, H. Li, Moea/d: a multiobjective evolutionary algorithm based on decomposition, IEEE Transactions on Evolutionary Computation 11 (6) (2007) 712–731. [34] M. Pila´t, R. Neruda, ASM-MOMA: multiobjective memetic algorithm with aggregate surrogate model, in: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2011, IEEE, 2011, pp. 1202–1208. [35] H. Ishibuchi, N. Tsukamoto, Y. Nojima, Evolutionary many-objective optimization: a short review, in: Proceedings of 2008 IEEE Congress on Evolutionary Computation, 2008, pp. 2424–2431. [36] A. Auger, J. Bader, D. Brockhoff, E. Zitzler, Theory of the hypervolume indicator: optimal m-distributions and the choice of the reference point, in: Foundations of Genetic Algorithms (FOGA 2009), 2009, Workshop Version. [37] K. Bringmann, T. Friedrich, Approximating the volume of unions and intersections of high-dimensional geometric objects, CoRR abs/0809.0835. [38] E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results, Evol. Comput. 8 (2) (2000) 173–195. [39] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, SIGKDD Explor. 11 (1) (2009) 10–18, http://dx.doi.org/10.1145/1656274.1656278.

[40] K. Deb, R.B. Agrawal, Simulated binary crossover for continuous search space, in: Complex Systems, vol. 9, 1995, pp. 115–148. [41] K. Deb, M. Goyal, A combined genetic adaptive search (GeneAS) for engineering design, Comput. Sci. Inf. 26 (1996) 30–45.

Martin Pila´t received his M.Sc. in theoretical computer science in 2009 at the Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic. Currently, he is a Ph.D. and lecturer at the same institution. His scientiﬁc interests include evolutionary multiobjective optimization, parallel evolutionary algorithms, and data mining.

Roman Neruda received his Ph.D. in theoretical computer science in 1998 from the Academy of Sciences of the Czech Republic. He works as a researcher at the Institute of Computer Science, Academy of Sciences of the Czech Republic in Prague, and as a lecturer at Charles University in Prague, Faculty of Mathematics and Physics. His research interests include hybrid computational intelligence models and intelligent agents.

Aggregate meta-models for evolutionary multiobjective and many-objective optimization

Aggregate meta-models for evolutionary multiobjective and many-objective optimization

Recommend Documents