Advances in Engineering Software 47 (2012) 62–71
Contents lists available at SciVerse ScienceDirect
Advances in Engineering Software journal homepage: www.elsevier.com/locate/advengsoft
A computational intelligence algorithm for simulation-driven optimization problems Yoel Tenne Faculty of Science and Engineering, Kyoto University, Kyoto, Japan
a r t i c l e
i n f o
Article history: Received 9 December 2010 Received in revised form 2 August 2011 Accepted 13 December 2011 Available online 9 January 2012 Keywords: Expensive functions Evolutionary algorithms Modeling Classification Model selection Simulation-driven optimization
a b s t r a c t The modern engineering design process often relies on computer simulations to evaluate candidate designs. This simulation-driven approach results in what is commonly termed a computationally expensive black-box optimization problem. In practise, there will often exist candidate designs which cause the simulation to fail. Such simulation failures can consume a large portion of the allotted computational resources, and thus can lead to search stagnation and a poor final solution. To address this issue, this study proposes a new computational intelligence optimization algorithm which combines a model and a k-NN classifier. The latter predicts which solutions are expected to cause the simulation to fail, and its prediction is incorporated with the model prediction to bias the search towards valid solutions, namely, for which the simulation is expected to succeed. A main contribution of this study is that to further improve the search efficacy, the proposed algorithm leverages on model-selection theory and continuously calibrates the classifier during the search. An extensive performance analysis using an engineering application of airfoil shape optimization shows the efficacy of the proposed algorithm. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Nowadays, engineers often use computer simulations to evaluate candidate designs. Such simulations, which must be properly validated with real-world laboratory experiments, can reduce the time and cost of the design process, and hence have established themselves as a valuable tool. This simulation-driven setup entails an optimization problem with three distinct challenges [1]: The simulation is often available only as a legacy or a commercial software, whose internal workings are inaccessible to the user. Accordingly, it is treated as a black-box function with no analytic expression defining how candidate solutions are mapped to their objective values. Each simulation run is computationally expensive, that is, requiring a lengthy execution time, and so only a small number of such evaluations can be made during the entire optimization search. and The resultant objective function often has a complicated, nonconvex landscape, which makes it difficult to locate an optimum. An established strategy for dealing such optimization problems is to couple a computational intelligence (CI) optimizer, such as an evolutionary algorithm (EA), particle swarm optimizer (PSO) and E-mail address:
[email protected] 0965-9978/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.advengsoft.2011.12.009
alike, with a model, namely, a mathematical approximations of the objective function which is computationally cheaper to evaluate. This combination has proven to be effective since the CI optimizer typically performs well on nonconvex objective landscapes, while the model replaces the expensive function and economically provides its predicted objective values to the optimizer. However, simulation-driven problems introduce another difficulty: the simulation may ‘crash’ and fail to return an objective value for some candidate designs. We refer to such designs as simulator-infeasible (SI), while those for which the simulation completes successfully and provides the output value are termed simulator-feasible (SF). SI designs (or vectors) have two main implications for the optimization problem: (a) they do not have an objective value associated with them, which results in a discontinuous objective function and exacerbates the difficulty of the optimization, and (b) such designs can consume a considerable portion of the allotted computational resources without providing new information nor driving the search towards better designs, thus degrading the search effectiveness. To effectively handle such SI vectors in simulation-driven optimization, we propose a CI algorithm which incorporates a classifier into the optimization search. Specifically, the proposed algorithm retains the SI vectors during the entire search, so they could be used to train the classifier. The classifier is continuously trained using both the SI and SF vectors already evaluated, and its role is to predict if a new vector is SI or not. The proposed algorithm incorporates the classifier’s prediction with the model prediction
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
to bias the search to vectors predicted to be SF. The classifiers used is the k nearest neighbors (k-NN), thanks to its robustness and versatility. However, this classifier requires a user-prescribed parameter, namely, k, the number of nearest neighbors to consider in the classification process. The optimal value of this parameter is unknown a priori, and it maybe impractical to obtain by numerical experiments due to the limited computational budget. Therefore, a main contribution of this paper is the classifier-adaptation stage which continuously selects during the search an optimal k value, to further improve the search efficacy. An extensive performance analysis using an engineering application of airfoil shape optimization shows the efficacy of the proposed algorithm. The remainder of this paper is organized as follows: Section 2 provides the relevant theoretical background on expensive optimization problems, Section 3 describes the proposed algorithm, and Section 4 gives a detailed performance analysis. Lastly, Section 5 summarizes this paper.
2. Theoretical background 2.1. The use of models in expensive optimization problems Computationally expensive optimization problems are prevalent in engineering and science, and as mentioned in Section 1, a typical scenario is the design optimization process driven by computationally expensive computer simulations. Fig. 1 shows the layout of such problems. The objective function arising in such problems is often nonconvex, and so classical gradient-based optimizers may converge to a poor local optimum. This has motivated the application of CI optimizers, as they tend to perform well in such challenging objective landscapes. In contrast to classical optimizers which use a single iterate, CI optimizers employ a population of candidate solutions, as well as stochastic operators, and both of these features provide them with enhanced exploration capabilities and better immunity to premature convergence. One such CI optimizer is the evolutionary algorithm (EA), which applies operators inspired by the theory of evolution, namely, selection, recombination and mutation. The fittest solutions survive and propagate through the generations, which results in a gradual adaptation of the population to the function landscape [2]. While CI optimizers do not require gradient information and perform well on nonconvex landscapes, they often require a large number of function evaluations to converge. This poses an obstacle in applying them to expensive optimization problems, where the expensive function can be evaluated only a small number of times. One strategy to overcome this is modeling, where a computationally-cheaper mathematical approximation, namely, the model, replaces the true expensive function during most the search. Models are typically interpolants trained with previously evaluated vectors, for example, polynomials, artificial neural networks, radial basis functions (RBF), and Kriging [3]. CI algorithms which use models are commonly termed model-assisted or surrogate-assisted, and following their established effectiveness [4,5] we adapt the modeling approach in this study. The high evaluation cost implies only a small number of vectors will be available to train the model, and so it will be inherently
Fig. 1. Principle schematic of simulation-driven optimization problems.
63
inaccurate. The extent of the model’s inaccuracy is unknown a priori, and depends on problem features such as the number of variables, and the complexity of the objective function landscape [6]. If the model inaccuracy is severe, the optimizer may even converge to a false optimum, namely, an optimum of the model which is not an optimum of the true expensive function [7]. As such, model-assisted algorithms must manage the model to ensure convergence to an optimum of the true expensive function. One established approach for model management is the trustregion (TR) framework, which originated in the domain of nonlinear programming. Here, the model is assumed to be accurate in the TR (designated as T), that is, a neighborhood around the current iterate. Starting from an initial guess x(0), at each iteration, i = 0, 1, . . ., a model is trained and the optimizer performs a trial step where it seeks an optimum of the model m(x) in the TR, where
T ¼ fx : kx xðiÞ k2 6 Dg;
ð1Þ
and D is the TR radius. Assuming a minimization problem, this defines the following constrained optimization problem
min mðxÞ s:t:
x2T
ð2Þ
which yields xm, the optimum of the model in the TR. The success of the trial step is then measured by the merit value
q¼
f ðxðiÞ Þ f ðxm Þ ; mðxðiÞ Þ mðxm Þ
ð3Þ
where q > 0 indicates the trial was successful. Based on the value of q the algorithm then updates the iterate and the TR, and different TR updating schemes have been proposed. For example, Wujek and Renaud [8,9] proposed using three ranges of values, where the TR was contracted if q was deemed low, enlarged if q was deemed high, and no change was made for intermediate values. Another scheme was proposed by Conn et al. [10], and used two threshold values for updating the TR. In this study, we apply the following updating scheme: If q > 0: the trust-region (TR) is centered at xm (so x(i+1) = xm) and D is increased. Otherwise: D is decreased. A strong merit of the TR framework is that it guarantees asymptotic convergence to an optimum of the true expensive objective function [11], and this has motivated its use with CI optimizers, for example [5,12–14]. 2.2. Simulator-infeasible vectors As mentioned in Section 1, this study focuses on expensive optimization problems with candidate solutions which cause the simulation code to fail, namely, SI vectors. The difficulties introduced by such vectors are well documented in literature, for example Koehler and Owen [15] mentioned ‘inputs combinations which are likely to crash the simulator’, Rasheed et al. [16] described a multidisciplinary optimization problem with ‘unevaluable points’ which ‘cause the simulator to crash’, while Conn et al. [17] mentioned ‘virtual constraints’ where ‘function evaluations fail at certain points’. Additional studies mentioning this issue include [7,18–22]. Regarding techniques to handle SI vectors, Rasheed et al. [16] used an EA as the optimizer, and proposed using a classifier to screen vectors before evaluating them. Those predicted to be SI were assigned a ‘death penalty’, that is, a fictitious and highly penalized objective value, to quickly eliminate them from the population. The study did not consider using models, and the EA
64
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
directly evaluated the expensive function. A related approach was used by Emmerich et al. [20], where SI vectors were assigned a highly penalized objective value and incorporated into the model. This was done to divert the search away from such vectors. In an alternative approach, Büche et al. [22] excluded all SI vectors from the training set of the model, effectively ignoring them altogether. The above approaches may not be effective in expensive optimization problems for several reasons: Assigning SI vectors a fictitious and highly penalized objective value, and then incorporating them into the model can severely deform the model landscape and create false optima.
ing two components: a ‘drift’ function, which is a global coarse approximation to the true function, and a local correction based on the correlation between the interpolation vectors. Given a set of evaluated vectors, xi 2 Rd ; i ¼ 1 . . . n, the Kriging model is trained such that it exactly interpolates the observed values, that is, m(xi) = f(xi), where m(x) and f(x) are the model and true objective function, respectively. Using a constant drift function [15] gives the Kriging model
mðxÞ ¼ b þ jðxÞ;
with the drift function b and local correction j(x). The latter is defined by a stationary Gaussian process with mean zero and covariance
Cov ½jðxÞjðyÞ ¼ r2 Rðh; x; yÞ;
while
ð4Þ
ð5Þ
SI vectors represent a piece of information evaluated at high computational cost, which may turn useful in the subsequent stages of optimization search.
where R is a user-prescribed correlation function. A common choice is the Gaussian correlation function [15], defined as
To demonstrate these potential issues, Fig. 2 compares two Kriging models of the Rosenbrock function. Fig. 2a shows a model trained using 50 vectors all SF, while in Fig. 2b 20 vectors were artificially set as SI, and were assigned a penalized fitness taken to be the worst objective value of the SF vectors. The resultant model has a severely deformed landscape which contains many false optima, making it difficult to locate an optimum of the true objective function. Such issues have motivated seeking alternative approaches to handle SI vectors. For example, Tenne and Armfield [14] proposed a dual model approach, where one model was used for the objective function and another provided an interpolated penalty, so vectors predicted to be SI were assigned a high penalty, and vice versa. Other studies have examined the use of classifiers for constrained nonlinear programming, but unrelated to SI vectors [23]. Further exploring the use of classifiers, Ref. [24] presented preliminary results with a classifier-assisted algorithm for handling SI vectors.
and combining it with the constant drift function transforms the model from (4) into the following form
3. Proposed algorithm The algorithm proposed in this study is designed to handle computationally expensive problems with SI vectors, and incorporates a model and a k-NN classifier. The algorithm manages the model to ensure convergence to an optimum of the original design problem, and calibrates the k-NN classifier to improve the search efficacy. The main steps entailed by the algorithm are described in the remainder of this section. 3.1. Generating the model The proposed algorithm imposes no restriction on the type of model which can be used, and in this study it used a Kriging model thanks to its ability to represent complicated landscapes [25,26]. This model takes a statistical approach to interpolation by combin-
Rðh; x; yÞ ¼ Pdi¼1 expðhðxi yi Þ2 Þ;
^ þ rðxÞT R1 ðf 1bÞ: ^ mðxÞ ¼ b
(b)
Fig. 2. Kriging models of the Rosenbrock function. (a) The resultant model when using a baseline training sample including all SF design vectors. (b) The resultant model when using the baseline sample, but where 20 vectors were set as SI and assigned the worst objective value.
ð7Þ
^ is the estimated drift coefficient, R is the symmetric matrix Here, b of correlations between all interpolation vectors, f is the vector of objective values, and 1 is a vector whose elements are all equal to 1. rT is the correlation vector between a new vector x and the sample vectors, namely,
rT ¼ ½Rðh; x; x1 Þ; . . . ; Rðh; x; xn Þ:
ð8Þ
^ and variance r ^ 2 are obtained from The estimated drift coefficient b
^ ¼ ð1T R1 1Þ1 1T R1 f; b 1 n
^ T R1 ðf 1bÞ: ^ r^ 2 ¼ ½ðf 1bÞ
ð9aÞ ð9bÞ
Fully defining the model requires the correlation parameter h, whose optimal value, hq, is commonly taken as the maximizer of the model likelihood, or equivalently, the minimizer of the negative log-likelihood of the model, that is
^ 2 Þ þ logðjRjÞÞ: h : min ðn logðr
ð10Þ
3.2. Description of the classifier Classifiers were originated in the domain of machine learning, and their goal is to predict the class of a vector [27]. Specifically, given a set of vectors, where each is associated with a group, the goal of a classifier is to map a new vector to one of these groups based on a similarity between the new vector and existing ones. Accordingly, given a set of vectors xi 2 Rd ; i ¼ 1 . . . n, with corresponding class labels, for example, in a two-class problem, Fðxi Þ 2 I ¼ f1; þ1g, a classifier performs the mapping
cðxÞ : Rd ! I;
(a)
ð6Þ
ð11Þ
where I is the set of possible classes, and c(x) is the class assigned by the classifier. In this study, vectors are classified as either SF (a class label of + 1) or SI (a class label of 1), which entails a two-class problem. To classify vectors, the proposed algorithm used the nearest neighbor (NN) classifier [28], which assigns the new vector the same class as its closest training vector (xNN), namely,
cðxÞ ¼ FðxNN Þ : dðx; xNN Þ ¼ mini¼1...n dðx; xi Þ;
ð12Þ
where d(x,y) is a distance measure, such as the Euclidean distance. A variant of the algorithm is the k nearest neighbors (K-NNl), which
65
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
assigns the new vector the class most frequent among its k nearest neighbors. k-NN classifiers require the user to prescribe the parameter k, where the latter affects the classifier’s accuracy. To illustrate this, we tested k-NN classifiers with different k values on two wellestablished data sets, iris and yeast, from the UCI machine learning repository [29]. Each data set was split in a 80–20 training–testing ratio, and the number of misclassifications was recorded at each k value, with k = 1, 2, 5, 8, and 10. Table 1 gives the observed results, which confirms the classification accuracy was dependant on the value of k. In practise, the optimal value for k is unknown a priori, while a poor parameter value can degrade the classifier’s accuracy and hence the search efficacy. To address this, the proposed algorithm introduces a classifier adaptation step where it selects a value for the k parameter out of a family of candidate values. To identify a good value in a mathematically rigorous approach, the parameter calibration is treated under the statistical framework of model selection [30]. Specifically, for each candidate k value a corresponding classifier is generated, and its accuracy is estimated using an available sample of previously evaluated vectors. Lastly, the k value chosen is the one corresponding to the classifier with the best estimated accuracy. Also following the model selection framework, the classifier’s accuracy is estimated using the cross-validation (CV) technique, where the vectors stored in memory are split into a training set and a testing set. A classifier is trained using the former and its accuracy is estimated using the latter, and with the following error measure
e¼
m X
ð^cðxi Þ – Fðxi ÞÞ
ð13Þ
i¼1
where ^cðxÞ is a classifier trained using the training set, and xi,i = 1 . . . m, are the testing vectors. To identify a suitable split ratio between the training and testing sets, we have performed numerical experiments, as described in Section 4.2. 3.3. Description of the optimization algorithm The proposed algorithm begins by generating an initial sample of vectors using a Latin hypercube (LH) design of experiments [31]. The technique is used to ensure the sample is space-filling, as this improves the model accuracy. Briefly, for a sample of l vectors the range of each variable is split into l equal intervals, and one point is sampled at random in each interval. Next, a sample point is selected at random and without replacement for each variable, and these samples are combined to give a vector. This procedure is repeated for l times to generate the complete sample. The sample vectors are then evaluated with the expensive simulation, and are stored in memory. The main optimization loop then begins, where the proposed algorithm trains a Kriging model using only the SF vectors stored in memory, and then trains a classifier using all vectors stored in memory, namely, both SF and SI, to account for both vector classes. The best vector in memory is taken as the TR center (xc), and the proposed algorithm performs a trial-step by seeking the optimum
Table 1 Misclassification error of a k-NN classifier for different k values. k
Iris
Yeast
1 2 5 8 10
3 5 1 2 2
375 388 330 321 313
Table 2 Internal parameters of the EA utilized in this study. Population size Generations Selection Recombination Mutation1
100 100 Stochastic universal selection (SUS) Intermediate, applied with probability p = 0.7 Breeder Genetic Algorithm (BGA) mutation, applied with probability p = 0.1 10%
Elitism 1
Based on ref. [32]
of the model in the TR. As the optimizer, it uses the real-coded EA of Chipperfield et al. [32], and since evaluating the model is computationally cheap, the EA uses a large population and many generations to improve its search. Table 2 gives the full EA parameter settings. However, during the trial-step the EA does not obtain the predicted objective values from the model, but instead from the following modified objective function
^ mðxÞ ¼
mðxÞ if cðxÞ is SF p
if cðxÞ is SI
ð14Þ
where m(x) is the model prediction, and p is a penalized fitness taken to be the worst function value from the initial LH sample. Accordingly, the EA receives the model prediction if the classifier predicts a vector is SI, but receives the penalized fitness otherwise. In this setup, the information about SI vectors is preserved in the classifier, but they are not incorporated into the model with a penalized fitness, and hence do not deform its landscape, which therefore avoids the issues discussed in Section 2.2. Next, the optimum found by the EA, namely, x⁄, is evaluated with the true expensive function, which yields f(x⁄). Based on the outcome of the trial step, the proposed algorithm manages the model and the TR, as follows: If f(x⁄) < f(xc): The trial step was successful, and so the TR is centered at the new optimum and its radius is doubled. If f(x⁄) P f(xc) and there are sufficient SF points inside the TR: The trial step was unsuccessful, and since there are enough points in the TR, the model is considered to be sufficiently accurate to justify contracting the TR. Accordingly, the TR radius is halved. If f(x⁄) P f(xc) and there are insufficient SF points inside the TR: The search was unsuccessful, but this may be due to poor model accuracy resulting from too few points in the TR. Accordingly, the proposed algorithm adds a new point (xn) inside the TR, and the procedure for this is explained below. In comparison to the classical TR framework, the above steps also monitor the number of vectors in the TR, since the model is an interpolant. Specifically, contracting the TR when there are too few vectors in the TR may lead to premature convergence [10]. To select a suitable value for this threshold number of vectors in the TR, p, we have performed numerical experiments, as described in Section 4.2. Another change from the classical TR framework is the addition of a new vector (xn) in the TR to improve the model accuracy. Since this accuracy depends on the distance between the interpolation vectors [33], this new vector should be remote from existing ones in the TR. Mathematically, this translates to the following max– min problem
xn : maxx2T minxi 2T fkx xi k2 g;
ð15Þ
where xi, i = 1 . . . l, are the existing vectors in the TR [34]. To simplify the solution of Eq. (15), the proposed algorithm generates a LH sample in the TR, and chooses the sample vector with the largest minimum distance.
66
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
The trial step may also fail when, in spite of the model correctly predicting the location of a new optimum, the classifier erroneously predicts this vector is SI, and so the optimization search will be biased away from it. This implies that it is necessary to also monitor the effect of the classifier on the search. To accomplish this, the proposed algorithm checks the number of successive failed iterations, that is, iterations in which no better optimum was found. If there have been such r successive iterations, where r is a user-prescribed parameter, another trial-step is performed but with the classifier disabled, so the EA receives only the model’s predictions. The optimum obtained in this step is compared to the one from the baseline TR trial step, namely, with the classifier enabled, and if the two optima differ the classifier is affecting the search. Accordingly, to improve the classifier, a new vector is added in a similar procedure to the one described above, that is, by generating a Latin hypercube design (LHD) sample and selecting a vector based on the max–min criterion. However, the sample is now generated in the cuboid bounded by the above two predicted optima. To select a suitable value for r, we have performed numerical experiments, as described in Section 4.2. Lastly, if the TR has contracted for q consecutive times, this is taken as an indication of convergence to an optimum. Therefore, the proposed algorithm adds a new vector outside the TR to improve the model accuracy globally and to assist in discovery of new optima. This new vector is generated using the same procedure as for the new interior point xn, but now considering the entire search space instead of just the TR. To select an optimal value for q, we have performed numerical experiments, as described in Section 4.2. To complete the description, Table 3 gives the pseudocode of the proposed algorithm, and Fig. 3 shows the workflow of the optimization iteration. 4. Results and discussion This section analyzes the performance of the proposed algorithm, as follows: it first describes the test problem used for performance analysis, it then provides a parameter sensitivity analysis
used to select suitable settings for the algorithm parameters, and lastly it describes benchmark tests where the proposed algorithm was compared to existing model-assisted algorithms, and extensively analyzes the results. 4.1. Problem description As a pertinent test problem, which is both simulation-driven and contains SI vectors, we have used the engineering application of airfoil shape optimization, whose formulation is as follows. During flight, an aircraft generates lift, namely, the force which counters the aircraft weight and keeps it airborne, and drag, which is an aerodynamic friction force obstructing the aircraft’s movement. In practise, the design requirements for airfoils are specified in terms of the nondimensional lift and drag coefficients, cl and cd, respectively, defined as
cl ¼ 1
L
qV 2 S 2
cd ¼ 1
D
qV 2 S 2
ð16aÞ
ð16bÞ
where L and D are the lift and drag forces, respectively, q is the air density, V is aircraft speed, and S is a reference area, such as the wing area. Besides the lift and drag coefficients, the other relevant physical quantities involved are the aircraft altitude, speed, and angle of attack (AOA), which is the angle between the aircraft velocity and the airfoil chord line. Fig. 4a shows the setup of the airfoil problem. In the airfoil problem used, the optimization goal was to find an airfoil shape which maximizes the lift coefficient (cl) and minimizes the aerodynamic drag coefficient (cd) at the prescribed flight altitude, speed, and AOA. Also, to ensure structural integrity, between 0.2 and 0.8 of the chord length the airfoil’s thickness (t) must not be smaller than a critical value tq = 0.1. To facilitate the model-assisted optimization, namely, so the algorithm would rely on a single model to account for both the
Table 3 Flow-chart of the proposed optimization algorithm with the adaptive k-NN classifier. generate an initial LH sample; evaluate the sample vectors and store them in memory; repeat set the best vector in memory as the TR center; train a model using the SF vectors stored in memory; /⁄classifier calibration step⁄/ for k = {k1, k2, . . .} do test the k-NN classifier accuracy using CV; record the k parameter corresponding to the most accurate classifier; define a k-NN classifier with the optimal k parameter, and with all vectors stored in memory; ⁄ /⁄TR trial step / search for the model optimum using an EA and the modified objective function; evaluate the predicted optimum; ⁄ /⁄managing the model and TR / if the new optimum is better than the TR center then double the TR radius else if the new optimum is not better than the TR center and there are insufficient vectors in the TR then add a new point in the TR to improve the model; else if the new optimum is not better than the TR center and there are sufficient vectors in the TR then halve the TR radius; ⁄ /⁄improving the classifier / if there have been r unsuccessful iterations then obtain an optimum with the classifier disabled; if the new optimum differs from the one from the baseline TR step, add a new vector to improve the classifier; /⁄checking for search stagnation ⁄/ if there have been q consecutive TR contractions then add a vector outside the TR to improve the model and classifier global accuracy; until maximum number of analyzes completed;
67
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
train a model perform a TR trial step using the modified function
update the TR & model
check for classifier improvement
select an optimal k parameter for the classifier
Fig. 3. Workflow of an optimization iteration in the proposed algorithm.
(a)
(b)
Fig. 4. (a) Nomenclature schematic of the airfoil optimization problem; (b) the relationship between the number of SI designs and the angle of attack.
objective function and the thickness constraint, we incorporated the latter into the objective function using a penalty approach, which resulted in the following objective function
f ¼
cl þ p; cd
ð17aÞ
where p is the penalty for airfoils which violate the thickness constraint, and defined as
( p¼
t t
ccl if t < tI d
0
ð17bÞ
otherwise
The flight conditions, which were fixed for all tests, were a cruise altitude of 30,000 ft and a cruise speed of Mach 0.7 (70% of the speed of sound). Candidate airfoils were represented with the Hicks–Henne parameterization, which uses a baseline airfoil and smoothly modifies it using a set of shape functions, bi(x), i = 1 . . . h, where h is prescribed by the user [35]. The upper and lower profiles of a candidate airfoil were then described by
y ¼ yb þ
h X
coefficients were obtained with XFoil, a computational fluid dynamics simulation for analysis of isolated airfoils operating in the subsonic regime [37]. Each airfoil evaluation required up to 30 s on a desktop computer. Fig. 4a summarizes the nomenclature of the different quantities involved in the design problem. Since the turbulence of the flow field increases with the angle of attack (AOA), at higher AOA values it will be more difficult to complete the aerodynamics simulation, and more trial designs will be SI. To verify this, 30 different airfoils were sampled and evaluated in identical flight conditions, except for the AOA which was increased from 20° to 50°. Fig. 4b shows the obtained results, where, as expected, the number of failed evaluations increased with the AOA. Therefore, by changing the AOA we could change the density of SI vectors in the search space, and hence the relative difficulty of the tests. In view of these results, the proposed algorithm was tested at the three settings AOA = 20°, 30°, and 40°, which following Fig. 4b, correspond to a low, medium and high density of SI vectors in the search space, respectively. 4.2. Parameter sensitivity analysis
ai bi ðxÞ;
ð18Þ
i¼1
where yb is the baseline upper/lower profile, taken as the NACA0012 symmetric airfoil, ai 2 [0.01, 0.01] were the coefficients (design variables) to be found, and with the basis functions [36]
h logð0:5Þ i4 bi ðxÞ ¼ sin pxlogði=ðhþ1ÞÞ ;
ð19Þ
with h = 10 functions for the upper and lower profile each, respectively, or a total of 20 functions per airfoil. The lift and drag
As described in Section 3, the proposed algorithm relies on four main parameters, namely: p: The minimum number of vectors in the TR needed to contract the TR. q: The minimum number of successive TR contractions needed to add a new vector outside the TR. r: The minimum number of unsuccessful iterations needed to add a new vector to improve the classifier.
68
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
Table 4 Performance statistics for different parameter values. p=2
p = 10
(a) p, Number of vectors in the TR needed to contract the TR Mean 2.052e+00 2.121e+00 SD 2.570e02 5.067e02 Median 2.057e+00 2.127e+00 Min 2.099e+00 2.188e+00 Max 2.015e+00 2.036e+00 r=2
r=4
p = 20 2.143e+00 2.999e02 2.142e+00 2.191e+00 2.109e+00 r=6
(b) r, Number of unsuccessful iterations to improve the classifier Mean 2.052e+00 2.062e+00 2.091e+00 SD 2.570e02 5.793e02 5.245e02 Median 2.057e+00 2.049e+00 2.088e+00 Min 2.099e+00 2.203e+00 2.160e+00 Max 2.015e+00 1.999e+00 2.018e+00 q=2 (c) q, Number of unsuccessful iterations Mean 2.052e+00 SD 2.570e02 Median 2.057e+00 Min 2.099e+00 Max 2.015e+00 80–20
q=1 to add a global point 2.089e+00 5.181e02 2.072e+00 2.181e+00 2.019e+00 50–50
(d) Split ratio (%) between the training and testing samples Mean 2.052e+00 2.079e+00 SD 2.570e02 5.047e02 Median 2.057e+00 2.070e+00 Min 2.099e+00 2.154e+00 Max 2.015e+00 1.997e+00
q=5 2.059e+00 3.377e02 2.060e+00 2.108e+00 2.010e+00 20–80 2.086e+00 5.025e02 2.104e+00 2.166e+00 2.005e+00
The split ratio between the training and testing sets, used in the classifier adaptation stage. To identify suitable values for each of these parameters, we benchmarked the proposed algorithm in identical settings except that one parameter was changed at a time, with three candidate values per parameter and 10 repetitions per candidate value. Table 4 shows the obtained results, from which it follows that suitable parameter settings are: p = 20, q = 1, r = 6, and a split ratio of 20–80%. Accordingly, these settings were used throughout the benchmark tests described in the following section. 4.3. Benchmark tests For a comprehensive evaluation, the proposed algorithm was benchmarked against the following two representative model-assisted EAs from literature: Model-assisted EA with periodic sampling (EA–PS) [38]: The algorithm safeguards the model accuracy by periodically evaluating a small subset of the population with the true objective function, and incorporating the new members into the model. The algorithm begins by generating candidate solutions using a LH design, evaluates them with the expensive function, and trains an initial Kriging model. A real-coded EA then runs for 10 generations while evaluating only the model. Next, the algorithm evaluates the best ten candidate solutions in the population with the true expensive function, and incorporates them into the model. The process repeats until the maximum number of expensive function evaluations is reached. In the tests, the EA was identical to the one in the proposed algorithm (Table 2). Expected-improvement with model-assisted CMA-ES (EI–CMA-ES) [22]: The algorithm searches for and evaluates solutions which represent different trade-offs between a global and a local search. The algorithm begins by generating an initial sample of
candidate solutions. The main loop then begins, where at each generation the algorithm trains a local Kriging model around the current best solution using both the recently evaluated vectors, and those stored in memory which are nearest to the best solution. The algorithm excludes SI vectors from the model training set. Next, the algorithm seeks the optimum of the model in the bounded region defined by the latter two sets of solutions, namely, the recently evaluated ones and the nearest neighbors. As the optimizer, it uses the covariance matrix adaptation evolutionary strategy (CMA-ES) algorithm. In the spirit of the Expected-Improvement framework [39], during this optimization step the algorithm uses the following merit function:
^f ðxÞ ¼ mðxÞ lfðxÞ;
ð20Þ
where m(x) is the prediction of the Kriging model, l is a prescribed coefficient, and f(x) is the estimate of the model prediction error, which is zero at sampled points since there the true objective value is known. The search is repeated for l = 0,1, 2, and 4, to obtain four solutions corresponding to different search profiles, namely, ranging from a local search (l = 0) to a more explorative one (l = 4). All nonduplicate solutions found are evaluated with the true expensive function and are stored in memory. In case no new solutions were evaluated, for example, because they already exist in the cache, the algorithm generates a new solution by perturbing the current elite. Following Ref. [22], the algorithm used a training set of 100 vectors comprising of the 50 most recently evaluated ones and 50 nearest-neighbors, and used the CMA-ES algorithm from Ref. [40]. To gauge the contribution of the classifier adaptation stage, we also ran the proposed algorithm with the parameter k fixed to 1, so this variant was identical to the proposed algorithm except that it did not employ classifier adaptation. In all tests the optimization budget was 200 evaluations of the true objective function, namely, simulation runs, and the size of the initial sample was 20. To support a valid statistical analysis, 30 runs were repeated for each combination of algorithm and AOA. Table 5 gives the test statistics for the four algorithms across the different AOAs, where the best mean and median scores are shown in bold. The table also gives the significance-level a, either 0.01, 0.05, at which the proposed algorithm was better than each of
Table 5 Optimization results for different values of the angle of attack. AOA 20
Mean SD Median Min (best) Max (worst)
P-A
P-F
EA–PS
EI–CMA-ES
3.389e+00 6.968e02 3.384e+00 3.509e+00 3.270e+00
3.363e+00 9.178e02 3.374e+00 3.472e+00 3.190e+00
3.288e+00 8.834e02 3.270e+00 3.585e+00 3.205e+00 0.01
3.294e+00 3.154e01 3.282e+00 4.433e+00 2.958e+00 0.05
2.670e+00 2.608e02 2.673e+00 2.714e+00 2.631e+00
2.680e+00 2.508e02 2.673e+00 2.723e+00 2.640e+00
2.640e+00 1.251e02 2.638e+00 2.671e+00 2.625e+00 0.01
2.641e+00 5.204e02 2.652e+00 2.733e+00 2.532e+00
2.104e+00 5.973e02 2.089e+00 2.222e+00 2.022e+00
2.083e+00 3.327e02 2.084e+00 2.144e+00 2.037e+00
2.027e+00 3.942e02 2.013e+00 2.129e+00 1.981e+00 0.01
2.058e+00 1.033e01 2.068e+00 2.236e+00 1.784e+00
a 30
Mean SD Median Min (best) Max (wor)
a 40
Mean SD Median Min Max
a
P-A: proposed algorithm with classifier adaptation. P-F: proposed algorithm but with a fixed classifier (k = 1). EA–PS: EA with periodic sampling [38]. EI–CMA-ES: Expected improvement framework with a CMA-ES optimizer [22].
69
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
the other three algorithms, and where an empty cell indicates performance was not statistically significant at the latter levels. Statistical significance tests were done using the Mann–Whitney nonparametric test [41]. From analysis of the test results at each AOA setting, it follows: AOA = 20°: The proposed algorithm obtained both the best mean and median scores, followed by the variant with a fixed k parameter (P-F). The proposed algorithm also had the best (lowest) standard deviation score, followed by the EA–PS algorithm. AOA = 30°: The P-F variant had the best mean statistic, closely followed by the proposed algorithm. Both of them had the same median statistic, and were better than the other two algorithms, namely, EA–PS and EI–CMA-ES. With respect to the standard deviation, the EA–PS algorithm had the best score, followed by the P-F variant. AOA = 40°: Similarly to the AOA = 20° case, the proposed algorithm had the best mean and median scores, followed by the P-F variant. With respect to the standard deviation, the P-F variant had the best score, followed by the EA–PS algorithm. Overall, results show that the proposed algorithm either obtained the best or second best objective value, while its performance was robust, namely, characterized by a low standard deviation, across the three AOA settings. Statistics also highlight the specific contribution of the classifier adaptation stage, as the proposed algorithm consistently outperformed the P-F variant which uses a fixed classifier parameter. Lastly, analysis also shows that in the AOA = 20° case the proposed algorithm had performance gains which were statistically significant over both reference algorithms, namely, EA–PS and EI–CMA-ES. At the 30° and 40° its performance was statistically significant better than the EA–PS algorithm. Investigating another performance aspect, Table 6 gives the test statistics for the number of SI vectors encountered during the optimization search. This data is important as it indicates the efficiency of the algorithms, namely, the amount of failed evaluations, and therefore of wasted computer resources, which occurred their respective runs. It follows that the proposed algorithm consistently obtained the second best score for the mean statistic across all AOA cases. The EA–PS algorithm had the best (lowest) score for the mean, which is attributed to it incorporating the penalized SI vectors into model. This strongly biases the search away from SI vectors, but results in a deformed model landscape, as discussed in Section 2.1, and a poor final result, as evident from Table 5. Overall, these results and analysis show that the proposed algorithm also performed an efficient search, namely, characterized by a competitively low number of SI vectors. To visualize the performance of the four algorithms, Fig. 5 provides representative convergence logs for the AOA = 20° and 40° cases. Each curve corresponds to a single run by one algorithm. The plots show that the proposed algorithm converged much faster than the other algorithms, including the P-F variant with a fixed k parameter, and obtained a better final result. Another important performance aspect is how frequently was the k parameter updated during the optimization searches, namely, whether a single value was chosen and used during most of the search, or if the selected value was updated frequently. Fig. 6 shows two representative plots of the optimal k parameter chosen during a test run with AOA = 20°, and with another with AOA = 40°. It follows that the optimal k value was updated frequently during both runs, indicating that there was no single selected value, and that it varied during the search. This further highlights the merit of the proposed algorithm, as it continuously adapts the k parameter. With respect to the algorithms’ run time, namely, the time consumed by training a model, performing a search using the model, and so on, it was
Table 6 Variation of the number of SI vectors with respect to the value of the angle of attack. AOA
P-A
P-F
EA–PS
EI–CMA-ES
20
Mean SD Median Min Max
3.700e+00 4.001e+00 1.500e+00 0.000e+00 1.100e+01
3.900e+00 4.149e+00 3.000e+00 1.000e+00 1.500e+01
2.100e+00 1.553e+00 2.000e+00 0.000e+00 5.000e+00
1.055e+01 2.793e+01 1.000e+00 0.000e+00 1.160e+02
30
Mean SD Median Min Max
6.800e+00 5.391e+00 5.000e+00 1.000e+00 1.700e+01
8.200e+00 4.780e+00 8.000e+00 1.000e+00 1.700e+01
4.650e+00 2.519e+00 4.000e+00 0.000e+00 1.100e+01
1.075e+01 2.084e+01 3.000e+00 0.000e+00 8.500e+01
40
Mean SD Median Min (best) Max (worst)
4.400e+01 1.795e+01 4.200e+01 2.400e+01 8.300e+01
4.830e+01 9.274e+00 4.800e+01 3.100e+01 6.800e+01
2.960e+01 6.056e+00 2.950e+01 1.800e+01 4.200e+01
3.565e+01 3.164e+01 3.000e+01 2.000e+00 1.030e+02
a P-A: proposed algorithm with classifier adaptation. P-F: proposed algorithm but with a fixed classifier (k = 1). EA–PS: EA with periodic sampling [38]. EI–CMA-ES: Expected improvement framework with a CMA-ES optimizer [22].
Fig. 5. Representative convergence plots for the four algorithms.
Fig. 6. The optimal value of the k parameter selected during two test runs.
comparable for all four algorithms and was negligible compared to the time consumed by simulation runs. Lastly, Fig. 7 shows representative airfoils obtained by the proposed algorithm at each of the AOA cases, showing it achieved valid profiles. Overall, results show that the proposed algorithm outperformed the two reference algorithms from literature in terms of final objective function obtained, and was able to maintain a competitively low number of SI vectors during the search, so less computer resources were wasted on failed evaluations. Its convergence was also significantly faster, implying it was able to achieve a good solution using a smaller number of expensive evaluations. Results also highlight the specific contribution of the proposed classifier
70
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71
Fig. 7. Representative airfoils obtained by the proposed algorithm, shown in black. For comparison, the baseline NACA0012 airfoil is also shown, in gray.
adaptation stage, as evident from the comparisons to the P-F variant which did not use classifier adaptation. 5. Summary The modern engineering design process often relies on computer simulations to evaluate candidate designs, a setup which results in an expensive black-box optimization problem. In practise, there will often exist candidate designs which cause the computer simulation to fail, and so such designs will have no objective value assigned to them. This scenario can lead to search stagnation and a poor final result. Existing approaches to handle such candidate designs either assign them a penalized objective value, or discard them altogether, but both of these approaches have significant demerits, as discussed in Section 2. Accordingly, this study has proposed a new computational intelligence optimization algorithm which combines a model and a k-NN classifier. The latter predicts which solutions are expected to crash the simulation, and its prediction is incorporated with the model’s prediction, to bias the search towards candidate designs for which the simulation is expected to complete successfully. However, the classifier’s accuracy depends on the parameter k, namely, the number of nearest neighbors to consider, but the optimal parameter value is typically not known a priori. and may be impractical to obtain by numerical experiments due to the limited computational budget. Therefore, the proposed algorithm incorporates a classifier adaptation stage, where using model selection techniques, it identifies the best k parameter value out of a family of candidate values. The parameter is continuously updated during the search. Section 4 presented an extensive performance analysis, starting with a parameter sensitivity analysis aimed at identifying a suitable set of values for the parameters of the proposed algorithm. Next, the algorithm was benchmarked against two representative model-assisted algorithms, and a variant of itself but with no classifier adaptation. All algorithms were benchmarked using a simulation-driven engineering application of airfoil shape optimization. Results and analysis show that: The proposed algorithm consistently obtained either the best or near-best objective value. Its performance was robust, namely, having a small standard deviation, across different problem settings. During the optimization search, it generated a competitively small number of SI vectors, so it reduced the amount computer resources wasted in failed simulation runs. Its convergence was typically much faster than that of the other three algorithms. The parameter k was updated frequently, indicating there was no single optimal value, and that the optimal value varied during the optimization search. Accordingly, the proposed algorithm appears to be an adequate strategy for effectively and efficiently handling SI vectors in expensive optimization problems. References [1] Santner TJ, Williams BJ, Notz W. The design and analysis of computer experiments. In: Springer series in statistics. New York; Berlin: Springer; 2003.
[2] Schwefel H-P. Evolution and optimum seeking. In: Sixth-generation computer technology series. New York: Wiley; 1995. [3] Simpson TW, Poplinski JD, Koch PN, Allen JK. Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 2001;17:129–50. [4] Lim D, Jin Y, Ong Y-S, Sendhoff B. Generalizing surrogate-assisted evolutionary computation. IEEE Trans Evolut Comput 2010;14(3):329–55. [5] Tenne Y, Armfield SW. A framework for memetic optimization using variable global and local surrogate models. J Soft Comput 2009;13(8):781–93. [6] Madych WR. Miscellaneous error bounds for multiquadric and related interpolators. Comput Math Appl 1992;24(12):121–38. [7] Jin Y, Olhofer M, Sendhoff B. A framework for evolutionary optimization with approximate fitness functions. IEEE Trans Evolut Comput 2002;6(5):481–94. [8] Wujek BA, Renaud JE. New adaptive move-limit management strategy for approximate optimization, part 1. AIAA J 1998;10:1911–21. [9] Wujek BA, Renaud JE. New adaptive move-limit management strategy for approximate optimization, part 2. AIAA J 1998;10:1922–34. [10] Conn AR, Scheinberg K, Toint PL. On the convergence of derivative-free methods for unconstrained optimization. In: Iserles A, Buhmann MD, editors. Approximation theory and optimization: tributes to M.J.D. Powell. Cambridge; New York: Cambridge University Press; 1997. p. 83–108. [11] Conn AR, Gould NIM, Toint PL. Trust region methods. Philadelphia, Pennsylvania: SIAM; 2000. [12] Ong Y-S, Nair PB, Keane AJ. Evolutionary optimization of computationally expensive problems via surrogate modeling. AIAA J 2003;41(4):687–96. [13] Tenne Y, Armfield SW. A memetic algorithm using a trust-region derivativefree optimization with quadratic modelling for optimization of expensive and noisy black-box functions. In: Yang S, Ong Y-S, Jin Y, editors. Evolutionary computation in dynamic and uncertain environments. Studies in computational intelligence, vol. 51. Berlin: Springer-Verlag; 2007. p. 389–415. [14] Tenne Y, Armfield SW. A versatile surrogate-assisted memetic algorithm for optimization of computationally expensive functions and its engineering applications. In: Yang A, Shan Y, Thu Bui L, editors. Success in evolutionary computation. Studies in computational intelligence, vol. 92. Berlin; Heidelberg: Springer-Verlag; 2008. p. 43–72. [15] Koehler JR, Owen AB. Computer experiments. In: Ghosh S, Rao CR, Krishnaiah PR, editors. Handbook of statistics. Amsterdam: Elsevier; 1996. p. 261–308. [16] Rasheed K, Hirsh H, Gelsey A. A genetic algorithm for continuous design space search. Artif Intell Eng 1997;11:295–305. [17] Conn AR, Scheinberg K, Toint PL. A derivative free optimization algorithm in practice. In: Proceedings of the seventh AIAA/USAF/NASA/ISSMO symposium on multidisciplinary analysis and optimization. American Institute of Aeronautics and Astronautics, Reston, Virginia, AIAA Paper AIAA-1998-4718; 1998. [18] Poloni C, Giurgevich A, Onseti L, Pediroda V. Hybridization of a multi-objective genetic algorithm, a neural network and a classical optimizer for a complex design problem in fluid dynamics. Comput Methods Appl Mech Eng 2000;186(2–4):403–20. [19] Okabe T. Stabilizing parallel computation for evolutionary algorithms on realworld applications. In: Proceedings of the 7th international conference on optimization techniques and applications–ICOTA 7. Tokyo: Universal Academy Press; 2007. p. 131–2. [20] Emmerich MTM, Giotis A, Özedmir M, Bäck T, Giannakoglou KC. Metamodelassisted evolution strategies. In: Merelo Guervós JJ, editor. The 7th international conference on parallel problem solving from nature–PPSN VII. Lecture notes in computer science, vol. 2439. Berlin: Springer; 2002. p. 361–70. [21] Liang K-H, Yao X, Newton C. Evolutionary search of approximated Ndimensional landscapes. Int J Knowl-Based Intell Eng Syst 2000;4(3):172–83. [22] Büche D, Schraudolph NN, Koumoutsakos P. Accelerating evolutionary algorithms with gaussian process fitness function models. IEEE Trans Syst, Man, Cyber–Part C 2005;35(2):183–94. [23] Handoko S, Kwoh CK, Ong Y-S. Feasibility structure modeling: an effective chaperon for constrained memetic algorithms. IEEE Trans Evolut Comput 2010;14(5):740–58. [24] Tenne Y, Izui K, Nishiwaki S. Handling undefined vectors in expensive optimization problems. In: Di Chio C, editor. Proceedings of the 2010 EvoStar conference. Lecture notes in computer science, vol. 6024/ 2010. Berlin: Springer; 2010. p. 582–91. [25] Cressie NAC. Statistics for spatial data. New York: Wiley; 1993. [26] MacKay DJC. Gaussian processes – a replacement for supervised neural networks? Lecture notes. Cavendish Laboratory, Department of Physics, Cambridge University; 1997. [27] Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley; 2001. [28] MacQueen JB. Some methods for classification and analysis of multivariate observations. In: Neyman J, Le Cam LM, editors. Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press; 1967. p. 281–97.
Y. Tenne / Advances in Engineering Software 47 (2012) 62–71 [29] Frank A, Asuncion A. UCI Machine Learning Repository; 2010. [30] Burnham KP, Anderson DR. Model selection and inference: a practical information-theoretic approach. New York: Springer; 2002. [31] McKay MD, Beckman RJ, Conover WJ. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 1979;21(2):239–45. [32] Chipperfield A, Fleming P, Pohlheim H, Fonseca C. Genetic Algorithm TOOLBOX For Use with MATLAB, Version 1.2, Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield; 1994. [33] Schaback R. Multivariate interpolation and approximation by translates of a basis function. In: Chui CK, Schumaker LL, editors. Approximation theory VIII. Farrer Road, Singapore; River Edge, New Jersey: World Scientific; 1995. p. 491–514. [34] Johnson ME, Moore LM, Ylvisaker D. Minimax and maximin distance designs. J Statist Plan Infer 1990;26(2):131–48. [35] Hicks RM, Henne PA. Wing design by numerical optimization. J Aircraft 1978;15(7):407–12.
71
[36] Wu H-Y, Yang S, Liu F, Tsai H-M. Comparison of three geometric representations of airfoils for aerodynamic optimization. In: Proceedings of the 16th AIAA computational fluid dynamics conference. American Institute of Aeronautics and Astronautics, AIAA 2003-4095; 2003. [37] Drela M, Youngren H, 9 User Primer XFOIL 6. Department of aeronautics and astronautics. Cambridge (MA): Massachusetts Institute of Technology; 2001. [38] Ratle A. Optimal sampling strategies for learning a fitness model. In: The 1999 IEEE congress on evolutionary computation–CEC 1999. Piscataway, New Jersey: IEEE; 1999. p. 2078–85. [39] Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box functions. J Global Optim 1998;13:455–92. [40] Hansen N, Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolut Comput 2001;9(2):159–95. http://www.lri.fr/ hansen/cmaes_inmatlab.html. [41] Sheskin DJ. Handbook of parametric and nonparametric statistical procedures. 4th ed. Boca Raton (Florida): Chapman and Hall; 2007.