Nested evolutionary algorithms for computationally expensive bilevel optimization problems: Variants and their systematic analysis

Nested evolutionary algorithms for computationally expensive bilevel optimization problems: Variants and their systematic analysis

Swarm and Evolutionary Computation 48 (2019) 329–344 Contents lists available at ScienceDirect Swarm and Evolutionary Computation journal homepage: ...

1MB Sizes 0 Downloads 34 Views

Swarm and Evolutionary Computation 48 (2019) 329–344

Contents lists available at ScienceDirect

Swarm and Evolutionary Computation journal homepage: www.elsevier.com/locate/swevo

Nested evolutionary algorithms for computationally expensive bilevel optimization problems: Variants and their systematic analysis Hemant Kumar Singh ∗ , Md Monjurul Islam, Tapabrata Ray, Michael Ryan School of Engineering and Information Technology, The University of New South Wales, Canberra, ACT, 2600, Australia

A R T I C L E

I N F O

Keywords: Bilevel optimization Surrogate assisted algorithm evolutionary algorithm Differential evolution

A B S T R A C T

Bilevel optimization problems involve a hierarchical model where an upper level optimization problem is solved with a constraint on the optimality of a nested lower level problem. The use of evolutionary algorithms (EAs) and other metaheuristics has been gaining attention to solve bilevel problems, especially when they contain nonlinear/black-box objective(s) and/or constraint(s). However, EAs typically operate in a nested mode wherein a lower level optimization is executed for each upper level solution. Evidently, this process requires excessive number of function evaluations, which might become untenable if the underlying functions are computationally expensive. In order to reduce this expense, the use of approximations (also referred to as surrogates or metamodels) has been suggested previously. However, the previous works have focused only on the use of surrogates for the lower level problem, whereas the computational expense of the upper level problem has not been considered. In this paper, we aim to make two contributions to address this research gap. The first is to introduce an improved nested EA which uses surrogate-assisted search at both levels in order to solve bilevel problems using limited number of function evaluations. The second is the revelation and a systematic investigation of a previously overlooked aspect of bilevel search – that the objective/constraints at the upper and lower levels may involve different computational expense. Consideration of this aspect can help in deciding a suitable strategy, i.e., in which level is the use of surrogates most appropriate for the given problem. Towards this end, four different nested strategies – with surrogate at either level, none or at both levels, are compared under various experimental settings. Numerical experiments are presented on a wide range of problems to demonstrate the efficacy and utility of the proposed contributions.

1. Background and motivation Bilevel optimization problems are those that contain a nested lower level (LL) optimization problem as a constraint within an upper level (UL) optimization problem. Bilevel, or more generally, multi-level optimization problems, have been a subject of research in mathematical programming for over half a century. The origins of such formulations are attributed to the works of Stackelberg [1], Bracken and McGill [2]; the former with an interest to model economic market theory and the latter in regards to defense applications. A number of other applications have been studied since that utilize a hierarchical model, spanning diverse areas such as structural optimization [3], transportation [4], production planning [5] and chemical engineering [6] among many others [7,8]. In the mathematical programming domain, several exact techniques have been developed to solve bilevel optimization problems, a biblio-

graphic review of which can be found in Ref. [9]. Understandably, these methods typically incorporate assumptions that the analytical expressions of the objective/constraint functions are known and conform to certain mathematical conditions. For example the studies [10–14] assumed the problem to be linear at both levels [15], assumed the problems to be quadratic [16], assumed them to be convex, whereas [17] considered nonlinear non-convex problems with strong second-order sufficiency condition and linear independence condition. In order to address more generic problems/applications where the above conditions may not be satisfied or the analytical functions for objective/constraints may not be available altogether, the use of metaheuristics such as genetic/evolutionary algorithms (GA/EA), has been suggested [7]. A recent review of evolutionary approaches for bilevel optimization can be found in Ref. [8]. Since evolutionary approaches operate without assuming any knowledge about the nature of the underlying objective/constraint functions, they come with their own short-

∗ Corresponding author. E-mail addresses: [email protected] (H.K. Singh), [email protected] (M.M. Islam), [email protected] (T. Ray), [email protected] (M. Ryan). https://doi.org/10.1016/j.swevo.2019.05.002 Received 25 September 2018; Received in revised form 1 February 2019; Accepted 9 May 2019 Available online 23 May 2019 2210-6502/© 2019 Elsevier B.V. All rights reserved.

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

• The first is to approximate the optimal LL decision variables as func-

comings. The most evident one is that the global optimality of the solutions can no longer be ascertained (at either upper or lower levels). The second is that typically these methods operate in a nested mode, where to evaluate each upper level decision vector, an optimization needs to be performed at the lower level. Such approaches typically result in a very high number of lower level (and overall) function evaluations during the optimization. For instance, the number of evaluations used by nested metaheuristic methods in Refs. [18–21] run in millions. Thus, while both the above two limitations are typical of EAs even for standard (single level) optimization tasks, they are further compounded when the optimization is performed at multiple levels. Additionally, the conflicting interaction of upper and lower level problems (if present) poses further challenges to evolutionary strategies since a solution at the upper level may falsely appear to be of high quality if its lower level decision vector is not optimal [22]. Therefore, it is of significant research interest to improve evolutionary algorithms for bilevel problems such that (a) they can be used to identify near-optimal solutions to the complex problems/applications that do not conform to limiting mathematical conditions and (b) are not computationally overwhelming. Some of the major research directions being undertaken towards accomplishing this include:

tions of the UL decision variables. If an accurate mapping could be obtained, then some of the lower level optimization tasks can be bypassed in order to save significant number of exact function evaluations. This type of approach has been adopted in Refs. [30,36,37], where a quadratic approximation is used to create this mapping. On the same lines, a similarity based surrogate model (SBSM) is used in Ref. [31], and an additional parameter is prescribed to control the proportion of conducting optimization versus approximating the lower level optimum. More recently, a quadratic approximation of LL optimal value function (instead of variables) has been incorporated in Ref. [38], primarily to deal with cases where the lower level variable mapping is undefined due to existence of multiple optimal solutions. • The second is to not forgo the LL optimization altogether, but instead use approximations to guide the lower level search instead of exact function evaluations. For example, in the modified N-BLEA approach presented in Ref. [39], the LL problem is approximated as a quadratic programming problem first. If such a model is built successfully, then quadratic programming (in lieu of EA) can be used to solve the approximated LL problem. The corresponding solution is accepted as optimal if its LL objective value (based on quadratic programming model) is within a certain threshold of its exactly evaluated objective. In Ref. [40], approximate solutions are used for the lower level which is NP-hard, for solving a capacitated facility location problem. In a more recent work [22], the SABLA (surrogateassisted bilevel algorithm) approach progressively approximates and searches through the lower level problem using multiple-surrogate assisted EA.

1. Hybridization of evolutionary and classical algorithms to capitalize on the versatility of evolutionary algorithms with the speed of quickconverging local searches [21,23–25]. 2. Reformulation of the bilevel problem into a single level problem for the EA through use of approximate optimality conditions [26–29]. 3. Use of approximations in lieu of full optimization of lower level problems [22,30,31]. In this study, we are concerned with the last class of the above approaches, i.e., the use of approximations (also referred to as metamodels or surrogates). Such approaches are particularly suitable for the problems where the evaluation of the objective and/or constraint function(s) is computationally expensive. An expensive evaluation could be a manifestation of several different kinds of scenarios. For example, the evaluation may involve time consuming numerical simulations such as finite element analysis (FEA), computational fluid dynamics (CFD), physical experiments in loop such as wind-tunnel testing, or monetary expense such as paying a third party to conduct an analysis. Consequently, reduction in the number of exact evaluations is considered more critical than reducing time complexity of evolutionary operators (such as crossover, mutation, approximation). This is an inherent assumption in the study of computationally expensive problems, since the time taken for evaluation may, e.g., run in minutes/hours, whereas time involved in all algorithmic operations combined would typically still be in seconds (or less). The total computation time is considered directly proportional to the number of exact evaluations in this paradigm [32,33]. Therefore, the studies addressing such problems attempt to ensure, to the best extent possible, that a solution is likely to produce improved result (also termed as pre-selection) [34] before conducting exact evaluation on it. While there is significant amount of literature on application of these techniques for standard (single-level) problems [35], only a few of the recent works have explored their potential for evolutionary bilevel optimization problems. These are briefly discussed below. When it comes to bilevel optimization task, the numbers of overall LL function evaluations conducted typically end up being much higher than the UL evaluations in a nested solution approach. Therefore the approximation-based methods have primarily focused on reducing the number of the LL evaluations [22,31,36] (as they form the overwhelming majority of the total evaluations). This statement could be interpreted in either way: reducing the number of LL evaluations to obtain similar quality of results or, equivalently, obtain better results using similar number of function evaluations compared to the peer algorithms. The existing methods that use approximations could be broadly categorized into two approaches:

As evident from the review above, the existing studies have attempted to reduce the number of LL evaluations, but none of them have considered a scenario where the UL objective and/or constraint function(s) are computationally expensive. However, it is indeed the case in some of the practical scenarios that the UL problem is more computationally expensive. Consider, for example, the shape optimization problem studied in Ref. [41], which involves minimization of weight (volume) of a solid subject to restriction on the Von Mises stress at the UL; and minimization of the body energy functional at the lower level for equilibrium. For complex geometries and/or large number of nodes, the finite element analysis (and similarly other numerical methods) to obtain the stresses at UL could be time consuming. Another example can be drawn from transportation research domain, where the “toll setting” is an often cited bilevel problem [8]. In this problem, a transport authority seeks to set an optimal toll with the UL objective of maximizing revenue, while for any given value, the user(s) make a decision on whether to use the toll road or not in order to optimize their commute time and cost. Thus, the LL problem in this case might be fairly simple (consisting of only an individual’s decision); but for the operator at UL, computing the revenue accurately could involve building a high-fidelity model of traffic flow and user’s choices in order to estimate the revenues accurately. In order to handle such problems, the existing techniques need to be enhanced to not only limit the function evaluations at the lower level, but also (or instead) the upper level. Infact, for completeness it is possible to conceive four different scenarios – two where either the upper or lower level objectives/constraints are expensive to evaluate and two where both are expensive or cheap to evaluate. In the current literature, this aspect has not been considered. In order to bridge this gap, this paper attempts to make the following new contributions: 1. The first is the identification of an important gap in the current evolutionary bilevel literature and propose consideration of a new perspective in solving them. So far all methods, in particular approximation based methods, have indiscreetly tried to reduce LL evaluations; which subsumes that the expense of evaluating a UL and LL problem are exactly the same. However, in principle it may be possi330

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

ble for certain problems that UL evaluation is much more expensive than LL, in which case a large number of LL evaluations may be more affordable than a single UL evaluation. 2. The second is to then design and study nested strategies that consider UL as expensive. In-fact, all four scenarios discussed are systematically discussed by using surrogate-assistance at different levels and benchmarking them on a range of problems. To the authors’ knowledge, this is the first attempt to design metaheuristics that explicitly try to reduce the UL evaluations. It is worth noting here that the application of surrogate to UL is not trivial, unlike the LL (or single level problems). This is because for LL or single level problems, evaluating a solution truly is straightforward and unambiguous. Therefore, the models that are built using these truly evaluated solutions have a good training database, which makes the approximation more accurate. However, when applying this at UL, the true evaluation itself involves an LL optimization; and depending on how accurate LL optimum is obtained, the training data itself for the surrogate modeling becomes inaccurate, having the evident follow-on effects on UL optimization. Therefore, at UL, we employ two types of strategies - training with available points every generation, and only the re-evaluated points periodically. The latter dataset is more accurate, but has much lower number of available points, since re-evaluation is done only selectively considering its computational expense. 3. The third is to demonstrate that the surrogates should not be indiscreetly applied to a given problem; but rather the strategy should be selected based on the nature of the problem at hand. Depending on which level has computationally expensive evaluations, the choice of appropriate strategy (from among the four discussed) will give a better performance than indiscreetly applying surrogate assisted search. The key reason for this is that surrogates are good for quick improvements, but the involvement of approximations may mislead the search. On the other hand, the non-surrogate assisted strategies use a lot of true evaluations, but due to accurate performance estimates the chance of the search getting misled is lower. Hence when affordable, the latter should be preferred. This is demonstrated through extensive experimentation in Part 2 of the experiments presented (in Section 3).

mally defined as shown in Equation (1). The subscripts u and l refer to the attributes of the upper and lower level problems, respectively. For a valid evaluation of any given UL solution xu , it is imperative that the xl must be the optimum solution of the corresponding lower level problem. Note that xu acts as a fixed parameter set for the LL problem. Minimize xu

S.t.

Fu (xu , xl ), Gk (xu , xl ) ≤ 0, k = 1, … , qu , Hk (xu , xl ) = 0, k = 1, … , ru ,

Minimize xl

S.t.

(1)

fl (xu , xl ), gk (xu , xl ) ≤ 0, k = 1, … , ql , hk (xu , xl ) = 0, k = 1, … , rl ,

where xu ∈ 𝕏u ,

xl ∈ 𝕏l

The UL and LL objective functions are denoted by Fu (xu , xl ) and fl (xu , xl ) respectively in Equation (1). The UL variables (xu ) and LL variables (xl ) are considered to be continuous and lie within rectangular bounds 𝕏u and 𝕏l respectively in this study. The inequality/equality constraints at UL and LL are represented using the sets G∕H and g∕h. For brevity, we do not include the equality constraints in the subsequent discussions since they can be reduced to two inequality constraints through the use of a tolerance threshold (a common practice in evolutionary optimization). 2.2. General optimization framework We start with a high-level overview of the approaches considered in this study, and then go into more specifics of each strategy in the subsections that follow. Considering the nested paradigm, two optimization algorithms need to be deployed – one to optimize the upper and the other to optimize the lower level problems. Among the two optimization algorithms, one with the use of surrogates and the other without leads to four possibilities listed in Table 1. In this work, we use a differential evolution (DE) based EA as the baseline algorithm, and hence the four strategies are abbreviated as DE-DE, DE-SA, SA-SA and SA-DE (where SA stands for surrogate-assisted). Out of these four strategies, the first one (DE-DE) is the simple combination of EA at both levels, which is representative of the existing algorithms such as BiDE [19] and N-BLEA [18]. The second one involves the use of surrogates at the lower level, a strategy similar to the SABLA [22] previously developed by the first three authors. The last two (SA-DE and SA-SA) involve use of surrogates for the upper level optimization, which in authors’ knowledge has not been studied before. Both of these approaches are newly proposed in this work. Of particular interest is SA-SA which extends SABLA by considering the problem to be computationally expensive at both levels. Nevertheless, all four strategies are of relevance to the further numerical experiments in the study. Also worth noting is that owing to the different nature of the problem at the upper and lower levels, there are slight differences in how the DE/SA are employed at the upper and lower levels. We have coded all four strategies under a unified framework (that uses exactly same implementations of recombination, local search etc.) for this study in order to avoid any variations due to minutiae of different implementations. These strategies are discussed next.

Towards demonstrating the above contributions, a set of 25 wellknown bilevel benchmark problems with different challenges are considered. An extensive set of numerical experiments conducted on the problems to characterize the algorithms performance. The various nested evolutionary strategies are compared and their relative advantages and shortcomings in different settings are reported. Following this background and motivation for the study, we discuss the general framework and nested strategies in Section 2, followed by numerical experiments in Section 3. Concluding remarks are presented in Section 4.

2. Proposed approach(es) 2.1. Preliminaries In its generic form, bilevel optimization problems may have single or multiple objectives at either level. Within the scope of this paper, we consider single-objective problems at both levels, which can be for-

Table 1 Four different nested algorithms studied in this paper. Upper level optimizer

Lower level optimizer

Abbreviation

Comments

Differential Evolution Differential Evolution Surrogate-assisted DE Surrogate-assisted DE

Differential Evolution Surrogate-assisted DE Surrogate-assisted DE Differential Evolution

DE-DE DE-SA SA-SA SA-DE

Similar to Ref. [19] Similar to Ref. [22] New New

331

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

Algorithm 1 Surrogate Assisted Bilevel Algorithm (SA-SA)

2.3. The four strategies: SA-SA, SA-DE, DE-SA, DE-DE

to ensure that the accuracy as well as choice of the surrogate models for any given function improves over the course of search. The process repeats until the LL generations are completed. Thereafter, a local search is run using the surrogate model (i.e. without use of any additional exact evaluations during the search) to estimate the best possible LL solution; and this obtained (predicted optimal) solution is then exactly evaluated. Evidently, if the model was an accurate representation of the LL landscape, this solution would most likely result in an improved solution. However if not, then this predicted optimum solution may not turn out to be better than existing solutions as it might have been misled to a sub-optimal region. Therefore, the solution obtained through this local search is not by default assumed to the optimal. Instead, the overall exact best solution obtained from this LL search (among all LL solutions exactly evaluated in DE + local search) is sent to the UL for the evaluation of corresponding xu . • Upper level search: At the upper level, DE mutation, crossover and ranking operators are used for evolution and selection of solutions. These operators are briefly outlined later in Section 2.4. Two different archives are maintained during the search: Archiveall which contains all evaluated solutions at the upper level, and ArchiveR which stores only re-evaluated solutions. The latter contains solutions more representative of the true performance of the UL solutions (because of more extensive LL search involved in re-evaluation); but is much smaller in size since re-evaluation is only done on the top ranked UL solution in each generation. The former represents a more crude estimate, but contains more points to aid the surrogate modeling. After the evaluation and ranking of the population, Archiveall is used to build multiple surrogate models for xu vs Fu , Gu . Similar to LL optimization, the models with the least validation error for each of these functions are considered the corresponding representative approximations. Subsequently, a DE and local search are executed on the surrogate functions (with UL variables) in order to come up with a best estimate of Fu . Note that this step does not evolve any exact evaluation. The solution obtained through this exercise is only approximate, so it is then re-evaluated to estimate its true performance.

In order to present all four strategies succinctly, we present below the description of SA-SA in Algorithms 1 and 2 and thereafter identify on it the lines which will be switched off for the corresponding algorithm without the surrogate in the remaining three strategies.

2.3.1. SA-SA For SA-SA, the upper level (UL) algorithm uses a DE algorithm whose convergence is expedited through the use of surrogates. At the same time, surrogate-assisted DE is used at the lower level (LL) optimization for reducing the underlying number of LL evaluations. The key steps (Algorithm 1) are as follows.

• Initialization: The algorithm starts with an initialization of Nu UL solutions within the bounds (𝕏u ) using Latin hypercube sampling [42]. For evaluating each of the UL solutions, the LL problem is optimized through a surrogate assisted optimization (SAO) and the best found xl is sent to UL for determining Fu . The top (best feasible) solution in the population at the upper level is re-evaluated. The re-evaluation (discussed later in Section 2.6) is intended to put a more stringent check on the LL optimality of a given solution in order to reflect its exact UL objective value, as demonstrated in our previous works [22, 23]. • Lower level search: The SAO used for LL optimization is outlined in Algorithm 2 and also uses DE as the underlying algorithm. It starts by generating the initial LL population using Latin hypercube sampling within the bounds 𝕏l and executing exact function evaluation for each individual in the population. A set of different surrogate models (discussed in Section 2.5) are then constructed to approximate fl , gl as a function of xl . The individual models that approximate each objective/constraint the best (based on validation error) are then chosen as representative surrogate models for prediction in subsequent LL generations. After evolution for a certain (prescribed) number of generations, the population is exactly evaluated again. Concurrently, the archive of evaluated solutions so far is updated, as well as the surrogate models are re-built using the updated archive. This periodic re-training of the surrogate models is 332

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

Algorithm 2 Optimization at the lower level

2.4. DE operators and local search

After every few (prescribed) number of generations (Tuls ), the surrogate model is built on the re-evaluated archive (ArchiveR ) instead of the overall archive (Archiveall ). The reason is that the surrogate models are likely to be more accurate on ArchiveR since the performance estimates in ArchiveR are much better. However, since ArchiveR only gains one new solution per generation, some generations are needed before there are enough solutions to build the surrogate models. Additionally, this process is also executed once in the last generation in order to further improve the final solution delivered (Line 14 in Algorithm 1). • Ranking and reduction: The final ranking in the generation (referred to as RankAF in Algorithm 1) orders the solutions in ArchiveR by Fu and constraint violations, and keeps them and above other solutions in the population. Once again, the reason is that the objective/constraint values in ArchiveR are likely to be true reflection of the performance. Therefore, even if a solution in ArchiveR has a worse objective value than the one in the current population, it will obtain a higher ranking, in order to maintain a selection pressure towards the exact optimum of the problem. Out of the combined set of ranked solutions (current population and ArchiveR ), the top Nu solutions are selected to progress to the next generation.

The underlying operators of the DE in this study are adopted from the 𝜖 -constrained differential evolution (𝜖 DE) algorithm [43]. For evolution, a trial vector is first constructed from each population member xi . Considering three random vectors xr1 , xr2 , xr3 from the population, and a scaling factor F, the trial vector is generated through mutation as: xt = xr1 + F .(xr2 − xr3 ),

(2)

Next, a binomial crossover is performed between the trial solution xt and parent solution xi using the commonly used DE/1/rand/bin strategy. Equation (3) shows the details of this operation for each variable. jr = a random integer in the range[1, Nvar ]; Nvar = no. of variables { xt (j), if rand[0, 1) ≤ CR or j = jr ; CR = crossover rate xc (j) = xi (j), otherwise (3) The child vector generated through the crossover operation above is then evaluated and accepted if it is better than the parent. In order to perform this comparison, the 𝜖 level based method is used, which allows for some marginally infeasible solutions to compete with the feasible solutions based on objective values if the constraint violations are below a threshold. This type of constraint handling has shown improvements over strategies where feasible solutions are ubiquitously preferred over infeasible irrespective of the magnitude of violation [43]. For more details of the above DE and constraint handling, the readers are referred to Ref. [43]. As far as the local search is concerned, we employ the commonly used interior point (IP) solver based on [44]. The implementation available under the fmincon package within Matlab 2017b is used in this study with default search options. Other suitable local optimizers could also be used.

2.3.2. SA-DE The algorithm progresses in the same way at the upper level as SASA discussed above. The difference is at the lower level (Algorithm 2) where no surrogates are used and DE progresses in a standard way. 2.3.3. DE-SA The algorithm progresses in the same way as SA-SA, except for the upper level where no surrogates are used. The strategy is the same as the one in previous work (SABLA). One difference, however, in this study is that we do not use the “nested local optimization” component in this (or the remaining three) strategy. While the utility of the component has already been demonstrated in Ref. [22], we have excluded it from this study in order to (a) further reduce the number of function evaluations, and (b) avoid the impact of the component while inferring relative performance of the four strategies presented here which is the main intent of this study.

2.5. Surrogate models

2.3.4. DE-DE This is the most straightforward strategy which follows the steps in Algorithm 1 and 2 without the use of any surrogates.

A number of recent studies have emphasized that a single type of surrogate is unlikely to be the best choice to approximate different types of functions [22,32]. An objective function may be of different 333

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

non-linearity than constraints; as well as this characteristic can also change as the training database changes/grows. Therefore, keeping inline with the SABLA algorithm [22], each objective/constraint function is approximated using multiple surrogate models using the archive of exactly evaluated solutions. The training of the surrogates is done on 80% solutions in the data (randomly chosen), while the testing error is calculated as the mean squared error (MSE) on the remaining 20% data. Among all the models considered, the one with the least validation error is considered to be the most appropriate to represent the given objective/constraint function based on the given archive. The advantages of using multiple surrogate models instead of relying on a single model have been previously established in Refs. [22,32]. Three types of models are used in this study: response surface method (RSM) [45] of order 1 and 2, and Kriging [46,47]. The first two are common types of linear/quadratic regression, whereas Kriging is a more common choice for generic nonlinear functions. A brief overview of these models is given below.

2.6. Re-evaluation mechanism In Section 1, we had mentioned that a characteristic challenge of bilevel optimization is the conflicting nature of UL and LL objectives. For the bilevel problems where such conflicts exist, an inaccurate LL optimum may result in a better UL objective value compared to even the true UL optimum. This creates an obvious challenge for UL ranking, since it would be difficult to remove such spurious solutions since their UL objective values appear to be (falsely) superior. Consequently, the UL search may be easily mislead towards a region which eventually turns out to be sub-optimal if the LL optimization is not extensively performed. When using metaheuristics for LL optimization, evidently this possibility cannot be entirely eliminated, but a more selective strict check can be applied on the highly ranked solution(s) in the population to reduce the probability of occurrence of this scenario. This is done through a re-evaluation step, wherein the top ranked UL solution undergoes a more extensive LL search. The search operates on exact LL evaluations (and not on surrogate). It comprises a global search using DE first, followed by a further local search using IP. Thus, although it consumes additional function evaluations, if used selectively it is able to improve the search substantially; as demonstrated in the previous studies [22,23]. In case the solution identified for re-evaluation already exists in the archive (ArchiveR ), then the next ranked solution in the population is considered for re-evaluation instead, and so on.

2.5.1. Response surface method (RSM) Response Surface Method refers to a regression process that uses polynomials (typically first or second degree; i.e. linear or quadratic) to fit a given dataset [45] of variables and corresponding responses. A generic 2nd order quadratic model with n input variables {x1 , x2 , … xn } can be written as shown in Equation (4). y(x) = 𝛼0 +

n ∑

𝛼i xi +

i=1

n ∑

n ∑ ∑

n−1

𝛼ii xi2 +

i=1

𝛼ij xi xj

(4)

i=1 j=i+1

3. Numerical experiments

where {𝛼 0 , 𝛼 i , 𝛼 ii , 𝛼 ij } are the unknown parameters of the model that are determined from the given data. In vector form, this can be written as y(x) = fT a. The vector f contains all the terms of x1 , x2 , … , xn and vector a contains all the unknown coefficients. The value of unknown coefficients is determined using least squares method. The least squares estimate of a is given by Equation (5).

̂ a = (FT F)−1 FT Y,

Before getting into the specific details of test problems and algorithmic parameters, we first explain the overall rationale behind the manner in which the experiments are setup and conducted. We begin by first collecting a set of 25 well-known benchmark problems in the field of evolutionary bilevel optimization. The problems are composed of analytical expressions, so they are not truly expensive, but we solve them assuming that one or more levels may be computationally expensive. The obvious reason for this is that for benchmark problems, the optimum values are known, making it possible to undertake quantitative comparisons. Additionally, the benchmark problems are constructed using controllable difficulties and hence can be used to test the algorithm’s capability more broadly in terms of handling diverse challenges that could occur within the paradigm of bilevel problems. The computational expense can be easily modeled in the problem by asserting that the number of evaluations for optimization at the computationally expensive level must be restricted; while the other level could potentially use large number of evaluations without adding much to the computational expense. For example, if both levels are expensive, a small number of evaluations need to be allotted for optimization for both UL and LL problems. On the other hand, if only UL is expensive, then the evaluations need to be small at UL, but much larger number of evaluations could be used for the LL optimization. In the numerical experiments that follow, we study all these four scenarios systematically in two parts which are discussed shortly, after we briefly outline the test problems and metrics.

(5)

where F is a matrix containing N rows, each row is a vector fT evaluated at an observation and Y are the observed responses. In this study, we consider RSM of order 1 and 2, referred to as RSM1 and RSM2 respectively. For RSM1, the coefficients of the quadratic terms (i.e. 𝛼 ii , 𝛼 ij ) are zero. 2.5.2. Kriging Kriging, also known as Design and Analysis of Computer Experiments (DACE) [46,47] is among the most popular methods to approximate generic non-linear functions. It attempts to approximate the function as a combination of a global regression model and a deviation term (with zero mean): y(x) = 𝜇(x) + 𝜖(x)

(6)

The regression model is typically polynomial, for example the one shown in Equation (4). The covariance function is given using the equation: cov(𝜖(xi ), 𝜖(xj )) = 𝜎.2 R(xi , xj ),

(7) 3.1. Test problems

where 𝜎 2 is the process variance and R(xi , xj ) is a spatial correlation function, typically modeled as shown in Equation (8). ∑

A total of 25 problems are considered in this paper for the numerical experiments. These consists of the following two sets of problems:

m

R(xi , xj ) = exp(−

𝜃k |xki − xkj |p )

(8)

• SMD problems: SMD series of test problems was proposed in Ref.

Here, 𝜃 k and p are hyper-parameters that are determined using the maximum likelihood estimation (MLE) in order to obtain the Kriging model. For more detailed description of the model, the readers are referred to Ref. [46].

[18]. It contains 12 problems encapsulating a range of challenging properties such as conflict in upper and lower level objectives, nonlinearity, multi-modality, etc. The suite is scalable in terms of the variables, and 5- and 10-variable versions are studied in this paper. For the 5-variable problems (used in Subsections 3.3–3.5), 2 upper

k=1

334

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

Table 2 Bilevel test problems considered in this study, and their optimum values. SMD Problems ([18])

BLTP problems

Problem

Fu∗

fl∗

Problem

Source

Fu∗

fl∗

SMD1 SMD2 SMD3 SMD4 SMD5 SMD6 SMD7 SMD8 SMD9 SMD10 SMD11 SMD12 –

0 0 0 0 0 0 0 0 0 4

0 0 0 0 0 0 0 0 0 3 1 4

BLTP1 BLTP2 BLTP3 BLTP4 BLTP5 BLTP6 BLTP7 BLTP8 BLTP9 BLTP10 BLTP11 BLTP12 BLTP13

[48] (Prob. 2) [16] (Prob. 1) [19] (Prob. 14) [49] (F1) [50] [16] (Prob. 3) [51] [17] [6] (Prob. 4) [6] (Prob. 2) [6] (Prob. 3) [52] (Prob. 2) [11]

0 17 1 1000 −1.4074 100 49 −1 85.0909 5 9 3.25 29.2

200 1 0 1 7.6172 0 −17 0 −50.1818 4 0 4 −3.2

−1 3

Table 3 Four settings used for the numerical experiments and the corresponding number of exact function evaluations used at UL and LL. Setting

UL evals

S1 S2 S3 S4

20 20 20 20

× × × ×

5 5 20 20

LL evals (per UL) 20 20 20 20

× × × ×

5 20 5 20

UL evals (total)

LL evals (total)

105 105 420 420

1.50E + 04 4.4E + 04 6.5E + 04 1.75E + 05

Table 4 Median performance comparison among SA-SA, SA-DE, DE-SA and DE-DE for setting S1. Prob.

SMD1 SMD2 SMD3 SMD4 SMD5 SMD6 SMD7 SMD8 SMD9 SMD10 SMD11 SMD12 BLTP1 BLTP2 BLTP3 BLTP4 BLTP5 BLTP6 BLTP7 BLTP8 BLTP9 BLTP10 BLTP11 BLTP12 BLTP13

DE-DE

DE-SA

SA-DE

SA-SA

𝜖u

𝜖l

𝜖u

𝜖l

𝜖u

𝜖l

𝜖u

𝜖l

3.30E − 01(−) 3.74E − 01(+) 5.48E − 01(=) 1.23E − 01(+) 7.02E − 01(−) 3.69E − 01(+) 8.72E − 01(+) 4.70E + 00(−) 3.84E − 01(=) 2.64E + 01(=) 2.57E − 01(+) – 1.00E − 06(=) 8.41E − 02(=) 1.00E − 06(=) 4.74E + 01(−) 2.49E − 01(+) 1.24E + 00(=) 1.20E + 00(+) 1.55E − 02(=) 3.04E + 00(=) 9.46E − 03(+) 1.25E − 02(+) 5.00E − 06(=) 6.16E + 00(+)

1.06E − 01(−) 1.18E − 01(+) 1.89E − 01(=) 3.49E − 02(+) 2.13E − 01(−) 1.27E − 01(+) 1.25E + 02(=) 6.76E − 01(−) 2.12E − 01(+) 3.76E + 00(=) 1.77E − 01(=) – 1.00E − 06(=) 1.88E − 01(=) 1.00E − 06(=) 1.00E − 06(=) 4.89E − 01(+) 3.81E − 03(=) 8.60E − 01(+) 1.00E − 06(=) 1.06E + 00(=) 3.55E − 01(+) 1.00E − 06(=) 3.70E − 05(=) 9.54E − 01(+)

3.86E − 01(+) 2.12E − 01(+) 4.34E − 01(−) 3.28E − 02(+) 1.07E + 00(+) 2.77E − 01(+) 1.08E − 01(−) 7.59E + 00(−) 4.37E − 01(+) 2.87E + 01(=) 9.23E − 02(=) – 1.00E − 06(=) 8.41E − 02(=) 1.00E − 06(=) 9.00E + 01(=) 5.36E − 02(−) 1.05E + 00(−) 7.67E − 01(+) 2.08E − 02(+) 3.07E + 00(−) 2.15E − 03(=) 3.45E − 04(+) 6.00E − 06(−) 1.60E + 00(−)

6.68E − 02(+) 9.13E − 02(+) 1.64E − 01(=) 2.03E − 02(+) 2.67E − 01(+) 1.30E − 01(+) 8.37E − 02(−) 9.78E − 01(−) 1.18E − 01(=) 2.85E + 00(=) 8.05E − 02(+) – 1.00E − 06(=) 1.88E − 01(=) 1.00E − 06(=) 1.00E − 06(=) 1.11E − 01(−) 2.74E − 03(−) 5.48E − 01(+) 1.00E − 06(+) 1.07E + 00(−) 1.68E − 01(=) 1.00E − 06(=) 5.00E − 05(−) 2.07E − 01(−)

8.37E − 03(+) 3.74E − 01(+) 9.47E − 01(=) 4.62E − 02(+) 4.32E − 02(=) 3.00E − 05(+) 7.79E − 01(+) 5.04E + 00(−) 3.84E − 01(+) 2.01E + 01(=) 1.74E − 01(+) – 1.00E − 06(=) 1.07E − 01(=) 1.00E − 06(=) 3.46E + 02(−) 2.81E − 01(+) 3.18E + 00(=) 1.00E − 06(+) 1.37E − 02(+) 5.70E + 00(=) 1.22E − 02(+) 2.00E − 06(+) 1.20E − 05(−) 8.98E + 00(+)

2.34E − 03(+) 3.51E − 02(+) 1.54E − 01(=) 2.47E − 03(+) 1.18E − 02(=) 6.00E − 06(+) 1.25E + 01(=) 4.40E − 01(−) 6.78E − 02(+) 9.80E − 01(+) 4.39E − 02(+) – 1.00E − 06(=) 2.31E − 01(=) 1.00E − 06(=) 1.00E − 06(=) 5.47E − 01(+) 2.45E − 02(=) 1.00E − 06(+) 1.00E − 06(+) 2.00E + 00(=) 3.85E − 01(+) 1.00E − 06(=) 9.20E − 05(−) 1.09E + 00(+)

2.84E − 02 1.43E − 02 7.98E − 01 6.00E − 05 1.11E − 01 1.00E − 06 1.61E − 01 1.16E + 01 1.67E − 01 1.76E + 01 1.15E − 01 – 1.00E − 06 9.99E − 02 1.00E − 06 9.00E + 01 9.88E − 02 1.52E + 00 1.00E − 06 1.46E − 02 5.15E + 00 1.85E − 03 1.00E − 06 2.30E − 05 2.23E + 00

6.85E − 03 8.80E − 04 2.41E − 01 2.00E − 06 4.27E − 02 1.00E − 06 2.00E − 01 1.97E + 00 1.23E − 02 8.36E − 01 3.20E − 02 – 1.00E − 06 2.18E − 01 1.00E − 06 1.00E − 06 2.03E − 01 5.66E − 03 1.00E − 06 1.00E − 06 1.80E + 00 1.55E − 01 1.00E − 06 1.33E − 04 2.70E − 01

• BLTP problems: Thirteen test problems considered here are collected

and 3 lower level variables are considered, while for the 10-variable problems (used in Subsection 3.6), 5 upper and 5 lower level variables are considered. For reference, the optimum values of Fu and fl are listed in Table 2 1

from various references in the literature, listed in Table 2. The problems are well known in the bilevel optimization domain and have often been used to evaluate the efficacy of different bilevel optimization algorithms. The detailed formulations of the problems could be found in the references given in Table 2. Since a number of these problems have been solved with classical methods, they mostly involve linear and/or quadratic objective and constraint functions. For ease of reference, the problems have been labeled

1

Please note that in the formulations of SMD2, SMD7 and SMD11, log10 was inadvertently used instead of loge . However, from the mathematical equations in the text problem construction, it is easy to observe that this change does not affect the true optimum values for these functions. 335

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

here as BLTP1 − BLTP13 (BLTP refers to Bilevel Test Problems), and their optimum values are listed in Table 2. For some of the problems (BLTP1, BLTP3, BLTP5, BLTP12), the true optimum values are not explicitly reported, and hence the best considered in the literature are listed.

to avoid an additional parameter, the frequency is set the same as that for LL surrogate update. Finally, for re-evaluation of the solutions, the DE needs to be run much longer than the usual expense for each standard evaluation (for a more thorough search). This is therefore set to a maximum of 35 generations. This is followed by IP with a maximum of 250 function evaluations - so that overall expense involved in each re-evaluation is approximately 1000 LL evaluations. An upper limit is needed since the re-evaluation is a component that involves significant number of evaluations. Since the population size is fixed, the primary mechanism for controlling the total number of function evaluations is by varying the number of generations. We have conducted the numerical experiments in two parts, using the settings S1-S4 shown in Table 3:

3.2. Performance assessment criteria 3.2.1. Objective values The performance is measured using the absolute difference of the obtained objective values (Fu∗ , fl∗ ) from the true optimum (Fu∗t , fl∗t ) listed in Table 2. The absolute differences are denoted as 𝜖 u and 𝜖 l for upper and lower levels respectively. The median values of 𝜖 u and 𝜖 l are computed across multiple (29) independent runs of the compared algorithms. A smaller value is indicative of a better performance.

• Part 1: This experiment is to demonstrate the efficacy of the proposed approach compared to others when dealing with computationally expensive problems at both levels. Therefore, a small number of generations (5) are allowed for the optimizations at each level, denoted as setting S1 in Table 3. In this experiment, all four strategies (SA-SA, SA-DE, DE-SA, DE-DE) are run with S1. • Part 2: This experiment is to study the performance of the four strategies when one or both the levels are cheap to evaluate. Therefore, a higher number of generations (20) are allowed for the level(s) that are computationally cheap, denoted as strategies S2, S3, S4 in Table 3. In this experiment, the results of SA-SA obtained using setting S1 are individually compared with SA-DE run with setting S2, DE-SA run with setting S3 and DE-DE run with setting S4.

3.2.2. Performance profiles While the comparison of objective values with the true optimum can help compare the performance for individual problems, it is not able to quantify how well the relative performances of multiple algorithms compare across the full spectrum of problems. A useful statistical tool to make such a comparison is the performance profile [53,54], which plots the cumulative distribution of the performance of each algorithm. The horizontal axis on a performance profile plot is denoted by 𝜏 , which quantifies how a particular algorithm compares with respect to the best performing algorithm for that problem. Correspondingly, on the vertical axis the quantity 𝜌 is plotted, which signifies what proportion (or percentage) of the overall problems was this algorithm able to solve within that factor 𝜏 of the best performing algorithm. For example, a point 𝜏 = 2, 𝜌 = 0.5 for an algorithm on the plot means that for 50% of the problems, the metric value (e.g. in this case median 𝜖 u or 𝜖 l ) delivered by this algorithm was within 2 times of that delivered by the best performing algorithm(s) for those problems. The performance can also be visualized in another intuitive way from this plot - by simply observing the area under the curve (AUC = ∫ 𝜌(𝜏)d𝜏 ) of the profile; a larger value denoting a better overall performance. Also note that when 𝜏 values are very high, their log10 values can be plotted instead. This is the case here, since the median 𝜖 u , 𝜖 l values can be in very different orders of magnitude between the best and worst strategies.

Note that the total number of evaluations presented in Table 3 include the evaluations undertaken for re-evaluation of the solutions. The numbers are prone to some variation (≈10%) since the number of evaluations in the local search (IP) cannot be strictly controlled due to numerical calculation of gradients (even though an upper limit of 250 is imposed). 3.4. Results: part 1 For the setting S1, Table 4 presents the median results obtained by the four strategies for the 25 test problems. The algorithm(s) with best 𝜖 u and 𝜖 l values are marked in bold for easier identification. Also note that the values that were less than 10−6 are simply indicated as 10−6 in the table, as typically the precision below this value is unlikely to be practical in most cases. It can be observed from the table that SA-SA strategy obtains the best median results for 10 problems at the upper level and 13 problems at the lower level. For most of the problems where it is not the best, the difference from the best performing strategy is relatively marginal. SA-DE shows the best results for 7 and 8 problems at the upper and lower level, respectively, while DE-DE shows the best results for 6 and 9 problems at the upper and lower level, respectively. The performance of DE-SA is the closest to SA-SA, with 9 and 11 best performances at the upper and lower levels respectively; however for the cases where it is not the best, the difference in the performance is relatively higher compared to SA-SA. Thus, overall, SA-SA is significantly better than the other individual strategies for most of the cases, and only marginally inferior in the others. The Wilcoxon signed rank test with 95% confidence is also conducted for each of the algorithms compared with SA-SA and the results are indicated using (+), (−) or (=) for statistically higher, lower or equivalent values respectively. The above statement about the summative performance is backed by the observations from the performance profiles in Fig. 1 where SASA shows the highest area under the curve (its curve is the leftmost for both UL and LL performances). At 𝜌(𝜏) = 1 for the UL performance profile, we observe 𝜏 = 0.6628, 2.886, 5.885, 6.079 for SA-SA SA-DE, DE-SA and DE-DE respectively. This means that among all problems,

3.3. Experimental setup For the numerical experiments, we have kept most of the algorithmic setting consistent with the recommendations in previous literature, and adjusted the population size and generations to set a reasonable computational expense. No parameter tuning has been undertaken. We use a constant population size of 20 for all strategies at the upper as well as the lower level, as used in Ref. [22] where the test problems were considered expensive. Note that this is to represent a much lower allowable computational expense compared to the works such as [18,20,21,23], where a much larger population size of 40–50 was used. For the DE crossover/mutation operators, CR = 0.9 and F = 0.7 are used as done in a number of previous studies [21,22]. The frequency of rebuilding surrogates during SAO at the lower level (SurrRF ) as well as invoking surrogates based on ArchiveR at the upper level is set to 10. The reason this can’t be done in every generation is as follows: (a) for LL, the re-building of surrogates involves the true evaluations of the population members. If this is done every generation, the function evaluations will grow similar to a standard DE and the benefits of optimizing the surrogate model over the prescribed number of generations will not be achievable. (b) for the upper level, the surrogates built using ArchiveR will have very few solutions in the database (as this is the number of re-evaluated solutions). Hence, there is a trade-off between using inaccurate larger number of solutions at UL to build the surrogate or use small number of more accurate (re-evaluated) points. In order to derive possible benefits of both types of modeling, as well as 336

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

will be very large compared to UL evaluations – in-fact this will hold only when we assume that both evaluations involve equal expense. There may be cases where a UL evaluation takes, say an hour (which could be due to the use of numerical simulations such as finite element analysis to evaluate structural properties of a design), whereas the LL evaluation takes micro-seconds (say, calculation of total mass of the design components). In such cases the overall time of optimization should be evidently dictated by UL evaluations, and hence the focus should be on reducing the UL evaluations. On the other hand, at the lower level, since the evaluation is cheap, the use of surrogates may be unnecessary since large number of function evaluations are affordable. In the Part 1 of experiments, we had presented the results assuming both levels to be expensive. In this part, we assume the following scenarios: (a) LL evaluation is not expensive, (b) UL evaluation is not expensive, and (c) neither LL or UL is expensive. Correspondingly, we run SA-DE with setting S2 (no surrogates at LL) to handle scenario (a); DE-SA with setting S3 (no surrogates at UL) to handle scenario (b) and DE-DE with setting S4 (no surrogates at any level) to handle scenario (c). We compare the results of each strategy individually with SA-SA under setting S1. The pairwise comparisons of the median values 𝜖 u and 𝜖 l between the above strategies are shown in Table 5. It can be seen that SA-SA is outperformed by each of the other strategies in an overwhelming majority of problems. At the upper level, SA-DE, DE-SA and DE-DE are better (or occasionally equal) to SA-SA in 18, 20 and 19 problems respectively out of 25; whereas at the lower level these numbers are 17, 20 and 20 respectively. At the same time, the differences in the cases where the other strategies were worse-off were very small. This behavior can be seen clearly from the performance profile plots shown in Fig. 4. This is in contrast to the observations earlier in Table 4 where SA-SA was the best algorithm under setting S1. This improvement in performance of course comes with higher number of function evaluations, as indicated in Table 6. However, the assumption that the computational expense may be low in the corresponding level, makes the higher evaluations a viable choice to improve results instead of focusing on reducing the function evaluations. The Wilcoxon signed rank test with 95% confidence is also conducted for each of the algorithms compared with SA-SA and the results are indicated using (+), (−) or (=) for statistically higher, lower or equivalent values respectively in Table 5. Thus, the numerical experiments demonstrate how different computational expense at the upper and lower level should be incorporated in selection of the solution strategy that is likely to perform the best for a given problem.

Fig. 1. Performance profile of the four strategies under the setting S1. Note: x-axis shows log10 𝜏 .

the median 𝜖 u value for SA-SA was no more than 4.60 (=100.6628 ) times more than the 𝜖 u of the best performing algorithm. To put this in perspective, for the other strategies, these factors are 769 (SA-DE), 7.67 × 105 (DE-SA) and 1.20 × 106 (DE-DE) respectively; denoting much poorer cumulative performance than SA-SA. None of the algorithms were able to solve SMD12 in this setting. For the visualization of convergence behavior, the median runs for all problems (except SMD12) are plotted in Figs. 2 and 3, which show the progression of reevaluated UL solutions over UL function evaluations for SMD and BLTP problems, respectively.

3.6. Scalability and current limitations In order to study the scalability of the proposed algorithms, further experiments are conducted with 10-variable SMD problems. The settings for this experiment are kept the same as S1, except that the number of generations is increased from 5 to 10 to allow some more evaluations to handle higher number of variables. The median accuracy and the number of function evaluations used are listed in Table 7. The results are largely consistent with the results for 5-variable version of the problems, with both the newly proposed SA-SA and SA-DE performing better than the other two variants. In this case, the results of the top two variants are relatively much closer to each other, which is apparent from the profile plots shown in Fig. 5. b! The above results confirm that the use of surrogates at the upper level is effective for the higher number of variables (10 in this case) as well. However, an important comment we would like to make here is that the proposed algorithm is not scalable to arbitrarily large number of variables, since the metamodels such as Kriging do not scale well for high dimensions. This is because the training of the model

3.5. Results: part 2 The key aim of this part of the experiments is to demonstrate that the use of approximations should not be ubiquitously for the lower level evaluations; but instead be based on the computational expense required for evaluating a solution at different levels. It is indeed true that in most of the strategies (nested strategies in particular), the total number of LL evaluations will typically be much higher than the number of UL evaluations; and hence the existing research has focused on reducing the latter. However, large number of LL evaluations does not necessarily mean that computational expense in evaluating them 337

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

Fig. 2. Convergence of UL objective values in the median run for SMD problems for the four strategies using S1 setting. SMD12 is excluded as no feasible solution was found by any variant.

338

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

Fig. 3. Convergence of UL objective values in the median run for BLTP problems for the four strategies using S1 setting. SMD12 is excluded as no feasible solution was found by any variant.

339

H.K. Singh et al.

Table 5 Comparison of solutions obtained using with SA-SA(S1) with the SA-DE(S2), DE-SA(S3) and DE-DE(S4). Prob.

Upper level accuracy (𝜖 u ) SA-SA(S1) vs SA-DE(S2)

340

2.84E − 02 1.43E − 02 7.98E − 01 6.00E − 05 1.11E − 01 1.00E − 06 1.61E − 01 1.16E + 01 1.67E − 01 1.76E + 01 1.15E − 01 – 1.00E − 06 9.99E − 02 1.00E − 06 9.00E + 02 9.88E − 02 1.52E + 00 1.00E − 06 1.46E − 02 5.15E + 00 1.85E − 03 1.00E − 06 2.30E − 05 2.23E + 00

7.00E − 05(−) 2.69E − 02(−) 3.74E − 02(−) 1.90E − 05(+) 1.45 − 04(−) 1.00E − 06(+) 1.23E − 01(−) 9.52E − 01(−) 3.70E − 02(−) 1.71E + 01(−) 1.07E − 01(+) – 1.00E − 06(−) 8.41E − 02(−) 1.00E − 06(=) 4.34E + 02(−) 1.15E − 01(−) 1.32E + 00(−) 1.00E − 06(+) 1.52E − 02(−) 5.23E + 00(−) 1.81E − 03(−) 2.00E − 06(+) 8.00E − 06(−) 3.04E + 00(−)

2.84E − 02 1.43E − 02 7.98E − 01 6.00E − 05 1.11–01 1.00E − 06 1.61E − 01 1.16E + 01 1.67E − 01 1.76E + 01 1.15E − 01 – 1.00E − 06 9.99E − 02 1.00E − 06 9.00E + 02 9.88E − 02 1.52E + 00 1.00E − 06 1.46E − 02 5.15E + 00 1.85E − 03 1.00E − 06 2.30E − 05 2.23E + 00

1.08E − 02(=) 3.90E − 03(−) 4.12E − 02(−) 7.57E − 04(+) 1.33E − 01(=) 1.13E − 03(+) 2.78E − 03(−) 3.76E + 00(−) 2.62E − 02(−) 3.69E + 00(−) 2.76E − 01(+) 1.51E + 00(−) 1.00E − 06(=) 1.54E − 03(=) 1.00E − 06(=) 9.00E + 02(=) 9.79E − 04(−) 2.50E − 02(−) 1.29E − 02(+) 1.24E − 04(−) 2.94E − 02(−) 1.00E − 06(−) 1.00E − 06(=) 6.00E − 06(−) 3.03E − 02(−)

SA-SA(S1) vs DE-DE(S4) 2.84E − 02 1.43E − 02 7.98E − 01 6.00E − 05 1.11E − 01 1.00E − 06 1.61E − 01 1.16E + 01 1.67E − 01 1.76E + 01 1.15E − 01 – 1.00E − 06 9.99E − 02 1.00E − 06 9.00E + 02 9.88E − 02 1.52E + 00 1.00E − 06 1.46E − 02 5.15E + 00 1.85E − 03 1.00E − 06 2.30E − 05 2.23E + 00

2.31E − 03(−) 1.08E − 03(+) 2.52E − 03(−) 1.22E − 03(=) 8.01 − 03(−) 1.14E − 03(=) 2.54E − 03(−) 3.58E − 02(−) 9.52E − 03(−) 3.09E + 00(=) 3.31E − 01(=) 9.95E − 01(−) 1.00E − 06(=) 2.24E − 03(=) 1.00E − 06(=) 7.98E + 02(−) 2.20E − 03(=) 4.08E − 02(=) 1.07E − 02(=) 2.49E − 04(=) 5.89E − 02(=) 4.50E − 05(=) 8.00E − 06(+) 7.00E − 06(=) 4.06E − 01(=)

SA-SA(S1) vs SA-DE(S2) 6.85E − 03 8.80E − 04 2.41E − 01 2.00E − 06 4.27E − 02 1.00E − 06 2.00E − 01 1.97E + 00 1.23E − 02 8.36E − 01 3.20E − 02 – 1.00E − 06 2.18E − 01 1.00E − 06 1.00E − 06 2.03E − 01 5.66E − 03 1.00E − 06 1.00E − 06 1.80E + 00 1.55E − 01 1.00E − 06 1.33E − 04 2.70E − 01

3.00E − 05(−) 1.87E − 03(−) 3.42E − 03(−) 2.00E − 06(+) 4.60 − 05(−) 1.00E − 06(+) 4.67E − 02(−) 1.05E − 01(−) 1.27E − 03(−) 8.78E − 01(=) 1.16E − 02(+) – 1.00E − 06(=) 1.88E − 01(−) 1.00E − 06(=) 1.00E − 06(=) 2.34E − 01(−) 4.27E − 03(−) 1.00E − 06(+) 1.00E − 06(=) 1.83E + 00(−) 1.56E − 01(−) 1.00E − 06(=) 6.60E − 05(−) 3.06E − 01(−)

SA-SA(S1) vs DE-SA(S3) 6.85E − 03 8.80E − 04 2.41E − 01 2.00E − 06 4.27–02 1.00E − 06 2.00E − 01 1.97E + 00 1.23E − 02 8.36E − 01 3.20E − 02 – 1.00E − 06 2.18E − 01 1.00E − 06 1.00E − 06 2.03E − 01 5.66E − 03 1.00E − 06 1.00E − 06 1.80E + 00 1.55E − 01 1.00E − 06 1.33E − 04 2.70E − 01

3.98E − 03(=) 6.36E − 04(=) 1.66E − 02(−) 3.56E − 04(+) 5.47E − 02(=) 6.98E − 04(+) 1.17E − 04(−) 4.64E − 01(−) 1.05E − 02(=) 8.23E − 01(=) 3.88E − 01(+) 1.23E + 00(−) 1.00E − 06(=) 4.03E − 03(−) 1.00E − 06(=) 1.00E − 06(=) 2.00E − 03(−) 2.00E − 06(−) 9.23E − 03(+) 1.00E − 06(=) 1.03E − 02(−) 2.42E − 03(−) 1.00E − 06(=) 5.00E − 05(−) 5.09E − 03(−)

SA-SA(S1) vs DE-DE(S4) 6.85E − 03 8.80E − 04 2.41E − 01 2.00E − 06 4.27E − 02 1.00E − 06 2.00E − 01 1.97E + 00 1.23E − 02 8.36E − 01 3.20E − 02 – 1.00E − 06 2.18E − 01 1.00E − 06 1.00E − 06 2.03E − 01 5.66E − 03 1.00E − 06 1.00E − 06 1.80E + 00 1.55E − 01 1.00E − 06 1.33E − 04 2.70E − 01

1.36E − 03(−) 2.73E − 04(=) 7.63E − 04(−) 5.75E − 04(=) 3.04 − 03(−) 2.86E − 04(=) 1.01E − 04(−) 3.57E − 03(−) 1.76E − 03(−) 7.18E − 01(=) 4.82E − 01(=) 1.98E + 00(−) 1.00E − 06(=) 5.84E − 03(=) 1.00E − 06(=) 1.00E − 06(−) 4.57E − 03(=) 4.00E − 06(=) 7.62E − 03(=) 1.00E − 06(=) 2.06E − 02(=) 1.32E − 02(=) 1.00E − 06(=) 5.40E − 05(=) 5.58E − 02(=)

Swarm and Evolutionary Computation 48 (2019) 329–344

SMD1 SMD2 SMD3 SMD4 SMD5 SMD6 SMD7 SMD8 SMD9 SMD10 SMD11 SMD12 BLTP1 BLTP2 BLTP3 BLTP4 BLTP5 BLTP6 BLTP7 BLTP8 BLTP9 BLTP10 BLTP11 BLTP12 BLTP13

Lower level accuracy (𝜖 l ) SA-SA(S1) vs DE-SA(S3)

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

Fig. 4. Pairwise performance profiles for median upper and lower level accuracy for SA-SA(S1) vs SA-DE(S2), DE-SA(S3), DE-DE(S4) approaches. The top row is for UL and the bottom row for LL performance. Note: x-axis shows log10 𝜏 . Table 6 Comparison of function evaluations used by SA-SA(S1) with the SA-DE(S2), DE-SA(S3) and DE-DE(S4). Prob.

Upper level function evals (×102 ) SA-SA(S1) vs SA-DE(S2)

SA-SA(S1) vs DE-SA(S3)

SA-SA(S1) vs DE-DE(S4)

SA-SA(S1) vs SA-DE(S2)

SA-SA(S1) vs DE-SA(S3)

SA-SA(S1) vs DE-DE(S4)

SMD1 SMD2 SMD3 SMD4 SMD5 SMD6 SMD7 SMD8 SMD9 SMD10 SMD11 SMD12 BLTP1 BLTP2 BLTP3 BLTP4 BLTP5 BLTP6 BLTP7 BLTP8 BLTP9 BLTP10 BLTP11 BLTP12 BLTP13

1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07

1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07

1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07

1.62 1.63 1.65 1.64 1.63 1.57 1.64 1.63 1.65 1.69 1.68 1.69 1.59 1.60 1.57 1.60 1.65 1.58 1.62 1.60 1.62 1.60 1.56 1.65 1.67

1.62 1.63 1.65 1.64 1.63 1.57 1.64 1.63 1.65 1.69 1.68 1.69 1.59 1.60 1.57 1.60 1.65 1.58 1.62 1.60 1.62 1.60 1.56 1.65 1.67

1.62 1.63 1.65 1.64 1.63 1.57 1.64 1.63 1.65 1.69 1.68 1.69 1.59 1.60 1.57 1.60 1.65 1.58 1.62 1.60 1.62 1.60 1.56 1.65 1.67

1.07 1.07 1.07 1.07 1.07 1.06 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07

Lower level function evals (×104 ) 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.19 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20

4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.17 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20 4.20

4.40 4.41 4.44 4.43 4.41 4.37 4.41 4.41 4.42 4.48 4.47 4.48 4.39 4.40 4.36 4.39 4.44 4.39 4.41 4.39 4.41 4.40 4.36 4.45 4.47

6.45 6.47 6.57 6.58 6.52 6.26 6.49 6.52 6.60 6.74 6.72 6.74 6.33 6.51 6.26 6.31 6.65 6.40 6.53 6.39 6.52 6.47 6.25 6.73 6.74

1.76 1.76 1.77 1.77 1.77 1.76 1.76 1.77 1.77 1.79 1.79 1.79 1.75 1.77 1.74 1.73 1.78 1.76 1.77 1.76 1.77 1.76 1.75 1.79 1.79

Table 7 Median performance comparison among SA-SA, SA-DE, DE-SA and DE-DE for 10-variable SMD problems. Accuracy Prob.

SMD1 SMD2 SMD3 SMD4

DE-DE

DE-SA

SA-DE

SA-SA

𝜖u

𝜖l

𝜖u

𝜖l

𝜖u

𝜖l

𝜖u

𝜖l

3.32E + 00(−) 3.83E + 00(=) 6.96E + 00(=) 9.85E − 01(+)

2.30E + 00(−) 1.41E + 00(=) 4.50E + 00(=) 7.40E − 01(+)

6.05E + 00(+) 3.54E + 00(+) 5.86E + 00(−) 9.76E − 01(+)

2.95E + 00(+) 2.03E + 00(+) 3.5E + 00(−) 5.74E − 01(+)

4.53E − 02(+) 7.85E − 01(+) 10.00E + 00(−) 2.34E − 02(+)

2.48E − 02(+) 3.93E − 02(+) 6.71E + 00(−) 1.09E − 03(+)

1.08E − 01 7.95E − 01 1.03E + 01 3.04E − 03

6.27E − 02 7.89E − 02 6.05E + 00 2.57E − 04

(continued on next page)

341

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

Table 7 (continued) SMD5 SMD6 SMD7 SMD8 SMD9 SMD10 SMD11 SMD12

7.79E + 00(−) 1.43E + 00(+) 1.50E + 00(=) 1.66E + 01(−) 6.96E + 00(=) 4.26E + 01(=) – 2.02E + 02(=)

4.57E + 00(−) 2.12E + 01(+) 4.34E + 02(=) 9.99E + 00(=) 3.78E + 00(=) 3.00E + 01(=) – 5.90E + 01(=)

2.89E + 01(=) 1.30E + 00(+) 2.10E + 00(+) 2.34E + 01(=) 4.23E + 00(+) 4.54E + 01(=) – 2.16E + 02

1.66E + 01(=) 1.65E + 01(+) 3.75E + 02(=) 1.52E + 01(+) 2.38E + 00(+) 2.64E + 01 – 6.31E + 01(=)

2.92E − 01(−) 7.30E − 05(+) 1.72E + 00(=) 1.81E + 01(−) 1.00E + 00(=) 5.24E + 00(−) – 2.16E + 02(=)

2.20E − 01(−) 2.31E + 01(+) 3.74E + 02(=) 1.26E + 01(=) 1.76E − 01(+) 2.63E + 01(=) – 6.12E + 01(=)

3.09E + 01 0.00E + 00 1.91E + 00 2.14E + 01 1.09E + 00 4.35E + 01 – 1.98E + 02

1.62E + 01 6.15E + 00 7.59E + 02 1.33E + 01 2.67E − 01 2.85E + 01 – 6.66E + 01

Function evaluations Prob.

SMD1 SMD2 SMD3 SMD4 SMD5 SMD6 SMD7 SMD8 SMD9 SMD10 SMD11

DE-DE

DE-SA

SA-DE

SA-SA

UL

LL

UL

LL

UL

LL

UL

LL

2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02

4.94E + 04 4.94E + 04 4.95E + 04 4.95E + 04 4.94E + 04 4.95E + 04 4.95E + 04 4.95E + 04 4.94E + 04 4.95E + 04 4.95E + 04

2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02 2.10E + 02

5.36E + 04 5.36E + 04 5.37E + 04 5.37E + 04 5.37E + 04 5.26E + 04 5.37E + 04 5.37E + 04 5.36E + 04 5.37E + 04 5.36E + 04

2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02

4.92E + 04 4.92E + 04 4.95E + 04 4.95E + 04 4.93E + 04 4.95E + 04 4.94E + 04 4.95E + 04 4.94E + 04 4.95E + 04 4.94E + 04

2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02 2.17E + 02

5.35E 5.34E 5.37E 5.37E 5.36E 5.34E 5.36E 5.37E 5.36E 5.37E 5.36E

+ 04 + 04 + 04 + 04 + 04 + 04 + 04 + 04 + 04 + 04 + 04

becomes time-consuming and inaccurate due to large number of coefficients. Furthermore, evolutionary algorithms themselves suffer the socalled “curse of dimensionality” for large number of variables since the decision space grows exponentially – they need specialized mechanisms such as decomposition to improve the search [55,56]. However, a problem can be computationally expensive even for low dimensions. For example, engineering design can often involve sizing optimization of a component, where the number of parameters to define the design is not large, but the analysis is done through a time consuming simulation such as FEA or CFD. The work presented in this study is more geared towards such expensive problems, with relatively low number of variables. Another notable limitation of the current implementation is that the surrogate models used are suitable only for continuous variables. In order to deal with other types of variables as well as a mix of different types of variables, such as discrete, binary and permutation variables, further enhancements need to be incorporated in the surrogate modeling itself. Certainly, the subsequent improvements to deal with higher number and types of variables can be accommodated in the general framework presented here, and will be explored in-depth in the future extensions of this work. 4. Concluding remarks Bilevel optimization is an interesting and challenging area from mathematical perspective, in that it incorporates optimality of a lower level problem as a constraint within an upper level optimization problem. This hierarchical model is suitable for modeling a number of reallife problems, and effective solution methodologies for the problem are well sought after by theoreticians as well as practitioners. The complexity in some of these problems may render the problem unsuitable for exact mathematical techniques, for example when the involved objective/constraints are highly nonlinear, discontinuous, non-differentiable or black-box. This has motivated the use of metaheuristics such as EAs in solving bilevel problems; although in its basic form the nested EA requires very large number of evaluations. In order to overcome this challenge, the use of approximation has been proposed in prior work. However, so far it has focused on reducing primarily the lower level evaluations, which has an implicit assumption that the function evaluations at both levels are equally expensive. In this study, we rationalize that this assumption may not always hold, and this aspect has significant implications on use of approximations in solving bilevel optimization problems.

Fig. 5. Performance profile of the four strategies under the setting S1 for 10variable SMD problems. Note: x-axis shows log10 𝜏 . 342

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

The first key contribution of this study is to propose new algorithm(s) that attempt to use approximations to reduce function evaluations at the upper level, in addition to lower level (SA-SA) or instead of it (SA-DE). The second contribution is to study four strategies systematically on a set of 25 commonly used bilevel optimization benchmark problems. The experiments show that when both the levels are considered expensive, SA-SA algorithm outperforms the other strategies. However, when one or more of the levels are computationally cheap, then the other three strategies can outperform SA-SA with the use of more function evaluations at the level that they are cheap to compute. The key takeaway from the exercise is that a user should make a careful choice about which levels to introduce surrogate-assisted optimization on in order to solve a problem at hand in the most efficient manner. The current strategy of applying surrogate assistance to lower level (as done in the literature currently) will produce benefits generally if LL evaluation has significant cost associated with it. On the other hand, a number of real-life problems may have UL as expensive, in which case the approach(es) presented in this paper could be considered to obtain better solution within the computational budget. While the presented work adequately demonstrates the choice of strategy in dealing with computationally expensive problems, only nested bilevel strategies are considered within the scope of this study. In the future work, the authors will explore such implications within non-nested formulations of bilevel optimization problems. Moreover, extensions to deal with large numbers and different types of variables will be also considered.

[18] A. Sinha, P. Malo, K. Deb, Test problem construction for single-objective bilevel optimization, Evol. Comput. 22 (3) (2014) 439–477. [19] J. Angelo, E. Krempser, H. Barbosa, Differential evolution for bilevel programming, in: IEEE Congress on Evolutionary Computation(CEC), 2013, pp. 470–477. [20] J.S. Angelo, H.J. Barbosa, A study on the use of heuristics to solve a bilevel programming problem, Int. Trans. Oper. Res. 22 (5) (2015) 861–882. [21] M.M. Islam, H.K. Singh, T. Ray, A memetic algorithm for solving single objective bilevel optimization problems, in: IEEE Congress on Evolutionary Computation(CEC), IEEE, 2015, pp. 1643–1650. [22] M.M. Islam, H.K. Singh, T. Ray, A surrogate assisted approach for single-objective bilevel optimization, IEEE Trans. Evol. Comput. 21 (5) (2017) 681–696. [23] M.M. Islam, H.K. Singh, T. Ray, A. Sinha, An enhanced memetic algorithm for single-objective bilevel optimization problems, Evol. Comput. 25 (4) (2017) 607–642. [24] A. Koh, A metaheuristic framework for bi-level programming problems with multi-disciplinary applications, in: Metaheuristics for Bi-level Optimization, Springer, 2013, pp. 153–187. [25] K. Deb, A. Sinha, An efficient and accurate solution methodology for bilevel multi-objective programming problems using a hybrid evolutionary-local-search algorithm, Evol. Comput. 18 (3) (2010) 403–449. [26] M.M. Islam, H.K. Singh, T. Ray, Use of a non-nested formulation to improve search for bilevel optimization, in: Australasian Joint Conference on Artificial Intelligence, Springer, 2017, pp. 106–118. [27] Y. Wang, Y.-C. Jiao, H. Li, An evolutionary algorithm for solving nonlinear bilevel programming based on a new constraint-handling scheme, IEEE Trans. Syst., Man, Cybern., Part C (Appl. Rev.) 35 (2) (2005) 221–232. [28] A. Koh, Solving transportation bi-level programs with differential evolution, in: IEEE Congress on Evolutionary Computation(CEC), IEEE, 2007, pp. 2243–2250. [29] A. Sinha, T. Soun, K. Deb, Evolutionary bilevel optimization using KKT proximity measure, in: IEEE Congress on Evolutionary Computation(CEC), 2017, pp. 2412–2419. [30] A. Sinha, P. Malo, K. Deb, An improved bilevel evolutionary algorithm based on quadratic approximations, in: IEEE Congress on Evolutionary Computation(CEC), 2014, pp. 1870–1877. [31] J.S. Angelo, E. Krempser, H.J. Barbosa, Differential evolution assisted by a surrogate model for bilevel programming problems, in: IEEE Congress on Evolutionary Computation (CEC), 2014, pp. 1784–1791. [32] K.S. Bhattacharjee, H.K. Singh, T. Ray, Multi-objective optimization with multiple spatially distributed surrogates, ASME J. Mech. Design 138 (9) (2016) 091401. [33] K.S. Bhattacharjee, H.K. Singh, T. Ray, Multiple surrogate-assisted many-objective optimization for computationally expensive engineering design, J. Mech. Des. 140 (5) (2018) 051403. [34] K.S. Bhattacharjee, H.K. Singh, T. Ray, J. Branke, Multiple surrogate assisted multiobjective optimization using improved pre-selection, in: IEEE Congress on Evolutionary Computation(CEC), 2016, pp. 4328–4335. [35] Y. Jin, Surrogate-assisted evolutionary computation: recent advances and future challenges, Swarm Evolut. Comput. 1 (2) (2011) 61–70. [36] A. Sinha, P. Malo, K. Deb, Evolutionary algorithm for bilevel optimization using approximations of the lower level optimal solution mapping, Eur. J. Oper. Res. 257 (2) (2017) 395–411. [37] A. Sinha, P. Malo, K. Deb, Efficient Evolutionary Algorithm for Single-Objective Bilevel Optimization, 2013, arXiv:1303.3901. [38] A. Sinha, P. Malo, K. Deb, P. Korhonen, J. Wallenius, Solving bilevel multicriterion optimization problems with lower level decision uncertainty, IEEE Trans. Evol. Comput. 20 (2) (2016) 199–217. [39] A. Sinha, P. Malo, P. Xu, K. Deb, A bilevel optimization approach to automated parameter tuning, in: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, ACM, 2014, pp. 847–854. [40] M.-S. Casas-Ramrez, J.-F. Camacho-Vallejo, I.-A. Martnez-Salazar, Approximating solutions to a bilevel capacitated facility location problem with customer’s patronization toward a list of preferences, Appl. Math. Comput. 319 (2018) 369–386. [41] J. Herskovits, A. Leontiev, G. Dias, G. Santos, Contact shape optimization: a bilevel programming approach, Struct. Multidiscip. Optim. 20 (3) (2000) 214–221. [42] M.D. McKay, R.J. Beckman, W.J. Conover, Comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics 21 (2) (1979) 239–245. [43] T. Takahama, S. Sakai, Constrained optimization by the constrained differential evolution with gradient-based mutation and feasible elites, in: IEEE Congress on Evolutionary Computation(CEC), IEEE, 2006, pp. 1–8. [44] R.H. Byrd, M.E. Hribar, J. Nocedal, An interior point algorithm for large-scale nonlinear programming, SIAM J. Optim. 9 (4) (1999) 877–900. [45] R. Myer, D.C. Montgomery, Response Surface Methodology: Process and Product Optimization Using Designed Experiment, John Wiley and Sons, New York, 2002. [46] J. Sacks, W.J. Welch, T.J. Mitchell, H.P. Wynn, Design and analysis of computer experiments, Stat. Sci. (1989) 409–423. [47] N.V. Queipo, J.V. Goicochea, S. Pintos, Surrogate modeling-based optimization of sagd processes, J. Pet. Sci. Eng. 35 (1) (2002) 83–93. [48] E. Aiyoshi, K. Shimizu, A solution method for the static constrained Stackelberg problem via penalty method, Automatic Control, IEEE Trans. 29 (12) (1984) 1111–1114. [49] V. Oduguwa, R. Roy, Bi-level optimisation using genetic algorithm, in: Artificial Intelligence Systems, 2002.(ICAIS 2002). IEEE International Conference on, IEEE, 2002, pp. 322–327.

Acknowledgment The authors would like to acknowledge the University of New South Wales, Capability Systems Centre grant RG181312 for supporting this work. The second and third authors would also like to acknowledge Discovery Project grants DP190102591 and DP190101271, respectively, from the Australian Research Council. References [1] H. v. Stackelberg, Theory of the Market Economy, Oxford University Press, 1952. [2] J. Bracken, J.T. McGill, Mathematical programs with optimization problems in the constraints, Oper. Res. 21 (1) (1973) 37–44. [3] S. Christiansen, M. Patriksson, L. Wynter, Stochastic bilevel programming in structural optimization, Struct. Multidiscip. Optim. 21 (5) (2001) 361–371. [4] Y. Yin, Multiobjective bilevel optimization for transportation planning and management problems, J. Adv. Transp. 36 (1) (2002) 93–105. [5] Z. Luka, K. ori, V.V. Rosenzweig, Production planning problem with sequence dependent setups as a bilevel programming problem, Eur. J. Oper. Res. 187 (3) (2008) 1504–1512. [6] J. Rajesh, K. Gupta, H.S. Kusumakar, V.K. Jayaraman, B.D. Kulkarni, A tabu search based approach for solving a class of bilevel programming problems in chemical engineering, J. Heuristics 9 (4) (2003) 307–319. [7] E.-G. Talbi, A taxonomy of metaheuristics for bi-level optimization, in: Metaheuristics for Bi-level Optimization, Springer, 2013, pp. 1–39. [8] A. Sinha, P. Malo, K. Deb, A review on bilevel optimization: from classical to evolutionary approaches and applications, IEEE Trans. Evol. Comput. 22 (2) (2018) 276–295. [9] L.N. Vicente, P.H. Calamai, Bilevel and multilevel programming: a bibliography review, J. Glob. Optim. 5 (3) (1994) 291–306. [10] S. Dempe, A simple algorithm for the-linear bilevel programming problem, Optimization 18 (3) (1987) 373–385. [11] W. Candler, R. Townsley, A linear two-level programming problem, Comput. Oper. Res. 9 (1) (1982) 59–76. [12] J.F. Bard, An efficient point algorithm for a linear two-stage optimization problem, Oper. Res. 31 (4) (1983) 670–684. [13] D.J. White, G. Anandalingam, A penalty function approach for solving bi-level linear programs, J. Glob. Optim. 3 (4) (1993) 397–419. [14] H. nal, A modified simplex approach for solving bilevel linear programming problems, Eur. J. Oper. Res. 67 (1) (1993) 126–135. [15] L. Vicente, G. Savard, J. Jdice, Descent approaches for quadratic bilevel programming, J. Optim. Theory Appl. 81 (2) (1994) 379–399. [16] J.F. Bard, Convex two-level optimization, Math. Program. 40 (13) (1988) 15–27. [17] J.E. Falk, J. Liu, On bilevel programming, part i: general nonlinear cases, Math. Program. 70 (13) (1995) 47–72.

343

H.K. Singh et al.

Swarm and Evolutionary Computation 48 (2019) 329–344

[50] G. Savard, J. Gauvin, The steepest descent direction for the nonlinear bilevel programming problem, Oper. Res. Lett. 15 (5) (1994) 265–272. [51] G. Anandalingam, D. White, A solution method for the linear static stackelberg problem using penalty functions, IEEE Trans. Autom. Control 35 (10) (1990) 1170–1173. [52] J.F. Bard, J.E. Falk, An explicit solution to the multi-level programming problem, Comput. Oper. Res. 9 (1) (1982) 77–100. [53] E.D. Dolan, J.J. Mor, Benchmarking optimization software with performance profiles, Math. Program. 91 (2) (2002) 201–213.

[54] H.J. Barbosa, H.S. Bernardino, A. Barreto, Using performance profiles to analyze the results of the 2006 cec constrained optimization competition, in: IEEE Congress on Evolutionary Computation (CEC), 2010, pp. 1–8. [55] H.K. Singh, T. Ray, Divide and conquer in coevolution: a difficult balancing act, in: Agent-Based Evolutionary Search, Springer, 2010, pp. 117–138. [56] M.N. Omidvar, X. Li, Y. Mei, X. Yao, Cooperative co-evolution with differential grouping for large scale optimization, IEEE Trans. Evol. Comput. 18 (3) (2014) 378–393.

344