Robust data envelopment analysis approaches for evaluating algorithmic performance

Robust data envelopment analysis approaches for evaluating algorithmic performance

Accepted Manuscript Robust Data Envelopment Analysis Approaches for Evaluating Algorithmic Performance Chung-Cheng Lu PII: DOI: Reference: S0360-8352...

393KB Sizes 11 Downloads 175 Views

Accepted Manuscript Robust Data Envelopment Analysis Approaches for Evaluating Algorithmic Performance Chung-Cheng Lu PII: DOI: Reference:

S0360-8352(14)00456-2 http://dx.doi.org/10.1016/j.cie.2014.12.027 CAIE 3907

To appear in:

Computers & Industrial Engineering

Received Date: Revised Date: Accepted Date:

22 January 2013 16 June 2014 24 December 2014

Please cite this article as: Lu, C-C., Robust Data Envelopment Analysis Approaches for Evaluating Algorithmic Performance, Computers & Industrial Engineering (2014), doi: http://dx.doi.org/10.1016/j.cie.2014.12.027

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Robust Data Envelopment Analysis Approaches for Evaluating Algorithmic Performance Abstract Recent advances in state-of-the-art meta-heuristics feature the incorporation of probabilistic operators aiming to diversify search directions or to escape from being trapped in local optima. This feature would result in non-deterministic output in solutions that vary from one run to another of a meta-heuristic. Consequently, both the average and variation of outputs over multiple runs have to be considered in evaluating performances of different configurations of a meta-heuristic or distinct meta-heuristics. To this end, this work considers each algorithm as a decision-making unit (DMU) and develops robust data envelopment analysis (DEA) models taking into account not only average but also standard deviation of an algorithm’s output for evaluating relative efficiencies of a set of algorithms. The robust DEA models describe uncertain output using an uncertainty set, and aim to maximize a DMU’s worst-case relative efficiency with respect to that uncertainty set. The proposed models are employed to evaluate a set of distinct configurations of a genetic algorithm and a set of parameter settings of a simulated annealing heuristic. Evaluation results demonstrate that the robust DEA models are able to identify efficient algorithmic configurations. The proposed models contribute not only to the evaluation of meta-heuristics but also to the DEA methodology.

Keywords: meta-heuristics; uncertainty modeling; data envelopment analysis; robust counterpart optimization.

1

1. Introduction This work focuses on determining relative efficiencies of a set of algorithmic configurations of a meta-heuristic or a set of different meta-heuristics for solving combinatorial optimization problems. Typically, a meta-heuristic consists of several algorithmic operators, each of which may be implemented in a number of distinct ways, thus resulting in various configurations (or combinations) with different performances. Comparing various combinations and identifying the most efficient one(s) are the critical tasks at the final stage of developing a meta-heuristic. In the literature, the most commonly-used method is the empirical analysis which involves an extensive set of paired-t tests for comparing average algorithmic performances with respect to a number of criteria (such as computational time, objective values, robustness, and flexibility) on a wide range of problem instances (Bräysy & Gendreau, 2005a, 2005b). More sophisticated methods may be based on the Design of Experiments (DOE) and Analysis of Variance (ANOVA), e.g., Coy (2000), Francois and Lavergne (2001), Rardin and Uzsoy (2001), Bartz-Beielstein (2006), Ruiz et al. (2006), Birattari (2009). While these methods help to decide best configurations, experimental design should be ideally and carefully used on a number of combinations of algorithmic operators, to arrive at conclusions that have meaning in a statistical sense. Inevitably, the complexity of making the decision based on the empirical analysis increases dramatically with the numbers of distinct operators, parameters values, and evaluation criteria, mainly due to a large number of paired-t tests. Furthermore, since ANOVA is parametric, when using the DOE with ANOVA, we have to check the three main hypotheses, which are normality, homoskedasticity, and independence of residuals; this is generally not a simple task. As also pointed out by Ruiz et al. (2006), the resulting ANOVA has many degrees of freedom; one has to be very careful when analyzing results of an experiment with such large sample sizes. In solving the meta-heuristics tuning problem, which aims to determine the best configuration of a meta-heuristic, Birattari (2009) proposed the F-Race algorithm, which adopts the Friedman two-way ANOVA in the racing algorithm inspired by Hoeffding race, introduced by Maron & Moore (1994), for solving the model selection problem in machine learning. Although this approach is statistically sound, it is computationally intensive and still has to satisfy the hypotheses of ANOVA. To avoid the hassles of using the above empirical analysis methods, Lu and Yu (2012) proposed an alternative that adopted data envelopment analysis (or DEA; e.g., Banker et al., 1984; Charnes et al., 1978; Amin and Toloo, 2007; Toloo and Nalchigar, 2009; Toloo, 2012, 2

2013, 2014(a), 2014(b)) to evaluate relative efficiencies of a set of combinations of the genetic algorithm (GA) operators (i.e., selection, crossover and mutation) for solving the pickup and delivery vehicle routing problem with soft time windows (PDVRPSTW). In this approach, each possible combination of GA operators was considered as a decision making unit (DMU), and DEA was adopted to evaluate and compare the algorithmic efficiency of the distinct GA combinations under consideration. In addition, the cross-efficiency (CE) method (e.g., Doyle and Green, 1994) was employed to rank the combinations. The numerical results showed that DEA is well-suited for determining efficient combinations of GA operators (Lu and Wu, 2014). While DEA represents a promising alternative for evaluating relative efficiencies of algorithms, one of its limitations is that input and output data of DMUs which are used as the coefficients in the corresponding linear programs (LPs) have to be precisely-known, a priori (Toloo and Nalchigar, 2011). This limitation may affect the efficiency evaluation of some state-of-the-art meta-heuristics that incorporate probabilistic operators with the aim to diversify search directions or to escape from being trapped in local optima. For example, GAs include a mutation rate that controls the probability of applying a mutation operator to chromosomes; simulated annealing (SA) approaches use Cauchy or Boltzmann functions (e.g., Lin et al., 2011; Ying et al., 2011) in the annealing process to determine the probability of replacing current solution with a worse solution. In this type of meta-heuristics, incorporating probabilistic operators would result in non-deterministic or uncertain output in solutions that vary from one run to another. Consequently, both the average and variation of outputs over multiple runs have to be considered in evaluating algorithmic efficiency. Previous research on DEA models with imprecise data may be classified into three main categories, namely, fuzzy DEA, interval DEA, and robust DEA. In the fuzzy DEA approach, Sengupta (1992) presented a fuzzy LP transformation to deal with DEA models with fuzzy input and output data. Based on the α-cut approach, Kao and Liu (2000) proposed a transformation of a fuzzy DEA model into a family of crisp DEA models, while Guo and Tanaka (2001) introduced an approach that changed a fuzzy DEA model to a bi-level LP model. Some other examples of fuzzy DEA models can be found in Soleimani-damaneh et al. (2006) and Liu (2008). A common shortcoming of the fuzzy DEA approach is that fuzzy DEA models are computationally expensive (Soleimani-damaneh et al., 2006). In the interval DEA approach, input and output values are selected from their respective intervals with prescribed lower and upper bounds so as to maximize a DMU’s relative 3

efficiency score. Because both the input and output data and weights are variables, the interval DEA approach results in nonlinear programs (NLP). Cooper et al. (1999) developed an interval approach that permits mixtures of imprecise and precise data. To deal with the nonlinear interval DEA model, they proposed a two-stage transformation that involves scale transformations and variable alternations to transform the interval DEA model into an ordinary linear program. Despotis and Smirlis (2002) defined the upper and lower bounds for the efficiency scores of the DMUs, and proposed transformations of nonlinear interval DEA models to LP equivalents. They also used a post-DEA model and the endurance indices to discriminate among the efficient DMUs. Some other examples of interval DEA models can be found in Entani et al. (2002), Kao (2006), and Toloo and Ertay (2014). In the interval DEA approach, some DMUs may be always efficient or inefficient for any combinations of values within given intervals, while others may be either efficient or inefficient depending on the values assigned. Moreover, the DMUs are no longer represented by points in the hyper-plane, and instead a number of efficient frontiers may exist. The efficiency of the DMUs may vary according to the efficiency frontier selected. The robust DEA approach is based on the robust counterpart optimization (RCO) approach (e.g., Ben-Tal and Nemirovski, 1999; Bertsimas and Sim, 2004) which describes uncertain data using an uncertainty set, and aims to maximize a DMU’s worst-case relative efficiency with respect to that uncertainty set. Sadjadi and Omrani (2008) proposed robust DEA models with consideration of uncertainty on output parameters for the performance assessment of electricity distribution companies. Shokouhi et al. (2010) developed robust DEA models which consider uncertainty on both input and output parameters. Note that both of the works focused on the adaptation of Bertsimas and Sim’s (2004) approach to the CCR model (Charnes et al., 1978). More recently, Sadjadi et al. (2011) applied the RCO approach of Ben-Tal and Nemirovski (1999) to the super-efficiency DEA model of Andersen and Petersen (1993), which was also based on the CCR model. To take into account the aforementioned output uncertainties (due to probabilistic operators) in using DEA models to evaluate algorithmic efficiency, this research develops two robust DEA models which take into account not only the average but also standard deviation of algorithm’s output values. Particularly, our work adapts the RCO approach to the BCC model (Banker et al., 1984), which allows variable returns-to-scale (VRS) on production frontiers, whereas previous robust DEA approaches were based on the CCR model, which assumes constant returns-to-scale (CRS). The BCC model has seen a wider use than the CCR model, because the former was built based on the more general assumption, VRS. As a result, 4

the proposed robust BCC models should receive greater attention than the robust CCR models presented in the literature (Sadjadi et al. 2008, 2011, Shokouhi et al. 2010). Moreover, this research represents the first to develop robust BCC models using distinct RCO techniques in the literature (i.e., Bertsimas and Sim, 2004 and Ben-Tal and Nemirovski, 1999) and compared their results in algorithmic efficiency evaluation. Using the same test dataset and GA algorithms provided by Lu and Yu (2012), we demonstrate the application of the robust DEA models to evaluate a set of GA algorithms for solving the PDVRPSTW. Additionally, the proposed models are applied to evaluate a set of parameter settings of a simulated annealing (SA) heuristic developed for solving the truck and trailer routing problem with time windows (Lin et al., 2011). To verify its effectiveness, the proposed approach is compared with the conventional empirical analysis method based on paired-t tests. It is important to note that a large number of paired-t tests have to be conducted, if the empirical analysis method is employed to determine the efficient combinations of GA operators or the efficient SA parameter settings. Furthermore, the solution variation among different runs of GA and SA due to the probabilistic feature in the mutation operator would add more complexity to the empirical analysis approach. Instead, by applying the proposed robust DEA models, which explicitly address (output) coefficient uncertainties in the BCC model, the relative efficiency of each algorithmic configuration can be easily determined. The rest of this paper is structured as follows. Section 2 describes the robust DEA models for evaluating the relative efficiency of a set of algorithmic combinations. Section 3 presents the applications of the robust DEA models to evaluate a number of combinations of GA operators and a set of SA parameter settings. Concluding remarks are in Section 4.

2. Robust DEA Model for Evaluating Algorithmic Efficiency Notation n number of combinations or configurations of algorithmic operators (or number of DMUs), j the subscript of the combinations of algorithmic operators, j = 1, 2, …, n, o the subscript of the evaluated combination on any trial, o ranges over 1, 2, …, n, yrj the amount of output r produced by combination j, r = 1, 2, …, s, xij the amount of input i utilized by combination j, i = 1, 2, …, m, ur the weight assigned to output r by the evaluated combination o, r = 1, 2, …, s, vi the weight assigned to input i by the evaluated combination o, i = 1, 2, …, m,

λj the weight associated with combination j, j = 1,…, n, Ej the efficiency of combination j, 5

2.1 Relative Efficiency DEA is a multi-factor productivity analysis model for measuring the relative efficiency of a set of DMUs. To apply DEA for evaluating algorithmic efficiency, each combination of algorithmic operators of interests is considered as a DMU. The efficiency of combination j in the presence of multiple outputs and inputs is defined as a weighted sum of its outputs divided by a weighted sum of its inputs:    ∑    / ∑   .

(1)

With the assumption of CRS, Charnes et al. (1978) formulated the choice of these input and output weights (i.e., vi’s and ur’s) as a linear program, called CCR model, that allows each DMU (i.e., combination) j to maximize its own measured efficiency relative to the other combinations. An optimal solution to the CCR model assigns the input and output weights which maximize the efficiency of the evaluated combination. This model is solved n times, each for a combination of algorithmic operators, to obtain the relative efficiency of all the combinations under consideration. To relax the strict CRS assumption, Banker et al. (1984) proposed a generalization of the CCR model, called BCC model, which allows VRS on production frontiers. The BCC model is adapted in this work to measure the relative efficiency of each combination. The envelopment form of the (output-oriented) BCC model for the evaluated DMU o is presented as follows (Banker et al., 1984). (BCC) Maximize ηo

(2)

Subject to ∑     ,  1, … , ,   ∑    ,   1, … , , ∑    1,

λj ≥0, j = 1,…, n, The slack variables associated with the first and second constraints of the BCC model (2) are

    ∑     0 ,  1, … ,  ,

and

  ∑        0,   1, … ,  ,

respectively, where  represent input excess while  are output shortfalls. If the optimal objective value ηo* = 1 and all the slack variables are zero, then the evaluated DMU is BCC-efficient (Cooper et al., 2007). Let vi, i = 1,…, m, ur, r = 1,…, s, and vo be the dual variables associated with the first, second, and third constraints of the BCC model (2), respectively. The dual problem (i.e., the multiplier form) of the BCC model is given as follows. (BCC-D) Minimize   ∑     

(3) 6

Subject to ∑    1,  ∑     ∑      0,   1, … , ,

vi ≥0, i = 1,…, m, and ur ≥0, r = 1,…, s, vo free in sign. 2.2 Undesired Outputs in Efficiency Evaluation Classical DEA models rely on the assumption that inputs have to be minimized and outputs have to be maximized. However, it is worth noting that production processes may also generate undesirable outputs, such as smoke pollution or solid waste. Particularly, motivated by environmental consciousness, ecological efficiency measurement considering undesirable pollutions has recently attracted much interest (Allen, 1999). Undesirable outputs may also appear in other applications like health care (e.g., complications of medical operations) and business (e.g., tax payments). In the context of evaluating the algorithmic efficiency for solving a minimization problem, a combination of algorithmic operators (or a DMU) is considered to be efficient if the objective value (i.e., the output) obtained by that DMU is minimal (or smaller than the objective values obtained by other DMUs), so the output may be viewed as undesired. For instance, the PDVRPSTW considered by Lu and Yu (2012) aims to minimize the number of vehicles used, total traveled distance and violations of time window constraints (i.e., earliness and tardiness), so this set of objectives are viewed as the undesired outputs of a combination of GA operators used to solve instances of the PDVRPSTW. Thus, applying DEA to evaluate algorithmic performance for solving minimization problems has to take into account undesirable outputs in efficiency valuations. Several approaches proposed in the literature for dealing with undesirable outputs could be employed to this end (Scheel, 2001). This study adopts the following simple transformation method to the (undesired) outputs. −

yrj+ = Mr −yrj , r = 1,…, s, j = 1,…, n.

(4)

where yrj− and yrj+ are the output values of output r of DMU j before and after transformation, respectively, and Mr denotes a big number that is greater than the maximum possible value of output r. 2.3 Robust BCC Model based on Ben-Tal and Nemirovski’s Approach This section presents the robust BCC model based on the robust counterpart approach proposed by Ben-Tal and Nemirovski (1999) for LPs with uncertain coefficients. While inputs and outputs of algorithms (i.e., DMUs) could be uncertain, our work particularly concerns about the uncertainty on outputs due to the probabilistic operators incorporated in meta-heuristics. Let  be the average of output r over multiple runs of algorithm j (or 7

DMU j) and  the maximum deviation from the average value. By adopting Ben-Tal and Nemirovski’s approach, the set of the uncertain outputs of DMU j is described using an ellipsoid as follows:      



,   1, … | ∑

ೝೕ ೝೕ  ೝೕ

  ",

(5)

where the parameter θ is a subjective value chosen by the decision-maker to reflect his/her attitude toward risk; the larger is θ, the more risk averse he/she is. Specifically, for θ = 0, the set shrinks to the singleton, Y(0) = { , r =1,…, s}; for θ = 1, Y(1) is the largest volume ellipsoid contained in the box B = {yrj| |yrj − | ≤ , r =1,…, s}; for θ = s, the set Y(s) is the smallest ellipsoid containing the box B. Ben-Tal and Nemirovski (1999) showed that the robust counterpart of an uncertain LP with ellipsoidal uncertainty set is computationally tractable, since it leads to a conic quadratic program, which can be solved in polynomial time. The idea of this approach is to replace an uncertain LP with its robust counterpart. Consider the general form of a LP: Minimize {∑ #  | ∑ $   0, ∀i}, where z = {zj} denotes the decision variable vector and c = {cj} and A = {aij} are coefficients vectors. Without loss of generality, consider that A is uncertain and belongs to the uncertainty set U. Then, the robust counterpart of LP is: Minimize { ∑ #  | ∑ $   0 , ∀i, ∀A∈U}. Specifically, when U is described using an ellipsoid, the robust counterpart of LP can be derived as: Minimize {∑௝ ௝ ௝ | ∑௝ ௜௝ ௝   ∑௝  ௜௝ଶ ௝ଶ 0, ∀i}. Accordingly, the robust BCC-BN model can be obtained as follows: (BCC-BN) Minimize z

(6)

Subject to   ∑       0,  ∑  %  &∑  '  1,   ∑ %    (∑  '  0,   1, … , ,     ∑ 

vi ≥0, i = 1,…, m, and ur ≥0, r = 1,…, s, vo free in sign. In BCC-BN, the objective value z represents the efficiency of the evaluated DMU o. While efficient combinations can attain the efficiency score of 1 in the (nominal) BCC model, for non-zero value of θ, the efficiency score obtained by the BCC-BN model is always less than 1. BCC-BN has the flexibility of controlling the degree of solution conservatism through the safety parameters θ. 2.4 Robust BCC Model based on Bertsimas and Sim’s Approach 8

This section presents the RCO formulation of BCC model based on the approach proposed by Bertsimas and Sim (2004) for linear and integer programs with uncertain data. Their approach is to introduce a protection function in a mathematical program with uncertain coefficients being described using intervals. This protection function serves as a buffer to against uncertain coefficients. Specifically, the robust counterpart of a LP with uncertain coefficients is given as: Minimize {∑௝ ௝ ௝ | ∑௝ ௜௝ ௝  ௜ , Γ 0, ∀i}, where ௜ , Γ denotes the protection function corresponding to constraint i, and the budget parameter Γis used to control the degree of uncertainty and solution conservatism. According to this approach, the set of the uncertain outputs of DMU j is defined as the following symmetric and bounded set, where a prescribed interval ) %  ' , % * ' + is used to describe uncertain output  .  Γ    , ,|   ) %  ' , % * ' +, ,; ∑

|ೝೕ ೝೕ | ೝೕ

Γ".

(7)

The idea of the parameter Γis that it is often unlikely that all the uncertain coefficients get the worst-case value simultaneously, so it may be adequate to protect up to .Γ/ of those coefficients to get their worst-case value and one coefficient, say t, to change by Γ  .Γ/ ' , where .Γ/ denotes the largest integer smaller than Γ. In light of the set defined in Eq.(7), the protection function βj(u, v, Γ) corresponding to DMU j is defined as follows.  , , Γ maxೕ|ೕ , ೕ Γ,\ೕ ∑ೕ    Γ  Γ   ,

(8)

where u and v are the decision variable vectors of output and input weights, respectively, and O is the index set of the outputs with its cardinality |O| equal to s. Then, the RCO formulation of BCC model is given as follows: (BCC-BS) Minimize z

(9)

Subject to   ∑        0, ∑      , , Γ 1,  ∑      , , Γ  0,  1, … , ,      ∑   

vi ≥0, i = 1,…, m, and ur ≥0, r = 1,…, s, vo free in sign. By varying Γ∈[0, |O|], the decision-maker is able to adjust solution robustness against the level of conservatism of the solution. When Γ= 0, β(u, v, Γ) = 0, the problem is equivalent to the nominal problem. On the other hand, if Γ = s, we get the most conservative version of the problem. Note that the above formulation is a NLP. The following theorem shows that this NLP can be reformulated as a LP that is much easier to be solved than the BCC-BS model. 9

Theorem 1: The BCC-BS is equivalent to the following LP. (BCC-BS-LP) Minimize z

(10)

Subject to  

∑    

   0,

∑     Γ



 ∑  ! 1,

 ∑     Γ      ∑   



 ∑  !  0,  1, … , ,



 !    , " 1, … , #,  1, … , ,



 0,  1, … , ,

!  0, " 1, … , #,  1, … , , vi ≥0, i = 1,…, m, and ur ≥0, r = 1,…, s, vo free in sign,

Proof of Theorem 1: Given decision vectors u and v, the protection function β(u, v, Γ) in Eq.(8) is equal to the objective function of the following LP problem (LP1), because the optimal solution of LP1, z* = {zrj, ∀r}, comprises Γ variables equal to 1 and one variable equal to Γ  Γ . (LP1) Maximize ∑  ' 

(11)

Subject to ∑  Γ, 0 ≤zrj ≤1, r = 1,…, s. This is equivalent to selecting a subset 0 1 2"|0 3 0, 40 4  .Γ/, 2  0\0 " with the corresponding objective function ∑  ' * Γ  .Γ/ ' . Let ϖj and πrj be the dual variables associated with the first and second constraints of the LP1 model (11), respectively. The dual problem of LP1 is presented in the following. (DLP1) Minimize Γ6 * ∑ 7

(12)

Subject to 6 * 7   ' ,   1, … , ,

ϖj ≥0, πrj ≥0, r = 1,…, s. Since Problem LP1 is feasible and bounded for Γ∈[0, |O|], by strong duality, Problem DLP1 is also feasible and bounded and their objective values coincide. The protection function β(u, v, Γ) in Eq.(8) is hence equal to the objective function of Problem DLP1. By substituting Problem DLP1 into Problem BCC-BS, we can obtain that Problem BCC-BS is equivalent to Problem BCC-BS-LP. This completes the proof. The BCC-BS-LP reformulation retains linearity. Moreover, the budget parameter controls the number of coefficients that can simultaneously take their largest variations, 10

allowing a tradeoff between robustness and optimality. While the same value of Γis applied to all DMUs in the above BCC-BS and BCC-BS-LP formulations, the decision-maker may attach different values of Γto the DMUs, which reflect his/her own knowledge on the DMUs and attitude toward risks.

3. Numerical Examples 3.1 Evaluation of GA configurations This sub-section presents the numerical example of applying the robust BCC models to evaluate the relative efficiencies of a set of different combinations of GA operators for solving the PDVRPSTW. To avoid repetition, the reader is referred to Lu and Yu (2012) for the descriptions of the problem and the GA operators. There are three selection operators, four crossover operators and four mutation operators, resulting in 48 combinations of the GA operators. Each combination (or DMU) is named by the acronym of the three constituent operators. For example, RWS-1PX-M1 denotes the combination of roulette wheel selection, one point crossover, and mutation 1. Moreover, the test dataset and results from Lu and Yu (2012) are used in this work. There are three instances of the problem: R (randomly distributed customers), C (clustered customers), and RC (a mix of random and clustered customers), differentiated by geographic data. Each instance has 100 customers; one half of them are randomly designated as pickup requests and the other half as delivery requests. The three DEA models, namely BCC, BCC-BN, and BCC-BS, presented in Section 2 are solved using the MINOS solver in GAMS for the three instances. The input of a DMU is the computation (CPU) time (in seconds) required to solve a problem instance, and the outputs include number of vehicles used, total traveled distance, total early time (earliness), and total late time (tardiness) obtained from solving a problem instance. In the conducted numerical experiment, each of the 48 combinations is tested 30 times (using different random seeds) on each problem instance. For each combination on each problem instance, the averages and standard deviations of the selected input and output items over 30 runs are obtained for the data required in the three DEA models. Specifically, for each problem instance, the average input and output values over 30 runs of each combination j are used as the nominal data:  , ∀r, and $ , ∀i. The maximum deviation of each output r of combination j is determined as  2&' , ∀r, j, where SDrj is the standard deviation of output r of combination j over 30 runs. The maximum deviations obtained in this way can cover more than 98% of all output values from all combinations on all problem instances. 11

The efficiency scores and ranking of the 48 GA combinations on solving the three instances C, R, and RC are presented in Tables 1, 2, and 3, respectively. In the three tables, the first column (from the leftmost) lists the 48 combinations. The second and third columns show the efficiency scores and ranking obtained by the BCC model, respectively, followed by the results from the BCC-BN model in columns four to seven. In the BCC-BN model, the parameter θ is set to 1 and 2, corresponding to the cases of the largest volume ellipsoid contained in the box B and the smallest volume ellipsoid containing the box B, respectively. Displayed in columns eight to eleven are the results from the BCC-BS model. The parameter Γ is set to be 1 and 4, corresponding to two different levels of conservatism. As shown in the tables, the BCC model identifies multiple efficient combinations (i.e., with efficiency score equal to 1) for each of the three instances, so we are not able to exactly rank all of the combinations. All of the efficient combinations include tournament selection (TS) as the selection operator, while the roulette wheel selection (RWS) and uniform selection (US) operators appear in the inefficient combinations. As for the crossover operators, one-point crossover (1PX) is found in the majority of efficient combinations, though some efficient combinations adopt cycle crossover (CX) operator. None of the 48 combinations adopts linear-order crossover (LOX) or partially-mapped (PMX) operators. Furthermore, the four mutation operators do not produce significant differences in algorithmic efficiency because, as shown in the tables, all of the four mutation operators are used in both efficient and inefficient combinations. Compared to the BCC model in which several combinations can take efficiency score 1.0 (i.e., multiple efficient combinations) for each test instance, both BCC-BN and BCC-BS models identify exactly one and the same efficient combination for each test instance. Specifically, the best (and robust) combinations for instances C, R, and RC are TS-1PX-M4, TS-CX-M4, and TS-1PX-M4, respectively. The results indicate that our attention may be restricted to less number of efficient combinations when the BCC-BN or BCC-BS model is used to evaluate algorithmic efficiencies. The reason is that not only the average but also the variation (or uncertainty) of the DMUs’ output values are explicitly considered in both the BCC-BN and BCC-BS models, whereas the BCC model takes only the average output values in to account. Thus, the robust BCC approach can facilitate ranking the DMUs exactly, if there is only one efficient combination identified by the robust models. Moreover, it can be found that the efficiency scores decrease as the level of protection or conservatism increases (i.e., as parameters θ and Γget larger), due to the protection from the 12

uncertain output values in the objective function of the two models. The ranking of the 48 combinations does not change for different values of the parameters, because the same values of parameters θand Γare applied to all 48 combinations in this experiment. Table 4 presents the comparison of best (robust), average, and worst combinations for the three instances in terms of average input and output values. Moreover, the comparison in terms of the standard deviation of input and output values is reported in Table 5. In these two tables, for each instance, the input and output values of the average combination are obtained by averaging the input and output values over the 48 combinations. It can be observed that the best combinations produce significantly lower (or better) averages and standard deviations of input and output values than the average and worst combinations. Such observation reveals that the best combinations not only outperform the average and worst combinations but also are indeed more robust. A set of statistical tests are conducted to compare the robust DEA evaluation results with the results of the empirical analysis method typically adopted in the literature. Based on the ranking results presented in Tables 1, 2, and 3, the aim is to examine whether or not the vector of the input and output of one DMU with a higher rank is significantly better than that of another DMU with a lower rank. The Hotelling’s two-sample T-square statistic (Hotelling, 1931), which is the multi-variate counterpart of the Student paired-t test, is employed in the conducted statistical tests. The results are shown in Table 6. For each test instance, the test is conducted by following the ranks determined by the robust DEA method. Specifically, the best DMU is compared with the second best DMU, the second best DMU is then compared with the third best DMU, and so on. It can be observed that all of the F-values obtained in the tests are larger than the threshold value, F(5, 54) = 2.425, when the significance level αis equal to 0.05. Equivalently, in all of the pair-wise comparisons, the higher ranked DMU is significantly better than the lower ranked DMU. This indicates that the ranks obtained by the robust DEA method are consistent with the empirical analysis results based on statistical tests. 3.2 Evaluation of SA parameter settings This sub-section presents the numerical example of applying the robust BCC models to evaluate the relative efficiencies of a set of different parameter settings of a SA heuristic for solving the vehicle routing problem with time windows (VRPTW). The reader is referred to Lin et al. (2011) for the description of the SA heuristic. In the example, the three major parameters under consideration are: the temperature reduction ratio (β), the maximum number of iterations between two different temperatures (Itemax), the maximum number of temperature 13

reductions (Nummax). In light of the numerical results of Lin et al. (2011), the following values are considered for the three parameters, resulting in 36 different parameter settings for the SA heuristic.

β= 0.955, 0.965, 0.975 Itemax = 50000, 70000, 90000, 120000, 150000, 200000 Nummax = 20, 30 Each parameter setting is considered as a DMU and named by the notation of the three parameters. For example, β1-I1-N1 denotes the setting in which β= 0.955, Itemax = 50000 and Nummax = 20. The numerical results of the three instances of VRPTW: R101, C101, and RC101 from Lin et al. (2011) are used in this example. The input of a DMU is the computation (CPU) time (in seconds) required to solve an instance, and the outputs include number of vehicles used, total traveled distance, total early time (earliness), and total late time (tardiness) obtained from solving a problem instance. For each instance, the average input and output values over 30 runs of each setting j are used as the nominal data:  , ∀r, and $ , ∀i. The maximum deviation of each output r of setting j is determined as  2&' , ∀r, j, where SDrj is the standard deviation of output r of setting j over 30 runs. The efficiency scores and ranking of the 36 SA parameter settings on solving the three instances C101, R101, and RC101 are presented in Tables 7, 8, and 9, respectively. The arrangement of these three tables is the same as that of Tables 1, 2, and 3. As shown in the tables, the BCC model identifies multiple efficient combinations (i.e., with efficiency score equal to 1) for each of the three instances, so we are not able to exactly rank all of the combinations. All of the efficient settings have β= 0.965, and Itemax = 150000 or 200000. For Nummax, some efficient settings have Nummax = 20 and the others have Nummax = 30. In general, the settings with β= 0.965 outperform those with β= 0.955 and β= 0.975. Both the BCC-BN and BCC-BS models identify exactly one and the same efficient setting for each test instance. The best (and robust) settings for instances C101, R101, and RC101 are (β2-I5-N2), (β2-I6-N2), and (β2-I5-N2), respectively. The evaluation result is consistent with the best parameter setting adopted by Lin et al. (2011), where β= 0.965 (β2), Itemax = 150000 (I5), and Nummax = 30 (N2). Although our research adopts a different method (i.e., the robust BCC models) to identify the efficient parameter setting, the result is the same as the best parameter setting determined by Lin et al. (2011), based on the empirical analysis (i.e., paired-t tests). Note that just like the observations from another example presented in 14

Section 3.1, the efficiency scores of the robust models decrease as the level of protection or conservatism increases (i.e., as parameters θand Γget larger), and the ranking of the 36 setting does not change for different values of the parameters θand Γ. Additionally, our method is able to rank all of the parameter settings and provides more information for determining robust efficient parameter settings, since both the BCC-BN and BCC-BS models explicitly consider not only the average but also the variation (or uncertainty) of the DMUs’ output values. Based on the ranking results presented in Tables 7, 8, and 9 for the instances C101, R101, and RC101, respectively, the two sample Hotelling's T-square test is conducted to examine whether or not the vector of the input and output of one DMU with a higher rank is significantly better than that of another DMU with a lower rank. The results are shown in Table 10. It can be observed that all of the F-values obtained in the tests are larger than the threshold value, F(5, 54) = 2.425 with the significance level αequal to 0.05. Thus, in all of the pair-wise comparisons, the higher-ranked DMU is significantly better than the lower-ranked DMU, indicating that the ranks obtained by the robust models are consistent with the results based on statistical tests.

4. Concluding Remarks With the particular focus on evaluating relative efficiency of meta-heuristics and enhancing the decision-making for determining efficient combinations of algorithmic operators, this work proposed two robust BCC models based on the RCO approaches (Ben-Tal and Nemirovski, 1999; Bertsimas and Sim, 2004). Our models took into account not only the average but also the variation of algorithm’s output values due to the design of probabilistic operators in various meta-heuristics. We then applied the three DEA models, namely BCC, BCC-BN, and BCC-BS, to evaluate a number of different combinations of GA operators for solving the PDVRPSTW and a set of parameter settings of a SA heuristic for solving the VRPTW. The numerical results demonstrated that our approaches are well suited for evaluating relative efficiencies of meta-heuristics. Moreover, the relative efficiencies obtained by the robust BCC models facilitate in ranking exactly the configurations or algorithms under consideration. While this paper presented a successful application of the proposed robust DEA models to evaluate GA operator combinations and SA parameter settings, the robust DEA models can be applied to evaluate relative efficiencies of a wide spectrum of meta-heuristics (e.g., Ant Colony Systems and Tabu search) for solving combinatorial optimization problems. In 15

particular, the proposed robust BCC models are suitable for evaluating the relative efficiency of the meta-heuristics which incorporate probabilistic operators used to diversify search process. The robust BCC models provide an alternative of identifying the best set of parameters values. Thus, we believe that DEA can be employed to enhance the decision-making for identifying efficient combinations of algorithmic operators and/or parameter values when developing meta-heuristics for solving optimization problems.

References Allen, K., 1999. DEA in the ecological context – an overview. In: Westermann, G. (Ed.), Data Enveloment Analysis in the Service Sector, pp. 203–235, Gabler, Wiesbaden. Amin, G. R., Toloo, M., 2007. Finding the most efficient DMUs in DEA: An improved integrated model. Computers & Industrial Engineering 52(1), 71–77. Andersen, P., Petersen, N. C., 1993. A procedure for ranking efficient units in data envelopment analysis. Management Science 39(10), 1261–1264. Banker, R. D., Charnes, A., Cooper, W. W., 1984. Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science 30(9), 1078–1092. Bartz-Beielstein, T., 2006. Experimental research in evolutionary computation - the new experimentalism. Natural Computing Series, Springer-Verlag, Heidelberg, Berlin. Ben-Tal, A., Nemirovski, A., 1999. Robust solutions to uncertain linear programs. Operations Research Letters, 25, 1–13. Bertsimas, D., Sim, M., 2004. The price of robustness. Operations Research 52, 35–53. Birattari, M., 2009. Tuning metaheuristics - A machine learning perspective. Springer, New York. Bräysy, O., Gendreau, M., 2005a. Vehicle routing problem with time windows part I: route consruction and local search algorithms. Transportation Science 39(1), 104–118. Bräysy, O., Gendreau, M., 2005b. Vehicle routing problem with time windows part II: metaheuristics. Transportation Science 39(1), 119–139. Charnes, A., Cooper, W. W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of Operational Research, 2, 429–444. Cooper, W. W., Park, K. S., Yu, G., 1999. IDEA and AR-IDEA: Models for dealing with imprecise data in DEA. Management Science 45(4), 597–607. Cooper, W. W., Seiford, L. M, Tone, K., 2007. Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software (2nd edition). Springer, New York. Coy, S., 2000. Using experimental design to find effective parameter settings for heuristics. Journal of Heuristics 7(1), 77-97. Despotis, D. K., Smirlis, Y. G., 2002. Data envelopment analysis with imprecise data. European Journal of Operational Research 140(1), 24–36. Doyle, J., Green, R., 1994. Efficiency and cross-efficiency in DEA: Derivations, meanings and uses. Journal of the Operational Research Society 45(5), 567-578. Entani, T., Maeda, Y., Tanaka, H., 2002. Dual models of interval DEA and its extension to interval data. European Journal of Operational Research 136(1), 32–45. Francois, O., Lavergne, C., 2001. Design of Evolutionary Algorithms – A Statistical Perspective. IEEE Transactions on Evolutionary Computation 5(2), 129-148. Guo, P., Tanaka, H., 2001. Fuzzy DEA: A perceptual evaluation method. Fuzzy Sets and Systems 119(1), 149–160. 16

Hotelling, H., 1931. The generalization of Student's ratio. Annals of Mathematical Statistics 2(3), 360–378. Kao, C., Liu, S. T., 2000. Fuzzy efficiency measures in data envelopment analysis. Fuzzy Sets and Systems 113(3), 427–437. Kao, C., 2006. Interval efficiency measures in data envelopment analysis with imprecise data. European Journal of Operational Research 174(2), 1087–1099. Lin, S.-W., Yu, Vincent F., Lu, C.-C., 2011. A simulated annealing heuristic for the truck and trailer routing problem with time windows. Expert Systems with Applications 38(12), 15244-15252. Lin, S.-W., Lee, Z.-J., Lu, C.-C., Ying, K.-C., 2011. Minimization of maximum lateness on parallel machines with sequence-dependent setup times and job release dates. Computers & Operations Research 38(5), 809–815. Liu, S. T., 2008. A fuzzy DEA/AR approach to the selection of flexible manufacturing system. Computer and Industrial Engineering 54(1), 66–76. Lu, C.-C., Yu, V. F., 2012. Data envelopment analysis for evaluating the efficiency of genetic algorithms on solving the vehicle routing problem with soft time windows. Computers & Industrial Engineering 63(2), 520–529 Lu, C.-C., Wu, Y.-C., 2014. Evaluation of heuristics using Data Envelopment Analysis. International Journal of Information Technology & Decision Making http://dx.doi.org/10.1142/S0219622014500606. Maron, O., Moore, A.W., 1994. Hoeffding races: Accelerating model selection search for classification and function approximation. In: Cowan, J.D., Tesauro, G., Alspector, J. (Eds.), Advances in Neural Information Processing Systems, Vol. 6 (pp. 59–66). Morgan Kaufmann, San Francisco. Sadjadi, S. J., Omrani, H., Abdollahzadeh, S., Alinaghian, M., Mohammadi, H., 2011. A robust super-efficiency data envelopment analysis model for ranking of provincial gas companies in Iran. Expert Systems with Applications 38(9), 2011, 10875–10881. Sadjadi, S. J., Omrani, H., 2008. Data envelopment analysis with uncertain data: An application for Iranian electricity distribution companies. Energy Policy 36(11), 4247–4254. Sengupta, J. K., 1992. A fuzzy systems approach in data envelopment analysis. Computers and Mathematics with Applications 24(8-9), 259–266. Shokouhi, A. H., Hatami-Marbini, A. Tavana, M., Saati, S., 2010. A robust optimization approach for imprecise data envelopment analysis. Computers & Industrial Engineering 59(3), 387–397. Soleimani-damaneh, M., Jahanshahloo, G. R., Abbasbandy, S., 2006. Computational and theoretical pitfalls in some current performance measurement techniques and a new approach. Applied Mathematics and Computation 181(2), 1199–1207. Rardin, R. L. Uzsoy, R., 2001. Experimental evaluation of heuristic algorithms: A tutorial. Journal of Heuristics 7(3), 261–304. Ruiz, R., Maroto, C., Alcaraz, J., 2006. Two new robust genetic algorithms for the flowshop scheduling problem. Omega, the international journal of management science 34(5), 461–476. Scheel, H., 2001. Undesirable outputs in efficiency valuations. European Journal of Operational Research 132, 400–410. Toloo, M., Nalchigar, S., 2009. A new integrated DEA model for finding most BCC efficient DMU. Applied Mathematical Modeling 33, 597–604. Toloo, M., Nalchigar, S., 2011. A new DEA method for supplier selection in presence of both cardinal and ordinal data. Expert Systems with Applications 38(12) 14726–14731. Toloo, M., 2012. On finding the most BCC-efficient DMU: A new integrated MIP-DEA 17

model. Applied Mathematical Modeling 36, 5515–5520. Toloo, M., 2013. The most efficient unit without explicit inputs: An extended MILP-DEA model. Measurement 46, 3628–3634. Toloo, M., Ertay, T., 2014. The most cost efficient automotive vendor with price uncertainty: A new DEA approach. Measurement 52, 135–144. Toloo, M., 2014(a). An epsilon-free approach for finding the most efficient unit in DEA. Applied Mathematical Modelling 38(13), 3182–3192. Toloo, M., 2014(b). The role of non-Archimedean epsilon in finding the most efficient unit with an application of professional tennis players. Applied Mathematical Modelling http://dx.doi.org/10.1016/j.apm.2014.04.010. Ying, K.-C, Lin, S.-W., Lu, C.-C., 2011. Cell formation using a simulated annealing algorithm with variable neighborhood. European Journal of Industrial Engineering 5(1), 22–42.

18

Table 1 Efficiency scores and ranking – PDVRPSTW Instance C BCC DMUs RWS-1PX-M1 RWS-1PX-M2 RWS-1PX-M3 RWS-1PX-M4 RWS-CX-M1 RWS-CX-M2 RWS-CX-M3 RWS-CX-M4 RWS-LOX-M1 RWS-LOX-M2 RWS-LOX-M3 RWS-LOX-M4 RWS-PMX-M1 RWS-PMX-M2 RWS-PMX-M3 RWS-PMX-M4 TS-1PX-M1 TS-1PX-M2 TS-1PX-M3 TS-1PX-M4 TS-CX-M1 TS-CX-M2 TS-CX-M3 TS-CX-M4 TS-LOX-M1 TS-LOX-M2 TS-LOX-M3 TS-LOX-M4 TS-PMX-M1 TS-PMX-M2 TS-PMX-M3 TS-PMX-M4 US-1PX-M1 US-1PX-M2 US-1PX-M3 US-1PX-M4 US-CX-M1 US-CX-M2 US-CX-M3 US-CX-M4 US-LOX-M1 US-LOX-M2 US-LOX-M3 US-LOX-M4 US-PMX-M1 US-PMX-M2 US-PMX-M3 US-PMX-M4

BCC-BN

θ=1 Score 0.89452 0.88810 0.86599 0.89488 0.81493 0.81154 0.81563 0.82045 0.83653 0.83549 0.82613 0.83928 0.85289 0.84233 0.84074 0.86075 1.00000 0.98911 0.98919 1.00000 1.00000 0.99319 0.99005 0.99085 0.92233 0.91009 0.92237 0.92339 0.91898 0.91294 0.92207 0.93065 0.87568 0.87841 0.85535 0.88391 0.80117 0.79892 0.80931 0.80961 0.83115 0.83031 0.81775 0.83524 0.84737 0.84154 0.83500 0.85623

Rank 16 17 21 15 41 42 40 38 31 32 37 30 25 27 29 22 1 6 5 1 1 2 4 3 10 14 9 8 12 13 11 7 20 19 24 18 45 46 44 43 35 36 39 33 26 28 34 23

Score 0.88694 0.88155 0.86197 0.88936 0.81154 0.80773 0.81488 0.81992 0.83592 0.82935 0.82196 0.83738 0.85152 0.84185 0.84020 0.86025 0.99275 0.97888 0.98882 0.99949 0.98492 0.97329 0.98570 0.98820 0.91608 0.90768 0.92106 0.91727 0.91560 0.90017 0.92158 0.91899 0.87073 0.87194 0.85244 0.88023 0.80032 0.79658 0.80865 0.80910 0.82728 0.82492 0.81630 0.83163 0.84686 0.83893 0.83448 0.85573

θ=2 Rank 18 19 23 17 43 46 42 40 33 36 39 32 27 29 30 24 2 7 3 1 6 8 5 4 13 15 10 12 14 16 9 11 22 21 26 20 47 48 45 44 37 38 41 35 28 31 34 25

Score 0.88657 0.88111 0.86156 0.88898 0.81102 0.80719 0.81435 0.81940 0.83548 0.82891 0.82154 0.83694 0.85099 0.84137 0.83967 0.85975 0.99220 0.97831 0.98844 0.99899 0.98430 0.97276 0.98520 0.98760 0.91559 0.90722 0.92066 0.91681 0.91516 0.89963 0.92110 0.91846 0.87036 0.87154 0.85201 0.87979 0.79981 0.79604 0.80800 0.80858 0.82686 0.82450 0.81590 0.83118 0.84636 0.83844 0.83396 0.85522

19

Rank 18 19 23 17 43 46 42 40 33 36 39 32 27 29 30 24 2 7 3 1 6 8 5 4 13 15 10 12 14 16 9 11 22 21 26 20 47 48 45 44 37 38 41 35 28 31 34 25

BCC-BS Γ= 1 Γ= 4 Score Rank Score Rank 0.88698 18 0.88694 18 0.88158 19 0.88155 19 0.86199 23 0.86197 23 0.88938 17 0.88936 17 0.81156 43 0.81154 43 0.80775 46 0.80773 46 0.81488 42 0.81488 42 0.81992 40 0.81992 40 0.83592 33 0.83592 33 0.82938 36 0.82934 36 0.82198 39 0.82196 39 0.83739 32 0.83738 32 0.85153 27 0.85152 27 0.84185 29 0.84185 29 0.84020 30 0.84020 30 0.86025 24 0.86025 24 0.99279 2 0.99275 2 0.97892 7 0.97888 7 0.98882 3 0.98882 3 0.99950 1 0.99949 1 0.98498 6 0.98491 6 0.97336 8 0.97329 8 0.98572 5 0.98570 5 0.98821 4 0.98820 4 0.91611 13 0.91608 13 0.90769 15 0.90768 15 0.92106 10 0.92106 10 0.91730 12 0.91727 12 0.91562 14 0.91560 14 0.90020 16 0.90017 16 0.92159 9 0.92158 9 0.91902 11 0.91899 11 0.87075 22 0.87073 22 0.87197 21 0.87194 21 0.85246 26 0.85244 26 0.88025 20 0.88023 20 0.80032 47 0.80032 47 0.79659 48 0.79658 48 0.80865 45 0.80865 45 0.80910 44 0.80910 44 0.82730 37 0.82728 37 0.82495 38 0.82492 38 0.81631 41 0.81630 41 0.83165 35 0.83163 35 0.84686 28 0.84686 28 0.83894 31 0.83893 31 0.83448 34 0.83448 34 0.85573 25 0.85573 25

Table 2 Efficiency scores and ranking – PDVRPSTW Instance R BCC DMUs RWS-1PX-M1 RWS-1PX-M2 RWS-1PX-M3 RWS-1PX-M4 RWS-CX-M1 RWS-CX-M2 RWS-CX-M3 RWS-CX-M4 RWS-LOX-M1 RWS-LOX-M2 RWS-LOX-M3 RWS-LOX-M4 RWS-PMX-M1 RWS-PMX-M2 RWS-PMX-M3 RWS-PMX-M4 TS-1PX-M1 TS-1PX-M2 TS-1PX-M3 TS-1PX-M4 TS-CX-M1 TS-CX-M2 TS-CX-M3 TS-CX-M4 TS-LOX-M1 TS-LOX-M2 TS-LOX-M3 TS-LOX-M4 TS-PMX-M1 TS-PMX-M2 TS-PMX-M3 TS-PMX-M4 US-1PX-M1 US-1PX-M2 US-1PX-M3 US-1PX-M4 US-CX-M1 US-CX-M2 US-CX-M3 US-CX-M4 US-LOX-M1 US-LOX-M2 US-LOX-M3 US-LOX-M4 US-PMX-M1 US-PMX-M2 US-PMX-M3 US-PMX-M4

BCC-BN

θ=1 Score 0.87376 0.86849 0.86264 0.88501 0.81519 0.81079 0.82566 0.82771 0.82806 0.81988 0.82098 0.83740 0.83937 0.83020 0.83790 0.85017 0.99065 1.00000 0.99697 1.00000 0.97813 0.97582 0.99955 1.00000 0.92828 0.93122 0.93064 0.93569 0.91938 0.91031 0.92374 0.92235 0.86741 0.85877 0.85473 0.87806 0.80922 0.80484 0.81998 0.82334 0.82169 0.81766 0.81449 0.82970 0.83582 0.82724 0.83534 0.84629

Rank 17 18 20 15 42 44 35 33 32 40 38 27 25 30 26 23 4 1 3 1 5 6 2 1 10 8 9 7 13 14 11 12 19 21 22 16 45 46 39 36 37 41 43 31 28 34 29 24

Score 0.87309 0.86775 0.86197 0.88425 0.81448 0.81006 0.82499 0.82709 0.82751 0.81934 0.82034 0.83677 0.83877 0.82961 0.83721 0.84957 0.98932 0.99271 0.99565 0.99877 0.97673 0.97505 0.99855 0.99933 0.92147 0.91949 0.92889 0.93276 0.91860 0.90351 0.92138 0.92114 0.86673 0.85798 0.85405 0.87730 0.80845 0.80408 0.81928 0.82265 0.82114 0.81711 0.81389 0.82907 0.83517 0.82660 0.83468 0.84566

θ=2 Rank 19 20 22 17 44 46 37 35 34 41 40 29 27 32 28 25 6 5 4 2 7 8 3 1 11 14 10 9 15 16 12 13 21 23 24 18 47 48 42 38 39 43 45 33 30 36 31 26

Score 0.87253 0.86719 0.86145 0.88366 0.81377 0.80933 0.82431 0.82646 0.82696 0.81881 0.81969 0.83613 0.83816 0.82903 0.83652 0.84898 0.98881 0.99219 0.99519 0.99843 0.97610 0.97437 0.99784 0.99867 0.92100 0.91897 0.92832 0.93227 0.91814 0.90301 0.92085 0.92064 0.86617 0.85743 0.85355 0.87677 0.80769 0.80333 0.81858 0.82196 0.82059 0.81658 0.81328 0.82845 0.83451 0.82595 0.83401 0.84503

20

Rank 19 20 22 17 44 46 37 35 34 41 40 29 27 32 28 25 6 5 4 2 7 8 3 1 11 14 10 9 15 16 12 13 21 23 24 18 47 48 42 38 39 43 45 33 30 36 31 26

BCC-BS Γ= 1 Γ= 4 Score Rank Score Rank 0.87311 19 0.87309 19 0.86779 20 0.86775 20 0.86200 22 0.86197 22 0.88429 17 0.88424 17 0.81448 44 0.81448 44 0.81006 46 0.81006 46 0.82499 37 0.82499 37 0.82709 35 0.82709 35 0.82751 34 0.82751 34 0.81934 41 0.81934 41 0.82034 40 0.82034 40 0.83677 29 0.83677 29 0.83877 27 0.83877 27 0.82961 32 0.82961 32 0.83721 28 0.83721 28 0.84957 25 0.84957 25 0.98943 6 0.98930 6 0.99291 5 0.99266 5 0.99574 4 0.99564 4 0.99877 2 0.99877 2 0.97674 7 0.97673 7 0.97507 8 0.97505 8 0.99859 3 0.99855 3 0.99934 1 0.99933 1 0.92153 11 0.92146 11 0.91959 14 0.91946 14 0.92902 10 0.92888 10 0.93287 9 0.93274 9 0.91865 15 0.91860 15 0.90364 16 0.90349 16 0.92148 12 0.92137 12 0.92126 13 0.92113 13 0.86676 21 0.86673 21 0.85803 23 0.85798 23 0.85408 24 0.85405 24 0.87734 18 0.87730 18 0.80845 47 0.80845 47 0.80408 48 0.80408 48 0.81928 42 0.81928 42 0.82265 38 0.82265 38 0.82114 39 0.82114 39 0.81712 43 0.81711 43 0.81389 45 0.81389 45 0.82907 33 0.82907 33 0.83517 30 0.83517 30 0.82660 36 0.82660 36 0.83468 31 0.83468 31 0.84566 26 0.84566 26

Table 3 Efficiency scores and ranking – PDVRPSTW Instance RC BCC DMUs RWS-1PX-M1 RWS-1PX-M2 RWS-1PX-M3 RWS-1PX-M4 RWS-CX-M1 RWS-CX-M2 RWS-CX-M3 RWS-CX-M4 RWS-LOX-M1 RWS-LOX-M2 RWS-LOX-M3 RWS-LOX-M4 RWS-PMX-M1 RWS-PMX-M2 RWS-PMX-M3 RWS-PMX-M4 TS-1PX-M1 TS-1PX-M2 TS-1PX-M3 TS-1PX-M4 TS-CX-M1 TS-CX-M2 TS-CX-M3 TS-CX-M4 TS-LOX-M1 TS-LOX-M2 TS-LOX-M3 TS-LOX-M4 TS-PMX-M1 TS-PMX-M2 TS-PMX-M3 TS-PMX-M4 US-1PX-M1 US-1PX-M2 US-1PX-M3 US-1PX-M4 US-CX-M1 US-CX-M2 US-CX-M3 US-CX-M4 US-LOX-M1 US-LOX-M2 US-LOX-M3 US-LOX-M4 US-PMX-M1 US-PMX-M2 US-PMX-M3 US-PMX-M4

BCC-BN

θ=1 Score 0.86578 0.85920 0.85623 0.87670 0.80696 0.80419 0.81490 0.81644 0.82819 0.82411 0.81802 0.83553 0.83502 0.83380 0.83208 0.84948 1.00000 1.00000 0.99423 1.00000 0.98378 0.99782 0.99997 0.98845 0.91978 0.92708 0.92472 0.92302 0.90076 0.89173 0.90959 0.90520 0.85967 0.85294 0.84517 0.87047 0.79312 0.79209 0.80482 0.80330 0.81693 0.81280 0.81086 0.82822 0.84066 0.82859 0.82670 0.84568

Rank 16 18 19 14 40 42 37 36 31 33 34 25 26 27 28 21 1 1 3 1 5 2 1 4 9 6 7 8 12 13 10 11 17 20 23 15 44 45 41 43 35 38 39 30 24 29 32 22

Score 0.86517 0.85858 0.85556 0.87609 0.80635 0.80358 0.81426 0.81579 0.82411 0.82013 0.81740 0.83332 0.83390 0.82890 0.83159 0.84891 0.98833 0.98275 0.99234 0.99932 0.98002 0.97128 0.99739 0.98750 0.91417 0.91549 0.92398 0.92230 0.89922 0.88541 0.90888 0.90417 0.85903 0.85229 0.84452 0.86982 0.79253 0.79150 0.80422 0.80268 0.81632 0.81218 0.81027 0.82761 0.83093 0.82332 0.82617 0.84312

θ=2 Rank 19 21 22 17 43 45 40 39 34 36 37 28 27 31 29 24 4 6 3 1 7 8 2 5 12 11 9 10 15 16 13 14 20 23 25 18 47 48 44 46 38 41 42 32 30 35 33 26

Score 0.86456 0.85795 0.85489 0.87549 0.80573 0.80297 0.81362 0.81514 0.82356 0.81957 0.81679 0.83273 0.83340 0.82839 0.83110 0.84835 0.98761 0.98209 0.99155 0.99864 0.97931 0.97053 0.99661 0.98676 0.91351 0.91481 0.92324 0.92159 0.89862 0.88481 0.90818 0.90352 0.85838 0.85165 0.84387 0.86917 0.79194 0.79091 0.80361 0.80206 0.81570 0.81157 0.80969 0.82701 0.83041 0.82281 0.82564 0.84262

21

Rank 19 21 22 17 43 45 40 39 34 36 37 28 27 31 29 24 4 6 3 1 7 8 2 5 12 11 9 10 15 16 13 14 20 23 25 18 47 48 44 46 38 41 42 32 30 35 33 26

BCC-BS Γ= 1 Γ= 4 Score Rank Score Rank 0.86517 19 0.86517 19 0.85858 21 0.85858 21 0.85556 22 0.85556 22 0.87609 17 0.87609 17 0.80635 43 0.80635 43 0.80358 45 0.80358 45 0.81426 40 0.81426 40 0.81579 39 0.81579 39 0.82414 34 0.82411 34 0.82015 36 0.82013 36 0.81741 37 0.81740 37 0.83334 28 0.83332 28 0.83390 27 0.83390 27 0.82892 31 0.82890 31 0.83159 29 0.83159 29 0.84891 24 0.84891 24 0.98841 4 0.98833 4 0.98283 6 0.98274 6 0.99235 3 0.99234 3 0.99933 1 0.99932 1 0.98005 7 0.98002 7 0.97141 8 0.97126 8 0.99740 2 0.99739 2 0.98750 5 0.98750 5 0.91421 12 0.91417 12 0.91556 11 0.91549 11 0.92398 9 0.92398 9 0.92230 10 0.92230 10 0.89922 15 0.89922 15 0.88544 16 0.88541 16 0.90888 13 0.90888 13 0.90418 14 0.90417 14 0.85903 20 0.85903 20 0.85229 23 0.85229 23 0.84452 25 0.84452 25 0.86982 18 0.86982 18 0.79253 47 0.79253 47 0.79150 48 0.79150 48 0.80422 44 0.80422 44 0.80268 46 0.80268 46 0.81632 38 0.81632 38 0.81218 41 0.81218 41 0.81027 42 0.81027 42 0.82761 32 0.82761 32 0.83099 30 0.83093 30 0.82336 35 0.82332 35 0.82617 33 0.82617 33 0.84313 26 0.84312 26

Table 4 Comparison of the best, average and worst GA combinations in terms of the averages of input and output values Instance C

R

RC

Best Average Worst Best Average Worst Best Average Worst

DMUs TS-1PX-M4 US-CX-M2 TS-CX-M4 US-CX-M2 TS-1PX-M4 US-CX-M2

CPU Time 1.534 1.758 1.925 1.548 1.774 1.923 1.594 1.842 2.012

Num. of Vehicles 12.833 13.165 13.500 13.400 13.443 13.933 13.533 14.219 14.667

Total Distance 1619.790 1737.700 1818.210 1783.960 1794.625 1867.780 1863.050 2008.982 2095.400

Total Earliness 650.412 654.543 656.352 678.932 723.311 731.368 578.490 578.921 600.653

Total Tardiness 232.217 309.292 335.705 287.591 308.340 353.118 293.719 356.122 423.905

Table 5 Comparison of the best, average and worst GA combinations in terms of the standard deviations of input and output values Instance C

R

RC

Best Average Worst Best Average Worst Best Average Worst

DMUs TS-1PX-M4 US-CX-M2 TS-CX-M4 US-CX-M2 TS-1PX-M4 US-CX-M2

CPU Time 0.0013 0.0009 0.0006 0.0015 0.0009 0.0007 0.0012 0.0010 0.0007

22

Num. of Vehicles 0.379 0.444 0.630 0.498 0.514 0.907 0.507 0.567 0.607

Total Distance 69.031 82.841 131.193 70.367 77.109 117.757 83.239 88.420 106.945

Total Earliness 111.153 115.265 117.318 106.454 115.190 128.002 95.496 95.862 98.131

Total Tardiness 61.443 82.012 83.367 66.663 78.367 88.530 70.607 78.648 91.078

Table 6 Results of Hotelling’s T-square tests (F values) for GA combinations DMUs in the test 1st vs. 2nd 2nd vs. 3rd 3rd vs. 4th 4th vs. 5th 5th vs. 6th 6th vs. 7th 7th vs. 8th 8th vs. 9th 9th vs. 10th 10th vs. 11th 11th vs. 12th 12th vs. 13th 13th vs. 14th 14th vs. 15th 15th vs. 16th 16th vs. 17th 17th vs. 18th 18th vs. 19th 19th vs. 20th 20th vs. 21st 21st vs. 22nd 22nd vs. 23rd 23rd vs. 24th 24th vs. 25th 25th vs. 26th 26th vs. 27th 27th vs. 28th 28th vs. 29th 29th vs. 30th 30th vs. 31st 31st vs. 32nd 32nd vs. 33rd 33rd vs. 34th 34th vs. 35th 35th vs. 36th 36th vs. 37th 37th vs. 38th 38th vs. 39th 39th vs. 40th 40th vs. 41st 41st vs. 42nd 42nd vs. 43rd 43rd vs. 44th 44th vs. 45th 45th vs. 46th 46th vs. 47th 47th vs. 48th

Instance C 3.66 6.73 4.72 3.01 9.36 8.38 5.71 4.39 28.62 5.35 8.61 10.25 7.94 33.92 3.77 22.15 3.00 31.40 21.87 4.95 10.30 8.40 4.66 26.22 3.20 5.66 8.07 22.64 8.00 8.64 79.05 5.24 12.04 4.20 36.33 5.22 10.91 18.63 33.55 19.28 10.91 9.23 3.55 8.00 7.02 13.39 3.61

23

Instance R 3.14 2.88 8.58 4.01 5.57 5.38 9.23 4.87 38.54 3.90 10.53 7.33 18.64 14.49 18.32 13.46 8.38 8.71 8.29 10.52 9.80 7.20 12.18 11.64 10.08 13.30 5.43 5.70 5.20 24.12 8.37 13.78 13.86 4.47 22.81 11.01 10.72 4.08 21.60 9.97 9.02 5.56 12.51 5.62 13.00 3.91 5.01

Instance RC 4.49 17.60 23.27 9.82 3.26 3.97 11.43 12.95 18.43 3.75 6.98 22.40 3.68 8.41 7.21 6.27 7.00 4.96 33.01 37.13 28.01 30.83 15.91 9.14 29.35 0.24 12.15 9.52 7.07 13.33 12.05 8.26 7.94 8.28 30.81 23.89 11.21 14.94 7.15 17.36 7.10 5.61 24.10 15.95 7.60 28.71 16.91

Table 7 Efficiency scores and ranking – VRPTW Instance C101 BCC DMUs

β1-I1-N1 β1-I1-N2 β1-I2-N1 β1-I2-N2 β1-I3-N1 β1-I3-N2 β1-I4-N1 β1-I4-N2 β1-I5-N1 β1-I5-N2 β1-I6-N1 β1-I6-N2 β2-I1-N1 β2-I1-N2 β2-I2-N1 β2-I2-N2 β2-I3-N1 β2-I3-N2 β2-I4-N1 β2-I4-N2 β2-I5-N1 β2-I5-N2 β2-I6-N1 β2-I6-N2 β3-I1-N1 β3-I1-N2 β3-I2-N1 β3-I2-N2 β3-I3-N1 β3-I3-N2 β3-I4-N1 β3-I4-N2 β3-I5-N1 β3-I5-N2 β3-I6-N1 β3-I6-N2

BCC-BN

θ=1 Score 0.89758 0.91497 0.90450 0.91729 0.90484 0.91998 0.90782 0.92024 0.91748 0.92989 0.91290 0.92837 0.93012 0.93321 0.93079 0.99451 0.93163 0.99602 0.93340 0.99643 1.00000 1.00000 0.99846 1.00000 0.86400 0.88316 0.86552 0.88435 0.86564 0.89006 0.86778 0.89023 0.87335 0.89187 0.87045 0.89447

Rank 22 17 21 16 20 14 19 13 15 11 18 12 10 7 9 5 8 4 6 3 1 1 2 1 34 28 33 27 32 26 31 25 29 24 30 23

Score 0.89863 0.90528 0.90348 0.91001 0.90426 0.91607 0.90426 0.91666 0.91365 0.91932 0.90471 0.92018 0.92684 0.98647 0.92967 0.98660 0.93120 0.99243 0.93244 0.99307 0.99694 1.00000 0.99565 0.99639 0.86471 0.87670 0.86497 0.87820 0.86711 0.87821 0.86977 0.88588 0.87079 0.88668 0.87539 0.88696

θ=2 Rank 24 19 23 18 22 16 21 15 17 14 20 13 12 8 11 7 10 6 9 5 2 1 4 3 36 30 35 29 34 28 33 27 32 26 31 25

Score 0.89810 0.90477 0.90293 0.90935 0.90365 0.91548 0.90367 0.91607 0.91304 0.91874 0.90402 0.91957 0.92623 0.98574 0.92913 0.98597 0.93066 0.99185 0.93189 0.99246 0.99641 1.00000 0.99497 0.99575 0.86405 0.87603 0.86429 0.87765 0.86643 0.87766 0.86909 0.88531 0.87023 0.88611 0.87484 0.88639

24

Rank 24 19 23 18 22 16 21 15 17 14 20 13 12 8 11 7 10 6 9 5 2 1 4 3 36 30 35 29 34 28 33 27 32 26 31 25

BCC-BS Γ= 1 Γ= 4 Score Rank Score Rank 0.89866 24 0.89863 24 0.90531 19 0.90528 19 0.90350 23 0.90348 23 0.91003 18 0.91001 18 0.90434 22 0.90425 22 0.91619 16 0.91606 16 0.90437 21 0.90425 21 0.91667 15 0.91666 15 0.91368 17 0.91364 17 0.91940 14 0.91932 14 0.90478 20 0.90470 20 0.92026 13 0.92017 13 0.92685 12 0.92684 12 0.98654 8 0.98646 8 0.92969 11 0.92967 11 0.98667 7 0.98660 7 0.93123 10 0.93120 10 0.99246 6 0.99243 6 0.93245 9 0.93244 9 0.99314 5 0.99306 5 0.99696 2 0.99694 2 1.00000 1 1.00000 1 0.99571 4 0.99565 4 0.99647 3 0.99638 3 0.86472 36 0.86471 36 0.87676 30 0.87669 30 0.86497 35 0.86497 35 0.87829 29 0.87820 29 0.86711 34 0.86711 34 0.87831 28 0.87820 28 0.86977 33 0.86977 33 0.88593 27 0.88587 27 0.87081 32 0.87078 32 0.88674 26 0.88667 26 0.87545 31 0.87538 31 0.88704 25 0.88695 25

Table 8 Efficiency scores and ranking – VRPTW Instance R101 BCC DMUs

β1-I1-N1 β1-I1-N2 β1-I2-N1 β1-I2-N2 β1-I3-N1 β1-I3-N2 β1-I4-N1 β1-I4-N2 β1-I5-N1 β1-I5-N2 β1-I6-N1 β1-I6-N2 β2-I1-N1 β2-I1-N2 β2-I2-N1 β2-I2-N2 β2-I3-N1 β2-I3-N2 β2-I4-N1 β2-I4-N2 β2-I5-N1 β2-I5-N2 β2-I6-N1 β2-I6-N2 β3-I1-N1 β3-I1-N2 β3-I2-N1 β3-I2-N2 β3-I3-N1 β3-I3-N2 β3-I4-N1 β3-I4-N2 β3-I5-N1 β3-I5-N2 β3-I6-N1 β3-I6-N2

BCC-BN

θ=1 Score 0.87677 0.91381 0.88144 0.91445 0.89780 0.92065 0.89785 0.92128 0.90105 0.92199 0.92272 0.92282 0.92302 0.92515 0.92359 0.92928 0.92912 0.93434 0.97697 0.98764 0.99057 0.99739 1.00000 1.00000 0.86076 0.86596 0.86175 0.86801 0.86243 0.86989 0.86553 0.87014 0.87113 0.87165 0.87355 0.87462

Rank 23 18 22 17 21 16 20 15 19 14 13 12 11 9 10 7 8 6 5 4 3 2 1 1 35 31 34 30 33 29 32 28 27 26 25 24

Score 0.89408 0.91061 0.89435 0.91352 0.89705 0.91404 0.90809 0.91514 0.91408 0.91742 0.91736 0.91761 0.91894 0.92526 0.92497 0.98509 0.92920 0.98748 0.97503 0.98934 0.98421 0.99339 0.99380 0.99911 0.85968 0.86336 0.86445 0.86917 0.86470 0.87041 0.86481 0.87096 0.86575 0.87537 0.87229 0.87604

θ=2 Rank 24 20 23 19 22 18 21 16 17 14 15 13 12 10 11 6 9 5 8 4 7 3 2 1 36 35 34 30 33 29 32 28 31 26 27 25

Score 0.89322 0.90978 0.89349 0.91273 0.89627 0.91325 0.90725 0.91437 0.91332 0.91659 0.91647 0.91681 0.91814 0.92446 0.92416 0.98425 0.92836 0.98663 0.97419 0.98847 0.98339 0.99256 0.99300 1.00000 0.85894 0.8626 0.86369 0.86839 0.86397 0.86969 0.86409 0.87028 0.86501 0.87457 0.87160 0.87531

25

Rank 24 20 23 19 22 18 21 16 17 14 15 13 12 10 11 6 9 5 8 4 7 3 2 1 36 35 34 30 33 29 32 28 31 26 27 25

BCC-BS Γ= 1 Γ= 4 Score Rank Score Rank 0.89419 24 0.89408 24 0.91075 20 0.91060 20 0.89447 23 0.89434 23 0.91367 19 0.91350 19 0.89718 22 0.89704 22 0.91421 18 0.91406 18 0.90820 21 0.90808 21 0.91534 16 0.91512 16 0.91427 17 0.91401 17 0.91757 14 0.91741 14 0.91754 15 0.91735 15 0.91773 13 0.91760 13 0.91911 12 0.91892 12 0.92540 10 0.92526 10 0.92512 11 0.92496 11 0.98538 6 0.98504 6 0.92934 9 0.92919 9 0.98775 5 0.98745 5 0.97506 8 0.97503 8 0.98935 4 0.98934 4 0.98423 7 0.98421 7 0.99365 3 0.99336 3 0.99403 2 0.99378 2 1.00000 1 1.00000 1 0.85971 36 0.85967 36 0.86341 35 0.86336 35 0.86452 34 0.86445 34 0.86927 30 0.86916 30 0.86481 33 0.86469 33 0.87041 29 0.87041 29 0.86481 32 0.86481 32 0.87096 28 0.87096 28 0.86583 31 0.86575 31 0.87548 26 0.87535 26 0.87231 27 0.87229 27 0.87604 25 0.87604 25

Table 9 Efficiency scores and ranking – VRPTW Instance RC101 BCC DMUs

β1-I1-N1 β1-I1-N2 β1-I2-N1 β1-I2-N2 β1-I3-N1 β1-I3-N2 β1-I4-N1 β1-I4-N2 β1-I5-N1 β1-I5-N2 β1-I6-N1 β1-I6-N2 β2-I1-N1 β2-I1-N2 β2-I2-N1 β2-I2-N2 β2-I3-N1 β2-I3-N2 β2-I4-N1 β2-I4-N2 β2-I5-N1 β2-I5-N2 β2-I6-N1 β2-I6-N2 β3-I1-N1 β3-I1-N2 β3-I2-N1 β3-I2-N2 β3-I3-N1 β3-I3-N2 β3-I4-N1 β3-I4-N2 β3-I5-N1 β3-I5-N2 β3-I6-N1 β3-I6-N2

BCC-BN

θ=1 Score 0.89444 0.89446 0.90573 0.91222 0.90875 0.91793 0.91028 0.91806 0.91874 0.92036 0.91850 0.91991 0.92106 0.97965 0.92154 0.98057 0.92500 0.98716 0.93423 0.99157 1.00000 1.00000 1.00000 1.00000 0.86365 0.86516 0.86216 0.86516 0.86514 0.86812 0.87167 0.87613 0.87949 0.88000 0.87506 0.87558

Rank 21 20 19 16 18 15 17 14 12 10 13 11 9 5 8 4 7 3 6 2 1 1 1 1 32 30 33 29 31 28 27 24 23 22 26 25

Score 0.89373 0.90916 0.89373 0.91135 0.90419 0.91303 0.90474 0.91318 0.91719 0.91777 0.91491 0.91766 0.91864 0.91917 0.92028 0.97967 0.93350 0.98618 0.97877 0.99067 0.99825 0.99916 0.99313 0.99897 0.86144 0.86436 0.86236 0.86437 0.86436 0.86672 0.86675 0.86682 0.87298 0.87430 0.86729 0.87351

θ=2 Rank 24 20 23 19 22 18 21 17 15 13 16 14 12 11 10 7 9 6 8 5 3 1 4 2 36 33 35 32 34 31 30 29 27 25 28 26

Score 0.89302 0.90836 0.89302 0.91058 0.90346 0.91212 0.90405 0.91241 0.91649 0.91708 0.91420 0.91686 0.91790 0.91848 0.91950 0.97878 0.93279 0.98520 0.97791 0.98980 0.99739 1.00000 0.99225 0.99811 0.86072 0.86355 0.86164 0.86360 0.86355 0.86598 0.86604 0.86620 0.87228 0.87367 0.86665 0.87285

26

Rank 24 20 23 19 22 18 21 17 15 13 16 14 12 11 10 7 9 6 8 5 3 1 4 2 36 33 35 32 34 31 30 29 27 25 28 26

BCC-BS Γ= 1 Γ= 4 Score Rank Score Rank 0.89373 24 0.89373 24 0.90921 20 0.90915 20 0.89375 23 0.89373 23 0.91137 19 0.91135 19 0.90432 22 0.90417 22 0.91317 18 0.91302 18 0.90477 21 0.90473 21 0.91333 17 0.91316 17 0.91721 15 0.91719 15 0.91779 13 0.91777 13 0.91507 16 0.91488 16 0.91772 14 0.91766 14 0.90432 12 0.91862 12 0.91920 11 0.91917 11 0.92028 10 0.92028 10 0.97968 7 0.97967 7 0.93351 9 0.93350 9 0.98618 6 0.98618 6 0.97879 8 0.97877 8 0.99070 5 0.99066 5 0.99847 3 0.99823 3 1.00000 1 1.00000 1 0.99326 4 0.99312 4 0.99908 2 0.99895 2 0.86144 36 0.86144 36 0.86436 33 0.86436 33 0.86243 35 0.86235 35 0.86437 32 0.86437 32 0.86436 34 0.86436 34 0.86687 31 0.86670 31 0.86688 30 0.86673 30 0.86692 29 0.86681 29 0.87310 27 0.87297 27 0.87437 25 0.87429 25 0.86744 28 0.86727 28 0.87366 26 0.87349 26

Table 10 Results of Hotelling’s T-square tests (F values) for SA parameter settings DMUs in the test 1st vs. 2nd 2nd vs. 3rd 3rd vs. 4th 4th vs. 5th 5th vs. 6th 6th vs. 7th 7th vs. 8th 8th vs. 9th 9th vs. 10th 10th vs. 11th 11th vs. 12th 12th vs. 13th 13th vs. 14th 14th vs. 15th 15th vs. 16th 16th vs. 17th 17th vs. 18th 18th vs. 19th 19th vs. 20th 20th vs. 21st 21st vs. 22nd 22nd vs. 23rd 23rd vs. 24th 24th vs. 25th 25th vs. 26th 26th vs. 27th 27th vs. 28th 28th vs. 29th 29th vs. 30th 30th vs. 31st 31st vs. 32nd 32nd vs. 33rd 33rd vs. 34th 34th vs. 35th 35th vs. 36th

Instance C101 180.85 18.62 48.47 7.05 210.17 452.02 128.53 65.25 8.07 189.05 262.14 8.56 11.34 85.6 179.47 187.25 3.94 14.53 6.79 15.58 8.15 4.12 211.25 16.33 55.7 374.66 16.16 54.11 3.12 282.79 1124.7 5.77 9.65 398.44 10.36

27

Instance R101 10.84 263.52 34.81 4.78 21.72 80.1 48.18 326.31 129.64 33.32 245.44 61.73 23.23 162.22 283.5 17.73 32.77 21.76 91.87 51.62 48.5 23.53 4.29 24.5 24.17 5.85 14.5 29.41 113.65 188.42 56.72 117.98 236.93 60.76 15.66

Instance RC101 468.39 44.01 3.65 16.23 423.65 233.33 3.75 65.79 69.6 37.16 16.63 312.77 75.99 12.75 13.7 9.19 20.46 6.19 12.93 19.4 4.14 160.09 10.72 77.78 55.02 594.26 728.94 9.82 741.06 206.72 12.21 17.84 29.86 4.3 252.12

Research Highlights • DEA is adopted to measure the relative efficiency of optimization algorithms • Consider not only average but also variation of algorithm’s output values • Robust DEA models are developed based on robust counterpart optimization approaches •Apply the models to evaluate GA operators for solving the vehicle routing problem

28

Robust Data Envelopment Analysis Approaches for Evaluating Algorithmic Performance Chung-Cheng Lu, Ph.D. Email: [email protected] (corresponding author) Tel: +886-2-27712171 ext 2306; Fax: +886-2-87726946 Department of Industrial Engineering and Management, National Taipei University of Technology 1 Section 3, Chung-Hsiao East Road, Taipei, 10608, Taiwan

Abstract Recent advances in state-of-the-art meta-heuristics feature the incorporation of probabilistic operators aiming to diversify search directions or to escape from being trapped in local optima. This feature would result in non-deterministic output in solutions that vary from one run to another of a meta-heuristic. Consequently, both the average and variation of outputs over multiple runs have to be considered in evaluating performances of different configurations of a meta-heuristic or distinct meta-heuristics. To this end, this work considers each algorithm as a decision-making unit (DMU) and develops robust data envelopment analysis (DEA) models taking into account not only average but also standard deviation of an algorithm’s output for evaluating relative efficiencies of a set of algorithms. The robust DEA models describe uncertain output using an uncertainty set, and aim to maximize a DMU’s worst-case relative efficiency with respect to that uncertainty set. The proposed models are employed to evaluate a set of distinct configurations of a genetic algorithm and a set of parameter settings of a simulated annealing heuristic. Evaluation results demonstrate that the robust DEA models are able to identify efficient algorithmic configurations. The proposed models contribute not only to the evaluation of meta-heuristics but also to the DEA methodology. Keywords: meta-heuristics; uncertainty modeling; data envelopment analysis; robust counterpart optimization.

Acknowledgements This paper is partially based on a project (NSC 102-2410-H-027-015-MY3) sponsored by the National Science Council, Taiwan. The author is grateful to three anonymous reviewers for many constructive comments and suggestions to improve the quality of this paper. The author is solely responsible for the content of this paper. 29