A filtering method for algorithm configuration based on consistency techniques

A filtering method for algorithm configuration based on consistency techniques

Knowledge-Based Systems 60 (2014) 73–81 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/...

502KB Sizes 1 Downloads 70 Views

Knowledge-Based Systems 60 (2014) 73–81

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

A filtering method for algorithm configuration based on consistency techniques Ignacio Araya ⇑, María-Cristina Riff Depto. Informática, Universidad Técnica Federico Santa María, Av. España 1680, V región, Chile

a r t i c l e

i n f o

Article history: Received 9 January 2013 Received in revised form 2 January 2014 Accepted 6 January 2014 Available online 11 January 2014 Keywords: Parameter tuning Constraint satisfaction problems Consistency techniques Algorithm configuration Algorithm design

a b s t r a c t Heuristic based algorithms are typically constructed following an iterative process in which the designer gradually introduces or modifies components or strategies whose performance is then tested by empirical evaluation on one or more sets of benchmark problems. This process often starts with some generic or broadly applicable problem solving method (e.g., metaheuristics, backtracking search), a new algorithmic idea or even an algorithm suggested by theoretical considerations. Then, through an iterative process, various combinations of components, methods and strategies are implemented/improved and tested. Even experienced designers often have to spend substantial amounts of time exploring and experimenting with different alternatives before obtaining an effective algorithm for a given problem. In this work, we are interested in assisting the designer in this task. Considering that components, methods and strategies are generally associated with parameters and parameter values, we propose a method able to detect, through a fine-tuning process, ineffective and redundant components/strategies of an algorithm. The approach is a model-free method and applies simple consistency techniques in order to discard values from the domain of the parameters. We validate our approach with two algorithms for solving SAT and MIP problems. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction When we design an algorithm for problem solving, generally we address several important decisions related to its components, such as which variable-ordering heuristic to use, which transformation algorithm to use, and whether to include some specific methods. These decisions can be handled by algorithm parameters. Then, we can apply a fine-tuning process to determine which components are crucial for the algorithm performance. There exist several automated procedures to find the optimal instantiation of the algorithm parameters (called configuration here) of a given algorithm. The problem of finding the optimal configuration is known as the algorithm configuration problem (AC) [17,3,9,16]. Automated procedures for solving this problem are called configuration algorithms or configurators, while algorithms whose parameters are tuned are called target algorithms. Usually, finding near-optimal configurations allows good performance on large data sets. However, this does not directly assist with the design phase, specifically, with discarding, simplifying or just better understanding the components of the target algorithm. Several configuration algorithms exist that supply information

⇑ Corresponding author. Tel.: +56 322654962. E-mail addresses: [email protected] (I. Araya), [email protected] (M.-C. Riff). 0950-7051/$ - see front matter Ó 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.knosys.2014.01.005

about the parameters/components of the target algorithm in addition to finding near-optimal configurations. Sampling methods (e.g., Latin-Square [21] and Taguchi Orthogonal Arrays [26]) take a representative sample of the configuration space using a full factorial design. The different configurations are analyzed to predict which parameter values work best and which are the most robust. CALIBRA [2] and meta-GA [21] reduce the search area from which new configurations are sampled at each iteration. Thus, they can be only used to find good configurations and are not well suited to analyzing the parameters. Model-based methods construct a model based on a reduced set of configurations. This model predicts the performances of new configurations. A common approach is to use a regression method to predict the utility of a configuration [12,22,14]. Based on the model, it is possible to identify some features of the parameters. Sequential model-based methods allow for a better exploration of the configuration space. Coy et al. [11] proposes a procedure consisting of a model-based method followed by a local search procedure to optimize the parameter values. SPO [5,6] goes further. In each iteration, it generates a new set of configurations and predicts their utilities using the current model. The vectors with the highest predicted performance are used to update the model for the next iteration. Finally, the procedure returns an accurate model of the most promising areas. Model-based methods commonly use

74

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Gaussian process models [23], thus they are limited to continuous parameters. SMAC [16] uses a new model class, based on the Hamming distance and random forests [10], to support categorical parameters. SMAC is also capable of handling multiple instances by integrating information about their features into the response surface model. Thus, the final model predicts the algorithm runtime for the configurations and instance features. The objective of racing methods [20,8,7] is to identify the best configurations from a large set, performing a minimum number of tests. In addition to finding the best configuration (with a certain confidence level) these methods provide information to estimate the robustness to changes in parameter values. Iterative F-RACE [4,9] combines racing with model-based methods. It starts by using a reduced population that represents the whole space of configurations. Using F-RACE [8,9], it reduces this population until a certain condition is met. Then, a multi-variate normal distribution fit on the surviving vectors is used as a probability density function to sample points for a new population. The entire procedure is repeated until a termination criteria is met. Like sequential modelbased methods, it finds good configurations and valuable information about the parameters. Evolutionary algorithms also have been used to find configurations with high utility. The ParamILS framework [18,17] starts with a default parameter configuration and iteratively improves it by searching in a neighborhood defined as the variation of the value of only one parameter. It requires that the procedure configuration h is better than configuration h0 be defined. A basic implementation of the procedure consists of comparing the average utilities over N runs, however, FocusedILS [17], an extension of the framework, replaces the procedure with racing. Evolutionary algorithms are very good at finding high quality vectors; however, they do not provide any indication of the robustness of the target algorithm. REVAC [24] is a specific type of EA for configuration where the population approximates the density function of the most promising areas, similar to Iterative F-RACE. The function is decomposed by coordinates (i.e., blind for parameter interactions), but can be used to analyze the sensitivity and relevance of the different parameters. An extension of REVAC [13] finds values that can obtain good results across a large range of different instances. Multi-objective methods allow us to take into account multiple problems or performance indicators. M-FETA [25] models the problem as a multi-objective optimization problem. It creates a Parameter Pareto Front that can be used to evaluate robustness to changes in problem definition, as well as performance using multiple performance criteria. An important added value of multi-objective methods with respect to other approaches lies in the insights regarding the applicability and fallibility (i.e., the relative difference between good and bad configuration performances [13]) of the target algorithm.

1.1. Our approach In this paper we introduce NODOM-C (NOn-DOMinated Consistency algorithm), a model-free algorithm based on sampling. The goal of NODOM-C is to detect useful components and to help the algorithm designer to identify those that are ineffective. NODOM-C may work with target algorithms containing a very large number of parameters. To our knowledge, only FocusedILS, GGA [3] and SMAC (a model-based approach) work with this type of target algorithm. Of the approaches that work with a large number of parameters, only SMAC may provide information about the parameters of the target algorithm. Iterative F-RACE also provides valuable information. Both of them, however, are sophisticated model-based approaches.

NODOM-C is based on sampling. However, unlike other sampling methods, which use sampling to detect promising parameter regions, our method uses sampling to remove and filter the configuration space. We thus refer to our approach as a filtering method for algorithm configuration. The main procedure iterates over all the parameters of the target algorithm. For each parameter, all its possible values are compared by using a sample where the rest of the parameters are set randomly from the parameter domains. Special care is taken to perform a fair comparison: two parameter values are compared using the same values for the rest of the parameters, the same instance set and the same seed. The values of the parameters, instances and seed change from one comparison to the other. A number m of comparisons is performed for every pair of parameters values. Then, a parameter value is eliminated from the domain if it is dominated by some other value (i.e., if in every comparison, it is worse than or equal to some other value) in the domain. If, after iterating over all the parameters, the configuration space has been reduced, then the whole procedure may be repeated to obtain further reductions. Finally, NODOM-C returns a reduced space of configurations and sets of comparable values for the parameters. From the results, the algorithm designer can easily deduce some interesting information: which components to discard and which components to use to perform a determined task (without impacting performance), It is important to emphasize that, unlike other configuration algorithms, our goal is not to find the best configuration for the target algorithm for a given set of instances. (A best configuration works relatively well for each instance of the given set, however, it may omit some important components of the target algorithm which may be effective in a few instances of the set.) In fact, our goal is to detect ineffective components to reduce the complexity of the target algorithm while maintaining performance. Ineffective components mean components which are useless in every single instance in the set. The algorithm is inspired by regression models, where we can reduce the complexity of a model by reducing the number of variables involved without decreasing the quality of the models estimates. In Section 2, we provide a formal definition of the algorithm configuration problem. Section 3 describes some concepts and definitions used in the filtering process. In Section 4, we detail NODOM-C, our filtering method for algorithm configuration. Experiments summarizing different aspects of the approach are shown in Section 5. Conclusions are given in Section 6.

2. The algorithm configuration problem The Algorithm Configuration problem (AC) can be stated as follows: given a target algorithm, a set of parameters for the algorithm and a set of input data, find parameter values under which the algorithm achieves the best performance on the input data. First, let us introduce some definitions. Let p1 ; . . . ; pk be the parameters of the target algorithm. The domain of possible values for each parameter pi is denoted by Hi . H ¼ H1      Hd denotes the space of all feasible configurations, and h ¼ ðh1 ; . . . ; hd Þ 2 H corresponds to a parameter instantiation or configuration. We distinguish two main types of parameters:  Categorical parameters: Those parameters which correspond to a choice inside the target algorithm. Components, methods and strategies are included in this category.  Numerical parameters: Those parameters with real or integer domains. Some examples of numerical parameters are the population size and the mutation rate in a genetic algorithm, the temperature in a simulated annealing, and the required precision of an iterative method.

75

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

In this work, we assume discrete domains for parameters. Thus, the algorithm configuration problem is a combinatorial optimization one, where numerical parameters are discretized. Definition 1 (Algorithm configuration problem [17]). An instance of the AC consists of a 5-tuple P ¼ ðA; H; P; jmax ; oÞ, where A is the target algorithm, H is the space of configurations for A, and P denote the set of input problem instances. cðh; pÞ is a function that computes the expected cost (e.g., spent CPU time) of running A on the instance p 2 P when using configuration h time jmax . jmax is a cutoff time, after which each run of A will be terminated if it is still running. Any configuration h 2 H is a candidate configuration of P. The cost of a candidate configuration h is given by:

C P ðhÞ ¼ meanðcðh; pÞÞ

ð1Þ

p2P

The definition considers the mean of costs induced by cðh; pÞ, however any other statistic could be used instead (e.g., median, variance). An optimal configuration h minimizes cP ðhÞ:

h 2 arg min C P ðhÞ h2H

The optimal cost is C  ¼ C P ðh Þ. In the following, we consider that cðh; pÞ corresponds to the expected CPU time spent by A on an instance p 2 P when using the configuration h and a cutoff time jmax . If the expected time is greater than jmax , then cðh; pÞ ¼ jmax . 3. Basic concepts related to the filtering process Before describing our filtering method for algorithm configuration we feel it is necessary to define some concepts involved in the filtering process. We call filtering the process of removing values from the domains of the parameters, such that the space generated by the Cartesian product of the reduced domains has the same optimal cost as the original space. Our filtering process involves the use of the following definition. Definition 2 (Domination and strict domination). Consider the AC instance P ¼ ðA; H; P; jmax ; oÞ. Let h0i and h00i be two values belonging to the domain Hi . We say that, for the instantiation of the parameter pi ; h0i dominates h00i in H if every pair of configurations h0 ; h00 belonging to H, such that h0 ¼ ðh1 ; . . . ; hi1 ; h0i ; hiþ1 ; . . . ; hk Þ and h00 ¼ ðh1 ; . . . ; hi1 ; h00i ; hiþ1 ; . . . ; hk Þ satisfies:

C P ðh0 Þ 6 C P ðh00 Þ Furthermore, if for all

ð2Þ

p 2 P:

cðh0 ; pÞ 6 cðh00 ; pÞ

ð3Þ

we say that h0i strictly dominates h00i in H. We remark that removing parameter values dominated by others allows us to filter the configuration space without discarding optimal configurations. However, this does not mean these parameter values (or algorithm components) are ineffective. They still may be good in single instances belonging to P. The strict domination definition allows us to detect those algorithm components that are truly ineffective. For instance, if a parameter value h00i is strictly dominated by another value h0i , then setting the parameter pi to h00i the target algorithm performs worse than setting pi to h0i in every single instance in P. Thus, the component related to the instantiation pi ¼ h00i may be discarded.

The following definition formalizes the local consistency we attempt to reach. Definition 3 (Non-dominated consistency). Consider the AC instance P ¼ ðA; H; P; jmax ; oÞ. We say that a domain Hi is nondominated consistent if for any h00i 2 Hi there is no h0i 2 Hi such that, for the parameter pi , h0i strictly dominates h00i . The space H ¼ H1      Hk is non-dominated consistent in P if all the domains Hi are non-domination consistent. In practice, we cannot reach the non-dominated consistency because it is too expensive to compute the domination condition for two values of one parameter. Thus, to know if for a parameter pi a value h0i strictly dominates another value h00i , we apply Definition 2 with a reduced sample of pairs of configurations (h0 ; h00 ) representing the universe H2 and a reduced sample of instances from P representing the set of instances. This leads us to the following definition. Definition 4 (Quasi-domination (strict)). Consider the AC instance P ¼ ðA; H; P; jmax ; oÞ. Let h0i and h00i be two values belonging to Hi . Consider a  sample of tuples  S ¼ ðh0ð1Þ ; h00ð1Þ ; pð1Þ Þ; . . . ; ðh0ðmÞ ; h00ðmÞ ; pðmÞ Þ , where h0ðkÞ 2 H, h00ðkÞ 2 H and pðkÞ 2 P for all k ¼ 1    m. Each tuple verifies 0ðkÞ 00ðkÞ 0ðkÞ 00ðkÞ hj ¼ hj for all j – i; hi ¼ h0i and hi ¼ h00i . We say that, for the parameter pi and based on the sample S, h0i quasi-dominates h00i in H if:

8ðh0 ; h00 ; pÞ 2 S :

cðh0 ; pÞ 6 cðh00 ; pÞ

ð4Þ

4. The NODOM-C Algorithm Our NOn-DOMinated Consistency algorithm (NODOM-C) attempts to reach the largest non-dominated consistent space of configurations. It can be summarized in the following steps: 1. Begin with an initial search space H corresponding to all the parameter configurations. 2. Reduce H by filtering the domain of each parameter pi independently: the algorithm removes values from the domain of Hi if they are quasi-dominated by some other value in Hi . 3. Repeat step 2 a given number of times or until no changes are produced in H. Note that step 2 tries to reach a non-dominated consistent domain for the parameter pi . The algorithm must iterate because the different parameters are usually correlated, thus reductions in one parameter domain eventually may imply further filtering in others domains. Algorithm 1. filter P (in: i; m; at ; inout: H; comp

v alues½i)

1: for all k 2 f1; . . . ; m} do get random instanceðPÞ 2: p 3: hH get_random_configurationðHÞ 4: Lh fg 5: for all hi 2 Hi do H H H 6: h fhH 1 ; . . . ; hi1 ; hi ; hiþ1 ; . . . ; hd g Lh [ fhg 7: Lh 8: Compute cðh; pÞ /⁄ to be used in the comparison phase ⁄/ 9: end for 10: /⁄⁄ Comparison phase ⁄⁄/ 11: Initialize the elements of the matrix Ddd to true (continued on next page)

76

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

12: for all ðh0 ; h00 Þ 2 L2h ; h0 – h00 do 13: if cðh0 ; pÞ > ð1 þ at Þ  cðh00 ; pÞ then   false 14: D h0i ½h00i  15: end if 16: end for 17: for all hi 2 Hi do cT ½hi  cT ½hi  þ cðh; pÞ end for 18: end for 19: /⁄⁄ Filtering phase ⁄⁄/ for all ðh0i ; h00i Þ 2 H2i do ifD½h0i ½h00i  and cT ½h00i  > cT ½h0i  then if D½h00i ½h0i  then comp alues½i:addðh0i ; h00i Þ removeðh00i ; Hi Þ

20: 21: 22: 23: 24: end if 25: end for

v

end if

4.1. Optimizing the pairwise comparison When Algorithm 1 performs the all-against-all pairwise comparison, we can know, before finishing the m comparisons, if a value hH i 2 Hi is incomparable with any value in Hi , i.e., neither does hH quasi-dominate any other value nor is hH i i quasi-dominated by any other value in the domain. Formally, hH i is incomparable when:

ð8hi 2 Hi s:t: hi – hH i Þ :

H ðD½hi ½hH i  _ D½hi ½hi Þ () false

Incomparable values cannot be discarded or discard other values from the domains. Thus, to avoid performing useless comparisons and runs of the target algorithm, we added a black list/set Hbl i storing the incomparable values related to the parameter pi . The set Hbl i should be initialized to an empty set before the first line of Algorithm 1. The following code updates Hbl i , and it is added after line 17: bl for all hH i 2 ðHi n Hi Þ do

Algorithm 1 filters a domain Hi . To filter, the algorithm performs m all-against-all pairwise comparison among the different values in Hi . In each comparison, a random instance p is extracted from the training set. In addition, a random parameter configuration hH is obtained from the current search space H. For each value hi in Hi , a configuration h is generated. The configurations are put in a list Lh , and their costs are computed. Note that the computed costs are (re)used in the comparison phase, thus avoiding unnecessary runs of the target algorithm. After the large loop (lines 1–18), each element D½h0i ½h00i  indicates if, according to the condition (4), h0i quasi-dominates h00i . Note that the comparison, performed in line 13, includes a parameter at > 0. at is a tolerance parameter that allows one to relax the condition (4) to find more dominated values. The last for-all loop performs the filtering. A value h00i is removed from Hi only if it is quasi-dominated by other value h0i 2 Hi and cT ½h00i  > cT ½h0i . cT ½hi  corresponds to a total cost related to the parameter value hi and it is computed incrementally in line 17 (the elements of the array must be initialized in 0). This total cost is used as a tie breaker when two parameter values dominate each other. When this occurs, the eliminated value h00i and the non-eliminated one h0i are considered comparable; when instantiating pi to h0i we should obtain a similar performance as instantiating pi to h00i . Algorithm 2. NODOM-CP(in: m; at ; max iter; inout: H; out: comp v alues)

comp v alues ½fg::fg; iter 0 repeat iter iter þ 1 H Hold for all i 2 f1; . . . ; dg do filter P ði; m; at ; H; comp v alues½iÞ end for until Hold ¼ H or iter ¼ max iter

Finally, Algorithm 2 shows the implementation of the NODOMC algorithm. It simply consists in a repeat-until loop that performs the filtering of each parameter domain, by using Algorithm 1, until no further reduction is produced in H. At the end, the algorithm returns a filtered space H and comp v alues, an array of sets of comparable values for each parameter.

H H if ð8hi 2 Hi ; hi – hH i Þ : ðD½hi ½hi  _ D½hi ½hi Þ () false then

Hbl i

H Hbl i [ fhi g

end if end for Line 5 is replaced by: for all hi 2 ðHi n Hbl i Þ do; and line 20 by: 2 for all ðh0i ; h00i Þ 2 ðHi n Hbl i Þ . 5. Experiments We performed several experiments to analyze different aspects of our approach. The main objectives of these experiments are: 1. to evaluate the filtering power of NODOM-C related to discarding useless parameter values (Section 5.1); 2. to evaluate the reliability of NODOM-C related to (a) not discarding useful parameter values (Section 5.2), (b) returning only useful parameter values (Section 5.3) and (c) finding a set of comparable values (Section 5.4); 3. to evaluate the practical interest of NODOM-C related to simplifying algorithms by removing useless or redundant components (Section 5.5). Target algorithms and instance sets We selected the target algorithms SPEAR [15,1]. These algorithms have been already used to test configuration algorithms [17,19].  SPEAR is a tree search algorithm for solving SAT problems. Configured with ParamILS, SPEAR won the quantifier-free bit-vector arithmetic category of the 2007 Satisfiability Modulo Theories Competition [15]. 26 parameters with a total number of 177 values have been considered. This algorithm is configured using two sets of instances: SAT-encoded quasi-group completion problems (QCP) and SAT-encoded graph-coloring problems based on small world graphs (SW-GCP).  CPLEX is a well-known algorithm for solving mixed integer programming problems. Among the 159 user-specifiable parameters, only 76 have been chosen. Hutter et al. [17] claim to have carefully choose these 76 parameters containing only those that affect CPLEX’s search trajectory. A total number of 346 values for the parameters have been considered. CPLEX was configured using one set of instances, the MIP-encoded winner determination problem for combinatorial auctions (Regions100). Parameters with continuous domains were discretized and a reduced number of values were considered for parameters with

77

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Fig. 1. Average size of the search space reported after each iteration for 10 independent runs of NODOM-C.

integer domains. We used the same values proposed by Hutter et al. [17]. The authors performed the selection of values using different criteria (e.g., according to intuition based on experience, choosing values uniformly distributed in the continuous domain).1 Our configuration algorithm uses a training set to filter the parameter domains of the target algorithm. As in some cases, the configuration algorithms find configurations adapted specifically to instances in the training set (see over-tuning [8,7,15]), we decided to evaluate the results using a new set of test instances. Thus, for each problem we use two sets of 1000 instances: a training set and a test set.2 Parameters of the configurator All of our experiments were carried out on a PowerEdge T420 server with 2 2.20 GHz quad-core Intel Xeon processors and 8 GB RAM running Ubuntu Linux. For the AC, we set a cutoff time of jmax ¼ 5 s (the same value used in [17]). The tolerance parameter was fixed to at ¼ 0:1, i.e., we consider that two costs are equivalent if the difference between them is at most 10% of the lowest cost. The parameter m was fixed to 50. The number of iterations was fixed to 3 (i.e., max iter ¼ 3). The expected cost cðh; pÞ corresponds to the CPU time spent by the target algorithm in the instance p with the configuration h. 5.1. Discarding ineffective values: filtering power The first results give us an idea of how much we can reduce the search space by removing ineffective values. Consider that many of these values correspond to components of the target algorithm (others are related to numerical parameters). Thus, we could remove these components without affecting the performance. Plots of Fig. 1 show the average size of the search space computed at the end of each iteration for each scenario. The size 1 A detailed list of all the parameters and their domains can be found at http:// www.cs.ubc.ca/labs/beta/Projects/ParamILS/algorithms.html. 2 The complete list can be found at: http://www.cs.ubc.ca/labs/beta/Projects/ ParamILS/results.html.

at iteration 0 corresponds to the initial search space. Note that in each target algorithm, the largest reduction in the size of the search space is produced by the first iteration, where the size decreases by several orders of magnitude (the plots are given in a logarithmic scale). Fig. 2 is similar to Fig. 1. It shows the average sum of the domain cardinalities computed at the end of each iteration. Note that during the first iteration about 50% of the values are removed from the domain of the parameters. Fig. 3 shows the CPU time spent by the filtering algorithm at each iteration. Note that the time spent by the target algorithm in each iteration of NODOM-C is limited by:

t 6 av target time  m 

d X cardðHi Þ

ð5Þ

i¼1

where av target time is the average CPU time spent by one run of P the target algorithm (av target time 6 jmax ). ki¼1 cardðHi Þ corresponds to the sum of the current domain cardinalities. Thus, as the iterations continue, the iteration time is reduced because (1) the cardinality related to each domain is also reduced (Fig. 2) and (2) the search space contains better configurations, reducing the average CPU time spent by the target algorithm in each instance. Table 1 shows the average costs of different configurations of the test sets for each scenario. The table also reports the average CPU time required to obtain these configurations. Configurations were obtained by applying the following strategies: random configurations from the initial space of configurations H, configurations obtained by applying FocusedILS to the initial space of configurations, random configurations from the space of configurations filtered by NODOM-C HF and configurations obtained by applying FocusedILS to this filtered space. The application of FocusedILS was restricted to 5 CPU hours. Each strategy was run 10 times, obtaining 10 different configurations for each case (the average costs are shown in the table).

78

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Fig. 2. Average sum of the cardinalities of the different domains reported after each iteration for 10 independent runs of NODOM-C.

Fig. 3. Average CPU time spent by the NODOM-C algorithm in each one of three iterations.

Note that random configurations in the filtered space are, on average, considerable better than random configurations in the initial space. Also note that FocusedILS finds comparable configurations in both spaces. Thus, although NODOM-C does not seem to improve the efficacy of FocusedILS or find better configurations, it seems, at least, to filter the initial space, preserving the best configurations. In the following experiments, when we refer to the filtered space returned by NODOM-C for a given scenario, we actually refer

to the largest space among the 10 filtered spaces returned by NODOM-C. 5.2. Reliability A: does NODOM-C discard only ineffective values? Our hypothesis is that NODOM-C discards parameter values which are useless in every single instance of the training set. In the following experiment, we attempt to prove that this statement is not true. Thus, the experiment tries to find a configuration

79

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Table 1 Comparison of different configurations for each scenario. At the top of each entry we report the cost (i.e., the average runtime over the set of instances) of the related configuration applied to the given scenario. At the bottom of each entry we report the average CPU time, in hours, spent by the tuning strategy to obtain the related configuration. Both results are expressed as a confidence interval (mean  error) with a ¼ 0:05. and tuning time reported by different strategies.

Spear (SW-GCP) Spear (QCP) CPLEX (Reg.100)

Random (H)

FocusedILS (H)

Random (HF )

FocusedILS (HF )

2.41 ± 0.31 0.0 0.55 ± 0.12 0.0 3.91 ± 0.24 0.0

1.02 ± 0.02 5.0 0.18 ± 0.01 5.0 0.19 ± 0.01 5.0

1.26 ± 0.25 5.7 ± 0.5 0.24 ± 0.05 2.1 ± 0.2 1.67 ± 0.12 22.9 ± 2.1

1.00 ± 0.02 10.7 ± 0.5 0.21 ± 0.05 7.1 ± 0.2 0.18 ± 0.01 27.9 ± 2.1

Table 2 Comparison between near-optimal configurations in the initial space H and nearoptimal configurations in the filtered space HF . The last column is expressed as a confidence interval (mean  error) with a ¼ 0:05. Scenario

Proportion of cases i

cðh ; pi Þ þ 0:02 < Spear (SW-GCP) Spear (QCP) CPLEX (Regions100)

0.032 0.028 0.008

cðhiF ;

Average costs

pi Þ

H 0:38  0:01 0:06  0:00 0:19  0:01

Table 3 Number of parameter values with at least one comparable value (#pv with cv), average number of comparable values for a parameter value (av (#cv)) and maximum number of comparable values for a parameter value (max #cv) for each scenario.

HF

Scenario

#pv with cv

av (#cv)

max #cv

0:38  0:02 0:06  0:00 0:19  0:01

Spear (SW-GCP) Spear (QCP) CPLEX (Regions100)

14 8 46

3.9 1.4 2.5

16 2 6

outside the filtered space which is better than all the configurations inside the filtered space for some single instance in the training set. To find this outer-configuration, we search for the best configuration hi in the whole space for each instance pi in the training set. In the same way, we search for the best configuration hiF in the filtered space for each instance. Then, we compare hi and hiF when solving the instance pi . According to our hypothesis, hi should never be better than hiF . The second column in Table 2 shows the proportion of instances in the corresponding training set in which the estimated cost of hiF is greater than the estimated cost of hi . To estimate the optimal configuration cost of a given scenario, space and instance pi , first, we perform ParamILS restricted to 150 iterations on the given instance, obtaining a near-optimal configuration hi . The estimated cost then corresponds to the average time spent by the target algorithm (10 runs with different seeds) on solving pi using the configuration hi . The third column in the table shows the average costs for the set of training instances of each scenario. Note that the proportion of cases in which a configuration hi is better than hiF is rather low. This provides partial evidence for the hypothesis, although it also means that some (few) important values are likely discarded by NODOM-C from the parameter domains. Discarding important values is due to two causes: (1) the relation of dominance among two values is computed comparing just a reduced sample of configurations (m ¼ 50) and (2) repeating the filtering process in each variable increases the probability of mistakenly removing a value. 5.3. Reliability B: does NODOM-C find a set of only effective values? No experiment is required to answer this question. Recall that in the filtering phase of NODOM-C (see Algorithm 1), values are eliminated from the domains if and only if they are shown to be strictly dominated by others. Thus, any non-eliminated value is not strictly dominated by another in the current domain. If we define as effective value those values which are not strictly dominated by others, then all the values in the filtered domains are effective.3 3 Assuming, of course, that the domains were not reduced in the last iteration of NODOM-C, in which case the domains can still contain some ineffective values.

Table 4 Proportion of instances in which h0 and a configuration from a given space (HC or HNC ) have comparable costs. HC is the space of configurations comparable to h0 found by NODOM-C. HNC is a set of configurations which are not comparable to h0 according to NODOM-C. Scenario

Spear (SW-GCP) Spear (QCP) CPLEX (Regions100)

HC

HNC

Train.

Test.

Train.

Test.

0.98 0.98 0.97

0.98 0.97 0.96

0.34 0.90 0.13

0.32 0.85 0.07

5.4. Reliability C: are the comparable values found by NODOM-C actually comparable? Recall that for any parameter value, NODOM-C retains a set of comparable values in the comp v alues arrays. In this section, we show evidence that the comparable parameter values found by NODOM-C has a similar performance. Basically, in the experiments, we observed if the performance of the target algorithm is maintained when a parameter value is replaced by a comparable value. Table 3 shows the number of parameter values with at least one comparable value for each scenario (column 2). The third column shows the average number of comparable values for each parameter value, and the fourth column shows the maximum number of comparable values for a parameter value. For instance, in the CPLEX scenario (Regions100), there are 46 parameter values with comparable values. Each one has an average of 2:5 comparable values. The value 5 = shifting for the parameter lpmethod (method for linear optimization) has the maximum number of comparable values (0 = automatic, 1 = primal simplex, 2 = dual simplex, 3 = network simplex, 4 = barrier, 6 = concurrent dual and barrier). should not affect the performance of CPLEX in any instance. In the experiments, we took a competitive configuration h0 from the filtered space HF . Then, we constructed HC , a space of configurations comparable to h0 , initializing each parameter domain Hi with the set of comparable values found by NODOM-C related to h0i and the parameter pi . We generated 10 random configurations hc from HC and we compared each of them to h0 to verify if they are actually comparable in any single instance. Table 4 reports

80

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Table 5 Performance comparison between the original and the simplified target algorithms (based on the results reported by NODOM-C).

Spear (SW-GCP) sSpear (SW-GCP) Spear (QCP) sSpear (QCP) CPLEX (Reg.100) sCPLEX (Reg.100)

Whole set of instances

Individual instances

1.00 ± 0.02 1.01 ± 0.02 0.18 ± 0.05 0.20 ± 0.11 0.19 ± 0.01 0.19 ± 0.01

0.39 ± 0.01 0.38 ± 0.01 0.07 ± 0.0 0.06 ± 0.0 0.19 ± 0.01 0.19 ± 0.00

the proportion of instances in which hc and h0 have comparable costs.4 HNC HF corresponds to a set of 10 configurations generated by replacing the parameter values (with comparable values) of h0 with random non comparable values. In other words, HNC is a set of configurations which are not comparable to h0 according to NODOM-C. Observe that configurations in HC are clearly comparable to h0 while configurations in HNC are, in general, not comparable to it. 5.5. Using NODOM-C to simplify the target algorithm Finally, we noted that the results reported by NODOM-C (ineffective and comparable parameter values) may be used to reduce the complexity of the target algorithm in at least three different ways: 1. We can find useless components by searching those filtered domains containing only one associated value to disable a component (e.g., sp-clause-inversion=false in the Spear algorithm). 2. Any other filtered domain may also be used to find useless components. For instance, the parameter lpmethod of CPLEX initially has 7 possible values: 0 = automatic, 1 = primal simplex, 2 = dual simplex, 3 = network simplex, 4 = barrier, 5 = shifting, 6 = concurrent dual and barrier. If, for instance, the domain of this parameter is reduced to f1; 6g, we could most likely remove the components related to network simplex and shifting methods, because they do not seem to be related to the methods 1 and 6. However, if the domain is reduced to a set of values containing 0, it is not clear if we can remove some component (automatic means that any of the methods could be chosen). 3. Comparable values can help to discard components. Suppose that the filtered domain of the parameter lpmethod is f0g with the comparable value 1. Then, we could remove all the components related to the parameter except the primal simplex method. Taking into account these three methods for simplification, we estimate that, assisted by NODOM-C, we could discard approximately 40 components of the CPLEX algorithm and approximately 12 components of the SMEAR algorithm. In other words, NODOM-C says that these components are unnecessary for solving the corresponding instances. We simulated the implementation of a simplified CPLEX algorithm (sCPLEX) and a simplified SMEAR algorithm (sSMEAR) by reducing its parameter domains according to their seemingly useless components. Then, we performed two tests for comparing the performance of the original and the simplified target algorithms. Table 5 reports the results. The second column shows the first test results. We generated 10 configurations for each scenario. 4 We consider that two configurations h0 and h00 have comparable costs for an instance p if their costs c1 ¼ cðh0 ; pÞ and c2 ¼ cðh00 ; pÞ satisfy the relation maxðjc1 c2 j0:02;0Þ < 0:2). maxðc1;c2Þ

Each configuration was generated by applying FocusedILS (restricted to 5 h) to the given scenario (target algorithm + instance training set). The cells reports the average cost (mean  error) of the 10 configurations found by FocusedILS. The third column reports the results of the second tests. For each instance in the given scenario, we searched for the best configuration using FocusedILS restricted to 10 min. The table reports the average costs of the set of configurations. Note that the times are lower because in this case we used a different ad hoc configuration for each instance. Note that, according to these experiments, the original and the simplified target algorithms have virtually equivalent performances. 6. Conclusion In this paper, we propose NODOM-C, a filtering method for the algorithm configuration problem. NODOM-C is simple (modelfree), and it is applicable to target algorithms with a large number of components/parameters. The main focus of NODOM-C is the algorithm design process. Experiments in Section 5.5 highlight that our approach can be a very useful tool to automatically detect ineffective or comparable (interchangeable) components, methods or strategies. Furthermore, experiments on Sections 5.2–5.4 show that NODOM-C is quite reliable, i.e., it mainly discards ineffective parameter values from the parameter domains. To increase the reliability of NODOM-C, in a future work we plan to incorporate a stochastic technique to estimate the distribution of the difference between costs for two parameter values. This estimation would be used to improve the accuracy of the filtering. Acknowledgments This work is supported by the Fondecyt Project 1120781 and Fondecyt Project 11121366. References [1] CPLEX, 11.0 User’s Manual, 2008. [2] B. Adenso-Diaz, M. Laguna, Fine-tuning of algorithms using fractional experimental designs and local search, Oper. Res. 54 (1) (2006) 99–114. [3] C. Ansotegui, M. Sellmann, K. Tierney, A gender-based genetic algorithm for the automatic configuration of solvers, in: Proc. CP, Springer, 2009, pp. 142– 157. [4] P. Balaprakash, M. Birattari, T. Stutzle, Improvement strategies for the f-race algorithm: Sampling design and iterative refinement, in: Proc. of the 4th International Conference on Hybrid Metaheuristics, Springer, 2007, pp. 108– 122. [5] T. Bartz-Beielstein, Experimental research in evolutionary computation: the new experimentalism, Nat. Comput. Ser. XIV (2006). [6] T. Bartz-Beielstein, M. Preuss, The future of experimental research, in: Proc. GECCO 2009, ACM Press, 2009, pp. 3185–3226. [7] M. Birattari, The Problem of Tuning Metaheuristics: As seen from a Machine Learning Perspective, IOS Press, 2005. [8] M. Birattari, T. Stutzle, L. Paquete, K. Varrentrapp, A racing algorithm for configuring metaheuristics, Proc. GECCO, vol. 2, Morgan Kaufmann, 2002, pp. 11–18. [9] M. Birattari, Z. Yuan, P. Balaprakash, T. Stüzle, F-race and iterated F-race: an overview, in: T. Bartz-Beielstein, M. Chiarandini, L. Paquete (Eds.), Experimental Methods for the Analysis of Optimization Algorithms, Springer, 2010. [10] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [11] S. Coy, B. Golden, G. Runger, E. Wasil, Using experimental design to find effective parameter settings for heuristics, J. Heuristics 7 (1) (2001) 77–97. [12] A. Czarn, C. MacNish, K. Vijayan, B. Turlach, R. Gupta, Statistical exploratory analysis of genetic algorithms, IEEE Trans. Evolut. Comput. (8) (2004) 405– 421. [13] A. Eiben, S. Smit, Parameter tuning for configuring and analyzing evolutionary algorithms, Swarm Evol. Comput. 1 (1) (2011) 19–31. [14] O. François, C. Lavergne, Design of evolutionary algorithms – a statistical perspective, IEEE Trans. Evol. Comput. 5 (2001) 129–148. [15] F. Hutter, D. Babic, H. Hoos, A. Hu, Boosting verification by automatic tuning of decision procedures, in: Proc. Uncertainty in Artificial Intelligence (UAI’01), Morgan Kaufmann, 2007, pp. 235–244.

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81 [16] F. Hutter, H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, in: LION 2011, Springer, 2011, pp. 507–523. [17] F. Hutter, H. Hoos, K. Leyton-Brown, T. Stutzle, Paramils: an automatic algorithm configuration framework, J. Artif. Intell. Res. 36 (1) (2009) 267–306. [18] F. Hutter, H. Hoos, T. Stutzle, Automatic algorithm configuration based on local search, in: Proc. AAAI, AAAI Press, 2007, pp. 1152–1157. [19] F. Hutter, H.H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, in: LION-5, Springer, 2011, pp. 507–523. [20] O. Maron, A. Moore, The racing algorithm: model selection for lazy learners, Artif. Intell. Rev. 11 (1) (1997) 193–225. [21] R. Myers, E. Hancock, Empirical modelling of genetic algorithms, Evol. Comput. 9 (2001) 461–493.

81

[22] I. Ramos, M. Goldbarg, E. Goldbarg, A. Neto, Logistic regression for parameter tuning on an evolutionary algorithm, in: Proc. CEC Congress, IEEE Press, Edinburgh, UK, 2005, pp. 1061–1068. [23] C. Rasmussen, C. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. [24] S. Smit, A. Eiben, Comparing parameter tuning methods for evolutionary algorithms, in: Evolutionary Computation, 2009. CEC’09, IEEE, 2009, pp. 399– 406. [25] S. Smit, A. Eiben, Z. Szlávik, An moea-based method to tune ea parameters on multiple objective functions, in: IJCCI, SciTePress, 2010, pp. 261–268. [26] G. Taguchi, T. Tokotama, Taguchi Methods: Design of Experiments, ASI Press, 1993.