A filtering method for algorithm configuration based on consistency techniques

Knowledge-Based Systems 60 (2014) 73–81 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/...

Download PDF

502KB Sizes 1 Downloads 70 Views

Report

PDF Reader
Full Text

Knowledge-Based Systems 60 (2014) 73–81

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

A ﬁltering method for algorithm conﬁguration based on consistency techniques Ignacio Araya ⇑, María-Cristina Riff Depto. Informática, Universidad Técnica Federico Santa María, Av. España 1680, V región, Chile

a r t i c l e

i n f o

Article history: Received 9 January 2013 Received in revised form 2 January 2014 Accepted 6 January 2014 Available online 11 January 2014 Keywords: Parameter tuning Constraint satisfaction problems Consistency techniques Algorithm conﬁguration Algorithm design

a b s t r a c t Heuristic based algorithms are typically constructed following an iterative process in which the designer gradually introduces or modiﬁes components or strategies whose performance is then tested by empirical evaluation on one or more sets of benchmark problems. This process often starts with some generic or broadly applicable problem solving method (e.g., metaheuristics, backtracking search), a new algorithmic idea or even an algorithm suggested by theoretical considerations. Then, through an iterative process, various combinations of components, methods and strategies are implemented/improved and tested. Even experienced designers often have to spend substantial amounts of time exploring and experimenting with different alternatives before obtaining an effective algorithm for a given problem. In this work, we are interested in assisting the designer in this task. Considering that components, methods and strategies are generally associated with parameters and parameter values, we propose a method able to detect, through a ﬁne-tuning process, ineffective and redundant components/strategies of an algorithm. The approach is a model-free method and applies simple consistency techniques in order to discard values from the domain of the parameters. We validate our approach with two algorithms for solving SAT and MIP problems. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction When we design an algorithm for problem solving, generally we address several important decisions related to its components, such as which variable-ordering heuristic to use, which transformation algorithm to use, and whether to include some speciﬁc methods. These decisions can be handled by algorithm parameters. Then, we can apply a ﬁne-tuning process to determine which components are crucial for the algorithm performance. There exist several automated procedures to ﬁnd the optimal instantiation of the algorithm parameters (called conﬁguration here) of a given algorithm. The problem of ﬁnding the optimal conﬁguration is known as the algorithm conﬁguration problem (AC) [17,3,9,16]. Automated procedures for solving this problem are called conﬁguration algorithms or conﬁgurators, while algorithms whose parameters are tuned are called target algorithms. Usually, ﬁnding near-optimal conﬁgurations allows good performance on large data sets. However, this does not directly assist with the design phase, speciﬁcally, with discarding, simplifying or just better understanding the components of the target algorithm. Several conﬁguration algorithms exist that supply information

⇑ Corresponding author. Tel.: +56 322654962. E-mail addresses: [email protected] (I. Araya), [email protected] (M.-C. Riff). 0950-7051/$ - see front matter Ó 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.knosys.2014.01.005

about the parameters/components of the target algorithm in addition to ﬁnding near-optimal conﬁgurations. Sampling methods (e.g., Latin-Square [21] and Taguchi Orthogonal Arrays [26]) take a representative sample of the conﬁguration space using a full factorial design. The different conﬁgurations are analyzed to predict which parameter values work best and which are the most robust. CALIBRA [2] and meta-GA [21] reduce the search area from which new conﬁgurations are sampled at each iteration. Thus, they can be only used to ﬁnd good conﬁgurations and are not well suited to analyzing the parameters. Model-based methods construct a model based on a reduced set of conﬁgurations. This model predicts the performances of new conﬁgurations. A common approach is to use a regression method to predict the utility of a conﬁguration [12,22,14]. Based on the model, it is possible to identify some features of the parameters. Sequential model-based methods allow for a better exploration of the conﬁguration space. Coy et al. [11] proposes a procedure consisting of a model-based method followed by a local search procedure to optimize the parameter values. SPO [5,6] goes further. In each iteration, it generates a new set of conﬁgurations and predicts their utilities using the current model. The vectors with the highest predicted performance are used to update the model for the next iteration. Finally, the procedure returns an accurate model of the most promising areas. Model-based methods commonly use

74

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Gaussian process models [23], thus they are limited to continuous parameters. SMAC [16] uses a new model class, based on the Hamming distance and random forests [10], to support categorical parameters. SMAC is also capable of handling multiple instances by integrating information about their features into the response surface model. Thus, the ﬁnal model predicts the algorithm runtime for the conﬁgurations and instance features. The objective of racing methods [20,8,7] is to identify the best conﬁgurations from a large set, performing a minimum number of tests. In addition to ﬁnding the best conﬁguration (with a certain conﬁdence level) these methods provide information to estimate the robustness to changes in parameter values. Iterative F-RACE [4,9] combines racing with model-based methods. It starts by using a reduced population that represents the whole space of conﬁgurations. Using F-RACE [8,9], it reduces this population until a certain condition is met. Then, a multi-variate normal distribution ﬁt on the surviving vectors is used as a probability density function to sample points for a new population. The entire procedure is repeated until a termination criteria is met. Like sequential modelbased methods, it ﬁnds good conﬁgurations and valuable information about the parameters. Evolutionary algorithms also have been used to ﬁnd conﬁgurations with high utility. The ParamILS framework [18,17] starts with a default parameter conﬁguration and iteratively improves it by searching in a neighborhood deﬁned as the variation of the value of only one parameter. It requires that the procedure conﬁguration h is better than conﬁguration h0 be deﬁned. A basic implementation of the procedure consists of comparing the average utilities over N runs, however, FocusedILS [17], an extension of the framework, replaces the procedure with racing. Evolutionary algorithms are very good at ﬁnding high quality vectors; however, they do not provide any indication of the robustness of the target algorithm. REVAC [24] is a speciﬁc type of EA for conﬁguration where the population approximates the density function of the most promising areas, similar to Iterative F-RACE. The function is decomposed by coordinates (i.e., blind for parameter interactions), but can be used to analyze the sensitivity and relevance of the different parameters. An extension of REVAC [13] ﬁnds values that can obtain good results across a large range of different instances. Multi-objective methods allow us to take into account multiple problems or performance indicators. M-FETA [25] models the problem as a multi-objective optimization problem. It creates a Parameter Pareto Front that can be used to evaluate robustness to changes in problem deﬁnition, as well as performance using multiple performance criteria. An important added value of multi-objective methods with respect to other approaches lies in the insights regarding the applicability and fallibility (i.e., the relative difference between good and bad conﬁguration performances [13]) of the target algorithm.

1.1. Our approach In this paper we introduce NODOM-C (NOn-DOMinated Consistency algorithm), a model-free algorithm based on sampling. The goal of NODOM-C is to detect useful components and to help the algorithm designer to identify those that are ineffective. NODOM-C may work with target algorithms containing a very large number of parameters. To our knowledge, only FocusedILS, GGA [3] and SMAC (a model-based approach) work with this type of target algorithm. Of the approaches that work with a large number of parameters, only SMAC may provide information about the parameters of the target algorithm. Iterative F-RACE also provides valuable information. Both of them, however, are sophisticated model-based approaches.

NODOM-C is based on sampling. However, unlike other sampling methods, which use sampling to detect promising parameter regions, our method uses sampling to remove and ﬁlter the conﬁguration space. We thus refer to our approach as a ﬁltering method for algorithm conﬁguration. The main procedure iterates over all the parameters of the target algorithm. For each parameter, all its possible values are compared by using a sample where the rest of the parameters are set randomly from the parameter domains. Special care is taken to perform a fair comparison: two parameter values are compared using the same values for the rest of the parameters, the same instance set and the same seed. The values of the parameters, instances and seed change from one comparison to the other. A number m of comparisons is performed for every pair of parameters values. Then, a parameter value is eliminated from the domain if it is dominated by some other value (i.e., if in every comparison, it is worse than or equal to some other value) in the domain. If, after iterating over all the parameters, the conﬁguration space has been reduced, then the whole procedure may be repeated to obtain further reductions. Finally, NODOM-C returns a reduced space of conﬁgurations and sets of comparable values for the parameters. From the results, the algorithm designer can easily deduce some interesting information: which components to discard and which components to use to perform a determined task (without impacting performance), It is important to emphasize that, unlike other conﬁguration algorithms, our goal is not to ﬁnd the best conﬁguration for the target algorithm for a given set of instances. (A best conﬁguration works relatively well for each instance of the given set, however, it may omit some important components of the target algorithm which may be effective in a few instances of the set.) In fact, our goal is to detect ineffective components to reduce the complexity of the target algorithm while maintaining performance. Ineffective components mean components which are useless in every single instance in the set. The algorithm is inspired by regression models, where we can reduce the complexity of a model by reducing the number of variables involved without decreasing the quality of the models estimates. In Section 2, we provide a formal deﬁnition of the algorithm conﬁguration problem. Section 3 describes some concepts and deﬁnitions used in the ﬁltering process. In Section 4, we detail NODOM-C, our ﬁltering method for algorithm conﬁguration. Experiments summarizing different aspects of the approach are shown in Section 5. Conclusions are given in Section 6.

2. The algorithm conﬁguration problem The Algorithm Conﬁguration problem (AC) can be stated as follows: given a target algorithm, a set of parameters for the algorithm and a set of input data, ﬁnd parameter values under which the algorithm achieves the best performance on the input data. First, let us introduce some deﬁnitions. Let p1 ; . . . ; pk be the parameters of the target algorithm. The domain of possible values for each parameter pi is denoted by Hi . H ¼ H1 Hd denotes the space of all feasible conﬁgurations, and h ¼ ðh1 ; . . . ; hd Þ 2 H corresponds to a parameter instantiation or conﬁguration. We distinguish two main types of parameters: Categorical parameters: Those parameters which correspond to a choice inside the target algorithm. Components, methods and strategies are included in this category. Numerical parameters: Those parameters with real or integer domains. Some examples of numerical parameters are the population size and the mutation rate in a genetic algorithm, the temperature in a simulated annealing, and the required precision of an iterative method.

75

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

In this work, we assume discrete domains for parameters. Thus, the algorithm conﬁguration problem is a combinatorial optimization one, where numerical parameters are discretized. Deﬁnition 1 (Algorithm conﬁguration problem [17]). An instance of the AC consists of a 5-tuple P ¼ ðA; H; P; jmax ; oÞ, where A is the target algorithm, H is the space of conﬁgurations for A, and P denote the set of input problem instances. cðh; pÞ is a function that computes the expected cost (e.g., spent CPU time) of running A on the instance p 2 P when using conﬁguration h time jmax . jmax is a cutoff time, after which each run of A will be terminated if it is still running. Any conﬁguration h 2 H is a candidate conﬁguration of P. The cost of a candidate conﬁguration h is given by:

C P ðhÞ ¼ meanðcðh; pÞÞ

ð1Þ

p2P

The deﬁnition considers the mean of costs induced by cðh; pÞ, however any other statistic could be used instead (e.g., median, variance). An optimal conﬁguration h minimizes cP ðhÞ:

h 2 arg min C P ðhÞ h2H

The optimal cost is C ¼ C P ðh Þ. In the following, we consider that cðh; pÞ corresponds to the expected CPU time spent by A on an instance p 2 P when using the conﬁguration h and a cutoff time jmax . If the expected time is greater than jmax , then cðh; pÞ ¼ jmax . 3. Basic concepts related to the ﬁltering process Before describing our ﬁltering method for algorithm conﬁguration we feel it is necessary to deﬁne some concepts involved in the ﬁltering process. We call ﬁltering the process of removing values from the domains of the parameters, such that the space generated by the Cartesian product of the reduced domains has the same optimal cost as the original space. Our ﬁltering process involves the use of the following deﬁnition. Deﬁnition 2 (Domination and strict domination). Consider the AC instance P ¼ ðA; H; P; jmax ; oÞ. Let h0i and h00i be two values belonging to the domain Hi . We say that, for the instantiation of the parameter pi ; h0i dominates h00i in H if every pair of conﬁgurations h0 ; h00 belonging to H, such that h0 ¼ ðh1 ; . . . ; hi1 ; h0i ; hiþ1 ; . . . ; hk Þ and h00 ¼ ðh1 ; . . . ; hi1 ; h00i ; hiþ1 ; . . . ; hk Þ satisﬁes:

C P ðh0 Þ 6 C P ðh00 Þ Furthermore, if for all

ð2Þ

p 2 P:

cðh0 ; pÞ 6 cðh00 ; pÞ

ð3Þ

we say that h0i strictly dominates h00i in H. We remark that removing parameter values dominated by others allows us to ﬁlter the conﬁguration space without discarding optimal conﬁgurations. However, this does not mean these parameter values (or algorithm components) are ineffective. They still may be good in single instances belonging to P. The strict domination deﬁnition allows us to detect those algorithm components that are truly ineffective. For instance, if a parameter value h00i is strictly dominated by another value h0i , then setting the parameter pi to h00i the target algorithm performs worse than setting pi to h0i in every single instance in P. Thus, the component related to the instantiation pi ¼ h00i may be discarded.

The following deﬁnition formalizes the local consistency we attempt to reach. Deﬁnition 3 (Non-dominated consistency). Consider the AC instance P ¼ ðA; H; P; jmax ; oÞ. We say that a domain Hi is nondominated consistent if for any h00i 2 Hi there is no h0i 2 Hi such that, for the parameter pi , h0i strictly dominates h00i . The space H ¼ H1 Hk is non-dominated consistent in P if all the domains Hi are non-domination consistent. In practice, we cannot reach the non-dominated consistency because it is too expensive to compute the domination condition for two values of one parameter. Thus, to know if for a parameter pi a value h0i strictly dominates another value h00i , we apply Deﬁnition 2 with a reduced sample of pairs of conﬁgurations (h0 ; h00 ) representing the universe H2 and a reduced sample of instances from P representing the set of instances. This leads us to the following deﬁnition. Deﬁnition 4 (Quasi-domination (strict)). Consider the AC instance P ¼ ðA; H; P; jmax ; oÞ. Let h0i and h00i be two values belonging to Hi . Consider a sample of tuples S ¼ ðh0ð1Þ ; h00ð1Þ ; pð1Þ Þ; . . . ; ðh0ðmÞ ; h00ðmÞ ; pðmÞ Þ , where h0ðkÞ 2 H, h00ðkÞ 2 H and pðkÞ 2 P for all k ¼ 1 m. Each tuple veriﬁes 0ðkÞ 00ðkÞ 0ðkÞ 00ðkÞ hj ¼ hj for all j – i; hi ¼ h0i and hi ¼ h00i . We say that, for the parameter pi and based on the sample S, h0i quasi-dominates h00i in H if:

8ðh0 ; h00 ; pÞ 2 S :

cðh0 ; pÞ 6 cðh00 ; pÞ

ð4Þ

4. The NODOM-C Algorithm Our NOn-DOMinated Consistency algorithm (NODOM-C) attempts to reach the largest non-dominated consistent space of conﬁgurations. It can be summarized in the following steps: 1. Begin with an initial search space H corresponding to all the parameter conﬁgurations. 2. Reduce H by ﬁltering the domain of each parameter pi independently: the algorithm removes values from the domain of Hi if they are quasi-dominated by some other value in Hi . 3. Repeat step 2 a given number of times or until no changes are produced in H. Note that step 2 tries to reach a non-dominated consistent domain for the parameter pi . The algorithm must iterate because the different parameters are usually correlated, thus reductions in one parameter domain eventually may imply further ﬁltering in others domains. Algorithm 1. filter P (in: i; m; at ; inout: H; comp

v alues½i)

1: for all k 2 f1; . . . ; m} do get random instanceðPÞ 2: p 3: hH get_random_configurationðHÞ 4: Lh fg 5: for all hi 2 Hi do H H H 6: h fhH 1 ; . . . ; hi1 ; hi ; hiþ1 ; . . . ; hd g Lh [ fhg 7: Lh 8: Compute cðh; pÞ /⁄ to be used in the comparison phase ⁄/ 9: end for 10: /⁄⁄ Comparison phase ⁄⁄/ 11: Initialize the elements of the matrix Ddd to true (continued on next page)

76

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

12: for all ðh0 ; h00 Þ 2 L2h ; h0 – h00 do 13: if cðh0 ; pÞ > ð1 þ at Þ cðh00 ; pÞ then false 14: D h0i ½h00i 15: end if 16: end for 17: for all hi 2 Hi do cT ½hi cT ½hi þ cðh; pÞ end for 18: end for 19: /⁄⁄ Filtering phase ⁄⁄/ for all ðh0i ; h00i Þ 2 H2i do ifD½h0i ½h00i and cT ½h00i > cT ½h0i then if D½h00i ½h0i then comp alues½i:addðh0i ; h00i Þ removeðh00i ; Hi Þ

20: 21: 22: 23: 24: end if 25: end for

v

end if

4.1. Optimizing the pairwise comparison When Algorithm 1 performs the all-against-all pairwise comparison, we can know, before ﬁnishing the m comparisons, if a value hH i 2 Hi is incomparable with any value in Hi , i.e., neither does hH quasi-dominate any other value nor is hH i i quasi-dominated by any other value in the domain. Formally, hH i is incomparable when:

ð8hi 2 Hi s:t: hi – hH i Þ :

H ðD½hi ½hH i _ D½hi ½hi Þ () false

Incomparable values cannot be discarded or discard other values from the domains. Thus, to avoid performing useless comparisons and runs of the target algorithm, we added a black list/set Hbl i storing the incomparable values related to the parameter pi . The set Hbl i should be initialized to an empty set before the ﬁrst line of Algorithm 1. The following code updates Hbl i , and it is added after line 17: bl for all hH i 2 ðHi n Hi Þ do

Algorithm 1 ﬁlters a domain Hi . To ﬁlter, the algorithm performs m all-against-all pairwise comparison among the different values in Hi . In each comparison, a random instance p is extracted from the training set. In addition, a random parameter conﬁguration hH is obtained from the current search space H. For each value hi in Hi , a conﬁguration h is generated. The conﬁgurations are put in a list Lh , and their costs are computed. Note that the computed costs are (re)used in the comparison phase, thus avoiding unnecessary runs of the target algorithm. After the large loop (lines 1–18), each element D½h0i ½h00i indicates if, according to the condition (4), h0i quasi-dominates h00i . Note that the comparison, performed in line 13, includes a parameter at > 0. at is a tolerance parameter that allows one to relax the condition (4) to ﬁnd more dominated values. The last for-all loop performs the ﬁltering. A value h00i is removed from Hi only if it is quasi-dominated by other value h0i 2 Hi and cT ½h00i > cT ½h0i . cT ½hi corresponds to a total cost related to the parameter value hi and it is computed incrementally in line 17 (the elements of the array must be initialized in 0). This total cost is used as a tie breaker when two parameter values dominate each other. When this occurs, the eliminated value h00i and the non-eliminated one h0i are considered comparable; when instantiating pi to h0i we should obtain a similar performance as instantiating pi to h00i . Algorithm 2. NODOM-CP(in: m; at ; max iter; inout: H; out: comp v alues)

comp v alues ½fg::fg; iter 0 repeat iter iter þ 1 H Hold for all i 2 f1; . . . ; dg do filter P ði; m; at ; H; comp v alues½iÞ end for until Hold ¼ H or iter ¼ max iter

Finally, Algorithm 2 shows the implementation of the NODOMC algorithm. It simply consists in a repeat-until loop that performs the ﬁltering of each parameter domain, by using Algorithm 1, until no further reduction is produced in H. At the end, the algorithm returns a ﬁltered space H and comp v alues, an array of sets of comparable values for each parameter.

H H if ð8hi 2 Hi ; hi – hH i Þ : ðD½hi ½hi _ D½hi ½hi Þ () false then

Hbl i

H Hbl i [ fhi g

end if end for Line 5 is replaced by: for all hi 2 ðHi n Hbl i Þ do; and line 20 by: 2 for all ðh0i ; h00i Þ 2 ðHi n Hbl i Þ . 5. Experiments We performed several experiments to analyze different aspects of our approach. The main objectives of these experiments are: 1. to evaluate the ﬁltering power of NODOM-C related to discarding useless parameter values (Section 5.1); 2. to evaluate the reliability of NODOM-C related to (a) not discarding useful parameter values (Section 5.2), (b) returning only useful parameter values (Section 5.3) and (c) ﬁnding a set of comparable values (Section 5.4); 3. to evaluate the practical interest of NODOM-C related to simplifying algorithms by removing useless or redundant components (Section 5.5). Target algorithms and instance sets We selected the target algorithms SPEAR [15,1]. These algorithms have been already used to test conﬁguration algorithms [17,19]. SPEAR is a tree search algorithm for solving SAT problems. Conﬁgured with ParamILS, SPEAR won the quantiﬁer-free bit-vector arithmetic category of the 2007 Satisﬁability Modulo Theories Competition [15]. 26 parameters with a total number of 177 values have been considered. This algorithm is conﬁgured using two sets of instances: SAT-encoded quasi-group completion problems (QCP) and SAT-encoded graph-coloring problems based on small world graphs (SW-GCP). CPLEX is a well-known algorithm for solving mixed integer programming problems. Among the 159 user-speciﬁable parameters, only 76 have been chosen. Hutter et al. [17] claim to have carefully choose these 76 parameters containing only those that affect CPLEX’s search trajectory. A total number of 346 values for the parameters have been considered. CPLEX was conﬁgured using one set of instances, the MIP-encoded winner determination problem for combinatorial auctions (Regions100). Parameters with continuous domains were discretized and a reduced number of values were considered for parameters with

77

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Fig. 1. Average size of the search space reported after each iteration for 10 independent runs of NODOM-C.

integer domains. We used the same values proposed by Hutter et al. [17]. The authors performed the selection of values using different criteria (e.g., according to intuition based on experience, choosing values uniformly distributed in the continuous domain).1 Our conﬁguration algorithm uses a training set to ﬁlter the parameter domains of the target algorithm. As in some cases, the conﬁguration algorithms ﬁnd conﬁgurations adapted speciﬁcally to instances in the training set (see over-tuning [8,7,15]), we decided to evaluate the results using a new set of test instances. Thus, for each problem we use two sets of 1000 instances: a training set and a test set.2 Parameters of the conﬁgurator All of our experiments were carried out on a PowerEdge T420 server with 2 2.20 GHz quad-core Intel Xeon processors and 8 GB RAM running Ubuntu Linux. For the AC, we set a cutoff time of jmax ¼ 5 s (the same value used in [17]). The tolerance parameter was ﬁxed to at ¼ 0:1, i.e., we consider that two costs are equivalent if the difference between them is at most 10% of the lowest cost. The parameter m was ﬁxed to 50. The number of iterations was ﬁxed to 3 (i.e., max iter ¼ 3). The expected cost cðh; pÞ corresponds to the CPU time spent by the target algorithm in the instance p with the conﬁguration h. 5.1. Discarding ineffective values: ﬁltering power The ﬁrst results give us an idea of how much we can reduce the search space by removing ineffective values. Consider that many of these values correspond to components of the target algorithm (others are related to numerical parameters). Thus, we could remove these components without affecting the performance. Plots of Fig. 1 show the average size of the search space computed at the end of each iteration for each scenario. The size 1 A detailed list of all the parameters and their domains can be found at http:// www.cs.ubc.ca/labs/beta/Projects/ParamILS/algorithms.html. 2 The complete list can be found at: http://www.cs.ubc.ca/labs/beta/Projects/ ParamILS/results.html.

at iteration 0 corresponds to the initial search space. Note that in each target algorithm, the largest reduction in the size of the search space is produced by the ﬁrst iteration, where the size decreases by several orders of magnitude (the plots are given in a logarithmic scale). Fig. 2 is similar to Fig. 1. It shows the average sum of the domain cardinalities computed at the end of each iteration. Note that during the ﬁrst iteration about 50% of the values are removed from the domain of the parameters. Fig. 3 shows the CPU time spent by the ﬁltering algorithm at each iteration. Note that the time spent by the target algorithm in each iteration of NODOM-C is limited by:

t 6 av target time m

d X cardðHi Þ

ð5Þ

i¼1

where av target time is the average CPU time spent by one run of P the target algorithm (av target time 6 jmax ). ki¼1 cardðHi Þ corresponds to the sum of the current domain cardinalities. Thus, as the iterations continue, the iteration time is reduced because (1) the cardinality related to each domain is also reduced (Fig. 2) and (2) the search space contains better conﬁgurations, reducing the average CPU time spent by the target algorithm in each instance. Table 1 shows the average costs of different conﬁgurations of the test sets for each scenario. The table also reports the average CPU time required to obtain these conﬁgurations. Conﬁgurations were obtained by applying the following strategies: random conﬁgurations from the initial space of conﬁgurations H, conﬁgurations obtained by applying FocusedILS to the initial space of conﬁgurations, random conﬁgurations from the space of conﬁgurations ﬁltered by NODOM-C HF and conﬁgurations obtained by applying FocusedILS to this ﬁltered space. The application of FocusedILS was restricted to 5 CPU hours. Each strategy was run 10 times, obtaining 10 different conﬁgurations for each case (the average costs are shown in the table).

78

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Fig. 2. Average sum of the cardinalities of the different domains reported after each iteration for 10 independent runs of NODOM-C.

Fig. 3. Average CPU time spent by the NODOM-C algorithm in each one of three iterations.

Note that random conﬁgurations in the ﬁltered space are, on average, considerable better than random conﬁgurations in the initial space. Also note that FocusedILS ﬁnds comparable conﬁgurations in both spaces. Thus, although NODOM-C does not seem to improve the efﬁcacy of FocusedILS or ﬁnd better conﬁgurations, it seems, at least, to ﬁlter the initial space, preserving the best conﬁgurations. In the following experiments, when we refer to the ﬁltered space returned by NODOM-C for a given scenario, we actually refer

to the largest space among the 10 ﬁltered spaces returned by NODOM-C. 5.2. Reliability A: does NODOM-C discard only ineffective values? Our hypothesis is that NODOM-C discards parameter values which are useless in every single instance of the training set. In the following experiment, we attempt to prove that this statement is not true. Thus, the experiment tries to ﬁnd a conﬁguration

79

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Table 1 Comparison of different conﬁgurations for each scenario. At the top of each entry we report the cost (i.e., the average runtime over the set of instances) of the related conﬁguration applied to the given scenario. At the bottom of each entry we report the average CPU time, in hours, spent by the tuning strategy to obtain the related conﬁguration. Both results are expressed as a conﬁdence interval (mean error) with a ¼ 0:05. and tuning time reported by different strategies.

Spear (SW-GCP) Spear (QCP) CPLEX (Reg.100)

Random (H)

FocusedILS (H)

Random (HF )

FocusedILS (HF )

2.41 ± 0.31 0.0 0.55 ± 0.12 0.0 3.91 ± 0.24 0.0

1.02 ± 0.02 5.0 0.18 ± 0.01 5.0 0.19 ± 0.01 5.0

1.26 ± 0.25 5.7 ± 0.5 0.24 ± 0.05 2.1 ± 0.2 1.67 ± 0.12 22.9 ± 2.1

1.00 ± 0.02 10.7 ± 0.5 0.21 ± 0.05 7.1 ± 0.2 0.18 ± 0.01 27.9 ± 2.1

Table 2 Comparison between near-optimal conﬁgurations in the initial space H and nearoptimal conﬁgurations in the ﬁltered space HF . The last column is expressed as a conﬁdence interval (mean error) with a ¼ 0:05. Scenario

Proportion of cases i

cðh ; pi Þ þ 0:02 < Spear (SW-GCP) Spear (QCP) CPLEX (Regions100)

0.032 0.028 0.008

cðhiF ;

Average costs

pi Þ

H 0:38 0:01 0:06 0:00 0:19 0:01

Table 3 Number of parameter values with at least one comparable value (#pv with cv), average number of comparable values for a parameter value (av (#cv)) and maximum number of comparable values for a parameter value (max #cv) for each scenario.

HF

Scenario

#pv with cv

av (#cv)

max #cv

0:38 0:02 0:06 0:00 0:19 0:01

Spear (SW-GCP) Spear (QCP) CPLEX (Regions100)

14 8 46

3.9 1.4 2.5

16 2 6

outside the ﬁltered space which is better than all the conﬁgurations inside the ﬁltered space for some single instance in the training set. To ﬁnd this outer-conﬁguration, we search for the best conﬁguration hi in the whole space for each instance pi in the training set. In the same way, we search for the best conﬁguration hiF in the ﬁltered space for each instance. Then, we compare hi and hiF when solving the instance pi . According to our hypothesis, hi should never be better than hiF . The second column in Table 2 shows the proportion of instances in the corresponding training set in which the estimated cost of hiF is greater than the estimated cost of hi . To estimate the optimal conﬁguration cost of a given scenario, space and instance pi , ﬁrst, we perform ParamILS restricted to 150 iterations on the given instance, obtaining a near-optimal conﬁguration hi . The estimated cost then corresponds to the average time spent by the target algorithm (10 runs with different seeds) on solving pi using the conﬁguration hi . The third column in the table shows the average costs for the set of training instances of each scenario. Note that the proportion of cases in which a conﬁguration hi is better than hiF is rather low. This provides partial evidence for the hypothesis, although it also means that some (few) important values are likely discarded by NODOM-C from the parameter domains. Discarding important values is due to two causes: (1) the relation of dominance among two values is computed comparing just a reduced sample of conﬁgurations (m ¼ 50) and (2) repeating the ﬁltering process in each variable increases the probability of mistakenly removing a value. 5.3. Reliability B: does NODOM-C ﬁnd a set of only effective values? No experiment is required to answer this question. Recall that in the ﬁltering phase of NODOM-C (see Algorithm 1), values are eliminated from the domains if and only if they are shown to be strictly dominated by others. Thus, any non-eliminated value is not strictly dominated by another in the current domain. If we deﬁne as effective value those values which are not strictly dominated by others, then all the values in the ﬁltered domains are effective.3 3 Assuming, of course, that the domains were not reduced in the last iteration of NODOM-C, in which case the domains can still contain some ineffective values.

Table 4 Proportion of instances in which h0 and a conﬁguration from a given space (HC or HNC ) have comparable costs. HC is the space of conﬁgurations comparable to h0 found by NODOM-C. HNC is a set of conﬁgurations which are not comparable to h0 according to NODOM-C. Scenario

Spear (SW-GCP) Spear (QCP) CPLEX (Regions100)

HC

HNC

Train.

Test.

Train.

Test.

0.98 0.98 0.97

0.98 0.97 0.96

0.34 0.90 0.13

0.32 0.85 0.07

5.4. Reliability C: are the comparable values found by NODOM-C actually comparable? Recall that for any parameter value, NODOM-C retains a set of comparable values in the comp v alues arrays. In this section, we show evidence that the comparable parameter values found by NODOM-C has a similar performance. Basically, in the experiments, we observed if the performance of the target algorithm is maintained when a parameter value is replaced by a comparable value. Table 3 shows the number of parameter values with at least one comparable value for each scenario (column 2). The third column shows the average number of comparable values for each parameter value, and the fourth column shows the maximum number of comparable values for a parameter value. For instance, in the CPLEX scenario (Regions100), there are 46 parameter values with comparable values. Each one has an average of 2:5 comparable values. The value 5 = shifting for the parameter lpmethod (method for linear optimization) has the maximum number of comparable values (0 = automatic, 1 = primal simplex, 2 = dual simplex, 3 = network simplex, 4 = barrier, 6 = concurrent dual and barrier). should not affect the performance of CPLEX in any instance. In the experiments, we took a competitive conﬁguration h0 from the ﬁltered space HF . Then, we constructed HC , a space of conﬁgurations comparable to h0 , initializing each parameter domain Hi with the set of comparable values found by NODOM-C related to h0i and the parameter pi . We generated 10 random conﬁgurations hc from HC and we compared each of them to h0 to verify if they are actually comparable in any single instance. Table 4 reports

80

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81

Table 5 Performance comparison between the original and the simpliﬁed target algorithms (based on the results reported by NODOM-C).

Spear (SW-GCP) sSpear (SW-GCP) Spear (QCP) sSpear (QCP) CPLEX (Reg.100) sCPLEX (Reg.100)

Whole set of instances

Individual instances

1.00 ± 0.02 1.01 ± 0.02 0.18 ± 0.05 0.20 ± 0.11 0.19 ± 0.01 0.19 ± 0.01

0.39 ± 0.01 0.38 ± 0.01 0.07 ± 0.0 0.06 ± 0.0 0.19 ± 0.01 0.19 ± 0.00

the proportion of instances in which hc and h0 have comparable costs.4 HNC HF corresponds to a set of 10 conﬁgurations generated by replacing the parameter values (with comparable values) of h0 with random non comparable values. In other words, HNC is a set of conﬁgurations which are not comparable to h0 according to NODOM-C. Observe that conﬁgurations in HC are clearly comparable to h0 while conﬁgurations in HNC are, in general, not comparable to it. 5.5. Using NODOM-C to simplify the target algorithm Finally, we noted that the results reported by NODOM-C (ineffective and comparable parameter values) may be used to reduce the complexity of the target algorithm in at least three different ways: 1. We can ﬁnd useless components by searching those ﬁltered domains containing only one associated value to disable a component (e.g., sp-clause-inversion=false in the Spear algorithm). 2. Any other ﬁltered domain may also be used to ﬁnd useless components. For instance, the parameter lpmethod of CPLEX initially has 7 possible values: 0 = automatic, 1 = primal simplex, 2 = dual simplex, 3 = network simplex, 4 = barrier, 5 = shifting, 6 = concurrent dual and barrier. If, for instance, the domain of this parameter is reduced to f1; 6g, we could most likely remove the components related to network simplex and shifting methods, because they do not seem to be related to the methods 1 and 6. However, if the domain is reduced to a set of values containing 0, it is not clear if we can remove some component (automatic means that any of the methods could be chosen). 3. Comparable values can help to discard components. Suppose that the ﬁltered domain of the parameter lpmethod is f0g with the comparable value 1. Then, we could remove all the components related to the parameter except the primal simplex method. Taking into account these three methods for simpliﬁcation, we estimate that, assisted by NODOM-C, we could discard approximately 40 components of the CPLEX algorithm and approximately 12 components of the SMEAR algorithm. In other words, NODOM-C says that these components are unnecessary for solving the corresponding instances. We simulated the implementation of a simpliﬁed CPLEX algorithm (sCPLEX) and a simpliﬁed SMEAR algorithm (sSMEAR) by reducing its parameter domains according to their seemingly useless components. Then, we performed two tests for comparing the performance of the original and the simpliﬁed target algorithms. Table 5 reports the results. The second column shows the ﬁrst test results. We generated 10 conﬁgurations for each scenario. 4 We consider that two conﬁgurations h0 and h00 have comparable costs for an instance p if their costs c1 ¼ cðh0 ; pÞ and c2 ¼ cðh00 ; pÞ satisfy the relation maxðjc1 c2 j0:02;0Þ < 0:2). maxðc1;c2Þ

Each conﬁguration was generated by applying FocusedILS (restricted to 5 h) to the given scenario (target algorithm + instance training set). The cells reports the average cost (mean error) of the 10 conﬁgurations found by FocusedILS. The third column reports the results of the second tests. For each instance in the given scenario, we searched for the best conﬁguration using FocusedILS restricted to 10 min. The table reports the average costs of the set of conﬁgurations. Note that the times are lower because in this case we used a different ad hoc conﬁguration for each instance. Note that, according to these experiments, the original and the simpliﬁed target algorithms have virtually equivalent performances. 6. Conclusion In this paper, we propose NODOM-C, a ﬁltering method for the algorithm conﬁguration problem. NODOM-C is simple (modelfree), and it is applicable to target algorithms with a large number of components/parameters. The main focus of NODOM-C is the algorithm design process. Experiments in Section 5.5 highlight that our approach can be a very useful tool to automatically detect ineffective or comparable (interchangeable) components, methods or strategies. Furthermore, experiments on Sections 5.2–5.4 show that NODOM-C is quite reliable, i.e., it mainly discards ineffective parameter values from the parameter domains. To increase the reliability of NODOM-C, in a future work we plan to incorporate a stochastic technique to estimate the distribution of the difference between costs for two parameter values. This estimation would be used to improve the accuracy of the ﬁltering. Acknowledgments This work is supported by the Fondecyt Project 1120781 and Fondecyt Project 11121366. References [1] CPLEX, 11.0 User’s Manual, 2008. [2] B. Adenso-Diaz, M. Laguna, Fine-tuning of algorithms using fractional experimental designs and local search, Oper. Res. 54 (1) (2006) 99–114. [3] C. Ansotegui, M. Sellmann, K. Tierney, A gender-based genetic algorithm for the automatic conﬁguration of solvers, in: Proc. CP, Springer, 2009, pp. 142– 157. [4] P. Balaprakash, M. Birattari, T. Stutzle, Improvement strategies for the f-race algorithm: Sampling design and iterative reﬁnement, in: Proc. of the 4th International Conference on Hybrid Metaheuristics, Springer, 2007, pp. 108– 122. [5] T. Bartz-Beielstein, Experimental research in evolutionary computation: the new experimentalism, Nat. Comput. Ser. XIV (2006). [6] T. Bartz-Beielstein, M. Preuss, The future of experimental research, in: Proc. GECCO 2009, ACM Press, 2009, pp. 3185–3226. [7] M. Birattari, The Problem of Tuning Metaheuristics: As seen from a Machine Learning Perspective, IOS Press, 2005. [8] M. Birattari, T. Stutzle, L. Paquete, K. Varrentrapp, A racing algorithm for conﬁguring metaheuristics, Proc. GECCO, vol. 2, Morgan Kaufmann, 2002, pp. 11–18. [9] M. Birattari, Z. Yuan, P. Balaprakash, T. Stüzle, F-race and iterated F-race: an overview, in: T. Bartz-Beielstein, M. Chiarandini, L. Paquete (Eds.), Experimental Methods for the Analysis of Optimization Algorithms, Springer, 2010. [10] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [11] S. Coy, B. Golden, G. Runger, E. Wasil, Using experimental design to ﬁnd effective parameter settings for heuristics, J. Heuristics 7 (1) (2001) 77–97. [12] A. Czarn, C. MacNish, K. Vijayan, B. Turlach, R. Gupta, Statistical exploratory analysis of genetic algorithms, IEEE Trans. Evolut. Comput. (8) (2004) 405– 421. [13] A. Eiben, S. Smit, Parameter tuning for conﬁguring and analyzing evolutionary algorithms, Swarm Evol. Comput. 1 (1) (2011) 19–31. [14] O. François, C. Lavergne, Design of evolutionary algorithms – a statistical perspective, IEEE Trans. Evol. Comput. 5 (2001) 129–148. [15] F. Hutter, D. Babic, H. Hoos, A. Hu, Boosting veriﬁcation by automatic tuning of decision procedures, in: Proc. Uncertainty in Artiﬁcial Intelligence (UAI’01), Morgan Kaufmann, 2007, pp. 235–244.

I. Araya, M.-C. Riff / Knowledge-Based Systems 60 (2014) 73–81 [16] F. Hutter, H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm conﬁguration, in: LION 2011, Springer, 2011, pp. 507–523. [17] F. Hutter, H. Hoos, K. Leyton-Brown, T. Stutzle, Paramils: an automatic algorithm conﬁguration framework, J. Artif. Intell. Res. 36 (1) (2009) 267–306. [18] F. Hutter, H. Hoos, T. Stutzle, Automatic algorithm conﬁguration based on local search, in: Proc. AAAI, AAAI Press, 2007, pp. 1152–1157. [19] F. Hutter, H.H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm conﬁguration, in: LION-5, Springer, 2011, pp. 507–523. [20] O. Maron, A. Moore, The racing algorithm: model selection for lazy learners, Artif. Intell. Rev. 11 (1) (1997) 193–225. [21] R. Myers, E. Hancock, Empirical modelling of genetic algorithms, Evol. Comput. 9 (2001) 461–493.

81

[22] I. Ramos, M. Goldbarg, E. Goldbarg, A. Neto, Logistic regression for parameter tuning on an evolutionary algorithm, in: Proc. CEC Congress, IEEE Press, Edinburgh, UK, 2005, pp. 1061–1068. [23] C. Rasmussen, C. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. [24] S. Smit, A. Eiben, Comparing parameter tuning methods for evolutionary algorithms, in: Evolutionary Computation, 2009. CEC’09, IEEE, 2009, pp. 399– 406. [25] S. Smit, A. Eiben, Z. Szlávik, An moea-based method to tune ea parameters on multiple objective functions, in: IJCCI, SciTePress, 2010, pp. 261–268. [26] G. Taguchi, T. Tokotama, Taguchi Methods: Design of Experiments, ASI Press, 1993.

A filtering method for algorithm configuration based on consistency techniques

A filtering method for algorithm configuration based on consistency techniques

Recommend Documents