Computers & Operations Research 27 (2000) 841}865
Experiments with new stochastic global optimization search techniques Linet OG zdamar *, Melek Demirhan Istanbul Ku( ltu( r University, Department of Computer Engineering, Gayrettepe Emekli Subay Evleri 23/5, 80280 Istanbul, Turkey Yeditepe University, Department of Systems Engineering, Istanbul, Turkey Received 1 October 1997; received in revised form 1 February 1998; accepted 1 March 1999
Abstract In this paper several probabilistic search techniques are developed for global optimization under three heuristic classi"cations: simulated annealing, clustering methods and adaptive partitioning algorithms. The algorithms proposed here combine di!erent methods found in the literature and they are compared with well-established approaches in the corresponding areas. Computational results are obtained on 77 small to moderate size (up to 10 variables) nonlinear test functions with simple bounds and 18 large size test functions (up to 400 variables) collected from literature. Scope and purpose Global optimization techniques aim at identifying the global optimum solution of a function which need not be convex or di!erentiable. In this paper, we consider stochastic global optimization search techniques. Among them, one can name probabilistic search methods such as simulated annealing, genetic algorithms, controlled random search and clustering methods. Another class of global optimization methods considered here are adaptive partitioning algorithms which aim at reducing the search space around the global optimum by consecutive partitioning iterations of a promising subregion into smaller and smaller subspaces. Here, we speci"cally investigate simulated annealing, clustering methods and adaptive partitioning algorithms. Besides implementing some well-established techniques in these "elds, we develop new techniques which also lead to hybrid methods combined with existing ones. In an adaptive partitioning algorithm proposed here a new partition evaluation measure of fuzzy nature is developed. Furthermore, both simulated annealing and random search are used for collecting samples in the adaptive partitioning algorithm. We conduct a study on
* Corresponding author. Tel.: 90-212-6393024/159; fax: 90-212-5511189. E-mail address:
[email protected] (L. OG zdamar) 0305-0548/00/$ - see front matter 2000 Elsevier Science Ltd. All rights reserved. PII: S 0 3 0 5 - 0 5 4 8 ( 9 9 ) 0 0 0 5 4 - 4
842
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
an extensive set of test problems and compare the performances of these algorithms. 2000 Elsevier Science Ltd. All rights reserved. Keywords: Probabilistic search methods; Global optimization; Adaptive partitioning algorithms; Fuzzy measures
1. Introduction We consider the optimization problem max f (x) s.t. x3a LRL, and here, a is a hypercube de"ned by boundaries lb )x )ub , i"1, 2, n, where lb , ub are lower and upper bounds on G G G G G variable x . There are no restrictions on f (x) in terms of convexity or di!erentiability. Therefore, it is G hard to locate the global optimum point(s) of such functions. Global optimization attracts attention from researchers in di!erent disciplines, because many real life problems in areas such as physics, chemistry and molecular biology [1,2]. Involve nonlinear functions of many variables with attributes (multimodality, discontinuouity, etc.) di$cult to handle by conventional mathematical tools. Classical gradient-based algorithms such as the Quasi-Newton method [3,4]. The modi"ed steepest descent algorithm [5], or the conjugate gradient method [6] may be readily trapped by local optima when the feasible region around the unique optimum is not well-conditioned [7]. To overcome the di$culties described above, probabilistic search methods which rely heavily on computer power have been developed since the 1980s. Random search algorithms, such as Density Clustering [8], Controlled Random Search [9], Multistart [10]. Multi Level Single Linkage (MLSL) [10], enable the exploration of the whole feasible region through random sampling followed by hill-climbing or gradient-based methods. For instance, in MLSL, a is subjected to random sampling and the promising points in the sample are selected according to a geometric criterion. Then, in order to intensify the search around these selected points, hill-climbing search is applied simultaneously starting with each selected point. The procedure continues repeating these operations until a given number of function evaluations are executed. In each cycle the value of the geometric criterion is tightened. Multi-start algorithms are provided with asymptotic convergence proofs. Topographical MLSL (TMLSL) [11] and Topographical Optimization with Local Search (TOLS) [12] are two recent methods along this line. TMLSL and TOLS are both reported to be considerably superior as compared to MLSL and other previously introduced clustering methods. Apart from the multi-start solution approaches described above, a stochastic single point search technique. Simulated Annealing (SA), proposed by Kirkpatrick et al. [13] is also used in optimization. In SA, given a starting point, a neighbour solution is generated randomly. If the neighbour solution is better than the current solution, it replaces the current solution. Else, it is rejected or accepted according to a probability of acceptance which depends on the current temperature and the amount of deterioration in the function value. Thus, the search may escape from a local optimum solution by moving on to worse solutions. SA guides the search by specifying a cooling scheme where the temperature is reduced as the search makes progress so that it may converge towards the global optimum. A convergence proof for SA in continuous domain is provided by Dekkers and Aarts [14]. The strong points of SA and some pitfalls for potential SA users are indicated in an extensive review given by Ingber [15] where a wide range of application areas from
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
843
"nance to combat analysis are described. Parallelization of the SA procedure as a multi-start search technique is discussed in Frost et al. [16] and also in Ingber [15]. Ingber provides a careful analysis of parallelization issues in SA applications and points out that parallel SA applications are yet in their early phase of development and that further investigation is necessary in this area. Here, we develop some simple SA schemes and compare them with SA approaches well established in global optimization. We also combine SA approach with TOLS in order to observe the e!ects of replacing hill-climbing search by SA as well as provide di!erent starting points for SA using TOLS. A third class of algorithms is proposed here for applying random search techniques. This class consists of adaptive partitioning algorithms, APA. APA divides the feasible region into smaller subspaces, discards some unfavourable regions and re-partitions promising areas to reduce the "nal search space [17}22]. A convergence proof and an extensive review of previous work carried out in this area are given by Pinter [23]. The most important factor in APA is the partition evaluation scheme. Two classes of partition evaluation strategies exist: interval estimation approaches based on a model supposition of f (x) and interval estimation approaches based on inclusion functions of f (x). In interval estimation approaches based on a model supposition of f (x) (e.g., [24}29]). It is assumed that f (x) belongs to a prespeci"ed class of functions such as Lipshitzian functions or it is assumed that f (x) is a realization of a stochastic process (e.g., Wiener (for unary f (x)), or Gaussian). The prior model of f (x) is integrated with the information gained from evaluating the functional values of points at given locations such as the boundaries or the diagonal of the sub-region. Using this information an adaptive posterior model of f (x) is obtained in each iteration and sub-regions are evaluated accordingly. In the interval estimation approach based on inclusion functions [30}32], the bounds on f (x) within a speci"c interval are calculated by replacing the real operations in f (x) by their predetermined interval operations resulting in an interval on f (x) for each operation. Then, when all the terms are aggregated, an enclosure of f (x) is obtained. Thus, each partition is evaluated with the aid of an inclusion function de"ned on f (x) and only the boundaries on each variable need to be known in every interval. Naturally, the strength of the bounds on f (x) depends on the inclusion function as well as the behaviour of f (x) within the interval. Here, we also develop an adaptive partitioning algorithm which, rather than evaluating subregions by taking samples from prespeci"ed locations, uses probabilistic search approaches such as pure random search (APARS) or SA (APASA) in order to decide on promising partitions. Furthermore, the information related to the samples obtained by the probabilistic search techniques is then aggregated into a fuzzy measure which guides the search in deciding on which partition to sub-divide, preserve or discard. We carry out computational experiments with all three classes of approaches described above. Among the di!erent methods included in the experiments are the following. The best reported clustering method reported in the literature, MLSL, and the recently introduced methods, TOLS and TMLSL, are included in the comparison as clustering techniques. Two additional variants of TOLS merged with SA and developed here are also included. A new adaptive partitioning algorithm, APA used with RS and SA constitutes two of the partitioning type of heuristics (APARS and APASA). Furthermore, six interval estimation algorithms based on a model supposition of f (x), DPS [25,26]. (Danilin and Piyavskii [25] and Shubert [26]), Strongin STR [27], KAB [24,33], (Kushner [24] and Archetti and Betro [33]). MTZ (Mockus, Tiesis and
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
844
Zilinskas [34]) [34]. ZIL (Zilinskas [28]) [28], BOE (Boender [35]) [35] are imbedded in APA and tested here. The latter six algorithms are discussed in detail in Pinter [23] and their convergence conditions are demonstrated. Two SA algorithms reported to be considerably superior to existing methods [14,36] and their variants developed here are included in the experimentation. Furthermore, three simple SA schemes introduced here complete the SA group of heuristics. We also compare the deterministic inductive search method described in Bilchev and Parmee [37], because good results are reported for standard test functions found in the web site http://iridia.ulb.ac.be/largerman/ICEO.htmlCRealtest where the functions of the First International Contest on Evolutionary Optimization are found and for four hard permutation test functions indicated in the web site http://solon.cma.univie.ac.at/neum/glopt. In order to provide reliable results and point out promising types of heuristics where future e!orts should be concentrated on, we include in our experiments a wide range of functions (77 small to moderate size nonlinear test functions with simple bounds up to 10 variables and 18 large size test functions up to 400 variables) collected from the authors of di!erent optimization methods as well as from public web sites. In the following sections, we brie#y describe the methods used in the comparison and provide the computational results. The sources of the test functions and some of their properties are listed in the appendix.
2. Methods used in the comparison 2.1. Simulated annealing algorithms Method-1 SA-I (developed here): SA-I is a simple SA scheme developed here where the neighbour to a current solution is generated randomly as follows. A dimension iH is selected randomly and a decision is made arbitrarily if the value of the variable in the selected dimension is to be decreased or increased. Then, a random amount is added/subtracted from the current variable value without violating the corresponding upper and lower bound while the values of the remaining variables stay intact. The neighbour solution, xI>, to the current solution xI in iteration k is created as follows: If i"iH, then and Else
xI>"xI#u(ub !xI) if sgn(u!0.5)(0, G G G G xI>"xI!u(xI!lb ) if sgn(u!0.5)*0. G G G G xI>"xI where u3;[0, 1]. G G
Here, u is a random variable uniformly distributed between zero and one. The new functional value is calculated and the solution is accepted if the move improves the current solution. Else, a nonimproving move is accepted according to the following cooling scheme used successfully in previous studies on mixed integer optimization problems [38}40]: PA(*, tI)"exp(!*/( f (xI)tI))
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
845
where, PA is the probability of acceptance, * is the di!erence between f (xI>) and f (xI), and tI is the temperature in iteration k. If a randomly generated number p between zero and one turns out to be less than PA, then the deteriorating move is carried out. tI depends on the number of times a deteriorated solution has been accepted. Initially tI is equal to one, but after each nonimproving move, it is reduced as follows, tI>QtI/(1#bHtI), where b is a nonnegative small constant less than one. Here, and in all of the following SA algorithms, b is assigned a value of 0.1 after preliminary experimentation. The geometric cooling scheme described here is classi"ed as Simulated Quenching by Ingber [15] and it is faster than Boltzmann annealing. The aim in reducing the temperature at every accepted nonimproving move is to direct the search away from unfavourable regions. Re-annealing takes place if tI drops below 0.01 during the search and tI is then re-assigned the value of one. Method-2. SA-I with cyclic search dimensions (SA-II) (developed here): SA-II is basically the same as SA-I, with the di!erence being that, rather than selecting a random dimension to perturb the corresponding variable's value, each dimension is selected sequentially, thus realizing a cyclic movement which advances once in every dimension in each cycle. In SA-II, each dimension has its own temperature which is reduced according to the nonimproving moves occurring in each dimension. Thus, tI becomes an array of length n, tI. In SA-II, the search is conducted in each G dimension to provide the means of exploring all dimensions without bias. The neighbour solution is now generated by applying the following formula for each dimension i"1, 2, n: xI>"xI#u (ub !xI) if sgn(u !0.5)(0 G G G G G G xI>"xI!u (xI!lb ) if sgn(u !0.5)*0. G G G G G G Here, u is a random variable uniformly distributed between zero and one. G Method-3. SA-I with local search (SA-III) (developed here): The di!erence of SA-III from SA-I is that when an improving move is identi"ed, a hill climbing procedure (LOCAL) is applied occasionally with the improved solution as the reference point. In the procedure LOCAL, neighbour solutions are generated randomly in the same manner SA-II, but the new values of each dimension should be within a distance which is less than 10% of the corresponding variable's bound length from the reference value of the initial improved solution. A new solution is accepted and replaces the current solution if and only if the new solution improves the current solution. The aim in applying procedure LOCAL is to intensify the search around the improved solution, so that SA does not skip a nearby global optimum. After procedure LOCAL is applied SA continues from the best solution found during hill-climbing in the close neighbourhood of the reference solution. No gradient information is used in procedure LOCAL, because we wish to report results pertaining to pure probabilistic search without exploiting functional information. In SA-III, procedure LOCAL is not applied every time an improved solution is found, because we do not wish to restrict the freedom of SA search so often. Rather, procedure LOCAL is applied if a number of improving moves (MaxMoveH0.005 improving moves where MaxMove is the total number of moves allowed for SA) have been carried out after the last time procedure LOCAL is called. The number of iterations carried out by procedure LOCAL also depends on the total
846
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
number of evaluations permitted for SA-III. In this implementation, we allow MaxMoveH0.001 hill-climbing moves each time procedure LOCAL is called. Method-4. STANDARD-SA (STSA-I) [14]: In STSA-I, neighbour solutions are generated in the same manner as in SA-I. However, the cooling scheme is di!erent. The temperature is not decreased geometrically at every nonimproving move, but it is reduced at every 2n moves where 2n is called the Markov chain length by Dekkers and Aarts [14]. Only after 2n moves are executed, the iteration number k is increased by one. The temperature, tI>, updated at the end of the Markov chain is calculated as follows, tI>"tI[1#(tI ln(1#b)/3uI)] where uI is the standard deviation of all functional values calculated during the moves within the last Markov cycle of 2n moves. The temperature is set to the value of one if it drops below 0.01. The probability of acceptance for a nonimproving move is calculated in the same manner as in SA-I. Method-5. STANDARD-SA with local search (STSA-II) [variant of Dekkers and Aarts's [14] developed here]: STSA-II di!ers from STSA-I in only one aspect: the hill-climbing procedure LOCAL is applied in the same manner as in SA-III in order to improve the performance of STSA-I. Method-6. ADAPTIVE SA (ASA) [36]: The most important feature in ASA is the formula used for generating neighbours. A neighbour to the current reference point is generated via an expression which depends on the temperature tI corresponding to the considered dimension. Thus, tI is an G G array of length n similar to SA-II. The neighbour solution, xI>, to the current solution xI is given by applying the following formulae for every dimension i: xI>"xI#yI(ub !lb ) where yI3[!1, 1] and is de"ned by G G G G G G yI"sgn(u !0.5)tI[(1#1/tI)SG\!1] where u 3;[0, 1]. G G G G G In ASA, tI is reduced after one move is attempted in each dimension and the iteration index k is G increased. The temperature at any iteration k is calculated as follows. tI"e\AGIB L where c and d are tuning parameters. G G G G
In this particular implementation, c and d are equal to 0.1A and 0.5A, respectively where G G A" (ub !lb )/(ub !lb ). G G G G G The aim in assigning these values to c and d is to lower the temperature reduction rate for G G dimensions with a large search space permitted by ub !lb and vice versa. G G Method-7. ADAPTIVE SA with Boltzmann annealing (ASA-B) [variant of Ingber's [36] developed here]: In ASA-B the neighbour solution is generated as in ASA, but the cooling scheme is di!erent. The temperature is an array of length n and it is reduced according to the expression below whenever an improving move is made in a given dimension i. tI>"1/ln(k ), G G where k is the number of improving moves in dimension i up to and including iteration k. G This cooling scheme tries to reduce the temperature rather slowly and in a di!erent rate for each dimension, depending on the status of the search in each dimension.
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
847
2.2. Clustering methods Method-1 Topographical optimization using presampled points plus local search (TOLS) [12]: In TOLS, a sample of solutions are obtained randomly from the feasible region with the condition that the euclidean distance from each point generated last to every existing point is at least R units so that the sample generated covers the whole feasible space uniformly. The authors decide on the value of the parameter R by trial and error, but in this application, the parameter R is given by
R" max +ub !lb ,/sn) G G G where s is the sample size and is given by s"0.1m(a ) where m(a ) is the lebesgue measure of the feasible space. However, s is not permitted to surpass n;500. The latter radius formula is motivated by the fact that each generated point is surrounded by a sphere of radius R which should not be trespassed by other points. Next, the topographical minima among the sample are found. A topographical minimum is a solution whose functional value is better than its h-nearest neighbours and thus, it is a promising point in that area. The number of the nearest neighbours, h, is an important parameter in TOLS. Our preliminary experiments show that h, determined as 0.1s, produces a logical number of topographical minima for most of the test functions. However, h is not permitted to be below 10 and larger than 1000. ToK rn and Viitanen [12] apply gradient search to each topographical minimum identi"ed after the initial sampling in order to improve the result. Here, we divide the remaining number of permitted function evaluations by the number of topographical minima and apply an equal number of random hill-climbing moves (procedure LOCAL) to every topographical minimum and report the best result. In our experiments the permitted number of function evaluations is 1000n or 2000n. So, at least 500n function evaluations are shared among topographical minima for applying procedure LOCAL to each. Method-2 Topographical optimization using presampled points plus SA-I (TOSA) [developed here]: TOSA di!ers from TOLS in one respect: Instead of applying local search to each topographical minimum, procedure SA-I is applied. Thus, topographical minima are used as good starting points for multi-start SA. Method-3 Topographical optimization using presampled points plus ASA (TOASA) [developed here]: In TOASA, procedure ASA is applied to each topographical minimum. Method-4 Multi level single linkage (MLSL) [10]: The procedure starts with obtaining a random sample of size s and then selecting some points as the reference points which are then subjected to local (gradient) search. A solution becomes a reference point if there does not exist other solutions around it within an euclidean distance RI with better functional values. The index k indicates the iteration number and RI decreases in each iteration. After all such reference points are subjected to local search, a new sample is collected, and reference points are re-identi"ed provided that they are not in the list of previous reference points. MLSL procedure has a proven asymptotic convergence given that RI is determined as follows and su$cient sampling iterations are carried out. RI"n\ (q(1#0.5n)m(a )U ln(ks)/ks)L where U is a constant (in this implementation, U"4), s is the sample size, and q is the gamma function. Here, s is equal to 10n and at most "ve sampling iterations (k"1, 2, 5), are permitted.
848
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
The values of U and s parameters selected here are the best reported values in Ali and Storey [11]. In our implementation, procedure LOCAL is called instead of gradient search. Method-5 Topographical multi level single linkage (TMLSL) [11]: TMLSL di!ers from MLSL in one aspect: in each iteration, the sample of solutions are generated as in TOLS in order to obtain a uniform cover and then topographical minima are found. Then, solutions among these topographical minima, which do not have better solutions around them within the euclidean distance RI are subjected to local search. In this manner, the identi"cation of promising local minima is enhanced. 2.3. Adaptive partitioning methods (APA) An Adaptive Partitioning Approach with a Fuzzy Measure (APASA variants and APARS developed here) In APARS or APASA, the initial search region a is divided into P equal area partitions and the iteration number, k, is set to zero. Then, a sample SI of size s is collected from each existing H sub-region aI, j"1, 2, P. (The sampling procedure may be purely random (APARS) or the H sample may consist of the best s solutions that SA located during the search (APASA).) Then, in each aI, the functional values f (xOHI) of all solutions xOHI3SI, q"1, 2, s, contribute to the value H H rI of the partition evaluation measure. The sub-region with the highest rI value is selected for H H re-partitioning into P new partitions in the next iteration k#1, whereas the partitions with rI values less than a speci"ed value are discarded. The partitions that are neither discarded nor H selected for re-partitioning, preserve their sizes and are subjected to re-sampling in the next iteration. Thus, APARS (APASA) intensi"es the search in promising sub-regions while enabling it to divert from a sub-region where consecutive re-partitioning takes place to a sub-region which is not considered for re-partitioning in the previous iterations. Demirhan et al. [41] show the probabilistic convergence of APARS when guided by an intelligent agent. We now discuss why the partition evaluation measure rI is fuzzy. H APARS (APASA) involves two kinds of uncertainties. The "rst one is probabilistic in nature and is due to the random acquisition of sample points in the search region. The second one is the fuzzy uncertainty involved in the selection of the (most promising) region to be re-partitioned in the next iteration. The presence of fuzzy uncertainty can be clari"ed as follows. In any partitioning iteration k of APA, the bounds on f (x) within a partition aI are not sharp, since partition aI is represented by H H a "nite set of sample points, SI. Similarly, the set, f (aH), (the set of functional values which are close H to the global optimum, f H), is represented by a "nite set and, therefore, has also fuzzy boundaries on f (x). Accordingly, APARS (APASA) uses a fuzzy approach in the assessment of sub-regions achieved in the following manner. The functional value of every sample point is assigned a membership value which represents the rate at which it belongs to the set f (aH). Then, the partition evaluation measure of each sub-region is computed convening these membership values. Demirhan and OG zdamar [42] show that the fuzzy uncertainty involved in APARS is reduced with an increasing number of partitioning iterations and convergence is proved in the limit. Membership functions: Before describing the fuzzy measure used in the evaluation of each sample, we need to de"ne a membership function, kOHI, which maps f (xOHI) to the unit interval. A value kOHI, measuring the degree of membership of f (xOHI) to the set f (aH), is assigned to every solution xOHI in SI. H Practically, membership values represent the scaled position of f (xOHI) against the best functional
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
849
value obtained so far, FI. Membership values are adaptive in the sense that kOHI changes as the accumulated information on f (aH) is expanded by re-sampling at each iteration k. The "rst membership function used in APA is the linear membership function (LMF) which provides a linear scaling factor for each f (xOHI) in the range of all functional values obtained in the search process. The second one (GMF) is a modi"ed version of the Gaussian membership function which shows an exponentially increasing bias towards a membership value of zero as the di!erence between f (xOHI) and FI increases. The mathematical expressions for LMF and GMF are given below. kOHI"1!["FI!f (xOHI)"/RI]
(LMF)
where RI is the range of all functional values obtained up to iteration k. kOHI"exp(![FI!f (xOHI)]/p) (GMF) I where p is the standard deviation of the functional values in all samples gathered in the current I iteration k. Fuzzy measures: The assessment of each partition aI is achieved through a fuzzy measure rI which H H is based on the membership values obtained in that particular region. To dispose of the unfavourable e!ects of poor solutions on rI, not all but only the best s (s)s) solutions are included in the H fuzzy measure calculations. (In this implementation s"10) We denote the latter truncated set by SYI. Membership values kOHI for all xOHI3SYI are calculated and an a-cut on kOHI is applied, i.e., only H H kOHI which are greater than or equal to 0.5 are aggregated into some functional form to result in rI. H However, since every undiscarded partition is subjected to re-sampling in each iteration, parental information (the information about a partition gathered in the previous iterations) is incorporated into rI by taking the weighted average of rI and rI\. Note that rI\ belongs either to the same H H H H partition j which is not discarded (and not partitioned) in iteration k!1, or to its parent which has been re-partitioned in iteration k!1. The two fuzzy measures used in this study are derived from entropy measures described in Pal and Bezdek [43]. Entropy measures are functions of membership values and they re#ect the average ambiguity in a given fuzzy set. In the entropy measure, the uncertainties related to both the set itself and to its complement are included. Thus, in our context, an entropy measure is usually factored into two parts: The part re#ecting the presence of an element in a fuzzy set and the part re#ecting its absence. A fuzzy set's uncertainty level is minimum when the membership values (the components of the entropy measure) approach either zero or one. The uncertainty level reaches the maximum when the membership values approach the value of 0.5. In the fuzzy measures de"ned in this particular context, the part of the entropy measure which re#ects the uncertainty related to the complement of the set is excluded and the information inherited from the parent is added. The "rst fuzzy measure FZ1 is provided by Pal and Pal [44] and is based on an exponential gain function.
T FZ1: rI"d kOHI exp(1!kOHI) v#(1!d)rI\ H H O where v (v)s) is the number of solutions in SYI with kOHI*0.5, and d (0)d)1) is a scaling factor H for incorporating parental information into rI. In this implementation, d"0.2. H
850
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
The second fuzzy measure is adapted from the one described in Deluca and Termini [45]. The e!ects of kOHI on rI are smoother in FZ2 as compared to FZ1. H T FZ2: rI"d 1# kOHI ln(kOHI) v #(1!d)rI\. H H O The two combinations (¸MF#FZ1) and (GMF#FZ2) have proved to be satisfactory in terms of solution quality in a previous investigation conducted by OG zdamar and Demirhan [46] about the e!ects of various membership functions and fuzzy measures on the performance of APARS. Unlike other methods, APARS (APASA) is perfectly amenable to parallelization, because the data to be transmitted from the master computer to the slaves before the search starts in each region, consist of regional bounds, and the number of moves to be carried out. The data transmitted from the slaves to the master are composed of the best function values obtained in each regional search among which FI is identi"ed by the master. FI is then forwarded from the master to the slaves which use it to calculate their rI. rI values are then transmitted to the master which H H decides on the region to be re-partitioned and triggers the search in the next level. Thus, on each slave the search in each region can be conducted independently. Both the CPU time and the memory requirements of APARS (APASA) are reduced by the number of slave processors. On the other hand, in clustering methods only the local search part can be parallelized and our experimentation as well as those of other researchers indicate that the sampling part for obtaining a uniform cover takes a lot of CPU time. As for SA, there is need to carry out further theoretical work for fully bene"ting from parallelization [15]. In this study, we di!erentiate APARS (APASA) applications by the number of subdivisions, P, and the method of sampling. Furthermore, in some applications the combination (¸MF#FZ1) is used whereas in others (GMF#FZ2) is used for selecting the most promising partition. The following describes the characteristics of "ve APASA/APARS applications in these respects. Method-1 (APASA-I): In APASA-I, the feasible region is divided into P"2L subspaces (N1"n, and each dimension's bound length is bisected and all combinations of bounds are generated, each corresponding to an area) each of which are subjected to SA-III and s (s"10) best solutions encountered during the search are included in the membership function, ¸MF. FZ1 is used as the fuzzy measure. In APASA-I, we permit at most 10 partitioning iterations in the search tree (k)10) and functions up to "ve variables (n)5) are solved due to the excessive memory requirements caused by 2L new regions added at each iteration. The number of SA-III moves, MovesI, permitted H in a region j at iteration k is given by the formula:
MovesI"MaxMove;0.5I(m(aI)/m(aI)) H H where m(aI) is the lebesgue measure of the sum of all non-discarded regions at iteration k. The search either ends at the tenth iteration or when the number of functional evaluations surpass MaxMove. Method-2 (APASA-II): Since it is prohibitive to solve functions with many variables by APASA-I (in terms of computer memory and the excessive number of function evaluations), instead of dividing the selected region into 2L subregions, in APASA-II, the selected region is divided into two subregions, P"2, and N1"1, by bisecting the dimension with the longest bound length in that region. The latter dimension selection is proper in terms of convergence as indicated by Ratz and Csendes [21], since all dimension sizes tend to be reduced in a balanced manner. Here,
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
851
MovesI"dIm(aI) where dI is the density pertaining to region j at iteration k and its value varies H H H H such that no region is assigned a MovesI value of more than 10,000. At most 100 iterations are H permitted, because the maximum number of regions at a given iteration k, is equal to k#1, and therefore, linear in k. APASA-II's memory requirements are very low at the expense of larger areas to search through as compared to APASA-I at the early iterations. As in APASA-I, the combination (¸MF#FZ1) is used. Method-3 (APASA-III): In APASA-III, 2, new subregions emanate from the selected region with N1 less than n, but greater than 1. Thus, N1 out of n dimensions are bisected at each iteration where the selection of dimensions are carried out in descending bound length. APASA-III is a mid-way solution between the two earlier versions of APASA. In this implementation, N1 is speci"ed as 5. The remaining parameters are the same as in APASA-II with the exception that 10 iterations are permitted at most. (The combination (LMF#FZ1) is used.) Method-4 (APASA-IV): All parameters of APASA-IV are the same as in APASA-II with the exception that the combination (GMF#FZ2) is used to determine the promising partition. The sampling method is again SA-III. Method-5 (APARS): Similar to APASA-IV, in APARS, the partition selection membership function/fuzzy measure combination is (GMF#FZ2). However, here the sampling procedure is pure Random Search in each partition area. Furthermore, the number of random samples collected from each partition is "xed to 200 and the number of partitioning iterations is limited to 100. The latter number proves to be satisfactory for most problems [46]. 2.4. Adaptive partitioning algorithms based on interval estimation Pinter [23] provides an excellent survey on partition assessment operators used in a generic partitioning algorithm, PAS, which is equivalent to the partitioning scheme in APARS (APASA) except for the fact that partitions are not discarded in PAS. Pinter [23] formulates necessary and su$cient convergence conditions for these operators. Here, six operators described in detail by Pinter are selected for computational comparison. While some of these operators are based on a Lipschitzian overestimate of f H in aI, others re#ect the probability of "nding an improved H estimate of f H on aI. Thus, all partition measures considered in this class are based on a prior class H of models for f (x). Originally, these partition measures are developed for univariate functions. However, here, we adapt them to multi-variate functions as described in Pinter [23] and imbed them into our own partitioning scheme APA as partition evaluation measures rI instead of our H fuzzy measures. The mathematical expressions for these measures are given below. In all of the following assessment schemes, we "x P to two similar to APASA-II and we do not discard any partitions during the iterations similar to PAS. Method-1 DPS [25,26]: Danilin and Piyavskii [25] and Shubert [26]: In this, and in all following measures (except MTZ), two functional evaluations are required in each partition, the "rst being at the upper vertex ;I of the diagonal in the hypercube de"ned by region aI and the second at the H H lower vertex, ¸I. DPS assumes a Lipschitzian prior model for f (x) and the Lipschitz constant which H is a parameter of this model is adaptively altered in each iteration. rI"0.5¸"";I!¸I""!0.5( f (;I)#f (¸I)) H H H H H
852
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
where ¸ is the Lipschitz constant or its known over estimate and "" ) "" is the Euclidean distance between two coordinates. ¸ is adaptively calculated in each iteration by taking the maximum of the slope of each dimension (in the region) and taking the average of the maxima over all existing partitions. Method-2 STR (Strongin 1978): STR also assumes a Lipschitzian model. rI"g"";I!¸I""#[f (;I)!f (¸I)]/g"";I!¸I""!2[f (;I)#f (¸I)] H H H H H H H H H where g'4¸. Method-3 KAB (Kushner [24]; Archetti and Betro [33]): KAB assumes a Wiener process for f (x) where the conditional expectation and variance are reduced to piecewise linear conditional mean and piecewise quadratic variance. rI"exp[!2(FI!c!f (;I))(FI!c!f (¸I))/(p"";I!¸I"")] H H H I H H where c and p are parameters. Here, we take c as the arithmetic mean of the functional evaluations I obtained in iteration k and p as their standard deviation. I Method-4 MTZ (Mockus, Tiesis, Zilinskas [34]): MTZ and the following two measures are based on a Gaussian model and they are constructed to minimize the average sum of lengths of intervals which may contain f H. rI"max pOHI[1!N((FI!lOHI)/pOHI)] H H VOHI where lOHI"( f (;I)""x!¸I""#f (¸I)"";I!xOHI"")/"";I!¸I"" H H H H H H and pOHI"p""xOHI!¸I"" "";I!xOHI""/"";I!¸I"". I H H H H Here, xOHI is an element of the set of a number of equidistant points on the diagonal joining ;I and ¸I, FI is the best functional value found in the latter set, and N( ) ) is the standard normal H H H distribution function. In addition to two functional evaluations at ;I and ¸I, we calculate f (x) for H H four equidistant points on the diagonal. Method-5 ZIL (Zilinskas [28]): rI"1!N([FIY"";I!¸I""!f (;I)""aI!¸I""!f (¸I)"";I!aI""]/[p (""aI!¸I"" "";I!aI"" H H H H H H H H H H I H H H H "";I!¸I"") ]) H H Here, FIY"FI!e, and, H H aI"0.5(;I#¸I)!("";I!¸I""( f (;I)!f (¸I))/[2( f (;I)#f (¸I)!2FIY)]). H H H H H H H H H H Method-6. BOE (Boender [35]): rI"max+ f (;I), f (¸I),#(0.5%"";I!¸I"") p p exp[( f (;I)!f (¸I))/(2p"";I!¸I"")] H H H H H I I H H I H H [1!N(!" f (;I)!f (¸I)"/p"";I!¸I"")]. H H I H H
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
853
2.5. Deterministic Method Method-1 Inductive Search (IS) [Bilchev and Parmee [37]]: In inductive search, the function is optimized in each dimension one at a time and sequentially (by ignoring remaining variables in the function and "xing previously optimized variables to their best found value) using Brent's parabolic interpolation routine designed for single dimension search [47]. The solution is then improved by the hill climbing procedure with the starting solution consisting of the best values found for the previous dimensions and the currently considered dimension. Next, having "xed the values of the optimized dimensions, the next dimension is considered for optimizing by Brent's method. The procedure continues until all dimensions are considered.
3. Computational results All of the methods indicated in the previous section are coded in Borland C## operating under Windows95. The memory requirements of all methods are minimal (even 8 MB RAM would be su$cient to run the executeable code) with the exception of TOLS and its variants, MLSL variants and APA variants. Due to the ease of parallelization, APA variants have been implemented using PVM daemon (installed on Sun Solaris work stations) which searches for free processors in the network to conduct the search in each independent region. All methods are terminated after 1000n function evaluations and 20 independent runs are executed for each method with di!erent random seeds, except for interval estimation based partitioning methods. The operations carried out by SA variants and APA variants are very simple and the only burden for these methods are the function evaluations. The latter is not true for TOLS and MLSL variants due to the computational burden existing in the initial sampling phase. These e!ects are observed in CPU times which are measured on a Pentium 100 MHz. PC with 32 MB RAM (Table 1). In Table 1, the quality of solutions are indicated in terms of the absolute deviation from the known global optimum. The best deviation observed during 20 runs (each with a di!erent seed) is reported. The average absolute deviation (standard deviation) and the maximum absolute deviation from global optima are indicated for all methods in Tables 1}3, Table 3 for 77 small and 18 larger size test functions, respectively. The number of optimum solutions found by each method (Copt) is also indicated in the results to the precision of 10\. In order to provide an average case analysis, all results are provided again in Table 1 and Table 3, by excluding three outlier problems, C23, the Trid, C24, Schwefel's Sine Root and C63, Ingber's f0, where most of the methods' performances deteriorate particularly, except for ASA and APASA-I. C23, the Trid, is a convex function of 10 variables. It is best solved by gradient methods and since the techniques described here are gradient-blind, it is only natural that their performance deteriorates on the Trid. C24, contains a second local minimum which is very far away from the global minimum and most methods are trapped by this second minimum. C63, which contains the maximum number of local minima (10) among all test functions, is the most di$cult of all. In both Tables 1 and 3 some promising methods are run with 2000n function evaluations in order to observe the rate of convergence. TOASA seems to bene"t the most from the increased number of function evaluations as compared to SA-III and ASA due to the augmented amount of
854
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
Table 1 Results on 77 small to moderate size test functions up to 10 variables Method
SA-I
SA-II
SA-III
TOLS
TOSA
TOASA
1000n iterations Avg Stdev Max Copt CPU (s)
2.290 8.283 64.680 49 0.49
15.347 109.667 968.235 48 0.50
1.256 3.479 17.68 52 0.48
285.79 1929.655 16,806.83 11 0.93
66.423 441.486 3736.354 43 0.89
6.613 32.741 221.654 47 0.61
0.994 3.068 17.689
40.135 316.696 2745.175
50.918 431.320 3736.354
4.057 25.679 221.654
TMLSL 67.105 307.393 1817.936 23 1.85
STSA-I 67.429 350.650 2550.684 38 0.48
STSA-II 34.290 194.927 1353.150 40 0.50
ASA 1.154 3.213 17.689 53 0.57
ASA-B 22.184 124.073 881.854 47 0.53
27.782 209.821 1817.939
10.201 61.829 530.880
2.992 9.214 60.318
1.097 3.222 17.689
10.844 76.790 664.190
APASA-II 1.123 3.403 17.689 56 0.56
APASA-III 15.984 110.069 936.728 48 3.52
APASA-IV 1.45565 3.90557 17.6889 51 0.65
APARS 208.366 1581.58 13,908.3 28 0.32
IS 14.579 96.880 785.082 38 0.45
1000n iterations (without 3 outliers) Avg 1.061 0.881 Stdev 3.221 2.979 Max 17.689 17.689
0.476 2.038 16.800
1.10641 3.39875 17.6889
7.70768 46.5556 400.895
0.958 2.494 14.275
2000n iterations Method Avg Stdev Max Copt
TOASA 3.148 17.417 150.167 47
ASA 1.074 3.174 17.689 53
APASA-I 0.970 3.147 17.689 46
APASA-II 0.970 2.976 17.689 56
1.007 3.148 17.689
1.064 3.226 17.689
0.979 3.171 17.689
0.875 2.975 17.689
1000n iterations (without 3 outliers) Avg 1.275 2.041 Stdev 3.816 6.695 Max 19.362 49.378 Method Avg Stdev Max Copt CPU (s) 1000n iterations Avg Stdev Max Method Avg Stdev Max Copt CPU (s)
MLSL 397.759 3227.91 28,495.81 22 1.08 (without 3 outliers) 3.240 8.402 46.889 APASA-I 1.051 3.197 17.689 39 0.62
SA-I 2.030 8.830 73.002 50
SA-III 1.151 3.238 17.689 53
2000n iterations (without 3 outliers) Avg 0.954 0.992 Stdev 3.088 3.129 Max 17.689 17.689 2000n iterations Method Avg Stdev Max Copt
with outliers APASA-IV 0.39406 8.68095 17.6889 63
APARS 31.91 208.6 1708 35
In IS some problem results are missing.
without 3 outliers APASA-IV APARS 0.12689 1.62 8.68496 6.03 17.6889 44.92
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
855
Table 2 Results of interval stimation based Adaptive Partitioning Algorithms on 77 test functions (20; 2000n iterations) Method
DPS
STR
KAB
MTZ
ZIL
BOE
Avg Stdev Max Copt CPU (s.)
83.44 414.64 2640.10 32 2.4
109.61 500.11 3074.32 25 1.65
713.28 4865.9 40,008.45 23 1.55
642.25 4787.2 40,008.45 24 1.77
594.42 4584.0 38,325.61 22 1.86
615.07 4785.2 40,008.45 22 1.85
Without 3 outliers Avg 50.5974 Stdev 326.6 Max 2640.1
78.7052 426.4 3074.32
617.416 4886.7 40,008.4
606.761 4886.8 40,008.4
582.168 4681.1 38,325.6
603.74 4487.0 40,008.4
Table 3 Results on 18 test functions with more than 10 variables up to 400 variables Method
SA-I
SA-III
ASA
TOASA
APASA-II
APASA-IV
1000n iterations Avg Std Max Copt CPU (s.)
20.242 46.398 199.702 3 83.99
15.154 31.002 128.038 3 80.92
378.367 749.194 2402.393 4 107.56
518.416 1198.110 4592.130 4 71.08
100.265 260.263 1065.299 3 170.12
182.784 504.812 2109.04 3 173.4
2000n iterations Method Avg Std Max Copt
SA-I 17.026 37.900 161.481 5
SA-III 12.630 19.856 63.106 5
ASA 352.340 738.473 2747.921 5
TOASA 334.793 716.855 2271.618 6
APASA-II 100.954 295.011 1276.991 4
APASA-IV 154.825 422.102 1887.231 4
executed on two processors.
moves dedicated to local search applied to topographical minima. Similarly, APARS shows considerable improvement due to the increased e!ectiveness of more densely sampled partitions. All interval-based adaptive partitioning algorithms in Table 2 are run with 20;2000n function evaluations, because the initial random seed does not a!ect the results and a fair comparion should be made with the other methods. In order to enhance these interval arithmetic based approaches, we permit the latter algorithms to run into a partitioning depth of 5000. However, it is observed in the experimentation that at most 1000 partitioning iterations are made. The best interval estimation measures are the two Lipschitzian measures DPS and STR. Yet, their performance is far more inferior than those of APASA variants and APARS with 2000n function evaluations. The inferiority is more acute when the outlier problems are not included in the results. Furthermore, the results
856
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
should be viewed with the underlying fact that some of the functions' global optima lie on the origin (one of the boundaries of the initial partitions generated at iteration zero). Although the interval estimation measures are naturally placed in an advantageous position by this fact (Copt are rather high despite their inferior average performance), their performance is remarkably inferior to APA with the fuzzy partition assessment measures. Without excluding the outlier problems we can make the following comments. APASA-I is applied to problems up to and including 5 variables, and consequently, 10 of the more di$cult problems are omitted from the reported results. Therefore, it is not surprising to see that APASA-III's performance (in solving all 77 problems) is quite inferior as compared to APASA-I and that the situation is reversed when the outlier problems (all having more than 5 variables) are omitted. Comparing the performance of APASA-II against APASA-III in outlier problems, we observe that although APASA-III divides the feasible region into "ner subspaces, many function evaluations are wasted in meaningless (in terms of the global optimum) areas in the "rst iterations of the search which leads to a deterioration in performance. This e!ect shows itself speci"cally in problem C23, the Trid, where APASA-III results in a deviation of 936.72 whereas APASA-II results in a deviation of 5.15 (which is in fact the second best result after ASA's deviation of 3.32). As for comparing the two membership function/fuzzy measure combinations, although there is no signi"cant di!erence between the results of APASA-II and APASA-IV when the number of moves is limited to 1000n, this di!erence becomes marked in favour of APASA-IV when the number of moves are doubled. This result may be due to the subdued e!ect of membership values in FZ2. When the number of samples per partition are increased, an averaging e!ect is obtained and more reliable decisions can be made. A fact which should be mentioned concerns the deterministic IS method. Although we managed to verify the results of problems C14 to C18 reported in Bilchev and Parmee [37], we could not solve the hard permutation problems C20 and C21, and verify the results of IS in the referenced web site http://solon.cma.univie.ac.at/neum/glopt. The reason is that, in some problems, we could not "nd three points satisfying convexity conditions locally to start Brent's method. The other problems which our code of IS cannot solve are C46, C48, C72, and C73. Comparisons among ASA and TOASA, and, SA-I and TOSA, demonstrate that identifying the topographical minima wastes function evaluations as far as SA-I and ASA are concerned. However, as Ali and Storey (1994) indicate, this additional phase in TMLSL does improve the performance of MLSL, speci"cally in outlier problems. The considerably poor performances of MLSL and TOLS are due to problems C23, 24, 63, 68 whereas MLSL is a good method on the moderately di$cult problems. Some remarks should be made about the problem speci"c results obtained in the experiments. In general, functions C16, 21, 23, 24, 27, 55, 63, 68 are the trouble makers for most of the methods. For C16 (Shekel's Foxholes), APASA-IV "nds the exact optimum within 2800 functional evaluations whereas all other methods (apart from other APA variants) result in a deviation of 7.21. For C21, (PERM0), which is a hard optimization problem where small di!erences from the optimal variable values lead to large di!erences from the global optimum functional value, SA-I, all APA variants with SA-III, and ASA provide the best results (a deviation less than 0.40). C24 (Schwefel's Sine Root) is a function where deviations may rise up to 1802. Here, SA-III and ASA end up with the best deviations (less than 0.3 within 10000 function evaluations). For C27 (Ackley's) SA-I,
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
857
SA-III and APASA-II provide the least deviations (less than 0.2) whereas ASA provides a deviation of 5.86. C55, which is the popular Goldstein}Price test function, is solved to the exact global optimum by only the four APA variants with SA-III in less than 50 function evaluations. C63 is a function with too many local minima and deviations rise up to 28495 (in MLSL). APASA-I and IS "nd a deviation of 0.41 for this di$cult function and APASA-II and ASA result in deviations of 1.04 and 1.35 within 146 and 3139 function evaluations, respectively. The latter ASA result as well as the results of C64 and C77 are consistent with those of Ingber [36] although we did not "ne tune any parameters in our implementation of ASA. As for C68, only the four APA variants with SA-III and IS "nd a result of three digit accuracy. Considering the results of STSA-I, Dekkers and Aarts [14] claim that STSA-I performs as good as MLSL (although sometimes worse), but their tests involve a small set of test problems. In Table 1, we observe that the performance of STSA-I is considerably superior to that of MLSL when the outliers are included in the results. This result is due to the fact that its performance is comparatively better in problems C63 and 68. In the Trid function, STSA-I results in the worst deviation of 2550.68. The bad performance of all methods on the Trid function is not surprising, due to the lack of gradient information. On the other hand, STSA-II where procedure LOCAL is inserted, is considerably superior to STSA-I both when the outlier problems are included and excluded. A similar conclusion may be drawn for SA-I and SA-III. Therefore, we may conclude that the random hill-climbing procedure LOCAL remedies most of the handicaps of the methods within which it is imbedded. To summarize, in the "rst set of problems, SA-III, ASA and APA variants are successful in general. Furthermore, although it has a linear partitioning scheme, APASA-II where SA-III is imbedded, somewhat improves SA-III. It is important to note that no "ne-tuning of parameters are carried out for any of the methods and gradient information is not exploited at all. Therefore, the average performance of all methods might be improved considerably if the latter two channels of experimentation are taken up. In larger size problems (Table 3), only the methods which are successful in the "rst set of problems are compared. However, the performance of APASA-II, APASA-IV, TOASA and ASA are far from comparable against those of SA-I and SA-III considering the results in Table 3. It seems that ASA's rate of convergence slows down in very large feasible spaces due to the cyclic visits to each dimension. The reason why APASA-II's performance is much worse than that of SA-III is that APASA-II loses many function evaluations in partitions which do not contain the global optimum before these partitions are assessed and discarded. In large problems, where it is more di$cult to converge to the optimum, these lossess a!ect the performance of the algorithm more acutely than in smaller problems. The freedom of search in SA-I and SA-III becomes very e!ective in this respect. Again, SA-III's performance could be improved by "ne tuning parameters such as b, the frequency of applying procedure LOCAL, the neighbourhood perimeter searched in LOCAL, etc.
4. Conclusion We propose various new approaches for global optimization in the areas of clustering, simulated annealing and adaptive partitioning. We compare them with well established methods in their respective areas. The test functions used in the comparison are collected from a wide range of
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
858
references and they range from small to large size test functions up to 400 variables. Thus, this report should provide a more robust basis for introducing new methods and/or improving existing ones in the "eld of global optimization. Computational experiments demonstrate that more research should be carried out to ameliorate the performance of probabilistic search algorithms in dealing with large scale nonlinear functions.
Appendix
No Name
Source/property
Reported by
No. Upper of bound, vars. lower bound
1
Complex
Press et al. [47]
2
(!2, 2)
0
2 3 4 5
Stenger Himmelblau Helical Valley Brown almost linear-3 Brown almost linear-4 Extended Kearfott Sine Envelope
Stenger [49] Botsaris [50] More et al. [51] More et al. [51]
Androulakis and Vrahatis [48] [48] [48] [48] [48]
2 2 3 3
(!1, 4) (!6, 6) (!2, 2) (!2, 4)
0 0 0 1
More et al. [51]
[48]
4
(!2, 4)
1
Kearfott [52]
[48]
4
(!3, 4)
0
Srinivas and Patnaik [54]
2
(!100, 100)
0
[54]
2
(!100, 100)
0
[54]
2
(!0.5, 0.5)
0
[54]
4
(!0.5, 0.5)
0
[54]
5
(!0.5, 0.5)
0
Michalewicz [55]
2
X :!3, 12.1 X : 4.1, 5.8
Bilchev and Parmee [37]
5
(!5.12, 5.12)
0
[37]
5
(!600, 600)
0
[37]
5
(0, 10)
6 7 8
9
10 11 12 13
14
15 16
Scha!er [53] Increasing barrier between minima, multimodal, symmetric Srinivas and Patnaik [54] Decreasing barrier between minima, multimodal Epistacity-k"5 Srinivas and Patnaik [54] Multimodal, 2 local optima Epistacity-k"5 Srinivas and Patnaik [54] 4 local optima Epistacity-k"5 Srinivas and Patnaik [54] 5 local optima Spike Michalewicz [55] Many spikes with a slight increasing tendency towards the global maximum Sphere http://iridia.ulb.ac.be/langerman/ ICEO.html Unimodal Griewank http://iridia.ulb.ac.be/langerman/ ICEO.html Shekel's Foxholes http://iridia.ulb.ac.be/langerman/ ICEO.html Very spiky, m"30
Global optimum
38.85 (max)
!10.40
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
859
Appendix (continued) No Name
Source/property
Reported by
No. Upper of bound, vars. lower bound
17 Michalewicz
http://iridia.ulb.ac.be/langerman/ ICEO.html http://iridia.ulb.ac.be/langerman/ ICEO.html valley between three spikes Same as 14 Neumaier web site Many local minima, very hard problem, small discrepancy in variable values lead to large di!erence in function value Neumaier web site Many local minima, very hard problem, small discrepancy in variable values lead to large di!erence in function value Neumaier web site Singular minimum among very #at valleys
[37]
5
(0, p)
!4.687
[37]
5
(0, 10)
!1.5
[37] Neumaier solon.cma.univie. ac.at/&neum/ glopt/myproblems Neumaier solon.cma.univie. ac.at/&neum/ glopt/myproblems Neumaier solon.cma.univie. ac.at/&neum/ glopt/myproblems Neumaier solon.cma.univie. ac.at/&neum/ glopt/myproblems web site MATLAB/TEST/ Lazauskas
3 4
(!5.12, 5.12) (!4, 4)
0 0
10
(!1, 1)
0
8
(0, 2)
0
10
(!100, 100)
!210
10
(!500, 500)
0
10
(!5.12, 5.12)
0
4
(!2.048, 2.048)
0
10
(!30, 30)
0
10
(!600, 600)
0
18 Langerma
19 Sphere 20 Permutation Beta"0.005
21 Permutation Beta"100
22 Powersum
23 Trid
Neumaier web site Convex, quadratic, tridiagonal hessian, best solved by gradient
24 Schwefel's
web site MATLAB/
Sine Root
25 Rastrigin
26 Rosenbrock
27 Ackley
28 Griewank
TEST/Lazauskas Second local minimum which is very far from global minimum traps web site MATLAB/TEST/ Lazauskas Highly multimodal and di$cult Rosenbrock [56] Long curved only slightly decreasing valley, unimodal web site MATLAB/TEST/ Lazauskas http://iridia.ulb.ac.be/langerman/ ICEO.html 4 local minima
web site MATLAB/TEST/ Lazauskas web site MATLAB/TEST/ Lazauskas web site MATLAB/TEST/ Lazauskas http://iridia.ulb. ac.be/langerman/ ICEO. html
Global optimum
860
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
Appendix (continued) No Name
Source/property
Reported by
No. Upper of bound, vars. lower bound
29 Three hump Camel Back 30 Levy 1
Levy et al. [57]
Jansson and KnuK ppel [58]
2
(!2, 4)
0
1
(!4, 4)
7
1
(!10, 10)
!14.508
2
(!10, 10)
!176.541
2
(!10, 10)
!176.137
3
(!10, 10)
0
4
(!10, 10)
0
5
(!10, 10)
0
8
(!10, 10)
0
10
(!10, 10)
0
2
(!10, 10)
0
3
(!10, 10)
0
4
(!10, 10)
0
5
(!5, 5)
0
31 Levy 2 32 Levy 3 33 Levy 5 34 Levy 8 35 Levy 9 36 Levy 10 37 Levy 11 38 Levy 12 39 Levy 13 40 Levy 14 41 Levy 15 42 Levy 16 43 Levy 18 44 Beale 45 Schwefel 3.1 46 Perturbed schewel 47 Booth 48 Box 3D 49 Kowalik 50 Powell 51 Matyas 52 Schewel 3.7
Levy et al. [57] 3 local minima Levy et al. [57] 19 local minima Levy et al. [57] 760 local minima Levy et al. [57] 760 local minima 125 local minima Levy et al. [57] Levy et al. [57] 625 local minima Levy et al. [57] 10 local minima Levy et al. [57] 10 local minima Levy et al. [57] 10 local minima Levy et al. [57] 900 local minima Levy et al. [57] 2700 local minima Levy et al. [57] 71000 local minima Levy et al. [57] 10 local minima Levy et al. [57] 10 local minima Schwefel [59] Schwefel [59] Schwefel [59] Schwefel [59] Schwefel [59] Schwefel [59] Singular, hessian at origin
54 Branin
Schwefel [59] Singular hessian at XH"0 Schwefel [59] Singular hessian at XH"0 ToK rn and Zilinskas [60]
55 Goldstein}Price
3 local minima, slow gradient 4 local minima
53 Schewel 3.2
Global optimum
[58] [58] [58] [58] [58] [58] [58] [58] [58] [58] [58] [58] [58] 7
(!5, 5)
0
[58] [58] [58] [58]
2 3 3
(!4.5, 4.5) (!10, 10) (!103, !103)
0 0 0
[58] [58] [58] [58] [58] [58]
2 3 4 4 2 5
(!5, 5) (!10, 30) (0, 0.42) (!2, 4) (!30, 30) (!0.005, 0.89)
0 0 0.000307 0 0 0
[58]
2
(!10, 10)
0
ToK rn and Zilinskas [60] [60]
2
X : (!5, 10)
0.3978
2
X : (0, 15) (!2, 2)
3
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
861
Appendix (continued) No Name
Source/property
Reported by
No. Upper of bound, vars. lower bound
56 Hartman 57 Sixhump Camel Back Valley 58 Rastrigin 59 Shekel 5
4 local minima ToK rn and Zilinskas [60]
[60] [60]
3 2
(0, 1) (!5, 5)
[60] [60]
2 4
(!1, 1) (0, 10)
!2 !10.153
[60]
4
(0, 10)
!10.402
[60]
4
(0, 10)
!10.536
[60]
6
(0, 1)
Ingber [36]
4
!1000, 1000
60 Shekel 7 61 Shekel 10 62 Hartman 63 Ingber's 10
64 p3-(2D Schubert) 65 p8 66 p16 67 p22 68 WingoA 69 70 71 72 73 74 75
WingoB WingoC Exp 2 Exp 4 Cos 2 Cos 4 Easom
76 De Jong t2
ToK rn and Zilinskas [60] 5 local minima ToK rn and Zilinskas [60] 7 local minima ToK rn and Zilinskas [60] 10 local minima ToK rn and Zilinskas [60] 4 local minima Corana et al. [61] 10 local minima, gradient search is useless Allutti and Pentini [62] 760 local. 18 global minima Allutti and Pentini [62] 5 local minima Allutti and Pentini [62] 15 local minima Allutti and Pentini [62] Origin is a local minima Wingo [29] [29] [29] Breiman and Cutler [63] [63] [63] [63] Easom [64] A singleton maximum in a horizontal valley De Jong [66]
77 De Jong t3 } Step Function
[66]
78 Baluja-1
Baluja (1994) [68] High interdependence among variables Baluja [68] High interdependence among variables Baluja [68] No interdependence among variables
79 Baluja-2
80 Baluja-3
Global optimum !3.862782 !1.031628
!3.32236 0
Dekker and Aarts 2 [14] [14] 3
(!10, 10)
!186.7309
(!10, 10)
0
[14]
5
(!5, 5)
0
[14]
2
(!20, 20)
Breiman, Cutler 1 [63] [63] 1 [63] 1 [63] 2 [63] 4 [63] 2 [63] 4 Stuckman and 2 Easom [65] Wodrich and 2 Bilchev [67] http://iridia. 5 ulb.ac.be/langerma n/ICEO.html Wodrich and 100 Bilchev [67]
(3, 17) (2, 26) (4.1, 2746) (!1, 1) (!1, 1) (!1, 1) (!1, 1) (!100, 100)
!24775 !28.1617 !73.452 !407.61 0.3678 0.1353 !2.2 !4.4 !1
(!2.048, 2.048) !3906.98 (!5.12, 5.12)
!30
(!2.56, 2.56)
!100,000
[67]
100
(!2.56, 2.56)
!100,000
[67]
100
(!2.56, 2.56)
!100,000
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
862
Appendix (continued) No Name
Source/property
81 82 83 84
Same Same Same Same
85 86 87 88 89 90 91 92 93 94 95
Sphere Michalewicz Powersum Schwefel's Sine Root Rastrigin Rastrigin Griewank Griewank Griewank Original Schewel, 3.7 Schewel, 3.7 Ackley's Fnc. Ackley's Fnc. Ackley Fnc. Ackley Fnc.
as as as as
14 17 22 24
Same as 25 Same as 25 Same as 25 Same as 25 Griewank [69] Same Same Same Same Same Same
as as as as as as
Reported by
52 52 27 27 27 27
ToK rn and Zilinskas [60]
No. Upper of bound, vars. lower bound
Global optimum
30 10 64 20
(!5.12, 5.12) 0, PI (0, 2) (!500, 500)
0 !9.66 0 0
20 50 20 100 50
(!5.12, 5.12) (!5.12, 5.12) (!600, 600) (!600, 600) (!100, 100)
0 0 0 0 0
10 30 30 100 200 400
(!0.002, 0.63) (!0.005, 0.36) (!30, 30) (!30, 30) (!30, 30) (!30, 30)
0 0 0 0 0 0
Solution may not be global optimum.
References [1] MoreH JJ, Wu Z. Global smoothing and continuation for large-scale molecular optimization. Preprint MCS-P5391095. Argonne National Laboratory, Illinois, 1995. [2] Schoen F. A wide class of test functions for global optimization. Journal of Global Optimization 1993;3: 133}7. [3] Gill P, Murray W. Quasi-Newton methods for unconstrained optimization. Journal of Institute of Mathematics and its Applications 1972;9:91}108. [4] Fletcher R, Powell M. A rapidly convergent descent method for minimization. Computer Journal 1963;6: 163}8. [5] Armijo L. Minimization of functions having Lipschitz continuous "rst partial derivatives. Paci"c J. of Mathematics 1966;16:1}3. [6] Fletcher R, Reeves C. Function minimization by conjugate gradients. Computer Journal 1964;7:149}54. [7] Beveridge G, Schechter R. Optimization: theory and practice. New York: Mc-Graw Hill, 1970. [8] ToK rn A. A search clustering approach to global optimization. In: Dixon LCW, SzegoK GP, editors. Towards global optimization 2. Amsterdam: North-Holland, 1978. [9] Price WL. A controlled random search procedure for global optimization. In: Dixon LCW, SzegoK GP, editors. Towards global optimization 2. Amsterdam: North-Holland, 1978. [10] Rinnooy Kan AHG, Timmer GT. Stochastic methods for global optimization. American Journal of Mathematical Management Science 1984;4:7}40. [11] Ali MM, Storey C. Topographical multi level single linkage. Journal of Global Optimization 1994;5:349}58. [12] ToK rn A, Viitanen S. Topographical global optimization using pre-sampled points. Journal of Global Optimization 1994;5:267}76.
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
863
[13] Kirkpatrick A, Gelatt Jr CD, Vechi MP. Optimization by simulated annealing Science 1988;220:671}80. [14] Dekkers A, Aarts E. Global optimization and simulated annealing. Mathematical Programming 1991;50:367}93. [15] Ingber L. Simulated annealing: practice versus theory. Journal of Mathematical Computer Modelling 1994;18(11):29}57. [16] Frost R. Ensemble based simulated annealing, ftp.sdsc.edu/pub/sdsc/math/Ebsa, La Jolla. CA. University of California, San Diego, 1993. [17] Wood GR. Multidimensional bisection and global minimization, Working Paper, New Zealand: University of Canterbury, 1985. [18] Horst R. A general class of branch and bound mehods in global optimization, with some new approaches for concave minimization. Journal of Optimization Theory and Applications 1986;51:271}91. [19] Csendes T, Pinter J. The impact of accelerating tools on the interval subdivision algorithm for global optimization. European Journal of Operations Research 1993;65:314}20. [20] Caprani O, Gothaab B, Madsen K. Use of real-valued local minimum in parallel interval global optimization. Interval Computations 1993;3:71}82. [21] Ratz D, Csendes T. On the selection of subdivision directions in interval branch and bound methods. Working Paper, Karlsruhe University, Institut fuK r Angewandte Mathematik, Karlsruhe, Germany, 1994. [22] Streltsov S, Vakili P. Global optimization: Partitioned random search and optimal index policies. Working Paper. Manufacturing Engineering Department. Boston: Boston University, 1995. [23] Pinter J. Convergence quali"cation of adaptive partitioning algorithms in global optimization. Mathematical Programming 1992;56:343}60. [24] Kushner HJ. A new method of locating the maximum point of an arbitrary multi-peak curve in the presence of noise. Transactions Of ASME, Series D, Journal of Basic Engineering 1964;86:97}105. [25] Danilin YM, Piyavskii SA. On an algorithm for "nding the absolute minimum, In: Theory of optimal solutions. Kiev: Institute of Cybernetics, 1967;25}37. [26] Schubert BO. A sequential method seeking the global maximum of a function. SIAM Journal of Numerical Analysis 1972;9:379}88. [27] Strongin RG, Numerical methods for multiextremal problems. Moscow: Nauka, 1978. [28] Zilinskas A. Two algorithms for one-dimensional multimodal minimization. Optimization 1981;12:53}63. [29] Wingo DR. Estimating the location of the Cauchy distribution by numerical global optimization. Communications in Statistics Part B. Simulation and Computation 1983;12:201}12. [30] Moore RE. Automatic error analysis in digital computation. Technical Report LMSD-48421, Lockheed Missiles and Space Division, Sunnyvale, CA, 1959. [31] Moore RE. Methods and applications of interval analysis. Philadelpxhia: SIAM, 1979. [32] Moore RE, Ratschek H. Inclusion functions and global optimization II. Mathematical Programming 1988;41:341}56. [33] Archetti F, Betro B. A probabilistic algorithm for global optimization. Calcolo 1979;16:335}43. [34] Mockus JB, Thiesis V, Zilinskas A. The application of Bayesian methods for seeking the extremum. In: Dixon LCW, Szego, GP, editors. Towards global optimization, vol. 2. Amsterdam: North-Holland, 1978;117}29. [35] Boender CGE. The generalized multinomial distribution: a Bayesian analysis and applications, PhD dissertation. Erasmus University, Rotterdam, 1984. [36] Ingber L. Adaptive simulated annealing (ASA): lessons learned. Control and Cybernetics, (Polish Journal), 1999, to appear. [37] Bilchev G, Parmee I. Inductive search. IEEE Transactions 1996;832}6. [38] OG zdamar L, Bozyel MA. The capacitated lot sizing problem with overtime decisions and setup times. Working Paper, Istanbul KuK ltuK r University, Dept of Computer Engineering. Istanbul. Turkey. 1996. [39] OG zdamar L, Bozyel MA. Simultaneous lot sizing and loading of product families on parallel facilities of di!erent classes. International Journal of Production Research 1998;36:1305}24. [40] OG zdamar L, Birbil IS. A hybrid genetic algorithm for the capacitated lot sizing and loading problem. European Journal of Operational Research 1998;110:525}47. [41] Demirhan M, OG zdamar L, Helvacioglu L, Birbil IS. A fractal partitioning algorithm (with fuzzy measures) for optimization. EUFIT '97, September 1997, Aachen, Germany.
864
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
[42] Demirhan M, OG zdamar L. A note on the use of fuzzy measures in adaptive partitioning algorithms for global optimization. IEEE Transactions on Fuzzy Systems 1999, to appear. [43] Pal NR, Bezdek JC. Measuring fuzzy uncertainty.. IEEE Transactions on Fuzzy Systems 1994;2:107}18. [44] Pal NR, Pal, SK. Object-background segmentation using new de"nitions of entropy IEE Proceedings 136, pt. E. 1989;284}95. [45] Deluca A, Termini S. A de"nition of nonprobabilistic entropy in the setting of fuzzy sets theory. Information and Control 1972;20:301}12. [46] Ozdamar L, Demirhan M. Comparison of partition evaluation measures in an adaptive partitioning algorithm for global optimization. Fuzzy Sets and Systems 1999, to appear. [47] Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical recipes, The art of scienti"c computing. New York: Cambridge University Press, 1992. [48] Androulakis GS, Vrahatis MN. OPTAC: a portable software package for analyzing and comparing optimization methos by visualization. Journal of Computational and Applied Mathematics 1996;72:41}62. [49] Stenger F. Computing the topological degree of a mapping in RL. Numerical Mathematics 1975;25:23}38. [50] Botsaris CA. A curvilinear optimization method based upon iterative estimation of the eigensystem of the hessian matrix. Journal of Mathematical Analysis and Applications 1978;63:396}411. [51] More BJ, Garbow BS. Hillstrom, K.E. Testing unconstrained optimization. ACM Transactions on Mathematical Software 1981;7:17}41. [52] Kearfott B. An e$cient degree-computation method for a generalized method of bisection. Numerical Mathematics 1979;32:109}27. [53] Scha!er JD. A study of control parameters a!ecting online performance of genetic algorithms for function optimization. Proceedings of the Third International Conference on Genetic Algorithms, 1989: 51}60. [54] Srinivas M, Patnaik LM. Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics 1994;24:656}67. [55] Michalewicz Z. Genetic algorithms ! data structures " evolution programs. Berlin: Springer, 1994. [56] Rosenbrock HH. State-space and multivariable theory. London: Nelson, 1970. [57] Levy AV, Montalvo A, Gomez S, Calderon A. Topics in Global Optimization. Lecture Notes in Mathematics, No. 909. Berlin: Springer, 1981. [58] Jansson C, KnuK ppel O. Numerical results for a self-validating global optimization method. Working Paper, No. 94.1, Technical University of Hamburg-Harburg, 1994. [59] Schwefel H. Numerical optimization of computer models. New York: Wiley, 1991. [60] ToK rn A, Zilinskas A. Global optimization. Lecture Notes in Computer Science. Berlin: Springer, 1989. [61] Corana A, Marchesi M, Marini C, Ridella S. Minimizing multimodal functions of continuous variables with the simulated annealing algorithm. ACM Transactions of Mathematical Software 1987;13:262}79. [62] Alu$-Pentini F, Parisi V, Zirilli F. Global optimization and stochastic di!erential equations. Journal of Optimization Theory and Applications 1985;47:1}16. [63] Breiman L, Cutler A. A deterministic algorithm for global optimization. Mathematical Programming 1993;58:179}99. [64] Easom E. A survey of global optimization techniques. M Eng thesis, University of Louisville, Louisville, KY, 1990. [65] Stuckman BE, Easom EE. A comparison of Bayesian/Sampling global optimization techniques. IEEE Transactions on Systems, Man and Cybernetics 1992;22:1024}32. [66] De Jong K. An analysis of the behaviour of a class of genetic adaptive systems. PhD thesis. University of Michigan, Ann Arbor, 1975. [67] Wodrich M, Bilchev G. Cooperative distributed search: the ant's way. Working Paper, University of Cape Town, South Africa, 1997. [68] Baluja S. Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning. Working Paper. CMU-CS-94-163, School of Computer Science, Carnegie Mellon University, 1994. [69] Griewank AO. Generalized descent for global optimization. Journal of Optimization Theory and Applications 1981;34:11}39.
L. O$ zdamar, M. Demirhan / Computers & Operations Research 27 (2000) 841}865
865
Linet OG zdamar has her B.Sc., M.Sc. and Ph.D. from Bog\ azic7 i University, Istanbul, Turkey. She currently holds the Chair of the Computer Engineering Department at Istanbul KuK ltuK r University and has published work in international scienti"c journals on project scheduling, hierarchical production planning, simultaneous lot sizing and loading, applications of meta-heuristics on OR problems, global optimization and environmental site assessment. Melek Demirhan has her B.Sc. from Middle East Technical University, Ankara, her M.Sc. from Hacettepe University, Ankara, and her Ph.D. from the Department of Econometrics, Marmara University, Istanbul. She is currently a professor in the Systems Engineering Department, Yeditepe University, Istanbul. Her interests lie in probabilistic and fuzzy uncertainty modelling, and spatial statistics. Dr. Demirhan has published work in journals such as IEEE Transactions on Fuzzy Systems, Fuzzy Sets and Systems, and the Journal of Global Optimization.