Clustered genetic search in continuous landscape exploration

Clustered genetic search in continuous landscape exploration

ARTICLE IN PRESS Engineering Applications of Artificial Intelligence 17 (2004) 407–416 Clustered genetic search in continuous landscape exploration R...

428KB Sizes 0 Downloads 67 Views

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence 17 (2004) 407–416

Clustered genetic search in continuous landscape exploration Robert Schaefer*, Katarzyna Adamska, Henryk Telega ! Poland Institute of Computer Science, Jagiellonian University, Nawojki 11, 30-072 Krakow,

Abstract We present the strategy of clustered genetic search as a case-oriented tool for global optimization problems. The strategy is performed in two quite different manners—as an iterative discrete hill-crunching approach and as a continuous method based on a finite mixture model. We set up the main points of theory on convergence, stop condition and error estimation. We illustrate the properties of both methods in two examples, including the problem of optimal geometry for argon cluster. r 2004 Elsevier Ltd. All rights reserved. Keywords: Basin of attraction; Genetic algorithm; Clustering

1. Statement of the problem There is a wide range of global optimization methods which are aimed at finding a single local minimizer (or maximizer) xþ of the real valued objective F in an admissible set DCRn : However, there is often a need of locating all available minimizers (maximizers) as well as detecting their basins of attraction (see definition below). One can mention inverse parametric problems being classical ambiguous problems with many minimizers, such as tasks appearing in the ultrasonographic defectoscopy for instance. The information about the shape and the volume of certain level sets that belong to basins of attraction may be useful for the stability analysis, which is crucial for the safety of mechanical structures. Detecting multiple extrema and their basins of attraction appears frequently in the chemical computation concerning optimal geometry and path of chemical reaction. This paper focuses on global minimization problems with continuous objective functions of the form F : D-R; DCRn ; mpFðxÞpM; 8xAD; which have only isolated minimizers in the interior of the domain and for which one can construct equivalent maximization * ¼ M  m  F: problems: F

*Corresponding author. E-mail addresses: [email protected] (R. Schaefer), adamska@ ii.uj.edu.pl (K. Adamska), [email protected] (H. Telega). 0952-1976/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2004.04.014

# Let LðyÞ ¼ fxAD: FðxÞpyg and LðyÞ ¼ fxAD : FðxÞoyg denote two types of level sets of function F: Lx ðyÞ and L# x ðyÞ stand for such simply # connected components of LðyÞ and LðyÞ (respectively) that contain x: For an arbitrary fixed x being a stationary point of function F let yðx %  Þ be defined as follows: R{yðx % Þ 8 8 > > < > > < min > ¼ : > > > :

y : (x isolated

9 > =

stationary point of F > ; x ax ; xALx ðyÞ minxAqD FðxÞ

if x exists otherwise

where xAqD denotes points on the boundary of the domain. The basin of attraction Bx of a local minimizer x is the simply connected component of the interior of L#  ðyðx %  ÞÞ; such that x AB  : x

x

Generally, cluster analysis (see e.g. Jain et al., 1999 and references therein) is a specific group of pattern recognition techniques, that does not utilize any learning set. Clustering consists in unsupervised exploration of a given data set, aimed at discovering groups in the data. To be more formal, clustering results in the construction of a partition of a discrete data set X ¼ fx1 ; y; xm g into non-empty, exclusive subsets X1 ; y; Xk ; kpm called clusters. That means clustering is governed by some equivalence relation R on X and all data items classified into the same cluster Xi are equivalent in the sense of R: Clustering algorithms has been also applied in

ARTICLE IN PRESS 408

R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

continuous global optimization. A group of such methods is described in Guus et al. (1995) and Horst and Pardalos (1995). The idea of applying clustering methods in global optimization is to determine groups of points from which local searches can be started. It is desirable that the number of local searches is diminished to one in each basin of attraction of a local extremum. The role of genetic algorithms in solving clustering problems has increased for recent years. Chiou and Lan (2001) classify heuristic algorithms for clustering in four groups, where the one is constituted with genetic algorithms. The authors provide a comprehensive list of literature concerning that approach. They propose and compare three models of genetic clustering utilizing different methods of coding. Tseng and Yang (2001) emphasize that the genetic algorithm gives an opportunity of omitting the common problem of determining a priori the number of clusters, which is a necessary assumption in most of the clustering methods. They present their algorithm of clustering with the use of a genetic algorithm that is designed to detect spherical clusters. In their next work Tseng and Yang (2000), they extend their ideas to the case of non-spherical clusters. The other authors Maulik and Bandyopadhyay (2000) name genetic clustering as some kind of enhancement of the k-means method; instead of the usual k-means procedure, they find the center of the clusters with a genetic algorithm. In the above mentioned works, as well as in nearly all papers where the term genetic clustering is used, a genetic algorithm serves as a help tool for a clustering task. We would like to show another perspective of combining these two strategies. We propose to exchange the roles and to use the clustering strategy as a tool for analysis of the results of genetic algorithms. In other words, genetic algorithm is not a help for clustering, but a generator of an input data set for clustering. Let us introduce for our strategy the name clustered genetic search (CGS) (in previous works Adamska, 2004 and Schaefer et al., 2002 we used to name it as genetic clustering). Respecting the above introduced notation, let us now define the relation R of the following form: Definition 1. Let F : D-R; DCRn be a continuous function, X CD is a discrete data set. Two elements xi ; xj AX are in relation R if and only if they belong to the same basin of attraction of a local minimizer of the function F or they do not belong to any basin of attraction. It is obvious that R is the equivalence relation. In our approach, a cluster is a discrete data set (identified by the means of the above relation) and located in the basin of attraction of an isolated local minimizer xþ of F: Moreover, for each xþ we look for a cluster extension, which is a closed positive-measured set located in a

central part of Bxþ (it is included in Bxþ and contains xþ in its interior).

2. Selected results of genetic algorithms asymptotic theory This section is to present some theoretical results that will be used in Section 3 as a background for analysis of asymptotic properties of two clustered genetic search methods introduced there. In Section 2.1 we introduce some basic elements of Simple Genetic Algorithm (SGA) theory. We interpret SGA as a measure-transforming system and conclude with two theorems which are tools for asymptotic analysis of both clustered genetic search methods. Section 2.1 describes also comprehensively the idea of a well-tuned genetic algorithm which is crucial for clustered genetic search. In Section 2.2 we present a theorem about properties of hierarchical genetic strategy (HGS) that is a genetic algorithm used in one of our methods. It enables error evaluation (Section 3.4) for that method. 2.1. On SGA asymptotic properties We recall some basic mathematical features of SGA (see Vose, 1999), which is intensively used in our clustering strategies. The genetic universum O in this case is a finite set composed of binary strings of the arbitrary length l: The set of phenotypes (encoded designing variables) D*Dr ¼ fai ; iAOg is also discrete. Each population may be represented by the frequency vector x that belongs to the unit simplex: ( L

r1

¼

r

xAR ; xi X0; i ¼ 1; y; r;

r X

) xi ¼ 1 CRr ; ð1Þ

i¼1

where the cardinality of a genotype space #O ¼ r ¼ 2l oN: The SGA dynamics may be expressed by the stochastic sampling role t : Lr1 -MðLr1 Þ coupling the current state xt of the population with the probabilistic measure tðxt Þ over Lr1 such that the passage to the next step t þ 1 may be interpreted as a one-time sampling from Lr1 with respect to tðxt Þ: M stands for the space of probabilistic measures over a set specified in brackets. Each point in Lr1 may also be handled as a probabilistic measure on O; so the formal mapping Y : Lr1 -MðOÞ may be established. This mapping may # : Lr1 -MðDr Þ by easily be extended to the mapping Y using the assumed one-to-one coding O2Dr between the genetic space and the set of phenotypes. For the SGA we can effectively define the algebraic form of the operator G : Lr1 -Lr1 (SGA heuristics).

ARTICLE IN PRESS R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

409

We say that G is focused, if there exists the non-empty set of fixed points KCLr1 of G such that for every xALr1 there exists zAK such that GðxÞp -z; p-N:

G is focused and the set of its fixed points K is finite, 8xþ AWess (Cðxþ ÞCD; which is closed, has the non-zero Lebesque measure; meas ðCðxþ ÞÞ > 0 and includes xþ in its interior; xþ Aint Cðxþ Þ; and

Theorem 2 (Cabib et al., 1998). Let Ke ¼ fxALr1 ; (yAK such that dðx; yÞoeg; d is a distance in Rr : If G is focused, then 8e > 0 : 8Z > 0; (NAN; (W ðNÞAN; such that 8n > N; 8k > W ðNÞ

rz ðxÞXthreshold;

xACðxþ Þ

rz ðxÞothreshold;

xAD 

pkm ðKe Þ

stands for the probabilistic measure where associated with the m-sized population after k genetic epochs.

which means that density rz ðxÞ induced by a fixed point population is greater or equal to some arbitrary selected threshold inside sets Cðxþ Þ and less than the threshold outside of these sets. zAK and threshold stand for the parameters of this definition.

In other words, if G is focused, then a sufficiently large SGA population can be concentrated arbitrarily close to the set of fixed points K with an arbitrary high probability 1  Z; after a sufficient number of evolution epochs. Let us also assume that it is possible to extend the r1 # measures YAMðD ; which are concentrated r Þ; xAL on the phenotype set to the measures CðxÞAMðDÞ; xALr1 ; which have densities rx ALp ðDÞ for some p > 1: The construction of the density rx ; xALr1 can be performed as follows: we spread D into r polyhedra fWi ; iAOg; such that Wi surrounds the phenotype ai : The partitioning is similar to the wellknown Voronoi one. Then we set: # YðxÞðfa i gÞ for wAWi : ð2Þ rx ðwÞ ¼ measðWi Þ

The above definition introduces the parameterized criterion which can distinguish whether the sampling measure can be successfully used in the local minimizer separation and rough location in the admissible set D: The central parts of the basin of attraction are simply occupied by the level sets of density. Let us now introduce a simple example of a welltuned SGA. According to a consideration by Grygiel (1996), in the case of an algorithm with mutation and selection only, a limit population can be easily computed using elementary tools of linear algebra. Let us denote fi as the fitness value for the genotype iAO and f ¼ ðf0 ; y; fr1 Þ as a fitness vector for all considered genotypes. For a given mutation rate x; entries of the mutation matrix U ¼ fuij g are calculated according to the formula

> 1  Z; pkm AMðLr1 Þ

Theorem 3 (Schaefer and Jab"on´ski, 2002). If the operator G is focused, then 8e > 0 8Z > 0; (NAN; (W ðNÞAN; (zAK such that 8m > N; 8t > W ðNÞ n eo P jjrxkm  rz jjLp ðDÞ o > 1  Z; c where c ¼ miniAO fmeas Wi g: The consecutive iterates Gp ðx0 Þ; p ¼ 1; 2; y of a focused heuristics depicts the trajectory of the infinite population genetic algorithm, which deterministically tends to some fixed point zAK starting from the population x0 ALr1 ; which is not necessarily infinite. We may conjecture that this sequence of populations performs the best search that starts from x0 ALr1 ; and zAK contains the maximum information about the optimization problem that can be collected by the genetic algorithm of the current type. Next we may distinguish the class of genetic algorithms that are well suited for attractor recognition. Definition 4 (Schaefer and Ko"odziej, 2003). We say that the particular class of the SGA with the heuristics G is well-tuned with respect to the essential finite set of the local minimizers Wess ; if:

[

Cðxþ Þ

xþ AWess

uij ¼ x/i"jS ð1  xÞl/i"jS :

ð3Þ

In the above / S there is a one-argument operator which returns the number of ones in a binary string. If only the selection and mutation are in use, the heuristics is given by the formula GðxÞ ¼ ðf ; xÞ1 Gx;

ð4Þ

where G ¼ Uðdiag f Þ: It has been proved in Grygiel (1996) that the limit population vector (i.e. corresponding to a population reached after an infinite number of steps) is equal to the eigenvector related to the biggest eigenvalue of G: Thus the problem, if the algorithm with selection and mutation is well-tuned for a given fitness function f ; can be solved by calculation and analysis of the eigenvector for the matrix G: Let us take as a fitness function the one-dimensional Rastrigin function fR ðxÞ ¼ x2 þ 10 cosð2pxÞ þ 110 xA½4:12; 4:12 : ð5Þ The graph of the function is shown in Fig. 1a. The domain is coded uniformly with binary strings of the length 8 and the mutation rate is x ¼ 0:05: The 256-element eigenvector corresponding to the biggest eigenvalue of G is visualized in Fig. 1b. The horizontal

ARTICLE IN PRESS 410

R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

Fig. 1. Example of a well-tuned algorithm: (a) the objective— Rastrigin function, (b) the frequencies of genotypes in the limit population.

axis is for 256 genotypes constituting the space O; and the vertical axis is for their frequencies in the limit population (i.e. values of the eigenvector). In this limit population, genotypes which code points located in central parts of basins of attraction exist with significantly higher frequencies than the other genotypes (see Fig. 1b). Comparison of Fig. 1a and Fig. 1b, implicates that the bigger the fitness value in a given coded domain point is, the higher the frequency of a corresponding genotype. That results in preserving a well-tuning condition for a wide spectrum of threshold values. For example, for a threshold value t1 marked in Fig. 1b, the algorithm is well-tuned with respect to the maximizers corresponding to the sever highest maxima, but omitting two outermost. If some lower threshold t3 is assumed (see Fig. 1b), the algorithm is well-tuned only with respect to the four maximizers corresponding to the lowest maxima in the considered domain. In this case the algorithm is not well-tuned with respect to the rest of maximizers, since checking if genotype frequency is higher than t3 is not enough to distinguish among genotypes from different basins of attraction. For a threshold value t2, the algorithm is well-tuned with respect to all maximizers in the considered domain. 2.2. Selected properties of hierarchical genetic strategy Another algorithm utilized to obtain a sample points in the admissible domain D is the hierarchical genetic strategy (HGS) (Schaefer and Ko"odziej, 2003; Schaefer et al., 2000). It consists in running in parallel a set of dependent evolutionary processes. The dependency relation has a tree structure with the restricted number of levels m: The processes of lower order (close to the root of the structure) represent a chaotic search with low accuracy. They detect the promising regions on the optimization landscape, in which a more accurate process of higher order is activated. Populations evolving in different processes can contain individuals which represent the solution (the phenotype) with different precision. This precision can be achieved by binary genotypes of different length or by different phenotype scaling. The evolutionary mechanism used in

a single population is SGA. Detailed description of HGS can be found in Appendix on p. 16. In the remaining part of this section, we take into account only the binary implementation, so all SGA features introduced previously are valid for each particular HGS branch. Let Gm : Lrm 1 -Lrm 1 denote the heuristics for all populations of the maximum order m; where rm ¼ 2sm is the chromosome length of these populations. Assume Gm is focused, well-tuned and has the unique fixed point zm , such that 8xALrm 1 limt-N Gtm ðxÞ ¼ zm : Assume also that after t epochs, at least b > 0 populations of the maximum order ðx1 Þtm ; y; ðxb Þtm are active, where m now stands for the identical size of each population of this order. Each of them induces the discrete measure # i Þt Þ on the set of phenotypes Dr : Following Yððx m m (Schaefer and Ko"odziej, 2003) we can introduce a new measure: 1 # 1t # b Þt ÞÞ; Þm Þ þ ? þ Yððx Zbt;m ¼ ðYððx m b

ð6Þ

which corresponds to all populations of the currently highest level, being active after t epochs. Analogously, as with (2), the density for that measure can be defined rZbt;m ¼

Zbt;m ðfai gÞ measðWi Þ

for wAWi : Using Theorem 4.1 in (Schaefer

and Ko"odziej, 2003) we may prove: Theorem 5 (Adamska, 2004). 8e > 0 8Z > 0; (NAN; (W ðNÞ > t0 ; such that 8m > N; 8t > W ðNÞ n eo P jjrZbt;m  rzm jjLp ðDÞ o > 1  Z; c where pA½1; NÞ and c ¼ miniAO fmeas Wi g as with Theorem 3.

3. Two examples of clustered genetic search For each strategy below we define the positive * fitness function f ¼ F3code in the space of genetic codes O (genetic universum), that expresses the source minimization problem for F on D: The encoding function code : O-D represents the N-dimensional vector on D defined in the standard way (see eg. Goldberg, 1989). 3.1. Clustered genetic search based on sequential niching (HC-CGS) An example of a clustered genetic search that is based on sequential niching was proposed in Telega (1999). This method can be referred to as Hill crunching clustered genetic search (HC-CGS). It utilizes SGA as the genetic engine. The idea of HC-CGS is presented in Scheme 1. Initially the set of cluster extensions is empty.

ARTICLE IN PRESS R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

Scheme 1. Main scheme of HC-CGS. REPEAT Step 1. Step 2. Step 3.

Step 4.

Step 5. Step 6.

Generate initial population (according to the uniform distribution) Evaluate fitness function f outside already recognized cluster extensions Modify fitness function (f ’MIN on cluster extensions, where MIN is the minimum fitness value that was found) Produce new generations of SGA until the complex stop criterion is satisfied: a) parts of basins (parts of cluster extensions) can be recognized OR b) GA recognizes plateau outside known basins (cluster extensions); this means that the whole domain has been processed Recognition of cluster extensions Update set of recognized cluster extensions (join parts of cluster extensions)

UNTIL the whole domain has been processed OR satisfactory set of cluster extensions is found The domain of searches D is divided into hypercubes that constitute a grid (raster). In this approach, the cluster extensions are unions of hypercubes. Each cluster extension is recognized in a stepwise manner. Each step of this process is performed after the SGA is stopped (that means after step 4). Step 5 can be explained as shown in Scheme 2. Scheme 2. Recognizing cluster extensions in HCCGS. REPEAT Step 5a. Reject hypercubes that contain less individuals than an arbitrary constant. Reject individuals that were placed in these hypercubes Step 5b. Select the hypercube that contains the best individual as the seed of a new part of a cluster extension Step 5c. Attach neighboring not rejected hypercubes to this new part of a cluster extension Start a rough local optimization method the new part of a cluster extension Step 5d. If this local method ends in the already recognized cluster extension, then attach this part to it UNTIL there are no more unclustered not rejected hypercubes

411

The fitness modification results in repelling individuals from cluster extensions (or their parts) that are already known. The global stop criterion distinguishes two basic kinds of SGA behavior. The first one is that the SGA finds parts of cluster extensions after a few generations, and the second is that the SGA ‘‘converges’’ to the uniform distribution of individuals. This corresponds to the recognition of a plateau (or areas where the fitness has small variability) outside of the already known cluster extensions. Other cases are treated as the situation when the SGA does not fit to the particular problem, and a refinement of SGA parameters is suggested. The stopping strategy is as follows: check stagnation of a sequence of some estimator of probabilistic distribution density. If it does not change, then check if an arbitrary number of hypercubes has the density of individuals below the threshold. If so, then begin the clustering procedure; otherwise, check if individuals are uniformly distributed. Computations show that HC-CGS constitutes a filter that eliminates local maxima with small fitness variability and narrow basins of attraction. Such property can be useful in some cases. The HC-CGS strategy should be especially convenient for functions with large areas of small variability (areas ‘‘similar to’’ plateaus) which can be difficult for other global and local optimization methods. 3.2. Justification of HC-CGS stop criteria We try to detect the situation in which the population is sufficiently concentrated in basins so that density cluster recognition is possible. The state in which an arbitrary rate of raster cells contain the assumed number (much less than the average) of individuals can be handled as the local stop criterion. By Theorem 3 the above situation is asymptotically highly probable if there exists at least one basin of attraction out of the union of cluster extensions that are already recognized. The chart of the modified fitness function becomes sufficiently flat at the end of the computations. It corresponds to the unique fixed point of G at the center of Lr1 (see Vose, 1999, Theorem 10.8). If the sufficiently large population that starts from the center of Lr1 (uniform distribution of individuals) does not leave its neighborhood for a sufficient length, this implies that the center of Lr1 is the fixed point of G (with the arbitrarily large probability). This follows from the Theorem 2 and corresponds to the situation that the probability of finding new local minimizers is arbitrarily small. 3.3. Density restoring clustered genetic search ðDR-CGSÞ On the contrary to the above-described iterative approach, the method presented in this section consists

ARTICLE IN PRESS R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

412

of two stages performed in a single run. First, a genetic algorithm is executed, and when it is over, clustering of its results is done. The clustering stage consists in construction of a probability density function, which reflects the spatial distribution of a final population. The method does not impose any kind of genetic algorithm to be used, but particularly in this work we will concentrate on application of HGS (see Section 2.2). Assuming individuals generated by HGS will be concentrated near extremal points (it means HGS is welltuned, see Section 2.2 for explanation), the probability density function will have greater values in these parts of the admissible domain which are close to the extrema. By cutting the function for some threshold, we get its level sets. Those level sets can be treated as cluster extensions. The probability density function is constructed with the use of finite mixture model (FMM) (McLachlan and Peel, 2000), which is a well-known statistical approach in cluster analysis. The main assumption of the FMM is that the density function r may be represented by the convex combination of some components (7), rðxÞ ¼

s X

gk gk ðx; qk Þ;

ð7Þ

k¼1

where gk ðx; qk Þ stands for a component function, depending on input data x and a set of specific parameters qk : Each component should describe one cluster, which is indexed with k: Coefficients gk in (7) are called mixing proportions and the function r is named a mixture. As in most of the clustering techniques, the number of clusters s must be predicted. The functional form of components gk is assumed, but their specific parameters remain unknown, so the mixture depends on the parameter vector q ¼ ðq1 ; y; qs Þ: In most practical works, components have a form of Gauss distribution, so a covariance matrix Ck and a mean vector mk are characteristic parameters of a kth component, qk ¼ ðCk ; mk Þ: The mixing proportions gk are also evaluated. In order to obtain that evaluation a probability matrix j¼1;y;n G ¼ ½gi; j i¼1;y;s is introduced. An element gij of the matrix G stands for the probability that jth data element belongs to ith cluster. A mixing proportion gk is computed P from G as a normalized sum over the kth n gki row; gk ¼ i¼1 for all k ¼ 1; y; s: n An elementary calculation based on the Bayes rule shows that sets of mixing proportions and component parameters are related to each other. Because of that fact, given one of these values sets, the second can be calculated. It constitutes the basis of the expectation– maximization algorithm (EM) (Dempster et al., 1977), which is an effective method of computing the mixture density function. One execution of the EM algorithm consists of two steps. The E-step is for calculating expected values of G entries. In the M-step, the

component parameter vector q is worked out in such a way that some auxiliary likelihood function (Duda and Hart, 1973, Section 3.2) is maximized. Starting the EM algorithm requires providing some initial evaluation of q or G: Then the iteration of EM steps begins. 3.4. Error estimation for DR-CGS approach As mentioned in Section 3.3, each iteration of the EM algorithm results in a set of parameters unambiguously determining a mixture density function. Assume that for a given input data the EM algorithm is convergent. Thus one can construct a contractive function H leading from a space of mixture parameters onto itself, which describes a single step of the algorithm. According to the well-known (e.g. Stoer and Burlish, 1980) theorem about the convergence of an iterative function, the function H has a stationary point. A detailed description of the construction of H and adaptation of the theorem for the EM algorithm is presented in Schaefer et al. (2002). Denote with ri a mixture density which can be obtained in ith step of the algorithm, and with rA a density corresponding to a stationary point of the algorithm. As a conclusion from the mentioned theorem, one gets the following estimation (Schaefer et al., 2002): ci jjri  rA jjLp p w; 1c

ð8Þ

where cAð0; 1Þ and w is a norm-dependent constant. Putting together the properties of HGS (Theorem 5) and EM algorithm (8) results in the following proposition: Theorem 6 (Adamska, 2004). 8i > 0; 8e > 08Z > 0; (NAN; (W ðNÞAN; such that 8n > N; 8k > W ðNÞ   e ci P jjri  rzm jjLp ðDÞ o þ w þ sGC > 1  Z; c 1c where i means a number of EM iterations and sGC ¼ jjrZbt;m  rA jjLp ðDÞ (the rest of the symbols as described above). The quantity jjri  rzm jjLp ðDÞ is a difference between the best density available in the HGS and a mixture density calculated after ith iterations of the EM algorithm. The first two quantities estimating the difference can be arbitrarily small; in the first component, e can be any positive number, and the second decreases with the increase of value i: Therefore the error of the clustered genetic search depends on the value of sGC ; and the rest of this section will be devoted to evaluating this component. The error sGC is defined as a difference between a genetic measure density corresponding to a m-sized HGS population after t genetic epochs ((2), (6) and the following comments) and a mixture density function (7)

ARTICLE IN PRESS R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

413

obtained after an infinite number of iterations of the EM algorithm. Assume that the mixture is composed of Gaussians. Also assume that rZbt;m is constructed as a piece-wise constant function over some hypercubes, analogously as noted for the SGA in Section 2.1. Then sGC can be treated as an error of approximating a sum of Gaussians with a piece-wise constant function. Application of the approximation theorem (after Ciarlet, 1978) followed by simple transformations based on properties of the norm and the integral lead to the final conclusion Adamska, 2004: sGC¼ jjrZbt;m  rA jjLq ðDÞ pCðmeas DÞ1=q dWmax jrA jmax N;1;W ;

ð9Þ

where C is a constant dependent on norm and geometrical properties of D partition, measD means the Lebesque measure of D; dWmax ¼ maxWi CD fdWi g; dWi is the diameter of Wi and jrA jmax N;1;W ¼ maxWi CD fjrA jN;1;Wi g where jrA jN;1;W stands for the semi-norm in a Sobolev P space, generally defined as: jhjp;m;D ¼ ð jaj¼m R qa hðxÞ p 1=p : D j qxa j dxÞ The clustered genetic search error depends on the measure of an admissible domain as well as on the partition of a domain, which constitutes a base for constructing genetic measure density. If the partition is regular, then the smaller the measures of subsets Wi are, the lower the value of the quantity dWmax is; and thus, the approximation error is less, too. In particular, if the binary coding is more accurate (coordinates are coded with longer strings), the error will decrease. Estimation (9) works well whilst comparing density functions; however, it is not always reliable for a comparison of level sets of those functions. It is possible to show the example (see Adamska, 2004) when the difference for densities is very small, but level sets for some specific threshold are significantly different. This case indicates that the problem still needs some more sophisticated tools to deal with it.

4. Applications In order to illustrate some properties of the presented methods, we show results of tests for two functions: twodimensional sinus and a simple chemical interaction potential. The tests for the sinus function ((10) and Fig. 2a) f1 ðx; yÞ ¼ sinðxyÞ þ 1

ðx; yÞA½3; 3

ð10Þ

illustrate the ability of both methods to recognize many basins in the case of a multimodal function (see Figs. 2 and 3). The minima of the function constitute some one-dimensional manifolds, which are an additional difficulty. However this example goes beyond the definition of the basin of attraction (which was given for isolated

Fig. 2. The objective and the results of HC-CGS method: (a) the objective, (b) cluster extensions.

Fig. 3. The results of the DR-CGS based approach for sinus: (a) mixture density function, (b) cluster extensions.

minimizers) we can see that the recognized sets can be treated as central parts of attractors. Starting from each point of a cluster extension we reach at least one point of the same manifold. Another example concerns the geometry optimization of a chemical cluster consisting of three argon atoms. We define the geometry using inter-atomic distances and describe the potential energy with a pair-wise additive Lennard–Jones potential ((11) and Fig. 4a): 3 3  12  6 X X s s  xA½3:6; 10 3 : ð11Þ f2 ðxÞ ¼ 4e x x ij ij i¼1 j>i ’ The constant values are e ¼ 0:0123 eV and s ¼ 3:36 A (after Niesse and Mayne, 1996). The aim was to find the basin of attraction for a minimizer of potential function 11a, which corresponds to the optimal geometry of the argon cluster. We performed the calculation in a threedimensional space and then, for visualization purposes, ’ (according to fixed one distance to the value of 3:77 A three-dimensional outputs) and repeated the optimization in a two-dimensional space. The results are presented in Figs. 4b, c and 5. In this case there is only one isolated minimizer. It was found both by HC-CGS and DR-CGS. By modifying HC-CGS parameters, there is a possibility to control the rate with which it recognized cluster extension fill in basins. It is shown in Fig. 4. Tests show that the HC-CGS can better recognize basins with more complicated shapes. However, the

ARTICLE IN PRESS R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

414

Fig. 4. The objective and HC-CGS cluster extensions for argon cluster geometry optimization: (a) the objective, (b) cluster extension, (c) cluster extension.

problems of high dimension. Both strategies recognize basins even in more complicated cases when minimizers constitute a manifold.

Appendix. The basis of hierarchical genetic strategy (HGS) Fig. 5. Results of the DR-CGS based approach for argon cluster geometry optimization: (a) mixture density function, (b) cluster extension.

main drawback of this method is its large memory complexity. However, in the case of a multidimensional functions genetic algorithm based on DR-CGS, this can be much more effective. This property comes from the method of cluster representation.

5. Concluding remarks The combination of genetic algorithms and clustering methods in which the genetic sample is classified exhibits high positive synergy in solving global optimization problems. There are the following advantages of this approach: *

*

*

*

central parts of the basins can be detected and approximated; evolution landscape can be partially explored and recognized; there is a possibility to analyze the stability of minimizers; the number of local time-expansive searches can be delimit to one in each basin when the very accurate approximation of minimizers is necessary.

The quality of presented methods can be verified theoretically, at least for some cases. This includes the accuracy and stop criteria. In HC-CGS strategy, one can control the rate with which the basins are filled in by cluster extensions. Clustered genetic search based on the finite mixture model offers low complexity in solving

The following description is reported after Schaefer and Ko"odziej (2003). HGS is a multipopulational parallel genetic algorithm. Populations in HGS operate on different levels and each level is characterized by specific accuracy of coding (determined by a chromosome length), mutation rate and population size. Dependencies among populations can be described with a tree structure. The populations with low accuracy of coding correspond to branches of a low order and they perform a chaotic search over wide parts of a domain. Those populations can sprout the populations of a higher order, which are related to more accurate coding, lower mutation rate and they are usually smaller. Populations of a higher order are aimed on more precise search over specific parts of a domain. Lower branches of a population tree play a role of a control system, which pushes populations of higher order into selected parts of a domain. Every single HGS population evolves according to SGA rules. To start HGS one has to fix the following parameters: *

*

1ps1 os2 os3 o; y; osm oN—lengths of binary coded strings (Os1 ; Os2 ; y; Osm are the genetic spaces associated with that strings), 1pn1 pn2 p; y; pnm —sizes of populations which are the multisubsets of Os1 ; Os2 ; y; Osm respectively,

For j ¼ 2; y; m we can represent these genetic spaces in the following way: Osj ¼ fðoðsj1 Þ oðsj sj1 Þ Þ: oðsj1 Þ AOsj1 ; oðsj sj1 Þ AOðsj sj1 Þ g: We assume that for every j ¼ 1; y; m the genetic space Osj is linearly ordered. Nested coding is utilized, which has been specially designed for HGS.

ARTICLE IN PRESS R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

415

For DCR we define a sequence of meshes Dr1 ; y; Drm and the sequence of coding mappings codej : Drj {xðoÞ -oAOsj ; j ¼ 1; y; m: The construction is performed recursively. Let us introduce: (1) the densest mesh Drm CR and strictly increasing coding function codem : Drm -Osm ; (2) selecting function fj : Osj -Oðsjþ1 sj Þ for 1pjpm  1; (3) Drj ¼ fxðo;fj ðoÞÞ ; oAOsj g; where xðo;xÞ ¼ code1 jþ1 ðo; xÞ; 1pjpm  1; (4) codej : Drj {xðo;fj ðoÞÞ -oAOsj ; 1pjpm  1: For DCRN ; N > 1; measðDÞ > 0 and for s%j AN; N s%j ¼ sj we define the sets D1rm ; y; DN rm CR such that the Cartesian product D1rm ? DN rm CD: We also define the strictly increasing mappings codemi : Dirm -Os%m for i ¼ 1; y; N: For each iAf1; y; Ng we define recursively a sequence of sets Dirj ; a sequence of mappings codeij : Dirj -Os%j ; j ¼ 1; y; m  1 and selecting function fj : Os%j -Os%jþ1 %sj for 1pjpm  1 such that: 1 i (1) Dirj ¼ fxðo;f % % ¼ ðcodejþ1 Þ s%j g; where xðo; % j ðoÞÞ % ; oAO % xÞ % ðo; % xÞ; 1pjpm  1; (2) codeij : Dirj {xðo;f % s%j ; 1pjpm  1: % j ðoÞÞ % -oAO

We put: Drj ¼ D1rj ? DN rj ; Osj ¼ ðOs%j ÞN ;

j ¼ 1; y; m;

j ¼ 1; y; m;

codej ¼ ðcode1j ; y; codeN j Þ;

j ¼ 1; y; m:

Fig. 6. Nested meshes for hierarchical genetic strategy in the case s1 ¼ 2; s2 ¼ 3; s3 ¼ 5:

Let Mk ðpr Þ ¼ ðprþl ; x; stopÞ; lpk denote an outcome of a k-periodic metaepoch started from the population pr in its rth generation. prþl stands for a frequency vector of the resulting population, x for the best adapted individual in the metaepoch and stop for the branch stop criterion marker. There are two types of stop criterions in HGS: a global one, which is responsible for stopping all sill active populations in the tree, and a local branch stop criterion which is aimed on heuristic detection of a lack of progress in a given population. The HGS tree is expanding because of sprouting new branches (called children) from a parental one. If a parental branch has a degree jAN; then the degree of its child is j þ 1: Before we describe formally the sprouting operation, let us introduce the following auxiliary operator As ; which cuts off a s0 -length prefix from the given binary string of length s00 ; s00 > s0 : Definition A.4. *

Remark A.1. Dr1 CDr2 C?CDrm1 CDrm : Fig. 6 shows the sets Dr1 ; Dr2 and Dr3 CR in the case of s1 ¼ 2; s2 ¼ 3; s3 ¼ 5; f1 ð00Þ ¼ f1 ð01Þ ¼ 1; f1 ð10Þ ¼ f1 ð11Þ ¼ 0; f2  01: In the following we assume that fj  0; j ¼ 1; y; m1: Definition A.2. We say that the branch has degree jAf1; y; mg if it is created by populations containing chromosomes of the length sj AN; s1 os2 o?osm : The unique branch of the lowest degree 1 is called root. Populations evolving in this branch have individuals with the shortest chromosomes. Every branch of degree jAf1; y; mg needs its own fitness definition 1 * fj ¼ F3code : Osj -Rþ ðA:1Þ j

Definition A.3. A k-periodic metaepoch Mk ; kAN is a discrete evolution process which starts from the given population and terminates after at most k generations by selection of the best adapted individual.

*

If DCR; then for a binary string o2 of the length jo2 j ¼ s00 we have As0 ðo2 Þ ¼ o1 where jo1 j ¼ s0 ; s00 ; s0 AN; s00 > s0 : It means, that o1 constitutes the s0 length prefix of the string o2 : If DCRN ; N > 1; mðDÞ > 0; then for s%0 ; s%00 AN such that N s%0 ¼ s0 ; N s%00 ¼ s00 and for oj ¼ ðo1j ?oN j Þ; j ¼ 1; 2; joi1 j ¼ s%0 ; joi2 j ¼ s%00 ; i ¼ 1; y; N then As0 is defined as the catenation of operators Ais%0 ; i ¼ 1; y; N i i i As0 ¼ ðA1s%0 ?AN s%0 Þ; As%0 ðo2 Þ ¼ o1 ;

i ¼ 1; y; N;

where constitutes the s% -length prefix of oi2 ; ; i ¼ 1; y; N: oi1

0

Now we are ready to define the sprouting operator SO: # ðjÞ be the best adapted individual in Definition A.5. Let o some metaepoch selected from population represented by psj evolving in branch of degree j and let sjþ1 > sj : An operator SO; given by the formula SOðpsj Þ ¼ ðpsj ; psjþ1 Þ; defines the process of construction of a population represented by the vector psjþ1 ; which is an initial population for the new branch of degree j þ 1: This population is a multisubset of Osjþ1 and individuals

ARTICLE IN PRESS R. Schaefer et al. / Engineering Applications of Artificial Intelligence 17 (2004) 407–416

416

for this set are selected according to the following rules: * *

Asj ðyÞ ¼ oðjÞ ; 8yAOsjþ1 (i.e. jyj ¼ sjþ1 ) a ðsjþ1  sj Þ-length suffix is selected according to the uniform probability distribution over Oðsjþ1 sj Þ :

The sprouting operator can be activated or not, depending on the outcome of prefix comparison operator defined as follows: Definition A.6. An operator PC : Q-f0; 1g given by the formula: 8 > < 1; (xAX and (yAY PCðX ; Y ; sÞ ¼ such that As ðxÞ ¼ As ðyÞ > : 0; otherwise where Q ¼ fðX ; Y ; sÞ : X ; Y  are multisets in Oz ; Or respectively; sAN; spz; sprg is called a prefix comparison operator. The HGS starts with an initial population which consists of n1 individuals with chromosomes of the length s1 : That population creates a root of the HGS tree. Then the population tree can be built using the sprouting operation. New branch is conditionally sprouted from the current branch psj by using SOðpsj Þ; if PCðpsj ; y; sj Þ for all child populations y (also dead ones) of the branch psj : The global stop criterion can be defined as no sprouting of new branches from a root combined with satisfying local branch stop criterion in all branches of the order higher than 1.

References Adamska, K., 2004. Genetic clustering as a parallel algorithm for approximating basins of attraction. Lecture Notes in Computer Science 3019, 536–543. Cabib, R., Schaefer, R., Telega, H., 1998. A Parallel Genetic Clustering for Inverse Problems. Lecture Notes in Computer Science, Vol. 1541. Springer, Berlin, pp. 551–556. Chiou, Y., Lan, L.W., 2001. Genetic clustering algorithms. European Journal of Operational Research 135 (2), 413–427.

Ciarlet, P., 1978. Finite Elements Method for Elliptic Problems. North-Holland, Amsterdam. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society Series B 39, 1–38. Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis. Wiley, New York. Goldberg, D., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA. Grygiel, K., 1996. On asymptotic properties of a selection-withmutation operator. Proceedings of the First National Conference on Genetic Algorithms, Murzasichle, Poland. Guus, C., Boender, E., Edwin Romeijn, H., 1995. Stochastic Methods. In: Horst, R., Pardalos, P.M. (Eds.), Handbook of Global Optimization. Kluwer Academic Publishers, Dordrecht, pp. 828–869. Horst, R., Pardalos, P.M. (Eds.), 1995. Handbook of Global Optimization, Kluwer Academic Publishers, Dordrecht. Jain, A.K., Murty, M.N., Flynn, P.J., 1999. Data clustering—a review. ACM Computing Surveys 31 (3), 264–323. Maulik, U., Bandyopadhyay, S., 2000. Genetic algorithm-based clustering technique. Pattern Recognition 33, 1455–1465. McLachlan, G., Peel, D., 2000. Finite Mixture Models. Wiley, New York, NY. Niesse, J.A., Mayne, H.R., 1996. Global geometry optimization of atomic clusters using a modified genetic algorithm in space-fixed coordinates. Journal of Chemical Physics 105, 4700–4706. Schaefer, R., Jab"on´ski, Z.J., 2002. On the convergence of sampling measures in the global genetic search. Lecture Notes in Computer Science, Vol. 2328. Springer, Berlin, pp. 593–600. Schaefer, R., Ko"odziej, J., 2003. Genetic search reinforced by the population hierarchy. In: De Jong, K.A., Poli, R., Rowe, J.E. (Eds.), Foundations of Genetic Algorithms, Vol. 7. Morgan Kaufman, Los Allos, CA, pp. 383–399. Schaefer, R., Ko"odziej, J., Gwizda"a, R., Wojtusiak, J., 2000. How simpletons can increase the community development—an attempt to hierarchical genetic computation. Proceedings of Fourth ! pp. 187–197. KAEiOG, Ladek Zdroj, Schaefer, R., Adamska, K., Jab"on´ski, Z.J., 2002. Clustering driven by the genetic sampling measure, Methods of Artificial Intelligence. Proceedings of the Symposium on Methods of Artificial Intelligence AI-METH 2002, Gliwice, Poland, pp. 361–366. Stoer, J., Burlish, R., 1980. Introduction to Numerical Analysis. Springer, Berlin (Section 5.2). Telega, H., 1999. Parallel algorithms for solving selected inverse problems. Ph.D. Thesis, Academy of Mining and Metallurgy, ! (in Polish). Krakow Tseng, L.Y., Yang, S.B., 2000. A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recognition 33, 1251–1259. Tseng, L.Y., Yang, S.B., 2001. A genetic approach to the automatic clustering problem. Pattern Recognition 34, 415–424. Vose, M.D., 1999. The Simple Genetic Algorithm. MIT Press, Cambridge, MA.