A new clustering algorithm based on hybrid global optimizationbased on a dynamical systems approach algorithm

Expert Systems with Applications 37 (2010) 5645–5652 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

Download PDF

639KB Sizes 2 Downloads 185 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 37 (2010) 5645–5652

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

A new clustering algorithm based on hybrid global optimization based on a dynamical systems approach algorithm Ali Maroosi *, Babak Amiri Iran University of Science and Technology, Tehran, Iran

a r t i c l e

i n f o

a b s t r a c t

Keywords: Clustering K-means Dynamical systems Tabu search

Many methods for local optimization are based on the notion of a direction of a local descent at a given point. A local improvement of a point in hand can be made using this direction. As a rule, modern methods for global optimization do not use directions of global descent for global improvement of the point in hand. From this point of view, global optimization algorithm based on a dynamical systems approach (GOP) is an unusual method. Its structure is similar to that used in local optimization: a new iteration can be obtained as an improvement of the previous one along a certain direction. In contrast with local methods, is a direction of a global descent and for more diversiﬁcation combined with Tabu search. This algorithm is called hybrid GOP (HGOP). Cluster analysis is one of the attractive data mining techniques that are used in many ﬁelds. One popular class of data clustering algorithms is the center based clustering algorithm. K-means is used as a popular clustering method due to its simplicity and high speed in clustering large datasets. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies have been done in clustering. In this paper, we proposed application of hybrid global optimization algorithm based on a dynamical systems approach. We compared HGOP with other algorithms in clustering, such as GAK, SA, TS, and ACO, by implementing them on several simulation and real datasets. Our ﬁnding shows that the proposed algorithm works better than others. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction

One popular class of data clustering algorithms is the center based clustering algorithm. K-means is used as a popular clustering method due to its simplicity and high speed in clustering large datasets (Forgy, 1965). However, K-means has two shortcomings: dependency on the initial state and convergence to local optima (Selim & Ismail, 1984) and also global solutions of large problems cannot found with reasonable amount of computation effort (Spath, 1989). In order to overcome local optima problem lots of studies have been done in clustering. Mualik and Bandyopadhyay (2000) proposed a genetic algorithm based method to solve the clustering problem and experiment on synthetic and real life datasets to evaluate the performance. The results showed that GA-based method might improve the ﬁnal output of K-means. Krishna and Murty (1999) proposed a novel approach called genetic K-means algorithm for clustering analysis. It deﬁnes a basic mutation operator speciﬁc to clustering called distance-based mutation. Using ﬁnite Markov chain theory, it proved that GKA converge to the best-known optimum. Selim and Al-Sultan (1991) discussed the solution of the clustering problem usually solved by the K-means algorithm. The is problem known to have local minimum solutions, which are

Clustering, so-called set partitioning, is a basic and widely applied methodology. Application ﬁelds include statistics, mathematical programming (such as location selecting, network partitioning, routing, scheduling and assignment problems, etc.) and computer science (including pattern recognition, learning theory, image processing and computer graphics, etc.). Clustering is mainly to group all objects into several mutually exclusive clusters in order to achieve the maximum or minimum of an objective function. Clustering is rapidly becoming computationally intractable as problem scale increases, because of the combinatorial character of the method. Brucker (1978) and Ward (1963) proved that, for speciﬁc object functions, clustering becomes an NP-hard problem when the number of clusters exceeds 3. There are many methods applied in clustering analysis, like hierarchical clustering, partition-based clustering, density-based clustering, and artiﬁcial intelligence-based clustering. * Corresponding author. E-mail addresses: [email protected], [email protected] (A. Maroosi), [email protected] (B. Amiri). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.02.047

5646

A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652

usually what the K-means algorithm obtains. The simulated annealing approach for solving optimization problems described and proposed for solving the clustering problem. The parameters of the algorithm were discussed in detail and it was shown that the algorithm converges to a global solution of the clustering problem. According to Sung and Jin (2000), researchers considered a clustering problem where a given data set partitioned into a certain number of natural and homogeneous subsets such that each subset is composed of elements similar to one another but different from those of any other subset. For the clustering problem, a heuristic algorithm exploited by combining the Tabu search heuristic with two complementary functional procedures, called packing and releasing procedures. The algorithm was numerically tested for its electiveness in comparison with reference works including the Tabu search algorithm, the K-means algorithm and the simulated annealing algorithm. Over the last decade, modeling the behavior of social insects, such as ants and bees, for the purpose of search and problem solving has been the context of the emerging area of swarm intelligence. Using ant colony is a typical successful swarm-based optimization approach, where the search algorithm is inspired by the behavior of real ants. Kuo, Wang, Hu, and Chou (2005) proposed a novel clustering method, ant K-means (AK) algorithm. Ant K-means algorithm modiﬁes the K-means as locating the objects in a cluster with the probability, which updated by the pheromone, while the rule of updating pheromone is according to total within-cluster variance (TWCV). Shelokar, Jayaraman, and Kulkarni (2004) presents an ant colony optimization, methodology for optimally clustering N objects into K clusters. The algorithm employs distributed agents who mimic the way real ants ﬁnd a shortest path from their nest to food source and back. They compared result with other algorithms in clustering, GA, Tabu search, SA. They showed that their algorithms are better than other algorithms in performance and time. This paper presents application of HGOP algorithm for clustering. The paper is organized as follows: in Section 2 we discussed cluster analysis problems. Section 3 introduces HGOP philosophy and application of it on clustering, and then in Section 4 experimental result of proposed clustering algorithm in comparison with other clustering algorithms is shown.

C i – / for i ¼ 1; . . . ; k; C i \ C j ¼ / for i ¼ 1; . . . ; k; j ¼ 1; . . . ; k; and i – j K

and [ C i ¼ S i¼1

In this study, we will also use Euclidian metric as a distance metric. The existing clustering algorithms can be simply classiﬁed into the following two categories: hierarchical clustering and partitional clustering. The most popular class of partitional clustering methods are the center based clustering algorithms (Gungor & Unler, 2006). The K-means algorithms, is one of the most widely used center based clustering algorithms (Forgy, 1965). To ﬁnd K centers, the problem is deﬁned as an optimization (minimization) of a performance function, f ðX; ZÞ, deﬁned on both the data items and the center locations. A popular performance function for measuring goodness of the k clustering is the total within-cluster variance or the total mean-square quantization error (MSE), Eq. (3) (Gungor & Unler, 2006).

f ðX; ZÞ ¼

N X

MinfkX i Z l k2 j l ¼ 1; . . . ; K:g

ð3Þ

i¼1

The steps of the K-means algorithm are as follow (Mualik & Bandyopadhyay, 2000): Step 1: Choose K cluster centers Z 1 ; Z 2 ; . . . ; Z k randomly from n points fX 1 ; X 2 ; . . . ; X n g. to cluster Step 2: Assign point X i ; i ¼ 1; 2; . . . ; n C j ; J 2 f1; 2; . . . ; Kg if kX i Z j k < kX i Z p k, p=1, 2, . . ., K, and j – p. Step 3: Compute new cluster centers Z 1 ; Z 2 ; . . . ; Z K as follows:

Z i ¼

1X Xj; n x 2C j

i ¼ 1; 2; . . . ; K;

i

where ni is the number of elements belonging to cluster C i . Step 4: If termination criteria are satisﬁed, stop otherwise continues from step 2 Note that in case the process does not terminate at step 4 normally, then it is executed for a mutation ﬁxed number of iterations. Global optimization algorithm (GOP)

2. Clustering Steps of the global optimization algorithm are as follow: Data clustering, which is an NP-complete problem of ﬁnding groups in heterogeneous data by minimizing some measure of dissimilarity, is one of the fundamental tools in data mining, machine learning and pattern classiﬁcation solutions (Garey, Johnson, & Witsenhausen, 1982). Clustering in N-dimensional Euclidean space RN is the process of partitioning a given set of n points into a number, say k, of groups (or, clusters) based on some similarity (distance) metric in clustering procedure which is Euclidean distance, derived from the Minkowski metric (Eqs. (1) and (2)).

dðx; yÞ ¼

m X

!1=r jxi yj j

r

ð1Þ

i¼1

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ2 u m uX dðx; yÞ ¼ t ðxi yj Þ

ð2Þ

i¼1

Let the set of n points fX 1 ; X 2 ; . . . ; X n g be represented by the set S and the k clusters be represented by C 1 ; C 2 . . . ; C K . Then:

1. Select some initial random points that is deﬁne a set A ¼ fUðtÞ ¼ ðU1ðtÞ; . . . ; UnðtÞÞ; t ¼ 1; . . . ; Tg; the set A uniformly select from the box U 2 RKd ; ai 6 ui 6 bi ; i ¼ 1; . . . ; Kd;. 2. Calculate the performance value at set A and then chose a point U 2 Awhich provides the best performance value and U T ¼ U . 3. Find good point U Tþ1 from U ðU T Þ and add it to the set A. For each point U 2 A and each coordinate i calculate degree of the change of objective function values f(U) when ui changes. Here change means either decrease or increase of a scalar variable. 4. U T ¼ U that U ¼ ðu0 ; u1 ; . . . ; un Þ, calculate Fðui "Þ Fðui #Þ. 5. Using these degrees, calculate forces FðTÞ ¼ ðF 1 ðTÞ; . . . ; F KL ðTÞÞ acting on increase of objective function values f at the point U ðU T Þ. FðTÞ ¼ Fðui "Þ Fðui #Þ. 6. Calculate U Tþ1 from U T by U Tþ1 ¼ U T þ aFðTÞ. 7. Then by the same manner, choose a new point U Tþ2 and so on. 8. This process is terminated if either FðTÞ ¼ 0 or after maximum iteration hold the best solution and repeat all of the these procedures with take other initial random points and another stage again.

A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652

3. Application of GOP in clustering 3.1. Philosophy behind the development of the GOP There are many different methods and algorithms developed for global optimization problems (Migdalas, Pardalos, & Varbrand, 2001). The algorithms GOP takes into account some relatively ”worse” points for further consideration. This is what many other methods do, such as Simulated, Annealing (Glover & Laguna, 1997), Genetic Algorithms (Smith, 2002) and Taboo Search (Cvijovic & Klinovski, 2002). The choice of a decent (good) direction

5647

is the main part of each algorithm. Instead of using a stochastic search (as in the algorithms mentioned), GOP uses the formula (Mammadov, 2004; Mammadov, Rubinov, & Yearwood, 2005). Note that the GOP algorithm has quite different settings and motivations compared to the methods that use so-called dynamical search (Pronzato, Wynn, & Zhigljausky, 2002). This method for a search has some ideas in common with the heuristic method which attempts to estimate the overall convexity characteristics of the objective function. The advantage of this approach is that it does not use any approximate underestimations including convex underestimations.

Fig. 1. Flowchart of the HGOP clustering algorithm.

5648

A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652

3.2. Application of hybrid GOP algorithm on clustering The search capability of HGOP algorithm used in this article for the purpose of appropriately determining a ﬁxed number of K cluster centers in RN ; thereby suitably clustering the set of n unlabelled points the clustering metric that has been adopted is the sum of the Euclidean distance of the points from their respective cluster centers. The steps of the proposed algorithm are shown in Fig. 1 and described in detail in this section. The pseudo code for HGOP clustering algorithm is shown in Fig. 2. Steps of application of the HGOP on clustering are as follows: Step 0. f ðU Total

Best Þ

þ1

Step 5-1. For each point U 2 A and each coordinate i calculate degree of the change of f(U) when ui changes. Here change means either decrease or increase of a scalar variable. If U T ¼ U ¼ ðu0 ; u1 ; . . . ; un Þ points that force FðTÞ was calculated for this point and U ¼ ðu0 ; u1 ; . . . ; un Þ was represented, the points exist.This method describes dynamical systems, based on the nonfunctional relationship between two variables. It is based on the notion of a fuzzy derivative. The fuzzy derivative @ui =@uj is deﬁned as an inﬂuence of the feature j on the feature i that

@ui ¼ lðui ; uj Þ; @uj

i; j 2 1; . . . ; n

where

Num Main Iter ¼ 0;

lðui ; uj Þ ¼ ðl1 ðui ; uj Þ; l2 ðui ; uj Þ; l3 ðui ; uj Þ; l4 ðui ; uj ÞÞ; i; j

TabuListðTLÞ ¼ /; Step 1. Select some initial random points that is deﬁne a set A ¼ fUðtÞ ¼ ðU 1 ðtÞ; . . . ; U n ðtÞÞ; t ¼ 1; . . . ; Tg; the set A is uniformly selected from the box U 2 RKd ; ai 6 ui 6 bi ; i ¼ 1; . . . ; Kd; and does not belong to the Tabu Region (Visited Region), where ai ; bi are minimum and maximum bound of ui that set with minimum and maximum of data that should be clustered. K is the number of clusters and d is the number of dimensions of points and Tabu region (TR) determined by Tabu List. Step 2. If TL ¼ fV 1 ; V 2 ; . . . ; V M then TR ¼ fUjjU V i j < qi ; i 2 f1; 2; . . . ; Mg such that qi is radius of spherical area with center Vi that is percent of C ðC ¼ maxðbi ai Þ that i ¼ 1; . . . ; KdÞ.The i th point is represented as a vector of decision variable values UðiÞ ¼ ðu1i ; u2i ; . . . ; uKd i Þ that is a candidate solution with K cluster centers. For example Uð1Þ ¼ ð2; 5; 1; 6; 3; 2; 5; 7; 4Þ represents a solution with 3 cluster centers that are Z 1 ¼ ð2; 5; 1Þ; Z 2 ¼ ð6; 3; 2Þ; Z 3 ¼ ð5; 7; 4Þ, such that each center has three dimensions. Step 3. Compute the performance value of f ðiÞ for each point UðiÞ. Note that to prevent the difference between U 1 ¼ ðZ 1 ; Z 2 ; Z 3 Þ and U 1 ¼ ðZ 2 ; Z 1 ; Z 3 Þ after selecting initial random points swap Zi to overcome this problem. Step 4. Calculate the performance value at set A and then choose U as U 2 A, which it provides the best performance value and then put U in U T . Step 5. Find good point U Tþ1 from U ðU T Þ and add it to the set A and update Tabu List as follow:

2 f0; . . . ; ng and i – j and l ðui ; uj Þ ¼ dðui "; uj "Þ; l2 ðui ; uj Þ ¼ dðui "; uj #Þ; l3 ðui ; uj Þ ¼ dðui #; uj #Þ; l4 ðui ; uj Þ ¼ dðui #; uj "Þl1 ðui ; uj Þ shows the degree of the increase of the entry i if the entry j increases starting from the initial state ðui ; uj Þ and so as for l2 ðui ; uj Þ; l3 ðui ; uj Þ and l4 ðui ; uj Þ 1

l1 ¼

M11 ; M1

l1 ¼

M12 ; M1

l1 ¼

M13 ; M2

l1 ¼

M 14 M2

where M1 is the number of points ðui ; uj Þ that satisfy ui > ui . M 11 is the number of points ðui ; uj Þ that satisfy ui > ui and uj > uj . M12 is the number of points ðui ; uj Þ that satisfy ui > ui and uj < uj . M2 is the number of points ðui ; uj Þ that satisfying ui < ui . M13 is the number of points ðui ; uj Þ that satisfy ui < ui and uj < uj . M14 is the number of points ðui ; uj Þ that satisfy ui < ui and uj > uj . lðui ; uj Þ ¼ ðg1 ; g2 ; g3 ; g4 Þ and lðuj ; ui Þ ¼ ð11 ; 12 ; 13 ; 14 Þ then Fðuj ! ui "Þ ¼ g1 11 þ g4 12 and Fðuj ! ui #Þ ¼ g3 13 þ g2 14 . The resulting forces on the entry i is deﬁned as a sum of all these forces that is, P P Fðui "Þ ¼ i – j Fðuj ! ui "Þ; Fðui #Þ ¼ i – j Fðuj ! ui #Þ Step 5-2. Using these degrees, calculate forces FðTÞ ¼ ðF 1 ðTÞ; . . . ; F KL ðTÞÞ acting to increase amount of f at the point of U ðU T Þ. FðTÞ ¼ Fðui "Þ Fðui #Þ b Tþ1 by using UT. Step 5-3. Calculate U set a ¼ 1 and U Dynamical System ¼ U T while a > e and Iteration
Fig. 2. Pseudo code for HGOP clustering algorithm.

A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652

U Dynamical

System

¼ U Dynamical

System

þ aFðTÞ

else

5649

our experiments on a Pentium IV, 2.8 GHz, 512 GB RAM computer and we have coded with Matlab 7.1 software. We run all ﬁve algorithms on datasets.

a ¼ a=2 Iteration= Iteration+1;

b Tþ1 ¼ U Dynamical U

System

In above e is a small positive number. Step 5-4. Reﬁnement of a new point: Using a local optimizab Tþ1 to ﬁnd a local minimum U Tþ1 tion procedure starting in U . A direct search method (Pattern Search, (Hart, 2001)) is used at this stage. b Tþ1 ; Dp ¼ D0 . Step 5-4-1. Set u0 ¼ U Step 5-4-2. for p ¼ 0; 1; . . ., number of pattern iteration, we have an iterate up and a step-length parameter Dp > 0 Let ei; i ¼ 1; . . . ; n, denote the standard unit basis vectors. Step 5-4-3. Look at the points upattern ¼ up Dp ei ; i ¼ 1; . . . ; n to ﬁnd upattern for which f ðupattern Þ < f ðup Þ. If you ﬁnd no upattern such that f ðupattern Þ < f ðup Þ, then reduce Dp by a half and continue; otherwise, leave the steplength parameter alone, setting Dpþ1 ¼ Dp and upþ1 ¼ upattern . In the latter case we can also increase the steplength parameter, say, by a factor of 2, if we feel a longer step might be justiﬁed. Step 5-4-4. Repeat the iteration just described until Dp is deemed sufﬁciently small. Step 5-4-5. At the end U Tþ1 ¼ unumberofpatterniteration Step 5-5. Construct a new population Aðt þ 1Þ ¼ AðtÞ[fU Tþ1 g. Step 5-6. Update Tabu List with U Tþ1 , if Tabu List is full, delete the point with worse ranking. Tabu Lists are ranked and saved according to their objective function values and Sum of the Euclidian distance of each point from another point in the list. 5-6-1. Save ranking of Tabu List according to the objective function values and name it Objective Function Rank (OFR). 5-6-2. Save ranking of Tabu List according to the maximum sum of Euclidian distance values from the tabu list points and name it Distance Rank (DR). 5-6-3. Ranking of each point in TL is best ranking in DR and OFR. Step 6. If either FðTÞ < c or T > T*; go to step 7, otherwise, return to step 4 where c > 0 and T* is a positive integer. Step 7. If f ðU Total Best Þ > f ðU Tþ1 Þ then U Total Best ¼ U Tþ1 else U Total Best ¼ U Total Best . If Num_Main_Iter is less than Max_Num_Main_Iter then Num_Main_Iter = Num_Main_Iter +1; and return to step 1. Step 8. The U Total Best is ﬁnal solution and Cluster center.

4.1. Artiﬁcial data sets Data 1: This is a nonoverlapping two-dimensional data set where the number of clusters is two. It has 10 points. The value of K is chosen to be 2 for this data set. Data 2: This is a nonoverlapping two-dimensional data set where the number of clusters is three. It has 76 points. The value of K is chosen to be 3 for this data set. Data 3: This is an overlapping two-dimensional triangular distribution of data points having nine classes where all the classes are assumed to have equal a priori probabilities ( = 1/19 ). It has 900 data points. The X-Y ranges for the nine classes are as follows: Class Class Class Class Class Class Class Class Class

1: 2: 3: 4: 5: 6: 7: 8: 9:

[3.3,0.7] [0.7, 3.3], [1.3, 1.3] [0.7, 3.3], [0.7, 3.3] [0.7, 3.3], [3.3,0.7] [1.3, 1.3], [1.3, 1.3] [1.3, 1.3], [0.7, 3.3] [1.3, 1.3], [3.3,0.7] [3.3,0.7], [1.3, 1.3] [3.3,0.7], [0.7, 3.3] [3.3,0.7].

Thus the domain for the triangular distribution for each class and for each axis is 2.6. Consequently, the height will be 1/1.3 (since 12*2.6*height”1). The value of K is chosen to be 9 for this data set. Data 4: This is an overlapping ten-dimensional data set generated using a triangular distribution of the form shown in Fig. 3 for two classes, 1 and 2. It has 1000 data points. The value of K is chosen to be 2 for this data set. The range for class 1 is [0, 2] [0, 2] [0, 2]. . .10 times, and that for class 2 is [1, 3] [0, 2] [0, 2]. . .9 times, with the corresponding peaks at (1, 1) and (2, 1). The distribution along the ﬁrst axis (X) for class 1 may be formally quantiﬁed as:

8 0 > > > < x f1 ðxÞ ¼ > 2 x > > : 0

for x 6 0 for 0 < x 6 1 for 1 < x 6 2 for x > 2

for class 1. Similarly for class 2

4. Experimental result The experimental results comparing the HGOP clustering algorithm with several typical algorithms including the ACO algorithm (Shelokar et al., 2004), the simulated annealing approach (Selim & Al-Sultan, 1991), the genetic K-means algorithm (Krishna & Murty, 1999), and the Tabu search approach (Sung & Jin, 2000) are provided for four artiﬁcial data sets (Data 1, Data 2, Data 3 and Data 4) and ﬁve real-life data sets (Vowel, Iris, Crude Oil, Wine and Thyroid diseases data), respectively. These are ﬁrst described below. The effectiveness of algorithms is greatly dependent on the generation of initial solutions. Therefore, for every dataset, algorithms performed 10 times individually for their own effectiveness tests, each time with randomly generated initial solutions. We have done

Fig. 3. Triangular distribution along the X-axis.

5650

A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652

8 0 > > > 3 x > > : 0

for x 6 1

4.2. Real-life data sets

for 1 < x 6 2 for 2 < x 6 3 for x > 3

The distribution along the other nine axes ðY i ; i ¼ 1; 2; . . . ; 9Þ for both the classes is

8 0 > > > 2 yi > > : 0

for yi 6 0 for 0 < yi 6 1 for 1 < yi 6 2 foryi > 2

Table 1 Result obtained by the ﬁve algorithms for 10 different runs on dataset 1. Method

HGOP ACO GAK TS SA

Function value

CPU time (s)

F best

F av erage

F worst

3,120125 3,142375 3,273426 3,244326 3,217832

3,131337 3,163422 3,355521 3,310024 3,282089

3,228137 3,352843 3,683901 3,572814 3,539115

1.81 1.89 2.01 1.92 1.99

Table 2 Result obtained by the ﬁve algorithms for 10 different runs on dataset 2. Method

HGOP ACO GAK TS SA

Function value

CPU time (s)

F best

F av erage

F worst

51.493674 52.082746 56,142562 54.752946 53.562492

51,533427 52,212071 56,377520 54,879342 53,635943

51.687453 52.729373 57,317354 55.384927 53.929748

8.23 8.98 17.24 14.57 14.82

Table 3 Result obtained by the ﬁve algorithms for 10 different runs on dataset 3. Method

HGOP ACO GAK TS SA

Function value

CPU time (s)

F best

F av erage

F worst

962.342786 964.739472 966,649837 972.629478 966.418263

962.578234 965,048327 966,772302 973,209275 966,614089

964.753761 966.283745 966,853946 975.528463 967.397392

25.93 26.88 38.52 32.78 31.24

Table 4 Result obtained by the ﬁve algorithms for 10 different runs on dataset 4. Method

HGOP ACO GAK TS SA

Function value

Table 6 Result obtained by the ﬁve algorithms for 10 different runs on Iris data.

CPU time (s)

F best

F av erage

F worst

1246.135426 1248.958685 1258.673362 1282.538294 1249.736287

1246,325342 1249,034036 1520,777767 1285,988483 1249,968105

1246.374356 1249.335442 1271.635528 1299.789237 1250.895375

Vowel data: This data consists of 871 Indian Telugu vowel sounds (Pal & Majumder, 1977). These were uttered in a consonant–vowel–consonant context by three male speakers in the age group of 30–35 years. The data set has three features F1, F2 and F3, corresponding to the ﬁrst, second and third vowel formant frequencies, and six overlapping classes fd; a; i; u; e; og. The value of K is therefore chosen to be 6 for this data. Iris data: This is the Iris data set, which is perhaps the bestknown database to found in the pattern recognition literature. Fisher’s paper is a classic in the ﬁeld and referenced frequently to this day. The data set contains three classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are not linearly separable from each other. There are 150 instances with four numeric attributes in iris data set. There is no missing attribute value. The attributes of the iris data set are; sepal length in cm, sepal width in cm, petal length in cm and petal width in cm (Blake and Merz). Crude oil data: This overlapping data (Johnson & Wichern, 1982) has 56 data points, 5 features and 3 classes. Hence the value of K is chosen to be 3 for this data set. Wine data: This is the wine data set, which is also taken from MCI laboratory. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. There are 178 instances with 13 numeric attributes in wine data set. All attributes are continuous. There is no missing attribute value. Thyroid diseases data: This dataset categories N = 215 samples of patients suffering from three human thyroid diseases, K = 3 as: euthyroid, hyperthyroidism, and hypothyroidism patients where 150 individuals tested euthyroid thyroid, 30 patients experienced hyperthyroidism thyroid while 35 patients suffered from hypothyroidism thyroid. Each individual was characterized by the result of ﬁve, n = 5 laboratory tests as: total serum thyroxin, total serum triiodothyronine, serum tri-iodothyronine resin uptake, serum thyroid-stimulating hormone (TSH), and increase TSH after injection of TSH-releasing hormone (Blake and Merz). The comparison of results for each dataset is based on the bet solution found in 10 distinct runs of each algorithm and the convergence processing time taken to attain the best solution. The solution quality is also given in terms of the average and worst

120.63 122.34 178.42 142.15 136.61

Method

HGOP ACO GAK TS SA

Function value

CPU time (s)

F best

F average

F worst

96.370352 97.100777 113.986503 97.365977 97.100777

96.373654 97.171546 125.197025 97.868008 97.134625

96.387564 97.808466 139.778272 98.569485 97.263845

31.35 33.72 105.53 72.86 95.92

Table 5 Result obtained by the ﬁve algorithms for 10 different runs on Vowel data. Method

HGOP ACO GAK TS SA

Function value

CPU time (s)

F best

F average

F worst

148718.363754 148837.736634 149346.152274 150635.653256 149357.634587

148718.454321 148837,768828 149391,501798 1506480,795320 1494360,175420

148718.674567 148837.937878 149436.851323 150697.784636 149749.549362

69.85 73.65 98.72 81.25 79.46

5651

A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652 Table 7 Result obtained by the ﬁve algorithms for 10 different runs on Crude oil data. Method

Function value

CPU time (s)

F best

F average

F worst

HGOP ACO GAK TS SA

250.983245 253.564637 278,965152 254.645375 253.763548

251.243564 254,180897 279,907028 2554229528 254,653207

252.028164 256.645938 283.674535 258.533264 258.211847

values of the clustering metric (F avg ; F worst , respectively) after 10 different runs for each of the ﬁve algorithms. F is the performance of clustering method that illustrated in Eq. (3). Tables 1–9 show these results. For Data 1 (Table 1) it is found that the HGOP clustering algorithm provides the optimal vale of 3,120125 in 90% of the total runs that is better than other clustering algorithms. The ACO clustering algorithm found value of 3,142375 in 90% of runs and GAK, TS, and SA found values of 3,273426, 3,244326, and 3,217832 in 80% of runs. The HGOP required the least processing time (1.81). For Data 2 (Table 2) the HGOP clustering algorithm attains the best

value of 51.493674 in 90% of the total runs. On the other hand ACO, GAK, TS and SA algorithms attain 52,082746, 56,142562, 54,752946, and 53,562492 in 80% of the total runs. The execution time taken by the HGOP algorithm is less than other algorithms (8.23). Similarly for Data 3 (Table 3) and Data 4 (Table 4) the best HGOP clustering algorithm attains the best values of 962.342786 and 1246.135426 in 90% and all of total runs, respectively. The best value provided by ACO, TS and SA obtained in 80% of total runs and the best value provided by GAK obtained in 40% of runs. In terms of the processing time, the HGOP performed better than other clustering algorithms as can be observed from Tables 3 and 4. For Vowel Data, (Table 5) the HGOP clustering algorithm attains the best value of 148718.363754 in 90% of runs. ACO, TS, and SA provided the best values in 80% of runs and the GAK algorithm attains the best value only in 50% of total runs. In addition the HGOP clustering algorithm performed better than other algorithms in terms of the processing time required (69.85). For clustering problem, on iris dataset results given in Table 6, show that the HGOP provide the optimum value of 96.370352. The HGOP and ACO were able to ﬁnd the optimum nine times as compared to that of ﬁve times obtained by SA. The HGOP required the least processing time (31.35). For Crude Oil data set, the HGOP clustering algorithm attains the best value of 251.534997 in 90% of total runs and ACO, GAK, TS, and SA attain the best value of 253.564637, 278,965152, 254.645375, and 250.983245 in 80% of total runs. The processing time required by HGOP is less than other algorithms (14.43). The result obtained for the clustering problem, Wine dataset given in Table 8. The HGOP ﬁnd the optimum solution of 16228.645326 and the ACO, SA and GAK methods provide 16530.533807. The HGOP, ACO, SA and GAK methods found the optimum solution in all their 10 runs. The execution time taken by the HGOP algorithm is less than other algorithms.

Table 8 Result obtained by the ﬁve algorithms for 10 different runs on Wine data. Method

HGOP ACO GAK TS SA

Function value

CPU time (s)

F best

F average

F worst

16228.645326 16530.533807 16530.533807 16666.226987 16530.533807

16228.645326 16530.533807 16530.533807 16785.459275 16530.533807

16228.645326 16530.533807 16530.533807 16837.535670 16530.533807

56.37 68.29 226.68 161.45 57.28

Table 9 Result obtained by the ﬁve algorithms for 10 different runs on Thyroid data. Method

HGOP ACO GAK TS SA

Function value

CPU time (s)

F best

F average

F worst

10109.874563 10111.827759 10116.294861 10249.72917 10111.827759

10111.132455 10112.126903 10128.823145 10354.315021 10114.045265

10113.657348 10114.819200 10148.389608 10438.780449 10115.934358

14.43 14.98 35.26 26.55 24.74

94.34 102.15 153.24 114.01 108.22

Table 10 Values of parameters of each of ﬁve algorithms. HGOP

ACO

GAK

TS

SA

Parameter

Value

Parameter

Value

Parameter

Value

Parameter

Value

Parameter

Value

Number of max Num_Main_Iter

15

Ants (R)

50

Population size

50

Tabu list size

25

0.98

Positive small integerc

.01

0.98

Crossover rate

0.8

.0005

0.01

Mutation rate

0.001

Number of iteration T*

40

0.01

Maximum number of iterations

1000

Number of trial solutions Probability threshold Maximum number of iterations

40

Positive small integere

Probability threshold for maximum trail (q0) Local search probability (pls) Evaporation rate ðqÞ

Probability threshold Initial temperature

step-lengthof pattern searchD0

.5

Number of initial point

20

number of pattern iteration Max_Num_Iter_Dynamical_System radius of tabu region qi Tabu List

20 40 .2 C 20

Maximum number of iterations (itermax)

1000

0.98 1000

5

Temperature multiplier Final temperature

0.98

Number of iterations detect steady state Maximum number of iterations

100

0.01

30,000

5652

A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652

The HGOP algorithm for the human thyroid disease dataset, provides the optimum solution of 10109.874563 to this problem with success rate of 90% during 10 runs. In term of the processing time the HGOP performed better than other clustering algorithms as can be observed from Table 9. Shelokar et al. (2004) performed several simulations to ﬁnd the algorithmic parameters that result into the best performance of ACO, GAK, SA and TS algorithms in terms of the equality of solution found, the function evaluations and the processing time required. In this study, we used their algorithmic parameters. In addition, we performed several simulations to ﬁnd the algorithmic parameters for HGOP algorithm. Algorithmic parameters for all algorithms are illustrated in Table 10. The result illustrate that the proposed HGOP optimization approach can be considered as a viable and an efﬁcient heuristic to ﬁnd optimal or near optimal solutions to clustering problems of allocating N objects to k clusters. As mentioned later, ﬁnal solution in K-means algorithm is sensitive to the initial population. In proposed HGOP algorithm initial solutions and individual solutions are not important and exchange of information among different individual solutions causes the proposed algorithm to ﬁnd the global solution and actually over come the K-means shortcoming.

5. Conclusion In summary, in this paper the hybrid HGOP algorithm is used to solve clustering problems. The HGOP algorithms use the notion of relationship between variables that describes inﬂuences of the changes of the variables to each other. The HGOP algorithm takes into account some relatively worse points for further consideration. This is what other methods do, such as Simulated Annealing, Genetic Algorithms and Taboo Search. The HGOP algorithm attempts to jump over local minimum points and tries to ﬁnd deeper points. In this paper the global optimum is combined with the Tabu search to solve the problem of revisiting the visited region. This hybridization makes the algorithm to be faster. The HGOP algorithm for data clustering can be applied when the number of clusters are known a priori and are crisp in nature. To evaluate the performance of the HGOP algorithm, it is compared with other stochastic algorithms viz. ant colony, genetic algorithm, simulated annealing and Tabu search. The algorithm is implemented and tested on several simulation and real datasets; preliminary computational experience is very encouraging in terms of the quality of solution found and the processing time required.

References Blake, C. L., & Merz, C. J. UCI repository of machine learning databases. Available from: . Brucker, P. (1978). On the complexity of clustering problems, optimization and operation research. Lecture Notes in Economics and Mathematical Systems, 157, 45–54. Cvijovic, D., & Klinovski, J. (2002). Taboo search: An approach to the multipleminima problem for continuous functions. In P. Pardalos & H. Romeijn (Eds.). Handbook of Global Optimization (Vol. 2). Kluwer Academic Publishers. Forgy, E. W. (1965). Cluster analysis of multivariate data: Efﬁciency versus interpretability of classiﬁcations. Biometrics, 21(3), 768–769. Garey, M. R., Johnson, D. S., & Witsenhausen, H. S. (1982). The complexity of the generalized Lloyd–Max problem. IEEE Transactions on Information Theory, 28(2), 255–256. Glover, F., & Laguna, M. (1997). Taboo search. Kluwer Academic Publishers. Gungor, Z., & Unler, A. (2006). K-harmonic means data clustering with simulated annealing heuristic. Applied Mathematics and Computation. Hart, W. E. (2001). A convergence analysis of unconstrained and bound constrained evolutionary pattern search. Evolutionary Computation, 9(1), 1–23. Johnson, R. A., & Wichern, D. W. (1982). Applied multivariate statistical analysis. Englewood Clifs, NJ: Prentice-Hall. Krishna, K., & Murty (1999). Genetic K-means algorithm. IEEE Transaction on Systems, Man, and Cybernetics – Part B: Cybernetics, 29, 433–439. Kuo, R. I., Wang, H. S., Hu, Tung-Lai, & Chou, S. H. (2005). Application of ant K-means on clustering analysis. Computers and Mathematics with Applications, 50, 1709–1724. Mammadov, M. A. (2004). A new global optimization algorithm based on a dynamical systems approach. In Proceedings of the 6th international conference on optimization: techniques and applications. Ballarat, Australia. Mammadov, M. A., Rubinov, A. M., & Yearwood, J. (2005). Dynamical systems described by relational elasticities with applications to global optimization. In V. Jeyakumar & A. Rubinov (Eds.), Continuous optimisation: Current trends and applications (pp. 365–387). Springer. Migdalas, A., Pardalos, P., & Varbrand, P. (2001). From local to global optimization. Nonconvex Optimization and Its Applications (Vol. 53). Kluwer Academic Publishers. Mualik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33, 1455–1465. Pal, S. K., & Majumder, D. D. (1977). Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Transactions on Systems, Man, and Cybernetics, SMC-7, 625–629. Pronzato, L., Wynn, H., & Zhigljausky, A. A. (2002). An introduction to dynamical search. In P. Pardalos & H. Romeijn (Eds.). Handbook of global optimization (Vol. 2). Kluwer Academic Publishers. Selim, S. Z., & Al-Sultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003–1008. Selim, S. Z., & Ismail, M. A. (1984). K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 81–87. Shelokar, P. S., Jayaraman, V. K., & Kulkarni, B. D. (2004). An ant colony approach for clustering. Analytica Chimica Acta, 509, 187–195. Smith, J. (2002). Genetic algorithms. In P. Pardalos & H. Romeijn (Eds.). Hand book of Global Optimization (Vol. 2). Kluwer Academic Publishers. Spath, H. (1989). Clustering Analysis Algorithms. Chichester, UK: Ellis Horwood. Sung, C. S., & Jin, H. W. (2000). A tabu-search-based heuristic for clustering. Pattern Recognition, 33, 849–858. Ward, J. W. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.

A new clustering algorithm based on hybrid global optimizationbased on a dynamical systems approach algorithm

A new clustering algorithm based on hybrid global optimizationbased on a dynamical systems approach algorithm

Recommend Documents