Expert Systems with Applications 37 (2010) 5645–5652
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
A new clustering algorithm based on hybrid global optimization based on a dynamical systems approach algorithm Ali Maroosi *, Babak Amiri Iran University of Science and Technology, Tehran, Iran
a r t i c l e
i n f o
a b s t r a c t
Keywords: Clustering K-means Dynamical systems Tabu search
Many methods for local optimization are based on the notion of a direction of a local descent at a given point. A local improvement of a point in hand can be made using this direction. As a rule, modern methods for global optimization do not use directions of global descent for global improvement of the point in hand. From this point of view, global optimization algorithm based on a dynamical systems approach (GOP) is an unusual method. Its structure is similar to that used in local optimization: a new iteration can be obtained as an improvement of the previous one along a certain direction. In contrast with local methods, is a direction of a global descent and for more diversification combined with Tabu search. This algorithm is called hybrid GOP (HGOP). Cluster analysis is one of the attractive data mining techniques that are used in many fields. One popular class of data clustering algorithms is the center based clustering algorithm. K-means is used as a popular clustering method due to its simplicity and high speed in clustering large datasets. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies have been done in clustering. In this paper, we proposed application of hybrid global optimization algorithm based on a dynamical systems approach. We compared HGOP with other algorithms in clustering, such as GAK, SA, TS, and ACO, by implementing them on several simulation and real datasets. Our finding shows that the proposed algorithm works better than others. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction
One popular class of data clustering algorithms is the center based clustering algorithm. K-means is used as a popular clustering method due to its simplicity and high speed in clustering large datasets (Forgy, 1965). However, K-means has two shortcomings: dependency on the initial state and convergence to local optima (Selim & Ismail, 1984) and also global solutions of large problems cannot found with reasonable amount of computation effort (Spath, 1989). In order to overcome local optima problem lots of studies have been done in clustering. Mualik and Bandyopadhyay (2000) proposed a genetic algorithm based method to solve the clustering problem and experiment on synthetic and real life datasets to evaluate the performance. The results showed that GA-based method might improve the final output of K-means. Krishna and Murty (1999) proposed a novel approach called genetic K-means algorithm for clustering analysis. It defines a basic mutation operator specific to clustering called distance-based mutation. Using finite Markov chain theory, it proved that GKA converge to the best-known optimum. Selim and Al-Sultan (1991) discussed the solution of the clustering problem usually solved by the K-means algorithm. The is problem known to have local minimum solutions, which are
Clustering, so-called set partitioning, is a basic and widely applied methodology. Application fields include statistics, mathematical programming (such as location selecting, network partitioning, routing, scheduling and assignment problems, etc.) and computer science (including pattern recognition, learning theory, image processing and computer graphics, etc.). Clustering is mainly to group all objects into several mutually exclusive clusters in order to achieve the maximum or minimum of an objective function. Clustering is rapidly becoming computationally intractable as problem scale increases, because of the combinatorial character of the method. Brucker (1978) and Ward (1963) proved that, for specific object functions, clustering becomes an NP-hard problem when the number of clusters exceeds 3. There are many methods applied in clustering analysis, like hierarchical clustering, partition-based clustering, density-based clustering, and artificial intelligence-based clustering. * Corresponding author. E-mail addresses:
[email protected],
[email protected] (A. Maroosi),
[email protected] (B. Amiri). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.02.047
5646
A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652
usually what the K-means algorithm obtains. The simulated annealing approach for solving optimization problems described and proposed for solving the clustering problem. The parameters of the algorithm were discussed in detail and it was shown that the algorithm converges to a global solution of the clustering problem. According to Sung and Jin (2000), researchers considered a clustering problem where a given data set partitioned into a certain number of natural and homogeneous subsets such that each subset is composed of elements similar to one another but different from those of any other subset. For the clustering problem, a heuristic algorithm exploited by combining the Tabu search heuristic with two complementary functional procedures, called packing and releasing procedures. The algorithm was numerically tested for its electiveness in comparison with reference works including the Tabu search algorithm, the K-means algorithm and the simulated annealing algorithm. Over the last decade, modeling the behavior of social insects, such as ants and bees, for the purpose of search and problem solving has been the context of the emerging area of swarm intelligence. Using ant colony is a typical successful swarm-based optimization approach, where the search algorithm is inspired by the behavior of real ants. Kuo, Wang, Hu, and Chou (2005) proposed a novel clustering method, ant K-means (AK) algorithm. Ant K-means algorithm modifies the K-means as locating the objects in a cluster with the probability, which updated by the pheromone, while the rule of updating pheromone is according to total within-cluster variance (TWCV). Shelokar, Jayaraman, and Kulkarni (2004) presents an ant colony optimization, methodology for optimally clustering N objects into K clusters. The algorithm employs distributed agents who mimic the way real ants find a shortest path from their nest to food source and back. They compared result with other algorithms in clustering, GA, Tabu search, SA. They showed that their algorithms are better than other algorithms in performance and time. This paper presents application of HGOP algorithm for clustering. The paper is organized as follows: in Section 2 we discussed cluster analysis problems. Section 3 introduces HGOP philosophy and application of it on clustering, and then in Section 4 experimental result of proposed clustering algorithm in comparison with other clustering algorithms is shown.
C i – / for i ¼ 1; . . . ; k; C i \ C j ¼ / for i ¼ 1; . . . ; k; j ¼ 1; . . . ; k; and i – j K
and [ C i ¼ S i¼1
In this study, we will also use Euclidian metric as a distance metric. The existing clustering algorithms can be simply classified into the following two categories: hierarchical clustering and partitional clustering. The most popular class of partitional clustering methods are the center based clustering algorithms (Gungor & Unler, 2006). The K-means algorithms, is one of the most widely used center based clustering algorithms (Forgy, 1965). To find K centers, the problem is defined as an optimization (minimization) of a performance function, f ðX; ZÞ, defined on both the data items and the center locations. A popular performance function for measuring goodness of the k clustering is the total within-cluster variance or the total mean-square quantization error (MSE), Eq. (3) (Gungor & Unler, 2006).
f ðX; ZÞ ¼
N X
MinfkX i Z l k2 j l ¼ 1; . . . ; K:g
ð3Þ
i¼1
The steps of the K-means algorithm are as follow (Mualik & Bandyopadhyay, 2000): Step 1: Choose K cluster centers Z 1 ; Z 2 ; . . . ; Z k randomly from n points fX 1 ; X 2 ; . . . ; X n g. to cluster Step 2: Assign point X i ; i ¼ 1; 2; . . . ; n C j ; J 2 f1; 2; . . . ; Kg if kX i Z j k < kX i Z p k, p=1, 2, . . ., K, and j – p. Step 3: Compute new cluster centers Z 1 ; Z 2 ; . . . ; Z K as follows:
Z i ¼
1X Xj; n x 2C j
i ¼ 1; 2; . . . ; K;
i
where ni is the number of elements belonging to cluster C i . Step 4: If termination criteria are satisfied, stop otherwise continues from step 2 Note that in case the process does not terminate at step 4 normally, then it is executed for a mutation fixed number of iterations. Global optimization algorithm (GOP)
2. Clustering Steps of the global optimization algorithm are as follow: Data clustering, which is an NP-complete problem of finding groups in heterogeneous data by minimizing some measure of dissimilarity, is one of the fundamental tools in data mining, machine learning and pattern classification solutions (Garey, Johnson, & Witsenhausen, 1982). Clustering in N-dimensional Euclidean space RN is the process of partitioning a given set of n points into a number, say k, of groups (or, clusters) based on some similarity (distance) metric in clustering procedure which is Euclidean distance, derived from the Minkowski metric (Eqs. (1) and (2)).
dðx; yÞ ¼
m X
!1=r jxi yj j
r
ð1Þ
i¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 u m uX dðx; yÞ ¼ t ðxi yj Þ
ð2Þ
i¼1
Let the set of n points fX 1 ; X 2 ; . . . ; X n g be represented by the set S and the k clusters be represented by C 1 ; C 2 . . . ; C K . Then:
1. Select some initial random points that is define a set A ¼ fUðtÞ ¼ ðU1ðtÞ; . . . ; UnðtÞÞ; t ¼ 1; . . . ; Tg; the set A uniformly select from the box U 2 RKd ; ai 6 ui 6 bi ; i ¼ 1; . . . ; Kd;. 2. Calculate the performance value at set A and then chose a point U 2 Awhich provides the best performance value and U T ¼ U . 3. Find good point U Tþ1 from U ðU T Þ and add it to the set A. For each point U 2 A and each coordinate i calculate degree of the change of objective function values f(U) when ui changes. Here change means either decrease or increase of a scalar variable. 4. U T ¼ U that U ¼ ðu0 ; u1 ; . . . ; un Þ, calculate Fðui "Þ Fðui #Þ. 5. Using these degrees, calculate forces FðTÞ ¼ ðF 1 ðTÞ; . . . ; F KL ðTÞÞ acting on increase of objective function values f at the point U ðU T Þ. FðTÞ ¼ Fðui "Þ Fðui #Þ. 6. Calculate U Tþ1 from U T by U Tþ1 ¼ U T þ aFðTÞ. 7. Then by the same manner, choose a new point U Tþ2 and so on. 8. This process is terminated if either FðTÞ ¼ 0 or after maximum iteration hold the best solution and repeat all of the these procedures with take other initial random points and another stage again.
A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652
3. Application of GOP in clustering 3.1. Philosophy behind the development of the GOP There are many different methods and algorithms developed for global optimization problems (Migdalas, Pardalos, & Varbrand, 2001). The algorithms GOP takes into account some relatively ”worse” points for further consideration. This is what many other methods do, such as Simulated, Annealing (Glover & Laguna, 1997), Genetic Algorithms (Smith, 2002) and Taboo Search (Cvijovic & Klinovski, 2002). The choice of a decent (good) direction
5647
is the main part of each algorithm. Instead of using a stochastic search (as in the algorithms mentioned), GOP uses the formula (Mammadov, 2004; Mammadov, Rubinov, & Yearwood, 2005). Note that the GOP algorithm has quite different settings and motivations compared to the methods that use so-called dynamical search (Pronzato, Wynn, & Zhigljausky, 2002). This method for a search has some ideas in common with the heuristic method which attempts to estimate the overall convexity characteristics of the objective function. The advantage of this approach is that it does not use any approximate underestimations including convex underestimations.
Fig. 1. Flowchart of the HGOP clustering algorithm.
5648
A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652
3.2. Application of hybrid GOP algorithm on clustering The search capability of HGOP algorithm used in this article for the purpose of appropriately determining a fixed number of K cluster centers in RN ; thereby suitably clustering the set of n unlabelled points the clustering metric that has been adopted is the sum of the Euclidean distance of the points from their respective cluster centers. The steps of the proposed algorithm are shown in Fig. 1 and described in detail in this section. The pseudo code for HGOP clustering algorithm is shown in Fig. 2. Steps of application of the HGOP on clustering are as follows: Step 0. f ðU Total
Best Þ
þ1
Step 5-1. For each point U 2 A and each coordinate i calculate degree of the change of f(U) when ui changes. Here change means either decrease or increase of a scalar variable. If U T ¼ U ¼ ðu0 ; u1 ; . . . ; un Þ points that force FðTÞ was calculated for this point and U ¼ ðu0 ; u1 ; . . . ; un Þ was represented, the points exist.This method describes dynamical systems, based on the nonfunctional relationship between two variables. It is based on the notion of a fuzzy derivative. The fuzzy derivative @ui =@uj is defined as an influence of the feature j on the feature i that
@ui ¼ lðui ; uj Þ; @uj
i; j 2 1; . . . ; n
where
Num Main Iter ¼ 0;
lðui ; uj Þ ¼ ðl1 ðui ; uj Þ; l2 ðui ; uj Þ; l3 ðui ; uj Þ; l4 ðui ; uj ÞÞ; i; j
TabuListðTLÞ ¼ /; Step 1. Select some initial random points that is define a set A ¼ fUðtÞ ¼ ðU 1 ðtÞ; . . . ; U n ðtÞÞ; t ¼ 1; . . . ; Tg; the set A is uniformly selected from the box U 2 RKd ; ai 6 ui 6 bi ; i ¼ 1; . . . ; Kd; and does not belong to the Tabu Region (Visited Region), where ai ; bi are minimum and maximum bound of ui that set with minimum and maximum of data that should be clustered. K is the number of clusters and d is the number of dimensions of points and Tabu region (TR) determined by Tabu List. Step 2. If TL ¼ fV 1 ; V 2 ; . . . ; V M then TR ¼ fUjjU V i j < qi ; i 2 f1; 2; . . . ; Mg such that qi is radius of spherical area with center Vi that is percent of C ðC ¼ maxðbi ai Þ that i ¼ 1; . . . ; KdÞ.The i th point is represented as a vector of decision variable values UðiÞ ¼ ðu1i ; u2i ; . . . ; uKd i Þ that is a candidate solution with K cluster centers. For example Uð1Þ ¼ ð2; 5; 1; 6; 3; 2; 5; 7; 4Þ represents a solution with 3 cluster centers that are Z 1 ¼ ð2; 5; 1Þ; Z 2 ¼ ð6; 3; 2Þ; Z 3 ¼ ð5; 7; 4Þ, such that each center has three dimensions. Step 3. Compute the performance value of f ðiÞ for each point UðiÞ. Note that to prevent the difference between U 1 ¼ ðZ 1 ; Z 2 ; Z 3 Þ and U 1 ¼ ðZ 2 ; Z 1 ; Z 3 Þ after selecting initial random points swap Zi to overcome this problem. Step 4. Calculate the performance value at set A and then choose U as U 2 A, which it provides the best performance value and then put U in U T . Step 5. Find good point U Tþ1 from U ðU T Þ and add it to the set A and update Tabu List as follow:
2 f0; . . . ; ng and i – j and l ðui ; uj Þ ¼ dðui "; uj "Þ; l2 ðui ; uj Þ ¼ dðui "; uj #Þ; l3 ðui ; uj Þ ¼ dðui #; uj #Þ; l4 ðui ; uj Þ ¼ dðui #; uj "Þl1 ðui ; uj Þ shows the degree of the increase of the entry i if the entry j increases starting from the initial state ðui ; uj Þ and so as for l2 ðui ; uj Þ; l3 ðui ; uj Þ and l4 ðui ; uj Þ 1
l1 ¼
M11 ; M1
l1 ¼
M12 ; M1
l1 ¼
M13 ; M2
l1 ¼
M 14 M2
where M1 is the number of points ðui ; uj Þ that satisfy ui > ui . M 11 is the number of points ðui ; uj Þ that satisfy ui > ui and uj > uj . M12 is the number of points ðui ; uj Þ that satisfy ui > ui and uj < uj . M2 is the number of points ðui ; uj Þ that satisfying ui < ui . M13 is the number of points ðui ; uj Þ that satisfy ui < ui and uj < uj . M14 is the number of points ðui ; uj Þ that satisfy ui < ui and uj > uj . lðui ; uj Þ ¼ ðg1 ; g2 ; g3 ; g4 Þ and lðuj ; ui Þ ¼ ð11 ; 12 ; 13 ; 14 Þ then Fðuj ! ui "Þ ¼ g1 11 þ g4 12 and Fðuj ! ui #Þ ¼ g3 13 þ g2 14 . The resulting forces on the entry i is defined as a sum of all these forces that is, P P Fðui "Þ ¼ i – j Fðuj ! ui "Þ; Fðui #Þ ¼ i – j Fðuj ! ui #Þ Step 5-2. Using these degrees, calculate forces FðTÞ ¼ ðF 1 ðTÞ; . . . ; F KL ðTÞÞ acting to increase amount of f at the point of U ðU T Þ. FðTÞ ¼ Fðui "Þ Fðui #Þ b Tþ1 by using UT. Step 5-3. Calculate U set a ¼ 1 and U Dynamical System ¼ U T while a > e and Iteration
Fig. 2. Pseudo code for HGOP clustering algorithm.
A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652
U Dynamical
System
¼ U Dynamical
System
þ aFðTÞ
else
5649
our experiments on a Pentium IV, 2.8 GHz, 512 GB RAM computer and we have coded with Matlab 7.1 software. We run all five algorithms on datasets.
a ¼ a=2 Iteration= Iteration+1;
b Tþ1 ¼ U Dynamical U
System
In above e is a small positive number. Step 5-4. Refinement of a new point: Using a local optimizab Tþ1 to find a local minimum U Tþ1 tion procedure starting in U . A direct search method (Pattern Search, (Hart, 2001)) is used at this stage. b Tþ1 ; Dp ¼ D0 . Step 5-4-1. Set u0 ¼ U Step 5-4-2. for p ¼ 0; 1; . . ., number of pattern iteration, we have an iterate up and a step-length parameter Dp > 0 Let ei; i ¼ 1; . . . ; n, denote the standard unit basis vectors. Step 5-4-3. Look at the points upattern ¼ up Dp ei ; i ¼ 1; . . . ; n to find upattern for which f ðupattern Þ < f ðup Þ. If you find no upattern such that f ðupattern Þ < f ðup Þ, then reduce Dp by a half and continue; otherwise, leave the steplength parameter alone, setting Dpþ1 ¼ Dp and upþ1 ¼ upattern . In the latter case we can also increase the steplength parameter, say, by a factor of 2, if we feel a longer step might be justified. Step 5-4-4. Repeat the iteration just described until Dp is deemed sufficiently small. Step 5-4-5. At the end U Tþ1 ¼ unumberofpatterniteration Step 5-5. Construct a new population Aðt þ 1Þ ¼ AðtÞ[fU Tþ1 g. Step 5-6. Update Tabu List with U Tþ1 , if Tabu List is full, delete the point with worse ranking. Tabu Lists are ranked and saved according to their objective function values and Sum of the Euclidian distance of each point from another point in the list. 5-6-1. Save ranking of Tabu List according to the objective function values and name it Objective Function Rank (OFR). 5-6-2. Save ranking of Tabu List according to the maximum sum of Euclidian distance values from the tabu list points and name it Distance Rank (DR). 5-6-3. Ranking of each point in TL is best ranking in DR and OFR. Step 6. If either FðTÞ < c or T > T*; go to step 7, otherwise, return to step 4 where c > 0 and T* is a positive integer. Step 7. If f ðU Total Best Þ > f ðU Tþ1 Þ then U Total Best ¼ U Tþ1 else U Total Best ¼ U Total Best . If Num_Main_Iter is less than Max_Num_Main_Iter then Num_Main_Iter = Num_Main_Iter +1; and return to step 1. Step 8. The U Total Best is final solution and Cluster center.
4.1. Artificial data sets Data 1: This is a nonoverlapping two-dimensional data set where the number of clusters is two. It has 10 points. The value of K is chosen to be 2 for this data set. Data 2: This is a nonoverlapping two-dimensional data set where the number of clusters is three. It has 76 points. The value of K is chosen to be 3 for this data set. Data 3: This is an overlapping two-dimensional triangular distribution of data points having nine classes where all the classes are assumed to have equal a priori probabilities ( = 1/19 ). It has 900 data points. The X-Y ranges for the nine classes are as follows: Class Class Class Class Class Class Class Class Class
1: 2: 3: 4: 5: 6: 7: 8: 9:
[3.3,0.7] [0.7, 3.3], [1.3, 1.3] [0.7, 3.3], [0.7, 3.3] [0.7, 3.3], [3.3,0.7] [1.3, 1.3], [1.3, 1.3] [1.3, 1.3], [0.7, 3.3] [1.3, 1.3], [3.3,0.7] [3.3,0.7], [1.3, 1.3] [3.3,0.7], [0.7, 3.3] [3.3,0.7].
Thus the domain for the triangular distribution for each class and for each axis is 2.6. Consequently, the height will be 1/1.3 (since 12*2.6*height”1). The value of K is chosen to be 9 for this data set. Data 4: This is an overlapping ten-dimensional data set generated using a triangular distribution of the form shown in Fig. 3 for two classes, 1 and 2. It has 1000 data points. The value of K is chosen to be 2 for this data set. The range for class 1 is [0, 2] [0, 2] [0, 2]. . .10 times, and that for class 2 is [1, 3] [0, 2] [0, 2]. . .9 times, with the corresponding peaks at (1, 1) and (2, 1). The distribution along the first axis (X) for class 1 may be formally quantified as:
8 0 > > > < x f1 ðxÞ ¼ > 2 x > > : 0
for x 6 0 for 0 < x 6 1 for 1 < x 6 2 for x > 2
for class 1. Similarly for class 2
4. Experimental result The experimental results comparing the HGOP clustering algorithm with several typical algorithms including the ACO algorithm (Shelokar et al., 2004), the simulated annealing approach (Selim & Al-Sultan, 1991), the genetic K-means algorithm (Krishna & Murty, 1999), and the Tabu search approach (Sung & Jin, 2000) are provided for four artificial data sets (Data 1, Data 2, Data 3 and Data 4) and five real-life data sets (Vowel, Iris, Crude Oil, Wine and Thyroid diseases data), respectively. These are first described below. The effectiveness of algorithms is greatly dependent on the generation of initial solutions. Therefore, for every dataset, algorithms performed 10 times individually for their own effectiveness tests, each time with randomly generated initial solutions. We have done
Fig. 3. Triangular distribution along the X-axis.
5650
A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652
8 0 > > >
3 x > > : 0
for x 6 1
4.2. Real-life data sets
for 1 < x 6 2 for 2 < x 6 3 for x > 3
The distribution along the other nine axes ðY i ; i ¼ 1; 2; . . . ; 9Þ for both the classes is
8 0 > > > 2 yi > > : 0
for yi 6 0 for 0 < yi 6 1 for 1 < yi 6 2 foryi > 2
Table 1 Result obtained by the five algorithms for 10 different runs on dataset 1. Method
HGOP ACO GAK TS SA
Function value
CPU time (s)
F best
F av erage
F worst
3,120125 3,142375 3,273426 3,244326 3,217832
3,131337 3,163422 3,355521 3,310024 3,282089
3,228137 3,352843 3,683901 3,572814 3,539115
1.81 1.89 2.01 1.92 1.99
Table 2 Result obtained by the five algorithms for 10 different runs on dataset 2. Method
HGOP ACO GAK TS SA
Function value
CPU time (s)
F best
F av erage
F worst
51.493674 52.082746 56,142562 54.752946 53.562492
51,533427 52,212071 56,377520 54,879342 53,635943
51.687453 52.729373 57,317354 55.384927 53.929748
8.23 8.98 17.24 14.57 14.82
Table 3 Result obtained by the five algorithms for 10 different runs on dataset 3. Method
HGOP ACO GAK TS SA
Function value
CPU time (s)
F best
F av erage
F worst
962.342786 964.739472 966,649837 972.629478 966.418263
962.578234 965,048327 966,772302 973,209275 966,614089
964.753761 966.283745 966,853946 975.528463 967.397392
25.93 26.88 38.52 32.78 31.24
Table 4 Result obtained by the five algorithms for 10 different runs on dataset 4. Method
HGOP ACO GAK TS SA
Function value
Table 6 Result obtained by the five algorithms for 10 different runs on Iris data.
CPU time (s)
F best
F av erage
F worst
1246.135426 1248.958685 1258.673362 1282.538294 1249.736287
1246,325342 1249,034036 1520,777767 1285,988483 1249,968105
1246.374356 1249.335442 1271.635528 1299.789237 1250.895375
Vowel data: This data consists of 871 Indian Telugu vowel sounds (Pal & Majumder, 1977). These were uttered in a consonant–vowel–consonant context by three male speakers in the age group of 30–35 years. The data set has three features F1, F2 and F3, corresponding to the first, second and third vowel formant frequencies, and six overlapping classes fd; a; i; u; e; og. The value of K is therefore chosen to be 6 for this data. Iris data: This is the Iris data set, which is perhaps the bestknown database to found in the pattern recognition literature. Fisher’s paper is a classic in the field and referenced frequently to this day. The data set contains three classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are not linearly separable from each other. There are 150 instances with four numeric attributes in iris data set. There is no missing attribute value. The attributes of the iris data set are; sepal length in cm, sepal width in cm, petal length in cm and petal width in cm (Blake and Merz). Crude oil data: This overlapping data (Johnson & Wichern, 1982) has 56 data points, 5 features and 3 classes. Hence the value of K is chosen to be 3 for this data set. Wine data: This is the wine data set, which is also taken from MCI laboratory. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. There are 178 instances with 13 numeric attributes in wine data set. All attributes are continuous. There is no missing attribute value. Thyroid diseases data: This dataset categories N = 215 samples of patients suffering from three human thyroid diseases, K = 3 as: euthyroid, hyperthyroidism, and hypothyroidism patients where 150 individuals tested euthyroid thyroid, 30 patients experienced hyperthyroidism thyroid while 35 patients suffered from hypothyroidism thyroid. Each individual was characterized by the result of five, n = 5 laboratory tests as: total serum thyroxin, total serum triiodothyronine, serum tri-iodothyronine resin uptake, serum thyroid-stimulating hormone (TSH), and increase TSH after injection of TSH-releasing hormone (Blake and Merz). The comparison of results for each dataset is based on the bet solution found in 10 distinct runs of each algorithm and the convergence processing time taken to attain the best solution. The solution quality is also given in terms of the average and worst
120.63 122.34 178.42 142.15 136.61
Method
HGOP ACO GAK TS SA
Function value
CPU time (s)
F best
F average
F worst
96.370352 97.100777 113.986503 97.365977 97.100777
96.373654 97.171546 125.197025 97.868008 97.134625
96.387564 97.808466 139.778272 98.569485 97.263845
31.35 33.72 105.53 72.86 95.92
Table 5 Result obtained by the five algorithms for 10 different runs on Vowel data. Method
HGOP ACO GAK TS SA
Function value
CPU time (s)
F best
F average
F worst
148718.363754 148837.736634 149346.152274 150635.653256 149357.634587
148718.454321 148837,768828 149391,501798 1506480,795320 1494360,175420
148718.674567 148837.937878 149436.851323 150697.784636 149749.549362
69.85 73.65 98.72 81.25 79.46
5651
A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652 Table 7 Result obtained by the five algorithms for 10 different runs on Crude oil data. Method
Function value
CPU time (s)
F best
F average
F worst
HGOP ACO GAK TS SA
250.983245 253.564637 278,965152 254.645375 253.763548
251.243564 254,180897 279,907028 2554229528 254,653207
252.028164 256.645938 283.674535 258.533264 258.211847
values of the clustering metric (F avg ; F worst , respectively) after 10 different runs for each of the five algorithms. F is the performance of clustering method that illustrated in Eq. (3). Tables 1–9 show these results. For Data 1 (Table 1) it is found that the HGOP clustering algorithm provides the optimal vale of 3,120125 in 90% of the total runs that is better than other clustering algorithms. The ACO clustering algorithm found value of 3,142375 in 90% of runs and GAK, TS, and SA found values of 3,273426, 3,244326, and 3,217832 in 80% of runs. The HGOP required the least processing time (1.81). For Data 2 (Table 2) the HGOP clustering algorithm attains the best
value of 51.493674 in 90% of the total runs. On the other hand ACO, GAK, TS and SA algorithms attain 52,082746, 56,142562, 54,752946, and 53,562492 in 80% of the total runs. The execution time taken by the HGOP algorithm is less than other algorithms (8.23). Similarly for Data 3 (Table 3) and Data 4 (Table 4) the best HGOP clustering algorithm attains the best values of 962.342786 and 1246.135426 in 90% and all of total runs, respectively. The best value provided by ACO, TS and SA obtained in 80% of total runs and the best value provided by GAK obtained in 40% of runs. In terms of the processing time, the HGOP performed better than other clustering algorithms as can be observed from Tables 3 and 4. For Vowel Data, (Table 5) the HGOP clustering algorithm attains the best value of 148718.363754 in 90% of runs. ACO, TS, and SA provided the best values in 80% of runs and the GAK algorithm attains the best value only in 50% of total runs. In addition the HGOP clustering algorithm performed better than other algorithms in terms of the processing time required (69.85). For clustering problem, on iris dataset results given in Table 6, show that the HGOP provide the optimum value of 96.370352. The HGOP and ACO were able to find the optimum nine times as compared to that of five times obtained by SA. The HGOP required the least processing time (31.35). For Crude Oil data set, the HGOP clustering algorithm attains the best value of 251.534997 in 90% of total runs and ACO, GAK, TS, and SA attain the best value of 253.564637, 278,965152, 254.645375, and 250.983245 in 80% of total runs. The processing time required by HGOP is less than other algorithms (14.43). The result obtained for the clustering problem, Wine dataset given in Table 8. The HGOP find the optimum solution of 16228.645326 and the ACO, SA and GAK methods provide 16530.533807. The HGOP, ACO, SA and GAK methods found the optimum solution in all their 10 runs. The execution time taken by the HGOP algorithm is less than other algorithms.
Table 8 Result obtained by the five algorithms for 10 different runs on Wine data. Method
HGOP ACO GAK TS SA
Function value
CPU time (s)
F best
F average
F worst
16228.645326 16530.533807 16530.533807 16666.226987 16530.533807
16228.645326 16530.533807 16530.533807 16785.459275 16530.533807
16228.645326 16530.533807 16530.533807 16837.535670 16530.533807
56.37 68.29 226.68 161.45 57.28
Table 9 Result obtained by the five algorithms for 10 different runs on Thyroid data. Method
HGOP ACO GAK TS SA
Function value
CPU time (s)
F best
F average
F worst
10109.874563 10111.827759 10116.294861 10249.72917 10111.827759
10111.132455 10112.126903 10128.823145 10354.315021 10114.045265
10113.657348 10114.819200 10148.389608 10438.780449 10115.934358
14.43 14.98 35.26 26.55 24.74
94.34 102.15 153.24 114.01 108.22
Table 10 Values of parameters of each of five algorithms. HGOP
ACO
GAK
TS
SA
Parameter
Value
Parameter
Value
Parameter
Value
Parameter
Value
Parameter
Value
Number of max Num_Main_Iter
15
Ants (R)
50
Population size
50
Tabu list size
25
0.98
Positive small integerc
.01
0.98
Crossover rate
0.8
.0005
0.01
Mutation rate
0.001
Number of iteration T*
40
0.01
Maximum number of iterations
1000
Number of trial solutions Probability threshold Maximum number of iterations
40
Positive small integere
Probability threshold for maximum trail (q0) Local search probability (pls) Evaporation rate ðqÞ
Probability threshold Initial temperature
step-lengthof pattern searchD0
.5
Number of initial point
20
number of pattern iteration Max_Num_Iter_Dynamical_System radius of tabu region qi Tabu List
20 40 .2 C 20
Maximum number of iterations (itermax)
1000
0.98 1000
5
Temperature multiplier Final temperature
0.98
Number of iterations detect steady state Maximum number of iterations
100
0.01
30,000
5652
A. Maroosi, B. Amiri / Expert Systems with Applications 37 (2010) 5645–5652
The HGOP algorithm for the human thyroid disease dataset, provides the optimum solution of 10109.874563 to this problem with success rate of 90% during 10 runs. In term of the processing time the HGOP performed better than other clustering algorithms as can be observed from Table 9. Shelokar et al. (2004) performed several simulations to find the algorithmic parameters that result into the best performance of ACO, GAK, SA and TS algorithms in terms of the equality of solution found, the function evaluations and the processing time required. In this study, we used their algorithmic parameters. In addition, we performed several simulations to find the algorithmic parameters for HGOP algorithm. Algorithmic parameters for all algorithms are illustrated in Table 10. The result illustrate that the proposed HGOP optimization approach can be considered as a viable and an efficient heuristic to find optimal or near optimal solutions to clustering problems of allocating N objects to k clusters. As mentioned later, final solution in K-means algorithm is sensitive to the initial population. In proposed HGOP algorithm initial solutions and individual solutions are not important and exchange of information among different individual solutions causes the proposed algorithm to find the global solution and actually over come the K-means shortcoming.
5. Conclusion In summary, in this paper the hybrid HGOP algorithm is used to solve clustering problems. The HGOP algorithms use the notion of relationship between variables that describes influences of the changes of the variables to each other. The HGOP algorithm takes into account some relatively worse points for further consideration. This is what other methods do, such as Simulated Annealing, Genetic Algorithms and Taboo Search. The HGOP algorithm attempts to jump over local minimum points and tries to find deeper points. In this paper the global optimum is combined with the Tabu search to solve the problem of revisiting the visited region. This hybridization makes the algorithm to be faster. The HGOP algorithm for data clustering can be applied when the number of clusters are known a priori and are crisp in nature. To evaluate the performance of the HGOP algorithm, it is compared with other stochastic algorithms viz. ant colony, genetic algorithm, simulated annealing and Tabu search. The algorithm is implemented and tested on several simulation and real datasets; preliminary computational experience is very encouraging in terms of the quality of solution found and the processing time required.
References Blake, C. L., & Merz, C. J. UCI repository of machine learning databases. Available from: . Brucker, P. (1978). On the complexity of clustering problems, optimization and operation research. Lecture Notes in Economics and Mathematical Systems, 157, 45–54. Cvijovic, D., & Klinovski, J. (2002). Taboo search: An approach to the multipleminima problem for continuous functions. In P. Pardalos & H. Romeijn (Eds.). Handbook of Global Optimization (Vol. 2). Kluwer Academic Publishers. Forgy, E. W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21(3), 768–769. Garey, M. R., Johnson, D. S., & Witsenhausen, H. S. (1982). The complexity of the generalized Lloyd–Max problem. IEEE Transactions on Information Theory, 28(2), 255–256. Glover, F., & Laguna, M. (1997). Taboo search. Kluwer Academic Publishers. Gungor, Z., & Unler, A. (2006). K-harmonic means data clustering with simulated annealing heuristic. Applied Mathematics and Computation. Hart, W. E. (2001). A convergence analysis of unconstrained and bound constrained evolutionary pattern search. Evolutionary Computation, 9(1), 1–23. Johnson, R. A., & Wichern, D. W. (1982). Applied multivariate statistical analysis. Englewood Clifs, NJ: Prentice-Hall. Krishna, K., & Murty (1999). Genetic K-means algorithm. IEEE Transaction on Systems, Man, and Cybernetics – Part B: Cybernetics, 29, 433–439. Kuo, R. I., Wang, H. S., Hu, Tung-Lai, & Chou, S. H. (2005). Application of ant K-means on clustering analysis. Computers and Mathematics with Applications, 50, 1709–1724. Mammadov, M. A. (2004). A new global optimization algorithm based on a dynamical systems approach. In Proceedings of the 6th international conference on optimization: techniques and applications. Ballarat, Australia. Mammadov, M. A., Rubinov, A. M., & Yearwood, J. (2005). Dynamical systems described by relational elasticities with applications to global optimization. In V. Jeyakumar & A. Rubinov (Eds.), Continuous optimisation: Current trends and applications (pp. 365–387). Springer. Migdalas, A., Pardalos, P., & Varbrand, P. (2001). From local to global optimization. Nonconvex Optimization and Its Applications (Vol. 53). Kluwer Academic Publishers. Mualik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33, 1455–1465. Pal, S. K., & Majumder, D. D. (1977). Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Transactions on Systems, Man, and Cybernetics, SMC-7, 625–629. Pronzato, L., Wynn, H., & Zhigljausky, A. A. (2002). An introduction to dynamical search. In P. Pardalos & H. Romeijn (Eds.). Handbook of global optimization (Vol. 2). Kluwer Academic Publishers. Selim, S. Z., & Al-Sultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003–1008. Selim, S. Z., & Ismail, M. A. (1984). K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 81–87. Shelokar, P. S., Jayaraman, V. K., & Kulkarni, B. D. (2004). An ant colony approach for clustering. Analytica Chimica Acta, 509, 187–195. Smith, J. (2002). Genetic algorithms. In P. Pardalos & H. Romeijn (Eds.). Hand book of Global Optimization (Vol. 2). Kluwer Academic Publishers. Spath, H. (1989). Clustering Analysis Algorithms. Chichester, UK: Ellis Horwood. Sung, C. S., & Jin, H. W. (2000). A tabu-search-based heuristic for clustering. Pattern Recognition, 33, 849–858. Ward, J. W. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.