Clustering using firefly algorithm: Performance study

Clustering using firefly algorithm: Performance study

Swarm and Evolutionary Computation 1 (2011) 164–171 Contents lists available at SciVerse ScienceDirect Swarm and Evolutionary Computation journal ho...

275KB Sizes 3 Downloads 182 Views

Swarm and Evolutionary Computation 1 (2011) 164–171

Contents lists available at SciVerse ScienceDirect

Swarm and Evolutionary Computation journal homepage: www.elsevier.com/locate/swevo

Regular paper

Clustering using firefly algorithm: Performance study J. Senthilnath, S.N. Omkar ∗ , V. Mani Department of Aerospace Engineering, Indian Institute of Science, Bangalore, India

article

info

Article history: Received 10 February 2011 Received in revised form 5 May 2011 Accepted 2 June 2011 Available online 30 June 2011 Keywords: Clustering Classification Firefly algorithm

abstract A Firefly Algorithm (FA) is a recent nature inspired optimization algorithm, that simulates the flash pattern and characteristics of fireflies. Clustering is a popular data analysis technique to identify homogeneous groups of objects based on the values of their attributes. In this paper, the FA is used for clustering on benchmark problems and the performance of the FA is compared with other two nature inspired techniques — Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), and other nine methods used in the literature. Thirteen typical benchmark data sets from the UCI machine learning repository are used to demonstrate the results of the techniques. From the results obtained, we compare the performance of the FA algorithm and conclude that the FA can be efficiently used for clustering. Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved.

1. Introduction Clustering is an important unsupervised classification technique, where a set of patterns, usually vectors in a multidimensional space, are grouped into clusters (or groups) based on some similarity metric [1–4]. Clustering is often used for a variety of applications in statistical data analysis, image analysis, data mining and other fields of science and engineering. Clustering algorithms can be classified into two categories: hierarchical clustering and partitional clustering [5,6]. Hierarchical clustering constructs a hierarchy of clusters by splitting a large cluster into smaller ones and merging smaller cluster into their nearest centroid [7]. In this, there are two main approaches: (i) the divisive approach, which splits a larger cluster into two or more smaller ones; (ii) the agglomerative approach, which builds a larger cluster by merging two or more smaller clusters. On the other hand partitional clustering [8,9] attempts to divide the data set into a set of disjoint clusters without the hierarchical structure. The most widely used partitional clustering algorithms are the prototype-based clustering algorithms where each cluster is represented by its center. The objective function (a square error function) is the sum of the distance from the pattern to the center [6]. In this paper we are concerned with partitional clustering for generating cluster centers and further using these cluster centers to classify the data set. A popular partitional clustering algorithm—k-means clustering, is essentially a function minimization technique, where the objective function is the squared error. However, the main



Corresponding author. E-mail address: [email protected] (S.N. Omkar).

drawback of k-means algorithm is that it converges to a local minima from the starting position of the search [10]. In order to overcome local optima problems, many nature inspired algorithms such as, genetic algorithm [11], ant colony optimization [12], artificial immune system [13], artificial bee colony [9], and particle swarm optimization [14] have been used. Recently, efficient hybrid evolutionary optimization algorithms based on combining evolutionary methods and k-means to overcome local optima problems in clustering are used [15–17]. The Firefly Algorithm (FA) is a recent nature inspired technique [18], that has been used for solving nonlinear optimization problems. This algorithm is based on the behavior of social insects (fireflies). In social insect colonies, each individual seems to have its own agenda and yet the group as a whole appears to be highly organized. Algorithms based on nature have been demonstrated to show effectiveness and efficiency to solve difficult optimization problems. A swarm is a group of multi-agent systems such as fireflies, in which simple agents coordinate their activities to solve the complex problem of the allocation of communication to multiple forage sites in dynamic environments. In this study, the Firefly Algorithm (FA), which is described by Yang [18] for numerical optimization problems, is applied to clustering. To study the performance of the FA to clustering problems, we consider the standard benchmark problems (13 typical test databases) that are available in the literature [9,14]. The performance of the FA algorithm on clustering is compared with the results of other nature inspired techniques—Artificial Bee Colony (ABC) [19] and Particle Swarm Intelligence (PSO) [20] algorithm on the same test data sets [9,14]. The FA, ABC and PSO algorithms are in the same class of population-based, nature inspired optimization techniques. Hence, we compare the performance of the FA algorithm with ABC and PSO algorithms.

2210-6502/$ – see front matter Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.swevo.2011.06.003

J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

We also present the results of other nine methods used in the literature [9,14]. For the ease of understanding and comparison, we follow the same manner of analysis and discussions, used in [9]. The only key difference is the use of the FA algorithm in this study. Contribution of this paper: In this work, for a given data set the FA is used to find the cluster centers. The cluster centers are obtained by randomly selecting 75% of the given data set. This 75% of the given data set, we call as a training set. The FA algorithm uses this training set and the cluster centers are obtained. In order to study, the performance of the FA algorithm, the remaining 25% of data set is used (called test data set). The performance measure used in the FA is the classification error percentage (CEP). This CEP is defined as the ratio of number of misclassified samples in the test data set and total number of samples in the test data set. This can be done because in the test data set, we know the actual class of the test data. The distances between the given test data and the cluster centers are computed. The data is assigned to the cluster center (class) that has the minimum distance. Hence, we can compute the performance measure—classification error percentage (CEP). The paper is organized as the implementation of the FA algorithm in Section 2, clustering using the FA and performance evaluation in Sections 3 and 4 respectively, and then results presented and discussed in Section 5. We conclude the paper in Section 6 by summarizing the observations. 2. Firefly algorithm Fireflies are glowworms that glow through bioluminescence. For simplicity in describing our firefly algorithm, we now use the following three idealized rules: (i) all fireflies are unisex so that one firefly will be attracted to other fireflies regardless of their sex; (ii) an important and interesting behavior of fireflies is to glow brighter mainly to attract prey and to share food with others; (iii) attractiveness is proportional to their brightness, thus each agent firstly moves toward a neighbor that glows brighter [21]. The Firefly Algorithm (FA) [18] is a population-based algorithm to find the global optima of objective functions based on swarm intelligence, investigating the foraging behavior of fireflies. In the FA, physical entities (agents or fireflies) are randomly distributed in the search space. Agents are thought of as fireflies that carry a luminescence quality, called luciferin, that emit light proportional to this value. Each firefly is attracted by the brighter glow of other neighboring fireflies. The attractiveness decreases as their distance increases. If there is no brighter one than a particular firefly, it will move randomly. In the application of the FA to clustering, the decision variables are cluster centers. The objective function is related to the sum on all training set instances of Euclidean distance in an N-dimensional space, as given in [9]. Based on this objective function, initially, all the agents (fireflies) are randomly dispersed across the search space. The two phases of the firefly algorithm are as follows. i. Variation of light intensity: Light intensity is related to objective values [18]. So for a maximization/minimization problem a firefly with high/low intensity will attract another firefly with high/low intensity. Assume that there exists a swarm of n agents (fireflies) and xi represents a solution for a firefly i, whereas f (xi ) denotes its fitness value. Here the brightness I of a firefly is selected to reflect its current position x of its fitness value f (x) [18]. Ii = f (xi ),

1 ≤ i ≤ n.

(1)

ii. Movement toward attractive firefly: A firefly attractiveness is proportional to the light intensity seen by adjacent fireflies [18]. Each firefly has its distinctive attractiveness β which implies how strong it attracts other members of the swarm. However, the

165

attractiveness β is relative, it will vary with the distance rij between two fireflies i and j at locations xi and xj respectively, is given as rij = ‖xi − xj ‖.

(2)

The attractiveness function β(r ) of the firefly is determined by

β(r ) = β0 e−γ r

2

(3)

where β0 is the attractiveness at r = 0 and γ is the light absorption coefficient. The movement of a firefly i at location xi attracted to another more attractive (brighter) firefly j at location xj is determined by 2

xi (t + 1) = xi (t ) + β0 e−γ r (xj − xi ).

(4)

A detailed description of this FA is given in [18]. A pseudo-code of this algorithm is given below.

Pseudo-code: A High-Level Description of firefly algorithm Input: Create an initial population of fireflies n within d-dimensional search space xik , i = 1, 2, . . . , n and k = 1, 2, . . . , d Evaluate the fitness of the population f (xik ) which is directly proportional to light intensity Iik Algorithm’s parameter—β0 , γ Output: Obtained minimum location: ximin begin repeat for i = 1 to n for j = 1 to n if (Ij < Ii ) Move firefly i toward j in d-dimension using Eq. (4) end if Attractiveness varies with distance r via exp[−r 2 ] Evaluate new solutions and update light intensity using Eq. (1) end for j end for i Rank the fireflies and find the current best until stop condition true end

3. Clustering using FA The clustering methods, separating the objects into groups or classes, are developed based on unsupervised learning. In the unsupervised technique, the training data set are grouped first, based solely on the numerical information in the data (i.e. cluster centers), and are then matched by the analyst to information classes. The data sets that we tackled contain the information of classes for each data. Therefore, the main goal is to find the centers of the clusters by minimizing the objective function, the sum of distances of the patterns to their centers. For a given N objects the problem is to minimize the sum of squared Euclidean distances between each pattern and allocate each pattern to one of k cluster centers. The clustering objective function is the sum of error squared as given in Eq. (5) is described as in [22]: J (K ) =

K − − (xi − ck ) k=1 i∈ck

(5)

166

J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

where K is the number of clusters, for a given n pattern xi (i = 1, . . . , n) the location of the ith pattern and ck (k = 1, . . . , K ) is the kth cluster center, to be found by Eq. (6): ck =

− xi i∈Ck

20

(6)

nk

15

1

DTrain



DTrain j=1

CLknown (xj )

d(xj , pi

y

where nk is the number of patterns in the kth cluster. The cluster analysis forms the assignment of data set into clusters so that it can be grouped into same cluster based on some similarity measures [23]. Distance measurement is most widely used for evaluating similarities between patterns. The cluster centers are the decision variables which are obtained by minimizing the sum of Euclidean distance on all training set instances in the d-dimensional space between generic instance xi and the center of the cluster ck . The cost (objective) function for the pattern i is given by Eq. (7), as in [9,14] fi =

training data testing data

Class 1

10

5

0

5

10

15

20

25

x Fig. 1. Data distribution.

)

(7)

where DTrain is the number of training data set which is used to normalize the sum that will range any distance within [0.0, 1.0] CLknown (xj )

4.1.2. Classification efficiency Classification efficiency is obtained using both the training and test data. The classification matrix is used to obtain the statistical measures for the class-level performance (individual efficiency) and the global performance (average and overall efficiency) of the classifier [24]. The individual efficiency is indicated by the percentage classification which tells us how many samples belonging to a particular class have been correctly classified. The percentage classification (ηi ) for the class ci is given by Eq. (9).

and pi defines the class that instance belongs to according to database. Note that in our FA algorithm, the decision variables are the cluster centers. The objective function in our FA algorithm is given by Eq. (7). In our study, we consider the standard 13 benchmark problems given in [14]. For a given data set, let n be the number of data points, d be the dimension, c be the number of classes. A given data point belongs to only one of these c classes. Of the given data set, 75% of the data set are randomly selected to obtain the cluster centers using Eq. (7). In this way we obtain the cluster centers for all the c classes. The remaining 25% of data set is used (called test data set) to obtain the classification error percentage (CEP). An illustrative example of this FA algorithm and its performance measure, is given in the next section.

ηi =

4. Performance measures and an illustrative example

ηa =

As discussed in the earlier section, the training data sets are used in the firefly algorithm to extract knowledge of each class in the form of cluster centers. Using these cluster centers, the testing data set are classified and the performance of classification are analyzed. 4.1. Performance evaluation

4.1.1. Classification Error Percentage (CEP) CEP is obtained only using the test data [9]. For each problem, we report the CEP which is the percentage of incorrectly classified patterns of the test data sets as given in [9], to make a reliable comparison. The classification of each pattern is done by assigning it to the class whose distance is closest to the center of the clusters. Then, the classified output is compared with the desired output and if they are not exactly the same, the pattern is separated as misclassified [9]. This procedure is applied to all test data and the total misclassified pattern number is percentaged to the size of test data set, which is given by number of misclassified samples total size of test data set

qii n ∑

(9)

qji

j =1

where qii is the number of correctly classified samples and n is the number of samples for the class ci in the data set. The global performance measures are the average (ηa ) and overall (ηo ) classification, which are defined as

ηo =

nc 1 −

nc i=1

ηi

(10)

nc 1 −

qii (11) N i =1 where nc is the total number of classes and N is the number of patterns. 4.2. Illustrative example

The performance of the extracted knowledge in the form of cluster centers by the FA is evaluated using Classification Error Percentage (CEP) and classification efficiency. CEP depends only on test data and the classification efficiency depends on both training and testing data.

CEP =

Class 2

× 100.

(8)

We illustrate how the Firefly Algorithm (FA) is used for clustering with the following synthetic data. Although the proposed algorithm can be used for any type of mixture model, we focus on a Gaussian mixture. Let us consider two Gaussian mixtures that have two input features, namely x and y. Here, the mean values µ1 = [8, 8]T and µ2 = [16, 16]T , co-variance matrix (x, y) = {(6, 3); (3, 2)} are assumed and each class have equal number of samples. In our experimentation 100 samples are generated randomly for each class. Of these 75 data points are used for training and the remaining 25 is used for testing in each class. This synthetic data generated is shown in Fig. 1. We use the firefly algorithm on training data to obtain cluster centers. Let xi be one of the solutions (cluster centers) and Ji be the objective function value for this cluster center. We consider a population size of 5 fireflies at locations x1 , x2 , x3 , x4 and x5 within 2d-dimensional search space. Now evaluate the fitness of the population J1 , J2 , J3 , J4 , and J5 using Eq. (7) which is directly proportional to light intensity I1 , I2 , I3 , I4 and I5 . Now compare the intensity values of a firefly, if (I2 < I1 ) then move firefly 2 toward 1 using Eq. (4), similarly compare all the agents

J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

18

Table 1 Properties of the problems.

16 Balance Cancer Cancer-Int Credit Dermatology Diabetes E.Coli Glass Heart Horse Iris Thyroid Wine

14

y

Agent

12

Agent movement Cluster center

10

8

6

167

4

6

8

10

12 x

14

16

18

20

Fig. 2. Optimal cluster centers.

and update the movement phase of each agent by evaluating new solutions and update light intensity using Eq. (1). This procedure is continued till it converges to the optimal cluster center i.e. ximin , as shown in Fig. 2. The cluster centers generated are

(x, y) = {(7.2233, 7.6659); (16.0580, 16.3230)}. The classification result using the testing data set of each class centers found by the firefly algorithm has zero classification error percentage. For the entire data set, the performance of individual, average and overall efficiency is 100%. 5. Results and discussion In this work, we present the results obtained using the Firefly Algorithm (FA) on 13 typical benchmark data sets which are well known in the literature (UCI database repository [25]). First, we describe the characteristics of the standard classification data set. Next we present the results obtained from the FA for 13 benchmark data set problems. Finally we present the comparison of the FA with other two nature inspired techniques—Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) and other 9 methods used in the literature [9,14] and analyze their performance. 5.1. Data set description The 13 classification data set is a well-known and well-used benchmark data set by the machine learning community. The number of data sets, the number of input features and the number of classes are presented in Table 1. These 13 benchmark problems are chosen exactly the same as in [9,14], to make a reliable comparison. The entire data set is segregated into two parts, the 75% of data is used for training purpose and the remaining 25% of data is used as testing samples. The number of the training and test sets can be found in Table 1. After training, we obtain the cluster centers (extracted knowledge) that can be used for classifying the test data set. The problems considered in this work can be described briefly as follows. Data set 1: The Balance data set is based on balance scale weight and distance. It contains 625 patterns which are split into 429 for training and 156 for testing. Their are 4 integer valued attributes and 3 classes. Data set 2 and 3: The Cancer and Cancer-Int data set is based on the diagnosis of ‘‘breast cancer Wisconsin—Diagnostic’’ and ‘‘breast cancer Wisconsin—Original’’ data sets respectively. It contains 2 classes with a tumor as either benign or malignant. A cancer data

Data

Train

Test

Input

Class

625 569 699 690 366 768 327 214 303 364 150 215 178

469 427 524 518 274 576 245 161 227 273 112 162 133

156 142 175 172 92 192 82 53 76 91 38 53 45

4 30 9 15 34 8 7 9 35 58 4 5 13

3 2 2 2 6 2 5 6 2 3 3 3 3

set contains 569 patterns with 30 attributes and the Cancer-Int contains 699 patterns, 9 attributes. Data set 4: The Credit data set is based on the Australian credit card to assess applications for credit cards. There are 690 patterns (number of applicants), 15 input features and the output has 2 classes. Data set 5: The Dermatology data set is based on differential diagnosis of erythemato-squamous diseases. There are 6 classes, 366 samples, and 34 attributes. Data set 6: The Pima—Diabetes data set has 768 instances of 8 attributes and two classes which are to determine if the detection of diabetes is positive (class A) or negative (class B). Data set 7: The Escherichia coli data set is based on the cellular localization sites of proteins. Here the original data set has 336 patterns formed of 8 classes, but 3 classes are represented with only 2, 2 and 5 number of patterns. Therefore, these 9 examples are omitted by considering 327 patterns, 5 classes and 7 attributes. Data set 8: The Glass data set is defined in terms of their oxide content as glass type. Nine inputs are based on 9 chemical measurements with one of 6 types of glass. The data set contains 214 patterns which are split into 161 for training and 53 for testing. Data set 9: The Heart data set is based on the diagnosis of heart disease. It contains 76 attributes for each pattern, 35 of which are used as input features. The data is based on Cleveland Heart data from the repository with 303 patterns and 2 classes. Data set 10: The Horse data set is used to predict the fortune of a horse with a colic and to classify whether the horse will die, will survive, or will be euthanized. The data set contains 364 patterns, each of which has 58 inputs from 27 attributes and 3 classes. Data set 11: The Iris data set consists of three varieties of flowers—setosa, virginica and versicolor. There are 150 instances and 4 attributes that make up the 3 classes. Data set 12: The Thyroid data set is based on the diagnosis of thyroid whether it is hyper or hypofunction. The data set contains 215 patterns, 5 attributes and 3 classes. Data set 13: The Wine data obtained from the chemical analysis of wines were derived from three different cultivators. The data set contains 3 types of wines, with 178 patterns and 13 attributes. 5.2. Results obtained using firefly algorithm In this section, we discuss the results obtained using the Firefly Algorithm (FA) on 13 benchmark data set problems and compare the FA with other 11 methods used in the literature based on the performance measures. 5.2.1. FA clustering and parameter setting The fireflies are initialized randomly in the search space. The parameter values used in our algorithm are

168

J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

Table 2 Average classification error percentages using nature inspired techniques on test data sets.

Balance Cancer Cancer-Int Credit Dermatology Diabetes E.Coli Glass Heart Horse Iris Thyroid Wine

FA

ABC

PSO

14.1 1.06 0 12.79 5.43 21.88 8.54 37.74 13.16 32.97 0 0 0

15.38 2.81 0 13.37 5.43 22.39 13.41 41.5 14.47 38.26 0 3.77 0

25.47 5.8 2.87 22.96 5.76 22.5 14.63 39.05 17.46 40.98 2.63 5.55 2.22

Number of fireflies (N ) = 20 Attractiveness (β0 ) = 1 Light absorption coefficient (γ ) = 1 Number of generations (T ) = 100. For most of the applications, the same parameter values are suggested by Yang [18]. After the fireflies are deployed randomly with in the search space, the parameter β0 = 1 which is equivalent to the scheme of cooperative local search with the brightest firefly strongly determined the other fireflies positions, especially in its neighborhood. The parameter value of γ = 1 determines the variation of light intensity with increasing distance from the communicated firefly, results in the complete random search. The number of function evaluations in the firefly algorithm can be obtained as follows: let N be the size of initial population, and T be the maximum number of generation. Then the number of N ∗(N −1) function evaluations for each iteration is . The total number 2 N ∗(N −1)

of function evaluations generated is × T . In our studies, 2 we have used 100 as the maximum number of generations. The number of function evaluation for each 13 classification data set, (with N = 20 and T = 100) in one simulation run is 19 000. 5.2.2. Analysis of Classification Error Percentage using FA In [9,14], the Classification Error Percentage (CEP) measure is used with all the 13 benchmark data sets. Falco et al. [14] compared the performance of the PSO algorithm with the other 9 methods namely Bayes Net [26], MultiLayer Perceptron Artificial Neural Network (MLP) [27], Radial Basis Function Artificial Neural Network (RBF) [28], KStar [29], Bagging [30], MultiBoostAB [31], Naive Bayes Tree (NBTree) [32], Ripple Down Rule (Ridor) [33] and Voting Feature Interval (VFI) [34]. Karaboga and Ozturk [9] implemented the ABC algorithm and analyzed CEP with all the above mentioned methods. In this study, in addition to these methods [9,14] we have analyzed the CEP measure of the FA to make reliable comparison. From the training data set the knowledge in the form of cluster centers is obtained using the Firefly Algorithm (FA). For these cluster centers the testing data sets are applied and the CEP values are obtained. The results of the nature inspired techniques—FA, ABC and PSO for the problems are given in Table 2 where CEP values are presented. The FA outperforms the ABC and PSO algorithms in all 13 problems, whereas ABC algorithm’s result is better than that of PSO algorithm in all 12 problems except for one problem (the glass problem) in terms of classification error. Moreover, the average classification error percentages is also better for all problems in the case of FA (11.36%) comparing to that of ABC (13.13%) and PSO (15.99%). From Table 3, we can observe that the CEP measure of the FA and 11 methods that are given in [9,14] are presented, and the ranking is based on the ascending order of average classification

error of the classifiers on each problem are also given in the parenthesis. At a glance, one can easily see that the FA gets the best solution in 8 of the problems among 13 problems used. To be able to make a good comparison of all the algorithms, Tables 4 and 5 are reported. Table 4 shows the average classification errors of all problems and the ranking is based on the ascending order of average classification error. Table 5 shows the sum of the algorithms ranking of each problem in ascending order. From Table 4, we can observe that based on average CEP values, FA is the best in comparison with that of MLP artificial neural network technique and ABC, while MLP performed better in comparison with the ABC. However, even if the results in the table are comparable, we believe that it may cause some significant points to be disregarded since the distribution of the error rates are not proportional. Therefore, the general ranking of the techniques in Table 5 is realized by calculating the sum of the ranks of each problem from Table 4. From this ranking, once again the FA is the best, while the ABC algorithm at the second position, and the BayesNet technique at the third position. The classification error rate and rankings from the tables show that clustering with the FA offers superior generalization capability. Note that, here we are using only the results of all the other methods given in earlier studies [9,14] except the FA algorithm. 5.2.3. Analysis of classification efficiency using FA In the previous section, we presented the result obtained using CEP. This (CEP) alone does not direct how far the algorithm is efficient. To analyze any classifier it is always important to check the individual classification efficiency for testing sample and also average and overall efficiency for complete data set. Using the same cluster centers the average and overall efficiency for entire data set are obtained. i. Significance of individual efficiency: For a testing data set, the main significance of individual efficiency is to analyze the class-level performance of a classifier algorithm. From Table 6, we can observe that the individual classification efficiency of the testing samples, here Cancer-Int, Iris, Thyroid and Wine is getting classified without any misclassifications and hence has an individual efficiency of 100%. In the case of Balance data set the individual efficiency of Class 2 is 66.7%. In Credit and Diabetes data set class 1 is misclassified as class 2 with individual efficiency of 70% where as in Heart and Dermatology data set Class 2 has less individual efficiency of 73.1% and 50% respectively. Form Table 3 we can observe that, for the classification problem—Heart (2 class problem), the FA performed better than all the other classifier with the CEP value 13.16. This does not mean that the individual efficiency of each class to be good. To illustrate this in more detail, let us consider Heart data set, from Table 6 we can observe that the Class 1 has impressive individual efficiency of 94% whereas in Class 2 most of the samples belonging to Class 2 is misclassified as Class 1 with individual efficiency 73.1%. Hence it is important to consider the individual efficiency to analyze the class-level performance of a clustering algorithm. ii. Performance of average and overall efficiency: For entire data set, it is always necessary to know the global performance of a algorithm. This can be achieved by using average and overall efficiency. As we can notice from Table 6, for the entire data set the average and overall efficiency using the firefly algorithm for 13 benchmark data sets. The Balance data set has an average and overall efficiency of 74.9% and 80.8% respectively where as Cancer data set with average and overall efficiency of 91% and 92.5% respectively. An average and overall efficiency of CancerInt, Dermatology, Iris and Wine data set are 97.9%, 81.9%, 94.7% and 90.6% respectively. The average efficiency of Credit, Diabetes, E.Coli, Heart and Thyroid are 75.5%, 73.4%, 88.5%, 77.1% and 92.6%. The overall efficiency of Credit, Diabetes, E.Coli, Heart and Thyroid

J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

169

Table 3 Average classification error percentages and ranking of the techniques given in and the FA algorithm on each problem.

Balance Cancer Cancer-Int Credit Dermatology Diabetes E.Coli Glass Heart Horse Iris Thyroid Wine

FA

ABC

PSO

BayesNet

MlpAnn

RBF

KStar

Bagging

MultiBoost

NBTree

Ridor

VFI

14.1 (3) 1.06 (1) 0 (1) 12.79 (5) 5.43 (6) 21.88 (1) 8.54 (1) 37.73 (7) 13.16 (1) 32.97 (6) 0 (1) 0 (1) 0 (1)

15.38 (5) 2.81 (3) 0 (1) 13.37 (6) 5.43 (6) 22.39 (2) 13.41 (2) 41.50 (10) 14.47 (2) 38.26 (8) 0 (1) 3.77 (3) 0 (1)

25.47 (10) 5.80 (8) 2.87 (3) 22.96 (11) 5.76 (8) 22.50 (3) 14.63 (4) 39.05 (8) 17.46 (3) 40.98 (11) 2.63 (8) 5.55 (4) 2.22 (5)

19.74 (6) 4.19 (5) 3.42 (4) 12.13 (2) 1.08 (1) 25.52 (4) 17.07 (6) 29.62 (5) 18.42 (4) 30.76 (2) 2.63 (8) 6.66 (6) 0 (1)

9.29 (1) 2.93 (4) 5.25 (8) 13.81 (7) 3.26 (3) 29.16 (8) 13.53 (3) 28.51 (4) 19.46 (7) 32.19 (5) 0 (1) 1.85 (2) 1.33 (4)

33.61 (11) 20.27 (12) 8.17 (12) 43.29 (12) 34.66 (11) 39.16 (12) 24.38 (11) 44.44 (11) 45.25 (12) 38.46 (9) 9.99 (12) 5.55 (4) 2.88 (8)

10.25 (2) 2.44 (2) 4.57 (6) 19.18 (10) 4.66 (5) 34.05 (10) 18.29 (9) 17.58 (1) 26.70 (11) 35.71 (7) 0.52 (6) 13.32 (11) 3.99 (9)

14.77 (4) 4.47 (6) 3.93 (5) 10.68 (1) 3.47 (4) 26.87 (6) 15.36 (5) 25.36 (3) 20.25 (8) 30.32 (1) 0.26 (5) 14.62 (12) 2.66 (7)

24.20 (9) 5.59 (7) 5.14 (7) 12.71 (4) 53.26 (12) 27.08 (7) 31.70 (12) 53.70 (12) 18.42 (4) 38.46 (9) 2.63 (8) 7.40 (7) 17.77 (12)

19.74 (6) 7.69 (11) 5.71 (10) 16.18 (8) 1.08 (1) 25.52 (4) 20.73 (10) 24.07 (2) 22.36 (9) 31.86 (3) 2.63 (8) 11.11 (9) 2.22 (5)

20.63 (8) 6.36 (9) 5.48 (9) 12.65 (3) 7.92 (10) 29.31 (9) 17.07 (6) 31.66 (6) 22.89 (10) 31.86 (3) 0.52 (6) 8.51 (8) 5.10 (10)

38.85 (12) 7.34 (10) 5.71 (10) 16.47 (9) 7.60 (9) 34.37 (11) 17.07 (6) 41.11 (9) 18.42 (4) 41.75 (12) 0 (1) 11.11 (9) 5.77 (11)

Table 4 Average classification error percentages and general ranking of the techniques on all problems.

Average Rank

FA

ABC

PSO

BayesNet

MlpAnn

RBF

KStar

Bagging

MultiBoost

NBTree

Ridor

VFI

11.36 1

13.13 3

15.99 9

13.17 4

12.35 2

26.93 12

14.71 6

13.3 5

22.92 11

14.68 7

15.38 8

18.89 10

Table 5 The sum of ranking of the techniques and general ranking based on the total ranking.

Total Rank

FA

ABC

PSO

BayesNet

MlpAnn

RBF

KStar

Bagging

MultiBoost

NBTree

Ridor

VFI

35 1

50 2

86 6

54 3

57 4

137 12

89 8

67 5

110 10

86 7

97 9

113 11

Table 6 FA best classification efficiency. Balance

Cancer

Cancer-Int

Credit

Dermatology

Diabetes

E.Coli

Glass

Heart

Horse

Iris

Thyroid

Wine

η1 η2 η3 η4 η5 η6

90.3 66.7 88.9 – – –

96.7 100 – – – –

100 100 – – – –

70 92.4 – – – –

96.7 50 100 100 100 100

70 82.8 – – – –

100 80 100 80 94.7 –

66.7 40 60 80 100 90

94 73.1 – – – –

81.7 40.9 44.4 – – –

100 100 100 – – –

100 100 100 – – –

100 100 100 – – –

ηa ηo

74.9 80.8

91 92.5

97.9 97.9

75.5 78.7

81.9 81.9

73.4 75.9

88.5 89.2

70.8 61.6

77.1 78.7

53.7 66.1

94.7 94.7

92.6 94.2

90.6 90.6

are 78.7%, 75.9%, 89.2%, 78.7% and 94.2%. In the case of Glass and Horse data set the FA has a less average efficiency of 70.8% and 53.7% respectively and overall efficiency of 61.6% and 66.1% respectively. 5.2.4. Comparison of classification efficiency of nature inspired technique using Iris data set Form Table 6 we can observe that, for the standard benchmark problems—Cancer-Int, Iris, Thyroid and Wine their is no misclassification in any of their individual classes i.e. all the test data set are classified correctly and hence CEP value is 0. This does not mean that the overall efficiency is 100%. To illustrate this in more detail, let us consider Iris data set as compared to Cancer-Int, Thyroid and Wine, it has less input features. The FA, ABC and PSO are

in the same class of population-based, nature inspired optimization techniques. Here we compare the three nature inspired technique to extract knowledge in the form of cluster centers and the performance is analyzed using classification efficiency. The cluster centers generated using the FA, ABC and PSO for Iris training data are shown in Table 7. Here the cluster centers obtained for ABC matches with the published literature [35]. The parameter value used for the FA is as given in Section 5.2.1. For ABC and PSO we consider the parameter value as in [9,14] respectively. From Table 8, we can observe in comparison with other nature inspired techniques the optimal and mean fitness values obtained using the FA is better than ABC and PSO. Being a continuous optimization problem, initially every possible cluster centers picked by the population is not the best optimal point in the search space. Therefore the selection of new cluster centers after fitness

170

J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171

Table 7 Optimal cluster center for Iris data set using nature inspired technique. Nature inspired technique FA

ABC

PSO

Feature A

Feature B

Feature C

Feature D

5.0538 5.9432 6.3929 5.0061 5.9026 6.8501 5.0416 6.0431 6.9410

3.4813 2.7627 2.9021 3.4171 2.7483 3.0737 3.4433 2.9738 2.9321

1.4665 4.4195 5.1321 1.4641 4.3937 5.7433 1.4796 4.327 5.4664

0.2895 1.4106 2.3328 0.2466 1.4349 2.0711 0.2475 1.3649 1.388

Table 8 Results of nature inspired techniques after 20 runs for optimal cluster centers using Iris data set. Nature inspired technique

Optimal

Worst

Average

Standard deviation

FA ABC PSO

0.4880 0.4891 0.4943

0.5928 0.4957 0.9115

0.4897 0.4903 0.5186

0.0087 0.0018 0.0668

Table 9 Comparison of classification efficiency for Iris data set. Classification efficiency

FA (%)

ABC (%)

PSO (%)

η1 η2 η3

100 96 88

100 96 72

100 92 74

ηa ηo

94.7 94.7

89.3 89.3

88.7 88.7

evaluation is effectively scanned iteratively till all the particles converge to an optimal result i.e. in the form of cluster centers. In the FA algorithm, a firefly particle moves toward another firefly particle, which has a better objective function value (fitness). The distance moved by the firefly in each instance is given by the distance between the two firefly particles (r). When the value of r is large/small, the firefly will move a small/large distance. This will affect the computation time of this algorithm. In PSO each particle will move a distance based on its personal best and the global best. In ABC each bee particle position will be compared twice with best particle position. Further the cluster centers generated by these algorithm is analyzed using the distances between the given data and the cluster centers are computed. The data is assigned to the cluster center (class) that has the minimum distance. The performance measure will helps us to examine which method has generated the optimal cluster centers. The classification matrix for the entire Iris data set are shown in Table 9. From this table, we can observe that, for all the nature inspired algorithms, samples belonging to Class 2 and Class 3 are getting misclassified as Class 3 and Class 2 respectively. For the FA and ABC generated optimal cluster centers, the performance of individual efficiency of Class 2 is 96% where as PSO has 92%. The individual efficiency of Class 3 using FA is 88% which is better in comparison with that of ABC and PSO with 72% and 74% respectively. In all the algorithms, Class 1 is classified without any misclassification and hence have individual efficiency of 100%, as it is linearly separable in comparison with that of other 2 classes. Also, the average and overall efficiency is better in the case of the FA with 94.7% in comparison to ABC and PSO is 89.3% and 88.7% respectively. Hence it is important to consider the individual, average and overall efficiency in multi-class classification problem for a generated cluster centers. It is important to note that the performance of clustering mainly depends on the size and quality of training data set. There are some methods available in the literature for the selection of training data

set [36]. Earlier study showed that the proper selection of training data set improves the performance measure [36]. In our study, we have selected 75% of training data set randomly and tabulated the result based on the most favorable performance measure for the selected training data set. In overall for most of the data set, the FA has good global performance. We can claim that by looking at the accuracy and robustness of FA, it can be used for clustering problems studied in this paper. 6. Discussions and conclusions This paper investigates a new nature inspired algorithm— the FA is used for clustering and evaluating its performance. The FA algorithm is compared with ABC and PSO as all these methods are in the same class of population-based, nature inspired optimization techniques. As in other population-based algorithms, the performance of the FA depends on the population size, β , and γ . In the FA algorithm, a firefly particle move toward another firefly particle, which has a better objective function value (fitness). The distance moved by the firefly in each instance is given by the distance between the two firefly particles (r). The effect of the values of β and γ are discussed in [18]. When the value of r is large/small, the firefly will move a small/large distance. This will affect the computation time of this algorithm. In PSO each particle will move a distance based on its personal best and the global best. In ABC each bee particle position will be compared twice with best particle position. In the FA algorithm only the distance is necessary for the movement. The performance measure (CEP), will helps us to examine which method has generated the optimal cluster centers. The clustering task of 13 benchmark data sets are accomplished successfully by the procedure of partitional clustering using a recent nature inspired technique—Firefly Algorithm (FA). Clustering is an important technique to identify homogeneous clusters (or classes) such that the patterns for a cluster center share a high degree of affinity while being very dissimilar for other clusters. The performance of the FA using classification error percentage is compared with other two nature inspired techniques—Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) and other nine methods which are widely used by the researchers. The performance measure using classification efficiency—individual, average and overall efficiency of the FA is analyzed using 13 benchmark problems. From the results obtained, we can conclude that the FA is an efficient, reliable and robust method, which can be applied successfully to generate optimal cluster centers. Acknowledgments The authors would like to thank the reviewers for their comments which were useful during the revision of this study. References [1] M.R. Anderberg, Cluster Analysis for Application, Academic Press, New York, 1973. [2] J.A. Hartigan, Clustering Algorithms, Wiley, New York, 1975. [3] P.A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, PrenticeHall, London, 1982. [4] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, 1988. [5] H. Frigui, R. Krishnapuram, A robust competitive clustering algorithm with applications in computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999) 450–465. [6] Y. Leung, J. Zhang, Z. Xu, Clustering by scale-space filtering, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 1396–1410. [7] D. Chris, Xiaofeng He, Cluster merging and splitting in hierarchical clustering algorithms, in: Proc. IEEE ICDM, 2002, pp. 1–8. [8] B. Mirkin, Mathematical Classification and Clustering, Kluwer Academic Publishers, Dordrecht, 1996. [9] D. Karaboga, C. Ozturk, A novel cluster approach: Artificial Bee Colony (ABC) algorithm, Applied Soft Computing 11 (1) (2010) 652–657.

J. Senthilnath et al. / Swarm and Evolutionary Computation 1 (2011) 164–171 [10] S.Z. Selim, M.A. Ismail, K -means type algorithms: a generalized convergence theorem and characterization of local optimality, IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984) 81–87. [11] E. Falkenauer, Genetic Algorithms and Grouping Problems, Wiley, Chichester, 1998. [12] Y. Kao, K. Cheng, An ACO-based clustering algorithm, in: M. Dorigo, et al. (Eds.), ANTS, in: LNCS, vol. 4150, Springer, Berlin, 2006, pp. 340–347. [13] R. Younsi, W. Wang, A new artificial immune system algorithm for clustering, in: Z.R. Yang (Ed.), LNCS, vol. 3177, Springer, Berlin, 2004, pp. 58–64. [14] IDe Falco, A.D. Cioppa, E. Tarantino, Facing classification problems with particle swarm optimization, Applied Soft Computing 7 (3) (2007) 652–658. [15] T. Niknam, B. Amiri, J. Olamaei, A. Arefi, An efficient hybrid evolutionary optimization algorithm based on PSO and SA for clustering, Journal of Zhejiang University: Science A 10 (4) (2009) 512–519. [16] T. Niknam, E. Taherian Fard, N. Pourjafarian, A.R. Rousta, An efficient hybrid algorithm based on modified imperialist competitive algorithm and k-means for data clustering, Engineering Applications of Artificial Intelligence 24 (2) (2011) 306–317. [17] T. Niknam, B. Amiri, An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis, Applied Soft Computing 10 (1) (2010) 183–197. [18] X.S. Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, 2008. [19] D. Karaboga, A. Basturk, On the performance of Artificial Bee Colony (ABC) algorithm, Applied Soft Computing 8 (1) (2008) 687–697. [20] J. Kennady, R.C. Eberhart, Particle swarm optimization, in: IEEE Intl. Conf. on Neural Networks, vol. 4, 1995, pp. 1942–1948. [21] J. Tyler, Glow-worms. http://website.lineone.net/galaxypix/Tylerbookpt1. html. [22] Y. Marinakis, M. Marinaki, M. Doumpos, N. Matsatsinis, C. Zopounidis, A hybrid stochastic genetic-GRASP algorithm for clustering analysis, Operational Research An International Journal 8 (1) (2008) 33–46. [23] A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review, ACM Computing Surveys 31 (3) (1999) 264–323.

171

[24] S. Suresh, N. Sundararajan, P. Saratchandran, A sequential multi-category classifier using radial basis function networks, Neurocomputing 71 (7–9) (2008) 1345–1358. [25] C.L. Blake, C.J. Merz, University of California at Irvine Repository of Machine Learning Databases, 1998. http://www.ics.uci.edu/mlearn/MLRepository. html. [26] F. Jensen, An Introduction to Bayesian Networks, UCL Press, Springer-Verlag, 1996. [27] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representation by back propagation errors, Nature 323 (1986) 533–536. [28] M.H. Hassoun, Fundamentals of Artificial Neural Networks, The MIT Press, Cambridge, 1995. [29] J.G. Cleary, L.E. Trigg, K*: an instance-based learner using an entropic distance measure, in: Proceedings of the 12th International Conference on Machine Learning, 1995, p. 108–114. [30] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140. [31] G.I. Webb, Multiboosting: a technique for combining boosting and wagging, Machine Learning 40 (2) (2000) 159–196. [32] R. Kohavi, Scaling up the accuracy of Naive–Bayes classifiers: a decision tree hybrid, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1996, pp. 202–207. [33] P. Compton, R. Jansen, Knowledge in context: a strategy for expert system maintenance, in: Proceedings of Artificial Intelligence, in: LNAI, vol. 406, Springer-Verlag, Berlin, 1988, pp. 292–306. [34] G. Demiroz, A. Guvenir, Classification by voting feature intervals, in: Proceedings of the Seventh European Conference on Machine Learning, 1997, pp. 85–92. [35] C. Zhang, D. Ouyang, J. Ning, An artificial bee colony approach for clustering, Expert Systems with Applications 37 (7) (2010) 4761–4767. [36] T. Yoshida, S. Omatu, Neural network approach to land cover mapping, IEEE Transactions on Geoscience and Remote Sensing 32 (5) (1994) 1103–1109.