A prototype classifier based on gravitational search algorithm

Applied Soft Computing 12 (2012) 819–825 Contents lists available at SciVerse ScienceDirect Applied Soft Computing journal homepage: www.elsevier.co...

Download PDF

373KB Sizes 7 Downloads 216 Views

Report

PDF Reader
Full Text

Applied Soft Computing 12 (2012) 819–825

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

A prototype classiﬁer based on gravitational search algorithm Abbas Bahrololoum, Hossein Nezamabadi-pour ∗ , Hamid Bahrololoum, Masoud Saeed Department of Electrical Engineering, Shahid Bahonar University of Kerman, P.O. Box 76169-133, Kerman, Iran

a r t i c l e

i n f o

Article history: Received 4 October 2010 Received in revised form 4 April 2011 Accepted 23 October 2011 Available online 31 October 2011 Keywords: Classiﬁcation Prototype classiﬁer Swarm intelligence Gravitational search algorithm UCI machine learning repository

a b s t r a c t In recent years, heuristic algorithms have been successfully applied to solve clustering and classiﬁcation problems. In this paper, gravitational search algorithm (GSA) which is one of the newest swarm based heuristic algorithms is used to provide a prototype classiﬁer to face the classiﬁcation of instances in multiclass data sets. The proposed method employs GSA as a global searcher to ﬁnd the best positions of the representatives (prototypes). The proposed GSA-based classiﬁer is used for data classiﬁcation of some of the well-known benchmark sets. Its performance is compared with the artiﬁcial bee colony (ABC), the particle swarm optimization (PSO), and nine other classiﬁers from the literature. The experimental results of twelve data sets from UCI machine learning repository conﬁrm that the GSA can successfully be applied as a classiﬁer to classiﬁcation problems. © 2011 Elsevier B.V. All rights reserved.

1. Introduction A classiﬁcation problem is the task of assigning objects to one of several predeﬁned categories. From the mathematical point of view, it is deﬁned as a mapping from the input feature space into a set of labels. There are many classiﬁcation techniques that have been developed by researchers in the ﬁeld of machine learning (ML) which is the study of developing computer programs that improve their performance through the experience they gain over sets of prepared data known as data sets. Among these techniques, we can name binary classiﬁers [1–4], decision tree classiﬁers [5–7], artiﬁcial neural network (ANN) classiﬁers [8–10], Bayesian classiﬁers [1,11], support vector machine (SVM) classiﬁers [12–16], and instance (prototype) based classiﬁers [17–22] that we shed more light on it later in this paper. The choice of proper classiﬁcation technique depends on factors such as having either noisy or noise free data, having either discrete-valued or real-valued inputs. It also depends on the type of hypothesis space representations, the choice of suitable inductive bias, prior knowledge about input data class probabilities (for instance, parameters or shapes of the probability density functions), the size of the dataset, the dimension of the inputs, the complexity of the classiﬁer, etc. More details can be found in [1]. There are a number of optimization algorithms that have been used for classiﬁcation problems. The main reason for that, perhaps,

∗ Corresponding author. Tel.: +98 341 3235900; fax: +98 341 3235900. E-mail address: [email protected] (H. Nezamabadi-pour). 1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2011.10.008

comes from the nature of ML techniques that have uncertainty within themselves. Since the accuracy of such techniques largely depends on the qualiﬁcation of training, they need to be trained well. The training process has a gradual form in which as the data set members are fed into a relative ML technique, it becomes more experienced, and simultaneously its performance improves. This gradual improvement is usually achieved by a search through a high dimensional space called “hypothesis space” [1]. It should be noted that different ML techniques may have different hypothesis space representations. For example, in concept learning that is a binary classiﬁer, it is in the form of lattice structure, and in genetic algorithms (GAs) usually, it is the space formed of bit strings. The common characteristic of all the techniques is that they search through their own hypothesis spaces using different semantics or strategies behind their algorithms. Such a space may have some local minima within itself. A side effect of a bad training can be convergence to a local optimum, leading to the poor performance of the relative technique. For instance, in the case of classiﬁcation, it may lead to a signiﬁcant increase in the number of misclassiﬁcations. In other words, this means that every ML technique has an optimization part of its type within itself. Having trained that ML technique properly, the correct convergence of that ML technique in that hypothesis space is attained. Here by the term “training properly”, we mean an expansion to the search mechanism of the relative ML technique. Such an expansion is expected to result in better manipulation of the problems, such as trapping into local minima, in searching hypothesis spaces. At this point it sounds reasonable that some researchers have incorporated other optimization techniques with some ML techniques to enhance their performance in terms of accuracy or speed of convergence.

820

A. Bahrololoum et al. / Applied Soft Computing 12 (2012) 819–825

As brieﬂy pointed out in the above part, one type of classiﬁers is called prototype (instance based) classiﬁers. These classiﬁers are referred to as model-free classiﬁcation techniques. The training data in prototype classiﬁers are represented in the form of a set of points in feature space, but prototypes are usually chosen to be different from training examples. Just for the case of one-nearestneighbor (1-NN), there is an exception [17]. Each prototype has a class label, and classiﬁcation of each new instance is carried out by ﬁnding the closest prototype using some deﬁned distance measures (typically Euclidean distance). There are some variations of prototype methods such as K-means clustering [23,24], learning vector quantization (LVQ) [25,26], Gaussian mixtures (GM) [27–30] and K-nearest neighbor (K-NN) [31–33]. They vary based on the following two criteria: ﬁrst the number of prototypes used and second the way by which prototypes are selected. In K-means clustering (KMC), the goal is to ﬁnd clusters and their centers in such a way that minimizes each inner cluster variance. After we choose randomly the initial cluster centers, the algorithm then is repeated over the following two steps to meet the convergence. First, training points are spread on associated clusters. Second, previous cluster centers are replaced with means of their relative cluster instances (i.e. for each dimension of cluster points, representing a feature, this mean is calculated). Clustering can be considered as a classiﬁcation task. It is cast into a classiﬁcation task in three steps. First, running KMC on each class. Next, each prototype is assigned a label. Last, the classiﬁcation of each new instance is carried out according to the label of nearest prototype. In LVQ, sampled training points can either repel or attract their closest prototypes classifying them. The degree of attraction and repellent is controlled through a parameter called “learning rate”. In GM, each cluster is represented by a Gaussian density with a centroid and a covariance matrix. An algorithm that is called Expectation Maximization (EM) runs to adjust cluster parameters in an iterative manner in two steps. In the ﬁrst step of the algorithm, each instance (observation) is given a weight based on the likelihood of Gaussians. The likelihood shows which cluster the instance belongs to and what degree it is. In the second step of EM, observations contribute to the centroids and covariance matrices. After ﬁnishing this kind of optimization using EM, the best parameters of Gaussians are identiﬁed and the classiﬁcation of the instances can be carried out calculating posterior probabilities. In K-NN, which is a memory based method; the classiﬁcation of an instance is carried out using the majority vote of K nearest neighbors of it. The neighbors are identiﬁed by deﬁning a distance metric. In the case of K-NN, no optimization is required. For more details, along with a comparative study on the above prototype methods, see [17]. There are a number of researchers who have applied different optimization techniques to some of the prototype based techniques. For example, De Falco et al. proposed a particle swarm optimization (PSO) based classiﬁer to face the problem of classiﬁcation of instances in multiclass databases [34]. Moreover, three different ﬁtness functions have been proposed by them for classiﬁcation purposes. Karaboga and Ozturk in [35] adapted the artiﬁcial bee colony (ABC) to face classiﬁcation using one of the functions proposed by De Falco et al. The experiments provided by [34,35] conﬁrm that the heuristic based classiﬁers (PSO and ABC) provide good results in contrast to well-known classiﬁcation techniques. Gravitational search algorithm (GSA) is one of the latest heuristic optimization algorithms, which was ﬁrst introduced by Rashedi et al. as a new stochastic population-based optimization tool [36] based on the metaphor of gravitational interaction between masses. This approach provides an iterative method that simulates mass interactions, and moves through a multi-dimensional search space under the inﬂuence of gravitation. This heuristic algorithm has been inspired by the Newtonian laws of gravity and motion

[36]. The effectiveness of GSA and its binary version (BGSA) [37] in solving a set of nonlinear benchmark functions has been proven [36,37]. Moreover, the results obtained in [38–40] conﬁrm that GSA is a suitable tool for linear and nonlinear ﬁlter modeling, parameter identiﬁcation of hydraulic turbine governing system and synthesis of thinned scanned concentric ring array antenna respectively. Theoretically, GSA belongs to the class of swarm based heuristic algorithms. Rashedi et al. [36] practically carried out a comparative study between GSA and a small number of well-known swarm algorithms like PSO. The results suggest that GSA which is inspired by the law of gravity has merit in the ﬁeld of optimization. In the current paper a prototype based classiﬁcation approach that uses GSA is recommended. For simplicity, this approach uses one prototype per class. The issue of ﬁnding the appropriate position of each prototype (class representative) is the main objective of this approach. After ﬁnding the class representatives, the instance is classiﬁed by the class representative that is at the closest distance (i.e. using Euclidean distance). The process of ﬁnding the proper positions of class representatives reveals another optimization task. GSA is used to tackle this optimization task. Since GSA is a heuristic based optimization algorithm, it requires some ﬁtness functions to guide its search. In this paper the same ﬁtness functions as those applied by [34] are used. To verify the effectiveness of the proposed GSA based classiﬁer, it is applied to 12 benchmark data sets, and the results are compared with those of the 11 well-known classiﬁcation techniques reported in the literature. The remainder of this paper is organized as follows: in Section 2 the structure of GSA is described. In Section 3 GSA is modiﬁed so that it conducts a classiﬁcation task by three ﬁtness functions mentioned above. In Section 4 ﬁrst, twelve selected data sets form UCI machine learning repository [41] are explained. Then, the proposed classiﬁer is applied to them and a comparison is made with those of the 11 well-known classiﬁcation techniques including PSO and ABC over the selected data sets. And ﬁnally, in Section 5 the conclusions are stated.

2. A prologue to gravitational search algorithm In GSA, agents are considered as objects, and their performances are measured by their masses. All these objects attract each other by a gravity force, and this force causes the movement of all objects globally towards objects with heavier masses. The heavy masses correspond to good solutions of the problem. The position of the agent corresponds to a solution of the problem, and its mass is determined using a ﬁtness function. By lapse of time, masses are attracted by the heaviest mass. We hope that this mass would present an optimum solution in the search space. The GSA could be considered as an isolated system of masses. It is like a small artiﬁcial world of masses obeying the Newtonian laws of gravitation and motion [36]. To describe the GSA, consider a system with N masses (agents) in which the position of the ith mass is deﬁned as follows: Xi = (xi1 , . . . , xid , . . . , xin ),

i = 1, 2, . . . , N

(1)

where xid is the position of ith mass in the dth dimension, and n is dimension of the search space. It is noted that the positions of masses correspond to the solutions of the problem. Based on [36], the mass of each agent is calculated after computing current population’s ﬁtness which is as follows: Mi (t) =

fiti (t) − worst(t)

N

j=1

(fitj (t) − worst(t))

(2)

A. Bahrololoum et al. / Applied Soft Computing 12 (2012) 819–825

821

where ﬁti (t) represents the ﬁtness value of the agent i at t, and worst(t) is deﬁned as follows: worst(t) =

⎧ min fitj (t) for maximization problems ⎨ j ∈ {1,...,N} ⎩ max fitj (t) for minimization problems

(3)

j ∈ {1,...,N}

At a speciﬁc time “t”, the force acting on mass “i” from mass “j” is deﬁned as following: Fijd (t) = G(t)

Mi (t) × Mj (t) Rij (t) + ε

(xjd (t) − xid (t))

(4)

To compute the acceleration of an agent, total forces from all other masses that are applied to it should be considered based on the law of gravity (Eq. (5)) which is followed by calculation of agent acceleration using law of motion (Eq. (6)). Fid (t) =

N

rj Fijd (t) = G(t)Mi (t)

j=1,j = / i

N

rj

j=1,j = / i

Mj (t) Rij (t) + ε

(xjd (t) − xid (t)) (5)

adi (t) =

Fid (t) Mi (t)

N

= G(t)

rj

j=1,j = / i

Mj (t) Rij (t) + ε

(xjd (t) − xid (t))

(6)

where adi (0) = 0. Afterward, the next velocity of an agent is calculated as a fraction of its current velocity added to its acceleration (Eq. (7)). Then, its next position could be calculated by using Eq. (8).

vdi (t + 1) = ri × vdi (t) + adi (t)

(7)

xid (t

(8)

+ 1) =

xid (t) + vdi (t

+ 1)

where vdi (0) = 0, ri and rj are two uniformly distributed random numbers in the interval [0, 1], and ε is a small value. G(t) = G(G0 , t) is gravitational constant at time t which is a decreasing function of time where is set to G0 at the beginning (t = 0) and will be decreased exponentially [36] or linearly [37] towards zero at the last iteration. Rij (t) is the Euclidean distance between two agents i and j, and it is deﬁned as:

Rij (t) = Xi (t), Xj (t)

Fig. 1. The pseudo code of GSA.

ﬁtness function. The block diagram of the proposed classiﬁer in the training phase is given by Fig. 2. GSA is initialized with N masses in n-dimension space in which n = C × D. It means that we ask GSA to ﬁnd one representative for each class, totally C representatives for the problem at hand, in which each representative is a D-dimensional vector that is used by a classiﬁer as the class prototype. The initial positions of the masses in GSA are selected randomly from the data of the training data set. The positions of masses form the candidate solutions to the problem. The ﬁtness value of each mass is computed, and the results are fed into the GSA for mass calculation (Eq. (2)), which is followed by computing Eqs. (4)–(8). The search process for ﬁnding the best representatives continues until the stopping condition is met. At the end of training phase, the best obtained representatives are delivered into the second phase to set the initial parameters of prototype classiﬁer. Based on the information obtained by the training phase, the ﬁnal prototype classiﬁer can start the work. Here, we have C prototypes, where each prototype has a class label and classiﬁcation of each new instance (from test set) is carried out by ﬁnding the closest prototype using a deﬁned distance measure (typically Euclidean distance). The block diagram of the proposed classiﬁer in the performance phase is given by Fig. 3. 3.1. Fitness functions

(9)

2

To make a comparison with GSA, PSO, and ABC in terms of classiﬁcation accuracy, the same ﬁtness functions as what was reported in [34] are picked and applied to GSA, namely ﬁt1 , ﬁt2 , and ﬁt3 .

The pseudo code of the GSA is given by Fig. 1. 3. The proposed GSA based classiﬁer Assume we have a C-class classiﬁcation problem in a Ddimensional feature space. It means that each instance should be classiﬁed into one of the existing C classes in D-dimensional feature space. To design prototype classiﬁer based on GSA, two phases are succeeding: training phase and Performance phase. For this purpose the data set is divided into two sections: training set and test set. In the training phase the GSA uses the training data to provide the best representative for each class according to a speciﬁc

3.1.1. Fitness function ﬁt1 According to this function the coordination of prototypes are calculated so that the percentage of the misclassiﬁcation (in the training set not in the test set) be minimized. The percentage of the misclassiﬁcation is calculated as follows [34]. fit1 =

Training data

GSA

Suggested representaves (one for each class)

100 DTrain

DTrain

m(Ij )

(10)

j=1

Fitness funcon

Prototype classiﬁer

Fitness value

Distance measure Fig. 2. The schematic of the proposed GSA-based prototype classiﬁer in the training phase.

822

A. Bahrololoum et al. / Applied Soft Computing 12 (2012) 819–825

4. Experimental results

Final representaves obtained by GSA (one for each class)

Prototype classiﬁer

Test data

4.1. UCI selected data sets

Class of data

Distance measure Fig. 3. The schematic of the proposed GSA-based prototype classiﬁer in the performance phase.

where DTrain is the number of training examples, each represented by an ordered pair consists of a vector Ij and a chosen output label from the label set of the relative data set. The term m(Ij ) indicates whether a misclassiﬁcation has occurred. In the case of a misclassiﬁcation, its value equals 1 and in the case of a correct classiﬁcation its value equals 0. The summation over all instances reveals the number of incurred misclassiﬁcations. It should be noted that the denominator DTrain is used to normalize the quantity ‘number of misclassiﬁcations’. And the number 100 is used for percentile that leads the ﬁtness function to vary in the interval [0, 100]. 3.1.2. Fitness function ﬁt2 The second ﬁtness function is deﬁned as the Euclidean distance between each representative (prototype) and each of its associated instances are represented by Ij . The formula is as follows [34]: fit2 =

Dtrain

1 DTrain

d(Ij , Pj )

(11)

j=1

where d() is the Euclidean distance between Ij ’s, generic instances, and Pj is the prototype to which the instance belongs to. The denominator DTrain is used to normalize the objective value. It should be noted that this second ﬁtness function ﬁt2 has a key advantage over the previous ﬁtness function ﬁt1 . The merit of ﬁt2 over ﬁt1 is due to the greater continuity of that, which is resulted from Euclidean distance continuous nature. While ﬁt1 can only take steps in size of the fraction 1/DTrain . 3.1.3. Fitness function ﬁt3 In the third ﬁtness function, ﬁrst, each training instance is assigned to the prototype of closer distance. Then in the next step, the ﬁtness function ﬁt3 is calculated as a linear combination of Eqs. (10) and (11) [34]: fit3 =

1 2

fit

1

100

+ fit2

(12)

to normalize ﬁt3 , ﬁrst, ﬁt1 is divided by 100, then its summation to ﬁt2 is divided by 2. Having deﬁned these three ﬁtness functions, the classiﬁcation task is shaped into a minimization problem. By these three different kinds of ﬁtness functions, three versions of GSA based prototype classiﬁers are used to do the classiﬁcation. The performance of each of the three versions of GSA then is calculated according to the misclassiﬁcation percentages of instances by the best found agent. This misclassiﬁcation percentage is obtained through the comparison between the assigned label and the correct label from training examples.

Again to make the comparison possible, the same classiﬁcation data sets, as used in [34,35] from UCI machine learning repository [41], are selected to be classiﬁed by three versions of GSA based classiﬁers. Note that in 9 data sets, the ﬁrst 75% of data set elements are used as a training set, and the next 25% are used for the case of testing after the training process. Just in the case of data sets glass, thyroid, and wine, the data set classes are in sequential form. Since in training and testing phases we must have elements with different class distributions, similar to [34] and [35], we ﬁrst shufﬂe the elements, and then act like before to pick training and test sets. It also should be noted that the data set attributes are normalized and then the three versions of GSA are run on them. The characteristics of these 12 data sets summarized in Table 1 are as follows: Balance data set: This data set is used to model psychological experimental results. This data set encompasses 625 elements. The instances of this data set are classiﬁed into three classes: balance scale tip to the right, balance scale tip to the left, or balanced. The 4 attributes are the left weight, the left distance, the right weight, and the right distance. From these elements, 469 are used for training and 156 for testing. Cancer data set: This data set is generated from the “breast cancer Wisconsin-Diagnostic” data set. It encompasses the symptoms by which a judgment about classiﬁcation of a tumor into two classes benign or malignant is made. A benign tumor is one that is not likely to cause a death. But a malignant tumor has advanced a lot and gets uncontrollable and is likely to lead to death. This data set includes 569 elements. Each instance of data set uses 30 attributes out of 32. Cancer-Int data set: This data set is generated from the “breast cancer Wisconsin-Original” data set. Its description is the same as the previous one. But it contains 699 elements and each instance of it uses 9 attributes out of 10. Credit data set: This data set is used for the case of credit approval (the Australian version of credit). The name and values of attributes has been chosen in such a way to be meaningless for information security purposes. This data set has an advantage of using a blend of continuous, small-valued nominal, and large-valued nominal attributes. The attributes are formed into 51 input values [42]. Dermatology data set: This data set is used for the case of recognition of skin diseases. Dermatology is the scientiﬁc study of skin diseases. Two major difﬁculties with the recognition of this disease are that ﬁrstly, it requires some microscopic tissue test to manifest the disease. Secondly it may show the characteristics of another disease that is in its ﬁrst stages. The data set contains 366 elements and 34 inputs and 6 following classes: psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, cronic dermatitis, and pityriasis rubra pilaris. Diabetes data set: This data set is used for the recognition of diabetes disease. Some constrains such as being females of 21-yearold belonging to Pima Indian heritage are imposed to gather data for this data set. This data set has 768 elements. The ﬁrst 576 elements are used as training set and the remainder 192 as test set. Each sample has 8 attributes. E. coli data set: This data set includes protein localization sites for E. coli bacteria. It should be noted that originally this data set has 336 members that are classiﬁed into 8 classes. But since 3 of classes were represented by just 2, 2, 5 instances, we set them aside. So the new data set has 5 classes and has 327 members, from which we picked 245 members for training and 82 members for testing as can be seen in Table 1. Glass data set: This data set is used for classiﬁcation of glass types as being ﬂoat processed building windows, non-ﬂoat processed building windows, vehicle windows, containers, tableware

A. Bahrololoum et al. / Applied Soft Computing 12 (2012) 819–825

823

Table 1 Characteristics of the 12 UCI data sets. Data Set

# of data set elements

# of training data

Balance Cancer Cancer-Int Credit Dermatology Diabetes E. coli Glass Horse Iris Thyroid Wine

625 569 699 690 366 768 327 214 364 150 215 178

469 427 524 518 274 576 245 161 273 112 162 133

and head lamps. This data set can be used for investigations in which a crime has been committed and a piece of glass left. Having classiﬁed the glass type correctly, it can later be utilized as evidence against the criminal. Input attributes are based on 9 chemical measurements assigned to each of the 6 glass types that have 70, 76, 17, 13, 9, and 29 instances of total 214 instances respectively. From the entire elements, 161 are selected for training and the left 53 are selected for testing. Horse data set: This data set is used for classiﬁcation (prediction) of a horse fate, when it has a colic problem and a severe pain it suffers from. The horse fate can be classiﬁed into the following classes: the horse will be euthanized (the act of killing animals painlessly), the horse will survive, or it will die. The data set has 364 elements. Each element has 58 inputs formed from 27 attributes and 3 classes. Iris data set: This data set is the most popular UCI data set and is used for classiﬁcation of Iris ﬂowers into one of the three species of them namely Setosa, Versicolor, Virginica. This classiﬁcation is carried out based on 4 attributes namely, Sepal length, Sepal width, Petal length, and Petal width. The data set has 150 elements of which 50 belong to each species. Thyroid data set: This data set is used to classify the situation of thyroid gland into three classes over function, normal function, or under function. The data set is based on new-thyroid data that contains 215 elements. Each instance has 5 attributes. Wine data set: This data set is used for classiﬁcation of wines into each of three cultivators based on a chemical analysis. This data set has 178 elements. And each instance has 13 attributes. 4.2. Results To classify using GSA, the following settings are applied: the population size, N and the number of iterations, T, are set to 20 and 50, respectively. G is set using Eq. (13), where G0 is set to 1 and T is the total number of iterations [37]:

G = G0 1 −

t T

# of testing data 156 142 175 172 92 192 82 53 91 38 53 45

(13)

Therefore, for GSA-based classiﬁer the number of ﬁtness evaluations is equal to 1000. In PSO and ABC as reported in [35] the number of ﬁtness evaluations are 50,000 and 20,000 respectively. The reason for taking small values of ﬁtness evaluations by GSA is to make the comparison as reliable as possible and to avoid having results dominated by such possible errors which are originated from giving more time to GSA than other algorithms. To make the comparison, the percentage of misclassiﬁcation is used as a criterion. This criterion is computed as follows: ﬁrst, the entire test data is classiﬁed and the number of misclassiﬁcations is counted through the comparison with the correct labels in the training data. Second, this number is divided by relative cardinality of the data set (test data). And ﬁnally to achieve percentage it is multiplied by 100.

# of input attributes

# of classes

4 30 9 51 34 8 7 9 58 4 5 13

3 2 2 2 6 2 5 6 3 3 3 3

Table 2 shows the average misclassiﬁcation percentages and rankings of each of the classiﬁers PSO, ABC, and GSA over 12 chosen data sets. In other words, each table slot, which belongs to a speciﬁc classiﬁer and the data set, contains the average of misclassiﬁcation percentages on 20 runs of that classiﬁer over that data set. The number in brackets in each table slot shows the ranking of each classiﬁer based on the mentioned criterion after running on a speciﬁc data set. The best results have been shown in bold. It should be noted that PSO and ABC results are those experimented in [34,35], respectively. As can be seen in Table 2, GSA versions have obtained acceptable results in comparison with PSO and ABC; just in the case of the data sets Iris and Wine, ABC gets the ﬁrst rank and GSA has the second and third ranks and in other cases GSA has the ﬁrst rank. In the case of Cancer-Int data set, both ABC and GSA get the same results. The reason that the ABC results are just present for ﬁtness function ﬁt2 is because there were no experiments on ﬁtness functions ﬁt1 and ﬁt3 in the relative paper [35]. So due to that, we also limit our comparison with the case of using ﬁtness function ﬁt2 in later tables. In Table 3, the results of PSO, ABC, and GSA based on ﬁtness function ﬁt2 are displayed for a better comparison. As it can be seen, PSO just in the Balance data set and ABC in ﬁve data sets Cancer, Cancer-Int, Diabetes, Iris, and Wine get the ﬁrst rank and GSA in seven data sets including Cancer-Int gets the ﬁrst rank. In Table 4, Average misclassiﬁcation percentages along with relative rankings for each of the 12 different classiﬁers running on the selected data sets are shown. The table shows a comprehensive list of classiﬁers and provides a better comparative framework with the current work. Except the GSA, the presented results of other classiﬁers are those reported in [34,35]. More details about presented classiﬁers in Table 4 can be found in the following papers [34,35]. In Table 5, the average misclassiﬁcation percentages on the entire data sets for each of the 12 different classiﬁers along with the new relative obtained ranks have been shown. In other words ﬁrst, the average of each column of the previous table has been calculated, then a ranking has been conducted which can be seen in brackets. As it can be seen from among all the classiﬁers, GSA, MLP-ANN, and Bayes-Net get ﬁrst, second, and third rankings respectively. In Table 6 this comparison has been made according to the sum of the ranks available in Table 4 per each column. Although this quantity is of lower precision degree for reporting results in some cases, it is common in Nonparametric Statistics. As can be seen in this table, MLP-ANN, GSA, and ABC get the ﬁrst, second, and third rankings, respectively. At this point, it should be mentioned that although there is no universal classiﬁers that can get the best results on the entire available benchmarks, the results obtained by GSA conﬁrm that the proposed GSA-based prototype classiﬁer also can be thought of as a worthwhile classiﬁer beside other practical classiﬁers that have proved their efﬁciencies thus far.

824

A. Bahrololoum et al. / Applied Soft Computing 12 (2012) 819–825

Table 2 Average misclassiﬁcation percentages and rankings of each of the three classiﬁers PSO, ABC, GSA executed on 12 UCI data sets. The number in brackets in each table slot shows the ranking of each classiﬁer. Data set

Classiﬁer

Balance Cancer Cancer-Int Credit Dermatology Diabetes E. coli Glass Horse Iris Thyroid Wine

PSO ﬁt1

PSO ﬁt2

PSO ﬁt3

ABC ﬁt2

GSA ﬁt1

25.47 (7) 5.80 (7) 2.87 (7) 22.96 (7) 5.76 (5) 22.50 (5) 14.63 (6) 40.18 (5) 40.98 (7) 2.63 (3) 5.55 (6) 2.22 (4)

13.24 (4) 3.49 (4) 2.75 (6) 22.19 (6) 19.67 (7) 23.22 (6) 23.65 (7) 40.18 (5) 37.69 (5) 3.68 (6) 6.66 (7) 6.22 (7)

13.12 (3) 3.49 (4) 2.64 (5) 18.77 (5) 6.08 (6) 21.77 (3) 13.90 (5) 38.67 (4) 35.16 (4) 5.26 (7) 3.88 (5) 2.88 (6)

15.38 (5) 2.81 (3) 0 (1) 13.37 (2) 5.43 (4) 22.39 (4) 13.41 (4) 41.50 (7) 38.26 (6) 0 (1) 3.77 (4) 0 (1)

12.67 (1) 1.53 (2) 0.97 (4) 14.36 (3) 3.80 (1) 21.40 (2) 7.37 (2) 33.01 (3) 33.91 (3) 1.84 (2) 3.14 (2) 0.90 (2)

GSA ﬁt2

GSA ﬁt3

18.88 (6) 4.16 (6) 0 (1) 10.94 (1) 5.27 (3) 23.59 (7) 11.22 (3) 32.07 (2) 33.31 (2) 2.63 (3) 1.85 (1) 2.27 (5)

12.92 (2) 1.39 (1) 0.91 (3) 14.42 (4) 4.12 (2) 20.88 (1) 7.31 (1) 31.89 (1) 33.15 (1) 2.89 (5) 3.14 (2) 1.58 (3)

Table 3 Average misclassiﬁcation percentages and rankings of PSO, ABC, and GSA using ﬁt2 on each of the 12 chosen UCI data sets. The number in brackets in each table slot shows the ranking of each classiﬁer. Data set

Classiﬁer PSO ﬁt2

Balance Cancer Cancer-Int Credit Dermatology Diabetes E. coli Glass Horse Iris Thyroid Wine

ABC ﬁt2

13.24 (1) 3.49 (2) 2.75 (3) 22.19 (3) 19.67 (3) 23.22 (2) 23.65 (3) 40.18 (2) 37.69 (2) 3.68 (3) 6.66 (3) 6.22 (3)

GSA ﬁt2

15.38 (2) 2.81 (1) 0 (1) 13.37 (2) 5.43 (2) 22.39 (1) 13.41 (2) 41.50 (3) 38.26 (3) 0 (1) 3.77 (2) 0 (1)

18.88 (3) 4.16 (3) 0 (1) 10.94 (1) 5.27 (1) 23.59 (3) 11.22 (1) 32.07 (1) 33.31 (1) 2.63 (2) 1.85 (1) 2.27 (2)

Table 4 Average misclassiﬁcation percentages along with relative rankings for each of the 12 different classiﬁers running on UCI data sets. The number in brackets in each table slot shows the ranking of each classiﬁer. Dataset

Balance Cancer Cancer-Int Credit Dermatology Diabetes E. coli Glass Horse Iris Thyroid Wine

Classiﬁer Bayes Net MLP ANN RBF

KStar

Bagging

MultiBoost NBTree

Ridor

VFI

PSO ﬁt3

ABC ﬁt2

GSA ﬁt3

19.74 (7) 4.19 (6) 3.42 (4) 12.13 (2) 1.08 (1) 25.52 (4) 17.07 (6) 29.62 (5) 30.76 (2) 2.63 (7) 6.66 (6) 0.00 (1)

10.25 (2) 2.44 (2) 4.57 (6) 19.18 (11) 4.66 (6) 34.05 (10) 18.29 (9) 17.58 (1) 35.71 (8) 0.52 (5) 13.32 (11) 3.99 (9)

14.77 (5) 4.47 (7) 3.93 (5) 10.68 (1) 3.47 (4) 26.87 (6) 15.36 (5) 25.36 (3) 30.32 (1) 0.26 (4) 14.62 (12) 2.66 (6)

24.20 (10) 5.59 (8) 5.14 (7) 12.71 (4) 53.26 (12) 27.08 (7) 31.70 (12) 53.70 (12) 38.46 (10) 2.63 (7) 7.40 (7) 17.77 (12)

20.63 (9) 6.36 (9) 5.48 (9) 12.65 (3) 7.92 (10) 29.31 (9) 17.07 (6) 31.66 (6) 31.86 (3) 0.52 (5) 8.51 (8) 5.10 (10)

38.85 (12) 7.34 (10) 5.71 (10) 16.47 (9) 7.60 (9) 34.37 (11) 17.07 (6) 41.11 (9) 41.75 (12) 0.00 (1) 11.11 (9) 5.77 (11)

13.12 (4) 3.49 (5) 2.64 (3) 18.77 (10) 6.08 (8) 21.77 (2) 13.90 (4) 38.67 (8) 35.16 (7) 5.26 (11) 3.88 (4) 2.88 (7)

15.38 (6) 2.81 (3) 0.00 (1) 13.37 (5) 5.43 (7) 22.39 (3) 13.41 (2) 41.50 (10) 38.26 (9) 0.00 (1) 3.77 (3) 0.00 (1)

12.92 (3) 1.39 (1) 0.91 (2) 14.42 (7) 4.12 (5) 20.88 (1) 7.31 (1) 31.89 (7) 33.15 (6) 2.89 (10) 3.14 (2) 1.58 (4)

9.29 (1) 2.93 (4) 5.25 (8) 13.81 (6) 3.26 (3) 29.16 (8) 13.53 (3) 28.51 (4) 32.19 (5) 0.00 (1) 1.85 (1) 1.33 (3)

33.61 (11) 20.27 (12) 8.17 (12) 43.29 (12) 34.66 (11) 39.16 (12) 24.38 (11) 44.44 (11) 38.46 (10) 9.99 (12) 5.55 (5) 2.88 (7)

19.74 (7) 7.69 (11) 5.71 (10) 16.18 (8) 1.08 (1) 25.52 (4) 20.73 (10) 24.07 (2) 31.86 (3) 2.63 (7) 11.11 (9) 2.22 (5)

Table 5 Average misclassiﬁcation percentages on the entire data sets for each of the 12 different classiﬁers along with the new relative obtained ranks. Average

Average

Classiﬁer Bayes Net

MLP ANN

RBF

KStar

Bagging

MultiBoost

NBTree

Ridor

VFI

PSO

12.73 (3)

11.75 (2)

25.40 (12)

13.71 (6)

12.73 (3)

23.30 (11)

14.04 (8)

14.75 (9)

18.92 (10)

13.80 (7) 13.02 (5) 11.21 (1)

ABC

GSA

Table 6 Sum of rankings of Table 4 on each column. Sum of ranks

Sum of ranks

Classiﬁer Bayes Net

MLP ANN

RBF

KStar

Bagging

MultiBoost

NBTree

Ridor

VFI

PSO

ABC

GSA

51 (3)

47 (1)

126 (12)

80 (8)

59 (5)

108 (10)

77 (7)

87 (9)

109 (11)

73 (6)

51 (3)

49 (2)

A. Bahrololoum et al. / Applied Soft Computing 12 (2012) 819–825

5. Conclusion The overall goal of this paper is to propose a novel prototype classiﬁer which employs GSA as the global searcher to ﬁnd the best positions of the representatives (prototypes). The proposed method has been used in classiﬁcation of benchmark problems. The performance of GSA based classiﬁer has been compared with ABC, PSO and nine other well-known classiﬁers which are widely used in the literature. The experimental results conﬁrm the effectiveness and efﬁciency of the proposed method and show that it can successfully be applied as a classiﬁer to classiﬁcation problems. Acknowledgements The authors would like to extend their appreciation to Saber Zahedi and Abdol-hamid Haeri for proof reading the manuscript and providing valuable comments. In addition, the authors would like to thank the ASOC Editorial Board and the anonymous reviewers for their very helpful suggestions. References [1] T. Mitchell, Machine Learning, McGraw-Hill, 1997, ISBN 0070428077. [2] A. Unler, A. Murat, A discrete particle swarm optimization method for feature selection in binary classiﬁcation problems, European Journal of Operational Research 206 (3) (2010) 528–539. [3] S.D. Bhavani, T.S. Rani, R.S. Bapi, Feature selection using correlation fractal dimension: issues and applications in binary classiﬁcation problems, Applied Soft Computing 8 (1) (2008) 555–563. [4] N. García-Pedrajas, D. Ortiz-Boyer, An empirical study of binary classiﬁer fusion methods for multiclass classiﬁcation, Information Fusion 12 (2) (2011) 111–130. [5] K. Polat, S. Günes¸, A novel hybrid intelligent method based on C4.5 decision tree classiﬁer and one-against-all approach for multi-class classiﬁcation problems, Expert Systems with Applications 36 (2) (2009) 1587–1592. [6] M.W. Kurzynski, The optimal strategy of a tree classiﬁer, Pattern Recognition 16 (1983) 81–87. [7] F. Seiﬁ, M.R. Kangavari, H. Ahmadi, E. Lotﬁ, S. Imaniyan, S. Lagzian, Optimizing twins decision tree classiﬁcation using genetic algorithms, in: 7th IEEE International Conference on Cybernetic Intelligent Systems, 2008, pp. 1–6. [8] P. Knagenhjelm, P. Brauer, Classiﬁcation of vowels in continuous speech using MLP and a hybrid net, Speech Communication 9 (1) (1990) 31–34. [9] J-Y. Wu, MIMO CMAC neural network classiﬁer for solving classiﬁcation problems, Applied Soft Computing 11 (2) (2011) 2326–2333. [10] C.R. De Silva, S. Ranganath, L.C. De Silva, Cloud basis function neural network: a modiﬁed RBF network architecture for holistic facial expression recognition, Pattern Recognition 41 (4) (2008) 1241–1253. [11] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classiﬁcation, Second ed., Wiley, New York, 2001. [12] Y. Liu, Z. You, L. Cao, A novel and quick SVM-based multi-class classiﬁer, Pattern Recognition 39 (11) (2006) 2258–2264. [13] H. Qian, Y. Mao, W. Xiang, Z. Wang, Recognition of human activities using SVM multi-class classiﬁer, Pattern Recognition Letters 31 (2) (2010) 100–111. [14] R. Kumar, A. Kulkarni, V.K. Jayaraman, B.D. Kulkarni, Symbolization assisted SVM classiﬁer for noisy data, Pattern Recognition Letters 25 (4) (2004) 495–504. [15] R. Kumar, V.K. Jayaraman, B.D. Kulkarni, An SVM classiﬁer incorporating simultaneous noise reduction and feature selection: illustrative case examples, Pattern Recognition 38 (1) (2005) 41–49. [16] E.J.R. Justino, F. Bortolozzi, R. Sabourin, A comparison of SVM and HMM classiﬁers in the off-line signature veriﬁcation, Pattern Recognition Letters 26 (9) (2005) 1377–1385. [17] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second ed., Springer Verlag, New York, 2009. [18] C.-L. Liu, M. Nakagawa, Evaluation of prototype learning algorithms for nearestneighbor classiﬁer in application to handwritten character recognition, Pattern Recognition 34 (2001) 601–615.

825

[19] F. Shen, O. Hasegawa, A fast nearest neighbor classiﬁer based on self-organizing incremental neural network, Neural Networks 21 (10) (2008) 1537–1547. [20] F. Chang, C.-H. Chou, C.-C. Lin, C.-J. Chen, A prototype classiﬁcation method and its application to handwritten character recognition, in: IEEE International Conference on Systems, Man, and Cybernetics, vol. 5, March, 2004, pp. 4738–4743. [21] S.R.H. Hatem, A. Fayed, A.F. Atiya, Self-generating prototypes for pattern classiﬁcation, Pattern Recognition 40 (5) (2007) 1498–1509. [22] C.-H. Chou, C.-C. Lin, Y.-H. Liu, F. Chang, A prototype classiﬁcation method and its use in a hybrid solution for multiclass pattern recognition, Pattern Recognition 39 (4) (2006) 624–634. [23] C.-C. Hung, L. Wan, Hybridization of particle swarm optimization with the K-Means algorithm for image classiﬁcation, in: IEEE Symposium on Computational Intelligence for Image Processing, 2009, pp. 60–64. [24] S. Su, Image classiﬁcation based on particle swarm optimization combined with K-means, in: International Conference on Test and Measurement (ICTM), Hong Kong, 5–6 December, 2009, pp. 367–370. [25] H.-H. Song, S.-W. Lee, LVQ combined with simulated annealing for optimal design of large-set reference models, Neural Networks 9 (2) (1996) 329–336. [26] D.T. Pham, S. Otri, A. Ghanbarzadeh, E. Kog, Application of the bees algorithm to the training of learning vector quantization networks for control chart pattern recognition, in: Information and Communication Technologies (ICTTA), 2006, pp. 1624–1629. [27] C.M. Bishop, Pattern Recognition and Machine Learning, ﬁrst ed., Springer, 2006. [28] Z. Botev, D.P. Kroese, Global likelihood optimization via the cross-entropy method, with an application to mixture models, in: R.G. Ingalls, M.D. Rossetti, J.S. Smith, B.A. Peters (Eds.), IEEE Proceedings of the 2004 Winter Simulation Conference, December, Washington DC, 2004, pp. 529–535. [29] A. Bessadok, P. Hansen, A. Rebai, EM algorithm and variable neighborhood search for ﬁtting ﬁnite mixture model parameters, in: Proceedings of the International Multiconference on Computer Science and Information Technology, 2009, pp. 725–733. [30] X. Zhou, X. Wang, Optimisation of Gaussian mixture model for satellite image classiﬁcation, IEE Proceedings – Vision, Image and Signal Process 153 (3) (2006) 349–356. [31] Y. Liao, V.R. Vemuri, Use of K-nearest neighbor classiﬁer for intrusion detection, Computers & Security 21 (5) (2002) 439–448. [32] C.-L. Liu, M. Nakagawa, Prototype learning algorithms for nearest neighbor classiﬁer with application to handwritten character recognition, in: ICDAR’99: IEEE Proceedings of the Fifth International Conference on Document Analysis and Recognition, Washington DC, USA, 1999, pp. 378–381. [33] L. Jiang, Z. Cai, D. Wang, S. Jiang, Survey of improving k-nearest-neighbor for classiﬁcation, in: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 1, Haikou, China, August 24–27, 2007, pp. 679–683. [34] I. De Falco, A. Della Cioppa, E. Tarantino, Facing classiﬁcation problems with particle swarm optimization, Applied Soft Computing 7 (3) (2007) 652–658. [35] D. Karaboga, C. Ozturk, A novel clustering approach: artiﬁcial bee colony (ABC) algorithm, Applied Soft Computing 11 (1) (2011) 652–657. [36] E. Rashedi, H. Nezamabadi-pour, S. Saryazdi, GSA: a gravitational search algorithm, Information Science 179 (2009) 2232–2248. [37] E. Rashedi, H. Nezamabadi-pour, S. Saryazdi, BGSA: binary gravitational search algorithm, Natural Computing 9 (3) (2010) 727–745. [38] E. Rashedi, H. Nezamabadi-pour, S. Saryazdi, Filter modeling using gravitational search algorithm, Engineering Applications of Artiﬁcial Intelligence 24 (1) (2011) 117–122. [39] C. Li, J. Zhou, Parameters identiﬁcation of hydraulic turbine governing system using improved gravitational search algorithm, Energy Conversion and Management 52 (1) (2011) 374–381. [40] A. Chatterjee, G.K. Mahanti, Comparative performance of gravitational search algorithm and modiﬁed particle swarm optimization algorithm for synthesis of thinned scanned concentric ring array antenna, Progress in Electromagnetics Research B 25 (2010) 331–348. [41] A.C.L. Blake, C.J. Merz, University of California at Irvine Repository of Machine Learning Databases, 1998, http://www.ics.uci.edu/∼mlearn/ MLRepository.html. [42] L. Prechelt, L. Proben, A set of neural network benchmark problems and benchmarking rules, Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe. Available via: ftp.ira.uka.de, 1994.

A prototype classifier based on gravitational search algorithm

A prototype classifier based on gravitational search algorithm

Recommend Documents