Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems

Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems

G Model ARTICLE IN PRESS ASOC 3012 1–12 Applied Soft Computing xxx (2015) xxx–xxx Contents lists available at ScienceDirect Applied Soft Computin...

2MB Sizes 1 Downloads 106 Views

G Model

ARTICLE IN PRESS

ASOC 3012 1–12

Applied Soft Computing xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems

1

2

3

Q1

Mohammed Alweshah a , Salwani Abdullah b,∗ a

4

b

5

Prince Abdullah Bin Ghazi Faculty of Information Technology, Al-Balqa Applied University, Salt, Jordan Data Mining and Optimization Research Group (DMO), Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia

6

a r t i c l e

7 21

i n f o

a b s t r a c t

8

Article history: Received 5 March 2013 Received in revised form 15 June 2015 Accepted 16 June 2015 Available online xxx

9 10 11 12 13 14

20

Keywords: Firefly algorithm Lévy flight Simulated annealing Probabilistic neural networks Classification problems

22

1. Introduction

15 16 17 18 19

23Q3 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Q2

Classification is one of the important tasks in data mining. The probabilistic neural network (PNN) is a well-known and efficient approach for classification. The objective of the work presented in this paper is to build on this approach to develop an effective method for classification problems that can find highquality solutions (with respect to classification accuracy) at a high convergence speed. To achieve this objective, we propose a method that hybridizes the firefly algorithm with simulated annealing (denoted as SFA), where simulated annealing is applied to control the randomness step inside the firefly algorithm while optimizing the weights of the standard PNN model. We also extend our work by investigating the effectiveness of using Lévy flight within the firefly algorithm (denoted as LFA) to better explore the search space and by integrating SFA with Lévy flight (denoted as LSFA) in order to improve the performance of the PNN. The algorithms were tested on 11 standard benchmark datasets. Experimental results indicate that the LSFA shows better performance than the SFA and LFA. Moreover, when compared with other algorithms in the literature, the LSFA is able to obtain better results in terms of classification accuracy. © 2015 Published by Elsevier B.V.

Classification is one of the key data mining tasks. Classification maps data into predefined groups or families. It is a form of supervised learning because the classes are learned before the data is examined. The goal of a classification method is to create a model that correctly maps the input to the output by using historical data, so that the model can be used to develop output when the desired output is unknown. Several techniques have been successfully used for classification problems, including the neural network (NN) [1], support vector machine (SVM) [2], naive Bayes (NB) [3], radial basis function (RBF) [4], logistic regression (LR) [5], K-nearest neighbours (KNN), and the iterative dichotomiser 3 (ID3) [6]. The Neural Network is one of the most well-known and widely used techniques for classification. The NN model was first proposed by Rosenblatt in the late 1950s [7]. Since that time, a lot of NN models have been developed, including feed-forward networks, RBF networks, the multi-layer perceptron, modular networks, and the

∗ Corresponding author. Tel.: +60 389216667. E-mail addresses: [email protected] (M. Alweshah), [email protected] (S. Abdullah).

probabilistic neural network (PNN). These models differ from each other in terms of architecture, behaviour and learning approaches, hence they are suitable for solving different problems such as series forecasting [5], stock market prediction [8], weather prediction [9], and pattern recognition [10]. The PNN is one of the appropriate approaches for solving classification problems. It is a general NN model that is based on the notion of ‘the gradient steepest descent method’, which enables a reduction of errors between the actual and predicted output functions by permitting the network to correct the network weights [1,6,11–14]. Recently, the hybridization of metaheuristics with different kinds of classifiers has been investigated and the developed models, some of which are described below, show better performance than the above-mentioned standard classification approaches. Single-based and population-based metaheuristics can be used to train a NN. Single-solution-based approaches include the tabu search [15] and the simulated annealing (SA) approach [14], the latter of which is based on a Monte Carlo model that was applied by Metropolis et al. to replicate energy levels in cooling solids. The population-based approach in combination with a NN has attracted great interest because NNs combined with evolutionary algorithms (EAs) result in better intelligent systems than when relying on NNs or EAs alone [16,17]. Among these population-based approaches, a particle swarm optimization (PSO) algorithm in isolation and

http://dx.doi.org/10.1016/j.asoc.2015.06.018 1568-4946/© 2015 Published by Elsevier B.V.

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

G Model ASOC 3012 1–12 2 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

113

114 115 116 117 118 119 120 121 122 123 124 125 126

ARTICLE IN PRESS M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

a hybridization PSO algorithm with a local search operator have been employed to train a NN [18,19]. Other swarm intelligence methods such as ant colony optimization (ACO) have also been employed [20,21]. In addition, Chen [22] proposed a novel hybrid algorithm based on the artificial fish swarm algorithm. The genetic algorithm (GA) [23], differential evaluation [24], improved bacterial chemotaxis optimization (IBCO) [25], electromagnetism-like mechanism-based algorithm (EMA) [26], and harmony search algorithm (HSA) [27–30] are some of the other important methods that have been proposed in recent years. The efficiency of metaheuristic algorithms can be attributed to their ability to imitate the best features in nature and the ‘selection of the fittest’ biological systems. The two most important characteristics of metaheuristic algorithms are diversification and intensification [31]. The aim of intensification, which is also called exploitation, is to search locally and more intensively, while diversification, which is also called exploration, aims to ensure that the algorithm globally explores the search space. The two terms might appear to be contradictory, but their balanced combination is crucial to the success of any metaheuristic algorithm [7,31–33]. For firefly algorithm (FA) optimization, the diversification component is represented by the random movement component, while the intensification component is implicitly controlled by the attraction of different fireflies and the attractiveness strength. Unlike the other metaheuristics, the interactions between exploration and exploitation in the FA are intermingled in some ways, which might be an important factor in its successful solving of classification problems. In this work, we investigate combining the FA with a SA algorithm and hybridizing the FA with Lévy flight in order to attempt to improve the performance of the PNN by creating an effective balance between exploration and exploitation during the optimization process, which we attempt to achieve by controlling the randomness steps and exploring the search space efficiently in order to find the optimal weights of the PNN classification technique. Such hybridization requires a fine balance between diversification and intensification to ensure faster and more efficient convergence and to ensure the quality of the solutions in order to find the optimal weight of the PNN classification technique. Therefore we start from the first iteration to calculate the accuracy by modifying the PNN using the FA and SA to improve the quality of the best solution. To our knowledge, this is the first attempt to hybridize the FA with SA for classification problems. The rest of the paper is organized as follows: Section 2 presents the background and literature on FAs and Section 3 describes the proposed method. Section 4 presents a discussion of the experimental results and Section 5 provides details of the computational complexity of the proposed method. Section 6 concludes the work presented in this paper.

and (3) the brightness of every firefly symbolizes the quality of the solution. ˙ Łukasik and Zak employed a FA for continuous constrained optimization tasks and it was found to consistently outperform PSO [37]. Yang employed and compared a FA with PSO for various test functions and found that the FA obtains better results than PSO and also a GA in terms of efficiency and success rate. It has also been found that the broadcasting ability of the FA gives better and quicker convergence towards optimality [34]. In a similar work by Yang and Deb, experimental results revealed that the FA outperforms other approaches such as PSO [38]. Sayadi introduced a FA with local search for minimizing the makespan in permutation flow scheduling problems and the initial results indicated that the proposed method performs better than an ACO algorithm [39]. Gandomi applied a FA to mixed variable structural optimization problems and the empirical results showed that the FA is better than PSO, GA, SA, and HSA [40]. Another work on the FA can be found in [35], where it was successfully applied to solve the economic emissions load dispatch problem. In light of the foregoing, it can be seen that the FA is more effective than some other methods, which motivated us to further investigate its performance with respect to the classification problem. In the FA, the form of the attractiveness function of a firefly is denoted by the following: 2

ˇ(r) = ˇ0 exp(−r )

(1)

where r is the distance between any two fireflies, ˇ0 is the initial attractiveness at r = 0 (set to 1 in this work), and  is an absorption coefficient that controls the decrease of the light intensity (also set to 1 in this work). The distance between any two fireflies i and j at positions xi and xj , respectively, can be defined as a Cartesian or Euclidean distance as follows:

  d  rij = ||xi − xj || =  (xi,k − xj,k )2

(2)

127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158

159

k=1

where d is the dimensionality of the given problem. The movement of a firefly i, which is attracted by a brighter firefly j, is represented by the following equation:



1 xi = xi = ˇ0 ∗ exp(−rij2 ) ∗ (xj − xi ) + ˛ ∗ rand − 2



.

(3)

where the first term is the current position of a firefly, the second term is used to consider the attraction of a firefly towards the intensity of the light of neighbouring fireflies, and the third term is used for the random movement of a firefly when it cannot ‘see’ any brighter ones. The coefficient ˛ is a randomization parameter determined by the problem of interest, while rand is a random number generator consistently distributed in the space [0, 1] [41].

2. Firefly algorithm: background and literature

3. Proposed method: hybridized FAs

The FA was initially developed by Yang [34] as a populationbased technique for solving optimization problems. It was motivated by the short and rhythmic flashing light produced by fireflies. Theses flashing lights enable fireflies to attract each other and assist them to find a mate, attack their prey, and also protect themselves by creating a sense of fear in the minds of predators [35]. The less bright fireflies are easily attracted by the brighter fireflies and the brightness of the light of a firefly is affected by the landscape [7,36]. This process can be formulated as an optimization algorithm because the flashing lights (solutions) can be formulated to match with the fitness function to be optimized. The FA follows three rules: (1) fireflies must be unisex, (2) the less bright firefly is attracted to the randomly moving brighter fireflies,

According to Specht (1991) and Paliwal and Kumar (2009), the PNN is an effective approach for solving classification problems. A PNN has a relatively faster training process than the backpropagation NN and has an intrinsically analogous structure that ensures convergence with an optimal classifier because the size of the representative training set is maximized and training samples can be added or removed without extensive retraining [11,42]. A PNN consists of four layers: input, pattern, summation, and output, as shown in Fig. 1. The input layer is the first layer of neurons. Each input neuron represents a separate attribute in the training/test datasets (for example, from x1 to xn ). The number of inputs is equal to the number of attributes in the dataset. The values from the input data are

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

160 161 162

163

164 165 166 167 168 169 170

171

172 173 174 175 176 177 178 179 180 181 182 183 184

G Model ASOC 3012 1–12

ARTICLE IN PRESS M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

3

Fig. 1. Mechanism of firefly algorithm with probabilistic neural network.

185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210

then multiplied by the appropriate weights w(ij), as determined by the PNN algorithm shown in Fig. 1, and are transmitted to the pattern layer. The output layer is the last layer that typically contains only one class because only one output is usually requested. During the training phase, the goal is to determine the most accurate weights to be assigned to the connector line. In this phase, the output is computed repeatedly, and the result is compared to the preferred output generated by the training/test datasets. As shown in Fig. 2, the procedure starts from initial weights that are randomly generated by the original PNN classification model. The values from the input data are then multiplied by the appropriate weights w(ij), as determined by the PNN algorithm. In this work, we focus on exploration and exploitation [33] because the balance of these two components is crucial to the success of any metaheuristic algorithm [33,31]. Therefore we propose three hybrid methods for data classification and the algorithms in these methods are based on the FA. The FA was chosen in order to obtain the optimal parameter settings for training a PNN and to achieve the best accuracy. The random step length in the FA (the random length between the current position of a firefly and neighbouring fireflies) is not limited and is not controlled in the original firefly mechanism. The first hybrid algorithm tries to speed up the convergence of the FA to find the optimal solution by integrating it with SA. In this algorithm, the best result using a FA after evaluating the population is passed to the SA algorithm to generate a neighbour solution. The second hybrid algorithm consists of a FA

integrated with a Lévy flight algorithm. In this algorithm, we try to control the random step in the firefly mechanism. In the firefly mechanism, the third term in the movement step is random, so we use Lévy flight to control the randomization parameter in order to achieve a balance between exploration and exploitation. In sum, the two hybridization methods proposed are a FA with a SA algorithm (denoted as SFA) and a FA with Lévy flight (denoted as LFA). The third hybrid algorithm integrates SFA and LFA (denoted as LSFA). 3.1. FA with SA (SFA) In this method, denoted as SFA, the FA is hybridized with SA to solve classification problems. Hybridization of the FA with SA is employed to achieve a balance between exploration and exploitation in order to ensure efficient convergence and an accurate solution. The SA algorithm is based on a Monte Carlo model that was applied to replicate energy levels in cooling solids. The cooling process is very important in a SA algorithm. The temperature values are chosen based on the cooling approach. Generally, the cooling approach for a temperature remains steady for several decades, thus getting stuck in local optima is avoided through allocating prospects to deteriorate moves or by recognizing the worst solution the algorithm begins with is the initial solution. After the initial temperature and final temperature parameters and the cooling rate

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

211 212 213 214 215 216 217 218 219

220

221 222 223 224 225 226 227 228 229 230 231 232 233 234

G Model

ARTICLE IN PRESS

ASOC 3012 1–12

M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

4

w(ij)

w(11)

w(12)

w(13)

w(14)

w(21)

w(22)

w(23)

Initial weigahts by PNN

0.58

0.21

0.12

0.23

0.12

0.25

0.43

……. …….

0.18

0.16

0.51

…….

w(82)

w(83)

w(84)

0.79

0.8

0.16

0.9

0.84

0.22

Movement by Firefly Algorithm

Weights by FA

0.62

0.35

0.22

0.41

Fig. 2. Representation of initial weights.

235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262

are initialized, the algorithm produces neighbour solutions of the current solution, then accepts the solution if it is superior to the existing solution. However, if the outcome is not better than the existing one, it checks the probability rule and accepts the new solution if it fulfils the probability rule. The procedure proceeds until the stopping criterion is satisfied. Fig. 3 shows the mechanism of the original PNN, where training data is used to train the PNN and then the test data is classified using the PNN. In this stage, the accuracy of the classified data is calculated by Eq. (7). Fig. 4 shows the structure of the proposed SFA approach. The FA is invoked to adjust the weights of the PNN and the test data is classified, and then the accuracy of the newly classified data is calculated. The first phase of the proposed approach concerns the FA, which is used as an improvement algorithm. As mentioned above, the FA is based on the light density of fireflies: when fireflies are brighter they attract other fireflies to the locations of more efficient solutions and finally to the optimal solution. The FA has been applied successfully in solving many optimization problems because of its good exploration capability. Since the size of the search space in a PNN is very large, an efficient method is needed to find the optimal solution, hence we employ the FA to find optimal values for the weights of the PNN. In the FA, some candidate solutions are randomly created and spread in the search space, then the fitness value for each candidate is calculated. All the candidates start moving towards the better position and during this movement the search for the optimal solution is carried out. If the candidate reaches the best position, it is considered to be the best candidate.

The second phase of the proposed approach relates to the SA (see Fig. 4), which is also used as an improvement algorithm. The best result obtained by using the FA to evalute the population is passed to the SA algorithm to generate a neighbour solution. The new solution is accepted if it is better than the current one. If the new solution is worse than the current one, the probability rule is checked and the new solution is accepted if it satisfies the probability rule (as in Eq. (4)). This process continues until the temperature reaches zero. After that, the best solution is sent to the FA to generate a new population based on the best candidate. During this process, the search for the optimal solution is incessant. The accuracy of the optimal solution is calculated at the end of the procedure. The SFA is summarized in the pseudocode shown in Fig. 5. As seen from Fig. 4, the procedure starts from an initial population of randomly generated individuals. The quality of each individual is calculated using Eq. (7) and the best solution among them is selected. The hybridization of a FA with a SA algorithm is employed to achieve a balance between exploration and exploitation in order to ensure efficient convergence and obtain an accurate solution. It is also used as an improvement algorithm because the best result using the FA after the optimal solution is calculated they passed to the SA algorithm to generate a neighbour solution, and the new solution is accepted if it is better than the current one. Concurrently, the FA randomly generates the initial population of candidate solutions for the given problem (here, the weights of the PNN). After that, it calculates the light intensity for all fireflies and finds the most attractive firefly (best candidate) within the population. Then, it calculates the attractiveness and distance for each firefly to move all fireflies towards the most attractive firefly in the search space. Next, the best solution among the population is passed to the SA as the initial solution, Sol. Then the SA generates a neighbourhood/new solution, Sol*. The SA accepts the new solution, Sol*, if it is better than the current one. If the new solution is worse than the current one, Sol, it checks the probability rule and accepts the new solution if it satisfies the probability rule, which is computed as follows, where temp is a current temperature: exp(−

f (Sol∗) − f (Sol) ) > random [0, 1] temp

(4)

At every iteration, the temp is decreased by ˛, as defined in Eq. (5). The algorithm stops when the maximum number of iterations, Iter max, is reached or the penalty cost is zero. ˛ = (log(T0 ) − log(Tf )/Iter max)

(5)

If the termination criterion is not met, the solution is returned to the FA and the FA begins again. 3.2. FA with Lévy flight (LFA)

Fig. 3. Mechanism of original probabilistic neural network.

In the second proposed hybrid method, denoted as LFA, the FA is hybridized with Lévy flight to solve classification problems. Lévy flight is a random walk in which each step of the movement is independently drawn from a probability distribution that has a heavy

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298

299

300 301 302

303

304 305

306

307 308 309 310

G Model ASOC 3012 1–12

ARTICLE IN PRESS M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

5

Fig. 4. Flowchart of proposed SFA.

311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327

328 329

330 331 332 333 334 335

power-law tail. However, this power-law tail means that sometimes a very large step is taken. The best application that has been suggested for Lévy flight is for determining the movements of fishing boats, where it acts as an indicator of susceptibility and might serve as a warning mechanism for fisheries management [43]. The behaviour of Lévy flight can also be applied to optimization and optimal searches [44]. The hybridization of FA with Lévy flight is employed to create a balance between exploration and exploitation in order to ensure efficient convergence and an accurate solution. Fig. 6 shows the structure of the proposed LFA approach. In the FA, all the candidates start moving towards the best positions and Lévy flight is used to control the movement of each firefly towards the best candidate by controlling the random step inside the movement operation in the FA. LFA algorithms is a metaheuristic algorithm that was developed by combining Lévy flight with the search strategy of the FA [33], as represented by the following equation:   1 xi = xi + ˇ0 ∗ exp(−ij2 ) ∗ (xj − xi ) + ˛ sign rand − ⊕ Lévy 2 (6) where the first term is the current position of a firefly, the second term is used for the attractiveness of a firefly, and the third term is randomization using Lévy flight as the randomization parameter. The product ⊕ means entrywise multiplications. The sign (rand −½), where rand is between [0, 1], essentially provides a random

sign or direction while the random step length is drawn from a Lévy distribution. The LFA is summarized in the pseudocode shown in Fig. 7. As seen in Fig. 6, Lévy flight is used to control the movement of each firefly towards the best candidate by controlling Algorithm: SFA Begin Generate the initial population randomly; Evaluate each individual in the population f(x) based on Eq. (7); Find the best solution from the population; While (stopping criterion is not satisfied) For i = 1 to n do For j = 1 to n do If ( f(xj) > f(xi) ) Calculate attractive fireflies based on Eq. (1); Calculate the distance between each firefly i and j based on Eq. (2); Move all fireflies (xi) to the best solution (xj) based on Eq. (3); Endif End for j End for i Improve the final solution of the FA using SA (Phase 2 in Fig. 4); End while Return best (TP), (TN), (FP), and (FN) (see Table 2); End Fig. 5. Pseudocode for SFA.

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

336 337 338 339 340

G Model

ARTICLE IN PRESS

ASOC 3012 1–12

M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

6

Table 1 Characteristics of the datasets.

Fig. 6. Flowchart of proposed LFA. 341 342 343

344

345 346

the random step inside the movement operation in the FA in order to ensure efficient convergence and obtain an accurate solution. 3.3. SFA with Lévy flight (LSFA) In this method, denoted as LSFA, the FA is hybridized with Lévy flight to evaluate the population in each iteration and the best result Algorithm: LFA Begin Generate the initial population randomly; Evaluate each individual in the population f(x) based on Eq. (1); Find the best solution for the population; While (stopping criterion is not satisfied) For i = 1 to n do For j = 1 to n do If ( f(xj) > f(xi) ) Calculate attractive fireflies based on Eq. (1); Calculate the distance between each firefly i and j based on Eq. (2); Move all fireflies (xi) to the best solution (xj) based on Eq. (6) using Lévy flight; Endif End for j End for i End while Return best (TP), (TN), (FP), and (FN) (see Table 2); End Fig. 7. Pseudocode for LFA.

Dataset

No. of attributes

Training set

Test set

PIMA Indian diabetes (PID) Haberman Surgery Survival (HSS) Appendicitis (AP) Breast Cancer (BC) BUPA Liver Disorders (LD) Statlog (Heart) German Credit Data (GCD) Parkinsons SPECTF Australian Credit Approval (ACA) Fourclass

8 3 7 10 6 13 20 23 45 14 2

518 206 71 193 233 182 675 131 180 465 581

192 77 27 72 86 68 250 49 67 173 216

is passed to the SA to generate a neighbour solution to solve classification problems. The aim of this algorithm is to try to control the random step in the firefly mechanism. In order to improve the algorithm further, two particular issues need to be considered, which are related to the absorption coefficient and that affect the accuracy of the solution. In the first case, fireflies can clearly see the brighter fireflies because they emit the same amount of light at any distance. Therefore, some of the further-away fireflies bright fireflies attempt to advance towards these brighter fireflies by taking the largest possible steps to save time. Thus exploration and exploitation are out of balance because exploitation is maximal and exploration is normal. In the second case, fireflies cannot see any brighter fireflies so the movement of fireflies is in random steps, thus exploration and exploitation is out of balance because the fireflies only explore and do not exploit the search space. Therefore, our proposed LSFA attempts to address these imbalances between exploration and exploitation by ensuring that both sides of the current value are explored thoroughly to find the optimal value. In other words, the firefly mechanism starts searching the problem space at distances far from the current location (exploration capability), while the search process is restricted to a very small area around the current location as the iterations elapse (exploitation capability). 4. Experimental results A number of contributions are made by this research in respect of solving classification problems with higher accuracy and faster convergence speed. This is achieved as follows: first, in the proposed hybrid methods based on a PNN and FA an improvement is carried out by using the FA to try to optimize the weights of the PNN in order to obtain better classification accuracy. Second, the FA is hybridized with a SA algorithm (SFA) to achieve a balance between exploration and exploitation in order to ensure efficient convergence and obtain an accurate solution, and it is also used as an improvement algorithm. Third, the FA is hybridized with Lévy flight (LFA) to control the movement of each firefly towards the best candidate by controlling the random step inside the movement operation in the FA also in order to ensure efficient convergence and obtain an accurate solution. Finally, the proposed LSFA method combines the FA with Lévy flight and after evaluating the population in each of the iterations the best result is passed to the SA algorithm in order to generate a neighbour solution. The aim of the LSFA is to try to control the random step in the firefly mechanism in order to create a balance between exploration and exploitation and find a near-optimal solution in the shortest possible time. In the experiments, the proposed algorithms (SFA, LFA, LSFA) are implemented using Matlab and simulations are performed on an Intel Pentium 4, 2.8 GHz computer. The characteristics of the datasets used are summarized in Table 1. We execute 20 independent runs for each dataset.

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369

370

371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395

G Model

ARTICLE IN PRESS

ASOC 3012 1–12

M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx Table 2 Classification outcomes of a two-class problem.

The outcomes (quality of the solutions) of the experiments using the proposed hybrid methods are compared against those of approaches in the literature that deal with the same problems. The 11 benchmark UCR classification datasets on which the proposed approaches are applied range in size from 98 to 925 instances and contain a different number of attributes [45]. The quality of the classification is measured based on the accuracy value as in Eq. (7). Accuracy is referred as True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) [34], as defined in Table 2.

Predicted class

Actual class

Yes No

Yes

No

True Positive (TP) False Positive (FP)

False Negative (FN) True Negative (TN)

Table 3 Parameter settings. Parameter

Value

Population size (# of fireflies) Number of iterations/generations Initial attractiveness (ˇ0 ) Absorption coefficient () Initial temperature T0 Final temperature Tf

50 100 1 1 100 0.5

7

Accuracy =

TP + TN TP + TN + FP + FN

(7)

A TP classification occurs when the actual label and the predicted label of an object are both positive. A TN occurs when the actual label and the predicted label of an object are both negative. A FP refers to when the actual label is negative but the classifier predicts

Table 4 Classification accuracy, sensitivity, specificity and error rate (%) for FA, SFA, LFA and LSFA with PNN. Dataset

Approach

TP

TN

FN

Accuracy

Sensitivity

Specificity

Error rate

PID

FA SFA LFA LSFA

33 46 34 50

30 17 29 13

113 124 114 123

16 5 15 6

76.04 88.54 77.08 90.10

0.52 0.73 0.54 0.79

0.88 0.96 0.88 0.95

0.24 0.11 0.23 0.10

HSS

FA SFA LFA LSFA

54 53 50 53

2 3 6 3

10 12 14 13

11 9 7 8

83.12 84.41 83.12 85.71

0.96 0.95 0.89 0.95

0.48 0.57 0.67 0.62

0.17 0.16 0.17 0.14

AP

FA SFA LFA LSFA

24 24 24 24

0 0 0 0

1 1 1 1

2 2 2 2

92.59 92.59 92.59 92.59

1.00 1.00 1.00 1.00

0.33 0.33 0.33 0.33

0.07 0.07 0.07 0.07

BC

FA SFA LFA LSFA

31 17 21 16

1 6 2 7

24 44 38 45

12 5 11 4

80.88 84.00 81.94 84.72

0.97 0.74 0.91 0.70

0.67 0.90 0.78 0.92

0.19 0.15 0.18 0.15

LD

FA SFA LFA LSFA

18 28 24 27

15 5 9 6

50 46 47 48

3 7 6 0

79.07 86.05 82.56 87.21

0.55 0.85 0.73 0.82

0.94 0.87 0.89 1.00

0.21 0.14 0.17 0.07

Heart

FA SFA LFA LSFA

31 32 32 32

1 0 0 0

24 24 23 24

12 12 13 12

80.88 82.35 80.88 82.35

0.97 1.00 1.00 1.00

0.67 0.67 0.64 0.67

0.19 0.18 0.19 0.18

GCD

FA SFA LFA LSFA

166 163 166 169

13 16 13 10

30 56 30 53

41 15 41 18

78.40 87.60 78.40 88.80

0.93 0.91 0.93 0.94

0.42 0.79 0.42 0.75

0.22 0.12 0.22 0.11

Parkinsons

FA SFA LFA LSFA

38 39 38 39

1 0 1 0

6 6 6 7

4 4 4 3

89.80 91.84 89.80 93.88

0.97 1.00 0.97 1.00

0.60 0.60 0.60 0.70

0.10 0.08 0.10 0.06

SPECTF

FA SFA LFA LSFA

52 52 50 49

1 1 3 4

10 11 12 14

4 3 2 0

92.54 94.03 92.54 94.03

0.98 0.98 0.94 0.92

0.71 0.79 0.86 1.00

0.07 0.06 0.07 0.06

ACA

FA SFA LFA LSFA

65 72 65 73

9 2 9 1

94 93 94 94

5 6 5 5

91.91 95.37 91.91 96.53

0.88 0.97 0.88 0.99

0.95 0.94 0.95 0.95

0.08 0.05 0.08 0.03

Fourclass

FA SFA LFA LSFA

78 78 78 78

0 0 0 0

138 138 138 138

0 0 0 0

100.00 100.00 100.00 100.00

1.00 1.00 1.00 1.00

1.00 1.00 1.00 1.00

0.00 0.00 0.00 0.00

FP

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

396 397 398 399 400 401 402 403 404 405

406

407 408 409 410

G Model

ARTICLE IN PRESS

ASOC 3012 1–12

M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

8

416

it as positive. A FN refers to when the actual label is positive but the classifier predicts it as negative. In addition to accuracy, we consider three other performance measures: error rate, sensitivity and specificity. The equations for the error rate (Eq. (8)), sensitivity (Eq. (9)), and specificity (Eq. (10)) are shown below:

417

Error Rate = 1 −

418

Sensitivity =

TP TP + FN

(9)

419

Specificity =

TN TN + FP

(10)

411 412 413 414 415

420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449

450 451 452 453 454 455 456 457 458 459 460 461 462 463 464

465 466 467 468

TP + TN TP + TN + FP + FN

(8)

Table 3 shows the parameters and their settings for the proposed algorithms, which were determined after some preliminary experiments. Table 4 presents the results of a comparison of the three proposed hybrid FAs and the FA in isolation when applied to the 11 datasets. The best results are presented in bold. From Table 4, the hybrid approaches show better performance (with respect to classification accuracy) than the FA alone. However, the LSFA clearly outperforms the other hybrid methods and is able to obtain the highest classification accuracy for all the tested datasets. The LSFA is followed by the SFA and the LFA. This finding correlates with the error rate, where the LSFA obtains lower error rates than the other approaches. Sensitivity is the proportion of TPs that are correctly known from a diagnostic test. For example, in the BC dataset, the LSFA achieved a sensitivity of 70%, which means that if one conducted a diagnostic test to identify patients with a certain disease there would be a 70% of chance that the patient(s) with the disease would be positively identified. From Table 4, the LSFA shows better performance (with respect to sensitivity) than the other algorithms for almost all datasets, while the FA is better than the others for two datasets (HSS and BC) and the SFA is better for one dataset (LD). On the other hand, specificity is the proportion of TNs that are correctly identified in the diagnostic test. For example, the LSFA obtains a specificity percentage of 92% for the BC dataset, which indicates that if one conducted a diagnostic test to identify patients without a certain disease, there would be a 92% of chance that the patient(s) without the disease would be identified as a negative case. Based on the results presented in Table 4 we can conclude the following: • In general, the SFA is better than the FA due to the capability of SA to keep the best found solution, which helps the algorithm to retain the best identified position so far. • The LFA is better than the FA because Lévy flight is able to restrict the movement step of the FA to a very small area around the current location. • The LSFA is better than the other proposed approaches (SFA, LFA) in terms of classification accuracy, sensitivity, specificity, and error rate. This is because SA and Lévy flight can control the randomness step within the FA and create an exploration and exploitation strategy where the exploration of the FA starts the search from distances that are far from the current location and where the exploitation uses Lévy flight and SA so that the search process is restricted to a very small area around the current location as the iterations elapse. We further investigate the performance of LSFA to determine whether it is statistically different from the FA, SFA, and LSFA by conducting a pairwise Wilcoxon test with a significance interval of 95% (˛ = 0.05) on classification accuracy, sensitivity,

Table 5 p-Values of Wilcoxon test for LSFA and FA, SFA and LFA: classification accuracy. Dataset

LSFA vs. FA

LSFA vs. SFA

LSFA vs. LFA

PID HSS AP BC LD Heart GCD Parkinsons SPECTF ACA Fourclass

0.005 0.009 0.002 0.007 0.005 0.005 0.005 0.002 0.074 0.005 1.000

0.854 1.000 1.000 0.404 0.212 0.009 0.032 0.002 0.211 0.012 1.000

0.005 0.007 1.000 0.007 0.005 0.003 0.005 0.002 0.958 0.005 1.000

Table 6 p-Values of Wilcoxon test for LSFA and FA, SFA and LFA: sensitivity. Dataset

LSFA vs. FA

LSFA vs. SFA

LSFA vs. LFA

PID HSS AP BC LD Heart GCD Parkinsons SPECTF ACA Fourclass

0.005 0.953 1.000 0.012 0.005 0.157 0.047 0.002 0.027 0.005 1.000

0.343 0.276 1.000 0.328 0.021 1.000 0.050 1.000 0.042 0.403 1.000

0.012 0.108 1.000 0.229 0.008 1.000 0.202 0.002 0.042 0.005 1.000

Table 7 p-Values of Wilcoxon test for LSFA and FA, SFA and LFA: specificity. Dataset

LSFA vs. FA

LSFA vs. SFA

LSFA vs. LFA

PID HSS AP BC LD Heart GCD Parkinsons SPECTF ACA Fourclass

0.007 0.043 1.000 0.231 0.008 0.010 0.005 0.002 0.008 0.008 1.000

0.465 0.336 1.000 0.292 0.005 0.010 0.199 0.002 0.411 0.411 1.000

0.005 0.475 1.000 0.077 0.008 0.003 0.005 0.002 0.223 0.223 1.000

and specificity, the obtained p-values of which are presented in Tables 5–7, respectively. The statistical information presented in Table 5 (with respect to classification accuracy) shows that the results for the LSFA are statistically different (p-value < 0.05) from the FA (except for the Fourclass dataset) and the LFA (except for the AP and Fourclass datasets). However, there is no significant difference between the LSFA and SFA, where there are seven datasets with a p-value > 0.05 (highlighted in bold). This finding is supported by the results presented in Table 4, which shows that the SFA is the closest competitor to the LSFA. In terms of sensitivity, it can be seen from Table 6 that the LSFA is statistically different from the FA. However, there is no significant difference between the LSFA and the SFA and LFA, where there are eight and six datasets with a p-value > 0.05, respectively (highlighted in bold). As seen from the results in Tables 5–7, the results for the LSFA are better than those of the other two proposed approaches because it leverages exploration and exploitation capabilities simultaneously. Fig. 8 shows the box plots that illustrate the distribution of solution quality for six datasets obtained by the FA, LFA, SFA, and LSFA. In most cases, we can see that the LSFA reduces the gap between the best, average, and worst solution qualities, which demonstrates

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491

G Model

ARTICLE IN PRESS

ASOC 3012 1–12

M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

100

9

100

95

95

6

90

90

85

85

80

80

75

75

70

70

PID

HSS

AP

FA

BC

GCD

10 2 1

3

ACA

PID

HSS

AP

BC

GCD

ACA

LFA 100

100 8

95

6

95 90

90 76

85

9

80

80

75

75

70

70

PID

HSS

4

85

2

AP

BC

GCD

PID

ACA

SFA

HSS

AP

LSFA

BC

GCD

ACA

Fig. 8. Box plots of penalty costs for all instances obtained by FA, LFA, SFA, and LSFA.

492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516

that the approach is robust. We believe that this is because SA is able to keep the best found solution position and Lévy flight is able to control or limit the movement step in the LSFA approach. In addition to this comparison based on classification accuracy, we present the simulation results of the convergence characteristics of the FA, SFA, LSA, and LSFA as plotted in Fig. 9 on the PID, HSS, GCD, Parkinsons, SPECTF, and ACA datasets. The experimental results indicate that the LSFA converges faster than the other approaches and shows the same trend of convergence over all the tested datasets. We now assess the computational results (in terms of classification accuracy) of the LSFA against the results of other approaches in the literature listed in Table 8. Note that none of the methods in Table 8 were tested on all the datasets that were used in this work. Thus, the results for each dataset are compared using different sets of approaches, as shown in Table 9. The best results are highlighted in bold. On most of the tested data, the LSFA is ranked first, except on the Heart dataset, where the LSFA is ranked second (with respect to classification accuracy). Moreover, the LSFA manages to classify the Fourclass dataset without any misclassification (accuracy of 100%). In this work, the search strategy of the FA is to start searching the problem space in distances far from the current location (exploration capability), which helps it to achieve a better convergence

to the near-optimal solution. In addition, the fitting exploitation is designed to exploit the history and experience of the search process. The search strategy aims to speed up convergence by reducing randomness and limiting exploration through a combination of Lévy flight and SA. It has been noted by previous studies that the hybridization approach can achieve a better balance between

Table 8 Acronyms of the compared methods. #

Acronym

Name of approach

Reference

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16

Flexible Neural-Fuzzy Inference System Fuzzy Neural Networks Fuzzy Kernel Multiple Hyperspheres SVMs using linear terms Proximal SVMs SVMs K-NN BP ANNs CLIP 3 C4.5 PSOPART-RVNS FAIRS Predictive Value Maximization CBA Hybrid simplex-GA

[46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [3] [56] [57] [58] [59] [60]

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

517 518 519 520 521 522

G Model ASOC 3012 1–12 10

ARTICLE IN PRESS M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

Fig. 9. Convergence characteristics of FA, SFA, LSA, and LSFA.

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

G Model

ARTICLE IN PRESS

ASOC 3012 1–12

M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx Table 9 Comparison of results using LSFA and state-of-the-art approaches. Dataset

Approach

Classification accuracy (%)

PID

LSFA M1 M2 M15

90.10 (1st rank) 78.6 81.8 76.7

LSFA M4 M5 M15

85.71 (1st rank) 71.2 72.5 72.7

AP

LSFA M14 M15

92.59 (1st rank) 89.6 91.4

BC

LSFA M8 M15 M16

84.72 (1st rank) 73.5 84.3 79.687

LSFA M12 M13 M15

87.21 (1st rank) 70.97 83.4 77.5

LSFA M12 M15

82.35 (2nd rank) 84.36 77.9

LSFA M6 M15

88.80 (1st rank) 77.9 86.4

LSFA M6 M9 M15

93.88 (1st rank) 91.4 81.3 85.7

LSFA M10 M15

94.03 (1st rank) 77 83.6

LSFA M6 M11 M15

96.53 (1st rank) 85.5 85.7 69.9

LSFA M3 M7 M15

100 (1st rank) 99.8 100 (1st rank) 94.6

HSS

LD

Heart

GCD

Parkinsons

SPECTF

ACA

Fourclass

523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539

540

541 542

exploration and exploitation, which can lead to faster convergence [61,62]. The hybridization of a population-based and a local search helps to create a balance between exploration and exploitation and prevents the process from becoming stuck in local optima. The simulation results indicate that one of the reasons for the faster convergence performance of the LSFA compared to the other models is that the algorithm starts from a good initial state. In sum, the results indicate that the LSFA approach is a suitable method that can be employed to solve classification problems because it shows good performance in terms of classification accuracy and fast convergence. Indeed, all three of the proposed approaches could be applied to other real and high-dimensional datasets in order to study their behaviour under different conditions in terms of numbers of classes and attributes, and the results could be compared with state-of-the-art approaches. As yet, the performance of the LSFA has not been tested on a large number of classes, and this will therefore be the subject of future work. 5. Computational complexity In this section, we calculate the computational complexity of the LSFA as follows:

11

• Time complexity of the PNN training for size S datasets is O(S) [63]. • Time complexity to generate the initial population of FA is O(N*S), where N is number of population. • Time complexity of sorting and finding the best solution from the population is O(N). • Time complexity to move all fireflies (xi ) to the best solution (xj ) is O(M*N2 *S), where M is # of iterations. • Time complexity of using the SA algorithm is O(M*N). • The overall big-O time complexity of LSFA is compounded by O(M*N2 *S), thus the actual running time or significantly lower assuming that the N is constant.

543 544 545 546 547 548 549 550 551 552 553 554

6. Conclusion

555

The overall goal of the work presented in this paper was to investigate the performance of modified FAs in solving classification problems. Three hybridized FAs were proposed: hybridization of FA with SA (SFA), hybridization of FA with Lévy flight (LFA), and a combination of FA, SA and Lévy flight (LSFA). These hybridizations were developed with the aim of creating an improved balance between exploration and exploitation in obtaining near-optimal weights for the PNN, which could thereby maximize classification accuracy and achieve a high convergence speed for classification problems. A comparison of these three hybridization methods showed that the LSFA exhibited a better overall performance on 11 benchmark datasets. This approach is simple yet effective and manages to produce a number of new best results in comparison with other approaches in the literature. We are confident that this study makes a significant contribution and will enable the production of highquality solutions for classification problems. However, we believe that a further study is needed to investigate how a good initial state could drive better classification accuracy and high convergence speed for classification problems. In this regard, the LSFA could be applied to other real and high-dimensional datasets in order to study their behaviour under different conditions in terms of numbers of classes and attributes. Therefore, this issue is the subject of our future work.

556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578

Acknowledgements

579

This work was supported by the Ministry of Education, Malaysia Q4 (ERGS/1/2013/ICT07/UKM/02/5) and the Universiti Kebangsaan Malaysia (DIP-2012–15). We are very grateful to the anonymous referees whose thoughtful and considered comments significantly improved the paper. References [1] G.P. Zhang, Neural networks for classification: a survey, IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 30 (2002) 451–462. [2] Y.J. Lee, O.L. Mangasarian, SSVM: a smooth support vector machine for classification, Comput. Optim. Appl. 20 (2001) 5–22. [3] N. Friedman, et al., Bayesian network classifiers, Mach. Learn. 29 (1997) 131–163. [4] A. Wai-Ho, K.C.C. Chan, Classification with degree of membership: a fuzzy approach, presented at the Proceedings in International Conference on Data Mining, California, USA, 2001. [5] E.M. Azoff, Neural Network Time Series Forecasting of Financial Markets, John Wiley & Sons, Inc., 1994. [6] J.K. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. [7] M. Alweshah, Firefly algorithm with artificial neural network for time series problems, Res. J. Appl. Sci. Eng. Technol. 7 (2014) 3978–3982. [8] Y. Zhang, L. Wu, Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network, Expert Syst. Appl. 36 (2009) 8849–8854. [9] M.N. French, et al., Rainfall forecasting in space and time using a neural network, J. Hydrol. 137 (1992) 1–31. [10] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge Univ. Press, 2008.

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

Q5

580 581 582 583 584

585

586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607

G Model ASOC 3012 1–12 12 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673

ARTICLE IN PRESS M. Alweshah, S. Abdullah / Applied Soft Computing xxx (2015) xxx–xxx

[11] D.F. Specht, Probabilistic neural networks, Neural Netw. 3 (1990) 109–118. [12] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006. [13] W.P. Sweeney Jr., et al., Classification of chromosomes using a probabilistic neural network, Cytometry 16 (1994) 17–24. [14] S. Chalup, F. Maire, A study on hill climbing algorithms for neural network training, in: Proc. Congress on Evolutionary Computation, 1999, pp. 2014–2021. [15] N.K. Treadgold, T.D. Gedeon, Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm, Neural Netw. 9 (1998) 662–668. [16] E. Alba, J. Chicano, Training neural networks with GA hybrid algorithms, presented at the Proceedings of GECCO’04, Seattle, Washington, 2004. [17] P. Malinak, R. Jaksa, Simultaneous gradient and evolutionary neural network weights adaptation methods, presented at the IEEE Congress on Evolutionary Computation (CEC), 2007. [18] F. Zhao, et al., Application of an improved particle swarm optimization algorithm for neural network training, presented at the Conference on Neural Networks and Brain (ICNN&B’05), 2005. [19] M.G.H. Omran, Using opposition-based learning with particle swarm optimization and barebones differential evolution Particle Swarm Optimization, vol. 23, InTech Education and Publishing, 2009, pp. 343–384. [20] K. Socha, C. Blum, An ant colony optimization algorithm for continuous optimization: application to feed-forward neural network training, Neural Comput. Appl. 16 (2007) 235–247. [21] C. Blum, K. Socha, Training feed-forward neural networks with ant colony optimization: an application to pattern classification, in: Fifth International Conference on Hybrid Intelligent Systems (HIS’05), 2005, pp. 233–238. [22] X. Chen, et al., A novel hybrid evolutionary algorithm based on PSO and AFSA for feedforward neural network training, presented at the IEEE 4th International Conference on Wireless Communications, Networking and Mobile Computing, 2008. [23] D.J. Montana, L. Davis, Training feedforward neural networks using genetic algorithms, presented at the International Joint Conference on Artificial Intelligence, USA, 1989. [24] A. Slowik, M. Bialko, Training of artificial neural networks using differential evolution algorithm, presented at the Conference on Human System Interactions, Amsterdam, 2008. [25] Y.D. Zhang, L. Wu, Weights optimization of neural network via improved BCO approach, Prog. Electromagn. Res. 83 (2008) 185–198. [26] X.J. Wang, et al., Electromagnetism-like mechanism based algorithm for neural network training, in: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, 2008, pp. 40–45. [27] A. Kattan, R. Abdullah, A parallel & distributed implementation of the harmony search based supervised training of artificial neural networks, presented at the ISMS, 2011. [28] A. Kattan, R. Abdullah, An enhanced parallel & distributed implementation of the harmony search based supervised training of artificial neural networks, presented at the CICSyN, 2011. [29] S. Kulluk, et al., Self-adaptive global best harmony search algorithm for training neural networks, Procedia Comput. Sci. 3 (2011) 282–286. [30] A. Kattan, et al., Harmony search based supervised training of artificial neural networks, presented at the ISMS’10 Proceedings of the 2010 International Conference on Intelligent Systems, Modelling and Simulation, Washington, DC, USA, 2010. [31] C. Blum, A. Roli, Metaheuristics in combinatorial optimization: overview and conceptual comparison, ACM Comput. Surv. 35 (2003) 268–308. [32] M. Dorigo, C. Blum, Ant colony optimization theory: a survey, Theor. Comput. Sci. 344 (2005) 243–278. [33] X.S. Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, 2008. [34] X.S. Yang, Firefly algorithms for multimodal optimization, Stoch. Algorithms: Found. Appl. (2009) 169–178. [35] T. Apostolopoulos, A. Vlachos, Application of the firefly algorithm for solving the economic emissions load dispatch problem, Int. J. Comb. 2011 (2011) 1–23. [36] X.-S. Yang, Engineering Optimization: An Introduction with Metaheuristic Applications, Wiley, 2010. ˙ Firefly algorithm for continuous constrained optimization [37] S. Łukasik, S. Zak, tasks, in: Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems, 2009, pp. 97–106.

[38] X.S. Yang, S. Deb, Eagle strategy using lévy walk and firefly algorithms for stochastic optimization, in: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), 2010, pp. 101–111. [39] M.K. Sayadi, et al., A discrete firefly meta-heuristic with local search for makespan minimization in permutation flow shop scheduling problems, Int. J. Ind. Eng. Comput. 1 (2010) 1–10. [40] A.H. Gandomi, et al., Mixed variable structural optimization using firefly algorithm, Comput. Struct. (2011) 2325–2336. [41] H.M.J. Hung, et al., The behavior of the P-value when the alternative hypothesis is true, Biometrics 53 (1997) 11–22. [42] M. Paliwal, U.A. Kumar, Neural networks and statistical techniques: a review of applications, Expert Syst. Appl. 36 (2009) 2–17. [43] S. Bertrand, et al., Scale-invariant movements of fishermen: the same foraging strategy as natural predators, Ecol. Appl. 17 (2007) 331–337. [44] X.S. Yang, S. Deb, Cuckoo Search via Lévy flights, IEEE, 2009, pp. 210–214. [45] H. Pham, E. Triantaphyllou, A meta-heuristic approach for improving the accuQ6 racy in some classification algorithms, Comput. Oper. Res. (2010). [46] L. Rutkowski, K. Cpalka, Flexible neuro-fuzzy systems, Neural Netw. 14 (2003) 554–574. [47] W. Leon IV, Enhancing Pattern Classification with Relational Fuzzy Neural Networks and Square BK-products (PhD dissertation in computer science), Springer, FL, USA, 2006. [48] G. Lei, et al., A novel classification algorithm based on fuzzy kernel multiple hyperspheres, presented at the Proceedings of the fourth international conference on fuzzy systems and knowledge discovery, FSKD Haikou, Hainan, China, 2007. [49] V.A. Kecman, I. T. Hadzic, LP and QP based learning from empirical data, presented at the Proceedings of International Joint Conference on Neural Networks (IJCNN 2001), Washington, DC, 2001. [50] G.M. Fung, O.L. Mangasarian, Multicategory proximal support vector machine classifiers, Mach. Learn. 59 (2005) 77–97. [51] C.L. Huang, et al., Credit scoring with a data mining approach based on support vector machines, Expert Syst. Appl. 33 (2007) 847–856. [52] N. Segata, E. Blanzieri, Empirical Assessment of Classification Accuracy of Local SVM, 2008. [53] F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms (PhD thesis), Dept. Comput. Sci., Brigham Young Univ., 1995. [54] M. Ene, Neural network-based approach to discriminate healthy people from those with Parkinson’s disease, Ann. Univ. Craiova-Math. Comput. Sci. Ser. 35 (2008) 112–116. [55] L.A. Kurgan, et al., Knowledge discovery approach to automated cardiac SPECT diagnosis, Artif. Intell. Med. 23 (2001) 149–169. [56] F. Al-Obeidat, et al., Automatic parameter settings for the PROAFTN classifier using hybrid particle swarm optimization, presented at the Canadian Conference on AI, 2010. [57] K. Polat, et al., Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism, Expert Syst. Appl. 32 (2007) 172–183. [58] R.J. Mooney, Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning, The Computing Research Repository (CoRR), 1996, pp. 82–91. [59] H.N.A. Pham, E. Triantaphyllou, A meta-heuristic approach for improving the accuracy in some classification algorithms, Comput. Oper. Res. 38 (2011) 174–189. [60] H. Salar, F. Farrokhi, Improving genetic algorithm performance in multiclassification using simplex method, presented at the First International Conference on Integrated Intelligent Computing (ICIIC), 2010. [61] X.S. Yang, Nature-inspired Metaheuristic Algorithms, Luniver Press, 2011(2008). [62] E.-G. Talbi, Metaheuristics: from Design to Implementation, Wiley Online Library, 2009. [63] F. Ancona, et al., Implementing probabilistic neural networks, Neural Comput. Appl. 5 (1997) 152–159.

Please cite this article in press as: M. Alweshah, S. Abdullah, Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.06.018

674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737