A novel hybrid clustering approach based on K-harmonic means using robust design

A novel hybrid clustering approach based on K-harmonic means using robust design

Author’s Accepted Manuscript A novel hybrid clustering approach based on Kharmonic means using robust design Wei-Chang Yeh, Chyh-Ming Lai, Kuei-Hu Cha...

1MB Sizes 28 Downloads 66 Views

Author’s Accepted Manuscript A novel hybrid clustering approach based on Kharmonic means using robust design Wei-Chang Yeh, Chyh-Ming Lai, Kuei-Hu Chang

www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(15)01353-3 http://dx.doi.org/10.1016/j.neucom.2015.09.045 NEUCOM16113

To appear in: Neurocomputing Received date: 2 October 2014 Revised date: 30 July 2015 Accepted date: 16 September 2015 Cite this article as: Wei-Chang Yeh, Chyh-Ming Lai and Kuei-Hu Chang, A novel hybrid clustering approach based on K-harmonic means using robust design, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2015.09.045 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A novel hybrid clustering approach based on K-harmonic means using robust design

Wei-Chang Yeh 1, Chyh-Ming Lai 1 and Kuei-Hu Chang 2,*

1

Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu 300, Taiwan 2

Department of Management Sciences, R.O.C. Military Academy, Kaohsiung 830, Taiwan

Abstract

The K-harmonic means (KHM) algorithm has been proposed for solving the initialization problem of K-means recently. However, the KHM still suffers from being trapped into local optima. In order to solve this problem, this paper presents a novel hybrid algorithm named RMSSOKHM based on KHM and a modified simplified swarm optimization. The proposed RMSSOKHM adopts the rapid centralized strategy (RCS) to increase the convergence speed and the minimum movement strategy (MMS) to effectively and efficiently search better solutions without trapping in local optima. In addition, the parameter settings of the proposed approach were optimized by using the Taguchi method. The performance of the proposed RMSSOKHM was examined and compared with the existing-known methods using eight benchmark datasets. The experimental results indicated that the proposed RMSSOKHM is superior to its competitors in terms of the quality of solutions and the efficiency of performance.

Keywords: particle swarm optimization; simplified swarm optimization; data clustering; K-harmonic means; Taguchi method

1

1. Introduction

With the development of information technology, the quantity of data has rapidly been increasing. As a result, the analysis of information hidden in these data is distinctly important. Data mining is a useful and efficient process for extracting useful information and knowledge from a large dataset. Among data mining methods, cluster analysis is an important and popular technique to group unlabeled data into different groups (clusters) based on similarities and dissimilarities between any pair of data instances [1, 2]. K-means (KM) is one of the most widespread and fundamental cluster analyses. It measures the distance between centers and data objects to partition data into k clusters, in terms of simplicity and efficiency of the algorithm [3]. However, the initial state may cause the solution obtained from the K-means to be determined as a local optima, thereby impacting the quality of the solution [4]. K-harmonic means (KHM) first proposed by Zhang et al. [5] is a new center-based clustering algorithm to solve the initialization problem of the KM. Different from KM, KHM assigns dynamic weights to each data objects based on a harmonic average such that KHM is more insensitive to initialization than K-means. However, it still suffers from being trapped into local optima because of its greedy search nature [6]. Recently, some evolutionary algorithms have made convincing progress in overcoming this drawback by combining with KHM. Gungor and Unler [6] proposed an approach based on the simulated annealing technique to help KHM to overcome its greedy search nature. The result demonstrated that the performance is better than that of KM and KHM. After that, Gungor and Unler [7] applied another approach by combining TS with KHM, and a comparison of its results with those of KM, fuzzy KM, and KHM showed that the algorithm outperforms to competitors. Jiang et al. [8] integrated ant clustering algorithm and KHM, utilized the merit of ant clustering algorithm, which partitions data without any knowledge of initial cluster centers, to help KHM reach a global optimal effectively. Yang et al. [9] proposed a hybrid clustering algorithm based on particle swarm optimization (PSO) [10-11] and KHM. The experimental results showed that the algorithm not only improves the convergence speed of PSO but also helps KHM escape from local optima. Recently, Yin et al. [12] introduced an improved GSA to solve the local optimal problem of KHM. The algorithm adopted two

2

strategies to avoid losing population diversity and improve the converge speed of GSA. The experimental results showed that this algorithm is both efficient and effective. The results of the above studies had indicated that integrating KHM into a hybrid algorithm can enhance the capacity of KHM to escape from local optima. However, those approaches have more computational complexity. In contrast, simplified swarm optimization (SSO) first proposed by Yeh [13], which searches solutions based on comparisons of one randomly generated variable to only three parameters, is easy to implement and efficient. In more recent years, a growing number of applications based on SSO are now available to demonstrate the performance of SSO [14-17]. As most of evolutionary algorithms, the performance of SSO depends on the proper setting of parameters. In the last several decades, growing attention is being paid to the use of the Taguchi method for robust design to find the best settings of the parameters by minimizing the variation [18-26]. In this work, two most popular tools of the Taguchi method, orthogonal array and signal-to-noise [25-27], are applied in the parameter design of SSO for obtaining optimized parameter settings in solving the clustering problem. The aim of this paper is to overcome the drawback of KHM and demonstrate more advantages of SSO on the clustering problem. KHM is integrated with a modified SSO, which adopts the rapid centralized strategy (RCS) to increase the convergence speed and searches through the minimum movement strategy (MMS) to improve the quality of the solution, to form a new hybrid clustering algorithm. In order to optimize its parameter settings, the experiments of robust designs such as Taguchi method are conducted. For evaluating this algorithm, eight benchmark datasets are tested, and the performance is compared with the related works. Encouraging results are found in terms of efficiency and effectiveness of the algorithm. The rest of the paper is organized as follows. In Section 2, the clustering problem, KHM and Taguchi method are briefly described. Section 3 introduces the proposed algorithm to solve clustering problems. The results of the experiments and comparison of the results to related works are discussed in Section 4. The conclusion of this paper is presented in Section 5.

2. Related works 2.1 K-harmonic means clustering

The goal of clustering is to partition a given set of N objects X = {X1, X2,…,Xi,…, XN}, each Xi={x1,

3

x2,…, xd,…, xD}  RD, into K groups, also called clusters C = {(C1, Z1),(C2, Z2),…,(Ck, Zk),…,(CK, Zk)}, where Ck is a subset of objects representing the kth cluster, Zk represents the centroid of Ck and K  N , such that [2] (1)

C   , k  1,2,, K ;

(2)



(3)

Ci  C j  ; i, j  1,2,, K and i  j .

K

k 1

Ck  X ;

The KHM proposed by Zhang et al. [5] and modified by Hamerly and Elkan [28] is a clustering approach similar to KM. The process of KHM can be described as follows [5, 9, 12]: Step 1. Generate the initial centroids randomly. Step 2. Calculate the objective function value according to

KHM ( X , Z ) 



K

N

i 1



(1)

1

K k 1

X i  Zk

p

where p is an input parameter; usually p  2 . The KHM objective function, which is the sum of the harmonic average of the distance (SHAD) from each data object to all centroids, is a differentiable function. To minimize SHAD is the aim of KHM. Step 3. Calculate the membership m( Z k X i ) of each data object according to

m( Z k X i ) 

X i  Zk



K k 1

 p2

(2)

 p 2

X i  Zk

Step 4. Calculate the weight w( X i ) of each data object according to

 w( X )     i

K k 1 K k 1

X i  Zk X i  Zk

 p 2

p 

(3)

2

 

Step 5. Update each centroid according to Eq. (4), and then calculate the objective function value according to Eq. (1)

 m(Z X )w( X ) X   m(Z X )w( X ) N

Z 'k

i 1 N

i 1

k

i

k

i

i

i

(4)

i

4

go to step 6 if the predefined number of iterations is reached; otherwise, repeat step 3-5. Step 6. Partition the data object Xi to cluster k with the largest m( Z k X i ) .

2.2 Simplified swarm optimization (SSO)

SSO is a population-based algorithm proposed by Yeh [13] to compensate for the deficiencies of PSO in solving discrete problems. In SSO, each individual in the swarm, called a particle representing a solution, is encoded as a finite-length string with a fitness value. Similar to most population-based algorithms, the solution of a specified problem is improved by the update mechanism (UM), which is the core of any evolutionary algorithm scheme. The UM of SSO is as follows:

 xijt 1 if   [0, C w )   p t 1 if   [C w , C p )  xijt   ij  g j if   [C p , C g )   x if   [C g ,1) 

(5)

where xijt is the position value in the ith solution with respect to the jth variable of the solution space at generation t. pi  ( pi1 , pi 2 ,, pij ) represents the best solution with the best fitness value in its own history, known as pBest. The best solution with the best fitness value among all solutions is called gBest, which is denoted by g  ( g1, g2 ,, gn ) , and gj denotes the jth variable in gBest. x is a new randomly generated value between the lower bound and the upper bound of the jth variable.  is a uniform random number between 0 and 1. Cw, Cp and Cg are three predetermined parameters, which form four interval probabilities. Thus, cw=Cw, cp=Cp-Cw, cg=Cg-Cp and cr=1-Cg represent probabilities of the new variable from four sources, such as the current solution, pBest, gBest and a random movement in the UM, respectively. The UM updates each solution to be a compromise of the four sources; in particular, a random movement maintains population diversity and enhances the capacity of escaping from a local optimum.

5

2.3 Taguchi method

In the 1940’s, Dr. Taguchi introduced a form design of experiments, popularly known as the Taguchi method, for improving the quality of products or manufacture processes [27]. Two techniques including orthogonal array and signal to noise (S/N) ratio are used in Taguchi method for parameters setting [24-27]. By using orthogonal array, which is employed for the arrangement of control and noise factors, the effect of the determined parameters can be estimated reliably without complex experimental designs. The control factors, which can be controlled in experiments, are assigned to the outer array. The noise factors, which are difficult or impossible to control in terms of cost or experimental environment, are assigned to the inner array. Table 1 shows a typical example of experimental strategy for parameters design. Table 1 Experimental strategy for parameter design.

Run 1 2 3 4

D E F Outer array A B C 1 1 1 1 2 2 2 1 2 2 2 1

Inner array 1 2 2 1 1 2 1 2 1 1 2 2 Replication 1 2 3 4

S/N ratios

yij

SNi

In this work, the aim of Taguchi method is to optimize the setting of parameters for improving the robustness of evolutionary algorithms by maximize the S/N ratio. Taguchi use the S/N ratio to measure the deviation between the response value (fitness value) and the desired value. Three different categories of S/N ratio, which are nominal-is-better, smaller-is-better and larger-is-better as shown as the following [18]: Nominal-is-better: SN i  10log10 (

yi s 2yi

Smaller-is-better: SN i  10log10 ( Larger-is-better: SN i  10log10 (

1 n

1 n

(6)

)



n



y2 ) j 1 ij

n

1

j 1

yij2

(7)

(8)

)

6

where yi and s 2yi are the mean and variance of the value of response variable y in the ith run, respectively. yij is the response of the jth replication in the ith run, and n is the number of replications. We would use nominal-is-better if the objective is to diminish variability with a specific target, smaller-is-better if the experiment is optimized when the response is as small as possible, and larger-is-better, if the experiment is optimized when the response is as large as possible [18]. In this paper, smaller-is-better is adopted since the objective of the proposed algorithm is to minimize SHAD [26].

3. The proposed methods

The SSO is first implemented to solve clustering problems and merged with KHM for capturing the merits of both. In order to enhance the performance of this hybrid algorithm, the restraint in the use of KHM and modified SSO are addressed in this section.

3.1 SSOKHM clustering algorithm

Similar to most population-based algorithms, a population of solutions, also called particles is generated randomly. Encoding solution is the critical first step toward becoming a clustering algorithm. Each solution is encoded as a finite-length string composed of 𝐾 initial centroids of clusters. Fig. 1 illustrates a solution string for a clustering problem with 𝐾 clusters, where the data object has 𝐷 features, also known as dimensions or variables.

Z1 z11

z12

ZK ...

z1D

...

zK1

zK2

...

zKD

Fig. 1. Example of a solution string.

For overcoming the drawback of KHM, KHM are incorporated with SSO to form a hybrid clustering algorithm called SSOKHM. KHM can find a near optimal solution with fewer function evaluations than SSO but it tends to get stuck into local optima because of its greedy characteristic [6]. To avoid this malignant characteristic, those works in [9, 12] activate KHM with four iterations only in every eight

7

iterations, i.e., the 8th and 16th iteration. However, updating each particle by using KHM wastes computational time and the diversity of population is restricted. Thus, every particle in SSOKHM is updated by KHM only in initialization, the 4 th and 8th iteration when its random number is in [0, 0.2] to restrict the use of KHM. This condition is obtained by trial and error. The primary steps of the SSOKHM for clustering problem are summarized as follows: Step 1: Generate a population of particles representing the centroids of each cluster with random positions based on a given dataset and let iter represents the current number of iterations. Step 2: For each particle, if random number is in [0, 0.2] activates KHM. Step 3: Evaluate the fitness value for each particle in the population according to Eq. (1). Step 4: Update pBest and gBest if necessary. Step 5: Update the solution’s variables according to Eq. (5). Step 6: If iter = 4 or 8, then go to Step 2; otherwise go to Step 7. Step 7: Stop the algorithm if the predefined number of iterations is met; otherwise go back to step 3.

3.2 The proposed RCS

The UM promotes SSO to be an algorithm with the advantages of simplicity, efficiency and flexibility. However, the UM is a stochastic process that can impact the efficiency and robustness of SSO, especially in continuous problems. Therefore, two strategies are developed to address those shortcomings. SSO exploits RCS for obtaining a better initial solution and increasing the convergence speed of SSO in dealing with the clustering problem. The RCS inspired from Bandyopadhyay and Maulik [29] can find better centroid of a cluster efficiently at the beginning by using arithmetic average. The mean of each cluster is recalculated to be a new centroid according to Eq. (9) after assigning each object to its centroid obtained by SSO. However, the algorithm would be inefficient when it is underused and the exploration of algorithm would be restricted if RCS overuses. Therefore, RCS is only used in the first 𝛽% of iterations in the proposed algorithm and 𝛽 is decided in next section. Z new, k 

1 Nk



X i C k

Xi

(9)

8

3.3 The proposed MMS

Because of the stochastic UM, SSO has inadequate capability to find nearby extreme points result in diminishing the robustness of itself. Applying a local search scheme is one of ways to overcome this exploitation problem. In this paper, a new scheme called MMS is added into the UM and the pBest position is discarded for searching the nearby gBest position to improve the quality of solution. The new variable after MMS is calculated as Eq. (10), and the modified UM is introduced as Eq. (11):

MMS ( x j )  g j      (Ub j  Lb j )  where



( Niter  iter ) Niter

is a predetermined parameter to handle the movement range of variables, and

(10) represents a

random number uniformly distributed in [-1,1]. Lb j and Ub j are the lower bound and the upper bound of the jth variable, respectively. Niter is the total number of iterations and iter is the number of current iterations. With an increase in the number of current iterations, the movement range of variables becomes smaller and stable.

 xijt 1 if   [0, Cw )   xijt   g j if   [Cw , C g )  MMS ( x j ) if   [C g ,1) 

(11)

Based on the above description, KHM and modified SSO are incorporated to form a hybrid clustering algorithm called RMSSOKHM. The steps of the proposed algorithm are shown in Fig. 2.

9

Generate initial particles randomly

The rapid centralized strategy

If rand ≤ 0.2

Yse

Execute KHM method with 4 iterations and initialize the gBest

No

Evaluate the fitness function and initialize the gBest

For iter = 1 to Niter

For i = 1 to number of particles

For j = 1 to number of variables generate random number Update Mechanism 0 < < Cw

Yes

Keep original variable

Yes

Replaced by gBest

Yes

MMS

No

No

C w ≤ < Cg

No Cg ≤ < 1 j = number of variables? Yes If iter ≥ % × Niter

No

The rapid centralized strategy

Yes If iter = 4 or 8 and rand ≤ 0.2

No

Evaluate the fitness function and update the gBest if necessary

Yes Execute KHM method with 4 iterations and update the gBest if necessary

i = number of particles?

No

Yes Yes

Stop

iter = Niter?

Fig. 2. Flowchart of the RMSSOKHM algorithm.

10

No

4. Experiments and results

In order to find the best settings for the parameters of the proposed algorithm, Taguchi method is applied in this section, and then the proposed RMSSOKHM algorithm is evaluated with eight benchmark datasets [30] to compare with the most recent well-known algorithms presented in the literature. All of the experiments are implemented in MATLAB R2012b running on a computer equipped with an Intel 2.4 GHz CPU and 12 GB of memory.

4.1 Data sets

For evaluating the performance of the proposed hybrid algorithm, eight datasets adapted from http://archive.ics.uci.edu/ml are tested. These datasets have sizes ranging from hundreds to thousands and a large spectrum ranging of the feature size is from 4 to 60. They have no missing data and all features are numeric. According to the categorization of problem size described by Kudo and Sklansky [31], the dataset can be categorized into three categories in terms of the number of features: small with 0 < d ≤ 19, medium with 20 < d ≤ 49 and large with d ≥ 50.The characteristics of these datasets are summarized in Table 2. Table 2 Characteristics of the considered data sets. Categorization

Small

Medium Large

Dataset Iris Cancer CMC Glass Wine WDBC Ionosphere Sonar LSVT voice

Features (d) 4 9 9 9 13 30 34 60 309

(n) 150 683 1473 214 178 569 351 208 126

11

Clusters (k) 3 2 3 6 3 2 2 2 2

4.2 The parameter settings

In order to optimize the parameter settings of RMSSOKHM, four datasets in Table 2 are selected to be test datasets including Iris, Cancer, WDBC and Sonar. Those selected datasets are from different size categories, thus the size differences are represented in those test datasets. Since the time cost is directly related to the number of selected datasets, test datasets are selected as few as possible. The performance of RMSSOKHM mightily depends on the parameters including Cw, Cg, α and β. In this work, the best settings of those parameters, which are four control factors in the experiments, are determined by using Taguchi method. Furthermore, the performance can be affected based on different dataset and value of p. In order to minimize the sensitivity of the performance to those affections, the value of p and datasets can be regarded as the noise factors which both have four levels respectively in the experiments. The factors and levels of concern for the experiments are listed in Table 3. Table 3 Factors and levels for the experiments of parameter settings. Control factors Cw Cg α β Noise factors The value of P Dataset

1 0.05 0.7 0.05 0 1 2.5 Iris

Levels 3 0.2 0.9 0.1 10 Levels 2 3 3.0 3.5 Cancer WDBC 2 0.1 0.8 0.01 5

4 0.3 0.95 0.2 20 4 4.0 Sonar

Following Taguchi method, an L16 (44) orthogonal array, which has 4 four-level factors and 16 runs, is chosen to be the outer array for the control factors, as shown in Table 4. The ones, twos, threes and fours in these columns of the outer array correspond to the levels of these factors. The inner array containing 32 combinations of the noise factors are used to vary the experimental conditions and the corresponding integers in these columns indicate the actual levels of the factors, as introduced in Table 5.

12

Table 4 L16 orthogonal array for the control factors. Run Factor

Cw Cg α β

1 1 1 1 1

2 1 2 2 2

3 1 3 3 3

4 1 4 4 4

5 2 1 2 3

6 2 2 1 4

7 2 3 4 1

8 2 4 3 2

9 3 1 3 4

10 3 2 4 3

11 3 3 1 2

12 3 4 2 1

13 4 1 4 2

14 4 2 3 1

15 4 3 2 4

16 4 4 1 3

9 3 1

10 3 2

11 3 3

12 3 4

13 4 1

14 4 2

15 4 3

Table 5 The inner array for the noise factors. Replication Factor

p Dataset

1 1 1

2 1 2

3 1 3

4 1 4

5 2 1

6 2 2

7 2 3

8 2 4

16 4 4

Since each run has 16 replications, the total repetitions to perform RMSSOKHM is 16 × 16 = 256 by combining the inner array with the outer array. For each performance, the response is the mean of the objective function value obtained after 30 generations and 20 iterations in each generation using RMSSOKHM with the specified settings for each control factor level and the specified experimental conditions for each noise factor level. Before the calculation of S/N ratios, all responses are normalized by min-max normalization shown in Eq. (12) for having the same scale [6]. yij' 

yij  min( yi )

(12)

max( yi )  min( yi )

The aim of the experiment is to minimize the objective function value obtained from RMSSOKHM; the S/N ratio for smaller-is-better is computed for each experimental run. The results including S/N ratio, mean and standard deviation (STD) of the response and described in Table 6. The main effects plots for the S/N ratio, mean and STD are depicted in Fig. 3-5, respectively. For maximizing S/N ratio, Cw = 0.2, Cg = 0.9, α = 0.05 and β =10 can be the best settings which are in agreement with those for minimizing the mean and STD while β =20 are slightly better than β =10 in the STD.

13

Table 6 The results of the experiment. Mean 0.5894 0.1819 0.2005 0.6250 0.2475 0.6707 0.2738 0.3294 0.2548 0.1487 0.4743 0.3145 0.7106 0.6404 0.2006 0.5715

STD 0.3590 0.2082 0.1691 0.3116 0.2167 0.2336 0.2493 0.2851 0.2669 0.1935 0.2825 0.2511 0.3560 0.2637 0.1600 0.4030

Main Effects Plot for SN ratios Cw

Cg

10

Mean of SN ratios

S/N ratio 3.2957 11.3250 11.7388 3.1729 9.7764 3.0011 8.7540 7.3351 8.8029 12.4235 5.2314 8.0130 2.0498 3.2304 11.9214 3.1979

8 6 4 0.05

0.10

0.20

0.30

0.70

0.80

α

0.90

0.95

10

20

β

10 8 6 4 0.01

0.05

0.10

0.20

0

5

Signal-to-noise: Smaller is better

Fig. 3. Main effects plot for S/N ratios of the control factors.

Main Effects Plot for Means Cw

0.6

Cg

0.5

Mean of Means

Run 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0.4 0.3 0.2 0.05

0.10

0.20

0.30

0.70

0.80

α

0.6

0.90

0.95

10

20

β

0.5 0.4 0.3 0.2 0.01

0.05

0.10

0.20

0

5

Fig. 4. Main effects plot for means of the control factors.

14

Main Effects Plot for StDevs Cw

Cg

0.32

Mean of StDevs

0.28 0.24 0.20 0.05

0.10

0.20

0.30

0.70

0.80

α

0.90

0.95

10

20

β

0.32 0.28 0.24 0.20 0.01

0.05

0.10

0.20

0

5

Fig. 5. Main effects plot for STD of the control factors.

4.3 Experimental and statistic results 4.3.1 Experimental results

In order to evaluate the performance of the proposed algorithm, well known and the most recent approaches are implemented including KHM [5], PSOKHM [9] and IGSAKHM [12]. The algorithmic parameters of each those approaches used in this study are provided in Table 7 and all approaches are conducted 20 iterations in each generation. The parameter settings of PSOKHM and IGSAKHM can be found in the literature. Table 7 Parameter setting. PSOKHM Parameter Particle size

c1 c2 w

Value 20 1.49618 1.49618

IGSAKHM Parameter Particle size

G0

Value 20 100



20

SSOKHM Parameter Particle size

0.7298

Cw

Value 20 0.1

RMSSOKHM Parameter Value Particle size 20 Cw 0.2

Cp

0.4

Cg

0.9

Cg

0.9

  (%)

0.05 10

The performance of the algorithms is evaluated and compared in terms of two criteria: (1) The sum of the harmonic average distance (SHAD): The sum of the harmonic mean of the distance from each data object to all centers, as defined in Eq. (1), which is the value of the objective function of KHM. A smaller SHAD results in a higher quality solution.

15

(2)

The F-measure: This criterion integrates the ideas of precision and recall from information retrieval [32, 33]. The precision (p) and the recall (r) are defined a p(i,j) 

N ij

r (i, j ) 

N ij

(13)

Nj

(14)

Ni

where Nij is the number of members of class i (given by the class labels of the dataset) in cluster j, Nj is the number of members in cluster j, and Ni is the number of members in class i. The F-measure correspond to a class i and a cluster j is F (i, j ) 

(b 2  1)  p(i, j )  r (i, j )

(15)

b 2  p(i, j )  r (i, j )

where b  1 , i.e., the precision and the recall have equal weighting. The overall F-measure of the clustering for the dataset of size N is defined by F



i

Ni max j {F (i, j )} N

(16)

where F is limited to the interval [0,1]; the larger the value of F is, the better the result obtained by the clustering algorithm is. The experimental results, including SHAD, the F-measure and CPU time (second), are summarized in Table 8-11 when p is 2.5, 3, 3.5 and 4. These results are provided in terms of the mean and the standard deviation (in brackets) of the obtained values after 30 generations.

16

Table 8 Simulation results of all algorithms when p = 2.5. p=2.5 Iris SHAD F-measure CPU time Cancer SHAD F-measure CPU time CMC SHAD F-measure CPU time Glass SHAD F-measure CPU time Wine SHAD F-measure CPU time WDBC SHAD

KHM

PSOKHM

IGSAKHM

SSOKHM

RMSSOKHM

151.214(12.655) 0.8741(0.061) 0.066(0.007)

148.901(0.003) 0.8853(0.000) 1.091(0.043)

148.903(0.007) 0.8853(0.000) 1.121(0.036)

148.930(0.077) 0.8853(0.000) 0.906(0.062)

148.830(0.007) 0.8857(0.002) 0.807(0.040)

58,615(7,956) 0.9461(0.082) 0.248(0.007)

57,067(55) 0.9605(0.001) 4.012(0.195)

57,142(33) 0.9605(0.002) 3.902(0.131)

57,122(36) 0.9602(0.000) 3.165(0.156)

56,830(7) 0.9610(0.001) 2.825(0.186)

96,197(4) 0.3980(0.001) 0.682(0.011)

96,198(8) 0.3972(0.002) 12.243(0.459)

96,200(9) 0.3980(0.002) 11.661(0.268)

96,297(235) 0.3972(0.001) 9.623(0.526)

96,104(8) 0.3987(0.003) 8.749(0.458)

1,196(33) 0.4928(0.018) 0.172(0.007)

1,185(23) 0.5012 (0.019) 3.480(0.132)

1,198(17) 0.4999(0.024) 3.547(0.086)

1,209(29) 0.5033(0.017) 2.620(0.123)

1,115(14) 0.5107(0.004) 2.478(0.112)

77,063,378 (8,796,892) 0.6783(0.027) 0.088(0.007)

75,346,808 (13,291) 0.6878(0.004) 1.664(0.064)

75,437,838 (231,715) 0.6882(0.008) 1.780(0.042)

75,378,957 (85,696) 0.6868(0.007) 1.295(0.067)

75,333,372 (738) 0.6891(0.001) 1.281(0.064)

3,087,756,051 (2,178) 0.8101(0.000) 0.255(0.006)

3,087,730,587 (16,811) 0.8101(0.000) 4.806(0.185)

3,087,746,430 (10,367) 0.8101(0.000) 4.793(0.184)

3,087,836,321 (141,613) 0.8101(0.000) 3.789(0.208)

3,087,704,450 (7,867) 0.8101(0.000) 3.471(0.189)

F-measure CPU time Ionosphere SHAD 2,805.202(0.249) 2,804.903(0.261) 2,804.979(0.451) 2,804.949(0.362) F-measure 0.6987(0.002) 0.7034(0.002) 0.7057(0.001) 0.7057(0.003) CPU time 0.160(0.007) 3.117(0.111) 3.228(0.067) 2.402(0.115) Sonar SHAD 199.577(0.078) 199.479(0.023) 199.503(0.055) 199.552(0.081) F-measure 0.5211(0.006) 0.5410(0.005) 0.5445(0.007) 0.5434(0.006) CPU time 0.113(0.006) 2.425(0.065) 2.816(0.055) 1.889(0.095) LSVT SHAD* 2.0056(4.4×1020) 2.0056(4.3×1021) 2.0056(3.9×1020) 2.0057(3.9×1023) F-measure 0.5306(0.000) 0.5306(0.000) 0.5306(0.000) 0.5312(0.003) CPU-time 0.2964(0.027) 8.8723(0.465) 10.6892(0.232) 6.1974(0.144) The best values are indicated in bold type. * denoted the mean of SHAD is the value× 1027.

17

2,801.498(0.676) 0.7060(0.003) 2.390(0.104) 199.406(0.069) 0.5461(0.006) 2.038(0.054) 2.0055(1.2×1022) 0.5316(0.000) 7.3950(0.202)

Table 9 Simulation results of all algorithms when p=3. p=3 Iris SHAD F-measure CPU time Cancer SHAD F-measure CPU time CMC SHAD F-measure CPU time Glass SHAD F-measure CPU time Wine SHAD F-measure CPU time WDBC SHAD F-measure CPU time Ion. SHAD F-measure CPU time Sonar SHAD F-measure CPU time LSVT SHAD* F-measure CPU-time

KHM

PSOKHM

IGSAKHM

SSOKHM

RMSSOKHM

129.194(17.030) 0.8774(0.062) 0.073(0.006)

126.072(0.013) 0.8907(0.002) 1.220(0.045)

126.074(0.004) 0.8913(0.002) 1.204(0.022)

126.105(0.047) 0.8892(0.003) 0.923(0.051)

125.771(0.012) 0.8916(0.001) 0.844(0.048)

113,644(135) 0.9544(0.001) 0.257(0.007)

112,956(453) 0.9641(0.002) 4.155(0.141)

113,551(152) 0.9647(0.000) 4.035(0.101)

113,325(151) 0.9642(0.001) 3.345(0.190)

111,285(32) 0.9650(0.001) 2.925(0.130)

186,998(32) 0.3967(0.001) 0.712(0.017)

186,978(25) 0.3966(0.001) 12.302(0.402)

186,976(17) 0.3962(0.001) 11.749(0.359)

187,027(119) 0.3973(0.002) 9.516(0.526)

186,509(39) 0.3978(0.002) 8.235(0.503)

1,444(124) 0.5035(0.019) 0.175(0.005)

1,425(56) 0.5067(0.009) 3.620(0.115)

1,424(64) 0.5042(0.011) 3.689(0.077)

1,500(87) 0.5009(0.021) 2.701(0.113)

1,392(32) 0.5061(0.008) 2.545(0.102)

1,406,160,975 (1,949,625,665) 0.5514(0.010) 0.092(0.010)

1,051,507,686 (2,770,708) 0.5532(0.007) 1.593(0.087)

1,051,293,029 (3,847,935) 0.5601(0.015) 1.656(0.034)

1,054,148,695 (5,193,193) 0.5551(0.009) 1.173(0.063)

1,049,005,478 (66,004) 0.5591(0.000) 1.156(0.057)

154,975,361,252 (144,969,372,513) 0.7703(0.022) 0.234(0.007)

84,330,558,433 (11,922,363) 0.7766(0.000) 4.383(0.116)

84,333,068,361 (8,996,139) 0.7766(0.000) 4.441(0.101)

84,349,767,821 (26,981,192) 0.7766(0.000) 3.499(0.154)

84,313,254,455 (1,194,279) 0.7766(0.000) 3.204(0.172)

2,659.601(39.898) 0.6968(0.016) 0.147(0.005)

2,643.782(3.386) 0.7025(0.004) 2.894(0.099)

2,644.155(3.426) 0.7011(0.001) 2.946(0.072)

2,644.021(2.818) 0.7025(0.004) 2.187(0.091)

2,627.126(1.834) 0.7032(0.004) 2.132(0.092)

114.824(0.139) 0.5416(0.011) 0.105(0.005)

114.604(0.055) 0.5417(0.006) 2.274(0.081)

114.638(0.077) 0.5418(0.005) 2.657(0.068)

114.612(0.100) 0.5411(0.006) 1.758(0.065)

114.364(0.120) 0.5470(0.007) 1.935(0.087)

7.6234(4.6×1032) 0.5307(0.046) 0.2881(0.015)

2.4268(2.1×1021) 0.5372(0.006) 8.3819(0.242)

2.4296(4.8×1029) 0.5355(0.007) 10.5909(0.165)

2.4277(3.1×1029) 0.5359(0.007) 6.0601(0.127)

2.4255(7.6×1027) 0.5372(0.005) 7.1901(0.132)

The best values are indicated in bold type. * denoted the mean of SHAD is the value × 10 32.

18

Table 10 Simulation results of all algorithms when p = 3.5. p=3.5 Iris SHAD F-measure CPU time Cancer SHAD F-measure CPU time CMC SHAD F-measure CPU time Glass SHAD F-measure CPU time Wine SHAD F-measure CPU time WDBC SHAD F-measure CPU time Ionosphere SHAD F-measure CPU time Sonar SHAD F-measure CPU time

KHM

PSOKHM

IGSAKHM

SSOKHM

RMSSOKHM

578.996(1,382.420) 0.8446(0.107) 0.069(0.011)

109.862(0.107) 0.8911(0.002) 1.127(0.050)

109.876(0.095) 0.8911(0.002) 1.124(0.038)

110.201(0.303) 0.8892(0.003) 0.890(0.049)

109.158(0.020) 0.8918(0.000) 0.792(0.044)

231,943(365) 0.9648(0.001) 0.238(0.006)

228,338(2,479) 0.9648(0.001) 4.014(0.171)

231,147(1,124) 0.9653(0.001) 3.848(0.075)

231,323(631) 0.9650(0.001) 3.104(0.153)

222,177(169) 0.9653(0.000) 2.735(0.181)

676,077(1,608,454) 0.3970(0.008) 0.670(0.013)

381,013(618) 0.3982(0.002) 11.943(0.445)

380,972(507) 0.3979(0.002) 11.351(0.301)

381,808(948) 0.3980(0.002) 9.056(0.427)

379,089(116) 0.3981(0.001) 8.195(0.423)

1,961(230) 0.4926(0.029) 0.164(0.008)

1,858(1) 0.5048(0.002) 3.420(0.147)

1,858(3) 0.5056(0.002) 3.521(0.100)

1,929(69) 0.4906(0.038) 2.537(0.120)

1,844(62) 0.5084(0.029) 2.426(0.092)

55,646,787,982 (116,052,454,131) 0.5358(0.060) 0.085(0.006)

14,527,915,213 (340,383,285) 0.5478(0.020) 1.580(0.056)

16,092,546,666 (1,193,053,484) 0.5489(0.001) 1.654(0.046)

14,418,659,086 (203,761,115) 0.5511(0.017) 1.225(0.065)

14,192,527,604 (2,101,660) 0.5633(0.056) 1.177(0.066)

13,999,283,696,423 (8,930,941,714,016) 0.7086(0.150) 0.239(0.007)

2,436,817,685,774 (21,828,969,746) 0.7482(0.002) 4.495(0.134)

2,439,845,199,776 (12,445,393,949) 0.7484(0.003) 4.512(0.135)

2,441,755,797,387 (25,031,811,702) 0.7484(0.002) 3.486(0.183)

2,423,420,174,654 (168,773,490) 0.7484(0.000) 3.248(0.160)

2,566.208(20.436) 0.6998(0.005) 0.153(0.005)

2,553.783(6.723) 0.7000(0.003) 2.934(0.114)

2,554.206(7.434) 0.7017(0.006) 3.041(0.090)

2,550.848(7.369) 0.7004(0.005) 2.251(0.081)

2,517.773(3.435) 0.7035(0.006) 2.187(0.117)

68.049(0.303) 0.5317(0.015) 0.107(0.006)

67.698(0.081) 0.5472(0.009) 2.281(0.070)

67.706(0.082) 0.5448(0.007) 2.657(0.078)

67.651(0.078) 0.5455(0.009) 1.737(0.079)

67.316(0.130) 0.5481(0.007) 1.938(0.092)

LSVT SHAD* 17.093(1.5×1037) 3.0791(7.5×1035) 3.2576(4.3×1036) 3.0543(5.1×1035) F-measure 0.5294(0.050) 0.5303(0.002) 0.5303(0.002) 0.5305(0.002) CPU-time 0.2917(0.008) 8.1921(0.204) 10.4853(0.169) 5.9483(0.159) The best values are indicated in bold type. * denoted the mean of SHAD is the value × 10 37.

19

3.0189(1.7×1033) 0.5323(0.000) 6.9852(0.109)

Table 11 Simulation results of all algorithms when p = 4. p=4

KHM

PSOKHM

IGSAKHM

SSOKHM

RMSSOKHM

SHAD

1,314.553(3,073.137)

100.464(2.242)

101.494(3.650)

99.964(1.554)

96.858(0.017)

F-measure

0.8169(0.137)

0.8890(0.004)

0.8870(0.003)

0.8877(0.003)

0.8918(0.000)

CPU time

0.066(0.007)

1.122(0.041)

1.128(0.037)

0.881(0.046)

0.793(0.038)

SHAD

1,007,793(759,156)

468,379(5,330)

478,437(3,243)

479,933(1,016)

449,480(791)

F-measure

0.9284(0.111)

0.9641(0.002)

0.9649(0.001)

0.9649(0.001)

0.9649(0.001)

CPU time

0.239(0.007)

3.918(0.174)

3.835(0.131)

3.128(0.167)

2.817(0.128)

SHAD

2,621,873(6,361,576)

812,578(8,966)

810,317(8,614)

822,890(12,683)

800,000(284)

F-measure

0.3964(0.006)

0.3971(0.003)

0.3978(0.001)

0.3971(0.001)

0.3982(0.002)

CPU time

0.662(0.012)

11.842(0.440)

11.354(0.290)

9.111(0.455)

8.111(0.415)

SHAD

3,427(3,194)

2,545(6)

2,549(17)

2,625(89)

2,447(54)

F-measure

0.4644(0.045)

0.4816(0.040)

0.4876(0.034)

0.4740(0.041)

0.4844(0.035)

CPU time

0.160(0.009)

3.374(0.099)

3.497(0.082)

2.538(0.137)

2.443(0.091)

F-measure

3,199,004,844,608 (4,621,596,106,542) 0.5265(0.066)

205,340,999,961 (12,384,939,199) 0.5396(0.034)

249,039,167,175 (28,319,085,987) 0.5585(0.042)

201,278,587,453 (6,581,291,999) 0.5412(0.026)

193,990,364,268 (20,210,605) 0.5903(0.025)

CPU time

0.088(0.006)

1.559(0.062)

1.644(0.047)

1.199(0.046)

1.192(0.060)

F-measure

580,699,023,772,159 (396,913,673,050,140 ) 0.7225(0.145)

72,946,105,323,91 0 (772,518,371,594) 0.7228(0.015)

73,033,921,328,920 (1,182,612,734,970 ) 0.7286(0.010)

72,828,741,866,917 (1,070,214,530,259 ) 0.7284(0.006)

71,898,088,221,48 3 (5,999,479,917) 0.7287(0.000)

CPU time

0.234(0.010)

4.480(0.133)

4.480(0.114)

3.471(0.165)

3.315(0.141)

SHAD

2,579.307(103.817)

2,510.374(8.774)

2,517.680(8.960)

2,507.539(8.419)

2,449.031(5.293)

F-measure

0.6820(0.037)

0.7007(0.005)

0.6984(0.006)

0.6983(0.006)

0.7016(0.005)

CPU time

0.148(0.006)

2.884(0.097)

3.001(0.086)

2.223(0.128)

2.206(0.114)

SHAD

41.332(0.477)

40.902(0.085)

40.945(0.105)

40.810(0.059)

40.465(0.101)

F-measure

0.5487(0.022)

0.5532(0.014)

0.5527(0.016)

0.5524(0.012)

0.5598(0.009)

CPU time

0.106(0.006)

2.250(0.076)

2.633(0.065)

1.745(0.075)

1.902(0.071)

SHAD*

47.671(3.0×1043)

4.0269(1.5×1041)

4.7291(1.2×1042)

4.0250(1.3×1041)

3.8434(5.8×1039)

F-measure

0.5296(0.054)

0.5313(0.002)

0.5314(0.011)

0.5310(0.002)

0.5323(0.000)

CPU-time

0.2829(0.008)

8.1131(0.130)

10.5373(0.188)

5.8698(0.136)

6.8672(0.116)

Iris

Cancer

CMC

Glass

Wine SHAD

WDBC SHAD

Ionosphere

Sonar

LSVT

The best values are indicated in bold type. * denoted the mean of SHAD is the value × 10 42.

From the results in Table 8-11, the performance of RMSSOKHM is better than SSOKHM in terms of SHAD, F-measure and CPU-time on most of all selected datasets for each p, which means the two proposed strategies play their prescribed roles in dealing with clustering problems and enhance the performance of RMSSOKHM.

20

In terms of SHAD, the results obviously reveal that RMSSOKHM yields the highest quality solutions than KHM, PSOKHM and IGSAKHM, on each selected dataset for each p. In other words, RMSSO enhances the capacity of KHM in escaping from a local optimum as well as converging to global optimum. The F-measure obtained by RMSSOKHM is also higher than those from the other approaches on most of datasets for each p, which means the performance of information retrieval of the proposed hybrid algorithm is superior to its competitors. The number of function evaluations of each algorithm for each problem in one generation are computed as shown in Table 12. Because the simplicity UM of RMSSO provides the outstanding efficiency in searching solution space, RMSSOKHM consumes less CPU time and requires the least number of function evaluations than PSOKHM and IGSAKHM. According to the results, RMSSOKHM is the most promising alternative of KHM than its competitors. Table 12 The number of function evaluations of each algorithm. Algorithm PSOKHM IGSAKHM SSOKHM RMSSOKHM Function evaluations 560 560 *432 *432 *: The values are expected values, since each particle of SSOKHM and RMSSOKHM have 20% chance to activate KHM in each iteration.

4.3.2 Statistic results

In order to confirm whether the proposed RMSSOKHM offers a significant improvement, two nonparametric statistical analyses including the Friedman test and the Iman-Davenport test are carried out based on two criteria in this work. If there are statistically significant differences among all algorithms, then the Holm’s method and Bonferroni-Dunn test are employed as a post hoc test which is used to perform a comparison considering the proposed algorithm (control algorithm) and the rest of algorithms. A level of significance   0.05 is used to determine whether the hypothesis is rejected in all case. The detail of these tests is introduced in Derrac et al. [34]. For each problem i ranks as 1 ≤ ri j ≤ k, where 1 and k denotes the best and worst result, respectively. k is the number of all algorithms and j is the jth algorithm. And then, the Friedman ranks of each algorithm can be obtained by R j 

1 n



n rj i 1 i

, where n = 36, since we have 4 types of p and 9 datasets

constitute 36 problems. Table 13 depicts the average ranks computed through the Friedman test based on

21

two criteria in this work including the SHAD and F-measure. As can be seen in the table, the propos19ed RMSSOKHM is the best performing algorithm, followed by PSOKHM or IGSAKHM (depends on SHAD or F-measure), SSOKHM and KHM, successively. Table 13 Average Friedman ranks of clustering algorithms based on the SHAD and F-measure. Algorithm SHAD F-measure

KHM 4.722 4.583

PSOKHM 2.472 3.153

IGSAKHM 3.389 2.708

SSOKHM 3.417 3.236

RMSSOKHM 1 1.319

The Friedman statistic Ff and Iman-Davenport statistic FID can be derived by Eq. (17-18), respectively. The result including the p-Values computed through the Friedman test and Iman-Davenport based on SHAD and F-measure are given in Table 14, which all strongly suggest the existence of significant differences on these two criteria among the considered approaches. Ff 

12n   k (k  1) 

FID 



j

Rj2 

k (k  1) 2   4 

(17)

(n  1)  F2 n(k  1)   F2

(18)

Table 14 The results of Friedman and Iman-Davenport test based on the SHAD and F-measure. Method Object Friedman Iman-Davenport

Statistical value SHAD F-measure 109 79.133 109 42.698

SHAD 0.000 0.000

p-Value F-measure 0.000 0.000

Hypothesis SHAD F-measure Rejected Rejected Rejected Rejected

For determining sufficient statistical differences between RMSSOKHM and the remaining algorithms, the Holm’s method is conducted as a post hoc test. The test statistic z-Values for comparing the ith algorithm and the proposed algorithm are calculated by Eq. (19), where Rc is average Friedman rank of the proposed algorithm. The results given in Table 15 show that all p-Values are smaller than  = 0.05, which indicates the proposed algorithm, RMSSOKHM, is statistically better than the KHM, PSOKHM, IGSAKHM and SSOKHM regarding the SHAD and F-measure. zi 

( Ri  Rc )

(19)

k (k  1) 6n

22

Table 15 Friedman ranks of clustering algorithms based on the SHAD. Algorithm Object KHM PSOKHM IGSAKHM SSOKHM

SHAD 9.988 3.950 6.410 6.485

z-Value F-measure 8.758 4.919 3.727 5.143

SHAD 0.000 0.000 0.000 0.000

p-Value F-measure 0.000 0.000 0.000 0.000

Hypothesis SHAD F-measure Rejected Rejected Rejected Rejected Rejected Rejected Rejected Rejected

Using the critical difference (CD) for all comparisons makes the Bonferroni-Dunn test simpler to calculate and visualize the result. The detail of this tests is introduced by Demsar [35]. CD is a threshold to identify that the performance of two algorithms is significantly different if the corresponding average Friedman ranks differ by at least this threshold. CD can be derived by the following: CD  q

k (k  1) 6n

(20)

where k and n is the number of the algorithms and problems, respectively. q is the critical values based on the Studentized range statistic divided by 2 as shown in Table 16. Table 16 Critical values for the two-tailed Bonferroni-Dunn test [35]. #Algorithms q0.05

2 1.960

3 2.241

4 2.394

5 2.498

6 2.576

7 2.638

8 2.690

The critical value is 2.498 and the corresponding CD is 2.498

9 2.724

10 2.733

5 6 =0.93. The difference of 6  36

average Friedman ranks between the algorithms is calculated in terms of SHAD and F-measure as shown in Table 17 and 18, respectively. Those differences of the proposed algorithm both based on SHAD and F-measure is bigger than CD, thus the conclusion, which the performance of proposed algorithm is significantly better than other competitors in terms of SHAD and F-measure, is the same as the result of the Holm’s method. Furthermore, we can identify PSOKHM and IGSAKHM are the same group, since the difference between those two algorithms is 0.917 ≤ CD = 0.93 based on SHAD as shown in Table 17. Likewise, SSOKHM and IGSAKHM are the same group. The results obtained from the Bonferroni-Dunn test can be visually represented with a simple diagram as described in Fig. 6-7 [35]. In those figures, the arrows represent the width of CD and any algorithm with the rank outside this range is significantly

23

different from RMSSOKHM. Also, it is easy to identify those algorithms sharing the same solid line are belong to the same group. Appendix Table 1-7 present the best centroids obtained by RMSSOKHM on the small and medium datasets when p = 2.5. We didn’t present those results of all datasets corresponding to each p values due to space restrictions. The best centroids can be adopted to validate the best SHAD in Appendix Table 8 by using the Eq. 1. Table 17 The difference of average Friedman ranks between the algorithms based on SHAD.

KHM PSOKHM IGSAKHM SSOKHM RMSSOKHM

KHM 0 2.25 1.333 1.305 3.722

PSOKHM 2.25 0 0.917 0.945 1.472

IGSAKHM 1.333 0.917 0 0.028 2.389

SSOKHM 1.305 0.945 0.028 0 2.417

RMSSOKHM 3.722 1.472 2.389 2.417 0

The difference of all algorithms against each other smaller than CD are indicated in bold type.

Table 18 The difference of average Friedman ranks between the algorithms based on F-measure.

KHM PSOKHM IGSAKHM SSOKHM RMSSOKHM

KHM 0 1.43 1.875 1.347 3.264

PSOKHM 1.43 0 0.445 0.083 1.834

IGSAKHM 1.875 0.445 0 0.528 1.389

SSOKHM 1.347 0.083 0.528 0 1.917

RMSSOKHM 3.264 1.834 1.389 1.917 0

The difference of all algorithms against each other smaller than CD are indicated in bold type.

The width of CD 5

4

3

KHM SSOKHM IGSAKHM

2

1

RMSSOKHM

PSOKHM

Fig. 6. Comparison of all algorithms against each other with the Boferroni-Dunn test based on SHAD.

24

The width of CD 5

4

3

KHM SSOKHM PSOKHM

2

1

RMSSOKHM IGSAKHM

Fig. 7. Comparison of all algorithms against each other with the Boferroni-Dunn test based on F-measure.

5. Conclusions

In this study, a hybrid clustering algorithm based on KHM and RMSSO, which is evolved from SSO, are proposed. RMSSOKHM attempts to incorporate the advantages of KHM and SSO simultaneously. KHM offers promising initial solutions and improves them with RMSSO. RMSSO overcomes the shortcomings of SSO by adding two strategies, the RCS and MMS, and enhances the capacity of KHM in escaping from a local optimum. Furthermore, the parameters settings of RMSSOKHM are optimized by using Taguchi method based on the experimental results for minimizing SHAD. According to the results, the proposed algorithm outperforms KHM, PSOKHM and IGSAKHM in most of the datasets in terms of SHAD, F-measure and CPU time. In future research, more applications, such as classification and feature selection, must be examined using RMSSO. In addition, the combination of KHM with other algorithms is another research direction.

Acknowledgements

The authors would like to thank the Ministry of Science and Technology of the Republic of China, for financially supporting this research under Contract No. MOST 103-2410-H-145-002.

References

[1] P. Willett, Recent trends in hierarchic document clustering: a critical review, Inf. Process. Manage. 24 (5) (1988) 577-597.

25

[2] R. Xu, D. Wunsch, Survey of clustering algorithms, IEEE Trans. Neural Netw. 16 (3) (2005) 645-678. [3] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu, An efficient k-means clustering algorithm: Analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell. 24 (7) (2002) 881-892. [4] S.Z. Selim, M.A. Ismail, K-means-type algorithms: a generalized convergence theorem and characterization of local optimality, IEEE Trans. Pattern Anal. Mach. Intell. 6 (1) (1984) 81-87. [5] B. Zhang, M. Hsu, U. Dayal, K-harmonic means - a data clustering algorithm, Technical Report HPL-1999-124, Hewlett-Packard Laboratories, 1999. [6] Z. Gungor, A. Unler, K-harmonic means data clustering with simulated annealing heuristic, Appl. Math. Comput. 184 (2) (2007) 199-209. [7] Z. Gungor, A. Unler, K-Harmonic means data clustering with tabu-search method, Appl. Math. Model. 32 (6) (2008) 1115-1125. [8] H. Jiang, S.H. Yi, J. Li, F.Q. Yang, X. Hu, Ant clustering algorithm with K-harmonic means clustering, Expert Syst. Appl. 37 (12) (2010) 8679-8684. [9] F.Q. Yang, T.E.L. Sun, C.H. Zhang, An efficient hybrid data clustering method based on K-harmonic means and particle swarm optimization, Expert Syst. Appl. 36 (6) (2009) 9847-9852. [10] A.A.A. Esmin, R.A. Coelho, S. Matwin, A review on particle swarm optimization algorithm and its variants

to

clustering

high-dimensional

data,

Artif.

Intell.

Rev.

in

press,

DOI

10.1007/s10462-013-9400-4. [11] S. Alam, G. Dobbie, Y.S. Koh, P. Riddle, S.U. Rehman, Research on particle swarm optimization based clustering: a systematic review of literature and techniques, Swarm Evol. Comput. 17 (2014) 1-13. [12] M.H. Yin, Y.M. Hu, F.Q. Yang, X.T. Li, W.X. Gu, A novel hybrid K-harmonic means and gravitational search algorithm approach for clustering, Expert Syst. Appl. 38 (8) (2011) 9319-9324. [13] W.C. Yeh, A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems, Expert Syst. Appl. 36 (5) (2009) 9192-9200. [14] C. Bae,

W.C. Yeh,

N. Wahid,

Y.Y. Chung, Y. Liu, A new simplified swarm optimization (SSO)

using exchange local search scheme, Int. J. Innov. Comp. Inf. Control 8 (6) (2012) 4391-4406. [15] Y.Y. Chung,

N. Wahid, A hybrid network intrusion detection system using simplified swarm

26

optimization (SSO), Appl. Soft Comput. 12 (9) (2012) 3014-3022. [16] R. Azizipanah-Abarghooee, A new hybrid bacterial foraging and simplified swarm optimization algorithm for practical optimal dynamic load dispatch, Int. J. Electr. Power Energy Syst. 49 (2013) 414-429. [17] W.C. Yeh, Y.M. Yeh, P.C. Chang, Y.C. Ke, V. Chung, Forecasting wind power in the Mai Liao Wind Farm based on the multi-layer perceptron artificial neural network model with improved simplified swarm optimization, Int. J. Electr. Power Energy Syst. 55 (2014) 741-748. [18] J.T. Tsai, J.H. Chou, T.K. Liu, Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm, IEEE Trans. Neural Netw. 17(1) (2006) 69-80. [19] B.W. Cheng, C. L. Chang, A study on flowshop scheduling problem combining Taguchi experimental design and genetic algorithm, Expert Syst. Appl. 32(2) (2007) 415-421. [20] L.K. Pan, C.C. Wang, S.L. Wei, H.F. Sher, Optimizing multiple quality characteristics via Taguchi method-based Grey analysis, J. Mater. Process. Technol. 182(1-3) (2006) 107-116. [21] A.R. Yildiz, A new design optimization framework based on immune algorithm and Taguchi's method, Comput. Ind. 60(8) (2009) 613-620. [22] B. Vahdani, R. Soltani, M. Zandieh, Scheduling the truck holdover recurrent dock cross-dock problem using robust meta-heuristics, Int. J. Adv. Manuf. Technol. 46(5-8) (2010) 769-783. [23] M.K. Tiwari, N. Raghavendra, S. Agrawal, S.K. Goyal, A Hybrid Taguchi–Immune approach to optimize an integrated supply chain design problem with multiple shipping, Eur. J. Oper. Res. 203(1) (2010) 95-106. [24] G. Candan, H.R. Yazgan, Genetic algorithm parameter optimisation using Taguchi method for a flexible manufacturing system scheduling problem. Int. J. Prod. Res. 53(3) (2015) 897-915. [25] A. Mozdgir, I. Mahdavi, I.S. Badeleh, M. Solimanpur, Using the Taguchi method to optimize the differential evolution algorithm parameters for minimizing the workload smoothness index in simple assembly line balancing, Math. Comput. Model. 57 (1-2) (2013) 137-151. [26] V.P. Vinay, R. Sridharan, Taguchi method for parameter design in ACO algorithm for distribution–allocation in a two-stage supply chain, Int. J. Adv. Manuf. Tech. 64 (9-12) (2013) 1333-1343. [27] G. Taguchi, Introduction to quality engineering: designing quality into products and processes,

27

Tokyo : The Organization. (1986). [28] G. Hamerly, C. Elkan, Alternatives to the k-means algorithm that find better clusterings, In: Proceedings of the 11th International Conference on Information and Knowledge Management, Virginia, USA, 2002, pp. 600-607. [29] S. Bandyopadhyay, U. Maulik, An evolutionary technique based on K-means algorithm for optimal clustering in R-N, Inform. sciences 146 (1-4) (2002) 221-237. [30] K. Bache, M. Lichman, UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013. [31] M. Kudo, J. Sklansky, Comparison of algorithms that select features for pattern classifiers, Pattern Recognit. 33 (1) (2000) 25-41. [32] A. Dalli, Adaptation of the F-measure to cluster based lexicon quality evaluation, In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable? Association for Computational Linguistics, 2003, pp. 51-56. [33] L. Zhang, Q.X. Cao, A novel ant-based clustering algorithm using the kernel method, Inform. sciences 181 (20) (2011) 4658-4672. [34] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm Evol. Comput. 1 (1) (2011) 3-18. [35] J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (2006) 1-30.

28

Wei-Chang Yeh is a professor of the Department of Industrial Engineering and Engineering Management at the National Tsing Hua University (NTHU), Hsinchu, Taiwan. He received his M.S. and Ph.D. from the Department of Industrial Engineering at the University of Texas at Arlington. His research interests include network reliability theory, graph theory, deadlock problem, and scheduling. Dr. Yeh is a member of IEEE and INFORMS and has received awards for his research achievement from the National Science Council.

Chyh-Ming Lai is a PhD student of the Department of Industrial Engineering and Engineering Management at the National Tsing Hua University (NTHU), Hsinchu, Taiwan. He received his M.S degree from Management College at National Defense University. His research interests are Evolutionary Computation and Data Mining.

29

Kuei-Hu Chang received his Bachelor’s degree in Mathematics from the Chinese Military Academy in 1996, his Master’s degree in Resources Management from the National Defense Management College in 2000, and his PhD degree in Industrial Engineering and Management from National Chiao-Tung University in 2008. Now he is an Associate Professor of the Management Sciences department in the Chinese Military Academy. His research is mainly in the fields of fuzzy logic, soft computing and reliability.

Appendix Appendix Table 1 The best centroids obtained by RMSSOKHM on the Iris dataset when p = 2.5. Center 1 5.89276 2.74607 4.36118 1.42357

Center 2 6.85932 3.07692 5.73517 2.05343

Center 3 5.00785 3.41620 1.49449 0.26374

Appendix Table 2 The best centroids obtained by RMSSOKHM on the Cancer dataset when p = 2.5. Center 1 3.14609 1.59280 1.68221 1.59615 2.32683 1.78225 2.24538

Center 2 7.16913 6.95097 6.90915 5.98962 5.71260 7.61734 6.21204

30

1.56053 1.31128

6.14546 3.11849

Appendix Table 3 The best centroids obtained by RMSSOKHM on the CMC dataset when p = 2.5. Center 1 44.21899 2.76968 3.22727 4.99774 0.75985 0.73825 1.93256 3.27186 0.17688

Center 2 33.60259 2.98658 3.41335 3.71702 0.78821 0.65833 2.09691 3.17712 0.08404

Center 3 23.79732 2.95841 3.37045 1.84590 0.86569 0.74083 2.27084 2.82240 0.05425

Appendix Table 4 The best centroids obtained by RMSSOKHM on the Glass dataset when p = 2.5. Center 1 1.52047 13.73937 3.40323 1.06692 71.98006 0.29222 9.30581 0.06990 0.06004

Center 2 1.51744 13.09338 3.47017 1.39771 72.84714 0.58236 8.34263 0.05265 0.06079

Center 3 1.51342 13.15253 0.32936 2.86883 70.79640 5.48011 7.03424 0.23374 0.00513

Center 4 1.52802 12.04075 0.08135 1.11917 71.94398 0.26741 14.29054 0.33306 0.11170

Center 5 1.51683 14.58701 0.22617 2.17992 73.20948 0.15188 8.66225 1.10694 0.01889

Center 6 1.52042 13.46928 0.52338 1.45827 72.92127 0.34197 11.05400 0.15955 0.05197

Appendix Table 5 The best centroids obtained by RMSSOKHM on the Wine dataset when p = 2.5. Center 1 13.81946 1.97045 2.39299

Center 2 12.96704 2.59908 2.44116

Center 3 12.51325 2.55273 2.27398

31

16.76468 105.81237 2.96249 3.04375 0.27465 1.90503 5.88321 1.09756 3.07797 1247.06163

19.59077 106.80663 2.29423 1.80176 0.37709 1.57457 5.70651 0.92532 2.47933 775.11805

20.78369 93.71124 2.13642 1.77749 0.40497 1.45254 4.48848 0.94552 2.55537 463.67740

Appendix Table 6 The best centroids obtained by RMSSOKHM on the WDBC dataset when p = 2.5. Center 1 11.95981 19.00468 81.57528 506.85327 0.09579 0.09660 0.06853 0.03038 0.17485 0.06505 0.29710 1.18501 2.35277 27.57236 0.00708 0.02492 0.03427 0.01078 0.02268 0.00323 14.19338 24.91795 93.01736 643.86270 0.13120 0.24004 0.20975 0.09471 0.29136 0.08107

Center 2 20.04466 21.74010 132.88104 1277.43929 0.09975 0.15196 0.19654 0.11108 0.19230 0.06182 0.81522 1.22678 6.19278 112.48486 0.00728 0.02586 0.03601 0.01596 0.02156 0.00452 24.61467 29.53357 165.59523 1912.53435 0.13862 0.37836 0.44130 0.19286 0.31023 0.08510

32

Appendix Table 7 The best centroids obtained by RMSSOKHM on the Ionosphere dataset when p = 2.5. Center 1 0.80325 0.00000 0.47356 0.14278 0.35908 0.20770 0.28750 0.30775 0.20222 0.38191 0.10599 0.32017 -0.02767 0.21912 -0.16045 0.16287 -0.05898 0.02802 -0.07517 -0.01593 -0.09926 -0.06690 -0.03503 -0.09549 0.05075 -0.13056 0.26980 -0.16096 0.06742 -0.06389 0.05742 -0.01401 0.06686 0.03698

Center 2 0.90792 0.00000 0.69986 -0.01760 0.71093 0.04782 0.69763 -0.01526 0.68498 0.05680 0.70691 0.05199 0.63812 0.00944 0.63793 0.01926 0.65166 -0.02612 0.63477 -0.02395 0.60390 0.04707 0.61577 -0.04312 0.61056 -0.04660 0.68239 -0.03305 0.57297 -0.01528 0.54656 0.00003 0.56211 -0.00388

Appendix Table 8 The best SHAD obtained by RMSSOKHM on small and medium datasets when p = 2.5. Dataset Iris Glass Cancer CMC Wine WDBC Ionosphere

Best SHAD 148.82208 1112.48571 56822.46414 96092.39223 75332710.97219 3087692223.51004 2800.71399

33