Variance-based differential evolution algorithm with an optional crossover for data clustering

Variance-based differential evolution algorithm with an optional crossover for data clustering

Accepted Manuscript Variance-based differential evolution algorithm with an optional crossover for data clustering Mohammed Alswaitti, Mohanad Albughd...

2MB Sizes 0 Downloads 47 Views

Accepted Manuscript Variance-based differential evolution algorithm with an optional crossover for data clustering Mohammed Alswaitti, Mohanad Albughdadi, Nor Ashidi Mat Isa

PII: DOI: Reference:

S1568-4946(19)30131-0 https://doi.org/10.1016/j.asoc.2019.03.013 ASOC 5383

To appear in:

Applied Soft Computing Journal

Received date : 19 August 2018 Revised date : 31 January 2019 Accepted date : 2 March 2019 Please cite this article as: M. Alswaitti, M. Albughdadi and N.A. Mat Isa, Variance-based differential evolution algorithm with an optional crossover for data clustering, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.03.013 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Highlights (for review)

Highlights •

A single solution representation is adopted to avoid setting the initial solutions’ sizes and positions.



A new switchable mutation scheme is employed to enhance the balance of the search behaviour.



Multidimensional mutation factor is introduced to enhance the mutant solution quality.



A new optional crossover strategy is proposed to increase the convergence rate.



Integration of the four proposals in one DE-based clustering algorithm.

*Manuscript Click here to view linked References

Variance-based Differential Evolution Algorithm with an Optional Crossover for Data Clustering

1 2 3 4 5 6 7

MOHAMMED ALSWAITTI 1, a, MOHANAD ALBUGHDADI 2, b, AND NOR ASHIDI MAT ISA 3, c, *

8

2

9 10

3

11

E-mail: [email protected], mohanad.albughdadi@ terranis.fr, [email protected]

12 13 14

Phone Numbers: (+60) 182097532, (+33) 7 81 85 34 77, (+60) 129896051

15

Abstract

16

The differential evolution optimization-based clustering techniques are powerful,

17

robust and more sophisticated than the conventional clustering methods due to their

18

stochastic and heuristic characteristics. Unfortunately, these algorithms suffer from

19

several drawbacks such as the tendency to be trapped or stagnated into local optima

20

and slow convergence rates. These drawbacks are consequences of the difficulty in

21

balancing the exploitation and exploration processes which directly affects the final

22

quality of the clustering solutions. Hence, a variance-based differential evolution

23

algorithm with an optional crossover for data clustering is presented in this paper to

24

further enhance the quality of the clustering solutions along with the convergence

25

speed. The proposed algorithm considers the balance between the exploitation and

26

exploration processes by introducing (i) a single-based solution representation, (ii) a

27

switchable mutation scheme, (iii) a vector-based estimation of the mutation factor,

28

and (iv) an optional crossover strategy. The performance of the proposed algorithm is

29

compared with current state-of-the-art differential evolution-based clustering

1

School of Information Science and Technology, Xiamen University Malaysia, Jalan Sunsuria, Bandar Sunsuria,

43900 Sepang Selangor Darul Ehsan, Malaysia. TerraNIS (New Information Services) SAS, 10 Avenue de l'Europe, 31 520 Ramonville, France. School of Electrical & Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong

Tebal, Penang, Malaysia. a

b

a

c

b

c

* Corresponding author

1

30

techniques on 15 benchmark datasets from the UCI repository. The experimental

31

results are also thoroughly evaluated and verified via non-parametric statistical

32

analysis. Based on the obtained experimental results, the proposed algorithm

33

achieves an average enhancement up to 11.98% of classification accuracy and

34

obtains a significant improvement in terms of cluster compactness over the

35

competing algorithms. Moreover, the proposed algorithm outperforms its peers in

36

terms of the convergence speed and provides repeatable clustering results over 50

37

independent runs.

38 39

Keywords: Differential Evolution, exploitation and exploration, data clustering,

40

switchable mutation, optional crossover, convergence speed.

41 42

1. Introduction

43

Recently, the vast advancements in data storage technologies and internet

44

applications have resulted in a massive growth of data quantity of all types. This

45

diversity of the data is an outcome of an endless sequence of daily life interactions

46

while accessing, recording, and transferring information (such as text, images, and

47

videos) among humans. The increase in both the volume and the variety of this data

48

induced the need for an advanced technology that is automatically capable of

49

summarizing these huge amounts of data to meaningful, comprehensible, and useful

50

information. To meet this requirement, data mining has emerged as a powerful

51

technique to extract the valuable hidden information and knowledge from the large

52

databases. Cluster analysis is one of the simplest data mining tools that is used to

53

categorize the data objects based on their features into a set of natural and similar

54

clusters without a prior knowledge of the data. Naturally, the grouped objects within

2

55

the same cluster share a high degree of similarity while being dissonant to other

56

objects belonging to other clusters. In other words, the formed clusters should satisfy

57

a high degree of homogeneity within their members and a high degree of

58

heterogeneity to other clusters. Grouping these patterns into meaningful clusters in

59

an unsupervised manner is done using clustering algorithms.

60

Unsupervised learning algorithms play an outstanding role in machine

61

learning due to their capabilities in exploring data without having any prior

62

information about them, i.e., there are no labels associated with these data. These

63

algorithms aim at modeling the underlying structure or distribution in the data, which

64

can be used for decision making, predicting future inputs, among others. In the past

65

few decades, cluster analysis has contributed a vital role in a diversity of fields and

66

the applications of clustering techniques have been used in a wide range of different

67

areas, including web analysis [1-3], business [4], marketing [5, 6], education [7, 8],

68

data science [9, 10], and medical diagnosis [11-13], among others.

69

There are various types of clustering algorithms which can be classified

70

broadly into different categories, namely, partitional, hierarchical, density-based, and

71

optimization-based clustering algorithms. Each category has its own working

72

mechanism, the capability to deal with certain types of data, advantages, and

73

drawbacks [14]. Partitional clustering algorithms are considered to be versatile such

74

as the K-means algorithm [15] but in general, partitional algorithms lack the

75

capability to discover groupings in overlapping clusters. Furthermore, they are

76

sensitive to the initialization step where the initial positions of the centroids are

77

specified [16]. In contrast, hierarchical clustering algorithms provide the advantage

78

of exploring data using different levels of the dendrogram with no prior information

79

about the number of clusters. However, the complexity of the hierarchical clustering

3

80

algorithms is higher than that of the partitional methods [17]. Moreover, Employing

81

the density concept in the density-based clustering algorithms allows for detecting

82

clusters with arbitrary shapes and makes the algorithm robust against outliers.

83

However, density-based clustering algorithms are not adequate for clusters with

84

varied densities and high-dimensional data [18].

85

Recently, Optimization-based clustering algorithms have seized a competitive

86

stature in solving clustering problems [19, 20] due to their capability of discovering

87

better solutions. The prominence of bio-inspired computing is increasing due to its

88

various applications in engineering [21], where numerous variations of these

89

algorithms have been proposed for dynamic optimization [22] along with their

90

applications to real-world problems. Mostly, the search process of these algorithms is

91

done by a population-based, decentralized, and self-organized behavior which

92

provides a powerful alternative to the conventional clustering methods. In

93

metaheuristic and population-based algorithms, the two cornerstones of problem-

94

solving by the search are the exploitation and exploration processes. The exploitation

95

process aims at searching the nearby area of the current solution, where the

96

exploration process allows searching for new solutions far from the current one in the

97

search space. The algorithm is considered to be efficient if an appropriate balance

98

between the exploitation and exploration processes is achieved on a given

99

optimization problem [23]. Many factors control the exploitation and exploration

100

processes while searching the problem space, and these factors differ according to

101

the followed mechanism by the optimization algorithm.

102

The exploitation and exploration processes are conflicting objectives and

103

achieving the balance between them is algorithm-dependent. Hence, the dominant

104

factors of the search process need to be studied and analyzed for each algorithm.

4

105

One of the most famous nature-inspired algorithms utilized in cluster analysis

106

is the Differential Evolution (DE) algorithm [24]. This algorithm is simple and

107

relatively efficient but it also comes with associated drawbacks while solving

108

clustering problems. It is a stochastic search method with simpler, more reliable, and

109

more robust characteristics than other evolutionary algorithms [25, 26]. However, the

110

search behavior of the canonical DE algorithm is not properly balanced. In other

111

words, the exploration ability of the DE algorithm is sufficient, where the

112

exploitation is considered to be weak which affects the convergence speed of the

113

algorithm [27]. Also, the DE algorithm does not guarantee to converge to the global

114

solution, and it is vulnerable to the premature convergence [28] and stagnation into

115

suboptimal solutions [29, 30]. Moreover, the performance of the DE algorithm is

116

affected by the control parameters setups, where various research studies aimed to

117

estimate these parameters to enhance the ability of the algorithm to solve a wider

118

range of problems [31, 32]. However, none of these research works considered the

119

effect of the control parameters on all dimensions, where most approaches assigned

120

magnitude values especially to the mutation factor and assumed an equally-weighted

121

difference for all dimensions.

122

Based on the aforementioned concerns, this paper proposes a new variant of

123

DE-based clustering algorithms, namely, Variance-based Differential Evolution

124

Algorithm with an Optional Crossover for Data Clustering (VDEO). The proposal of

125

this algorithm includes a single-based solution representation to avoid the difficulty

126

associated with determining the number of initial solutions and to reduce the

127

computations related to the fitness evaluation for all these initial solutions. In

128

addition, the proposed algorithm presents a new switchable DE mutation scheme

129

between two random and variance-based schemes to balance the search behavior of

5

130

the canonical DE algorithm. Furthermore, a vector-based estimation of the mutation

131

factor which takes into consideration the data distribution at each dimension is also

132

introduced. This multidimensional mutation factor contributes to the quality of the

133

generated mutant solution. Finally, an optional crossover strategy is also presented to

134

enhance the convergence rate of the proposed algorithm. The integration of the

135

previous proposals aims at balancing the exploitation and exploration processes,

136

improving the convergence rate of the algorithm, and providing high-quality

137

clustering solutions.

138

The rest of the paper is organized as follows. A background of the canonical

139

DE algorithm and its utilization in cluster analysis are provided in Section 2. The

140

proposed algorithm is introduced in Section 3. Validation on datasets from the UCI

141

using different measures and statistical analysis is conducted in Section 4. Finally,

142

conclusions are drawn in Section 5.

143 144

2. Differential Evolution Optimization-based Clustering Algorithms

145

The DE algorithm is considered as a heuristic, population-based, and

146

evolution-inspired optimization technique which was proposed by Storn and Price

147

[24] to solve complex real-world problems. Consequently, numerous variants of the

148

DE algorithm have been developed and employed to solve a diverse range of real-

149

world problems [33-38]. The key idea of the DE algorithm is to evolve the

150

population (solutions) at each generation (iteration) by mutation and crossover

151

processes, where random combinations of population differences are generated to

152

form mutant vectors. Then, these mutants contribute to the base population (current

153

solution) to form a new candidate solution, where the fittest among the current and

154

the new solution will survive to the next iteration.

6

155

Generally, the performance of the DE algorithm is affected by a number of

156

factors such as the initial population size and positions, controlling parameters

157

(mutation and crossover factors), and the employed mutation strategy. Hence,

158

numerous variants were proposed in the literature to adapt these factors or propose

159

new schemes to enhance the convergence speed, the complexity, and the quality of

160

the provided solutions by the canonical DE algorithm. A recent and comprehensive

161

survey on the development of the several aspects of the DE as an optimization

162

algorithm can be found at the work proposed in [32].

163

In this paper, only optimization-based clustering techniques related to the DE

164

algorithm are considered in the literature in order to stick with the paper scope.

165

Therefore, the next sections are confined to exhibit the canonical DE algorithm and

166

the related works in the literature that used it for cluster analysis.

167 168 169

2.1. The Canonical Differential Evolution Algorithm Similar to the heuristic optimization algorithms, the DE algorithm searches

170

for an optimal solution in a -dimensional search space by initiating

171

which are considered as candidate solutions that evolve through iterations to provide

172

better solutions. The framework of the canonical DE algorithm mainly consists of

173

four consecutive phases namely, population initialization, mutation, crossover, and

174

selection. It is worth mentioning that the population initialization phase is only done

175

once, then the three remaining phases are executed successively through the

176

iterations until a termination criterion is satisfied. The following explanation for all

177

these phases is based on the canonical DE proposed in [24] as follows:

178 179

7

population,

180

2.1.1. Population Initialization

181 182

Each candidate solution

that belongs to the population

at an iteration

is represented as (1)

183

where

and

represents the search space dimension. The initial

184

positions (at

185

bounds of the search space

) of the candidate solutions are randomly selected within the and

186

in a uniform manner in order to achieve a maximum

187

coverage of the search space. Each of the initial solutions is also known as a target

188

vector.

189 190

2.1.2. Mutation

191

After the population initialization phase, the DE algorithm generates a

192

donor/mutant solution

193

weighted difference between randomly selected solutions from the population. In

194

other words, three solutions

195

population such that

196

for each given target solution

and

by computing the

are randomly selected from the and

. The donor vector

is then computed as (2)

197

where

is the mutation factor that controls the mutation process and takes

198

magnitude values in the range of

.

199 200 201

8

202

2.1.3. Crossover

203

In this phase, a trial solution

is generated as a combination of the target

204

and the donor solutions where this process allows the target solution to inherit a

205

number of attributes from the donor solution. Certainly, the crossover process is

206

subjected to constraints that determine the number and indices of the inherited

207

features. The canonical DE algorithm employs the binomial crossover scheme which

208

is controlled by a probability

. The trial solution is then defined as

209 (3) 210 211

where

is a uniform random number in the range of [0,1], and

212

integer number

213

solution will contribute to the trial solution with one attribute at least.

. The purpose of using the

is a random

is to guarantee that the donor

214 215 216

2.1.4. Selection The role of the selection phase is to determine the fittest solution among the

217

target and the trial solutions based on the value of the objective function

to be

218

minimized. The vector that yields the lower value of the objective function will be

219

selected to survive (be a part of the population) to the next iteration

as follows:

220 (4) 221 222

The mutation, crossover, and selection processes are repeatedly executed

223

until the convergence or a specified termination criterion is met. The previous 9

224

description of the DE optimization algorithm has been employed in cluster analysis

225

(as will be described in Section 3) by considering the intra-cluster distances as the

226

objective function to be minimized. The next section is only dedicated to exhibit the

227

related works in the literature that have customized the DE algorithm as a clustering

228

tool.

229 230

2.2. DE Utilization in Cluster Analysis

231

An automatic clustering using an improved differential evolution algorithm

232

(ACDE) was proposed in [39]. This technique enhanced the convergence of the

233

canonical DE algorithm by varying the values of the mutation factor and crossover

234

probability through iterations using mathematical formulas to balance the search

235

behavior. Furthermore, the proposed technique provided a new representation of

236

each solution in the search process in order to automatically estimate the number of

237

clusters in a dataset, where the strength of a solution is measured using a clustering

238

validity index (e.g., the CS or Davis-Bouldin measure). The proposed technique was

239

tested on real-world datasets and grayscale images to prove its ability to detect the

240

optimal number of clusters. Despite the novelty of the proposed technique, its

241

performance is highly sensitive to the clustering validity index selection, where

242

insufficient selection may result into an inaccurate number of clusters.

243

The work presented in [40] adopted a similar solution representation to the

244

work proposed in [39] to introduce a multi-objective differential evolution algorithm

245

for automatic clustering. In this framework, two conflicting objective functions were

246

optimized simultaneously in order to produce multiple solutions that have different

247

trade-offs between the two objectives along with a different possible number of

248

clusters. Therefore, this approach provides the flexibility to choose the optimal

10

249

solution based on the problem to be optimized. However, the framework is highly

250

sensitive to the selection of the objective functions to be optimized.

251

An algorithm for data clustering with differential evolution incorporating

252

macro-mutations (DEMM) was proposed in [41]. Authors claimed that the solutions

253

(population) of the canonical DE algorithm tends to be more similar through

254

iterations, which limits the exploration behavior of the algorithm. Hence, they

255

proposed the macro-mutations as an alternative scheme to improve the exploration

256

ability of the DE algorithm, where a new probability of macro-mutations

257

defined to switch between the traditional mutation/crossover processes and the

258

macro-mutations scheme. This probability was increased through iterations by a

259

linear function to give a higher probability of exploring the search space by the

260

macro-mutations scheme at the final stages of the search. Furthermore, an

261

exponentially decreasing function of the crossover probability of the canonical DE

262

algorithm was introduced which allows an exploration process at early stages of the

263

search (high values) and gradually turns into an exploitation process (low values) at

264

the final stages of the search. In fact, the proposed technique provides a good balance

265

between the exploitation and exploration processes by adjusting the crossover and

266

macro-mutations probabilities and provided high-quality solutions based on the

267

reported experimental results. However, the minimum and the maximum values of

268

the two probabilities are set empirically and they are sensitive to the maximum

269

number of iterations.

was

270

In the work proposed in [42], an efficient approach based on the differential

271

evolution algorithm for data clustering was presented. The key idea of this technique

272

is simple, where each member of the population has an equal probability to be

273

selected for the mutation process. Furthermore, the algorithm employed the inter and

11

274

intra-cluster distances as the objective functions to be maximized and minimized,

275

respectively. Although the performance of the presented technique was compared to

276

the K-means algorithm and provided more acceptable results, further evaluation of

277

the algorithm performance on a wider range of real-world problems is needed.

278

An effective hybrid of bees’ and differential evolution algorithms in data

279

clustering was proposed in [43]. In this hybrid technique, the K-means algorithm was

280

used as a preprocessing stage to determine the initial cluster centroids. Then, the bee

281

algorithm was used to start the global search and represent the explorative search

282

behavior, where the DE algorithm was assigned to perform the local search to

283

represent the exploitative search behavior. The hybrid technique employed the

284

strengths of both algorithms and overcame their shortcomings. Based on the reported

285

experimental results, the algorithm produced competitive clustering results with

286

relatively acceptable complexity compared to other hybrids in the literature.

287

However, an adequate compromise between the quality of the solution and the

288

associated complexity is necessary when this hybrid is applied to cluster high

289

dimensional datasets.

290

Another hybrid technique of the K-means and DE algorithms was proposed in

291

[44] for optimal clustering. In this technique, the K-means algorithm was applied on

292

the generated trial solution by the DE algorithm to perform a local search. In

293

addition, a rearrangement scheme of the clusters’ centroids was presented to

294

maximize the classification process of data points. Furthermore, two different

295

objective functions were employed separately for clustering, the first aimed at

296

minimizing the trace within criterion (TRW) and the second aimed at maximizing the

297

variance ratio criterion (VCR). The proposed technique was tested on real-world

298

problems, and the results concluded that the hybrid DE algorithms outperformed the

12

299

non-hybrid ones when the previous two objective functions were used. However, the

300

performance of the hybrid will be affected if the objective function is replaced by

301

another clustering validity measure.

302

The work in [45] presented a dynamic shuffled differential evolution

303

algorithm for data clustering (DSDE) to enhance the convergence rate of the

304

canonical DE algorithm. Due to the sensitivity of most clustering algorithms to the

305

initial positions of centroids, this technique presented an initialization scheme called

306

the random multistep sampling to avoid the premature convergence to a local

307

optimum. Furthermore, a new sorting and dividing scheme of the population into two

308

subpopulations was presented based on the shuffled frog leaping algorithm to

309

increase the diversity (exploration) and enhance the information exchange among the

310

population. Additionally, both the convergence speed and the exploitation ability of

311

the algorithm were improved by employing the DE/best/1 mutation strategy for

312

subpopulations during the evolution process. The reported experimental results

313

concluded that the proposed technique has the ability to provide high quality

314

clustering solutions in terms of classification accuracy and intra-cluster distances

315

compared to the canonical DE and other state-of-the-art evolutionary algorithms in

316

the literature. On the other hand, the algorithm requires a high number of function

317

evaluations (in sorting and dividing the population) which is directly proportional to

318

the complexity of the algorithm.

319

Due to the high effect of the mutation strategy on the performance of the DE

320

algorithm, a forced strategy differential evolution algorithm for data clustering

321

(FSDE) was proposed in [46]. The new mutation strategy which was presented in

322

this DE variant is as follows:

323

13

(5) 324 325

where

is the mutant solution,

,

, and

are randomly selected solutions

326

from the population, and

327

Besides the traditional mutation factor

328

technique also proposed an additional controlling factor

329

in the range of [0,1] at each iteration. These modifications were made to improve the

330

quality of the mutant solution and consequently the efficiency of the DE algorithm. It

331

is worth mentioning that the clustering result of the K-means clustering algorithm

332

was used as one of the initial population to the proposed DE variant, where the

333

remaining population members are initialized randomly. Based on the reported

334

experimental results, the proposed technique provided good clustering solutions

335

according to different clustering validation indices.

336

To summarize, the following problems need to be addressed in order to develop a

337

robust DE clustering algorithm:

is the best solution found at the current iteration . which takes a constant value of 0.6, this that takes a varying value

338

i. The initial population sizes and positions highly affect the performance of the

339

DE algorithm. Most of the previous studies [31, 32] assign the initial

340

positions of the population (centroids of the clusters) randomly, which

341

increases the algorithm vulnerability to premature convergence, and

342

entrapment or stagnation into local optimal solutions.

343

ii. The adopted mutation scheme by the DE algorithm plays an important role in

344

the search behavior of the algorithm. Specifically, the canonical DE

345

algorithm employed the DE/rand/bin/1 mutation scheme in producing the

346

mutant solution. This scheme preserves the diversity of the search through the

347

randomness concept, but it slows the convergence speed of the algorithm and 14

348

it does not guarantee the convergence to the optimal solution. On the other

349

hand, other approaches employ the DE/best/bin/1 mutation scheme in

350

producing the mutant solution [45, 46]. This scheme indeed increases the

351

convergence speed of the DE algorithm but the greediness toward the best

352

solution makes the algorithm vulnerable to premature convergence to local

353

optima.

354

iii. The DE algorithm includes parameters to control the search behavior such as

355

the mutation and crossover factors which control the generation of the mutant

356

and the trial solutions, respectively. The previous studies related to the DE

357

clustering literature [40-42, 45, 46] assign magnitude values for these

358

controlling parameters within a conventional range which neglects the variety

359

of the clustering problems and the data distribution along all the dimensions.

360

The main problem is to find the optimal estimations or settings for these

361

controlling parameters to make the algorithm more efficient in solving a

362

wider range of clustering problems.

363

iv. In the majority of DE-based clustering algorithms, the trial solution is always

364

selected to be compared to the best solution in order to select the fittest one,

365

where sometimes the mutant solution is fitter than both of them. Hence,

366

adding a restriction to this step would be beneficial in increasing the

367

algorithm convergence rate.

368

The previous literature on the diverse proposed techniques surely enhanced

369

the performance of the canonical DE algorithm. However, an algorithm that

370

guarantees the convergence to the global optimum (by balancing the search

371

behavior), insensitive to the initial population size and positions, requires few input

372

parameters, and has a low computational complexity is still missing. These demands

15

373

motivate the research work proposed in Section 3 in order to improve the ability of

374

the DE algorithm to provide faster convergence with better clustering solutions.

375 376

3. The Proposed Algorithm

377

The proposed algorithm consists of four main stages. The first one is the

378

initialization and problem formulation stage, where the initial solution representation

379

and initial position to the search space are determined. Also, the preprocessing step

380

for the input data to be clustered is presented to cope with the new proposed mutation

381

scheme. The second stage includes the two employed mutation schemes, the

382

switchable mechanism between them, and the vector-based estimation of the

383

mutation factor. The third stage is concerned with the optional crossover strategy

384

where the necessity of the crossover process is determined according to the fitness of

385

both the mutant and trial solutions. Finally, the fourth stage determines whether the

386

trial or the current solution will survive to the next iteration through the selection

387

process of the fittest. Fig. 1 shows the general flow of the proposed VDEO algorithm

388

where the four stages mentioned above are described in the following sections.

389

16

Start

Multidimensional mutation factor estimation & search space division

Estimate the mutation factor and sort the input dataset according to the feature with the maximum variance

Divide the dataset into K subsets (equal to the number of required clusters) Initialize the solution position randomly to the search space

No

Exploration Process (2nd mutation scheme)

If random >Threshold ϕ

Initialization and preprocessing for the exploitation process

Switchable mutation

Yes

Generate two random solutions by selecting each centroid within the bounds of the whole input dataset.

Generate three random solutions by selecting each centroid within the bounds of the corresponding Kth subset.

Create the mutant solution using the DE/best/1 scheme

Create the mutant solution using the DE/rand/1 scheme

Exploitation Process (1st mutation scheme)

Create the trial solution using the random crossover probability

Compare and select the fittest between the mutant and the trial solutions

Balancing the search behavior

Optional crossover and selection processes

Compare to the current solution and select the fittest for the next iteration

No

Termination criter ion ?

Yes

Faster convergence

Optimal clusters

390 391

End

392 393

Fig. 1. An overview of the proposed Variance-based Differential Evolution algorithm with an Optional Crossover for Data Clustering (VDEO).

394 395 396 397 17

398

3.1. Initialization and Problem Formulation

399

In order to customize the DE algorithm for data clustering purposes, the

400

representation of the solution is modified. In this research work, a single-based

401

solution scheme is adopted, and the concept of the population is discarded. This

402

reduces the limitations associated with multi-based solutions techniques such as the

403

required number of initial solutions (which is considered as another optimization

404

problem depending on the input data size [47]) and their associated complexity

405

represented by the fitness function evaluation of each solution. Hence, only one

406

solution

is initialized to the search space and represented by a matrix as follows

407

,

(6)

408 409

where each row vector represents a coordinate of a centroid with

number of

410

attributes, and the number of the row vectors is equal to the required

number of

411

clusters. In other words, the first dimensional vector in the matrix refers to the

412

position of the first centroid, the

413

dimensional vector refers to the position of the

centroid, and so on. The initial positions of the centroids that form the whole

414

solution (the target solution)

are distributed in a uniformly random manner to the

415

search space . The constraints of the maximum and the minimum values of the

416

attributes are also considered, where each centroid that corresponds to a row vector

417

takes values between

418

Section 2.1.1).

and

of the dataset attributes (as described in

18

419

To represent the quality of the solution in an optimization-based clustering

420

problem, an objective function based on a similarity measure should be defined. In

421

this paper, the employed objective function is the sum of intra-cluster distances,

422

where the Euclidean distance between data points and the corresponding clusters

423

centers is selected as a similarity measure. The value of the objective function

424

can be calculated as:

425

(7)

426 427 428

where

is the number of clusters,

is the corresponding cluster of a centroid

is a similarity function (Euclidean distance), and with the centroid

,

is the -th data point that

429

belongs to the cluster

. The smaller the value of the objective

430

function is, the more compact the clusters are. Hence, the clustering process of the

431

proposed algorithm is considered as a global optimization problem that aims at

432

minimizing the intra-cluster distances.

433

Due to the adopted single-based solution representation in the proposed

434

algorithm, a preparation step for the new mutation scheme is also introduced to cope

435

with the framework mechanism. Assuming a

436

of observations, and

437

subsets using the following steps:

dataset, where

is the number

is the number of features. The dataset will be divided into

438

1) Compute the variance of each feature (column).

439

2) Find the column with the maximum variance and sort the dataset in an

440

ascending order according to it.

19

441

3) Split the dataset into approximately equal sized

subsets, where

is the

442

number of clusters. Assuming that

is an index that runs over the clusters

443

such that

444

used to define each subset. For example, for

445

contain all observations whose indices lie in the interval between

446

Consequently, the second subset will contain all observations whose indices

447

lie in the interval between

and so on. It is worth

448

mentioning that if the term

is not a valid index, the floor function is

449

used to convert it to the closest smaller integer. Hence, the output can be

450

expressed as:

, the formula

can be , the first subset will .

451

.

(8)

452 453

3.2. Mutation

454

In the proposed algorithm, two different mutation schemes are introduced to

455

produce the donor/mutant solutions. The first mutation scheme uses the DE/rand/1

456

strategy and it employs the

457

solutions are selected to generate the mutant solutions. In contrast, the second

458

mutation scheme uses the DE/best/1 strategy and it includes the whole dataset

459

bounds to select the random solutions in order to generate the mutant solutions.

460

During the clustering process of the proposed algorithm, only one of the mutation

461

schemes is executed at an iteration. Hence, both mutation schemes and the selection

462

criteria between them (switching) is also presented in the next sections.

output subsets as a pool from where the random

20

463

3.2.1. The First Mutation Scheme

464

This mutation scheme uses the DE/rand/1 strategy, i.e., the weighted

465

difference of two randomly selected solutions is added to a third random solution to

466

produce the mutant solution as in Eq. (10). In addition, this strategy uses the

467

binomial crossover to produce the trial solution. The proposal of this mutation

468

scheme represents the exploitation part of the framework, where the mutant solution

469 470

is generated using three solutions to

the

obtained

subsets

as

in

471 472 473

,

, and

Eq.

(8).

which are produced according To

be

more

specific,

the

and of each

are first computed, then the general formula for producing a random

is as follows:

474

(9)

,

475 476

where

and the attribute values of a centroid

are generated in a

477

uniformly random manner within the range of the minimum and maximum values of

478

the corresponding subset such that

479

solutions are generated at each iteration and for each solution the first centroid is

480

assigned using a uniform random function within the bounds of the first subset, the

481

second centroid is assigned using a uniform random function within the bounds of

482

the second subset, the

483

within the bounds of the

484

calculated as:

. More specifically, three

centroid is assigned using a uniform random function subset, and so on. The mutant solution

21

is then

485 (10) 486 487

where

is the mutation factor that weights the difference of the mutation process.

488 489

3.2.2. The Second Mutation Scheme

490

The alternative mutation scheme proposed in this algorithm uses the

491

DE/best/1 strategy, i.e., the weighted difference of two randomly selected solutions is

492

added to the best (current) solution to produce the mutant solution (as in Eq. (11)).

493

Also, this strategy uses the binomial crossover to produce the trial solution. The

494

proposal of this mutation scheme represents the exploration part of the algorithm,

495

where the best solution found until the current iteration

496

random solutions

497

contrast to the first mutation scheme, the centroids of the random solutions are

498

assigned in a uniformly random manner within the minimum and maximum values

499 500

and

and

are employed in generating the mutant solution. In

of the whole dataset before the division process. Therefore, the

general formula for producing a random

501

along with two other

is the same as in Eq. (9), where

and the attribute values of a centroid

are generated by a uniform random

502

function within the range of the minimum and maximum values of the whole dataset

503

such that

. The mutant solution

is then calculated as

504 ,

(11)

505 506

where

is the mutation factor that weights the difference of the mutation process.

507

Usually, the mutation factor

takes magnitude values in the range of 22

in the

508

proposed techniques in the literature which were reviewed in Section 2.2. This

509

magnitude value does not take into consideration the spatial distribution of the data

510

in each dimension. Furthermore, the effect of this magnitude value on a variety of

511

datasets with multiple dimensions and variances is different. Hence, in this work, a

512

new simple, dynamic, and vector-based estimation technique of

513

value of the mutation factor is set to be equal to the variance vector that corresponds

514

to the input dataset to be clustered, and it is represented as:

is presented. The

515 (12) 516 517

where

is the variance of the attribute column and

518

dataset. It is worth mentioning that this

519

mutation schemes of the proposed algorithm.

is the dimensions of the

estimation method is applied in both

520 521

3.2.3. The Mutation Scheme Switching Criteria

522

At each iteration, only one of the previously described two mutation schemes

523

is selected to generate the mutant solution. In order to determine which scheme will

524

be executed, a threshold value

525

number

526

the mutation schemes such as:

is introduced, then a uniform random

is generated at each iteration to control the switching process between

527 (13) 528 529

where

represents the current iteration, and

530

threshold. The value of

is the mutation scheme switching

is set to 0.5 to give an equal probability of the two 23

531

mutation schemes to be selected at each iteration. In other words, the probability of

532

the generated value of the

533

equal.

at each iteration to be smaller or greater than

is

534 535 536

3.3. Optional Crossover The canonical DE algorithm applies the binomial crossover process between

537

the target solution

(current solution) and the mutant solution

in order to

538

produce a trial solution

539

of both the target and trial solutions are compared to select the fittest solution that

540

will survive for the next iteration. In fact, it is not always guaranteed that the trial

541

solution will be fitter than both the mutant and the target solution. Particularly, the

542

crossover process is not always a step forward in enhancing the quality of the current

543

solution. Hence, in this research work, the possibility of the mutant solution

544

fitter than the trial solution

545

solution is modified to represent the fitter solution among the mutant and the trial

546

solutions based on the value of the objective function

547

follows:

(as in Eq. (3)). After that, the objective function outputs

being

is considered. Therefore, the definition of the trial

to be minimized as

548 (14) 549 550

The new introduced simple mechanism of determining the trial solution can

551

increase the convergence speed of the algorithm. Also, the crossover probability

552

in Eq. (3) is usually set as a fixed value within the range of

553

the proposed approaches in the literature. In the proposed algorithm, the value of the

554

is varied through the iterations within the range of values 24

in the majority of

using a uniform

555

random function. Meanwhile, the

value becomes a random integer number

556

since now the dimension of the optimization problem depends on

557

the number of clusters and the number of attributes of the dataset to be clustered.

558

This random assignment increases the diversity of the search and produces a variety

559

of mutant solutions that contribute to evolving the quality of the solution.

560 561

3.4. Selection

562

After determining the trial solution using the modified crossover scheme, the

563

selection of the fittest solution among the target and the trial solutions is determined

564

similarly to the canonical DE algorithm as in Eq. (4). Eventually, the evolution of the

565

quality of the target solution

566

crossover, and selection processes are repeated until convergence.

is transferred to the next iteration and the mutation,

567

The main objective of the proposed algorithm is to enhance the exploitation

568

ability of the DE algorithm through the divide and conquer strategy and to regulate

569

its random exploration behavior. The input dataset is divided into

570

the attribute with the highest variance, where each subset is expected to have at least

571

one cluster. Then, using the first mutation strategy, the difference between the

572

randomly produced solutions (where each centroid is selected from its corresponding

573

subset) confines each centroid to exploit its current subset for the best position. Since

574

only using this strategy might lead to premature convergence or stagnation, an

575

alternative strategy (the second mutation strategy) is also presented to produce these

576

random solutions by selecting the centroids positions within the bounds of the whole

577

dataset. This second mutation strategy allows the random exploration of the search

578

space but does not waste the resources of the algorithm since the best solution

579

scheme is adopted. The switching between these two mutation schemes with an equal

25

subsets based on

580

probability gives the algorithm the ability to balance the search behavior of the

581

algorithm.

582

Another noteworthy modification that contributes to the balance of the search

583

behavior of the DE algorithm is the mutation factor. The vector-based estimation of

584

the mutation factor

585

the search process, where the difference of each attribute in the produced solution is

586

weighted based on its variance. This estimation takes into consideration the

587

distribution of the input data and logically leads to a better search behavior.

according to the variance of each attribute gives more insight to

588

The last proposed modification is the optional crossover strategy which aims

589

at increasing the convergence speed of the algorithm. Due to the random search

590

characteristics of the DE algorithm, a better solution is not expected to appear at each

591

iteration. By applying the same concept to the crossover process, it is not always

592

guaranteed to produce a trial solution which is fitter than the mutant solution and

593

vice versa. Hence, selecting the fittest among these two solutions to be compared to

594

the best solution increases the chance to find the global solution and consequently

595

increases the convergence speed of the algorithm. For more clarity, the analysis of

596

the working mechanism of the proposed VDEO framework along with an illustrative

597

example are provided in Section 3.5. The pseudo-code of the proposed VDEO

598

algorithm is given in Algorithm 1.

599 600 601 602 603 604

26

605

Algorithm 1: Pseudo-code of the proposed VDEO algorithm.

: Dataset : Optimal clusters 1) Initialization // (number of clusters) // Centroids positions of , where as in Eq. (9) // Mutation factor // Sort and divide the dataset into subsets as in Eq. (8) // Best solution // (maximum number of iterations) // (the current iteration)

2) Mutation Generate [

and

as in Eq. (9) within the corresponding subset range

]; Create the mutant solution

as in Eq. (10);

Generate and as in Eq. (9) within the whole dataset range Create the mutant solution as in Eq. (11); Create the trial solution

;

as in Eq. (3);

3) Optional crossover The trial solution

;

The trial solution

;

4) Selection Determine

as in Eq. (4);

606 607

3.5. Characteristics of The Proposed Algorithm

608

This section is dedicated to provide more enlightening about the working

609

mechanism of the proposed VDEO algorithm. To observe the search behavior of the

610

proposed algorithm, it was applied to cluster the Ruspini dataset [48] which has 75

611

observations, 2 dimensions, and 4 clusters. This dataset was selected for illustration

612

purposes only due to its simplicity, clarity of its distinctive clusters, and the ability to

613

plot its 2-D features. 27

614

The VDEO algorithm adopts a stochastic and heuristic working mechanism

615

(which is based on the DE algorithm) while searching for the solution. This

616

randomness property made observing and tracing the solutions trajectories through

617

iterations strenuous. Therefore, in what follows, only examples of the significant

618

cases are provided to illustrate the working mechanism of the VDEO algorithm when

619

applied to the Ruspini dataset. In Fig. 2, the current solution (centroids) are

620

represented by black asterisk marks (i.e., ‘*’) and it was initialized randomly.

621

Moreover, Fig. 2 shows the initial setups of the VDEO algorithm, where each

622

rectangle represents a subset bounds after dividing the dataset into 4 subsets (as in

623

Eq. (8)). The colors of each rectangle and the included data points are only set for

624

clarification purposes and they are not indicating the current clusters. The randomly

625

generated solutions from the first mutation strategy are represented by red, green,

626

and blue triangles where every four triangles sharing the same color represent a

627

complete solution. As shown in Fig. 2(b), the generation of these solutions is

628

restricted by selecting each centroid from a different subset in case this mutation

629

strategy is selected (exploitation). In other words, each subset bound must contain

630

three different colored triangles (there are no triangles sharing the same color inside

631

one subset bound or located outside the bounds). Similarly, the randomly generated

632

solutions from the second mutation strategy are represented by red and blue squares

633

where every four squares sharing the same color represents a complete solution. As

634

shown in Fig. 2(c), the generation of these solutions is totally random and each

635

centroid is selected from the whole dataset bound (exploration) with no restrictions

636

related to the subset bounds. These two strategies will alternatively generate different

637

permutations of the random solutions that will contribute to producing the mutant

638

solution at each iteration.

28

639

Fig. 2(d) shows the mutant and the trial solutions which are represented by

640

cyan diamonds and yellow hexagrams, respectively. In this figure, neither the mutant

641

nor the trial solutions (centroids) are located in better positions than the current

642

solution (the best solution). Specifically, regardless of the colors of the data points,

643

the mutant and the trial solutions do not provide better clustering solution

644

(representative centroids of the four clusters of the Ruspini dataset). Hence, in this

645

scenario, the current solution will not be changed and it will survive to the next

646

iteration. On the other hand, Fig. 2(e) shows another scenario where the trial solution

647

is fitter than the current solution. It shares three centroids with the current solution

648

and a better position for the fourth centroid (the yellow hexagram in the blue

649

rectangle). This better position is a result of crossing over the current solution and the

650

mutant solution provided by the exploitative mutation scheme. To be more specific,

651

the trial vector inherits the first three centroids from the current solution and the

652

fourth centroid from crossing over the current and the mutant solutions (the x-axis

653

from the current solution and the y-axis from the mutant solution). Hence, in this

654

scenario, the trial solution will be set as the current solution (the best solution) and

655

the fourth centroid will move from the current position to the better one as illustrated

656

by the black arrow. Eventually, the mutation (two switchable schemes), crossover

657

(with optional strategy) and selection processes are repeatedly executed until the

658

final best solution is obtained which is depicted in Fig. 2(f).

29

659 660 661 662

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 2. Ruspini dataset clustering by the proposed VDEO algorithm. (a) initial setup of the algorithm, (b) random solutions generation using the first mutation scheme, (c) random solutions generation using the second mutation scheme, (d) neither the mutant nor the trial solutions are fitter than the current one, (e) the trial solution is fitter than the current solution, and (f) the final clustering solution.

663 664

30

665

4. Experimental Results and Discussion

666

The proposed VDEO algorithm aims at enhancing the convergence speed of

667

the canonical DE algorithm while providing high quality clustering solutions. To

668

achieve these two conflicting goals, the proposed VDEO algorithm presents a new

669

switchable mutation scheme, multidimensional and variance-based estimation of the

670

mutation factor, and an optional crossover strategy. Intuitively, these modifications

671

will set the appropriate balance between the exploitation and exploration processes

672

and consequently improve both the convergence rate and the clustering solution

673

quality.

674 675

4.1. Methods

676

In order to validate the proposed VDEO algorithm, it was tested on 15

677

datasets selected from the UCI repository [49]. The selection of these datasets

678

considers different distributions, complexities, and clusters overlapping of the data.

679

The used datasets are Iris, Balance, Wine, Cancer (WDBC), Lung Cancer,

680

Transfusion (BTSCD), Breast Cancer (WDBC-Int), Glass, Vowel, Seeds, New

681

Thyroid, Haberman, Dermatology, Heart, and Landsat. The description of these

682

datasets is provided in Table 1.

683 684 685 686 687 688 689

31

690

Table 1: Description of datasets.

Dataset Iris Haberman New Thyroid Seeds Lung Cancer Glass Wine Balance Vowel BTSCD Heart WDBC-Int Dermatology WDBC Landsat

Number of observations 150 306 215 210 32 214 178 625 871 748 303 699 366 569 2000

Number of features 4 3 5 7 56 9 13 4 3 4 13 9 34 30 36

Number of classes 3 2 3 3 3 6 3 3 6 2 2 2 6 2 6

691 692

The performance of the proposed algorithm was compared to the canonical

693

DE algorithm [24] with two variations DE/rand/1/bin and DE/best/1/bin based on the

694

adopted mutation scheme. Moreover, the comparison included the most recent state-

695

of-the-art DE-based clustering techniques, namely, a dynamic shuffled differential

696

evolution algorithm for data clustering (DSDE) [45], and a forced strategy

697

differential evolution algorithm for data clustering (FSDE) [46]. For a fair

698

comparison, the maximum number of fitness function evaluations was set to 1e4 for

699

all experiments as recommended in [50]. The parameters used for all algorithms are

700

summarized in Table 2 according to the reported setups in their original works.

701

First, each of the competing algorithms was tested on the 15 selected datasets

702

through 50 independent runs. The average objective function values (sum of intra-

703

cluster distances) and the average classification accuracy (CA) of the obtained

704

solutions by each algorithm through the 50 runs are presented in Tables 3 and 4,

705

respectively. Then, to detect the statistical differences among a group of results, the 32

706

Friedman Aligned-Ranks (FA) test was used to obtain a rank for each algorithm [51].

707

Consequently, the adjusted p-values can be computed by applying a post-hoc method

708

using the results in the Friedman Aligned-Ranks test. The Holm's test, which is

709

described in [52] is adopted as the post-hoc method. In this paper, the null hypothesis

710

is the case of no difference between the performances of two clustering methods. If

711

the p-value is less than or equal to a specified significance level, then the null

712

hypothesis is rejected and the existence of a significant difference between the two

713

methods is accepted. To determine the significance, this study set the default level of

714

significance to 0.05, then the adjusted p-values obtained by the Holm post-hoc will

715

decide the corresponding level of significance of each experiment. Eventually, the

716

convergence rates of each algorithm represented by the convergence curves are

717

provided for all algorithms on all datasets in Fig. 3.

718 719

Table 2: Experimental parameter settings for the DE-based clustering algorithms.

Algorithms DE/rand/1/bin DE/best/1/bin [24] DSDE [45]

Parameters/Values

FSDE [46] The proposed VDEO 720 721

4.2. Experimental Results

722

From the reported results in Table 3, the proposed VDEO algorithm obtained

723

similar or better objective function values than all the competing algorithms in 13 out

724

of 15 test datasets (bolded results), while it also attained a very competitive objective

725

function values in the other 2 datasets (Wine and Balance) in contrast to the best

726

values achieved by the DSED and DE/best/1/bin algorithms, respectively. Moreover, 33

727

the standard deviation values obtained by the proposed VDEO algorithm are smaller

728

than those achieved by the other competing algorithms in 12 out of 15 datasets,

729

which suggests the effectiveness and results repeatability of the proposed VDEO

730

algorithm. In general, the DE/best/1/bin algorithm produced better objective function

731

values than the DE/rand/1/bin algorithm in all test datasets except for the Iris,

732

Haberman, WDBC-Int, and WDBC datasets. Furthermore, the FSDE and DSDE

733

algorithms performed better than both DE/best/1/bin and DE/rand/1/bin algorithms in

734

at least 5 and 12 out of 15 test datasets, respectively. The employed forced strategy

735

by the FSDE algorithm slightly enhanced the performance in contrast to the both

736

mutation strategies (rand and best) adopted by the canonical DE algorithm.

737 738 739

Table 3: Average Objective Function Values and Standard Deviation (Std) Among the Competing DEBased Clustering Algorithms For 50 Runs on the 15 Datasets. Dataset

Method DE/rand/1/bin DE/best/1/bin Iris DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Haberman DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin New Thyroid DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Seeds DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Lung Cancer DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Glass DSDE FSDE

Best 96.62 96.54 96.65 96.65 96.54 2566.99 2566.99 2566.99 2566.99 2566.99 1873.31 1866.47 1866.47 1866.54 1866.47 315.48 311.80 311.80 311.83 311.80 119.38 106.20 105.34 113.13 103.52 265.17 214.81 210.26 222.54

34

Worst 109.03 127.57 96.65 96.77 96.54 2567.00 2567.82 2566.99 2566.99 2566.99 2093.37 2155.62 1895.99 1895.91 1868.44 352.28 311.80 311.80 315.37 311.80 124.79 117.56 109.52 117.97 109.72 314.54 254.70 249.55 271.33

Median 98.26 96.54 96.65 96.68 96.54 2566.99 2566.99 2566.99 2566.99 2566.99 1898.33 1890.21 1868.29 1890.27 1868.29 327.29 311.80 311.80 311.93 311.80 122.31 110.97 106.97 116.48 106.14 282.51 246.69 215.19 246.36

Mean 100.10 100.52 96.65 96.70 96.54 2566.99 2567.07 2566.99 2566.99 2566.99 1919.12 1900.50 1874.00 1882.60 1867.51 329.15 311.80 311.80 312.19 311.80 122.04 111.76 107.23 116.17 105.91 283.20 243.27 220.83 245.02

Std 4.22 9.81 0.00 0.10 0.00 0.00 0.26 0.00 0.00 0.00 52.23 60.73 11.76 11.74 0.91 10.88 0.00 0.00 0.78 0.00 1.64 3.54 1.18 1.63 1.51 11.39 10.09 12.16 12.13

The proposed VDEO DE/rand/1/bin DE/best/1/bin Wine DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Balance DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Vowel DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin BTSCD DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Heart DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin WDBC-Int DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Dermatology DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin WDBC DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Landsat DSDE FSDE The proposed VDEO

210.40 16309.31 16292.18 16292.18 16295.19 16292.43 1429.89 1423.82 1423.82 1424.47 1423.83 171462.40 148967.24 149062.08 149684.28 149073.62 407714.24 407714.23 407714.23 407714.23 407714.23 11687.69 11685.14 11685.14 11685.41 11685.15 2965.37 2964.39 2964.39 2964.41 2964.41 2760.83 2049.92 1995.83 2398.88 1995.83 149519.13 149473.88 149473.86 149479.81 149473.86 190329.69 117269.37 106209.14 149897.82 98762.72

216.55 16440.72 16294.69 16292.67 16423.60 16295.15 1435.36 1425.94 1431.59 1430.51 1426.29 187476.12 153051.91 160667.46 169710.95 150913.46 407721.83 407714.23 407714.23 407918.88 407714.23 11694.39 11685.46 11685.21 11694.35 11685.15 2972.50 3108.78 2990.20 2964.65 2964.49 2964.67 2172.80 2017.63 2722.89 1995.83 149881.22 152280.00 149473.93 149555.75 149473.86 231272.96 165444.97 120436.04 183902.56 110352.64

214.03 16355.82 16292.67 16292.18 16303.89 16293.21 1431.30 1423.82 1428.44 1426.81 1425.64 179146.96 149070.46 150130.05 154197.35 149388.61 407714.25 407714.23 407714.23 407714.23 407714.23 11689.00 11685.14 11685.15 11685.78 11685.15 2966.53 2964.39 2964.39 2964.46 2964.43 2864.28 2099.10 2007.15 2619.30 1995.83 149552.64 149473.94 149473.86 149486.14 149473.86 214775.55 121387.57 116954.00 171863.49 104248.29

213.62 1.99 16359.22 33.34 16292.92 0.77 16292.28 0.20 16325.26 38.92 16293.56 0.87 1432.00 1.73 1424.43 0.87 1427.70 3.74 1427.29 1.97 1425.13 0.88 179692.78 4576.01 149850.77 1091.12 150629.33 2530.23 156641.90 6285.35 149682.56 615.17 407715.10 2.24 407714.23 0.00 407714.23 0.00 407733.76 52.21 407714.23 0.00 11689.55 1.90 11685.18 0.11 0.02 11685.15 11686.43 2.08 11685.15 0.00 2966.93 1.80 2971.67 32.27 2968.28 9.45 2964.47 0.05 2964.43 0.02 2854.43 48.66 2104.62 28.43 2006.13 6.45 2600.49 78.55 1995.83 0.00 149577.83 81.82 149621.72 626.18 149473.87 0.02 149490.61 16.88 149473.86 0.00 214280.25 10694.11 137836.10 22526.19 115179.22 4685.60 169425.23 10704.02 104574.39 3028.83

740 741

On the other hand, the DSED algorithm significantly enhanced the

742

performance of the DE algorithm by only employing the best strategy coupled with

743

the adopted initialization and sorting/dividing schemes of the population. Eventually,

744

the proposed VDEO algorithm employed the switchable property with equal

35

745

probability between the best and rand mutation schemes to inherit the advantages of

746

both of them and to balance the exploitation and exploration processes. In addition,

747

the restricted choice of generating the random solutions that form the mutant solution

748

associated with each mutation scheme gave the proposed VDEO algorithm the

749

superiority over all the competing algorithms.

750

Similarly, the best, worst, median, mean, and standard deviation (Std) values

751

of the average classification accuracies obtained by the competing algorithms are

752

summarized in Table 4. It is observed that the proposed VDEO framework obtained

753

similar or better average classification accuracies in 10 out of 15 datasets than all

754

other competing algorithms (i.e., the Iris, Haberman, Seeds, Balance, BTSCD, Heart,

755

WDBC-Int, Dermatology, WDBC, and Landsat datasets), and provided a very

756

competitive results in the remaining 5 datasets. Furthermore, the proposed algorithm

757

achieved an average enhancement of the classification accuracy up to 11.98% over

758

the competing algorithms in the Landsat dataset, which is considered as a large scale

759

one.

760 761 762

Table 4: Average classification accuracy and standard deviation (Std) among the competing DE-based clustering algorithms for 50 runs on the 15 datasets. Dataset

Iris

Haberman

New Thyroid

Seeds

Method DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin

Best% 94.67 90.00 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 67.91 65.58 65.58 65.58 65.58 90.95 89.52

36

Worst% 88.67 54.00 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 60.00 50.70 58.14 58.14 59.30 86.19 89.52

Median% 90.00 90.00 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 65.58 65.58 62.79 65.58 63.12 89.52 89.52

Mean% 89.43 85.23 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 64.74 63.79 62.14 63.44 62.58 88.93 89.52

Std 1.47 11.88 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.29 3.94 1.71 2.93 1.34 1.19 0.00

Lung Cancer

Glass

Wine

Balance

Vowel

BTSCD

Heart

WDBC-Int

Dermatology

WDBC

Landsat

DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO

89.52 89.52 89.52 59.26 62.96 70.37 59.26 66.67 51.87 54.21 53.27 54.21 55.61 71.91 71.91 71.91 71.91 71.91 57.76 56.16 53.92 55.36 58.40 61.54 57.98 57.98 59.82 59.01 65.11 65.11 65.11 65.11 65.11 59.26 59.26 58.92 59.26 58.92 96.49 96.49 96.49 96.49 96.49 32.12 30.73 29.89 31.01 29.89 86.82 86.82 86.82 86.82 86.82 51.55 51.60 52.80 51.85 67.60

37

89.52 89.52 89.52 37.04 37.04 51.85 37.04 48.15 48.13 50.47 37.85 47.20 46.73 70.79 70.79 71.35 70.79 70.79 44.00 49.44 50.08 47.20 47.20 45.24 47.30 47.76 47.53 47.65 65.11 65.11 65.11 65.11 65.11 58.25 58.59 58.59 58.59 58.92 96.19 95.17 96.34 96.49 96.49 25.70 26.82 26.54 24.58 29.89 86.82 86.82 86.82 86.82 86.82 31.35 33.00 50.90 33.35 45.70

89.52 89.52 89.52 46.30 44.44 62.96 46.30 59.26 49.53 51.17 48.60 51.87 51.87 70.79 71.35 71.91 71.35 71.35 50.64 52.64 52.64 52.48 53.28 51.72 49.37 57.06 52.18 49.20 65.11 65.11 65.11 65.11 65.11 58.92 58.92 58.75 58.92 58.92 96.49 96.49 96.49 96.49 96.49 28.77 27.51 27.09 28.63 29.89 86.82 86.82 86.82 86.82 86.82 40.28 50.85 51.03 47.98 55.75

89.52 89.52 89.52 45.56 46.30 61.67 44.63 58.52 49.86 51.50 48.41 51.82 51.33 71.12 71.46 71.80 71.43 71.35 50.30 52.45 52.75 52.09 53.64 53.15 52.20 54.63 53.19 52.22 65.11 65.11 65.11 65.11 65.11 58.79 58.89 58.75 58.87 58.92 96.45 96.42 96.46 96.49 96.49 29.01 28.07 27.67 28.39 29.89 86.82 86.82 86.82 86.82 86.82 41.47 43.94 51.31 43.60 57.06

0.00 0.00 0.00 8.51 8.37 5.80 7.36 5.71 1.07 0.97 3.44 1.62 1.10 0.46 0.35 0.23 0.38 0.45 4.22 1.71 0.94 2.10 2.62 4.33 4.59 4.27 4.02 4.55 0.00 0.00 0.00 0.00 0.00 0.25 0.15 0.17 0.16 0.00 0.08 0.29 0.05 0.00 0.00 1.62 1.25 1.25 1.93 0.00 0.00 0.00 0.00 0.00 0.00 7.33 8.88 0.65 7.74 5.68

763

In addition to the aforementioned competitive results obtained by the

764

proposed VDEO algorithm in terms of both average objective function and CA

765

values over the competing algorithms, the convergence rate to the optimal solution is

766

also improved. Fig. 3 illustrates the convergence performance of the competing

767

algorithms through the maximum number of fitness function evaluations over the 50

768

runs on all test datasets. It is noticed that the objective function values of all

769

competing algorithms at the beginning are relatively high if compared to the DSDE

770

algorithm. This is imputed to the employed initialization method (multistep

771

sampling) by the DSDE algorithm, which enhances the quality of the initial solutions

772

as a preprocessing step. Conversely, all other competing algorithms employed a

773

random initialization method of the initial solutions. Obviously, the DE/best/1/bin

774

algorithm converged faster than both DE/rand/1/bin and FSDE algorithms in all test

775

datasets regardless of the finally obtained objective function value. This fast

776

convergence is due to the adopted mutation scheme that is based on involving the

777

best solution while producing the mutant solution. Unfortunately, the greediness of

778

this adopted mutation scheme sometimes leads to the premature convergence to a

779

local optimal solution. For Instance, Fig. 3a shows the fast convergence of the

780

DE/best/1/bin algorithm in contrast to the FSDE algorithm at the beginning, but

781

eventually, the FSDE algorithm succeeded in finding a better solution (with better

782

objective function value) than the DE/best/1/bin algorithm. Moreover, the

783

convergence performance of the DE/rand/1/bin is considered to be poor if compared

784

to all other algorithms due to the adopted rand mutation scheme in producing the

785

mutant solution. Furthermore, the adopted forced mutation scheme by the FSDE

786

algorithm improved the convergence of the canonical DE algorithm with the

38

787

DE/rand/1/bin mutation scheme to some extent, but it is still inferior in comparison

788

to the DE/best/1/bin, DSDE, and the proposed VDEO algorithm.

789 790

4.2.1. Statistical analysis of the results

791

Friedman Aligned-Ranks test was used to detect the significance of the

792

proposed algorithm over the other competing algorithms. Table 5 displays the

793

computed ranks through the Friedman Aligned-Ranks (FA) test and the adjusted p-

794

values with Holm post-hoc test (Holm APV) for the mean objective function values

795

(Table 3). In this table, the algorithms are ordered from the best to the worst ranking,

796

and significant results in Holm APV are bolded. Since the proposed VDEO

797

algorithm obtained the best FA ranking, it was considered as the control method. The

798

obtained p-values by the Holm post-hoc test confirm the significant improvement of

799

the proposed VDEO algorithm over the FSDE and DE/rand/1/bin algorithms (their p-

800

values 0.004819 and 0.000002 are ≤ 0.016667 and 0.0125, respectively).

801 802 803

Table 5: Average ranking obtained by Friedman Aligned-Ranks and Holm's test for objective function values using the 15 datasets of the competing algorithms.

Algorithm The proposed VDEO DSDE DE/best/1/bin FSDE DE/rand/1/bin 804

FA ranking 22.4 25.6 36.8667 44.8333 60.3

p-Values

Holm APV

0.687611 0.069091 0.004819 0.000002

0.05 0.025 0.016667 0.0125

Aligned Friedman statistic: 12.084555. p-Value computed: 0.016733445627.

805 806

Similarly, the Friedman Aligned-Ranks analysis was conducted on the

807

provided results in Table 4. The performance order of the competing algorithms in

808

terms of the obtained average CA is the proposed VDEO algorithm > DSED >

809

DE/best/bin/1 > FSDE > DE/rand/bin/1. This order is concluded from the FA 39

810

ranking and the obtained p-values by the Holm post-hoc test as shown in Table 6.

811

However, the results show that there is no significant improvement of the proposed

812

VDEO algorithm over the competing algorithms in terms of average CA (none of the

813

competing algorithms have an unadjusted p-value ≤ the corresponding Holm APV

814

value).

815 816

Table 6: Average ranking obtained by Friedman Aligned-Ranks and Holm's test for CA using the 15 datasets of the competing algorithms.

Algorithm The proposed VDEO DSDE DE/best/1/bin FSDE DE/rand/1/bin 817

FA ranking 28.8 33.8333 39.4 42.7 45.2667

p-Values

Holm APV

0.52708 0.182875 0.080703 0.038533

0.05 0.025 0.016667 0.0125

Aligned Friedman statistic: 12.586136. p-Value computed: 0.013485454537.

818 819

4.3. Discussions

820

The combination of the best and rand mutation schemes with a switchable

821

property enhanced the convergence and decreased the probability of the premature

822

convergence. The effect of this combination along with the associated selection

823

criteria of the random solutions that produce the mutant solution in each mutation

824

scheme (exploitation and exploration) is obvious in the proposed VDEO algorithm.

825

More specifically, the proposed VDEO algorithm converged faster than all the

826

competing algorithms in all test datasets despite the random initialization of the

827

initial solutions as shown in Fig. 3a-o. In other words, the proposed first mutation

828

scheme subdivides the dataset into smaller subsets according to the data variance and

829

employed the rand mutation scheme with random generation of solutions within each

830

subset bounds. This mechanism explains the fast dropouts of the produced

831

convergence curves of the proposed VDEO algorithm at the beginning in all datasets

832

(Fig. 3) due to the high probability of finding the best solution within these subsets’ 40

833

bounds (exploitation). Then, the proposed second mutation scheme employed the

834

best mutation scheme with random generation of solutions from the whole dataset

835

bound (exploration). This mechanism gives diversity to the search in case the optimal

836

solution was not located within subsets’ bounds. For instance, this explains finding

837

better solutions (objective function values) by the proposed VDEO algorithm at later

838

iterations in the Lung Cancer, Glass, Dermatology, and Landsat datasets (Fig. 3e, f,

839

m, and o, respectively). It is worth mentioning that the proposed variance-based

840

multidimensional estimation of the mutation factor and the proposed optional

841

crossover strategy also contribute in finding better and faster solutions, especially in

842

the aforementioned high dimensional datasets.

843

In general, the novelty of the proposed algorithm is a combination of multiple

844

modifications to enhance the balance between the exploitation and exploration

845

processes. These modifications are inseparable where the effect of each modification

846

alone is barely noticeable to the overall performance and it could be a

847

complementary part to another modification. Specifically, the initialization and

848

preprocessing step addresses the difficulty of determining the initial population size

849

along with the expensive function evaluations for each solution (part of the

850

convergence speed). Also, it addresses a part of the exploitation process by

851

determining the bounds of each subset and another part of the exploration by

852

determining the mutation factor vector. In addition, the switchable mutation scheme

853

mechanism addresses the second part of the exploitation and exploration processes

854

together by giving an equal probability of execution. Eventually, the optional

855

crossover strategy contributes to the convergence rate by checking the quality of the

856

mutant solution against the trial one. Hence, these interleaved modifications should

857

be considered as one package to achieve the best results. Another noteworthy

41

858

limitation of the study is determining the value of the probability threshold

. In this

859

study, an equal probability assumption has been considered to select between the two

860

proposed mutation schemes. However, it was noticed from some experiments that the

861

value of this threshold also affects the convergence rate of some datasets in case it

862

was increased or decreased. Hence, finding the optimal value of this threshold is still

863

under investigation.

864

According to the aforementioned analyses, it can be concluded that the

865

performance of the proposed VDEO algorithm is significantly enhanced in contrast

866

to the canonical DE algorithm and outperformed the other state-of-the-art DE-based

867

clustering algorithms in terms of the objective function values (cluster compactness),

868

classification accuracy, repeatability, and convergence speed.

42

(a)

(b)

(c)

(d)

(e)

(f)

43

(g)

(h)

(i)

(j)

(k)

(l)

44

(m)

(n)

(o)

Fig. 3. The convergence rate of the competing algorithms on the 15 datasets. (a) Iris, (b) Haberman, (c) New Thyroid, (d) Seeds, (e) Lung Cancer, (f) Glass, (g) Wine, (h) Balance, (i) Vowel, (j) BSTCD, (k) Heart, (l) WDBC-Int, (m) Dermatology, (n)WDBC, (o) Landsat.

45

869

5. Conclusion and Future work

870

The main purpose of this research work is to balance the search behavior of

871

the canonical DE algorithm and improve its efficiency as a clustering tool. This

872

purpose was fulfilled via proposing the VDEO algorithm which includes four main

873

modifications to address the canonical DE limitations. First, the VDEO algorithm

874

adopted a single-based solution representation and a preprocessing step for the input

875

data to be clustered. This phase reduces the limitations associated with multi-based

876

solutions techniques such as the required number of initial solutions and their

877

associated complexity represented by the fitness function evaluation of each solution.

878

Moreover, the preprocessing (splitting into

879

feature) of the input data to be clustered is done only once and it is considered as a

880

preparation stage for the exploitation process. Secondly, the proposed VDEO

881

algorithm employed two mutation schemes with a switchable mechanism to balance

882

the exploitation and exploration processes during the search. Specifically, the first

883

mutation scheme produces mutant solutions using randomly generated solutions

884

within the bounds of the split subsets to reinforce the exploitation. On the other hand,

885

the second mutation scheme produces mutant solutions using randomly generated

886

solutions within the bounds of the whole dataset to preserve the diversity of the

887

search (exploration). The third modification is represented by a simple variance-

888

based estimation method of the mutation factor that takes into consideration the data

889

distribution at each dimension. Both of the switchable mechanism and the mutation

890

factor estimation contribute in providing higher quality mutant solutions that increase

891

the probability of finding the optimal one. Lastly, an optional crossover strategy was

892

also presented, where the necessity of the crossover process is determined according

893

to the fitness of both the mutant and trial solutions. In addition, the crossover

46

subsets based on the highest variance

894

probability is varied through iterations within a specified range to give more

895

stochastic characteristics to the algorithm and increases the chance of finding better

896

permutations of the trial solutions. By including the latter four modifications in one

897

framework, the experimental results showed the prominence performance of the

898

proposed VDEO algorithm against its peers by providing better clustering solutions

899

in terms of classification accuracy, cluster compactness (sum of intra-cluster

900

distances), repeatability, and convergence speed.

901

Prospective investigations could be conducted while optimizing the control

902

parameters in the DE algorithm such as the mutation factor and the crossover

903

probability. Many research works in the literature introduced a controlled amount of

904

randomness. It is surely interesting to investigate the usefulness of increasing or

905

decreasing the degree of randomization and which are the suitable methods to do

906

this. More specifically, a linking could be made between the amount of randomness

907

with some features of the clustering problem or finding some kind of correlation

908

among the decision variables. In addition, the relation between the input data

909

characteristics and the probability threshold

910

research.

911 912 913 914 915 916 917 918

47

can be a potential future avenue of

919

References

920 921 922

[1] T.D. Rajkumar, S.P. Raja, A. Suruliandi, Users’ Click and Bookmark Based Personalization Using Modified Agglomerative Clustering for Web Search Engine, International Journal on Artificial Intelligence Tools, 26 (2017) 1-16.

923 924 925

[2] L.M. Torres, E. Magana, D. Morato, S. Garcia-Jimenez, M. Izal, TBDClust: Timebased density clustering to enable free browsing of sites in pay-per-use mobile Internet providers, J. Netw. Comput. Appl., 99 (2017) 17-27.

926 927 928

[3] G. Saisai, H. Wei, L. Haoxuan, Q. Yuzhong, Property Clustering in Linked Data: An Empirical Study and Its Application to Entity Browsing, International Journal on Semantic Web and Information Systems (IJSWIS), 14 (2018) 31-70.

929 930

[4] C.H. Chou, S.C. Hsieh, C.J. Qiu, Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction, Appl. Soft. Comput., 56 (2017) 298-316.

931 932

[5] V. Holy, O. Sokol, M. Cerny, Clustering retail products based on customer behaviour, Appl. Soft. Comput., 60 (2017) 752-762.

933 934 935

[6] E. Kurum, G.W. Weber, C. Iyigun, Early warning on stock market bubbles via methods of optimization, clustering and inverse problems, Ann. Oper. Res., 260 (2018) 293-320.

936 937 938

[7] A.B.U. Najera, J. de la Calleja, M.A. Medina, Associating students and teachers for tutoring in higher education using clustering and data mining, Comput. Appl. Eng. Educ., 25 (2017) 823-832.

939 940 941

[8] A.M. Navarro, P. Moreno-Ger, Comparison of Clustering Algorithms for Learning Analytics with Educational Datasets, International Journal of Interactive Multimedia and Artificial Intelligence, 5 (2018) 9-16.

942 943 944

[9] J.d. Andrade Silva, E.R. Hruschka, J. Gama, An evolutionary algorithm for clustering data streams with a variable number of clusters, Expert Systems with Applications, 67 (2017) 228-238.

945 946

[10] R. Hyde, P. Angelov, A.R. MacKenzie, Fully online clustering of evolving data streams into arbitrarily shaped clusters, Information Sciences, 382-383 (2017) 96-114.

947 948 949

[11] J.M.V. Kinani, A.J.R. Silva, F.G. Funes, D.M. Vargas, E.R. Diaz, A. Arellano, Medical Imaging Lesion Detection Based on Unified Gravitational Fuzzy Clustering, J. Healthc. Eng., 2017 (2017) 14 pages.

950 951

[12] N.D. Thanh, M. Ali, L.H. Son, A Novel Clustering Algorithm in a Neutrosophic Recommender System for Medical Diagnosis, Cogn. Comput., 9 (2017) 526-544.

952 953 954

[13] L.D. Wang, X.G. Zhou, Y. Xing, M.K. Yang, C. Zhang, Clustering ECG heartbeat using improved semi-supervised affinity propagation, IET Softw., 11 (2017) 207-213.

48

955 956 957

[14] S. Saraswathi, M.I. Sheela, A comparative study of various clustering algorithms in data mining, International Journal of Computer Science and Mobile Computing, 11 (2014) 422-428.

958 959 960

[15] J.A. Hartigan, M.A. Wong, Algorithm AS 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics), 28 (1979) 100108.

961 962 963

[16] M.E. Celebi, H.A. Kingravi, P.A. Vela, A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications, 40 (2013) 200-210.

964

[17] J. Han, J. Pei, M. Kamber, Data mining: concepts and techniques, Elsevier, 2011.

965 966

[18] A. Moreira, M.Y. Santos, S. Carneiro, Density-based clustering algorithms– DBSCAN and SNN, University of Minho-Portugal, (2005).

967 968

[19] S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarm and Evolutionary Computation, 16 (2014) 1-18.

969 970 971

[20] R. Ayachi, H. Bouhani, N.B. Amor, An Evolutionary Approach for Learning Opponent's Deadline and Reserve Points in Multi-Issue Negotiation, International Journal of Interactive Multimedia and Artificial Intelligence, 5 (2018) 131-140.

972 973

[21] A.K. Kar, Bio inspired computing – A review of algorithms and scope of applications, Expert Systems with Applications, 59 (2016) 20-32.

974 975 976

[22] M. Mavrovouniotis, C. Li, S. Yang, A survey of swarm intelligence for dynamic optimization: Algorithms and applications, Swarm and Evolutionary Computation, 33 (2017) 1-17.

977 978

[23] I. BoussaïD, J. Lepagnot, P. Siarry, A survey on optimization metaheuristics, Information Sciences, 237 (2013) 82-117.

979 980 981

[24] R. Storn, K. Price, Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces, Journal of Global Optimization, 11 (1997) 341-359.

982 983 984 985

[25] J. Vesterstrom, R. Thomsen, A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems, in: Proceedings of the Congress on Evolutionary Computation, Portland, OR, USA, 2004, pp. 1980-1987.

986 987

[26] K. Price, R.M. Storn, J.A. Lampinen, Differential evolution: a practical approach to global optimization, Springer-Verlag Berlin Heidelberg, 2006.

988 989

[27] N. Noman, H. Iba, Accelerating Differential Evolution Using an Adaptive Local Search, IEEE Transactions on Evolutionary Computation, 12 (2008) 107-125.

990 991

[28] R. Knobloch, J. Mlýnek, R. Srb, The classic differential evolution algorithm and its convergence properties, Applications of Mathematics, 62 (2017) 197-208.

49

992 993 994

[29] S. Das, A. Abraham, U.K. Chakraborty, A. Konar, Differential Evolution Using a Neighborhood-Based Mutation Operator, IEEE Transactions on Evolutionary Computation, 13 (2009) 526-553.

995 996

[30] F. Neri, V. Tirronen, Recent advances in differential evolution: a survey and experimental analysis, Artificial Intelligence Review, 33 (2010) 61-106.

997 998

[31] S. Das, P.N. Suganthan, Differential Evolution: A Survey of the State-of-the-Art, IEEE Transactions on Evolutionary Computation, 15 (2011) 4-31.

999 1000

[32] S. Das, S.S. Mullick, P.N. Suganthan, Recent advances in differential evolution – An updated survey, Swarm and Evolutionary Computation, 27 (2016) 1-30.

1001 1002

[33] Y.J. Gong, Y. Zhou, Differential Evolutionary Superpixel Segmentation, IEEE Transactions on Image Processing, 27 (2018) 1390-1404.

1003 1004 1005

[34] M.Z. Ali, N.H. Awad, P.N. Suganthan, R.G. Reynolds, An Adaptive Multipopulation Differential Evolution With Dynamic Population Reduction, IEEE Transactions on Cybernetics, 47 (2017) 2768-2779.

1006 1007 1008

[35] U.M. Nunes, D.R. Faria, P. Peixoto, A human activity recognition framework using max-min features and key poses with differential evolution random forests classifier, Pattern Recognition Letters, 99 (2017) 21-31.

1009 1010 1011

[36] A. Majed, Z. Salam, A.M. Amjad, Harmonics elimination PWM based direct control for 23-level multilevel distribution STATCOM using differential evolution algorithm, Electric Power Systems Research, 152 (2017) 48-60.

1012 1013 1014 1015

[37] D. Teijeiro, X.C. Pardo, D.R. Penas, P. González, J.R. Banga, R. Doallo, A cloud-based enhanced differential evolution algorithm for parameter estimation problems in computational systems biology, Cluster Computing, 20 (2017) 19371950.

1016 1017 1018

[38] L. Jebaraj, C. Venkatesan, I. Soubache, C.C.A. Rajan, Application of differential evolution algorithm in static and dynamic economic or emission dispatch problem: A review, Renewable and Sustainable Energy Reviews, 77 (2017) 1206-1220.

1019 1020 1021

[39] S. Das, A. Abraham, A. Konar, Automatic Clustering Using an Improved Differential Evolution Algorithm, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 38 (2008) 218-237.

1022 1023 1024

[40] K. Suresh, D. Kundu, S. Ghosh, S. Das, A. Abraham, S.Y. Han, Multi-objective differential evolution for automatic clustering with application to micro-array data analysis, Sensors (Basel, Switzerland), 9 (2009) 3981-4004.

1025 1026 1027 1028 1029

[41] G. Martinović, D. Bajer, Data Clustering with Differential Evolution Incorporating Macromutations, in: B.K. Panigrahi, P.N. Suganthan, S. Das, S.S. Dash (Eds.) Swarm, Evolutionary, and Memetic Computing: 4th International Conference, SEMCCO 2013, Chennai, India, December 19-21, 2013, Proceedings, Part I, Springer International Publishing, Cham, 2013, pp. 158-169.

50

1030 1031 1032

[42] M. Hosseini, M. Sadeghzade, R. Nourmandi-Pour, An efficient approach based on differential evolution algorithm for data clustering, Decision Science Letters, 3 (2014) 319-324.

1033 1034 1035

[43] M.B. Bonab, S.Z. Hashim, N.E.N. Bazin, A.K.Z. Alsaedi, An Effective Hybrid of Bees Algorithm and Differential Evolution Algorithm in Data Clustering, Mathematical Problems in Engineering, vol.2015 (2015) 17 pages.

1036 1037

[44] J. Tvrdík, I. Křivý, Hybrid differential evolution algorithm for optimal clustering, Appl. Soft. Comput., 35 (2015) 502-512.

1038 1039 1040

[45] W.-l. Xiang, N. Zhu, S.-f. Ma, X.-l. Meng, M.-q. An, A dynamic shuffled differential evolution algorithm for data clustering, Neurocomputing, 158 (2015) 144154.

1041 1042 1043

[46] M. Ramadas, A. Abraham, S. Kumar, FSDE-Forced Strategy Differential Evolution used for data clustering, Journal of King Saud University-Computer and Information Sciences, (In Press, Corrected Proof) (2016).

1044 1045

[47] K.Y. Kok, P. Rajendran, Differential-evolution control parameter optimization for unmanned aerial vehicle path planning, PloS one, 11 (2016) e0150558.

1046 1047

[48] E.H. Ruspini, Numerical methods for fuzzy clustering, Information Sciences, 2 (1970) 319-350.

1048 1049

[49] M. Lichman, UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, in, 2013.

1050 1051 1052

[50] B. Jiang, N. Wang, L. Wang, Particle swarm optimization with age-group topology for multimodal functions and data clustering, Communications in Nonlinear Science and Numerical Simulation, 18 (2013) 3134-3145.

1053 1054 1055

[51] J. Hodges, E.L. Lehmann, Rank methods for combination of independent experiments in analysis of variance, The Annals of Mathematical Statistics, 33 (1962) 482-497.

1056 1057 1058

[52] S. Holm, A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics, (1979) 65-70.

51