Accepted Manuscript Variance-based differential evolution algorithm with an optional crossover for data clustering Mohammed Alswaitti, Mohanad Albughdadi, Nor Ashidi Mat Isa
PII: DOI: Reference:
S1568-4946(19)30131-0 https://doi.org/10.1016/j.asoc.2019.03.013 ASOC 5383
To appear in:
Applied Soft Computing Journal
Received date : 19 August 2018 Revised date : 31 January 2019 Accepted date : 2 March 2019 Please cite this article as: M. Alswaitti, M. Albughdadi and N.A. Mat Isa, Variance-based differential evolution algorithm with an optional crossover for data clustering, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.03.013 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Highlights (for review)
Highlights •
A single solution representation is adopted to avoid setting the initial solutions’ sizes and positions.
•
A new switchable mutation scheme is employed to enhance the balance of the search behaviour.
•
Multidimensional mutation factor is introduced to enhance the mutant solution quality.
•
A new optional crossover strategy is proposed to increase the convergence rate.
•
Integration of the four proposals in one DE-based clustering algorithm.
*Manuscript Click here to view linked References
Variance-based Differential Evolution Algorithm with an Optional Crossover for Data Clustering
1 2 3 4 5 6 7
MOHAMMED ALSWAITTI 1, a, MOHANAD ALBUGHDADI 2, b, AND NOR ASHIDI MAT ISA 3, c, *
8
2
9 10
3
11
E-mail:
[email protected], mohanad.albughdadi@ terranis.fr,
[email protected]
12 13 14
Phone Numbers: (+60) 182097532, (+33) 7 81 85 34 77, (+60) 129896051
15
Abstract
16
The differential evolution optimization-based clustering techniques are powerful,
17
robust and more sophisticated than the conventional clustering methods due to their
18
stochastic and heuristic characteristics. Unfortunately, these algorithms suffer from
19
several drawbacks such as the tendency to be trapped or stagnated into local optima
20
and slow convergence rates. These drawbacks are consequences of the difficulty in
21
balancing the exploitation and exploration processes which directly affects the final
22
quality of the clustering solutions. Hence, a variance-based differential evolution
23
algorithm with an optional crossover for data clustering is presented in this paper to
24
further enhance the quality of the clustering solutions along with the convergence
25
speed. The proposed algorithm considers the balance between the exploitation and
26
exploration processes by introducing (i) a single-based solution representation, (ii) a
27
switchable mutation scheme, (iii) a vector-based estimation of the mutation factor,
28
and (iv) an optional crossover strategy. The performance of the proposed algorithm is
29
compared with current state-of-the-art differential evolution-based clustering
1
School of Information Science and Technology, Xiamen University Malaysia, Jalan Sunsuria, Bandar Sunsuria,
43900 Sepang Selangor Darul Ehsan, Malaysia. TerraNIS (New Information Services) SAS, 10 Avenue de l'Europe, 31 520 Ramonville, France. School of Electrical & Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong
Tebal, Penang, Malaysia. a
b
a
c
b
c
* Corresponding author
1
30
techniques on 15 benchmark datasets from the UCI repository. The experimental
31
results are also thoroughly evaluated and verified via non-parametric statistical
32
analysis. Based on the obtained experimental results, the proposed algorithm
33
achieves an average enhancement up to 11.98% of classification accuracy and
34
obtains a significant improvement in terms of cluster compactness over the
35
competing algorithms. Moreover, the proposed algorithm outperforms its peers in
36
terms of the convergence speed and provides repeatable clustering results over 50
37
independent runs.
38 39
Keywords: Differential Evolution, exploitation and exploration, data clustering,
40
switchable mutation, optional crossover, convergence speed.
41 42
1. Introduction
43
Recently, the vast advancements in data storage technologies and internet
44
applications have resulted in a massive growth of data quantity of all types. This
45
diversity of the data is an outcome of an endless sequence of daily life interactions
46
while accessing, recording, and transferring information (such as text, images, and
47
videos) among humans. The increase in both the volume and the variety of this data
48
induced the need for an advanced technology that is automatically capable of
49
summarizing these huge amounts of data to meaningful, comprehensible, and useful
50
information. To meet this requirement, data mining has emerged as a powerful
51
technique to extract the valuable hidden information and knowledge from the large
52
databases. Cluster analysis is one of the simplest data mining tools that is used to
53
categorize the data objects based on their features into a set of natural and similar
54
clusters without a prior knowledge of the data. Naturally, the grouped objects within
2
55
the same cluster share a high degree of similarity while being dissonant to other
56
objects belonging to other clusters. In other words, the formed clusters should satisfy
57
a high degree of homogeneity within their members and a high degree of
58
heterogeneity to other clusters. Grouping these patterns into meaningful clusters in
59
an unsupervised manner is done using clustering algorithms.
60
Unsupervised learning algorithms play an outstanding role in machine
61
learning due to their capabilities in exploring data without having any prior
62
information about them, i.e., there are no labels associated with these data. These
63
algorithms aim at modeling the underlying structure or distribution in the data, which
64
can be used for decision making, predicting future inputs, among others. In the past
65
few decades, cluster analysis has contributed a vital role in a diversity of fields and
66
the applications of clustering techniques have been used in a wide range of different
67
areas, including web analysis [1-3], business [4], marketing [5, 6], education [7, 8],
68
data science [9, 10], and medical diagnosis [11-13], among others.
69
There are various types of clustering algorithms which can be classified
70
broadly into different categories, namely, partitional, hierarchical, density-based, and
71
optimization-based clustering algorithms. Each category has its own working
72
mechanism, the capability to deal with certain types of data, advantages, and
73
drawbacks [14]. Partitional clustering algorithms are considered to be versatile such
74
as the K-means algorithm [15] but in general, partitional algorithms lack the
75
capability to discover groupings in overlapping clusters. Furthermore, they are
76
sensitive to the initialization step where the initial positions of the centroids are
77
specified [16]. In contrast, hierarchical clustering algorithms provide the advantage
78
of exploring data using different levels of the dendrogram with no prior information
79
about the number of clusters. However, the complexity of the hierarchical clustering
3
80
algorithms is higher than that of the partitional methods [17]. Moreover, Employing
81
the density concept in the density-based clustering algorithms allows for detecting
82
clusters with arbitrary shapes and makes the algorithm robust against outliers.
83
However, density-based clustering algorithms are not adequate for clusters with
84
varied densities and high-dimensional data [18].
85
Recently, Optimization-based clustering algorithms have seized a competitive
86
stature in solving clustering problems [19, 20] due to their capability of discovering
87
better solutions. The prominence of bio-inspired computing is increasing due to its
88
various applications in engineering [21], where numerous variations of these
89
algorithms have been proposed for dynamic optimization [22] along with their
90
applications to real-world problems. Mostly, the search process of these algorithms is
91
done by a population-based, decentralized, and self-organized behavior which
92
provides a powerful alternative to the conventional clustering methods. In
93
metaheuristic and population-based algorithms, the two cornerstones of problem-
94
solving by the search are the exploitation and exploration processes. The exploitation
95
process aims at searching the nearby area of the current solution, where the
96
exploration process allows searching for new solutions far from the current one in the
97
search space. The algorithm is considered to be efficient if an appropriate balance
98
between the exploitation and exploration processes is achieved on a given
99
optimization problem [23]. Many factors control the exploitation and exploration
100
processes while searching the problem space, and these factors differ according to
101
the followed mechanism by the optimization algorithm.
102
The exploitation and exploration processes are conflicting objectives and
103
achieving the balance between them is algorithm-dependent. Hence, the dominant
104
factors of the search process need to be studied and analyzed for each algorithm.
4
105
One of the most famous nature-inspired algorithms utilized in cluster analysis
106
is the Differential Evolution (DE) algorithm [24]. This algorithm is simple and
107
relatively efficient but it also comes with associated drawbacks while solving
108
clustering problems. It is a stochastic search method with simpler, more reliable, and
109
more robust characteristics than other evolutionary algorithms [25, 26]. However, the
110
search behavior of the canonical DE algorithm is not properly balanced. In other
111
words, the exploration ability of the DE algorithm is sufficient, where the
112
exploitation is considered to be weak which affects the convergence speed of the
113
algorithm [27]. Also, the DE algorithm does not guarantee to converge to the global
114
solution, and it is vulnerable to the premature convergence [28] and stagnation into
115
suboptimal solutions [29, 30]. Moreover, the performance of the DE algorithm is
116
affected by the control parameters setups, where various research studies aimed to
117
estimate these parameters to enhance the ability of the algorithm to solve a wider
118
range of problems [31, 32]. However, none of these research works considered the
119
effect of the control parameters on all dimensions, where most approaches assigned
120
magnitude values especially to the mutation factor and assumed an equally-weighted
121
difference for all dimensions.
122
Based on the aforementioned concerns, this paper proposes a new variant of
123
DE-based clustering algorithms, namely, Variance-based Differential Evolution
124
Algorithm with an Optional Crossover for Data Clustering (VDEO). The proposal of
125
this algorithm includes a single-based solution representation to avoid the difficulty
126
associated with determining the number of initial solutions and to reduce the
127
computations related to the fitness evaluation for all these initial solutions. In
128
addition, the proposed algorithm presents a new switchable DE mutation scheme
129
between two random and variance-based schemes to balance the search behavior of
5
130
the canonical DE algorithm. Furthermore, a vector-based estimation of the mutation
131
factor which takes into consideration the data distribution at each dimension is also
132
introduced. This multidimensional mutation factor contributes to the quality of the
133
generated mutant solution. Finally, an optional crossover strategy is also presented to
134
enhance the convergence rate of the proposed algorithm. The integration of the
135
previous proposals aims at balancing the exploitation and exploration processes,
136
improving the convergence rate of the algorithm, and providing high-quality
137
clustering solutions.
138
The rest of the paper is organized as follows. A background of the canonical
139
DE algorithm and its utilization in cluster analysis are provided in Section 2. The
140
proposed algorithm is introduced in Section 3. Validation on datasets from the UCI
141
using different measures and statistical analysis is conducted in Section 4. Finally,
142
conclusions are drawn in Section 5.
143 144
2. Differential Evolution Optimization-based Clustering Algorithms
145
The DE algorithm is considered as a heuristic, population-based, and
146
evolution-inspired optimization technique which was proposed by Storn and Price
147
[24] to solve complex real-world problems. Consequently, numerous variants of the
148
DE algorithm have been developed and employed to solve a diverse range of real-
149
world problems [33-38]. The key idea of the DE algorithm is to evolve the
150
population (solutions) at each generation (iteration) by mutation and crossover
151
processes, where random combinations of population differences are generated to
152
form mutant vectors. Then, these mutants contribute to the base population (current
153
solution) to form a new candidate solution, where the fittest among the current and
154
the new solution will survive to the next iteration.
6
155
Generally, the performance of the DE algorithm is affected by a number of
156
factors such as the initial population size and positions, controlling parameters
157
(mutation and crossover factors), and the employed mutation strategy. Hence,
158
numerous variants were proposed in the literature to adapt these factors or propose
159
new schemes to enhance the convergence speed, the complexity, and the quality of
160
the provided solutions by the canonical DE algorithm. A recent and comprehensive
161
survey on the development of the several aspects of the DE as an optimization
162
algorithm can be found at the work proposed in [32].
163
In this paper, only optimization-based clustering techniques related to the DE
164
algorithm are considered in the literature in order to stick with the paper scope.
165
Therefore, the next sections are confined to exhibit the canonical DE algorithm and
166
the related works in the literature that used it for cluster analysis.
167 168 169
2.1. The Canonical Differential Evolution Algorithm Similar to the heuristic optimization algorithms, the DE algorithm searches
170
for an optimal solution in a -dimensional search space by initiating
171
which are considered as candidate solutions that evolve through iterations to provide
172
better solutions. The framework of the canonical DE algorithm mainly consists of
173
four consecutive phases namely, population initialization, mutation, crossover, and
174
selection. It is worth mentioning that the population initialization phase is only done
175
once, then the three remaining phases are executed successively through the
176
iterations until a termination criterion is satisfied. The following explanation for all
177
these phases is based on the canonical DE proposed in [24] as follows:
178 179
7
population,
180
2.1.1. Population Initialization
181 182
Each candidate solution
that belongs to the population
at an iteration
is represented as (1)
183
where
and
represents the search space dimension. The initial
184
positions (at
185
bounds of the search space
) of the candidate solutions are randomly selected within the and
186
in a uniform manner in order to achieve a maximum
187
coverage of the search space. Each of the initial solutions is also known as a target
188
vector.
189 190
2.1.2. Mutation
191
After the population initialization phase, the DE algorithm generates a
192
donor/mutant solution
193
weighted difference between randomly selected solutions from the population. In
194
other words, three solutions
195
population such that
196
for each given target solution
and
by computing the
are randomly selected from the and
. The donor vector
is then computed as (2)
197
where
is the mutation factor that controls the mutation process and takes
198
magnitude values in the range of
.
199 200 201
8
202
2.1.3. Crossover
203
In this phase, a trial solution
is generated as a combination of the target
204
and the donor solutions where this process allows the target solution to inherit a
205
number of attributes from the donor solution. Certainly, the crossover process is
206
subjected to constraints that determine the number and indices of the inherited
207
features. The canonical DE algorithm employs the binomial crossover scheme which
208
is controlled by a probability
. The trial solution is then defined as
209 (3) 210 211
where
is a uniform random number in the range of [0,1], and
212
integer number
213
solution will contribute to the trial solution with one attribute at least.
. The purpose of using the
is a random
is to guarantee that the donor
214 215 216
2.1.4. Selection The role of the selection phase is to determine the fittest solution among the
217
target and the trial solutions based on the value of the objective function
to be
218
minimized. The vector that yields the lower value of the objective function will be
219
selected to survive (be a part of the population) to the next iteration
as follows:
220 (4) 221 222
The mutation, crossover, and selection processes are repeatedly executed
223
until the convergence or a specified termination criterion is met. The previous 9
224
description of the DE optimization algorithm has been employed in cluster analysis
225
(as will be described in Section 3) by considering the intra-cluster distances as the
226
objective function to be minimized. The next section is only dedicated to exhibit the
227
related works in the literature that have customized the DE algorithm as a clustering
228
tool.
229 230
2.2. DE Utilization in Cluster Analysis
231
An automatic clustering using an improved differential evolution algorithm
232
(ACDE) was proposed in [39]. This technique enhanced the convergence of the
233
canonical DE algorithm by varying the values of the mutation factor and crossover
234
probability through iterations using mathematical formulas to balance the search
235
behavior. Furthermore, the proposed technique provided a new representation of
236
each solution in the search process in order to automatically estimate the number of
237
clusters in a dataset, where the strength of a solution is measured using a clustering
238
validity index (e.g., the CS or Davis-Bouldin measure). The proposed technique was
239
tested on real-world datasets and grayscale images to prove its ability to detect the
240
optimal number of clusters. Despite the novelty of the proposed technique, its
241
performance is highly sensitive to the clustering validity index selection, where
242
insufficient selection may result into an inaccurate number of clusters.
243
The work presented in [40] adopted a similar solution representation to the
244
work proposed in [39] to introduce a multi-objective differential evolution algorithm
245
for automatic clustering. In this framework, two conflicting objective functions were
246
optimized simultaneously in order to produce multiple solutions that have different
247
trade-offs between the two objectives along with a different possible number of
248
clusters. Therefore, this approach provides the flexibility to choose the optimal
10
249
solution based on the problem to be optimized. However, the framework is highly
250
sensitive to the selection of the objective functions to be optimized.
251
An algorithm for data clustering with differential evolution incorporating
252
macro-mutations (DEMM) was proposed in [41]. Authors claimed that the solutions
253
(population) of the canonical DE algorithm tends to be more similar through
254
iterations, which limits the exploration behavior of the algorithm. Hence, they
255
proposed the macro-mutations as an alternative scheme to improve the exploration
256
ability of the DE algorithm, where a new probability of macro-mutations
257
defined to switch between the traditional mutation/crossover processes and the
258
macro-mutations scheme. This probability was increased through iterations by a
259
linear function to give a higher probability of exploring the search space by the
260
macro-mutations scheme at the final stages of the search. Furthermore, an
261
exponentially decreasing function of the crossover probability of the canonical DE
262
algorithm was introduced which allows an exploration process at early stages of the
263
search (high values) and gradually turns into an exploitation process (low values) at
264
the final stages of the search. In fact, the proposed technique provides a good balance
265
between the exploitation and exploration processes by adjusting the crossover and
266
macro-mutations probabilities and provided high-quality solutions based on the
267
reported experimental results. However, the minimum and the maximum values of
268
the two probabilities are set empirically and they are sensitive to the maximum
269
number of iterations.
was
270
In the work proposed in [42], an efficient approach based on the differential
271
evolution algorithm for data clustering was presented. The key idea of this technique
272
is simple, where each member of the population has an equal probability to be
273
selected for the mutation process. Furthermore, the algorithm employed the inter and
11
274
intra-cluster distances as the objective functions to be maximized and minimized,
275
respectively. Although the performance of the presented technique was compared to
276
the K-means algorithm and provided more acceptable results, further evaluation of
277
the algorithm performance on a wider range of real-world problems is needed.
278
An effective hybrid of bees’ and differential evolution algorithms in data
279
clustering was proposed in [43]. In this hybrid technique, the K-means algorithm was
280
used as a preprocessing stage to determine the initial cluster centroids. Then, the bee
281
algorithm was used to start the global search and represent the explorative search
282
behavior, where the DE algorithm was assigned to perform the local search to
283
represent the exploitative search behavior. The hybrid technique employed the
284
strengths of both algorithms and overcame their shortcomings. Based on the reported
285
experimental results, the algorithm produced competitive clustering results with
286
relatively acceptable complexity compared to other hybrids in the literature.
287
However, an adequate compromise between the quality of the solution and the
288
associated complexity is necessary when this hybrid is applied to cluster high
289
dimensional datasets.
290
Another hybrid technique of the K-means and DE algorithms was proposed in
291
[44] for optimal clustering. In this technique, the K-means algorithm was applied on
292
the generated trial solution by the DE algorithm to perform a local search. In
293
addition, a rearrangement scheme of the clusters’ centroids was presented to
294
maximize the classification process of data points. Furthermore, two different
295
objective functions were employed separately for clustering, the first aimed at
296
minimizing the trace within criterion (TRW) and the second aimed at maximizing the
297
variance ratio criterion (VCR). The proposed technique was tested on real-world
298
problems, and the results concluded that the hybrid DE algorithms outperformed the
12
299
non-hybrid ones when the previous two objective functions were used. However, the
300
performance of the hybrid will be affected if the objective function is replaced by
301
another clustering validity measure.
302
The work in [45] presented a dynamic shuffled differential evolution
303
algorithm for data clustering (DSDE) to enhance the convergence rate of the
304
canonical DE algorithm. Due to the sensitivity of most clustering algorithms to the
305
initial positions of centroids, this technique presented an initialization scheme called
306
the random multistep sampling to avoid the premature convergence to a local
307
optimum. Furthermore, a new sorting and dividing scheme of the population into two
308
subpopulations was presented based on the shuffled frog leaping algorithm to
309
increase the diversity (exploration) and enhance the information exchange among the
310
population. Additionally, both the convergence speed and the exploitation ability of
311
the algorithm were improved by employing the DE/best/1 mutation strategy for
312
subpopulations during the evolution process. The reported experimental results
313
concluded that the proposed technique has the ability to provide high quality
314
clustering solutions in terms of classification accuracy and intra-cluster distances
315
compared to the canonical DE and other state-of-the-art evolutionary algorithms in
316
the literature. On the other hand, the algorithm requires a high number of function
317
evaluations (in sorting and dividing the population) which is directly proportional to
318
the complexity of the algorithm.
319
Due to the high effect of the mutation strategy on the performance of the DE
320
algorithm, a forced strategy differential evolution algorithm for data clustering
321
(FSDE) was proposed in [46]. The new mutation strategy which was presented in
322
this DE variant is as follows:
323
13
(5) 324 325
where
is the mutant solution,
,
, and
are randomly selected solutions
326
from the population, and
327
Besides the traditional mutation factor
328
technique also proposed an additional controlling factor
329
in the range of [0,1] at each iteration. These modifications were made to improve the
330
quality of the mutant solution and consequently the efficiency of the DE algorithm. It
331
is worth mentioning that the clustering result of the K-means clustering algorithm
332
was used as one of the initial population to the proposed DE variant, where the
333
remaining population members are initialized randomly. Based on the reported
334
experimental results, the proposed technique provided good clustering solutions
335
according to different clustering validation indices.
336
To summarize, the following problems need to be addressed in order to develop a
337
robust DE clustering algorithm:
is the best solution found at the current iteration . which takes a constant value of 0.6, this that takes a varying value
338
i. The initial population sizes and positions highly affect the performance of the
339
DE algorithm. Most of the previous studies [31, 32] assign the initial
340
positions of the population (centroids of the clusters) randomly, which
341
increases the algorithm vulnerability to premature convergence, and
342
entrapment or stagnation into local optimal solutions.
343
ii. The adopted mutation scheme by the DE algorithm plays an important role in
344
the search behavior of the algorithm. Specifically, the canonical DE
345
algorithm employed the DE/rand/bin/1 mutation scheme in producing the
346
mutant solution. This scheme preserves the diversity of the search through the
347
randomness concept, but it slows the convergence speed of the algorithm and 14
348
it does not guarantee the convergence to the optimal solution. On the other
349
hand, other approaches employ the DE/best/bin/1 mutation scheme in
350
producing the mutant solution [45, 46]. This scheme indeed increases the
351
convergence speed of the DE algorithm but the greediness toward the best
352
solution makes the algorithm vulnerable to premature convergence to local
353
optima.
354
iii. The DE algorithm includes parameters to control the search behavior such as
355
the mutation and crossover factors which control the generation of the mutant
356
and the trial solutions, respectively. The previous studies related to the DE
357
clustering literature [40-42, 45, 46] assign magnitude values for these
358
controlling parameters within a conventional range which neglects the variety
359
of the clustering problems and the data distribution along all the dimensions.
360
The main problem is to find the optimal estimations or settings for these
361
controlling parameters to make the algorithm more efficient in solving a
362
wider range of clustering problems.
363
iv. In the majority of DE-based clustering algorithms, the trial solution is always
364
selected to be compared to the best solution in order to select the fittest one,
365
where sometimes the mutant solution is fitter than both of them. Hence,
366
adding a restriction to this step would be beneficial in increasing the
367
algorithm convergence rate.
368
The previous literature on the diverse proposed techniques surely enhanced
369
the performance of the canonical DE algorithm. However, an algorithm that
370
guarantees the convergence to the global optimum (by balancing the search
371
behavior), insensitive to the initial population size and positions, requires few input
372
parameters, and has a low computational complexity is still missing. These demands
15
373
motivate the research work proposed in Section 3 in order to improve the ability of
374
the DE algorithm to provide faster convergence with better clustering solutions.
375 376
3. The Proposed Algorithm
377
The proposed algorithm consists of four main stages. The first one is the
378
initialization and problem formulation stage, where the initial solution representation
379
and initial position to the search space are determined. Also, the preprocessing step
380
for the input data to be clustered is presented to cope with the new proposed mutation
381
scheme. The second stage includes the two employed mutation schemes, the
382
switchable mechanism between them, and the vector-based estimation of the
383
mutation factor. The third stage is concerned with the optional crossover strategy
384
where the necessity of the crossover process is determined according to the fitness of
385
both the mutant and trial solutions. Finally, the fourth stage determines whether the
386
trial or the current solution will survive to the next iteration through the selection
387
process of the fittest. Fig. 1 shows the general flow of the proposed VDEO algorithm
388
where the four stages mentioned above are described in the following sections.
389
16
Start
Multidimensional mutation factor estimation & search space division
Estimate the mutation factor and sort the input dataset according to the feature with the maximum variance
Divide the dataset into K subsets (equal to the number of required clusters) Initialize the solution position randomly to the search space
No
Exploration Process (2nd mutation scheme)
If random >Threshold ϕ
Initialization and preprocessing for the exploitation process
Switchable mutation
Yes
Generate two random solutions by selecting each centroid within the bounds of the whole input dataset.
Generate three random solutions by selecting each centroid within the bounds of the corresponding Kth subset.
Create the mutant solution using the DE/best/1 scheme
Create the mutant solution using the DE/rand/1 scheme
Exploitation Process (1st mutation scheme)
Create the trial solution using the random crossover probability
Compare and select the fittest between the mutant and the trial solutions
Balancing the search behavior
Optional crossover and selection processes
Compare to the current solution and select the fittest for the next iteration
No
Termination criter ion ?
Yes
Faster convergence
Optimal clusters
390 391
End
392 393
Fig. 1. An overview of the proposed Variance-based Differential Evolution algorithm with an Optional Crossover for Data Clustering (VDEO).
394 395 396 397 17
398
3.1. Initialization and Problem Formulation
399
In order to customize the DE algorithm for data clustering purposes, the
400
representation of the solution is modified. In this research work, a single-based
401
solution scheme is adopted, and the concept of the population is discarded. This
402
reduces the limitations associated with multi-based solutions techniques such as the
403
required number of initial solutions (which is considered as another optimization
404
problem depending on the input data size [47]) and their associated complexity
405
represented by the fitness function evaluation of each solution. Hence, only one
406
solution
is initialized to the search space and represented by a matrix as follows
407
,
(6)
408 409
where each row vector represents a coordinate of a centroid with
number of
410
attributes, and the number of the row vectors is equal to the required
number of
411
clusters. In other words, the first dimensional vector in the matrix refers to the
412
position of the first centroid, the
413
dimensional vector refers to the position of the
centroid, and so on. The initial positions of the centroids that form the whole
414
solution (the target solution)
are distributed in a uniformly random manner to the
415
search space . The constraints of the maximum and the minimum values of the
416
attributes are also considered, where each centroid that corresponds to a row vector
417
takes values between
418
Section 2.1.1).
and
of the dataset attributes (as described in
18
419
To represent the quality of the solution in an optimization-based clustering
420
problem, an objective function based on a similarity measure should be defined. In
421
this paper, the employed objective function is the sum of intra-cluster distances,
422
where the Euclidean distance between data points and the corresponding clusters
423
centers is selected as a similarity measure. The value of the objective function
424
can be calculated as:
425
(7)
426 427 428
where
is the number of clusters,
is the corresponding cluster of a centroid
is a similarity function (Euclidean distance), and with the centroid
,
is the -th data point that
429
belongs to the cluster
. The smaller the value of the objective
430
function is, the more compact the clusters are. Hence, the clustering process of the
431
proposed algorithm is considered as a global optimization problem that aims at
432
minimizing the intra-cluster distances.
433
Due to the adopted single-based solution representation in the proposed
434
algorithm, a preparation step for the new mutation scheme is also introduced to cope
435
with the framework mechanism. Assuming a
436
of observations, and
437
subsets using the following steps:
dataset, where
is the number
is the number of features. The dataset will be divided into
438
1) Compute the variance of each feature (column).
439
2) Find the column with the maximum variance and sort the dataset in an
440
ascending order according to it.
19
441
3) Split the dataset into approximately equal sized
subsets, where
is the
442
number of clusters. Assuming that
is an index that runs over the clusters
443
such that
444
used to define each subset. For example, for
445
contain all observations whose indices lie in the interval between
446
Consequently, the second subset will contain all observations whose indices
447
lie in the interval between
and so on. It is worth
448
mentioning that if the term
is not a valid index, the floor function is
449
used to convert it to the closest smaller integer. Hence, the output can be
450
expressed as:
, the formula
can be , the first subset will .
451
.
(8)
452 453
3.2. Mutation
454
In the proposed algorithm, two different mutation schemes are introduced to
455
produce the donor/mutant solutions. The first mutation scheme uses the DE/rand/1
456
strategy and it employs the
457
solutions are selected to generate the mutant solutions. In contrast, the second
458
mutation scheme uses the DE/best/1 strategy and it includes the whole dataset
459
bounds to select the random solutions in order to generate the mutant solutions.
460
During the clustering process of the proposed algorithm, only one of the mutation
461
schemes is executed at an iteration. Hence, both mutation schemes and the selection
462
criteria between them (switching) is also presented in the next sections.
output subsets as a pool from where the random
20
463
3.2.1. The First Mutation Scheme
464
This mutation scheme uses the DE/rand/1 strategy, i.e., the weighted
465
difference of two randomly selected solutions is added to a third random solution to
466
produce the mutant solution as in Eq. (10). In addition, this strategy uses the
467
binomial crossover to produce the trial solution. The proposal of this mutation
468
scheme represents the exploitation part of the framework, where the mutant solution
469 470
is generated using three solutions to
the
obtained
subsets
as
in
471 472 473
,
, and
Eq.
(8).
which are produced according To
be
more
specific,
the
and of each
are first computed, then the general formula for producing a random
is as follows:
474
(9)
,
475 476
where
and the attribute values of a centroid
are generated in a
477
uniformly random manner within the range of the minimum and maximum values of
478
the corresponding subset such that
479
solutions are generated at each iteration and for each solution the first centroid is
480
assigned using a uniform random function within the bounds of the first subset, the
481
second centroid is assigned using a uniform random function within the bounds of
482
the second subset, the
483
within the bounds of the
484
calculated as:
. More specifically, three
centroid is assigned using a uniform random function subset, and so on. The mutant solution
21
is then
485 (10) 486 487
where
is the mutation factor that weights the difference of the mutation process.
488 489
3.2.2. The Second Mutation Scheme
490
The alternative mutation scheme proposed in this algorithm uses the
491
DE/best/1 strategy, i.e., the weighted difference of two randomly selected solutions is
492
added to the best (current) solution to produce the mutant solution (as in Eq. (11)).
493
Also, this strategy uses the binomial crossover to produce the trial solution. The
494
proposal of this mutation scheme represents the exploration part of the algorithm,
495
where the best solution found until the current iteration
496
random solutions
497
contrast to the first mutation scheme, the centroids of the random solutions are
498
assigned in a uniformly random manner within the minimum and maximum values
499 500
and
and
are employed in generating the mutant solution. In
of the whole dataset before the division process. Therefore, the
general formula for producing a random
501
along with two other
is the same as in Eq. (9), where
and the attribute values of a centroid
are generated by a uniform random
502
function within the range of the minimum and maximum values of the whole dataset
503
such that
. The mutant solution
is then calculated as
504 ,
(11)
505 506
where
is the mutation factor that weights the difference of the mutation process.
507
Usually, the mutation factor
takes magnitude values in the range of 22
in the
508
proposed techniques in the literature which were reviewed in Section 2.2. This
509
magnitude value does not take into consideration the spatial distribution of the data
510
in each dimension. Furthermore, the effect of this magnitude value on a variety of
511
datasets with multiple dimensions and variances is different. Hence, in this work, a
512
new simple, dynamic, and vector-based estimation technique of
513
value of the mutation factor is set to be equal to the variance vector that corresponds
514
to the input dataset to be clustered, and it is represented as:
is presented. The
515 (12) 516 517
where
is the variance of the attribute column and
518
dataset. It is worth mentioning that this
519
mutation schemes of the proposed algorithm.
is the dimensions of the
estimation method is applied in both
520 521
3.2.3. The Mutation Scheme Switching Criteria
522
At each iteration, only one of the previously described two mutation schemes
523
is selected to generate the mutant solution. In order to determine which scheme will
524
be executed, a threshold value
525
number
526
the mutation schemes such as:
is introduced, then a uniform random
is generated at each iteration to control the switching process between
527 (13) 528 529
where
represents the current iteration, and
530
threshold. The value of
is the mutation scheme switching
is set to 0.5 to give an equal probability of the two 23
531
mutation schemes to be selected at each iteration. In other words, the probability of
532
the generated value of the
533
equal.
at each iteration to be smaller or greater than
is
534 535 536
3.3. Optional Crossover The canonical DE algorithm applies the binomial crossover process between
537
the target solution
(current solution) and the mutant solution
in order to
538
produce a trial solution
539
of both the target and trial solutions are compared to select the fittest solution that
540
will survive for the next iteration. In fact, it is not always guaranteed that the trial
541
solution will be fitter than both the mutant and the target solution. Particularly, the
542
crossover process is not always a step forward in enhancing the quality of the current
543
solution. Hence, in this research work, the possibility of the mutant solution
544
fitter than the trial solution
545
solution is modified to represent the fitter solution among the mutant and the trial
546
solutions based on the value of the objective function
547
follows:
(as in Eq. (3)). After that, the objective function outputs
being
is considered. Therefore, the definition of the trial
to be minimized as
548 (14) 549 550
The new introduced simple mechanism of determining the trial solution can
551
increase the convergence speed of the algorithm. Also, the crossover probability
552
in Eq. (3) is usually set as a fixed value within the range of
553
the proposed approaches in the literature. In the proposed algorithm, the value of the
554
is varied through the iterations within the range of values 24
in the majority of
using a uniform
555
random function. Meanwhile, the
value becomes a random integer number
556
since now the dimension of the optimization problem depends on
557
the number of clusters and the number of attributes of the dataset to be clustered.
558
This random assignment increases the diversity of the search and produces a variety
559
of mutant solutions that contribute to evolving the quality of the solution.
560 561
3.4. Selection
562
After determining the trial solution using the modified crossover scheme, the
563
selection of the fittest solution among the target and the trial solutions is determined
564
similarly to the canonical DE algorithm as in Eq. (4). Eventually, the evolution of the
565
quality of the target solution
566
crossover, and selection processes are repeated until convergence.
is transferred to the next iteration and the mutation,
567
The main objective of the proposed algorithm is to enhance the exploitation
568
ability of the DE algorithm through the divide and conquer strategy and to regulate
569
its random exploration behavior. The input dataset is divided into
570
the attribute with the highest variance, where each subset is expected to have at least
571
one cluster. Then, using the first mutation strategy, the difference between the
572
randomly produced solutions (where each centroid is selected from its corresponding
573
subset) confines each centroid to exploit its current subset for the best position. Since
574
only using this strategy might lead to premature convergence or stagnation, an
575
alternative strategy (the second mutation strategy) is also presented to produce these
576
random solutions by selecting the centroids positions within the bounds of the whole
577
dataset. This second mutation strategy allows the random exploration of the search
578
space but does not waste the resources of the algorithm since the best solution
579
scheme is adopted. The switching between these two mutation schemes with an equal
25
subsets based on
580
probability gives the algorithm the ability to balance the search behavior of the
581
algorithm.
582
Another noteworthy modification that contributes to the balance of the search
583
behavior of the DE algorithm is the mutation factor. The vector-based estimation of
584
the mutation factor
585
the search process, where the difference of each attribute in the produced solution is
586
weighted based on its variance. This estimation takes into consideration the
587
distribution of the input data and logically leads to a better search behavior.
according to the variance of each attribute gives more insight to
588
The last proposed modification is the optional crossover strategy which aims
589
at increasing the convergence speed of the algorithm. Due to the random search
590
characteristics of the DE algorithm, a better solution is not expected to appear at each
591
iteration. By applying the same concept to the crossover process, it is not always
592
guaranteed to produce a trial solution which is fitter than the mutant solution and
593
vice versa. Hence, selecting the fittest among these two solutions to be compared to
594
the best solution increases the chance to find the global solution and consequently
595
increases the convergence speed of the algorithm. For more clarity, the analysis of
596
the working mechanism of the proposed VDEO framework along with an illustrative
597
example are provided in Section 3.5. The pseudo-code of the proposed VDEO
598
algorithm is given in Algorithm 1.
599 600 601 602 603 604
26
605
Algorithm 1: Pseudo-code of the proposed VDEO algorithm.
: Dataset : Optimal clusters 1) Initialization // (number of clusters) // Centroids positions of , where as in Eq. (9) // Mutation factor // Sort and divide the dataset into subsets as in Eq. (8) // Best solution // (maximum number of iterations) // (the current iteration)
2) Mutation Generate [
and
as in Eq. (9) within the corresponding subset range
]; Create the mutant solution
as in Eq. (10);
Generate and as in Eq. (9) within the whole dataset range Create the mutant solution as in Eq. (11); Create the trial solution
;
as in Eq. (3);
3) Optional crossover The trial solution
;
The trial solution
;
4) Selection Determine
as in Eq. (4);
606 607
3.5. Characteristics of The Proposed Algorithm
608
This section is dedicated to provide more enlightening about the working
609
mechanism of the proposed VDEO algorithm. To observe the search behavior of the
610
proposed algorithm, it was applied to cluster the Ruspini dataset [48] which has 75
611
observations, 2 dimensions, and 4 clusters. This dataset was selected for illustration
612
purposes only due to its simplicity, clarity of its distinctive clusters, and the ability to
613
plot its 2-D features. 27
614
The VDEO algorithm adopts a stochastic and heuristic working mechanism
615
(which is based on the DE algorithm) while searching for the solution. This
616
randomness property made observing and tracing the solutions trajectories through
617
iterations strenuous. Therefore, in what follows, only examples of the significant
618
cases are provided to illustrate the working mechanism of the VDEO algorithm when
619
applied to the Ruspini dataset. In Fig. 2, the current solution (centroids) are
620
represented by black asterisk marks (i.e., ‘*’) and it was initialized randomly.
621
Moreover, Fig. 2 shows the initial setups of the VDEO algorithm, where each
622
rectangle represents a subset bounds after dividing the dataset into 4 subsets (as in
623
Eq. (8)). The colors of each rectangle and the included data points are only set for
624
clarification purposes and they are not indicating the current clusters. The randomly
625
generated solutions from the first mutation strategy are represented by red, green,
626
and blue triangles where every four triangles sharing the same color represent a
627
complete solution. As shown in Fig. 2(b), the generation of these solutions is
628
restricted by selecting each centroid from a different subset in case this mutation
629
strategy is selected (exploitation). In other words, each subset bound must contain
630
three different colored triangles (there are no triangles sharing the same color inside
631
one subset bound or located outside the bounds). Similarly, the randomly generated
632
solutions from the second mutation strategy are represented by red and blue squares
633
where every four squares sharing the same color represents a complete solution. As
634
shown in Fig. 2(c), the generation of these solutions is totally random and each
635
centroid is selected from the whole dataset bound (exploration) with no restrictions
636
related to the subset bounds. These two strategies will alternatively generate different
637
permutations of the random solutions that will contribute to producing the mutant
638
solution at each iteration.
28
639
Fig. 2(d) shows the mutant and the trial solutions which are represented by
640
cyan diamonds and yellow hexagrams, respectively. In this figure, neither the mutant
641
nor the trial solutions (centroids) are located in better positions than the current
642
solution (the best solution). Specifically, regardless of the colors of the data points,
643
the mutant and the trial solutions do not provide better clustering solution
644
(representative centroids of the four clusters of the Ruspini dataset). Hence, in this
645
scenario, the current solution will not be changed and it will survive to the next
646
iteration. On the other hand, Fig. 2(e) shows another scenario where the trial solution
647
is fitter than the current solution. It shares three centroids with the current solution
648
and a better position for the fourth centroid (the yellow hexagram in the blue
649
rectangle). This better position is a result of crossing over the current solution and the
650
mutant solution provided by the exploitative mutation scheme. To be more specific,
651
the trial vector inherits the first three centroids from the current solution and the
652
fourth centroid from crossing over the current and the mutant solutions (the x-axis
653
from the current solution and the y-axis from the mutant solution). Hence, in this
654
scenario, the trial solution will be set as the current solution (the best solution) and
655
the fourth centroid will move from the current position to the better one as illustrated
656
by the black arrow. Eventually, the mutation (two switchable schemes), crossover
657
(with optional strategy) and selection processes are repeatedly executed until the
658
final best solution is obtained which is depicted in Fig. 2(f).
29
659 660 661 662
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 2. Ruspini dataset clustering by the proposed VDEO algorithm. (a) initial setup of the algorithm, (b) random solutions generation using the first mutation scheme, (c) random solutions generation using the second mutation scheme, (d) neither the mutant nor the trial solutions are fitter than the current one, (e) the trial solution is fitter than the current solution, and (f) the final clustering solution.
663 664
30
665
4. Experimental Results and Discussion
666
The proposed VDEO algorithm aims at enhancing the convergence speed of
667
the canonical DE algorithm while providing high quality clustering solutions. To
668
achieve these two conflicting goals, the proposed VDEO algorithm presents a new
669
switchable mutation scheme, multidimensional and variance-based estimation of the
670
mutation factor, and an optional crossover strategy. Intuitively, these modifications
671
will set the appropriate balance between the exploitation and exploration processes
672
and consequently improve both the convergence rate and the clustering solution
673
quality.
674 675
4.1. Methods
676
In order to validate the proposed VDEO algorithm, it was tested on 15
677
datasets selected from the UCI repository [49]. The selection of these datasets
678
considers different distributions, complexities, and clusters overlapping of the data.
679
The used datasets are Iris, Balance, Wine, Cancer (WDBC), Lung Cancer,
680
Transfusion (BTSCD), Breast Cancer (WDBC-Int), Glass, Vowel, Seeds, New
681
Thyroid, Haberman, Dermatology, Heart, and Landsat. The description of these
682
datasets is provided in Table 1.
683 684 685 686 687 688 689
31
690
Table 1: Description of datasets.
Dataset Iris Haberman New Thyroid Seeds Lung Cancer Glass Wine Balance Vowel BTSCD Heart WDBC-Int Dermatology WDBC Landsat
Number of observations 150 306 215 210 32 214 178 625 871 748 303 699 366 569 2000
Number of features 4 3 5 7 56 9 13 4 3 4 13 9 34 30 36
Number of classes 3 2 3 3 3 6 3 3 6 2 2 2 6 2 6
691 692
The performance of the proposed algorithm was compared to the canonical
693
DE algorithm [24] with two variations DE/rand/1/bin and DE/best/1/bin based on the
694
adopted mutation scheme. Moreover, the comparison included the most recent state-
695
of-the-art DE-based clustering techniques, namely, a dynamic shuffled differential
696
evolution algorithm for data clustering (DSDE) [45], and a forced strategy
697
differential evolution algorithm for data clustering (FSDE) [46]. For a fair
698
comparison, the maximum number of fitness function evaluations was set to 1e4 for
699
all experiments as recommended in [50]. The parameters used for all algorithms are
700
summarized in Table 2 according to the reported setups in their original works.
701
First, each of the competing algorithms was tested on the 15 selected datasets
702
through 50 independent runs. The average objective function values (sum of intra-
703
cluster distances) and the average classification accuracy (CA) of the obtained
704
solutions by each algorithm through the 50 runs are presented in Tables 3 and 4,
705
respectively. Then, to detect the statistical differences among a group of results, the 32
706
Friedman Aligned-Ranks (FA) test was used to obtain a rank for each algorithm [51].
707
Consequently, the adjusted p-values can be computed by applying a post-hoc method
708
using the results in the Friedman Aligned-Ranks test. The Holm's test, which is
709
described in [52] is adopted as the post-hoc method. In this paper, the null hypothesis
710
is the case of no difference between the performances of two clustering methods. If
711
the p-value is less than or equal to a specified significance level, then the null
712
hypothesis is rejected and the existence of a significant difference between the two
713
methods is accepted. To determine the significance, this study set the default level of
714
significance to 0.05, then the adjusted p-values obtained by the Holm post-hoc will
715
decide the corresponding level of significance of each experiment. Eventually, the
716
convergence rates of each algorithm represented by the convergence curves are
717
provided for all algorithms on all datasets in Fig. 3.
718 719
Table 2: Experimental parameter settings for the DE-based clustering algorithms.
Algorithms DE/rand/1/bin DE/best/1/bin [24] DSDE [45]
Parameters/Values
FSDE [46] The proposed VDEO 720 721
4.2. Experimental Results
722
From the reported results in Table 3, the proposed VDEO algorithm obtained
723
similar or better objective function values than all the competing algorithms in 13 out
724
of 15 test datasets (bolded results), while it also attained a very competitive objective
725
function values in the other 2 datasets (Wine and Balance) in contrast to the best
726
values achieved by the DSED and DE/best/1/bin algorithms, respectively. Moreover, 33
727
the standard deviation values obtained by the proposed VDEO algorithm are smaller
728
than those achieved by the other competing algorithms in 12 out of 15 datasets,
729
which suggests the effectiveness and results repeatability of the proposed VDEO
730
algorithm. In general, the DE/best/1/bin algorithm produced better objective function
731
values than the DE/rand/1/bin algorithm in all test datasets except for the Iris,
732
Haberman, WDBC-Int, and WDBC datasets. Furthermore, the FSDE and DSDE
733
algorithms performed better than both DE/best/1/bin and DE/rand/1/bin algorithms in
734
at least 5 and 12 out of 15 test datasets, respectively. The employed forced strategy
735
by the FSDE algorithm slightly enhanced the performance in contrast to the both
736
mutation strategies (rand and best) adopted by the canonical DE algorithm.
737 738 739
Table 3: Average Objective Function Values and Standard Deviation (Std) Among the Competing DEBased Clustering Algorithms For 50 Runs on the 15 Datasets. Dataset
Method DE/rand/1/bin DE/best/1/bin Iris DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Haberman DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin New Thyroid DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Seeds DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Lung Cancer DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Glass DSDE FSDE
Best 96.62 96.54 96.65 96.65 96.54 2566.99 2566.99 2566.99 2566.99 2566.99 1873.31 1866.47 1866.47 1866.54 1866.47 315.48 311.80 311.80 311.83 311.80 119.38 106.20 105.34 113.13 103.52 265.17 214.81 210.26 222.54
34
Worst 109.03 127.57 96.65 96.77 96.54 2567.00 2567.82 2566.99 2566.99 2566.99 2093.37 2155.62 1895.99 1895.91 1868.44 352.28 311.80 311.80 315.37 311.80 124.79 117.56 109.52 117.97 109.72 314.54 254.70 249.55 271.33
Median 98.26 96.54 96.65 96.68 96.54 2566.99 2566.99 2566.99 2566.99 2566.99 1898.33 1890.21 1868.29 1890.27 1868.29 327.29 311.80 311.80 311.93 311.80 122.31 110.97 106.97 116.48 106.14 282.51 246.69 215.19 246.36
Mean 100.10 100.52 96.65 96.70 96.54 2566.99 2567.07 2566.99 2566.99 2566.99 1919.12 1900.50 1874.00 1882.60 1867.51 329.15 311.80 311.80 312.19 311.80 122.04 111.76 107.23 116.17 105.91 283.20 243.27 220.83 245.02
Std 4.22 9.81 0.00 0.10 0.00 0.00 0.26 0.00 0.00 0.00 52.23 60.73 11.76 11.74 0.91 10.88 0.00 0.00 0.78 0.00 1.64 3.54 1.18 1.63 1.51 11.39 10.09 12.16 12.13
The proposed VDEO DE/rand/1/bin DE/best/1/bin Wine DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Balance DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Vowel DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin BTSCD DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Heart DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin WDBC-Int DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Dermatology DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin WDBC DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin Landsat DSDE FSDE The proposed VDEO
210.40 16309.31 16292.18 16292.18 16295.19 16292.43 1429.89 1423.82 1423.82 1424.47 1423.83 171462.40 148967.24 149062.08 149684.28 149073.62 407714.24 407714.23 407714.23 407714.23 407714.23 11687.69 11685.14 11685.14 11685.41 11685.15 2965.37 2964.39 2964.39 2964.41 2964.41 2760.83 2049.92 1995.83 2398.88 1995.83 149519.13 149473.88 149473.86 149479.81 149473.86 190329.69 117269.37 106209.14 149897.82 98762.72
216.55 16440.72 16294.69 16292.67 16423.60 16295.15 1435.36 1425.94 1431.59 1430.51 1426.29 187476.12 153051.91 160667.46 169710.95 150913.46 407721.83 407714.23 407714.23 407918.88 407714.23 11694.39 11685.46 11685.21 11694.35 11685.15 2972.50 3108.78 2990.20 2964.65 2964.49 2964.67 2172.80 2017.63 2722.89 1995.83 149881.22 152280.00 149473.93 149555.75 149473.86 231272.96 165444.97 120436.04 183902.56 110352.64
214.03 16355.82 16292.67 16292.18 16303.89 16293.21 1431.30 1423.82 1428.44 1426.81 1425.64 179146.96 149070.46 150130.05 154197.35 149388.61 407714.25 407714.23 407714.23 407714.23 407714.23 11689.00 11685.14 11685.15 11685.78 11685.15 2966.53 2964.39 2964.39 2964.46 2964.43 2864.28 2099.10 2007.15 2619.30 1995.83 149552.64 149473.94 149473.86 149486.14 149473.86 214775.55 121387.57 116954.00 171863.49 104248.29
213.62 1.99 16359.22 33.34 16292.92 0.77 16292.28 0.20 16325.26 38.92 16293.56 0.87 1432.00 1.73 1424.43 0.87 1427.70 3.74 1427.29 1.97 1425.13 0.88 179692.78 4576.01 149850.77 1091.12 150629.33 2530.23 156641.90 6285.35 149682.56 615.17 407715.10 2.24 407714.23 0.00 407714.23 0.00 407733.76 52.21 407714.23 0.00 11689.55 1.90 11685.18 0.11 0.02 11685.15 11686.43 2.08 11685.15 0.00 2966.93 1.80 2971.67 32.27 2968.28 9.45 2964.47 0.05 2964.43 0.02 2854.43 48.66 2104.62 28.43 2006.13 6.45 2600.49 78.55 1995.83 0.00 149577.83 81.82 149621.72 626.18 149473.87 0.02 149490.61 16.88 149473.86 0.00 214280.25 10694.11 137836.10 22526.19 115179.22 4685.60 169425.23 10704.02 104574.39 3028.83
740 741
On the other hand, the DSED algorithm significantly enhanced the
742
performance of the DE algorithm by only employing the best strategy coupled with
743
the adopted initialization and sorting/dividing schemes of the population. Eventually,
744
the proposed VDEO algorithm employed the switchable property with equal
35
745
probability between the best and rand mutation schemes to inherit the advantages of
746
both of them and to balance the exploitation and exploration processes. In addition,
747
the restricted choice of generating the random solutions that form the mutant solution
748
associated with each mutation scheme gave the proposed VDEO algorithm the
749
superiority over all the competing algorithms.
750
Similarly, the best, worst, median, mean, and standard deviation (Std) values
751
of the average classification accuracies obtained by the competing algorithms are
752
summarized in Table 4. It is observed that the proposed VDEO framework obtained
753
similar or better average classification accuracies in 10 out of 15 datasets than all
754
other competing algorithms (i.e., the Iris, Haberman, Seeds, Balance, BTSCD, Heart,
755
WDBC-Int, Dermatology, WDBC, and Landsat datasets), and provided a very
756
competitive results in the remaining 5 datasets. Furthermore, the proposed algorithm
757
achieved an average enhancement of the classification accuracy up to 11.98% over
758
the competing algorithms in the Landsat dataset, which is considered as a large scale
759
one.
760 761 762
Table 4: Average classification accuracy and standard deviation (Std) among the competing DE-based clustering algorithms for 50 runs on the 15 datasets. Dataset
Iris
Haberman
New Thyroid
Seeds
Method DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin
Best% 94.67 90.00 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 67.91 65.58 65.58 65.58 65.58 90.95 89.52
36
Worst% 88.67 54.00 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 60.00 50.70 58.14 58.14 59.30 86.19 89.52
Median% 90.00 90.00 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 65.58 65.58 62.79 65.58 63.12 89.52 89.52
Mean% 89.43 85.23 90.00 90.00 90.00 51.96 51.96 51.96 51.96 51.96 64.74 63.79 62.14 63.44 62.58 88.93 89.52
Std 1.47 11.88 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.29 3.94 1.71 2.93 1.34 1.19 0.00
Lung Cancer
Glass
Wine
Balance
Vowel
BTSCD
Heart
WDBC-Int
Dermatology
WDBC
Landsat
DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO DE/rand/1/bin DE/best/1/bin DSDE FSDE The proposed VDEO
89.52 89.52 89.52 59.26 62.96 70.37 59.26 66.67 51.87 54.21 53.27 54.21 55.61 71.91 71.91 71.91 71.91 71.91 57.76 56.16 53.92 55.36 58.40 61.54 57.98 57.98 59.82 59.01 65.11 65.11 65.11 65.11 65.11 59.26 59.26 58.92 59.26 58.92 96.49 96.49 96.49 96.49 96.49 32.12 30.73 29.89 31.01 29.89 86.82 86.82 86.82 86.82 86.82 51.55 51.60 52.80 51.85 67.60
37
89.52 89.52 89.52 37.04 37.04 51.85 37.04 48.15 48.13 50.47 37.85 47.20 46.73 70.79 70.79 71.35 70.79 70.79 44.00 49.44 50.08 47.20 47.20 45.24 47.30 47.76 47.53 47.65 65.11 65.11 65.11 65.11 65.11 58.25 58.59 58.59 58.59 58.92 96.19 95.17 96.34 96.49 96.49 25.70 26.82 26.54 24.58 29.89 86.82 86.82 86.82 86.82 86.82 31.35 33.00 50.90 33.35 45.70
89.52 89.52 89.52 46.30 44.44 62.96 46.30 59.26 49.53 51.17 48.60 51.87 51.87 70.79 71.35 71.91 71.35 71.35 50.64 52.64 52.64 52.48 53.28 51.72 49.37 57.06 52.18 49.20 65.11 65.11 65.11 65.11 65.11 58.92 58.92 58.75 58.92 58.92 96.49 96.49 96.49 96.49 96.49 28.77 27.51 27.09 28.63 29.89 86.82 86.82 86.82 86.82 86.82 40.28 50.85 51.03 47.98 55.75
89.52 89.52 89.52 45.56 46.30 61.67 44.63 58.52 49.86 51.50 48.41 51.82 51.33 71.12 71.46 71.80 71.43 71.35 50.30 52.45 52.75 52.09 53.64 53.15 52.20 54.63 53.19 52.22 65.11 65.11 65.11 65.11 65.11 58.79 58.89 58.75 58.87 58.92 96.45 96.42 96.46 96.49 96.49 29.01 28.07 27.67 28.39 29.89 86.82 86.82 86.82 86.82 86.82 41.47 43.94 51.31 43.60 57.06
0.00 0.00 0.00 8.51 8.37 5.80 7.36 5.71 1.07 0.97 3.44 1.62 1.10 0.46 0.35 0.23 0.38 0.45 4.22 1.71 0.94 2.10 2.62 4.33 4.59 4.27 4.02 4.55 0.00 0.00 0.00 0.00 0.00 0.25 0.15 0.17 0.16 0.00 0.08 0.29 0.05 0.00 0.00 1.62 1.25 1.25 1.93 0.00 0.00 0.00 0.00 0.00 0.00 7.33 8.88 0.65 7.74 5.68
763
In addition to the aforementioned competitive results obtained by the
764
proposed VDEO algorithm in terms of both average objective function and CA
765
values over the competing algorithms, the convergence rate to the optimal solution is
766
also improved. Fig. 3 illustrates the convergence performance of the competing
767
algorithms through the maximum number of fitness function evaluations over the 50
768
runs on all test datasets. It is noticed that the objective function values of all
769
competing algorithms at the beginning are relatively high if compared to the DSDE
770
algorithm. This is imputed to the employed initialization method (multistep
771
sampling) by the DSDE algorithm, which enhances the quality of the initial solutions
772
as a preprocessing step. Conversely, all other competing algorithms employed a
773
random initialization method of the initial solutions. Obviously, the DE/best/1/bin
774
algorithm converged faster than both DE/rand/1/bin and FSDE algorithms in all test
775
datasets regardless of the finally obtained objective function value. This fast
776
convergence is due to the adopted mutation scheme that is based on involving the
777
best solution while producing the mutant solution. Unfortunately, the greediness of
778
this adopted mutation scheme sometimes leads to the premature convergence to a
779
local optimal solution. For Instance, Fig. 3a shows the fast convergence of the
780
DE/best/1/bin algorithm in contrast to the FSDE algorithm at the beginning, but
781
eventually, the FSDE algorithm succeeded in finding a better solution (with better
782
objective function value) than the DE/best/1/bin algorithm. Moreover, the
783
convergence performance of the DE/rand/1/bin is considered to be poor if compared
784
to all other algorithms due to the adopted rand mutation scheme in producing the
785
mutant solution. Furthermore, the adopted forced mutation scheme by the FSDE
786
algorithm improved the convergence of the canonical DE algorithm with the
38
787
DE/rand/1/bin mutation scheme to some extent, but it is still inferior in comparison
788
to the DE/best/1/bin, DSDE, and the proposed VDEO algorithm.
789 790
4.2.1. Statistical analysis of the results
791
Friedman Aligned-Ranks test was used to detect the significance of the
792
proposed algorithm over the other competing algorithms. Table 5 displays the
793
computed ranks through the Friedman Aligned-Ranks (FA) test and the adjusted p-
794
values with Holm post-hoc test (Holm APV) for the mean objective function values
795
(Table 3). In this table, the algorithms are ordered from the best to the worst ranking,
796
and significant results in Holm APV are bolded. Since the proposed VDEO
797
algorithm obtained the best FA ranking, it was considered as the control method. The
798
obtained p-values by the Holm post-hoc test confirm the significant improvement of
799
the proposed VDEO algorithm over the FSDE and DE/rand/1/bin algorithms (their p-
800
values 0.004819 and 0.000002 are ≤ 0.016667 and 0.0125, respectively).
801 802 803
Table 5: Average ranking obtained by Friedman Aligned-Ranks and Holm's test for objective function values using the 15 datasets of the competing algorithms.
Algorithm The proposed VDEO DSDE DE/best/1/bin FSDE DE/rand/1/bin 804
FA ranking 22.4 25.6 36.8667 44.8333 60.3
p-Values
Holm APV
0.687611 0.069091 0.004819 0.000002
0.05 0.025 0.016667 0.0125
Aligned Friedman statistic: 12.084555. p-Value computed: 0.016733445627.
805 806
Similarly, the Friedman Aligned-Ranks analysis was conducted on the
807
provided results in Table 4. The performance order of the competing algorithms in
808
terms of the obtained average CA is the proposed VDEO algorithm > DSED >
809
DE/best/bin/1 > FSDE > DE/rand/bin/1. This order is concluded from the FA 39
810
ranking and the obtained p-values by the Holm post-hoc test as shown in Table 6.
811
However, the results show that there is no significant improvement of the proposed
812
VDEO algorithm over the competing algorithms in terms of average CA (none of the
813
competing algorithms have an unadjusted p-value ≤ the corresponding Holm APV
814
value).
815 816
Table 6: Average ranking obtained by Friedman Aligned-Ranks and Holm's test for CA using the 15 datasets of the competing algorithms.
Algorithm The proposed VDEO DSDE DE/best/1/bin FSDE DE/rand/1/bin 817
FA ranking 28.8 33.8333 39.4 42.7 45.2667
p-Values
Holm APV
0.52708 0.182875 0.080703 0.038533
0.05 0.025 0.016667 0.0125
Aligned Friedman statistic: 12.586136. p-Value computed: 0.013485454537.
818 819
4.3. Discussions
820
The combination of the best and rand mutation schemes with a switchable
821
property enhanced the convergence and decreased the probability of the premature
822
convergence. The effect of this combination along with the associated selection
823
criteria of the random solutions that produce the mutant solution in each mutation
824
scheme (exploitation and exploration) is obvious in the proposed VDEO algorithm.
825
More specifically, the proposed VDEO algorithm converged faster than all the
826
competing algorithms in all test datasets despite the random initialization of the
827
initial solutions as shown in Fig. 3a-o. In other words, the proposed first mutation
828
scheme subdivides the dataset into smaller subsets according to the data variance and
829
employed the rand mutation scheme with random generation of solutions within each
830
subset bounds. This mechanism explains the fast dropouts of the produced
831
convergence curves of the proposed VDEO algorithm at the beginning in all datasets
832
(Fig. 3) due to the high probability of finding the best solution within these subsets’ 40
833
bounds (exploitation). Then, the proposed second mutation scheme employed the
834
best mutation scheme with random generation of solutions from the whole dataset
835
bound (exploration). This mechanism gives diversity to the search in case the optimal
836
solution was not located within subsets’ bounds. For instance, this explains finding
837
better solutions (objective function values) by the proposed VDEO algorithm at later
838
iterations in the Lung Cancer, Glass, Dermatology, and Landsat datasets (Fig. 3e, f,
839
m, and o, respectively). It is worth mentioning that the proposed variance-based
840
multidimensional estimation of the mutation factor and the proposed optional
841
crossover strategy also contribute in finding better and faster solutions, especially in
842
the aforementioned high dimensional datasets.
843
In general, the novelty of the proposed algorithm is a combination of multiple
844
modifications to enhance the balance between the exploitation and exploration
845
processes. These modifications are inseparable where the effect of each modification
846
alone is barely noticeable to the overall performance and it could be a
847
complementary part to another modification. Specifically, the initialization and
848
preprocessing step addresses the difficulty of determining the initial population size
849
along with the expensive function evaluations for each solution (part of the
850
convergence speed). Also, it addresses a part of the exploitation process by
851
determining the bounds of each subset and another part of the exploration by
852
determining the mutation factor vector. In addition, the switchable mutation scheme
853
mechanism addresses the second part of the exploitation and exploration processes
854
together by giving an equal probability of execution. Eventually, the optional
855
crossover strategy contributes to the convergence rate by checking the quality of the
856
mutant solution against the trial one. Hence, these interleaved modifications should
857
be considered as one package to achieve the best results. Another noteworthy
41
858
limitation of the study is determining the value of the probability threshold
. In this
859
study, an equal probability assumption has been considered to select between the two
860
proposed mutation schemes. However, it was noticed from some experiments that the
861
value of this threshold also affects the convergence rate of some datasets in case it
862
was increased or decreased. Hence, finding the optimal value of this threshold is still
863
under investigation.
864
According to the aforementioned analyses, it can be concluded that the
865
performance of the proposed VDEO algorithm is significantly enhanced in contrast
866
to the canonical DE algorithm and outperformed the other state-of-the-art DE-based
867
clustering algorithms in terms of the objective function values (cluster compactness),
868
classification accuracy, repeatability, and convergence speed.
42
(a)
(b)
(c)
(d)
(e)
(f)
43
(g)
(h)
(i)
(j)
(k)
(l)
44
(m)
(n)
(o)
Fig. 3. The convergence rate of the competing algorithms on the 15 datasets. (a) Iris, (b) Haberman, (c) New Thyroid, (d) Seeds, (e) Lung Cancer, (f) Glass, (g) Wine, (h) Balance, (i) Vowel, (j) BSTCD, (k) Heart, (l) WDBC-Int, (m) Dermatology, (n)WDBC, (o) Landsat.
45
869
5. Conclusion and Future work
870
The main purpose of this research work is to balance the search behavior of
871
the canonical DE algorithm and improve its efficiency as a clustering tool. This
872
purpose was fulfilled via proposing the VDEO algorithm which includes four main
873
modifications to address the canonical DE limitations. First, the VDEO algorithm
874
adopted a single-based solution representation and a preprocessing step for the input
875
data to be clustered. This phase reduces the limitations associated with multi-based
876
solutions techniques such as the required number of initial solutions and their
877
associated complexity represented by the fitness function evaluation of each solution.
878
Moreover, the preprocessing (splitting into
879
feature) of the input data to be clustered is done only once and it is considered as a
880
preparation stage for the exploitation process. Secondly, the proposed VDEO
881
algorithm employed two mutation schemes with a switchable mechanism to balance
882
the exploitation and exploration processes during the search. Specifically, the first
883
mutation scheme produces mutant solutions using randomly generated solutions
884
within the bounds of the split subsets to reinforce the exploitation. On the other hand,
885
the second mutation scheme produces mutant solutions using randomly generated
886
solutions within the bounds of the whole dataset to preserve the diversity of the
887
search (exploration). The third modification is represented by a simple variance-
888
based estimation method of the mutation factor that takes into consideration the data
889
distribution at each dimension. Both of the switchable mechanism and the mutation
890
factor estimation contribute in providing higher quality mutant solutions that increase
891
the probability of finding the optimal one. Lastly, an optional crossover strategy was
892
also presented, where the necessity of the crossover process is determined according
893
to the fitness of both the mutant and trial solutions. In addition, the crossover
46
subsets based on the highest variance
894
probability is varied through iterations within a specified range to give more
895
stochastic characteristics to the algorithm and increases the chance of finding better
896
permutations of the trial solutions. By including the latter four modifications in one
897
framework, the experimental results showed the prominence performance of the
898
proposed VDEO algorithm against its peers by providing better clustering solutions
899
in terms of classification accuracy, cluster compactness (sum of intra-cluster
900
distances), repeatability, and convergence speed.
901
Prospective investigations could be conducted while optimizing the control
902
parameters in the DE algorithm such as the mutation factor and the crossover
903
probability. Many research works in the literature introduced a controlled amount of
904
randomness. It is surely interesting to investigate the usefulness of increasing or
905
decreasing the degree of randomization and which are the suitable methods to do
906
this. More specifically, a linking could be made between the amount of randomness
907
with some features of the clustering problem or finding some kind of correlation
908
among the decision variables. In addition, the relation between the input data
909
characteristics and the probability threshold
910
research.
911 912 913 914 915 916 917 918
47
can be a potential future avenue of
919
References
920 921 922
[1] T.D. Rajkumar, S.P. Raja, A. Suruliandi, Users’ Click and Bookmark Based Personalization Using Modified Agglomerative Clustering for Web Search Engine, International Journal on Artificial Intelligence Tools, 26 (2017) 1-16.
923 924 925
[2] L.M. Torres, E. Magana, D. Morato, S. Garcia-Jimenez, M. Izal, TBDClust: Timebased density clustering to enable free browsing of sites in pay-per-use mobile Internet providers, J. Netw. Comput. Appl., 99 (2017) 17-27.
926 927 928
[3] G. Saisai, H. Wei, L. Haoxuan, Q. Yuzhong, Property Clustering in Linked Data: An Empirical Study and Its Application to Entity Browsing, International Journal on Semantic Web and Information Systems (IJSWIS), 14 (2018) 31-70.
929 930
[4] C.H. Chou, S.C. Hsieh, C.J. Qiu, Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction, Appl. Soft. Comput., 56 (2017) 298-316.
931 932
[5] V. Holy, O. Sokol, M. Cerny, Clustering retail products based on customer behaviour, Appl. Soft. Comput., 60 (2017) 752-762.
933 934 935
[6] E. Kurum, G.W. Weber, C. Iyigun, Early warning on stock market bubbles via methods of optimization, clustering and inverse problems, Ann. Oper. Res., 260 (2018) 293-320.
936 937 938
[7] A.B.U. Najera, J. de la Calleja, M.A. Medina, Associating students and teachers for tutoring in higher education using clustering and data mining, Comput. Appl. Eng. Educ., 25 (2017) 823-832.
939 940 941
[8] A.M. Navarro, P. Moreno-Ger, Comparison of Clustering Algorithms for Learning Analytics with Educational Datasets, International Journal of Interactive Multimedia and Artificial Intelligence, 5 (2018) 9-16.
942 943 944
[9] J.d. Andrade Silva, E.R. Hruschka, J. Gama, An evolutionary algorithm for clustering data streams with a variable number of clusters, Expert Systems with Applications, 67 (2017) 228-238.
945 946
[10] R. Hyde, P. Angelov, A.R. MacKenzie, Fully online clustering of evolving data streams into arbitrarily shaped clusters, Information Sciences, 382-383 (2017) 96-114.
947 948 949
[11] J.M.V. Kinani, A.J.R. Silva, F.G. Funes, D.M. Vargas, E.R. Diaz, A. Arellano, Medical Imaging Lesion Detection Based on Unified Gravitational Fuzzy Clustering, J. Healthc. Eng., 2017 (2017) 14 pages.
950 951
[12] N.D. Thanh, M. Ali, L.H. Son, A Novel Clustering Algorithm in a Neutrosophic Recommender System for Medical Diagnosis, Cogn. Comput., 9 (2017) 526-544.
952 953 954
[13] L.D. Wang, X.G. Zhou, Y. Xing, M.K. Yang, C. Zhang, Clustering ECG heartbeat using improved semi-supervised affinity propagation, IET Softw., 11 (2017) 207-213.
48
955 956 957
[14] S. Saraswathi, M.I. Sheela, A comparative study of various clustering algorithms in data mining, International Journal of Computer Science and Mobile Computing, 11 (2014) 422-428.
958 959 960
[15] J.A. Hartigan, M.A. Wong, Algorithm AS 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics), 28 (1979) 100108.
961 962 963
[16] M.E. Celebi, H.A. Kingravi, P.A. Vela, A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications, 40 (2013) 200-210.
964
[17] J. Han, J. Pei, M. Kamber, Data mining: concepts and techniques, Elsevier, 2011.
965 966
[18] A. Moreira, M.Y. Santos, S. Carneiro, Density-based clustering algorithms– DBSCAN and SNN, University of Minho-Portugal, (2005).
967 968
[19] S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarm and Evolutionary Computation, 16 (2014) 1-18.
969 970 971
[20] R. Ayachi, H. Bouhani, N.B. Amor, An Evolutionary Approach for Learning Opponent's Deadline and Reserve Points in Multi-Issue Negotiation, International Journal of Interactive Multimedia and Artificial Intelligence, 5 (2018) 131-140.
972 973
[21] A.K. Kar, Bio inspired computing – A review of algorithms and scope of applications, Expert Systems with Applications, 59 (2016) 20-32.
974 975 976
[22] M. Mavrovouniotis, C. Li, S. Yang, A survey of swarm intelligence for dynamic optimization: Algorithms and applications, Swarm and Evolutionary Computation, 33 (2017) 1-17.
977 978
[23] I. BoussaïD, J. Lepagnot, P. Siarry, A survey on optimization metaheuristics, Information Sciences, 237 (2013) 82-117.
979 980 981
[24] R. Storn, K. Price, Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces, Journal of Global Optimization, 11 (1997) 341-359.
982 983 984 985
[25] J. Vesterstrom, R. Thomsen, A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems, in: Proceedings of the Congress on Evolutionary Computation, Portland, OR, USA, 2004, pp. 1980-1987.
986 987
[26] K. Price, R.M. Storn, J.A. Lampinen, Differential evolution: a practical approach to global optimization, Springer-Verlag Berlin Heidelberg, 2006.
988 989
[27] N. Noman, H. Iba, Accelerating Differential Evolution Using an Adaptive Local Search, IEEE Transactions on Evolutionary Computation, 12 (2008) 107-125.
990 991
[28] R. Knobloch, J. Mlýnek, R. Srb, The classic differential evolution algorithm and its convergence properties, Applications of Mathematics, 62 (2017) 197-208.
49
992 993 994
[29] S. Das, A. Abraham, U.K. Chakraborty, A. Konar, Differential Evolution Using a Neighborhood-Based Mutation Operator, IEEE Transactions on Evolutionary Computation, 13 (2009) 526-553.
995 996
[30] F. Neri, V. Tirronen, Recent advances in differential evolution: a survey and experimental analysis, Artificial Intelligence Review, 33 (2010) 61-106.
997 998
[31] S. Das, P.N. Suganthan, Differential Evolution: A Survey of the State-of-the-Art, IEEE Transactions on Evolutionary Computation, 15 (2011) 4-31.
999 1000
[32] S. Das, S.S. Mullick, P.N. Suganthan, Recent advances in differential evolution – An updated survey, Swarm and Evolutionary Computation, 27 (2016) 1-30.
1001 1002
[33] Y.J. Gong, Y. Zhou, Differential Evolutionary Superpixel Segmentation, IEEE Transactions on Image Processing, 27 (2018) 1390-1404.
1003 1004 1005
[34] M.Z. Ali, N.H. Awad, P.N. Suganthan, R.G. Reynolds, An Adaptive Multipopulation Differential Evolution With Dynamic Population Reduction, IEEE Transactions on Cybernetics, 47 (2017) 2768-2779.
1006 1007 1008
[35] U.M. Nunes, D.R. Faria, P. Peixoto, A human activity recognition framework using max-min features and key poses with differential evolution random forests classifier, Pattern Recognition Letters, 99 (2017) 21-31.
1009 1010 1011
[36] A. Majed, Z. Salam, A.M. Amjad, Harmonics elimination PWM based direct control for 23-level multilevel distribution STATCOM using differential evolution algorithm, Electric Power Systems Research, 152 (2017) 48-60.
1012 1013 1014 1015
[37] D. Teijeiro, X.C. Pardo, D.R. Penas, P. González, J.R. Banga, R. Doallo, A cloud-based enhanced differential evolution algorithm for parameter estimation problems in computational systems biology, Cluster Computing, 20 (2017) 19371950.
1016 1017 1018
[38] L. Jebaraj, C. Venkatesan, I. Soubache, C.C.A. Rajan, Application of differential evolution algorithm in static and dynamic economic or emission dispatch problem: A review, Renewable and Sustainable Energy Reviews, 77 (2017) 1206-1220.
1019 1020 1021
[39] S. Das, A. Abraham, A. Konar, Automatic Clustering Using an Improved Differential Evolution Algorithm, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 38 (2008) 218-237.
1022 1023 1024
[40] K. Suresh, D. Kundu, S. Ghosh, S. Das, A. Abraham, S.Y. Han, Multi-objective differential evolution for automatic clustering with application to micro-array data analysis, Sensors (Basel, Switzerland), 9 (2009) 3981-4004.
1025 1026 1027 1028 1029
[41] G. Martinović, D. Bajer, Data Clustering with Differential Evolution Incorporating Macromutations, in: B.K. Panigrahi, P.N. Suganthan, S. Das, S.S. Dash (Eds.) Swarm, Evolutionary, and Memetic Computing: 4th International Conference, SEMCCO 2013, Chennai, India, December 19-21, 2013, Proceedings, Part I, Springer International Publishing, Cham, 2013, pp. 158-169.
50
1030 1031 1032
[42] M. Hosseini, M. Sadeghzade, R. Nourmandi-Pour, An efficient approach based on differential evolution algorithm for data clustering, Decision Science Letters, 3 (2014) 319-324.
1033 1034 1035
[43] M.B. Bonab, S.Z. Hashim, N.E.N. Bazin, A.K.Z. Alsaedi, An Effective Hybrid of Bees Algorithm and Differential Evolution Algorithm in Data Clustering, Mathematical Problems in Engineering, vol.2015 (2015) 17 pages.
1036 1037
[44] J. Tvrdík, I. Křivý, Hybrid differential evolution algorithm for optimal clustering, Appl. Soft. Comput., 35 (2015) 502-512.
1038 1039 1040
[45] W.-l. Xiang, N. Zhu, S.-f. Ma, X.-l. Meng, M.-q. An, A dynamic shuffled differential evolution algorithm for data clustering, Neurocomputing, 158 (2015) 144154.
1041 1042 1043
[46] M. Ramadas, A. Abraham, S. Kumar, FSDE-Forced Strategy Differential Evolution used for data clustering, Journal of King Saud University-Computer and Information Sciences, (In Press, Corrected Proof) (2016).
1044 1045
[47] K.Y. Kok, P. Rajendran, Differential-evolution control parameter optimization for unmanned aerial vehicle path planning, PloS one, 11 (2016) e0150558.
1046 1047
[48] E.H. Ruspini, Numerical methods for fuzzy clustering, Information Sciences, 2 (1970) 319-350.
1048 1049
[49] M. Lichman, UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, in, 2013.
1050 1051 1052
[50] B. Jiang, N. Wang, L. Wang, Particle swarm optimization with age-group topology for multimodal functions and data clustering, Communications in Nonlinear Science and Numerical Simulation, 18 (2013) 3134-3145.
1053 1054 1055
[51] J. Hodges, E.L. Lehmann, Rank methods for combination of independent experiments in analysis of variance, The Annals of Mathematical Statistics, 33 (1962) 482-497.
1056 1057 1058
[52] S. Holm, A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics, (1979) 65-70.
51