Expert Systems with Applications 41 (2014) 5960–5971
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
An interval weighed fuzzy c-means clustering by genetically guided alternating optimization Liyong Zhang a,⇑, Witold Pedrycz b, Wei Lu a, Xiaodong Liu a, Li Zhang c a
School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6R 2V4 AB, Canada c School of Information, Liaoning University, Shenyang 110036, China b
a r t i c l e
i n f o
Keywords: Fuzzy clustering Attribute weighting Interval number Genetic algorithm Alternating optimization
a b s t r a c t The fuzzy c-means (FCM) algorithm is a widely applied clustering technique, but the implicit assumption that each attribute of the object data has equal importance affects the clustering performance. At present, attribute weighted fuzzy clustering has became a very active area of research, and numerous approaches that develop numerical weights have been combined into fuzzy clustering. In this paper, interval number is introduced for attribute weighting in the weighted fuzzy c-means (WFCM) clustering, and it is illustrated that interval weighting can obtain appropriate weights more easily from the viewpoint of geometric probability. Moreover, a genetic heuristic strategy for attribute weight searching is proposed to guide the alternating optimization (AO) of WFCM, and improved attribute weights in interval-constrained ranges and reasonable data partition can be obtained simultaneously. The experimental results demonstrate that the proposed algorithm is superior in clustering performance. It reveals that the interval weighted clustering can act as an optimization operator on the basis of the traditional numerical weighted clustering, and the effects of interval weight perturbation on clustering performance can be decreased. 2014 Elsevier Ltd. All rights reserved.
1. Introduction Clustering is a process to partition a data set into groups with the principle that similar objects are within the same cluster and dissimilar objects are in different clusters. Fuzzy clustering introduces the concept of membership into data partition, for the reason that membership can indicate the degree to which an object belongs to the clusters definitely, and actually represents the data partition more clearly. The fuzzy c-means (FCM) algorithm, as one of the basic unsupervised clustering approaches, was proposed by Dunn and developed by Bezdek et al., and many valuable results have been achieved so far on aspects such as distance metric (Hathaway, Bezdek, & Hu, 2000; Zhang et al., 2012), optimization calculation (Cura, 2012; Hruschka et al., 2009; Izakian & Abraham, 2011; Sheng, et al., 2005), computational efficiency (Fan, Zhen, & Xie, 2004), parameter selection (Yu, Cheng, & Huang, 2004), and clustering validity (Baskir & Turksen, 2013; Sheng et al., 2005). In particular, Pimentel and Souza (2013) proposed a multivariate fuzzy c-means algorithm by introducing the different membership degrees for individuals from one attribute ⇑ Corresponding author. Tel.: +86 411 84707335. E-mail address:
[email protected] (L. Zhang). http://dx.doi.org/10.1016/j.eswa.2014.03.042 0957-4174/ 2014 Elsevier Ltd. All rights reserved.
to another, which achieved the fusion of the inherent information in each attribute for the FCM algorithm. However, due to the differences in natural properties and structural characteristics of different data sets, clustering is a difficult and challenging problem. To partition a set of object data {x1, x2, . . . , xn} Rs into clusters, the unsupervised FCM algorithm always treats each attribute j (1 6 j 6 s) as equally important, and minimizes the clustering objective function by adopting common Euclidean distance between objects and cluster prototypes, which can affect the clustering performance. As an improvement, attribute weighted fuzzy c-means (WFCM) algorithms are proposed by introducing weighted Euclidean distance to emphasize the contribution of important attributes, and may help to partition the data into more meaningful clusters (Frigui & Nasraoui, 2004). In WFCM, each attribute is distributed a value to represent its importance degree in clustering, and the clustering performance can be significantly degraded if irrelevant attribute weights are used. For example, Wang, Wang, and Wang (2004) assigned weights (0, 0, 1, 1) and (1, 1, 0, 0) to the four attributes of the IRIS data respectively, and presented the clustering in these two extreme cases, which demonstrated the effects of weight assignment on clustering performance. So, finding a relative reasonable attribute weighting approach in weighted fuzzy clustering becomes an important issue.
5961
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971
With the further and continuous research on weighted fuzzy clustering algorithms, many attribute weighting approaches have been incorporated into clustering. ReliefF is considered to be a successful algorithm for assessing the quality of attributes due to its simplicity and effectiveness (Sun, 2007). And Li, Gao, and Jiao (2005) used the ReliefF algorithm to assign weights to the attributes in the weighted fuzzy clustering algorithm. Combining with attribute weighting by using ReliefF, Bao, Han, and Wu (2006) further assigned a weight to each object datum based on the probability density of object data, and proposed a general weighted fuzzy clustering algorithm. Attribute weight learning based on evaluation index is another important method to generate weights, and Yeung and Wang (2002) and Wang et al. (2004) gave each attribute a weight by minimizing an attribute evaluation index E(w) through gradient descent technique, then applied the weights to improve the clustering performance. And this approach was also adopted in the literatures such as Chang, Fan, and Wang (2008), Chang, Fan, and Dzan (2010) and Fan et al. (2011). Furtherly, Wang et al. (2006) introduced another attribute evaluation index CFuzziness(w), and proved that CFuzziness(w) performed slightly better than E(w) in terms of the clustering results. Besides, Frigui and Nasraoui (2004) proposed an approach that performed attribute weighting and clustering analysis simultaneously in an unsupervised manner, which assigned different relevant weights for different categories of a data set. Li et al. (2007) and Shen et al. (2006) obtained the attribute weights and clustering results simultaneously by minimizing the weighted FCM clustering objective function and the weighted fuzzy kernel-clustering objective function respectively. Pimentel and Souza (2014) proposed a weighted multivariate fuzzy c-means method for interval data, in which the weights of the memberships regarding cluster prototypes and attributes were determined by minimizing the corresponding clustering objective function under the restriction. With the development of rough set theory, attribute weighting by rough set was brought into fuzzy clustering, which was specially suitable for the cases in which attributes need to be largely reduced (Han, Gao, & Ji, 2006; Li & Gao, 2009). Considering that the value of attribute weights should become larger when the classification structure is clearer and become smaller when the classification structure is more unclear, Mika (2007) and Mika (2008) proposed a membership-based attribute weighting approach, in which the initial membership could be obtained by standard FCM. The above approaches used in clustering obtain attribute weights by developing mathematical models based on specific criteria and information content, and belong to the objective approaches. On the other hand, the subjective approaches select weights based on preference information, experience and knowledge of decision-makers. The analytic hierarchy process (AHP) proposed by Saaty is one of the most influencing subjective approaches to weight attributes, which can realize system analysis qualitatively and quantitatively, and has been widely used in multi-attribute decision making problems. Combined with the weights determined by AHP, Chen et al. (2009) presented an investigation on spatial fuzzy clustering, and Zhang et al. (2007) proposed a weighted fuzzy c-means algorithm, which was also applied in customer classification. Besides, Bandeira, Sousa, and Kaymak (2003) proposed to weight the attributes based on expert knowledge and some trial-and-error experiments, and then added the weights to fuzzy clustering techniques. In the literature of weighted fuzzy clustering mentioned above, all the attribute weighting approaches incorporated in clustering develop numerical weights. Different objective approaches can lead to different weights based on different principles, and as for the subjective approaches, attribute weights will be certainly different from different subjective judgments. Thus, it is hard to say which numerical weights, obtained by different approaches, are
more appropriate in weighted clustering. While, in the field of decision analysis, due to the complexity and uncertainty involved in real-world decision problems, the concept of interval weight has been proposed and received growing attention (Chen & Miao, 2011; Entani & Tanaka, 2007; Wang & Elhag, 2007; Wang, Yang, & Xu, 2005). In this paper, interval attribute weights are incorporated into weighted clustering. When solving fuzzy clustering, attribute weights can be viewed as variables which are constrained by intervals, and a hybrid optimization strategy of genetic heuristic mechanism and gradient-based alternating iteration is proposed, which can obtain attribute weights in interval-constrained ranges and data partition simultaneously. The rest of the paper is organized as follows. Section 2 presents a short description of the numerical WFCM algorithm. Section 3 discusses the interval attribute weight. The novel interval weighted fuzzy c-means algorithm is introduced in Section 4, with focus on the solving for attribute weights and data partition based on genetically guided alternating optimization. Section 5 presents clustering results of several well-known data sets and a comparative study with numerical WFCM algorithms. Finally, conclusions are drawn in Section 6. 2. Preliminary knowledge: numerical weighted fuzzy c-means algorithm The numerical weighted fuzzy c-means (WFCM) algorithm partitions a set of object data fx1 ; x2 ; . . . ; xn g Rs into c-(fuzzy) clusters. For the given attribute weight vector w ¼ ½w1 ; w2 ; . . . ; ws T , P "j:wj > 0, and usually sj¼1 wj ¼ 1, the WFCM algorithm minimizes the objective function (Wang et al., 2004)
JðU; VÞ ¼
c X n X 2 um ik kxk v i kw
ð1Þ
i¼1 k¼1
with the constraint of c X uik ¼ 1;
k ¼ 1; 2; . . . ; n;
ð2Þ
i¼1
where xk = [x1k, x2k, . . . , xsk]T is an object datum, and xjk is the jth attribute value of xk; vi is the ith cluster prototype, Vi 2 Rs , and let matrix of cluster prototypes V = [vji] = [v1, v2, . . . , vc] 2 Rsc ; uik is the membership that represents the degree to which xk belongs to the ith cluster, "i, k:uik 2 [0, 1], and let partition matrix U ¼ ½uik 2 Rcn ; m is a fuzzification parameter, m 2 (1, 1); kkw denotes the weighted Euclidean distance between xk and vi, which is defined as
h i1=2 kxk v i kw ¼ ðxk v i ÞT wT wðxk v i Þ ;
ð3Þ
where W = diag[w1, w2, . . . , ws] is a diagonal matrix. Using the Lagrange multiplier method, let the Lagrange function be
J a ðU; VÞ ¼
! c X n n c X X X 2 ; um kx v k þ k u 1 i w k k ik ik i¼1 k¼1
k¼1
ð4Þ
i¼1
where k ¼ ½k1 ; k2 ; . . . ; kn T is Lagrange multiplier. The necessary conditions for minimizing (1) with the constraint of (2) are the update equations as follows (Wang et al., 2004):
vi ¼ and
uik ¼
Pn m uik xk Pk¼1 ; n m k¼1 uik
i ¼ 1; 2; . . . ; c
2
c X kxk v i k2w 4 t¼1
kxk v t k2w
1 31 !m1 5 ;
ð5Þ
i ¼ 1; 2; . . . ; c;
k ¼ 1; 2; . . . ; n: ð6Þ
5962
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971
When using the WFCM algorithm to partition an object data set, numerical weights should be determined firstly during the initialization procedure by subjective weighting approaches or objective weighting approaches. The procedure of WFCM can be described as follows. Step (1) Determine the numerical weight vector w by the objective weighting approaches or the subjective weighting approaches. Step (2) Choose m, c and e, where e > 0 is a small positive constant; initialize partition matrix U(0). Step (3) When the iteration index is l (l = 1, 2, . . .), calculate matrix of cluster prototypes V(l) using (5) and U(l1). Step (4) Update partition matrix U(l) using (6) and V(l), w. ðlÞ ðl1Þ Step (5) If 8i; k : max uik uik < e, then stop and get partition matrix U and matrix of cluster prototypes V; otherwise set l = l + 1 and return to Step (3).
judgment, which can be illustrated by Fig. 1(a). In Fig. 1(a), the blue unit cube W denotes the weight space. (i) Let square shadow region W represent the set of appropriate weights, and if the numerical weights determined by objective approaches or subjective approaches are represented by a vector w, which is denoted by a point in Fig. 1(a), we can see that if w is reasonable, then w 2 W must be satisfied. (ii) Let us consider the case that attribute weights are determined to be intervals, in this case the black cuboid W0 in Fig. 1(a) denotes the region bounded by interval weights. If there exist appropriate numerical weights in W0 , then, when incorporating interval weights into weighted fuzzy clustering, some of these appropriate numerical weights are likely to be found if proper approaches are employed, thus clustering performance can be improved. So, the necessary condition that W0 should be rational is that the intersection of W0 and W is not an empty set, that is, W0 \ W –/.
When the attribute weights in vector w are equal, the WFCM algorithm is standard FCM.
Let the length, width, height of W0 and W be a0 , b0 , c0 and a⁄, b⁄, c respectively, and if we do not use any approach and only determine the numerical weights and interval weights randomly, then the probability of the random event w 2 W can be readily derived by using geometric probability and given as ⁄
3. Interval attribute weights 3.1. Superiority of interval weights From the view of cognition, attribute weighting should be relative and fuzzy, and this kind of fuzziness can be described as interval numbers. And it is relatively easy to provide a fuzzy interval
Pðw 2 W Þ ¼
VðW Þ ¼ a b c ; VðWÞ
ð7Þ
w3 1
1
w
w2
1
w1
(a) w3
w3
1
A C 1
2
w2
w2
1
1
w1
w1
(b)
(c)
Fig. 1. Interval weights and appropriate weights with three-dimensional attributes as a schematic.
5963
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971
where VðW Þ and VðWÞ are the volumes of W and W respectively. Furthermore, we also suppose that W0 and W are sufficiently small relative to W, that is, no matter how W0 and W are intersected, W0 W is always satisfied. In Fig. 1(b), the lengths of the red cuboid W1 are 1 – a0 , 1 b0 and 1 c0 respectively, and the point A is a vertex of W0 . It can be seen that W0 W holds if and only if A 2 W1 is satisfied. In Fig. 1(c), the lengths of the red cuboid W2 are a⁄ + a0 , b⁄ + b0 and c⁄ + c0 respectively, and the point C is a vertex of W0 . It can be seen that W0 \ W –/ holds if and only if C 2 W2 is satisfied. So the probability of the random event W0 \ W –/ is
PðW0 \ W –/Þ ¼
0
VðW2 Þ ða þ a0 Þðb þ b Þðc þ c0 Þ ; ¼ 0 VðW1 Þ ð1 a0 Þð1 b Þð1 c0 Þ
ð8Þ
where VðWÞ1 and VðWÞ2 are the volumes of W1 and W2 respectively. It can be seen that PðW0 \ W –/Þ P Pðw 2 W Þ, and if and only 0 if a0 ¼ b ¼ c0 ¼ 0, that is, interval weights degenerate into numerical ones, the equality holds. It also explains that it is easier to obtain appropriate weights when interval number, which can be viewed as a more suitable description for attribute weights, is adopted to weight attributes.
3.2. Interval weight determination In this paper, the analytic hierarchy process (AHP) is used to determine the interval attribute weight. AHP proposed by Saaty has been devised as a mathematical-based technique to analyze complex situations. Since its introduction, the method has received considerable attention and wide application in a variety of areas (Monti & Carenini, 2000; Wang, Elhag, & Hua, 2006). The traditional analytic hierarchy process is based on the elicitation from the decision-makers of pairwise judgments of the importance of different attributes, and in most situations, the pairwise comparisons contain the subjective preferences so that it will affect the evaluation results. Fortunately, many objective weighting approaches can present prior knowledge for the pairwise comparisons, which will provide reference for decision-makers to set pairwise comparison matrix. And now, the pairwise comparisons can be obtained by applying axiomatic fuzzy set (AFS) theory on the basis of converting the data information into fuzzy sets and their logic operations, which can reduce the effect of the subjective preferences on the evaluation results (Tao, Chen, Liu, et al., 2012). Here, the results of AHP are used as interval centers directly. The procedure of AHP can be described as follows (Saaty, 2003). Step (1) Pairwise compare the s attributes using the 1–9 scale, and construct an s s numerical judgment matrix A. Step (2) Calculate the largest eigenvalue kmax ðAÞ of A, and check the consistency ratio (C.R.) of A
C:R: ¼
kmax ðAÞ s ; ðs 1ÞR:I:
ð9Þ
where R.I. is the random index, which can be obtained in relevant lookup table based on the number of attributes. If C.R. < 0.05, the consistency level of A is acceptable. The consistency ratio can provide a measure of the consistency of judgments, which is one of the notable features of AHP. Step (3) Calculate the principal right eigenvector w of A, which satisfies Aw ¼ kmax ðAÞw, and interval centers can be obtained by normalizing w.
As for interval ranges, the uncertainty level is utilized, which can be scaled by the consistency ratio (C.R.) in the pairwise comparison of attributes. If the value of C.R. obtained by (9) is smaller, that is, the uncertainty level is lower in the pairwise comparison of attributes, which indicates the reliability of interval centers to be higher, then the extended interval ranges by taking the interval centers as benchmarks should be limited. Conversely, if the value of C.R. obtained is larger, then the extended interval ranges should be wider. Let w = [w1, w2, . . . , ws]T be the interval center vector, and aj, bj be the left and right endpoints of the j th interval weight for j = 1, 2, . . . , s. And the interval ranges are determined as follows: Rule 1: IF C.R. < 0.01, THEN aj = wj (1 10%), and bj = min {wj (1 + 10%), 1}; Rule 2: IF 0:01 6 C:R: < 0:03, THEN aj = wj (1 10 C.R.), and bj = min {wj (1 + 10 C.R.), 1}; Rule 3: IF 0:03 6 C:R: < 0:05, THEN aj = wj (1 30%), and bj = min {wj (1 + 30%), 1}. As can be seen from the above rules, the extended interval ranges vary linearly with the value of C.R. on the whole, under the premise that interval weights are contained in the closed interval [0, 1]. However, when C.R. is less than 0.01, interval ranges will not decrease with the decrease of C.R. to ensure enough interval lengths, which can reduce the dependence of clustering on numerical weights (namely interval centers). And when C.R. is larger than 0.03, interval ranges will not increase with the increase of C.R. to avoid oversized intervals, which is beneficial to avoid clustering from falling into unnecessary local minimum out of the set of appropriate weights. 4. Interval weighted fuzzy c-means clustering 4.1. Algorithm formulation When incorporating interval weights into the fuzzy c-means clustering, attribute weights can be viewed as interval-constrained variables. The formulation of the interval weighted fuzzy c-means algorithm is given as follows: for a set of object data fx1 ; x2 ; . . . ; xn g Rs , let attribute weight vector w = [w1, w2, . . . , ws]T, where "j:wj 2 [aj, bj], aj, bj 2 [0, 1], and satisfies s X wj ¼ 1:
ð10Þ
j¼1
With the constraints of (2), (10) and wj 2 [aj, bj], calculate U, V and w by minimizing the following objective function:
JðU; V; wÞ ¼
c X n X 2 um ik kxk v i kw :
ð11Þ
i¼1 k¼1
If we assume that no wj is constrained by interval, then, the Lagrange multiplier method can be used to explore the solution of this problem. Let the Lagrange function be c X n n c X X X 2 J a ðU; V; wÞ ¼ um kk uik 1 ik kxk v i kw þ i¼1 k¼1
! s X þg wj 1 ;
k¼1
!
i¼1
ð12Þ
j¼1
where k ¼ ½k1 ; k2 ; . . . ; kn T and g are Lagrange multipliers. The necessary conditions for minimizing the objective function (11) with the constraints of (2) and (10) are (5), (6) and the following
5964
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971
8" # 2 " #1 391 c X n s c X n < X = X X 2 2 m m 4 5 wj ¼ uik ðxjk v ji Þ uik ðxtk v ti Þ ; : i¼1 k¼1 ; t¼1 i¼1 k¼1 j ¼ 1; 2; . . . ; s:
ð13Þ
Cyclicly iterate (5), (6), and (13), we can readily obtain U, V as well as w. And this is the idea that Li et al. (2007) and Shen et al. (2006) used to obtain the clustering and attribute weights. But actually, all wj for j = 1, 2, . . . , s are constrained by intervals. Considering that gradient-based algorithms are not theoretically applicable when solving optimization constrained by bounded closed sets, and this is where the evolutionary algorithms, such as genetic algorithm, specialize in. The evolutionary algorithms do not need to use the gradient-based information of the objective function, and the coding representation is based on limited ranges, which make them quite suitable to handle the interval-constrained attribute weight variables. So in this paper, we propose an interval weighted fuzzy c-means clustering by genetically guided alternating optimization (GIWFCM), in which the attribute weights are searched by genetic algorithm in corresponding interval ranges for optimized solutions to guide the clustering process, and the cluster prototypes and data partition are obtained by using the alternating optimization (AO) scheme of the numerical WFCM. 4.2. Custom-built genetic scheme Genetic algorithm (GA) is a population based stochastic global optimization technique inspired by the biological mechanisms of evolution and heredity. And here we employ genetic algorithm to optimize the interval weighted fuzzy c-means clustering model. In our algorithm, only the attribute weights are encoded, which is allowed to evolve to achieve the desired optimal or sub-optimal attribute weights in corresponding interval ranges using a set of genetically motivated operations. Although the coding is independent of partition matrix U and matrix of cluster prototypes V, attribute weights and data partition can be obtained simultaneously. And due to its small coding length and small-scale genetic operations, the algorithm is of high computational efficiency. In the following sections, we will describe how to encode the attribute weight vector, how to handle equality constraints, how the solutions are initially created, how they evolve during the optimization process, and how the process terminates. 4.2.1. Genetic representation and initialization In the genetic clustering applications, binary and real parameter representations are commonly used (Liu et al., 2004; Mukhopadhyay, Maulik, & Bandyopadhyay, 2009; Sheng et al., 2005). In general, the results with a real-valued GA are about the same as those using a binary representation (Hall, Ozyurt, & Bezdek, 1999), thus the overall conclusions of this paper are independent of the representation. Compared with the binary representation, real representation can provide direct coding and omit the encoding/decoding operations, besides, higher precision can be achieved due to the continuous code, and the smaller chromosome length in real-coded GA can lead to higher computational efficiency. For the reasons mentioned above, the chromosome is composed of a sequence of real-valued numbers that represent the attribute weights. Let E be the population and M be the population size, and for a set of object data fx1 ; x2 ; . . . ; xn g 2 Rs , the pth individual chromosome of the population at generation t has s components, i.e.,
Ep ðtÞ ¼ ½wp;1 ; wp;2 ; . . . ; wp;s ;
1 6 p 6 M;
ð14Þ
where wp,j (1 6 j 6 s) is the jth attribute weight of the pth individual, and should satisfy its interval constraint, that is, wp,j 2 [aj, bj]. Note that the coding is independent of partition matrix U and
matrix of cluster prototypes V, so the coding scheme can help avoid large costs both in storage and in computation time. Considering our determination of interval attribute weights, the interval centers can be regarded as preferred weights. To merge the preferred weights into the initial population, the first chromosome can be initialized as the interval centers, and all the other initial solutions are randomized in corresponding interval-constrained ranges, that is,
E1 ð1Þ ¼ ½ða1 þ b1 Þ=2; ða2 þ b2 Þ=2; . . . ; ðas þ bs Þ=2;
ð15Þ
Ep ð1Þ ¼ ½randða1 ; b1 Þ; randða2 ; b2 Þ; . . . ; randðas ; bs Þ; p ¼ 2; 3; . . . ; M: ð16Þ It is clear that the above initialization strategy can help the attribute weights to evolve on the basis of the interval centers, and clustering performance will be improved if more rational numerical weights in intervals can be obtained. Even if we fail to find more rational numerical weights, the interval centers are very likely to be retained and act as attribute weights in weighted clustering, so we can still get acceptable clustering results. Therefore, by taking such an initialization strategy, even if the optimization ability of the genetic operations is limited, the reliability of clustering performance can still be guaranteed. 4.2.2. Fitness function Generally speaking, each individual Ep(t) = [wp,1, wp,2, . . . , wp,s] (1 6 p 6 M) should satisfy the following constraint: s X wp;j ¼ 1:
ð17Þ
j¼1
Accordingly, this is an equality-constrained optimization problem. Usually, the simplest approach in genetic algorithm to handling the constraint (17) is to discard the infeasible solutions as soon as they are generated. It is obviously that the approach is time demanding, and does not utilize the information contained in infeasible solutions. The method we adopted here is to impose a penalty onto infeasible individuals and assign them low fitness, which provides a way of guiding the search toward feasible solutions. In this way, penalty term is added to the fitness function, and the invalid solutions are considered as valid but are penalized according to the degree of violation of the constraint (Petridis, Kazarlis, & Bakirtzis, 1998). The fitness metric is defined as the reciprocal of the following function:
X c X n s X 2 m FðEp ðtÞÞ ¼ uik kxk v i kw þ K wp;j 1; i¼1 k¼1 j¼1
ð18Þ
where K is a penalty gain. The fitness function has two components. The first component is the objective function of interval weighted fuzzy c-means (11), which allows us to obtain compact clusters with the assistance of alternating optimization scheme of fuzzy cmeans. The second component in Eq. (18) is the penalty term, and the penalty gain K determines the satisfactory extent of the constraint (17). Generally speaking, the larger the value of K is, the closer the sum of wp,j for j = 1, 2, . . . , s should be to 1; however, a large value of K will weaken the influence of clustering process on fitness metric. Taking the clustering of IRIS data as an example, Fig. 2 shows the effect of K on the sum of attribute weights in a logarithmic coordinate. From Fig. 2, it can be seen that when K is set to a value larger than a certain constant, the constraint (17) can always be satisfied, which makes it easier to choose the value of K. And by this means, the equality-constrained attribute weighted clustering can be converted into an easier unconstrained problem.
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971
Fig. 2. Sum of attribute weights trend line with the value of K.
4.2.3. Genetic operators Genetic algorithms are inspired by the biological process of Darwinian evolution and genetics, where selection, crossover and mutation play major roles. After the evolution process, the final generation of population consists of highly fit individuals that provide optimal or near optimal solutions. 4.2.3.1. Selection. As an artificial version of nature selection, the operator allows the chromosomes with better fitness have higher probabilities of being selected in the next generation. And one of the popular selection schemes is roulette wheel, which is employed here. After ascending sorting the individuals according to their fitness values, the selection probability of each individual is defined as follows:
Pselection ðEp ðtÞÞ ¼
2p ; MðM þ 1Þ
p ¼ 1; 2; . . . ; M:
ð19Þ
4.2.3.2. Crossover. Crossover requires two individuals to exchange their genetic composition, so that the information between different candidate solutions can be recombined. Here, we adopt whole arithmetic crossover mechanism, in which an arithmetic crossover multiplier c 2 [0, 1] is introduced, and crossover offspring of the continuous pth and (p + 1)th chromosomes at generation t are
Ep ðt þ 1Þ ¼ cEpþ1 ðtÞ þ ð1 cÞEp ðtÞ; Epþ1 ðt þ 1Þ ¼ cEp ðtÞ þ ð1 cÞEpþ1 ðtÞ;
ð20Þ 1 6 p; p þ 1 6 M:
5965
Step (1) Determine the interval endpoints aj and bj for attribute weight wj (j = 1, 2, . . . , s). Step (2) Set the genetic population size M, maximal number of generations G, and crossover probability Pc, mutation probability Pm. Initialize the genetic population Ep(1) (p = 1, 2, . . . , M) by (15) and (16). Step (3) When the genetic generation index is t (t = 1, 2, . . . , G), for each chromosome Ep(t) (1 6 p 6 M), calculate VðtÞ and p UðtÞ p by using the alternating optimization of (5) and (6). Step (4) Calculate the fitness value of each chromosome Ep(t) (1 6 p 6 M) by using (18), and ascending sort the individuals by their fitness values. Save the two individuals with best fitness. Step (5) Perform roulette wheel selection according to the selection probability defined as (19); perform whole arithmetic crossover using (20) and (21) according to the crossover probability Pc; perform uniform mutation according to the mutation probability Pm. Step (6) If genetic generation index t = G, then stop and get appropriate attribute weights, and the corresponding clustering results; otherwise set t = t + 1 and return to Step (3). 4.4. Illustration 4.4.1. Characteristic of GIWFCM algorithm The overall framework of GIWFCM is shown in Fig. 3. Firstly, based on the attribute weights represented by chromosome Ep(t) (t = 1, 2, . . . , G), the algorithm calls the numerical WFCM as described in Section 2, and the corresponding partition matrix ðtÞ UðtÞ p and matrix of cluster prototypes Vp can be obtained by using alternating optimization, as shown by the 1st arrows in Fig. 3. Secondly, the fitness value F(Ep(t)) should be calculated by substitutðtÞ ing VðtÞ as well as Ep(t) into (18), as shown by the 2nd p , Up arrows. Finally, F(Ep(t)) can be used to guide the genetic selection to generate the next population Ep(t + 1), as shown by the 3rd arrows. The genetic iteration will continue until the maximal number of generations G is reached. It can be seen that, the overall algorithm is within the GA framework, and can obtain the AO clustering results of the weighted fuzzy c-means algorithm under the appropriate weights in interval ranges.
ð21Þ
The convex combination can guarantee the legality of the resulting offspring. 4.2.3.3. Mutation. The operation is applied to each child individually after crossover, which randomly alters some individuals with a small probability. Mutation provides a means to increase the population diversity. And uniform mutation is used here, in which each elements wp,j of a random selected chromosome Ep(t) (1 6 p 6 M) is replaced by a random number in the corresponding range [aj, bj]. 4.2.4. Termination condition The processes of the above genetic operations are usually repeated until some termination condition is satisfied, which may be specified as the attainment of an acceptable fitness level or a maximal number of generations. And in our implementation, we employ the latter criteria. 4.3. Algorithm procedure of interval weighted fuzzy clustering In the following, the algorithm procedure of the interval weighted fuzzy c-means clustering by genetically guided alternating optimization (GIWFCM) is presented.
4.4.2. The relation between GIWFCM and numerical weighted fuzzy clustering To begin with, let us clarify the two numerical weighted fuzzy c-means algorithms mentioned above: (i) Weighted fuzzy c-means algorithm based on AHP (AWFCM). In AWFCM, attribute weight vector w is a constant determined by AHP in advance (Chen et al., 2009; Zhang et al., 2007). At present, attribute weights applied to weighted fuzzy clustering are almost determined by subjective approaches or objective approaches before clustering, thus, AWFCM can be viewed as a representative of such approaches. (ii) Weighted fuzzy c-means algorithm based on clustering objective function minimization (OWFCM). In OWFCM, attribute weight vector w is a variable, which can be obtained by cyclically iterating (5), (6), and (13) to make the clustering objective function (11) reach its minimum (Li et al., 2007; Shen et al., 2006), but w is not constrained by intervals. It is clear that attribute weights are not determined before clustering in OWFCM, and it is a typical idea to obtain attribute weights along with cluster prototypes and memberships simultaneously.
5966
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971
Fig. 3. Overall framework of GIWFCM.
5.1. Experimental results on IRIS
Fig. 4. Schematic of the relation between GIWFCM and AWFCM, OWFCM.
In the following, we consider the two extreme cases of the interval ranges in GIWFCM to illustrate the relation between GIWFCM and AWFCM, OWFCM, as shown in Fig. 4. From Fig. 4, it can be seen that in GIWFCM: (i) If the two endpoints of interval weights tend to the corresponding interval centers infinitely, then the interval weights are numerical ones, and the GIWFCM algorithm degenerates to the AWFCM algorithm. (ii) If the two endpoints of interval weights tend to 0 and 1 respectively, then the attribute weights are equivalent to being not interval-constrained, and the GIWFCM algorithm degenerates to the OWFCM algorithm. 4.4.3. The time complexity (i) The time complexity of the standard FCM algorithm (Kolen & Hutcheson, 2002; Pimentel & Souza, 2013) is O(nc2s), where n is the number of object data, c is the number of clusters, and s is the dimension of data vectors. (ii) The time complexity of the AWFCM algorithm is also O(nc2s), which is same as that of the standard FCM algorithm, due to the attribute weights in AWFCM being determined in advance and independent of the iterative process. (iii) In OWFCM, the calculation of partition matrix U requires nc2s operations, and the calculation of attribute weight vector w requires ncs2 operations. In conclusion, the time complexity of the OWFCM algorithm is O(ncs(c + s)). (iv) In GIWFCM, most of the time is spent performing the WFCM algorithm. Therefore, the time complexity of the GIWFCM algorithm is O(MNnc2s), where N is the number of iterations in WFCM, and M is the population size. 5. Experiments and discussions In the experiments presented below, we test the performance of the proposed GIWFCM on three well-known data sets: IRIS, CrudeOil and New-thyroid (Frank & Asuncion, 2014; Johnson & Wichern, 1982). These data sets are often used as standard databases to test the performance of clustering algorithms.
The IRIS data set contains 150 four-dimensional attribute vectors, depicting four attributes of IRIS, which include Petal Length, Petal Width, Sepal Length and Sepal Width. The three IRIS classes involved are Setosa, Versicolor and Virginica, each containing 50 vectors, in which, Setosa is well separated from the others, while Versicolor and Virginica are not easily separable due to the overlapping of their vectors. Hathaway and Bezdek (1995) resented the actual cluster prototypes of the IRIS data: v 1 ¼ ð5:00; 3:42; 1:46; 0:24ÞT , v 2 ¼ ð5:93; 2:77; 4:26; 1:32ÞT , v3 ¼ ð6:58; 2:97; 5:55; 2:02ÞT . When using GIWFCM to partition the data sets, interval weights should be determined by the analytic hierarchy process (AHP) before clustering. Pairwise compare the attributes of the IRIS data according to the 1–9 scale, and we can get judgment matrix as follows:
2
1 1=2 1=9 1=6
62 6 AIRIS ¼ 6 49
1 5
6
3
3
1=5 1=3 7 7 7; 1 1 5 1
ð22Þ
1
where AIRIS is a 4 4 matrix, in which all the entries on diagonal are equal to 1, the entries aij for all 1 6 j < i 6 n below the diagonal represent the relative importance of the ith attribute to the jth attribute, and the entries above the diagonal satisfy the reciprocal property of pairwise comparisons, that is, aji = 1/aij. We get the consistency ratio C.R.IRIS = 0.0103, so AIRIS is a consistent matrix. The interval centers produced from judgment matrix AIRIS are 0.0554, 0.1082, 0.4657, 0.3708 in turn, and the interval weights by Rule 1–Rule 3 are [0.0497, 0.0611], [0.0971, 0.1193], [0.4177, 0.5137], [0.3326, 0.4090] respectively. Choose fuzzification parameter m = 2, the number of clusters c = 3, convergence threshold e = 106. And the genetic population size M = 60, maximal number of generations G = 100, crossover probability Pc = 0.6, mutation probability Pm = 0.1, arithmetic crossover multiplier c = 0.3, and penalty gain K = 500. The clustering results are listed in Table 1, in which the accuracy of results is evaluated by the error sum of squares (ESS) between the obtained cluster prototypes and the actual ones
ESS ¼
c X
kv i v i k22
ð23Þ
i¼1
and the number of misclassifications compared with the actual data partition. The clustering partition of IRIS data is described in Petal Length–Sepal Width plane, as shown in Fig. 5. In Fig. 5, the object
5967
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971 Table 1 Clustering performance on IRIS. Algorithm
Final attribute weights
Cluster prototypes T
FCM
w = (0.2500, 0.2500, 0.2500, 0.2500)
OWFCM
w = (0.1286, 0.1114, 0.4520, 0.3081)T
AWFCM
w = (0.0554, 0.1082, 0.4657, 0.3708)T
GIWFCM
w = (0.0571, 0.1058, 0.4505, 0.3867)T
T
v1 = (5.0033, 3.4029, 1.4894, 0.2512) v2 = (5.8712, 2.7402, 4.3450, 1.3718)T v3 = (6.7390, 3.0592, 5.5741, 2.0478)T v1 = (5.0059, 3.4104, 1.4745, 0.2473)T v2 = (5.9424, 2.7696, 4.3297, 1.3568)T v3 = (6.7005, 3.0334, 5.6354, 2.0806)T v1 = (5.0061, 3.4118, 1.4722, 0.2462)T v2 = (5.9539, 2.7702, 4.3268, 1.3500)T v3 = (6.6802, 3.0284, 5.6218, 2.0833)T v1 = (5.0061, 3.4118, 1.4722, 0.2460)T v2 = (5.9560, 2.7697, 4.3287, 1.3491)T v3 = (6.6773, 3.0283, 5.6184, 2.0852)T
2.5
2.5
2
2
1.5
1.5
1
0.5
0
Number of misclassifications
Iterations
0.0501
16
25
0.0363
6
21
0.0289
7
16
0.0283
5
100
Fig. 6 shows the variations of objective function values (18) for the optimal and suboptimal individuals in 100 generations when clustering the IRIS data. It can be seen that the optimal and suboptimal individuals can achieve uniformity within 10 generations and keep invariant in the further generations, which show that the convergence of the algorithm can be guaranteed.
Sepal Width
Sepal Width
data classified to Setosa, Versicolor and Virginica are represented by D, + and respectively; the three cluster prototypes are represented by stars. Object data that actually belong to Virginica but classified to Versicolor are represented by s; object data that actually belong to Versicolor but classified to Virginica are represented by h.
Error sum of squares
1
0.5
4
4.5
5
5.5
6 6.5 Petal Length
7
7.5
8
0
4
4.5
5
5.5
2.5
2.5
2
2
1.5
1.5
1
0.5
0
7
7.5
8
7
7.5
8
(b)
Sepal Width
Sepal Width
(a)
6 6.5 Petal Length
1
0.5
4
4.5
5
5.5
6 Petal Length
(c)
6.5
7
7.5
8
0
4
4.5
5
5.5
6 Petal Length
(d)
Fig. 5. Clustering partition of IRIS data. (a) FCM, (b) OWFCM, (c) AWFCM, (d) GIWFCM.
6.5
5968
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971 Time(s) 0.46
0
3.69
5.94
7.66
9.05
10.31
11.5
13.03
14.55
15.83
17.19
suboptimal individual optimal individual
0.44 0.42
Objective function value
0.4 0.38 0.36
and can get the consistency ratio C.R.Crude-Oil = 0.0126. The interval centers produced are 0.1537, 0.1112, 0.1317, 0.5355, 0.0678 in turn, and the corresponding interval weights determined by Rule 1–Rule 3 are [0.1343, 0.1731], [0.0972, 0.1252], [0.1151, 0.1483], [0.4680, 0.6030], [0.0593, 0.0763] respectively. Choose m = 2, c = 3, e = 106, M = 60, G = 100, Pc = 0.7, Pm = 0.1, c = 0.1, and K = 100. The clustering results are listed in Table 2. It can be seen from Table 2 that the number of misclassifications for GIWFCM is only 16, which is still superior to the other three clustering algorithms.
0.34
5.3. Experimental results on New-Thyroid
0.32 0.3 0.28 0.26
0
10
20
30
40
50 60 Generations
70
80
90
100
Fig. 6. Genetic iteration trend lines for the optimal and suboptimal individuals.
FCM, AWFCM and OWFCM all adopt the gradient-based alternating optimization, and the numbers of iterations of the above algorithms are listed in Table 1. By contrast, 100 generations of genetic operations in the GIWFCM algorithm obviously result in higher computational costs. From Fig. 6, the average CPU time per generation for GIWFCM is 0.1719 s (tested on a dual-core 2.53 GHz PC with 4 GB of RAM), which illustrates that the computational efficiency of our algorithm is acceptable. From Table 1 and Fig. 5, it can be seen that the final attribute weights determined are different by different algorithms, which result in different cluster prototypes and memberships. The error rate for GIWFCM is only 5/150, and its cluster prototypes are closer to the actual ones with the error sum of squares that reaches 0.0283, so the clustering performance is satisfying. 5.2. Experimental results on Crude-Oil The Crude-Oil has 56 data points and five attributes, which is a chemical analysis of Crude-Oil samples from three zones of sandstone. In this data set, 7 samples are from Wilhelm, 11 samples from Sub-Mulinia and 38 samples from Upper-Mulinia. We set the judgment matrix
2
ACrude-Oil
1 6 6 1=2 6 ¼6 6 1 6 4 4
2
1
1=4 2
3
7 1=5 2 7 7 1 1 1=4 2 7 7; 7 5 4 1 75 1=2 1=2 1=2 1=7 1 1
1
ð24Þ
The New-Thyroid data set comprises 215 patients from the same hospital, and for each of the samples, there are five attributes. The individuals are divided into three groups based on diagnosis results where there are 150 healthy individuals, 35 patients suffering from hyperthyroidism, and 30 from hypothyroidism. We set the judgment matrix as follows:
2
ANew-thyroid
1 1=2 1 1=5 1
6 62 6 ¼6 61 6 45
1
1
1
1 4
3
7 1 1=4 1 7 7 1 1=6 1 7 7: 7 6 1 35 1 1=3 1
ð25Þ
For this judgment matrix, the consistency ratio is C.R.New-thy= 0.0198. The interval centers produced are 0.1012, 0.1399, 0.1116, 0.5187, 0.1286 in turn, and the corresponding interval weights are [0.0812, 0.1212], [0.1122, 0.1676], [0.0895, 0.1337], [0.4160, 0.6214], [0.1031, 0.1541] respectively. Choose m = 2, c = 3, e = 106, M = 30, G = 100, Pc = 0.7, Pm = 0.2, c = 0.3, and K = 20. The clustering results are listed in Table 3. From Table 3, we can see that the number of misclassifications for GIWFCM is 19, which is superior to AWFCM whose result is 20, and it implies that GIWFCM finds more rational weights than the interval centers. While the number of misclassifications for OWFCM reaches 30, which provides an example of the inapplicability of OWFCM in attribute weighting and clustering. roid
5.4. Repeated experiments and results statistics In this part, we will carry out the following experiments. (i) Using the judgment matrices AIRIS, ACrude-Oil and ANew-thyroid shown in (22), (24), and (25), and choosing the same parameters as in the above Sections 5.1–5.3 respectively, 10 trials will be performed on the three data sets. In addition, the statistics of number of misclassifications as well as the best, worst and mean objective function (18) minimum values obtained will be made respectively.
Table 2 Clustering performance on Crude-Oil. Algorithm
Final attribute weights
Cluster prototypes
Number of misclassifications
Iterations
FCM
w = (0.2000, 0.2000, 0.2000, 0.2000, 0.2000)T
23
36
OWFCM
w = (0.1282, 0.1194, 0.1216, 0.5644, 0.0664)T
19
45
AWFCM
w = (0.1537, 0.1112, 0.1317, 0.5355, 0.0678)T
18
27
GIWFCM
w = (0.1615, 0.1058, 0.1240, 0.5422, 0.0665)T
v1 = (3.8552, 40.7619, 0.1837, 6.8284, 10.5005)T v2 = (5.4522, 28.9891, 0.3732, 5.1773, 3.9234)T v3 = (8.0937, 18.9972, 0.3313, 4.4342, 6.8494)T v1 = (4.3854, 40.4133, 0.2792, 7.3882, 9.4687)T v2 = (4.7727, 26.7636, 0.3943, 5.5805, 5.7100)T v3 = (7.8578, 21.1707, 0.2984, 4.1364, 5.6853)T v1 = (4.3752, 40.3719, 0.2758, 7.3760, 9.4536)T v2 = (4.7026, 27.0277, 0.3912, 5.5740, 5.6622)T v3 = (7.8805, 21.0703, 0.2980, 4.1442, 5.7129)T v1 = (4.3827, 40.2803, 0.2789, 7.3765, 9.4296)T v2 = (4.6503, 27.0909, 0.3868, 5.5772, 5.6630)T v3 = (7.8820, 21.0813, 0.2990, 4.1455, 5.7093)T
16
100
5969
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971 Table 3 Clustering performance on New-thyroid. Algorithm
Final attribute weights
Cluster prototypes T
FCM
w = (0.2000, 0.2000, 0.2000, 0.2000, 0.2000)
OWFCM
w = (0.0830, 0.0899, 0.1962, 0.4569, 0.1740)T
AWFCM
w = (0.1012, 0.1399, 0.1116, 0.5187, 0.1286)T
GIWFCM
w = (0.0959, 0.1288, 0.1109, 0.5177, 0.1468)T
T
v1 = (86.7181, 19.5317, 5.2735, 1.3063, 0.5155) v2 = (109.7613, 9.3628, 1.7681, 1.4655, 2.5446)T v3 = (122.8814, 4.6773, 1.2143, 11.1698, 14.2673)T v1 = (89.6860, 18.8250, 5.2549, 1.5734, 1.1489)T v2 = (110.2596, 9.2807, 1.7398, 1.4713, 2.8049)T v3 = (124.2291, 2.7611, 0.8465, 20.1277, 18.8176)T v1 = (96.6049, 17.0849, 4.0055, 1.5534, 1.3398)T v2 = (110.4819, 8.8374, 1.7304, 1.5197, 3.0806)T v3 = (124.3463, 2.5759, 0.7810, 21.7782, 17.4562)T v1 = (97.1118, 16.8053, 3.9432, 1.5979, 1.4793)T v2 = (110.4917, 8.8437, 1.7288, 1.5167, 3.0530)T v3 = (124.2064, 2.5814, 0.7814, 22.0531, 17.7198)T
(ii) Set other judgment matrices A0IRIS , A0Crude-Oil and A0New-thyroid on the three data sets respectively as follows:
2
3 1 1=2 1=8 1=5 62 1 1=7 1=4 7 6 7 A0IRIS ¼ 6 7; 48 7 1 1 5 5 4 1 1 3 2 1 2 1 1=3 2 7 6 1 1 1=5 2 7 6 1=2 7 6 0 6 1 1 1=5 2 7 ACrude-Oil ¼ 6 1 7; 7 6 5 5 1 75 4 3 1=2 1=2 1=2 1=7 1 3 2 1 1=2 1 1=6 1 7 6 1 1 1=4 1 7 62 7 6 0 1 1 1=6 1 7 ANew-thyroid ¼ 6 7; 61 7 6 4 6 1 35 46 1 1 1 1=3 1
ð26Þ
in which the relative importance of individual properties exists slight difference compared with AIRIS, ACrude-Oil and ANew-thyroid. And the interval centers and interval weights derived from these judgment matrices are listed in Table 4. Here, the same parameters for the three datasets are still chosen, respectively, as in Sections 5.1–5.3, and the statistics of
Number of misclassifications
Iterations
20
103
30
69
20
78
19
100
number of misclassifications as well as the best, worst and mean objective function (18) minimum values obtained are made over 10 trials. The experimental results are listed in Table 5. From Table 5, it can be seen that: (i) In terms of minimal number of misclassifications and the best objective function minimum values obtained over 10 trials, GIWFCM can get a small number of misclassifications and a small objective function value on the two judgment matrices of each data set, which shows that GIWFCM has the ability to search for optimal solution. (ii) In terms of maximal number of misclassifications and the worst objective function minimum values obtained over 10 trials, GIWFCM can still get the same clustering performance as AWFCM in the worst cases, that is, the attribute weights found are interval centers. (iii) In terms of average number of misclassifications and the mean objective function minimum values obtained over 10 trials, GIWFCM can always get satisfying results.
5.5. Discussions 5.5.1. Comparison of GIWFCM and AWFCM From Tables 1–3 as well as Table 5, it can be seen that AWFCM reaches good clustering results due to its reasonable attribute
Table 4 Interval centers and interval weights derived from the judgment matrices. Judgment matrix
Interval centers
Interval weights
A0IRIS A0Crude-Oil A0New-thyroid
0.0585, 0.0907, 0.4804, 0.3703 0.1630, 0.1110, 0.1271, 0.5308, 0.0680 0.0963, 0.1373, 0.1097, 0.5296, 0.1271
[0.0438, 0.0732], [0.0679, 0.1135], [0.3598, 0.6010], [0.2774, 0.4632] [0.1405, 0.1855], [0.0957, 0.1263], [0.1096, 0.1446], [0.4575, 0.6041], [0.0586, 0.0774] [0.0768, 0.1158], [0.1094, 0.1652], [0.0874, 0.1320], [0.4221, 0.6371], [0.1013, 0.1529]
Table 5 Statistics of experimental results over 10 trials.
Number of misclassifications
Minimum of objective function obtained
AIRIS
A0IRIS
ACrude-Oil
A0Crude-oil
ANew-thyroid
A0New-thyroid
OWFCM AWFCM GIWFCM-Worst GIWFCM-Mean GIWFCM-Best
6 7 7 6.1 5
7 7 6.2 5
19 18 18 17.5 16
18 18 17.7 16
30 20 20 19.7 19
21 21 20.4 19
OWFCM AWFCM GIWFCM-Worst GIWFCM-Mean GIWFCM-Best
0.2604 0.3227 0.2739 0.2729 0.2721
0.3231 0.2746 0.2729 0.2708
0.1654 0.1762 0.1671 0.1663 0.1659
0.1767 0.1674 0.1667 0.1662
0.2362 0.2480 0.2480 0.2474 0.2447
0.2480 0.2480 0.2469 0.2448
5970
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971
weights, but GIWFCM is still superior to AWFCM for the reason that GIWFCM can find more rational weights within the intervalconstrained ranges than the interval centers, which shows that GIWFCM can act as an optimization operator on top of AWFCM. 5.5.2. Comparison of GIWFCM and OWFCM From Table 5, it can be seen that on the three data sets, all the objective function minimum values obtained by OWFCM are always less than those by GIWFCM. From Tables 1–4, it can be seen that the attribute weight vectors generated by OWFCM are not within the regions bounded by interval weights of GIWFCM, however, the numbers of misclassifications of OWFCM are not satisfying, especially on the New-thyroid data where OWFCM gets more misclassification errors than standard FCM. The above results reveal that if the attribute weight vector is only determined by minimizing the clustering objective function without being constrained in appropriate region, the attribute weight vector obtained is not always in the set of appropriate weights, though the final iteration results may lead to a smaller objective function value. And only if attribute weight vector in the set of appropriate weights is found can the reasonable clustering results be achieved. 5.5.3. Effects of interval weight perturbation In our experiments, two judgment matrices with slight differences are given on each data set, which derive different but overlapping interval weights. From Table 5, it can be seen that for the two different interval weights, GIWFCM can always find optimized or feasible solution in interval-constrained ranges, and the numbers of misclassifications obtained in GIWFCM are generally better than the compared algorithms. The above analysis indicates that when interval weighting is adopted, the clustering performance has robustness to interval weights with slight differences. 6. Conclusion In this paper, interval number is introduced for attribute weighting in weighted fuzzy clustering. The interval-represented attribute weights can raise greatly the possibility that attribute weight vector falls into the set of appropriate weights, which makes attribute weighting much easier. And based on the predetermined interval weights, within the genetic heuristic framework, the custom-built hybrid optimization scheme can find appropriate numerical weights in interval-constrained ranges and lead to reasonable data partition. The proposed algorithm merges the merits of two kinds of attribute weighting ideas, namely, weight determination by subjective approaches or objective approaches before clustering, and that based on collaborative optimization of attribute weights and clustering by minimizing the objective function. An application with three real data sets is carried out in order to show the usefulness of the proposed method. The accuracy of the results is evaluated by the two most direct indices as the number of misclassifications and the prototype error. As can be seen from the results, the much more accurate clustering results are usually acquired by extending numerical weights into interval ones, which shows that the interval weighted clustering can act as an optimization operator on the basis of the numerical weighted clustering. And the clustering results based on two different but overlapping interval weights show that the effects of interval weight perturbation on clustering performance can be decreased, which is quite important for practical applications. The future work about this topic will be mainly focused on the combination of interval weighting and the multivariate fuzzy cmeans algorithm, in which interval weights may be determined by the membership values according to each attribute. In addition, interval weighting will be also applied to the clustering of interval
data set and that of incomplete data set with missing values, and the new clustering algorithms will be designed by using the interval algebra approach.
Acknowledgments The authors would like to express their gratitude to the editor, associate editor, and all reviewers for their helpful suggestions for improving this manuscript. This work was supported by the National Natural Science Foundation of China (61175041, 61174115) and Canada Research Chair (CRC) Program.
References Bandeira, L., Sousa, J., & Kaymak, U. (2003). Fuzzy clustering in classification using weighted features. Lecture Notes in Computer Science, 2715, 560–567. Bao, Z., Han, B., & Wu, S. (2006). A general weighted fuzzy clustering algorithm. Lecture Notes in Computer Science, 4142, 102–109. Baskir, M., & Turksen, I. (2013). Enhanced fuzzy clustering algorithm and cluster validity index for human perception. Expert Systems with Applications, 40(3), 929–937. Chang, P., Fan, C., & Dzan, W. (2010). A CBR-based fuzzy decision tree approach for database classification. Expert Systems with Applications, 37(1), 214–225. Chang, P., Fan, C., & Wang, Y. (2008). Data clustering and evolving fuzzy decision tree for data base classification problems. Communications in Computer and Information Science, 15, 463–470. Chen, G., et al. (2009). Research on spatially weighted fuzzy dynamic clustering algorithm and spatial data mining visualization. In Proceedings of world congress on software engineering (pp. 60–66), Xiamen. Chen, M., & Miao, D. (2011). Interval set clustering. Expert Systems with Applications, 38(4), 2923–2932. Cura, T. (2012). A particle swarm optimization approach to clustering. Expert Systems with Applications, 39(1), 1582–1588. Entani, T., & Tanaka, H. (2007). Interval estimations of global weights in AHP by upper approximation. Fuzzy Sets and Systems, 158(17), 1913–1921. Fan, C. et al. (2011). A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Applied Soft Computing, 11(1), 632–644. Fan, J., Zhen, W., & Xie, W. (2004). Suppressed fuzzy c-means clustering algorithm. Pattern Recognition Letters, 24(9–10), 1607–1612. Frank, A., & Asuncion, A. (2014). UCI machine learning repository,
. Frigui, H., & Nasraoui, O. (2004). Unsupervised learning of prototypes and attribute weights. Pattern Recognition, 37(3), 567–581. Hall, L., Ozyurt, I., & Bezdek, J. (1999). Clustering with a genetically optimized approach. IEEE Transactions on Evolutionary Computation, 3(2), 103–112. Han, B., Gao, X., & Ji, H. (2006). A novel feature weighted clustering algorithm based on rough sets for shot boundary detection. Lecture Notes in Computer Science, 4223, 471–480. Hathaway, R., & Bezdek, J. (1995). Optimization of clustering criteria by reformulation. IEEE Transactions on Fuzzy Systems, 3(2), 241–245. Hathaway, R., Bezdek, J., & Hu, Y. (2000). Generalized fuzzy c-means clustering strategies using Lp norm distances. IEEE Transactions on Fuzzy Systems, 8(5), 576–582. Hruschka, E. et al. (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man and Cybernetics – Part C: Applications and Reviews, 39(2), 133–155. Izakian, H., & Abraham, A. (2011). Fuzzy c-means and fuzzy swarm for fuzzy clustering problem. Expert Systems with Applications, 38(3), 1835–1838. Johnson, R. A., & Wichern, D. W. (1982). Applied multivariate statistical analysis. New Jersey: Prentice-Hall. Kolen, J. F., & Hutcheson, T. (2002). Reducing the time complexity of the fuzzy cmeans algorithm. IEEE Transactions on Fuzzy Systems, 10(2), 263–267. Li, J., & Gao, J. (2009). Research on improved weighted fuzzy clustering algorithm based on rough set. In Proceedings of international conference on computer engineering and technology (pp. 98–102), Singapore. Li, J. et al. (2007). A feature weighted fuzzy clustering algorithm. Journal of Beijing Electronic Science and Technology Institute, 15(2), 74–76. Li, J., Gao, X., & Jiao, L. (2005). A new feature weighted fuzzy clustering algorithm. Lecture Notes in Computer Science, 3641, 412–420. Liu, Y. et al. (2004). A genetic clustering method for intrusion detection. Pattern Recognition, 37(5), 927–942. Mika, S. (2007). Weighted fuzzy clustering on subsets of variables. In Proceedings of international symposium on signal processing and its applications (pp. 1–4), Sharjah. Mika, S. (2008). Fuzzy variable selection with degree of classification based on dissimilarity between distributions of variables. International Journal of Intelligent Technologies and Applied Statistics, 1(2), 1–18. Monti, S., & Carenini, G. (2000). Dealing with the expert inconsistency in probability elicitation. IEEE Transactions on Knowledge and Data Engineering, 12(4), 499–508.
L. Zhang et al. / Expert Systems with Applications 41 (2014) 5960–5971 Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S. (2009). Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Transactions on Evolutionary Computation, 13(5), 991–1005. Petridis, V., Kazarlis, S., & Bakirtzis, A. (1998). Varying fitness functions in genetic algorithm constrained optimization: The cutting stock and unit commitment problems. IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics, 28(5), 629–640. Pimentel, B. A., & Souza, R. M. C. R. (2013). A multivariate fuzzy c-means method. Applied Soft Computing, 13(4), 1592–1607. Pimentel, B. A., & Souza, R. M. C. R. (2014). A weighted multivariate fuzzy c-means method in interval-valued scientific production data. Expert Systems with Applications, 41(7), 3223–3236. Saaty, T. (2003). Decision-making with the AHP: Why is the principal eigenvector necessary. European Journal of Operational Research, 145(1), 85–91. Shen, H. et al. (2006). Attribute weighted mercer kernel based fuzzy clustering algorithm for general non-spherical datasets. Soft Computing, 10(11), 1061–1073. Sheng, W. et al. (2005). A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics, 35(6), 1156–1167. Sun, Y. (2007). Iterative RELIEF for feature weighting: Algorithms, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1035–1051. Tao, L. et al. (2012). An integrated multiple criteria decision making model applying axiomatic fuzzy set theory. Applied Mathematical Modelling, 36(10), 5046–5058.
5971
Wang, Y., & Elhag, T. (2007). A goal programming method for obtaining interval weights from an interval comparison matrix. European Journal of Operational Research, 177(1), 458–471. Wang, Y., Elhag, T., & Hua, Z. (2006). A modified fuzzy logarithmic least squares method for fuzzy analytic hierarchy process. Fuzzy Sets and Systems, 157(23), 3055–3071. Wang, L. et al. (2006). Fuzzy c-mean algorithm based on feature weights. Chinese Journal of Computers, 29(10), 1797–1803. Wang, X., Wang, Y., & Wang, L. (2004). Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recognition Letters, 25(10), 1123–1132. Wang, Y., Yang, J., & Xu, D. (2005). A two-stage logarithmic goal programming method for generating weights from interval comparison matrices. Fuzzy Sets and Systems, 152(3), 475–498. Yeung, D., & Wang, X. (2002). Improving performance of similarity-based clustering by feature weight learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 556–561. Yu, J., Cheng, Q., & Huang, H. (2004). Analysis of the weighting exponent in the FCM. IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics, 34(1), 634–639. Zhang, J. et al. (2012). Robust data clustering by learning multi-metric Lq-norm distances. Expert Systems with Applications, 39(1), 335–349. Zhang, L. et al. (2007). Research on customer classification based on fuzzy clustering. Journal of Computational Information Systems, 3(5), 1971–1976.