Robust kernelized approach to clustering by incorporating new distance measure

Robust kernelized approach to clustering by incorporating new distance measure

Engineering Applications of Artificial Intelligence 26 (2013) 833–847 Contents lists available at SciVerse ScienceDirect Engineering Applications of ...

1MB Sizes 14 Downloads 117 Views

Engineering Applications of Artificial Intelligence 26 (2013) 833–847

Contents lists available at SciVerse ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Robust kernelized approach to clustering by incorporating new distance measure Prabhjot Kaur a,n, A.K. Soni b,1, Anjana Gosain c,2 a

Department of Information Technology, Maharaja Surajmal Institute of Technology, C-4, Janakpuri, Guru Gobind Singh Indraprastha University, New Delhi 110058, India Department of Computer Science, Sharda University, Greater Noida, Uttar Pradesh, India c Department of Information Technology, USIT, Guru Gobind Singh Indraprastha University, New Delhi, India b

a r t i c l e i n f o

abstract

Article history: Received 9 January 2012 Received in revised form 2 July 2012 Accepted 3 July 2012 Available online 31 July 2012

A new data clustering algorithm Density oriented Kernelized version of Fuzzy c-means with new distance metric (DKFCM-new) is proposed. It creates noiseless clusters by identifying and assigning noise points into separate cluster. In an earlier work, Density Based Fuzzy C-Means (DOFCM) algorithm with Euclidean distance metric was proposed which only considered the distance between cluster centroid and data points. In this paper, we tried to improve the performance of DOFCM by incorporating a new distance measure that has also considered the distance variation within a cluster to regularize the distance between a data point and the cluster centroid. This paper presents the kernel version of the method. Experiments are done using two-dimensional synthetic data-sets, standard data-sets referred from previous papers like DUNN data-set, Bensaid data-set and real life high dimensional data-sets like Wisconsin Breast cancer data, Iris data. Proposed method is compared with other kernel methods, various noise resistant methods like PCM, PFCM, CFCM, NC and credal partition based clustering methods like ECM, RECM, CECM. Results shown that proposed algorithm significantly outperforms its earlier version and other competitive algorithms. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Fuzzy clustering Kernel approach Density-oriented approach Noise clustering Robust clustering Distance metric

1. Introduction Clustering helps in finding natural boundaries in the data whereas fuzzy clustering can be used to handle the problem of vague boundaries of clusters. In fuzzy clustering, the requirement of crisp partition of the data is replaced by a weaker requirement of fuzzy partition, where the associations among data are represented by fuzzy relations. Fuzzy clustering can be applied to a wide variety of applications like image segmentation, pattern recognition, object recognition, and customer segmentation etc. Fuzzy clustering deals with uncertainty, fuzziness and vagueness. The Fuzzy C means (FCM) (Bezdek, 1981) algorithm, proposed by Bezdek (1981), is the first and most widely used algorithm for clustering because it has robust characteristics for ambiguity and can retain much more information than hard segmentation methods. FCM is the extension of the fuzzy ISODATA algorithm proposed by Dunn (1974). FCM has been successfully applied to feature analysis, clustering, and classifier designs in fields such as astronomy, geology, medical imaging, target recognition, and image

n

Corresponding author. Tel.: þ91 9810665064/91 9810165064. E-mail addresses: [email protected] (P. Kaur), [email protected] (A.K. Soni), [email protected] (A. Gosain). 1 Tel.: þ91 9990021800. 2 Tel.: þ91 9811055716. 0952-1976/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.engappai.2012.07.002

segmentation. In case the data is noisy, FCM technique wrongly classifies noisy objects because of its abnormal feature data. Various approaches are proposed by researchers to compensate this drawback of FCM. Another similar technique, PCM, proposed by Krishnapuram and Keller (1993) interprets clustering as a possibilistic partition. However, it caused clustering being struck in one or two clusters. To overcome the problem of identical clusters Pal et al. (2005) introduced PFCM that generates both membership and typicality values when clustering unlabeled data. PFCM failed to give desired results if the data-set consists of unequal size clusters with noise. Dave (Dave and Krishnapuram, 1997; Dave, 1991, 1993) introduced the concept of noise cluster in Noise Clustering, which identifies outliers in separate cluster. Problem with NC is that it does not identify outliers which are located in between the clusters and is not independent of number of clusters for the same data-set (Kaur and Gosain, 2011; Prabhjot Kaur and Gosain, 2010). Krishna K. Chintalapudi proposed Credibility Fuzzy c means (CFCM) (Chintalapudi and Kam, 1998), which introduced credibility function to reduce the influence of outliers on cluster centroids. Although CFCM is sound in reducing the effect of outliers but most of the time it assigns some outliers to more than one cluster (Kaur and Gosain, 2011; Prabhjot Kaur and Gosain, 2010). Recently, a new concept of partition, the credal partition, developed in the framework of belief function theory, has been

834

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

introduced (Denoeux and Masson, 2004; Denoeux and Smets, 2006; Masson and Denoeux, 2008, 2009; Antoine et al., 2009). This concept generalizes existing concepts of hard, fuzzy (probabilistic) or possibilistic partitions by allowing an object to belong to several subsets of classes. Many algorithms, Evidential clustering (EVCLUS) (Denoeux and Masson, 2004), Evidential c-means (ECM) (Masson and Denoeux, 2008), Relational Evidential c means (RECM) (Masson and Denoeux, 2009) and constrained evidentrial c means (CECM) (Antoine et al., 2009) have been proposed in order to derive such credal partitions from data and to increase the robustness against outliers. All these algorithms are based upon Dampster–Shafer theory of belief functions. They assign a basic belief assignment to each object in such a way that the degree of conflict between the masses given to any two objects reflects their dissimilarity. EVCLUS was the first incursion of belief functions into the cluster analysis domain. It was designed to handle relational data. It is applicable to both metric and nonmetric dissimilarity data and does not use any explicit geometrical model of the data. It only postulates that more similar two objects, the more plausible it is that they belong to the same cluster. ECM is the direct extension of FCM and Dave’s noise clustering and is only applicable to object data. Here each class is represented by the prototype and the similarity between an object and a cluster and is measured using Euclidean metric. ECM is computationally more efficient than EVCLUS when applied to object data. RECM is the relational version of ECM in the framework of belief functions. RECM provides similar results with EVCLUS but is computationally much more efficient than EVCLUS. Recently, CECM is proposed which is another variant of ECM taking into account the pairwise constraints. These constraints are translated into the belief function framework and integrated in the cost function. All the above techniques, except NC approach and credal based clustering methods, do not result into efficient clusters because they include outliers into the final clusters and are not able to get noiseless clusters. The clustering output depends upon various parameters like distribution of points inside and outside the cluster, shape of the cluster and linear or non-linear separability. The effectiveness of the clustering method highly relies on the choice of distance metric. FCM used Euclidean distance as a distance measure thus can only able to detect hyper spherical clusters. Researchers have proposed various other distance measures like Mahalanobis distance measure, Kernel based distance measure in the data space and in high dimensional feature space so that non-hyper spherical/non-linear clusters can be detected (Zhang and Chen, 2003, 2004). Tsai and Lin (2011) proposed extension of FCM; FCM-s and KFCM-s by incorporating new distance metric into FCM and its kernelized version. They incorporate the distance variation of each individual data group into the FCM method to regularize the distance between a data point and the cluster centroid, and to able to detect non-hyperspherical clusters with uneven densities for linear as well as for non-linear separation. But these methods suffer with the same problem as conventional FCM i.e. they are not able to give efficient clustering results in the presence of noise. Earlier we proposed a technique called Density Oriented Fuzzy C-Means (DOFCM) (Kaur and Gosain, 2011; Prabhjot Kaur and Gosain, 2010), which identified outliers using density of points in the data-set before creating clusters. DOFCM results into ‘n þ1’ clusters with ‘n’ good clusters and one invalid cluster containing outliers. DOFCM is based on the concept that if outliers are not required in clustering, their memberships should not be involved during clustering. We tried to nullify the affect of outliers by assigning them zero membership value during clustering. In this paper, we are further trying to improve the performance of

DOFCM and presenting Density oriented kernelized approach to fuzzy c-means using new distance metric (DKFCM-new) algorithm by incorporating the distance measure proposed by Tsai and Lin. It is the improvement over DOFCM by incorporating Kernel function and new distance metric. Visualizing the clustering results can help to quickly assimilate this information and provide insights that support and complement textual descriptions or statistical summaries. A standard way to display the result of clustering is the use of dendrograms, where bottom shows the initial element and each next row shows how clusters are combined. But still with this technique we are lacking the insight into the distribution of the elements. So we need a representation with the help of which a user can view any details he wants. One such representation is called calendar. But for effective data exploration, user interaction is as important as presentation. The combination of cluster analysis with a calendar representation provides good opportunities for interaction (van Wijk and Edword, 1999). The organization of the paper is as follows: Section 2, briefly reviews Fuzzy C-Means (FCM) and its variants, Kernel based FCM (KFCM), Density Oriented FCM (DOFCM), FCM-s and KFCM-s. Section 3 describes the proposed algorithm, Density oriented Kernel zed approach to fuzzy c-means using new distance metric (DKFCM-new). Section 4 evaluates the performance of the proposed algorithms using synthetic, standard and real life data-sets followed by concluding remarks in Section 5.

2. Background information 2.1. The fuzzy c-means algorithm and its variants This section briefly discusses the Fuzzy C-Means (FCM) and its variants, Kernel based FCM, Density Oriented FCM and FCM-s and its kernelized version. In this paper, the data-set is denoted by ‘X’, where X¼{x1, x2, x3, y xn} specifying an image with ‘n’ pixels in M-dimensional space to be partitioned into ‘c’ clusters. Centroids of clusters are denoted by vi and dik is the distance between xk and vi.

2.1.1. The fuzzy c-means FCM is the most popular fuzzy clustering algorithm. It assumes that number of clusters ‘c’ is known in priori and minimizes the objective function (JFCM) as JFCM ¼

c X n X

2

um ik dik

ð1Þ

i¼1k¼1

where dik ¼ :xk vi :, and uik is the membership of pixel ‘xk’ in cluster ‘i’, which satisfies the following relationship: c X

uik ¼ 1,

i ¼ 1,2,. . .,n

ð2Þ

i¼1

Here ‘m’ is a constant, known as the fuzzifier (or fuzziness index), which controls the fuzziness of the resulting partition. m¼2 is used in this paper. Any norm :*: can be used for calculating dik. Minimization of JFCM is performed by a fixed point iteration scheme known as the alternating optimization technique. The conditions for local extreme for (1) and (2) are derived using Lagrangian multipliers: uik ¼ P c

j¼1



1 dik =djk

2=ðm1Þ

8k,i

ð3Þ

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

where 1r i rc; 1 rk r n and  m  Pn u xk vi ¼ Pk n¼ 1  ikm  8i k ¼ 1 uik

835

and ð4Þ

The FCM algorithm iteratively optimizes JFCM(U,V) with the continuous update of U and V, until 9U(l þ 1) U(l)9o ¼ e, where ‘l’ is the number of iterations. FCM works fine for the images which are not corrupted with noise but if the image is noisy or distorted then it wrongly classifies noisy pixels because of its abnormal feature data which is pixel intensity in the case of images, and results in an incorrect membership and improper segmentation.

Pn Z ðaum ki þ bt ki Þxi vk ¼ Pi n¼ 1 Z m i ¼ 1 ðauki þ bt ki Þ

ð12Þ

Though PFCM is found to perform better than FCM and PCM but when two highly unequal sized clusters with outliers are given, it fails to give desired results.

2.1.2. Possibilistic C-Means clustering (PCM) To avoid the noise sensitivity problem of FCM, Krishnapuram and Keller (1993) relaxed the column sum constraint

2.1.4. Noise clustering (NC) Noise clustering is introduced by Dave to overcome the major deficiency of the FCM algorithm i.e. its noise sensitivity. He gave the concept of ‘‘noise prototype’’, which is a universal entity such that it is always at the same distance from every point in the dataset. Let ‘vk’ be the noise prototype and ‘xi’ be any point in the dataset such that vk, xi RP. Then noise prototype is the distance, dki, given by

c X

dki ¼ d,8i

uki ¼ 1,

i ¼ 1,2,. . .,n

ð5Þ

k¼1

and proposed a possibilistic approach to clustering by minimizing objective function as c X n X

J PCM ðU,V Þ ¼

c X

2

um ki dki þ

k¼1i¼1

k¼1

Zk

n X

ð1uki Þm

ð6Þ

i¼1

1  1=ðm1Þ 2 1 þ :xi vk : =Zk

ð7Þ

PCM sometimes helps when data is noisy. It is very much sensitive to initializations and sometimes results into overlapping or identical clusters. 2.1.3. Possibilistic Fuzzy C-Means clustering (PFCM) Pal et al. (2005) integrates the fuzzy approach with the possibilistic approach and hence, it has two types of memberships, viz. a possibilistic (tki) membership that measures the absolute degree of typicality of a point in any particular cluster and a fuzzy membership (uki) that measures the relative degree of sharing of point among the clusters. PFCM minimizes the objective function as J PFCM ðU,V,T Þ ¼

c X n  c n X X X Z 2 aum gk ð1t ki ÞZ ki þ bt ki dki þ k¼1i¼1

k¼1

ð8Þ

i¼1

subject to the constraint that c X

uki ¼ 18i

ð9Þ

k¼1

Here, a 40, b40, m41, and Z 4 1. The constants ‘a’ and ‘b’ define the relative importance of fuzzy membership and typicality values in the objective function. The minimization of objective function gives the following conditions: 1  2=ðm1Þ 8k,i j ¼ 1 dki =dji

uki ¼ P c

ð10Þ

and t ki ¼

1

 1=ðZ1Þ 8k 2 1 þ ðb=gk Þdki

The NC algorithm considers noise as a separate class. The membership u*i of xi in a noise cluster is defined as c X

uni ¼ 1

uki

ð14Þ

k¼1

where Zk are suitable positive numbers. The first term tries to reduce the distance from data points to the centroids as low as possible and second term forces uki to be as large as possible, thus avoiding the trivial solution. The updating of centroids is same as that in FCM but the membership matrix of PCM is updated as uki ¼

ð13Þ

ð11Þ

NC reformulates FCM objective function as JNCðU,V Þ ¼

cX þ1

N X

2

um ki dki

ð15Þ

k¼1i¼1

where ‘cþ1’ consists of ‘c’ good clusters and one noise cluster and for k¼n ¼cþ 1. "P # PN c 2 2 k¼1 i ¼ 1 ðdki Þ ð16Þ d ¼ Nc and membership equation is 0 11 ! ! 2 1=ðm1Þ 2 1=ðm1Þ kX ¼c dji dji A uki ¼ @ þ 2 2 d k ¼ 1 dki

ð17Þ

Noise clustering is a better approach than FCM, PCM, and PFCM. It identifies outliers in a separate cluster but does not result into efficient clusters because it fails to identify those outliers which are located in between the regular clusters (refer Section 4.2). Its main objective is to reduce the influence of outliers on the clusters rather than identifying it. Real-life datasets usually contain cluster structures that differ from our assumptions. So a clustering technique should be independent of the number of clusters for the same data-set. In NC, noise distance is given as (16). Here, noise distance depends upon distance measure, number of assumed clusters, and l, which is the value of multiplier used 0 to obtain 0 d , from the average of distances. From the equation, it is interpreted that if the number of clusters is increased, d assumes high values. NC assigns only those points to noise cluster whose distance from regular clusters is less than the distance from noise distance, d. If the number of clusters is increased for the same data-set, NC does not detect outliers, because in that scenario, the average distance between points and regular clusters decreases and the noise distance remains almost constant or assumes relatively high values (Rehm et al., 2007). Performance of NC degrades when the number of outliers are increased for the same data-set (refer Section 4.2). 2.1.5. Credibilistic Fuzzy C Means (CFCM) Krishna K. Chintalapudi proposed Credibilistic Fuzzy C-means (CFCM) to reduce the effect of outliers by introducing a new variable, credibility. Credibility is a function that assumes low

836

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

value for outliers and high for non-outliers. CFCM defines credibility as

ck ¼ 1

ð1yÞak  , maxj ¼ 1...n aj

0 r y r1

ð18Þ

min k ¼ i ¼ 1...c ðdik Þ.

where a ak is the distance of vector xk from its nearest centroid. The farther xk is from its nearest centroid, the lower is its credibility. The parameter y controls the minimum value of ck so that the noisiest vector gets a credibility equal to y; y a 0 is needed only if the user wishes to set a priori the credibility of the noisiest vector. Setting y ¼ 1 reduces the scheme to FCM while y ¼ 0 assigns zero memberships to the noisiest vector. CFCM partitions data set by minimizing: J CFCMðU,VÞ ¼

c X n X

2

um ki dki

ð19Þ

k¼1i¼1

Subject to the constraint iX ¼c

uik ¼ ck ,

k ¼ 1,. . .,n

ð20Þ

i¼1

Since for outliers, credibility is very small so the memberships generated by CFCM for outliers are smaller than those generated with FCM. Although, it is superior to FCM, PCM, and PFCM but we observed that most of the time it assigns some outlier points to more than one cluster (refer Section 4.2). Another problem identified by Chen and Wang (1999) is that credibility is estimated by unstable prototypes that are undergoing the convergence process and result affects with the change in initial prototypes. Moreover, it does not separate outliers so noiseless clusters are not obtained. Its main emphasis is only to reduce the effect of outliers on regular clusters. 2.2. Kernel based approach

Consider the following example. For p¼2 and the mapping function F pffiffiffi F : R2 -H ¼ R3 ðxi1 ,xi2 Þ-ðx2i1 ,x2i2 , 2xi1 xi2 Þ Then the dot product in the feature space H is calculated as   pffiffiffi pffiffiffi    Fðxi ÞUF xj ¼ x2i1 ,x2i2 , 2xi1 xi2 x2j1 ,x2j2 , 2xj1 xj2 ¼ ððx2i1 ,x2i2 Þðx2j1 ,x2j2 ÞÞ2 ¼ ðxi :xj Þ2 ¼ Kðxi ,xj Þ

ð22Þ

where K-function is the square of the dot product in the input space. We saw from this example that use of the kernel function makes it possible to calculate the value of dot product in the feature space H without explicitly calculating the mapping function F. Some examples of kernel function are: Example 1: Polynomial Kernel: Kðxi ,xj Þ ¼ ðxi xj þ cÞd , where c Z0,d A N 2 Example 2: Gaussian Kernel: Kðxi ,xj Þ ¼ expðð:xi xj : =2s2 ÞÞ, where s 40 b P   Example 3: Radial basis Kernel: Kðxi ,xj Þ ¼ expð xai xaj  =s2 Þ, where s,a,b 40 RBF function with a¼1, b ¼2 reduces to Gaussian function. Example 4: Hyper Tangent Kernel: Kðxi ,xj Þ ¼ 1tanhðð:xi  2 xj : =s2 ÞÞ, where s 4 0 2.3. Kernel version of Fuzzy C-means (KFCM) KFCM represents the kernel version of FCM by exploiting a kernel function for calculating the distance of data point from the cluster centers i.e. mapping the data points from the input space to a high dimensional space. KFCM modify the objective function of FCM with the mapping F as JKFCM ¼

c X n X

2

um ik :Fðxk ÞFðvi Þ:

ð23Þ

i¼1k¼1

The kernel function can be applied to any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced by a kernel function. When done, linear algorithms are transformed into non-linear algorithms. Those non-linear algorithms are equivalent to their linear originals operating in the range space of a feature space j. However, because kernels are used, the j function does not need to be ever explicitly computed. This is highly desirable, as sometimes our higher-dimensional feature space could even be infinite-dimensional and thus infeasible to compute. A kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space in which they are more clearly separable. By employing a mapping function FðxÞ, which defines a non-linear transformation: x-FðxÞ, the non-linearly separable data structure existing in the original data space can possibly be mapped into a linearly separable case in the higher dimensional feature space. Given an unlabeled data set X¼{x1, x2, y, xn} in the pdimensional space RP, let F be a non-linear mapping function from this input space to a high dimensional feature space H:

2

where :Fðxk ÞFðvi Þ: is the square of distance between Fðxk Þ and Fðvi Þ: The distance in the feature space is calculated through the kernel in the input space as follows: 2

Fd2 ¼ :Fðxk ÞFðvi Þ: ¼ ðFðxk ÞFðvi ÞÞðFðxk ÞFðvi ÞÞ ki

¼ Fðxk ÞFðxk Þ2Fðxk ÞFðvi Þ þ Fðvi ÞFðvi Þ ¼ K ðxk ,xk Þ2K ðxk ,vi Þ þ K ðvi ,vi Þ

ð24Þ

If kernel width is a positive number then K ðx,xÞ ¼ 1. Thus (23) can be written as JKFCM ¼

c X n X

2

um ik :1K ðxk ,vi Þ:

ð25Þ

i¼1k¼1

Minimizing (25) under the constraint of U, we get uik ¼

Pc

i¼1



1

Fd2 =Fd2 kc

1=ðm1Þ

ð26Þ

ki

Pn um xk vi ¼ Pk n¼ 1 ikm k ¼ 1 uik

ð27Þ

F : Rp -H, x-FðxÞ The key notion in kernel based learning is that mapping function F need not be explicitly specified. The dot product in the high dimensional feature space can be calculated through the kernel function Kðxi ,xj Þ in the input space RP.     ð21Þ K xi ,xj ¼ Fðxi ÞF xj

2.4. Density Oriented Fuzzy C Means (DOFCM) DOFCM separates noise into different cluster. It modifies the membership of FCM results into ‘nþ1’ clusters with ‘n’ good clusters and one invalid cluster of outliers. It identifies outliers on the basis of density of points in the data. Neighborhood membership of a

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

Membership and modified cluster center equations are

point ‘i’ in the data-set ‘X’ is defined as M ineighborhood ¼

i neighborhood

Z

ð28Þ

Zmax

‘a’ can be selected from the range of M neighborhood values after observing the density of points in the data-set and should be close to zero. After identifying the outliers, clustering follows: DOFCM reformulates FCM objective function as cX þ1

n X

2

um ki dki

ð30Þ

i¼1k¼1

where dki ¼ :xk vi :. In this equation the number of clusters is changed from ‘c’ to ‘cþ 1’ because noise is separated into one cluster. If there are ‘c’ good clusters then ‘cþ1’ consists of ‘c’ good clusters and one noise cluster. Hereafter, it will use i ¼cþ1. Membership function uki is modified as 8 i 1 > < Pc ðd =d Þ2=ðm1Þ 8k,i if M neighborhood Z a ki ji j ¼ 1 ð31Þ uki ¼ > : 0ðzeroÞ if M ineighborhood o a Here m is the Fuzziness Index. Updating of centroid is same as in FCM as (4). The constraint on fuzzy membership is extended to c X

0r

uki r 1,

i ¼ 1,2,. . .,n

ð32Þ

i¼1



Pc

uki ¼ 1

ð33Þ

i¼1

2.5. Fuzzy c-means with new distance metric (FCM-s) FCM-s is proposed by Tsai and Lin (2011) by incorporating a new distance measure into the conventional FCM. New distance metric is defined as

1 d^ ik =d^ jk

2=ðm1Þ 8k,i

ð37Þ

where 1 r ir c; 1 r kr n and  m  Pn u xk vi ¼ Pk n¼ 1  ikm  8i k ¼ 1 uik

ð38Þ

2.6. Kernel Fuzzy c-means with new distance metric (KFCM-s) Conventional FCM and FCM-s can only deal with linearly separable data points in observation space. The observed dataset can be transformed to higher dimensional feature space by applying a non-linear mapping function to achieve non-linear separation. KFCM-s incorporates a new distance measure into the mapped feature space. New distance metric is defined as: 2

^ 2 ¼ F d

:Fðxk ÞFðvi Þ:

Fsi

kc

¼

Fd2

kc

ð39Þ

Fsi

where Fsi is the weighted mean distance of cluster i in the mapped feature space and is given by (Pn )1=2 m k ¼ 1 uki Fd2kc Fsi ¼ Pn ð40Þ m k ¼ 1 uki KFCM-s minimizes objective function: JKFCMs ¼

c X n X

2

um ik :Fðxk ÞFðvi Þ:

ð41Þ

i¼1k¼1

Here, uik is the membership of pixel ‘xk’ in cluster ‘i’, which satisfies the following relationship: c X

uik ¼ 1,

i ¼ 1,2,. . .,n

ð42Þ

i¼1 2

and :Fðxk ÞFðvi Þ: is the square of distance between Fðxi Þ and Fðvk Þ: As per (24), minimizing (41) w.r.t. U, we get uik ¼

Pc

i¼1

instead of the following which was used in conventional FCM algorithm c X

uik ¼

j¼1

where Zineighborhood is the number of points in the neighborhood of point i; Zmax is the maximum number of points in the neighborhood of any point in the data-set. Neighborhood membership of each point in the data-set ‘X’ is calculated as per (28) and from the complete range of neighborhood membership values, depending on the density of points in the data-set, the threshold value ‘a’ is selected. The point will be considered as an outlier if its neighborhood membership is less than ‘a’. Let ‘i’ be a point in the data-set ‘X’, then if ( ) o a outlier i ð29Þ M neighborhood ¼ Z a non-outlier

J DOFCMðU,VÞ ¼

837



1

Fd2 =Fd2 kc

1=ðm1Þ

ð43Þ

ki

Pn um xk vi ¼ Pk n¼ 1 ikm k ¼ 1 uik

ð44Þ

The KFCM-s algorithm allows the clustering of non-hyperspherically shaped data with uneven densities in the mapped feature space and achieved nonlinear separation of the data in the observation space. Although, the results of FCM-s and KFCM-s are efficient for noiseless data, they do not perform well in the presence of noise.

3. The proposed technique

2

2

d^ ki ¼

:xk vi :

ð34Þ

si

where si is the weighted mean distance of cluster ‘i’ and is given by (Pn

si ¼

2

um :xk vi : Pn ki m k ¼ 1 uki

)1=2

k¼1

ð35Þ

It minimizes the objective function as J FCMs ¼

c X n X i¼1k¼1

2

um ik

:xk vi :

si

ð36Þ

3.1. Density oriented Kernelized approach to Fuzzy C-means with new Distance metric (DKFCM-new) After explaining KFCM, KFCM-s and DOFCM, we are now in a position to construct the density oriented kernel version of FCM using new distance measure. DKFCM-new is designed with the motive to remove the affect of noise on the cluster centroid locations and on the output of resulting clusters for data as well as high dimensional feature space. Like NC technique, DKFCM-new results in ‘nþ 1’ clusters with ‘n’ good clusters and one noise cluster. Proposed algorithm

838

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

first identify outliers and then apply clustering algorithm to contruct noiseless clusters. 3.2. Identification of outliers DKFCM-new identifies outliers on the basis of density of points in the data-set i.e. on the basis of the number of other points in its neighborhood. DKFCM-new defines density factor, called neighborhood membership, which measures density of an object in relation to its neighborhood. As per the technique, the neighborhood of a given radius of each point in a data-set has to contain at least a minimum number of other points to become a good point (non-outlier). Shape of the neighborhood is determined by the choice of a distance function used for two points x1 and x2, denoted by dist(x1,x2) e.g. when using Manhattan distance in the 2D space, the neighborhood shape is a rectangle and by using Euclidean distance it is spherical Neighborhood membership of a point ‘i’ in the data-set ‘X’. is defined as M ineighborhood ¼

Zineighborhood Zmax

ð45Þ

where Zineighborhood is the number of points in the neighborhood of point i, Zmax is the maximum number of points in the neighborhood of any point in the data-set. Let ‘q’ is in the neighborhood of a point ‘i’, so ‘q’ will satisfy:  qEX9distði,qÞ r r neighborhood ð46Þ where rneighborhood is the radius of neighborhood and dist (i,q) is the distance between points ‘i’ and ‘q’. Neighborhood membership of each point in the data-set ‘X’ is calculated as per (45) and from the complete range of neighborhood membership values, depending on the density of points in the data-set, the threshold value ‘a’ is selected. The point will be considered as an outlier if its neighborhood membership is less than ‘a’. So, as per the analysis, Outlier in DKFCM-new is defined as a point in the data-set ‘X’ whose neighborhood membership is less than the threshold value ‘a’. Let ‘i’ be a point in the data-set ‘X’, then if: ( ) o a outlier M ineighborhood ¼ ð47Þ Z a non-outlier ‘a’ can be selected from the range of Mneighborhood values after observing the density of points in the data-set and should be close to zero.

Step 4: Local membership values i.e. Mneighborhood for the data-set are calculated by dividing Zneighborhood/Zmax (this step is done to normalize the local membership values between 0 and 1). Step 5: By visually observing the data-set, we select a particular Zneighborhood value from the whole range of Zneighborhood values and its corresponding Mneighborhood value is selected as threshold value (a). This concept can be best realized through example. Let x1, x2, x3, x4, k1, k2, p1, p2, and p3 are points in the data-set as shown in Fig. 1. x1 has two points in its neighborhood (k1 and k2), x2 has three points (p1, p2, and p3), x3 has 15 points, and x4 has 9 points in their neighborhoods. As x1, x2, k1, k2, p1, p2, and p3 are far from the dense part of the data-set, therefore, it is obvious that these points should be outliers but if we consider the ideal situation i.e. a point will only be considered as an outlier if no other point is present in its neighborhood, then these are not outliers, which cannot be tolerated with real life data-sets. To tackle this problem, the proposed algorithm uses the threshold variable ‘a’ which is selected after observing the data-set carefully. With this proposed condition, from Fig. 1, after visual observations, we have selected that a point will be considered as an outlier if the number of points in their neighborhood is less than four. So, we have selected Zneighborhood value¼ 4. Here, for this example we are assuming the Zmax value. So, the threshold value ‘a’ is calculated as Let Zmax ¼15. For considering a point to be an outlier, we have selected Zneighborhood ¼ 4 Threshold valueðaÞ ¼

Zneighborhood ¼ 4=15 ¼ 0:266 Zmax

Hence, all the points whose neighborhood membership is less than 0.266 will be outliers. So with this condition, calculations for the above points are: Point x1: Mneighborhood ¼2/15 ¼0.133, which is less than a 0.266. Hence, x1 is an outlier. Point x2: Mneighborhood ¼3/15¼0.20, which is less than a 0.266. Hence, x2 is an outlier. Point x3: Mneighborhood ¼15/15 ¼1.0, which is more than a 0.266. Hence, x3 is not an outlier. Point x4: Mneighborhood ¼ 9/15¼0.60, which is more than a 0.266. Hence, x4 is not an outlier.

3.2.1. Selection of the threshold value ‘a’ Ideally, a point will be outlier only if no other point is present in its neighborhood i.e. when neighborhood membership is zero or threshold value ‘a’ ¼0. However, in the proposed scheme, a point is considered as an outlier when its neighborhood membership is less than ‘a’, where ‘a’ is a critical parameter to identify outlier. Its value depends upon the nature of data-set i.e. how much dense is the data-set? Its value will vary for different datasets. Various steps to select threshold value are: Step 1: A neighborhood radius is calculated i.e. rneighborhood. Calculation of neighborhood radius is done as per Ester et al. (1996). Step 2: Considering the circular region using rneighborhood, the number of points in the local neighborhood of every point in the data-set is found i.e. Zneighborhood. Step 3: From the total range of Zneighborhood values, the point in the data-set having the maximum Zneighborhood value is selected i.e. Zmax (this is done to normalize the local membership values).

Fig. 1. The neighborhood range.

i.e. i.e. i.e. i.e.

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

839

where uki is the membership of pixel xi in cluster k and 2

^ 2 ¼ F d

:Fðxk ÞFðvi Þ:

Fsi

kc

¼

Fd2 kc Fsi

ð49Þ

From (49), DKFCM-new objective function can be re-written as JDKFCM-newðU,V Þ ¼

cX þ1

n X

2

um ki

:Fðxk ÞFðvi Þ:

Fsi

i¼1k¼1

ð50Þ

where Fsi is the weighted mean distance of cluster i in the mapped feature space and is given by (Pn )1=2 m k ¼ 1 uki Fd2kc Fsi ¼ Pn ð51Þ m k ¼ 1 uki 2

and :Fðxk ÞFðvi Þ: is the square of distance between Fðxk Þ and Fðvi Þ: The distance in the feature space is calculated through the kernel in the input space as follows: 2

Fd2 ¼ :Fðxk ÞFðvi Þ: ¼ ðFðxk ÞFðvi ÞÞðFðxk ÞFðvi ÞÞ ki

¼ Fðxk ÞFðxk Þ2Fðxk ÞFðvi Þ þ Fðvi ÞFðvi Þ ¼ K ðxk ,xk Þ2K ðxk ,vi Þ þ K ðvi ,vi Þ

Fig. 2. Outlier identification by varying threshold value ‘a’.

Table 1 Effect of changing threshold value. Data set

gmaxa

Threshold value ‘a’

No. of outliers

D45

15

0.0 0.09 0.14 0.21

6 9 10 12

i.e. i.e.

i¼1k¼1

2

um ki

:1K ðxk ,vi Þ:

cX þ1

n X

um ik

ð54Þ

Fsi

b P  2  2 :1exp  xai xaj  =h :

Fsi

ð55Þ

Given a set of points X, we minimize J DKFCM-new in order to determine uki and vi . We adopt an alternating optimization approach to minimize JDKFCM-new and need the following theorem: Theorem 1. The necessary conditions for minimizing J DKFCMnew under the constraint of U, we get: Minimizing (54) under the constraint of U, we get uik ¼

Pc

i¼1



1

Fd2 =Fd2 kc

1=ðm1Þ

ð56Þ

ki

But as per the definition of DKFCM-new, we are assigning zero membership to the points which are identified as outliers. So the modified membership equation for DKFCM-new is 8 >  1 1=ðm1Þ 8k,i if M ineighborhood Z a > < Pc ^ ^ Fd2 =Fd2 j ¼ 1 ð57Þ uki ¼ kc ki > > : 0ðzeroÞ if M ineighborhood o a

Here the constraint on fuzzy membership is extended to

3.3.1. Formulation We can construct the kernel version of the DOFCM with new distance metric and modify its objective function with the mapping F as follows: n X

n X

i¼1k¼1

i.e. i.e.

cX þ1

i¼1k¼1

JDKFCM-newðU,V Þ ¼

3.3. Clustering process

cX þ1

JDKFCM-newðU,V Þ ¼

i.e.

Hence x1, x2, k1, k2, p1, p2, and p3 are outliers, which is also visually verified. Let us observe it with synthetic data-sets, D45 (data-set with 45 points) given in reference (Dave, 1991). Fig. 2 shows identification of outliers by changing the threshold value ‘a’ with data-set D45. We observed from Fig. 2 and Table 1 that large values of ‘a’ lead to more compact clusters with more number of outliers. As ‘a’-0, DKFCM-new behaves as KFCM-s. Proper selection of ‘a’ would provide better results. After identifying the outliers the clustering process will be done as follows.

J DKFCM-newðU,VÞ ¼

As we are adopting radial basis kernel in the propose technique so 0 P b 1    xai xaj  B C K ðx,yÞ ¼ exp@ ð53Þ A with a,b 4 0 2 h where h is defined as kernel width and it is a positive number, then K ðx,xÞ ¼ 1: Thus (50) can be written as

a Zmax is the maximum number of points in the neighborhood of any other point in the data-set.

Point k1: Mneighborhood ¼2/15¼ 0.133, which is less than a 0.266. Hence, k1 is an outlier. Point k2: Mneighborhood ¼2/15¼ 0.133, which is less than a 0.266. Hence, k2 is an outlier. Point p1: Mneighborhood ¼1/15¼0.066, which is less than a 0.266. Hence, p1 is an outlier. Point p2: Mneighborhood ¼3/15 ¼0.20, which is less than a 0.266. Hence, p2 is an outlier. Point p3: Mneighborhood ¼3/15 ¼0.20, which is less than a 0.266. Hence, p3 is an outlier.

ð52Þ

^ um ki Fd2

kc

ð48Þ

0r

c X

uki r 1,

i ¼ 1,2,. . .,n

ð58Þ

i¼1

And cluster centers are calculated as Pn um K ðxk ,vi Þxk vi ¼ Pk n¼ 1 kim k ¼ 1 uki K ðxk ,vi Þ

ð59Þ

840

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

Proof. We differentiate JDKFCM-new with respect to uki and vk and set the derivatives to zero. Thus, we get (56) and (59). The details are given in Appendix A. Various steps of the proposed algorithm are: Density oriented kernel ZED approach to fuzzy c-means by incorporating new distance measure (DKFCM-new) Input parameters: Data-set (X), Number of Clusters (i¼c þ1), Number of Iterations, Stopping criteria ( ), fuzziness index (m). Output: Cluster centroids matrix, Outlier vector, and Membership matrix. Identification of outliers Step 1: for i¼1,2,3,y,n; do: (a) Calculate the number of points in the neighborhood of each point i.e. Zineighborhood (b) (c)

Select Zmax Compute neighborhood membership, Mineighborhood, for each point using (45)

Step 2: Select Threshold value ‘a’ based upon density of points in the data-set from the whole range of neighborhood membership values. Step 3: With the given value of ‘a’, identify outliers using (47) Clustering process Step 4: Determine initial centroids vk. Step 5: Initialize the membership uki and update the memberships of all the outliers to zero Step 6: for n¼1, 2, 3,y max_iter;do: (a) Update all centroids vni using (59) (b) Update all membership values uki using (57) (c) Compute objective function (On) using (55) (d) Compute En ¼max9On  On  19, if En ( , stop; Else n ¼n þ1

4. Results and simulations In the first subsection, we have compared proposed method with the six methods: FCM, KFCM, DOFCM, FCM-s, and KFCM-s. In the second subsection, we compared DKFCM-new with the other well known noise resistant methods like PCM, PFCM, CFCM and NC. In the last subsection, proposed method is compared with credal partition based evidential clustering methods. Experiments are implemented and simulated using MATLAB Version 7.0 on Core-2 Due processor, 2.0 GHz with 4 GB RAM. We considered following common parameters: m¼2, which is a common choice for fuzzy clustering, e ¼0.0001, maximum iterations¼100. 4.1. Comparison with FCM, KFCM, DOFCM, FCM-s and KFCM-s Example 1. Data-set: Diamond data-set (D11), D12 (referred from Pal et al. (2005)). D11 is a noiseless data-set of points fxi g11 i ¼ 1 . D12 is the union of D11 and an outlier  12. Algorithms: FCM, KFCM, DOFCM, FCM-s, KFCM-s and DKFCMnew Number of Clusters: 2 (Identical data with Noise) Fig. 3 shows clustering results of FCM, KFCM, FCM-s, KFCM-s, DOFCM and DKFCM-new. ‘*’ symbol shows centroids. Table 2 lists the centroids generated with the algorithms. It is observed from

the figure that FCM could not detect the clusters and its performance is badly affected with the presence of even single outlier. FCM-s although detected the clusters but the centroids are diverted towards the outlier. Centroids generated with KFCM and KFCM-s are not much affected with the presence of noise but KFCM allocated the noise point to both the clusters as indicated in the figure and the final clusters are including the noise point because they are not identifying outliers. DOFCM and DKFCM-new identified the noise point with the symbol J and final clusters are without noise. It is observed from the Fig. 3 and Table 2 that DKFCM-new has detected almost original centroid locations and its performance is better than other algorithms and its earlier version. The ideal (true) centroids of data-set D11 are:

3:34 0 V ideal ¼ 3:34 0 To show the effectiveness of the proposed algorithm, we also 2 calculated the error, En ¼ :V ideal V n : , where * is FCM/KFCM/FCM-s/ KFCM-s/DOFCM/DKFCM-new. Table 3 lists the error percentage. Example 2. Data-set: Dunn, 2-dimentional Square data (Dunn, 1974) (142 points) Algorithm: FCM, KFCM, DOFCM, FCM-s, KFCM-s and DKFCMnew Number of Clusters: 2 DUNN Square data-set with noise is used in this example. It consisted of one small and one big cluster of square shape. We have increased the quantity of core points to increase the density of dataset. Figs. 4(a) and (b) shows original and noisy data-set. Figs. 4(c)– (h) shows clustering result of the algorithms and Table 2 lists their centroid locations. It is observed that results of FCM in data space and by incorporating new distance are not very good, whereas kernalized FCM methods have improved the performance in comparison to FCM methods. But these algorithms are not able to identify outliers hence are not producing original clusters. DOFCM and DKFCM-new have identified the outliers but the cluster locations generated with DKFCM-new are almost close to ideal centroids. The ideal (true) centroids of DUNN data-set are

5:25 0:25 V ideal ¼ 17 0 Table 3 lists the error percentage. Example 3. Data-set: Bensaid (Bensaid et al., 1996) 2-dimentional data (213 points) Algorithm: FCM, KFCM, DOFCM, FCM-s, KFCM-s and DKFCM-new Number of Clusters: 3 clusters with 2 small and 1 big size clusters Bensaid’s two-dimensional data-set consisting of one big and two small size clusters is used in first example. We have saved the structure of this set but have increased count of core points and added uniform noise to it, which is distributed over the region [0,120]  [10,80]. Fig. 5 shows the clustering results of all the methods. The clustering results of FCM method, whether in the observed space or in the feature space or including the new distance, in all the cases is highly affected with the presence of noise FCM-s and KFCM-s are not able to detect clusters properly as some points of middle cluster are allocated to the left most clusters in both the cases. Although, both DOFCM and DKFCM-new have identified noise and original clusters. But DKFCM-new outperformed its earlier version.

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

841

Fig. 3. (a) Clustering result of FCM, (b) clustering result of KFCM with h(kernel width) ¼ 6, a¼ 1, b ¼5, (c) clustering result of FCM-s, (d) clustering result of KFCM-s with h ¼ 20, a¼ 1, b¼ 5, (e) clustering result of DOFCM with a ¼0.09, and (f) clustering result of DKFCM-new with a ¼ 0.09, h ¼ 19, a ¼1, b¼ 6. Centroids are shown with ‘*’ symbol and outliers (noise points) are identified with the symbol J. Table 2 Centroids produced by FCM, KFCM, FCM-s, KFCM-s, DOFCM, and DKFCM-new for D12, DUNN and Bensaid data-sets. Data sets

Name of the algorithms

D12

FCM (m¼ 2) 0 0

DUNN data-set

0 37

FCM (m¼ 2)

KFCM (m¼ 2, h¼ 20, a ¼1, b¼ 5)

FCM-r (m¼ 2)

KFCM-r (m ¼2, h ¼ 20, a¼ 1, b¼5)

DOFCM (m¼ 2, a ¼0.09)

DKFCM-new (m¼ 2, a ¼0.09, h ¼ 19, a ¼ 1, b ¼6)

3.252  3.224

3.62 0.280  2.14 5.249

3.315  3.073

3.1672 0  3.167 0

3.324  3.291

FCM-r (m¼ 2)

KFCM-r DOFCM (m ¼2,h ¼10, a ¼1, b¼ 2.45) (m¼ 2,a ¼0.1)

5.509 0.511 16.778 0.880

5.463 16.98

KFCM (m¼ 2,h ¼ 150, a¼ 1, b¼ 2.3)

FCM-r (m¼ 2)

KFCM-r (m ¼2, h ¼ 150, a¼ 1, b¼2.25)

10.592 58.042 109.91

20.757 50.016 24.244 59.671 51.279 59.938 109.41 48.856 109.45

KFCM (m¼ 2, h¼ 10, a ¼1, b¼ 2.3)

6.182 0.782 5.634 17.207 0.7046 17,148 Bensaid data-set FCM (m¼ 2)

10.64 49.12 57.75 51.49 109.21 48.69

0.003 0.002

0.367 0.363

49.195 51.448 48.614

4.1.1.1. High dimensional data-set We now test the performance of our proposed clustering algorithm on well known real data-sets namely Wisconsin breast cancer data-set and Iris data-set. The clustering results were assessed using Huang’s accuracy measure (r) (Huang and Ng, 1999). Pk n ð60Þ r¼ i¼1 i n where ni is the number of data occurring in both the ith cluster and its corresponding true cluster and ‘n’ is the number of data points

0.0006 0

0.288 0.30

50.552 51.162 48.676

5.59 17.31

0 0

DKFCM-new (m¼ 2,a ¼0.09, h ¼10, a ¼1, b¼ 2.31)

0.241 5.27  0.007 17.06

0.249  0.001

DOFCM (m¼ 2, a ¼0.0)

DKFCM-new (m¼ 2, a ¼0.0, h ¼150, a ¼ 1, b¼ 2.3)

5.530 57.306 110.82

3.846 57 111.58

49 51.501 48.487

48.865 51.494 48.451

in the data-set. According to this measure, a higher value of ‘r’ indicates a better clustering result with perfect clustering yielding a value r¼1.

Example 4. Data-set: Wisconsin Breast cancer data set, 30dimentional data (569 points) Algorithm: FCM, KFCM, DOFCM, FCM-s, KFCM-s and DKFCM-new Number of Clusters: 2 clusters Size of Clusters: 357,212

842

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

Table 3 Error %age. Name of the algorithm

Error %age D12 Cluster 1

FCM KFCM FCM-s KFCM-s DOFCM

0.01346 0.1568 0.071289 0.029

DKFCM-new

0.0023

Square data-set Cluster 2

Average error

Does not recognize clusters 0.007834 0.01064 28.89 14.52 0.0006011 0.0359451 0.029 0.029000 0.000256

0.001278

Cluster 1

Cluster 2

Average error

1.15295 0.16045 0.13520 0.04681 0.11567

0.53931 0.15367 0.82368 0.09039 0.09614

0.84613 0.15706 0.47944 0.06860 0.10591

0.00038

0.00360

0.00199

Fig. 4. (a) Original data-set. (b) noisy data-set, (c) clustering result of FCM, (d) clustering result of KFCM with h (kernel width) ¼ 10, a ¼1, b¼ 2.3, (e) clustering result of FCM-s, (f) clustering result of KFCM-s with h ¼10, a¼ 1, b¼ 2.45, (g) clustering result of DOFCM with a ¼ 0.1, (h) clustering result of DKFCM-new with a ¼0.1, h ¼10, a¼ 1, b¼ 2.31. Centroids are shown with ‘*’ symbol and outliers (noise points) are identified with the symbol J.

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

843

Fig. 5. (a) Original data-set, (b) noisy data-set, (c) clustering result of FCM, (d) clustering result of KFCM with h (kernel width)¼ 150, a ¼1, b ¼2.3, (e) clustering result of FCM-s, (f) clustering result of KFCM-s with h ¼ 150, a¼ 1, b ¼2.25, (g) clustering result of DOFCM with a ¼ 0, and (h) clustering result of DKFCM-new with a ¼ 0, h ¼ 150, a¼ 1, b ¼2.3. Centroids are shown with ‘*’ symbol and outliers (noise points) are identified with the symbol J.

The Wisconsin Breast cancer data is widely used to test the effectiveness of classification. The aim of the classification is to distinguish between benign and malignant cancers based on the available 30 attributes. The original database contains 569 instances. The class distribution is 357 benign and 212 malignant, respectively. Table 4 shows misclassification and the accuracy value. In the case of Wisconsin breast cancer data, the best result is given by DKFCM-new, in which out of 569 points only 08 points are misclustered with the accuracy value of 0.9929. KFCM-s is close to DKFCM-new with 14 misclassifications. KFCM and FCM-s

Table 4 The number of misclassified data and accuracies using FCM, KFCM, FCM-s, KFCMs, DOFCM, and DKFCM-new for the Wisconsin Breast Cancer data-set. Methods

Misclassification

Accuracy

FCM KFCM FCM-s KFCM-s DOFCM DKFCM-new

162 92 94 14 136 08

0.8576 0.9490 0.9173 0.9876 0.8150 0.9929

have almost same performance with 92 and 94 misclassifications with the accuracy values of r¼0.9490 and 0.9173 respectively. Example 5. Data-set: Iris data set, 4-dimentional data (150 points) Algorithm: FCM, KFCM, DOFCM, FCM-s, KFCM-s and DKFCM-new Number of Clusters: 3 clusters

Size of clusters: 50, 50, 50 It is a four-dimensional data set containing 50 samples each of three types of Iris flowers. One of the three clusters (class 1) is

844

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

well separated from the other two, while classes 2 and 3 have some overlap. We made several runs of these algorithms and the proposed method. In the case of Iris data-set, DKFCM-new has detected the original clusters with the accuracy value of 1. As indicated in Table 5, FCM has 20 misclassifications when compared with physically correct labels of the Iris data. Misclassifications made by KFCM and KFCM-s are 02 and 04 respectively. 4.2. Comparison of DKFCM-new with other noise resistant methods like PCM, PFCM, CFCM and NC We also compared the proposed method with other well known noise resistant methods in the history. Fig. 6 shows the clustering comparison of DKFCM-new with PCM, PFCM, CFCM and NC with D12 data-set. Centroids in case of PFCM are attracted towards outliers and they do not recognize efficient clusters. PCM resulted into identical clusters. The centroids of NC and CFCM are not much affected with the presence of outlier but CFCM assigned one outlier point to both the clusters and it included the outlier into the final cluster. Although NC and DKFCM-new separate the outlier but as compare to NC, Table 5 The number of misclassified data and accuracies using FCM, KFCM, FCM-s, KFCMs, DOFCM, and DKFCM-new for the Iris data-set. Methods

Misclassification

Accuracy

FCM KFCM FCM-s KFCM-s DOFCM DKFCM-new

20 02 20 04 12 Nil

0.9333 0.9933 0.9333 0.9866 0.94 1

DKFCM-new generated almost original centroids. Table 6 lists the centroid locations and error percentage for the above methods. Noise resistant methods should be independent of the location of outliers i.e. the method should be able to identify outliers locating anywhere in the data-set. Problem with NC is that it is not independent of the location of outliers. It does not identify outlier which lies in between the two clusters. To prove, we are considering a date-set of 115 points, consisting of two clusters of approximately equal size with noise and the data points are distributed over two dimensional spaces. Figs. 7(a) and (b) shows original data-set and its noisy version. From Fig. 7, it can be clearly seen that NC did not identify all the outliers including the one which is lying in between the clusters. Consider a labeled point ‘A’ as shown in Fig. 7(b) (which is an outlier). As per NC approach, this point cannot be considered as an outlier because its membership degree to the noise cluster can never be less than the distance from its regular clusters as it is located in between the regular clusters (Dave and Krishnapuram, 1997; Dave, 1991, 1993). This scenario can never be justified with NC approach.

Table 6 Error percentage and centroids produced by PCM, PFCM, NC, CFCM, and DKFCMnew for D12, no. of clusters, ‘K’¼ 2. Methods

PCM (m ¼2, Z ¼2) PFCM (a¼ 1, b ¼1, m¼2, Z ¼ 2) CFCM (m¼ 2, y ¼0) NC (l ¼0.6, m¼ 2) DKFCM-new (m ¼2, a ¼0.09, h ¼19, a¼ 1, b¼ 6)

Centroid locations Center 1

Center 2

0 0.0007 3.179 3.151 3.324

0  0.0007  3.179  3.135  3.291

0.013 0.819 0 0.2 0

Average error

0.013 11.15 0.819 3.94 0 0.028 0.2 0.082 0 0.0012

Fig. 6. (a) Clustering result of PCM for m¼ 2, Z ¼2, (b) clustering result of PFCM for m ¼2, Z ¼2, a¼ 1, b ¼1, (c) clustering result of CFCM for m¼ 2, y ¼0, (d) clustering result of NC for m ¼2, l ¼0.6, and (e) clustering result of DKFCM-new for m¼ 2, a ¼0, a¼ 1, b¼ 6, h ¼19. Centroids are plotted with ‘*’, Outliers are represented as J, and clusters are separated with the symbols ‘.’ & ‘þ ’.

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

845

Fig. 7. (a) Original data-set D115, (b) noisy version, (c) clustering result of NC for m¼ 2, l ¼0.50, and (d) clustering result of DKFCM-new for m ¼2, a ¼ 0.14, a¼ 1, b¼ 6, h ¼ 50. Centroids are plotted with ‘*’, Outliers are represented as J and clusters are separated with the symbols ‘.’ & ‘þ ’.

Fig. 8. (a) Clustering result of NC for m¼ 2, l ¼0.51 and (b) clustering result of DKFCM-new for m ¼2, a ¼ 0.14, a¼ 1, b ¼ 6, h ¼ 50. Centroids are plotted with ‘*’, Outliers are represented as J and clusters are separated with the symbols ‘.’,’x’,’ x ’ & ‘þ ’.

DKFCM-new has detected it as an outlier because the local membership of this point is less than ‘a’. It is also visually verified that DKFCM-new has detected original clusters. Noise resistant methods should be independent of the number of clusters for the same data-set (Rehm et al., 2007). NC is not independent of the number of clusters i.e. if the number of clusters is increased for the same data-set, it does not identify outliers whereas DKFCM-new is independent of number of clusters for the same data-set. To test this, same data-set is divided into four clusters. Fig. 8 shows that NC could not detect even those outliers which are identified in the 2 cluster structure, whereas DKFCM-new detected all the outliers. Noise resistant methods should be independent of any amount of outliers i.e. performance of the algorithm should not change on increasing the number of outliers in the same data-set. Fig. 9 shows the performance comparison of NC and DKFCM-new when we increased the number of outliers from one to four in the D12 data-set.

It is visually verified that the performance of NC is degraded as we increased the number of outliers whereas the cluster centroids did not change in case of the proposed method.

4.3. Comparison of DKFCM-new with credal partition based evidential clustering methods Credal partition based method are developed within the framework of belief functions and based upon Dempster–Shafer theory of evidence (Denoeux and Masson, 2004; Denoeux and Smets, 2006; Masson and Denoeux, 2008, 2009; Antoine et al., 2009). A credal partition extends the existing concepts of hard, fuzzy (probabilistic) and possibilistic partition by allocating a mass of belief for each object. It represent partial knowledge regarding the class membership of an object i by a bba ( basic belief assignment) mi on the set O ¼{o1, y,oc}. Evidential C-means (ECM) is inspired from FCM and NC algorithm.

846

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

Fig. 9. (a) Clustering result of NC for m¼ 2, l ¼ 0.6 and (b) clustering result of DKFCM-new with a ¼0.09, h ¼ 19, a¼ 1, b ¼6. . Centroids are shown with ‘*’ symbol and outliers (noise points) are identified with the symbol J.

A credal partition (Masson and Denoeux, 2008) is defined as the n-tuple M¼(m1, y, mn). It can be seen as the general model of partitioning: – when each mi is a certain bba, then M defines a conventional, crisp partition of the set of objects. – when each mi is a Bayesian bba , then M specifies a fuzzy partition, as defined by Bezdek (1981). – when the focal elements of all bbas are restricted to be singletons of O or the empty set, a partition similar to the one of Dave is recovered. If a credal partition from object data is derived then we have to determine, foreach object i, the quantities mij; mij ¼ mi Aj Aj a |,Aj D O in such a way that mij is low when the distance dij between i and the focal set Aj is high. The distance between an object and any nonempty subset of O has thus to be defined like NC. Like fuzzy clustering assumes that each class is represented by the center vk; it associated barycenter vj to each subset Aj of O. Barycenter vj is associated to Aj by vj ¼

c 1X s v cj k ¼ 1 kj k

ð61Þ

where cj ¼9Aj9 denotes the cardinality of Aj.  1 if ok EAj skj ¼ 0 else The distance dij is defined by

ð62Þ

2

2

dij 9:xi vj :

ð63Þ

In credal framework, empty set is equivalent to the noise cluster. The objective function of ECM is J ECM ðM,VÞ9

n X

X

i ¼ 1fj=Aj a |,Aj D Og

Subject to X mij þ mi| ¼ 1

2

caj mbij dij þ

8i ¼ 1,n

n X

d2 mbi|

separate then in the empty set. But, as there is a need to set a particular fixed distance in these algorithms, which is called noise distance in case of NC; these algorithms will not be independent of locations of outliers as well as number of clusters for the same data-set. DKFCM-new is the extension and kernelized version of FCM with a new distance metric which also consider the variation of points within the cluster also. Hence, is able to detect non-linear clusters whereas in credal based methods, Euclidean distance is used which can only detect hyper spherical clusters. Thus, it is evident that DKFCM-new performed better as compared to other competitive algorithms and is highly robust against noise.

5. Conclusion In this paper, we proposed a robust kernel approach to the earlier proposed method. Proposed method used a new distance metric instead of Euclidean distance that incorporates the distance variation of data-points within each cluster to DOFCM in the feature space. The DKFCM-new with the new distance metric significantly shows improvement over DOFCM and other competitive methods. To test the performance of proposed method it is applied to synthetic data-sets, Standard data-sets like DUNN and Bensaid data-set, and to real life data-sets like Wisconsin Breast cancer and Iris data-set and compared with various kernel based methods. The performance of proposed method is also compared with other robust clustering methods like PCM, PFCM, CFCM, NC and credal based clustering methods like ECM, RECM, CECM. It has been found that proposed algorithm significantly outperformed its earlier version and was found highly robust to noise and outliers.

Appendix A. Kernel version of Density oriented Fuzzy C-means with new distance metric (DKFCM-new)

ð64Þ

i¼1

A.1. Proof of density oriented kernelized approach to fuzzy c-means with new distance metric ð65Þ

fj=Aj a |,Aj D Og

The criteria of ECM is similar to JNC except that an additional weighting coefficient (caj Þ is introduced which aims at penalizing the subsets in O of high cardinality. The exponent a controls the degree of penalization. Parameter b and d have the same meanings as in Dave’s method. The fundamental difference between ECM and NC is that a credal partition has more degrees of freedom than a fuzzy one. The performance of NC and credal clustering based methods (ECM, RECM, CECM) is similar. They identified the outliers and

In this appendix, we give the proof of DKFCM-new which is the kernelized version of density oriented fuzzy c-means clustering algorithm by incorporating new distance metric. The problem of minimization of objective function J DKFCM-new {given in (48)} subjected to the constraint specified by (2) is solved by minimizing a constraint free objective function defined as ! cX þ1 X n cX þ1 n X ^ 2 þ J ¼ um F li uki 1 ð66Þ DKFCM-newðU,V Þ

ki

i¼1k¼1

dkc

i¼1

k¼1

where li ði ¼ 1,2,3,. . .,cÞ are Langrangian multipliers.

P. Kaur et al. / Engineering Applications of Artificial Intelligence 26 (2013) 833–847

By taking the partial derivatives of J DKFCM-new with respect to uki and vk , yields the solution for the problem: 2

kc

:Fðxk ÞFðvi Þ:

ðFðxk ÞFðvi ÞÞðFðxk ÞFðvi ÞÞ ¼ Fsi Fsi Fðxk ÞFðxk Þ 2Fðxk ÞFðvi Þ Fðvi ÞFðvi Þ ¼  þ Fsi Fsi Fsi

^ 2 ¼ F d

ð67Þ

If we adopt Radial Basis kernel in the propose technique then 0 P 1 b 9xai xaj 9 @ A with a,b 4 0 K ðx,yÞ ¼ exp  2 h Let a¼ 1 and b¼2, then ! Jxk vi J2 K ðx,yÞ ¼ exp  2 h

ð68Þ

where h is defined as kernel width and it is a positive number, then K ðx,xÞ ¼ 1: Hence, (67) will become 2

:Fðxk ÞFðvi Þ: ¼ ð1K ðxk ,vi ÞÞ So,

J DKFCM-new ¼

cX þ1

n X

^ um ki ð1K ðxk ,vi ÞÞ þ

i¼1k¼1 cX þ1 X n

um ik

cX þ1

li

i¼1

1exp 

i¼1k¼1

Jxk vi J2 h

2

n X

! uik 1

k¼1 !! cX þ1 þ li i¼1

n X

! uik 1

k¼1

ð69Þ A.1.1. Partial derivative of JDKFCM-new with respect to vi The partial derivative of J DKFCMnew with respect to vi is !! n X um @J Jx v J2 ðxk vi Þ ik ¼ exp  k 2 i 2 @vi F h h k ¼ 1 si

ð70Þ

Equating (27) to zero leads to @J ¼0 @vi n X ^ um ) ik  K ðxk ,vi Þðxk vi Þ ¼ 0 ) vi ¼

k¼1 n X

^ um ik  K ðxk ,vi Þxk ¼

k¼1 Pn m ^ k ¼ 1 uik  K ðxk ,vi Þxk Pn m ^ k ¼ 1 uik  K ðxk ,vi Þ

n X

^ um ik  K ðxk ,vi Þvi

k¼1

ð71Þ

A.1.2. Partial derivative of JDKFCM-new with respect to uik   @J ¼ muðm1Þ 1K^ ðxk ,vi Þ þ li ik @uik

ð72Þ

Equating (72) to zero leads to the following:   @J ¼ 0 ) muðm1Þ 1K^ ðxk ,vi Þ þ li ¼ 0 ik @uik !1=ðm1Þ

l 1=ðm1Þ 1 ) uik ¼  i m 1K^ ðxk ,vi Þ To fulfill the constraint (2), ) uik ¼ Pc

uik

i¼1

uik

In view of (72), Eq. (73) can be written as 

 1=ðm1Þ 1= 1K^ ðxk ,vi Þ ) uik ¼   1=ðm1Þ Pc ^ i ¼ 1 1= 1K ðxk ,vi Þ 2

¼ K^ ðxk ,xk Þ2K^ ðxk ,vi Þ þ K^ ðvi ,vi Þ

J DKFCM-new ¼

847

ð73Þ

By using equation, :Fd2 ¼ Fðxk ÞFðvi Þ: , above equation can ki be re-written as uik ¼

Pc

i¼1



1 ^ F

2

dkc

^ 2 =F d

1=ðm1Þ

ð74Þ

ki

References Antoine, V., Quost, B., Masson, M.-H., Denoeux, T., 2009. CECM: Constrained evidential c-means algorithm. Comput. Stat. Data Anal. 56, 894–914. Bensaid, A.M., hall, L.O., Bezdek, J.C., Clarke, L.P., Silbiger, M.L., Arrington, J.A., Murtagh, R.F., 1996. Validity-guided clustering with applications to image segmentation. IEEE Trans. Fuzzy Syst. 4 (2), 112–123. Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithm. Plenum, New York. Chen, J.L., Wang, J.H.,1999. A new Robust Clustering Algorithm: Density-Weighted Fuzzy C-Means. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 90–94. Chintalapudi, K.K., Kam M., 1998. A noise resistant fuzzy c-means algorithm for clustering. In: IEEE Conference on Fuzzy Systems Proceedings, vol. 2, pp. 1458–1463. Dave, R.N., 1991. Characterization and detection of noise in clustering. Pattern Recognition Lett. 12 (11), 657–664. Dave R.N., 1993. Robust fuzzy clustering algorithms. In 2nd IEEE International Conference on Fuzzy Systems, San Francisco, CA, March 28–April 1, 1993, pp. 1281–1286. Dave, R.N., Krishnapuram, R., 1997. Robust clustering methods: a unified view. IEEE Trans. Fuzzy Syst. 5 (2). Denoeux, T., Masson, M.-H., 2004. EVCLUS: EVidential CLUStering of proximity data. IEEE Trans. Syst. Man Cybernet. B 34 (1), 95–109. Denoeux, T., Smets, P., 2006. Classification using belief functions: the relationship between the case-based and model-based approaches. IEEE Trans. Syst. Man Cybernet. B 36 (6), 1395–1406. Dunn, J., 1974. A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J. Cybernet. 3, 32–57. Ester, M., Kriegel, H.-P., Sander, J., Xu, X., 1996. A density-based algorithm for discovering clusters inlarge spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD, Portland, OR, pp. 226–231. Huang, Z., Ng, M.K., 1999. A Fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7 (4), 446–452. Kaur, P., Gosain, A., 2011. A density oriented fuzzy c-means clustering algorithm for recognising original cluster shapes from noisy data. Int. J. Innovative Comput. Appl. 3 (2), 77–87. Krishnapuram, R., Keller, J., 1993. A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1 (2), 98–110. Masson, M.-H., Denoeux, T., 2008. ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recognition 41, 1384–1397. Masson, M.-H., Denoeux, T., 2009. RECM: relational evidential c-means algorithm. Pattern Recognition Lett. 30, 1015–1026. Pal, N.R., Pal, K., Keller, J., Bezdek, J.C., 2005. A Possibilistic Fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Syst. 13 (4), 517–530. Prabhjot Kaur, Anjana Gosain,2010. Density oriented approach to identify outliers and get noiseless clusters in fuzzy c-means. In: 2010 IEEE International Conference on Fuzzy Systems, July 18–23, Barcelona, Spain. Rehm, F., Klawonn, F., Kruse, R., 2007. A novel approach to noise clustering for outlier detection. In: Applications and Science in Soft Computing, vol. 11. Springer-Verlag, pp. 489–494. Tsai, D.M., Lin, C.C., 2011. Fuzzy C-means based clustering for linearly and nonlinearly separable data. Pattern Recognition 44 (2011), 1750–1760. van Wijk, Jarke J., Edword, R., 1999. Cluster and calender based visualizations of time series data. In: IEEE Symposium on Information Visualization, San Fransisco, October 25–26, 1999. Zhang, D.Q., Chen, S.C., 2003. Clustering incomplete data using kernel based fuzzy c-means algorithm. Neural Process. Lett. 18 (2003), 155–162. Zhang, D.Q., Chen, S.C., 2004. A novel kernelized fuzzy c-means algorithm with application in medical image segmentation. Artif. Intell. Med. 32 (2004), 37–50.