Validation criteria for enhanced fuzzy clustering

Validation criteria for enhanced fuzzy clustering

Available online at www.sciencedirect.com Pattern Recognition Letters 29 (2008) 97–108 www.elsevier.com/locate/patrec Validation criteria for enhanc...

503KB Sizes 0 Downloads 40 Views

Available online at www.sciencedirect.com

Pattern Recognition Letters 29 (2008) 97–108 www.elsevier.com/locate/patrec

Validation criteria for enhanced fuzzy clustering Asli Celikyilmaz b

a,*

, I. Burhan Tu¨rksßen

a,b

a Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ontario, Canada M5S 3G8 Head Department of Industrial Engineering, TOBB-Economy and Technology University, So¨gu¨to¨zu¨ Cad. No. 43, So¨gu¨to¨zu¨ 06560, Ankara, Turkey

Received 13 November 2006; received in revised form 22 June 2007 Available online 22 September 2007 Communicated by F. Roli

Abstract We introduce two new criterions for validation of results obtained from recent novel-clustering algorithm, improved fuzzy clustering (IFC) to be used to find patterns in regression and classification type datasets, separately. IFC algorithm calculates membership values that are used as additional predictors to form fuzzy decision functions for each cluster. Proposed validity criterions are based on the ratio of compactness to separability of clusters. The optimum compactness of a cluster is represented with average distances between every object and cluster centers, and total estimation error from their fuzzy decision functions. The separability is based on a conditional ratio between the similarities between cluster representatives and similarities between fuzzy decision surfaces of each cluster. The performance of the proposed validity criterions are compared to other structurally similar cluster validity indexes using datasets from different domains. The results indicate that the new cluster validity functions are useful criterions when selecting parameters of IFC models.  2007 Elsevier B.V. All rights reserved. Keywords: Supervised clustering; Fuzzy clustering; Cluster validity index; Fuzzy functions

1. Introduction Since Zadeh’s initial introduction of the concept of fuzzy sets (1965), numerous fuzzy set-based approaches have been developed to model systems with uncertainties. The principles of these theories are to identify uncertainties in a given system by means of linguistic terms represented with membership functions. Fuzzy clustering methods are one of the strategies implemented to identify these membership functions by organizing patterns into clusters such that data samples within clusters are more similar to each other. The most commonly used fuzzy clustering method is the fuzzy C-means (FCM) (Bezdek, 1981) algorithm. Numerous variations of FCM algorithm are later developed for different purposes, e.g. (Hathaway and Bezdek, 1993; Ho¨ppner and Klawonn, 2003; Pedrycz, 2004; Cimino et al., 2006). *

Corresponding author. Tel.: +1 416 978 1278. E-mail address: [email protected] (A. Celikyilmaz).

0167-8655/$ - see front matter  2007 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2007.08.017

Recently, the authors have developed a new improved fuzzy clustering (IFC) (Celikyilmaz and Tu¨rksßen, in press, submitted for publication) algorithm for regression and classification type domains. Initially, IFC combines standard fuzzy clustering, i.e. FCM (Bezdek, 1981) and fuzzy C-regression, i.e. FCRM (Hathaway and Bezdek, 1993), algorithms for identification of the structures of fuzzy system models (FSM) with improved fuzzy functions (IFFs) (Tu¨rksßen, in press; Tu¨rksßen and Celikyilmaz, 2006), which use membership values as additional predictors of the system model. They have shown that FSMs with IFFs can provide better estimations. The extension of the novel IFC to classification problems, IFC-C (Celikyilmaz and Tu¨rksßen, submitted for publication) also explores any given classification dataset to find local fuzzy partitions and simultaneously builds c fuzzy classifiers (functions). Every fuzzy clustering approach, including the latest IFC (Celikyilmaz and Tu¨rksßen, in press, submitted for

98

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

publication) algorithms, assumes that some initialization parameters are known. Cluster validity index (CVI) measures, (Fukuyama and Sugeno, 1989; Xie and Beni, 1991; Pal and Bezdek, 1995; Bezdek, 1976) have been proposed to validate the underlying assumptions of the number of clusters, mainly for FCM (Bezdek, 1981) clustering approach. Later, many variations of these functions are introduced, e.g., (Bouguessa et al., 2006; Dave, 1996; Kim et al., 2003; Kim and Ramakrishna, 2005; Wu and Yang, 2005a,b), have been extended. The main characteristics of these CVIs are that they all use either within cluster, viz., compactness, or between cluster distances, viz., separability, or both as a way of assessing the clustering schema (Kim et al., 2003). Base on the way the compactness and separability are coupled, the CVI measures are generally classified into ratio or summation-type measures. Most commonly the CVIs listed above are designed to validate FCM (Bezdek, 1981) clustering algorithm and they may not be suitable for other variations of fuzzy clustering algorithms, which are designed for different purposes, e.g. fuzzy C-regression (switching regression) algorithm (FCRM) (Hathaway and Bezdek, 1993). For these types of FCM variations, different validity measures are created. For instance, in (Kung and Lin, 2002, 2004), a new CVI, which is a modification of the Xie and Beni (1991) ratiotype validity function, is introduced to measure the optimum number of clusters of the FCRM applications. It accounts for the similarity between regression models using the standard inner product of unit normal vectors, instead of the distance between the cluster centers. In this paper, two new validity criterions is introduced to measure the optimum number of clusters, denoted with c*, using two different versions of IFC algorithms (Celikyilmaz and Tu¨rksßen, in press, submitted for publication). The new ratio-type validity criterions measure the ratio between the compactness and separability of the clusters. Since IFC is a new type of hybrid-clustering method, which in a way uses structures from two separate clustering algorithms during optimization, viz., fuzzy clustering types, i.e. FCM (Bezdek, 1981) and FCRM (Hathaway and Bezdek, 1993) in a novel way, and utilizes fuzzy functions (FFs), the new CVI is designed to validate two different concepts. The compactness couples within cluster distances and c number of regression/classification function errors between the actual and estimated output values/class labels. The separability, on the other hand, will determine the structure of the clusters by measuring the ratio between the cluster center distances and the angle between their fuzzy decision surfaces. The organization of this paper is as follows. In Section 2, IFC algorithms are briefly reviewed. In Section 3, the new CVIs designed for IFC algorithm are introduced. In Section 4, we present simulation results of the application of the new CVI on different dataset domains using artificial and real life datasets and compare the results to three other well-known cluster validity measures, which are closely related to the proposed validity measures.

2. Improved fuzzy clustering (IFC) algorithm In the earlier FSM with FFs modeling strategies (Celikyilmaz and Tu¨rksßen, 2007; Tu¨rksßen, in press; Tu¨rksßen and Celikyilmaz, 2006) standard FCM clustering algorithm is used to find membership values, which are supposed to represent good partitions of the given system domain. In (Celikyilmaz and Tu¨rksßen, in press, submitted for publication), two new fuzzy clustering methods are presented, improved fuzzy clustering, IFC (Celikyilmaz and Tu¨rksßen, in press) for regression type domains and its variation, improved fuzzy clustering for classification (IFC-C) type systems. The optimization approach of the IFC not only searches for the best partitions of the data, but also aims at increasing the predictive power of membership values to model the input–output relations in local fuzzy functions for regression models or in local fuzzy decision surfaces for classification models. IFC algorithm introduces a new objective function, which carries out two purposes: (i) to find a good representation of the partition matrix; and (ii) to find membership values to minimize the error of the models of FSM with improved FFs (FSMIFF). To optimize the membership values to accomplish these aims, in the new IFC, we appended the error of the regression functions, for estimation of the given output using the membership values, to the objective function of the standard FCM as follows: Xc Xnd J IFC ¼ lm d 2 m i¼1 k¼1 ik ik |fflfflfflfflfflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl ffl} FCM

þ

c X nd X i¼1

k¼1

^ i ÞÞ2 ; lmik ðy k  fi ðsik ; w |fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl ffl}

ð1Þ

SE of Fuzzy Function

where m determines the degree of overlap between clusters, d 2ik is the distance between kth input vector, k = 1, . . .,nd, and ith cluster center identified at iteration t and lik is the membership value of kth data vector in cluster i, i = 1, . . .,c, where c is the number of clusters initialized by the expert and nd is the total number of vectors in training dataset. The first term of the objective function in Eq. (1) is as same as the standard FCM algorithm, which controls the precision of each input vector to its cluster and vanishes when every instance is a cluster by itself. The second term measures squared error (SE) of the (approximated) fuzzy functions, f(si,wˆi), that are built at each step of IFC algorithm using the membership values and/or their possible transformations as input variables excluding the original scalar inputs. Therefore, the input matrix, si(li), ˆ i of f(si,wˆi), inwhich is used to estimate the parameters, w clude membership values from the previous iteration step, and/or their transformations. When the system domain is a classification type, the objective function of the extended version of IFC, which is IFC-C (Celikyilmaz and Tu¨rksßen, submitted for publication) is as follows:

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

J IFCC ¼ m

Xc Xnd

nd X m ðlik Þ xk 8 vi ¼

lm d 2 i¼1 k¼1 ik ik |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl} þ

c X

FCM nd X

i¼1

k¼1

16i6c

lmik ðy k  Pb ik ðy k ¼ 1jf ðsik ÞÞÞ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} 2

ð2Þ

SE of IFC-C

In the case of Eq. (2), SE of IFC-C measures the squared deviation of actual class labels from estimated posterior probabilities, Pb ik ðy k ¼ 1jf ðsik ÞÞ. For any cluster i, the second term takes the form: 3 2 02 3 12 y1 Pb i1 ðy 1 ¼ 1Þ B6 . 7 6 7C .. 6 7 6 7C SEi ¼ B @4 .. 5  4 5A . b y nd P i;nd ðy nd ¼ 1Þ Eq. (3) displays two special cases to estimate the fuzzy ˆ i), parameters during a t iteration of IFC function, f(si,w and IFC-C for different system domains using linear regression (LSE) and logistic regression (LR) for classification as follows: ^ i Þ ¼ fsi : ½1 lik    eðlik Þ j^ ^ li;1i þ    þ w ^ i;nm eðlik Þ g fi ðsi ; w wi;0 þ w Pb ik ðy k ¼ 1jsik Þ ¼ 1=ð1 þ expð^ wi  sik ÞÞ 3 0 0 02 111 ^ i;0 2 w 3 sik 2Rnmþ1 , 7 zfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflffl{ CCC B B B6 w 7 CCC B B B6 ^ i;1 76 ¼ 1 B1 þ exp BB6 . 741lik    eðlik Þ 5CCC @ @ @4 .. 5 AAA ^ i;nm w ð3Þ During IFC optimization, the aim is to minimize the error between the actual and the estimated output values or predicted posterior probabilities in classification cases in each cluster. Excluding original input variables during shaping membership values, we are able to measure and improve their individual effect on the performance of a model. In the IFC-C case, one could use suitable statistical learning algorithms, e.g. LR, or much effective soft computing approach such as support vector classification (SVC), to approximate classifiers. Then, we measure the posterior probabilities of the SVM using improved Platt’s probability method, which is base on the approximation of the yˆk with a sigmoid function. The solution to minimize the new objective function of IFC-C IFC, J IFC , can be found by taking the m , and IFC-C, J m dual of the model using the Lagrange transformation of the objective function, which yields new membership functions as follows: " #1=ð1mÞ c X ^ i ÞÞ2 d 2ik þ ðy k  fi ðsik ; w IFC lik ¼ ; 2 ^ j ÞÞ2 1
jk

Since the second term of the objective function J IFC does m not include the cluster center term as do the first term, the cluster center, vi(x,y), formulation of the IFC remains same as the standard FCM:

k¼1

, n X

ðlik Þ

99

m

ð5Þ

k¼1

The optimization method searches for the optimum membership values, which are to be used later as additional predictors to estimate the parameters of FSM with improved fuzzy regression/classification functions (FSMIFF) of the given system model. 3. Validity measures for improved fuzzy clustering (IFC) The fact that IFC couples point-wise clustering, e.g. FCM, and regression/classification type clustering e.g. fuzzy C-regression (FCRM), we hypothesize that the new validity index should include the concepts from two different types of CVIs. Hence, we will investigate both types in this section before we present the new CVI. 3.1. Well-known CVIs for point-wise and regression type clustering algorithms Most common CVIs are structured combining two different clustering concepts: (i) Compactness. Similarity between cluster elements within each cluster. (ii) Separability. Dissimilarity between each individual cluster. It has been shown that a clustering algorithm is effective when compactness is small and separability is large (Kim and Ramakrishna, 2005). Based on the conjunction of these two clustering concepts, the cluster validity indices can be categorized into two different types: ratio-type and summation-type. The ratio-type validity indices are formed by measuring the ratio of the compactness to the separability, e.g. Xie and Beni (1991) index. The summation-type validity indices couple the compactness and separability concept by adding them in many different ways, e.g. Fukuyama and Sugeno (1989) index. Since the new validity index proposed in this paper is a ratio-typevalidity measure, we will give details of the prominent ratio-typevalidity indexes. A well-known ratio-type CVI (compactness/separability) is the XB CVI (Xie and Beni, 1991) which is formulized as: P P  c nd 2 2 =nd i¼1 k¼1 lik dðxk ; vi Þ XBðcÞ ¼ ; 2 mini;j6¼i dðvi ; vj Þ dðxk ; vi Þ ¼ kxk  vi k

2

ð6Þ

where xk 2 Rnv represents the kth input vector, k = 1, . . . , nd and vi 2 Rnv represents cluster center as a vector of nv dimensions. XB decreases monotonically when c is close to nd. Kim and Ramakrishna (2005) discuss the relationship and behaviour of the compactness and separability of the clusters obtained from FCM for changing

100

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

values of the number of clusters. Fig. 1 is adopted from Kim and Ramakrishna (2005). It is generalized that, the compactness increases sharply as c decreases from c* to c*  1. In XB validity index, the compactness of the whole clustering structure is determined (the numerator in (6)) by averaging the compactness of every cluster. However, averaging might suppress the effect of the large changes in the compactness’s of some clusters, which are caused by FCM models that have undersized (oversized) number of clusters. Therefore, to exploit these huge shifts in compactness, maximum compactness of clusters should be measured. The relative changes of the compactness and separability are somewhat similar when c 5 c* (see Fig. 1). Therefore, their effect to the ratio-type validity indices should be somewhat similar, i.e., they should both be increasing/decreasing functions at the same c values. Hence, in (Kim and Ramakrishna, 2005), an improved version of XB validity index is proposed as follows: Pnd  l2 kx mi k2 k¼1 ik k maxi¼1;::;c nd XB ðcÞ ¼ ð7Þ mini;j6¼i kvi  vj k2

compactness, seperability

XB* index has been proven to be more effective than the XB index, because XB* can detect the clusters with large compactness, which leads us to determine the c* by observing these ambiguities in clustering structure. Hence, XB* will be the starting point of our new cluster validity formula for IFC (Celikyilmaz and Tu¨rksßen, in press, submitted for publication). On the other hand, Kung and Lin (2004) formulated a validity index to validate FCRM (Hathaway and Bezdek, 1993) type clustering approaches. The objective of FCRM is to identify C-regression model parameters and a partition matrix, which is interpreted as the importance or weight attached to the measured error between the actual output and each regression model output of the systems with multiple models. Their CVI is based on XB index; however, the compactness is measured by the regression model fit between the regression output from each model and actual output. The separability is measured by the

separability

compactness

c-optimal number of clusters

Fig. 1. Compactness and separation concepts of ratio-type CVI index (Kim and Ramakrishna, 2005).

inverse dissimilarity between the clusters defined by the absolute value of the standard inner product of the unit normal vectors, which represent c hyper-planes. The Kung and Lin (2004) CVI is formulized as follows: P P  c nd 2 2 T l kx h  y k =nd i k i¼1 k¼1 ik h i Kung  LinðcÞ ¼ ; 1 maxi6¼j jhui ;uj ijþj

1

where hi ¼ ½xT li x ½xT li xy

ð8Þ

The numerator is the compactness measure and the denominator represents the separability. ui’s represent unit normal vector of each c-regression functions and their corresponding unit vectors are defined as: ui ¼

½ni  ; kni k

ni ¼ ½hi1    hi;nv  1 2 Rnvþ1

ð9Þ

The ni indicates the regression function parameters and k Æ k is the Euclidean norm, nvindicate the number of variables of the input dataset, x = [x1, . . . , xnv]. Kung and Lin (2004) use dinner product of the two unit vectors of two clusters, which equals to the value of the cosine of the angle between them and this value is used to measure the separability of the c regression functions. When the functions are orthogonal, the separability is maximized. 3.2. Two new CVIs for IFC and IFC-C Two new ratio-type CVIs are proposed to validate supervised IFC (Celikyilmaz and Tu¨rksßen, in press) and its extension for classification domains, IFC-C (Celikyilmaz and Tu¨rksßen, submitted for publication) algorithms to find the optimum number of clusters, c*. In general IFC, the clusters are identified together by cluster prototypes (centers) and their corresponding functions. Membership values calculated with IFC are also used as candidate input variables, which can help to identify a decision functions along with the original input variables to model the output variable for each prototype. Due to the coupling of two concepts in one clustering algorithm, IFC, i.e. clustering and regression/classification, the new validity index should be able to validate both concepts. The compactness of the new validity measure proposed in this paper, cviFF, for IFC and cviFF-C for IFC-C, will combine two terms. We use XB* compactness (numerator in (7)) as the first term of the compactness and a modified version of the compactness of Kung–Lin index (numerator in (8)) as the second term. The second term of the compactness of the cviFF will represent the error between the actual output and the estimated output from ‘‘fuzzy functions’’, f(x,si), (either as scalar output or posterior probabilities for classification models) which are approximated using the membership values and the original input variables as explanatory variables. The separability of the cviFF will couple the angle between the decision surfaces of FFs and the distance between each input–output vector and the cluster centers. Herein, we used a sim-

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

pler regression function optimizer, LSE for IFC and LR for IFC-C optimization. Calculation of cviFF or cviFF-C values proceeds as follows: (i) A different dataset is structured for every cluster i by using membership values (li) and/or their transformations as additional dimensions. i.e. x 2 Rnv ! Ui ðx; li Þ 2 Rnvþnm , for each cluster i, where nm is the number of membership value transformations used as additional dimensions of the decision surface. (ii) We fit a linear or a non-linear function using each dataset, Ui(x,li) and measure the new CVI designed for IFC as follows: vc



9

nd P > ^ i ÞÞ2 Þ > ¼ max nd1 lmik ðkxk y k  vi k2 þ ðy k  fi ðUi ðx; li Þ; w > > > i¼1;::;c k¼1 > = 8 2 min ðkv  v k =jha ; a ijÞ; if jha ; a ij ¼ 6 0 > i j i j i j < i;j6¼i > i;j¼1;::;c;j6¼i

vs ¼

> : min kvi  vj k2 ;

otherwise

i;j6¼i

> > > > > ;



 cviFF ¼

vc ; ðc  vs Þ þ 1

ð10Þ where vc* represents the compactness and vs* represents the separability in the new validity measure. Let nUi , nUi ¼ ½ bi1 bi2    binm biðnmþ1Þ    biðnmþnvÞ  1 2 Rnvþnmþ1 , represent the normal vector of the fuzzy functions obtained from the dataset in the feature space, Ui ðx; li Þ 2 Rnvþnm . The ai in the jhai, ajij 2 [0,1] represent the unit normal vector of each fuzzy function, i, ai ¼ ½nUi =knUi k. The absolute value of the inner product of the two unit vectors of the fuzzy functions of the two clusters, i, j = 1, . . . , c, i 5 j, equals to the value of the cosine of the angle between them: cos hi;j ¼

hnUi ; nUj i ¼ hai ; aj i jnUi knUj j

When two cluster centers are too close to each other due to oversized number of clusters, the distance between them becomes almost (ffi0) invisible, then the validity measure goes to infinity. To prevent this, the denominator of cviFF in Eq. (10) is increased by 1. In cviFF-C for IFC-C model, the FF in vc*, fi(Ui(x,li), wˆi), is replaced with posterior probabilities obtained from the classifier models as follows:

vc ¼ max

following reasons. When c > c*, the compactness of the clusters will be small. This is because, as the number of clusters is increased, within cluster distances will decrease, since clusters will include objects that are more similar. In addition, there is one regression function estimated for each cluster. Similarly, as the number of functions is increased, the error of the fuzzy functions will decrease, because the regression model output will approach to the actual output. The compactness will be zero when c = nd, which is when every object becomes its own cluster center and one function passes through every object in the dataset. When c < c*, the clusters will include dissimilar objects together, which will increase the first term of the compactness in cviFF. Since there will be less functions than the actual number of models, the deviation between the actual and estimated error will be high. Therefore, if the number of clusters is less than c*, the compactness will be high. There will be a decrease in compactness as the number of clusters converges to c*. Therefore, from c = c* to c = nd, the compactness will gradually converge to zero. As in XB*, the new validity indices, cviFF, and cviFF-C, designed for IFC and IFC-C, respectively, are also concerned with the fact that the averaging of the compactness of individual clusters could suppress the effect of the clusters with high compactness when the number of clusters is small. Hence, to identify the sudden changes, one should examine the maximum compactness values instead of averaging the compactness values. With the separability, between cluster distances are represented with Euclidean distances between the cluster centers as in Eq. (10). Additionally, the absolute value of the cosine of the angle between each fuzzy function jhai,ajij 2 [0,1] is used as additional separability criteria. If the functions are orthogonal, then they are the most dissimilar functions, which is an attribute of an optimum model. The separability (denominator of cviFF in Eq. (10)) conditionally combines between cluster distances and angles by taking the ratio between them. If the angle between any two function is zero, i.e. they are parallel to each other, separability is represented with the minimum distance between their clusters. The more the separability, the better the clustering results. The validity of the separability can be explained by considering a clustering structure as shown in Fig. 2. When the two clusters are apart from each other, i.e. the distance

 9 8P < Nk¼1 lmik kxk y k  vi k2 þ ðy k  Pb i ðy ¼ 1jf ðUi ðx; li Þ; w ^ i ÞÞÞ2 =

i¼1;::;c :

nd

The compactness measure, vc*, of the new validity criterions introduces an additional term; the error of the fuzzy function models. vc* still shares a similar behaviour as the compactness of the XB* index as shown in Fig. 1 for the

101

;

ð11Þ

between their cluster centers are farther than the rest of the clusters, e.g. cluster1 (D’s) and cluster 2 (•’s), no matter what the angle between the functions are, the separability will be large. The separability will get larger, if the vector

102

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

50

40 1

30

20 2

10

0 4

-10 3

-20 -5

-3

-1

1

3

5

Fig. 2. Four different functions with varying separability relations.

lines are orthogonal, i.e. the cosine of the angle approaches to zero as the value of the angle approaches to 90. On the other hand, when the clusters are so close to each other, e.g., cluster 2 (•’s), cluster 3 (*’s) and cluster 4 (’s), the angle between their functions would be dominant separability identifier. If the vectors are close to being orthogonal vectors, i.e. the cosine of the angle between them is very close to 0 as in cluster 3 and cluster 4, then the minimum separability would be very large even if their distance is close to zero. When two cluster centers are close to each other and the functions are almost parallel to each other, i.e. the value of the cosine of the angle will be close to 1, e.g. cluster 2 (•’s) and cluster 3 (*’s), then the separability will be small. 4. Analysis of experiments This section presents the simulation results from the application of the cviFF and cviFF-C on the results of the IFC and IFC-C, respectively, using artificial and real datasets with different structures. 4.1. Experiment 1 The best way to demonstrate the performance of a validity index is to test on datasets where the structure is known.

We introduce a dataset structure similar to Kung and Lin (2004), but we use four different datasets containing different number of linear models. Each dataset is created using different number of functions with varying Gaussian noise, el,m(l = 1, . . . , 4, m = 1, . . . , nf) having zero means and variance of 0.9 in each function. nf represents the number of functions used to generate each dataset l, l = 1, . . . , 4. Firstly, 400 training input vectors, x, which are uniformly distributed in the range of [5, +5], are generated randomly for the first dataset. Then the dataset is grouped into 4 separate parts, of 100 observations. Each function from Table 1A is applied to one group to obtain output values, e.g. 4-clustered dataset illustrates four separate models. Following the same convention, we have generated 500, 490 and 450 more training vectors, x, separately which are also uniformly distributed in the range of [5, +5] to generate the datasets with 5, 7 and 9 patterns using the corresponding functions in Table 1B, C, and D. Using 500, 490 and 450 training vector sets; we applied each of their 100, 70 and 50 observations individually to the 5, 7, and 9 functions correspondingly to form three more single input- single output datasets, e.g. dataset2, dataset3, and dataset4. The scatter diagrams of four different datasets are shown in Fig. 3. We applied IFC algorithm (Celikyilmaz and Tu¨rksßen, in press) to these four datasets separately using two different degree of fuzziness component, m = 1.3, and m = 2.0 and 14 different number of clusters, c = 2, . . . , 15 and obtained the membership matrix for each different c and m values. We calculated the values of the proposed cviFF for each combination. To show the effectiveness of civFF, using the same membership value matrix from IFC, we calculated the values of XB, XB*, and Kung–Lin CVIs. Fig. 4 illustrates validity measures calculated from membership values of IFC applied on dataset1 for changing values of number of clusters, c, for two fuzziness values, m = 1.3 and 2.0. m = 1.3 is used to represent a more crisp model where overlapping of clusters are negligible, while m = 2 is a model where the local fuzzy clusters (models) overlap to a degree of 2.0. Figs. 5–7 compare different CVI measures of IFC algorithm applied on dataset2, dataset3 and dataset4, respectively. Hence, it is expected that c* would be 4, 5, 7, and 9 for dataset1, dataset2, dataset3 and dataset4, respectively.

Table 1 Functions used to generate artificial datasets (A) 4-cluster (dataset1) y1 y2 y3 y4

¼ ¼ ¼ ¼

xT1 b1 xT2 b2 xT3 b3 xT4 b4

¼ 2x þ 5 þ e1;1 ¼ 0:2x  5 þ e1;2 ¼ x þ 1 þ e1;3 ¼ x  8 þ e1;4

(B) 5-cluster (dataset2) y1 y2 y3 y4 y5

¼ xT1 b1 ¼ xT2 b2 ¼ xT3 b3 ¼ xT4 b4 ¼ xT5 b5

¼ 2x þ e2;1 ¼ 7x  5 þ e2;2 ¼ x þ 1 þ e2;3 ¼ x þ 14 þ e2;4 ¼ 5x  6 þ e1;5

(C) 7-cluster (dataset3) y1 y2 y3 y4 y5 y6 y7

¼ ¼ ¼ ¼ ¼ ¼ ¼

xT1 b1 xT2 b2 xT3 b3 xT4 b4 xT5 b5 xT6 b6 xT7 b7

¼ 2x þ e3;1 ¼ 7x  5 þ e3;2 ¼ x þ 1 þ e3;3 ¼ x  14 þ e3;4 ¼ 5x  6 þ e3;5 ¼ 6e3;6 ¼ 3x  25 þ e3;7

(D) 9-cluster (dataset4) y1 y2 y3 y4 y5 y6 y7 y8 y9

¼ xT1 b1 ¼ xT2 b2 ¼ xT3 b3 ¼ xT4 b4 ¼ xT5 b5 ¼ xT6 b6 ¼ xT7 b7 ¼ xT8 b8 ¼ xT9 b9

¼ x þ e4;1 ¼ 3x þ 2 þ e4;2 ¼ 0:5x þ 1 þ e4;3 ¼ 3x  3 þ e4;4 ¼ x þ 2 þ e4;5 ¼ 2x  9 þ e4;6 ¼ 2x þ e4;7 ¼ x  0:5 þ e4;8 ¼ x þ 0:2 þ e4;9

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

4-cluster

5-cluster 10

0

0

-2

-10

y

y

2

-4

-20

-6

-30

-8 -5

-40 -3

-1

1

3

5

-5

-3

-1

x1

1

3

5

3

5

12

15

x1

7-cluster

9-cluster 5

10 0

0

-10

y

y

103

-5 -20 -10

-30 -5

-3

-1

1

3

5

-5

-3

-1

1

x1

x1 Fig. 3. Input–output graphs of four different datasets.

15

8 m= 1.3 m= 2.0

m= 1.3 m= 2.0

6

XB

XB*

10 4

5 2 0 0

4

8

12

0

15

0

8

4

number of clusters

number of clusters 0.15

0.8 m= 1.3

0.1

cviFF

Kung-Lin

0.6

m= 2.0 m= 1.3

0.125

m= 2.0

0.4

0.075 0.05

0.2

0.025 0 0

4

8

12

15

number of clusters

0

0

4

8

12

15

number of clusters

Fig. 4. Cluster validity measures, XB, XB*, Kung–Lin and cviFF, versus c for two m values; m = 1.3 (.) and m = 2.0 (*) using four-patterned dataset.

4.2. Experiment 2 In order to evaluate performance of the proposed cviFF measure to strengthen our findings; we used the results

from the application of IFC (Celikyilmaz and Tu¨rksßen, in press) on real dataset of historical stock prices of a major Canadian Financial Institution, to estimate its last 2 months stock prices using 10 month prior stock prices. In

104

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108 60

8 m= 1.3 m= 2.0

m= 2.0 m= 1.3

6

XB*

XB

40

4

20 2 0

0 0

5

10

0

15

5

10

15

number ofclusters

number of clusters 0.8 m= 1.3

m= 1.3

m= 2.0

m= 2.0 0.2

cviFF

Kung-Lin

0.6 0.4

0.1

0.2 0 0

5

10

0

15

0

5

number ofclusters

10

15

number of clusters

Fig. 5. Cluster validity measures, XB, XB*, Kung–Lin and cviFF, versus c for two m values; m = 1.3 (.) and m = 2.0 (*) using five-patterned dataset.

3

6 m= 1.3 m= 2.0

m= 1.3 m= 2.0 2

XB

XB*

4

1

2

0 1

0 1

4

7

10

13

15

4

number of clusters

10

13

15

1.5

1

m= 1.3

m= 1.3 m= 2.0

0.75

m= 2.0 1

cviFF

Kung-Lin

7

number of clusters

0.5

0.5 0.25 0

0 1

4

7

10

13

15

number of clusters

1

4

7

10

13

15

numberof clusters

Fig. 6. Cluster validity measures, XB, XB*, Kung–Lin and cviFF, versus c for two m values; m = 1.3 (.) and m = 2.0 (*) using seven-patterned dataset.

real datasets, just as in artificial datasets, there are hidden components within which input–output relations can be individually defined with FSMIFF strategies. Fig. 8 illustrates Gaussian density graphs of the stock price dataset using two financial indicators. Two separate components can easily be noticed from the graph. The degree of overlap between these components might affect the number of clus-

ters perceived by the human operator. We expect to distinguish between any hidden overleaping clusters in real datasets using the new cviFF or cviFF-C based on the structure of the system. Stock prices collected over 12 months are divided into two parts. Approximately data from the first 10 months, i.e. from 27 July 2005 to 11 May 2006, is used to train

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108 3

6

m= 1.3 m= 2.0

m= 2.0 m= 1.3 2

XB

XB*

4

2

1

0

0 1

4

7

10

13

15

0

3

numer of clusters

6

9

12

15

number of clusters

1

0.15 0.

m= 2.0 m= 1.3

m= 2.0 0.8

m= 1.3

cvi FF

Kung-Lin

105

0.6

0.1

0.05 0.4 0.2

0 0

3

6

9

12

0

15

3

6

9

12

15

number of clusters

number of clusters

Fig. 7. Cluster validity measures, XB, XB*, Kung–Lin and cviFF, versus c for two m values; m = 1.3 (.) and m = 2.0 (*) using nine-patterned dataset.

x 10

-3

Potential Cluster #1

3.5

Potential Cluster #2

3

density

2.5 2 1.5 1 0.5 0 70

Bo

65

60

llin

ger

-Ba

60

nd

55

50

20

Day

45 40

50

0 day

EMA 5

Fig. 8. Density graph of stock price dataset using two financial indicators. Two components that are well separated are indicated.

models and to optimize model parameters. The last 2 months, i.e. from 11 May–21 July 2006, are holdout for testing model performances. Experiments were repeated with 20 random subsets of above sizes. Model performances using root-mean-error-square (RMSE) are measured for holdout dataset and averaged over 20 repetitions. Financial indicators like moving average, exponential moving average are used as predictors of the stock price.

Optimum parameters of IFC are determined from the best model of FSMIFF, which has the least RMSE value as a result of a grid search for changing values of c 6 nd*(1/10), where nd is the number of training samples. Membership values, l*, their logit transformations, i.e. log((1  l*)/l*), and exponential transformations, i.e. exp(l*), are used as additional dimensions to approximate systems of the multiple models with FFs. The average optimum c, c*, of the best FSMIFF models over 20

106

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

repetitions was found to be c* = 4.9 ± 1.77. In summary, stock price dataset is estimated to consists of c* 2 [3, 6] structures. To validate the latter statement, we calculated the values of the proposed validity index, cviFF, XB, XB* and Kung– Lin index for each combination. We plotted the validity index values measured for different cvalues using two fuzziness values, m = 1.3 and 2.0. Fig. 9 compares the values of

different CVI using membership values from IFC algorithm applied on stock price dataset. 4.3. Experiment 3 In addition, we used Ionosphere dataset from UCI repository (Hettich and Bay, 1999) to demonstrate the performance of the new cviFF-C, this time using a classifica3

8

m= 1.3

m= 1.3 m= 2.0

m= 2.0

2

XB*

XB

6 4

1 2 0

0 2

4

6

8

10

2

4

number of clusters

6

0.015

10

1.5 m= 1.3

m= 1.3

m= 2.0

m= 2.0 1

0.01

cviFF

Kung-Lin

8

number of clusters

0.5

0.005

0

0 2

4

6

8

10

2

4

number of clusters

6

8

10

number of clusters

Fig. 9. Stock price dataset – cluster validity measures, XB, XB*, Kung–Lin and cviFF, versus c for two m values; m = 1.3 (.) and m = 2.0 (*).

400

60

m= 1. 3 m= 2. 0

m= 1. 3 m= 2. 0

XB*

XB

40 200

20

0 2

4

6

8

0 2

10

number of clusters

4

6

0.08

10

0.8 m= 1. 3 m= 2. 0

m= 1. 3 0.06

m= 2. 0

0.6

cviFF-C

Kung-Lin

8

number of clusters

0.04 0.02

0.4 0.2

0

0 2

4

6

8

number of clusters

10

2

4

6

8

10

number of clusters

Fig. 10. Ionosphere dataset – cluster validity measures, XB, XB*, Kung–Lin and cviFF-C, versus c for two m values; m = 1.3 (.) and m = 2.0 (*).

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

tion dataset. The list of parameters used to model stock price dataset in the previous experiment is also used to build different models using FSMIFF strategy (Celikyilmaz and Tu¨rksßen, submitted for publication) on Ionosphere dataset using a three-way data split cross validation method to estimate the classification error rate of the final models. The experiment is repeated 10 times. The calculated average c* is found to be c* = 3.5 ± 0.92, based on the average of highest classification accuracies among 10 repetitions. Hence, to validate the c*, we measured the values for cviFF-C for classification cases, and XB, XB* and Kung–Lin validity indexes using the same dataset and averaged over 10 repetitions. Fig. 10 demonstrates the results for two fuzziness values, m = 1.3 and 2.0. 5. Discussions We conclude the following results from the application of the new cluster validity functions, cviFF as well as XB, XB*, and Kung–Lin CVI measures on the outcome of IFC using 4 different artificial datasets and a real dataset, additionally cviFF-C on the outcome of IFC-C using a real classification dataset.

107

the XB* validity index can increase or decrease but the c* should be where the index is minimum. When the actual number of clusters are small, as for dataset4 and dataset5, XB* cannot identify c*. For larger c values, e.g. dataset3 in Fig. 6, XB* can more closely identify the actual number of clusters. We can conclude that when the number of different components in the dataset are large, XB* can validate the IFC clustering applications to a certain degree. When the system has less models, XB* is not the best CVI for IFC algorithm. The Kung and Lin (2004) CVI asymptotes to zero and the number of clusters c indicate c* at the knee-point when it starts to asymptote to zero. From the CVI graphs of four artificial datasets demonstrated in Figs. 4–7, Kung–Lin index is unable to identify the optimum number of clusters for small-clustered datasets and is somewhat capable of identifying them for datasets with larger number of clusters. In (Kim and Ramakrishna, 2005), the inefficiency of XB index for identification of the c* of the FCM clustering algorithms is proven with examples. We wanted to test if it can identify the c* for IFC application as well. The minimum XB indicates the c*. XB index was unable to identify the c* in most of the datasets.

5.1. Experiment 1 5.2. Experiment 2 Table 2 displays the optimum number of clusters indicated by each validity measure using 4 different datasets for two different fuzziness levels, i.e. m = 1.3 and m = 2.0, respectively. The results from the application of the cviFF on four different datasets indicate that (Table 2), the new index can successfully separate different models in a given dataset. The new validity measure is not effected from the changing values of the fuzziness criteria, i.e. m = 1.3 and m = 2.0, of the IFC algorithm and converges at the optimum number of clusters, and then it asymptotes to zero as the number of clusters goes to infinity, e.g. nd. In all of the four datasets, the new validity criteria is around its minimum at the c* and the validity measure converges to zero as c > c*. Proposed cviFF is expected to show an asymptotical behaviour towards the larger number of clusters, whereas Table 2 Optimum number of clusters of artificial datasets for m 2 {1.3,2.0} Actual # of clusters !

4

5

7

9

m = 1.3 XB XB* Kung–Lin cviFF

2–6 2–8 n/a 4

2–7 2–9 n/a 5

5 10 8 7

2, 5, 8 4 6 9

m = 2.0 XB XB* Kung–Lin cviFF

2–6 2–8 8, 13 4

2–6 10 n/a 5, 6

3, 5, 6 8 6, 7 7

4, 9 9–10 6 9

From the application of enhanced FSM using IFC as structure identification on real stock price dataset (Celikyilmaz and Tu¨rksßen, in press), the optimum number of clusters was obtained to be c* 2 [3, 6] based on 20 repetition three-way cross validation method. Hence, we applied the proposed cviFF function and averaged over 20 repetitions. Fig. 9 depicts the CVI values for four different validity measures. The proposed cviFF indicate that c* is around 4 or 6 for two different m values. Kung–Lin index can also validate the c* only for more crisp models, viz, m = 1.3. In addition, XB index can somewhat identify c* within the interval of c* 2 [4  8]. Among four CVI measures, cviFF has still the closest values to the actual c*. 5.3. Experiment 3 In this experiment, we wanted to demonstrate that the new cviFF-C, specifically designed for classification type datasets, could actually identify the optimum number of clusters of a real dataset, viz. Ionosphere dataset from UCI repository. The optimum number of clusters based on best classification accuracy results using grid search for changing values of parameters of the classifiers indicates that the optimum number of clusters is c* = 3.5 ± 0.92. Fig. 10 demonstrates that the new cviFFC is able to identify the overlapping clusters and the elbow indicates that the optimum c* should be 4. None of the other cvi measures were able to confidently identify c* for changing values of fuzziness.

108

A. Celikyilmaz, I.B. Tu¨rksßen / Pattern Recognition Letters 29 (2008) 97–108

6. Conclusion In this paper, two new cluster validity criterions are introduced for the validation of a previously proposed improved fuzzy clustering (IFC) algorithm. Given fuzzy partitions with different input–output relations, the proposed validity index, cviFF, computes two different clustering (dis)similarities, compactness and separability. The best CVI is obtained by the ratio between the maximum compactness and the minimum separability measures. The new index gradually asymptotes to its minimum after it reaches an elbow, which is the indicator of the optimum number of clusters. The performance of the proposed index was tested on various datasets demonstrating its validity, effectiveness and reliability. The simulation results indicate that for different m values, the new validity index was more robust than the rest of the validity indices on artificial dataset of hyper-plane structures, as well as real datasets with general structures. Acknowledgements We gratefully thank anonymous reviewers for their many helpful and constructive comments and suggestions. This work has been supported by research grants from the Natural Sciences and Engineering Research Council of Canada (NSERC). References Bezdek, J.C., 1976. Cluster validity with fuzzy sets. J. Cybernetics 3, 58–72. Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York. Bouguessa, M., Wang, S., Sun, H., 2006. An objective approach to cluster validation. Pattern Recognition Lett. 27, 1419–1430. Celikyilmaz, A., Tu¨rksßen, I.B., 2007. Fuzzy functions with support vector machines. Inform. Sciences 177, 5163–5177. Celikyilmaz, A., Tu¨rksßen, I.B., in press. Enhanced fuzzy system models with improved fuzzy clustering algorithm. IEEE Trans. Fuzzy Systems.

Celikyilmaz, A., Tu¨rksßen, I.B., submitted for publication. Increasing accuracy of two-class pattern recognition with enhanced fuzzy functions. Expert Systems Appl. Cimino, M.G.C.A., Lazzerini, B., Marcelloni, F., 2006. A novel approach to fuzzy clustering based on a dissimilarity relation extracted from data using TS system. Pattern Recognition 39, 2077–2091. Dave, R.N., 1996. Validating fuzzy partition obtained through C-shells clustering. Pattern Recognition Lett. 17, 613–623. Fukuyama, Y., Sugeno, M., 1989. A new method of choosing the number of clusters for the fuzzy C-means method. In: Proc. 5th Fuzzy Systems Symposium, pp. 247–250. Hathaway, R.J., Bezdek, J.C., 1993. Switching regression models and fuzzy clustering. IEEE Trans. Fuzzy Systems 1 (3), 195–203. Hettich, S., Bay, S.D., 1999. The UCI KDD Archive []. Department of Information and Computer Science, University of California, Irvine, CA. Ho¨ppner, F., Klawonn, F., 2003. Improved fuzzy partitions for fuzzy regression models. Internat. J. Approx. Reason. 32, 85–102. Kim, M., Ramakrishna, R.S., 2005. New indices for cluster validity assessment. Pattern Recognition Lett. 26, 2353–2363. Kim, D.-W., Lee, K.H., Lee, D., 2003. Fuzzy cluster validation index based on inter-cluster proximity. Pattern Recognition Lett. 24, 2561– 2574. Kung, C.-C., Lin, C.-C., 2002. Fuzzy C-regression model with a new cluster validity criterion. IEEE World Congr. Comput. Intell. 2, 499– 504. Kung, C.-C., Lin, C.-C., 2004. A new cluster validity criterion for fuzzy C-regression model and its application to T-S fuzzy model identification. Proc. IEEE Internat. Conf. on Fuzzy Systems, vol. 3. pp. 1673–1678. Pal, N.K., Bezdek, J.C., 1995. On cluster validity for fuzzy C-means model. IEEE Trans. Fuzzy Systems 3 (3), 370–379. Pedrycz, W., 2004. Fuzzy clustering with knowledge-based guidance. Pattern Recognition Lett. 25 (4), 469–480. Tu¨rksßen, I.B., in press. Fuzzy functions with LSE. Appl Soft Comput. Tu¨rksßen, I.B., Celikyilmaz, A., 2006. Comparison of fuzzy functions with fuzzy rule base approaches. Internat. J. Fuzzy Systems (IJFS) 8 (3), 137–149. Wu, K.-L., Yang, M.-S., 2005a. A cluster validity index for fuzzy clustering. Pattern Recognition Lett. 26 (9), 1275–1291. Wu, K.-L., Yu, J., Yang, M.-S., 2005b. A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests. Pattern Recognition Lett. 26 (5), 639–652. Xie, X.L., Beni, G.A., 1991. Validity measure for fuzzy clustering. IEEE Trans. Pattern and Machine Intell. 3 (8), 841–846. Zadeh, L.A., 1965. Fuzzy sets. Inform. Control 8, 338–353.