Interval-valued fuzzy set approach to fuzzy co-clustering for data classification

Interval-valued fuzzy set approach to fuzzy co-clustering for data classification

ARTICLE IN PRESS JID: KNOSYS [m5G;June 20, 2016;16:1] Knowledge-Based Systems 0 0 0 (2016) 1–13 Contents lists available at ScienceDirect Knowled...

3MB Sizes 0 Downloads 53 Views

ARTICLE IN PRESS

JID: KNOSYS

[m5G;June 20, 2016;16:1]

Knowledge-Based Systems 0 0 0 (2016) 1–13

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Interval-valued fuzzy set approach to fuzzy co-clustering for data classification Van Nha Pham a,b, Long Thanh Ngo a,∗, Witold Pedrycz c,d,e a

Department of Information Systems, Faculty of Information Technology, Le Quy Don Technical University, 236 Hoang Quoc Viet, Hanoi, Vietnam MIST Institute of Science and Technology, 17 Hoang Sam, Hanoi, Vietnam c Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6R 2V4 AB, Canada d Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589, Saudi Arabia e Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland b

a r t i c l e

i n f o

Article history: Received 18 October 2015 Revised 23 May 2016 Accepted 25 May 2016 Available online xxx Keywords: Fuzzy clustering Fuzzy co-clustering Interval type-2 fuzzy sets Interval-valued fuzzy sets Data classification

a b s t r a c t Data clustering is aimed at discovering a structure in data. The revealed structure is usually represented in terms of prototypes and partition matrices. In some cases, the prototypes are simultaneously formed using data and features by running a co-clustering (bi-clustering) algorithm. Interval valued fuzzy clustering exhibits advantages when handling uncertainty. This study introduces a novel clustering technique by combining fuzzy co-clustering approach and interval-valued fuzzy sets in which two values of the fuzzifier of the fuzzy clustering algorithm are used to form the footprint of uncertainty (FOU). The study demonstrates the performance of the proposed method through a series of experiments completed for various datasets (including color segmentation, multi-spectral image classification, and document categorization). The experiments quantify the quality of results with the aid of validity indices and visual inspection. Some comparative analysis is also covered. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Data clustering concerns unsupervised learning in which we partition data into groups or clusters on the basis of similarity of their features. The techniques which simultaneously cluster both data and their features are referred to as co-clustering algorithm. The applicability and quality of clusters could be augmented by augmenting co-clustering with the concepts and techniques of fuzzy sets. Subsequently, the resulting techniques can be called fuzzy co-clustering. Fuzzy co-clustering follows the same principle as general co-clustering. The difference is that the boundary between any two clusters is described in terms of membership functions rather than characteristic functions. For example, in document classification, fuzzy co-clustering allows any document and word to belong to more than a single co-cluster. Fuzzy coclustering is suitable for clustering complex data types as multidimensional, multi-features, and of large size. Bezdek et al. [1] introduced fuzzy C-Means (FCM) clustering, which nowadays becomes one of the most commonly considered method of fuzzy clustering. Salehi et al. [35] proposed the synergistic combination model based on Particle Swarm Optimization and fuzzy sets to ∗

Corresponding author. E-mail addresses: [email protected] (V.N. Pham), [email protected] (L.T. Ngo), [email protected] (W. Pedrycz).

cope with uncertainty and optimize the seed points and expand the hyper-boxes used in granular computing. Bhoyar et al. [2] also proposed modified FCM for color image segmentation using just noticeable difference histogram. Hu et al. [36] proposed a hierarchical cluster ensemble model based on knowledge granulation to provide a new way to deal with the cluster ensemble problem together with ensemble learning application of the knowledge granulation. Recently, type-2 fuzzy sets have been studied and combined with clustering techniques to enhance the abilities of these methods to capture and quantify the aspect of uncertainty [3–5]. Furthermore interval type-2 fuzzy sets have applied to many problems such as land cover classification [6] and color image segmentation [7]. Yeh et al. [8] used interval type-2 fuzzy to data-based system modeling by combining type-2 fuzzy neural network with a hybrid learning algorithm. Mau et al. [9] proposed an interval Type-2 Fuzzy Subtractive Clustering approach to obstacle detection of robot vision using RGB-D camera. Melin and Castillo [10] presented studies and applications of type-2 fuzzy clustering to segmentation, classification, and pattern recognition. Nguyen et al. [32] proposed a genetic type-2 Fuzzy C-means clustering approach, which is developed and applied to the segmentation and classification of M-FISH images. Ngo et al. [30] exploits local spatial information between the pixel and its neighbors to compute the membership degree by using an interval type-2 fuzzy clustering algorithm. Nguyen et al. [31] proposed

http://dx.doi.org/10.1016/j.knosys.2016.05.049 0950-7051/© 2016 Elsevier B.V. All rights reserved.

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

JID: KNOSYS 2

ARTICLE IN PRESS

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

kernel interval-valued Fuzzy C-Means clustering (KIFCM) and multiple kernel interval-valued Fuzzy C-Means clustering (MKIFCM) which were built on a basis of the kernel learning method and the interval valued fuzzy sets with intent to overcome some drawbacks existing in the “conventional” Fuzzy C-Means (FCM) algorithm. Clustering of interval-valued data is considered following various approaches [38,39], in which fuzzy k-Means is proposed using adaptive quadratic distances by optimizing a certain adequacy criterion [38]. Another clustering approach [39] for intervalvalued data is based on fuzzy c-ordered medoids using Hubers Mestimators and the Yagers order weighted averaging to make it robust to outliers. Xu et al. [37] also referred to interval-valued fuzzy sets by considering weak transitivity index to quantify the transitivity consistency degree of this relation. Dang et al. [33] proposed methods involving interval type-2 fuzzy sets to realize collaborative clustering. Note that interval valued fuzzy set is the special case of interval type-2 fuzzy set when the interval value Jx is the closed interval. Most of co-clustering algorithms are used to deal with dyadic data, e.g., the document and word co-occurrence frequencies. Coclustering technique is an alternative clustering method to cluster some data which exhibit complex data structures, such as web, multi-spectral images, hyper-spectral images and alike... A number of pertinent studies on co-clustering have been reported in [11–14]. The first framework of fuzzy co-clustering was proposed by Honda as fuzzy co-clustering modeling (FCCM) [15] for clustering multi-dimensional, multi-feature data. FCCM is an FCM-type co-clustering model, whose goal is to extract co-clusters of objects and features from co-occurrence matrices. Kummamuru et al. [16] proposed an improved fuzzy co-clustering algorithm for clustering documents and keywords (Fuzzy CoDoK). However, Fuzzy CoDoK is vulnerable to outliers because of the algorithms sole reliance on fuzzy C-means-like nature. Tjhi and Chen [17] proposed an improved fuzzy co-clustering algorithm for clustering documents and words by imposing standard partition-like requirements to generate fuzzy word clusters that capture the natural distribution of words, which may be beneficial for further information retrieval. Tjhi and Chen [18] developed a possibilistic fuzzy coclustering algorithm (PFCC) for automatic categorization of large collections of document. PFCC integrates a possibilistic document clustering technique and a combined formulation of fuzzy word ranking and partitioning into a fast iterative co-clustering procedure. The key idea behind HFCR [19] is the formulation of the dual-partitioning approach for fuzzy co-clustering, replacing the existing partitioning-ranking approach. HFCR adopts an efficient and practical heuristic method that can be shown to be more robust [17] than the dual-partitioning approach. Kanzawa [20] compared imputation strategies in FNM-based and RFCM-based fuzzy co-clustering. Honda et al. [21] improved the performance of recommenders, which come as a combination of content-based and collaborative filtering approaches in a two-layer graph model. Tjhi et al. [22] proposed fuzzy semi-supervised co-clustering algorithm for categorization of large web documents. In this approach, the clustering process is carried out by incorporating some prior knowledge into the fuzzy co-clustering framework and coming in the form of pairwise constraints provided by users. A new cluster validity measure proposed for finding the optimal number of clusters and verifying quality of the proposed approaches which used fuzzy c-means [24,25] and general type-2 fuzzy c-means [34]. Hanmandlua et al. [23] established a new proposal using fuzzy coclustering model to segment color image data and assess quality of co-clustering based on some validity indices. In this study, an interval valued fuzzy co-clustering is proposed by combining advantages of fuzzy co-clustering and intervalvalued fuzzy sets. The proposed algorithm, called Interval-Valued Fuzzy Co-Clustering algorithm (IVFCoC), is aimed at solving clus-

tering problems in the presence of complex data. The IVFCoC is an extension of fuzzy co-clustering by using two values of the fuzziness coefficient m1 , m2 to produce FOU corresponding to the upper and lower values of type-2 fuzzy co-clustering. Thus the membership functions assume interval values. Experiments are conducted on various datasets including those of color segmentation, multi-spectral image classification, and document categorization. We show the quality of the experimental results in terms of performance evaluation criteria, accuracy, stability and efficiency. Some comparative analysis is also delivered. The paper is organized as follow. Section 2 provides a brief background of fuzzy sets and fuzzy co-clustering algorithm. Section 3 introduces the algorithm IVFCoC. Section 4 covers some experiments. Section 5 includes conclusions and future works. 2. Prerequisites Here, we offer a concise overview of essentials concerning type2 fuzzy sets and fuzzy co-clustering. 2.1. Interval-valued fuzzy sets Definition 1. A type-2 fuzzy set, denoted A˜ , is characterized by a type-2 membership function (MF)μA˜ (x, u ) for every x ∈ X , let Jx be a subset of [0,1]. A type-2 MF μA˜ is a function from the set {(x, u)|x ∈ X and u ∈ Jx } to [0,1], i.e.,

A˜ = {((x, u ), μA˜ (x, u ))|∀x ∈ X,

∀u ∈ [0, 1]}

(1)

in which 0 ≤ μA˜ (x, u ) ≤ 1 and



Jx = u ∈ [0, 1],

 μA˜ (x, u ) > 0

(2) 

At each value of x, we say x = x , the 2-D plane whose axes are u and μA˜ (x , u ) is called a vertical slice of μA˜ (x, u ). A secondary membership function is a vertical slice of μA˜ (x, u ). It comes in the form μA˜ (x = x , u ) for x ∈ X and u ∈ Jx , i.e.,



μA˜ (x = x , u ) ≡ μA˜ (x ) = 





u∈Jx

fx (u )/u, u ∈ [0, 1]

Jx = u ∈ [0, 1], μA˜ (x , u )

(3) (4)

in which 0 ≤ fx (u) ≤ 1. Definition 2. Uncertainty in the primary memberships of a type-2 fuzzy set A˜ consists of a bounded region that we call the footprint of uncertainty (FOU). It is the union of all primary memberships,  i.e., F OU (A˜ ) = x∈X Jx . An upper MF and a lower MF are two type-1 MFs that are bounds for the FOU of a type-2 fuzzy set A˜ . The upper MF is associated with the upper bound of FOU(A˜ ), and is denoted μA˜ (x ), ∀x ∈ X. The lower MF is associated with the lower bound of FOU(A˜ ), and is denoted μA˜ (x ), ∀x ∈ X, i.e.

  μA˜ (x ) = sup u|u ∈ [0, 1], μA˜ (x, u ) > 0

(5)

  μA˜ (x ) = in f u|u ∈ [0, 1], μA˜ (x, u ) > 0

(6)

Type-2 fuzzy sets are called interval type-2 fuzzy sets [5] if all

μA˜ (x, u ) = 1, i.e., such constructs are defined as follows.

Definition 3. Type-2 fuzzy sets A˜ are called interval type-2 fuzzy sets [5] if all μA˜ (x, u ) = 1. An interval type-2 fuzzy set A˜ is characterized by an interval membership function μA˜ (x, u ) = 1 where x ∈ X and u ∈ Jx ⊆[0, 1], i.e.,

A˜ = {((x, u ), 1 )|∀x ∈ X, u ∈ [0, 1]}

(7)

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

3

Let

Jx = {u ∈ [0, 1], 1} = [μA˜ (x ), μA˜ (x )]

(8)

X = {x1 , x2 , ..., xi , ..., xN } ∈ RK

where μA˜ (x ) and μA˜ (x ) are defined in Eqs. (5) and (6). Definition 4. Let us denote by L([0,1]) the set of all closed subintervals in [0,1], that is, L([0, 1] ) = {x = [x, x]|[x, x] ⊆ [0, 1] and x ≤ x}. An interval-valued fuzzy set A on the universe X = ∅ is a mapping A: U → L([0,1]), such that the membership degree of x ∈ X is given by A(x ) = [A(x ), A(x )] ⊆ L([0, 1] ), where A (x): X → [0, 1] and A(x ) : X → [0, 1] are mappings defining the lower and the upper bound of the membership interval A(x). An interval-valued fuzzy set (IVFS) is special case of interval type-2 fuzzy set [5]. 2.2. Fuzzy co-clustering Fuzzy co-clustering is a variant of co-clustering that generates fuzzy co-clusters, i.e., fuzzy object and feature clusters. Compared to its Boolean counterpart, fuzzy co-clustering can produce more realistic results by allowing co-clusters to overlap each other. In the literature, there are three important fuzzy co-clustering algorithms that are relevant to our work. They are Fuzzy Clustering for Categorical Multivariate Data (FCCM) [15], Fuzzy Co-Clustering of Documents and Keywords (Fuzzy CoDoK) [16] and fuzzy co-clustering for color image segmentation [23]. All the algorithms share the same principle where co-clustering is seen as a combination of object partitioning and feature ranking. The objective function JFCoC (U, V, P) is expressed in the following form

JF CoC (U, V, P ) =

C  N  K 

uci vc j dci j + Tv

c=1 i=1 j=1 C  K 

+Tv

C  N 

uci log uci

dci j = xi j , pc j j 2 = (xi j − pc j )2

(12)

Let uci denotes the object membership of the data object i to cluster c, U = {uci } be the C × N object membership function matrix of datasets, vcj denotes the feature membership defined as the membership of feature j to the cluster c and V = {vc j } be the corresponding C × K feature membership matrix of datasets. By using a fuzzifier parameter to generalize FCoC (called GFCoC) algorithm, GFCoC algorithm is an iterative optimization that minimizes the cost function defined as follows:

JGF CoC (U (m ) , V (m ) , P (m ) )=

c=1 i=1

vc j log vc j

denote a dataset with N data objects to be partitioned into C clusters, where xi represents K-feature data object. xij denote the feature j of the data object i, P = { pc j }, c = 1..C and j = 1..K, be the set of feature-based centroids. Various distances, such as the Minkowski distance, the Chebychev distance, the Mahalanobis one, are considered in comparison with the Euclidean one. Because, dcij is the individual distance with respect to the jth feature between the data xi and the centroid pc , so the used distance has to be decomposed into individual variables of data. Iterative clustering algorithms usually use up a lot of time. On the other hand, the Euclidean distance forms a special case of the Minkowski distance. In this study, we use the Euclidean distance which facilitates computing and offers conditions of decomposability, i.e. dcij is the squared Euclidean distance between xij and pcj given by:

C  N  K 

m um ci vc j dci j + Tu

c=1 i=1 j=1

(9)

+Tv

c=1 j=1

C  K 

C  N 

m um ci log uci

c=1 i=1 m vm c j log vc j

(13)

c=1 j=1

In which C, N, K are the number of clusters, objects and features, respectively; uci and vci are object and feature membership grades; Tu , Tv are co-clustering membership parameters; dcij be the square of the Euclidean distance between feature data object xij and the feature cluster centroid pcj . The components of the objective function (9) are determined as follows: uci is membership grade of the ith data object and it expressed as

uci =

e

(−

C

q=1

K

vc j dci j

j=1

e (−

Tu

K j=1

)

v dqi j qj Tu

)

(10)

vcj is membership grade of features/components of each data object, calculated as follows:

e (−

vc j =  K

q=1

N

i=1

e (−

u dci j ci Tv

N

)

u dciq ci i=1 Tv

)

(11)

To establish optimal clustering, the (13) reaches a minimum and JGFCoC is minimized subject to the following constraints: C 

uci = 1, uci ∈ [0, 1], ∀i = 1, N

c=1 K 

vc j = 1, vc j ∈ [0, 1], ∀c = 1, C

(14)

j=1

where Tu and Tv are weights of fuzzifiers. Increasing Tu and Tv will increase the opacity of the cluster. xij is element j of data point i, P = { pc j } be the set of feature centroids. Interval-Valued Fuzzy Co-Clustering algorithm (IVFCoC) is extension of objective function (13) by using two fuzziness parameters m1 , m2 to make FOU, corresponding to upper and lower values of interval valued fuzzy co-clustering. Using fuzzifiers gives different objective functions to be minimized as follows,

3. Interval-valued fuzzy co-clustering algorithm Interval-valued fuzzy co-clustering algorithm (IVFCoC) derived from combining the advantages of fuzzy co-clustering and intervalvalued fuzzy sets to solve clustering problems with complex data models as multi-dimensions, multi-features, high size. IVFCoC algorithm assigns data objects to each cluster by using two intervalvalued fuzzy membership functions regards to data objects and features of data objects.

Jm1 (U (m1 ) , V (m1 ) , P (m1 ) ) =

C  N  K 

1 m1 um v d + Tu ci c j ci j

c=1 i=1 j=1

+Tv

C  K 

C  N 

1 1 um log um ci ci

c=1 i=1 1 1 vm log vm cj cj

(15)

c=1 j=1

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS 4

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

and

Jm2 (U (m2 ) , V (m2 ) , P (m2 ) )

Jm2 (U (m2 ) , V (m2 ) , P (m2 ) ) =

C  N  K 

2 m2 um d ci c j ci j

v

+ Tu

c=1 i=1 j=1

+Tv

C  N 

2 um ci

=

2 log um ci

C  N  K 

m2 cj

v log v

(16)

c=1 j=1

+

To determine the components of the objective function, we establish the following theorem. Theorem 1. The Jm1 in (15) and Jm2 in (16) attains its local minima ( m1 )

( m2 )

( m2 )

]CxN , U (m2 ) ≡ [uci ( m1 )

V (m2 ) ≡ [vcj ]CxK , P (m1 ) ≡ [ pcj the following relationship,

( m1 ) uci =

e C

j=1

e

p=1

1d vm c j ci j

K

(−

m1 Tu

j=1

pj m1 Tu

( m2 )

]CxK and P (m2 ) ≡ [ pcj

)

vm1 d pi j

K

(−

( m1 )

]CxN , V (m1 ) ≡ [vcj

and )

e

( m2 ) uci =

(−

C p=1

m v 2 dci j cj

K j=1

e

(

]CxK ,

]CxK satisfy

m2 Tu

)

m v 2 d pi j  − Kj=1 pmj T 2 u

)

(17)

N 

2 2 um log um ci ci

c=1 i=1

C  K 

+Tv

m2 cj

c=1 j=1

when U (m1 ) ≡ [uci

C  N 

c=1 i=1 j=1

c=1 i=1

C  K 

2 m2 um v d + Tu ci c j ci j

2 2 vm log vm cj cj



( m2 )

λi

C 

i=1

2 um ci

−1



C K   2 + γc(m2 ) vm −1 cj

c=1

c=1

(21)

j=1

To find the optimal fuzzy object memberships U (m1 ) and U (m1 ) , the fuzzy feature memberships V (m1 ) and V (m2 ) are fixed and the distances dcij are also constants. We take the derivatives of objective function given in (20) and (21) with respect to the fuzzy partition matrix and setting them to zero. For each fuzzy object membership (m ) (m ) uci 1 and uci 2 , we obtain,

∂ Jm1 (U (m1 ) , V (m1 ) , P (m1 ) ) ∂ uci(m1 ) =

K 

( m1 ) 1 vm d + Tu (m1 log uci + 1 ) + λi(m1 ) = 0 c j ci j

(22)

j=1

vc(mj 1 ) =

e

(−

m u 1 dci j ci i=1 m1 Tv

N

K

q=1 e

(−

)

m u 1 dciq ci i=1 m1 Tv

N

and )

vc(mj 2 ) =

e

(−

m u 2 dci j ci i=1 m2 Tv

N

K

q=1 e

(−

N

i=1

∂ Jm2 (U (m2 ) , V (m2 ) , P (m2 ) ) ∂ uci(m2 )

)

m u 2 dciq ci m2 Tv

)

=

(18)

N

p(cmj 1 ) = N

i=1

( m1 )

uci

N

xi j

uci

i=1

p(cmj 2 ) = N

i=1

and

( m1 )

( m2 )

uci

i=1

K 

( m2 ) 2 vm d + Tu (m2 log uci + 1 ) + λi(m2 ) = 0 c j ci j

After some algebraic manipulations made in (22) and (23), we obtain,

xi j

(19)

( m2 )

uci

( m1 )

=

uci

e



m v 1 dci j

K

cj m1 Tu

j=1

λ

( m1 )

and

Because of the constraint

Proof. The goal of IVFCoC is to simultaneously determine the par(m ) (m ) tition matrices U (m1 ) ≡ [uci 1 ]CxN , U (m2 ) ≡ [uci 2 ]CxN and V (m1 ) ≡

C C −   e ( m1 ) uci =

(m )

[vcj 1 ]CxK , V (m2 ) ≡ [vcj 2 ]CxK that minimize the objective function given as (13) and satisfy the constrains (14). IVFCoC adopts an alternating optimization approach to minimize Jm1 and Jm2 . With constraints given in (13), the minimum of the objective function is calculated by forming an objective function with Lagrange multiplier λ(m1 ) to capture constraint C ( m1 ) = 1, with Lagrange multiplier λ(m2 ) to describe conc=1 uci C ( m2 ) straint = 1, with Lagrange multiplier γ (m1 ) , γ (m2 ) to c=1 uci   (m ) (m ) describe constraint Kj=1 vc j 1 = 1, Kj=1 vc j 2 = 1, respectively. Thus, the following Lagrange function is obtained,

=

1 m1 um d ci c j ci j

v

+ Tu

c=1 i=1 j=1

+Tv

C  K  c=1 j=1

+

C  N 

1 um ci

c=1

c=1 λ

( m1 )

i

⇒ e m1 Tu

+ m1

1

v log v 

λ

( m2 )

C

c=1

m v 1 dci j

K

cj m1 Tu

j=1

λ

( m1 )

e C 

λ

( m2 )

i m2 Tu

e

=

λ



c=1

( m2 )

uci

C

( m1 )

=

+ m1

c=1 e λ

e

2



cj m1 Tu

j=1

i

i m2 Tu

=1

+ m1

1 m v 2 dci j cj

K

( m2 )

= 1, the

m v 1 dci j

K

e m1 Tu

1

cj m2 Tu

C

= 1 and

c=1 e

+ m1

j=1

2

λ(m2 ) are eliminated as C

m v 2 dci j

K

( m1 )

uci

(24)

+ m1

j=1

m2 Tu

=1

+ m1

2

m v 1 dci j  − Kj=1 cmj T 1 u

(25)

c=1 λ

( m2 )

i

⇒ e m2 Tu

( m1 ) uci =



c=1

=

cj m2 Tu

j=1

i

e m1 Tu

C C −   e ( m2 ) uci =

m v 2 dci j

K

e m2 Tu

i

c=1



+ m1

2

=

C 

e



K

m v 2 dci j

j=1

cj m2 Tu

(26)

By using (25) and (26) in (24), the closed-form solution for the optimal object membership is obtained as

1 log um ci

m1 cj

N C   1 λi(m1 ) um −1 ci i=1

c=1

=

1

Lagrange multiplier λ(m1 ) and

c=1 i=1 m1 cj

uci

e

c=1

Jm1 (U (m1 ) , V (m1 ) , P (m1 ) ) C  N  K 

i

e m1 Tu

( m2 )

+ m1

in which c = 1, ..., C, i = 1, ..., N and j = 1, ..., C; C is the number of clusters and N is the number of patterns and K is the number of features.

(m )

(23)

j=1



+

C K   1 γc(m1 ) vm −1 cj c=1

e



C p=1



m v 1 dci j cj

K j=1

e



m1 Tu

K j=1

m v 1 d pi j pj m1 Tu

and

( m2 ) uci =

e



C p=1

m v 2 dci j cj

K j=1

e



m2 Tu

K j=1

m v 2 d pi j pj m2 Tu

(27) (20)

j=1

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

To find the optimal prototypes V (m1 ) and V (m2 ) , we assume the fuzzy partition U (m1 ) and U (m2 ) are fixed and the distances dcij are also constants. We take the derivatives of the objective function given in (18) with respect to the fuzzy feature memberships and setting them to zero. As before for U (m1 ) and U (m2 ) , we obtain,

vc(mj 1 ) =

e

(−

K

q=1

N

i=1

e

(

m u 1 dci j ci m1 Tv

)

m u 1 dciq  ci − N i=1 m1 Tv

vc(mj 2 ) =

and )

e

(−

K

q=1

N

e

i=1

(

m u 2 dci j ci m2 Tv

)

m u 2 dciq  ci − N i=1 m2 Tv

)

2

ter centroid is computed as dci j = xi j − pc j = (xi j − pc j )2 = x2i j − we obtain,

N N   ∂ Jm1 (U (m1 ) , V (m1 ) , P (m1 ) ) ( m1 ) m1 m1 m1 ( m1 ) = v u x − v p uci =0 i j cj ci cj cj ∂ P ( m1 ) i=1 i=1

(29)

N N   ∂ Jm2 (U (m2 ) , V (m2 ) , P (m2 ) ) ( m2 ) m2 m2 m2 ( m2 ) = v u x − v p uci =0 i j cj ci cj cj ∂ P ( m2 ) i=1 i=1

(30) Following some algebraic simplifications of (29) and (30), we obtain,

N

p(cmj 1 ) = N

i=1

N

( m1 ) uci xi j

i=1

and

( m1 ) uci

p(cmj 2 ) = N

i=1

(31)

( m2 ) uci



Theorem 1 has been proved.

2. Let (U (m1 ) ) = Jm1 (U (m1 ) , V (m1 ) ), (U (m2 ) ) = Lemma ( m ) (m ) Jm2 (U (m2 ) , V (m2 ) ) where U (m1 ) ≡ [uci 1 ]CxN and U (m2 ) ≡ [uci 2 ]CxN C  ( m1 ) ( m2 ) C satisfies the constraints conditions

c=1

uci

= 1 and

c=1

uci

=

1 (for i=1..N), we have dcij > 0, m2 > m1 > 1 then U (m1 ) is a local optimum of (U (m1 ) ) and U (m2 ) is a local optimum of (U (m2 ) ), if (m ) (m ) and only if uci 1 and uci 2 (for c=1..C and i=1..N) are calculated by (17). Proof. The necessity has been proved above.



To prove its sufficiency, the Hessian matrices H ((U (m1 ) )) and (U (m1 ) ) or H ((U (m2 ) )) are obtained using the Lagrange function given in (20) and (21) as follows.

h f g,ci (U (m1 ) ) = h f g,ci (U (m1 ) ) =





∂ufg Tu m1

( m1 )

uci

∂ (U (m1 ) ) ∂ uci(m1 )



if

f=c

and

g=i

if

f=c

and

g=i

else h f g,ci (U (m1 ) ) = 0 h f g,ci (U (m2 ) ) = else h f g,ci (U

Tu m2

( m2 )

uci ( m2 )

)=0

( m2 )

, uci

and are separately

( m1 )

]CxN with uci

( m2 )

and uci

are calculated by (17),

(m ) and feature membership matrices V (m1 ) ≡ [vcj 1 ]CxK and V (m2 ) ≡ ( m2 ) ( m1 ) ( m2 )

[vcj

]CxK with vc j

and vc j

calculated by (18), the cluster cen-

(m ) (m ) (m ) (m ) troids P (m1 ) ≡ [ pcj 1 ]CxK , P (m2 ) ≡ [ pcj 2 ]CxK with pc j 1 and pc j 2

calculated by (19), respectively. The IVFCoC method considers an interval-valued fuzzifier [m1 , m2 ] rather than a precise numerical value. The interval object membership [uci , uci ] can then be computed following (34) and (35): ( m1 ) ( m2 ) uci = min(uci , uci )

(34)

( m1 ) ( m2 ) uci = max(uci , uci )

(35)

for object membership grades, we have,

uci =

(uci + uci _ ) 2

, c = 1..C, i = 1..N

The interval feature membership vc j , vc j following (37) and (38)

(36)



can then be computed

vc j = min(vc(mj 1 ) , vc(mj 2 ) )

(37)

vc j = max(vc(mj 1 ) , vc(mj 2 ) )

(38)

for feature membership grades, we have,

( m2 ) uci xi j

i=1

( m2 )

calculated by (17), uci > 0, uci > 0, m2 > m1 > 1 and dcij > 0. The above Hessian matrices are the positive definite matrices. So (32) and (33) give the sufficient condition to minimize (U (m1 ) ) and (U (m2 ) ), respectively.

( m2 )

Euclidean distance dci j = xi j − pc j 2 be squared distance between the jth feature of ith pattern and the the jth feature of cth clus2xi j pc j +

( m1 )

and U (m2 ) ≡ [uci

(28)

p2c j ,

( m1 )

For all 1 < c < C and 1 < i < N uci

Remark. From above Theorem and Lemma, the IVFCoC is con(m ) verged with the object membership matrices U (m1 ) ≡ [uci 1 ]CxN

To find the cluster centroids P (m1 ) and P (m2 ) , the fuzzy memberships U (m1 ) , U (m2 ) , V (m1 ) and V (m2 ) are fixed and the distances dcij are also constants. We determine the derivatives of the objective function given in (20) and (21) with respect to the cluster centroids and setting them to zero. (m ) (m ) For each cluster centroid pc j 1 and pc j 2 , by using the squared



5

(32)

(33)

According to (32) and (33), h f g,ci (U (m1 ) ) and h f g,ci (U (m2 ) ) are the diagonal matrices.

vc j =

( vc j + vc j _ ) 2

, c = 1..C, j = 1..K

The left and right end-points of the centroid, p be computed following equations:

(39) cj

and pc j can then

pc j = min( p(cmj 1 ) , p(cmj 2 ) )

(40)

pc j = max( p(cmj 1 ) , p(cmj 2 ) )

(41)

For cluster centroids, we have,

pc j =

( pc j + pc j _ ) 2

, c = 1..C, j = 1..K

(42)

In the IVFCoC algorithm, the interval of object membership grades [uci , uci ] are formed by (34) and (35), object membership grades uci is calculated through these in (36). The interval of

components  feature membership grades vc j , vc j are formed by (37) and (38), feature membership grades vcj is formed by (39). The left and right end-points of the centroid, p cj and pc j are formed by (40) and (41), cluster centroids is formed by (42). Using two different fuzzifiers m1 and m2 and to determine a global optimum of the constrained optimization problem, the IVFCoC algorithm is further given as a learning step to update the components of the objective functions in (15) and (16). The steps of IVFCoC algorithm are shown as Algorithm 1. The time complexity of IVFCoC is O(CK2 N), where N be the number of data objects, C be the number of clusters, K be the number of features and τ denotes the number of iterations. At the same time, the computation of matrix D requires O(CKN). The formula for calculating upper and lower bounds of U, V an P be

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS 6

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

Algorithm 1 IVFCoC algorithm: a processing scheme



 M

1. Input: Data X = xi , xi ∈ R , i = 1..N, the number of clusters C, fuzifier parameters Tu , Tv , m1 , m2 (1 < m1 < m2 < +∞ ), maximum error limit ε , maximum number of iterations τmax . 2. Set iteration number τ = 1. 3. Initialize uci such that 0 ≤ uci ≤ 1, uci = uci = uci . 4. DO 5. Update pc j , pc j and pc j using (40), (41) and (42). 6. Update dci j using (12). 7. Update vc j , vc j and vc j using (37), (38) and (39). 8. Update uci , uci and uci using (34), (35) and (36). 9. Increase τ = τ + 1. 10. WHILE (max(|uci [τ ] − uci [τ − 1]| ) ≤ ε or τ =τmax ) 11: Output: Clustering result.

performed independently. The computation of matrix P requires O(CKN). The time complexity of update all feature memberships vcj per iteration is O(CK2 N) and the time complexity of update all object memberships uci per iteration is O(C2 KN). In fact, the number of dimensions is larger than the number of clusters, therefore, taking into account the number of iterations, the completely time complexity of this IVFCoC is O(CK2 N), which is the same as fuzzy co-clustering algorithms. 4. Experimental results 4.1. Validity indexes Fuzzy clustering or fuzzy co-clustering are unsupervised learning techniques so data objects are not labeled and anticipated structure. Numerous validity indexes was proposed to valuate quality of clustering for algorithms or datasets. Xie and Beni [24] proposed the validity function S of clusters as follows:

S (c ) =

σ /N

(43)

dmin

Where dmin is determined by



dmin = min

K 

∀c



( p(c+1) j − pc j )

2

σ = max ∀c

N  i=1

u2ci

K 



(xij − pc j )2

(45)

j=1

CS(c ) = Com(c ) × SE (c )

(46)

where, Com(c) is measure compactness in each cluster, defined as follows,

Com(c ) =

C N 1  uci xi − pc wci N

2



x i − p c 2 2σ

C

k=1 c (c−1 ) 2

pi −pc







mini = j ( pi − p j )

(49)

According to Xie and Beni in [24] or Shieh in [25], S or CS take smaller value, the better clusters. Therefore, we can determine the optimal number of clusters by finding the minimal value of S(c) or CS(c). The value c obtained from the minimal S(c) or CS(c) is the optimal number of clusters. To assess the quality of clustering, we use a set of appropriate indicators of the parameters related to the clustering algorithm. There have been many studies around the development of clustering assessment criteria, but studies are limited to the fuzzy clustering algorithm, and fuzzy co-clustering. Bezdek [26] proposed two coefficients that are Partition Coefficient (PC) and Partition Entropy (PE):

PC =

C N 1  2 uci N

(50)

c=1 i=1

and

PE = −

C N 1  uci log uci C

(51)

c=1 i=1

PC(PE) take smaller (larger) value when membership grades become ambiguous. So, the optimal partition can be the one having the largest PC (or the smallest PE) value. In fuzzy co-clustering, the validity indexes are not only affected object membership grades uci but also feature membership grades vcj . Meanwhile, Honda et al. [21] also adjusted the indexes PC and PE becoming four components PCu , PCv , PEu and PEv :

PCu =

C N C K 1  2 1  2 uci , PCv = vc j N C c=1 i=1

P Eu = −

(52)

c=1 j=1

C N C K 1  1  uci log uci , P Ev = − vc j log vc j N C c=1 i=1

(53)

c=1 i=1

(47)

 (48)

N 1 ( xi − yi ) N

(54)

i=1

IQI =

4σxy xy

(σ + σy2 )(x2 + y2 ) 2 x

with x = 1 N

1 N

N

i=1 xi ,

y=

1 N

(55) N

1 N 2 2 i=1 yi , σx = N−1 i=1 (xi − x ), σy  N 1 i=1 (xi − x )(yi − y ) Where, x N−1

=

= i=1 (yi − y ) and σxy = N−1 {xi } = {x1 , x2 , ..., xN } and y = {yi } = {y1 , y2 , ..., yN } correspond to the original image and segmented image. To assess the quality of clustering on the data set that have been labeled, we use Precision and Recall, as defined follows [29]

C

c=1

P recision = and

Recall =

c=1 i=1

wci = exp −

i=1

MSE (x, y ) =

Shieh [25] defined validity measure on the separation between clusters and the compactness in each cluster to indicate the optimal number of clusters obtained by fuzzy clustering algorithms as follows,



1

SE (c ) = √C

(44)

j=1



and SE(c) is measure the separation between clusters, defined as follows

To assess the quality of image segmentation, we use two indices, namely Mean Squared Error (MSE) [27] and Image Quality Index (IQI) [28]

and σ is the minimum distance between the prototypes pcj of cluster c and feature j; and σ is the maximum variance among C clusters, given by

and

[m5G;June 20, 2016;16:1]

( |Rc||R+c||Sc | ) C

C c=1

( |Rc||R+c ||Tc | ) C

(56)

(57)

where Rc is the set of all objects correctly assigned to cluster c, Sc is the set of all objects incorrectly assigned to cluster c, and Tc is the set of all objects incorrectly not assigned to cluster c. When computing precision and recall, we use the optimal clustering assignment in determining which of objects have been correctly clustered.

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

7

Table 1 Color image segmentation results using FCM, FCoC and IVFCoC on color image “147091.jpg”. No

Type

C

K

PCu

PEu

MSE

IQI

S

CS

1

FCM FCoC IVFCoC FCM FCoC IVFCoC FCM FCoC IVFCoC FCM FCoC IVFCoC FCM FCoC IVFCoC FCM FCoC IVFCoC FCM FCoC IVFCoC FCM FCoC IVFCoC

3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10

15 12 35 26 32 34 27 35 43 28 60 39 26 57 27 24 35 44 26 34 36 27 31 38

0.7332 0.9399 0.9972 0.6949 0.9704 0.9952 0.6225 0.9851 0.9916 0.5892 0.9987 0.9883 0.5431 0.9978 0.9841 0.5162 0.9977 0.9818 0.4868 0.9972 0.9745 0.4629 0.9970 0.9711

0.20602 0.0 0 018 0.00219 0.25638 0.0 0 037 0.00379 0.32337 0.0 0 089 0.00647 0.36517 0.0 0 096 0.00898 0.41882 0.00147 0.01231 0.45083 0.00165 0.01381 0.48969 0.00199 0.01976 0.52590 0.00219 0.02231

39.44 44.07 36.95 32.72 32.61 30.67 29.96 28.68 27.14 27.12 22.70 22.71 26.61 20.86 20.74 24.26 19.63 19.64 23.39 18.66 18.67 22.76 18.17 18.01

0.9344 0.8754 0.9569 0.9524 0.9505 0.9736 0.9643 0.9582 0.9811 0.9709 0.9875 0.9875 0.9738 0.9898 0.9899 0.9770 0.9909 0.9909 0.9797 0.9919 0.9920 0.9814 0.9925 0.9923

0.1631 0.0610 0.0649 0.1052 0.0474 0.0392 0.3307 0.0736 0.0598 0.1989 0.0524 0.0498 0.1338 0.0723 0.0487 0.8675 0.0556 0.0529 23.0918 0.1039 0.0925 6.8071 0.0877 0.0859

0.010 0.035 0.057 0.044 0.030 0.008 0.070 0.108 0.102 0.076 0.074 0.186 0.172 0.357 0.479 0.319 0.302 0.237 0.191 0.275 0.258 0.196 0.263 0.182

2

3

4

5

6

7

8

Fig. 1. Image segmentation results using IVFCoC on original image “147091.jpg” with number of clusters from 2 to 10.

4.2. Experiments In this section, experimental results are shown in order to demonstrate the advances of the proposed algorithm. 4.2.1. Color image segmentation The experiments are implemented to exhibit the effectiveness of the proposed algorithm in segmenting natural color images. A color image datasets to be analyzed was downloaded from Berkeleys photo library1 . In color image segmentation, we considered datasets for co-clustering in which pixels are data objects and their colors are features. The first experiment, we have conducted clustering on a color image “147091.jpg” with the number of clusters from 2 to 10, using three clustering methods of FCM, FCoC and IVFCoC. Clustering results are presented in Table 1 including the value of the validity indices received from the clustering algorithms. Image segmentation results using IVFCoC on original image “147091.jpg” with the number of clusters sets up from 2 to 10. In Fig. 1 are the result im1 The Berkeley Segmentation Dataset and Benchmark [Online], http://www.eecs. berkeley.edu/Research/Projects/CS/vision/bsds/

ages of this experiments. In Table 1, we can see indices S and CS come from three algorithms obtained the first minimal value when the number of clusters set up 4, i.e., the optimal number of clusters is 4. PCu , PEu , MSE and IQI indices of IVFCoC in experiments with the optimal number of clusters are better (smaller) than FCM and FCoC, respectively. The next experiment, we use 100 color images, the size of each image is either 321 × 481 or 481 × 321. The segmentation of all 100 images by the proposed IVFCoC algorithm is shown in Fig. 2. The results show a good match with human ground truth segmentations as indicated by a high value of Partition Coefficient index (PC) and Image Quality Index (IQI), and also efficient color differentiation as indicated by a low value of Mean Squared Error MSE. The number of clusters is determined from the first local minimal in the cluster validity S graph (normally 7 clusters for our experiments). In this experiment, we used input parameters Tu = 8, Tv = 9 × 107 , m1 = 1.5, m2 = 3.5. Next, we have conducted experiments on 10 randomly selected test images from the Berkeley Segmentation Dataset. In this experiment, we have used the number of cluster and fuzzy parameters were identified, three algorithms FCM, FCoC and IVFCoC was used to segment these photos three clustering algorithms. The results of

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

JID: KNOSYS 8

ARTICLE IN PRESS

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

Fig. 2. Image segment results using IVFCoC on 100 images in Berkeleys gallery. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

the segmentation and the corresponding PCu , PEu , MSE, IQI, S and CS segmentation evaluation indices are listed in Fig. 3 depicting the lowest values of S and CS for the proposed method as compared to all other methods. However, as clustering results observed from the segmentation results in Fig. 3, the segmentation evaluation indices received from the proposed method is always better than the prior proposals. Specifically, the corresponding PE, MSE, S and CS were lower and the corresponding PC and IQI index were higher. 4.2.2. Multi-spectral satellite image clustering Multi-spectral image are one of types which acquired from remote sensing (RS) radiometers. By dividing the spectrum into many bands, multi-spectral is the opposite of panchromatic, which only records the total intensity of radiation falling on each pixel. Usually, satellites have three or more radiometers. Each one acquires one digital image (in remote sensing, called a “scene”) in a small band of visible spectra, ranging from 0.0 0 04 mm to 0.0 0 07 mm, called red-green-blue (RGB) region, and going to infrared wavelengths of 0.0 0 07 mm to 0.001 mm or more, classified as near infrared (NIR), middle infrared (MIR) and far infrared (FIR or thermal). Because land covers such as forestry, agriculture, bare land, water surface, etc usually have no clearly boundaries, in fact, a indi-

Table 2 The electromagnetic spectrum for bands of Landsat7 imagery. Band No.

WaveBand

WaveLength(nm)

1 2 3 4 5 7

Blue Green Red Very Near-Infrared Near-Infrared Shortwave infrared

450–520 520–600 630–690 760–900 1550–1750 2080–2350

vidual pixel can contain ambiguous information e.g., rate of 30% is soil, rate of 25% is forest and rate of 45% is water. In this case, the pixel is assigned as water cover, but its rate of 55% is no water and applying a yes-no classification is not suitable. Therefore, fuzzy clustering in which a pixel can be assigned into many layers according to the membership grades, is more suitable than Boolean clustering. IVFCoC are applied to classify land covers from Landsat7 imagery consisting of 7 bands numbered from 1 to 7, in which wavelength and waveband are described in Table 2. In order to increase precision of classification, we used 6 bands of Landsat7 images. Because of being the thermal infrared band which is mainly used to

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

JID: KNOSYS

ARTICLE IN PRESS

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

9

Fig. 3. Image segmentation results using FCM, FCoC and IVFCoC on 10 images. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

calculate the surface temperature of the surface, the 6th band of Landsat7 images is not used. Let X = {x1 , x2 , ..., xN }, in which N is the number of pixels, xi = (xi1 , xi2 , xi3 , xi4 , xi5 , xi7 ) is grey level vector of image bands, espectively, i.e., xi6 corresponding to the 6th band is not used. The clustering algorithms are used to classify X into 6 sub-sets (classes) corresponding to the 6 types of land covers as follows: The study dataset of the Landsat7 imagery concerns an area of north of Hanoi, Vietnam (21o 54 23.11  N, 105o 03 06.47  E to 20o 55 14.25  N, 106o 0258.57E) with area of 3774.8736 km2 (Fig. 5). The size of each image band is 2048 x 2048 pixels and the resolution is 30 m x 30 m per pixel. Hence, the value of N is 4, 194,304 pixels. Fig. 5 shows 6 bands of Hanoi data from band 1 to band 7, excluding band 6.

Fig. 4. Six basic classes in multi-spectral satellite image classification. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS 10

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

Fig. 5. Six images corresponding to six spectrum bands of original multispectral satellite images (a) Band 1; (b) Band 2; (c) Band 3; (d) Band 4; (e) Band 5; (f) Band 7).

Fig. 6. Results of multi-spectral satellite images clustering FCM (a), FCoC (b), IVFCoC (c).

Table 3 Results of multi-spectral satellite images clustering. Algorithm

Tu

Tv

m1

m2

PCu

PEu

MSE

IQI

FCM FCoC IVFCoC

– 0.1 0.1

– 9.8 × 107 9.8 × 107

– – 1.5

– – 1.7

0.2050 0.9696 0.9767

0.58314 0.01392 0.00808

5.65 5.62 4.70

0.8332 0.8522 0.9133

Fuzzy parameters are set Tu = 0.1, Tv = 9.8x107 , m1 = 1.5, m2 = 1.7. Clustering results are shown in Fig. 6 came from FCM(a), FCoC(b), IVFCoC algorithms. Fig. 6 uses six basic classes in Fig. 4 to represent the clustered multi-spectral image pixels. To assessing the performance of the algorithms on the experimental images we analyzed the results on the basis of several validity indexes in Table 3. As prior experiments, we considered the different validity indicessuch as the Bezdeks partition coefficient PCu [21], Mean

Squared Error MSE [27] and Image Quality Index IQI [28]. The values of these validity indicesare shown in the Table 2. In these results, indices PCu and IQI come from IVFCoC were obtained the largest values and index MSE were obtained the smallest value in comparison with FCM and FCoC, i.e the proposed algorithm have better quality clustering than FCM and FCoC algorithms. This method can be used for Multi-spectral satellite image classification in assessment of land cover on a large area quickly, reduce costs compared to other ways of change detection, makes predictions about the land cover fluctuations to supports planning urban, natural resources management, etc.

4.2.3. Clustering high-dimensional datasets Next, we have used five high-dimensional datasets which is downloaded from the clustering datasets of Speech and Image Pro-

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13 Table 4 Concise information of high-dimensional datasets. Datasets

No. of clusters

No. of objects

No. of features

Dim032 Dim064 Dim128 Dim256 Dim512 Dim1024

16 16 16 16 16 16

1024 1024 1024 1024 1024 1024

32 64 128 256 512 1024

Table 6 Clustering results using IVFCoC with dataset “Dim256”, Tu = 1 ÷ 19, Tv = 106 , m1 = 1.5, m2 = 3.5.

Table 5 Clustering results using six high-dimensional datasets. Datasets

Index

FCM

FCoC

IVFCoC

Dim32

MSE IQI PC MSE IQI PC MSE IQI PC MSE IQI PC MSE IQI PC MSE IQI PC

12.34 0.92 0.850 13.21 0.89 0.935 15.20 0.83 0.813 19.10 0.75 0.800 11.23 0.86 0.780 21.12 0.72 0.750

8.07 0.983 0.900 5.11 0.995 0.966 24.38 0.969 0.889 8.22 0.976 0.900 6.42 0.987 0.933 7.022 0.986 0.955

7.45 0.997 0.966 0.19 0.999 0.997 1.198 0.999 0.680 0.661 0.999 0.986 0.842 0.999 0.718 3.93 0.995 0.249

Dim64

Dim128

Dim256

Dim512

Dim1024

cessing Unit, School of Computing University of Eastern Finland2 . The number of clusters for these datasets is 16, the number of objects is 1024, and the number of features ranges from 32 to 1024. Table 4 shows the information of these datasets. In this experiment, we have used set of fuzzy parameters with Tu = 10, Tv = 106 , m1 = 1.1 and m2 = 1.5. The clustering results in PC, MSE and Q measure of six datasets are shown in Table 5. From experimental results in Table 4, some advantages of IVFCoC can be summarized. All three measurements PC, MSE and IQI are obtained from IVFCoC algorithm better than other proposals. 4.2.4. Discussions In this paper, we have conducted three experiments for different types of data. In order to achieve the best clustering result, it is very important to choose a suitable set of parameters for each particular dataset. In this section, the impact of parameters to the clustering results is discussed. We conducted experiments on each dataset and take different values of input parameters Tu , Tv , m1 and m2 . The algorithm is found to be more sensitive to the values of Tu than Tv since the values of uci obtained are close to the Boolean case and a careful choice of Tu is required for the algorithm to converge. The value of Tu in our experiments is found to range from 190 for different images with values close to 1 for images with nondistinct colors and higher values for visually distinct colors. The valid values of Tv ranges widely from 105 ÷108 and do not have a major impact on the resulting clusters since its function is to contribute to the computation of uci by scaling vcj . The valid values of Tv ranges shortly from 1.1 3.5. To determine the parameters for the proposed algorithm, we conducted the following steps. First, we seek the value of the parameters Tu and Tv for the dataset. And then, we gradually refined 2 Speech and Image Processing Unit, School of Computing University of Eastern Finland, Clustering datasets [Online], http://cs.joensuu.fi/sipu/datasets/.

11

Tu

PCu

MSE

IQI

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0.966 0.966 0.966 0.999 0.999 0.998 0.997 0.996 0.993 0.986 0.975 0.959 0.937 0.856 0.828 0.846 0.810 0.770 0.730

3.430 3.430 3.430 0.660 0.660 0.660 0.660 0.660 0.662 0.664 0.665 0.667 0.668 1.900 1.930 0.700 0.730 0.778 0.830

0.990 0.990 0.990 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.994 0.994 0.995 0.997 0.999 0.999

Table 7 Clustering results using IVFCoC with dataset “Dim256”, Tu = 5, Tv = 500 ÷ 2.4x101 0, m1 = 1.5, m2 = 3.5. Tv

PCu

MSE

IQI

500 2500 12500 62500 312500 1.60E+06 7.80E+06 3.90E+07 1.90E+08 9.70E+08 4.90E+09 2.40E+10

0.68 0.83 0.93 0.93 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999

25.5 7.94 8,5 5.78 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66

0.9 0.959 0.987 0.99 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999

the parameters Tu , Tv , m1 and m2 to obtain the best clustering results, respectively, according to the best values of several validity indices. To clarify the choice of parameters, we used dataset “Dim256” in Section 4.2.3 to analyze the process of determining the parameters to obtain the best clustering results. Firstly, the parameters were set as Tu = 10, Tv = 106 , m1 = 1.5 and m2 = 3.5. Parameters are adjusted to obtain the best clustering. Tu is changed in the range [1, 19] to determine validity indices as in Table 6. Based on the clustering results, the validity indices reached the best values at Tu = 4 or Tu = 5. Hence, Tu was set to Tu = 5. Table 7 showed the validity indices when Tv is changed in the range [500, 2.4x1010 ] using the algorithm IVFCoC. When Tv is larger than 312500, the validity indices reached the best values together. In fact, the similar results were obtained when Tv = 106 , therefore Tv was set to 105 . To determine the parameters m1 and m2 , Tu and Tv were set to the above selected values, i.e. Tu = 5, Tv = 106 ; m1 and m2 are taken values in the range [1.1, 2.1]. The algorithm IVFCoC was used to determine the validity indices of PC, MSE and IQI which are shown in Table 8. The best values of PC, MSE and IQI obtained at pairs (m1 , m2 ) of (1.3, 1.7), (1.5, 1.6), (1.6, 1.7) and (1.5, 1.7). Almost experiments, the proposed algorithms obtained the better results in comparison with other clustering methods of FCM

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

ARTICLE IN PRESS

JID: KNOSYS 12

[m5G;June 20, 2016;16:1]

V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13 Table 8 Validity indices (PC, MSE, IQI, respectively) by running IVFCoC for dataset “Dim256”, Tu = 5, 1.9, m2 = 1.0 ÷ 2.1.

Tv = 106 ,

m1 = 0.9 ÷

m2 m1 0.9

1.0 –

1.1 –

1.2 –

1.3 –

1.4 –

1.5 –

1.6 –

1.7 –

1.8 –

1.9 –

2.0 –

2.1 –

1.0



1.1



0.900 8.200 0.976 –

1.2





0.900 8.200 0.976 0.900 8.200 0.976 –

1.3







0.930 6.440 0.988 0.930 6.440 0.988 0.930 6.440 0.988 0

1.4









0.930 6.440 0.988 0.930 6.440 0.988 0.930 6.440 0.988 0.930 6.440 0.988 –

1.5











0.930 9.200 0.989 0.930 6.440 0.988 0.930 6.440 0.988 0.960 3.430 0.994 0.960 3.430 0.994 –

1.6













0.930 9.400 0.989 0.930 6.440 0.988 0.930 6.440 0.988 0.960 3.430 0.994 0.960 3.430 0.994 0.999 0.660 0.999 –

1.7















0.930 9.400 0.989 0.960 3.670 0.994 0.960 3.430 0.994 0.999 0.660 0.999 0.960 3.430 0.994 0.999 0.660 0.999 0.990 0.660 0.999 –

1.8

















0.950 3.600 0.990 0.950 3.600 0.994 0.992 0.662 0.999 0.991 0.661 0.999 0.950 3.430 0.994 0.992 0.660 0.999 0.992 0.660 0.998 0.950 1.880 0.994 –

0.330 17.40 0.937 0.340 16.40 0.940 0.350 16.105 0.960 0.340 11.20 0.960 0.368 9.710 0.977 0.410 2.860 0.991 0.450 1.420 0.998 0.450 1.180 0.998 –

0.280 19.50 0.920 0.280 17.60 0.937 0.280 17.20 0.950 0.280 12.40 0.960 0.290 10.78 0.972 0.310 3.0 0 0 0.996 0.310 2.440 0.997 0.310 1.990 0.998 –

1.9



















0.520 14.90 0.960 0.520 14.50 0.960 0.520 14.70 0.970 0.520 10.10 0.974 0.810 0.760 0.999 0.810 0.710 0.998 0.810 0.688 0.998 0.760 1.910 0.994 0.800 0.670 0.998 –





and FCoC. Especially, experiments were performed on complex datasets with high uncertainty and high dimensions. For simple dataset, the proposed algorithms do not usually give results better than other algorithms. Besides, the computational complexity of IVFCoC in theory is larger than FCoC and FCM. However, IVFCoC normally converged faster through the experiments conducted. 5. Conclusion In this paper, we developed an Interval-Valued Fuzzy CoClustering built upon a combination of a certain co-clustering algorithm and interval fuzzy sets. The IVFCoC is guided by the new objective function with two values of the fuzzifier producing the FOU. Experiments were conducted on the datasets of color images, multi-spectral satellite images and high dimensional datasets of the Berkeleys sample photo gallery and the UCI Machine Learning Repository. The obtained results are compared with those produced by several other approaches of fuzzy clustering and fuzzy co-clustering. Some future pursuits could be focused on problem of centroid initialization by using optimization methods such as e.g., particle swam optimization. Some other research alternative could be to apply the proposed algorithm to several real problems such as hyper-spectral image classification, medical diagnosis or environmental studies. References [1] C. Bezdek, R. Ehrlich, W. Full, The fuzzy c-means clustering algorithm, Comput. Geosci. 10 (23) (1984) 191–203. [2] K. Bhoyar, O. Kakde, Colour image segmentation using fast fuzzy c-means algorithm, Electron. Lett. Comput. Vis. Image Anal. 9 (2010) 18–31. [3] N.N. Karnik, J.M. Mendel, Operations on type-2 fuzzy sets, Fuzzy Sets Syst. 122 (2) (2001) 327–348. [4] J.M. Mendel, Advances in type-2 fuzzy sets and systems, Inf. Sci. 177 (2008) 84–110.

[5] J.M. Mendel, M.R. Rajati, P. Sussner, On clarifying some definitions and notations used for type-2 fuzzy sets as well as some recommended changes, Inf. Sci. (2016) 337–345. [6] L.A. Lucas, T.M. Centeno, M.R. Delgado, Land cover classification based on general type-2 fuzzy classifiers, Int. J. Fuzzy Syst. 10 (3) (2008) 207–216. [7] J. Clairet, A. Bigand, O. Colot, Color image segmentation using type-2 fuzzy sets, IEEE Int. Conf. E-Learn. Indus. Electron. (2006) 52–57. [8] C. Yeh, W.R. Jeng, S. Lee, Data-based system modeling using a type-2 fuzzy neural network with a hybrid learning algorithm, IEEE Trans. Neural Netw. 22 (12) (2011) 2296–2309. [9] U.N. Mau, L.N. Thanh, T.D. Thanh, An interval type-2 fuzzy subtractive clustering approach to obstacle detection of robot vision using RGB-d camera, Int. J. Hybrid Intell. Syst. 11 (2) (2014) 97–107. [10] P. Melin, O. Castillo, A review on type-2 fuzzy logic applications in clustering, classification, Appl. Soft Comput. 21 (2014) 568–577. [11] S.C. Madeira, A.L. Oliveira, Biclustering algorithms for biological data analysis: a survey, IEEE/ACM Trans. Comput. Biol. Bioinform. 1 (2004) 24–25. [12] V. Sindhwani, J. Hu, A. Mojsilovic, Regularized co-clustering with dual supervision, in: Proceedings of NIPS, 2008. [13] H. Ma, W. Zhao, Q. Tan, Z. Shi, Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering, Pacific-Asia Conf. Knowl. Disc. Data Mining (2010) 189–200. [14] Y. Song, S. Pan, S. Liu, Constrained co-clustering for textual documents, Assoc. Adv. Artif. Intell. (2010). [15] C.H. Oh, K. Honda, H. Ichihashi, Fuzzy clustering for categorical multivariate data, Proceedings of Joint Ninth IFSA World Congress and Twentieth NAFIPS International Conf. (2001) 2154–2159. [16] K. Kummamuru, A. Dhawale, R. Krishnapuram, Fuzzy co-clustering of documents and keywords, IEEE Int. Conf. Fuzzy Syst. 2 (2003) 772–777. [17] W.C. Tjhi, L. Chen, A partitioning based algorithm to fuzzy co-cluster documents and words, Pattern Recog. Lett. 27 (2006) 151–159. [18] W.C. Tjhi, L. Chen, Possibilistic fuzzy co-clustering of large document collections, Pattern Recog. 40 (12) (2007) 3452–3466. [19] W.C. Tjhi, L. Chen, A heuristic-based fuzzy co-clustering algorithm for categorization, Fuzzy Sets Syst. 159 (2008) 371–389. [20] Y. Kanzawa, Comparison of imputation strategies in FNM-basedand RFCM-based fuzzy co-clustering, in: International Conference on Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012, pp. 1988–1993. [21] K. Honda, M. Muranishi, A. Notsu, H. Ichihashi, FCM-type cluster validation in fuzzy co-clustering and collaborative filtering applicability, IJCSNS Int. J. Comput. Sci. Netw. Secur. 13 (1) (2013) 24–29. [22] Y. Yan, L. Chen, W.C. Tjhi, Fuzzy semi-supervised co-clustering for text documents, Fuzzy Sets Syst. 215 (2013) 74–89.

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049

JID: KNOSYS

ARTICLE IN PRESS V.N. Pham et al. / Knowledge-Based Systems 000 (2016) 1–13

[23] M. Hanmandlua, O.P. Verma, S. Susan, V.K. Madasu, Color segmentation by fuzzy co-clustering of chrominance color features, Neurocomputing 120 (2013) 235–249. [24] X.L. Xie, G. Beni, A validity measure for fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell. 13 (1991) 841–847. [25] H. Shieh, A hybrid fuzzy clustering method with a robust validity index, Int. J. Fuzzy Syst. 16 (1) (2014) 39–45. [26] J.C. Bezdek, Cluster validity with fuzzy sets, J. Cybernet 3 (1974) 58–73. [27] Z. Wang, A.C. Bovik, Mean squared error: love it or leave it? A new look at signal fidelity measures, IEEE Signal Process. Mag. (2009) 98–117. [28] Z. Wang, A.C. Bovik, A universal image quality index, IEEE Signal Process. lett. 9 (3) (2002). [29] I.S. Dhillon, S. Mallela, D.S. Modha, Information-theoretic co-clustering, in: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 03), 2003, pp. 89–98. [30] L.T. Ngo, D.S. Mai, W. Pedrycz, Semi-supervising interval type-2 fuzzy c-means clustering with spatial information for multi-spectral satellite image classification and change detection, Comput. Geosci. 83 (2014) 1–16. [31] D.D. Nguyen, L.T. Ngo, L.T. Pham, W. Pedrycz, Towards hybrid clustering approach to data classification: multiple kernels based interval-valued fuzzy c-means algorithms, Fuzzy Sets Syst. 279 (2015) 17–39.

[m5G;June 20, 2016;16:1] 13

[32] D.D. Nguyen, L.T. Ngo, J. Watada, A genetic type-2 fuzzy c-means clustering approach to m-FISH segmentation, J. Intell. Fuzzy Syst. 27 (2012) 3111–3122. [33] T.H. Dang, L.T. Ngo, W. Pedrycz, Interval type-2 fuzzy c-means approach to collaborative clustering, IEEE Conf. Fuzzy Syst. (2015). [34] A.D. Torshizi, M.H.F. Zarandi, A new cluster validity measure based on general type-2 fuzzy sets: Application in gene expression data clustering, Knowl. Based Syst. 64 (2014) 81–93. [35] S. Salehi, A. Selamat, M.R. Mashinchi, H. Fujita, The synergistic combination of particle swarm optimization and fuzzy sets to design granular classifier, Knowl. Based Syst. 76 (2015) 200–218. [36] J. Hu, T. Li, H. Wang, H. Fujita, Hierarchical cluster ensemble model based on knowledge granulation, Knowl. Based Syst. 91 (2016) 179–188. [37] Y. Xu, H. Wang, D. Yu, Weak transitivity of interval-valued fuzzy relations, Knowl. Based Syst. 63 (June 2014) 24–32. [38] F. de A. T., C.P. Tenrio, Fuzzy k-means clustering algorithms for interval-valued data based on adaptive quadratic distances, Fuzzy Sets Syst. 161 (2010) 2978–2999. [39] P. D’Urso, J.M. Leski, Fuzzy c-ordered medoids clustering for interval-valued data, Pattern Recog. 58 (October 2016) 49–67.

Please cite this article as: V.N. Pham et al., Interval-valued fuzzy set approach to fuzzy co-clustering for data classification, KnowledgeBased Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.05.049