Active learning for improving a soft subspace clustering algorithm

Active learning for improving a soft subspace clustering algorithm

Engineering Applications of Artificial Intelligence 46 (2015) 196–208 Contents lists available at ScienceDirect Engineering Applications of Artificial...

7MB Sizes 5 Downloads 205 Views

Engineering Applications of Artificial Intelligence 46 (2015) 196–208

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Active learning for improving a soft subspace clustering algorithm Amel Boulemnadjel a,n, Fella Hachouf a, Amel Hebboul b, Khalifa Djemal c a Laboratoire d'Automatique et de Robotique, Département d’Électronique, Faculté des sciences de l'ingénieur, Université des Freres Mentouri Constantine, Route d'Ain el bey, 25000 Constantine, Algeria b Département des Sciences Exactes et Informatique, Ecole Normale Supérieure de Constantine, Ali Mendjli, Constantine 3, Algeria c IBISC Laboratory, University Evry val D'Essonnes, 40 Pelvoux Street, 91080 EVRY Courcouronnes Cedex, France

art ic l e i nf o

a b s t r a c t

Article history: Received 26 October 2014 Received in revised form 4 July 2015 Accepted 6 August 2015

In this paper a new soft subspace clustering algorithm is proposed. It is an iterative algorithm based on the minimization of a new objective function. The classification approach is developed by acting at three essential points. The first one is related to an initialization step; we suggest to use a multi-class support vector machine (SVM) for improving the initial classification parameters. The second point is based on the new objective function. It is formed by a separation term and compactness ones. The density of clusters is introduced in the last term to yield different cluster shapes. The third and the most important point consists in an active learning with SVM incorporated in the classification process. It allows a good estimation of the centers and the membership degrees and a speed convergence of the proposed algorithm. The developed approach has been tested to classify different synthetic datasets and real images databases. Several indices of performance have been used to demonstrate the superiority of the proposed method. Experimental results have corroborated the effectiveness of the proposed method in terms of good quality and optimized runtime. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Subspace clustering Density Active learning SVM

1. Introduction Clustering problem seeks to partition a given dataset into clusters. Objects in a same cluster have high similarity but they are very dissimilar to objects in other clusters according to a similarity measure. In high dimensional datasets, objects representation varies from one dimension to another. Therefore, it is difficult to find a single notion of similarity to group all objects. Related classical algorithms are inefficient for classification data in different subspaces. They suffer from the curse of dimensionality and the similarity functions that use all input features with equal relevance and may not be effective. Dimensionality reduction methods such as feature transformation and selection (Hu et al., 2007) have been proposed to solve this type of problem. The biclustering algorithms (Madeira and Oliveira, 2004; Busygin et al., 2008) use a dissimilarity measure between the rows and the columns. These methods are mainly applied to cluster binary data. In data mining, dataset have hundreds or thousands of categorical attributes. In this case, the bi-clustering algorithms need a large number of binary attributes. This will inevitably increase the computation memory space. This problem led researchers to find n

Corresponding author. E-mail addresses: [email protected] (A. Boulemnadjel), [email protected] (F. Hachouf), [email protected] (A. Hebboul), [email protected] (K. Djemal). http://dx.doi.org/10.1016/j.engappai.2015.08.005 0952-1976/& 2015 Elsevier Ltd. All rights reserved.

other methods that detect different clusters in different subspaces. Subspace clustering is an extension of the traditional clustering (Parsons et al., 2004), with the goal of finding clusters that form on different subspaces. Aggrawal et al. (1999a) were the first to successfully introduce a methodology for locating different subspaces in different clusters. Subspaces clustering methods are divided into two broad categories: hard subspace clustering (Parsons et al., 2004) and soft subspace clustering (Deng et al., 2010; Xia et al., 2013; Wang et al., 2013). In the hard methods, each point of the dataset can belong to only one category. The first example of bottom-up methods was Clique algorithm while Proclus (Aggarwal et al., 1999b) was the first top-down one. In soft subspace clustering methods, each point of the dataset does not belong fully to one class but it has different degrees of membership in several classes. Weights are assigned to each feature to measure its contribution to build a particular cluster. In recent years, soft subspace clustering has emerged as an important research topic and many algorithms have been developed. Michele (2008) proposed a semisupervised method. A metric learning approach is used to improve the classical fuzzy C-means (Bezdek, 1981). This method is based on two steps. In the first step, a series of different metrics is learnt based on the data. In the second one, the fuzzy C-means with the previously computed metric is executed. Wu et al. (2005) employed two important informations; between and separation clusters to develop a fuzzy compactness and separation algorithm (FCS). Liang et al. (2011) proposed a clustering method for high

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

dimensionality data based on the k-modes algorithm. Two types of weights are introduced to identify important attributes and delete non-significant ones. Feature groups k-means (FG-k-means) is an algorithm proposed in Xiaojun et al. (2012). Two terms have been added to compute two types of weight attributes. Then, they are introduced to simultaneously identify the importance of groups and individual features in categorizing each cluster. Shortcomings of these methods are in the initialization step which makes unstable the final results. Soft subspace clustering has been enhanced by introducing the concept of entropy weights tuning (Anil Kumar et al., 2010; Domeniconi et al., 2007; Friedman and Meulman, 2004). LAC (Domeniconi et al., 2007) and COSA (Friedman and Meulman, 2004) associate to each cluster a weight vector. Both of them have been developed on an exponential weighting scheme but they are fundamentally different in their search strategies and outputs. Only inter-cluster dissimilarity is used in the objective function. In Deng et al. (2010) the objective function of an enhanced soft subspace clustering (ESSC) algorithm is based on two important informations; within-cluster compactness and between-cluster separation. It suffers from the initialization step inducing instability in the results. The effectiveness of this type of algorithm drops if the clusters have different densities and shapes. Better results are obtained with methods based on the cluster density (Sunita and Parag, 2009; Sembiring and Zain, 2010). Among these methods, DBSCAN (Ester et al., 1996) is the oldest. It is based on the density of reachability and connectability. Two parameters are selected to initialize the clustering process; neighborhood size and minimum density of clusters. This method presents the problem of parameters selection and computational complexity. An improved version of this method is given in Damodar and Prasanta (2012). The K-means algorithm is used to perform an automatic prototype selection. The paper is organized as follows: in Section 3, the proposed approach built around an active learning for improving the soft subspace clustering algorithm (ALISSC) is presented. Section 4 is dedicated to the experimental results and their discussion; cluster analysis and technical validation of the proposed method are developed. Finally, in Section 5, a conclusion and perspectives to improve the proposed work are given.

2. Motivations Classification methods based on optimization models (Benaichouche et al., 2013) are iterative methods that need an initialization step. An arbitrary initialization induces more computing time and influences the stability of the results. The cluster centers are poorly localized. Indeed, errors in estimating distances between centers are amplified through the iterations, inducing a classification low rate and a high processing time. The complexity of the objective function makes difficult to find the local minima. To overcome these shortcomings, a new soft subspace clustering approach is proposed. It is based on a new objective function. It contains two terms: weighting within cluster compactness and weighting between cluster separation. The weighting within cluster compactness term is based on the local variance and the clusters density. In this term, the weights are assigned to features according to the local variance of data along each dimension. They are computed using a new and simple formulation based on the local entropy for each feature. The learned weights perform a directional local reshaping of distances. Hence, they allow a better clusters separation and therefore the discovery of new patterns in different subspaces. An initial classification step is added by using a support vector machine (SVM) algorithm (Vapnik, 1999; Sangeetha et al., 2011; Langone et al., 2015), to generate the initial set of cluster centers and membership degrees. Clusters density is

197

Fig. 1. Flowchart of the proposed method.

introduced to improve the efficiency separation and to yield different clusters shapes. To take the advantages of the proposed objective function and the multi-class SVM, a combination of these two concepts is achieved. A novel multi-label strategy based on active learning (Jain and Kapoor, 2009; Hua Ho et al., 2011) is proposed to accelerate the convergence of the SVM and to estimate the cluster centers (Fig. 1). In the processing steps all used parameters are automatically selected.

3. Proposed method 3.1. Nomenclature To improve the readability of the equations, the following notations are used: ALISSC ESSC FSC JALISSC v u m c N D w x nik

η

Active learning to improve a soft subspace clustering Enhanced soft subspace clustering Fuzzy compactness and separation Proposed objective function Cluster center Membership degree Fuzzy parameter Clusters number Data size Features number (subspaces) Weight matrix Data Cluster density in the ith cluster and kth subspace Parameter controlling the influence of the weights in clusters separation

198

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

v0 JCt JSt ent Nc SVM RBF T K yl s

Center of centers in D dimensions Proposed compactness term Weighting between-cluster separation term Dataset entropy Number of objects in a cluster Support vector machine Radial basis function Learning vector Kernel function Class of the input xl SVM kernel parameter Threshold Normalized mutual information Number of data points in cluster i Number of data points in true cluster j Number of agreements between cluster i and true cluster j Percentage Index Number of true data points in cluster i Number of false data points in cluster i Center deviation True center Iterations number Cross-validation evaluation

ϵ

NMI Ni Nj Nij IP Nti Nfi CD tvi nt CVE

c

ϕ1 (u, λ ) =

N

D

∑ ∑ uijm ∑ wi, k i=1 j=1 N

k=1

(xjk − vik )2 − η (vik − v0k )2 nik + 1

c

∑ λju ∑ uij − 1



j=1

i=1

D (xjk − vik )2 ∂ϕ1 = muijm − 1 ∑ wik − η (vik − v0k )2 − λ ju = 0 ∂uij + n 1 ik k=1

∂ϕ1 = ∂λ ju

(5)

(6)

c

∑ uij − 1 = 0

(7)

i=1

The membership degrees u minimizing the objective function of Eq. (1) are given by

uij =

⎡ (xjk − vik )2 ⎤−1/ (m − 1) D ∑k = 1 wik ⎢ − η (vik − v0k )2⎥ ⎣ nik + 1 ⎦ ⎡ (xjk − vik )2 ⎤−1/ (m − 1) c D ∑i = 1 ∑k = 1 wik ⎢ − η (vik − v0k )2⎥ ⎣ nik + 1 ⎦

(8)

with c

∑ uij = 1 3.2. Objective function

i=1

The new objective function JALISSC (active learning for improving soft space clustering) is defined as

=

N

(x − v )2 ∑ wik jk ik nik + 1 k=1

i=1 j=1 c

N

c

− η∑∑ i=1 j=1

∑ wik (vik − v0k

(1)

k=1

N

D

∑ ∑ uijm



i=1 j=1

k=1

wik

(xik − vik )2 nik + 1

(2)

The second term (weighting between-cluster separation) is inherited from the objective function of EWKM (Jing et al., 2007) and ESSC (Deng et al., 2010): c

N

JSt (v, u) = η ∑ ∑ uijm i=1 j=1

∑ wik (vik − v0k )2 k=1

(3)

smck D

∑k = 1 smck

(4)

With Nc

smck =

∂ϕ2 = wik ∂vik

N



∑ uijm ⎜ j=1



(xjk − vik )2 nik + 1

D

∑ wik ((vik − v0k )2) k=1



(xjk − vik )2 ⎞ ⎟ − η (vik − v0k ) = 0 nik + 1 ⎠

⎞ ⎛ xjk N ∑ j = 1 uij ⎜ − ηv0k ⎟ ⎠ ⎝ nik + 1 vik = ⎛ ⎞ 1 N ∑ j = 1 uij ⎜ − η⎟ ⎝ (nik + 1) ⎠

(9)

(10)

(11)

3.3. Active learning using multi-class SVM

D

The wik weights are computed using the local entropy as

wik =

k=1 N

i=1 j=1

)2

In Eq. (1) the compactness term is based on the local variance using the clusters density (Boulemnadjel and Hachouf, 2012): c

D

− η ∑ ∑ uijm

D

uijm

N

∑ ∑ uijm ∑ wik i=1 j=1

D

∑ ∑ uijm

JCt (v, u) =

c

ϕ2 (v) =

JALISSC (v, u) c

Result 2. Given u = [u1, u2…u3 ] fixed, the optimal vik is obtained by setting ∂ϕ2/∂vik .

∑ ent (h, k) h=1 D

under the constraint ∑k = 1 wik = 1 We can solve this constrained optimization problem by introducing the Lagrange multipliers and minimizing the objective function. Result 1. Given v = [v1, v2…v3 ] fixed, the optimal uij is obtained by setting ∂ϕ1/∂uij .

Active learning studies the phenomenon in a closed loop of a learner selecting actions or making queries. This will influence the choice of data to be added to the training set. Local learning idea is to build a new database from the dataset that will be used by the SVM. Kernel functions that allow an optimal separation between dataset are employed by the SVM algorithm. One-against-all method has been widely used in the support vector literature to solve the multi-class pattern recognition problem (Polat and Gunes, 2009). For our application, the use of the method one against all seems well suited. It is fast in terms of computation time because the number of steps is small. So a quick decision for a sample test is obtained. The one-against-all approach constructs M binary SVM classifiers each of which separates one class from all the rest. The ith SVM is trained with all the training examples of the ith class with positive labels, and the others with negative labels. Mathematically the ith SVM solves the following problem that yields the ith decision function fi (x ) = wiT ϕ (x ) + bi to be

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

minimized using the Lagrangian as

Table 1 UCI datasets.

N

1 ‖wi ‖2 + C ∑ ξ ji 2 l= 1

L (w , ξ ji ) =



subject to yj (wiT ϕ (xj ) + bi )≽1 − ξ ji,

ξ ji ≽0

where ∼

yj = 1



if yj = i and yj = − 1 otherwise

where w, hyperplane; b, scalar, ξ, slack variables; and C, tuning parameter. At the classification step, a sample x is assigned to a class i⁎ with fin which produces the largest value: i = 1, … , M

The multi-class SVM is applied in the initialization step using an arbitrary learning vector T:

Txy = (x1, y1 ), ……‥ , (xk , yk ) The SVM rules are trained on the data Txy with the kernel function K

⎛ ∥ xi − xj ∥ ⎞ K (xi , xj ) = exp ⎜ − ⎟ ⎠ ⎝ 2σ 2

(12)

The clusters centers vik and the membership degrees uij obtained by SVM can be written respectively as (Boulemnadjel and Hachouf, 2013)

vik =

Ni

⎧ ′=1 ⎪ uij ⎨ ⎪ u ⎩ ij′ = 0

=

N ∑ j = 1 uij′ xjk

Ni

,

i = 1‥ c , k = 1‥ D

(13)

Clusters number

Vectors number

Dimensions number

Abdolan Australian Balance Car Glass Heart1 Heart Iris Wine

3 2 3 4 6 5 2 3 3

4177 690 625 946 214 270 270 150 178

8 14 4 18 9 13 13 4 13

Database name

Clusters number

Vectors number

Dimensions number

A1 A2 A3 Dim32 Dim64 Dim128 S1 S2 S3 S4

20 35 50 16 16 16 15 15 15 15

3000 5250 7500 1024 1024 1024 5000 5000 5000 5000

2 2 2 32 64 128 2 2 2 2

Table 3 Clustering results obtained for the nine UCI datasets with NMI metric in nt iterations. Database

if the cluster of the data j is equal to i otherwise i = 1…c , j = 1…N

(14)

Optimal values of s are retained. They match the best classification rate. For each iteration, the active learning steps taken to create a new learning base are as follows: obtained clusters centers vik (Eq. (11)) and their detected near neighborhoods using a minimum Euclidean distance are selected. The near neighborhood is the set of objects (xi) having membership degrees greater than 1/c. They are used to build the new learning base T as a new SVM training vector. Mathematically, selecting the learning base generated by the active learning is performed as

An Euclidian distance is computed as dis (i, j ) =

Database name

Table 2 S, A and Dim datasets.

i⁎ = argmax fi (x) = argmax (wiT ϕ (x) + bi )

N ∑l =i 1 xlk

N

199

Abdolan Australian Balance Car Glass Heart1 Heart Iris Wine

ALISSC

ESSC

FCS

Multi-class SVM

NMI

nt NMI

nt

NMI

nt

NMI

0.176 0.451 0.38 0.181 0.668 0.213 0.245 0.873 0.834

2 2 3 3 3 2 3 4 2

11 12 9 12 12 12 16 16 12

0.1507 0.014 0.1057 0.218 0.1597 0.09 0.1417 0.04 0.3077 0.047 0.0467 0.052 0.1317 0.040 0.7177 0.184 0.7337 0.005

20 20 20 20 20 20 20 20 20

0.133 0.319 0.14 0.655 0.352 0.547 0.337 0.873 0.853

0.1637 0.02 0.3617 0.016 0.287 70.14 0.1237 0.197 0.3507 0.012 0.1117 0.153 0.3067 0.047 0.7417 0.005 0.8627 0.05

The bold values present the best values obtained by the methods for different data base

Table 4 Clustering results obtained for the nine UCI datasets with IP metric. Database

ALISSC

ESSC

FCS

Multi-class SVM

Abdolan Australian Balance Car Glass Heart1 Heart Iris Wine

48.00 87.00 60.00 44.00 65.89 48.51 78.00 95.33 94.94

45.377 0.60 80.65 7 2.10 59.20 7 7.32 34.047 1.50 41.50 7 1.77 21.45 7 2.33 72.43 73.51 90.00 70.51 95.26 7 0.54

41.63 7 5.01 51.90 70.70 45.197 7.13 31.38 7 2.53 43.47 77.41 47.51 70.45 31.767 5.81 88.677 6.18 88.20 70.50

65.86 93.00 54.51 80.87 74.79 92.73 70.30 95.33 96.05

c

∑ j = 1 ∑i = 1 (xjk − vi )2

Clusters labels yj are given by

yj = argmin (dis (i, j ))

A pseudo training vector T′ is constructed by the previous cited near neighborhood and the labels yj:

T ′ = {(xjk , yj ) if uij ≥ 1/c The final training vector T, given by Eq. (15), is created by using the cluster centers and the pseudo vector T′:

Txy = {(v1, 1), …‥ , (vc , c ), (xi , yi )…(xl , yl )} A new formulation for the membership degrees u is given by

(15)

The bold values present the best values obtained by the methods for different data base

200

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

Table 5 Clustering results obtained for A, Dim and S datasets with NMI metric in ntiterations. Database ALISSC

A1 A2 A3 Dim32 Dim64 Dim128 S1 S2 S3 S4

ESSC

FCS

Multi-class SVM

NMI

nt

NMI

nt

NMI

nt

NMI

0.900 0.912 0.914 0.983 0.982 0.900 0.770 0.756 0.777 0.710

7 16 8 4 5 6 17 13 18 17

0.795 7 0.206 0.784 7 0.207 0.9107 0.009 0.7417 0.005 0.954 7 0.005 0.9007 0.023 0.6897 0.134 0.84 70.128 0.60 7 0.164 0.60 7 0.128

19 23 33 16 45 20 24 21 10 18

0.7237 0.046 0.7207 0.055 0.7187 0.028 0.831 70.032 0.826 7 0.085 0.8417 0.114 0.7257 0.019 0.798 7 0.053 0.7317 0.045 0.669 70.057

20 20 20 20 20 20 20 20 20 20

0.455 0.350 0.291 0.485 0.442 0.399 0.490 0.586 0.481 0.443

The bold values present the best values obtained by the methods for different data base

Table 6 Clustering results obtained for A, Dim and S datasets with IP metric. Database

ALISSC

ESSC

FCS

Multi-class SVM

A1 A2 A3 Dim32 Dim64 Dim128 S1 S2 S3 S4

74.33 77.81 76.43 95.33 93.00 75.74 84.62 78.98 79.50 75.16

62.87 7 0.94 56.59 7 1.04 74.94 7 0.79 90.00 71.95 84.00 71.34 73.14 71.76 66.537 1.94 75.03 7 3.05 53.187 2.86 52.487 2.34

45.107 6.00 38.82 71.27 28.497 0.90 65.137 7.40 70.59 71.28 71.007 1.40 58.25 71.06 65.177 6.21 52.50 77.21 59.38 75.80

20.00 11.43 8.00 24.90 23.14 18.75 38.01 28.84 28.94 27.65

The bold values present the best values obtained by the methods for different data base D

uij =

∑k = 1 wik [som1 − η (som2)]−1/ (m − 1) ∑i = 1 ⎡⎣ ∑k = 1 wik (som1) − (som2) ⎦ c

D

⎤−1/ (m − 1)

(16)

assumed that N ⎛ ⎞2 ∑ ⎜ x − j = 1 uij xjk ⎟ ⎜ jk ⎟ Ni ⎝ ⎠ som1 = nik + 1

⎛ ∑N u x ⎞2 j = 1 ij jk − v0k ⎟ som2 = ⎜ ⎜ ⎟ Ni ⎝ ⎠ There are no compulsory geometric constraints in all computing steps of the proposed approach which is summarized by Algorithm 1: Algorithm 1. Step 1: Initialization: – Input: Number of clusters C, parameters m, ϵ = 0.01 – Data normalization. – SVM training – u , v and w computing using respectively Eqs. (8), (11) and (4). – Output v(ik ) 0 , v0, w(ik ) 0 Step 2: Processing step: While v (t + 1) − v (t ) ≥ ϵ do – uij and vik computing using Eqs. (16) and (13).

Fig. 2. Centers evolution against iterations number: blue line with square : ALISSC, line :ESSC, (a) :Iris dataset, (b) glass dataset, (c):S1 and (c) S1 dataset. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

– Apply active learning to extract the new vector learning using clusters centers and membership degrees previously computed. – Apply the SVM algorithm and compute the new centers matrix using Eq. (13). – Compute w using Eq. (4) End while Step 3: Data classification – Assign each object xi to its potential cluster using the max of membership degrees u. End.

The best value of the fuzzy parameter m is computed as follows (Yu et al., 2004):

If min (N , D − 1) ≤ 3 −2

then1 < m < min (N , D − 1)/min (N , D − 1) else m = 2

where N is the number of objects in a cluster and D is the number of features. The parameter η is used to maintain a balance between the compactness and the cluster separations terms. Its value is determined by the distance between som1 and som2 as follows (Eq. (16)):

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

201

Fig. 3. (a) Iris dataset with centers obtained by ALISSC and ESSC algorithms, (b) clustering by ESSC algorithms and (c) clustering by ALISSC algorithm.

dij = som1 − ηsom2 st = som1/som2

When η is equal to St, the distance dij will be zero. So the membership degrees uij obtained by Eq. (16) will be a no value (division by zero). To ensure that the value of 0 ≤ u ≤ 1, the dij must be greater than zero. It is necessary to impose a constraint on η which must be lower than St at each iteration. So this rule is imposed by

⎛ ⎛ som1 ⎞ ⎞ η = min ⎜ min ⎜ η, ⁎F ⎟ ⎟ ⎝ ⎝ som2 ⎠ ⎠ where F ≠ 1

4. Results and discussion To test the performance of the proposed method, experiences have been carried on different real images and synthetic datasets, using Matlab 2010 on a core Duo 2.2 GHz computer with 2 GB RAM. UCI (http://archive.ics.uci.edu/ml/), S, A and Dim databases (Frnti and Virmajoki, 2006) with specific number of clusters, vectors and dimensions have been used. A description of these datasets is given in Tables 1 and 2. Nine databases have been selected from UCI datasets, with clusters number 2 ≤ c ≤ 6 and the higher number for vectors set to 4177 for Abdolan database and the lower one is 150 for Iris database. Dataset A is a synthetic 2-d data with respectively varying number of clusters between 20 and 50 and vectors between 3000 and 7500. Dim 32, Dim 64 and Dim

202

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

Fig. 4. (a) S1 dataset with centers obtained by ALISSC and ESSC algorithms, (b) clustering by ESSC algorithms and (c) clustering by ALISSC algorithm. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

128 are synthetic data with 16 Gaussian clusters, 1024 vectors and 32, 64 and 128 dimensions. Set S is 2-d data with 5000 vectors and 15 clusters with different degree of cluster overlapping.

Subspace clustering is performed in all these dimensions. Also, the proposed approach has been tested to classify different type of images.

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

203

Fig. 5. Convergence of the methods: (a) ESSC algorithm and (b) ALISSC algorithm. Table 7 Clustering results obtained for the nine UCI datasets with CD metrics.

Table 10 Cross validation on the nine UCI datasets.

Database

ALISSC

ESSC

FCS

Abdolan Australian Balance Car Glass Heart1 Heart Iris Wine

0.210 0.229 0.026 0.760 0.027 0.013 0.003 0.0086 0.0043

0.171 70.01 0.2437 0.03 0.012 70.0016 0.077 0.10 0.064 7 0.077 0.0737 0.02 0.0737 0.02 0.048 7 0.03 0.03 70.02

0.08370.015 0.804 70.061 0.3217 0.250 2.2767 0.838 0.968 70.355 0.047 70.143 1.539 70.188 0.9337 0.294 0.1637 0.392

The bold values present the best values obtained by the methods for different data base

Table 8 Clustering results obtained for the A, Dim and S datasets with CD metrics. Database

ALISSC

ESSC

FCS

A1 A2 A3 Dim32 Dim64 Dim128 S1 S2 S3 S4

0.016 0.028 0.10 0.052 0.012 0.231 0.035 0.03 0.028 0.035

0.1807 0.243 0.1627 0.13 0.1457 0.12 0.048 7 0.03 0.05 7 0.126 0.228 70.194 0.175 70.04 0.1847 0.178 0.147 0.098 0.1997 0.271

0.098 7 0.144 0.0727 0.071 0.0577 0.071 7.846 7 0.594 15.863 7 1.277 40.9107 2.617 0.036 7 0.043 0.055 7 0.063 0.0427 0.047 0.309 7 0.179

The bold values present the best values obtained by the methods for different data base

Database

ALISSC

ESSC

Multi-class SVM

Abdolan Australian Balance Car Glass Heart1 Heart Iris Wine

0.55 0.53 0.55 0.99 0.97 0.64 0.74 0.99 0.97

0.33 0.53 0.38 0.48 0.42 0.56 0.29 0.82 0.77

0.44 0.80 0.98 0.77 0.94 0.70 0.73 0.96 0.94

The bold values present the best values obtained by the methods for different data base

(Yi and Zheng, 2005). It is noticed that ESSC effectiveness has been evaluated against other existing methods like EWKM (entropy weighting K-means; Jing et al., 2007) and LAC (local adaptative clustering; Domeniconi et al., 2007). For this reason, it is not necessary to use them for comparing our method.

4.1. Performance analysis In this section, three metrics and the cross validation method are used to evaluate the results of the proposed algorithm. Normalized mutual information (NMI; Liu et al., 2006) given by c

NMI =

c

∑i = 1 ∑ j = 1 Nij log N·Nij/Ni Nj c

c

∑i = 1 Ni log Ni/N· ∑ j = 1 Nj log Nj/N ALISSC is compared to ESSC (Deng et al., 2010), FCS (fuzzy compactness and separation; Wu et al., 2005) and multi-class SVM

(17)

The higher value of NMI indicates the best result of the clustering. NMI takes a value within the interval [0,1].

Table 9 IP values for different subspaces clustering using Iris dataset. Dataset

Clusters number

IP Dimension1

IP Dimension2

IP Dimension3

IP Dimension4

Best wi

Iris Iris Iris

C1 C2 C3

92 98 100

98 94 94

100 98 100

100 100 96

3 4 3

The bold values present the best values obtained by the methods for different data base

204

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

Fig. 6. The obtained average of the best clustering results. Table 11 Running time (seconds) of algorithms performed on the nine UCI datasets. Database

ALISSC (1 execution)

ESSC (10 executions)

FCS (10 executions)

Multi-class SVM (1 execution)

Abdolan Australian Balance Car Glass Heart1 Heart Iris Wine

12.308 1.865 1.559 0.981 3.864 2.577 0.833 0.842 1.127

57.905 6.655 6.50 5.72 8.350 6,212 3.194 3.72 5.32

616.697 10.469 12.176 3.580 7.931 9.214 3.344 3.295 3.849

0.339 0.887 0.812 0.429 0.91 1.019 0.449 0.452 0.511

4.2. Experiments on synthetic datasets

The bold values present the best values obtained by the methods for different data base

Percentage index (IP) given by

Nti ⎞ ⎟ 100 Nti + Nfi ⎠ ⎝ i=1 c

IP =



∑⎜

(18)

Center deviation (CD; Hathaway et al., 2000) given by

CD =

1 c

c

∑ ‖vi − tvi ‖22 i=1

complexity of the proposed algorithm is O(N  D  c). For nt iterations, it becomes O(nt  N  D  c). Also the running times of the different algorithms have been compared.

(19)

The quality of the estimated center locations is measured by the obtained center location vi and the true center tvi given by the datasets. The lower value of CD corresponds to the best location of the center. Cross validation (CVE; Devijver and Kittler, 1982) is a wellknown method for analyzing results. It is a popular strategy for algorithm selection because of its simplicity and universality. The main idea behind CVE is to split data, once or several times, for estimating the risk of each algorithm: Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm. Then, CVE selects the algorithm with the smallest estimated risk. Complexity analysis: The computational complexity of the proposed algorithm has been studied. It is dependent on the data size N, the number of dimensions D (subspaces), the number of clusters c and the multi-class SVM. The SVM complexity is dependent on the support vectors number Nsv and the clusters number c. So it is O(c  Nsv) (Xisheng et al., 2012). Since Nsv ⪡ N (data size), the global complexity depends only on N, D and c. At each iteration the

Based on the above evaluation techniques, different tests have been conducted on the enumerated databases. Obtained results by the proposed method, ESSC, FCS and multi-class SVM algorithms (Yi and Zheng, 2005) are analyzed. ALISSC, ESSC and FCS algorithms have been executed 10 times. NMI average accuracies and standard deviation are reported in different clustering results. Forty percent of the databases are used to train the SVM. Cluster labels given by the datasets are used to learn the SVM, while they will only contribute in ALISSC, ESSC and FCS validation step. It should be noted that the SVM algorithm is a supervised classification method (Yi and Zheng, 2005; Finley, 2005). Results of different measures are reported in Tables 3–6. Mainly, for all the tested datasets, the best NMI values are given by ALISSC algorithm except for wine and S2 datasets for which ESSC produces the best ones ( NMIwine = 0.86 ± 0.05 , NMIS2 = 0.84 ± 0.128) in respectively T¼ 12 and T ¼21 iterations. The multi-class SVM algorithm performs well in car, heart1 and heart datasets with respectively NMIcar = 0.65, NMIheart1 = 0.54 and NMIheart = 0, 33. Apart from dataset wine ( IPwine = 95.26 for ESSC algorithm), the percentage index IP informing about correct clustering is the highest for the proposed algorithm along the rest of datasets with few iterations. It is noticed that along 10 executions time, ALISSC provides the same results in opposition to the variables ones achieved by ESSC and FCS methods. The proposed approach yields stable results because of the initialization step efficiency. While the proposed improvements are essentially based on the estimation of the centers locations, it is necessary to make a full analysis of the obtained centers. For this purpose, first the evolution norm of centers is plotted against the iterations number. Fig. 2 shows that the centers obtained by the proposed method are closer to the real ones only in few iterations. From the beginning, they are well distributed because of the good initialization step, in opposition to the clusters centers obtained by ESSC which take several iterations to converge to real centers. Figs. 3 and 4 represent the clustering data respectively for Iris and S1 dataset. Despite the observed overlapping between clusters, ALISSC produces the well located centers corresponding to the synthetic centers (Fig. 3c and Fig. 4c). In Fig. 4b, ESSC has created a false center (black circle)

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

205

Fig. 7. (a, e, i, m, and q) Original images; (b, f, j, n, and r) ESSC results; (c, g, k, o, and s) FCS; and (d, h, l, p, and t) ALISSC. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.

and consequently a nonexistent cluster. Some clusters are missed and others are merged (red circle) affecting the clustering quality. The good convergence of the proposed algorithm is shown in Fig. 5. The values of the center deviation parameter CD are computed and reported in Tables 7 and 8. Once again ALISSC outperforms ESSC regarding the CD results. The proposed algorithm fails in four databases: Abdolan CDFCS = 0.08, Balance CDESSC = 0.012,

Dim32 CDESSC = 0.048, and Dim128 CDESSC = 0.228 because these datasets contain only separated clusters. So ESSC and FCS do not perform well when clusters are overlapped. The entropy weights represent the influence of each subspace in the clusters distribution. The highest weights indicate the best partition in a subspace. If the weight value of wik in a subspace k is the highest, the corresponding center vik for the cluster is the best.

206

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

Fig. 8. Results in the seven subspaces: ESSC (1st column), FCS (2nd column) and ALISSC (3rd column).

Classification results for each subspace are given by Table 9. As an example we give the result for Iris dataset. The highest values of IP correspond to the best weights.

To test the generality of the proposed approach we perform the cross validation evaluation (CVE) since ALISSC is based on an active learning method. In order to have a fair performance

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

evaluation, for the tested algorithms we divide each datasets into 10 groups. Nine groups are used for the training step and one group for the test step. This process is repeated 10 times with each group used exactly once as a test set. Finally the 10 results for the folds are averaged to produce single performance estimation. CVE takes a value within the interval [0, 1]. The higher value of CVE indicates the best result of the clustering. CVE values for the studied algorithms are tabulated in Table 10 for UCI database. It is shown that the best values are provided by the proposed algorithm for almost datasets ( CVEiris = CVEcar = 0.99, CVEglass = 0.97, CVEheart = 0.74 and CVEabdolan = 0.55) followed by the multi-class SVM algorithm ( CVEaustralian = 0.80, CVEbalance = 0.98 and CVEheart1 = 0.70) which assesses very well the efficiency of our approach. In order to clarify the difference between the clustering results, all the datasets average values of the used metrics are plotted in Fig. 6. The running times of the different algorithms performed on the UCI datasets are computed and reported in Table 11. While ESSC and FCS algorithms require 10 executions time, the proposed method needs only one execution time to obtain the final result. It is noticed that for almost datasets, ALISSC presents the lower execution time. Consequently, the proposed method is faster than ESSC and FCS algorithms. It is evident that ALISSC is slower than multi-class SVM because it is incorporated in the proposed method. 4.3. Experiments on real images Several tests have been conducted on different real images to evaluate our algorithm since image segmentation is considered as an unsupervised classification. Used subspaces consist of the first five parameters of Haralick et al. (1973) namely: contrast, homogeneity, correlation, entropy and energy extracted from the cooccurrence matrix. Also the Canny edge detector and the gray level are used. Hence, in this case the vector dimension is seven. Fig. 7a is a synthetic image. For all methods, we have considered the same number of clusters. Fig. 7b and c are the classified images by ESSC and FCS algorithms respectively, initialized with c ¼5. The triangle and the image background are merged into one cluster leading to a result with c¼3 clusters. The result of our approach is illustrated in Fig. 7d. Despite the same initialization, clusters are well detected and boundaries of objects are well limited. Fig. 7e, i and m represents different abdominal CT scan images. Fig. 7q represents magnetic resonance imaging (MRI) scan. The number of clusters has been set to 6 for Fig. 7e, 7 for Fig. 7i and m and 5 for Fig. 7q. Generally, it is noticed that the shapes of the different clusters obtained by ALISSC are well detected, in opposition to the results obtained by ESSC and FCS methods. In Fig. 7g, the left kidney (1) does not appear and in Fig. 7f, the shape of the right kidney (2) is not detected. In Fig. 7h all organs are detected. In Fig. 7j, the kidney (5) and the spine (3) are represented by the same cluster, the renal vena (4) is merged with the kidney (5). In opposition, in Fig. 7l they are well separated (blue arrow). The liver (6), the spine (7) and the spleen (8) are fused in a same cluster. Also the inferior vena cava (10), in the upper abdomen is not detected in Fig. 7n but it appears clearly in Fig. 7p. The vena (9) is well shaped (green arrow). In Fig. 7t all clusters are well detected. To display the detected clusters in each subspace, centers obtained for each feature are used (Boulemnadjel and Hachouf Amel, 2012). It is obvious that some clusters are detected in some subspaces where part of them are hiding in others. Each subspace provides different information. Fig. 8 presents the clustering results in the used subspaces by the different considered method (Fig. 7e: original image used). Fig. 8, a, b and c are the resulting images when only the edge subspace is used. They are similar. For the entropy subspace, the same clusters are detected (Fig. 8d–f).

207

However the proposed approach (Fig. 8f) produces the best results. Most organs shape are respected. Similar results are obtained by all the methods for correlation and/or contrast subspaces with a better quality of ALISSC result (Fig. 8i and l). The stomach and the vertebra are well detected. Using only homogeneity and/or energy, no method has proven its superiority against another one (Fig. 8m–o and p–r). The best result is achieved by the proposed approach using the gray level as the last subspace. Most of the clusters are detected (Fig. 8u). In Fig. 8t the ribs and the liver have been merged by the FCS algorithm. Finally, it is noticed that for almost subspaces, ALISSC results are satisfying. So, we can conclude that the proposed approach yields good result in real images classification compared to existing methods dealing with the subspace clustering.

5. Conclusion In this paper, a new soft subspace clustering approach has been presented. It is based on a minimization of an objective function less complex than existing ones in the literature. To increase the classification rate by handling the missed and badly classified vectors, an active learning process is adopted. A new initialization step by SVM is introduced to reduce the number of iterations and stabilize the clustering results. Incorporating SVM in the active learning at each iteration has allowed to construct a new formulation of the membership degree and the cluster center. Consequently, the localization of the cluster center has been improved and the algorithm convergence has been accelerated. The developed method is global. It can be implemented efficiently on any dataset with various dimension sizes. It has been compared to other subspace clustering algorithms. The obtained results have revealed the efficiency of the proposed method. We are continuing efforts into improving the ALISSC algorithm. Further studies will be oriented on developing specific distances and optimizing features. We are hopeful that this new approach will give a new insights into subspace clustering paradigm.

Acknowledgments The authors would like to thank the anonymous reviewers whose valuable comments and suggestions have helped to considerably improve the quality of the paper.

Appendix A. Supplementary data Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.engappai.2015.08.005.

References Aggrawal, R., et al., 1999a. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, ACM Press, pp. 94–105. Aggarwal, C., et al., 1999b. Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, vol. 28 (2). ACM Press. pp. 61–72. Anil Kumar, T., et al., 2010. Entropy weighting genetic k-means algorithm for subspace clustering. Int. J. Comput. Appl. (7), 975–8887. 〈http://archive.ics.uci.edu/ml/〉. Benaichouche, A.N., Oulhadj, H., Siarry, P., 2013. Improved spatial fuzzy c-means clustering for image segmentation using PSO initialization, Mahalanobis distance and post-segmentation correction. Digit. Signal Process. 23 (5), 1390–1400.

208

A. Boulemnadjel et al. / Engineering Applications of Artificial Intelligence 46 (2015) 196–208

Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, NY. Boulemnadjel, A., Hachouf, F., 2012. An improved algorithm for subspace clustering applied to image segmentation. In: IV 2012, 297–301. Boulemnadjel, A., Hachouf Amel, 2012. A new method for finding clusters embedded in subspaces applied to medical tomography scan image. In: IEEE 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 383–390. Boulemnadjel, A., Hachouf, F., 2013. Estimating clusters centres using support vector machine: an improved soft subspace clustering algorithm. In: CAIP 2013, pp. 254–261. Busygin, S., et al., 2008. Biclustering in data mining. Comput. Oper. Res. 35, 2964–2987. Damodar, R., Prasanta, K., 2012. A prototype-based modified DBSCAN for gene clustering, Procedia Technol. 6, 485–492. Deng, Z., et al., 2010. Enhanced soft subspace clustering integrating within cluster and between-cluster information. Pattern Recognit. 43, 767–781. Devijver, Pierre A., Kittler, Josef, 1982. Pattern Recognition: A Statistical Approach. Prentice-Hall, London, GB. Domeniconi, C., et al., 2007. Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Discov. 14, 63–67. Ester, M., et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 291–316. Finley, T Joachims. 2005. Supervised clustering with support vector machines. In: Machine Learning, Bonn, Germany. Friedman, J.H., Meulman, J.J., 2004. Clustering objects on subsets of attributes. J. R. Stat. Soc. B 66 (4), 815–849. Frnti, P., Virmajoki, O., 2006. Iterative shrinking method for clustering problems. Pattern Recognit. 39 (May (5)), 761–765. Haralick, R., et al., 1973. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3 (6), 610–621. Hathaway, R.J., et al., 2000. Generalized fuzzy c-means clustering strategies using Lp norm distances. IEEE Trans. Fuzzy Syst. 8 (5), 576–582. Hua Ho, C., et al., 2011. Active learning and experimental design with SVMs. JMLR: Workshop Conf. Proc. 16, 71–84 (Workshop on Active Learning and Experimental Design). Hu, Q.H., et al., 2007. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognit. 40, 3509–3521. Jain, P., Kapoor, A., 2009. Active learning for large multi-class problems. In: Vision and Pattern Recognition, CVPR. Jing, L.P., Ng, M.K., Huang, Z.X., et al., 2007. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19 (8), 1026–1041.

Langone, R., Alzate, Carlos, et al., 2015. LS-SVM based spectral clustering and regression for predicting maintenance of industrial machines. Eng. Appl. Artif. Intell. (37), 268–278. Liang, B., et al., 2011. A novel attribute weighting algorithm for clustering high dimensional categorical data. Pattern Recognit. 44, 2843–2861. Liu, J., Mohammed, J., Carter, J., et al., 2006. Distance-based clustering of CGH data. Bioinformatics 22 (16), 1971–1978. Madeira, S.C., Oliveira, A.L., 2004. Biclustering algorithms for biological data analysis: a survey. IEEE Trans. Comput. Biol. Bioinform. 1, 24–45. Michele, C.A.M., 2008. Improving fuzzy clustering of biological data by metric learning with side information. Int. J. Approx. Reason. 47, 45–57. Parsons, L., et al., 2004. Evaluating subspace clustering algorithms. In: Workshop on Clustering High Dimensional Data and its Applications. SIAM International Conference on Data Mining, pp. 48–56. Polat, K., Gunes, S., 2009. A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst. Appl. 36 (2009) 1587–1592. Sangeetha, R., et al., 2011. Identifying efficient kernel function in multiclass support vector machines. Int. J. Comput. Appl. 28, 55. Sembiring, R.W., Zain, J.M., 2010. Cluster evaluation of density based subspace clustering. J. Comput. 2 (November (11)), 2151–9617. Sunita, J., Parag, K., 2009. Intelligent subspace clustering, a density based clustering approach for high dimensional dataset. World Acad. Sci. Eng. Technol. 55, 69–73. Vapnik, V., 1999. An overview of statistical learning theory. IEEE Trans. 477 on Neural Networks 10 (5), 988–999. Wang, Jun, Wang, Shitong, Chung, Fulai, Deng, Zhaohong, 2013. Fuzzy partition based soft subspace clustering and its applications in high dimensional data. Inf. Sci. 246, 133–154. Wu, K.L., Yu, J., Yang, M.S., 2005. A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests. Pattern Recognit. Lett. 26 (5), 639–652. Xia, Hu, Zhuang, Jian, Yu, Dehong, 2013. Novel soft subspace clustering with multiobjective evolutionary approach for high-dimensional data. Pattern Recognit. 46, 2562–2575. Xiaojun, C., et al., 2012. A feature weighting method for subspace clustering of highdimensional data. Pattern Recognit. 45, 434–446. Xisheng, H., Zhe, W., et al., 2012. A simplified multi-class support vector machine with reduced dual optimization. Pattern Recognit. Lett. (33), 71–82. Yi, L., Zheng, Y., 2005. One against – all multi – class SVM classification using reliability measure. In: IEEE International Joint Conference, vol. 2, pp. 849–854. Yu, J., et al., 2004. Analysis of the weighting exponent in the FCM. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 34 (1), 164–176.