Constructing affinity matrix in spectral clustering based on neighbor propagation

Constructing affinity matrix in spectral clustering based on neighbor propagation

Neurocomputing 97 (2012) 125–130 Contents lists available at SciVerse ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom ...

462KB Sizes 46 Downloads 738 Views

Neurocomputing 97 (2012) 125–130

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Constructing affinity matrix in spectral clustering based on neighbor propagation Xin-Ye Li n, Li-jie Guo Department of Electronic and Communication Engineering, North China Electric Power University, Baoding, Hebei 071003,China

a r t i c l e i n f o

abstract

Article history: Received 22 July 2011 Received in revised form 13 March 2012 Accepted 15 June 2012 Communicated by M. Sato-Ilic Available online 3 July 2012

Ng–Jordan–Weiss (NJW) spectral clustering method partitions data using the largest K eigenvectors of the normalized affinity matrix derived from a dataset, but when the dataset is of complex structure, the affinity matrix constructed by traditional Gaussian function could not reflect the real similarity among data points, then the decision of clustering number and selection of K largest eigenvectors are not always effective. Constructing a good affinity matrix is very important to spectral clustering. A new affinity matrix generation method is proposed by using neighbor relation propagation principle and a neighbor relation propagation algorithm is also given. The affinity matrix generated can increase the similarity of point pairs that should be in same cluster and can well detect the structure of data. An improved multi-way spectral clustering algorithm is proposed then. We have performed experiments on dataset of complex structure, adopting Tian Xia and his partner’s method for a baseline. The experiment result shows that our affinity matrix well reflects the real similarity among data points and selecting the largest K Eigenvectors gives the correct partition. We have also made comparison with NJW method on some common datasets, the results show that our method is more robust. & 2012 Elsevier B.V. All rights reserved.

Keywords: Pattern recognition Spectral clustering Affinity matrix Neighbor relation propagation

1. Introduction In recent years, spectral clustering [1–7] has attracted more and more interest due to their high performance in data clustering and simplicity in implementation. Spectral clustering methods utilize the eigenvectors of the normalized affinity matrix derived from data to perform data partitioning. Compared with traditional clustering method, spectral clustering method does not need to suppose that data distribution is spheral, so it could recognize the nonspheral distributed clusters. NJW method [3] is one of the most widely used spectral clustering algorithms. For a K clustering problem, this method always partitions data using the largest K eigenvectors of the normalized affinity matrix of a dataset. Although the spectral relaxation solution of normalized cut criteria lies in the subspace spanned by these eigenvectors, it is not guaranteed that the largest K eigenvectors can well detect the structure of the data [1]. To improve spectral clustering algorithm, [8] simplifies the selection of parameters and using neighbor adaptive scale, which simplifies the selection of parameters and makes the improved algorithm insensitive to both density and outliers. [9] proposed a relevance learning method which measures the relevance of an eigenvector according to how

n

Corresponding author. E-mail address: [email protected] (X.Y. Li).

0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.06.023

well it can separate the dataset into different clusters. [10] proposed a novel eigenvector selection method based on entropy ranking for spectral clustering. According to [3], in the ideal case in which all points in different clusters are infinitely far apart, the value lying in the ith row and the jth column of affinity matrix is greater than zero if the ith point and the jth point are in same cluster, but zero if the ith point and the jth point are in different cluster. The K largest eigenvalues of the laplacian matrix derived from the affinity matrix are all 1, while the (K þ1)th largest eigenvalue is far away from 1, then using the K largest eigenvectors to construct matrix Y, we will get K mutually orthogonal points on the surface of the unit K-sphere around which Y’s rows will cluster, Y is a new space in NJW method; moreover, these clusters correspond exactly to the true clustering of the original data. In the general case, if the perturbation of affinity matrix is small, the difference between the Kth largest eigenvalue and the (K þ1)th largest eigenvalue will be the largest, then select the largest K eigenvectors to construct matrix Y, the rows of Y still will form tight clusters around K wellseparated points on the surface of the K-sphere according to their ‘‘true’’ cluster. Typically the distribution of a dataset generated by a realworld system is complex and of an unknown shape, the affinity matrix constructed by Gauss function does not consider the distribution structure of the dataset and could not reflect the real similarity among data points, the eigenvalues of the laplacian

126

X.Y. Li, L.J. Guo / Neurocomputing 97 (2012) 125–130

matrix derived from the affinity matrix do not conform to the above rules again, so it is not guaranteed that the largest K eigenvectors can well detect the structure of the data. In this case, the research of selecting parameters or Eigenvectors would not be effective. Constructing an effective affinity matrix representing the dataset’s distribution structure is important. [11] proposed a new definition of affinity graph for spectral clustering from the graph partition perspective, and then it defined the affinity graph respecting two consistencies in a regularization framework of ranking on manifolds. Then it got the affinity matrix defined as A¼(I  aS)  1Y, where S¼D  1/2WD  1/2, W is the initial affinity matrix computed by Gaussian function, the degree matrix D is a diagonal matrix whose element Dii is the degree of the point xi, aA(0,1), I is an identity matrix. The proposed definition of affinity graph was applicable to both unsupervised and semi-supervised spectral clustering. In unsupervised spectral clustering, Y ¼I, that is, Y is also an identity matrix, then the vector space of A is identical with S, so we think that in unsupervised spectral clustering, affinity matrix A has not been improved radically. Here, we also propose a construction method of affinity matrix; we use neighbor propagation to get the last affinity matrix that could depict the intrinsic structure of the data. The rest paper is organized as follows. In Section 2, we review NJW method. In Section 3, we first present some definitions, and then give the construction of affinity matrix based on neighbor propagation, a neighbor propagation algorithm and an improved multi-way spectral clustering algorithm. Experimental results on dataset of complex structure and on several common dataset are given in Section 4, comparing our method with method in Ref. [11] and with the NJW method respectively. Finally, some concluding remarks and issues for future work are given in the Section 5.

In step (1), Gaussian function is widely used to construct the affinity matrix for spectral clustering. But Gauss function does not consider the distribution structure of the dataset and could not reflect the real similarity among data points, especially when a dataset generated by a real-world system is complex and of an unknown shape. In this case, it is not guaranteed that the largest K eigenvectors of L in step (3) can well detect the structure of the data. 3. Defining affinity matrix for spectral clustering through neighbor propagation 3.1. Some definitions Here, we give several definitions. Given a set of points S¼{s1,y,sn} in Rl, then: Distance matrix B is a n  n symmetrical matrix, the element in the ith row and the jth column of B represents the euclidean distance between the ith point si and the jth point sj in S, and is denoted as bij, as the follow definition: bij ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðsi1 sj1 Þ2 þ ðsi2 sj2 Þ2 þ . . . þðsil sjl Þ2

Distance threshold e is defined as n

n

i¼1

j¼1

e ¼ maxðmin bij Þ

2

wij ¼ exp 

Furthermore, the degree matrix D is a diagonal matrix whose P element Dii (Dii ¼ nj¼ 1 Aij ) is the degree of the point xi. As a spectral approach to graph partitioning problem, NJW method [3] uses the normalized affinity matrix as the Laplacian matrix and solves the optimization of the normalized cut criterion through considering the eigenvectors associated with the largest eigenvalues. The idea of NJW method is to find a new representation of patterns on the first K eigenvectors of the Laplacian matrix. The details of NJW method are given as follows. (1) Form the affinity matrix AARn  n defined by Eq. (1). (2) Compute the degree matrix D and the normalized affinity matrix L¼D  1/2AD  1/2. (3) Let 1¼ l1 Z l2 ZyZ lK be the K largest eigenvalues of L and v1, v2,y,vK be the corresponding eigenvectors. Form the matrix V¼[v1,v2,y,vK]ARn  K and here vi is the column vector. (4) Form the matrix Y, from V by renormalizing each of V’s rows P to have unit length (i.e.Y ij ¼ V ij =ð j V 2ij Þ1=2 ). (5) Treat each row of Y as a point in RK, and cluster them into K clusters via K-Means algorithm to obtain the final clustering of original dataset.

ð3Þ

Similarity matrix W is a n  n symmetrical matrix, the element in the ith row and the jth column of W represents the similarity between the ith point si and the jth point sj in S, and is denoted as wij, as the follow definition:

2. Ng–Jordan–Weiss (NJW) spectral clustering algorithm Spectral clustering methods are widely used graph-based approaches for data clustering. Given a dataset X¼{x1, x2,y,xn} in Rd with K clusters, we can define a n  n affinity matrix A whose element Aij can be viewed as the weight on the edge connecting the ith and jth data points. The element Aij of the affinity matrix is measured by a typical Gaussian function: ( 2 expðd ðxi ,xj Þ=s2 Þ i aj Aij ¼ ð1Þ 0 i¼j

ð2Þ

bij

!

2s2

ð4Þ

Neighbor relation R: if bij of distance matrix B is less than distance threshold e, then we call that points si and sj in S are neighbors, denoted as (si,sj)AR. Neighbor relation matrix N is a n  n symmetrical matrix, the value of N’s element could only be zero or one. If points si and sj in S are neighbors, then the value of element in the ith row and the jth column of N is one, otherwise, it equals to zero. Neighbor propagation principle: if (si,sj)AR and (sj,sk)AR, then (si,sk)AR.

3.2. Affinity matrix construction We construct the affinity matrix according to the following steps: Step (1). Compute the euclidean distance between each pairs of points in S according to formula (2), get the distance matrix B; and then compute the similarity between each pairs of points in S according to formula (4), get the similarity matrix W. Step (2). Initiate the neighbor relation matrix N according to distance matrix B and distance threshold e, that is, if bij of distance matrix B is less than distance threshold e, then nij ¼ 1 and nji ¼1. Step (3). Update neighbor relation matrix N and similarity matrix W according to the neighbor propagation principle: If nij ¼ 1, njk ¼1 and nik ¼0, then set nik ¼1 and nki ¼1, simultaneously, update wik and wki as minðwij , wjk Þ. We also propose a neighbor propagation algorithm here, that is: Input: n  ndistance matrix B, n  nsimilarity matrix W, n  ninitial neighbor relation matrix N Output: affinity matrix A

X.Y. Li, L.J. Guo / Neurocomputing 97 (2012) 125–130

Algorithm: For i¼1 to n For j¼iþ1 to n If (N[i][j]¼ ¼0) then For k ¼1 to n If ((ko 4i) & (ko 4j)) then N[i][j]¼N[i][k]*N[k][j]; If(N[i][j]¼ ¼1) then N[j][i]¼1; W[i][j]¼min(W[i][k],W[k][j]) W[i][j]¼W[j][i] Exit for End if End if End for End if End for End for MinSim¼min(W) For i¼1 to n For j¼iþ1 to n If (N[i][j]¼ ¼0) && (W[i][j]4 MinSim ) then W[i][j]¼ MinSim W[i][j]¼W[j][i] End if End for End for A ¼W

It is clear that the algorithm’s complexity is o(n3).

127

Step (4). The output of the neighbor propagation algorithm is the last affinity matrix A. In our method, the value of neighbor relation matrix N reflects the neighbor relation of all point pairs; the last affinity matrix reflects the real similarity of all point pairs with complex distribution structure. Especially to those point pairs with small initial similarity computed by Gauss function but in the same distribution structure, after the Neighbor propagation, their similarity will be bigger enough to reflect their neighbor relation. 3.3. An improved special clustering algorithm Step (1) Constructs the affinity matrix A according to Section 3.2 Step (2) Construct the Laplacian matrix L¼D  1/2AD  1/2. Step (3) Find the K largest eigenvectors of L and form the matrix V Step (4) Form the matrix Y from V by renormalizing each of V’s rows to have unit length, the element of Y yij is defined as V ij yij ¼ PK ð j ¼ 1 V 2ij Þ1=2 Step (5) Treating each row of Y as a point in RK, cluster them into K clusters via K-Means, assign the original point si to cluster j if and only if row i of the matrix Y was assigned to cluster j.

4. Experiment results and analysis At First, let us see the synthetic dataset shown in Fig. 1. According to our definition of neighbor relation, there are (c,f)AR, (f,g)AR, (g,h)AR, (h,d)AR and so on, but ðc, dÞ 2 = R,ðc, eÞ 2 =R According to our neighbor propagation principal, we get the following inference: ) ðc, f Þ A R ) ðc, gÞ A R ðf , gÞ A R ) ðc, gÞ A R ) ðc, hÞ A R ðg, hÞ A R ) ðc, hÞ A R ) ðc, dÞ A R ðh, dÞ A R Simultaneously, we update the following similarity values:

Fig. 1. A synthetic dataset with complex structure.

Wðc, gÞ ¼ min fWðc, f Þ, Wðf , gÞg Wðc, hÞ ¼ min fWðc, gg, Wðg, hÞg

Fig. 2. The first 7 eigenvalues gained by (a) using our method for s ¼ 1, 1.5, 2, 3, 4, 5 and (b) using method in Ref. [11] for s ¼1, 1.5, 2, 3y10.

128

X.Y. Li, L.J. Guo / Neurocomputing 97 (2012) 125–130

1.5

1 0.9 0.8 0.7

1

0.6 y 0.5

y

0.4

0.5

0.3 0.2 0.1

0 -1.5

-1

-0.5

0 x

0.5

1

0

1.5

-1

-0.5

0.5

1

0

0.5 x

1

1.5

2

6

7

0 -0.2 -0.4 -0.6 -0.8 y

-1 -1.2 -1.4 -1.6 -1.8 -2

-2

-1.5 -1

-0.5 0 x

1.5

2

1.2

1.2

1

1

0.8

0.8

eigen value

eigen value

Fig. 3. Three common datasets.

0.6 0.4 0.2 0 -0.2

0.6 0.4 0.2 0

1

2

3

4 5 number 1.2

6

-0.2

7

1

2

3

4 5 number

6

4 5 number

1

eigen value

0.8 0.6 0.4 0.2 0 -0.2

1

2

3

7

Fig. 4. The first 7 eigenvalues gained by our method for s ¼ 0.1, 0.2, 0.3, 0.4, 0.5, 1.2, 1.3, 1.5, 1.8, and 1.9 on three common dataset (a) on dataset in Fig. 3(a), (b) on dataset in Fig. 3(b) and (c) on dataset in Fig. 3(c).

X.Y. Li, L.J. Guo / Neurocomputing 97 (2012) 125–130

1.2

Wðc, dÞ ¼ min fWðc, hÞ, Wðh, dÞg It is clear that W(c, d) is larger than before. In the same way, we get (c, e)AR and increase the value of W(c, e). In Fig. 1, since the distance of point a and c is larger than e, they are not neighbors, so their initial neighbor relation value is 0; since there are no common points in a’s neighborhood and c’s neighborhood, after the neighbor propagation, the neighbor relation value is still 0. When the neighbor propagation process is over, we put the similarity of those non-neighbor points as the smallest value in W. From above we can see that our method can increase the similarity of point pairs belonging to same cluster and decrease the similarity of point pairs belonging to different clusters separately so as to improve the affinity matrix. Then we make comparison between our method and the method in Ref. [11]. Fig. 2 shows the Laplacian matrix’s eigenvalues gained by using our method and by using the method in

1

eigen value

0.8 0.6 0.4 0.2 0 -0.2

129

1

2

3

4 number

5

6

7

Fig. 5. The first 7 eigen values gained by NJW method for s ¼0.1, 0.2, 0.3, 0.4, 0.5, 1.2, 1.3, 1.5, 1.8, and 1.9 (from the top down).

1.5

1.5

1.5

1

1

1

0.5

0.5

0.5

0 -1.5 -1 -0.5

0

0.5

1

1.5

0 -1.5 -1

-0.5

0

0.5

1

1.5

0 -1.5 -1 -0.5

0

0.5

1

1.5

0 -1.5 -1 -0.5

0

0.5

1

1.5

0

1.5

1.5

1.5

1.5

1

1

1

0.5

0.5

0.5

0 -1.5 -1

-0.5

0

0.5

1

1.5

0 -1.5 -1

-0.5

0

1.5

1.5

1

1

0.5

0.5

0 -1.5

-1

-0.5

0

0.5

1

1.5

0.5

0 -1.5

-1

1

1.5

-0.5

0.5

1

Fig. 6. Clustering results of NJW method for different s. (a) s ¼0.1; (b) s ¼0.2; (c) s ¼ 0.3; (d) s ¼ 0.4; (e) s ¼ 0.5; (f) s ¼1.2; (g) s ¼1.3–1.8 and (h) s ¼ 1.9.

130

X.Y. Li, L.J. Guo / Neurocomputing 97 (2012) 125–130

Ref. [11]. In Fig. 2, the horizontal coordinate axis represents the number of eigenvalues, and the vertical coordinate axis represents the eigenvalues. We show the first 7 eigenvalues in both methods. Fig. 2(a) is the Laplacian matrix’s eigenvalues gained by using our method for scale parameter s ¼ 1, 1.5, 2, 3, 4 and 5. When s o4, the first three eigenvalues equal to or nearly equal to 1 while the fourth eigenvalue is much smaller than 1. The eigengap between the 3rd eigenvalue and the 4th eigenvalue is very large; When s 44, the eigengap between the 3rd eigenvalue and the 4th eigenvalue is also the largest one. So we can decide that the clustering number is 3 and select the first 3 eigenvectors to form the new space to cluster, and then we get the correct 3 clusters as shown in Fig. 1. Fig. 2(b) is the Laplacian matrix’s eigenvalues gained by using the method in Ref. [11] for s ¼1, 1.5, 2, 3 y 10, it is clear that we find it hard to decide the clustering number and select the correct eigenvectors from it. The reason is that in [11], the improved affinity matrix A¼(I  aS)  1Y, for unsupervised spectral clustering, Y ¼I, then the vector space of A is identical with S, so A has not been improved radically. To verify our method’s effectiveness, we also make comparison with NJW method on the following common datasets shown in Fig. 3. By using our method on the datasets shown in Fig. 3, we get the first 7 eigenvalues for three datasets shown in Fig. 4. In Fig. 4, the curves from the top down are all for s ¼ 0.1, 0.2, 0.3, 0.4, 0.5, 1.2, 1.3, 1.5, 1.8, and 1.9 respectively, when s ¼0.1, 0.2, 0.3, 0.4, 0.5, the first 2 eigenvalues are all 1, so we get the curves for these cases almost the same one. In Fig. 4(b) and (c), except for s ¼ 1.8, and 1.9, the eigengaps between the second and the third largest eigenvalue are all the largest one. In Fig. 4(a), except for s ¼1.5,1.8, and 1.9, the eigengaps between the second and the third largest eigenvalue are all the largest one. We select the 2 largest eigenvectors to construct matrix Y and get the correct clustering result in all cases. While by using NJW method, it is not always effective. Here, taking dataset (a) in Fig. 3 as an example, we change the scale parameter s from 0.1 to 1.9 and get the first 7 eigenvalues as shown in Fig. 5. In Fig. 5, the curves from the top down are for s ¼0.1, 0.2, 0.3, 0.4, 0.5, 1.2, 1.3, 1.5, 1.8, and 1.9 respectively. According to NJW method, the K largest eigenvalues should nearly approach to 1 and the (K þ1)th eignvalue is far away from 1, or the eigengap between the Kth eigenvalue and the (K þ1)th eigenvalue is the largest. But there are no cases in Fig. 5 which conform to that. If we select the 2 largest eigenvectors to cluster, we get the results as shown in Fig. 6. In Fig. 6, for the top six cases in Fig. 5, that is when s ¼0.1, 0.2, 0.3, 0.4, 0.5, and 1.2, the partitions are wrong, but when 1.94 s 41.2, that is for the cases with s ¼1.3, 1.5, and 1.8 in Fig. 5, the partitions are correct. But when s 41.8, the partitions are again wrong. It is clear that, our method is more robust than NJW method and the result is hardly influenced by the scale parameter.

5. Conclusions In this paper, we present an affinity matrix construction method based on neighbor propagation for spectral clustering. The experiment on dataset of complex structure shows that our affinity matrix well reflects the real similarity among data points and selecting the largest K Eigenvectors gives the correct partition. We also make

comparison with NJW method on some common datasets, the results show that our method is more robust. The later work is to do more experiments on different real dataset. Especially when the dataset is of outside points or noise points, the decision of distance threshold e should be verified and need to be studied more. We will pursue these research directions in the future work.

Acknowledgment This work was mainly supported by the Fundamental Research Funds for the Central Universities (09QG08). References [1] F.R. Bach, M.I. Jordan, Blind one-microphone speech separation: a spectral learning approach, in: Proceedings of the 2004 Neural Information Processing Systems, 2004, pp. 65–72. [2] M. Meila, J. Shi, A random walks view of spectral segmentation, in: Proceedings of Eighth International Conference on AI and Statistics, 2001, pp.873– 879. [3] A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in: Proceedings of the 2001 Neural Information Processing Systems, 2001, pp.849–856. [4] H. Chang, D.Y. Yeung, Robust path-based spectral clustering, Pattern Recognition 41 (1) (2008) 191–203. [5] A. Azran, Z. Ghahramani, Spectral methods for automatic multiscale data clustering, in: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, pp.190–197. [6] Qin Guimin, Gao Lin, Spectral clustering for detecting protein complexes in protein_protein interaction (PPI) networks, Math. Comput. Model. 52 (11–12) (2010) 2066–2074. [7] Li Wenyuan, Ng Wee-Keong, Liu Ying, Ong Kok-Leong, Enhancing the effectiveness of clustering with spectra analysis, IEEE Trans. Knowl. Data Eng. 19 (7) (2007) 887–902. [8] Jiacai Wang Ruijun Gu, An improved spectral clustering algorithm based on neighbor adaptive scale, in: Proceedings of the 2009 International Conference on Business Intelligence and Financial Engineering, 2009, pp. 233–236. [9] Xiang Tao, Gong Shaogang, Spectral clustering with eigenvector selection, Pattern Recognition 41 (3) (2008) 1012–1029. [10] Zhao Feng, Jiao Licheng, Liu Hanqiang, Gao Xinbo, Gong Maoguo, Spectral clustering with eigenvector selection based on entropy ranking, Neurocomputing 73 (10–12) (2010) 1704–1717. [11] Xia Tian, JuanCao, Zhang Yong-dong, Li Jin-tao, On defining affinity graph for spectral clustering through ranking on manifolds, Neurocomputing 72 (13–15) (2009) 3203–3211. Xinye Li received the M.Sc. degree from North China Electric Power University in 1996 and the Ph.D. degree from North China Electric Power University in 2009. Her research interests include pattern recognition, intelligent information process and information retrieval.

Lijie Guo is currently a M.Sc. student of North China Electric Power University. Her current research interests include pattern recognition and XML information retrieval.