Inappropriateness of the criterion of k-way normalized cuts for deciding the number of clusters

Pattern Recognition Letters 28 (2007) 1981–1986 www.elsevier.com/locate/patrec Inappropriateness of the criterion of k-way normalized cuts for decidi...

Download PDF

288KB Sizes 0 Downloads 7 Views

Report

PDF Reader
Full Text

Pattern Recognition Letters 28 (2007) 1981–1986 www.elsevier.com/locate/patrec

Inappropriateness of the criterion of k-way normalized cuts for deciding the number of clusters Ayumu Nagai

*

Department of Computer Science, Gunma University, 1-5-1 Tenjin-cho, Kiryu, Gunma 376-8515, Japan Received 16 February 2006; received in revised form 20 April 2007 Available online 12 June 2007 Communicated by R.P.W. Duin

Abstract Spectral clustering is a completely diﬀerent algorithm from other existing clustering algorithms in that it relies on a linear algebraic approach including spectral decomposition. Normalized Cuts is a representative algorithm of spectral clustering. It incorporates a criterion for deciding the number k of clusters to partition. This paper shows that the criterion is not appropriate for deciding k. We showed this by proving that the optimal bipartition (that is, when k = 2) becomes the optimal clustering. Namely, based on the criterion, the evaluation becomes better when k is small. We also show that the criterion is inappropriate for comparing approximate solutions with various k. Especially we prove that a ^ ^ can be constructed from H ^ ^ within the time complexity at most bipartition which surpasses the best given approximate solution H k k ^ ^. Oð^k 3 Þ, where ^k is the number of clusters contained in H k Based on these two reasons, the Normalized Cuts Criterion is not appropriate for deciding k. An alternative criterion is necessary. 2007 Elsevier B.V. All rights reserved. Keywords: Clustering; Spectral clustering; Number of clusters; Cluster validation

1. Introduction Clustering algorithms, which are used for unsupervised classiﬁcation, classiﬁes a given dataset into some groups called clusters, intending to recognize a bunch of similar data as a cluster. There are two major goals of clustering algorithms, that is, (1) to ﬁnd a proper partition and (2) to decide a number of clusters. Since the ‘‘proper’’ partition is diﬃcult to deﬁne, many studies have been invested in the ﬁrst goal. Whether directly or indirectly, some of the studies deal with the second goal. In this paper, we mainly discuss the second goal, that is, the decision of a number of clusters. Among various algorithms, we focus on an algorithm called spectral clustering (Dhillon, 2001; Ding et al., 2001; *

Tel.: +81 277 30 1809; fax: +81 277 30 1801. E-mail address: [email protected]

0167-8655/$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2007.05.020

Kannan et al., 2004; Ding, 2004). Spectral clustering is a quite unique algorithm from other algorithms in the sense that it uses spectral decomposition in order to obtain an approximate solution. The methods for spectral clustering are classiﬁed into several types depending on the objective function to use. Among them, we focus on a representative method called Normalized Cuts (Shi and Malik, 2000). Normalized Cuts is mostly used for image segmentation (Malik et al., 2001; Yu and Shi, 2003; Cour et al., 2005). Besides, there are applications to biological data (Pentney and Meila˘, 2005; Dhillon et al., 2004). The idea of Normalized Cuts (Shi and Malik, 2000) is as follows. The problem is similar to the min-cut problem in the ﬁeld of graph theory, since a cut of a graph is similar to a classiﬁcation of the dataset. However, the minimal cut tends to become a cut that separates an outlier data. Such a cut is meaningless in the actual cases, when the sizes of the clusters are exceedingly one-sided. Therefore, Shi

1982

A. Nagai / Pattern Recognition Letters 28 (2007) 1981–1986

proposed a criterion, which gives a good evaluation when the sizes of the clusters are equalized. This is the idea of Normalized Cuts. When we regard every individual input data as a node, and the similarity between two data as a weight on the edge between the nodes corresponding to the data, we can consider an input of the problem as a weighted graph. Therefore, we assume that a weighted graph G ¼ ðV; E; WÞ is given, where V is a set of all nodes (corresponding to the input data), E is a set of edges connecting the nodes, and W is a similarity matrix which denotes the similarity between any two input data. The matrix W is assumed as a nonnegative and symmetric n · n matrix. Clustering n input data into k clusters is to group V into k disjoint sets def denoted as CkV ¼ fV1 ; V2 ; . . . ; Vk g which satisﬁes Sk k a¼1 Va ¼ V and Va \ Vb ¼ ; for all a,b when a5b. CV corresponds to a clustering. We simply deﬁne V, which is a set of n input data, as {1, 2, . . . , n} in this paper. Let A; B V. We deﬁne links between two sets A and B by XX def ðWÞij ð1Þ linksðA; BÞ ¼ i2A j2B

which links is the same as cut except that links from A to A itself can be deﬁned. We deﬁne the degree of a set as the total links to all the nodes. That is, def

degreeðAÞ ¼ linksðA; VÞ

ð2Þ

The linkratio between two sets is deﬁned as follows. The degree is used for the normalization def

linkratioðA; BÞ ¼

linksðA; BÞ degreeðAÞ

ð3Þ

The linkratio between A and B is the proportion of the links to B to the total links that A has. Then, the formulation of Normalized Cuts is to minimize the following criterion: def

F k ðCkV Þ ¼

k 1X linkratioðVa ; V n Va Þ k a¼1

optimal bipartition (i.e., k = 2) becomes the optimal (w.r.t. k) clustering based on the criterion Fk. F k , which denotes the evaluation of the optimal k-way partition, becomes a monotone increasing function w.r.t. k. In other words, Fk has a bias that small k is favorable. Therefore, we cannot use the criterion Fk in order to decide k. 2. Normalized Cuts Criterion An objective function of Normalized Cuts is usually expressed in the form of linear algebra. We denote the partition CkV by n · k ‘‘partition matrix’’ H. Let def ½h1 ; h2 ; . . . ; hk ¼ H, that is, ha is a binary indicator vector for a cluster Va . The ith element of ha is 1 if and only if the ith data is P a member of a cluster Va . (Otherwise it is k 0.) Note that a¼1 ha ¼ 1n , since each data is a member of a single cluster. In other words, 1 ði 2 Va Þ def ðHÞia ¼ ð1 6 i 6 n; 1 6 a 6 kÞ ð5Þ 0 ði 62 Va Þ We deﬁne an n · n degree matrix D as follows: 8 n < PðWÞ ði ¼ jÞ def il ð1 6 i; j 6 nÞ ðDÞij ¼ l¼1 : 0 ði 6¼ jÞ D is a diagonal matrix. Note that W1n = D1n. Then, links and degree can be denoted as follows: linksðVa ; Vb Þ ¼ hTa Whb ¼ hTb Wha degreeðVa Þ ¼

hTa Dha

ð7Þ ð8Þ

Then, the criterion Fk of Normalized Cuts can be denoted as Eq. (9) def

Fk ¼

¼

ð4Þ

‘‘linkratioðVa ; V n Va Þ’’ is the proportion of the links exiting outside from a cluster Va to the total links in Va . Fk, the criterion of Normalized Cuts, is the average linkratio over all clusters. So far, we have assumed that k, i.e., the number of clusters, is given. In this case, since k is ﬁxed, only the partition CkV has to be determined under the criterion Fk. However, it is often the case that k is not given, depending on the problems. In this case, k has to be determined as well as a partition CkV . This is a more diﬃcult problem than a problem in which k is given. As a matter of fact, the criterion Fk is a representative criterion of Normalized Cuts to decide k in the problems where k is not given (Yu and Shi, 2003; Meila˘ and Xu, 2003; Dhillon et al., 2004; Yu and Shi, 2004). In this paper, we show that the criterion Fk is not appropriate for deciding k. In concrete terms, we prove that the

ð6Þ

k 1X linkratioðVa ; V n Va Þ k a¼1 k 1X ðlinkratioðVa ; VÞ linkratioðVa ; Va ÞÞ k a¼1

¼1

k k 1X 1X hTa Wha linkratioðVa ; Va Þ ¼ 1 k a¼1 k a¼1 hTa Dha

ð9Þ

Normalized Cuts results in a minimization problem expressed as Eq. (9). Fk(H) explicitly denotes the evaluation of a partition H by the criterion Fk. Hk explicitly denotes the k-way partition H. When k is given, the criterion for Normalized Cuts is deﬁned as follows: min Hk

s:t:

F k ðH k Þ 8 nk > < H k 2 f0; 1g k P > : ha ¼ 1n

ð10Þ

a¼1

However, when k is not given, k has to be determined as well as a partition H. In that case, the problem is deﬁned as (Yu and Shi, 2003, 2004; Meila˘ and Xu, 2003; Dhillon et al., 2004)

A. Nagai / Pattern Recognition Letters 28 (2007) 1981–1986

F k ðH k Þ 8 nk > < H k 2 f0; 1g k P > : ha ¼ 1n

min min k

Hk

s:t:

ð11Þ

a¼1

What we want to prove in this paper is Theorem 3.1. Theorem 3.1. For any given k P 2, mink minH k F k ðH k Þ ¼ minH 2 F 2 ðH 2 Þ. The claim of Theorem 3.1 is as follows. It is when k = 2 that the criterion Fk is minimized. There may be another minimal solution when k > 2, but at least there exists an optimal solution when k = 2. The strategy of our proof is as follows. First, we prove a proposition that, for any k, the optimal k-way partition surpasses (or is not worse than) the optimal (k + 1)-way partition. Here, we show that, by merging some pair of clusters in the optimal (k + 1)-way partition, we can construct a new k-way partition whose evaluation excels the original optimal (k + 1)-way partition. Since this new kway partition is not better than the optimal k-way partition obviously, the proposition holds. By applying the above proposition iteratively, Theorem 3.1 is derived. That is, Fk is optimal (w.r.t. k) when k = 2 (the minimum partition). For the actual proof, we use another criterion Gk instead of Fk. A criterion Gk is deﬁned as follows: def

Gk ðH k Þ ¼

k F k ðH k Þ k1

ð12Þ

Gk is deﬁned as an optimal k-way partition under the criterion Gk. That is, def

Gk ¼ min Gk ðH k Þ

ð13Þ

Hk

def

F k is also deﬁned in a similar way, such as F k ¼ minH k F k ðH k Þ. The optimal k-way partition H k is deﬁned as follows: def

H k ¼ arg min F k ðH k Þ

Proof of Lemma 3.2. For any given k, suppose that the optimal (k + 1)-way partition H kþ1 , whose evaluation Gkþ1 is deﬁned as Eq. (16), is given. ! kþ1 X 1 Wi def k þ 1 F kþ1 ¼ kþ1 Gkþ1 ¼ ð16Þ k k Di i¼1 def

3. Theorem and its proof

1983

def

T where W i ¼ hT i Whi , Di ¼ hi Dhi (1 6 i 6 k + 1), and H kþ1 ¼ ½h1 ; h2 ; . . . ; hkþ1 is the optimal (k + 1)-way partition. On the other hand, the evaluation of a k-way partition is deﬁned as Eq. (17). ! k X k 1 W 0i def Fk ¼ k Gk ¼ ð17Þ k1 k1 D0i i¼1 def

def

0 0T 0 0 where W 0i ¼ h0T i Whi and Di ¼ hi Dhi (1 6 i 6 k). We consider a k-way partition which is obtained by merging two clusters Va and Vb (a < b) selected from the k + 1 clusters (see Fig. 1). That is, we only consider h0i which is deﬁned as follows: 8 ha þ hb ði ¼ 1Þ > > > < h ð2 6 i < aÞ def i1 ð18Þ h0i ¼ > hi ða < i < bÞ > > : hiþ1 ðb < i 6 kÞ

By Eq. (17), kþ1 X 1 W i W a W b W 01 Gk ¼ k þ þ k1 Di Da Db D01 i¼1

By Eq. (16) and Eq. (19), Gkþ1

kþ1 X Wi P Gk () ðk 1Þ k þ 1 Di i¼1

! ð19Þ

!

! kþ1 X W i W a W b W 01 Pk k þ þ () 1 Di Da Db D01 i¼1 kþ1 X Wi W a W b W 01 þ k þ P0 Di Da Db D01 i¼1

ð20Þ

ð14Þ

Hk

Since Gk(Hk) equals a constant times Fk(Hk) for any Hk, H k ¼ arg min Gk ðH k Þ

ð15Þ

Hk

def

Let ½h1 ; h2 ; . . . ; hk ¼ H k . Note that Gk ¼ Gk ðH k Þ and F k ¼ F k ðH k Þ. Lemma 3.2. For any given k P 2, Gkþ1 P Gk . Lemma 3.2 claims that the optimal k-way partition surpasses (or is equal to) the optimal (k + 1)-way partition under the criterion G.

Fig. 1. An example of a (k + 1)-way partition when k = 3. A k-way partition is obtained by merging two clusters Va and Vb (a < b).

1984

A. Nagai / Pattern Recognition Letters 28 (2007) 1981–1986

We are going to show that there is a pair a and b (1 6 a < b 6 k + 1) which satisﬁes Eq. (20). In order to show this by proof by contradiction, we ﬁrst assume to the contrary that Eq. (20) is not satisﬁed, for any pair of a and b. 8a; 8b ða 6¼ bÞ ðleft-hand side of Eq: ð20ÞÞ < 0

¼

! kþ1 kþ1 kþ1 X X X Wi Wi W i kD kD k 2 1 þ Di Di i¼1 i¼1 i¼1 þk

kþ1 X

W i þ k2

kþ1 X i

i¼1

¼ kD þ k

kþ1 X

() 8a; 8bða 6¼ bÞ ðleft-hand side of Eq: ð20ÞÞ

hT a Whb

16a¼b6kþ1

16a
¼ kD þ k

ðDa þ Db Þ < 0

kþ1 X kþ1 X

ð21Þ

X

2hT a Whb

16a
X

¼ kD þ k

2hT a Whb

16a
Wiþk

i¼1

ðDa þ Db Þ < 0 X ½ðleft-hand side of Eq: ð20ÞÞ )

X

Wiþk

þ

X

! 2hT a Whb

16a
T hT a Whb ¼ kD þ k1n W1n

a¼1 b¼1

¼ kD þ kD ¼ 0 We then show that the left-hand side of Eq. (21) is equal to 0, which results in a contradiction. To that end, we use Eq. (22) and Eq. (23) X W a W b þ ðDa þ Db Þ Da Db 16a
kþ1 X a¼1

¼k

kþ1 X

kþ1 X Wa Waþ ðD Da Þ Da a¼1

WiþD

i¼1

Since the left-hand side of Eq. (21) is equal to 0, Eq. (21) is a contradiction. Therefore, it always exists a pair of clusters Va and Vb which satisﬁes Eq. (20). Thus, Gkþ1 P Gk holds, where Gk is an evaluation of the new k-way partition which is constructed by merging the two clusters. On the other hand, Gk P Gk obviously holds, that is, this new k-way partition is not better than the optimal k-way partition. Hence, Gkþ1 P Gk P Gk . h Lemma 3.3. For any given k P 2, F kþ1 P F k . Proof of Lemma 3.3. It is easily shown by Lemma 3.2. k k1 G Gk k þ 1 kþ1 k k k2 1 Gkþ1 ¼ G k kþ1 k2 k ðG Gk Þ P k þ 1 kþ1 P 0 ðby Lemma 3:2Þ

F kþ1 F k ¼

kþ1 kþ1 X Wi X Wi Di i¼1 i¼1

ð22Þ

def Pkþ1 T T where D ¼ i¼1 Di ¼ 1n D1n ¼ 1n W1n . X X W 01 ¼ ðha þ hb ÞT Wðha þ hb Þ 16a
16a
X

¼

ðW a þ W b Þ þ

16a
¼k

kþ1 X i

D01

ðha

ð24Þ

X

ð25Þ

The proof of Theorem 3.1, the goal of this paper, is as follows.

2hT a Whb

16a
X

Wiþ

2hT a Whb

ð23Þ

16a
hb ÞT Dðha

hb Þ

Since ¼ þ þ ¼ Da þ Db , we have the following fact due to Eq. (22) and Eq. (23): ðthe left-hand side of Eq: ð21ÞÞ " # ! kþ1 X X Wi ¼ 1 þ ðDa þ Db Þ Di 16a
Proof of Theorem 3.1. By applying Lemma 3.3 iteratively, we have F k P F k1 P P F 3 P F 2 . Theorem 3.1 is proved, since min min F k ðH k Þ ¼ min F 2 ðH 2 Þ () min F k ¼ F 2 k

Hk

H2

k

Hence, the criterion Fk, which is widely used for Normalized Cuts, becomes the optimal (w.r.t. k) when k = 2. This fact shows that it is not appropriate to use the criterion Fk for deciding k, that is, the number of clusters to partition. 4. Inappropriateness of a comparison of approximate solutions In this section, we brieﬂy show that the criterion Fk is not appropriate to compare approximate solutions that are obtained individually.

A. Nagai / Pattern Recognition Letters 28 (2007) 1981–1986

1985

For any given k P 2, a problem for optimizing the kway partition is known to be NP hard (Meila˘ and Xu, 2003). Therefore, k is actually decided by comparing approximate solutions. Suppose that we are given approx^ k (2 6 k 6 K). Let F^ k def ^ k Þ. Note imate solutions H ¼ F k ðH def ^ ^ ^ ^ that F k P F k . Let k ¼ arg mink F k , that is, a k-way parti^ ^k is the best solution among the given K 1 approxtion H imate solutions. Theorem 4.1. A bipartition H 02 which satisfies F^ ^k P F ðH 02 Þ ^ ^ within the time complexity at can be constructed from H k 3 ^ most Oðk Þ. We only give an outline of a proof for Theorem 4.1, since it is almost the same as Lemma 3.3. In the following ^ 0 Þ is abbreviated as F 0 . discussion, the term F i ðH i i We assumed that an ‘‘optimal’’ (k + 1)-way partition is given for proving Lemma 3.3 (and Lemma 3.2). Actually, we can have a same discussion even when an ‘‘arbitrary’’ (k + 1)-way partition is given, which leads to a generalized lemma of Lemma 3.3 denoted as Lemma 3.3 0 . Lemma 3.3 0 ^ ^, can be applied to the best-known approximate solution H k 0 ^ which leads to a (k 1)-way partition H ^k1 which surpasses the best solution, because F^ ^k P F ^0k1 . By applying Lemma 3.3 0 iteratively, we can ﬁnally construct a bipartition H02 which surpasses the best solution. Note that F^ ^k P F 0^k1 P P F 02 . H 02 is constructed by applying Lemma 3.3 0 ^k 2 times. The time complexity to apply Lemma 3.3 0 once is at most Oð^k 2 Þ, since the number of combinations to select two clusters Va and Vb from the ^k clusters is Oð^k 2 Þ. Therefore, H 02 can be obtained within the time complexity at most Oð^k 3 Þ. As an example, we pick up a case of Fig. 2. The best^ 4 , since ^k ¼ 4. known approximate solution in Fig. 2 is H By merging an appropriate pair of clusters composing ^ 4 , we can construct a 3-way partition H 0 which surpasses H 3 ^ 4 . H 0 is denoted as a white circle in Fig. 3. In the same H 3 way, we can construct a bipartition H 02 which surpasses

^k Fig. 2. The best solution of the given approximate solutions H ^ 4 (i.e., when ^k ¼ 4). H ^ are the optimal solutions which (2 6 k 6 K) is H k are unknown.

Fig. 3. A 3-way partition H 03 is constructed by merging an appropriate pair ^ 4 . Note that F 0 surpasses F^ 4 . In the same way, a of clusters composing H 3 ^ 4 is constructed by merging an approbipartition H 02 which surpasses H priate pair of clusters composing H 03 . In this way, the bipartition H 02 can be ^ 4. constructed which surpasses the best-known approximate solution H

H 03 , by merging an appropriate pair of clusters composing H 03 . ^ 0 can be obtained which In this way, the bipartition H 2 surpasses (or is equal to) the best-known approximate solu^ ^ . Therefore, Fk is not appropriate for deciding k by tion H k comparing approximate solutions. 5. Discussions In this section, we discuss why it was not noticed so far that H k exceeds H kþ1 nor F k is a monotone increasing function (w.r.t. k). We believe that there are two major reasons. (1) First of all, the optimal k-way solution H k is diﬃcult to obtain, since it is an NP hard problem. Because of its diﬃculty, it is common that we can ^ k in practice. only obtain an approximate solution H Indeed, Normalized Cuts is one of the method to obtain approximate solutions instead of optimum solutions. ^ k is not always the partition (2) Besides, in general, H obtained by merging certain two clusters constructing ^ kþ1 , as long as H ^ k and H ^ kþ1 are obtained individuH ally by Normalized Cuts. Normalized Cuts uses spectral decomposition in order to obtain approximate solutions. However, we only use k eigenvectors (including a trivial eigenvector), leaving all the other eigenvectors unused. In short, eigenvectors used for ^ k and H ^ kþ1 are diﬀerent. Therefore, genobtaining H ^ k is not always the partition obtained by erally, H ^ kþ1 . merging certain two clusters constructing H Because of the above two reasons, it was not noticed so far that H k exceeds H kþ1 nor F k is a monotone increasing function (w.r.t. k).

1986

A. Nagai / Pattern Recognition Letters 28 (2007) 1981–1986

6. Related works In terms of the criteria in the ﬁeld of spectral clustering, Ratio-Cut (Hagen and Kahng, 1992) and Min–Max Cut (Ding et al., 2001) are familiar. Their diﬀerence between Normalized Cuts is the deﬁnition of degree (i.e., Eq. (8)). degreeðVÞ for Ratio-Cut is deﬁned as the number of data included in the cluster V. degreeðVÞ for Min–Max Cut is deﬁned as linksðV; VÞ, which shows how strong each data in V is connected with each other. Among them, Normalized Cuts is the most practically used criterion in the ﬁeld of spectral clustering. Deciding the number of clusters (or model order) is a part of cluster validation. There is an approach called resampling (Levine and Domany, 2001; Tibshirani et al., 2001), whose basic idea is to ﬁnd a partition with high stability against resampling the dataset. Resampling is to select a subset of data out of the given dataset. The stability of a certain partition is empirically estimated by the average probability that the cluster membership do not change by the repeated cluster analysis for each resampled data. An advantage of resampling is that a partition of a stable local optimum can be obtained, while it has a disadvantage of time-consumption because of the repeated cluster analysis. In terms of methods to decide the number of clusters, there are still more approaches. We brieﬂy introduce three types of approaches among them. The ﬁrst approach is based on log-likelihood with penalty, such as Akaike Information Criterion (AIC) (Akaike, 1974) and Bayesian Information Criterion (BIC) (Schwarz, 1978). Both of them are adopted by SPSS (Norusis, 2005) which is one of the most commonly used statistical tools. The second approach is based on coding-length such as Minimum Description Length (MDL) (Kontkanen et al., 2003). The last one is full Bayesian approach whose representative implementation is AutoClass (Cheeseman and Stutz, 1996), which is for soft clustering. 7. Conclusion Normalized Cuts is a representative algorithm of spectral clustering, where Fk is widely used as a criterion for deciding the number k of clusters to partition. However, it is not appropriate to use Fk for deciding k. We showed this by proving that the optimal bipartition (when k = 2) becomes the optimal (w.r.t. k) clustering based on Fk. Moreover, we proved that F k , i.e., the evaluation of the optimal k-way partition, becomes a monotone increasing function w.r.t. k. Therefore, we cannot use the criterion Fk in order to decide k.

It is also inappropriate to decide k by comparing approximate solutions with various k. It is because we can construct a bipartition, which surpasses the best given approximate solution within the time complexity at most ^ ^. Oð^k 3 Þ, where ^k is the number of clusters contained in H k Therefore, the criterion Fk is not appropriate for deciding k. An alternative criterion is necessary. References Akaike, H., 1974. A new look at the statistical identiﬁcation model. IEEE Trans. Automat. Control 19, 716–723. Cheeseman, P., Stutz, J., 1996. Bayesian classiﬁcation (AutoClass): theory and results. In: Fayyad, U., Shapiro, G.P., Smyth, P., Uthurusamy, R. (Eds.), Advances in Knowledge Discovery and Data Mining. AAAI Press, pp. 153–180. Cour, T., Be´ne´zit, F., Shi, J., 2005. Spectral segmentation with multiscale graph decomposition. In: Conf. on Computer Vision and Pattern Recognition, pp. 1124–1131. Dhillon, I.S., 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In: Internat. Conf. on Knowledge Discovery and Data Mining, pp. 269–274. Dhillon, I.S., Guan, Y., Kulis, B., 2004. Kernel k-means, spectral clustering and normalized cuts. In: Internat. Conf. on Knowledge Discovery and Data Mining, Poster Session: Research track posters, pp. 551–556. Ding, C., He, X., 2004. Linearized cluster assignment via spectral ordering. In: Proc. 21st Internat. Conf. on Machine Learning. Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D., 2001. A min–max cut algorithm for graph partitioning and data clustering. In: Proc. IEEE Internat. Conf. on Data Mining, pp. 107–114. Hagen, L., Kahng, A., 1992. New spectral methods for ratio-cut partitioning and clustering. IEEE Trans. Comput.-Aided Des. 11 (9), 1074–1085. Kannan, R., Vempala, S., Vetta, A., 2004. On clusterings: Good, bad and spectral. J. ACM 51 (3), 497–515. Kontkanen, P., Vempala, S., Vetta, A., 2003. An MDL framework for data clustering. HIIT Technical Report 2004-6. Levine, E., Domany, E., 2001. Resampling method for unsupervised estimation of cluster validity. Neural Comput 13 (11), 2573–2593. Malik, J., Belongie, S., Leung, T.K., Shi, J., 2001. Contour and texture analysis for image segmentation. Internat. J. Comput. Vision 43 (1), 7– 27. Meila˘, M., Xu, L., 2003. Multiway cuts and spectral clustering. In: Advances in Neural Information Processing Systems. Norusis, M., 2005. SPSS 13.0 Statistical Procedures Companion. PrenticeHall. Pentney, W., Meila˘, M., 2005. Spectral clustering of biological sequence data. In: American Association for Artiﬁcial Intelligence, pp. 845–850. Schwarz, G., 1978. Estimating a dimension of a model. Ann. Statist. 6, 461–464. Shi, J., Malik, J., 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 22 (8), 888–905. Tibshirani, R., Walther, G., Botstein, D., Brown, P., 2001. Cluster validation by prediction strength. Technical Report, Department of Biostatistics, Stanford University. Yu, S., Shi, J., 2003. Multiclass spectral clustering. In: Internat. Conf. on Computer Vision. Yu, S., Shi, J., 2004. Segmentation given partial grouping constraints. IEEE Trans. Pattern Anal. Mach. Intell. 26 (2), 173–183.

Inappropriateness of the criterion of k-way normalized cuts for deciding the number of clusters

Inappropriateness of the criterion of k-way normalized cuts for deciding the number of clusters

Recommend Documents