A parameter-free similarity graph for spectral clustering

A parameter-free similarity graph for spectral clustering

ARTICLE IN PRESS JID: ESWA [m5G;August 10, 2015;21:44] Expert Systems With Applications xxx (2015) xxx–xxx Contents lists available at ScienceDire...

1MB Sizes 10 Downloads 108 Views

ARTICLE IN PRESS

JID: ESWA

[m5G;August 10, 2015;21:44]

Expert Systems With Applications xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa

A parameter-free similarity graph for spectral clustering ∗ ˙ Tülin Inkaya

Q1 Q2

Uluda˘g University, Industrial Engineering Department, Görükle, 16059 Bursa, Turkey

a r t i c l e

i n f o

Keywords: Spectral clustering Similarity graph k-nearest neighbor ε -neighborhood Fully connected graph

Spectral clustering is a popular clustering method due to its simplicity and superior performance in the data sets with non-convex clusters. The method is based on the spectral analysis of a similarity graph. Previous studies show that clustering results are sensitive to the selection of the similarity graph and its parameter(s). In particular, when there are data sets with arbitrary shaped clusters and varying density, it is difficult to determine the proper similarity graph and its parameters without a priori information. To address this issue, we propose a parameter-free similarity graph, namely Density Adaptive Neighborhood (DAN). DAN combines distance, density and connectivity information, and it reflects the local characteristics. We test the performance of DAN with a comprehensive experimental study. We compare k-nearest neighbor (KNN), mutual KNN, ε -neighborhood, fully connected graph, minimum spanning tree, Gabriel graph, and DAN in terms of clustering accuracy. We also examine the robustness of DAN to the number of attributes and the transformations such as decimation and distortion. Our experimental study with various artificial and real data sets shows that DAN improves the spectral clustering results, and it is superior to the competing approaches. Moreover, it facilitates the application of spectral clustering to various domains without a priori information. © 2015 Published by Elsevier Ltd.

1

1. Introduction

2 3

Spectral clustering determines the clusters based on the spectral analysis of a similarity graph. The approach is easy to implement, and it outperforms traditional clustering methods such as k-means algorithm. For this reason, it is one of the widely used clustering algorithms in bioinformatics (Higham, Kalna, & Kibble, 2007), pattern recognition (Vázquez-Martín & Bandera, 2013, Wang, 2008), image segmentation (Zeng, Huang, Kang, & Sang, 2014), and text mining (Dhillon, 2001, He, Qin, & Liu, 2012). Basically, a spectral clustering algorithm consists of three steps: pre-processing, decomposition, and grouping. In the pre-processing step, a similarity graph and its adjacency matrix are constructed for the data set. In the decomposition step, the representation of the data set is changed using the eigenvectors of the matrix. In the grouping step, clusters are extracted from the new representation. In this study, we focus on the pre-processing step. Our aim is to represent the local characteristics of the data set using a similarity graph. In spectral clustering, we consider three important properties of a similarity graph (Von Luxburg, 2007): (1) The similarity graph should be symmetric and non-negative. (2) The similarity graph should be connected unless the connected components (subclusters) form the target clusters. (3) The similarity graph should be robust.

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Q3

a b s t r a c t



Corresponding author. Tel.: +902242942605; fax: +902242941903. E-mail address: [email protected], [email protected]

The most commonly used similarity graphs in the literature are k-nearest neighbor (KNN), mutual KNN, ε -neighborhood, and fully connected graphs (Von Luxburg, 2007). The main idea in these approaches is to represent the local characteristics of the data set using a parameter such as k, ε , and σ . A recent study by Maier, von Luxburg, and Hein (2013) show that the clustering results depend on the choice of the similarity graph and its parameters. However, proper parameter setting becomes a challenging task for the data sets with arbitrary shaped clusters, varying density, and imbalanced clusters. For instance, KNN may connect the points in different density regions. A similar problem is observed in the ε -neighborhood and fully connected graphs due to the spherical-shaped neighborhoods. To overcome these limitations a stream of research addresses parameter selection problem for the similarity graph (Nadler & Galun, 2006, Ng, Jordan, & Weiss, 2002, Zelnik-Manor & Perona, 2004, Zhang, Li, & Yu, 2011). Another research stream incorporates the proximity relations to the similarity graph using minimum spanning tree and β –skeleton (Carreira-Perpinan & Zemel, 2005, Correa & Lindstorm, 2012). There are also studies that use k-means, genetic algorithms, and random forests to obtain robust similarity matrices (Beauchemin, 2015, Chrysouli & Tefas, 2015, Zhu, Loy, & Gong, 2014). These approaches provide some improvement, however, they still include parameters to be set properly. Moreover, some of them do not handle the data sets with varying density. In this study, we propose a parameter-free similarity graph to address the limitations of the aforementioned approaches. We adopt the neighborhood construction (NC) method proposed by

http://dx.doi.org/10.1016/j.eswa.2015.07.074 0957-4174/© 2015 Published by Elsevier Ltd.

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

JID: ESWA 2

ARTICLE IN PRESS ˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

80

˙ Inkaya, Kayalıgil, and Özdemirel (2015) to reflect the local characteristics of the data set. NC yields a unique neighborhood for each point, and the similarity graph generated using NC neighborhoods may be asymmetric. Also, it may include isolated vertices and subgraphs. However, spectral clustering algorithms require symmetric and connected subgraphs. In order to satisfy these properties, we perform additional steps. First, we construct an undirected graph using NC neighborhoods. We call this graph Density Adaptive Neighborhood (DAN). Then, we insert edges to DAN if it includes more connected components than the target number of clusters. Finally, we form the weighted adjacency matrix of DAN using Gaussian kernel function. In order to find the clusters, decomposition and grouping steps of any spectral clustering algorithm are applied to the proposed approach. Our comprehensive experimental study with various artificial and real data sets shows the superiority of DAN to competing approaches. To sum up, our contribution is the development of a preprocessing step for spectral clustering with no a priori information on the data set. The proposed approach includes the construction of a parameter-free similarity graph and its weighted adjacency matrix. It is flexible in the sense that it can be applicable to any spectral clustering algorithm. It works in the data sets with arbitrary shaped clusters and varying density. Moreover, it is robust to the number of attributes and transformations. The rest of the paper is organized as follows. The related literature is provided in Section 2. We introduce the background information about spectral clustering and similarity graphs in Section 3. The proposed approach is explained in Section 4. The performance of the proposed approach is examined in Section 5. The discussion of the experiments is given in Section 6. Finally, we conclude in Section 7.

81

2. Literature review

82

Spectral clustering has its roots in graph partitioning problem. Nascimento and Carvalho (2011), Von Luxburg (2007), and Jia, Ding, Xu, and Nie (2014) provide comprehensive reviews about the spectral clustering algorithms. The literature about spectral clustering can be classified into two categories (Zhu et al., 2014): (1) The studies that focus on data grouping when a similarity graph is given, and (2) the studies that focus on similarity graph construction when a particular spectral clustering algorithm is used. In the first category, there are several studies that improve the clustering performance. For instance, Liu, Poon, Liu, and Zhang (2014) use latent tree models to find the number of leading eigenvectors and partition the data points. Lu, Fu, and Shu (2014) combine spectral clustering with non-negative matrix factorization, and propose non-negative and sparse spectral clustering algorithm. Xiang and Gong (2008) introduce a novel informative/relevant eigenvector selection algorithm, which determines the number of clusters. In this study, we address the similarity graph construction problem, so our work is related to the second category. A group of studies in the second category aims to determine the local characteristics of the data set using proper parameter selection. Ng et al. (2002) suggest the execution of spectral clustering algorithm for different values of neighborhood width σ . Then, they pick the one having the least squared intra-cluster distance to the centroid. This method extracts the local characteristics better. However, additional parameters are required, and the computational complexity is high. Zelnik-Manor and Perona (2004) propose the calculation of a local scaling parameter σ i for each data point instead of a global parameter σ . However, this approach has limitations for the data sets with density variations. Zhang et al. (2011) introduce a local density adaptive similarity measure, namely Common-Near-Neighbor (CNN). CNN uses the local density between two points, and reflects the connectivity by a set of successive points in a dense region. This approach helps scale parameter σ in the Gaussian similarity function. In an alternative scheme,

51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114

[m5G;August 10, 2015;21:44]

Nadler and Galun (2006) introduce a coherence measure for a set of points in the same cluster. The proposed measure is compared with some threshold values to accept or reject a partition. Although this approach finds the clusters correctly, it is not capable of finding clusters with density variations. Carreira-Perpinan and Zemel (2005), and Correa and Lindstorm (2012) use proximity graphs to incorporate the connectivity information to the similarity graph. Carreira-Perpinan and Zemel (2005) propose two similarity graphs based on minimum spanning tree (MST). Both graphs are constructed using an ensemble of trees. In the first graph, each point is perturbed using a noise model, and a given number of MSTs are constructed using perturbed versions of the data set. Then, these MSTs are combined to obtain the similarity graph. In the second one, a given number of MSTs are constructed such that the edges in the MSTs are disjoint. Then, the combination of these disjoint MSTs forms the similarity graph. Correa and Lindstorm (2012) introduce an approach that combines β –skeleton (empty region) graph with a local scaling algorithm. The local scaling algorithm uses a diffusion-based mechanism. It starts from an estimate of the local scale, and the local scale is refined for some iterations. Two parameters are used to control the diffusion speed. Although these approaches find arbitrary shaped clusters, density relations among the data points are not reflected to the similarity graphs. Moreover, their performances are sensitive to the proper parameter selection. A group of studies combine various methods to improve the similarity matrix construction. For example, a recent study by Beauchemin (2015) proposes a density-based similarity matrix construction method based on k-means with subbagging. The subbagging procedure increases the density estimate accuracy. However, the proposed approach requires six hyperparameters. Moreover, it has shortcomings when there is manifold proximity in the data set. Zhu et al. (2014) use clustering random forests to obtain a robust similarity matrix. A binary split function is optimized for learning a clustering forest. This also includes two parameters. Chrysouli and Tefas (2015) combine spectral clustering and genetic algorithms (GA). Using GA, they evolve a number of similarity graphs according to the clustering result. There are also other variants of spectral clustering algorithms. For example, approximate spectral clustering (ASC) is developed for large data sets. ASC works with the representatives of data samples (points), namely prototypes. Hence, the desired similarity matrix should reflect the relations between the data samples and prototypes. Tasdemir ¸ (2012) adopts the connectivity graph proposed by Tasdemir ¸ and Merényi (2009), and introduces a similarity measure for the vector quantization prototypes, namely CONN. CONN calculates the similarity measure considering the distribution of the data samples in the Voronoi polygons with respect to the prototypes. Tasdemir, ¸ Yalçin, and Yildirim (2015) extend this idea and incorporate topology, distance and density information using geodesic-based similarity criteria. Different from these studies, we aim to define the relations among all points in the data set. In this study, we propose a pre-processing step for spectral clustering, with no a priori information. The proposed approach yields a similarity graph and its weighted adjacency matrix, which can be used with any spectral clustering algorithm. Our work differs from the previous studies in the following sense: (1) It is a parameter-free approach. (2) It reflects the connectivity, density and distance relations among all data points. (3) It works on the data sets not only with convex clusters, but also with clusters having arbitrary shapes and varying density. (4) It is robust to the transformations in the data set.

115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

3. Spectral clustering

176

In this section, we explain the most commonly used similarity graphs and spectral clustering algorithms in the literature.

178

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

177

JID: ESWA

ARTICLE IN PRESS

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217

3.1. Similarity graphs Let X = {x1 , … , xn } be the set of data points. We represent X in the form of a similarity graph G = (V,E) where each data point is represented by a vertex. G is an undirected graph with vertex set V and edge set E. The weighted adjacency matrix of the graph is W = (wij )i,j = 1, … , n . If wij = 0, then the vertices vi and vj are not connected. k-nearest neighbor graph: The main idea is that vertex vi is connected with vertex vj if vj belongs to the k nearest neighbors of vi , or vi belongs to the k nearest neighbors of vj . The resulting graph is called the k-nearest neighbor graph (KNN). After edge insertion, each edge is weighted by the similarity of its end points. KNN graph should be connected, or it should include a few connected components (Von Luxburg, 2007). For this purpose, the asymptotic connectivity result for random graphs (Brito, Chavez, Quiroz, & Yukich, 1997) can be used in a finite sample, i.e. k can be chosen in the order of log(n). Mutual k-nearest neighbor graph: The goal is to connect vertices vi and vj if both vi belongs to the k nearest neighbors of vj , and vj belongs to the k nearest neighbors of vi . The resulting graph is called the mutual k-nearest neighbor graph (MKNN). Similar to KNN, each edge is weighted by the similarity of its end points. In general, MKNN has fewer edges compared to KNN. For this reason, selecting a larger k compared to the one in KNN is reasonable. ε-neighborhood graph: In this graph, vertices vi and vj are connected if dij is smaller than ε , where dij denotes the distance between vertices vi and vj . In general, edge weighting is not applied, as the distances between the connected points are in a similar scale. There are two alternative ways to determine the value of ε : (i) setting ε as the longest edge in the minimum spanning tree (MST) of the data points, (ii) setting ε as the mean distance of a point to its kth closest neighbor. The former one ensures connectivity in the graph whereas the latter one can extract the local characteristics inherent in the data set. Fully connected graph: In this graph, all vertices are connected. For this reason, the selection of the similarity function is important, as the adjacency matrix should represent the local characteristics of the neighborhood. A typical example for such a similarity function is 2 the Gaussian kernel s(xi ,x j ) = exp (−di j /(2σ 2 )), where parameter σ controls the neighborhood width. Parameter σ has a similar role as k and ε . For this reason, σ can be chosen as the longest edge in the MST or the mean distance of a point to its kth closest neighbor with

3

k = log(n). Alternative ways for choosing σ are proposed by Ng et al. (2002), and Zelnik-Manor and Perona (2004) (see Section 2). Proximity graphs: The most well-known proximity graphs are MST, relative neighborhood graph (RNG), and Gabriel graph (GG) (Gabriel & Sokal, 1969, Jaromczyk & Toussaint, 1992). MST is a tree having the minimum total edge weights. In RNG, vertex vi is connected with vertex vj , if di j ≤ max{dip , d j p } for ∀v p ∈ V. In GG, vi is

218 219 220 221 222 223 224

 2 + d 2 } for ∀v ∈ V. MST, connected with vertex vj , if di j ≤ min { dip p pj

225

RNG and GG do not have any parameters.

226

3.2. Spectral clustering algorithms

227

Let S = (sij )i,j = 1, … , n be a similarity matrix and W = (wij )i,j = 1, … , n be its weighted adjacency matrix. The degree of vertex vi is di = n i=1 wi j , and the degree matrix D is the diagonal matrix with the degrees d1 , … , dn on the diagonal. Spectral clustering is based on the graph Laplacian, which is a matrix representation of the graph (Chung, 1997, chap. 1). The unnormalized graph Laplacian, L, is calculated as L = D-W. There are also normalized versions of graph Laplacian, i.e. Lsym = D−1/2 LD−1/2 and Lrw = D−1 L. The former one is a symmetric matrix, whereas the latter one is based on the random walk perspective. These graph Laplacians help extract the properties of a data set. The unnormalized spectral clustering algorithm (Von Luxburg, 2007) and two normalized spectral clustering algorithms (Ng et al., 2002, Shi & Malik, 2000) are presented in Figs. 1–3, respectively. The unnormalized spectral clustering algorithm is based on the unnormalized graph Laplacian, whereas the normalized spectral clustering algorithms use one of the normalized graph Laplacians.

228

244

4. The proposed approach

245

The proposed approach corresponds to the pre-processing step of a spectral clustering algorithm, and it includes the construction of Density Adaptive Neighborhood (DAN) and its adjacency matrix. The steps of the proposed approach are given in Fig. 4. In the first step, the local characteristics of the data set are ex˙ tracted. Neighborhood Construction (NC) algorithm (Inkaya et al., 2015) is adopted for this purpose. Let X = {x1 , … , xn } be the set of

246

Input: Data set X and target number of clusters k -W. k eigenvectors u1 uk of L. Rnxk which contains the vectors u1 uk as columns. i n, let yi Rk be the vector corresponding to the ith row of U. yi)i=1,...,n in Rk with the k-means algorithm into clusters C1,..,Ck. Output: Clusters C1,..,Ck Fig. 1. Unnormalized spectral clustering algorithm by Von Luxburg (2007).

Input: Data set X and target number of clusters k

Lrw = D-1L where L=D-W. k eigenvectors u1 uk of Lrw.

Rnxk which contains the vectors u1 uk as columns. i n, let yi Rk be the vector corresponding to the ith row of U. yi)i=1,...,n in Rk with the k-means algorithm into clusters C1,.., Ck. Output: Clusters C1,..,Ck Fig. 2. Normalized spectral clustering algorithm by Shi and Malik (2000).

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

229 230 231 232 233 234 235 236 237 238 239 240 241 242 243

247 248 249 250 251 252

ARTICLE IN PRESS

JID: ESWA

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

4

Input: Data set X and target number of clusters k LD-1/2. k eigenvectors u1 uk of Lsym. Rnxk which contains the vectors u1 uk as columns. Normalize the rows of U to norm 1, and form the matrix T sym=D

-1/2

Rnxk such that

1/ 2

tij

uij

k

uik2

.

n, let yi Rk be the vector corresponding to the ith row of T. yi)i=1,...,n in Rk with the k-means algorithm into clusters C1,.., Ck. Output: Clusters C1,..,Ck i

Fig. 3. Normalized spectral clustering algorithm by Ng et al. (2002).

Step 1. Construct the neighborhood of each vertex using the NC algorithm 2015). 1.1. Find the nearest direct neighbors. 1.2. Find the indirect neighbors till the first density decrease. 1.3. Extend the indirect neighbors in Step 1.2 using the indirect connectivity. 1.4. Determine the final neighbors by mutual connectivity tests. Step 2. Construct an undirected graph, namely DAN. Step 3. Determine the connected components of DAN. Step 4. If the number of connected components is greater than k, insert an edge between the nearest connected components. Update the number of connected components and DAN graph accordingly, and return to Step 4. Otherwise, go to Step 5. Step 5. Form the weighted adjacency matrix of DAN using equation (1). Fig. 4. The proposed approach.

253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282

data points, and each data point in X is represented by a vertex in set V = {v1 , … ,vn }. In NC, the hypersphere passing through vertices vi and vj with diameter dij is used for density calculation, where dij is the Euclidean distance between vertices vi and vj . The number of vertices lying in this hypersphere shows the density between vertices vi and vj , densityij . If densityij = 0, then vertices vi and vj are directly connected. If densityij > 0, then vertices vi and vj are indirectly connected. Using these density and connectivity definitions, Steps 1.1–1.4 are executed in Fig. 4, and the NC neighborhood of each vertex, NCi , is determined uniquely. NC neighborhoods reflect the density and connectivity relations in the data set. In the second step, we construct an undirected similarity graph G = (V,E) with vertex set V and edge set E. We insert an edge (vi ,vj ) if vi ∈ NCj or vj ∈ NCi . This graph is called Density Adaptive Neighborhood (DAN). This step yields a symmetric similarity graph. In the third step, we determine the connected components (subclusters) of DAN. A connected component of an undirected graph is the largest subgraph in which all vertices are connected. These connected components are the potential clusters in the data set. In the fourth step, the number of connected components is compared with the number of target clusters (k). This check shows whether DAN satisfies the connectivity property. When the number of connected components is more than k, this implies that there are too many isolated subclusters or vertices. For this reason, we insert an edge (vi ,vj ) such that (vi ,vj ) = arg min {dij |vi ∈ CCr , vj ∈ CCq , p=r}, where CCr and CCq denote the connected components r and q, respectively. We repeat this step until the number of connected components is less than or equal to k. In the final step, the weighted adjacency matrix of DAN is formed using the Gaussian kernel. The weight between vertices vi and vj , wij ,

is calculated as follows

wi j =

⎧  ⎨exp − ⎩

di j

(max{dik :(i,k) ∈ E})

0

283



2 2

if(i, j) ∈ E,

(1)

otherwise;

where E is the edge set of DAN. In the Gaussian kernel, the neighborhood width is equal to the longest edge in the neighborhood of the corresponding point. Hence, it is uniquely calculated for each point.

284

5. Experimental study

287

In this section, we performed a comparative study of DAN and other similarity graphs for spectral clustering.

288

5.1. Data sets and comparison

290

We conducted experiments with spatial data sets (Buerk, 2015, ˙ Chang & Yeung 2008, Iyigün, 2008, Sourina, 2013), I- (Fukunaga, 1990) and Ness (Van Ness, 1980) data sets. There are 20 spatial data sets including clusters with various shapes and density differences. We provide some example data sets in Fig. 5 (a)–(f). I- and Ness are Gaussian data sets with two clusters. The data generation models are explained briefly. Let μi denote the mean vector for cluster i, and  i denote the covariance matrix for cluster i. We define Ip as the p×p identity matrix, and diag[.] as the diagonal matrix. We set n1 = 100 and n2 = 100, where n1 and n2 denote the number of points in clusters 1 and 2, respectively. I- data set: 8-dimensional Gaussian data set with μ1 = 0, μ2 = [3.86 3.10 0.84 0.84 1.64 1.08 0.26 0.01]T ,  1 = I8 and  2 = diag[8.41 12.06 0.12 0.22 1.49 1.77 0.35 2.73].

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

285 286

289

291 292 293 294 295 296 297 298 299 300 301 302 303 304

ARTICLE IN PRESS

JID: ESWA

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

(a)

(b)

(c)

(d)

(e)

(f)

5

Fig. 5. Example data sets, (a) spiral, (b) data-uc-cc-nu-n_v2, (c) data-c-cc-nu-n_v2, (d) I-, (e) two_moons, and (f) chainlink.

Table 1 The properties of UCI data sets.

5.2. Comparison and performance criteria

Data set

CN

PN

DN

Banknotes Breast cancer Hepta Iris Liver Seeds User knowledge Vertebral

2 2 7 3 2 3 4 2

200 449 212 147 341 210 258 310

6 9 3 4 6 7 5 6

CN # of clusters PN # of points in the data set DN # of attributes

305 306 307 308 309 310 311 312 313

Ness data sets: p-dimensional Gaussian data set with μ1 = 0, I 0 μ2 = [/2 0 … 0 /2]T ,  1 = Ip , and 2 = [ p 1 ] where p = 0 2 Ip 2, … , 8 and  = 6 and 8.  is the Mahalanobis distance between two clusters. We also used data sets from UCI Machine Learning Repository (Bache & Lichman, 2013) to generalize our results. The characteristics of the eight UCI data sets are shown in Table 1. All the data sets have numerical attributes. We eliminated the missing values in each data set. In addition, we normalized each data set.

We used KNN, MKNN, ε -neighborhood, fully connected graph, MST, and GG for comparison. We set the parameters considering the recommendations in the literature. In KNN, we examined three settings where k is log(n), 5% and 10% of the points in the data set. We labeled these settings as KNN1, KNN2 and KNN3, respectively. Similar to KNN, we chose three settings for parameter k in MKNN with labels MKNN1, MKNN2 and MKNN3. In ε -neighborhood, we used two settings where ε is equal to: (i) the longest edge in the MST of the data points, (ii) the mean distance of a point to its kth closest neighbor where k = log(n). We labeled these settings as eps-1 and eps2, respectively. In the fully connected graph (FCG), we used Gaussian kernel. The neighborhood width σ in the Gaussian kernel has a similar role as parameter ε , so we used the same parameter setting for σ with labels FCG1 and FCG2. There is no parameter for MST and GG. After constructing the similarity graph, we applied unnormalized spectral clustering algorithm, and two normalized spectral clustering algorithms in Section 3.2. We labeled these algorithms as USC, NSC1 and NSC2. All the algorithms were coded in Matlab 8.1. We ran the algorithms on a PC with Intel Core i5 3.00 GHz processor and 4 GB RAM. We evaluated the clustering quality of different similarity graphs in terms of Normalized Mutual Information (NMI) (Fred & Jain, 2003) and Rand Index (RI) (Rand, 1971). NMI is an information theoretical

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338

ARTICLE IN PRESS

JID: ESWA

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

6

Table 2 Comparison results for the spatial data sets (The numbers in parentheses show the ranks within each spectral clustering algorithm such that 1 is the best and 13 is the worst). USC TCN

KNN1 KNN2 KNN3 MKNN1 MKNN2 MKNN3 eps-1 eps-2 full-1 full-2 MST GG DAN

4 6 5 9 5 4 0 7 4 0 2 1 10

NSC1 RI

NMI

TCN

Mean

Min

Mean

Min

0.821 (4) 0.836 (3) 0.811 (6) 0.877 (2) 0.817 (5) 0.795 (7) 0.705 (12) 0.787 (9) 0.794 (8) 0.660 (13) 0.776 (10) 0.714 (11) 0.915 (1)

0.618 0.499 0.497 0.5 0.496 0.497 0.435 0.498 0.498 0.498 0.589 0.499 0.746

0.747 (3) 0.744 (4) 0.708 (5) 0.794 (2) 0.708 (6) 0.685 (7) 0.501 (12) 0.682 (9) 0.683 (8) 0.47 (13) 0.666 (10) 0.539 (11) 0.887 (1)

0.339 0.004 0 0.024 0 0 0.002 0.024 0.006 0.004 0.315 0.005 0.684

7 5 5 8 5 6 0 5 4 1 3 1 11

NSC2 RI

NMI

TCN

Mean

Min

Mean

Min

0.846 (3) 0.818 (5) 0.801 (9) 0.848 (2) 0.825 (4) 0.803 (8) 0.681 (13) 0.817 (6) 0.786 (10) 0.714 (12) 0.805 (7) 0.721 (11) 0.901 (1)

0.618 0.517 0.498 0.5 0.517 0.498 0.479 0.498 0.526 0.512 0.585 0.517 0.589

0.777 (2) 0.717 (5) 0.680 (10) 0.751 (3) 0.723 (4) 0.686 (8) 0.491 (13) 0.717 (6) 0.682 (9) 0.541 (12) 0.711 (7) 0.542 (11) 0.868 (1)

0.339 0.034 0.001 0.008 0.034 0.001 0.027 0.024 0.001 0.003 0.359 0.005 0.535

9 5 6 3 6 5 1 5 5 1 3 2 13

RI

NMI

Mean

Min

Mean

Min

0.873 (2) 0.814 (3) 0.814 (4) 0.809 (8) 0.813 (5) 0.787 (10) 0.745 (12) 0.812 (6) 0.801 (9) 0.726 (13) 0.812 (7) 0.746 (11) 0.925 (1)

0.639 0.505 0.498 0.509 0.505 0.498 0.5 0.498 0.554 0.534 0.638 0.505 0.725

0.822 (2) 0.717 (5) 0.712 (7) 0.809 (3) 0.711 (8) 0.669 (10) 0.555 (12) 0.715 (6) 0.695 (9) 0.553 (13) 0.723 (4) 0.585 (11) 0.901 (1)

0.524 0.012 0.001 0.509 0.012 0.001 0.05 0.024 0.001 0.001 0.459 0.005 0.598

TCN # of data sets in which target clusters are found.

340

measure, and it is based on entropy. RI penalizes both divided clusters and mixed clusters.

341

5.3. Experiments on artificial data sets

339

342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382

The comparison results for the spatial data sets and I- are provided in Table 2. We rank each performance criterion for the ease of comparison. Among 21 data sets, USC with DAN finds the target clusters in 10 data sets. NSC1 and NSC2 with DAN extract the target clusters in 11 and 13 data sets, respectively. MKNN1 and KNN1 follow DAN. MKNN1 extracts the target clusters in 9, 8 and 3 data sets with USC, NSC1 and NSC2, respectively. The same Figs. are 4, 7 and 9 for KNN1. Among 13 competing similarity graphs, DAN gives the best RI and NMI values in all spectral clustering algorithms followed by KNN1 and MKNN1. The performances of ε -neighborhood, FCG, MST and GG are inferior to the performances of DAN, KNN and MKNN. Hence, DAN outperforms other similarity graphs in finding the target clusters with various shapes and density differences. However, it has limitations in the data sets, which include noise and mixed clusters. Still, the worst-case performance of DAN (minimum of RI and NMI) is superior to the other ones. Table 2 indicates that there is a relation between the similarity graph and the spectral clustering algorithm used in terms of clustering performance. For instance, NSC2 together with DAN is the best performer for all spatial data sets. NSC2 is also preferable in most of the similarity graphs including ε -neighborhood, FCG, MST and GG. However, for KNN and MKNN, the best performing spectral clustering algorithms are USC and NSC1 depending on the parameter setting. We also analyzed the impact of the parameter settings on KNN, MKNN, ε -neighborhood and fully connected graphs. In Table 2, we observe that the performances of these similarity graphs are sensitive to the parameter settings. KNN and MKNN are more successful in cluster extraction for k = log(n). For the larger values of k, RI and NMI values worsen. FCG shows significantly better performance, when the neighborhood width (σ ) is chosen as the longest edge in the MST. In ε-neighborhood graph, setting ε to the mean distance of a point to its kth closest neighbor with k = log(n) provides a significant improvement in the clustering performance. This is also consistent with the superior performance of KNN and MKNN for k = log(n). Finally, we examined the performance of DAN with varying the number of attributes. We generated 2- to 8-dimensional Ness data sets with  = 6, 8. The experimental results are shown in Fig. 6. In Fig. 6, we present the average RI values for different parameter settings of KNN, MKNN, ε -neighborhood and FCG. When the clusters are well separated ( = 8), DAN finds the target clusters in most

of the data sets in Fig. 6(a)–(c). Even the number of attributes increases, the performance of DAN stays the same. When the clusters are closer ( = 6), the performance of DAN shows a slight decrease in Fig. 6(d)–(f). Still, the number of attributes does not affect the performance of DAN significantly. In both settings of , the performances of KNN and GG are close to the performance of DAN. Similar to DAN, the number of attributes does not have an impact on the performance of KNN. MST is sensitive to the number of attributes, in particular, when the clusters are closer. ε -neighborhood and MKNN are inferior to the other similarity graphs, and their performances depend on the number of attributes. To sum up, DAN is successful in finding target clusters for the data sets with large number of attributes.

394

5.4. Experiments on UCI data sets

395

In Table 3, we present the RI value of each UCI data set, when USC is applied. DAN outperforms other similarity graphs in all data sets except “seeds” and “user knowledge”. For “seeds”, KNN1 is the best performer, whereas MKNN2 gives the highest RI for “user knowledge”. For “banknotes” and “vertebral” data sets, other similarity graphs such as KNN and MKNN, FCG and ε -neighborhood graph also find the target clusters. However, the performance of these similarity graphs is sensitive to the parameter settings. DAN graph is able to represent the local characteristics of the data set without any parameter. We also provide RI values of NSC1 and NSC2 in the Appendix. Even for different spectral clustering algorithms, DAN is the best performer among 13 similarity graphs.

396

383 384 385 386 387 388 389 390 391 392 393

397 398 399 400 401 402 403 404 405 406 407 408

5.5. Experiments on robustness

409

We analyzed the robustness of the similarity graph using two types of transformations: geometric distortion and decimation. We consider the spatial data sets in Section 5.1 for this purpose. Let xi = (xi1 , xi2 ) and xi  = (xi1  , xi2  ) be the original and distorted points, respectively. In the geometric distortion, we displace each point horizontally such that xi1  = xi1 + λxi2 and xi2  = xi2 where λ is the distortion factor. In decimation, we remove 0.05Dn points randomly where D is the decimation factor. The impact of distortion is illustrated on two example data sets in Fig. 7. The resulting clusters for the best three performing similarity graphs, i.e. DAN, KNN1 and MKNN1, are shown in Fig. 7. DAN can find the target clusters in both data sets in Fig. 7(c) and (f), whereas KNN1 and MKNN1 mix target clusters as shown in Fig. 7(a), (b), (d), and (e). For the distortion factor λ ∈ {0,0.1, … , 0.5}, we present the comparison results for all spatial data sets in Fig. 8. DAN

410

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

411 412 413 414 415 416 417 418 419 420 421 422 423 424

ARTICLE IN PRESS

JID: ESWA

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

(a)

7

(b)

(c)

(d)

(e)

(f)

Fig. 6. Experimental results for Ness data sets with varying the number of attributes, when (a) USC is applied for  = 8, (b) NSC1 is applied for  = 8, (c) NSC2 is applied for  = 8, (d) USC is applied for  = 6, (e) NSC1 is applied for  = 6, and (f) NSC2 is applied for  = 6. Table 3 Comparison results for the UCI data sets in terms of RI when USC is applied (The best performer for each data set is bolded).

425 426 427 428 429 430 431 432 433 434 435

Data set

KNN1

KNN2

KNN3

MKNN1

MKNN2

MKNN3

eps-1

eps-2

FCG1

FCG2

MST

GG

DAN

Banknotes Breast cancer Hepta Iris Liver Seeds User knowledge Vertebral

1.000 0.887 0.737 0.816 0.506 0.922 0.705 0.516

1.000 0.891 0.947 0.864 0.504 0.911 0.675 0.534

1.000 0.887 0.862 0.837 0.505 0.894 0.673 0.545

0.498 0.501 0.755 0.432 0.511 0.377 0.317 0.557

1.000 0.891 0.861 0.864 0.504 0.910 0.718 0.534

1.000 0.887 0.722 0.837 0.505 0.894 0.674 0.545

0.861 0.684 0.583 0.729 0.507 0.359 0.319 0.541

0.498 0.500 0.876 0.767 0.512 0.687 0.317 0.559

0.498 0.500 0.951 0.767 0.510 0.731 0.300 0.559

0.498 0.500 0.738 0.768 0.512 0.860 0.300 0.559

0.887 0.841 0.937 0.702 0.504 0.904 0.699 0.559

0.827 0.899 0.953 0.716 0.510 0.873 0.650 0.503

1.000 0.907 0.956 0.876 0.512 0.891 0.701 0.559

provides the highest RI among all competing similarity graphs, followed by KNN and MKNN. ε -neighborhood, FCG, MST, and GG fall behind DAN, KNN and MKNN. As the distortion factor increases, the performance of DAN decreases. KNN and MKNN are also sensitive to the distortion factor. Although GG has the worst performance among all similarity graphs, its sensitivity to the distortion factor is less. We also show the impact of decimation on two example data sets in Fig. 9. The results of the best three performing similarity graphs, i.e. DAN, KNN1 and MKNN1, are given in Fig. 9. In Fig. 9(c) and (f), DAN outperforms KNN1 and MKNN1 in terms of RI.

In Fig. 10, the comparison results for 21 spatial data sets are given with varying the decimation factor D ∈ {0,1, … ,5}. Although the average RI values of DAN decrease for larger decimation factors, DAN is still superior to the competing similarity graphs. KNN and MKNN follow DAN, and GG has the minimum RI values. To sum up, DAN is the best performer under distortion and decimation transformations.

436 437 438 439 440 441

6. Discussion

442

Our experimental study shows that DAN is an effective preprocessing step for spectral clustering. It is successful in finding the

444

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

443

ARTICLE IN PRESS

JID: ESWA

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

8

(a)

(d)

(b)

(c)

(e)

(f)

Fig. 7. Example data sets with distortion factor λ = 0.3, when NSC2 applied to (a) spiral with KNN1 (RI = 0.95), (b) spiral with MKNN1 (RI = 0.72), (c) spiral with DAN (RI = 1), (d) data-uc-cc-nu-n_v2 with KNN1 (RI = 0.92), (e) data-uc-cc-nu-n_v2 with MKNN1 (RI = 0.83), and (f) data-uc-cc-nu-n_v2 with DAN (RI = 1).

(a)

(b)

(c)

Fig 8. The impact of distortion for (a) USC, (b) NSC1, and (c) NSC2.

(a)

(d)

(b)

(e)

(c)

(f)

Fig. 9. Example data sets with decimation factor D = 3 when USC is applied to (a) spiral with KNN1 (RI = 0.72), (b) spiral with MKNN1 (RI = 0.72), (c) spiral with DAN (RI = 1); and when NSC2 is applied to, (d) data-uc-cc-nu-n_v2 with KNN1 (RI = 0.95), (e) data-uc-cc-nu-n_v2 with MKNN1 (RI = 0.85), and (f) data-uc-cc-nu-n_v2 with DAN (RI = 1).

445 446 447 448 449

clusters with arbitrary shapes and varying density. Also, it is superior to the other approaches in terms of RI, NMI and the number of data sets in which target clusters are found. Particularly, in spatial data sets, DAN together with the normalized spectral clustering algorithm by Ng et al. (2002) is able to find the target clusters correctly.

Moreover, DAN is robust to the number of attributes, and the transformations such as distortion and decimation. Although KNN and MKNN follow DAN, their performances are sensitive to the parameter setting. Even the properties of the data set affect the parameter setting. For instance, clustering accuracy

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

450 451 452 453 454

ARTICLE IN PRESS

JID: ESWA

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

(a)

9

(b)

(c)

Fig. 10. The impact of decimation for (a) USC, (b) NSC1, and (c) NSC2.

455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472

improves by choosing k = log(n) in spatial data sets, whereas larger values of k perform better in UCI data sets, which have larger number of attributes and data points compared to the spatial data sets. When the data set has varying density, MKNN yields slightly better results than KNN. However, MKNN is more sensitive to decimation and distortion compared to KNN. The performances of ε -neighborhood and FCG are inferior to DAN, KNN and MKNN in most of the data sets. As ε -neighborhood and FCG define the similarity graphs based on spherical shaped neighborhoods, these two similarity graphs are not capable of finding the clusters with arbitrary shapes and varying density. MST and GG are parameter-free approaches, however, they have poor performance on spectral clustering. Hence, defining the proximity relations without density information is not sufficient for spectral clustering. ε-neighborhood, FCG, MST and GG show superior performance with normalized spectral clustering algorithm by Ng et al. (2002), whereas the choice of the spectral clustering algorithm for KNN and MKNN depends on the parameter setting.

473

7. Conclusion

474

Determining the local characteristics of a data set is a building block for spectral clustering. However, the existing approaches such as KNN, MKNN, ε -neighborhood and FCG are sensitive to the parameter selection, and there is not a systematic way for finding the proper setting. This study aims to fill the gap in the spectral clustering literature by providing a pre-processing step, which includes the construction of a parameter-free similarity graph and its adjacency matrix.

475 476 477 478 479 480

The proposed similarity graph, namely DAN, facilitates the use of spectral clustering algorithms in various domains without a priori information on the data set. Compared to the existing approaches, the main advantages of DAN are as follows: (i) it is parameter-free, (ii) it can work together with the well-known spectral clustering algorithms, (iii) it is successful in finding the local characteristics of the data sets with arbitrary shaped clusters and varying density, and (iv) its performance is robust to the number of attributes and transformations such as distortion and decimation. Possible future research directions are as follows. DAN has limitation in handling data sets with noise. Hence, a future research direction is the development ways of handling noise. When there exist mixed clusters, the neighborhood relations among the data points may not be precise. Instead of hard (crisp) neighborhood relations, an interesting research direction can be the use of the fuzzy neighborhood relations in spectral clustering. In practice, side information might be available to guide the clustering result to the desired partition. This can be in the form of pairwise constraints or partial labeling. Learning a similarity graph based on side information can be investigated further. Hybridization of spectral clustering with metaheuristics is a promising tool to improve the clustering accuracy.









Appendix

481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502

503

Table A.1, Table A.2.

504

Table A.1 Comparison results for the UCI data sets in terms of RI when NSC1 is applied (The best performer for each data set is bolded). Data set

KNN1

KNN2

KNN3

MKNN1

MKNN2

MKNN3

eps-1

eps-2

FCG1

FCG2

MST

GG

DAN

Banknotes Breast cancer Hepta Iris Liver Seeds User knowledge Vertebral

1.000 0.891 0.897 0.572 0.499 0.916 0.669 0.551

1.000 0.895 0.854 0.864 0.500 0.905 0.672 0.559

1.000 0.895 0.948 0.842 0.499 0.900 0.669 0.562

0.498 0.501 0.715 0.429 0.506 0.380 0.456 0.557

1.000 0.895 0.857 0.864 0.500 0.905 0.672 0.559

1.000 0.895 0.854 0.842 0.499 0.900 0.675 0.562

0.887 0.500 0.867 0.703 0.512 0.373 0.337 0.553

0.498 0.500 0.864 0.772 0.512 0.732 0.666 0.559

0.961 0.791 0.738 0.772 0.510 0.881 0.648 0.559

0.923 0.826 0.952 0.818 0.509 0.885 0.678 0.545

0.887 0.841 0.947 0.828 0.504 0.894 0.699 0.559

0.835 0.899 0.949 0.721 0.502 0.879 0.654 0.574

1.000 0.907 0.820 0.883 0.503 0.900 0.701 0.564

Table A.2 Comparison results for the UCI data sets in terms of RI when NSC2 is applied (The best performer for each data set is bolded). Data set

KNN1

KNN2

KNN3

MKNN-1

MKNN2

MKNN3

eps-1

eps-2

FCG1

FCG2

MST

GG

DAN

Banknotes Breast cancer Hepta Iris Liver Seeds User knowledge Vertebral

1.000 0.911 0.950 0.729 0.499 0.911 0.672 0.564

1.000 0.915 1.000 0.832 0.500 0.895 0.701 0.564

1.000 0.915 0.950 0.837 0.500 0.911 0.669 0.566

0.852 0.758 0.892 0.566 0.509 0.390 0.620 0.539

1.000 0.915 0.904 0.832 0.501 0.895 0.673 0.564

1.000 0.915 0.898 0.837 0.500 0.911 0.669 0.566

0.498 0.845 0.906 0.713 0.501 0.400 0.547 0.504

0.498 0.500 0.959 0.842 0.512 0.864 0.673 0.559

0.961 0.895 1.000 0.773 0.510 0.889 0.717 0.559

0.932 0.867 1.000 0.833 0.499 0.882 0.665 0.559

0.923 0.899 0.948 0.823 0.507 0.888 0.690 0.555

0.628 0.499 1.000 0.823 0.499 0.654 0.653 0.569

1.000 0.903 0.953 0.917 0.502 0.900 0.721 0.579

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

JID: ESWA 10

505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 Q4 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551

ARTICLE IN PRESS

[m5G;August 10, 2015;21:44]

˙ T. Inkaya / Expert Systems With Applications xxx (2015) xxx–xxx

References Bache, K. & Lichman, M. (2013). UCI machine learning repository [http://archive.ics.uci.edu/ml] Irvine, CA: University of California, School of Information and Computer Science Beauchemin, M. (2015). A density-based similarity matrix construction for spectral clustering. Neurocomputing, 151, 835–844. Brito, M., Chavez, E., Quiroz, A., & Yukich, J. (1997). Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Statistics & Probability Letters, 35, 33–42. Buerk, I. (2015). Fast and efficient spectral clustering. Retrieved from http://www.mathworks.com/matlabcentral/fileexchange/34412 Last accessed: July 10, 2015. Carreira-Perpinan, M. A., & Zemel, R. S. (2005). Proximity graphs for clustering and manifold learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems (pp. 225–232). Cambridge: MIT Press. Chang, H., & Yeung, D. Y. (2008). Robust path-based spectral clustering. Pattern Recognition, 41(1), 191–203. Chung, F. R. K. (1997). Spectral graph theory. Providence: American Mathematical Society. Chrysouli, C., & Tefas, A. (2015). Spectral clustering and semi-supervised learning using evolving similarity graphs. Applied Soft Computing. doi:10.1016/j.asoc.2015.05.026. Correa, C. D., & Lindstorm, P. (2012). Locally-scaled spectral clustering using empty region graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1330–1338). Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 269–274). Fred, A. L. N., & Jain, A. K. (2003). Robust data clustering. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 128–136). Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). Academic Press. Gabriel, K. R., & Sokal, R. R. (1969). New statistical approach to geographic variation analysis. Systematic Zoology, 18(3), 259–278. He, R., Qin, B., & Liu, T. (2012). A novel approach to update summarization using evolutionary manifold-ranking and spectral clustering. Expert Systems with Applications, 39(3), 2375–2384. Higham, D. J., Kalna, G., & Kibble, M. (2007). Spectral clustering and its use in bioinformatics. Journal of Computational and Applied Mathematics, 204(1), 25–37. ˙ Inkaya, T., Kayalıgil, S., & Özdemirel, N. E. (2015). An adaptive neighbourhood construction algorithm based on density and connectivity. Pattern Recognition Letters, 52, 17–24. ˙ Iyigün, C. (2008). Probabilistic distance clustering. New Brunswick, New Jersey: Ph.D. Dissertation. Rutgers University. Jaromczyk, J. W., & Toussaint, G. T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517. Jia, H., Ding, S., Xu, X., & Nie, R. (2014). The latest research progress on spectral clustering. Neural Computing and Applications, 24, 1477–1486.

Liu, A. H., Poon, L. K. M., Liu, T.-F., & Zhang, N. L. (2014). Latent tree models for rounding in spectral clustering. Neurocomputing, 144, 448–462. Lu, H., Fu, Z., & Shu, X. (2014). Non-negative and sparse spectral clustering. Pattern Recognition, 47, 418–426. Maier, M., von Luxburg, U., & Hein, M. (2013). How the result of graph clustering methods depends on the construction of the graph. ESAIM: Probability and Statistics, 17, 370–418. Nadler, B., & Galun, M. (2006). Fundamental limitations of spectral clustering. Advances in Neural Information Processing Systems, 19, 1017–1024. Nascimento, M. C. V., & Carvalho, A. C. P. L. F. (2011). Spectral methods for graph clustering-A survey. European Journal of Operational Research, 211, 221–231. Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In T. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems (pp. 849–856). Cambridge: MIT Press. Rand, W. M. (1971). Objective criteria for the evaluation of the clustering methods. Journal of the American Statistical Association, 66(336), 846–850. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905. Sourina, O. Current projects in the homepage of Olga Sourina. (http://www.ntu.edu.sg/home/eosourina/projects.html , last accessed on March 2, 2013). Tasdemir, ¸ K. (2012). Vector quantization based approximate spectral clustering of large datasets. Pattern Recognition, 45, 3034–3044. Tasdemir, ¸ K., & Merényi, E. (2009). Exploiting data topology in visualization and clustering of self-organizing maps. IEEE Transactions on Neural Networks, 20(4), 549–562. Tasdemir, ¸ K., Yalçin, B., & Yildirim, I. (2015). Approximate spectral clustering with utilized similarity information using geodesic based hybrid distance measures. Pattern Recognition, 48, 1465–1477. Van Ness, J. (1980). On the dominance of nonparametric Bayes rule discriminant algorithms in high dimensions. Pattern Recognition, 12(6), 355–368. Vázquez-Martín, R., & Bandera, A. (2013). Spatio-temporal feature-based keyframe detection from video shots using spectral clustering. Pattern Recognition Letters, 34(7), 770–779. Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17, 395–416. Wang, C.-H. (2008). Recognition of semiconductor defect patterns using spatial filtering and spectral clustering. Expert Systems with Applications, 34(3), 1914–1923. Xiang, T., & Gong, S. (2008). Spectral clustering with eigenvector selection. Pattern Recognition, 41(3), 1012–1029. Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. Advances in Neural Information Processing Systems, 17, 1601–1608. Zeng, S., Huang, R., Kang, Z., & Sang, N. (2014). Image segmentation using spectral clustering of Gaussian mixture models. Neurocomputing, 144, 346–356. Zhang, X., Li, J., & Yu, H. (2011). Local density adaptive similarity measurement for spectral clustering. Pattern Recognition Letters, 32, 352–358. Zhu, X., Loy, C. C., & Gong, S. (2014). Constructing Robust Affinity Graphs for Spectral Clustering. In Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1450–1457).

˙ Please cite this article as: T. Inkaya, A parameter-free similarity graph for spectral clustering, Expert Systems With Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.07.074

552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599