An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications

Pattern Recognition Letters 32 (2011) 816–823 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.c...

Download PDF

320KB Sizes 0 Downloads 12 Views

Report

PDF Reader
Full Text

Pattern Recognition Letters 32 (2011) 816–823

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classiﬁcations q Sang-Woon Kim ⇑ Department of Computer Science and Engineering, Myongji University, Yongin 449-728, South Korea

a r t i c l e

i n f o

Article history: Received 12 August 2009 Available online 18 January 2011 Communicated by F. Roli Keywords: Dissimilarity-based classiﬁcations Dimensionality reduction schemes Prototype selection methods Linear discriminant analysis

a b s t r a c t This paper presents an empirical evaluation on the methods of reducing the dimensionality of dissimilarity spaces for optimizing dissimilarity-based classiﬁcations (DBCs). One problem of DBCs is the high dimensionality of the dissimilarity spaces. To address this problem, two kinds of solutions have been proposed in the literature: prototype selection (PS) based methods and dimension reduction (DR) based methods. Although PS-based and DR-based methods have been explored separately by many researchers, not much analysis has been done on the study of comparing the two. Therefore, this paper aims to ﬁnd a suitable method for optimizing DBCs by a comparative study. Our empirical evaluation, obtained with the two approaches for an artiﬁcial and three real-life benchmark databases, demonstrates that DR-based methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA) based methods, generally improve the classiﬁcation accuracies more than PS-based methods. Especially, the experimental results demonstrate that PCA is more useful for the well-represented data sets, while LDA is more helpful for the small sample size problems. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction One of the most recent and novel developments in the ﬁeld of statistical pattern recognition (PR) (Jain et al., 2000) is the concept of dissimilarity-based classiﬁcations (DBCs) proposed by Duin and his co-authors (Pekalska and Duin, 2005). DBCs are a way of deﬁning classiﬁers among the classes; and the process is not based on the feature measurements of individual object samples, but rather on a suitable dissimilarity measure among the individual samples. The three major questions we encountered when designing DBCs are summarized as follows: (1) How can prototype subsets be selected (or created) from the training samples? (2) How can the dissimilarities between object samples be measured? (3) How can a classiﬁer in the dissimilarity space be designed? Numerous strategies have been used to explore these questions (Kim and Oommen, 2007; Pekalska and Duin, 2005; Pekalska et al., 2006). First of all, the prototype subsets can be selected in a simple way by using all of the input vectors as prototypes. In most cases, however, this strategy imposes a computational burden on the classiﬁer. To select the prototype subset that is compact and capaq This work was supported by the National Research Foundation of Korea funded by the Korean Government (NRF-2010-0015829). Preliminary versions of this paper were partially presented at the CIARP2008, the 13th Iberoamerican Congress on Pattern Recognition, Havana, Cuba, in September 9–12, 2008, and the ICAART2010, the International Conference on Agents and Artiﬁcial Intelligence, Valencia, Spain, in January 22–24, 2010. ⇑ Tel.: +82 31 330 6437. E-mail address: [email protected]

0167-8655/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2011.01.009

ble of simultaneously representing the entire data set, Duin and his colleagues (Pekalska and Duin, 2005; Pekalska et al., 2006) discussed a number of methods, where a training set is pruned to yield a set of representation prototypes. On the other hand, by invoking a prototype reduction scheme (PRS), Kim and Oommen (2007) also obtained a representative subset, which is utilized by the DBC. Aside from using PRSs, the authors simultaneously proposed the use of the Mahalanobis distance as the dissimilaritymeasurement criterion. With regard to the second question, investigations have focused on measuring the appropriate dissimilarity by using various lp norms, the modiﬁed Hausdorff norms, and traditional PR-based measures, such as those used in template matching and correlation-based analysis (Pekalska and Duin, 2005; Pekalska et al., 2006). Finally, the third question refers to the learning paradigms, especially those which deal with either parametric or nonparametric classiﬁers. On this subject, the use of many traditional decision classiﬁers including the k-NN rule and the linear/quadratic normal-density-based classiﬁers has been reported (Pekalska and Duin, 2005). In the interest of brevity, the details of the second and the third questions are omitted here, but can be found in the concerning literature. In DBCs, a good selection of prototypes seems to be crucial to succeed with the classiﬁcation algorithm in the dissimilarity space. The prototypes should avoid redundancies in terms of selection of similar samples, and include as much information as possible. However, it is difﬁcult for us to ﬁnd the optimal number of prototypes. Furthermore, there is a possibility of losing some useful information for discrimination when selecting prototypes (Bicego

817

S.-W. Kim / Pattern Recognition Letters 32 (2011) 816–823

et al., 2004; Bunke and Riesen, 2007; Kim and Gao, 2008; Riesen et al., 2007). To avoid these problems, in Bicego et al. (2004), Bunke and Riesen (2007), Kim and Gao (2008), Riesen et al. (2007), the authors separately proposed an alternative approach where all of the available samples were selected as prototypes, and, subsequently, a dimension reduction scheme, such as principal component analysis (PCA) or linear discriminant analysis (LDA), was applied to the reduction of dimensionality. That is, they preferred not to directly select the representation prototypes from the training samples; rather, they proposed a way of using the dimension reduction schemes after computing the dissimilarity matrix with the entire set of training samples. This approach seems to be more principled and allows us to completely avoid the problem of ﬁnding the optimal number of prototypes. In this paper, we perform an empirical evaluation on the two approaches of reducing the dimensionality of dissimilarity spaces for optimizing DBCs: prototype selection (PS) based methods and dimension reduction (DR) based methods. In PS-based methods, we ﬁrst select the representation prototypes from the training data set by resorting to one of the prototype selection methods as described in Kim and Oommen (2007), Pekalska and Duin (2005), Pekalska et al. (2006). Then, we compute the dissimilarity matrix in which each individual dissimilarity is computed on the basis of the measures described in Pekalska et al. (2006). Finally, we perform the classiﬁcation by invoking a classiﬁer built in the dissimilarity space. On the other hand, in DR-based methods, we prefer not to directly select the representation prototypes from the training samples; rather, we employ a way of using a DR, after computing the dissimilarity matrix with the entire set of training samples. Here, the point to be mentioned is how to choose the optimal number of prototypes and the subspace dimensions to be reduced. In PS-based methods, we heuristically select the same number of (or twice as many) prototypes as the number of classes. In DRbased methods, on the other hand, we can use a cumulative proportion technique (Laaksonen and Oja, 1996; Oja, 1983) to choose the subspace dimensions. The main contribution of this paper is to present an empirical evaluation on the two approaches of reducing the dimensionality of dissimilarity spaces for optimizing DBCs. This evaluation shows that DBCs can be optimized by employing a dimension reduction scheme as well as a prototype selection method. Here, the aim of using the dimension reduction scheme instead of selecting the prototypes is to accommodate some useful information for discrimination and to avoid the problem of ﬁnding the optimal number of prototypes. Our experimental results, obtained with the two approaches for an artiﬁcial and three real-life benchmark databases, demonstrate that the DR-based methods, such as PCA and LDAbased methods, can generally improve the classiﬁcation accuracy of DBCs more than the PS-based methods. The remainder of the paper is organized as follows: In Section 2, after presenting a brief overview of dissimilarity representation and prototype selection methods, we present a scheme for the solution of using the dimension reduction schemes. Following this, in Section 3, we present an experimental set-up for the prototype selection based methods and the dimension reduction based methods. In Section 4, we present the experimental results of an artiﬁcial data set and three real-life image databases. Finally, in Section 5, we present our concluding remarks.

ple, as an n m dissimilarity matrix, DT,Y[, ], where Y ¼ fyj gm j¼1 2 Rd , a prototype set, is extracted from T, and the subscripts of D represent the set of elements on which the dissimilarities are evaluated. Thus, each entry, DT,Y[i, j], corresponds to the dissimilarity between the pairs of objects, xi and yj, where xi 2 T and yj 2 Y. Consequently, an object, xi, is represented as a column (or a row) vector, d (xi, Y), as follows: T

dðxi ; YÞ ¼ ½dðxi ; y1 Þ; dðxi ; y2 Þ; . . . ; dðxi ; ym Þ ;

1 6 i 6 n:

ð1Þ

Here, the dissimilarity matrix, DT,Y[, ], deﬁnes vectors in a dissimilarity space on which the d-dimensional object, x, given in the feature space, is represented as an m-dimensional vector, d(x, Y). In this paper, the vector, d(x, Y), is simply denoted by dY(x). 2.2. Prototype selection (PS) methods The intention of selecting prototypes is to guarantee a good tradeoff between the recognition accuracy and the computational complexity when DBC is built on DT,Y[, ] rather than DT,T[, ]. Various PS methods, such as Random, RandomC, KCentres, ModeSeek, LinProg, FeatSel, KCentres-LP, and EdiCon, have been proposed in the literature (Pekalska and Duin, 2005; Pekalska et al., 2006). KCentres consists of a procedure that is applied to each class separately. For each class i, the algorithm is invoked so as to choose mi center objects, fyj gj¼1 , which are evenly distributed with respect to the dissimilarity matrix DT i ;T i ½; . Here, the center objects are chosen from all objects such that the maximum of the distances over all objects to the nearest center is minimized. FeatSel makes use of supervised information in the dissimilarity space, while KCentres is a k-means clustering (unsupervised) algorithm. Also, FeatSel, like a traditional feature selection method (i.e., forward/backward or plus-L-takeaway-R feature selection method Jain et al., 2000), is capable of selecting an optimal set of prototypes from the original dissimilarity representation according to some separation measure. Therefore, using this PS method, we can reduce the distance matrix DT,T[, ] to DT,Y[, ] by selecting an optimal set of Y prototypes according to the leave-one-out 1-NN error, which can be computed on the given dissimilarity space DT,T directly. In the interest of compactness, the details of the other methods are omitted here, but can be found in the related literature. DBCs summarized in Section 2.1, in which the representation prototypes are selected with a PS, are referred to as PS-based DBCs or simply as PS-based methods. An algorithm for PS-based DBCs is summarized in the following: 1. Select the representation subset, Y, from the training set, T, by resorting to one of the prototype selection methods. 2. Using Eq. (1), compute the dissimilarity matrix, DT,Y[, ], in which each individual dissimilarity is computed on the basis of the measures described in the literature. 3. For a testing sample, z, compute a dissimilarity column vector, dY(z), by using the same measure used in Step 2. 4. Achieve the classiﬁcation by invoking a classiﬁer built in the dissimilarity space and by operating the classiﬁer on the vector, dY(z).

2. Dissimilarity-based classiﬁcation (DBC)

From these four steps, we can see that the performance of DBCs relies heavily on how well the dissimilarity space, which is determined by the dissimilarity matrix, DT,Y[, ], is constructed. To improve the performance, we need to ensure that the dissimilarity matrix is well designed.

2.1. Foundations of DBCs

2.3. Dimension reduction (DR) schemes

A dissimilarity representation of a set of samples, T ¼ fxi gni¼1 2 R , is based on pairwise comparisons and is expressed, for exam-

With regard to reducing the dimensionality of the dissimilarity spaces, we can use a strategy of employing DRs after computing

d

818

S.-W. Kim / Pattern Recognition Letters 32 (2011) 816–823

the dissimilarity matrix with the entire set of training samples. Numerous DRs have been proposed in the literature (some of them are Belhumeour et al., 1997; Howland et al., 2006; Loog and Duin, 2004; Martinez and Kak, 2001; Yang et al., 2005; Yu and Yang, 2001). The most well known of these are the classes of linear discriminant analysis (LDA) and principal component analysis (PCA) (Martinez and Kak, 2001). The details of PCA and LDA are omitted here, but we brieﬂy explain below the Chernoff distance-based LDA (CLDA) (Loog and Duin, 2004), which is pertinent to our present study. It is well-known that LDA is incapable of dealing with the heteroscedastic data in a proper way. To overcome this limitation, the Fisher criterion can be extended in the transformed space based on the Chernoff measure (Loog and Duin, 2004). In CLDA, the square of Euclidean distance, SE ¼ p 1p SB , where SB and pi are the between1 2

class scatter matrix and a priori probability of class i, respectively, 1 is replaced with the Chernoff distance deﬁned as: SC ¼ S2 ðm1 T 12 1 m2 Þðm1 m2 Þ S þ p p ðlog S p1 log S1 p2 log S2 Þ, where Si and 1 2

mi are the scatter matrix and the mean vector, respectively; S = p1S1 + p2S2. Using SC, the Fisher separation criterion can be 1 deﬁned as: J H ¼ trfðASW AT Þ1 ðp1 p2 Aðm1 m2 Þ ðm1 m2 ÞT AT AS2W 1 1 1 1 1 p1 log SW2 S1 SW2 þ p2 log SW2 S2 SW2 S2W AT Þg (see Eq. (5) of Loog and Duin, 2004). CLDA ﬁnds a mapping of the labeled training data onto a q-dimensional linear subspace (q < c) such that it maximizes the heteroscedastic Chernoff criterion, JH. Here, to simplify the computation, the within-class scatter matrix, SW, can be set to the identity matrix. Therefore, the labeled data set can ﬁrst be centered and sphered by Karhunen–Loéve mapping, followed by scaling. Then, a solution to the Chernoff criterion can be determined. DBCs, in which the dimensionality of dissimilarity spaces is reduced with a DR, are referred to as DR-based DBCs or DR-based methods. An algorithm for DR-based DBCs is summarized in the following: 1. Select the entire set of training samples, T, as the representation subset, Y. 2. Using Eq. (1), compute the dissimilarity matrix, DT,T[, ], in which each individual dissimilarity is computed on the basis of the measures described in the literature. After computing the DT,T[, ], reduce its dimensionality by invoking a DR. 3. This step is the same as Step 3 in DBC (see the previous section). 4. This step is the same as Step 4 in DBC (see the previous section). The rationale of this strategy is presented in a later section together with the experimental results. 3. Experimental set-up 3.1. Experimental data PS-based and DR-based methods were tested and compared with each other by conducting experiments for an artiﬁcial data set (which is referred to as XOR4: 4-dimensional Exclusive OR), a digit image data set, and two well-known face databases, namely, Nist38 (Wilson and Garris, 1992), AT&T (Samaria et al., 1994), and Yale (Georghiades et al., 2001). The data set named XOR4, which has been included in the experiment as a baseline data set, was generated from a mixture of four 4 dimensional Gaussian distributions as follows: p1 ðxÞ ¼ 12 N l11 ; I4 þ

l12 ; I4 Þ;p2 ðxÞ ¼ 12 Nðl21 ; I4 Þ þ 12 Nðl22 ;I4 Þ, where l11 = [2, 2, 0, 0], l12 = [2, 2, 0, 0], l21 = [2, 2, 0, 0], and l22 = [2, 2, 0, 0]. Also, I4 is 1 Nð 2

the 4-dimensional Identity matrix. XOR4 consists of 400 data points of two classes represented within a square; 200 points of the ﬁrst

and the third corners of the square are in class 1, while 200 points of the opposite corners of the square are in class 2. From this, it is clear that each class contains two clusters. The data set captioned Nist38 chosen from the NIST database (Wilson and Garris, 1992) consists of two kinds of digits, 3 and 8, for a total of 1000 binary images. The size of each image is 32 32 pixels, for a total dimensionality of 1024 pixels. The face database of AT&T, formerly known as the ORL database of faces, consists of ten different images of 40 distinct objects, for a total of 400 images. The size of each image is 112 92 pixels, for a total dimensionality of 10304 pixels. The face database of Yale contains 165 gray scale images of 15 individuals. The size of each image is 243 320 pixels, for a total dimensionality of 77760 pixels. To reduce the computational complexity of this experiment, facial images of Yale were down-sampled into 178 236 pixels and then represented by a centered vector of normalized intensity values. On the other hand, when designing a biometric classiﬁcation system, we sometimes suffer from difﬁculties of collecting sufﬁcient data for each objects. For example, in face recognition, since the images are very high-dimensional, the dimension easily exceeds the number of images in the database. In this case, where the number of data samples is smaller than the dimension of data space, scatter matrices become singular and their inverses are not deﬁned. Therefore, the Fisher criterion cannot be applied for reducing the dimensionality, resulting in what is referred to as a small sample size (SSS) problem (Howland et al., 2006). From this consideration, XOR4 and Nist38 were employed as a benchmark for well-represented data sets, while AT&T and Yale were utilized as an example of the SSS problems. 3.2. Experimental method In this experiment, ﬁrst, data sets are randomly split into training sets and test sets in the ratio of 75:25. Then, the training and testing procedures are repeated 30 times and the results obtained are averaged. To measure the dissimilarity between two object samples, we used conventional measuring systems, such as Euclidean distance (ED) and the regional distance (RD) (Adini et al., 1997). In RD, the dissimilarity is deﬁned as the average of the minimum difference between the gray value of a pixel and the gray value of each pixel in a 5 5 neighborhood of the corresponding pixel. In this case, therefore, the regional distance compensates for a displacement of up to three pixels of the images. To construct the dissimilarity matrices, in PS-based methods, we employed RandomC, KCentres, and FeatSel,1 which were, respectively, selected as a random-based method, a k-means clustering (unsupervised) algorithm, and a supervised algorithm of focusing the leave-one-out 1-NN error in the dissimilarity space. In DR-based methods, on the other hand, to keep a balance in the competition between the two approaches, we employed only three DR-based methods, such as PCA (Belhumeour et al., 1997), direct LDA (LDA) (Yu and Yang, 2001), and Chernoff distance-based LDA (CLDA) (Loog and Duin, 2004). Here, PCA and LDA were selected as the counterparts of KCentres and FeatSel, respectively. Then, CLDA was considered as an advanced one in reducing the dimensionality. For the sake of researchers who would like to duplicate the results, in this experiment, CLDA was implemented with a function of PRTools (Duin et al., 2004), i.e., chernoffm, where the regularization variable was determined as 0.1.

1 Some of the DR methods (e.g., LDA and CLDA) are supervised in the sense that they optimize the between-class scatter vs. within-class scatter information and by this partially explore the gap between the classes. To be fair, FeatSel (i.e., plus-Ltakeaway-R feature selection method) has been selected as a PS method of supervised algorithms. The authors are thankful to the referee who brought this observation to our attention.

819

S.-W. Kim / Pattern Recognition Letters 32 (2011) 816–823

j¼1

, kj

d X

kj :

ð2Þ

j¼1

Here, the subspace dimension, q, (where d and kj are the dimensionality and the eigenvalue) of the data sets is determined by considering the cumulative proportion a(q). The eigenvectors and eigenvalues are computed, and the cumulative sum of the eigenvalues is compared to a ﬁxed number. In other words, the subspace dimensions are selected by considering the following relation:

aðqÞ 6 k 6 aðq þ 1Þ:

ð3Þ

The algorithm employed can be formalized as follows: 1. Compute the covariance matrix, G, with the training data set, T. From G, compute the eigenvalues, kj, j = 1, . . . , d, and the cumulative proportion, a(q), as Eq. (2). 2. Select a q as the ﬁnal best subspace dimension by considering Eq. (3).

Estimated error rates

aðqÞ ¼

q X

knnc 1 Euclidean distance (ED) regional distance (RD)

0.8 0.6 0.4 0.2 0 1

2

4

8

16

32

64

128

Dimensions ldc 1

Estimated error rates

In addition, to select the dimensions for the systems, we used the cumulative proportion, a, which is deﬁned as follows (Laaksonen and Oja, 1996; Oja, 1983):

Euclidean distance (ED) regional distance (RD)

0.8 0.6 0.4 0.2 0

1

2

4

8

16

32

64

128

Dimensions qdc Estimated error rates

1 Euclidean distance (ED) regional distance (RD)

0.8 0.6 0.4 0.2 0

1

2

4

8

16

32

64

128

Dimensions libsvm Estimated error rates

In this experiment, the PS-based approaches run in dissimilarity spaces of either c or 2c dimension (where c is the number of classes). On the other hand, the DR-based approaches work in qdimensional spaces, where q is chosen such that 99% of variance is preserved. More speciﬁcally, the dimensionality of dissimilarity spaces for PCA based method is q, while that of LDA/CLDA based methods is min(q, c1). Finally, to maintain the diversity between the dissimilaritybased classiﬁcations, we designed different classiﬁers, such as k-nearest neighbor classiﬁers (k = 1), regularized normal densitybased linear/quadratic classiﬁers, and support vector machines (Fan et al., 1889). All of the classiﬁers mentioned above were also implemented with PRTools (Duin et al., 2004) and denoted in the next section as knnc, ldc, qdc, and libsvm, respectively. Here, ldc and qdc were regularized with the parameters of (r, s) = (0.01, 0.01) when computing the covariance matrix G by: G = (1 rs)⁄G + r⁄diag(diag(G)) + s⁄mean(diag(G))⁄eye(size(G,1)), where the functions of diag, mean, eye, and size are, respectively, for diagonals of a matrix, mean values of a vector, an identity matrix, and a size of an array. Also, libsvm2 was implemented using the software provided in Fan et al. (1889) (i.e., svm-train and svm-predict) and a polynomial kernel function of degree 1.

1 Euclidean distance (ED) regional distance (RD)

0.8 0.6 0.4 0.2 0

1

2

4

8

16

32

64

128

Dimensions Fig. 1. Plots of the estimated error rates of PS-based DBCs for Yale: (a) top, (b) the second from the top, (c) the third, and (d) bottom; (a)–(d) are obtained in knnc, ldc, qdc, and libsvm, respectively. Here, the prototype subsets are selected with KCentres.

4. Experimental results The run-time characteristics of the empirical evaluation on the four data sets are reported below and shown in ﬁgures and tables. We ﬁrst investigate the rationality of employing a PS or a DR method in reducing the dimensionality. Then, we present classiﬁcation accuracies of the PS and DR-based methods for an artiﬁcial data set and three real-life databases. Consequently, based on the classiﬁcation results, we grade a ranking of the methods. Finally, we introduce a comparison of the processing CPU-times. Firstly, the experimental results of PS and DR-based methods explained in Sections 2.2 and 2.3 were probed with ﬁgures. Figs. 1 and 2 show plots of the estimated error rates obtained with knnc, ldc, qdc, and libsvm for Yale. In Fig. 1, the dissimilarity matrix, in which the four classiﬁers were evaluated, was generated with the prototype subset selected with a PS (i.e., KCentres). In Fig. 2, on the other hand, after generating the dissimilarity matrix with the 2

This software is referred to as libsvc in PRTools (Duin et al., 2004).

entire data set, the dimensionality of the matrix was reduced by invoking a DR algorithm (i.e., PCA). In the ﬁgures, it is interesting to note that the PS and DR-based DBCs can be optimized by means of choosing the number of prototypes or reducing the dimensionality of the dissimilarity space. Consider ﬁrst the results of RD for knnc (marked with w) in Fig. 1(a) and Fig. 2(a). In this case, both classiﬁcation accuracies of the DBCs are saturated when the number of prototypes and the subspace dimensions are 8 and 16, respectively. Similarly, in the cases of ldc, qdc, and libsvm, the accuracies are also saturated when the number of prototypes and the subspace dimensions are 16 or 32. The same characteristics could also be observed in the AT&T and Nist38 databases. The results of these databases are omitted here again to avoid repetition. Here, the problem to be addressed is how to choose the optimal number of prototypes and the dimension to be reduced. In PSbased methods, an experimental parameter, such as c or 2c, can

820

S.-W. Kim / Pattern Recognition Letters 32 (2011) 816–823

ED 1 Euclidean distance (ED) regional distance (RD)

0.8 0.6 0.4 0.2 0

1

2

4

8

16

32

64

Criterion values

Estimated error rates

knnc 1

0.8 0.6

Nist38 (2c=4) AT&T (2c=80) Yale (2c=30)

0.4 0.2 0

128

1

2

4

Dimensions

8

0.8 0.6 0.4 0.2 1

2

4

8

32

64

80

RD Euclidean distance (ED) regional distance (RD)

16

32

64

1

Criterion values

Estimated error rates

ldc 1

0

16

Number of features (selected)

0.8 0.6

Nist38 (2c=4) AT&T (2c=80) Yale (2c=30)

0.4 0.2

128

0

Dimensions

1

2

4

8

16

32

64

80

Number of features (selected)

Estimated error rates

qdc 1 Euclidean distance (ED) regional distance (RD)

0.8 0.6 0.4

4.1. Experimental results: artiﬁcial data sets

0.2 0

1

2

4

8

16

32

64

128

Dimensions libsvm Estimated error rates

Fig. 3. Plots of the characteristics between the criterion values and the number of features selected with FeatSel for the Nist38, AT&T, and Yale data sets: (a) top and (b) bottom; (a) and (b) are obtained in ED and RD metrics, respectively. Here, the selected dimensions for the three data sets are 4, 80, and 30, respectively.

1 Euclidean distance (ED) regional distance (RD)

0.8 0.6 0.4 0.2 0

1

2

4

8

16

32

64

128

Dimensions Fig. 2. Plots of the estimated error rates of DR-based DBCs for Yale: (a) top, (b) the second from the top, (c) the third, and (d) bottom; (a)–(d) are obtained in knnc, ldc, qdc, and libsvm, respectively. Here, the subspace dimensions are selected with a PCA.

be selected as the number of prototypes. For FeatSel, we selected an optimal set of prototypes by computing a criterion value based on leave-one-out 1-NN classiﬁcation applied directly on the dissimilarity matrix. To provide an argument on why such a criterion makes sense, we investigated the changes in the criterion values evaluated for the Nist38, AT&T, and Yale data sets. Fig. 3 shows a plot of the characteristics between the criterion values and the number of features selected with FeatSel. In the ﬁgure, it should be mentioned that we can optimize the number of (selected) features by considering the criterion values for the data sets. From this consideration, it is evident that the rationale of the paper for employing the criterion works well. In DR-based methods, on the other hand, the subspace dimension can be chosen by using the cumulative proportion (Laaksonen and Oja, 1996; Oja, 1983). For example, in this experiment, by considering Eqs. (2) and (3) when k = 0.99, we chose the subspace dimensions as follows: qED = 7 for XOR4; qED = 41, qRD = 22 for Nist38; qED = 184, qRD = 21 for AT&T; and qED = 76, qRD = 15 for Yale.

Although it is hard to quantitatively evaluate the various PS and DR-based methods, we have attempted to do exactly this. First, the experimental results obtained with an artiﬁcial data set, XOR4, were probed into. Table 1 shows the estimated error rates (standard deviation) of the two approaches for XOR4. In the table, PS/ DR methods in the second column refers to the prototype selection methods and/or the dimension reduction schemes. The third column is the estimated error rates obtained with the four classiﬁers (PS-based and DR-based DBCs). Here, the values underlined are the lowest ones among the six error rates of each classiﬁer (three of RandomC, KCentres, and FeatSel and three of PCA, LDA, and CLDA). As can be shown in Fig. 1, increasing the cardinality of the prototype subset decreases the average error rates of the resultant DBCs. From this consideration, we repeated the experiment for the benchmark data set in two dissimilarity matrices constructed with the cardinalities of c and 2c, respectively. By comparing the two error rates, we observed that the error rate achieved in the 2c cardinality is generally lower than that of the c cardinality. For the interest of brevity, however, only the error rate obtained in the cardinality of 2c was presented here. Table 1 A numerical comparison of the estimated error rates (standard deviations) of PS/DRbased DBCs for XOR4 in the dissimilarity spaces of 2c and q cardinalities. Data set

PS/DR methods

PS-based and/or DR-based DBCs knnc

ldc

qdc

libsvm

XOR4

RandomC

0.1050 (0.0329) 0.0887 (0.0286) 0.1007 (0.0275) 0.0883 (0.0241)

0.3560 (0.1390) 0.1100 (0.0396) 0.3200 (0.1243)

0.1087 (0.0384) 0.0687 (0.0274) 0.1037 (0.0325)

0.4213 (0.1222) 0.1397 (0.0540) 0.3823 (0.1055)

0.0623 (0.0237) 0.3103 (0.1405) 0.1144 (0.0516)

0.0613 (0.0247) 0.2980 (0.1284) 0.2822 (0.0747)

0.0560 (0.0239) 0.3043 (0.1368) 0.1122 (0.0442)

KCentres FeatSel PCA LDA

0.2973 (0.0841)

CLDA

0.0878 (0.0406)

821

S.-W. Kim / Pattern Recognition Letters 32 (2011) 816–823

From the results shown in Table 1, we can see that almost all the lowest error rates (underlined) are achieved with a DR-based method, such as PCA or LDA. For example, the lowest rate of knnc column is obtained with CLDA. Also, the lowest rates of the ldc, qdc, and libsvm classiﬁers are all achieved with PCA. This observation conﬁrms the possibility that we can improve the classiﬁcation accuracy of DBCs by effectively reducing the dimensionality of dissimilarity spaces after constructing them with all of the training samples. To observe how well the PS/DR methods work, we picked the best three among the six (three of PSs and three of DRs) methods per each classiﬁer and ranked them in the order from the lowest to the highest error rates. Although this comparison is a very simplistic model of comparison, we believe that it is the easiest approach a researcher can employ when dealing with algorithms

Table 2 Top three PS/DR methods in terms of error rates obtained with the four classiﬁers for XOR4 in the dissimilarity spaces of 2c and q cardinalities. Data set

XOR4

Performance rank

1st 2nd 3rd

PS and/or DR-based DBCs knnc

ldc

qdc

libsvm

CLDA PCA KCentres

PCA KCentres CLDA

PCA KCentres FeatSel

PCA CLDA KCentres

that have different characteristics. A more scientiﬁc model for comparison like the one employed by Sohn (Sohn, 1999) is an avenue for future work. Table 2 shows the rankings of the PS and DR methods for XOR4 in terms of classiﬁcation accuracies (error rates). In this comparison, DBCs were also designed in the dissimilarity spaces of the cardinality of c, 2c, and q (or min(q, c1) for LDA and CLDA), respectively. For the interest of brevity, however, as shown in Table 1, only the table of showing a ranking of the two DBCs of 2c and q cardinalities was presented here. From the table, we can clearly observe the possibility of improving classiﬁcation performance of DBCs by utilizing PCA. 4.2. Experimental results: real-life data sets In order to further investigate the run-time characteristics gained by utilizing the PS and DR-based methods, we repeated the experiment for three real-life databases, namely, Nist38, AT&T, and Yale. Table 3 shows the estimated error rates (standard deviation) of these two approaches for the three databases. In the table, ED and RD in the second column are abbreviations for the dissimilarity measuring systems employed and the others are the same as in Table 1. Consider the classiﬁcation results obtained with the ﬁrst data set, Nist38. From the results, we can see that all the lowest error rates (underlined) are achieved with a DR-based method, PCA.

Table 3 A numerical comparison of the estimated error rates (standard deviations) of PS/DR-based DBCs for the three real-life databases in the dissimilarity spaces of 2c and q cardinalities. Data sets

Nist38

Distance metrics

ED

RD

AT&T

ED

RD

PS/DR methods

RandomC KCentres FeatSel PCA LDA CLDA RandomC KCentres FeatSel PCA

ED

RD

ldc

qdc

libsvm

0.2132 (0.0673) 0.2053 (0.0562) 0.0656 (0.0174)

0.1929 (0.0724) 0.1907 (0.0668) 0.0588 (0.0170)

0.1817 (0.0659) 0.1809 (0.0601) 0.0501 (0.0149)

0.2011 (0.0732) 0.1963 (0.0714) 0.0587 (0.0157)

0.0236 0.1488 0.1409 0.1357 0.1093 0.0697

0.0195 0.1291 0.1409 0.1227 0.1027 0.0607

0.0085 0.1244 0.1432 0.1123 0.1013 0.0567

0.0207 0.1181 0.1411 0.1284 0.1096 0.0652

(0.0079) (0.0236) (0.0276) (0.0454) (0.0429) (0.0190)

(0.0078) (0.0156) (0.0276) (0.0441) (0.0404) (0.0169)

(0.0056) (0.0148) (0.0263) (0.0394) (0.0418) (0.0179)

(0.0085) (0.0150) (0.0277) (0.0474) (0.0394) (0.0177)

LDA CLDA

0.0333 (0.0120) 0.1356 (0.0197) 0.4576 (0.0310)

0.0291 (0.0107) 0.1176 (0.0204) 0.4576 (0.0310)

0.0136 (0.0066) 0.1095 (0.0176) 0.4575 (0.0312)

0.0343 (0.0106) 0.1095 (0.0163) 0.4576 (0.0311)

RandomC KCentres FeatSel PCA

0.0683 0.0700 0.0683 0.0408

0.0354 (0.0215) 0.0462 (0.0200) 0.0733 (0.0258)

0.0550 0.0567 0.0617 0.0629

0.1825 0.1942 0.2042 0.0504

LDA

0.0308 (0.0222)

CLDA

0.0283 (0.0186) 0.0700 (0.0311) 0.0708 (0.0322)

RandomC KCentres FeatSel PCA LDA

Yale

PS-based and/or DR-based DBCs knnc

(0.0311) (0.0272) (0.0293) (0.0210)

0.0625 (0.0297) 0.0750 (0.0301)

CLDA

0.0104 (0.0099) 0.0446 (0.0260)

RandomC KCentres FeatSel PCA

0.1944 0.1978 0.1867 0.1822

LDA

0.1089 (0.0495)

CLDA

0.0922 (0.0516) 0.2111 (0.0542)

RandomC KCentres FeatSel PCA LDA CLDA

(0.0503) (0.0532) (0.0687) (0.0523)

0.2089 (0.0532) 0.1989 (0.0628) 0.2067 (0.0506) 0.0744 (0.0452) 0.0900 (0.0519)

0.0183 (0.0146) 0.0258 (0.0194) 0.0350 (0.0203) 0.0142 (0.0117) 0.0138 0.0242 0.0396 0.0283

(0.0115) (0.0180) (0.0192) (0.0150)

0.0638 (0.0285) 0.1167 (0.0531) 0.1067 (0.0535) 0.1311 (0.0532) 0.0644 (0.0289) 0.0678 (0.0321) 0.1233 (0.0588) 0.0300 0.0356 0.0700 0.0322 0.0333

(0.0332) (0.0360) (0.0414) (0.0344) (0.0316)

0.1056 (0.0533)

(0.0240) (0.0202) (0.0267) (0.0240)

0.0342 (0.0191) 0.5817 (0.0905)

(0.0407) (0.0374) (0.0477) (0.0211)

0.0504 (0.0309)

0.0387 (0.0228) 0.0433 (0.0224)

0.0479 (0.0305) 0.1942 (0.0408) 0.1933 (0.0400)

0.0417 (0.0242) 0.0354 (0.0218)

0.2325 (0.0426) 0.0892 (0.0266)

0.0167 (0.0155) 0.1479 (0.0341)

0.0458 (0.0201) 0.0654 (0.0280)

0.2133 0.2233 0.2111 0.1900

0.2222 0.2311 0.2744 0.1933

(0.0507) (0.0588) (0.0505) (0.0620)

0.1556 (0.0570) 0.2744 (0.0762)

(0.0570) (0.0539) (0.0741) (0.0528)

0.1111 (0.0466)

0.2344 (0.0564)

0.1100 (0.0504) 0.2933 (0.0657)

0.2400 (0.0563) 0.2189 (0.0688) 0.2189 (0.0592)

0.2944 (0.0554) 0.2800 (0.0598) 0.2200 (0.0500)

0.1522 (0.0544) 0.2622 (0.0757)

0.0578 (0.0419) 0.1156 (0.0598)

822

S.-W. Kim / Pattern Recognition Letters 32 (2011) 816–823

Table 4 Top three PS/DR methods in terms of error rates obtained with the four classiﬁers for the three real-life databases in the dissimilarity spaces of 2c and q cardinalities. Data sets

Distance metrics

Performance rank

knnc

ldc

qdc

libsvm

Nist38

ED

1st 2nd 3rd 1st 2nd 3rd

PCA FeatSel CLDA PCA FeatSel KCentres

PCA FeatSel LDA PCA FeatSel KCentres

PCA FeatSel LDA PCA FeatSel KCentres

PCA FeatSel LDA PCA FeatSel LDA

1st 2nd 3rd 1st 2nd 3rd

CLDA LDA PCA LDA CLDA FeatSel

PCA LDA CLDA KCentres RandomC FeatSel

LDA RandomC KCentres LDA PCA RandomC

CLDA PCA, LDA LDA CLDA PCA

1st 2nd 3rd 1st 2nd 3rd

CLDA LDA PCA LDA CLDA FeatSel

PCA LDA KCentres RandomC PCA LDA

LDA PCA FeatSel LDA PCA, FeatSel

CLDA LDA PCA LDA CLDA PCA

RD

AT&T

ED

RD

Yale

ED

RD

Similar, not exactly the same, characteristics could also be observed in the second and the third data sets. For AT&T, ﬁrst, we can see an exception, 0.0138 of (RD - KCentres - ldc), which was occurred with a PS, i.e., KCentres. Next, for Yale, we can also see another exception, 0.0300 of (RD - RandomC - ldc). From the table, it should also be mentioned that the classiﬁcation error of CLDA is sometimes greater than that of LDA, even though CLDA has been considered as an advanced one in reducing the dimensionality. For example, for AT&T data set, the classiﬁcation error rates of (RD - LDA - qdc) and (RD - CLDA - qdc) are 0.0167 and 0.1479, respectively. This difference in classiﬁcation accuracies between the two methods seems to be resulted on the assumptions that we had taken in computing the heteroscedastic Chernoff criterion and the regularization variable. Finding the exact causes of this difference is an avenue for future work. To avoid the confusion based on the exceptions encountered and to observe how well the methods work, we again picked the best three among the six methods per each classiﬁer and ranked them as in Table 2. Table 4 shows the rankings of PS and DR methods for Nist38, AT&T, and Yale in terms of classiﬁcation accuracies (error rates). From the table, we can clearly observe the possibility of improving classiﬁcation performance of DBCs by utilizing DRs. Consider ranking of the ﬁrst data set, Nist38, shown in the ﬁrst row of Table 4, which was obtained from Table 3. From the rankings, we can see that almost all of the higher rankings are from the results of DRs, namely, PCA and LDA, and a few of them are those of PSs, such as FeatSel and KCentres. In the case of ED, only four among the 12 (three ranks four classiﬁers) entries are the results of PS-based DBCs and the remaining are of DR-based ones. However, the second row of Nist38, which was obtained with RD metric, shows a different trend for some classiﬁers. From these considerations, we can see that it is difﬁcult for us to grade the methods as they are. Therefore, for simple comparisons, we ﬁrst gave marks of 3, 2, or 1 to all entries (DR and PS methods) of Tables 2 and 4 according to their ranks; 3 marks are given to the 1st rank; 2 marks for the 2nd, and 1 mark for the 3rd. Then, we added all marks that each method earned with the four classiﬁers and the two measuring methods. As an example, for Nist38 in Table 4, the marks that PCA got from ED and RD rows are 12(=3 + 3 + 3 + 3) and 12(=3 + 3 + 3 + 3), respectively. Thus, the total marks that the PCA earned is 24. Using the same system, we graded the other DR (and PS) methods, and, as a ﬁnal ranking, we obtained a competition result as shown in Table 5.

PS and/or DR-based DBCs

Table 5 Grades of the PS/DR methods obtained with the four classiﬁers for the four data sets, where the values () of each DR (or PS) method are the grades obtained. Rank

XOR4

Nist38

AT&T

Yale

1st 2nd 3rd

PCA (11) CLDA (6) KCentres (5)

PCA (24) FeatSel (16) LDA (4)

LDA (18) CLDA (11) PCA (9)

LDA (19) PCA (12) CLDA (10)

From the ﬁnal ranking, we can see that almost all of the highest three ranks are of DR methods. Thus, in general, it should be mentioned that more satisfactory optimization of DBCs can be achieved by applying a DR after building the dissimilarity space with all of the available samples, rather than by selecting the prototype subset from them. The other observations obtained from the experiments are the followings: PCA + LDA, which is one of the most popular LDA-based strategies to deal with the SSS problems, should lead to worse results than LDA in case of XOR4 and Nist38, simply because they are well represented. On the other hand, we should expect improvement of PCA + LDA vs. LDA for the other data as we get badly estimated covariance matrices (between-class scatter and within-class scatter) in these enormously high dimensional spaces.3 In addition, it is interesting to note that, among the three PS methods, we obtained the lowest error rates with KCentres and FeatSel for XOR4 and Nist38, respectively. For AT&T and Yale, however, we obtained the lowest ones with RandomC. As mentioned in Section 3.2, to be fair, we selected FeatSel as a PS method and utilized the supervised information. However, we failed to obtain an improved accuracy with FeatSel for the SSS problems. We also failed to get answers to the following questions: Why does the accuracy of RD be poorer than that of ED?4 Why does the accuracy of CLDA be lower than that of LDA? The problem of answering these questions remains as unresolved. Finally, in the attempt to provide a comparison between the PSbased and DR-based methods, we present an analysis on their computational complexities. Rather than embark on another analysis of the computational complexities of the two approaches, however, we simply measured their processing CPU-times (seconds) for the four data sets. Table 6 shows a comparison of the averaged processing

3

We are also thankful to the referee who informed us about this point. Note that RD has been developed to increase the performance of ED metric (Adini et al., 1997). 4

823

S.-W. Kim / Pattern Recognition Letters 32 (2011) 816–823 Table 6 A comparison of the processing CPU-times (seconds) of the PS and DR methods of ED metric for the four data sets (sample #/dimension #/ class #). Each processing time is obtained by averaging the results of ten iterations on a Windows platform (CPU: 2.67 GHz, RAM: 3.50 GB). PS/DR methods

XOR4 (400/4/2)

Nist38 (1000/1024/2)

AT&T (400/10304/40)

Yale (165/42008/15)

PS

RandomC KCentres FeatSel

0.00 0.09 28.82

0.00 0.49 619.44

0.00 0.36 1851.83

0.00 0.13 83.65

DR

PCA LDA CLDA

0.79 0.06 2.03

15.55 3.11 47.54

1.16 0.07 1291.55

0.23 0.00 20.89

CPU-times (which do not include the times for classiﬁcation) of the PS and DR methods of ED metric for the four data sets. From Table 6, we can observe that the processing CPU-times increased when FeatSel and CLDA are applied. Especially, it should be mentioned that the processing time of these two methods increases radically as the number of classes increases. For example, consider the processing times for Nist38. The processing times for PCA, LDA, and CLDA methods are, respectively, 15.55, 3.11, and 47.54 (seconds), while those of RandomC, KCentres, and FeatSel are, respectively again, 0, 0.49, and 619.44 (seconds). The same characteristic could also be observed in the results of RD metric. In review, the experimental results show that when the DRbased method, namely, PCA or LDA-based method, is applied to the dissimilarity representation, the classiﬁcation accuracy of the resultant DBCs increases. More speciﬁcally, in terms of classiﬁcation accuracies (error rates), PCA is clearly more useful for the well-represented data sets, such as XOR4 and Nist38, while LDA is more helpful for the SSS problems, such as AT&T and Yale. 5. Conclusions In order to reduce the dimensionality of dissimilarity spaces for optimizing dissimilarity-based classiﬁcations (DBCs), two kinds of approaches, prototype selection (PS) based methods and dimension reduction (DR) based methods, have been explored separately in the literature. The aim of this paper is to empirically evaluate the two methods in terms of classiﬁcation accuracy for an artiﬁcial, three real-life benchmark databases. It is a well known fact that classiﬁcation algorithms based on LDA are superior to those based on PCA, while PCA can outperform LDA when the training data set is small (Martinez and Kak, 2001). However, no empirical comparison has been done between the performances of DBCs in the dissimilarity spaces reduced by PCA and LDA. Our experimental results obtained with the few examples demonstrate that DR-based methods, such as PCA and LDA based method, excel PS-based methods in terms of classiﬁcation accuracy. The results also show that PCA based method is generally more useful for the well-represented data sets, while LDA based method works better for the SSS problems. Despite this success, problems remain to be addressed. First, in this evaluation, we employed a very simplistic model of comparison. Thus, developing a more scientiﬁc model, such as the one in Sohn (1999), is an avenue for future work. Beside this, we failed to obtain an improved accuracy with the FeatSel and CLDA methods. We also have a question to answer: Why does the accuracy of RD be lower than that of ED? To solve these questions, therefore, further analyzing the experimental results obtained is required. Acknowledgments The authors thank the anonymous Referees for their valuable comments, which improved the quality and readability of the paper.

References Adini, Y., Moses, Y., Ullman, S., 1997. Face recognition: The problem of compensating for changes in illumination direction. IEEE Trans. Pattern Anal. Machine Intell. 19 (7), 721–732. Belhumeour, P.N., Hespanha, J.P., Kriegman, D.J., 1997. Eigenfaces vs. Fisherfaces: Recognition using class speciﬁc linear projection. IEEE Trans. Pattern Anal. Machine Intell. 19 (7), 711–720. Bicego, M., Murino, V., Figueiredo, M.A.T., 2004. Similarity-based classiﬁcation of sequences using hidden Markov models. Pattern Recognit. 37, 2281– 2291. Bunke, H. Riesen, K. 2007. A family of novel graph kernels for structural pattern recognition. In: Proceedings of 12th Iberoamerican Congress on Pattern Recognition, Vina del Mar - Valparaiso, Chile, LNCS-4756 20–31. Duin, R.P.W., Juszczak, P., de Ridder, D., Paclik, P., Pekalska, E., 2004. Tax, D.M.J.: PRTools 4: A Matlab Toolbox for Pattern Recognition. Technical Report, Delft University of Technology, The Netherlands. Can also be available at . Fan, R.-E., Chen, P.-H., Lin, C.-J. 2005. Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6 1889– 1918. Can also be referred to (as of July 2010) . Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J., 2001. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Machine Intell. 23 (6), 643–660. Can also be downloaded (as of February 2010) from . Howland, P., Wang, J., Park, H., 2006. Solving the small sample size problem in face recognition using generalized discriminant analysis. Pattern Recognit. 39, 277– 287. Jain, A.K., Duin, R.P.W., Mao, J., 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Machine Intell. 22 (1), 4–37. Kim, S.-W., Gao, J. 2008. On using dimensionality reduction schemes to optimize dissimilarity-based classiﬁers. In: Proceedings of 13th Iberoamerican Congress on Pattern Recognition, Havana, Cuba, LNCS-5197, pp. 309–316. Kim, S.-W., Oommen, B.J., 2007. On using prototype reduction schemes to optimize dissimilarity-based classiﬁcation. Pattern Recognit. 40, 2946–2957. Laaksonen,J., Oja, E. 1996. Subspace dimension selection and averaged learning subspace method in handwritten digit classiﬁcation. In: Proceedings of ICANN, Bochum, Germany, pp. 227–232. Loog, M., Duin, R.P.W., 2004. Linear dimensionality reduction via a heteroscedastic extension of LDA: The Cherno criterion. IEEE Trans. Pattern Anal. Machine Intell. 26 (6), 732–739. Martinez, A.M., Kak, A.C., 2001. PCA versus LDA. IEEE Trans. Pattern Anal. Machine Intell. 23 (2), 228–233. Oja, E., 1983. Subspace Methods of Pattern Recognition. Research Studies Press. Pekalska, E., Duin, R.P.W., 2005. The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientiﬁc, Publishing, Singapore. Pekalska, E., Duin, R.P.W., Paclik, P., 2006. Prototype selection for dissimilaritybased classiﬁers. Pattern Recognit. 39, 189–208. Riesen, K., Kilchherr, V., Bunke, H. 2007. Reducing the dimensionality of vector space embeddings of graphs. In: Proceedings of 5th International Conference on Machine Learning and Data Mining, LNAI-4571, pp. 563–573. Samaria, F., Harter, A. 1994. Parameterisation of a stochastic model for human face identiﬁcation. In: Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, 215–220. Can also be available (as of February 2010) from . Sohn, S.Y., 1999. Meta analysis of classiﬁcation algorithms for pattern. IEEE Trans. Pattern Anal. Machine Intell. 21 (11), 1137–1144. Wilson, C.L., Garris, M.D. 1992. Handprinted Character Database 3. Technical report, National Institute of Standards and Technology, Gaithersburg, Maryland. Yang, J., Zhang, D., Yong, X., Yang, J.-Y., 2005. Two-dimensional discriminant transform for face recognition. Pattern Recognit. 38, 1125–1129. Yu, H., Yang, J., 2001. A direct LDA algorithm for high-dimensional data – with application to face recognition. Pattern Recognit. 34, 2067–2070.

An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications

An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications

Recommend Documents