Fuzzy multi-class classifier based on support vector data description and improved PCM

Expert Systems with Applications 36 (2009) 8714–8718 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

Download PDF

195KB Sizes 0 Downloads 38 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 36 (2009) 8714–8718

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Fuzzy multi-class classiﬁer based on support vector data description and improved PCM Yong Zhang a,b,*, Zhong-Xian Chi b, Ke-Qiu Li b a b

Department of Computer, Liaoning Normal University, Dalian 116081, China Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China

a r t i c l e

i n f o

Keywords: Support vector data description Possibilistic c-means algorithm Minimum enclosing sphere Classiﬁer SVM

a b s t r a c t In this paper, a novel fuzzy classiﬁer for multi-classiﬁcation problems, based on support vector data description (SVDD) and improved PCM, is proposed. The proposed method is the robust version of SVDD by assigning a weight to each data point, which represents fuzzy membership degree of the cluster computed by the improved PCM method. Accordingly, this paper presents the multi-classiﬁcation algorithm based on the robust weighted SVDD, and gives the simple classiﬁcation rule. Experimental results show that the proposed method can reduce the effect of outliers and yield higher classiﬁcation rate. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction SVMs for pattern classiﬁcation are based on two-class classiﬁcation problems. The interesting property of SVM is that it is an approximate implementation of the structural risk minimization principal based on statistical learning theory (SLT) (Vapnik, 1995) rather than the empirical risk minimization method. In recent years, SVM has drawn considerable attentions due to its high generalization ability of a wide range of applications and better performance than other traditional learning machines (Muller, Mike, & Ratsch, 2001; Schölkopf & Smola, 2002). However, unclassiﬁable regions exist when they are extended to multi-class problems. Now there are some methods to solve this problem. One is to construct several two-class classiﬁers, and then to assemble a multi-class classiﬁer, such as one-against-one, oneagainst-all and DAGSVMs (Hsu & Lin, 2002; Platt, Cristianini, & Shawe-Taylor, 2002). The second method is to construct multiclass classiﬁer directly, such as k-SVM proposed by Weston and Watkins (1998). Zhang, Chi, Liu, and Wang (2007) also proposed a fuzzy compensation multi-classiﬁcation SVM based on Weston and Walkins’ idea. Recently, Zhu, Chen, and Liu (2003) proposed a multi-class classiﬁcation algorithm which adopted the minimum enclosing spheres to classify a new example and showed that the resulting classiﬁer performed comparable to the standard SVMs. Based on Zhu et al. (2003), Wang, Neskovic, and Cooper (2007) and Lee and Lee (2007) also proposed a new classiﬁcation rule on the basis of Bayesian optimal decision theory, respectively.

* Corresponding author. Address: Department of Computer, Liaoning Normal University, No. 20, Liushu South Street, Ganjingzi, Dalian, Liaoning Province 116081, China. E-mail address: [email protected] (Y. Zhang). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.03.026

The minimum enclosing sphere of a set of data, deﬁned as the smallest sphere enclosing the data, was ﬁrst used by Schölkopf et al., to estimate the VC-dimension of support vector classiﬁers (Schölkopf, Burges, & Vapnik, 1995, 2001) and later applied by Tax and Duin to data domain description (Tax & Duin, 1999, 2004), named as support vector data description (SVDD). From a computational point of view, there is a great advantage to use the minimum enclosing spheres for pattern classiﬁcation purpose. In many practical engineering applications, the obtained training data is often contaminated by noises. Furthermore, some points in the training data set are misplaced far away from main body or even on the wrong side in feature space. The standard SVM is very sensitive to outliers and noises. Some techniques have been found to tackle this problem for SVM in the literature of data classiﬁcation. One of the methods is to use averaging algorithm to relax the effect of outliers in Song, Hu, and Xie (2002), Hu and Song (2004). But in this kind of method, the additional parameter is difﬁcult to be tuned. The second method, proposed in Lin and Wang (2002, 2003), Inoue and Abe (2001) and named as a fuzzy SVM (FSVM), is to apply fuzzy membership to the training data to relax the effect of the outliers. However, the selection of membership function remains a problem for FSVM so far. The third interesting method, called support vector data description (SVDD) (Tax & Duin, 1999, 2004), has been shown effectiveness to detect outliers from the normal data points. However, it is well-known that SVDD is particular in the case of one-class classiﬁcation problem. The original possibilistic c-means (PCM) algorithm (Krishnapuram & Keller, 1993, 1996) has been shown to have satisﬁed ability of handling noises and outliers. In addition, the induced partition assigns each data point a relative value (probability) according to the compatibility of the points with the class prototypes: important data points have high values but outliers and noises have

8715

Y. Zhang et al. / Expert Systems with Applications 36 (2009) 8714–8718

low values. The weights used in our proposed method are generated by the improved possibilistic c-means algorithm, which extends the original PCM algorithm into the kernel space by using kernel methods. That is to say, the partitioned relative values of the improved PCM algorithm are served as the weights for the training samples. In this paper we propose the robust version of SVDD by assigning a weight to each data point, which presents fuzzy membership degree of the cluster computed by the improved PCM method. Accordingly, we propose the multi classiﬁcation algorithm based on the robust weighted SVDD, and give the classiﬁcation rule. The rest of this paper is organized as follows. In Section 2 we give a brief description of support vector data description (SVDD) and the possibilistic c-means (PCM) clustering, then extend the PCM algorithm into the kernel space, and present the weights generating algorithm. In Section 3 we deﬁne the objective functions of the multi classiﬁcation based on the weighted SVDD, and present the proposed algorithm. In Section 4 we illustrate the new algorithm with examples by comparing its performance with other methods. Finally, the conclusion is given in Section 5. 2. Previous works In this section, we ﬁrst review the support vector data description and the original possibilistic c-means algorithm. Then we extend the PCM algorithm into the kernel space, and present the weights generating algorithm. 2.1. Support vector data description

where c is the center of the minimum enclosing sphere S and ni are slack variables allowing for soft boundaries. To penalize large distances to the center of the sphere, the quadratic optimization problem in Eq. (1), accordingly, is represented as follows:

min R2 þ C

X

c;R

ð4Þ

ni

i

under the constraints (3), where C > 0 is a penalty constant that controls the trade-off between the size of the sphere and the number of samples that possibly fall outside of the sphere. To solve this problem, we introduce the Lagrangian

L ¼ R2

X 2 X X ðR þ ni kUðxi Þ ck2 Þai bi ni þ C ni i

i

Using the Lagrange multiplier method (Tax & Duin, 2004), the above constrained quadratic optimization problem can be formulated as the Wolfe dual form. Differentiating in R, c, ni according to Eq. (5), and setting their derivative value is zero, we obtain

! X X oL ai ¼ 0 ) ai ¼ 1 ¼ 2R 1 oR i i X X oL ¼2 ai ðUðxi Þ cÞ ¼ 0 ) c ¼ ai Uðxi Þ oc i i oL ¼ C ai bi ¼ 0 ) C ¼ ai þ bi oni

W¼

X

ai Kðxi ; xi Þ

i

min R2 c;R

ð1Þ

subject to the constraints

kxi ck2 6 R2

ð2Þ

However, in real-world applications, training data of a class is rarely distributed spherically, even if the outermost samples are excluded. To have more ﬂexible descriptions of a class, one can ﬁrst transform the training samples into a high-dimensional feature space using a nonlinear mapping U and then compute the minimum enclosing sphere S in the feature space. Furthermore, to allow for the possibility of some samples falling outside of the sphere, one can relax the constraints (2), which require the distance from xi to the center c to be smaller than R for all xi, with a set of soft constraints

kUðxi Þ ck2 6 R2 þ ni ni P 0; for i ¼ 1; 2; . . . ; l

ð3Þ

ð6Þ ð7Þ ð8Þ

Substituting (6)–(8) back into (5), we obtain the dual formulation

max

Over the last several years, SVM has already been generalized to solve one-class problems. Tax and Duin (1999, 2004) presented a support vector data description (SVDD) method for classiﬁer designation and outlier detection. The main idea in Tax and Duin (1999, 2004) is to map data points by means of a nonlinear transformation to a high dimensional feature space and to ﬁnd the smallest sphere that contains most of the mapped data points in the feature space. This sphere, when mapped back to the data space, can be separated into several components, each enclosing a separate cluster of points. In real operations, they introduce the soft idea and kernel techniques from SVM and allow some data points outside the more general sphere. The advantage of SVDD lies in the fact that the algorithm is very intuitive and comprehensive. More speciﬁcally, given a set of training data fxi ; i ¼ 1; 2; . . . ; lg Rn , the minimum enclosing sphere S, which is characterized by its center c and radius R, can be found by solving the following constrained quadratic optimization problem:

ð5Þ

i

s:t:

0 6 ai 6 C;

X

X

ai aj Kðxi ; xj Þ

i;j

ai ¼ 1; j ¼ 1; 2; . . . ; l

ð9Þ

i

where the kernel K(xi, xj) = U(xi) U(xj) satisﬁes Mercer’s condition. Points xi with ai > 0, called support vectors (SVs), are needed in the description and lie on the boundary of the sphere. To test the point x, the distance to the center of the sphere has to be calculated. A test point x is accepted when this distance is smaller or equal than the radius:

kUðxÞ ck2 ¼ Kðx; xÞ 2

X i

ai Kðxi ; xÞ þ

X

ai aj Kðxi ; xj Þ 6 R2 ð10Þ

i;j

2

where R is the distance from the center of the sphere c to any support vector xi on the boundary. 2.2. Improved possibilistic c-means algorithm In the theories of SVDD, the training process is very sensitive to outlier and noise data. Quadratic form (4) uses parameter C to control the penalty degree of error classiﬁcation samples. If parameter C value is bigger, it denotes to set a bigger penalty value to error classiﬁcation samples, and to decrease error classiﬁcation data, while a smaller C denotes to ignore some ‘‘negligible” error classiﬁcation data, and to obtain a bigger classiﬁcation margin as a result. But parameter C value is invariable in the SVDD training process, which means all training data is comparably disposed. So SVDD training process is sensitive to some data, such as outlier, noise data, etc. Furthermore, it could result in ‘‘over-ﬁtting”. In order to decrease the effect of those outliers or noises, researchers assign each data point in the training dataset 1 membership and sum the deviations weighted by their memberships. If one data point is detected as an outlier, it is assigned with a low membership, so its contribution to total error term will be decreased. Unlike the equal treatment in standard SVM, this method fuzziﬁes the penalty term to reduce the sensitivity of less important data points.

8716

Y. Zhang et al. / Expert Systems with Applications 36 (2009) 8714–8718

Possibilistic c-means (PCM) algorithm, which was ﬁrst proposed in Krishnapuram and Keller (1993) and further explored in Krishnapuram and Keller (1996) to overcome the relative membership problem of fuzzy c-means (FCM) algorithm, has been shown to have satisﬁed ability of handling noises and outliers. Its basic idea is to relax this constraint and propose a possibilistic approach to clustering by minimizing the following object function:

JðU; VÞ ¼

c X n X

2 um ik kxk v i k þ

i¼1 k¼1

c X

gi

n X ð1 uik Þm

i¼1

ð11Þ

k¼1

where c is the number of clusters and selected as a speciﬁed value in this paper, nis the number of data points, uik 2 [0, 1] is the membership of xk in class i, m is the quantity controlling clustering fuzziness, and V is the set of cluster centers or prototypes (vi 2 Rp), gi are suitable positive numbers. The function J(U, V) is minimized by a famous alternate iterative algorithm (Krishnapuram & Keller, 1993, 1996). The ﬁrst term demands that the distances from data points to the prototypes be as low as possible, whereas the second term forces the uik to be as large as possible, thus avoiding the trivial solution. To make use of the PCM algorithm for the proposed weighted SVDD classiﬁer, we extend the original PCM into the kernel space by using kernel methods and develop the improved kernel-based PCM algorithm to generate weights in the kernel space instead of in the input space. Deﬁne a nonlinear map as /:x ? /(x) 2 F, where x 2 X. X denotes the data space, and F denotes the transformed feature space with higher dimension. Kernel-based PCM minimizes the following objective function:

JðU; VÞ ¼

c X n X

um ik k/ðxk Þ

2

/ðv i Þk þ

i¼1 k¼1

c X

n X gi ð1 uik Þm

i¼1

Kðxi ; xj Þ ¼ exp

ð12Þ

k¼1

! ð13Þ

r2

This kernel is independent of the position of the data with respect to the origin since only uses distances between objects. The parameter r is a width parameter that controls how tight the description is around the data. Obviously, when use the Gaussian function as the kernel function, K(x, x) = 1, k/(xk) /(vi)k2 = 2 2K(xk, vi). According to these, Eq. (12) can be rewritten as

JðU; VÞ ¼ 2

c X n X

um ik ð1 Kðxk ; v i ÞÞ þ

i¼1 k¼1

c X i¼1

gi

n X ð1 uik Þm

ð14Þ

k¼1

gi ¼ K

m k¼1 uik k/ðxk Þ Pn m k¼1 uik

/ðv i Þk2

¼K

2

Pn

m k¼1 uik ð1 Kðxk ; Pn m k¼1 uik

v i ÞÞ

ð15Þ

Typically, K is chosen to be 1, and we set K a constant with the value 1 in this paper. The minimization of (14) gives the updating equations for probability and cluster prototype as follows (Krishnapuram & Keller, 1993):

¼

1 1 þ ð2ð1 Kðxk ; v i ÞÞ=gi Þ1=ðm1Þ ð16Þ

vi ¼

Pn m k¼1 uik /ðxk Þ Pn m k¼1 uik

ð17Þ

The kernel-based PCM algorithm is implemented by iteratively updating (16) and (17) until the stopping criterion is satisﬁed. After partitioning, the weights of outliers and noises will be very small or even nearly to zero, such that the effect of outliers and noises to decision hyperplane is signiﬁcantly reduced. To reduce the training time in the weighted SVDD classiﬁer, we set a threshold rt for weights uik. If uik > rt, it denotes the training sample xk affects the ith class to some degree. Otherwise, training algorithm ignores the effect of xk to the ith class. 3. Weighted SVDD based on improved PCM 3.1. Weighted SVDD Let fðxi ; mi Þ; i ¼ 1; 2; . . . ; lg Rn be a given training data set, where mi is the weights of training sample xi. Once we compute weights mi of the training sample using the improved PCM method in Section 2, weighted SVDD problems can be subsequently described as follows:

min R2 þ C

X

c;R

mi ni

ð18Þ

i

subject to the constraints

for i ¼ 1; 2; . . . ; l

ð19Þ

where mi is the weights generalized by improved PCM method in Section 2. We can ﬁnd the solution to this optimization problem in dual variables by ﬁnding the saddle point of the Lagrange. Its corresponding Lagrangian formula is

L ¼ R2

X 2 X X ðR þ ni kUðxi Þ ck2 Þai b i ni þ C mi ni i

i

ð20Þ

i

Differentiating in R, c, ni according to Eq. (20), and setting their derivative value is zero, we obtain

! X oL ai ¼ 0 ¼ 2R 1 oR i X oL ai ðUðxi Þ cÞ ¼ 0 ¼2 oc i

ð21Þ ð22Þ

oL ¼ Cmi ai bi ¼ 0 oni

ð23Þ

P P respectively, leads to i ai ¼ 1, and c ¼ i ai Uðxi Þ. Substituting (21)–(23) back into (20), we obtain the dual formulation

max

W¼

X

ai Kðxi ; xi Þ

i

Referring to the recommendation in Krishnapuram and Keller (1993), gi can be obtained from the average possibilistic intra-cluster distance of cluster i as

Pn

1 1 þ ðk/ðxk Þ /ðv i Þk2 =gi Þ1=ðm1Þ

kUðxi Þ ck2 6 R2 þ ni ni P 0;

where k/(xk) /(vi)k2 = K(xk, xk) + K(vi, vi) 2K(xk, vi). Compare with Eqs. (11) and (12) uses the squared Euclidian distance between the transformed pattern /(xk) and cluster prototype /(xk) in the kernel space. In this paper, we adopt the Gaussian function as a kernel function:

kxi xj k2

uik ¼

s:t:

0 6 ai 6 Cmi ;

X

X

ai aj Kðxi ; xj Þ

i;j

ai ¼ 1; j ¼ 1; 2; . . . ; l

ð24Þ

i

We replace the inner products in the original form (24) by a kernel function K(xi, xj) = U(xi) U(xj). Several kernel functions have been proposed, including linear kernel, polynomial kernel, radial basis function kernel and sigmoid kernel. In our experiments we use a Gaussian kernel function as form (13). Solving the dual QP problem, we obtain the Lagrange multipliers a i, which give the center c of the minimum enclosing sphere S as a linear combination of xi,

8717

Y. Zhang et al. / Expert Systems with Applications 36 (2009) 8714–8718

c¼

X

ai xi

ð25Þ

i

classiﬁcation rule in Zhu et al. (2003) and Wang et al. (2007) through experiments in Section 4.

According to the Karush–Kuhn–Tucker (KKT) optimality conditions, we have

ai ¼ 0 ) kUðxi Þ ck2 < R2 and ni ¼ 0 0 < ai < Cmi ) kUðxi Þ ck2 ¼ R2 and ni ¼ 0 ai ¼ Cmi ) kUðxi Þ ck2 P R2 and ni P 0

4. Experimental results

ð26Þ

Therefore, only ai that correspond to training examples xi that lie either on or outside of the sphere are nonzero. All the remaining ai are zero and the corresponding training samples are irrelevant to the ﬁnal solution. It is clear that the only difference between standard SVDD and weighted SVDD is the upper bounds of Lagrange multipliers ai in the dual problem. In standard SVDD, the upper bounds of ai are bounded by a constant C while they are bounded by dynamical boundaries that are weight values Cmi in weighted SVDD. 3.2. The proposed algorithm description In this section, we present a method for multi classiﬁcation problems. Suppose that a set of training data (x1, y1), (x2, y2), . . . , (xl, yl) is given where l 2 R denotes the number of training samples, xi 2 Rn denotes an input pattern, and yi 2 R = {1,2, . . . , C} denotes its output class, where C is the number of classes. The central idea of the proposed method is to utilize the information of the fuzzy domain description generated by the fuzzy SVDD for estimating the distributions of each partitioned class data, and then to utilize the estimates to classify a data point via the given decision rule. The detailed procedure of the proposed method is as follows: Step 1. (Computing weights uik): Set a threshold rt, Use improved kernel-based PCM method to compute the weights uik of each training sample xi to each class k. For weight matrix U = {uik}, set uik be 0 if uik 6 rt. Step 2. (Data partitioning): Partition the given training data into C subsets fDp gCp¼1 according to their weights uik. For example, the pth subset, Dp, contains lp elements as follows:

Dp ¼ fðxi1 ; pÞ; . . . ; ðxilp ; pÞg

ð27Þ

where uic p P rt , for c = 1, . . . , lp. Step 3. (Weighted SVDD for each class data set): For each class data set Dp, we build a trained kernel support function via fuzzy SVDD. Speciﬁcally, we solve the dual problem Eq. (9). Let its solution be ai ; i ¼ i1 ; . . . ; ilp , and Jp {1,. . .,lp} be the set of the index of the nonzero ai . The trained kernel support function for each class data set Dp is then given by

fp ðxÞ ¼ Kðx; xÞ 2

X

ai Kðxi ; xÞ þ

i2J p

X

ai aj Kðxi ; xj Þ

p

Table 1 Data sets from UCI. Data set

#pts

#att

#class

Glass Ionosphere Sonar Pima-diabetes Iris Vehicle

214 351 208 768 150 846

9 34 60 8 4 18

6 2 2 2 3 4

ð28Þ

i;j2J p

Step 4. (Constructing a classiﬁcation rule for each class): We construct the following classiﬁcation rule for the new sample x:

FðxÞ ¼ arg max

The effectiveness of the proposed weighted SVDD algorithm to classiﬁcation problems is tested on benchmark data sets from the UCI machine learning repository http:/www.ics.uci.edu/~mlearn/ MLRepository.html, including glass, ionosphere, sonar, pima-diabetes, irisandvehicle. Their related parameters are listed in Table 1. In the Table 1, #pts denotes the number of data points, #att denotes the number of the attributes, and #class denotes the number of the classiﬁcations. On each data set, we compared our method to the standard SVM classiﬁer and to the sphere-based classiﬁer using the decision rule of Zhu et al. (2003) and Wang et al. (2007). We used Gaussian kernel for all algorithms and used the ﬁvefold cross-validation method to estimate the generalization errors of the classiﬁers. In the cross validation process, we ensured that each training set and each testing set were the same for all algorithms. Note that the generalization errors of the classiﬁers depend on the values of the kernel parameter r and the regularization parameter C. So the boundary description is strongly inﬂuenced by the width parameter r. For small values of r all the objects tend to be support vectors. The description approximates a Parzen density estimation with a small width parameter. For large values of r, the SVDD description approximates the original spherical solution. For intermediate values of r, a weighted Parzen density is obtained. Accordingly, let a threshold rt be 0.1 in the step 1 of our proposed algorithm. The value of the threshold rt inﬂuences the algorithm runtime. For small values of rt more training samples may be included in the extra class. In the experiments we standardized the data to zero mean. While using the Gaussian kernel, we determined the tuning parameters r2 and C with standard ﬁvefold cross validation. The initial search for optimal parameters r2 and C was done on a 10 10 uniform coarse grid in the (r2, C) space, namely, r 2 = [23, 21, . . . , 213, 215], C = [215,213, . . . , 21, 23].

lp ððRp Þ2 fp ðxÞÞ l ðRp Þ2

ð29Þ

where p = {1,2, . . . , C}, ¼ fp ðxi Þ for any support vector xi of the pth class, l 2 R denotes the number of the training samples, and lp 2 R denotes the number of the pth class in the training samples.In step 4, we propose the classiﬁcation rule based on the SVDD which is an optimal classiﬁer indicated by Lee and Lee (2007) according to the Bayesian decision theory, and show that it leads to a better generalization accuracy than the

Table 2 Comparison between SVMs and the sphere-based classiﬁers. Data set

Glass Ionosphere Sonar Pima-diabetes Iris Vehicle Iris Vehicle

The classiﬁcation error rate (%) SVM

The sphere-based classiﬁer

Proposed method (rt = 0.1)

27.57 5.12 10.62 27.48 2.67 12.64 5.42 19.86

28.54 4.84 15.12 27.00 2.72 12.60 5.64 19.32

27.51 4.68 11.38 26.82 2.68 12.60 3.89 17.58

8718

Y. Zhang et al. / Expert Systems with Applications 36 (2009) 8714–8718

The experimental results are summarized in Table 2. Note that values in Table 2 denote the error classiﬁcation rate. As we see, except on the Sonar and Iris data sets, using our proposed new algorithm, the performance of the resulting classiﬁer improves signiﬁcantly and is better than that of the standard SVM classiﬁer on four out of the six data sets being tested. Moreover, our proposed method also improved compare with the sphere-based classiﬁer in Zhu et al. (2003) and Wang et al. (2007) from experimental results. Additionally, to test algorthm sensitivity to outier data, we random add 10% ‘‘outier” samples to iris and vehicle, respectively, which are labeled as Iris* and Vehicle* in Table 2. Experimental results are shown as the last 2 rows in Table 2. Results indicate that the proposed method successfully reduce the effect of outliers and accordingly yield good generalization performance, especially existing outier data in data sets. 5. Conclusions In this paper, a weighted support vector data description (SVDD) is proposed, which can be used to deal with the outlier sensitivity problem in traditional multi-class classiﬁcation problems. The robust possibilistic c-means clustering algorithm is improved to generate different weight values for main training data points and outliers according to their relative importance in the training set. Experimental results show that the proposed method can reduce the effect of outliers and yield higher classiﬁcation rate than other existing methods do. Acknowledgements This work is supported by the Liaoning Doctoral Research Foundation of China (Grant No. 20081079), and the University Scientiﬁc Research Project of Liaoning Education Department of China (Grant No. 2008347). The authors are grateful for the anonymous reviewers who made constructive comments. References Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.

Hu, W. J., & Song, Q. (2004). An accelerated decomposition algorithm for robust support vector machines. IEEE Transactions on Circuits and Systems II: Express Briefs, 51(5), 234–240. Inoue, T., & Abe, S. (2001). Fuzzy support vector machines for pattern classiﬁcation. In Proceedings of international joint conference on neural networks (IJCNN ‘01) (Vol. 2, pp. 1449–1454). Krishnapuram, R., & Keller, J. M. (1993). A possibilistic approach to clustering. IEEE Transactions on Fuzzy System, 1(2), 98–110. Krishnapuram, R., & Keller, J. M. (1996). The possibilistic c-means algorithm: Insights and recommendations. IEEE Transactions on Fuzzy System, 4(3), 385–393. Lee, D., & Lee, J. (2007). Domain described support vector classiﬁer for multiclassiﬁcation problems. Pattern Recognition, 40, 41–51. Lin, C. F., & Wang, S. D. (2002). Fuzzy support vector machines. IEEE Transactions on Neural Networks, 13(2), 464–471. Lin, C. F., & Wang, S. D. (2003). Training algorithms for fuzzy support vector machines with noisy data. In Proceedings of the IEEE 8th workshop on neural networks for signal processing (pp. 517–526). Muller, K. R., Mike, S., Ratsch, G., et al. (2001). An introduction to kernel based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201. Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2002). Large margin DAG’s for multiclass classiﬁcation. In S. A. Solla, T. K. Leen, & K. R. Müller (Eds.). Advances in neural information processing systems (Vol. 12, pp. 547–553). Cambridge, MA: MIT Press. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge, MA: MIT Press. Schölkopf, B., Burges, C., & Vapnik, V. (1995). Extracting support data for a given task. In Proceedings of ﬁrst international conference on knowledge discovery and data mining (pp. 252–257). Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. Song, Q., Hu, W. J., & Xie, W. F. (2002). Robust support vector machine with bullet hole image classiﬁcation. IEEE Transactions on Systems, Man and Cybernetics, 32(4), 440–448. Tax, D., & Duin, R. (1999). Support vector domain description. Pattern Recognition Letters, 20, 1191–1199. Tax, D., & Duin, R. (2004). Support vector data description. Machine Learning, 54, 45–66. Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer. Wang, J., Neskovic, P., & Cooper, L. N. (2007). Bayes classiﬁcation based on minimum bounding spheres. Neurocomputing, 70, 801–808. Weston, J., & Watkins, C. (1998). Multi-class support vector machines, Department of Computer Science, Royal Holloway University of London Technical Report, SD2TR298204. Zhang, Y., Chi, Z. X., Liu, X. D., & Wang, X. H. (2007). A novel fuzzy compensation multi-class support vector machines. Applied Intelligence, 27(1), 21–28. Zhu, M. L., Chen, S. F., & Liu, X. D. (2003). Sphere-structured support vector machines for multi-class pattern recognition. Lecture notes in computer science (Vol. 2639, pp. 589–593). Berlin: Heidelberg, Springer.

Fuzzy multi-class classifier based on support vector data description and improved PCM

Fuzzy multi-class classifier based on support vector data description and improved PCM

Recommend Documents