The probabilistic constraints in the support vector machine

The probabilistic constraints in the support vector machine

Available online at www.sciencedirect.com Applied Mathematics and Computation 194 (2007) 467–479 www.elsevier.com/locate/amc The probabilistic const...

877KB Sizes 15 Downloads 22 Views

Available online at www.sciencedirect.com

Applied Mathematics and Computation 194 (2007) 467–479 www.elsevier.com/locate/amc

The probabilistic constraints in the support vector machine Hadi Sadoghi Yazdi

a,*

, Sohrab Effati b, Zahra Saberi

b

a

b

Engineering Department, Tarbiat Moallem University of Sabzevar, Sabzevar, Iran Department of Mathematics, Tarbiat Moallem University of Sabzevar, Sabzevar, Iran

Abstract In this paper, a new support vector machine classifier with probabilistic constrains is proposed which presence probability of samples in each class is determined based on a distribution function. Noise is caused incorrect calculation of support vectors thereupon margin can not be maximized. In the proposed method, constraints boundaries and constraints occurrence have probability density functions which it help for achieving maximum margin. Experimental results show superiority of the probabilistic constraints support vector machine (PC-SVM) relative to standard SVM.  2007 Elsevier Inc. All rights reserved. Keywords: Probabilistic constraints; Support vector machine; Margin maximization

1. Introduction Such learning only aims at minimizing the classification error in the training phase, and it cannot guarantee the lowest error rate in the testing phase. In statistical learning theory, the support vector machine (SVM) has been developed for solving this bottleneck. Support vector machines (SVMs) as originally introduced by Vapnik within the area of statistical learning theory and structural risk minimization [1] and create a classifier with minimized VC dimension. It have proven to work successfully on wide range applications of nonlinear classification and function estimation such as optical character recognition [2,3], text categorization [4], face detection in images [5], vehicle tracking in video sequence [6], nonlinear equalization in communication systems [7], and generating of fuzzy rule based system using SVM framework [8,9]. Basically, the support vector machine is a linear machine with some very nice properties. It is not possible for such a set of training data to construct a separating hyperplane without encountering classification error. In this case a set of slack variable are used for samples that reduce confidence interval. In this case, it may be formulated to a dual problem form and so slack variable is not appeared in the dual problem and is converted to separable form. Main motivation of this paper rely on probabilistic constraints and obtained results include asymmetric margin depend on to probability density function of the data classes and importance of each samples in determination of hyperplane parameters. *

Corresponding author. E-mail address: [email protected] (H.S. Yazdi).

0096-3003/$ - see front matter  2007 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2007.04.109

468

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

1.1. Related work on support vector machine In this sub-section some notes are expressed which researchers have considered to it in the field of support vector machine. 1.1.1. Large data training set in SVM Usually, SVMs are trained using a batch model. Under this model, all training data is given a priori and training is performed in one batch. If more training data is later obtained, or we wish to test different constraint parameters, the SVM must be retrained from scratch. But if we are adding a small amount of data to a large training set, assuming that the problem is well posed, then it will likely have only a minimal effect on the decision surface. Resolving the problem from scratch seems computationally wasteful. An alternative is to ‘‘warm-start’’ the solution process by using the old solution as a starting point to find a new solution. This approach is at the heart of active set optimization methods [10,11] and, in fact, incremental learning is a natural extension of these methods. In papers [12–15], incremental learning have been considered in field of SVM. In [23], the new set of points which are incremented, instead of being randomly generated, is generated according to a probability denotes the event that sample is a support vector. Selection of subset of data set from large data set is considered in [22] and solves a smaller optimization problem but notes as generalization ability is caused. 1.1.2. Kernel determination in SVM The introduction of kernel methods has made SVMs have nonlinear process ability. Presently, there are many Mercer kernels available such as Gaussian radial basis function kernel, sigmoid kernel, polynomial kernel, spline kernels, and others. These kernels must satisfy Mercer’s condition or they must be symmetric and positive semi definite. Here we will extend the range of usable kernels that are not required to satisfy the condition of the positive definite property. As we know, the introduction of kernel functions is based on the view of nonlinear mapping. With regard that feedforward neural networks and radial basis function neural networks have a good nonlinear mapping ability and approximation performance. So in [16] input space is mapped into a hidden space by a set of hidden using artificial neural networks then introduce the structural risk in the hidden space to implement hidden space support vector machines. In [21], authors try kernel is determined based on data properties. When all feature vectors are almost orthogonal, founded solution is nearly the center of gravity of the examples. Contrarily, when feature vectors are almost the same, the solution is approximated to that of the inhomogeneous and SVM have a linear kernel. 1.1.3. Training procedure in SVM Standard SVM use a quadratic optimization problem for training. In [24,25], least square SVM was proposed that use a set of linear equations in training. 1.1.4. Fuzzy SVM in creating of soft penalty term As shown in previous researches [18,19], SVM is very sensitive to outliers or noises since the penalty term of SVM treats every data point equally in the training process. This may result in the occurrence of overfitting problem if one or few data points have relatively very large values of slack variable. The fuzzy SVM (FSVM) to deal with the overfitting problem. FSVM is an extension of SVM that takes into account the different significance of the training samples. For FSVM, each training sample is associated with a fuzzy membership value. The membership value reflects the fidelity of the data; in other words, how confident we are about the actual class information of the data. The higher its value, the more confident we are about its class label. The optimization problem of the FSVM is formulated in [17,26] and have used in works such as [20,27,28]. In this method slack variable is scaled by

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

469

the membership value. The fuzzy membership values are used to weigh the soft penalty term in the cost function of SVM. The weighted soft penalty term reflects the relative fidelity of the training samples during training. Important samples with larger membership values will have more impact in the FSVM training than those with smaller values. We present probabilistic constraints in the SVM for the first time in this paper. Manifest features of the proposed method are, • Creating soft penalty term. • Reducing of effect of noisy samples in optimal hyperplane calculation. • Ability of adding confidence coefficient to training samples. The rest of the paper is organized as follows: Section 2 introduces the PC-SVM and its geometrical interpretation. Experimental discussion is mentioned in Section 3. Conclusions are given in Section 4. 2. The proposed probabilistic constraints SVM (PC-SVM) We first provide a brief describe on SVM and introduce a PC-SVM formulation. 2.1. Support vector machine formulation This sub-section appropriates to a brief introduction on SVM. Let S ¼ fðxi ; d i Þgni¼1 be a set of n training samples, where xi 2 Rm is an m-dimensional sample in the input space, and di 2 {  1, 1} is the class label of xi. SVM finds the optimal separating hyperplane with the minimal classification errors. Let w0 and b0 denote the optimum values of the weight vector and bias respectively. The hyperplane can be represented as wT0 x þ b0 ¼ 0;

ð1Þ

where w = [w1, w2, . . . , wm]T and x = [x1, x2, . . . , xm]T. w is the normal vector of the hyperplane, and b is the bias that is a scalar. The optimal hyperplane can be obtained by solving the following optimization problem [1]: n X 1 kwk2 þ C ni ð2Þ Minimize 2 i¼1 s:t:

d i ðwT xi þ bÞ P 1  ni ; ni P 0;

i ¼ 1; . . . ; n:

ð3Þ

Constrained optimization problem is performed using the Lagrange multipliers method. 1  ni  d i ðwT xi þ bÞ 6 0: Optimization continues as follows: n n X X 1 ni þ ai ½1  ni  d i ðwT xi þ bÞ; J ðw; b; a; ni Þ ¼ wT w þ C 2 i¼1 i¼1 n X oJ ¼w ai d i xi ¼ 0; ow i¼1 n oJ X ¼ ai d i ¼ 0; ob i¼1 oJ ¼ C  ai ¼ 0; oni

ð4Þ

ð5Þ

i ¼ 1; . . . ; n;

where C is the regularization parameter controlling tradeoff between margin maximization and classification error. C has to be selected by the user and ni is called the slack variable that is related to classification errors in SVM. The optimization problem can be transformed into the following equivalent dual problem:

470

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

Maximize

n X i¼1

subject to

n X

ai 

n X n 1X ai aj d i d j x i x j 2 i¼1 j¼1

ð6Þ

d i ai ¼ 0;

i¼1

0 6 ai 6 C; i ¼ 1; . . . ; n;

ð7Þ

where ai is the Lagrange multiplier. We get the following properties from the Kuan-Tucker conditions of optimization theory. 9 ai d i ¼ 0 = Dual feasible; i¼1 ; 0 6 ai 6 C n P

1  ni  d i ðwT xi þ bÞ 6 0;

i ¼ 1; . . . ; n Primal feasible;

ai ½1  ni  d i ðwT xi þ bÞ ¼ 0; li ni ¼ 0;

ð8Þ

i ¼ 1; . . . ; n;

i ¼ 1; . . . ; n:

li are Lagrange multipliers is used to enforce the nonnegativity of the slack variables ni for all i. At the saddle oJ point on ¼ 0. The evaluation of which yields so that ni=0 if ai < C. Consequently if ai + li = 0, the optimum i Lagrange multipliers, denoted by a0,i , we can compute the optimum weight vector w0 using n X a0;i d i xi ; w0 ¼ i¼1

ð9Þ

for

ai < C;  1  wT xi b¼ 1 þ wT xi

d i ¼ 1; d i ¼ 1:

2.2. Formulation of the proposed PC-SVM In the proposed algorithm optimal hyperplane can be obtained by solving the following optimization problem: n X 1 2 kwk þ C Minimize ni ð10Þ 2 i¼1 subject to Prðd i ðwT xi þ bÞ P ui  ni Þ P di ; ni P 0; i ¼ 1; . . . ; n;

ð11Þ

where ui are independent random variables with known distribution functions and 0 6 di 6 1 is value of effect of ith samples determination of optimal hyperplane position. Then (11) can be written as d i ðwT xi þ bÞ P F 1 i ðbi Þ;

ð12Þ

F 1 i ðÞ

where bi = 1  di, and is the inverse distribution function of the variable ui  ni, with i = 1, . . . , n; which has to be continuous. Similar to the conventional SVM, the optimization problem of PC-SVM can be transformed into its dual problem. One reason for moving to the dual form of the problem is that of, constraints in the dual form of the problem are significantly simpler than those in the primal form and other subject is that of in the dual form, the training data will appear only in the form of dot products. T F 1 i ðbi Þ  d i ðw xi þ bÞ 6 0:

ð13Þ

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

471

Optimization procedure continues as follows: n n X X 1 ni þ ai ½F 1 ðbi Þ  d i ðwT xi þ bÞ; J ðw; b; a; ni Þ ¼ wT w þ C 2 i¼1 i¼1 n X oJ ¼w ai d i xi ¼ 0; ow i¼1 n oJ X ¼ ai d i ¼ 0; ob i¼1 oJ ¼ C ¼ 0; i ¼ 1; . . . ; n; oni

ð14Þ

n

For solving this problem, it is converted to dual form. With given the training sample fðxi ; d i Þgi¼1 find the n Lagrange multipliers fai gi¼1 that maximize the objective function. n n X n X 1X Maximize ai F 1 ðbi Þ  ai aj d i d j x i x j ð15Þ 2 i¼1 j¼1 i¼1 subject to

n X

d i ai ¼ 0;

i¼1

0 6 ai ; i ¼ 1; . . . ; n:

ð16Þ

We get the following properties from the Kuan-Tucker conditions of optimization theory: 9 n P ai d i ¼ 0 = Dual feasible; i¼1 ; 0 6 ai F 1 ðbi Þ  d i ðwT xi þ bÞ 6 0; i ¼ 1; . . . ; n Primal feasible; ai ½F 1 ðbi Þ  d i ðwT xi þ bÞ ¼ 0;

ð17Þ

i ¼ 1; . . . ; n:

The optimum Lagrange multipliers, denoted by a0,i ,we may compute the optimum weight vector w0, bias b and slack variables ni respectively by using the following equations: n X a0;i d i xi ; w0 ¼ 

i¼1 1

F ðbi Þ  wT xi d i ¼ 1; F 1 ðbi Þ  wT xi d i ¼ 1; ni ¼ 1  d i ðwT xi þ bÞ; i ¼ 1; . . . ; n:



ð18Þ

2.3. Visualization of the proposed PC-SVM We start with one example for explaining of cogency of probabilistic constrains. Fig. 1 shows two classes but with these following notes: • Class 1 is condense and class 2 is disperse. • Sample of class 1 have high confidence in collecting and measurement but class 2 have not these properties. • Training data in class 2 polluted with noise whilst data capturing. As seen the standard SVM have found margin but without a priori knowledge about probability density function over confidence in collected data’s and level of added noise. If we know about above mentioned notes, we can create one soft margin based of reliability of data’s but in the standard SVM can not do it. We assume that, this reliability has semi normal PDF as follows:

472

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

Fig. 1. Captured margin using standard SVM.

Class 1 N s ðx; l1 ; r1 Þ : fu ðxÞ ¼ Class 2 N s ðx; l2 ; r2 Þ : fu ðxÞ ¼

8 < : 8 < :

1 ffiffiffiffi p e r1 2p

ðxl1 Þ2 2r2 1

1

1 ffiffiffiffi p e r2 2p

1

8x ;

ð19Þ

otherwise:

ðxl2 Þ2 2r2 2

8x ;

ð20Þ

otherwise;

Fig. 2. Confidence of data’s for two class. For better demonstration, class 1 with inverse probability and class 2 with positive probability have been shown.

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

473

Fig. 3. Effect of reliability over hyperplanes and moving of hyperplane as asymmetric. Hyperplane related to class 2 moves further but hyperplane class 1 have been moved slightly.

where Ns(x, li, ri) is probability density function which shows that reliability of data’s. fu(x) in (19) and (20) is PDF of u in (11). For x near the support vector (x) which are in the left of mean class, this PDF is normal and for samples that are far from mean class probability is one. This work helps for high effect of samples of far from support vector and lower effect of samples that is near to support vectors in calculation of parameters of optimal hyperplane. These probabilities show reliability value of class data’s. For presentation of those in one figure, probability of class 1 are shown to negative form and class 2 to positive form in Fig. 2. In this figure, old support vectors (SV) in conventional SVM have low reliability (probability near to zero) and center of class have maximum reliability (probability near to 1 for class 2 and near to 1 for class 1). So it is expected samples of far from conventional SV attract hyperplanes toward their selves. If reliability for samples of class 1 is bigger than class 2, it is expected hyperplane related to class 2 moves further but hyperplane class 1 have been moved slightly which is illustrated in Fig. 3. Margin is incremented but to form of asymmetric. One note must be interested about data points falls on wrong side of the optimal decision surface in the standard SVM. In PC-SVM, a priori knowledge can emphasis over these data points and use it in obtaining optimal hyperplane but in conventional SVM do not exist this capability. 3. Experimental results Firstly we define overlap between classes for generating of test data. If (r11, r12) is amplitude boundaries of class 1 and (r21, r22) is amplitude boundaries of class 2 and r21 (minimum value of class 2) is bigger than r12 (maximum value of class1) then in this case do not exist any overlap between two classes but if r21 = r12  g overlap for each class is according to following relation: jr21  r12 j ; r12  r11 jr21  r12 j ; Overlap2 ¼ 100 r22  r21

Overlap1 ¼ 100

ð21Þ ð22Þ

where Overlap1 is overlap between class 1 with class 2 and Overlap2 is overlap between class 2 with class 1. It must be noticed that synthetic data for two classes are generated by uniform density function in the above

474

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

defined boundaries. For showing capability of the PC-SVM algorithm relative conventional SVM we discuss following examples. Example 1. In case of test pattern is generated with (r11, r12) = (0.07, 0.27), (r21, r22) = (0.12, 0.52) then Overlap1 = 75%, Overlap2 = 37.5% and by 500 running of two algorithm, Recognition rate is obtained as shown in Fig. 4. We use 100 samples for training and 100 samples for test procedure. PDF of ui in (19), (20) for two classes are, Class 1: Ns (x, 1, 0.1), Class 2: Ns(x, 1, 0.4). Obtained results show that the proposed PC-

Fig. 4. Comparing recognition rate of two algorithm conventional SVM and the proposed PC-SVM for example 1 in 500 runs.

Fig. 5. Comparing recognition rate of two algorithm conventional SVM and the proposed PC-SVM for example 2 in 100 runs.

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

475

SVM behaves better than conventional SVM for total runs. Average of recognition over total runs for conventional SVM is 83.96% and for the PC-SVM algorithm is 88.65%. Example 2. In case of test pattern is generated with (r11, r12) = (0.03, 0.23), (r21, r22) = (0.18, 0.23) then Overlap1 = 25%, Overlap2 = 100% and by 100 running of two algorithm, Recognition rate is obtained as shown in Fig. 5. We use 100 samples for training and 100 samples for test procedure. PDF of ui in (19), (20) for two classes are, Class 1: Ns(x, 1, 0.7), Class 2: Ns(x, 1, 0.1). Obtained results show that the proposed PC-SVM behaves better than conventional SVM for total runs. Average of recognition over total runs for conventional SVM is 86.41% and for the PC-SVM algorithm is 93.35%.

Fig. 6a. The standard SVM algorithm in classification of classes with two features.

Fig. 6b. The proposed PC-SVM algorithm in classification of classes with two features.

476

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

Fig. 6c. A sample of Farsi number in each class.

Example 3. In this example, we study reasons of power of the PC-SVM relative standard SVM. As shown in Fig. 6a–6c, standard SVM generate illustrated margin but collected data’s for class 1 have low noise and high reliability but class 2 have not any, this subject must be applied in the classification problem. Result of classification by standard SVM for testing data for class 1 (illustrated with) has 50% error but after applying reliability factor using the proposed PC-SVM algorithm margin is added toward class 2 as shown in Fig. 7 and error is decreased to zero in testing data’s. Recognition rate of the standard SVM is 75% and the PC-SVM is 100% in test patterns. Example 4. Main target in this example is that of explain ununiversality and unreliability of collected training data. In this example, optical character recognition is discussed for comparing the PC-SVM and standard SVM. Character recognition is commonly known as optical character recognition (OCR) which deals with the recognition of optical characters. OCR has wide applications in modern society: Document reading and sorting, postal address reading, bank check recognition, form recognition, signature verification, digital bar code reading, map interpretation, engineering drawing recognition, and various other industrial and commercial applications. The difficulty of the text recognition greatly depends on the type of characters to be recognized. The difficulty varies from that needed to process relatively easy mono fonts to that of extremely difficult cursive text. In this example we focus over Farsi handwirrten number recognition. One rich data base is used and selects two patterns in this binary classification which include 54 samples from first class (number 2) and 86 samples from second class (number 5). Fig. 6a shows a sample of each class. Two features are extracted from picture of numbers because of we can show operation of classifiers (SVM and PC-SVM). After normalization of picture of numbers to size of 64 · 64, concentration of pixels in top of image is obtained as first feature (x1) and concentration of pixels in bottom of image as second feature (x2). This operation is shown in Fig. 7. To training number 2 is added noise as shown in Fig. 8 but number 5 is checked with normal conditions. Fig. 9 shows training and testing data’s in feature space (x2  x1). Confusion matrixes over test samples are given in Tables 1 and 2 which include result of recognition rate using standard SVM and the proposed PC-SVM. Moving of hyperplane relative to class 1 toward center of this class helps to better recognition of test data’s. Main reason for moving of hyperplane of class 1 is ununiversality and unreliability to collected data in class 1. In this case, PC-SVM is caused hyperplane of class 1 moves toward center of class 1 or same position by high reliability. In the Table 1, total of samples of class 1 is identified correctly but 14.1% of samples of class 2 is recognized as class 1 and 85.9% of them is identified

Fig. 7. Feature extraction from number 2 or class 1 (Farsi number ).

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

477

Fig. 8. Noisy images of number 2 or class 1(Farsi number ) in the left side And normal handwritten number 5 or class 2 (Farsi number ) in the right side

Fig. 9. Training and testing data’s in feature space (x2  x1). Hyperplanes are obtained using standard SVM but in the PC-SVM hyperplane of class 1 moves to center of class 1 and result will be higher recognition rate.

Table 1 Confusion matrix of the the PC-SVM Class name

Class 1 as for number 2

Class 2 as for number 5

Class 1 as for number 2 Class 2 as for number 5

100 14.10

0 85.90

correctly. Finally, obtained recognition rate using the PC-SVM is 92.30% but recognition rate of the standard SVM is 76.28% which indicates superiority of the proposed approach.

478

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

Table 2 Confusion matrix of the standard SVM Class name

Class 1 as for number 2

Class 2 as for number 5

Class 1 as for number 2 Class 2 as for number 5

100 58.97

0 41.03

4. Conclusions Train noisy data are caused optimal hyperplane is not found in suitable position. The probabilistic constrain help in support vector classifier which margin is maximized based on reliability factor. The results showed that the proposed algorithm have higher capability relative standard SVM. In the next work we will presented automatic approach for finding parameters the PC-SVM algorithm include PDF of ui in (19), (20) and boundaries of probability di in (11) according to density of class and figure of class. Novelty of the proposed algorithm helps for applying of the reliability of collected trainnig data’s in obtaining of optimum hyperplane. References [1] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. [2] Y. LeCun, L. Botou, L. Jackel, H. Drucker, C. Cortes, J. Denker, I. Guyon, U. Muller, E. Sackinger, P. Simard, V. Vapnik, Learning algorithms for classification: A comparison on handwritten digit recognition, Neural Network (1995) 261–276. [3] C. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition: bench-marking of state-of-the-art techniques, Pattern Recognit. 36 (2003) 2271–2285. [4] T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: C. Nedellec, C. Rouveirol (Eds.), Proc. Europ. Conf. Mach. Learn., Berlin, Germany, 1998, pp. 137–142. [5] E. Osuna, R. Freund, F. Girosi, Training support vector machines: an application to face detection, IEEE Conf. Comput. Vis. Pattern Recognit. (Jan.) (1997) 130–136. [6] S. Avidan, Support vector tracking, IEEE Trans. Pattern Anal. Mach. Intell. 26 (8) (2004) 1064–1072. August. [7] D.J. Sebald, James A. Bucklew, Support vector machine techniques for nonlinear equalization, IEEE Trans. Signal Process. 48 (11) (2000) 3217–3226, November. [8] J.-H. Chiang, P.-Y. Hao, Support vector learning mechanism for fuzzy rule-based modeling: a new approach, IEEE Trans. Fuzzy Syst. 12 (1) (2004) 1–12, February. [9] Y. Chen, J.Z. Wang, Support vector learning for fuzzy rule-based classification systems, IEEE Trans. Fuzzy Syst. 11 (6) (2003) 716– 728, December. [10] R. Fletcher, Practical Methods of Optimization, Wiley, New York, 1981. [11] T.F. Coleman, L.A. Hulbert, A direct active set method for large sparse quadratic programs with simple bounds, Math. Program. 45 (1989) 373–406. [12] A. Sanderson, A. Shilton, Image tracking and recognition with applications to fisheries—an investigation of support vector machines, Final year honors thesis, Dept. Elect. Electron. Eng., Univ. Melbourne, Melbourne, Australia, 1999. [13] M. Palaniswami, A. Shilton, D. Ralph, B. D. Owen, Machine learning using support vector machines, in: Proc. Int. Conf. Artificial Intelligence Science and Technology (AISAT), 2000. [14] A. Shilton, M. Palaniswami, D. Ralph, A.C. Tsoi, Incremental training of support vector machines, in: Proc. Int. Joint Conf. Neural Networks (IJCNN), 2001, CD-ROM. [15] A. Shilton, M. Palaniswami, D. Ralph, A.C. Tsoi, Incremental training of support vector machines, IEEE Trans. Neural Networks 16 (1) (2005) 114–131, January. [16] L. Zhang, W. Zhou, L. Jiao, Hidden space support vector machines, IEEE Trans. Neural Networks 15 (6) (2004) 1424–1434, November. [17] C.F. Lin, S.D. Wang, Fuzzy support vector machines, IEEE Trans. Neural Networks 13 (2) (2002) 464–471, March. [18] I. Guyon, N. Matic, V. Vapnik, Discovering informative patterns and data cleaning, in: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA, 1996, pp. 181–203. [19] X. Zhang, Using class-center vectors to build support vector machines, in: Proc. IEEE Workshop Neural Networks Signal Process. (NNSP’99), Madison, WI, 1999, pp. 3–11. [20] Y.-H. Liu, Y.-T. Chen, Face recognition using total margin-based adaptive fuzzy support vector machines, IEEE Trans. Neural Networks 18 (1) (2007) 178–192, January. [21] K. Ikeda, Effects of kernel function on nu support vector machines in extreme cases, IEEE Trans. Neural Networks 17 (1) (2006) 1–9, January.

H.S. Yazdi et al. / Applied Mathematics and Computation 194 (2007) 467–479

479

[22] K.-M. Lin, C.-J. Lin, A study on reduced support vector machines, IEEE Trans. Neural Networks 14 (6) (2003) 1449–1459, November. [23] P. Mitra, C.A. Murthy, S.K. Pal, A probabilistic active support vector learning algorithm, IEEE Trans. Pattern Anal. Mach. Intell. 26 (3) (2004) 413–418, March. [24] D. Tsujinishi, S. Abe, Fuzzy least squares support vector machines for multi-class problems, Neural Networks Field 16 (2003) 785– 792. [25] E. C ¸ omak, K. Polat, S. Gunes, A. Arslan, A new medical decision making system:lLeast square support vector machine (LSSVM) with fuzzy weighting pre-processing, Expert Syst. Appl. 32 (2007) 409–414. [26] H.-P. Huang, Y.-H. Liu, Fuzzy support vector machine for pattern recognition and data mining, Int. J. Fuzzy Syst. 4 (3) (2002) 826– 8235. [27] T.-Y. Wang, H.-M. Chiang, Fuzzy support vector machine for multi-class text categorization, Inform. Process. Manage. (2006), doi:10.1016/j.ipm.2006.09.011. [28] C.-H. Yang, L.-C. Jin, L.-Y. Chuang, Fuzzy support vector machines for adaptive Morse code recognition, Med. Eng. Phys. 28 (2006) 925–931.