Solving one-class problem with outlier examples by SVM

Solving one-class problem with outlier examples by SVM

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Solving o...

571KB Sizes 0 Downloads 28 Views

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Solving one-class problem with outlier examples by SVM Zhigang Wang a,b,n, Zengshun Zhao c, Shifeng Weng d, Changshui Zhang b a Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin 300384, P.R. China b State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing 100084, P.R. China c College of Electronics Communication and Physics, Shandong University of Science and Technology, Qingdao 266590, P.R. China d School of Electronics and Information, Zhejiang Wanli University, Ningbo, 315100, P.R. China

art ic l e i nf o

a b s t r a c t

Article history: Received 20 June 2013 Received in revised form 18 January 2014 Accepted 3 March 2014

Support Vector Data Description (SVDD) is an important algorithm for data description problem. SVDD uses only positive examples to learn a predictor whether an example is positive or negative. When a fraction of negative examples are available, the performance of SVDD is expected to be improved. SVDDneg, as an extension of SVDD, learns a predictor with positive examples and a fraction negative ones. However, the performance of SVDD-neg becomes worse than SVDD in some cases when some negative examples are available. In this paper, a new algorithm “SVM-SVDD” is proposed, in which both Support Vector Machine (SVM) and SVDD are used to solve data description problem with negative examples. The experimental results illustrate that SVM-SVDD outperforms SVDD-neg on both training time and accuracy. & 2014 Elsevier B.V. All rights reserved.

Keywords: SVM SVDD Data description

1. Introduction Binary classification is one of the most important problems in machine learning. In the binary classification problem, two classes of examples labeled þ1 and  1 are provided in the training step. The task is to learn a decisive function to predict the label of one unseen example. There are some classification algorithms have been developed such as SVM [1] and Boosting [2]. However, only one class examples are provided in some applications and no or only a few of examples from other classes. A decisive function is also required to judge whether one example comes from the given class or not. If one example is far different from the given class, we think it comes from non-given class with a high likelihood. Here, one example of the given class is called “positive example” or “target”. And one of nongiven class is called “negative example” or “outlier”. This problem is usually called data description or one-class classification [3]. Data description problem is usually caused by that one class of examples can be collected conveniently while ones from non-given class are difficult to obtain. Data description problem happens frequently in real life, cannot be solved directly by binary classification algorithms. A typical application of data description is machine monitoring system. Assume that we describe measurements from machine under normal condition. When the machine works under normal n Corresponding author at: State Key Laboratory on Intelligent Technology and Systems, Department of Automation, Tsinghua University, Beijing 100084, PR China. Tel.: þ 86 10 62796872. E-mail address: [email protected] (Z. Wang).

condition, we can collect a lot of targets easily. On the other hand only when the machine goes out of order, the outlier can be available. So the data description problem is also called outlier detection. Scholkopf et al. [4] made some modifications on classical twoclass SVM and proposed one-class SVM for data description problem. The idea of one-class SVM is to maximize the margin between the given class examples and origin in the feature space. Density Level Detection (DLD) [5] framework is proposed to find density level set to detect observations not belonging to the given class. According to DLD principle, a new modified SVM—DLD-SVM was developed to deal with one-class classification problem. The above algorithms are discriminative ones. On the other hand, data description problem can be taken as a traditional sample distribution estimation problem. So the existing density distribution estimation algorithms (such as Parzen Window [6] and Gaussian distribution Estimation [7]) can be used to solve one-class classification problem. However, the density distribution estimator usually requires many examples to achieve a high performance, while data description problem in many real settings cannot provide sufficient examples. Especially when each example is high-dimensional (for example several hundred or even thousand), this disadvantage of density distribution estimator is more obvious. SVDD is developed by Tax and Duin [8], which is constructed based on the hypothesis that the examples from the given class be inside a supersphere while the non-given class examples be outside. SVDD has become a popular method to solve data description problem and has been applied successfully to a lot of applications such as remote sensing [9,10], face detection and

http://dx.doi.org/10.1016/j.neucom.2014.03.072 0925-2312/& 2014 Elsevier B.V. All rights reserved.

Please cite this article as: Z. Wang, et al., Solving one-class problem with outlier examples by SVM, Neurocomputing (2014), http://dx. doi.org/10.1016/j.neucom.2014.03.072i

Z. Wang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

recognition [11,12], fault detection [13], and document retrieval [14]. Although there are several algorithms to learn a function to predict the label of one example with only one class examples, high accuracies are difficult to achieve for the scarcity of non-given class examples. In some real applications, some outliers are usually available, which can be used to improve the performance with only targets for training. To deal with data description with negative examples, [8] adapted SVDD to SVDD-neg [8], which can solve the problem of data description with negative examples. But SVDD-neg often gets worse performance than SVDD [8]. It is worse that SVDD-neg requires to solve a non-convex optimization problem and is difficult to obtain global optimal solution. A new algorithm SVM-SVDD is proposed to solve data description problem with negative examples. The experimental results illustrates that SVM-SVDD achieves better performances with less training time than SVDD-neg in benchmark data sets. The remaining is arranged as follows: Section 2 is the introduction on SVDD-neg. Section 3 proposes the new approach to solve one-class problem with negative examples. Section 4 presents our experiments to evaluate SVM-SVDD. We make a conclusion on our work in the last section.

that targets and outliers are separable in the feature space. Given the training set: N targets fxi g, i ¼ 1; 2; …; N and M outliers fxj g, j ¼ N þ 1; …; N þ M. The optimization objective function of SVDDneg is formulated as: N

NþM

i¼1

j ¼ Nþ1

min R2 þ C 1 ∑ ξi þC 2 s:t:

‖xi  a‖2 rR2 þ ξi ;

‖xj a‖2 Z R2  ξj ; C1 ¼

1

ν1 N

;

N

R2 þ C ∑ ξi i¼1

i ¼ 1; 2; …; N; ξi Z 0;

ð1Þ

where C penalizes the loss term ∑ξi and ξi are the slack variables. The value of C is decided by the expected upper fraction bound ν on misclassified targets: C¼

1 : Nν

ð2Þ

a is computed: N

a ¼ ∑ α i xi ; i¼1

0 r αi r C:

The value of

ð3Þ

αi can be divided into three categories:

αi ¼ 0 ) ‖xi  a‖2 o R2 ; 0 o αi o C ) ‖xi  a‖2 ¼ R2 ;

αi ¼ C ) ‖xi  a‖2 4 R2 : To predict an example v, the distance between v and a is computed: ‖v  a‖2 r R2 ) v is target: ‖v  a‖2 4 R2 ) v is outlier:

1

ν2 M

:

ð5Þ

N

‖v  a‖2 ¼ ðv  vÞ  2 ∑ αi ðv  xi Þ i¼1

N

ð6Þ

i¼1j¼1

A brief introduction on SVDD and SVDD-neg [8] is presented here. Given a set of targets fxi g; i ¼ 1; 2; …; N for training. The goal of SVDD is to learn a decision function to predict whether an example is a target or an outlier. SVDD is constructed based on the hypothesis that targets are enclosed by a close boundary in the feature space. The simplest form of closed boundary is a supersphere, which can be described with such two parameters: center a and radius R. Now a supersphere is required to enclose the given target examples with the smallest radius R. This optimization problem is formulated as

‖xi  a‖2 r R2 þ ξi ;

i ¼ 1; 2; …; N; ξi Z0;

j ¼ N þ1; N þ2; …; N þM; ξj Z 0;

The values of ν1 and ν2 are the misclassified fraction of targets and outliers respectively in the training step. For example, if 5% targets rejected and 1% outliers accepted is acceptable, C 1 ¼ 1=0:01  N and C 2 ¼ 1=0:05  M. To deal with nonlinear separability between targets and outliers, SVDD-neg can be rewritten in the formation of inner product. The decision function is

N

s:t:

C2 ¼

ξj ;

þ ∑ ∑ αi αj ðxi  xj Þ r R2 ;

2. Review on SVDD and SVDD-neg

min



ð4Þ

SVDD-neg is an extension of SVDD to deal with data description with negative examples. The given training set contains both N targets and M outliers. Intuitively, the given outliers should be outside the close boundary that encloses the targets. So the distance between an outlier a should be larger than R to ensure

where ðx  yÞ denotes inner product between two examples. Polynomial, RBF, sigmoid are used most frequently in real applications. In the remaining of this paper, we use RBF as default kernel function for its flexibility. 3. SVM-SVDD SVDD-neg and SVM share the similar idea and formulation. If a few outliers can be obtained in the training step, SVDD-neg is expected to achieve better performance than SVDD intuitively. However, SVDDneg provides worse performances with a high likelihood than SVDD when there are overlap areas between targets and outliers in the feature space [8]. In summary, there are several reasons that cause the worse performances on SVDD-neg. Firstly, SVDD-neg makes the given targets inside supersphere and outliers outside. But there is no separation gap between outliers and targets, which is easy to cause the overlap between targets and outliers. In contrast, there is a margin between two classes of examples in SVM. The margin in SVM makes two classes as far as possible. Secondly, the close boundary in SVDDneg has two tasks to do: the first task is to enclose most targets inside and the second is to make targets far from outliers. SVDD-neg is difficult to complete these two tasks well in parallel. Additionally, the objective function of SVDD-neg (Eq. (5)) is more difficult to solve than that of SVDD (Eq. (1)). Eq. (5) becomes a non-convex programming problem for the outliers in the training set. Non-convex programming is likely to converge to a local minimum rather and difficult to obtain the global minimum. If a global optimal solution is required, some advanced optimization algorithms (for example, [15,16]) can be applied to solve Eq. (5). No algorithm ensures it can obtain the global optimal solution from a non-convex programming problem. In contrast, Eq. (1) is a convex optimization problem, which is easy to obtain the global optimal solution. Now we have analyzed in detail that why performance of SVDDneg becomes worse with some outliers in some cases. Therefore, we design an algorithm for data description problem with negative examples, which uses SVM to improve the performance of SVDD. SVM [1,17] is a popular algorithm to solve binary classification problem. Given two classes of examples (positive and negative) in the training step. The label of positive example is þ 1 and negative example is  1. The number of positive examples is N and that of negative examples is M. Here the set fxi g; i ¼ 1; …; N þ M are the

Please cite this article as: Z. Wang, et al., Solving one-class problem with outlier examples by SVM, Neurocomputing (2014), http://dx. doi.org/10.1016/j.neucom.2014.03.072i

Z. Wang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

given positive and negative examples for training. fyi g; i ¼ 1; …; N þ M are the labels of xi , in which yi ¼ þ 1; i ¼ 1; …; N and yi ¼  1; i ¼ N þ 1; …; N þ M. SVM is to learn a decision function to predict the label of an example. The optimization formulation of SVM is [1]: N þM ‖w‖2 þ C ∑ ξi ; 2 i¼1

min s:t:

wxi þ b Z 1  ξi ; wxi þ br  1 þ ξi ;

i ¼ 1; …; N; i ¼ N þ 1; …; N þ M;

N N NþM ‖w‖2 þ r 2 þ C 0 ∑ ζ k þC 1 ∑ ξi þC 2 ∑ ξj ; 2 i¼1 j ¼ Nþ1 k¼1

w;r

s:t:

wxi þ bZ 1  ξi ; wxj þ b r  1 þ ξj ;

ð8Þ

SVM-SVDD (Eq. (8)) is to solve the data description problem with negative examples instead of SVDD-neg (Eq. (5)). The goal of Eq. (8) is to find both a hyperplane with analytic form y ¼ wx þ b and a supersphere with analytic form ‖x  a‖2 r r 2 . The hyperplane separates targets and outliers and the supersphere encloses targets at the same time. A large value is advised on C0 to make more targets enclosed inside supersphere. C1 and C2 adjust the error fractions on targets and outliers respectively. Both ξ and ζ are the slack variables. Eq. (8) can be reformulated in dual form: min α;β

N 1 NþM ∑ y y α α ker 1 ðxi ; xj Þ þ ∑ β i β j ker 2 ðxi ; xj Þ 2 i;j ¼ 1 i j i j i;j ¼ 1 NþM

N

i¼1

i¼1

 ∑ αi  ∑ β i ker 2 ðxi ; xi Þ MþN

s:t: ∑ yi αi ¼ 0; 0 r αi r C 1 ði ¼ 1; …; NÞ; i¼1

0 r αi rC 2 ði ¼ N þ 1; …; N þ MÞ;

N

∑ βi ¼ 1; 0 r βi rC 0 ;

ð9Þ

i¼1

where there are two different kernel functions ker 1 ðÞ, ker 2 ðÞ. yi ¼ þ 1; i ¼ 1; …; N and yi ¼  1; i ¼ N þ 1; …; N þ M. Two different kernel functions make SVM-SVDD more flexible. α ¼ ðα1 ; …; αN þ M ÞT and β ¼ ðβ 1 ; …; β N ÞT is solved separately in Eq. (9) for the independence between α and β. If α is fixed, Eq. (9) becomes a convex quadratic programming problem. If β is fixed, Eq. (9) is also convex. The solutions of β and α are independent. Because Eq. (9) can be divided two independent quadratic programming problems—SVM (Eq. (1)) and SVDD (Eq. (7)), the time complexity of Eq. (9) is the sum of SVM and SVDD. The time complexities of SVM and SVDD are OððN sv þ M sv Þ3 Þ, OðN 3 Þ. Here Msv, Nsv are the number of support vectors in M, N. So the time complexity of Eq. (9) is OððN sv þ M sv Þ3 þ N 3 Þ. By comparison, Eq. (5) is a non-convex quadratic programming problem, whose time complexity is OððN þ MÞ3 Þ. In most cases, M sv þ N sv ≪M þ N, so SVM-SVDD usually gets its solution with less time than SVDD-neg. An example v is predicted as a target only when both Eqs. (10) and (11) hold ! f 1 ðvÞ ¼ sign

NþM

∑ yi αi ker 1 ðxi ; vÞ þ b

¼ þ1

! ¼ þ 1:

ð11Þ

i¼1j¼1

If f 1 ðvÞ ¼  1 or f 2 ðvÞ 1, the prediction of v is an outlier. In Eq. (8), C 0 ; C 1 ; C 2 are set as follows: 1

;

C0 ¼

ν0 N

C1 ¼

ν1 N

C2 ¼

ν2 M

1

;

1

;

ð12Þ

where ν0 ; ν1 is the upper bound on fraction of rejected targets in SVDD, SVM and ν2 is the upper bound on fraction of accepted outliers in SVM. The upper bound on fraction of rejected targets in the training step of SVM-SVDD satisfies the following: Theorem 1. In Eq. (8), the following holds: minðν0 ; ν1 Þ is the upper bound on the fraction of rejected targets.

i ¼ 1; …; N j ¼ N þ 1; …; N þ M

‖xk  a‖2 r r 2 þ ζ k ; k ¼ 1; 2; …; N; ξi Z 0; ξj Z 0; ζ k Z 0:

N

 ∑ ∑ αi αj ker 2 ðxi ; xj Þ

ð7Þ

where ξi are the slack variables and C controls the fraction on misclassified training examples. Based on SVM and SVDD, we propose a new algorithm SVMSVDD by the following formulation: min

N

3

Proof. According to [1,8], ν0 and ν1 are the upper fraction on rejected target examples of SVM and SVDD. Only when one example is accepted by SVM (Eq. (10)) and SVDD (Eq. (11)), it will be predicted as a target by SVM-SVDD. So minðν0 ; ν1 Þ is the upper bound on the fraction of rejected target examples. □ Figs. 1–3 illustrate the merits of SVM-SVDD to SVDD-neg. We choose linear kernel for a clear illustration: In Figs. 1–3 the circle encloses both target area (white) and outlier area (black part on the left). Now we get an area with no outlier inside and with as many targets as possible. The inner (centered with cross sign) circle is SVDDneg's result. The solid (not dashed) close boundary (solid part of straight line and solid part of circle) is the decision function from SVMSVDD. It is obvious that the close boundary of SVM-SVDD contains more target area (white) than SVDD-neg and meanwhile all outlier area is excluded. By comparison, the red circle (SVDD-neg) misses a part of white area, which suggests that SVM-SVDD reduces the number of false negative examples while keeping the number of false positive similar to that of SVDD-neg. Now, there are more reasons that the performance of SVMSVDD is better than that of SVDD-neg. We have analyzed the disadvantages of SVDD-neg in detail. Now let us investigate whether the drawbacks of SVDD-neg have been overcome in SVM-SVDD. First, the term of Eq. (8) corresponds to margin in SVM. The margin makes targets (positive class) far from outliers (negative class). So SVM-SVDD have stronger discriminative ability than SVDD-neg. Second, SVDD-neg must satisfy that the targets' boundary is close. In contrast, the hyperplane from SVM has no such limit, which can be either open or close. The close boundary is not fit for separation in some cases. It suggests that the hyperplane is more flexible than the hypersphere. In addition, the original SVDD-neg is a non-convex problem. By comparison, the formulation of SVM-SVDD can be solved by solving two convex problems, which is easier to solve than SVDD-neg. SVDD-neg is to solve a non-convex problem, which is difficult to get a global optimal solution.

ð10Þ

i¼1

N

f 2 ðvÞ ¼ sign r 2 ker 2 ðv; vÞ þ2 ∑ αi ker 2 ðv; xi Þ i¼1

Fig. 1. The distribution of target and outlier is shown. We consider the area inside circle. This circle contains the distributions of both target and outlier. The white area inside this circle contains targets and the black area inside the circle contains outliers. The task is to learn a close boundary to enclose targets and exclude outliers.

Please cite this article as: Z. Wang, et al., Solving one-class problem with outlier examples by SVM, Neurocomputing (2014), http://dx. doi.org/10.1016/j.neucom.2014.03.072i

Z. Wang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

The initial motivation of SVDD is to solve the data description problem without negative examples. This problem cannot be solved by SVM directly. So SVDD succeeds in the case only with positive examples. However, SVDD has no sufficient ability to deal with

negative examples. In contrast, SVM will treat targets and outliers as two different classes, which are of equal importance. However, SVM is only to find a kerneled hyperplane to separate targets and outliers. What we need is the distribution boundary to describe the target class accurately. It indicates that SVM cannot solve the data description problem with negative examples independently. Our proposed approach keep the advantages of both SVDD and SVM. Meanwhile, SVM-SVDD does not have the disadvantages of SVDD and SVM. So SVM-SVDD can achieve higher performance than SVDD-neg.

4. Experiments

Fig. 2. The inner (centered with cross sign) circle is the result of SVDD-neg with a linear kernel function. The inner circle excludes all outlier area at the cost of some target area outside the circle.

In this section, SVM-SVDD is evaluated on some benchmark data sets. The toolkit “libsvm” [18] and “dd_tools” [19] are used in our experiments. These data sets come from UCI maching learning repository [20]. The details are provided in Table 1 on these benchmark data sets. Each of these data sets contains both targets and outliers. We applies five-fold cross validation strategy to train and test these data sets. Table 1 illustrates the performances and training speeds of both SVM-SVDD and SVDD-neg. The results in Table 2 are mean values of 10 runs. The parameters in optimization problems of both SVM-SVDD and SVDD-neg are fine-tuned by grid search. In the second column (training time (seconds)) of Table 2, the training time is those of SVM-SVDD. Here, the accuracy results include three indexes: target error (first error), outlier error (second error) and total error. These three indexes are defined according to four abbreviations: TT (True Target), TO (True Outlier), FT (False Target), FO (False Outlier). These three definitions are provided as follows [17]: target error ¼

FO ; TT þ FO

outlier error ¼ total error ¼ Fig. 3. The solid (not dashed) boundary is the result of SVM-SVDD. Our solution includes all target area without any outlier inside.

Table 1 Datasets description. Dataset

Example number (target/outlier)

Dimension

a1a australian breast-cancer diabetes haberman german-credit

395/1210 307/383 239/444 500/268 225/81 700/300

123 14 10 8 3 24

FT ; TO þ FT

ð13Þ

FO þ FT : TT þ TO þ FT þFO

By comparison, SVM-SVDD finishes the training step with less time than SVDD-neg. In addition, SVM-SVDD achieves the higher accuracies than SVDD-neg. In Table 2, SVM-SVDD achieves higher improvement on target error at the cost of lower reduction on outlier error than SVDD-neg. The ROC curves of SVM-SVDD and SVDD-neg in Fig. 4 illustrates the advantages of SVM-SVDD. Each ROC curve of SVDD-neg in Fig. 4 falls below that of SVM-SVDD. This implies that SVM-SVDD achieves higher target acceptance rate at the same outlier acceptance rate.

5. Conclusion and future work SVDD-neg is to improve the performance of SVDD with only targets by finding a supersphere containing targets inside and

Table 2 The comparison of training time and accuracy (percentage) between SVM-SVDD and SVDD-neg is shown. In each blank of Table 2, the number before “/” is the result of SVMSVDD and after ‘/’ is of SVDD-neg. Data sets

Training time (s)

Target error (percentage)

Outlier error (percentage)

Total error (percentage)

a1a australian breast-cancer diabetes haberman german-credit

0.883/22.144 0.144/1.399 0.118/0.522 0.165/2.219 0.102/0.498 0.3131/1.8513

36.27/32.19 17.32/31.58 1.26/12.52 13.2/27.8 8.44/26.22 10.43/27.29

13.64/17.09 11.21/11.74 2.93/2.71 41.38/40.68 66.54/60.29 55.33/48.67

19.08/20.73 13.92/20.57 2.2/6.15 23.05/32.30 23.86/35.29 23.90/33.70

Please cite this article as: Z. Wang, et al., Solving one-class problem with outlier examples by SVM, Neurocomputing (2014), http://dx. doi.org/10.1016/j.neucom.2014.03.072i

Z. Wang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

ROC curve

5

ROC curve

1

1

0.9 0.9

targets acceptede (TP)

targets acceptede (TP)

0.8 0.7 0.6 0.5 0.4 0.3

0.8 0.7 0.6 0.5

0.2 0.4

SVM−SVDD

0.1

SVM−SVDD

SVDD_neg

SVDD_neg

0 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0

0.05

outliers accepted (FP)

0.9

0.9

0.8

0.8

0.7 0.6 0.5 0.4 0.3 0.2

0.25

0.7 0.6 0.5 0.4 0.3

0

0.05

0.1

0.15

0.2

SVM−SVDD

0.1

SVDD_neg

0

0.25

SVDD_neg

0

0.2

outliers accepted (FP)

0.4

0.6

0.8

1

outliers accepted (FP)

ROC curve

ROC curve 1

0.9

0.9

0.8

0.8

targets acceptede (TP)

1

0.7 0.6 0.5 0.4 0.3 0.2

0.7 0.6 0.5 0.4 0.3 0.2

SVM−SVDD

0.1 0

0.2

0.2 SVM−SVDD

0.1

targets acceptede (TP)

0.15

ROC curve 1

targets acceptede (TP)

targets acceptede (TP)

ROC curve 1

0

0.1

outliers accepted (FP)

0

0.2

0.4

0.6

0.8

SVM−SVDD

0.1

SVDD_neg

1

outliers accepted (FP)

0

SVDD_neg

0

0.2

0.4

0.6

0.8

1

outliers accepted (FP)

Fig. 4. The ROC curves comparison between SVDD-neg and SVM-SVDD. (a) The data set “a1a” ROC curves comparison. (b) The data set “australian” ROC curves comparison. (c) The data set “breast-cancer” ROC curves comparison. (d) The data set “diabetes” ROC curves comparison. (e) The data set “german-credit” ROC curves comparison. (f) The data set “haberman” ROC curves comparison.

outliers outside. But this task is difficult to complete when some outliers are available for training. The non-convex formulation of SVDD-neg is difficult to solve and time consuming. An algorithm SVM-SVDD is proposed to solve data description with negative examples efficiently. The objective function of SVM-SVDD can be solved by solving two convex quadratic programming problems.

The experimental results show that SVM-SVDD outperforms SVDD-neg on both prediction accuracy and training time. In the future work, we think the training time can be reduced furthermore without much accuracy loss if some dimensional reduction approaches [21–23] are utilized. In addition, some other neural network optimization methods such as [24–30] also can be utilized.

Please cite this article as: Z. Wang, et al., Solving one-class problem with outlier examples by SVM, Neurocomputing (2014), http://dx. doi.org/10.1016/j.neucom.2014.03.072i

Z. Wang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

Acknowledgments This work is supported by 973 Program (2013CB329503) NSFC (Grant no. 91120301) and Beijing Municipal Education Commission Science and Technology Development Plan key project under Grant KZ201210005007. References [1] V. Vapnik, Statistical Learning Theory, Wiley, New York, NY, 1998. [2] Y. Freund, R. Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, in: Computational Learning Theory, Springer, 1995, pp. 23–37. [3] M. Moya, M. Koch, L. Hostetler, One-Class Classifier Networks for Target Recognition Applications, Technical Report, SAND–93-0084C, Sandia National Labs., Albuquerque, NM (United States), 1993. [4] B. Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, SV estimation of a distributions support, Adv. Neural Inf. Process. Syst. 41 (1999) 42–44. [5] I. Steinwart, D. Hush, C. Scovel, A classification framework for anomaly detection, J. Mach. Learn. Res. 6 (1) (2006) 211. [6] L. Tarassenko, P. Hayton, N. Cerneaz, M. Brady, Novelty detection for the identification of masses in mammograms, in: Fourth International Conference on Artificial Neural Networks, 1995, pp. 442–447. [7] L. Parra, G. Deco, S. Miesbach, Statistical independence and novelty detection with information preserving nonlinear maps, Neural Comput. 8 (2) (1996) 260–269. [8] D. Tax, R. Duin, Support vector data description, Mach. Learn. 54 (1) (2004) 45–66. [9] C. Sanchez-Hernandez, D. Boyd, G. Foody, One-class classification for mapping a specific land-cover class: SVDD classification of Fenland, IEEE Trans. Geosci. Remote Sens. 45 (4) (2007) 1061–1073. [10] W. Sakla, A. Chan, J. Ji, A. Sakla, An SVDD-based algorithm for target detection in hyperspectral imagery, IEEE Geosci. Remote Sens. Lett. 8 (2) (2011) 384–388. [11] J. Seo, H. Ko, Face detection using support vector domain description in color images, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP'04), vol. 5, IEEE, 2004, pp. V–729. [12] S. Lee, J. Park, S. Lee, Low resolution face recognition based on support vector data description, Pattern Recognit. 39 (9) (2006) 1809–1812. [13] H. Luo, J. Cui, Y. Wang, A SVDD approach of fuzzy classification for analog circuit fault diagnosis with FWT as preprocessor, Expert Syst. Appl. 38 (8) (2011) 10554–10561. [14] T. Onoda, H. Murata, S. Yamada, Non-relevance feedback document retrieval based on one class SVM and SVDD, in: International Joint Conference on Neural Networks, IJCNN'06, IEEE, 2006, pp. 1212–1219. [15] L. Cheng, Z.-G. Hou, Y. Lin, M. Tan, W.C. Zhang, F.-X. Wu, Recurrent neural network for non-smooth convex optimization problems with application to the identification of genetic regulatory networks, IEEE Trans. Neural Netw. 22 (5) (2011) 714–726. [16] X. Hu, J. Wang, An improved dual neural network for solving a class of quadratic programming problems and its k-winners-take-all application, IEEE Trans. Neural Netw. 19 (12) (2008) 2022–2031. [17] K. Veropoulos, C. Campbell, N. Cristianini, Controlling the sensitivity of support vector machines, in: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI99), 1999. [18] 〈http://www.csie.ntu.edu.tw/  cjlin/libsvm/〉. [19] 〈http://ict.ewi.tudelft.nl/  davidt/dd_tools.html〉. [20] 〈http://archive.ics.uci.edu/ml/〉. [21] C. Hou, C. Zhang, Y. Wu, Y. Jiao, Stable local dimensionality reduction approaches, Pattern Recognit. 42 (9) (2009) 2054–2066. [22] C. Hou, C. Zhang, Y. Wu, F. Nie, Multiple view semi-supervised dimensionality reduction, Pattern Recognit. 43 (3) (2010) 720–730. [23] C. Hou, F. Nie, C. Zhang, D. Yi, Y. Wu, Multiple rank multi-linear SVM for matrix data classification, Pattern Recognit. 47 (1) (2014) 454–469. [24] Y. Xia, J. Wang, A recurrent neural network for solving nonlinear convex programs subject to linear constraints, IEEE Trans. Neural Netw. 16 (2) (2005) 379–386. [25] Y. Xia, G. Feng, J. Wang, A novel recurrent neural network for solving nonlinear optimization problems with inequality constraints, IEEE Trans. Neural Netw. 19 (8) (2008) 1340–1353. [26] J. Wang, Analysis and design of k-winners-take-all model with a single state variable and heaviside step activation function, IEEE Trans. Neural Netw. 21 (9) (2010) 1496–1506.

[27] L. Cheng, Z.-G. Hou, M. Tan, A delayed projection neural network for solving linear variational inequalities, IEEE Trans. Neural Netw. 20 (6) (2009) 915–925. [28] L. Cheng, Z.-G. Hou, M. Tan, Solving linear variational inequalities by projection neural network with time-varying delays, Phys. Lett. A 373 (20) (2009) 1739–1743. [29] L. Cheng, Z.-G. Hou, M. Tan, A neutral-type delayed projection neural network for solving nonlinear variational inequalities, IEEE Trans. Circuits Syst. II: Express Briefs 55 (8) (2008) 806–810. [30] L. Cheng, Z.-G. Hou, M. Tan, Relaxation labeling using an improved hopfield neural network, in: Intelligent Computing in Signal Processing and Pattern Recognition, Springer, 2006, pp. 430–439.

Zhigang Wang is a teacher with Tianjin University of Technology. He received his Ph.D. degree in department of automation, Tsinghua University. His research interests include machine learning, computer vision and pattern recognition.

Changshui Zhang received the B.S. degree in mathematics from Peking University, Beijing, China, in 1986, and the M.S. and Ph.D. degrees in control science and engineering from Tsinghua University, Beijing, in 1989 and 1992, respectively. He joined the Department of Automation, Tsinghua University, in 1992, and is currently a Professor. He has authored more than 200 papers. His current research interests include pattern recognition and machine learning. Prof. Zhang is currently an Associate Editor of the Pattern Recognition Journal. He is a member of the Standing Council of the Chinese Association of Artificial Intelligence.

Shifeng Weng received Ph.D. degree in department of automation, Tsinghua University. Now he works with School of Electronics and Information, Zhejiang Wanli University. His research interests include machine learning, computer vision and pattern recognition.

Zengshun Zhao received the Ph.D. degree in control engineering from the Institute of Automation, Chinese Academy of Sciences, in 2007. He is currently an associate professor at the College of Information and Electrical Engineering, Shandong University of Science and Technology, Qingdao, China. In 2011, he worked as a visiting scientist with Prof. C.S. Zhang. At Stinghua University. His research interests include machine learning, pattern recognition, computer vision and intelligent robot.

Please cite this article as: Z. Wang, et al., Solving one-class problem with outlier examples by SVM, Neurocomputing (2014), http://dx. doi.org/10.1016/j.neucom.2014.03.072i