Neurocomputing 129 (2014) 146–152
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
A new classification model using privileged information and its application Zhiquan Qi a, Yingjie Tian a,n, Yong Shi a,b a b
Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100190, China College of Information Science & Technology, University of Nebraska at Omaha, Omaha, NE 68182, USA
art ic l e i nf o
a b s t r a c t
Article history: Received 11 May 2013 Received in revised form 16 September 2013 Accepted 21 September 2013 Communicated by Xianbin Cao Available online 29 October 2013
In human's behavior and cognition, teachers always play an important role. However, in the field of machine learning, the information offered by the teacher is seldom applied. In this paper, inspired by Vapnik et al., we propose a fast learning model using privileged information, which uses two smallersized Linear Programming (LP) model to take place of a larger Quadratic Programming (QP) model and applies two nonparallel hyperplanes to construct the final classifier. After that, we introduce the Learning model Using Privileged Information (LUPI) into the Visual Tracking Object (VOT) field, which can accelerate the convergence rate of learning and effectively improve the quality. In detail, we give the clear definition of the privileged information about VOT problem and propose a simple but effective online object tracking algorithm using privileged information, and all experimental results show the robustness and effectiveness of the proposed method, at the same time show the privileged information provides a great help for further improving the quality. Crown Copyright & 2013 Published by Elsevier B.V. All rights reserved.
Keywords: Classification Kernel learning Privileged information
1. Introduction Recently, machine learning methods have been applied in computer vision field [1–3]. For the traditional supervised binary classification problem, the leaner's goal is to find a decision function with the small generalization error on the unknown test examples for a given training data: fðx1 ; y1 Þ; …; ðxl ; yl Þg, where xi A Rn is the input data and yi A f 1; 1g is the corresponding label. However, in human learning, there is also a lot of teacher information such as explanations, comments, comparisons and so on along with examples. Vapnik et al. [4–6] called this kind of additional prior information as the privileged information, which is only available at the training stage and never available for the test set. They proposed a new learning model, called Learning Using Privileged Information (LUPI), which can accelerate the convergence rate of learning especially when the learning problem itself is hard. This model has been proven to be a powerful machine learning technique in theory [4–6]. Compared with semi-supervised learning, in the optimistic case, the LUPI model can improve the bound for the probability of test error from pffi Oð1= lÞ to Oð1=lÞ, where l is the number of training data. However, the semi-supervised learning model with l labeled and n unlabeled
n
Corresponding author. Tel.: þ 86 10 82680997. E-mail addresses:
[email protected] (Z. Qi),
[email protected] (Y. Tian),
[email protected] (Y. Shi).
pffiffiffiffiffiffiffiffiffi data can only reach the bound Oð1= l þ nÞ, the LUPI model can thus significantly outperform it [7]. Though the LUPI method has had a great success, the number of variables in the LUPI-SVM model is more than two times of the standard SVM and usually needs to solve a more difficult optimization problem than the standard SVM [7]. In this paper, we propose a Fast Twin Support Vector Machine Using Privileged Information (called FTSVMPI), whose classifiers can be obtained by solving two smaller LP model. In addition, another important contribution of this paper is that we introduce the privileged information's concept into VOT field for the first time at the same time design a simple but effectively VOT algorithm using privileged information (called O2TUPI1). All experimental results both digit recognition datasets and visual object tracking datasets show the robustness and effectiveness of the proposed methods, simultaneously, also show O2TUPI1 achieves superior results with realtime performance with the help of the privileged information. The paper is organized as follows. In Section 2, we will briefly review the mechanism of Learning Using Privileged Information (LUPI) proposed by Vapnik and Vashist [5], and then give our new model: Fast Twin Support Vector Machine Using Privileged Information (FTSVMPI) for both the linear and nonlinear case. The application of VOT problem and all experimental results are described in Section 3. Finally, Section 4 gives the final conclusions. All mathematical notation and corresponding background material are described in the following. All vectors will be column
0925-2312/$ - see front matter Crown Copyright & 2013 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.09.045
Z. Qi et al. / Neurocomputing 129 (2014) 146–152
vectors unless transposed to a row vector by a prime superscript > . For a vector x in the n-dimensional input space Rn, the sign function sign(x) is defined as sgnðxi Þ ¼ 1 if xi 4 0 else sgnðxi Þ ¼ 1 if xi r 0 , for i ¼ 1; …; n. The inner product of two vectors x and y in the n-dimensional real space Rn will be denoted by ðx yÞ and the 2-norm of w will be denoted by J w J 22 . For a matrix A A Rmn , Ai is the ith row of A which is a row vector in Rn, while Aj is the jth column of A. A column vector of ones of arbitrary dimension will be denoted by e. For A A Rmn and B A Rnk , the kernel KðA; BÞ maps from Rmn Rnk into Rmk . In particular, if x and y are column vectors in Rn then, Kðx; yÞ will be a real number, Kðx; AÞ will be a row vector in Rm and KðA; AÞ will be a m m matrix.
147
The corresponding dual problem is as follows: l
∑ αj
max α;β
j¼1
l 1 l ∑ ∑ yi yj αi αj ðxi xj Þ; 2i¼1j¼1
l
∑ αi yi ¼ 0;
s:t:
i¼1 l
∑ ðαi þ βi CÞ ¼ 0;
i¼1 l
∑ ðαi þ βi CÞxni ¼ 0;
i¼1
αi Z 0; βi Z 0;
i ¼ 1; …; l
ð9Þ
For the nonlinear case, introducing two transformations: x ¼
ΦðxÞ : Rn -H and xn ¼ Φn ðxn Þ : Rm -Hn , the primal problem are 2. Fast twin support vector machine using privileged information (FTSVMPI)
constructed as follows:
2.1. Learning model using privileged information (LUPI) [4]
w;wn ;b;b
min
yi ½ðw Φðxi ÞÞ þ b Z 1 ½ðwn Φ ðxni ÞÞ þ b ;
T ¼ ðx1 ; xn1 ; y1 Þ; …; ðxl ; xnl ; yl Þ
ð1Þ
where xi A Rn , xni A Rm , yi A f 1; 1g,i¼ 1,…,l, and the privileged information xni is only included in the training input ðxi ; xni Þ, while not in any testing input x. In order to find a real valued function g (x) in Rn, such that the value of y for any x can be predicted by the decision function f ðxÞ ¼ sgnðgðxÞÞ:
ðwn Φ ðxni ÞÞ þ b Z 0;
ðw0 xÞ þ b0 ¼ 0:
ð4Þ
The oracle function ξðxÞ of the input x is defined as follows:
ξ0 ¼ ξðxÞ ¼ ½1 yððw0 xÞ þ b0 Þ þ ; where
(
½η þ ¼
ð5Þ
η; if η Z 0; 0;
α ;β
l
ϕðxn Þ ¼ ðwn xn Þ þ bn :
ð7Þ
Replacing ξi ði ¼ 1; …; lÞ by ϕðxi Þ in the primal problem of SVM, we can get the following primal problem: n
l 1 n J w J 2 þC ∑ ½ðwn xni Þ þ b ; 2 i¼1
min
s:t:
i¼1 l
∑ ðαi þ βi CÞ ¼ 0;
i¼1 l
∑ ðαi þ βi CÞK n ðxni ; xnj Þ ¼ 0;
i¼1
αi Z 0; βi Z 0;
n
ðw xi Þ þb Z0;
i ¼ 1; …; l:
ð11Þ
Let us reconsider the above classification problem with l1 positive points and l2 negative points. Suppose that the positive training points and their additional information (privileged information) are denoted by A A Rl1 n and An A Rl1 m , where each row of A A Rn and An A Rm represents a training point and an additional information. Similarly, B A Rl2 n and Bn A Rl2 m represent all the data points, and its additional information that belongs to the negative class.
i ¼ 1; …; l:
2.2.1. Linear case Similar to [8–14], in order to improve the training speed of LUPI, we first use two small models to construct the classifier. Replacing slack variables by ϕðAni Þ and ϕðBni Þ in the primal problem of TWSVM(Twin support vector machine) [8] and using two linear correcting functions to approximate the related oracle functions:
ϕðAni Þ ¼ ðwn Ani Þ þ bn ; ϕðBni Þ ¼ ðwnþ Bni Þ þ bnþ ; n
m1
n
ð13Þ m2
n
n
where w þ A R ; w A R ; b þ ; b A R, and is a dot product operation. The corresponding model can be formulated as
s:t: ð8Þ
ð12Þ
and
min
n
n
j ¼ 1; …; l;
2.2. FTSVMPI
n
w þ ;wnþ ;b þ ;b þ
yi ½ðw xi Þ þ b Z 1 ½ðwn xni Þ þ b ; n
∑ αi yi ¼ 0;
s:t:
ð6Þ
otherwise:
If we could know the value of the oracle function on each 0 training input xi such as we know the triplets ðxi ; ξi ; yi Þ with 0 ξi ¼ ξðxi Þ; i ¼ 1; …; l, we can accelerate its learning rate. However, in fact, a teacher does not know either the values of slacks or the oracle function. Instead, Vapnik et al. use a so-called correcting function to approximate the oracle function. In the linear case,
w;wn ;b;bn
ð10Þ
l l 1 l ∑ ∑ yi yj αi αj Kðxi ; xj Þ ∑ αj ; 2i¼1j¼1 j¼1
min
Definition 2.1 (Oracle function). Given a traditional classification problem with the training set
Suppose there exists the best but unknown linear hyperplane:
i ¼ 1; …; l:
Similarly, we can give its dual programming:
In order to explain the basic idea of LUPI, we first introduce the definition of oracle function [4].
ð3Þ
n
n
ð2Þ
T ¼ fðx1 ; y1 Þ; …; ðxl ; yl Þg:
n
n
s:t: Different with standard binary classification problem, LUPI is given a training set as follows:
l 1 n J w J 22 þC ∑ ½ðwn Φðxni ÞÞ þ b ; 2 i¼1
n
1 n n n J Aw þ þ e þ b þ J 22 þ c1 e > ðB w þ þ e b þ Þ; 2 n
ðBw þ þe b þ Þ Z e ðBn wnþ þ e b þ Þ; n
Bn wnþ þ e b þ Z 0;
ð14Þ
148
Z. Qi et al. / Neurocomputing 129 (2014) 146–152
where Kn is a kernel function: K n ðxni ; xnj Þ ¼ ðΦðxni Þ Φðxnj ÞÞ, > > n n n n M n ¼ ½An Bn nl , λ þ ; λ A Rl , and b þ ; b A R. Therefore the optimization problems for the nonlinear case are constructed as
and 1 n n n J Bw þe b J 22 þ c2 e > þ ðA w þ e þ b Þ 2
min
w ;wn ;b ;bn
n
ðAw þ e þ b Þ Z e þ ðAn wn þe þ b Þ;
s:t:
n
An wn þ e þ b Z 0;
ð15Þ
where c1 ; c2 Z0 are the pre-specified penalty factors, e þ ; e are vectors of ones of appropriate dimensions. Next, we use 1-norm distance to replace the square of the 2norm of model (14) and (15). Specifically, J Aw þ þ e þ b þ J 22 is replaced by J Aw þ þ e þ b þ J 1 , which can be easily converted to a linear term e > þ α with the corresponding constraint α r Aw þ þ e þ b þ r α, where α ¼ fα1 ; …; αl1 g. So the optimization problem (14) is replaced by min
w þ ;wnþ ;b þ ;b þ ;α n
n
ð16Þ
1 > n n n e β þ c2 e > þ ðA w þ e þ b Þ 2
n
An wn þ e þ b Z 0:
and
f ðxÞ ¼ w > x þ b ¼ 0;
ð18Þ
where w þ ; w A Rn ; b þ ; b A R. A new data point x A Rn is then assigned to the positive or negative class, depending on which of the two hyperplanes it lies closer to, i.e. f ðxÞ ¼ argminfd þ ðxÞ; d ðxÞg;
ð19Þ
þ;
where d ðxÞ ¼ jw > x þb j;
ð20Þ
and j j is the perpendicular distance of point x from the planes > w> þ x þ b þ ¼ 0 or w x þ b ¼ 0. 2.2.2. Non-linear case Now we extend the linear FTSVMPI to the non-linear case. Similar to the linear case, two hyperplanes f þ ðxÞ ¼ ðw þ ΦðxÞÞ þ b þ ¼ 0 and f ðxÞ ¼ ðw ΦðxÞÞ þ b ¼ 0 are considered, where ΦðÞ is a non-linear mapping from a low dimensional space to a higher dimensional Hilbert space H. According to Hilbert space theory([15]), w þ and w can be expressed as w þ ¼ ∑li1¼þ1l2 ðλ þ Þi
Φðxi Þ ¼ ΦðMÞλ þ and w ¼ ∑li1¼þ1l2 ðλ Þi Φðxi Þ ¼ ΦðMÞλ , respecn n tively. Similarly wnþ ¼ ∑li1¼þ1l2 ðλ þ Þi Φðxni Þ ¼ ΦðM n Þλ þ and wn ¼ n l1 þ l2 n n n ∑i ¼ 1 ðλ Þi Φðxi Þ ¼ ΦðM Þλ . So the two hyperplanes turn to be the following kernel-generated formulations: Kðx > ; M > Þλ þ þ b þ ¼ 0;
Kðx > ; M > Þλ þ b ¼ 0; is
a
>
n
>
>
n
kernel
function:
Kðxi ; xj Þ ¼ ðΦðxi Þ Φðxj ÞÞ,
>
e þ ðKðAn ; M n Þλ þ e þ b Þ; n
n
n
n
ð24Þ
n
K n ðxn ; M n Þλ þ b ¼ 0;
2.2.3. Discussion Since LUPI-SVM model is more than two times slower than the standard SVM and usually needs to solve a more difficult optimization problem than the standard SVM, we improve the LUPI model by the following two ways: reducing the model size and using L-1 norm regularization term method. For the first way, not only does FTSVMPI accelerate the training speed but also inherits the virtue of TWSVM [8] which uses two nonparallel hyperplanes to construct a decision function, and has a better generalized capability than traditional LUGPI (Jayadeva et al. used a famous XOR datasets to fully confirm this viewpoint [8]). Furthermore, unlike TWSVM, our model avoids solving the inverse matrix whose 3 computational complexity is more than Oðl Þ and further reduce the model's training time. For the second way, FTSVMPI can obtain advantages as follows: (1) our model can help to perform feature ranking and selection in the learning process. In the result, the final classification rule found by our FTSVMPI might be more interpretable. (2) Since the computational cost of solving LP is much cheaper than solving QPP with the same scale, our model is usually much faster and cheaper than the training of LUPI. In fact, some recent works have adapted different strategies and methods to improve the speed and quality of SVM [16–19]. For example, Luo et al. proposed a manifold regularized multitask SVM learning algorithm to improve the quality of classification [17], Zhou and Tao et al. proposed a fast gradient method for SVM [19]. We are very interested in how to adding privileged information into these improved algorithm for SVM in the future work.
ðÞ
K n ðxn ; M n Þλ þ þ b þ ¼ 0; n
1 > n n n n> e φ þ c2 e > Þλ þ e þ b Þ þ ðKðA ; M 2
ð21Þ
denotes dot product operation, M > ¼ ½A > B > nl , λ þ ; λ A Rl , and b þ ; b A R. Correspondingly, the correcting function can be written as >
ð23Þ
ð17Þ
Finally, we get two nonparallel hyperplanes
K
n
Notice that problems (23) and (24) are two standard Linear Programming (LP).
β r Bw þe b r β;
where
n
ðKðA; M > Þλ þ e þ b Þ Z
>
n
d þ ðxÞ ¼ jw > þ x þb þ j;
>
K n ðBn ; M n Þλ þ þ e b þ 40;
ðKðAn ; M n Þλ þ e þ b Þ Z 0;
ðAw þ e þ b Þ Z e þ ðAn wn þe þ b Þ;
f þ ðxÞ ¼ w > þ xþbþ ¼ 0
n
φ r KðB; M > Þλ þ e b r φ;
Similarly, the optimization problem (15) can be converted to
s:t:
n
ϕ r KðA; M > Þλ þ þ e þ b þ r ϕ;
s:t:
Bn wnþ þe b þ Z 0:
n
>
e ðK n ðBn ; M n Þλ þ þ e b þ Þ;
min
α r Aw þ þe þ b þ r α;
min
ðKðB; M > Þλ þ þ e b þ Þ Z
λn ;bn ;λ ;b ;wn ;bn ;φ
ðBw þ þ e b þ Þ Z e ðBn wnþ þ e b þ Þ;
w ;wn ;b ;b ;β
s:t:
and
1 > n n n e α þ c1 e > ðB w þ þe b þ Þ; 2 þ n
s:t:
1 > n n n n n> e ϕ þc1 e > Þλ þ þ e b þ Þ; ðK ðB ; M 2 þ
min
λnþ ;bnþ ;λ þ ;b þ ;wnþ ;bnþ ;ϕ
ð22Þ
Fig. 1. Samples of “5” and “8” in different image resolutions. Images in the first and the third rows are 28 28; images in the second and fourth rows are 10 10. When the resolution of those images is reduced, some of them begin to become vague and incomplete and it is even hard to recognize them by human eyes.
Z. Qi et al. / Neurocomputing 129 (2014) 146–152
dSVM+ FTSVMPI
15
10
5 20
40
60
80
100
0.06 Running time (seconds)
Error rates (%)
20
0.05
149
dSVM+ FTSVMPI
0.04 0.03 0.02 0.01 0 20
Sizes of training data
40
60
80
100
Sizes of training data
Fig. 2. The results of the comparison between FTSVMPI and dSVMþ .
3. Experiments 3.1. Digit recognition problem We, firstly compare the FTSVMPI against dSVMþ 1 on MNIST datasets in this section. c1 ; c2 and RBF kernel parameter s are all selected from the set f2i ji ¼ 7; …; 7g. The experimental environment: Intel Core i7-2600 CPU, 4 GB memory. The “LOQO” function2 is employed to solve quadratic programming problems related to LUGPI and LP problems related to FTSVMPI are solved by the CVX package.3 The dataset was obtained from the MNIST dataset [5] for digit recognition. The same as [5], we only consider the binary problem of “5” vs.“8” in 28 28 pixels (this database contains 5522 and 5652 images of 5 and 8, respectively). In order to make more difficult, they are resized the digits to 10 10 pixel images(see Fig. 1). We use the first 50 images of “5” and 50 images of “8” as the training set with the privileged information. Training sets of the smaller size are randomly extracted from the 100 selected images. 4000 of the remaining digits are used as a fixed validation set and 1766 digits are used as a test set. Each training data with the privileged information was supplied with a holistic description of the corresponding image [5]. These holistic descriptions are translated into 21-dimensional features such as two-part-ness (0–5); tilting to the right (0–3); aggressiveness (0–2); stability (0–3); uniformity (0–3), and so on. These privileged information is created prior to the learning process by independent expert (more details can be find from NEC lab4) In Fig. 2, we illustrated the results by varying the number of training data in a wider range. From the results (left of Fig. 2), we can find FTSVMPI outperforms dSVMþ in all case. The average error ratio of LUGPI is 1.9422% lower than that of dSVMþ. These results above show that the classifier decided by two nonparallel hyperplanes is superior to ones by a single hyperplane again [8]. More importantly, the computing speed of FTSVMPI is much faster than dSVMþ (see the right of Fig. 2), which confirms our result in Section 2. 3.2. Visual object tracking problem In this section, we apply FTSVMPI into the Visual Object Tracking (VOT) problem. VOT is one of the challenges in computer vision and has attracted many attentions. Recently, one dominating research trend in the area has been to consider the “object tracking” as a 1 dSVMþ is a version of LUPI; Vapink shows that the dSVMþ method outperforms the X n SVM þ method in almost all of the experiments [5]. 2 http://www.princeton.edu/ rvdb/loqo/LOQO.html 3 http://cvxr.com/cvx 4 www.nec-labs.com/research/machine/ml_website/department/software/ learning-with-teacher
binary classification problem and then to apply on-line updating appearance-based classifiers to track the designated moving object. Some classical papers include SVM methods [20–22], on-line boosting [23,24], multiple instance learning [25], metric learning [26] and so on. These methods mainly adopt the following three means to improve the accuracy of VOT problem: (1) design better object tracking algorithm or strategy; (2) express the tracking object more reasonable; (3) exploit the known prior information fully. Essentially, VOT is a classification problem for small samples. Vapnik and Vashist [5] have shown that the privileged information is able to accelerate the convergence rate of learning especially in the case of small samples. As a result, we attempt to use the privileged information to improve the VOT's quality. To the best of our knowledge, there is no research on discussing the privileged information in VOT field and of course exploiting this information to improve the corresponding tracker. Here, we propose a new VOT algorithm based on the privileged information. First, we define what is the privileged information of VOT (see Fig. 3). Suppose the tþk þ1-th frame is the current frame, and examples obtained from the t-th frame are taken as the standard input data. Frames between t-th and tþk þ1-th are able to be considered “the future in the past” where these data are taken as the privileged information, which is not available for testing (but obtainable for training) and can be expressed as xi nt ¼ ðxiðt þ 1Þ ; xiðt þ 2Þ ; …; xiðt þ kÞ Þ;
i ¼ 1; …; l:
ð25Þ
Next, similar to [21], we give a simple object tracking algorithm using privileged information. Details are described as Algorithm 1.
Algorithm 1. On-line object tracking using privileged information (O2TUPI1). Input: Initialize:
N frames video images. Set the target region R, k value in (25) and
parameters in FTSVMPI. Set 1-th to ðk þ 1Þ-th frames: ○ Extract positive, negative samples and privileged information S0þ ¼ fxi ; xni ; þ 1g, S0 ¼ fxi ; xni ; 1g. ○ Train the FTSVMPI's model by (23) and (24) and obtain the two nonparallel classifiers (see (18)). For each new frame n (1 r n r N k 1): Find the local maximum score V of the search region by f þ ðn 1Þ ðxÞ and f ðn 1Þ ðxÞ. If the V 40, go to the next step; else stop updating the current model and suppose Rn is the target region, and go to the next frame.
150
Z. Qi et al. / Neurocomputing 129 (2014) 146–152
Fig. 3. The interpretation about the privileged information in the VOT problem.
Update positive samples and negative samples: ○ If n o ¼ m þ 1, and then Snþ ¼ Snþ 1 ⋃P and Sn ¼ Sn 1 ⋃N, where N; P is current positive samples and negative samples. ○ Else update Snþ ¼ ðSnþ 1 Snþ m 1 Þ⋃P and Sn ¼ ðSn 1 Sn m 1 Þ⋃N. Retrain the FTSVMPI's model using new samples and update our classifiers.
In Algorithm 1, in order to preserve the history information, we take m frames' samples as “Key Frames”. At the same time, we update FTSVMPI's model using new samples to avoid the drifting problem as much as possible.5 In addition, we use two kinds of features to extract the image samples' feature efficiently (HOG feature [27] is used to build the standard input data and Haar [28] is for the corresponding privileged information). For HOG feature, we construct a 9 bins HOG histogram for each cell (4 4 pixels), each block contains four cells with a 36-d HOG feature vector. For Haar feature, we build a 16-d vector for each block, in which each cell contains 4 kinds of haar features (see [28]). Each image patch is tiled with a grid of overlapping blocks (we set the overlap ratio is 0.5 in our experiments). Positive samples are generated by shifting the tracking object window within the scope of 4 pixels and rotating it from 51 to þ 51. Negative samples are generated within the search region (In practice, we use 20 positive samples and negative samples to construct the training data. The search radius is set to 40 pixels. For the privileged information, set k ¼3. For the history information, select m¼ 10). The search region would be multi-scale scanned and the scaling factor is set to [0.8, 1.5], and the searching step is set to 1. Finally, we perform our experiments on 4 publicly available video sequences: Tiger2, Sylvester, Occluded faces2 and Surfer [25]. In order to show the effect of the privileged information, we compare our results with Algorithm 1 without using the privileged information (called O2TUPI2). For horizontal comparison, we compared our results to MILTrack [25]. Table 1, Figs. 4 and 5 report our tracking results. 5 In fact, Algorithm 1 is not the best VOT's tactics than exiting object trackers. However, since our goal is only to confirm whether the privileged information is helpful for the VOT, we use the simple tactics.
Table 1 Location mean error on different datasets. Location mean error shows the average Euclidean distance from object locations to ground truths. Datasets
O2TUPI1
O2TUPI2
MILTrack
Tiger2 Sylvester Occluded Face2 Surfer
9.15 9.02 28.55 5.68
12.91 12.67 34.94 8.48
9.81 9.38 30.73 5.12
From these results, we can draw the conclusion as follows: (1) O2TUPI1 outperforms O2TUPI2 in all cases. We already know VOT may be regarded as a classification problem for small samples. How to quickly find the optimal solution of the corresponding optimization is the key issue. All results show the utilization of privileged information indeed accelerates the convergence rate of learning and helps the object tracker to further improve the quality. (2) O2TUPI1 is superior to MILTrack in Tiger2, Sylvester, Occluded Face2 datasets, but MILTrack has a better performance on the Surfer dataset. For the horizontal comparison, O2TUPI1 is competitive with MILTrack at least. This confirms our method's effectiveness on the VOT problem again.
4. Conclusion In this paper, we proposed a fast support vector machine method using privileged information (called FTSVMPI). Unlike LUPI, our method can be solved effectively by two smaller-sized LP model. At the same time, our method inherits all the virtues of TWSVM and has the natural advantages in the model's training time and generalized capability compared with the standard LUPI. Next, we raise an interesting topic: how to exploit the privileged information to improve the performance of VOT problem. We, firstly define the privileged information of the VOT, and then give a simple but effective on-line object tracking algorithm using privileged information: O2TUPI1. All experiments show the privileged information is very useful and can further improve our tracker's performance and robustness. In fact, our paper is only a beginning of using privileged information to improve the tracker's quality. In the future work, how to further express privileged information, how to design a more efficient algorithm to track object are still under our consideration.
Z. Qi et al. / Neurocomputing 129 (2014) 146–152
1
0.6 0.4 0.2 0
MILTrack O2TUPI1 O2TUPI2
0.8 Error rates (%)
Error rates (%)
1
MILTrack O2TUPI1 O2TUPI2
0.8
0.6 0.4 0.2
0
10
20
30
40
0
50
0
10
Threshold 1
30
1
40
50
0.6 0.4 0.2
MILTrack O2TUPI1 O2TUPI2
0.8 Error rates (%)
Error rates (%)
20
Threshold
MILTrack O2TUPI1 O2TUPI2
0.8
0
151
0.6 0.4 0.2
0
10
20
30
Threshold
40
50
0
0
10
20
30
40
50
Threshold
Fig. 4. The error rates in different thresholds. The error rates show the percentage of frames for which the estimated object location was outside some threshold distance of the ground truth. (a) Tiger2, (b) Sylvester, (c) Occluded Face2, (d) Surfer.
Fig. 5. Partly results of VOT on four datasets. Up–down: Tiger2, Sylvester, Occluded Face2, Surfer.
152
Z. Qi et al. / Neurocomputing 129 (2014) 146–152
Acknowledgments This work has been partially supported by China Postdoctoral Science Foundation under Grant no. 2013M530702, and grants from National Natural Science Foundation of China (No. 11271361, No. 71331005), the CAS/SAFEA International Partnership Program for Creative Research Teams, Major International (Ragional) Joint Research Project (No. 71110107026), the Ministry of water resources' special funds for scientific research on public causes (No. 201301094).
[24] H. Grabner, C. Leistner, H. Bischof, Semi-supervised on-line boosting for robust tracking., in: ECCV (1), vol. 5302, Springer, 2008, pp. 234–247. [25] Boris Babenko, Ming-Hsuan Yang, Serge Belongie, Robust object tracking with online multiple instance learning, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1619–1632. [26] X. Wang, G. Hua, T. X. Han, Discriminative tracking by metric learning, in: ECCV, vol. 6313, 2010, pp. 200–214. [27] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: International Conference on Computer Vision and Pattern Recognition, vol. 2, 2005, pp. 886–893. [28] Paul Viola, Michael Jones, Rapid object detection using a boosted cascade of simple features, in: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. 1, 2001, pp. 511–518.
References [1] R. Cipolla, S. Battiato, G.M. Farinella, Machine Learning for Computer Vision, Springer, 2013. [2] Z. Qi, Y. Xu, L. Wang, Y. Song, Online multiple instance boosting for object detection, Neurocomputing 74 (2011) 1769–1775. [3] Z. Qi, Y. Tian, Y. Shi, Efficient railway tracks detection and turnouts recognition method using hog features, Neural Comput. Appl. 23 (1) (2013) 245–254. [4] V. Vapnik, Estimation of Dependences Based on Empirical Data (Information Science and Statistics), Springer, 2006. [5] V. Vapnik, A. Vashist, A new learning paradigm: learning using privileged information, Neural Networks 22 (2009) 544–557. [6] D. Pechyony, V. Vapnik, On the theory of learning with privileged information, in: Advances in Neural Information Processing Systems, vol. 23, 2010. [7] D. Pechyony, R. Izmailov, A. Vashist, V. Vapnik, Smo-style algorithms for learning using privileged information., in: DMIN, CSREA Press, 2010, pp. 235– 241. [8] Jayadeva, R. Khemchandani, S. Chandra, Twin support vector machines for pattern classification, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 905–910. [9] Y.-H. Shao, C.-H. Zhang, X.-B. Wang, N.-Y. Deng, Improvements on twin support vector machines, IEEE Trans. Neural Networks 22 (2011) 962–968. [10] Z. Qi, Y. Tian, Y. Shi, Structural Twin Support Vector Machine for Classification, Knowledge-Based Systems, http://dx.doi.org/10.1016/j.knosys.2013.01.008. [11] Z. Qi, Y. Tian, Y. Shi, Laplacian twin support vector machine for semisupervised classification, Neural Networks 35 (2012) 46–53. [12] Z. Qi, Y. Tian, Y. Shi, Robust twin support vector machine for pattern classification, Pattern Recognition 46 (1) (2012) 305–316. [13] Z. Qi, Y. Tian, Y. Shi, Twin support vector machine with universum data, Neural Networks 36C (2012) 112–119. [14] Y. Tian, Z. Qi, X. Ju, X. Liu, Nonparallel support vector machines for pattern classification, IEEE Trans. Syst. Man Cybernet. Part B, http://dx.doi.org/10.1109/ TCYB.2013.2279167, in press. [15] B. Schölkopf, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002. [16] N. Guan, D. Tao, Z. Luo, J. Shawe-Taylor, Mahnmf: Manhattan non-negative matrix factorization, arXiv preprint, arxiv:1207.3438. [17] Y. Luo, D. Tao, B. Geng, C. Xu, S. Maybank, Manifold regularized multi-task learning for semi-supervised multi-label image classification, IEEE Trans. Image Process. 22 (2) (2013) 523–536. [18] D. Tao, X. Tang, X. Li, X. Wu, Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 28 (7) (2006) 1088–1099. [19] T. Zhou, D. Tao, X. Wu, Nesvm: a fast gradient method for support vector machines, in: 2010 IEEE 10th International Conference on Data Mining (ICDM), IEEE, 2010, pp. 679–688. [20] S. Avidan, Support vector tracking, IEEE Trans. Pattern Anal. Mach. Intell. (2001) 184–191. [21] M. Tian, W. Zhang, F. Liu, On-line ensemble SVM for robust object tracking, in: ACCV, vol. 4843, 2007, pp. 355–364. [22] X. Li, A. R. Dick, H. Wang, C. Shen, A. van den Hengel, Graph mode-based contextual kernels for robust SVM tracking, in: ICCV, 2011, pp. 1156–1163. [23] Helmut Grabner, Michael Grabner, Horst Bischof, Real-time tracking via online Boosting. BMVC 1 (5) (2006) 47–56.
Zhiquan Qi is an Assistant professor of Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences. His research interests include object detection, object tracking, change detecting and machine learning.
Yingjie Tian is the Professor of Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences. First degree in mathematics (1994), Master in applied mathematics (1997), Ph.D. in management science and engineering. He has published 4 books about SVMs, one of which has been cited over 1000 times. His research interests include support vector machines, optimization theory and applications, data mining, intelligent knowledge management, risk management.
Yong Shi is a Professor. He currently serves as the Executive Deputy Director, Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences Research Center on. He has been the Charles W. and Margre H. Durham Distinguished Professor of Information Technology, College of Information Science and Technology, Peter Kiewit Institute, University of Nebraska, USA since 1999. His research interests include business intelligence, data mining, and multiple criteria decision making. He has published more than 17 books, over 200 papers in various journals and numerous conferences/ proceedings papers. He is the Editor-in-Chief of International Journal of Information Technology and Decision Making (SCI), and a member of Editorial Board for a number of academic journals. He has received many distinguished awards including the Georg Cantor Award of the International Society on Multiple Criteria Decision Making (MCDM), 2009; Outstanding Young Scientist Award, National Natural Science Foundation of China, 2001; and Speaker of Distinguished Visitors Program (DVP) for 1997–2000, IEEE Computer Society. He has consulted or worked on business projects for a number of international companies in data mining and knowledge management.