Query-Level Stability of Ranking SVM for Replacement Case

Query-Level Stability of Ranking SVM for Replacement Case

Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Engineering Procedia Engineering 00 (2011) 000–000 Proc...

296KB Sizes 0 Downloads 46 Views

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

Procedia Engineering

Procedia Engineering 00 (2011) 000–000 Procedia Engineering 15 (2011) 2145 – 2149 www.elsevier.com/locate/procedia

Advanced in Control Engineering and Information Science

Query-Level Stability of Ranking SVM for Replacement Case a

Yun Gaoa, Wei Gaob, Yungang Zhangc *

Department of Editorial, Yunnan Normal University, Kunming 650092, China b Department of Mathematics, Soochow University, Jiangsu Suzhou 215006, China c Department of Computer Science, Yunnan Normal University, Kunming 650092, China

Abstract The quality of ranking determines the success or failure of information retrieval and the goal of ranking is to learn a real-valued ranking function that induces a ranking or ordering over an instance space. We focus on stability and generalization ability of ranking SVM for replacement case. The query-level stability of ranking SVM for replacement case and the generalization bounds for such ranking algorithm via query-level stability by changing one element in sample set are given.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [CEIS 2011] Key words: ranking; algorithmic stability; generalization bounds; learnability; Reproducing Kernel Hilbert Space

1. Introduction The problem of ranking is formulated by learning a scoring function with small ranking error generated from the given labeled samples. There are some famous ranking algorithms such as rank boost (see [1]), gradient descent ranking (see [2]), margin-based ranking (see [3]), P-Norm Push ranking (see [4]), ranking SVMs (see [5]), MfoM (see [6]), Magnitude-Preserving ranking (see [7]) and so on. Some theory analysis can be found in [8-12]. Stability analysis is important issue in learning theory. In particular, it is known that stability of the ERM is sufficient for learnability. In [13], it is argued that stability is also a necessary for learnability. * Corresponding author. Tel.: +86-871-5516277; fax: +86-871-5516277. E-mail address:[email protected]

1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2011.08.401

2146 2

/ Procedia Engineering (2011) 2145 – 2149 YunYun GaoGao ,etetal/al.Procedia Engineering 0015 (2011) 000–000

That is to say, when uniform convergence is equivalent to learnability, stability is necessary for any learning algorithm. [14] learn the generalization bounds for the extension of this ranking algorithm via uniform leave-onequery-out associate-level loss stability. Our paper as the continue work of [14], consider stability for extension ranking algorithm raised by [14] and the generalization bounds for such ranking algorithms in the replacement case. The organization of this paper is as follows: we describe the setting of ranking problem in next section. Using these notions, we derive stability and generalization bound for stable ranking algorithms in the replacement case. 2. Settings Assume Q is query space and query q ∈ Q is a random sample according to a probability distribution

and its ground truth g( ω ) are sampled from space Ω × G PQ. For query q, an associate ω according to a joint probability distribution Dq, where G is the space of ground truth and Ω is the space of (q)

associates. Here the associate

ω (q)

(q)

can be a pair of documents, a single document, or a set of documents,

and correspondingly ground truth g( ω

(q)

) can be a relevance score. Let l(f;

(referred to as associate-level loss) defined on ( ω Expected query-level loss is defined as: L(f;q)=



Ω×G

(q)

, g( ω

(q)

ω ( q ) , g( ω ( q ) )) denote a loss

)) and a ranking function f is score function.

l ( f ; ω ( q ) , g (ω ( q ) ))Dq (dω ( q ) , dg (ω ( q ) )) .

L(f;q) measure the quality of f. However, L(f;q) cannot be computer directly since probability distribution is unknown. We always compute Empirical query-level loss instead , which defined as:

1 Lˆ ( f ; q ) = nq where (ω

(q) j

nq

∑ l( f ;ω j =1

(q) j

, g (ω (j q ) )) ,

, g (ω )) , j =1 , …, nq show nq associates of q, which are sampled i.i.d. according to Dq. (q) j

The empirical query-level loss can be an estimate of the expected query-level loss, and the estimation is consistent. The goal of learning to rank is to learn the best ranking function f which can minimize the expected query-level risk defined as:

Rl ( f ) = EQ L( f ; q) = ∫ L( f ; q ) PQ (dq ) Q

(1)

Again, PQ is unknown. We choose the training samples (q1, S1), …, (qr, Sr), where Si ={( ω1 , (i )

g( ω1 )),…, ( ωni , g( ωni ))}, i=1, …, r, and ni is the number of associates for query qi. Here q1, …, qr can (i )

(i )

(i )

be viewed as data sampled i.i.d. according to PQ, and ( ω j , g( ω j )) as data sampled i.i.d. according to (i )

(i )

Dqi , j = 1, …, ni, i = 1, …, r.

The empirical query-level risk now associated training sample is defined as:

1 Rˆl ( f ) = ∑ Lˆ ( f ; qi ) r i =1 r

(2)

2147 3

YunAuthor Gao etname al. / Procedia Engineering 1500 (2011) 2145 – 2149 / Procedia Engineering (2011) 000–000

The empirical query-level risk is an estimate of the expected query-level risk and the estimation is consistent. This probabilistic formulation can cover most of existing learning for ranking algorithms. Therefore, we expected that

Rl ( f ) and Rˆl ( f ) are as close as possible.

Ranking SVM is widely used in ranking for IR, which views document pair as associate of the query and minimizes:

min f ∈F

1 n ∑ lh ( f ; zi , yi ) + λ f n i =1

2

(3)

K

Where lh ( f ; zi , yi ) is the hinge loss, and K is a kernel function in the Reproducing Kernel Hilbert Space(RKHS).

3. Stability of uniform associate-level loss stability for replacement case Y. Lan (see [14]) defined uniform leave-one-query-out associate-level loss stability. To use notions defined above, we use uniform associate-level loss stability for change one element in training sample as defined in [15], called Uniform associate-level loss stability for replacement case. It is also good measures to show how robust a ranking algorithm is. Using the conventional stability theory in [16], we can get the following result which shows the query-level stability of Ranking SVM in replacement case.

∀ x ∈ X, K(x,x) ≤ κ 2 < ∞ , then Ranking SVM has quary-level stability with coefficient ni 8κ 2 (r ) τ= × max ∀ni , Si . 1 r λr ∑ ni r i =1 Theorem 1. If

As discuss in [14], suppose the mean and variance of the distribution of nq are

μ

and

σ2

respectively.

When r tends to infinity, Then ∀ 0< δ <1, ∀ ε >0, ∃ R( ε ), if r> R( ε ), with probability at least 1- δ , by the Law of Large Numbers and Chebyshev’s inequality, we have

σ

1+ max ∀ni , Si

σ

1+ So,

τ (r ) ≤

8κ λr

2

convergence rate of

1 r ∑ ni r i =1



μ ε 1− μ

δ r .

δ

μ ε 1− μ O(

ni

r . That is to say, when r goes to infinity, τ (r ) will tend to zero with a

1 ). r

For the practice case, r is finite, then no reasonable statistical estimation of the term

2148 4

al.Procedia / Procedia Engineering (2011) 2145 – 2149 YunYun GaoGao ,etetal/ Engineering 0015 (2011) 000–000

max ∀ni , Si

ni

. As a result, it only get a loose bound for

1 r ∑ ni r i =1 but is still finite, τ ( r ) does not necessarily decrease.

τ (r )

as

8κ 2 . That is, when r increases λr

4. Generalization bound Based on the McDiarmid inequality and Theorem 1, we can further derive the generalization bound of Ranking SVM. In particular, as the function f{( q , S )}r is learned from the training samples (q1, S1),… , (qr, i

i

i =1

f{( q , S )}r

Sr), there is a constant C satisfy that, for ∀ (q1, S1),… , (qr, Sr), we have

i

i

i =1

K

≤ C . Then for

∀ (q1, S1),… , (qr, Sr), z ∈ Z, y ∈ Y, lh ( f{( q , S )}r ; z , y ) ≤ 1 + 2Cκ . We obtain the following theorems. i

i

i =1

Theorem 2. If ∀ x ∈ X, K(x,x) ≤ κ < ∞ , then for Ranking SVM, ∀ 2

if r> R( ε ), with probability at least 1-2 δ over the samples of



r i =1

δ ∈ (0, 1), ∀ε > 0 , ∃ R( ε ), {(qi , Si )}ir=1 in the product space

{Q × ( X × X × Y )∞ } , we have

σ

1+

σ

1+

μ

16κ 2 Rl ( f{( q , S )}r ) ≤ Rˆl ( f{( q , S )}r ) + i i i =1 i i i =1 λr

ε 1− μ

8κ λr

δ

δ

μ

2

ε 1− μ

r +

r + λ (1 + 2C ) κ ln

λ

1

δ

2r

.

Theorem 3. If ∀ x ∈ X, K(x,x) ≤ κ < ∞ and we have no constraint on r, then for Ranking SVM, 2

∀ δ ∈ (0, 1), with probability at least 1- δ over the samples of {(qi , Si )}ir=1 in the product space



r i =1

{Q × ( X × X × Y )∞ } , we have

16κ 8rκ + λ (1 + 2Cκ ) Rl ( f{( q , S )}r ) ≤ Rˆl ( f{( q , S )}r ) + + 2

i

i

i =1

i

i

i =1

λ

2

λ

ln

1

δ

2r

.

Theorem 2 used in the case when the number of training queries tends to be infinity, such that with high probability the empirical query-level risk of Ranking SVM will converge to its expected query-level risk. However, in the practice case, the number of training queries is finite, the expected query-level risk and empirical query-level risk are not necessarily close to each other, and the bound in Theorem 3 quantifies the difference, which is an increasing function of the number of training queries.

YunAuthor Gao etname al. / Procedia Engineering 1500 (2011) 2145 – 2149 / Procedia Engineering (2011) 000–000

5. Conclusion In this paper, we focus on stability and generalization ability of ranking SVM for replacement case. The query-level stability of ranking SVM for replacement case and the generalization bounds for such ranking algorithm via query-level stability by changing one element in sample set are given. Acknowledgements We would like to thank the anonymous referees for providing us with constructive comments and suggestions. References [1]

R. Cynthia, E. Robert, D. Ingrid, Boosting based on a smooth margin, Proceedings of the 16th Annual Conference on Computational Learning Theory, pp. 502-517, (2004).

[2]

C. Burges, Learning to rank using gradient descent, Proceedings of the 22nd Intl Conference on Machine Learning, pp. 89-96, (2005).

[3]

Y. Rong, Alexander, D. Hauptmann, Efficient margin-based rank learning algorithms for information retrieval, CIVR, pp. 113-122, (2006).

[4] [5]

R. Cynthia, Ranking with a P-Norm Push, COLT 2006, LNAI 4005, pp. 589-604, (2006). T. Joachims, Optimizing search engines using clickthrough data, in Proc.The 8th ACM SIGKDD Intl Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, pp. 133-142, (2002).

[6] [7]

T.S. Chua, S.Y. Neo, H.K. Goh,et al, Trecvid 2005 by nus pris, NIST TRECVID, (2005). C. Corinna, M. Mehryar, R. Ashish, Magnitude-Preserving Ranking Algorithms, in Proc. The 24th International Conference on Machine Learning. Corvallis, OR, (2007).

[8]

S. Kutin, P. Niyogi, The interaction of stability and weakness in AdaBoost, Technical Report TR-2001-30, Computer Science Department, University of Chicago (2001).

[9]

S. Agarwal, P. Niyogi, Stability and generalization of bipartite ranking algorithms, In proceedings of the 18th Annual Conference on Learning Theory, (2005).

[10] S. Agarwal, P. Niyogi, Generalization bounds for ranking algorithms via algorithmic stability, Journal of Machine Learning Research, Vol,10, December. 2009, pp. 441-474. [11] W. Gao, Y. Zhang, Y. Gao, L, Liang, Y. Xia, Strong and Weak Stability of Bipartite Ranking Algorithms, International Conference on Engineering and Information Management (ICEIM 2011), Chengdu, China, Apr, 2011, pp.303-307. [12] W. Gao, Y. Zhang, L. Liang, Y. Xia, Stability analysis for ranking algorithms, 2010 IEEE International Conference on Information Theory and Information Security (ICITIS), Beijing, pp.973-976. [13] S. Mukherjee, P. Niyogi, T. Poggio, R. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 2006; 25(1-3):161–193. [14] Y. Lan, T. Liu, T. Qin, Z. Ma, H. Li, Query-Level Stability and Generalization in Learning to Rank, Appearing in Proceedings of the 25 th International Conference on Machine Learning, Helsinki, Finland, (2008). [15] X.He,W.Gao, Z.Jia. Generalization bounds of Ranking via Query-Level Stability I. proceedings of 2011 2nd International Conference on Intelligent Transportation Systems and Intelligent Computing (ITSIC 2011). [16] O.Bousquet, A. Elisseeff, Stability and generalization, Journal of Machine Learning Research, 2002;2,pp.499-526, [17] C. McDiarmid, On the method of bounded differences. In Surveys in Combinatorics 1989, pp. 148-188. Cambridge University Press.

2149 5