A New Discriminative Ordinal Regression Method

Available online at www.sciencedirect.com Available online at www.sciencedirect.com Available online at www.sciencedirect.com ScienceDirect Procedia...

Download PDF

608KB Sizes 12 Downloads 111 Views

Report

PDF Reader
Full Text

Available online at www.sciencedirect.com Available online at www.sciencedirect.com Available online at www.sciencedirect.com

ScienceDirect

Procedia Computer Science 00 (2018) 000–000 Procedia (2018)000–000 605–612 Procedia Computer Computer Science Science 139 00 (2018)

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

The International Academy of Information Technology and Quantitative Management, the The International Academy of Information Technology and Quantitative Management, the Peter Kiewit Institute, University of Nebraska Peter Kiewit Institute, University of Nebraska

A New Discriminative Ordinal Regression Method A New Discriminative Ordinal Regression Method Wenhan Jiang* Wenhan Jiang*

First Research Institute of the Ministry of Public Security, Beijing, 100048, China First Research Institute of the Ministry of Public Security, Beijing, 100048, China

Abstract Abstract Ordinal regression as an important machine learning problem has been widely applied to information retrieval and collaborative filtering. ordinalmachine regression methodsproblem include has perceptron based (Pranks) methods, SVM based ordinal Ordinal regression as Current an important learning been widely applied to information retrieval and regression (SVOR) methods, discriminant learning ordinal regression (KDLOR) methodmethods, etc. Among methods, collaborative filtering. Currentand ordinal regression methods include perceptron based (Pranks) SVMthese based ordinal KDLOR performs for rank because of itsordinal majority on discriminant information forAmong classes.these However, this regression (SVOR)well methods, andprediction, discriminant learning regression (KDLOR) method etc. methods, method only considered the rank ordinal information of two classes for discriminant learning.for In fact, there exists much KDLOR performs well for prediction, because of adjacent its majority on discriminant information classes. However, this ordinal only information in any upper-lower pair class. In adjacent this paper we present an enhanced discriminant learning method considered the ordinal information of two classes for discriminant learning. In fact, there existsordinal much regression (EDLOR) in method which uses global ordinal upper-lower pair class, simultaneously realize the ordinal information any upper-lower pair class. In constraints this paper on weany present an enhanced discriminant learning ordinal maximum ofmethod them. The results numerical experiments and benchmark datasets verify the usefulness regression distance (EDLOR) which usesofglobal ordinal constraintsononSynthetic any upper-lower pair class, simultaneously realize the of our approach. maximum distance of them. The results of numerical experiments on Synthetic and benchmark datasets verify the usefulness of our approach. © 2018 The Authors. Published by Elsevier B.V. © 2018 The Authors. Published by Elsevier B.V. This is an open accessPublished article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). © 2018 The Authors. by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer is review under responsibility of the the CC scientific committee of The International Academy of Information Technology and This an open access article under BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer review under responsibility of the scientific committee of The International Academy of Information Technology Quantitative Management, the Peter Kiewit Institute, University of Nebraska. Peer review under responsibility ofPeter the scientific committee of The International and Quantitative Management, the Kiewit Institute, University of Nebraska. Academy of Information Technology and Quantitative Management, the Peter Kiewit Institute, University of Nebraska. Keywords: Ranking; Ordinal Regression; Discriminant Analysis. Keywords: Ranking; Ordinal Regression; Discriminant Analysis.

1. Introduction 1. Introduction Ordinal regression is a supervised learning problem, also a kind of pointwise learning rank methodology, Ordinal regression is a supervised learning problem, a kind of pointwise rankregression methodology, where the training instances are labeled by ordinal scalealso called the rank. The tasklearning of ordinal is to where by losses. ordinalOrdinal scale called the rank. of ordinal is to predictthe thetraining rank of instances instances are withlabeled minimal regression has The beentask widely appliedregression in information predict rank of instancesfiltering with minimal losses. Ordinal regression been widely appliedifinsome information retrievaltheand collaborative [2][3][10][22]. For example in has collaborative filtering, users’ retrieval and“very collaborative filtering [2][3][10][22]. For good” example in collaborative if user somewho users’ preferences bad”, “bad”, “average”, “good”, “very on movies are known,filtering, for a new has preferences “very bad”, “bad”, good” on movies areranking known,model. for a new user labels who has ranked the movies already seen,“average”, we can give“good”, a rating“very of new movies to him by The rank in ranked the movies already seen, we can give a rating of new movies to him by ranking model. The rank labels in

* Corresponding author. Tel.:+86-18610739716. E-mail address: [email protected]. * Corresponding author. Tel.:+86-18610739716. E-mail address: [email protected].

1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer review under responsibility of the scientific committee of The International Academy of Information Technology and Quantitative Management, the Peter Kiewit Institute, University of Nebraska. 10.1016/j.procs.2018.10.203

606

Wenhan Jiang / Procedia Computer Science 139 (2018) 605–612 Author name / Procedia Computer Science 00 (2017) 000–000

ordinal regression are usually represented by a finite number of ranks {1,2,…,k}, which is different from that in metric regression and multiple classification. From the views of learning to rank, ordinal regression belongs to pointwise ranking method [10]. There are mainly three ranking approaches, pointwise, pairewise and listwise. Pointwise approach takes sample’s feature as for training and its decision rule is defined on a single sample, which mainly includes perceptron based ranking (Prank) methods [2][12], SVM based ordinal regression (SVOR) methods [3][13][21], Multi-Class ranking (McRank) method [17], Subset ranking (SubsetRank) method [4], and Discriminant Learning based ordinal regression (KDLOR) method [22] etc. Pairewise approach transforms ranking into pair classification, and any pair of samples and their orders are used for training. Pairewise approach includes RankNet, IRSVM, GBRank, Frank, LambdaRank, and LambdaMART, etc. [10][8][19][23][26]. Listwise approach addresses the ranking problem in a more natural way, and takes ranking lists as instances in both learning and prediction, such as ListNet, ListMLE, AdaRank, SVM MAP, Soft Rank, and AppRank etc. [10][2][25]. Our proposal method belongs to pointwise approach, thus the relative works of our paper will focus on the ordinal regression in pointwise. In the past ten years, many algorithms for ordinal regression have been proposed in the domain of machine learning. As ordinal regression for ranking problem can also be taken as special multiple classification and regression, thus existing ideas for classification and regression are generally applied for it. The common idea is to transform the ordinal scales into numeric values, and then to deal with the regression problem. Kramer, et al. [15] used regression tree learner by mapping the ordinal scale to numeric values. Cossock and Zhang [4] considered using discounted cumulative gain (DCG) Kalervo and Jaana [16] as evaluation measure, and solve the original ranking as regression problem. Frank and Hall [9] decomposed the original ordinal regression problem into a set of binary classification tasks, and encoded the ordering of the original ranks. Li et al. [17] cast the ranking problem as multi-class classification, and used the gradient boosting tree algorithm to calculate the class probabilities of objects for ranking. Crammer and Singer [5] proposed a proceptron-based online algorithm for rank prediction, known as the Prank algorithm. Harrington [12] extended the Prank algorithm into an online large margin version as OAP-BPM algorithm. Har-Peled [11] proposed a constraint classification approach that provides a unified framework for solving ranking and multi-classification problems. Herbrich, et al. [13] made a theoretical study on ordinal regression and applied the principle of Structural Risk Minimization [24] to ordinal regression. Shashua and Levin [21] proposed two large-margin principles, namely, fixed-margin principle and sum of margins principle, to handle the direction and multiple thresholds. Chu and Keerthi [3] improved the methods of Shashua and Levin [21], and proposed two support vector ordinal regression methods seperately by adding the ordering of the thresholds and by considering all the ranks to each threshold as constraints. Sun et al. [22] extended the traditional linear discriminant analysis algorithm called KDLOR, which utilized the global information of discriminant features for ordinal regression. Among above methods, SVM-based SVOR methods [3][21] have shown great promise in ordinal regression. However, they suffer from two problems [22]: (1) Decision boundary is exclusively determined by support vectors, but ignores the global information of the samples; (2) Computational complexity is generally higher because of the involved quadratic programming problem of SVM. Compared to SVOR methods, the KDLOR method proposed by Sun et al. [22] makes use of the global information of classes, and obeys the discriminant analysis idea that minimizing the within-class distance and maximizing the between-class distance for ranking. Experimental results on benchmark datasets demonstrated that KDLOR is competitive with SVM-based ordinal regression methods in accuracy and computational complexity. However, KDLOR still exists a problem that the rank constraint in it only considers the nearest two classes but not all the pairs of classes, i.e. the objective maximum between-class distance is only limited to the neighbor grades. In fact, there usually exist larger overlaps among the classes in real data. In order to insure the general rank loss of all the data are small, it is more suitable to maximize the distance of any possible ordinal pair of classes. In this paper, we propose an extended LDA ordinal regression method by enhancing rank constraints into all the possible pairs of classes. Experimental

Wenhan Jiang / Procedia Computer Science 139 (2018) 605–612 Author name / Procedia Computer Science 00 (2017) 000–000

607

results show that our proposed method outperforms other compared methods, and can improve the KDLOR in accuracy for ordinal regression. This paper is organized as follows. In Section 2 we propose enhanced discriminative learning ordinal regression (EDLOR) method. In Section 3 we test our method on several datasets, and in the last section we conclude our work. 2. Enhanced Discriminative Ordinal Regression 2.1. Proposal Method The objective of our proposed method is to find a linear projection on which any pair of ordinal classes tries to have maximal inter-class distance, meanwhile maintain minimal inner-class distance. Being same with LDA [1][7][18] and KDLOR [22], an inner-class scatter matrix is as follows: 1 r T (1) Sw =   ( x − mk )( x − mk ) N k k =1x X k where Xk denotes the sample set with kth-rank label, and m k is mean of the kth class:

1 x. N k xX k For kth-rank class, we define the sum of between-class from it to any lower rank class as: d(k) = k, q  wT (mk − mq ), qk mk =

q

(2) (3)

where wT (mk − mq) is the mean distance of class k and q on projection vector w . k, q is the weight for the pair class, which can define how interval value is more important for d (k ) . In our paper we introduce exponential dependence weigh k, q for the interval:

k, q = exp( k − q) , k  q

(4)

k, q can be also written as k, q = exp( yk − q) , yk denotes the kth-rank label. We wish a sample in kth class has a lower absolute error by our model, thus the distance d (k ) (see Eq.(3)) should be sensitive to larger error interval rank of k − q . For example, given a sample in kth class, for the case that q  y k ( i.e. k − q is larger), if predict result is yˆ k  q , then the distance d (k ) will lost much larger value. Based on above analysis, the objective function of our method can be written as: min J (w,  ) = wT Sww − C   + (wT w ) s.t.

 k,q  wT (mk +1 − mq )  ,

q,qk

k = 1,2,..., r − 1

(5)

Our model tries to minimize the distance of samples in same class while simultaneously insure the ordinal information by constraint items. However, different to KDLOR, our model enhances the ordinal constraint to any pair class, and also give different weigh to the pair class by exponential dependence on their rank interval. In Eq.(5),   0 , C > 0, C is a penalty coefficient.  > 0 is the regularization factor, which is to help solving the singularity problem for S w . From Eq.(5), we get lagrangian equation: L(w, , ) = wT (Sw + I )w − C   −

K −1





k =1

 q, q  k



 k   k, q  wT (mk +1 − mq ) −  

(6)

Wenhan Jiang / Procedia Computer Science 139 (2018) 605–612 Author name / Procedia Computer Science 00 (2017) 000–000

608

with lagrange multipliers  k  0 , k = 1,...,r − 1 . To derive the lagrange multipliers from Eq.(9), we obtain formulae: 1 L = 0  w = (Sw + I )−1 2 w

r −1





k =1

 q,qk



k  k,q  (mk +1 − mq) 

r −1 L = 0   k = C  k =1

Let u =

(7) (8)

r −1

 k (  k, q  (mk +1 − mq)) , the corresponding optimization problem can be turned into:

k =1

q, q  k

1 min f() = − uT (Sw + I )-1u 4 s.t. k  0, k = 1,...,r − 1,

(9)

r −1

 k = C.

k =1

This is a convex quadratic programming (QP) problem, and many methods, such as interior point, active set, and conjugate gradient can be used to solve it [22]. After obtaining the optimal solution  k , we can compute w by Eq.(7). The multiple threshold bk (k = 1,..., r − 1) are as Eq.(10), and bk is set to a larger number, 𝑏𝑏𝑘𝑘 = 105 .

bk = wT ( Nk +1mk +1 − Nkmk ) /( Nk +1 + Nk )

(10)

The decision rule to predict the rank of an given vector x can be as follows: f(x) = min

k : wT x - bk  0

k {1,...,r}

(11)

The complexity of our method is basically same to KDLOR, which is smaller than Prank and SVORs[22]. 2.2. Algorithm Given training data (xi, yi)  Rl  R , i = 1,..., N , and yi  {1,2,..., r} is rank label. X k = {xi | yi = k} is the sample set labelled with rank k, and its size is Nk . We give our proposed EDLOR algorithm process as Algorithm 1. Similar to KDLOR by kernelized learning, our method can also be extended for nonlinear ordinal regression problems. The kernel-based idea is to use kernel trick to map the original training data into a higher dimensional feature space in which the EDLOR are conducted. We ignore the details of kernelized EDLOR process in our introduction. For the parameter C and in Algorithm 1, we respectively get a suitable value by search method. Usually C is set about 0.5, and λ is a smaller positive number. 3. Experiments In this section, we compare our proposed method EDLOR with Prank [5], SVOR-IMC, SVOR-EXC [3] and KDLOR [22] separately on synthetic datasets, benchmark data sets and collaborative filtering dataset. The experiments are conducted by Matlab2013b software, and involved optimal problems in these methods are all solved by quadprog function of Matlab Optimization Toolbox. For SVORs, we don’t adopt SMO rapidly computing, but to solve the standard optimal model by Matlab. For Prank method, we set 500 round for perceptron iterative process, where a round is a loop for all the samples’ prediction. For the involved parameters in SVOR-EXC, SVOR-IMC, KDLOR and our EDLOR are set by search method. The linear kernel is adopted for all the compared methods in our test.

Wenhan Jiang / Procedia Computer Science 139 (2018) 605–612 Author name / Procedia Computer Science 00 (2017) 000–000

609

In our experiments, two evaluation metrics are adopted, mean of zero-one error (MZE) and mean of absolute    error (MAE). Let the actual test outputs be { y1 , y 2 ,..., y n } and the predicted test outputs be { y1 , y 2 ,..., y n } , then the zero-one error and absolute error are defined as follows. -Mean zero-one error. the fraction of incorrect preictions on test data i.e. (1 / n)in=1 I(yˆi  yi) , where I () is an indicator function which gives 1 when the argument is true and 0 otherwise. -Mean absolute error. the average deviation of predicted test outputs from the true rank which is treated as consecutive integers, i.e., (1 / n)in=1 yˆi − yi . Algorithm 1 EDLOR Algorithm Input: 1. The training set {(xi, yi) | yi  {1,2,..., r}, i = 1,..., N} . Initialization: 1. The penalty coefficient C; 2. The regular item weigh  ; Training Process: 1. Compute inner-class scatter matrix, Sw; 2. Compute mean of each class sample, mk , k = 1,...,r − 1 ; 3. Solve following optimal problem to obtain  k : min uT (Sw + I)-1u , constraint  k  0, r-k=11 αk = C. where u = r-1 αk ( k,q  (mk +1 − mq)) . k =1 q,q  k

4.

Compute w: w=

5.

r −1 1 (Sw + I)-1  k (  k,q  (mk +1 − mq)) ; 2 k =1 q,qk

Compute bk:

bk = wT ( Nk +1mk +1 − Nk mk ) /( Nk +1 + Nk ) , k=1,…,r-1;

br = 10 5 .

Output: Decision rule : f(x) = min

k : wT x - bk  0 .

k{1,...,r}

3.1. Experiments on Synthetic Data Set In this experiment, we randomly generate four ranks of data with gaussian distribution in three dimensional space, and there are 100 samples in each rank respectively. The dataset obviously exist the overlaps among the classes. Figure 1 is the distribution of the data.

Figure 1. Synthetic data set for ordinal regression

Author name / Procedia Computer Science 00 (2017) 000–000 Wenhan Jiang / Procedia Computer Science 139 (2018) 605–612 Author name / Procedia Computer Science 00 (2017) 000–000

610

On the synthetic dataset, we compared the Prank, SVOR-EXC, SVOR-IMC, KDLOR and our EDOR method, On the synthetic dataset, we compared SVOR-EXC, KDLOR EDOR method, using all the samples as training and alsothe asPrank, test set. The results SVOR-IMC, of MZE, MAE and theand runour time are given in Authoralso nameas / Procedia Computer Science 00of (2017) 000–000 using test set. The results MZE, MAE and the run time are given in Table all 1. the samples as training and Table 1. On theonsynthetic Table 1. Results synthetic dataset, dataset we compared the Prank, SVOR-EXC, SVOR-IMC, KDLOR and our EDOR method, using all the samples as training and also as test set. The results of MZE, MAE and the run time are given in Table 1. Results on synthetic dataset Table 1. Methods MZE MAE Time(s) Prank 0.515 0.572 0.56 Methods MZE MAE Time(s) Table 1. Results on synthetic dataset SVR-EXC 0.207 0.207 36.06 Prank 0.515 0.572 0.56 SVR-IMC 0.362 0.362 272.63 SVR-EXC 0.207 0.207 36.06 KDLOR 0.187 0.187 0.005 Methods MZE MAE Time(s) SVR-IMC 0.362 0.362 272.63 Prank 0.515 0.572 0.56 EDOR 0.177 0.177 0.007 KDLOR 0.187 0.187 0.005 SVR-EXC 0.207 0.207 36.06 EDOR 0.177 0.177 0.007 SVR-IMC 0.362 0.362 272.63

In Table 1, our EDOR method obtain the lowest mean zero-one error and mean absolut error, and the run time KDLOR 0.187 0.187 0.005 In Table 1, our EDOR method obtain the lowest mean zero-one and mean absolut error, and the run time similar to KDLOR is much smaller than Prank and SVORs. EDOR 0.177 0.177 error 0.007 similar to KDLOR is much smaller than Prank and SVORs. In Table 1, our EDOR method obtain the lowest mean zero-one error and mean absolut error, and the run time 3.2. Experiments on Benchmark similar to KDLOR is much smaller than Prank and SVORs. 3.2. Experiments on Benchmark We3.2. useExperiments five benchmark datasets, Pyrimidines, Machine, Boston, Abalone and Bank[14][20][6][3] for our on Benchmark We use five benchmark datasets,arePyrimidines, Machine, Boston, Abalone and Bank[14][20][6][3] for our experiments. The data descriptions given in Table 2. experiments. The data descriptions are given in Table 2. We use five benchmark datasets, Pyrimidines, Machine, Boston, Abalone and Bank[14][20][6][3] for our Table 2.experiments. Benchmark Datasets The data descriptions are given in Table 2. Table 2. Benchmark Datasets Table 2. Benchmark Datasets Bechmark Size Pyrimidines 74 Bechmark Size Machine 209 Bechmark Size Pyrimidines 74 Pyrimidines 74 Boston 506 Machine 209 Machine 209 Abalone 4177 Boston 506 Boston 506 Bank 8182 Abalone 4177 Abalone 4177 Bank 8182 Bank 8182

Dimension 27 Dimension 6 Dimension 27 27 13 6 68 13 13 32 88 32 32

Firstly, the benchmark datasets are discretized into 5 equal-frequency bins, the partition of training and test is same to Firstly, [3], the results of EXC and IMC are based on [3]. The averages over 20 of trials are given inisTable Firstly, thewhere benchmark datasets areare discretized into 55 equal-frequency bins, partition of training and test is the benchmark datasets discretized into equal-frequency bins, thethe partition training and test same [3], the results oflowest EXC and IMC areMAE based on [3]. averages over 20 20 trials are given in Table same to [3],to the results and IMC are based on three [3].The The averages over trials are given in Table 3. In Table 3,where ourwhere EDLOR hasof theEXC MZE and datasets, Pyrimidines, Machine and Boston. 3. In Table our EDLOR lowest MZEand andMAE MAE on on three Machine and Boston. 3. In Table 3, our3,EDLOR hashas thethe lowest MZE threedatasets, datasets,Pyrimidines, Pyrimidines, Machine and Boston. Table 3. Compared Results with Related Works on Benchmark Datasets Table 3. Compared Results with Related Works on Benchmark Datasets Table 3. Compared Results with Related Works on Benchmark Datasets Mean Zero-one Error (MZE) Dataset Mean Zero-one Error (MZE) Dataset (r=5) EXC IMC KDLOR Mean Zero-one Error (MZE) Dataset (r=5) Prank Prank EXC IMC KDLOR (r=5) Pyrimidines Pyrimidines 0.689±0.189 0.525±0.095 Prank EXC IMC KDLOR 0.689±0.189 0.525±0.0950.517±0.086 0.517±0.086 0.475±0.141 0.475±0.141 Machine Machine 0.765±0.083 0.423±0.060 Pyrimidines 0.689±0.189 0.525±0.095 0.517±0.086 0.475±0.141 0.765±0.083 0.423±0.0600.431±0.054 0.431±0.054 0.422±0.102 0.422±0.102 Boston Boston 0.796±0.050 0.336±0.033 0.332±0.024 0.351±0.051 0.796±0.050 0.336±0.0330.431±0.054 0.332±0.024 0.422±0.102 0.351±0.051 Machine 0.765±0.083 0.423±0.060 0.740±0.075 0.522±0.0150.527±0.009 0.527±0.009 0.560±0.029 0.560±0.029 Abalone 0.740±0.075 0.522±0.015 Boston Abalone 0.796±0.050 0.336±0.033 0.332±0.024 0.351±0.051 Bank0.740±0.075 0.787±0.087 0.528±0.0040.527±0.009 0.537±0.004 0.560±0.029 0.603±0.026 Bank 0.787±0.087 0.528±0.004 0.537±0.004 0.603±0.026 Abalone 0.522±0.015 Bank

0.787±0.087

0.528±0.004

0.537±0.004

0.603±0.026

Mean Absolute Error (MZE)

EDLOR EDLOR 0.466±0.158 EDLOR 0.466±0.158 0.402±0.089 0.466±0.158 0.402±0.089 0.307±0.045 0.307±0.045 0.402±0.089 0.533±0.008 0.533±0.008 0.307±0.045 0.534±0.007 0.534±0.007 0.533±0.008

Prank 0.966±0.358 Prank 0.966±0.358 1.034±0 .136 0.966±0.358 1.034±0 .136 1.260±0.134 ±0.134 1.260 1.034±0 .136 1.555±0.410 1.555±0.410 1.260 ±0.134 1.611±0.489 1.611±0.489 1.555±0.410

Prank

0.534±0.007

1.611±0.489

Mean Absolute Error (MZE)

EXC IMC KDLOR Mean Absolute Error (MZE)

EDLOR

EXC IMC KDLOR EDLOR 0.623±0.120 0.615±0.127 0.618±0.143 0.587±0.112 EXC 0.615±0.127 IMC 0.618±0.143 KDLOR EDLOR 0.623±0.120 0.587±0.112 0.458±0.067 0.462±0.062 0.454±0.122 0.444±0.097 0.623±0.1200.462±0.062 0.615±0.127 0.618±0.143 0.587±0.112 0.458±0.067 0.454±0.122 0.444±0.097 0.362±0.0360.357±0.024 0.357±0.024 0.368±0.058 0.331±0.055 0.362±0.036 0.368±0.058 0.331±0.055 0.458±0.067 0.462±0.062 0.454±0.122 0.444±0.097 0.662±0.005 0.805±0.063 0.742±0.021 0.662±0.005 0.657±0.011 0.805±0.063 0.742±0.021 0.362±0.0360.657±0.011 0.357±0.024 0.368±0.058 0.331±0.055 0.674±0.006 0.729±0.068 0.693±0.009 0.674±0.0060.661±0.005 0.661±0.005 0.729±0.068 0.693±0.009 0.662±0.005 0.657±0.011 0.805±0.063 0.742±0.021

0.674±0.006

0.661±0.005

0.729±0.068

0.693±0.009

In order to study different quantizing metricsfor forordinal ordinal regression, thethe experiments on different In order to study different quantizing metrics regression,we weconduct conduct experiments on different quantization sizes for rank label. We separately select the number of ranks as r = 2,4,6,8,10, that is, quantization rank label. We separately select theregression, number ofwe ranks as rthe = experiments 2,4,6,8,10, that is, each In order tosizes studyfor different quantizing metrics for ordinal conduct oneach different benchmark dataset is discretized into r equal-frequency bins. We randomly select 20 percent of the samples in quantization sizes for rank label. We separately select the number of ranks as r = 2,4,6,8,10, that is, each benchmark dataset is discretized into r equal-frequency bins. We randomly select 20 percent of the samples in each class as training, and the others as test. For Bank dataset, only use 5 percent of samples for training. On benchmark dataset discretized into rasequal-frequency bins. over Weonly randomly select percent offor samplesOn in each class as training, and thesetting others test. Bank dataset, use 5arepercent ofFigure samples each dataset, forisdifferent r, the resultsFor of the averages 20 trials shown20 in 2 to 6.thetraining. different r, theasresults of the averages over 20 use trials shown Figure for 2 totraining. 6. each dataset, class as for training, andsetting the others test. For Bank dataset, only 5 are percent of in samples On each dataset, for different setting r, the results of the averages over 20 trials are shown in Figure 2 to 6.

Wenhan Jiang / Procedia Computer Science 139 (2018) 605–612 Author name / Procedia Computer Science 00 (2017) 000–000

611

Figure 2. Different quantizing metrics for compared methods on Pyrimidines dataset

Figure 3. Different quantizing metrics for compared methods on Machine dataset

Figure 4. Different quantizing metrics for compared methods on Boston dataset

Figure 5. Different quantizing metrics for compared methods on Abalone dataset

Figure 6. Different quantizing metrics for compared methods on Bank dataset

From the results in Figure 2 to 6, we can see that with the increasing of quantized rank’s number, the error rates (MZE and MAE) of all the methods become larger. Among the methods, the two discriminant learning methods EDLOR and KDLOR generally show better performances than others, and our EDLOR is superior to KDLOR. Comparing the results of MZE, in Figure 2 the MZE of five compared methods have similar results, and in Figure 3 to 6, the MZE of EDLOR and KDLOR are lower than that of others. For the results of MAE, our EDLOR obviously outperforms other methods. 4. Conclusion This paper presents a new discriminant learning ordinal regression method as EDLOR for ranking data. The proposal method enhances the discriminant ordinal learning by using global ordinal constraints on pair classes, simultaneously realize the maximum distance of any upper-lower pair category. On synthetic and benchmark

Wenhan Jiang / Procedia Computer Science 139 (2018) 605–612 Author name / Procedia Computer Science 00 (2017) 000–000

612

datasets, our proposal method is competitive with other compared methods in accuracy, and can get same faster speed with KDLOR. Acknowledgements This work was supported by the National Key R&D Program of China under Grants 2016YFC0801100. References [1] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [2] Olivier Chapelle and Mingrui Wu. Gradient descent optimization of smoothed information retrieval metrics. Inf. Retr., 13(3):216–235, 2010. [3] Wei Chu and S. Sathiya Keerthi. New approaches to support vector ordinal regression. ICML’05 , pages 145–152, 2005. [4] David Cossock and Tong Zhang. Subset ranking using regression. COLT ’06, pages 605–619, 2006. [5] Koby Crammer and Yoram Singer. Pranking with ranking. In NIPS, pages 641–647, 2001. [6] C. David, S. Zoltan, A. Anthony. A Quantitative Comparison of Dystal and Backpropagation. The Australian Conference on Neural Networks (ANN’ 96). [7] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern classification and scene analysis 2nd ed. 1995. [8] Jonathan L. Elsas, Vitor R. Carvalho, and Jaime G. Carbonell. Fast learning of document ranking functions with the committee perceptron. In WSDM ’08, pages 55–64, New York, NY, USA, 2008. [9] Frank, E. and M. Hall. A simple approach to ordinal classification. In Proceedings of the European Conference on Machine Learning, pages 145-165, 2001. [10] Hang Li. Learning to Rank for Information Retrieval and Natural Language Processing. Synthesis Lectures on Human Language Technologies. Graeme Hirst, Series Editor. Morgan & Claypool Publishers, 2013. [11] S. Har-Peled, D. Roth, and D. Zimak. Constraint classification: A new approach to multiclass classification and ranking. In Advances in Neural Information Processing Systems 15, 2002. [12] Edward Harrington. Online ranking/ collaborative filtering using the perceptron algonthm [C]. ICML2003: 251-258. [13] Herbrich, R., T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115132. MIT Press, 2000. [14] D. H. Jonathan, D. K. Ross, J. E. S. Michael. Quantitative structure-activity relationships by neural networks and inductive logic programming. Journal of Computer-Aided Molecular Design 8 (1994): 405-420, March 1994. [15] S. Kramer, G. Widmer, B. Pfahringer, and M. DeGroeve. Prediction of ordinal classes using regression trees. Fundamenta Informaticae, 47:1-13, 2001. [16] Kalervo Järvelin and Jaana Kekäläinen. In evaluation methods for retrieving highly relevant documents. ACM SIGIR, pages 4148,New York,NY, USA, 2000. [17] Ping Li, Christopher Burges, and Qiang Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems 20, pages 897–904. MIT Press, Cambridge, MA, 2008. [18] S. Mika, “Kernel Fisher Discriminants,” PhD thesis, Univ. of Technology, 2002. [19] Taesup Moon, Alex J. Smola, Yi Chang, and Zhaohui Zheng. Intervalrank: isotonic regression with listwise and pairwise constraints. In WSDM 2010, pages 151-160, 2010. [20] R.Quinlan.Combining Instance-Based and Model-Based Learning. Tenth International Conference of Machine Learning,236243,University of Massachusetts, Amherst. Morgan Kaufmann.1993. [21] Amnon Shashua and Anat Levin. Ranking with large margin principle: Two approaches. Advances in Neural Information Processing Systems 15. MIT Press, 2003. [22] Bing-Yu Sun, Jiuyong Li, Desheng Dash Wu, Xiao-Ming Zhang, and Wen-Bo Li. Kernel Discriminant Learning for Ordinal Regression. IEEE Transactions on knowledge and data engineering, Vol. 22, No. 6, June 2010. [23] Nicolas Usunier, David Buffoni, and Patrick Gallinari. Ranking with ordered weighted pairwise classification. In ICML ’09, pages 1057–1064, New York, NY, USA, 2009. [24] V. N. Vapnik. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. [25] Maksims N. Volkovs and Richard S. Zemel. Boltzrank: learning to maximize expected ranking gain. ICML’09, pages 1089–1096, New York, NY, USA, 2009. ACM. [26] Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. Learning to rank with ties. SIGIR 2008, pages 275–282, 2008.

A New Discriminative Ordinal Regression Method

A New Discriminative Ordinal Regression Method

Recommend Documents