Matrix factorization with heterogeneous multiclass preference context

Matrix factorization with heterogeneous multiclass preference context

ARTICLE IN PRESS JID: NEUCOM [m5G;February 3, 2020;11:57] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing ...

2MB Sizes 0 Downloads 73 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;February 3, 2020;11:57]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Matrix factorization with heterogeneous multiclass preference contextR Jing Lin a,b,c, Weike Pan a,b,c,∗, Lin Li a,b,c, Zixiang Chen a,b,c, Zhong Ming a,b,c,∗ a

National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, China Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, China c College of Computer Science and Software Engineering, Shenzhen University, China b

a r t i c l e

i n f o

Article history: Received 18 June 2019 Revised 12 November 2019 Accepted 20 January 2020 Available online xxx Communicated by Dr. Jiajun Bu Keywords: Multiclass preference context (MPC) Dual MPC Pipelined MPC Matrix factorization Collaborative filtering

a b s t r a c t The spreading use of the Internet and big data technology has spawned the need for recommendation systems. However, to alleviate public anxiety about privacy, this paper advocates making recommendation with internal context, which refers to the implicit context hidden beneath the rating matrix only. Inspired by a recent work that embeds neighborhood information among users (as represented by multiclass preference context, MPC) into a factorization-based method, we extend it to both user-oriented and item-oriented, and put forward both user-oriented MPC and item-oriented MPC into a generic factorization framework called matrix factorization with heterogeneous MPC (MF-HMPC). In particular, we derive two specific recommendation methods from MF-HMPC with different structures, including MF with dual MPC (MF-DMPC) and MF with pipelined MPC (MF-PMPC), which are corresponding to concurrent structure and sequential structure, respectively. Extensive empirical studies on four public datasets show that our two forms of MF-HMPC outperform the referenced state-of-the-art MF methods as well as two representative deep learning methods. Moreover, we also discover some interesting facts about how these two kinds of internal contextual information complement each other. The main advantage of our methods is that they manage to strike a good balance between user-oriented neighborhood information and item-oriented neighborhood information. © 2020 Elsevier B.V. All rights reserved.

1. Introduction Recommendation systems [2–4], which are designed to help users pick out preferable items from a massive collection, benefit both customers and supplier with convenience and efficiency, and therefore attract a great deal of attention from both the academic and industrial communities. Nowadays, apart from traditional user-item rating matrix data, an infinite variety of auxiliary data can be captured and utilized by recommendation systems to enhance recommendation performance, for they can reveal different external context information R This work is an extension of our previous work [1]. In this paper, we have added the following new contents: (i) we have developed a new method, i.e., MF-PMPC, in Section 5; (ii) we have included new experimental results and associated analyses in Section 6; (iii) we have added more related works and more detailed descriptions of baseline methods in Sections 2–4; and (iv) we have made many improvements throughout the whole paper including figure illustrations and linguistic quality. ∗ Corresponding authors. E-mail addresses: [email protected] (J. Lin), [email protected] (W. Pan), [email protected] (L. Li), [email protected] (Z. Chen), [email protected] (Z. Ming).

such as physical context (e.g., time [5,6], location [7]) and social context (e.g., trust [8]). However, with the growing awareness of personal privacy, using rating matrix only to discover more internal context (latent collaborative pattern) [9,10] is a more reliable and perpetually efficient strategy, and still occupies an important place in the research community. Different from context-aware recommendation systems [5–8,11] that make recommendations with external context, traditional recommendation systems, typically collaborative filtering recommendation systems [9–13] only consider internal context that is extracted form ratings assigned by users to items. A recently proposed model called matrix factorization with multiclass preference context (MF-MPC) [10] is a unified method which combines the two major categories of collaborative filtering – neighborhood-based [11] and model-based [9,12]. Briefly, MFMPC is an improved method of SVD [12], which is also a matrix factorization (MF) method. Specifically, MF-MPC strengthens the MF method by modeling the multiclass preference context (MPC) of users as a newly added matrix, which also performs the same functions as the user similarity in a neighborhood-based method.

https://doi.org/10.1016/j.neucom.2020.01.060 0925-2312/© 2020 Elsevier B.V. All rights reserved.

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

ARTICLE IN PRESS

JID: NEUCOM 2

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

In this paper, we further introduce two MF models that contain not only user similarity but also item similarity, and collectively referred to as matrix factorization with heterogeneous multiclass preference context (MF-HMPC). More specifically, MF-HMPC consists of matrix factorization with dual multiclass preference context (MF-DMPC) and matrix factorization with pipelined multiclass preference context (MF-PMPC). MF-DMPC combines user-oriented MPC and item-oriented MPC in a concurrent way by joining them into one prediction formula. While in MF-PMPC, user-oriented MPC and item-oriented MPC are presented in different prediction rules for two sequential steps. For empirical studies, we compare our two models against not only the two referenced models – SVD and MF-MPC, but also two representative deep learning models – RBM [14] and NCF [15]. We also analyze the advantage factors of these two proposed models by studying the independent and joint effect of the two types of MPC. Experimental results show that: (1) it is helpful to introduce MPC into MF models but the influences of user-oriented and item-oriented MPC differ according to the data characteristics; (2) MF-DMPC outperforms two single-oriented MF-MPC; (3) MF-PMPC can obtain better performance than MF-DMPC and it relies more on the first step training; (4) both MF-DMPC and MF-PMPC defeat the two deep learning methods, i.e., RBM and NCF. In general, our MF-HMPC that unifies MF-DMPC and MF-PMPC inherits both high accuracy of model-based recommendation algorithms and good explainability of neighborhood-based algorithms, and further strikes a good balance between user-oriented neighborhood information and item-oriented neighborhood information. We organize the remainder of this paper as follows. We introduce some matrix factorization based and deep learning based recommendation algorithms that use rating matrix only in Section 2. We describe the rating prediction problem that we focus on, and make an interpretation of MPC in Section 3. We formulate our proposed framework MF-HMPC, which consists of MF with dual MPC in Section 4, and MF with pipelined MPC in Section 5. We conduct experiments of our proposed models and other related models, and analyze the results in Section 6. Finally, we give some concluding remarks in Section 7.

to represent user similarity. Instead of ignoring categorical scores, MF-MPC makes full use of multiclass preference context and treats SVD++ as an exception, so as to gain more accurate results. Notice that such a CF problem is equivalent to a low-rank matrix completion problem in essence, which can also be efficiently solved by Schatten p-norm and lp -norm minimization to enhance the robustness [16–18]. While in this paper, we focus on utilizing the internal context that is peculiar to recommendation systems. We thus extend MF-MPC, and obtain a unified framework MF-HMPC with two special cases, i.e., MF-DMPC and MF-PMPC. Our MF-HMPC framework is a typical example of exploring and utilizing the internal context aiming at more accurate rating prediction.

2. Related work

3. Preliminaries

As mentioned above, discovering recommendation algorithms that exploit rating matrix only is still a research hotspot especially when novel technologies become popular. In this section, we introduce the traditional CF methods that our proposed models based on, as well as some deep learning methods that are in vogue recently.

Before bringing out our proposed models, we need to formalize the rating prediction problem that we focus on, and make an interpretation of multiclass preference context (MPC), which is also our pointcut to upgrade the MF models.

2.1. Traditional collaborative filtering algorithms

In this paper, we focus on making the most of internal context in a recommendation system, which means that we only utilize an incomplete rating matrix represented by R = {(u, i, rui )} for preference prediction. Notice that u represents one of the ID numbers of n users (or rows in the rating matrix), i represents one of the ID numbers of m items (or columns), and rui ∈ M is the recorded rating of user u to item i, where M can be {1, 2, 3, 4, 5}, {0.5, 1, 1.5, . . . , 5} or other ranges. Our goal is to build progressive models to predict the vacancies of the rating matrix. In other words, we are studying a problem called rating prediction, which is also a typical problem in the field of collaborative filtering. Some notations used to establish our recommendation methods are shown in Table 1.

Traditional collaborative filtering algorithms adopted in recommendation systems utilize explicit ratings data from users’ feedback. Making recommendations with internal context only, traditional CF methods can be broadly classified into two categories, including neighborhood-based methods and model-based methods. Neighborhood-based CF, which makes predictions by calculating user similarity (represented by user-oriented CF) or item similarity (represented by item-oriented CF [13]) based on certain assumptions. Model-based CF (represented by SVD [12]) works out latent feature vectors of users and items through matrix factorization, and then obtains predictive scores. SVD++ method proposed by Koren [9], which is regarded as an attempt to combine neighborhood-based and model-based methods goes a step further. By analyzing the merits and demerits of SVD++, Pan and Ming proposed the MF-MPC method [10], in which they pointed out that the new complex latent vector in SVD++ is actually a vector form

2.2. Deep learning based collaborative filtering algorithms With the great increase of data resources and computational power, deep learning methods have come to our eyes in a highprofile way. Conceivably, many researchers have adopted these deep learning methods into recommendation systems domain and achieved many delightful results. For example, there are two typical examples of deep learning based CF methods which adapt restricted Boltzmann machines (RBM) and neural networks (NN) respectively to rating prediction problem in recommendation systems – RBM [14] and NCF [15]. RBM is composed of two undirected layers of binary hidden units and softmax visible units, where the binary rating matrix of each user is input and reconstructed in the visible layer. NCF adopts a multi-layer perceptron to learn the user-item interaction so as to provide the model with non-linearities. Though proven to be effective, these methods are much similar to MF methods because they are all solving a data reconstruction problem in principle and lack of interpretability, let alone consideration of neighborhood information. Now in our MF-HMPC framework, we include an approach to join neighborhood information into MF models, which will by no means be enlightening to the improvement of deep learning CF methods too. Notice that deep learning methods that involve external context through recurrent neural networks [19,20], convolutional neural networks [21,22] and so on are not in our focused area.

3.1. Problem definition

3.2. Multiclass preference context In the state-of-the-art matrix factorization based model, i.e., SVD [12], the prediction rule for the rating of user u to item i can

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

ARTICLE IN PRESS

JID: NEUCOM

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

Pan et al. then added the neighborhood information U¯ uMPC to · SVD model so as to get the MF-MPC prediction rule [10] for the rating of user u to item i as follows,

Table 1 Some notations and explanations. Symbol

Meaning

n m u, u i, i M rui ∈ M R = { (u, i, rui )} yui ∈ {0, 1} Iu Iur , r ∈ M Ui Uir , r ∈ M μ∈R bu ∈ R bi ∈ R d∈R Uu· ∈ R1×d Vi· ∈ R1×d Nur · ∈ R1×d Mir· ∈ R1×d U¯ uMPC · V¯ MPC

number of users number of items user ID item ID multiclass preference set rating of user u to item i rating records of training data indicator, yui = 1 if (u, i, rui ) ∈ R and yui = 0 otherwise items rated by user u items rated by user u with rating r users who rate item i users who rate item i with rating r global average rating value user bias item bias number of latent dimensions user-specific latent feature vector item-specific latent feature vector user-specific latent feature vector w.r.t. rating r item-specific latent feature vector w.r.t. rating r aggregated user-specific latent preference vector aggregated item-specific latent preference vector rating records of test data the final predicted rating of user u to item i the first predicted rating (iff in residual based algorithm) the second predicted rating (ditto) the residual rating (ditto) iteration number



Rte = { (u, i, rui )} rˆui 1 rˆui 2 rˆui RES rui T

be written as follows,

rˆui = Uu·Vi· T + bu + bi + μ,

(1)

where Uu· ∈ R1×d , Vi· ∈ R1×d , bu ∈ R, bi ∈ R and μ ∈ R are the userspecific latent feature vector, item-specific latent feature vector, user bias, item bias and global average, respectively. Through investigations about combining neighborhood-based and factorization-based methods, Pan et al. [10] proposed a categorical internal context to encode the neighborhood information in a matrix factorization framework. Intuitively, the rating of user u to item i, i.e., rui , can be represented in a probabilistic way as follows,

Prob(rui |(u, i ); (u, i , rui ), i ∈ ∪r∈M Iur \{i} ),

(2)

which means that rui is dependent on not only the (user, item) pair (u, i), but also the examined items i ∈ Iu \{i} and the categorical score rui ∈ M assigned to each item by user u. Here, the condition (u, i , rui ), i ∈ ∪r∈M Iur \{i} is given a name multiclass preference context (MPC) in contrast to oneclass preference context (OPC) without categorical scores. In order to introduce MPC into an MF method, here defines a user-specific aggregated latent preference vector U¯ uMPC for user u · from the multiclass preference context [10],

= U¯ uMPC ·

 r∈M



1



|Iur \{i}| i ∈Iur \{i}

3

Mir · .

(3)

Notice that Mir· ∈ R1×d can be considered as a classified itemspecific latent feature vector w.r.t. rating r, and √ 1r plays as |Iu \{i}|

a normalization term for the preference of class r. It is also worth noting that the variables in Mir· are independent of the users and are shared by all calculations of U¯ uMPC · . The difference between the two MPC of two users u and u are essentially determined by Iur and Iur  , which are the items they rated and the corresponding ratings. The outcome that users who rated similar items with similar ratings having similar MPC is consistent with the basic assumption of user-based CF, therefore we believe that MPC can represent user similarity, as declared in [10].

T rˆui = Uu·Vi· T + U¯ uMPC · Vi· + bu + bi + μ,

(4)

where Uu· , Vi· , bu , bi and μ are exactly the same with that of the SVD model. MF-MPC is proved to generate better recommendation performance than SVD [12] and SVD++ [9], and also embraces them as special cases. 4. Matrix factorization with dual multiclass preference context Inspired by the differences between user-oriented and itemoriented collaborative filtering [11], we can infer that item similarity (item-oriented MPC) can also be introduced to improve the performance of matrix factorization models. Furthermore, thanks to the extendibility of MF models, we can hopefully join both useroriented MPC and item-oriented MPC into the prediction rule so as to obtain a hybrid model, i.e., matrix factorization with dual multiclass preference context (MF-DMPC). 4.1. Item-oriented multiclass preference context Now we restate U¯ uMPC (mentioned in the previous section) as · user-oriented multiclass preference context (user-oriented MPC). Similarly, we have a symmetrical form of MPC called item-oriented multiclass preference context (item-oriented MPC) V¯iMPC to repre· sent item similarity, which is formulated as,

= V¯iMPC ·

 r∈M





1

| \{u}| u ∈U r \{u} Uir

Nur  · ,

(5)

i

where Nur · ∈ R1×d is a user-specific latent preference vector w.r.t. rating r. Likewise, we have the prediction rule of item-oriented MFMPC,

rˆui = Uu·Vi· T + V¯iMPC Uu· T + bu + bi + μ. ·

(6)

The learning process of matrix factorization with user-oriented and item-oriented MPC respectively are quite similar. With different prediction rules, they have the same abbreviated optimization function as follows,

arg min 

n  m  u=1 i=1

yui

1 2

 (rui − rˆui )2 + reg(u, i ) .

(7)

In particular, the regularization terms reg(u, i) vary for specific cases, i.e., in user-oriented MF-MPC,   reg(u, i ) = α2m r∈M i ∈I r \{i} ||Mir · ||2F + α2 ||Uu· ||2 + α2 ||Vi· ||2 + u

α ||b ||2 + α ||b ||2 , and item-oriented MF-MPC, reg(u, i ) = u 2 2 i αn  α α α r 2 2 2 2 r∈M u ∈Uir \{u} ||Nu · ||F + 2 ||Uu· || + 2 ||Vi· || + 2 ||bu || + 2 α ||b ||2 . Hence, with e = (r − rˆ ), the model parameters to 2

i

ui

ui

ui

be learned and the corresponding gradients are also different as shown in Table 2. With the gradients, we can update the model parameters  via the update rule,

θ = θ − γ ∇θ ,

(8)

where γ is the learning rate, and θ ∈  is a model parameter to be learned. 4.2. Dual multiclass preference context When choosing between a user-oriented CF and an itemoriented CF to implement in recommendation systems, we often consider criteria such as accuracy, efficiency, and so on [11]. However these properties always depend on the characteristics of

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

ARTICLE IN PRESS

JID: NEUCOM 4

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

Table 2 The gradients of the model parameters  = {Mir· , Uu· , Vi· , bu , bi , μ} in user-oriented MF-MPC and  = {Nur · , Uu· , Vi· , bu , bi , μ} in item-oriented MF-MPC, with u = 1, 2, . . . , n, i = 1, 2, . . . , m, r ∈ M in common. User-oriented MF-MPC

Item-oriented MF-MPC

∇Uu· = −euiVi· + αUu· ∇Vi· = −eui (Uu· + U¯ uMPC ) + αVi· · ∇ Mir · = √−euir Vi· + αm Mir · , i ∈ Iur \{i}

∇Uu· = −eui (Vi· + V¯iMPC ) + αUu· · ∇Vi· = −euiUu· + αVi·

|Iu \{i}|

ui Uu· + αn Nur  · , u ∈ Uir \{u} ∇ Nur  · = √−e |U r \{u}|

MF-MPC [10] lies in the prediction rule as shown in Eq. (9) and the corresponding gradients. From the prediction rules shown in Eq. (4) and Eq. (9), and the algorithmic framework in Algorithm 2, we can see that the time complexity of MF-DMPC is about twice as that of the most closely related work MF-MPC. 5. Matrix factorization with pipelined multiclass preference context

i

∇ bu = −eui + α bu ∇ bi = −eui + α bi ∇μ = −eui

datasets [23], e.g., the ratio between the number of users and the number of items in the system. The good news is that through the advanced matrix factorization method, we can handle both user-oriented and item-oriented neighborhood information by joining both U¯ uMPC and V¯iMPC simul· · taneously in the model, which is thus named as dual multiclass preference context (DMPC). To combine them together, we acquire T ¯ MPCUu· T as the DMPC term. U¯ uMPC · Vi· + Vi· In this way, we obtain our new model, i.e., matrix factorization with dual multiclass preference context (MF-DMPC). The prediction rule of our MF-DMPC for the rating of user u to item i is finally defined as follows, DMPC rˆui

= Uu·Vi·

T

T ¯ MPCUu· T + bu + bi + μ, + U¯ uMPC · Vi· + Vi·

(9)

with all notations described ahead. The development process from SVD to MF-DMPC can be found in Fig. 1. As shown in this figure, ingredient of SVD are core for all offspring methods. With a newly defined user-oriented MPC U¯ uMPC interacting with item feature Vi· , user-oriented MF-MPC (MF· MPCuser ) comes into being. Similarly we obtain item-oriented MFMPC (MF-MPCitem ). MF-DMPC is a fusion of both < user-oriented MPC, item > interaction and < item-oriented MPC, user > interaction. All these methods aim at reducing the loss between the predicted rating rˆui and the true rating rui . 4.3. Learning algorithm of MF-DMPC



where 

n  m 

yui

u=1 i=1

1 2

rui − rˆriDMPC

2



+ regDMPC (u, i ) ,

(10)

   regDMPC (u, i ) = α2m r∈M i ∈I r \{i} ||Mir · ||2F + α2n r∈M

r 2 u ∈Uir \{u} ||Nu · ||F

5.1. Residual training It is understandable that the residual training strategy can be regarded as adding a fine tuning process upon a sufficient state. That is to say, in residual training, the prediction rule for rating rˆui 1 and rˆ2 , as follows, is divided into two parts, i.e., rˆui ui 1 2 rˆui = rˆui + rˆui .

u

+ α2 ||Uu· ||2 + α2 ||Vi· ||2 + α2 ||bu ||2 + α2 ||bi ||2 is the

regularization term used to avoid overfitting, and  = {Uu· , Vi· , bu , bi , μ, Mir· , Nur · } with u = 1, 2, . . . , n, i = 1, 2, . . . , m, r ∈ M are model parameters to be learned. Notice that the objective function of MF-DMPC is still similar to that of MF-MPC. The difference lies in the newly added item-oriented MPC, i.e., V¯iMPC Uu· T in the pre·   diction rule, and α2n r∈M u ∈U r \{u} ||Nur  · ||2F in the regularization i term. Using the stochastic gradient descent (SGD) algorithm, we can easily have the gradients of the model parameters for a randomly sampled rating record (u, i, rui ) according to the gradients in Table 2 and update the parameters according to Eq. (8). In summary, the algorithm of MF-DMPC (see Fig. 2) consists of three major steps. Firstly, we randomly pick out a rating record sample from the training data. Secondly, we calculate the gradients. Thirdly, we update each model parameter via Eq. (8). The major difference between the algorithm of MF-DMPC and that of

(11)

Having the prediction rule above, it can be inferred that there are two steps in series in an integrated residual training process of MF-PMPC as follows, •

Step 1: Train a user-oriented or item-oriented MF-MPC model 1 , making it as close to the training data r as posto obtain rˆui ui sible, and get the residual rating, RES 1 rui = rui − rˆui .



Now let us run through the algorithm of MF-DMPC by reviewing similar routes of MF-MPC again. With the MF-DMPC prediction rule in Eq. (9), we can learn the model parameters in the following minimization problem,

arg min

In the above section, user-oriented MPC and item-oriented MPC join together through the addition of < user-oriented MPC, T item > interaction U¯ uMPC · Vi· and < item-oriented MPC, user > inT MPC ¯ teraction Vi· Uu· in an MF prediction rule. Intuitively, it is a concurrent way of combination. However, some studies have found that it can be remarkable if we combine two kinds of methods using the residual training strategy [24,25]. For instance, we can obtain the predicted ratings through a pipelined process, which executes independent user-oriented MPC and item-oriented MPC methods one after another. For this reason, we name these kinds of methods as matrix factorization with pipelined multiclass preference context (MF-PMPC).

(12)

Step 2: Train another MF-MPC model to obtain the prediction 2 , of which the target value is the residual data r RES invalue rˆui ui stead of the original training data rui .

5.2. Pipelined multiclass preference context As to assemble user-oriented MPC and item-oriented MPC into a pipelined process, we now arrange the < user-oriented MPC, T item > interaction U¯ uMPC · Vi· and the < item-oriented MPC, user > T T MPC interaction V¯i· Uu· in two steps. When introducing U¯ uMPC · Vi· in T MPC the first step and V¯ Uu· in the residual step, this pipelined algoi·

rithm is denoted as MF-PMPCuser → item . On the contrary, we denote a residual MF method which starts with item-oriented MPC as MF-PMPCitem → user . When choosing the first prediction rule in step 1 and the residual prediction rule in step 2, logically we have formulas as follows (take MF-PMPCuser → item as example, see Fig. 3),

Step 1: T 1 rˆui = Uu·Vi· T + U¯ uMPC · Vi· + bu + bi + μ (the same as Eq. (4)),

(13) Step 2: T

T

2 rˆui = Uu2·Vi2· + V¯iMPC Uu2· + b2u + b2i + μ2 (the same as Eq. (6)), ·

(14) or 2 rˆui = V¯iMPC Uu· T ( interaction only). ·

(15)

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

JID: NEUCOM

ARTICLE IN PRESS

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

5

Fig. 1. Illustration of SVD (in black), MF-MPCuser (in green), MF-MPCitem (in red) and our MF-DMPC (in blue). Notice that all symbols in rectangles are defined in Table 1. The stars mark the results of each method. Solid lines express value pass, and dashed line points from the true rating to the predicted one.

Fig. 2. The algorithm of matrix factorization with dual multiclass preference context (MF-DMPC).

The essential distinction between the above two residual rules in the second step is whether to retrain Uu· , Vi· , bu , bi and μ, i.e., 2 the original model parameters in SVD. Notice that, Uu2· , Vi2· , b2 u , bi 2 and μ in Eq. (14) need to be initialized and trained somewhat 2 = differently from Eq. (13), due to the change of target rating, rui RES = r − rˆ1 . However in Eq. (15), U rui is kept as the same in its u · ui ui previous step, and only parameter V¯iMPC needs to be updated. · Our preliminary studies show that it is more steady and effective to retrain the SVD parameters in the residual step, probably to ensure the coordination of different parameters. So we determine the prediction rules of MF-PMPCuser → item as Eqs. (13) and (14). Meanwhile, we have MF-PMPCitem → user in a symmetrical way as follows, 1 Step 1: rˆui = Uu·Vi· T + V¯iMPC Uu· T + bu + bi + μ, ·

(16)

T 2 2T Step 2: rˆui = Uu2·Vi2· + U¯ uMPC + b2u + b2i + μ2 . · Vi·

(17)

5.3. Learning algorithm of MF-PMPC Continuing with the MF-PMPCuser → item example, the optimization function, gradients calculation and update rule of Eqs. (13) and (14) are similar to that of single user-oriented MF-MPC and single item-oriented MF-MPC (see Table 2), respectively. Only the target rating rui changes as, 1 Step 1: rui = rui

(the same as the original rating),

2 RES 1 Step 2: rui = rui = rui − rˆui

(18)

(change to the residual rating). (19)

Fig. 3. Illustration of MF-PMPCuser → item . Similar to Fig. 1, the left side shows possible components of one step’s results in turn, and the right side depicts the integration of two steps’ results. The difference is that the two coarse lines denote two options for the residual result.

The algorithm of MF-PMPC is shown in Fig. 4. Looking throughout the algorithm, it can be found that MF-PMPC degenerates into MF-MPC when lines 12–23 are removed. Moreover, on account of the demand of retraining the SVD parameters, MF-PMPC contains more parameters to be learned. In this way, MF-PMPC is more likely to promote the coordination of different parameters and obtain better performance. It can be easily seen that the time complexity of MF-PMPC is twice as that of MF-MPC.

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

Fig. 4. The algorithm of matrix factorization with pipelined multiclass preference context (MF-PMPC) via user → item configuration.

Table 3 Statistics of the datasets used in the experiments, including the number of users (n), the number of items (m), the number of rating records in the whole dataset (|R| + |Rte |), the ratio of the number of users to the number of items (n/m), and the density of training data (|R|/nm).

Fig. 5. Illustration of the relationship among different model-based recommendation methods. Notice that OPC and MPC denote oneclass preference context and multiclass preference context, respectively.

5.4. Our final matrix factorization with heterogeneous multiclass preference context framework We summarize the relationship of our MF-DMPC and MF-PMPC and other methods in Fig. 5, which illustrates the development of our solutions based on basic MF methods. And Finally, we unify MF-DMPC and MF-PMPC into a generic factorization-based framework, i.e., matrix factorization with heterogeneous multiclass preference context (MF-HMPC). In this framework, the two specific variants of MF models with two types of MPC are of different structures, i.e, MF with dual MPC (MF-DMPC) for concurrent structure and MF with pipelined MPC (MF-PMPC) for sequential structure. Both methods in this framework integrate two kinds of neighborhood information into a factorization model for rating prediction in a seamless and complementary way.

6. Experiments

Dataset

n

m

|R| + |Rte |

n/m

|R|/nm

ML100K ML1M ML10M NF10M

943 6040 71,567 50,000

1682 3952 10,681 17,770

100,000 1,000,209 10,000,054 10,442,504

0.56 1.53 6.70 2.81

5.04% 3.35% 1.05% 0.94%

6.1. Datasets and evaluation metrics We follow [10] to use three MovieLens datasets and supplement a Netflix dataset (subset) in our experiments. Among these four datasets, MovieLens 100K (ML100K), MovieLens 1M (ML1M) and MovieLens 10M (ML10M) [26] can be treated as three datasets of different scales, size or density from the same domain. While Netflix subset with 50,0 0 0 users and about 10 million records (NF10M) is also a movie rating dataset but collected from another platform. We show the statistics of these datasets in Table 3. Notice that the ratio of the number of users to the number of items, i.e., n/m or the shape the of rating matrix, is an important factor to analyze the effectiveness of MPC. We use five-fold cross validation to obtain the evaluation results of each case (the combination of datasets and baselines), in which the best tradeoff parameters (e.g. α , α m and α n ) are searched using the first copy of each dataset. The recommendation performance of each method can be measured by the prediction error with respect to the real ratings. From an overall perspective of the datasets, we adopt mean absolute error (MAE) and root mean square error (RMSE) as final evaluation metrics in our studies, which are formulated as follows (the smaller the better),



MAE = In this section, we conduct extensive empirical studies in order to evaluate the recommendation performance of our proposed models and as well as to investigate the comparison and complementarity between the two single-oriented MPC.

(u,i,rui )

RMSE =



|rui − rˆui |/|Rte |,

∈Rte



(rui − rˆui )2 /|Rte |,

(u,i,rui )∈Rte

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

JID: NEUCOM

ARTICLE IN PRESS

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

where Rte denotes the set of test records.

Table 4 Configurations of the tradeoff parameters of MF methods, where χ i ∈ {0.001, 0.01, 0.1}, i = 1, 2, . . . , 8 and “ - ” denotes N/A.

6.2. Baselines and parameter settings Our experiments aim at two questions: (i) how much do different kinds of MPC help an MF model and (ii) how can they complement each other. For the first question, we mainly refer to finding out the effect of single type of MPC acting on diverse conditions (e.g., data of different scales). For this reason, we have the foremost baselines: •





SVD is the basic MF model for recommendations without MPC (see Eq. (1)). MF-MPCuser is an MF model with only a user-oriented MPC term (see Eq. (4)). MF-MPCitem is an MF model with only an item-oriented MPC term (see Eq. (6)).

For the second question, we make further investigation about the integration ways of two single types of MPC and their performance. Thus we have some additional methods in the experiments: •





MF-DMPC is an MF model which is combined with dual MPC, by integrating both a user-oriented MPC term and an itemoriented MPC term together, which can be found in Eq. (9). Structurally, DMPC is a parallel combination form. MF-PMPCuser → item is a model which is divided into two MF steps – the first step is equivalent to a user-oriented MF-MPC, and the second one is an item-oriented MF-MPC but with target value replaced by the residual value obtained from step 1 shown in Eq. (13) and Eq. (14). PMPC can be considered as a sequential modeling technique. MF-PMPCitem → user is a reverse process of MF-PMPCuser → item , which operates an item-oriented MF-MPC followed by a useroriented MF-MPC for the residual described in Eqs. (16) and (17).

Apart from the above MF models, we also include two deep learning CF models to manifest the undeniable recommendation performance of our proposed models: •



RBM [14] utilizes a class of two-layer undirected graphical models called restricted Boltzmann machines to make rating prediction. NCF [15] utilizes neural networks with multi-layer perceptron layers to model implicit feedback in a non-linear way and can be easily adapted to an explicit feedback version.

In order to study the complementarity of item-oriented MFMPC and user-oriented MF-MPC, we follow [10] for fair comparison. Specifically, we configure the parameters of the factorizationbased methods as follows. For the learning rate γ , we set its initial value to a common default value, i.e., γ = 0.01. For the number of latent dimensions d, it is sufficient to show the advantages of each methods when d = 20 [10]. For the iteration number, we fix it as T = 50, where the results have reached a steady state. And for each baseline on each dataset, the tradeoff parameters are selected to be different values through parameter tuning experiments using the first copy of each dataset and the RMSE metric (As shown in Table 4, the results of the second step in MF-PMPC are gained using the corresponding results of MF-MPC so the first step tradeoff parameters are kept unchanged). For RBM [14], we use the hidden layer of 20 units, mini-batches of size 16, CD learning with 1 step Gibbs sampling, and the learning rate of 0.05. Taking consideration of efficiency and effectiveness, we adopt a momentum of 0.9 and search the optimal weight decay value α w ∈ {0.0 0 05, 0.0 01, 0.0 05, 0.01} according to the

7

Method

α

α2

αm

αn

SVD MF-MPCuser MF-MPCitem MF-DMPC MF-PMPCuser → item MF-PMPCitem → user

χ1 χ2 χ3 χ4 χ2 χ3

-

-

-

χ7 χ8

χ2 -

χ5 χ2 χ3

χ3 χ6 χ7 χ8

RMSE results. We also apply early stop strategy to avoid overfitting, i.e., model training stops when the RMSE no longer decrease within 20 epoches or the number of training iterations reach to a ceiling value of 200. For more details, the weights in the model are initialized with random values sampled from a normal distribution N (0, 0.012 ) and the biases are initialized as 0. Since we apply NCF [15] for rating prediction, we use meansquare-loss and linear function for the prediction layer. Following [15], we adopt similar parameter settings. Gaussian distribution (with a mean of 0 and standard deviation of 0.01) will be used for model parameters initialization and mini-batch Adam for optimization. Specifically, we fix the MLP hidden layers as 64 → 32 → 16 → 8, which corresponds to the best performance from the paper, and search the best value of the learning rate from γ NCF ∈ {0.0 0 01, 0.0 0 05, 0.0 01, 0.0 05} and the batch size from sbatch ∈ {128, 256, 512, 1024}. 6.3. Results 6.3.1. Main results The experimental results of all mentioned baselines are shown in Table 5. The adjustable parameters shown in the table are the searched best values for each method. The best result of each dataset is highlighted in bold and the suboptimal result is marked with “ ”. From the results in Table 5, we can have the following observations: •







The recommendation performance improves with richer preference context, i.e., from void preference context in SVD to simple and complex preference context in MF-MPC, MF-DMPC and MF-PMPC. For modeling user-oriented preference context and itemoriented preference context, the performance depends on the ratio of user group size to item group size, i.e., n/m. Specifically, item-oriented MF-MPC performs better when n/m is of a suitable size (as on ML100k and ML1M). And as n/m becomes larger (as on ML10M and NF10M), user-oriented MPC behaves more reliable because of the increasing probability to find similar users. This can be found from the ratio n/m in Table 3 and the results in Table 5. For modeling dual preference context, MF-DMPC outperforms SVD and MF-MPC in all cases. However, the performance of MF-DMPC is in a way restrained by the better result of useroriented MF-MPC and item-oriented MF-MPC. Gratifyingly, this improvement reveals that MF-DMPC strikes a good balance between user-oriented MPC and item-oriented MPC, and provides evidences of facilitating recommendations with combination of user-oriented MPC and item-oriented MPC. For modeling sequential preference context, MF-PMPC is the best approach in all cases, which showcases the effectiveness of our residual-based sequential modeling approach. The relative performance between the two types of MF-PMPC is similar to that of MF-MPCuser and MF-MPCitem , which shows the importance of the first step in MF-PMPC. There is one excep-

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

ARTICLE IN PRESS

JID: NEUCOM 8

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx Table 5 Recommendation performance of Singular value decomposition (SVD), matrix factorization with user-oriented multiclass preference context (MF-MPC), and our MF with dual MPC (MF-DMPC), MF with pipelined MPC (MF-PMPC) and two deep learning methods RBM and NCF on four datasets. Data

Method

MAE

ML100K

SVD MF-MPCuser MF-MPCitem MF-DMPC MF-PMPCuser → item MF-PMPCitem → user RBM NCF SVD MF-MPCuser MF-MPCitem MF-DMPC MF-PMPCuser → item MF-PMPCitem → user RBM NCF SVD MF-MPCuser MF-MPCitem MF-DMPC MF-PMPCuser → item MF-PMPCitem → user RBM NCF SVD MF-MPCuser MF-MPCitem MF-DMPC MF-PMPCuser → item MF-PMPCitem → user RBM NCF

0.7446 0.7129 0.7025 0.7008 0.7002 0.6991 0.7734 0.7189 0.7017 0.6605 0.6568 0.6552 0.6572 0.6549 0.7071 0.6746 0.6067 0.5950 0.5983 0.5941 0.5933 0.5952 0.6550 0.6108 0.6506 0.6415 0.6435 0.6390 0.6392 0.6389 0.7000 0.6526

ML1M

ML10M

NF10M



Parameter ((α , α m , α n ) for MF methods)

RMSE ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±

0.0033 0.0032 0.0025 0.0027 0.0025 0.0029 0.0028 0.0042 0.0016 0.0013 0.0013 0.0013 0.0014 0.0012 0.0009 0.0034 0.0007 0.0005 0.0005 0.0005 0.0006 0.0007 0.0008 0.0006 0.0003 0.0004 0.0005 0.0005 0.0003 0.0004 0.0007 0.0021

tion that it is hard to tell whether MF-MPCuser or MF-MPCitem is better on NF10M, where n/m is small and the records number is large. And MF-PMPC may fail to outperform MF-DMPC when the first-stage performance is much weaker (see the results of MF-MPCuser → item on ML1M and NF10M, and the results of MFMPCitem → user on ML10M). Last but not the least, we find the two deep learning methods behave poor in our cases. Specifically, RBM is utterly defeated by SVD on all the four datasets. Although NCF shows more power to compete with SVD, it only beats SVD on two smaller (or denser) datasets (i.e., ML100K and ML1M) and is still worse than MF-MPC on all datasets, which in a way proves MF-MPC to be a strong baseline.

In a summary, the results in Table 5 show that our MF-PMPC as well as MF-DMPC not only inherit high accuracy of model-based algorithms and good explainability of neighborhood-based algorithms, but also make better use of neighborhood-based information in collaborative filtering. 6.3.2. Effectiveness of dual MPC To better reveal the effectiveness of dual reference context, we show the MAE and RMSE results of MF-MPC and MF-DMPC for different iteration numbers in Fig. 6. In this figure, we focus on the study of the two one-staged solutions and leave out the two-staged solution MF-PMPC because its first-stage training is the same as MF-MPC. Notice that with the advantage shown in the sufficiently trained first stage, the second stage of MFPMPC mainly plays a role as fine-tuning and its effect is discussed in the next subsection. As mentioned earlier, the results of both methods have achieved convergence in the 50th iteration. The

0.9445 0.9098 0.8980 0.8972 0.8959 0.8941 0.9693 0.9169 0.8899 0.8442 0.8411 0.8409 0.8410 0.8396 0.8979 0.8634 0.7913 0.7782 0.7833 0.7782 0.7766 0.7796 0.8500 0.7994 0.8402 0.8314 0.8342 0.8303 0.8298 0.8300 0.8948 0.8477

± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±

0.0035 0.0028 0.0020 0.0029 0.0032 0.0025 0.0022 0.0041 0.0023 0.0017 0.0016 0.0017 0.0018 0.0052 0.0008 0.0034 0.0009 0.0006 0.0005 0.0007 0.0006 0.0006 0.0005 0.0009 0.0004 0.0005 0.0003 0.0004 0.0005 0.0005 0.0005 0.0004

(0.01, N/A, N/A) (0.01, 0.01, N/A) (0.01, N/A, 0.01) (0.01, 0.01, 0.01) (0.01, 0.01, N/A) (0.01, N/A, 0.01) αw = 0.01 γNCF = 0.0 0 01, sbatch = 128 (0.01, N/A, N/A) (0.01, 0.01, N/A) (0.01, N/A, 0.01) (0.01, 0.01, 0.01) (0.01, 0.01, N/A) (0.01, N/A, 0.01) αw = 0.001 γNCF = 0.0 0 01, sbatch = 256 (0.01, N/A, N/A) (0.01, 0.01, N/A) (0.01, N/A, 0.01) (0.01, 0.001, 0.1) (0.01, 0.01, N/A) (0.01, N/A, 0.01) αw = 0.0 0 05 γNCF = 0.0 0 05, sbatch = 512 (0.01, N/A, N/A) (0.01, 0.01, N/A) (0.01, N/A, 0.01) (0.01, 0.001, 0.01) (0.01, 0.01, N/A) (0.01, N/A, 0.01) αw = 0.0 0 05 γNCF = 0.001, sbatch = 128

changing trend of MAE and RMSE are quite similar. Look back at earlier iterations, we can see that MF-DMPC shows its strongest superiority during the first 10 iterations and roughly keeps its stable advantage after the 30th iteration. These indicate that dual preference context helps speed up the improvement of accuracy during training. Moreover, it explores and utilizes more information especially when data records are limited so as to gain better results. 6.3.3. Effectiveness of pipelined MPC It has been clarified that MF-PMPC is an algorithm which executes two different MF-MPC steps in series. In this section, we study the effectiveness of pipelined MPC by visually comparing the performance of MF-MPC and MF-PMPC using histograms as Fig. 7. As found in previous research, this figure shows the varying influence degrees of MF-MPCuser and MF-MPCitem on different datasets according to the n/m ratio. Furthermore, MF-PMPCuser → item and MF-PMPCitem → user have similar properties. Specifically, positive adjustment from MF-MPC to corresponding MF-PMPC (i.e. from MF-MPCuser to MF-PMPCuser → item , or from MF-MPCitem to MFPMPCitem → user ) is relatively larger if the first step MF-MPC is at a disadvantage. However, the second step only plays a role in narrowing the contrast rather notably reversing it. Overall, with a different MF-MPC step concatenated, MF-PMPC gains obvious error decrease in all cases. For all these results are obtained with their best searched parameter values, we can ignore the impact of the appended number of parameters and iterations, and infer that the improvements of the second step probably come from the only difference (i.e., the other MPC) that is newly included, which means that pipelined MPC captures what is complementary to single MPC to bring these improvements.

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

ARTICLE IN PRESS

JID: NEUCOM

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

9

Fig. 6. Recommendation performance of MF-MPC and our MF-DMPC on the first copies of four datasets with different iteration numbers.

Fig. 7. Recommendation performance of MF-MPC and our MF-PMPC on the first copy of three datasets.

7. Conclusions and future works In this paper, we study a classical recommendation problem, i.e., rating prediction in a user-item matrix. In particular, we focus on exploiting the important but rarely studied internal context beneath users’ ratings, and develop a generic factorization-based framework, i.e., matrix factorization with heterogeneous multiclass preference context (MF-HMPC). We then design two specific variants with different structures, including MF with dual MPC (MFDMPC) for concurrent structure and MF with pipelined MPC (MFPMPC) for sequential structure, which integrate both user-oriented and item-oriented neighborhood information into factorization model for rating prediction in a seamless and complementary way. Empirical studies on four public datasets clearly showcase the advantages of our methods over the very state-of-the-art methods. For future works, we are interested in studying the issues of robustness of factorization-based algorithms with internal preference context. We also plan to study some advanced strategies such

as adversarial sampling [27], denoising [28] and multilayer perception [15] in our proposed factorization framework.

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement Jing Lin: Conceptualization, Methodology, Software, Validation, Investigation, Writing - original draft. Weike Pan: Conceptualization, Methodology, Writing - review & editing, Supervision, Funding acquisition. Lin Li: Software, Validation, Investigation. Zixiang Chen: Software, Validation, Investigation. Zhong Ming: Resources, Supervision, Funding acquisition.

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060

JID: NEUCOM 10

ARTICLE IN PRESS

[m5G;February 3, 2020;11:57]

J. Lin, W. Pan and L. Li et al. / Neurocomputing xxx (xxxx) xxx

Acknowledgement We thank the handling Associate Editor and Reviewers for their efforts and constructive expert comments, and Mr. Yunfeng Huang for assistance in code review and helpful discussions. We thank the support of National Natural Science Foundation of China Nos. 61872249, 61836005 and 61672358. Weike Pan and Zhong Ming are the corresponding authors for this work. References [1] J. Lin, W. Pan, Z. Ming, MF-DMPC: Matrix factorization with dual multiclass preference context for rating prediction, in: Proceedings of the International Conference on Web Services, in: ICWS’18, 2018, pp. 337–349. [2] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng. 17 (6) (2005) 734–749. [3] J. Bobadilla, F. Ortega, A. Hernando, A. GutiéRrez, Recommender systems survey, Knowl. Based Syst. 46 (2013) 109–132. [4] J. Lu, D. Wu, M. Mao, W. Wang, G. Zhang, Recommender system application developments, Decis. Support Syst. 74 (C) (2015) 12–32. [5] P.G. Campos, F. Díez, I. Cantador, Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols, User Model. User Adapt. Interact. 24 (1–2) (2014) 67–119. [6] Y. Koren, Collaborative filtering with temporal dynamics, in: Proceedings of the Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in: KDD’09, 2009, pp. 447–456. [7] X. Liu, Y. Liu, X. Li, Exploring the context of locations for personalized location recommendations, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, in: IJCAI’16, 2016, pp. 1188–1194. [8] B. Yang, Y. Lei, J. Liu, W. Li, Social collaborative filtering by trust, IEEE Trans. Pattern Anal. Mach. Intell. 39 (8) (2017) 1633–1647. [9] Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in: KDD’08, 2008, pp. 426–434. [10] W. Pan, Z. Ming, Collaborative recommendation with multiclass preference context, IEEE Intell. Syst. 32 (2) (2017) 45–51. [11] B.S. Francesco Ricci Lior Rokach, Recommender Systems Handbook, 2nd, Springer, 2015. [12] S. Rendle, Factorization machines with libfm, ACM Trans. Intell. Syst. Technol. 3 (3) (2012) 57:1–57:22. [13] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Item-based collaborative filtering recommendation algorithms, in: Proceedings of the Tenth International Conference on World Wide Web, in: WWW’01, 2001, pp. 285–295. [14] R. Salakhutdinov, A. Mnih, G. Hinton, Restricted boltzmann machines for collaborative filtering, in: Proceedings of the Twenty-Fourth International Conference on Machine Learning, in: ICML’07, 2007, pp. 791–798. [15] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, T.-S. Chua, Neural collaborative filtering, in: Proceedings of the Twenty-Sixth International Conference on World Wide Web, in: WWW’17, 2017, pp. 173–182. [16] F. Nie, H. Wang, H. Huang, C.H.Q. Ding, Joint schatten p-norm and p -norm robust matrix completion for missing value recovery, Knowl. Inf. Syst. 42 (3) (2015) 525–544. [17] F. Nie, H. Wang, X. Cai, H. Huang, C.H.Q. Ding, Robust matrix completion via joint Schatten p-norm and lp-norm minimization, in: Proceedings of the Twelfth IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10–13, 2012, 2012, pp. 566–574. [18] F. Nie, H. Huang, C. Ding, Low-rank matrix recovery via efficient Schatten p-norm minimization, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, in: AAAI’12, 2012, pp. 655–661. [19] B. Hidasi, M. Quadrana, A. Karatzoglou, D. Tikk, Parallel recurrent neural network architectures for feature-rich session-based recommendations, in: Proceedings of the Tenth ACM Conference on Recommender Systems, in: RecSys’16, 2016, pp. 241–248. [20] M. Jarana, M. Craig, O. Iadh, A contextual attention recurrent architecture for context-aware venue recommendation, in: Proceedings of the SIGIR’18, ACM, 2018, pp. 555–564. [21] D. Kim, C. Park, J. Oh, S. Lee, H. Yu, Convolutional matrix factorization for document context-aware recommendation, in: Proceedings of the Tenth ACM Conference on Recommender Systems, in: RecSys’16, 2016, pp. 233–240. [22] S. Seo, J. Huang, H. Yang, Y. Liu, Interpretable convolutional neural networks with dual local and global attention for review rating prediction, in: Proceedings of the Eleventh ACM Conference on Recommender Systems, in: RecSys’17, 2017, pp. 297–305. [23] G. Adomavicius, J. Zhang, Impact of data characteristics on recommender systems performance, ACM Trans. Manag. Inf. Syst. 3 (1) (2012) 3:1–3:17. [24] M. Jahrer, A. Töscher, R. Legenstein, Combining predictions for accurate recommender systems, in: Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in: KDD’10, 2010, pp. 693–702. [25] L. Li, W. Pan, L. Chen, Z. Ming, RLT: Residual-loop training in collaborative filtering for combining factorization and global-local neighborhood, in: Pro-

ceedings of the International Conference on Web Services, in: ICWS’18, 2018, pp. 326–336. [26] F.M. Harper, J.A. Konstan, The movielens datasets: history and context, ACM Trans. Interact. Intell. Syst. 5 (4) (2015) 19:1–19:19. [27] J. Wang, L. Yu, W. Zhang, Y. Gong, Y. Xu, B. Wang, P. Zhang, D. Zhang, Irgan: a minimax game for unifying generative and discriminative information retrieval models, in: Proceedings of the Fortieth International ACM SIGIR Conference on Research and Development in Information Retrieval, in: SIGIR’17, 2017, pp. 515–524. [28] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in: Proceedings of the Twenty-Fifth International Conference on Machine Learning, in: ICML’08, 2008, pp. 1096–1103. Jing Lin received the B.S. degree in Electronic and Information Engineering from the Shenzhen University, Shenzhen, China, in 2018. She is currently a master student in the National Engineering Laboratory for Big Data System Computing Technology and the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. Her research interests include collaborative recommendation, deep learning and transfer learning. Contact her at [email protected].

Weike Pan received the Ph.D. degree in Computer Science and Engineering from the Hong Kong University of Science and Technology, Kowloon, Hong Kong, China, in 2012. He is currently an associate professor with the National Engineering Laboratory for Big Data System Computing Technology and the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include transfer learning, recommender systems and machine learning. He has been active in professional services. He has served as an editorial board member of Neurocomputing, a coguest editor of a special issue on big data of IEEE Intelligent Systems (2015–2016), an information officer of ACM Transactions on Intelligent Systems and Technology (2009–2015), and journal reviewer and conference/workshop PC member for dozens of journals, conferences and workshops. Contact him at [email protected]. Lin Li received the B.S. degree in Computer Science and Technology from the Shenzhen University, Shenzhen, China, in 2017. She is currently a master student in the National Engineering Laboratory for Big Data System Computing Technology and the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. Her research interests include collaborative recommendation, deep learning and transfer learning. Contact her at [email protected].

Zixiang Chen received the B.S. degree in Software Engineering from the Ludong University, Yantai, China, in 2016. He is currently a master student in the National Engineering Laboratory for Big Data System Computing Technology and the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include collaborative recommendation and deep learning. Contact him at chenzixiang2016@ email.szu.edu.cn.

Zhong Ming received the Ph.D. degree in Computer Science and Technology from the Sun Yat-Sen University, Guangzhou, China, in 2003. He is currently a professor with the National Engineering Laboratory for Big Data System Computing Technology and the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include software engineering and web intelligence. Contact him at [email protected].

Please cite this article as: J. Lin, W. Pan and L. Li et al., Matrix factorization with heterogeneous multiclass preference context, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.060