Matrix completion incorporating auxiliary information for recommender system design

Matrix completion incorporating auxiliary information for recommender system design

Expert Systems with Applications 42 (2015) 5789–5799 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

2MB Sizes 0 Downloads 37 Views

Expert Systems with Applications 42 (2015) 5789–5799

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Matrix completion incorporating auxiliary information for recommender system design Anupriya Gogna ⇑, Angshul Majumdar Indraprastha Institute of Information Technology – Delhi, Okhla Phase – III, New Delhi, Delhi 110020, India

a r t i c l e

i n f o

a b s t r a c t

Article history: Available online 10 April 2015

Rating prediction accuracy of latent factor analysis based techniques in collaborative filtering is limited by the sparsity of available ratings. Usually more than 90% of the missing ratings need to be predicted from less than 10% of available ratings. The problem is highly under-determined. In this work, we propose to improve the prediction accuracy by exploiting the user’s demographic information. We propose a new formulation to incorporate this information into the matrix completion framework of latent factor based collaborative filtering. The ensuing problem is efficiently solved using the split Bregman technique. Experimental evaluation indicates that the use of additional information indeed improves the accuracy of rating prediction. We also compared our proposed approach with an existing technique that incorporates auxiliary information using a graph-Laplacian framework and one utilizing neighborhood based approach; we find that our proposed method yields considerably superior results. Ó 2015 Elsevier Ltd. All rights reserved.

Keywords: Collaborative filtering Matrix completion Metadata Latent factor model

1. Introduction With the ever growing density of online portals and e-commerce sites, customers can access anything online – from travel plans and conference alerts to movies and books. However, plaguing this ease of access is the information overload which a customer has to navigate through before finding the desired. This is where the role of a Recommender System (RS) (Hornick & Tamayo, 2012; Liu et al., 2014; Miller, Albert, Lam, Konstan, & Riedl, 2003) gains prominence for both customers as well as online portals. Most websites and service portals, be it movie rental services, online shopping sites or travel package providers and alike, offer some form of recommendations to the users. These recommendations provide the users more clarity, that too expeditiously and accurately in (shortlisting) limiting the items/information they need to search through, thereby improving the customer’s experience. A relevant suggestion to the user improves user’s satisfaction and hence popularity of online portals. Thus, design of an effective recommender system has sparked both academic and industrial interest, as it’s linked directly to revenue generations for the e-commerce sites. Most recommender system databases primarily consist of a partially filled rating matrix – containing ratings given by users

⇑ Corresponding author. Tel.: +91 9910622345. E-mail addresses: (A. Majumdar).

[email protected]

(A.

http://dx.doi.org/10.1016/j.eswa.2015.04.012 0957-4174/Ó 2015 Elsevier Ltd. All rights reserved.

Gogna),

[email protected]

to certain items. Ratings can either be explicit – like ratings given by users to items on a scale of 1–5 – or implicit – inferred from user’s behavior such as browsing history or buying pattern. Explicit ratings are more dependable, but suffer from the sparsity problem – usually the task is to predict more than 90% of the ratings from less than 10% of available data. Implicit ratings are easier to come across, but less dependable. Moreover it is not possible to determine negative views from implicit ratings. In this work, we will concentrate on explicit ratings. The problem of rating prediction is highly under-determined. In such a scenario, additional available information, apart from the rating database, can augment the basic model, improve performance and help achieve better Quality of Prediction (QoP). Most RS databases contain some secondary information such as user’s demographic profile, item categories or genres and user’s social network information. This information can be exploited to augment the rating information and improve the QoP. Numerous approaches have been proposed for RS design (Adomavicius & Tuzhilin, 2005; Bobadilla, Ortega, Hernando, & Gutiérrez, 2013) – the most popular being collaborative filtering (CF) (Su & Khoshgoftaar, 2009) because of its superior performance over other methods. Conventional CF techniques use the ratings given by the user to predict his/her choice and make relevant suggestions. These methods can be further divided into memory based and (latent factor) model based approaches (Adomavicius & Tuzhilin, 2005).

5790

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

At the core, memory based methods (Bell & Koren, 2007) are linear interpolation techniques; they follow heuristic neighborhood based strategies relying on the assumption that if two users have rated certain items similarly, they will have similar choice on other items as well. A weighted average of ratings by similar users is used to make predictions for target user. Similar strategy can be extended to item-item similarity based approach (Sarwar, Karypis, Konstan, & Riedl, 2001) or a combination of user and item based approaches (Wang, De Vries, & Reinders, 2006). These techniques have the advantage of easier interpretability and analysis but are computationally slow and comparatively less accurate. On the other hand, latent factor models (Hofmann, 2004) construct a lower dimensional model from the available dataset and use it for subsequent predictions. These methods are based on the premise that user’s choice of an item is determined by a small number of factors/characteristics – the latent factors, thereby enabling a lower dimensional representation. It has been observed that the latent factor model provides better prediction than neighborhood based techniques. Also, they are able to provide much wider coverage and improved accuracy than their memory based counterparts (Adomavicius & Tuzhilin, 2005). Most existing works using either of the two collaborative filtering methods utilize only the available (sparse) rating matrix, which poses a limitation on the prediction accuracy. In an attempt to improve coverage and accuracy, some works have been proposed that utilize secondary/auxiliary information, in addition to rating values, in either memory based (Bedi & Sharma, 2012; Lika, Kolomvatsos, & Hadjiefthymiades, 2014; Vozalis & Margaritis, 2007) or model based set ups (Gu, Zhou, & Ding, 2010; Koren, 2008; Ma, Zhou, Liu, Lyu, & King, 2011; Zhang, Chen, & Yin, 2013; Zhou, Shan, Banerjee, & Sapiro, 2012; Zhu, Xin, Wei, & Zhao, 2013). Although strategies which incorporate side information in memory based models, help improve accuracy and coverage to some extent, they still suffer from slow computational speeds. The most commonly used formulation for latent factor model is the matrix factorization (MF) framework (Koren, Bell, & Volinsky, 2009) which aims to recover the rating matrix as a product of two matrices – item latent factor matrix and user latent factor matrix. Matrix factorization, though fast, is bilinear and hence nonconvex; therefore there are no guarantees on global convergence. Recently, researchers have proposed an alternate formulation based on low rank matrix completion (Jaggi & Sulovsk, 2010; Lee, Recht, Srebro, Tropp, & Salakhutdinov, 2010; Shamir & ShalevShwartz, 2011), for recovering the full ratings prediction given the partial observations. It is formulated as a nuclear norm minimization problem (1)

minkY  M  X k2F þ kkX k X

ð1Þ

where X; Y 2 Rmn represent the completely filled (to be recovered) and the partially observed rating matrix respectively; k is the regularization parameter and M a binary mask. In case of latent factor models, as the number of independent variables (latent factors) is far less than the number of users or the number of items, the rating matrix has a low rank structure. Thus matrix completion approach (1) can be extended to RS design as well. The advantage of this approach is that it leads to a convex formulation unlike the matrix factorization framework. In this work, our objective is to utilize auxiliary information about the users to improve rating prediction. Acquiring the auxiliary information incurs no extra cost. This is because for most ecommerce portals, before the user uses it, they need to sign-up; the supplementary information regarding the user is collected as a part of this process. Some works exist (Gupta & Gadge, 2015; Safoury & Salah, 2013) which use similar information but their

focus is primarily to handle the cold start problem. We on the other hand solve a more general and challenging problem of improving overall rating prediction for all users. So far there is no paper that incorporated auxiliary information into the matrix completion approach. Also, most existing works focus on exploiting user’s social profile or network information to augment the base model; however in several cases such information is not available. In this work, we design a framework for using both (explicit) rating data and supplementary information, for improving accuracy, in a matrix completion framework. We focus on incorporating user demographic information – age, gender and occupation to augment the latent factor model; this information is more readily available to the portal than user’s network structure. Our design is based on the proposition that users belonging to same age group or sharing the same profession tend to have similar preferences. There exists few works that utilize similar arguments in either a neighborhood based set up (Vozalis & Margaritis, 2004) or using graphical modeling in a Non negative matrix factorization (NMF) framework (Zhu et al., 2013). We also design an efficient algorithm for our proposed framework. The novelty of our approach lies in presenting a new formulation (based on matrix completion) for incorporating user’s demographic information. We focus on reducing the variability of (predicted) rating values amongst users grouped together by some demographic trait(s) by including suitable penalty terms in the regularized matrix completion formulation. The main contribution of our work can be summarized as follows.  Propose a new formulation by augmenting matrix completion framework to utilize user’s demographic information in an attempt to improve prediction accuracy.  Propose a generalized framework for the same which can be customized to include multiple information sources (like age, gender, etc.) as per available data.  Propose an efficient algorithm using split Bregman technique (Goldstein & Osher, 2009), for solving our formulation, which is efficient both in terms of accuracy and processing speeds.  Conduct extensive experimentation to study the impact of various kinds of demographic data on the QoP. We also propose a model including just the demographic information (without explicit ratings) which performs fairly well in terms of QoP and has a much lower complexity (run times) than existing works. The remaining paper is organized as follows. In Section 2 we discuss MF formulation for latent factor and existing works incorporating auxiliary information in the same. We also review state of the art matrix completion algorithms. Section 3 describes our problem formulation and proposed algorithm. Section 4 contains the experimental setup, results and comparisons with existing CF techniques. Paper ends with conclusion and future direction in Section 5.

2. Related work 2.1. Latent factor model – matrix factorization framework Latent factor model is the current de facto approach for recommender system design. As stated above, it is based on the belief that certain (handful of) features decide the user’s preference for any particular item; these features are the latent factors. For example, in case of a book recommendation system, a user’s liking for any book may be influenced by features such as author, genre, language etc. A book may also be described in terms of the degree to

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

which it possesses these traits. Thus, both users and items can be modeled as vectors describing the degree of affinity or possession of latent factors – U u (user u ’s latent factor vector) and Ii (item i’s latent factor vector) respectively. Then, the choice of a user for an item can be modeled as interaction (inner product) between their individual latent factor vectors (2).

Interactionðu; iÞ ¼ hU u ; Ii i

ð2Þ

Most popular approach for estimating the interaction component is the matrix factorization framework. It aims to recover the interaction as a product of two matrices – U (containing latent factor vectors of users) and V (containing latent factor vectors for items) from the partially observed rating matrix as in (3).

  minkY  M  ðUV Þk2F þ k kU k2F þ kV k2F

ð3Þ

U;V

where U 2 Rmf and V 2 Rf n with f ; m; n being the number of latent factors, users and items respectively; k is the regularization parameter, Y 2 Rmn is the available set of ratings and M is the binary mask. Regularization terms prevent over-fitting of model to available ratings. Some of the algorithms proposed for MF are Probabilistic matrix factorization (PMF) (Mnih & Salakhutdinov, 2007), non-negative matrix factorization (NMF) (Lee & Seung, 2001), and fast maximum margin matrix factorization (Rennie & Srebro, 2005). 2.2. Auxiliary Information in latent factor model The major bottleneck while estimating the interaction component from the available rating information is the extreme sparsity of the data. Several works have been proposed which augment the basic MF model (3) to include supplementary data in order to improve the accuracy of estimation. We review some of them in this section. Initial work in incorporating side information into latent factor framework was undertaken in Koren (2008), where a combination of neighborhood and latent factor model was proposed to achieve higher accuracy. They combined the global outlook of latent factor models and similarity measure of neighborhood approach into a global optimization framework. Ma et al. (2011) and Zhang et al. (2013) used information about trust or social network of users to augment basic matrix factorization models. They modeled the matrix factorization problem, with suitable add-on regularization terms, such that the resultant latent factor vectors for users are similar to those in their social circle or trust network. Authors in Ma et al. (2011) modified the MF model (3) as follows

  XX  2 minkY  M  ðUV Þk2F þ k kU k2F þ kV k2F þ b simði; f ÞU i  U f F U;V

i f 2F þ i

ð4Þ The additional regularization term ensures that a user’s latent factor vector is similar to those in its trust network ðF þ Þ. simði; f Þ – similarity amongst users (based on rating pattern) – is used to weigh varying members of trust network differently. Gradient based method was used to solve the formulation. Gu et al. (2010) and Zhu et al. (2013) utilized user demography and item categorization to supplement the rating matrix data. It uses graph regularization to include side information pertaining to user demography, item category and user’s social trust network in a weighted NMF framework as in (5).

h  i h  i min kY  M  ðUV Þk2F þ k trace U T Lu U þ l trace V T LV V U;V

ð5Þ

where Lu and Lv are the graph Laplacians. The user/item graphs are constructed with edges weighted by similarity amongst users (based on demography or social network)/similarity amongst item

5791

genres. The minimization problem is formulated as low-rank semi-definite Program. Zhou et al. (2012) proposed to incorporate external information (user’s social circle) into a PMF framework. Basic PMF assumes independent latent factor vectors for describing the rows (users) and columns (items). It does not take into account any correlation that may exist between various items or users. In Zhou et al. (2012) prior distribution of rows and columns of the latent factor matrix is described as a Gaussian process, whose correlation function captures the correlation between the vectors describing various users or items. They employed stochastic gradient descent for solving their formulation. All the existing latent factor model based works (utilizing secondary data) are based on matrix factorization framework. In this work, we incorporate supplementary information in a matrix completion framework; which has not been undertaken so far. Use of matrix completion gives a convex problem formulation with proven convergence guarantees. Also, existing MF based algorithms works rely mostly on first order gradient methods to design algorithms and thus are not very efficient. We aim to design an efficient algorithm for solving augmented matrix completion formulation. 2.3. Matrix completion Matrix factorization, although extensively used is non-convex. Low-rank matrix completion (LRMC) – a nascent area of research – has been used recently for matrix recovery for applications including recommender system design. There are only a handful of papers on algorithms for this. In this section, we will review them briefly. In Cai, Candès, and Shen (2010) authors proposed a Singular Value Thresholding (SVT) algorithm which uses first order methods for solving (1). SVT is able to yield good results for only small matrices with relatively high sampling ratio, thereby making it inefficient for collaborative filtering problems. In Toh and Yun (2010) authors proposed Accelerated Proximal Gradient (APG) algorithm for low rank matrix recovery. It employs Proximal Gradient (PG) (Nesterov & Nesterov, 2004) method (using proximity operator) with an appropriate step size and an extra interpolation step to achieve faster convergence. The iterative algorithm can be summarized as follows

 tk1  1  k X  X k1 k t   1    Gk ¼ W k  sk AT A W k  b ; W k ¼ Xk þ

b ¼ v ecðY Þ; A : block diag form of M   X kþ1 ¼ Ssk Gk qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 þ 1 þ 4ðsk Þ kþ1 ¼ t 2

ð6Þ

Authors in Mohan and Fazel (2012) proposed a method for lowrank matrix recovery using the Iterative Least Square (IRLS) technique. It aims at minimizing the weighted Frobenius norm,    ð1=2Þ 2 W p X  of matrix, X. A low rank matrix ðXÞ results if weighting F

matrix W p is chosen appropriately. IRLS algorithm for nuclear norm minimization consists of following iterates

n   o T X k ¼ arg min Tr W k1 p X X : AðX Þ ¼ b     p=21 T Xk X k þ ck I W kp ¼

ð7Þ

Most of the existing methods for LRMC employ first order (gradient descent type) algorithms which require large number of iterations for convergence on large datasets. We propose an

5792

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

efficient algorithm based on split Bregman technique (Goldstein & Osher, 2009) to solve our formulation. Use of split Bregman helps achieve faster convergence and improved recovery accuracy. 3. Proposed formulation In this section, we describe our proposed formulation for design of a recommender system based on latent factor model. The novelty of our work lies in formulating a new model for exploiting user metadata along with ratings for improving prediction accuracy. Such user metadata is more readily available to the portal, compared to the users’ network structure; therefore the idea of improving rating prediction with such (easily available) metadata is feasible. Our assumption is that users in the same category (age, occupation etc.) are likely to have similar tastes; therefore the deviation of ratings within these categories will be small. 3.1. Problem formulation Consider a partially filled rating matrix R 2 RMN having ratings given by each of the M users to a subset of N items. The explicit ratings in the matrix can be modeled as in (8) as a combination of baseline estimates and the interaction component (Koren et al., 2009).

Ru;i ¼ lg þ bi þ bu þ hU u ; Ii i |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} |fflfflffl{zfflfflffl} Baseline

ð8Þ

Interaction

where Ru;i is the rating given by user u on item i.   Baseline component consists of global mean lg , user bias ðbu Þ and item bias ðbi Þ which are independent of each other and capture the general rating patterns of individual items and users. Users who are more critical tend to rate all items lower than the population average and have a negative (user) bias. Similarly, very popular items like award winning books/movies are usually rated high by almost all users; hence such items have a positive item bias. On the other hand, interaction component captures the affinity of a user towards an item. According to latent factor model, the interaction is dependent on a small number of (latent) variables or characteristics. As an example, consider a movie recommender system. Any movie can be characterized by traits such as genre, language, cast and other related features. Similarly a user can be defined in terms of his/her affinity to each of these traits. These defining traits are the latent factors and each user/item can be modeled as a vector containing information about its affinity/ degree of possession of each of these latent factors. User’s choice of an item (interaction component) is just an inner product between the individual latent factor vectors as in (2). In our work, we estimate the baseline offline using stochastic gradient descent (http://sifter.org/simon/journal/20061211. html) to solve (9) (Koren et al., 2009).

 2     2 2 minRu;i  bu  bi  lg  þ d kbu k2 þ kbi k2 bu ;bi

2

ð9Þ

where d is the regularization parameter and the regularization term is added to prevent over fitting. Once the baseline is estimated, we   extract the interaction component Y u;i from the available ratings as in (10) and use it for further rating prediction.

Y u;i ¼ Ru;i  bu  bi  lg

ð10Þ

Baseline estimation is a relatively easy and straight forward task. The real challenge is in computing the interaction component from a very sparse (rating) dataset ðYÞ hence obtained. The amount

of available information is very limited (usually less than 10% of ratings is available); this is the major bottleneck for increasing prediction accuracy. As per discussion in Section 2, traditionally matrix factorization has been mostly used for recovering the filled in interaction matrix. However, it is a bilinear and hence non-convex formulation. In our work we use the alternate matrix completion formulation, which is convex with proven convergence guarantees. As discussed above, the interaction component is dependent on a handful of the latent factors – much less than the dimensionality of the rating matrix. Thus, a limited number of variables defining the overall interaction matrix yield a low rank structure. This can be posed as a nuclear norm minimization problem.

minkY  AðZÞk2F þ kkZ k

ð11Þ

Z

where A is the binary mask and Z is the completely filled matrix of interaction component between user’s and item’s latent factor vectors. The nuclear norm minimization (the convex hull of the set of rank-one matrices with spectral norm less than one) is a convex relaxation of the NP hard rank minimization problem. Theoretical studies guarantee the recovery of low-rank matrices via this formulation (Recht, Fazel, & Parrilo, 2010). It can be solved efficiently using semi-definite programming. In order to recover the matrix with high probability nuclear norm minimization requires that the number of available ratings be larger than ð6n  5rÞr for a n  n matrix of rank r (Candès & Recht, 2013). This is by far the tightest bound on the sampling requirement for matrix completion. For our problem n is between 1000 and 3000 and the approximate number of factors ðrÞ is around 40. Therefore according to theory the proportion of available ratings should be at least around 23% (for n = 1000). This is never the case; as mentioned before, recommender systems typically have less than 10% of the ratings available. Therefore, following the theory of low-rank matrix completion, nuclear norm minimization will never be able to predict all the missing ratings accurately. Given this scenario, there is always scope for improvement by exploiting auxiliary information. In a sense, the auxiliary information makes the problem less under-determined. Obtaining the auxiliary information from the users is almost automatic in ecommerce portals. In most cases, before the user can use the system, they need to sign-up. During this process, the demographic information of the user is collected. Therefore acquiring this auxiliary information bears no extra cost. Our model is based on the premise that users who belong to same age bracket or are in similar professions exhibit similar affinities. In this work we use additional information – age, gender and occupation – about users (available in the dataset) to establish similarity between various users. Following our basic assumption, we postulate that the ratings given by such similar users would also be similar. We incorporate this idea into the matrix completion framework by modifying (11) to include penalty terms that reduce variability in ratings within a group.

minkY  AðZÞk2F þ kkZ k þ Z

X G2Groups

lG

X

!

v arg ðRÞ

ð12Þ

g2G

where G is the set of groups – age bracket, occupation, gender, etc.; is the variance of ratings given by users in a group g(of similar users say all females); lG is the respective regularization parameter for each kind of group (age/occupation/gender) G 2 Groups.

v arg ðRÞ

5793

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

Inclusion of the second regularization term in (12) promotes similarity in ratings within the group by penalizing the group’s variance. Variance within a group i.e. v ar g ðRÞ can be defined as 2 P    u2g Z u;:  mg;: 2 where Z u;: is the vector of rating given by user   P u to all items and mg;: ¼ 1=jg j u Z u;: , is the vector of mean rating value for individual items in a group g and jg j is the cardinality of the group.   The variance term v ar g ðRÞ can be viewed in matrix form as in (13), where in the similarity matrix S is defined such that Si;j ¼ 1 if users i; j belong to same group and 0 otherwise; Z k;l is the rating (interaction component) given by user k on item l.

2  Z 11 Z 12  6 Z 6 21 Z 22 6 4     Z  m1

 Z 1n  

3

2

6  7 7 1 6 S21 7 6 5 jg j 4  

 Z mn

3 Z 11 Z 12  Z 1n 2  7 6   7 76 Z 21 Z 22   7 76 7   54     5   Smm Z m1   Z mn F

S11 S12  S1m  

Sm1



32

ð13Þ

minkY  Z

AðZÞk2F

þ kkZ k þ

lG kZ 

ð14Þ

where SG is the corresponding similarity matrix for group structure Gconstructed as discussed above. With, Im  SG ¼ W G ; Im : m  m Identity matrix, we can rewrite (14) as Z

X

lG kW G Z k2F

ð15Þ

Z

AðZÞk2F

 2 X 1 X    þ kkZ k þ l1 Z u;:  Z u;:    jg j 1 u2g u2g 1

1

ð16Þ

2

where g 1 is the groups formed as per age-gender combination. We also propose to incorporate information about occupation of users as an additional regularization term added to (16) as shown in (17).

2  X 1 X    minkY  þ kkZ k þ l1 Z u;:  Z u;:    Z jg 1 j u2g u2g 1 1 2 2   X X 1   þ l2 Z u;:  Z u;:    jg 2 j u2g u2g AðZÞk2F

2

2

ð18Þ

Consider the objective function given in (19) where UðuÞ and HðuÞ are convex and H is differentiable.

min jUðuÞj1 þ HðuÞ

ð19Þ

u

Split Bregman technique focuses on decomposing a complex optimization problem (with multiple norm terms) such that they form different sub problems which can be solved more easily than the original composite objective function. Rewriting (19) by letting d ¼ UðuÞ we get

ð20Þ

Subject to d ¼ UðuÞ

The unconstrained equivalent of (20) is obtained by adding a penalization function to the problem as in (21).

k min jdj1 þ HðuÞ þ jjd  UðuÞjj22 u 2

ð21Þ

Considering Eðu; dÞ ¼ jdj1 þ HðuÞ, using (18) we can write kþ1

ðukþ1 ; d

G2Groups

Eq. (15) illustrates our generalized formulation for supplementing the matrix completion model with auxiliary information pertaining to users. It captures our reasoning that the users grouped by certain similarity measure have similar rating pattern, by using auxiliary information to augment the standard matrix completion framework. Use of additional information helps improve the robustness and accuracy of our recommender system; as support. Moreover the formulation is still convex; we have just added variances as penalty terms – variance being a convex function. In this work, we construct groups according to age-gender combination of users to form groups of similar users having same age group and gender combination. Thus, our formulation can be written as a special case of (12) as follows

minkY 

DpE ðu; v Þ ¼ EðuÞ  Eðv Þ  hp; u  v i

u

SG Z k2F

G2Groups

minkY  AðZÞk2F þ kkZ k þ

In this section, we will briefly review the spilt Bregman technique. Bregman distance forms the basis for formulation of these algorithms. For a convex function E: X ! R, where u; v 2 X and p belongs to the set of sub gradient of the function, Bregman distance is given by DpE

minjdj1 þ HðuÞ

Using (13) we can rewrite (12) as follows

X

3.2. Brief outline of split Bregman technique

  k k Þ ¼ min DpE u; uk ; d; d þ jjd  UðuÞjj22 u;d 2 ¼ min Eðu; dÞ  pku ;u  uk  pkd ; d  dk u;d

  k kþ1 þ jjd  UðuÞjj22 pukþ1 ¼ pku þ kðrUÞT Uukþ1  d pkþ1 d 2   kþ1

¼ pkd þ k d

 Uukþ1

ð22Þ

The 1st update step can be solved using ADMM (alternating direction method of multipliers) (Boyd, Parikh, Chu, Peleato, & Eckstein, 2011) by alternatingly optimizing over each variable. Simplified form of (22) is given below.

k k ukþ1 ¼ min HðuÞ þ jjd  UðuÞ  b jj22 u 2 k kþ1 k d ¼ min jdj1 þ jjd  UðuÞ  b jj22 d 2   kþ1 k kþ1 b ¼ b þ Uukþ1  d

ð23Þ

Since HðuÞ is smooth and differentiable everywhere, updation for u can be solved analytically. Solution for d is nothing but the solution for synthesis prior formulation and is obtained directly by shrinkage (soft thresholding) operator. Last step is the updation of Bregman variable. Use of split Bregman technique aids in faster convergence and lower recovery errors, as no cooling of regularization parameter is required and thus optimal values of regularization parameters for each of the sub problem can be set. 3.3. Algorithm design

ð17Þ

2

where g 2 is the groups formed as per occupational profile of users. Both (16) and (17) can be implemented as special cases of (15). We design an algorithm for our generalized formulation (15) using split Bregman technique (Goldstein & Osher, 2009). Split Bregman techniques (and in general Augmented Lagrangian methods) are suitable for solving convex problems such as ours.

In this section, we discuss the design of our algorithm based on split Bregman technique. Reconsidering our formulation (15), we   introduce proxy variables Pg to enable spilt Bregman type splitting of norm terms (24). X X minkY  AðZÞk2F þ kkZ k þ lG kW G PG k2F þ gG kPG  Z  BG k2F Z

G2Groups

G2Groups

ð24Þ

5794

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

Fig. 1. Algorithm for matrix completion with auxiliary information (MCAI).

where BG are the Bregman relaxation variables used to enforce equality at convergence between original and proxy variables. Eq. (24) can be split into two (simpler) sub problems by variable splitting as follows. Sub problem 1

X

nkY  AðZÞk2F þ kkZ k þ Z

gG kPG  Z  BG k2F

ð25aÞ

G2Groups

Sub problem 2

min PG

X

lG kW G PG k2F þ

G2Groups

X

gG kPG  Z  BG k2F

ð25bÞ

G2Groups

Sub problem 2 involves minimization over each of the proxy variables alternately. The first sub problem can be recast as follows (Gp being the total number of grouping schemes used).

0 Y  pffiffiffiffiffi  g ðP  B1 Þ B B pffiffiffiffiffi1 1 B g2 ðP2  B2 Þ minB Z B ... @  pffiffiffiffiffiffiffi   g PGp  BGp Gp

1

0

A pffiffiffiffiffi C B g1 I C B pffiffiffiffiffi C B g I 2 CB C B A @ :: pffiffiffiffiffiffiffi gGp I

1 2   C  C  C  CZ  þ kkZ k C  A   

 Soft Singular

v alueðBÞ;

k 2a

where Softðt; uÞ ¼ signðt Þ max ð0; jt j  uÞ.

A pffiffiffiffiffi g1 I pffiffiffiffiffi g2 I

1T 00

Y

1

0

A pffiffiffiffiffi g1 I pffiffiffiffiffi g2 I

Sub problem 2 is just simple least squares formulation (in each proxy variable) which can be efficiently solved. We used lsqr (Paige & Saunders, 1982) to solve them. Each iterative step for solving above sub problems is followed by updation of Bregman variables as follows

ð26Þ BG ¼ BG þ Z  PG

F

Eq. (26) can be solved by soft thresholding of singular values as follows

Z

1 11 BB C BB pffiffiffiffiffi C CC B BB g1 ðP1  B1 Þ C C BB C CC C B BB C C CC C B B B ffiffiffiffiffi p BB C CZ CC C B B B P g ð  B Þ B ¼ Z þ 1=aBB B ; 2 2 2 C C CC C B B BB C CC C BB B C B@ :: C :: ::: A A A A @ @ @ A @ pffiffiffiffiffiffiffi  pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi gGp I gGp PGp  BGp gGp I 0 0 1T 0 11 A A B B pffiffiffiffiffi C B pffiffiffiffiffi CC B B g1 I C B g1 I CC B B pffiffiffiffiffi C B pffiffiffiffiffi CC CC B g2 I C a P max B Beig B C B g 2 I CC C B B C B B B :: A @ :: C AC @ @ pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi A gGp I gGp I 00

ð27Þ

ð28Þ

Use of Bregman variables ensures that during initial iterations, equality constraint is relaxed and updation of Bregman variable over subsequent iterations help achieve convergence. Iterations continue till convergence i.e. either decrease in objective function reaches a threshold or maximum numbers of iterations are reached. The complete algorithm (matrix completion with auxiliary information – MCAI) is summarized in Fig. 1.

5795

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

4. Experiment and results We conducted experiment on two movielens datasets – 100K and 1M (http://grouplens.org/datasets/movielens/). These datasets from Grouplens are the most widely used publically available datasets (which also includes user’s demographic information) for evaluating the performance of recommender systems. The larger (10 Million) dataset from the Movielens repository contains only the explicit rating values and no demographic information and hence, our evaluation is restricted to 100K and 1M datasets. We compared out results against existing formulations for matrix completion and other frameworks utilizing auxiliary information. 4.1. Description of dataset The 100K dataset has 100,000 ratings provided by 943 users for 1682 movies. The 1M dataset consists of ratings on 3952 movies by 6040 users. Thus, both the datasets are extremely sparse with less than 5% available ratings. To augment this information, in order to improve prediction accuracy, we use the demographic profile of the users, available as part of the dataset. The information pertaining to age, gender and occupation of users is taken into consideration. The users are grouped as per their age-gender information and professions. For the 100K datasets, the groups formed are Male and female in age brackets of 1–10, 11–20, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80. Thus we have a total of 18 groups of similar users as per their age-gender combination. We also grouped users on the basis of their occupation, wherein there are 21 different occupations and equal number of groups. In case of 1M dataset, there are 7 different age groups: 1–17, 18–24, 25–34, 35–44, 45–49, 50–55 and 56+. So, a total of 14 groups are made with age-gender (M/F) information. Similar to 100K dataset, users are divided into 21 groups based on their occupation. 4.2. Experimental setup and evaluation criteria Both the datasets are divided into test and training data for cross validation. We conducted fivefold cross validation taking 80% of available ratings as train and rest as test data. The simulations are carried out on system with i7-3770S CPU @3.10 GHz with 8 GB RAM. 100 independent test runs for each test-train pair were carried out and the results reported are the average value of all the runs. For our algorithm (MCAI), the value of regularization parameter, d in (9) is taken to be 1e  3. The value of regularization parameters in our formulation (16) – MCAI (Age) or MCAI (Occ) for both 100K and 1M dataset is kept at k ¼ 1e þ 1; g1 ¼ 1e  1; l1 ¼ 1e  1. For our formulation (17) – MCAI (Age – Occ) for both 100K and 1M dataset the value of regularization parameters is kept at k ¼ 1e þ 1; g1 ¼ 1e  1; l1 ¼ 1e  1; g2 ¼ 1e  2; l2 ¼ 1e  4. The values are achieved using L-curve technique (Lawson & Hanson, 1974). We compared the Quality of Prediction of our proposed formulations against existing works in terms Mean Absolute Error (MAE) (29) and Root Mean Squared Error (RMSE) (30).

P MAE ¼

RMSE ¼

m;n Rm;n

^ m;n R

jRj vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ffi uP  u ^ t m;n Rm;n  Rm;n jRj

ð29Þ

ð30Þ

^ m;n are the actual and predicted ratings and jRj is where Rm;n and R the cardinality of the rating matrix R .

We also evaluate our algorithm based on the accuracy of Top-N recommendation measured in terms of precision (31) and recall (32) (Shani & Gunawardana, 2011) for varying lengths of recommendation list. A high value of precision and recall are marks of a good recommender system.

Precision ¼

Recall ¼

#tp #t p þ #f p

ð31Þ

#tp #t p þ #f n

ð32Þ

where t p denotes true positive (item relevant and recommended), f p denotes false positive (item irrelevant and recommended) and f n denotes false negative (item relevant and not recommended). To differentiate the relevant and irrelevant items we binarize the available ratings; marking the items rated as 4 or 5 as irrelevant and those rated below (1–3) as irrelevant to the user. 4.3. Impact of demographic information In this sub section, we study the impact of adding supplementary information to the matrix completion framework. Here we do not use the low-rank structure of the matrix, i.e. the latent factor model is ignored; we only want to study how recommendations can be based only on the user’s demographic information. We compare our work with the standard low-rank matrix completion framework for collaborative filtering. The APG (Accelerated proximal gradient) (Toh & Yun, 2010) algorithm is used to solve the LRMC problem. We also compare our scheme against a matrix factorization algorithm – Block co-ordinate descent method for NMF (BCD-NMF) (Xu & Yin, 2013). In these experiments we study the impact of various kind of supplementary information available about users in the database and their influence on Quality of Prediction. Tables 1 and 2 gives the results (error measures) for 100K and 1M dataset for various formulations. It contains results for the case where only demographic (auxiliary) information is considered without any low rank constraint on the rating matrix – AI (Age) and AI (Occ). Former involves using just the age-gender categorization and the latter uses grouping based on occupational profile of users. We solve a problem of the form given in (33) to study the same.

minkY  Z

AðZÞk2F

2  X 1 X    þ l Z u;:  Z u;:    jgj u2g u2g

ð33Þ

2

Eq. (33) has is a simple least squares formulation and can be rewritten as in (34), where W G is defined as in (15). Table 1 MAE and RMSE values for 100K dataset. Algorithm

MAE

RMSE

APG BCD-NMF AI (Age) AI (Occ)

0.8847 0.7582 0.7657 0.7797

3.7076 0.9816 0.9814 1.0014

Algorithm

MAE

RMSE

APG BCD-NMF AI (Age) AI (Occ)

0.9782 0.6863 0.7305 0.7211

3.8109 0.8790 0.9286 0.9157

Table 2 MAE and RMSE values for 1M dataset.

5796

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

Table 3 Run times comparison. Algorithm

100K dataset (s)

1M dataset (s)

APG BCD-NMF AI (Age) AI (Occ)

15.01 8.83 6.64 4.45

228.5 190.03 161.2 162.4

!

X

AT A þ

W TG W G Z ¼ AT Y

ð34Þ

G2Groups

Above equation, being a least squares formulation as a simple closed form solution given below

2 T

Z ¼4 A Aþ AT A þ

X

!T

W TG W G G2Groups

X

!T W TG W G

T

A Aþ

X

W TG W G

!31 5

G2Groups

  AT Y

ð35Þ

G2Groups

We compared our results with those obtained using matrix completion (nuclear norm minimization) using APG and BCD-NMF algorithm. It can be observed from above values that user’s demographic profile contributes significantly to deciding his/her choice. Even with just the demographic information (AI), we are able to achieve a decent MAE/RMSE value. The values are better than stateof-the-art matrix completion technique (APG). The added advantage of our proposed formulation is that, it has a closed form solution unlike matrix completion techniques which are iterative approaches requiring computing singular value decomposition (SVD) in every iteration. Therefore using the formulation (33) one is able to achieve more accurate results at considerably shorter times as illustrated by run time comparison shown in Table 3 for above discussed algorithms. As compared to matrix factorization framework (BCD-NMF algorithm), our formulation using just demographic data gives poorer MAE and RMSE values but the run time of our formulation is shorter (by up to 50%) because of simple closed form solution. A detailed look at Tables 1 and 2 reveal that age-gender combination is able to better group together users than their occupational profile as indicated by associated MAE and RMSE values. Figs. 2 and 3 gives the precision and recall values as a function of number of recommendations for all the algorithms compared in Tables 1 and 2 for the 100K dataset. Corresponding graphs for 1M dataset are shown in Figs. 4 and 5. It can be observed from the figures that for all algorithms, precision and recall improve with increasing length of recommendation list. Our formulation using just the demographic information consistently performs better than the APG algorithm for both the datasets. The improvement in precision and recall values is more pronounced for the 1M dataset than the 100K one as the former has higher explicit rating data sparsity (around 1% available values). In terms of top-N recommendation accuracy, our formulation using just the demographic information performs poorer than BCD-NMF algorithm for 100K dataset but fares comparably (sometimes better) for the 1M dataset. Given the substantially lower computational complexity of our formulation (AI-Age, AI-Occ) compared to BCD-NMF, the performance of our design is fairly good. 4.4. Comparison with existing techniques This section contains comparison between results of our formulations employing nuclear norm minimization with added

Fig. 2. Precision vs. no. of recommendations (100K).

Fig. 3. Recall vs. no. of recommendations (100K).

variance penalty terms. We compare the performance of our three approaches – MCAI (Age), MCAI (Occ) (16) and combined formulation – MCAI (Age–Occ) (17) against other works incorporating similar secondary information. We compare our work with graph regularized matrix factorization (Graph-Reg NMF) formulation proposed in Gu et al. (2010) incorporating user’s auxiliary information, the neighborhood method (KNN based) proposed in Vozalis and Margaritis (2004) and BCD-NMF (Xu & Yin, 2013) as a baseline. We selected only these two methods (Gu et al., 2010; Vozalis & Margaritis, 2004), as there is very limited literature on methods utilizing user’s demographic profile. These two works are representative of the existing state of the art formulation incorporating user’s information. There are few recent works incorporating demographic information in neighborhood based models (Gupta & Gadge, 2015; Dang, Duong, & Nguyen, 2014) but their focus is primarily on handling the cold-start problem, which is not the domain of our work. Also, as per the results highlighted in these works, their performance on movielens dataset is much poorer than our formulation (as per results in the paper) and hence, we do not explicitly report the results of these works. Most existing

5797

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799 Table 5 RMSE for 100K dataset. Algorithm

Root Mean Square Error

MC (SB) MCAI (Age) MCAI (Occ) MCAI (Age–Occ) BCD-NMF KNN based Graph-Reg NMF

0.9201 0.9193 0.9212 0.9187 0.9816 1.0467 0.9616

Table 6 MAE for 1M dataset. Algorithm

Mean Absolute Error

MC (SB) MCAI (Age) MCAI (Occ) MCAI (Age–Occ) BCD-NMF KNN based Graph-Reg NMF

0.6943 0.6772 0.6812 0.6749 0.6863 0.8198 0.7233

Fig. 4. Precision vs. no. of recommendations (1M).

neighbors for rating predictions. For our formulation, bias correction is carried out offline, in order to extract only the interaction component sans the (user and item) biases plaguing the explicit (raw) ratings. However, algorithm proposed in Xu and Yin (2013) is based on non-negative matrix factorization involving no bias correction and hence raw rating data is used for rating prediction. Also, it’s a non-convex formulation unlike MCAI (Age–Occ). Work in Vozalis and Margaritis (2004) does not involve incorporating user and item bias, however, global mean of the dataset is considered. To illustrate further, the improvement achieved using demographic (age/gender and occupation information), we also show the results obtained for formulation incorporating just the nuclear norm constraint (36) – MC (SB).

minkY  AðZÞk2F þ kkZ k

ð36Þ

Z

For solving (36), we use split Bregman technique as for the MCAI formulations by introducing a proxy variable ðHÞ as in (37) where, Bz is the Bregman variable Fig. 5. Recall vs. no. of recommendations (1M).

min kY  AðZÞk2F þ kkHk þ lkH  Z  Bz k2F

ð37Þ

Z

Table 4 MAE for 100K dataset. Algorithm

Mean Absolute Error

MC (SB) MCAI (Age) MCAI (Occ) MCAI (Age–Occ) BCD-NMF KNN based Graph-Reg NMF

0.7359 0.7264 0.7310 0.7206 0.7582 0.8302 0.7577

works use social network and related information to augment basic latent factor models. As, the movielens dataset does not contain social profiling data of users, we could not compare our formulation against other recent works. Authors is Vozalis and Margaritis (2004) modified the K nearest neighbors algorithm by utilizing users demographic information. They combined the supplementary data information and similarity based on ratings to compute net similarity quotient between various users. This enhanced similarity measure is used to determine the (K) nearest

Eq. (37) is solved using procedure similar to that outlined in Section 3.3. Tables 4 and 5 gives the MAE and RMSE values respectively for 100K dataset. MAE and RMSE values for the 1M dataset are given in Tables 6 and 7. Combining the demographic information (agegender and occupation) with the low rank constraint on the interaction component, achieves a considerable improvement in prediction accuracy as compared to just the low rank constraint (BCD-NMF algorithm). Our formulation, combining both age-gender and occupation based profiling gives around 5% lesser Table 7 RMSE for 1M dataset. Algorithm

Root Mean Square Error

MC (SB) MCAI (Age) MCAI (Occ) MCAI (Age–Occ) BCD-NMF KNN based Graph-Reg NMF

0.8813 0.8651 0.8734 0.8622 0.8790 0.9989 0.9139

5798

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

Fig. 6. Precision vs. no. of recommendations (100K).

Fig. 8. Precision vs. no. of recommendations (1M).

Fig. 7. Recall vs. no. of recommendations (100K).

Fig. 9. Recall vs. no. of recommendations (1M).

MAE than the BCD-NMF algorithm (no demographic information for both the dataset for both the datasets). Also, as compared to MC (SB) algorithm, which uses only the nuclear norm constraint (basic matrix completion), our formulations incorporating demographic information gives better results. By use of demographic information – combined age, gender and occupation data – we are able to achieve a reduction of over 2% in MAE values for both the 100K and 1M dataset. This validates our claim that including auxiliary information into the matrix completion framework in addition to the rating data achieves considerable improvement in rating prediction accuracy. It can be seen from above observations that our algorithm MCAI (Age–Occ) shows significant improvement over other algorithms. MCAI (Age–Occ) achieves an improvement of over 15% in recovery accuracy as compared to nearest neighbor based approach. It indicates the superiority of latent factor based models over the neighborhood approach. Our algorithm is also able to achieve approximately 5% improvement in both MAE and RMSE values

over the Graph-Reg NMF algorithm which also uses similar demographic information. However, it works with raw rating data whereas in our formulation, we use baseline estimation. User and item bases negatively affect the accurateness of rating prediction when working with raw data. Also, the improvement in accuracy indicates the efficiency of split Bregman technique in achieving better results. Similar pattern can be seen for the results with 1M dataset as well. We are able to achieve significant reduction in error values (around 7%) compared to Graph-Reg NMF and over KNN based approach (21% reduction in MAE). Figs. 6–9 shows the precision and recall values for 100K and 1M dataset for all the algorithms/formulations. As expected, KNN based neighborhood model performs the poorest amongst all algorithms. Even though Graph-Regularized NMF framework works better than KNN model; our formulation using either age or occupation as secondary information is able to outperform Graph-Reg NMF with respect to both precision and recall. Also, our full (combined) formulation imbibing both age-gender and

A. Gogna, A. Majumdar / Expert Systems with Applications 42 (2015) 5789–5799

occupational similarity grouping preforms slightly but consistent better than our formulations incorporating either of them alone. This indicates that as we incorporate additional information into the matrix completion framework, the QoP improves. 5. Conclusion In this work, we propose to improve the rating prediction accuracy of collaborative filtering by utilizing user demographic information along with user ratings. Our new formulation exploits user demographic information to augment the matrix completion framework for collaborative filtering. Most existing works incorporating such secondary information focus on augmenting the matrix factorization framework. However, being bi-linear, matrix factorization is a non-convex formulation with no convergence guarantees. We, on the other hand augment the matrix completion framework to include users’ demographic information. Our assumption is that users’ in same categories will rate items similarly, therefore the variance within these categories will be small. Therefore we augment the matrix completion problem with variance minimizing penalties. Our proposed formulation is convex and has been solved using the Split Bregman approach; it is guaranteed to converge to the global solution. We summarize the outcomes of our experiments. First, we find that just by utilizing the similarity information (among groups), without any latent factor model (low-rank assumption) we are able to achieve better results than state of the art techniques in low-rank matrix completion. This too at the expense of reduced computation. The second is that our proposed approach of utilizing user metadata yields better results than prior techniques. In the future, we aim to include item category/genre information also into the design for further performance enhancement. Also, we would like to extend our design for other recommender system examples as well such as books or restaurants. References Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. Bedi, P., & Sharma, R. (2012). Trust based recommender system using ant colony for trust computation. Expert Systems with Applications, 39(1), 1183–1190. Bell, R. M., & Koren, Y. (2007). Improved neighborhood-based collaborative filtering. In KDD cup and workshop at the 13th ACM SIGKDD international conference on knowledge discovery and data mining. Bobadilla, J., Ortega, F., Hernando, A., & Gutiérrez, A. (2013). Recommender systems survey. Knowledge-Based Systems, 46, 109–132. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and TrendsÒ in Machine Learning, 3(1), 1–122. Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982. Candès, E., & Recht, B. (2013). Simple bounds for recovering low-complexity models. Mathematical Programming, 141(1-2), 577–589. Dang, T. T., Duong, T. H., & Nguyen, H. S. (2014). A hybrid framework for enhancing correlation to solve cold-start problem in recommender systems. In Seventh IEEE symposium on computational intelligence for security and defense applications (CISDA), 2014 (pp. 1–5). IEEE. Goldstein, T., & Osher, S. (2009). The split Bregman method for L1-regularized problems. SIAM Journal on Imaging Sciences, 2(2), 323–343. Gu, Q., Zhou, J., & Ding, C. H. (2010). Collaborative filtering: Weighted nonnegative matrix factorization incorporating user and item graphs. In SDM (pp. 199–210). Gupta, J., & Gadge, J. (2015). Performance analysis of recommendation system based on collaborative filtering and demographics. In International conference on communication, information & computing technology (ICCICT), 2015 (pp. 1–6). IEEE. Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS), 22(1), 89–115. Hornick, M. F., & Tamayo, P. (2012). Extending recommender systems for disjoint user/item sets: The conference recommendation problem. IEEE Transactions on Knowledge and Data Engineering, 24(8), 1478–1490.

5799

Jaggi, M., & Sulovsk, M. (2010). A simple algorithm for nuclear norm regularized problems. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 471–478). Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 426–434). ACM. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37. Lawson, C. L., & Hanson, R. J. (1974). Solving least squares problems (Vol. 161). Englewood Cliffs, NJ: Prentice-hall. Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (pp. 556–562). Lee, J., Recht, B., Srebro, N., Tropp, J., & Salakhutdinov, R. (2010). Practical large-scale optimization for max-norm regularization. In Advances in neural information processing systems (pp. 1297–1305). Lika, B., Kolomvatsos, K., & Hadjiefthymiades, S. (2014). Facing the cold start problem in recommender systems. Expert Systems with Applications, 41(4), 2065–2073. Liu, Q., Chen, E., Xiong, H., Ge, Y., Li, Z., & Wu, X. (2014). A cocktail approach for travel package recommendation. IEEE Transactions on Knowledge and Data Engineering, 26(2), 278–293. Ma, H., Zhou, D., Liu, C., Lyu, M. R., & King, I. (2011). Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 287–296). ACM. Miller, B. N., Albert, I., Lam, S. K., Konstan, J. A., & Riedl, J. (2003). MovieLens unplugged: Experiences with an occasionally connected recommender system. In Proceedings of the eighth international conference on intelligent user interfaces (pp. 263–266). ACM. Mnih, A., & Salakhutdinov, R. (2007). Probabilistic matrix factorization. In Advances in neural information processing systems (pp. 1257–1264). Mohan, K., & Fazel, M. (2012). Iterative reweighted algorithms for matrix rank minimization. The Journal of Machine Learning Research, 13(1), 3441–3473. Nesterov, Y., & Nesterov, I. E. (2004). Introductory lectures on convex optimization: A basic course (Vol. 87). Springer. Paige, C. C., & Saunders, M. A. (1982). LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software (TOMS), 8(1), 43–71. Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501. Rennie, J. D., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd international conference on machine learning (pp. 713–719). ACM. Safoury, L., & Salah, A. (2013). Exploiting user demographic attributes for solving cold-start problem in recommender system. Lecture Notes on Software Engineering, 1(3), 303–307. Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on world wide web (pp. 285–295). ACM. Shamir, O., & Shalev-Shwartz, S. (2011). Collaborative filtering with the trace norm: Learning, bounding, and transducing. Shani, G., & Gunawardana, A. (2011). Evaluating recommendation systems. In Recommender systems handbook (pp. 257–297). US: Springer. Su, X., & Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009, 4. Toh, K. C., & Yun, S. (2010). An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of Optimization, 6(615–640), 15. Vozalis, M., & Margaritis, K. G. (2004). Collaborative filtering enhanced by demographic correlation. In AIAI symposium on professional practice in AI, of the 18th world computer congress. Vozalis, M. G., & Margaritis, K. G. (2007). Using SVD and demographic data for the enhancement of generalized collaborative filtering. Information Sciences, 177(15), 3017–3037. Wang, J., De Vries, A. P., & Reinders, M. J. (2006). Unifying user-based and itembased collaborative filtering approaches by similarity fusion. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 501–508). ACM. Xu, Y., & Yin, W. (2013). A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3), 1758–1789. Zhang, Y., Chen, W., & Yin, Z. (2013). Collaborative filtering with social regularization for TV program recommendation. Knowledge-Based Systems, 54, 310–317. Zhou, T., Shan, H., Banerjee, A., & Sapiro, G. (2012). Kernelized probabilistic matrix factorization: exploiting graphs and side information. In SDM (Vol. 12, pp. 403– 414). Zhu, Z., Xin, P., Wei, S., & Zhao, Y. (2013). Orthogonal graph-regularized matrix factorization and its application for recommendation. In IEEE international conference on multimedia and expo (ICME), 2013 (pp. 1–6). IEEE.