Neurocomputing 366 (2019) 86–96
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
LSCD: Low-rank and sparse cross-domain recommendation Ling Huang a,b, Zhi-Lin Zhao a, Chang-Dong Wang a,b,∗, Dong Huang c, Hong-Yang Chao a a
School of Data and Computer Science, Sun Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou, Guangdong 510006, China b Guangdong Province Key Laboratory of Computational Science, Guangzhou, China c College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China
a r t i c l e
i n f o
Article history: Received 2 January 2019 Revised 30 May 2019 Accepted 30 July 2019 Available online 6 August 2019 Communicated by Dr Zenglin Xu Keywords: Recommendation Cross-domain Low-rank Sparse
a b s t r a c t Due to the ability of addressing the data sparsity and cold-start problems, Cross-Domain Collaborative Filtering (CDCF) has received a significant amount of attention. Despite significant success, most of the existing CDCF algorithms assume that all the domains are correlated, which is however not always guaranteed in practice. In this paper, we propose a novel CDCF algorithm termed Low-rank and Sparse CrossDomain (LSCD) recommendation algorithm. Different from most of the CDCF algorithms, LSCD extracts a user and an item latent feature matrix for each domain respectively, rather than tri-factorizing the rating matrix of each domain into three low dimensional matrices. In order to simultaneously improve the performance of recommendations among correlated domains by transferring knowledge and among uncorrelated domains by differentiating features in different domains, the features of users are separated into shared and domain-specific parts adaptively. Specifically, a low-rank matrix is used to capture the shared features of each user across different domains and a sparse matrix is used to characterize the discriminative features in each specific domain. Extensive experiments on two real-world datasets have been conducted to confirm that the proposed algorithm transfers knowledge in a better way to improve the quality of recommendation and outperforms state-of-the-art recommendation algorithms. © 2019 Elsevier B.V. All rights reserved.
1. Introduction Nowadays, for achieving the benefit of businesses and the satisfaction of users, many online platforms like Amazon, Netflix, Douban and App store use recommendation algorithms to recommend items to users who are the most likely to be interested in them by analyzing huge amounts of user behavior data [1–10]. The task of recommendation algorithm is to speculate the value of missing ratings in the sparse user-item rating matrix by analyzing a small portion of known ratings [11–13]. Then some unrated items with high predicted ratings will be recommended to the target users. Collaborative Filtering (CF) [14–16] is the most widely used recommendation algorithm and one representative technology for collaborative filtering is matrix factorization. Over the past decade, matrix factorization has attracted an increasing amount of attention and has been applied in many areas
∗ Corresponding author at: School of Data and Computer Science, Sun Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou, Guangdong 510 0 06, China. E-mail addresses:
[email protected] (L. Huang),
[email protected]. edu.cn (Z.-L. Zhao),
[email protected] (C.-D. Wang), huangdonghere@ gmail.com (D. Huang),
[email protected] (H.-Y. Chao).
https://doi.org/10.1016/j.neucom.2019.07.091 0925-2312/© 2019 Elsevier B.V. All rights reserved.
such as machine learning [17], computer vision [18] and data mining [19]. From the perspective of collaborative filtering [14,20,21], matrix factorization firstly extracts latent factors of users and items from the user-item rating matrix and then predicts missing ratings according to those latent factors. In practice, the rating matrix is usually quite sparse so that it is difficult to learn satisfactory latent factors of users and items which will make a great impact on the quality of rating prediction. Besides, it is a challenging task to make reliable predictions for new users due to the lack of the relevant rating data. In order to solve the above data sparsity and cold-start problems as well as improve the quality of recommendation [22], transfer learning [23,24] has been integrated into matrix factorization to transfer the knowledge from auxiliary data to rating data. Some typical auxiliary data are social media [25,26], user reviews [27,28], product images [29], context [30] and tags [31]. As a special case of transfer learning, Cross-Domain Collaborative Filtering (CDCF) [32–35] transfers knowledge of rating data among multiple domains to make a better recommendation. It predicts ratings for all domains by learning from multiple rating matrices of different domains together. Combining all domains, we have more data to describe the features of users which can mitigate the sparsity problem. If a user does not get any ratings
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
to items in a target domain but there are some ratings in other relative domains, we can solve the cold-start problem by learning the features of the user from other relative domains. Most existing CDCF methods [36–39] tri-factorize the rating matrix of each domain into three low dimensional matrices which represent user latent factors (or labels), codebook (or rating pattern matrix) and item latent factors (or labels) respectively. Usually, the codebook describing the relations between the clusters of users and items is shared among domains. Besides, users may appear in all domains and the user factors can also be shared, so user latent factor matrix or codebook can be viewed as a bridge to transfer knowledge. However, in the scenario where the domains are uncorrelated, the bridge cannot transfer meaningful knowledge and would even transfer negative knowledge which reduces the effect of recommendation. Besides, even for the same user, the performance will differ from one domain to another. So there are some domainspecific features of each user. Unfortunately, there is still a lack of methods differentiating shared features and domain-specific features adaptively. To solve the above problems, we propose a novel crossdomain collaborative filtering algorithm termed Low-rank and Sparse Cross-Domain (LSCD) recommendation algorithm. Different from most of the existing CDCF algorithms, we decompose all the rating matrices of different domains into user and item latent feature matrices. More specifically, each user latent feature matrix consists of two parts: user-domain-shared feature matrix and user-domain-specific feature matrix. The former is used to describe the overall preferences of users among multiple domains which is modeled by a low-rank matrix. And the latter is used to characterize the discriminative features of users in each domain because the expressions of some features vary from one domain to another which is modeled by a sparse matrix. For example, if a user is willing to give high ratings to all items on average, we can capture this feature by analyzing the rating records of all domains. But if the user prefers movies to books, the user may give higher ratings on movie domain than book domain. So, a user-domain-specific feature matrix should be used to adjust shared feature matrix to fit specific domain. Therefore, if the domains are correlated, the performance can be improved because we use more rating data to learn the shared features of users. On the contrary, if the domains are uncorrelated, the performance can still be improved because we can distinguish the domain-specific features of users in different domains. Extensive experiments have been conducted on two real world datasets: Amazon and MovieLens. The results show that the proposed LSCD recommendation algorithm can improve the quality of recommendation as the number of domains increases even the domains are uncorrelated. And it outperforms the state-of-the-art recommendation algorithms. The main contributions of this paper are summarized as follows. 1. We propose a novel cross-domain collaborative filtering algorithm termed LSCD, which is able to adaptively transfer knowledge among correlated domains and differentiate features among uncorrelated domains. 2. Different from most of the cross-domain collaborative filtering algorithms which tri-factorize the rating matrix of each domain into three low dimensional matrices, LSCD extracts a user and an item latent feature matrix for each domain respectively. 3. The user latent feature matrix is separated into shared and domain-specific parts adaptively. The former is used to describe the overall preferences of users among multiple domains and the latter is used to characterize the discriminative features of users in different domains.
87
4. Extensive experiments have been conducted to confirm the effectiveness of the proposed method. The rest of this paper is organized as follows. In Section 1, we will briefly introduce the related work. Section 3 describes in detail the proposed algorithm, including the objective function, the optimization and the theoretical analysis. In Section 4, extensive experiments will be conducted to confirm the effectiveness of the proposed method. We draw the conclusion and present the future work in Section 5. Some results in this paper were first presented in [40]. 2. Related work The most remarkable matrix factorization algorithm is probabilistic matrix factorization (PMF) [19] which extracts user and item latent feature vectors by decomposing the rating matrix and makes rating prediction according to those latent features. In order to prevent overfitting, Gaussian-Wishart priors have been introduced on the hyperparameters of user and item latent feature vectors by Bayesian probabilistic matrix factorization (BPMF) [41]. For obtaining integral rating prediction, some approaches have been developed, like Maximum Margin Matrix Factorization (MMMF) [42] using trace-norm constraint and Bayesian Nonparametric Poisson Factorization (BNPF) [43] capturing preference patterns. Based on weighted nonnegative matrix factorization (WNMF), Hierarchical Structures Recommender (HSR) [44] explores the implicit hierarchical structures of users and items to improve the performance of recommendation. However, the above algorithms just work in a single domain and often lead to sparsity and cold-start problems. In order to solve the above two problems and improve the quality of recommendation, transfer learning has been used. CrossDomain Collaborative Filtering (CDCF) is a special case of transfer learning. Some efforts have been made to transfer knowledge from an auxiliary domain to a target domain [45]. For example, Cross-domain Topic Learning (CTL) [46] learns and differentiates collaboration topics from other topics to alleviate the sparsity issue. Chen et al. proposed the FUSE algorithm [47] which uses PARAFAC tensor decomposition to transfer knowledge. The algorithm can increase user acceptance rate in recommendation lists but needs tag information and is time-consuming. Hybrid Random Walk (HRW) [48] can transfer knowledge through a social network but the error will be propagated in the layered model. Besides, those algorithms can only transfer knowledge from an auxiliary domain to a target domain but cannot enhance the performances simultaneously among multiple domains. Different from the above methods, some efforts have been made to transfer knowledge between each domain by training some rating matrices together. The basic idea is similar to MultiTask Learning (MTL) [49] which trains several models together to improve the performance of each model. So it’s not necessary for us to distinguish auxiliary domains and target domains because each domain is taken as both an auxiliary domain and a target domain. For example, User-based Neighborhood Cross-Domain Collaborative Filtering model (N-CDCF-U), Item-based Neighborhood Cross-Domain Collaborative Filtering model (N-CDCF-I) and Matrix Factorization based Cross-Domain Collaborative Filtering model (MF-CDCF) [37] combine the rating records in all domains to form a rating matrix, based on which the rating prediction can be made. However, those simple methods neglect the peculiarity of each domain. In TrAnsfer Learning for MUltiple Domains (TALMUD) [36], the features and codebook knowledges of all domains can be transferred to each other, but the optimization procedure is time-consuming. In Collective Matrix Factorization (CMF) [50], the user latent feature matrix which can be viewed as a bridge to
88
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
transfer knowledge is shared when decomposing the rating matrices of different domains, but the domain-specific features are not considered. In [51], Li et al. proposed to share group-level rating patterns by means of both matching user/item groups from different domains and imposing user/item dependence across domains. Codebook is used in Cross-Domain Triadic Factorization (CDTF) [37] but is not shared among domains. Therefore, the knowledge of user-item co-clustering cannot be transferred to other domains. For overcoming this defect, Cluster-level Latent Factor Model (CLMF) [38] allows a part of codebook to be shared and item latent feature matrix of each domain should be divided into two parts to cater to this situation, but the shared features of users are not taken into account. As an improved version of CLMF, Probabilistic Cluster-Level Latent Factor Model (PCLF) [52] can capture diversities among different domains and predict the ratings of an item for a user in another domain according to the common clusters and domain-specific clusters of items. But the improved algorithm encounters the same problem as CLMF. Based on BPMF, the algorithm proposed in [53] assumes that the distributions of the latent feature vectors of users and items share the same mean and covariance. For predicting the optional course scores for university students, in our previous work [54], a cross-user-domain collaborative filtering (CUDCF) algorithm is designed by taking junior students and senior students as two different user (student) domains, where the course-score information of senior students is used to predict the course-score of junior students. However, in the scenario where the domains are uncorrelated, the above approaches cannot work well because the bridge cannot transfer meaningful knowledge and would even transfer negative knowledge which reduces the effect of recommendation. Besides, even for the same user, the performance will differ from one domain to another. So it is necessary to extract some domain-specific features of each user. Unfortunately, there is still a lack of methods differentiating shared features and domain-specific features adaptively. Recently, as the main trend of leveraging deep learning for recommendation [55,56], deep learning has also been applied to cross-domain recommendation [57,58]. The basic idea is to extract user/item features from multiple domains by means of deep neural networks, which are further fused together for cross-domain recommendation. However, as shown in our previous work [59], the major issues of deep learning for recommendation are the computational complexity caused by deep neural network and the overfitting issue caused by the relatively small number of ratings compared with the free parameters. These issues may become more serious in the case of cross-domain recommendation due to more domains are considered. 3. The proposed algorithm In this section, we will describe the proposed LSCD algorithm in detail. In multiple domains, we assume the features of user can be divided into shared part and domain-specific part adaptively. Following the common methodology of matrix factorization, we assume that the structure of user-domain-shared feature matrix is low-rank and the structure of user-domain-specific feature matrix is sparse. We use the approximate projected gradient method [60] for solving the objective function. We assume there are D domains in a recommendation task. The input user-item rating matrix Rd ∈ Rm×nd represents the rating relation between m users and nd items in the dth domain. Note that the users are the same in each domain. Each entry denotes the rating of a user to an item within a certain numerical interval [Rmin , Rmax ] which will vary on different datasets. The rating will be zero if the user has not rated the item. Id is the indicator matrix of Rd , where the value will be equal to one if the corresponding item in Rd has been rated or zero otherwise. U ∈ Rm×l is
the user-domain-shared feature matrix and Hd ∈ Rm×l is the userdomain-specific feature matrix in the dth domain. Vd ∈ Rnd ×l denotes the item latent feature matrix in the dth domain. Among them, l min(m, nd ), ∀d = 1, . . . , D is the number of latent features. 3.1. The objective function In the dth domain, the user latent feature matrix is the sum of the domain-shared feature matrix and domain-specific feature matrix, i.e., U + Hd . In the proposed algorithm, the shared features and domain-specific features can be differentiated adaptively. Based on the traditional matrix factorization algorithm, the predicted ratings of users to unrated items in the domain can be estimated by the product of users’ and items’ latent feature matrices,
Pd = (U + Hd )VdT .
(1)
In matrix factorization, those latent feature matrices will be learnt from rating data by minimizing the sum of squared errors between real ratings and predicted ratings. Besides, all the parameters among all domains should be learnt together so that the shared features and domain specific features of users can be distinguished. And the user-domain-shared feature matrix U can be viewed as a bridge to transfer knowledge among domains. So the loss function is, D 1 Rd − (U + Hd )VdT Id 2F , 2
(2)
d=1
where · F is Frobenius norm and denotes the Hadamard product. Multi-Task Learning (MTL) [49] captures the task relationship via a shared low-rank structure, and CDCF is similar to this learning method since we want to explore the latent feature relationship among multiple domains. Inspired by MTL, we divide the factors of users into shared part and domain-specific part. The shared part is used to model the overall preferences of users among multiple domains. Although the dimension of U is Rm×l , the number of shared features may be less than l and even equal to 0 if the domains are uncorrelated. When the domains are uncorrelated, it is not necessary to transfer knowledge because transferring negative knowledge may reduce the effect. Besides, if the domains are strongly correlated, the rank of U is at most l. So we need to find some important shared features from the l features, the number of which may be less than l. Besides, we can get a good result even the number of features is not large enough, since the most important features can be selected. Therefore, we assume U is lowrank [61]. So users will share the same low-rank feature subspace and we should minimize rank(U) to get a low-rank structure of user-domain-shared feature matrix. On the other hand, the preference of a user to different domains will vary slightly as discussed earlier and it can be reflected by a few features. So we use entry-wise sparse regularization term to identify those discriminative features in each domain. Because l0 -norm counts the number of nonzero elements of a solution, we should minimize Hd 0 to get a sparse structure of user-domainspecific feature matrix. And the matrix can be utilized to adjust domain-shared feature matrix to fit specific domain. The objective function is,
LO =
D D 1 λ Rd − (U + Hd )VdT Id 2F + V Vd 2F 2 2 d=1
+
λU rank(U ) + λH
d=1
D d=1
Hd 0 ,
(3)
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
≈(
:
≈(
Movie domain
Book domain
+
)×
=
+
)×
=
=
:
89
Fig. 1. The LSCD model of two domains. The black represents zero value while the white represents nonzero value.
where λV , λU and λH are regularization coefficients. The ratio between λH and λU is used to control the composition’s ratio of shared and domain-specific parts in user latent features. The Frobenius norm of Vd is added to prevent overfitting of the item latent feature matrix. However, for the user-domain-shared feature matrix U and the user-domain-specific feature matrix Hd , the Frobenius norms are not necessary because the low-rank and sparse regularization terms of the two variables can achieve the same goal [62,63]. Solving the above nonconvex optimization problem is NP-hard. As pointed out in [64], if the rank of U is not too large and Hd is sparse, the regularization terms rank(U) and Hd 0 can be approximated by the tightest convex relaxation U∗ and Hd 1 respectively where · ∗ and · 1 denote the Nuclear norm and the l1 -norm of a matrix, respectively. Therefore, the relaxed objective function is,
+
d=1
F
λU U ∗ + λH
G
D d=1
Hd 1 .
d=1
In order to solve the optimization problem, we work with the approximate objective function where the approximate projected gradient method is applied [65]. The objective function L can be split into two terms F and G where F is differentiable and the subgradients of G with respect to U and Hd can be obtained. And the gradients of parameters of the function F are as follows:
∇F Hd = ∇F U =
(U + Hd )VdT − Rd Id Vd ,
D d=1
∇F Vd =
(5)
D (U + Hd )VdT − Rd Id Vd = ∇F Hd ,
(6)
d=1
(U + Hd )VdT − Rd Id T (U + Hd ) + λV Vd .
(7)
As we can see that, the gradient of F with respect to U can be obtained from the gradient with respect to Hd . So, as a matter of fact, we just need to calculate the gradient with respect to Hd and Vd .
D D 1 λ L= Rd − (U + Hd )VdT Id 2F + V Vd 2F 2 2
3.2. Optimization
(4)
3.2.1. Update U The approximate proximal gradient method of variable U is,
U k+1 = arg min U
The main reason why we do not replace the low-rank rank(U) term by Frobenius norm U 2F in Eq. (4) is that it can only find the features of users in a single domain. Therefore, it fails to discover the shared factors of users, i.e., the basis of a low-rank subspace [49]. Besides, low-rank can preferentially find out the most important factors, so we can get a good result even by using a small dimension. For better understanding, an example is shown in Fig. 1, there are two domains: Movie and Book. The proposed algorithm decomposes the two sparse rating matrices into user and item latent feature matrix. The item feature matrix is of full rank and the user item feature matrix contains shared as well as domainspecific parts. The approximatively low-rank (convex relaxation) user-domain-shared feature matrix learned from the ratings of the two domains is a bridge to transfer knowledge between the two domains. The sparse user-domain-specific feature matrix finds the diversity of user feature on different domains. Finally, the latent features of users and items can recover the rating matrix to get the prediction to non-purchased items. Notice that the proposed algorithm is not designed to transfer knowledge from a source domain to a target domain. It is not necessary for us to distinguish source domains and target domains, because the LSCD trains all domains together to transfer knowledge with each other.
1 2
μλU U ∗ + U − (U k − μ∇F U k )2F
= proxμ,λU U ∗ (U k − μ∇F U k ),
(8)
where prox is the proximal operator and μ is the step size. We assume Q k k (W k )T = SVD(U k − μ∇F U k ) where the operation SVD is Single Value Decomposition. So we can update U by,
U k+1 = Q k SμλU ( k )(W k )T ,
(9)
where Sφ (x ) is the soft-thresholding (shrinkage) operator [65],
Sφ ( x ) =
x − φ, x > φ 0, |x| ≤ φ . x + φ , x < −φ
(10)
3.2.2. Update Hd The approximate proximal gradient method of variable Hd is,
Hdk+1 = arg min Hd
1 2
μλH Hd 1 + Hd − (Hdk − μ∇F Hdk )2F
= proxμ,λH H 1 (Hdk − μ∇F Hdk ),
(11)
so we can obtain Hdk+1 by the soft-thresholding operator,
Hdk+1 = SμλH Hdk − μ∇F Hdk .
(12)
90
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
3.2.3. Update Vd The parameter can be updated by the gradient descent method directly in each iteration,
Vdk+1
=
Vdk
− μ∇
k F Vd .
Table 1 Summary of the two testing datasets used in our experiments. Datasets Amazon
(13)
A local minimum of the objective function will be found when the iteration converges, and the optimization procedure is summarized in Algorithm 1.
MovieLens
Algorithm 1 LSCD. 1: 2: 3: 4: 5: 6: 7: 8: 9:
Input: Rating matrixs: R = {R1 , . . . , RD }, step size μ, regularization coefficient λU , λH and λV . Initialize: U, H = {H1 , . . . , HD }, V = {V1 , . . . , VD }. while not converge do Compute gradients ∇F Hd , ∇F U and ∇F Vd . Update U by Eq. (9). Update Hd by Eq. (12), ∀d = 1, . . . , D. Update Vd by Eq. (13), ∀d = 1, . . . , D. end while Output: U, H = {H1 , . . . , HD }, V = {V1 , . . . , VD }.
Domains
#Users
#Items
#Ratings
Density (%)
Book CD Music Movie
12,761 12,761 12,761 12,761
7346 2541 778 8270
85,400 85,865 28,680 188,507
0.09 0.27 0.29 0.18
COM DRA ACT THR
2113 2113 2113 2113
3029 3975 1277 1460
332,038 381,616 241,211 226,975
5.19 4.54 8.94 7.36
4. Experiments In this section, extensive experiments have been conducted on two real-world datasets to evaluate the effectiveness of the proposed LSCD algorithm. The dimensionality of latent feature vector l, the ratio between λH and λU and the number of domains will be adjusted to analyze their effects on our algorithm. And we compare the performance with nine state-of-the-art recommendation algorithms. The source code and datasets are available at https://github.com/sysulawliet/LSCD.
3.3. Rating prediction 4.1. Dataset description After obtaining the latent feature matrices U, Hd and Vd , the predicted rating matrix Pd of each domain can be estimated by Eq. (1). Each entry pdij denotes the predicted rating of user i to the non-purchased item j in the dth domain but some predicted ratings may go beyond the legal rating range [Rmin , Rmax ]. We must ensure that all predicted ratings pdij are in the interval, so some adjustments must be applied,
pdi j =
Rmax , if pdi j > Rmax Rmin , if pdi j < Rmin . pdi j , otherwise
(14)
3.4. Complexity and convergence analysis The dimensionalities of the latent feature matrix U, Hd and Vd are Rm×l , Rm×l and Rnd ×l respectively. The most complex and time-consuming operation during the optimization process is the gradient estimate of the three variables, and the time complexity is O (mnd l ). By assuming that the objective function will converge after t iterations, the time complexity of the single domain d is O (tmnd l ). But the proposed LSCD algorithm trains the rating data of D domains together and the value of D is relatively smaller compared with the other variables. Therefore, the time complexity of LSCD is O (tmnmax l ) where nmax = max(n1 , . . . , nD ). The objective function L in Eq. (4) is nonconvex. By applying the analytical method of the nonconvex case proposed in [66], we can obtain,
min U k − Q k SμλU ( k )(W k )T F
1≤k≤t
1 ≤ √ t
2(L(U 0 ) − L∗ ) 1 LU (F )
2
,
min Hdk − SμλH Hdk − μ∇F Hdk
1≤k≤t
1 ≤ √ t
2 ( L ( H 0 ) − L∗ ) 1 d
LHd (F )
2
,
(15)
F (16)
for 1 ≤ t ≤ ∞, where LU (F ) and LHd (F ) are the Lispschitz constants
of ∇F U and ∇F Hd respectively. So U k − Q k SμλU ( k )(W k )T → 0
and Hdk − SμλH Hdk − μ∇F Hdk → 0 as the number of iterations increases.
In our experiments, two real-world datasets with multiple item domains are used, namely Amazon and MovieLens. • Amazon1 : This dataset is obtained from Julian McAuley [29], which contains 6,643,669 users, 2,441,053 products and 80,737,555 ratings of 24 domains from Amazon spanning Jun 1995 - Mar 2013. Each record is a (user, item, rating, timestamp) tuple and the time information is not used. The potential duplicates are removed. The scale of the dataset is quite large and the users who have few rating records tend to give ratings to random items which will reduce the efficiency and effectiveness. So the users and items having less than 5 ratings are removed. Four item domains are used in our experiments, i.e., Book, CD, Music and Movie. • MovieLens2 : This dataset is obtained from the Information Retrieval Group at Universidad Autónoma de Madrid, which contains 2113 users, 10,197 movies, and 855,598 ratings from MovieLens spanning 1970–2009. Compared with the Amazon dataset, the rating matrix of MovieLens is denser. And each user has rated at least 20 movies. Tags will be labeled by users to movies and there are 16,528 tags. But many tags have the similar meanings, so these tags are combined into groups according to the meaning of words. We use the tags of movies to classify the ratings into 18 domains and the four movie domains are used in our experiments, i.e., comedy (COM), dramatic (DRA), action (ACT), thrilling (THR) domains. Without loss of generality, the two datasets are split randomly with 80% as the training set and 20% as the testing set. Four domains are selected from each dataset and the preferences of users are dependent across these domains. In order to show the effectiveness of the CDCF algorithms, we seek out the users who have rated at least an item in each domain. Because if a user has just rated items in a single domain, all the CDCF algorithms will degenerate into a traditional single domain recommendation algorithm and give relatively poor performance. For clarity, Table 1 summarizes the two datasets used in our experiments, where the density #Ratings is defined as #Users . ×#Items 1 2
http://jmcauley.ucsd.edu/data/amazon. https://grouplens.org/datasets/hetrec-2011.
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
91
Fig. 2. Parameter analysis: The impact of the dimensionality of latent feature vector l when using Frobenius norm and Nuclear norm for the regularization terms of U.
4.2. Evaluation measures In order to evaluate the quality of the recommendation algorithms, two widely used evaluation metrics, namely Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), will be used to measure the accuracy of the predicted ratings, which are defined as follows:
MAE =
1 |ri j − pi j |, T
RMSE =
i, j
1 ( ri j − pi j )2 , T i, j
where T denotes the number of tested ratings, rij and pij represent real rating and predicted rating respectively. By definition, a smaller MAE and RMSE value means better prediction quality of an algorithm. RMSE is sensitive to large errors while MAE is sensitive to the accumulation of small errors. 4.3. Parameter analysis Large regularization coefficients will reduce the fitting degree of the loss function, so we choose small values for the coefficients and test the effect of the ratio between λH and λU to better analyze the relation between the shared and domain-specific parts in user latent features. We also analyze the effect of the number of domains and the influence of different norms used in the regularization terms of U, e.g., Frobenius norm and Nuclear norm. Firstly, we vary the dimensionality of latent feature vector l from 10 to 100 with a step size 10 to find the proper value, and discover the difference between Frobenius norm and Nuclear norm. Secondly, we fix the value of λU to 0.1 and vary the value of λH from 0 to 0.15 with a step size 0.01. So the value of λH /λU will vary from 0 to 1.5 with a step size 0.1. The numbers of domains D are 2, 3 and 4 respectively. The results are shown in Fig. 2 and Fig. 3.
As shown in Fig. 2, as the dimensionality of latent feature vector increases, the error of the predictions on the two datasets will decrease firstly and tend to be stable in the end. In the preliminary stage, the low dimensional vector is not enough to describe users and items, so the performance of the algorithm will be improved as the dimensionality increases. If the dimensionality reaches to the value which can describe users and items adequately, further increase will no longer improve the performance because we have already had the capacity of getting the best prediction. We can also find that Frobenius norm is always inferior to Nuclear norm and needs more features to achieve the best result. Because it cannot find the common features among domains and treats each feature equally. Nuclear norm can get a good result even the number of features is small since it gives a priority to the most important features. So we choose the dimensionality l = 50 for the Nuclear norm in the following experiment which is satisfactory to describe users and items. As shown in Fig. 3, overall, the performance of the proposed LSCD algorithm on the two datasets will be improved as the number of domains increases. The reason is that we have more data to learn the features of users. So we can better isolate the shared and domain-specific parts from user latent feature matrix. Besides, the user latent features can be learned more accurately from more data which can also lead to a better learning of item features. The ratio between λH /λU has little effect over two domains on the MovieLens dataset and just enhances the efficiency slightly on the Amazon dataset when the ratio is larger than 0.8. This is because there is not enough data to distinguish the two compositions from user features well. But it can be observed that the effect of λH /λU is more obvious when the number of domains is increased to three or four. Over three or four domains, the performance will increase firstly and then decrease as the increases of λH /λU . When D = 3, the optimal λH /λU is approximately equal to 0.9 and 0.5 on the Amazon and MovieLens datasets respectively. But the algorithm can achieve the best results on the two datasets if λH /λU is around 1. Based on the above analysis, one conclusion can be drawn that
92
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
Fig. 3. Parameter analysis: The impact of λH /λU .
the optimal ratio between λH and λU increases versus the increase of the number of domains D. This means that the domain-specific part of user latent features becomes more important as the number of domains increases. This is because the differences between domains will be more apparent when there are more domains being compared with each other. 4.4. Comparison experiments We compare the results of the predicted ratings of the proposed LSCD algorithm with the following nine state-of-the-art recommendation algorithms. 1. User-based Neighborhood Cross-Domain Collaborative Filtering model (N-CDCF-U) [37]. 2. Item-based Neighborhood Cross-Domain Collaborative Filtering model (N-CDCF-I) [37]. 3. Matrix Factorization based Cross-Domain Collaborative Filtering model (MF-CDCF) [37]. 4. Cross Domain Triadic Factorization model (CDTF) [37]. 5. Collective Matrix Factorization (CMF) [50]. 6. Cross-domain recommendation based on Cluster-level Latent Factor Model (CLMF) [38]. 7. TrAnsfer Learning for MUltiple Domains (TALMUD) [36]. 8. Cross-Domain recommendation by sharing Latent vector Distributions (CDLD) [53]. 9. Probabilistic Cluster-level Latent Factor model (PCLF) [52]. For the matrix factorization methods, namely, MF-CDCF, CDTF, CMF, CLMF, TALMUD, CDLD, PCLF and our proposed LSCD algorithm, we set the dimensionality of latent feature vector l = 50. For the proposed LSCD algorithm, we take the best ratio λH /λU according to the results shown in Fig. 3. Besides, the step size μ and the item regularization coefficient λV are set to 0.001 and 0.1 respectively. To be fair, all the regularization coefficients of the compared
algorithms are set to 0.1. We initialize all the latent feature matrices randomly. 4.4.1. Comparison over two domains There are 11 combinations of the 4 domains. Plotting all the results in a figure will lead to chaos. Without loss of generality, we plot some typical combinations of domains. The comparison results in terms of MAE and RMSE on the two datasets over two domains are reported in Fig. 4. Six different pairs of domains are selected from each dataset respectively. The results show that the proposed LSCD algorithm outperforms the other state-of-theart CDCF algorithms in all combinations of two domains. Generally speaking, the other five matrix factorization methods are inferior to the two neighborhood based collaborative filtering methods on the MovieLens dataset but better than the two algorithms on the Amazon dataset. The reason is that, according to Table 1, the Amazon dataset is much sparser than the MovieLens dataset, and those matrix factorization methods can work well when data information is sparse because they predict ratings according to latent features rather than the original data. Besides, those neighborhood based collaborative filtering methods have more advantages when the rating matrix is dense since they can get a more precise user or item similarity matrix. But the proposed LSCD algorithm which is also based on matrix factorization can work well on both datasets. One reason may be that the low-rank structure of U makes it able to catch the shared feature subspace of users from few entries. On the other hand, the results of different combinations of two domains are quite different. For example, the performance of (CD + Music) domains is better than (Book + Music) domains on the Amazon dataset. And the performance of (DRA + ACT) domains is better than (COM + ACT) domains on the MovieLens dataset. The strength of domain correlation can be roughly measured by the Nuclear norm of U obtained from the combination of the corresponding domains. From this perspective, Music is more correlated
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
93
Fig. 4. Performance comparisons of different combination of two domains.
to CD than Book, that is, the corresponding Nuclear norm of U between Music and CD is 1.25 × 104 , which is larger than that between Music and Book (i.e. 7.30 × 103 ). And action movie is more correlated to dramatic movie than comedy movie, that is, the corresponding Nuclear norm of U between action movie and dramatic movie is 2.15 × 103 , which is larger than that between action movie and comedy movie (i.e. 1.07 × 103 ). So the performance of correlated domains will be better for the eight recommendation algorithms. But the proposed LSCD algorithm can obtain good results even the two domains are uncorrelated since the all sparse matrices Hd can characterize those domain-specific features of users and differentiate all domains better.
4.4.2. Comparison over multiple domains The comparison results in terms of MAE and RMSE on the two datasets over multiple domains are reported in Fig. 5. In most cases, the performances will be improved as the number of domains D increases but the proposed LSCD algorithm can achieve more performance gain. It’s obvious that the proposed LSCD algorithm significantly outperforms the other state-of-the-art recommendation algorithms. Similar to the situation of two domains, the matrix factorization methods will work better on the sparse dataset as shown in Fig. 5 and Table 1. Besides, the other five matrix factorization methods can work well just when the number of domains is large and the performances are poor when there are
94
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
Fig. 5. Performance comparisons of multiple domains.
just two or three domains. But the proposed LSCD algorithm can get a good result even if there are two domains because the lowrank and sparse structures are more suitable to model the features of users. What’s more, negative knowledge transfer will happen to some cases. For example, when adding Movie domain for CLFM on the Amazon dataset and adding action movie domain for CMF on the MovieLens dataset, their performances have not been improved but reduced because the patterns of transferring knowledge cannot be captured well. On the contrary, the proposed LSCD algorithm divides the features of users into shared and domain-specific parts. If the domains are correlated, the knowledge can be transferred by the shared part. Even if the domains are uncorrelated, the performance of each domain can be improved by the sectioned domain-
specific part. So this feature model ensures that the proposed LSCD algorithm will not transfer negative knowledge and can work well in both correlated domains and uncorrelated domains. 5. Conclusion and future work In this paper, we have proposed a novel cross-domain collaborative filtering recommendation algorithm termed LSCD which can better model the latent features of users to improve the quality of rating prediction. In particular, it is able to adaptively transfer knowledge among correlated domains and differentiate features among uncorrelated domains. Based on matrix factorization, we assume the user latent features are divided into shared
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
and domain-specific parts. We use low-rank matrix to capture the shared feature subspace of users and use sparse matrix to identify discriminative features in each domain. The objective function is optimized by the approximate projected gradient method and theoretical analysis has shown the complexity and convergence of the proposed algorithm. Extensive experiments on two real-world datasets have confirmed that the proposed algorithm can transfer knowledge among domains in an even better fashion and significantly outperforms state-of-the-art recommendation algorithms. As the future work, it is interesting to apply the proposed LSCD model in other recommendation tasks such as top-n item recommendation [67–69] and sequential next-item recommendation [56]. It is also very interesting to investigate whether the proposed LSCD model works in optional course score prediction [54]. Declaration of Competing Interest The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, cccsultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or nonfinancial Interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. The authors whose names are listed immediately below report the following details of affiliation or involvement In an organization or entity with a financial or non-financial interest In the subject matter or materials discussed in this manuscript. Please specify the nature of the conflict on a separate sheet of paper if the space below is inadequate. This statement is signed by all the authors to Indicate agreement that the above information is true and correct (a photocopy of this form may be used if there are more than 10 authors): Acknowledgment This project was supported by NSFC (61876193, 61672548, U1611461), Guangdong Natural Science Funds for Distinguished Young Scholar (2016A030306014), and Tip-top Scientific and Technical Innovative Youth Talents of Guangdong special support program (2016TQ03X542). References [1] B. Zheng, H. Su, K. Zheng, X. Zhou, Landmark-based route recommendation with crowd intelligence, Data Sci. Eng. 1 (2) (2016) 86–100. [2] Z.-L. Zhao, C.-D. Wang, Y.-Y. Wan, J.-H. Lai, Recommendation in feature space sphere, Electron. Commer. Res. Appl. 26 (2017) 109–118. [3] S. Deng, D. Wang, Y. Li, B. Cao, J. Yin, Z. Wu, M. Zhou, A recommendation system to facilitate business process modeling, IEEE Trans. Cybern. 47 (6) (2017) 1380–1394. [4] Q.-Y. Hu, Z.-L. Zhao, C.-D. Wang, J.-H. Lai, An item orientated recommendation algorithm from the multi-view perspective, Neurocomputing 269 (2017) 261–272. [5] C.-D. Wang, Z.-H. Deng, J.-H. Lai, P.S. Yu, Serendipitous recommendation in e– commerce using innovator-based collaborative filtering, IEEE Trans. Cybern. 49 (7) (2019) 2678–2692. [6] H.-H. Chen, Behavior2vec: generating distributed representations of users’ behaviors on products for recommender systems, Trans. Knowl. Discov. Data 12 (4) (2018) 43:1–43:20. [7] L. Li, W. Peng, S. Kataria, T. Sun, T. Li, Recommending users and communities in social media, Trans. Knowl. Discov. Data 10 (2) (2015) 17:1–17:27. [8] B. Liu, Y. Wu, N.Z. Gong, J. Wu, H. Xiong, M. Ester, Structural analysis of user choices for mobile app recommendation, Trans. Knowl. Discov. Data 11 (2) (2016) 17:1–17:23. [9] Q.-Y. Hu, L. Huang, C.-D. Wang, H.-Y. Chao, Item orientated recommendation by multi-view intact space learning with overlapping, Knowl. Based Syst. 164 (2019) 358–370.
95
[10] K. Wang, W. Meng, S. Li, S. Yang, Multi-modal mention topic model for mentionee recommendation, Neurocomputing 325 (2019) 190–199. [11] F. Nie, H. Wang, X. Cai, H. Huang, C.H.Q. Ding, Robust matrix completion via joint schatten p-norm and lp-norm minimization, in: 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10–13, 2012, 2012, pp. 566–574. [12] S. Pyo, E. Kim, M. Kim, LDA-based unified topic modeling for similar TV user grouping and TV program recommendation, IEEE Trans. Cybern. 45 (8) (2015) 1476–1490. [13] S. Feng, J. Cao, J. Wang, S. Qian, Recommendations based on comprehensively exploiting the latent factors hidden in items’ ratings and content, Trans. Knowl. Discov. Data 11 (3) (2017) 35:1–35:27. [14] D. Jannach, M. Zanker, A. Felfering, G. Friedrich, Recommender Systems: An Introduction, Addison Wesley Publishing, 2013. [15] M. Mao, J. Lu, G. Zhang, J. Zhang, Multirelational social recommendations via multigraph ranking, IEEE Trans. Cybern. 47 (12) (2017) 4049–4061. [16] Y. Wang, J. Deng, J. Gao, P. Zhang, A hybrid user similarity model for collaborative filtering, Inf. Sci. 418 (2017) 102–118. [17] K. Allab, L. Labiod, M. Nadif, A semi-NMF-PCA unified framework for data clustering, IEEE Trans. Knowl. Data Eng. 29 (1) (2017) 2–16. [18] B. Qian, X. Shen, Z. Tang, T. Zhang, Deep convex NMF for image clustering, in: 11th Chinese Conference on Biometric Recognition, 2016, pp. 583–590. [19] R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, in: Advances in Neural Information Processing Systems 20, 2007, pp. 1257–1264. [20] V. Kumar, A.K. Pujari, S.K. Sahu, V.R. Kagita, V. Padmanabhan, Collaborative filtering using multiple binary maximum margin matrix factorizations, Inf. Sci. 380 (2017) 1–11. [21] F. Ortega, A. Hernando, J. Bobadilla, J.-H. Kang, Recommending items to group of users using matrix factorization based collaborative filtering, Inf. Sci. 345 (2016) 313–324. [22] G. Guo, J. Zhang, D. Thalmann, Merging trust in collaborative filtering to alleviate data sparsity and cold start, Knowl.-Based Syst. 57 (2014) 57– 68. [23] B. Tan, Y. Song, E. Zhong, Q. Yang, Transitive transfer learning, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1155–1164. [24] P. Hao, G. Zhang, L. Martinez, J. Lu, Regularizing knowledge transfer in recommendation with tag-inferred correlation, IEEE Trans. Cybern. 49 (1) (2019) 83–96. [25] H. Ma, D. Zhou, C. Liu, M.R. Lyu, I. King, Recommender systems with social regularization, in: Proceedings of the Forth International Conference on Web Search and Web Data Mining, 2011, pp. 287–296. [26] Z. Yu, M. Tian, Z. Wang, B. Guo, T. Mei, Shop-type recommendation leveraging the data from social media and location-based services, Trans. Knowl. Discov. Data 11 (1) (2016) 1:1–1:21. [27] X. Xin, Z. Liu, C.-Y. Lin, H. Huang, X. Wei, P. Guo, Cross-domain collaborative filtering with review text, in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015, pp. 1827–1834. [28] W.X. Zhao, J. Wang, Y. He, J.-R. Wen, E.Y. Chang, X. Li, Mining product adopter information from online reviews for improving product recommendation, Trans. Knowl. Discov. Data 10 (3) (2016) 29:1–29:23. [29] J.J. McAuley, C. Targett, Q. Shi, A. van den Hengel, Image-based recommendations on styles and substitutes, in: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016, pp. 43–52. [30] L. Hong, L. Zou, C. Zeng, L. Zhang, J. Wang, J. Tian, Context-aware recommendation using role-based trust network, Trans. Knowl. Discov. Data 10 (2) (2015) 13:1–13:25. [31] Y. Gong, Q. Zhang, X. Huang, Hashtag recommendation for multimodal microblog posts, Neurocomputing 272 (2018) 170–177. [32] S. Tan, J. Bu, X. Qin, C. Chen, D. Cai, Cross domain recommendation based on multi-type media fusion, Neurocomputing 127 (2014) 124–134. [33] P. Hao, G. Zhang, J. Lu, Enhancing cross domain recommendation with domain dependent tags, in: 2016 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2016, Vancouver, BC, Canada, July 24–29, 2016, 2016, pp. 1266–1273. [34] B. Wang, M. Ester, Y. Liao, J. Bu, Y. Zhu, Z. Guan, D. Cai, The million domain challenge: broadcast email prioritization by cross-domain recommendation, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1895–1904. [35] T. Song, Z. Peng, S. Wang, W. Fu, X. Hong, P.S. Yu, Review-based cross– domain recommendation through joint tensor factorization, in: 22nd International Conference on Database Systems for Advanced Applications, 2017, pp. 525–540. [36] O. Moreno, B. Shapira, L. Rokach, G. Shani, TALMUD: transfer learning for multiple domains, in: 21st ACM International Conference on Information and Knowledge Management, 2012, pp. 425–434. [37] L. Hu, J. Cao, G. Xu, L. Cao, Z. Gu, C. Zhu, Personalized recommendation via cross-domain triadic factorization, in: 22nd International World Wide Web Conference, 2013, pp. 595–606. [38] S. Gao, H. Luo, D. Chen, S. Li, P. Gallinari, J. Guo, Cross-domain recommendation via cluster-level latent factor model, in: European Conference on Machine Learning and Knowledge Discovery in Databases, 2013, pp. 161– 176. [39] Y.-F. Liu, C.-Y. Hsu, S.-H. Wu, Non-linear cross-domain collaborative filtering via hyper-structure transfer, in: Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 1190–1198.
96
L. Huang, Z.-L. Zhao and C.-D. Wang et al. / Neurocomputing 366 (2019) 86–96
[40] Z.-L. Zhao, L. Huang, C.-D. Wang, D. Huang, Low-rank and sparse cross-domain recommendation algorithm, in: 23rd International Conference on Database Systems for Advanced Applications, 2018, pp. 150–157. [41] S. Ruslan, M. Andriy, Bayesian probabilistic matrix factorization using Markov chain monte carlo, in: Proceedings of the Twenty-Fifth International Conference on Machine Learning, 2008, pp. 880–887. [42] J.D.M. Rennie, N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, in: Proceedings of the Twenty-Second International Conference on Machine Learning, 2005, pp. 713–719. [43] P. Gopalan, F.J.R. Ruiz, R. Ranganath, D.M. Blei, Bayesian nonparametric poisson factorization for recommendation systems, in: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 2014, pp. 275–283. [44] S. Wang, J. Tang, Y. Wang, H. Liu, Exploring implicit hierarchical structures for recommender systems, in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015, pp. 1813–1819. [45] Q. Zhang, D. Wu, J. Lu, F. Liu, G. Zhang, A cross-domain recommender system with consistent information transfer, Decis. Support Syst. 104 (2017) 49–63. [46] J. Tang, S. Wu, J. Sun, H. Su, Cross-domain collaboration recommendation, in: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 1285–1293. [47] W. Chen, W. Hsu, M.-L. Lee, Making recommendations from multiple domains, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 892–900. [48] M. Jiang, P. Cui, X. Chen, F. Wang, W. Zhu, S. Yang, Social recommendation with cross-domain transferable knowledge, IEEE Trans. Knowl. Data Eng. 27 (11) (2015) 3084–3097. [49] J. Chen, J. Zhou, J. Ye, Integrating low-rank and group-sparse structures for robust multi-task learning, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, pp. 42–50. [50] A.P. Singh, G.J. Gordon, Relational learning via collective matrix factorization, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 650–658. [51] B. Li, X. Zhu, R. Li, C. Zhang, Rating knowledge sharing in cross-domain collaborative filtering, IEEE Trans. Cybern. 45 (5) (2015) 1054–1068. [52] S. Ren, S. Gao, J. Liao, J. Guo, Improving cross-domain recommendation through probabilistic cluster-level latent factor model, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, pp. 4200–4201. [53] T. Iwata, K. Takeuchi, Cross-domain recommendation without shared users or items by sharing latent vector distributions, in: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015, pp. 379–387. [54] L. Huang, C.-D. Wang, H.-Y. Chao, J.-H. Lai, P.S. Yu, A score prediction approach for optional course recommendation via cross-user-domain collaborative filtering, IEEE Access 7 (2019) 19550–19563. [55] Z.-H. Deng, L. Huang, C.-D. Wang, J.-H. Lai, P.S. Yu, DeepCF: a unified framework of representation learning and matching function learning in recommender system, in: Proceedings of the 33rd AAAI Conf. on Artificial Intelligence, 2019. [56] H. Fang, G. Guo, D. Zhang, Y. Shu, Deep learning-based sequential recommender systems: concepts, algorithms, and evaluations, in: Proceedings of the 19th International Conference on Web Engineering, ICWE 2019, Daejeon, South Korea, June 11–14, 2019, pp. 574–577. [57] A.M. Elkahky, Y. Song, X. He, A multi-view deep learning approach for cross domain user modeling in recommendation systems, in: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18–22, 2015, 2015, pp. 278–288. [58] H. Kanagawa, H. Kobayashi, N. Shimizu, Y. Tagami, T. Suzuki, Cross-domain recommendation via deep domain adaptation, in: Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part II, 2019, pp. 20–29. [59] W.-D. Xi, L. Huang, C.-D. Wang, Y.-Y. Zheng, J.-H. Lai, BPAM: recommendation based on bp neural network with attention mechanism, in: Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019. [60] S.A. Bhaskar, Probabilistic low-rank matrix recovery from quantized measurements: application to image denoising, in: 49th Asilomar Conference on Signals, Systems and Computers, 2015, pp. 541–545. [61] F. Nie, H. Huang, C.H.Q. Ding, Low-rank matrix recovery via efficient schatten p-norm minimization, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22–26, 2012, Toronto, Ontario, Canada, 2012. [62] Z. Kang, H. Pan, S.C.H. Hoi, Z. Xu, Robust graph learning from noisy data, IEEE Trans. Cybern. (2019), in press. [63] Z. Kang, L. Wen, W. Chen, Z. Xu, Low-rank kernel learning for graph-based clustering, Knowl.-Based Syst. 163 (2019) 510–517. [64] E.J. Candes, X. Li, Y. Ma, J. Wright, Robust principal component analysis, J. ACM 58 (3) (2011) 1–39. [65] N. Parikh, S.P. Boyd, Proximal algorithms, Found. Trends Optim. 1 (3) (2014) 127–239. [66] D. Palomar, Y.E. Eds, Convex Optimization in Signal Processing and Communications, Cambridge University Press, 2009.
[67] Z. Kang, C. Peng, Q. Cheng, Top-n recommender system via matrix completion, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12–17, 2016, Phoenix, Arizona, USA., 2016, pp. 179–185. [68] G. Guo, J. Zhang, F. Zhu, X. Wang, Factored similarity models with social trust for top-n item recommendation, Knowl.-Based Syst. 122 (2017) 17–25. [69] Z. Kang, C. Peng, Q. Cheng, Kernel-driven similarity learning, Neurocomputing 267 (2017) 210–219. Ling Huang received her undergraduate and master degree in 2009 and 2013 respectively from South China University of Technology. She is currently working toward the Ph.D. degree at Sun Yat-sen University. She has published several papers in international journals and conferences such as Pattern Recognition, Information Sciences, Neurocomputing, Knowledge-Based Systems, KDD, AAAI, IJCAI, ICDM, BIBM and DASFAA. Her research interest is data mining.
Zhi-Lin Zhao received his undergraduate and master degree in 2016 and 2018 respectively from Sun Yat-sen University, Guangzhou, China. He is currently working toward the Ph.D. degree at University of Technology Sydney. He has published several papers in international conferences such as CIKM and DASFAA. His research interest is machine learning.
Chang-Dong Wang received the Ph.D. degree in computer science in 2013 from Sun Yat-sen University, Guangzhou, China. He is a visiting student at University of Illinois at Chicago from Jan. 2012 to Nov. 2012. He joined Sun Yat-sen University in 2013 as an assistant professor with School of Mobile Information Engineering and now he is currently an associate professor with School of Data and Computer Science. His current research interests include machine learning and data mining. He has published over 100 scientific papers in international journals and conferences such as IEEE TPAMI, IEEE TKDE, IEEE TCYB, IEEE TSMC-C, Pattern Recognition, Neurocomputing, KDD, AAAI, IJCAI, ICDM, CIKM, SDM, BIBM and DASFAA. His ICDM 2010 paper won the Honorable Mention for Best Research Paper Awards. He won 2012 Microsoft Research Fellowship Nomination Award. He was awarded 2015 Chinese Association for Artificial Intelligence (CAAI) Outstanding Dissertation. Dong Huang received the B.S. degree in computer science from South China University of Technology, Guangzhou, China, in 2009, and the M.Sc. and Ph.D. degrees in computer science from Sun Yat-sen University, Guangzhou, in 2011 and 2015, respectively. He joined South China Agricultural University in 2015, where he is currently an Associate Professor with the College of Mathematics and Informatics. From July 2017 to July 2018, he was a visiting fellow with the School of Computer Science and Engineering, Nanyang Technological University, Singapore. His research interests are in the areas of data mining and machine learning, and more specifically focus on ensemble clustering, multi-view clustering, and large-scale clustering. Hong-Yang Chao received the B.S. and Ph.D. degrees in computational mathematics from Sun Yet-sen University, Guangzhou, China. In 1988, she joined the Department of Computer Science, Sun Yet-sen University, where she was initially an Assistant Professor and later became an Associate Professor. She is currently a Full Professor with the School of Data and Computer Science. She has published extensively in the area of image/video processing and holds three U.S. patents and four Chinese patents in the related area. Her current research interests include image and video processing, image and video compression, massive multimedia data analysis, and content-based image (video) retrieval.