Accepted Manuscript
Mixed Similarity Learning for Recommendation with Implicit Feedback Mengsi Liu, Weike Pan, Miao Liu, Yaofeng Chen, Xiaogang Peng, Zhong Ming PII: DOI: Reference:
S0950-7051(16)30504-4 10.1016/j.knosys.2016.12.010 KNOSYS 3761
To appear in:
Knowledge-Based Systems
Received date: Revised date: Accepted date:
13 June 2016 8 December 2016 9 December 2016
Please cite this article as: Mengsi Liu, Weike Pan, Miao Liu, Yaofeng Chen, Xiaogang Peng, Zhong Ming, Mixed Similarity Learning for Recommendation with Implicit Feedback, Knowledge-Based Systems (2016), doi: 10.1016/j.knosys.2016.12.010
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights
from the perspective of item similarity.
CR IP T
• We study an important recommendation problem with implicit feedback
• We exploit the complementarity of the predefined similarity and the learned similarity via a novel mixed similarity model.
• We develop a novel recommendation algorithm, i.e., pairwise factored mixed
AN US
similarity model (P-FMSM), based on the mixed similarity and pairwise preference assumption.
• We showcase the effectiveness of P-FMSM as compared with several state-
AC
CE
PT
ED
M
of-the-art methods on four public datasets.
ACCEPTED MANUSCRIPT
Mixed Similarity Learning for Recommendation with Implicit Feedback
CR IP T
Mengsi Liu, Weike Pan# , Miao Liu, Yaofeng Chen, Xiaogang Peng∗ , Zhong Ming∗
AN US
[email protected],
[email protected],
[email protected],
[email protected],
[email protected],
[email protected] College of Computer Science and Software Engineering, Shenzhen University
Abstract
Implicit feedback such as users’ examination behaviors have been recognized as a very important source of information in most recommendation scenarios. For recommendation with implicit feedback, a good similarity measurement and a proper
M
preference assumption are critical for the quality of personalization services. So far, the similarities used in the state-of-the-art recommendation methods include
ED
the predefined similarity and the learned similarity; and the preference assumptions include the well-known pairwise assumption.
PT
In this paper, we exploit the complementarity of the predefined similarity and the learned similarity via a novel mixed similarity model. Furthermore, we de-
CE
velop a novel recommendation algorithm, i.e., pairwise factored mixed similarity model (P-FMSM), based on the mixed similarity and pairwise preference assump-
AC
tion. Our P-FMSM is able to (i) capture the locality of the user-item interactions via the symmetric predefined similarity, (ii) model the global correlations among items via the asymmetric learned similarity, and (iii) digest the uncertain implicit : Co-first author. The first two authors contributed equally. : Xiaogang Peng and Zhong Ming are corresponding authors.
# ∗
Preprint submitted to Knowledge-Based Systems
December 13, 2016
ACCEPTED MANUSCRIPT
feedback via the pairwise preference assumption. Empirical studies on four public datasets show that our P-FMSM can recommend significantly more accurate than several state-of-the-art methods.
CR IP T
Keywords: Mixed Similarity, Implicit Feedback, Recommender Systems 1. Introduction
The task of recommendation with implicit feedback [6, 27] aims to provide
AN US
personalization services via learning users’ preferences from their examination behaviors. It has recently been recognized as a more important problem than
the well studied numerical rating prediction problem in the Netflix $1 Million Prize [15] in most scenarios. It has also attracted attentions from industry practitioners because of the pervasiveness of the problem setting in enormous recom-
M
mendation systems such as users’ browsing behaviors in e-commerce, watching patterns in video streaming sites, and check-in records in location-based mobile
ED
social networks, etc.
For recommendation with implicit feedback, there are at least three categories
PT
of approaches. The first one adopts some predefined similarities such as Cosine similarity and Jaccard Index, based on which a neighborhood of items is con-
CE
structed and a prediction can then be made [5]. The second one chooses to learn the similarity instead of using some predefined similarities [14]. The third one first proposes a proper preference assumption and then designs a corresponding
AC
loss or objective function to be optimized for user behavior reasoning, among which the pairwise assumption [27] usually performs well regarding accuracy and efficiency. However, most recommendation methods focus on one specific aspect of those
ACCEPTED MANUSCRIPT
three categories of approaches such as a predefined similarity, a learned similarity or some preference assumption. Each aspect has its own advantage, e.g., the abil-
CR IP T
ity to capture the local and global relations in the predefined similarity and learned similarity, respectively. As far as we know, no published works aim to integrate those three aspects into one single algorithm in a principled way.
In this paper, we take each aspect as a reusable component and compile them
into one single algorithm as a unified whole. Specifically, we first design a mixed
AN US
similarity model that combines the predefined similarity and learned similarity.
And then, we embed the prediction rule based on the mixed similarity into a typical loss function of the well-known pairwise preference assumption. Finally, we obtain our algorithm called pairwise factored mixed similarity model (P-FMSM). We conduct extensive empirical studies with several closely related state-of-
M
the-art methods on four public real-world datasets, and find that our proposed solution, i.e., P-FMSM, performs significantly better than those with one of the
ED
aforementioned three aspects or components. Furthermore, our mixed similarity learning model is significantly better than the corresponding unmixed similarity
PT
model in the same pairwise preference learning framework. These observations clearly show the effectiveness of the designed mixed similarity model and the
CE
developed recommendation algorithm. We organize the rest of the paper as follows. We first give some background
about the problem definition and some existing approaches for the studied prob-
AC
lem in Section 2, and then describe our proposed solution in detail in Section 3. We conduct extensive empirical studies in Section 4, and discuss and summarize some closely related work in Section 5. Finally, we conclude our work in Section 6.
ACCEPTED MANUSCRIPT
2. Background 2.1. Problem Definition
CR IP T
In the studied problem of recommendation with implicit feedback, we have n users, m items and some associated examination records in the form of (user,
item) pairs denoted as R. Our goal is to learn users’ preferences from such pairs and provide a personalized recommendation list for each user.
AN US
We put some heavily used notations and their explanations in Table 1. 2.2. Recommendation with Predefined Similarity
Most memory-based or neighborhood-based recommendation methods adopt some predefined similarity measurement such as the well-known Cosine similarity between item i and item i0 ,
M
|Ui ∩ Ui0 | (p) sii0 = p , |Ui ||Ui0 |
(1)
ED
where |Ui |, |Ui0 | and |Ui ∩ Ui0 | denote the number of users examined item i, item i0 , and both item i and item i0 , respectively. Once the similarity of each item pair
PT
has been calculated, we can obtain a set of most similar items to a target item i denoted as Ni . Finally, we can make a prediction for the preference of user u on
AC
CE
item i as follows [5],
(p)
rˆui =
X
(p)
sii0 ,
(2)
i0 ∈Iu ∩Ni
where Iu ∩ Ni denotes the intersected set of items that belong to the examined items by user u, i.e., Iu , and the nearest neighbors of item i, i.e., Ni . The pre-
diction rule in Eq.(2) represents the overall similarity between the target item i and the most similar items examined by user u, i.e., Iu ∩ Ni . With the predicted
CR IP T
ACCEPTED MANUSCRIPT
Table 1: Some notations and explanations.
user set, u ∈ U, |U| = n
U
item set, i, i0 , j ∈ I, |I| = m
I
items examined by u
Iu
nearest neighbors of item i
Ni R = {(u, i)} (p) (`)
predefined similarity between i and i0 learned similarity between i and i0
sii0
(m)
mixed similarity between i and i0
ED
sii0
Uu· ∈ R1×d
PT
Vi· , Wi0 · ∈ R1×d bi
(`) rˆui ,
(m) rˆui
user-specific latent feature vector item-specific latent feature vector item bias predicted preference
T
iteration number
λs , α
tradeoff parameter
CE AC
examination records
M
sii0
(p) rˆui ,
AN US
users that examined i
Ui
ACCEPTED MANUSCRIPT
preference on each (user, item) pair, we can rank the items and recommend a few items with the highest preference scores. Note that the preference predic-
CR IP T
tion rule in Eq.(2) is usually called item-oriented recommendation, parallel to the user-oriented recommendation based on a predefined similarity between each two users. 2.3. Recommendation with Learned Similarity
The predefined similarity is of good interpretability, easy defect tracking, and
AN US
system operation and maintenance. However, the main drawback of the predefined similarity is the lack of transitivity, i.e., two items will not be connected if there is
no common user (Ui ∩ Ui0 = ∅). As a result, such an unconnected item i0 will not contribute to the preference prediction as it will not be included in the item set Ni
M
in Eq.(2). In order to address this limitation, a learned similarity is proposed [14], (`)
sii0 = Vi· WiT0 · ,
(3)
ED
where the similarity between item i and item i0 is replaced by the inner product of two latent feature vectors, i.e., Vi· ∈ R1×d for item i and Wi0 · ∈ R1×d for item i0 .
PT
The inner product of two learned latent feature vectors is unlikely to be zero. Such non-zero similarity means that two items can be connected, which overcomes
CE
the aforementioned intransitivity limitation of the predefined similarity. With the (`)
AC
learned similarity sii0 , a similar prediction rule can then be derived [14], X (`) 1 (`) rˆui = bi + p sii0 , |Iu \{i}| i0 ∈Iu \{i}
(4)
where bi is the newly added item bias denoting the learned popularity of item i, |Iu \{i}| is the size of the item set Iu excluding the item i, and √
1 |Iu \{i}|
is a cer-
tain normalization for comparability among predictions from different users. For
ACCEPTED MANUSCRIPT
a ranking-oriented recommendation model, we usually do not include the user bias because it will not affect the ranking orders for a typical user. Note that a
CR IP T
recommendation method with learned similarity is usually not a staged method as the item-oriented recommendation with steps of similarity calculation and neighborhood construction. Hence, we do not include the item set Ni in the prediction rule in Eq.(4), but use Iu \{i} instead for simplicity [14].
2.4. Recommendation with Pairwise Preference Assumption
AN US
For the studied problem of recommendation with implicit feedback, besides
the prediction rule based on the predefined or learned similarity, another important issue is the preference assumption. A proper preference assumption is usually proposed to digest the uncertain implicit feedback with different perspectives.
M
There are typically three types of preference assumptions, including pointwise preference assumption, pairwise preference assumption and listwise preference
ED
assumption. Empirically, the pairwise preference assumption [27] usually performs more accurate than the pointwise assumption and more efficient than the
ciency.
PT
listwise assumption, which is thus a good balance between accuracy and effi-
In pairwise preference learning [27], we usually assume that a user prefers
CE
an examined item to an unexamined one. Specifically, there are three different instantiations of this assumption, including (i) rˆui > rˆuj meaning relative prefer-
AC
ence of user u on item i and item j [27], (ii) rˆui − rˆuj = 1 meaning the prefer-
ence difference is a positive constant [11, 14, 31], and (iii) an extension with user groups [24].
ACCEPTED MANUSCRIPT
3. Our Solution 3.1. Mixed Similarity Model (p)
(`)
CR IP T
The predefined similarity sii0 is known to be good at capturing local rela-
tions among items, while the learned similarity sii0 can model global relations well because of the transitivity. We propose to combine the predefined similarity and learned similarity in a principled way in order to make full use of both two. Specifically, (i) we first use the predefined similarity to augment the learned simi(p) (`)
AN US
larity, i.e., sii0 sii0 , which is a typical way of integrating two different signals [19]; and (ii) we then further integrate the augmented similarity into the learned similarity because of its superiority in preference prediction in most cases, which will also be discussed in our empirical studies. Note that the enhanced similarity is expected to capture both the local and global relations among items. We illustrate
M
the main idea of our mixed similarity learning in Figure 1. Mathematically, we
ED
formalize our mixed similarity model via a linear combination as follows, (`)
(p) (`)
(1 − λs )sii0 + λs sii0 sii0 ,
(5)
(p) (`)
PT
where the tradeoff parameter λs ∈ [0, 1] is the weight on the augmented similarity
sii0 sii0 . When λs = 0, it reduces to the learned similarity only [14]; when λs = 1,
CE
it is similar to the augmented similarity using auxiliary content information [19]; and when 0 < λs < 1, we have a mixed similarity model, which usually performs
AC
the best in our empirical studies. 3.2. Factored Mixed Similarity Model In order to overcome the limitation of intransitivity, we follow the seminal
work [14] and use two latent feature vectors to represent each learned similarity,
CR IP T
ACCEPTED MANUSCRIPT
AN US
Figure 1: Illustration of mixed similarity learning for recommendation with implicit feedback. (`)
i.e., sii0 = Vi· WiT0 · , and have a factored version of the mixed similarity, (m)
(p)
sii0 = ((1 − λs ) + λs sii0 )Vi· WiT0 ·
(6)
ED
M
With the factored mixed similarity, we can estimate the preference of user u on P (m) (m) item i in a similar way to that of [14], i.e., rˆui = bi + √ 1 i0 ∈Iu \{i} sii0 for |Iu \{i}| P (m) (m) an examined item i ∈ Iu , and rˆuj = bj + √1 i0 ∈Iu sji0 for an unexamined |Iu |
(`)
item j 6∈ Iu . We can see that the difference between the prediction rule of rˆui in (m)
FISM [14] as shown in Eq.(4) and ours is the mixed similarity sii0 . Hence, we
PT
call our similarity model factored mixed similarity model (FMSM).
CE
3.3. Pairwise Factored Mixed Similarity Model Furthermore, in order to develop a recommendation algorithm, we adopt the
AC
well-known pairwise preference assumption [27], and assume that the preference of an examined item i ∈ Iu is larger than that of an unexamined item j 6∈ Iu , i.e.,
rˆui > rˆuj . For this reason, we call our algorithm pairwise factored mixed similarity model (P-FMSM). With such a pairwise preference assumption, we reach the
ACCEPTED MANUSCRIPT
objective function of our P-FMSM, min (m)
(m)
u∈U i∈Iu j∈I\Iu
(m)
where fuij = − ln σ(ˆ ruij ) + α2 kVi· k2 + α2 kVj· k2 + α2 α kbj k2 2
(7)
fuij , P
CR IP T
Θ
XX X
i0 ∈Iu
||Wi0 · ||2F + α2 kbi k2 +
is a tentative objective function for the sampled triple (u, i, j), and Θ =
{Wi· , Vi· , bi , i = 1, 2, . . . , m} denotes the set of parameters to be learned. Note
that although the objective function of P-FMSM as shown in Eq.(7) looks very
AN US
similar to that of BPR [27], the prediction rule is totally different with the pro-
posed factored mixed similarity model. The similarity measurement is actually a very fundamental issue as discussed before and is also our focus in this paper. 3.4. Learning the P-FMSM
M
For learning the model parameters θ ∈ Θ in Eq.(7), we use the popular
stochastic gradient descent (SGD) algorithmic framework [27], which is described in Figure 2. Note that the P-FMSM algorithm as shown in Figure 2 refers to the
ED
step of “Learn Parameters” in the middle part of Figure 1. Specifically, our algorithm contains two loops, where the outer loop is for the number of scans, and the
PT
inner loop is for one scan of the whole set of feedback. For each iteration of the inner loop, we first randomly sample an examined (user, item) pair (u, i) ∈ R and
CE
then sample an unexamined item j ∈ I\Iu , which is a typical sampling strategy in recommendation methods based on pairwise preference assumption [27]. Once a triple (u, i, j) is sampled, we can calculate the gradients, and use them to update
AC
the model parameters via θ ← θ − γ∇θ with γ > 0 as the learning rate. The
1:
Initialize the model parameters Θ
2:
for t = 1, . . . , T do for t2 = 1, . . . , |R| do
Randomly pick up a pair (u, i) ∈ R
4:
Randomly pick up an item j from I\Iu
5:
Calculate the gradients ∇θ, θ ∈ Θ
6:
Update the model parameters, θ ← θ − γ∇θ, θ ∈ Θ
7:
AN US
3:
CR IP T
ACCEPTED MANUSCRIPT
Figure 2: The SGD algorithm for P-FMSM.
gradients of the model parameters, ∇θ = (m)
(m)
∂fuij ∂θ
, are as follows,
M
∇bj = −σ(−ˆ ruij )(−1) + αbj , X 1 (p) (m) ((1 − λs ) + λs sii0 )Wi0 · ) + αVj· , ∇Vj· = −σ(−ˆ ruij )(− p |Iu | i0 ∈Iu (m)
ED
∇bi = −σ(−ˆ ruij ) + αbi , X 1 (m) (p) ∇Vi· = −σ(−ˆ ruij ) p ((1 − λs ) + λs sii0 )Wi0 · + αVi· , |Iu \{i}| i0 ∈I \{i}
(8) (9) (10) (11)
u
PT
Vi· Vj· (m) (p) ∇Wi0 · = −σ(−ˆ ruij )((1 − λs ) + λs sii0 )( p −p ) + αWi0 · , i0 ∈ Iu \{i},(12) |Iu \{i}| |Iu | (m)
CE
∇Wi· = −σ(−ˆ ruij ) (m)
(m)
(p)
−((1 − λs ) + λs sii0 )Vj· p + αWi· , |Iu |
(13)
(m)
where ruij = rˆui − rˆuj .
From the algorithm in Figure 2, we can see that the time complexity of our P-
AC
FMSM is O(T |R|dρ), and that of BPR is O(T |R|d), where ρ denotes the average number of examined items by each user.
ACCEPTED MANUSCRIPT
4. Experimental Results 4.1. Datasets
CR IP T
For direct comparative empirical studies, we use four public datasets1 , where the training and test records of each dataset are obtained via a random splitting of the converted implicit feedback [24]. Specifically, the first dataset Movie-
Lens100K contains 943 users, 1682 items, 27688 training records and 27687 test records; the second dataset MovieLens1M contains 6040 users, 3952 items,
AN US
287641 training records and 287640 test records; the third dataset UserTag contains 3000 users, 2000 items, 123218 training records and 123218 test records; and the fourth dataset Netflix5K5K contains 5000 users, 5000 items, 77936 train-
ing records and 77936 test records. Each dataset contains three copies of training records and test records. For the first copy of each dataset, we construct a valida-
M
tion data for parameter tuning via sampling n (i.e., the number of users) records from the corresponding training records.
ED
We put the statistics of the first copy of each dataset in Table 2.
PT
4.2. Evaluation Metrics
For recommendation with implicit feedback, a user is usually only interested
CE
in a few top ranked items (say 5 for example) in the recommendation list instead of their predicted scores [3]. We thus use two commonly adopted ranking-oriented
AC
evaluation metrics in recommendation and information retrieval [24], i.e., P rec@5 and N DCG@5 for precision and normalized discounted cumulative gain (NDCG) at location 5, respectively. 1
https://sites.google.com/site/weikep/GBPRdata.zip
ACCEPTED MANUSCRIPT
Table 2: Statistics of the data used in the experiments, including numbers of users (|U|), items (|I|), implicit feedback (|R|) of training records (Tr.) and test records (Te.), and densities (|R|/|U|/|I|)
CR IP T
of training records.
|U|
|I|
|R| (Tr.)
|R| (Te.)
|R|/|U|/|I| (Tr.)
MovieLens100K
943
1682
27688
27687
1.75%
MovieLens1M
6040
3952
287641
287640
1.21%
UserTag
3000
2000
123218
123218
2.05%
Netflix5K5K
5000
5000
AN US
Data set
77936
77936
0.31%
ED
M
Mathematically, P rec@5 and N DCG@5 are defined as follows [26], " # 5 X 1X 1 P re@5 = δ(Lu (p) ∈ Iute ) , |U te | 5 p=1 u∈U te " # u 5 X 1 2δ(Lu (p)∈Ite ) − 1 1 X , N DCG@5 = Pmin(5,|Iute |) 1 |U te | log(p + 1) te p=1 p=1 u∈U
log(p+1)
where Lu (p) is the pth item for user u in the recommendation list, and δ(x) is an
PT
indicator function with value of 1 if x is true and 0 otherwise. Note that U te and Iute denote the set of test users and the set of examined items by user u in the test
CE
data, respectively.
AC
4.3. Baselines and Parameter Settings In order to study the effectiveness of the developed algorithm with the de-
signed mixed similarity model in the pairwise preference learning framework, we choose the state-of-the-art methods based on the predefined similarity, the learned
ACCEPTED MANUSCRIPT
similarity or the pairwise preference assumption. Specifically, we include the following baselines:
ommendation with implicit feedback;
CR IP T
• Ranking based on items’ popularity (PopRank) is a basic algorithm for rec-
• Item-oriented collaborative filtering with Cosine similarity (ICF) [5] is a classical algorithm based on the predefined similarity;
AN US
• ICF with an amplifier on the similarity (ICF-A) [2] favors the items with higher similarity;
• Factored item similarity model with AUC loss (FISMauc) and RMSE loss
(FISMrmse) [14] are two seminal algorithms based on the learned similar-
M
ity;
• Hierarchical poisson factorization (HPF) [8] is a recent Bayesian approach
ED
tailed for modeling implicit feedback in recommender systems;
• Bayesian personalized ranking (BPR) [27] is the first work that introduces
PT
the pairwise preference assumption into the task of recommendation with
CE
implicit feedback, which performs among the best in most cases; • Adaptive K nearest neighbors based recommendation (A-KNN) [27] gen-
AC
eralizes the pairwise preference assumption to ICF with similarity learning.
However, the learned similarity in A-KNN is not factorized, which thus has the same limitation of intransitivity. For fair comparison, we implement a factored version of adaptive KNN (FA-KNN);
ACCEPTED MANUSCRIPT
• and Group preference based BPR (GBPR) [24] is an extended pairwise preference learning method considering user groups.
CR IP T
For ICF and ICF-A, we use Cosine similarity and |Ni | = 20 in neighborhood
construction. For the amplifier in ICF-A, we use 2.5 as a typical value [2]. For
factorization-based methods, i.e., FISMauc, FISMrmse, BPR, FA-KNN, GBPR
and our P-FMSM, we use d = 20 latent features, and search the tradeoff parameter on regularization terms α ∈ {0.001, 0.01, 0.1} and iteration number T ∈
AN US
{100, 500, 10000} using N DCG@5. For the Bayesian method HPF, we use the public code2 and default parameter configurations, i.e., 20 for the latent dimension
number, 10 for the iteration number multiple, 0.3 for the hyperparameter, and true for bias and hierarchy. The tradeoff parameter λs for the mixed similarity in PFMSM is also searched via N DCG@5 from {0.2, 0.4, 0.6, 0.8, 1}. The learning
M
rate γ is fixed as 0.01. The results of GBPR are copied from [24]. Note that the
sampling strategy of BPR and P-FMSM in our empirical studies in this paper is
ED
shown in Figure 2, which is more efficient than that of BPR and GBPR in [24].
PT
4.4. Experimental Results 4.4.1. Overall Results
CE
The main experimental results are shown in Tables 3, 4, 5 and 6. We can have the following observations: (i) our proposed P-FMSM performs significantly better than all other methods in all cases (the p-value of the significance test is smaller
AC
than 0.01)3 , which clearly shows the advantages of integrating the predefined and 2 3
https://github.com/premgopalan/hgaprec We calculate the p-value of the statistical significance test via the MATLAB function ttest2.m
as shown at http://www.mathworks.com/help/stats/ttest2.html
ACCEPTED MANUSCRIPT
learned similarities in the pairwise preference learning framework; (ii) the recommendation methods based on the learned similarity (e.g., FISMauc, FISMrmse,
CR IP T
FA-KNN and our P-FMSM) usually perform better than that based on the predefined similarity (i.e., ICF and ICF-A), which shows the helpfulness of similarity learning for a recommendation algorithm; (iii) the recommendation methods
based on the pairwise preference assumption (e.g., BPR, FA-KNN, GBPR and our P-FMSM) perform well, which shows its effectiveness for the uncertain implicit
AN US
feedback; (iv) and the very basic method PopRank usually performs worst as expected as it is a non-personalized solution and recommends the same set of most popular items to all users. 4.4.2. Effect of Mixed Similarity
In this subsection, we study the effect of the mixed similarity in P-FMSM.
M
For direct comparison, we fix λs = 0 and obtain a reduced model P-FISM with the unmixed learned similarity only. We show the results in Figure 3, and can
ED
see that P-FMSM is better P-FISM across all the four datasets, which clearly shows the merit of the mixed similarity via exploiting the complementarity of the
PT
predefined similarity and the learned similarity in the pairwise preference learning
CE
framework.
4.4.3. Effect of Neighborhood Size
AC
In this subsection, we study the effect of using different neighborhood sizes,
i.e., K ∈ {20, 30, 40, 50}, of P-FMSM instead of using all the neighbors by default. For fair comparison, we also use the same numbers of neighbors in ICF.
The recommendation performance on P rec@5 and N DCG@5 are shown in Figure 4 and Figure 5, respectively. We can have the following observations: (i) both
CR IP T
ACCEPTED MANUSCRIPT
Table 3: Recommendation performance of P-FMSM and other methods on MovieLens100K using
P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are
AN US
marked in bold (p value < 0.01).
P rec@5
PopRank
N DCG@5
0.2724±0.0094 0.2915±0.0072
ICF
0.3145±0.0018 0.3305±0.0010
M
ICF-A
0.2820±0.0048 0.2943±0.0091 0.3651±0.0086 0.3788±0.0139
FISMrmse (α = 0.001, T = 100)
0.3987±0.0085 0.4153±0.0092
HPF
0.3412±0.0077 0.3523±0.0097
ED
FISMauc (α = 0.01, T = 100)
0.3627±0.0079 0.3779±0.0079
FA-KNN (α = 0.1, T = 1000)
0.3679±0.0044 0.3868±0.0076
GBPR
0.4051±0.0038 0.4201±0.0031
CE
PT
BPR (α = 0.1, T = 1000)
0.4115±0.0059 0.4320±0.0054
AC
P-FMSM (α = 0.001, T = 500, λs = 0.4)
CR IP T
ACCEPTED MANUSCRIPT
Table 4: Recommendation performance of P-FMSM and other methods on MovieLens1M using
P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are
AN US
marked in bold (p value < 0.01).
P rec@5
PopRank
N DCG@5
0.2822±0.0019 0.2935±0.0010
ICF
0.3831±0.0021 0.3966±0.0020
M
ICF-A
0.3921±0.0016 0.4066±0.0022 0.3502±0.0069 0.3569±0.0082
FISMrmse (α = 0.001, T = 100)
0.4227±0.0006 0.4366±0.0011
HPF
0.4131±0.0080 0.4262±0.0096
ED
FISMauc (α = 0.001, T = 100)
0.4195±0.0013 0.4307±0.0008
FA-KNN (α = 0.01, T = 100)
0.3650±0.0026 0.3808±0.0017
GBPR
0.4494±0.0020 0.4636±0.0014
CE
PT
BPR (α = 0.01, T = 1000)
0.4562±0.0037 0.4731±0.0038
AC
P-FMSM (α = 0.001, T = 500, λs = 0.4)
CR IP T
ACCEPTED MANUSCRIPT
Table 5: Recommendation performance of P-FMSM and other methods on UserTag using P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are marked in
AN US
bold (p value < 0.01).
P rec@5
PopRank
N DCG@5
0.2647±0.0012 0.2730±0.0014
ICF
0.2257±0.0051 0.2306±0.0045
M
ICF-A
0.2480±0.0028 0.2540±0.0032 0.2536±0.0031 0.2619±0.0037
FISMrmse (α = 0.001, T = 100)
0.3006±0.0046 0.3092±0.0049
HPF
0.2684±0.0070 0.2756±0.0074
ED
FISMauc (α = 0.1, T = 500)
0.2849±0.0036 0.2931±0.0047
FA-KNN (α = 0.1, T = 1000)
0.2641±0.0046 0.2720±0.0036
GBPR
0.3011±0.0008 0.3104±0.0009
CE
PT
BPR (α = 0.1, T = 500)
0.3129±0.0026 0.3218±0.0030
AC
P-FMSM (α = 0.001, T = 500, λs = 0.6)
CR IP T
ACCEPTED MANUSCRIPT
Table 6: Recommendation performance of P-FMSM and other methods on Netflix5K5K using
P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are
P rec@5
N DCG@5
0.1728±0.0012
0.1794±0.0004
0.2017±0.0012
0.2186±0.0008
0.1653±0.0032
0.1804±0.0041
FISMauc (α = 0.01, T = 500)
0.1932±0.0039
0.2022±0.0023
FISMrmse (α = 0.001, T = 100)
0.2217±0.0043
0.2413±0.0063
HPF
0.1899±0.0047
0.2044±0.0041
0.2207±0.0051
0.2374±0.0055
FA-KNN (α = 0.001, T = 100)
0.2101±0.0020
0.2254±0.0029
GBPR
0.2411±0.0027
0.2611±0.0025
CE
AN US
marked in bold (p value < 0.01).
0.2469±0.0027
0.2661±0.0027
PopRank ICF
ED
M
ICF-A
PT
BPR (α = 0.01, T = 1000)
AC
P-FMSM (α = 0.001, T = 1000, λs = 0.2)
ACCEPTED MANUSCRIPT
P−FISM P−FMSM
0.4
0.3
0.2
MovieLens100K
MovieLens1M
UserTag
P rec@5
P−FISM P−FMSM
0.4
0.3
MovieLens100K
MovieLens1M
M
0.2
Netflix5K5K
AN US
NDCG@5
0.5
CR IP T
Prec@5
0.5
UserTag
Netflix5K5K
ED
N DCG@5
Figure 3: Recommendation performance of P-FISM (i.e., with learned similarity only when
PT
λs = 0) and P-FMSM. Note that the searched values of P-FISM are α = 0.01, T = 1000 on MovieLens100K, α = 0.001, T = 500 on MovieLens1M, α = 0.001, T = 100 on UserTag, and
CE
α = 0.001, T = 500 on Netflix5K5K.
P-FMSM and P-FMSM (with different K) give about 23% on MovieLens100K,
AC
10% on MovieLens1M, 28% on UserTag, and 15% on Netflix5k5k of significantly better results compared to ICF w.r.t. the corresponding values of the neighborhood sizes K ∈ {20, 30, 40, 50}, which shows the effectiveness of the developed algo-
rithm; and (ii) the performance of P-FMSM and P-FMSM (with different K) are
close, which shows that we can use a proper size of neighborhood instead of using
ACCEPTED MANUSCRIPT
all the neighbors to achieve low space complexity. It is an appealing property for real deployment by industry practitioners.
CR IP T
The above experimental results show that our P-FMSM is significantly more accurate than the state-of-the-art recommendation methods for the studied problem, which shows the effectiveness of mixed similarity learning model via inte-
grating the predefined similarity, learned similarity and pairwise preference learning in one single algorithm. One major potential limitation is its effectiveness in
AN US
an extremely sparse case, because neither the predefined similarity nor the learned similarity can guarantee to succeed in bridging two latently related items well.
Prec@5
0.4
0.3
20
30
K
40
ED
MovieLens100K
0.4
0.3
20
AC
CE
0.2
30
K
UserTag
40
50
0.3
20
30
K
ICF w/ different K P−FMSM w/ different K P−FMSM 40 50
MovieLens1M 0.5
ICF w/ different K P−FMSM w/ different K P−FMSM
PT
Prec@5
0.5
0.4
0.2
50
Prec@5
0.2
0.5
ICF w/ different K P−FMSM w/ different K P−FMSM
M
Prec@5
0.5
ICF w/ different K P−FMSM w/ different K P−FMSM
0.4
0.3
0.2
20
30
K
40
50
Netflix5K5K
Figure 4: Recommendation performance of P rec@5 using ICF and P-FMSM with different values
of K (i.e., different neighborhood sizes), and the default P-FMSM with all neighbors.
ACCEPTED MANUSCRIPT
0.5
0.4
0.3
0.2
20
30
K
ICF w/ different K P−FMSM w/ different K P−FMSM 40 50
0.4
0.3
0.2
20
MovieLens100K NDCG@5
0.3
20
30
K
K
MovieLens1M 0.5
ICF w/ different K P−FMSM w/ different K P−FMSM
0.4
0.2
30
40
50
UserTag
ICF w/ different K P−FMSM w/ different K P−FMSM
0.4
AN US
NDCG@5
0.5
ICF w/ different K P−FMSM w/ different K P−FMSM 40 50
CR IP T
NDCG@5
NDCG@5
0.5
0.3
0.2
20
30
K
40
50
Netflix5K5K
M
Figure 5: Recommendation performance of N DCG@5 using ICF and P-FMSM with different
5. Related Work
ED
values of K (i.e., different neighborhood sizes), and the default P-FMSM with all the neighbors.
PT
In the big data era, the information overload problem has become unavoidable for almost everyone. We are submerged into a world with various types of infor-
CE
mation from social updates to recent scientific and technology advancement. So far, we mainly have two approaches to address the information overload problem
AC
to some extent, including information retrieval [4] and intelligent recommendation [28]. The paradigm of information retrieval is from a user-issued query to a ranked list of documents w.r.t. a certain relevance metric, which needs proactive involvement of the user. On the contrary, an intelligent recommendation engine exploits users’ historical behavior and then generates a personalized ranked list of
ACCEPTED MANUSCRIPT
items (e.g., web pages [20], tags [1], travel sequences [13] and many others [18]) to a target user or a group of target users [34] without intensive user involvement.
CR IP T
Technically, there are mainly two types of recommendation methods, including neighborhood-based recommendation and factrization-based recommendation, besides the very basic approach of recommendation based on items’ popularity
(i.e., PopRank). We will discuss these two types of recommendation methods separately in the subsequent two subsections.
AN US
5.1. Neighborhood-based Recommendation
There are usually two types of users’ feedback in a typical recommendation system [28, 12], including explicit feedback such as 5-star ratings and implicit feedback such as examination records. For either recommendation with explicit
M
feedback or implicit feedback, neighborhood construction is always a very critical step [21], which requires some similarity measurement between two users or two
ED
items.
For explicit feedback, the most well-known similarity measurement is probably the Pearson correlation coefficient (PCC) [7, 9] defined on the co-rated items
PT
by two users or co-raters to two items. For implicit feedback, we have similarities such as Cosine similarity, Jaccard index and their extensions [5, 33]. Note that
CE
the similarity between two users (or two items) such as Cosine similarity may be
AC
further amplified to favor users (or items) with high similarity [2]. 5.2. Factorization-based Recommendation Factorization-based methods have been well recognized as the state-of-the-
art methods in most recommendation problems. For explicit feedback, a typical
ACCEPTED MANUSCRIPT
factorization-based method will approximate each feedback, e.g., a numerical rating, with the inner product of two latent feature vectors [29] or with additional fea-
CR IP T
ture vectors of implicitly examined items [15]. There are also some factorizationbased methods that optimize the ranking list of the derived relative ordering from explicit feedback directly [35].
For implicit feedback, a factorization-based method is usually associated with
a preference assumption, including pointwise preference assumption [10, 14, 22],
AN US
pairwise preference assumption [14, 27, 36, 24], and listwise preference assump-
tion [30]. A specific preference assumption is usually associated with a loss to be minimized or an objective to be maximized, such as square loss [10, 22] in pointwise assumption, Logistic loss [27], Hinge loss [36] and square loss of relative preference difference [14] in pairwise assumption, and the objective of mean
M
reciprocal rank (MRR) [30] in listwise assumption. For the prediction rule in a typical loss or objective function, we usually have two choices, including the one
ED
with the inner product of two latent feature vectors [10, 22, 27, 30, 36] and the other using the prediction rule of neighborhood-based method [14]. Besides, a
PT
recent approach digests the implicit user feedback in the Bayesian framework [8] instead of the well-known matrix-factorization framework.
CE
Similarity between two items or two users is a very fundamental issue in recommendation. Our mixed similarity learning model goes one step beyond traditional unmixed similarity such as the predefined similarity or the learned similarity
AC
only, and inherits the merits of both two. The designed mixed similarity learning model is also generic and can be applied to recommendation settings with heterogeneous feedback [25] and social connections [32]. We summarize some related works in Table 7, from which we can see that our
ACCEPTED MANUSCRIPT
Table 7: Summary of some related works,
Factorization-based
recommendation
recommendation
CR IP T
Neighborhood-based
w/o similarity: e.g., [29, 35] Explicit feedback
predefined similarity: e.g., [7]
learned similarity: e.g., [15] mixed similarity: N/A
w/o similarity: e.g., [8, 10, 22, 24, 27, 30, 36] predefined similarity: e.g., [2, 5]
learned similarity: e.g., [14]
AN US
Implicit feedback
mixed similarity: e.g., P-FMSM
proposed mixed similarity learning model is a novel solution for recommendation
M
with implicit feedback. 6. Conclusions and Future Work
ED
In this paper, we study an important recommendation problem with implicit feedback from the perspective of item similarity. Specifically, we propose a novel
PT
mixed similarity learning model that exploits the complementarity of the predefined similarity and the learned similarity commonly used in the state-of-the-art
CE
recommendation methods. As far as we know, we are the first to study the integration of such two canonical approaches for bridging two items in a principled way.
AC
With the mixed similarity, we further develop a novel recommendation algorithm in the pairwise preference learning framework, i.e., pairwise factored mixed sim-
ilarity model (P-FMSM). Our P-FMSM can recommend significantly better than the state-of-the-art methods with the predefined similarity or the learned similarity.
ACCEPTED MANUSCRIPT
For future works, we are interested in studying the mixed similarity model in an extremely sparse case, generalizing it to cross-domain problem settings with
ploying it in a cloud computing environment [17]. Acknowledgments
CR IP T
heterogeneous auxiliary data of text and temporal inforamtion [16, 23], and de-
We thank the handling editor and reviewers for their expert comments and con-
AN US
structive suggestions. We thank the support of National Natural Science Founda-
tion of China No. 61502307 and No. 61672358, and Natural Science Foundation of Guangdong Province No. 2014A030310268 and No. 2016A030313038. References
M
[1] Fabiano M. Bel´em, Carolina S. Batista, Rodrygo L. T. Santos, Jussara M. Almeida, and Marcos A. Gonc¸alves. Beyond relevance: Explicitly promot-
ED
ing novelty and diversity in tag recommendation. ACM Transactions on
PT
Intelligent Systems and Technology, 7(3):26:1–26:34, 2016. [2] John S. Breese, David Heckerman, and Carl Myers Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the
CE
14th Conference on Uncertainty in Artificial Intelligence, UAI ’98, pages
AC
43–52, 1998.
[3] Li Chen and Pearl Pu. Users’ eye gaze pattern in organization-based recommender interfaces. In Proceedings of the 16th International Conference on Intelligent User Interfaces, IUI ’11, pages 311–314, 2011.
ACCEPTED MANUSCRIPT
[4] Prabhakar Raghavan Christopher D. Manning and Hinrich Sch¨utze. Introduction to Information Retrieval. Cambridge University Press, 2008.
CR IP T
[5] Mukund Deshpande and George Karypis. Item-based top-n recommenda-
tion algorithms. ACM Transactions on Information Systems, 22(1):143–177, 2004.
[6] Asmaa Elbadrawy and George Karypis. User-specific feature-based similar-
AN US
ity models for top-n recommendation of new items. ACM Transactions on Intelligent Systems and Technology, 6(3):57:1–57:22, 2015.
[7] David Goldberg, David Nichols, Brian M. Oki, and Douglas Terry. Using collaborative filtering to weave an information tapestry. Communications of
M
the ACM, 35(12):61–70, 1992.
[8] Prem Gopalan, Jake M. Hofman, and David M. Blei. Scalable recommen-
ED
dation with hierarchical poisson factorization. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, UAI ’15, pages 326–
PT
335, 2015.
[9] Guibing Guo, Jie Zhang, and Neil Yorke-Smith. A novel bayesian similarity
CE
measure for recommender systems. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, IJCAI ’13, pages 2619–2625,
AC
2013.
[10] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 8th IEEE International Conference on Data Mining, ICDM ’08, pages 263–272, 2008.
ACCEPTED MANUSCRIPT
[11] Michael Jahrer and Andreas T¨oscher. Collaborative filtering ensemble for ranking. In Proceedings of KDD Cup 2011 Competition, pages 153–167,
CR IP T
2012. [12] Gawesh Jawaheer, Peter Weller, and Patty Kostkova. Modeling user preferences in recommender systems: A classification framework for explicit and
implicit user feedback. ACM Transactions on Interactive Intelligent Systems, 4(2):8:1–8:26, 2014.
AN US
[13] Shuhui Jiang, Xueming Qian, Tao Mei, and Yun Fu. Personalized travel sequence recommendation on multi-source big social media. IEEE Transactions on Big Data, 2(1):43–56, 2016.
[14] Santosh Kabbur, Xia Ning, and George Karypis. Fism: Factored item sim-
M
ilarity models for top-n recommender systems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data
ED
Mining, KDD ’13, pages 659–667, 2013. [15] Yehuda Koren. Factorization meets the neighborhood: a multifaceted col-
PT
laborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08,
CE
pages 426–434, 2008.
AC
[16] Jianqiang Li, Jing Li, Xianghua Fu, M. A. Masud, and Joshua Zhexue Huang.
Learning distributed word representation with multi-contextual
mixed embedding. Knowledge-Based Systems, 106:220–230, 2016.
[17] Jiayin Li, Meikang Qiu, Zhong Ming, Gang Quan, Xiao Qin, and Zonghua Gu. Online optimization for scheduling preemptable tasks on iaas cloud
ACCEPTED MANUSCRIPT
systems. Journal of Parallel and Distributed Computing, 72(5):666–677, 2012.
CR IP T
[18] Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. Recommender system application developments: A survey. Decision Support Systems, 74:12–32, 2015.
[19] Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. Content-based collaborative filtering for news topic recommendation. In
’15, pages 217–223, 2015.
AN US
Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI
[20] Thi Thanh Sang Nguyen, Haiyan Lu, and Jie Lu. Web-page recommendation based on web usage and domain knowledge. IEEE Transactions on
M
Knowledge and Data Engineering, 26(10):2574–2587, 2014. [21] Xia Ning, Christian Desrosiers, and George Karypis. A comprehensive sur-
ED
vey of neighborhood-based recommendation methods. In Recommender Systems Handbook (Second Edition), pages 37–76. 2015.
PT
[22] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. One-class collaborative filtering. In Proceedings
CE
of the 8th IEEE International Conference on Data Mining, ICDM ’08, pages
AC
502–511, 2008.
[23] Weike Pan. A survey of transfer learning for collaborative recommendation with auxiliary data. Neurocomputing, 177:447–453, 2016.
[24] Weike Pan and Li Chen. Gbpr: Group preference based bayesian personalized ranking for one-class collaborative filtering. In Proceedings of the 23rd
ACCEPTED MANUSCRIPT
International Joint Conference on Artificial Intelligence, IJCAI’13, pages 2691–2697, 2013.
CR IP T
[25] Weike Pan, Shanchuan Xia, Zhuode Liu, Xiaogang Peng, and Zhong Ming. Mixed factorization for collaborative recommendation with heterogeneous explicit feedbacks. Information Sciences, 332:84–93, 2016.
[26] Weike Pan, Hao Zhong, Congfu Xu, and Zhong Ming. Adaptive bayesian
AN US
personalized ranking for heterogeneous implicit feedbacks. KnowledgeBased Systems, 73:173–180, 2015.
[27] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars SchmidtThieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence,
M
UAI ’09, pages 452–461, 2009.
ED
[28] Francesco Ricci, Lior Rokach, and Bracha Shapira. Recommender Systems Handbook (Second Edition). Springer, 2015.
PT
[29] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Annual Conference on Neural Information Processing Systems 20, NIPS
CE
’08, pages 1257–1264, 2008. [30] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria
AC
Oliver, and Alan Hanjalic. Climf: learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings of the 6th ACM Conference on Recommender Systems, RecSys ’12, pages 139–146, 2012.
ACCEPTED MANUSCRIPT
[31] G´abor Tak´acs and Domonkos Tikk. Alternating least squares for personalized ranking. In Proceedings of the 6th ACM Conference on Recommender
CR IP T
Systems, RecSys ’12, pages 83–90, 2012. [32] Jiliang Tang, Xia Hu, and Huan Liu. Social recommendation: a review. Social Network Analysis and Mining, 3(4):1113–1133, 2013.
[33] Koen Verstrepen and Bart Goethals. Unifying nearest neighbors collabora-
AN US
tive filtering. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 177–184, 2014.
[34] Wei Wang, Guangquan Zhang, and Jie Lu. Member contribution-based group recommender system. Decision Support Systems, 87:80–93, 2016.
M
[35] Markus Weimer, Alexandros Karatzoglou, and Alexander J. Smola. Improving maximum margin matrix factorization. Machine Learning, 72(3):263–
ED
276, 2008.
[36] Shuang-Hong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, and
PT
Zhaohui Zheng. Collaborative competitive filtering: learning recommender using context of user choice. In Proceeding of the 34th International ACM
CE
SIGIR Conference on Research and Development in Information Retrieval,
AC
SIGIR ’11, pages 295–304, 2011.