Mixed similarity learning for recommendation with implicit feedback

Mixed similarity learning for recommendation with implicit feedback

Accepted Manuscript Mixed Similarity Learning for Recommendation with Implicit Feedback Mengsi Liu, Weike Pan, Miao Liu, Yaofeng Chen, Xiaogang Peng,...

907KB Sizes 0 Downloads 29 Views

Accepted Manuscript

Mixed Similarity Learning for Recommendation with Implicit Feedback Mengsi Liu, Weike Pan, Miao Liu, Yaofeng Chen, Xiaogang Peng, Zhong Ming PII: DOI: Reference:

S0950-7051(16)30504-4 10.1016/j.knosys.2016.12.010 KNOSYS 3761

To appear in:

Knowledge-Based Systems

Received date: Revised date: Accepted date:

13 June 2016 8 December 2016 9 December 2016

Please cite this article as: Mengsi Liu, Weike Pan, Miao Liu, Yaofeng Chen, Xiaogang Peng, Zhong Ming, Mixed Similarity Learning for Recommendation with Implicit Feedback, Knowledge-Based Systems (2016), doi: 10.1016/j.knosys.2016.12.010

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights

from the perspective of item similarity.

CR IP T

• We study an important recommendation problem with implicit feedback

• We exploit the complementarity of the predefined similarity and the learned similarity via a novel mixed similarity model.

• We develop a novel recommendation algorithm, i.e., pairwise factored mixed

AN US

similarity model (P-FMSM), based on the mixed similarity and pairwise preference assumption.

• We showcase the effectiveness of P-FMSM as compared with several state-

AC

CE

PT

ED

M

of-the-art methods on four public datasets.

ACCEPTED MANUSCRIPT

Mixed Similarity Learning for Recommendation with Implicit Feedback

CR IP T

Mengsi Liu, Weike Pan# , Miao Liu, Yaofeng Chen, Xiaogang Peng∗ , Zhong Ming∗

AN US

[email protected], [email protected], [email protected], [email protected], [email protected], [email protected] College of Computer Science and Software Engineering, Shenzhen University

Abstract

Implicit feedback such as users’ examination behaviors have been recognized as a very important source of information in most recommendation scenarios. For recommendation with implicit feedback, a good similarity measurement and a proper

M

preference assumption are critical for the quality of personalization services. So far, the similarities used in the state-of-the-art recommendation methods include

ED

the predefined similarity and the learned similarity; and the preference assumptions include the well-known pairwise assumption.

PT

In this paper, we exploit the complementarity of the predefined similarity and the learned similarity via a novel mixed similarity model. Furthermore, we de-

CE

velop a novel recommendation algorithm, i.e., pairwise factored mixed similarity model (P-FMSM), based on the mixed similarity and pairwise preference assump-

AC

tion. Our P-FMSM is able to (i) capture the locality of the user-item interactions via the symmetric predefined similarity, (ii) model the global correlations among items via the asymmetric learned similarity, and (iii) digest the uncertain implicit : Co-first author. The first two authors contributed equally. : Xiaogang Peng and Zhong Ming are corresponding authors.

# ∗

Preprint submitted to Knowledge-Based Systems

December 13, 2016

ACCEPTED MANUSCRIPT

feedback via the pairwise preference assumption. Empirical studies on four public datasets show that our P-FMSM can recommend significantly more accurate than several state-of-the-art methods.

CR IP T

Keywords: Mixed Similarity, Implicit Feedback, Recommender Systems 1. Introduction

The task of recommendation with implicit feedback [6, 27] aims to provide

AN US

personalization services via learning users’ preferences from their examination behaviors. It has recently been recognized as a more important problem than

the well studied numerical rating prediction problem in the Netflix $1 Million Prize [15] in most scenarios. It has also attracted attentions from industry practitioners because of the pervasiveness of the problem setting in enormous recom-

M

mendation systems such as users’ browsing behaviors in e-commerce, watching patterns in video streaming sites, and check-in records in location-based mobile

ED

social networks, etc.

For recommendation with implicit feedback, there are at least three categories

PT

of approaches. The first one adopts some predefined similarities such as Cosine similarity and Jaccard Index, based on which a neighborhood of items is con-

CE

structed and a prediction can then be made [5]. The second one chooses to learn the similarity instead of using some predefined similarities [14]. The third one first proposes a proper preference assumption and then designs a corresponding

AC

loss or objective function to be optimized for user behavior reasoning, among which the pairwise assumption [27] usually performs well regarding accuracy and efficiency. However, most recommendation methods focus on one specific aspect of those

ACCEPTED MANUSCRIPT

three categories of approaches such as a predefined similarity, a learned similarity or some preference assumption. Each aspect has its own advantage, e.g., the abil-

CR IP T

ity to capture the local and global relations in the predefined similarity and learned similarity, respectively. As far as we know, no published works aim to integrate those three aspects into one single algorithm in a principled way.

In this paper, we take each aspect as a reusable component and compile them

into one single algorithm as a unified whole. Specifically, we first design a mixed

AN US

similarity model that combines the predefined similarity and learned similarity.

And then, we embed the prediction rule based on the mixed similarity into a typical loss function of the well-known pairwise preference assumption. Finally, we obtain our algorithm called pairwise factored mixed similarity model (P-FMSM). We conduct extensive empirical studies with several closely related state-of-

M

the-art methods on four public real-world datasets, and find that our proposed solution, i.e., P-FMSM, performs significantly better than those with one of the

ED

aforementioned three aspects or components. Furthermore, our mixed similarity learning model is significantly better than the corresponding unmixed similarity

PT

model in the same pairwise preference learning framework. These observations clearly show the effectiveness of the designed mixed similarity model and the

CE

developed recommendation algorithm. We organize the rest of the paper as follows. We first give some background

about the problem definition and some existing approaches for the studied prob-

AC

lem in Section 2, and then describe our proposed solution in detail in Section 3. We conduct extensive empirical studies in Section 4, and discuss and summarize some closely related work in Section 5. Finally, we conclude our work in Section 6.

ACCEPTED MANUSCRIPT

2. Background 2.1. Problem Definition

CR IP T

In the studied problem of recommendation with implicit feedback, we have n users, m items and some associated examination records in the form of (user,

item) pairs denoted as R. Our goal is to learn users’ preferences from such pairs and provide a personalized recommendation list for each user.

AN US

We put some heavily used notations and their explanations in Table 1. 2.2. Recommendation with Predefined Similarity

Most memory-based or neighborhood-based recommendation methods adopt some predefined similarity measurement such as the well-known Cosine similarity between item i and item i0 ,

M

|Ui ∩ Ui0 | (p) sii0 = p , |Ui ||Ui0 |

(1)

ED

where |Ui |, |Ui0 | and |Ui ∩ Ui0 | denote the number of users examined item i, item i0 , and both item i and item i0 , respectively. Once the similarity of each item pair

PT

has been calculated, we can obtain a set of most similar items to a target item i denoted as Ni . Finally, we can make a prediction for the preference of user u on

AC

CE

item i as follows [5],

(p)

rˆui =

X

(p)

sii0 ,

(2)

i0 ∈Iu ∩Ni

where Iu ∩ Ni denotes the intersected set of items that belong to the examined items by user u, i.e., Iu , and the nearest neighbors of item i, i.e., Ni . The pre-

diction rule in Eq.(2) represents the overall similarity between the target item i and the most similar items examined by user u, i.e., Iu ∩ Ni . With the predicted

CR IP T

ACCEPTED MANUSCRIPT

Table 1: Some notations and explanations.

user set, u ∈ U, |U| = n

U

item set, i, i0 , j ∈ I, |I| = m

I

items examined by u

Iu

nearest neighbors of item i

Ni R = {(u, i)} (p) (`)

predefined similarity between i and i0 learned similarity between i and i0

sii0

(m)

mixed similarity between i and i0

ED

sii0

Uu· ∈ R1×d

PT

Vi· , Wi0 · ∈ R1×d bi

(`) rˆui ,

(m) rˆui

user-specific latent feature vector item-specific latent feature vector item bias predicted preference

T

iteration number

λs , α

tradeoff parameter

CE AC

examination records

M

sii0

(p) rˆui ,

AN US

users that examined i

Ui

ACCEPTED MANUSCRIPT

preference on each (user, item) pair, we can rank the items and recommend a few items with the highest preference scores. Note that the preference predic-

CR IP T

tion rule in Eq.(2) is usually called item-oriented recommendation, parallel to the user-oriented recommendation based on a predefined similarity between each two users. 2.3. Recommendation with Learned Similarity

The predefined similarity is of good interpretability, easy defect tracking, and

AN US

system operation and maintenance. However, the main drawback of the predefined similarity is the lack of transitivity, i.e., two items will not be connected if there is

no common user (Ui ∩ Ui0 = ∅). As a result, such an unconnected item i0 will not contribute to the preference prediction as it will not be included in the item set Ni

M

in Eq.(2). In order to address this limitation, a learned similarity is proposed [14], (`)

sii0 = Vi· WiT0 · ,

(3)

ED

where the similarity between item i and item i0 is replaced by the inner product of two latent feature vectors, i.e., Vi· ∈ R1×d for item i and Wi0 · ∈ R1×d for item i0 .

PT

The inner product of two learned latent feature vectors is unlikely to be zero. Such non-zero similarity means that two items can be connected, which overcomes

CE

the aforementioned intransitivity limitation of the predefined similarity. With the (`)

AC

learned similarity sii0 , a similar prediction rule can then be derived [14], X (`) 1 (`) rˆui = bi + p sii0 , |Iu \{i}| i0 ∈Iu \{i}

(4)

where bi is the newly added item bias denoting the learned popularity of item i, |Iu \{i}| is the size of the item set Iu excluding the item i, and √

1 |Iu \{i}|

is a cer-

tain normalization for comparability among predictions from different users. For

ACCEPTED MANUSCRIPT

a ranking-oriented recommendation model, we usually do not include the user bias because it will not affect the ranking orders for a typical user. Note that a

CR IP T

recommendation method with learned similarity is usually not a staged method as the item-oriented recommendation with steps of similarity calculation and neighborhood construction. Hence, we do not include the item set Ni in the prediction rule in Eq.(4), but use Iu \{i} instead for simplicity [14].

2.4. Recommendation with Pairwise Preference Assumption

AN US

For the studied problem of recommendation with implicit feedback, besides

the prediction rule based on the predefined or learned similarity, another important issue is the preference assumption. A proper preference assumption is usually proposed to digest the uncertain implicit feedback with different perspectives.

M

There are typically three types of preference assumptions, including pointwise preference assumption, pairwise preference assumption and listwise preference

ED

assumption. Empirically, the pairwise preference assumption [27] usually performs more accurate than the pointwise assumption and more efficient than the

ciency.

PT

listwise assumption, which is thus a good balance between accuracy and effi-

In pairwise preference learning [27], we usually assume that a user prefers

CE

an examined item to an unexamined one. Specifically, there are three different instantiations of this assumption, including (i) rˆui > rˆuj meaning relative prefer-

AC

ence of user u on item i and item j [27], (ii) rˆui − rˆuj = 1 meaning the prefer-

ence difference is a positive constant [11, 14, 31], and (iii) an extension with user groups [24].

ACCEPTED MANUSCRIPT

3. Our Solution 3.1. Mixed Similarity Model (p)

(`)

CR IP T

The predefined similarity sii0 is known to be good at capturing local rela-

tions among items, while the learned similarity sii0 can model global relations well because of the transitivity. We propose to combine the predefined similarity and learned similarity in a principled way in order to make full use of both two. Specifically, (i) we first use the predefined similarity to augment the learned simi(p) (`)

AN US

larity, i.e., sii0 sii0 , which is a typical way of integrating two different signals [19]; and (ii) we then further integrate the augmented similarity into the learned similarity because of its superiority in preference prediction in most cases, which will also be discussed in our empirical studies. Note that the enhanced similarity is expected to capture both the local and global relations among items. We illustrate

M

the main idea of our mixed similarity learning in Figure 1. Mathematically, we

ED

formalize our mixed similarity model via a linear combination as follows, (`)

(p) (`)

(1 − λs )sii0 + λs sii0 sii0 ,

(5)

(p) (`)

PT

where the tradeoff parameter λs ∈ [0, 1] is the weight on the augmented similarity

sii0 sii0 . When λs = 0, it reduces to the learned similarity only [14]; when λs = 1,

CE

it is similar to the augmented similarity using auxiliary content information [19]; and when 0 < λs < 1, we have a mixed similarity model, which usually performs

AC

the best in our empirical studies. 3.2. Factored Mixed Similarity Model In order to overcome the limitation of intransitivity, we follow the seminal

work [14] and use two latent feature vectors to represent each learned similarity,

CR IP T

ACCEPTED MANUSCRIPT

AN US

Figure 1: Illustration of mixed similarity learning for recommendation with implicit feedback. (`)

i.e., sii0 = Vi· WiT0 · , and have a factored version of the mixed similarity, (m)

(p)

sii0 = ((1 − λs ) + λs sii0 )Vi· WiT0 ·

(6)

ED

M

With the factored mixed similarity, we can estimate the preference of user u on P (m) (m) item i in a similar way to that of [14], i.e., rˆui = bi + √ 1 i0 ∈Iu \{i} sii0 for |Iu \{i}| P (m) (m) an examined item i ∈ Iu , and rˆuj = bj + √1 i0 ∈Iu sji0 for an unexamined |Iu |

(`)

item j 6∈ Iu . We can see that the difference between the prediction rule of rˆui in (m)

FISM [14] as shown in Eq.(4) and ours is the mixed similarity sii0 . Hence, we

PT

call our similarity model factored mixed similarity model (FMSM).

CE

3.3. Pairwise Factored Mixed Similarity Model Furthermore, in order to develop a recommendation algorithm, we adopt the

AC

well-known pairwise preference assumption [27], and assume that the preference of an examined item i ∈ Iu is larger than that of an unexamined item j 6∈ Iu , i.e.,

rˆui > rˆuj . For this reason, we call our algorithm pairwise factored mixed similarity model (P-FMSM). With such a pairwise preference assumption, we reach the

ACCEPTED MANUSCRIPT

objective function of our P-FMSM, min (m)

(m)

u∈U i∈Iu j∈I\Iu

(m)

where fuij = − ln σ(ˆ ruij ) + α2 kVi· k2 + α2 kVj· k2 + α2 α kbj k2 2

(7)

fuij , P

CR IP T

Θ

XX X

i0 ∈Iu

||Wi0 · ||2F + α2 kbi k2 +

is a tentative objective function for the sampled triple (u, i, j), and Θ =

{Wi· , Vi· , bi , i = 1, 2, . . . , m} denotes the set of parameters to be learned. Note

that although the objective function of P-FMSM as shown in Eq.(7) looks very

AN US

similar to that of BPR [27], the prediction rule is totally different with the pro-

posed factored mixed similarity model. The similarity measurement is actually a very fundamental issue as discussed before and is also our focus in this paper. 3.4. Learning the P-FMSM

M

For learning the model parameters θ ∈ Θ in Eq.(7), we use the popular

stochastic gradient descent (SGD) algorithmic framework [27], which is described in Figure 2. Note that the P-FMSM algorithm as shown in Figure 2 refers to the

ED

step of “Learn Parameters” in the middle part of Figure 1. Specifically, our algorithm contains two loops, where the outer loop is for the number of scans, and the

PT

inner loop is for one scan of the whole set of feedback. For each iteration of the inner loop, we first randomly sample an examined (user, item) pair (u, i) ∈ R and

CE

then sample an unexamined item j ∈ I\Iu , which is a typical sampling strategy in recommendation methods based on pairwise preference assumption [27]. Once a triple (u, i, j) is sampled, we can calculate the gradients, and use them to update

AC

the model parameters via θ ← θ − γ∇θ with γ > 0 as the learning rate. The

1:

Initialize the model parameters Θ

2:

for t = 1, . . . , T do for t2 = 1, . . . , |R| do

Randomly pick up a pair (u, i) ∈ R

4:

Randomly pick up an item j from I\Iu

5:

Calculate the gradients ∇θ, θ ∈ Θ

6:

Update the model parameters, θ ← θ − γ∇θ, θ ∈ Θ

7:

AN US

3:

CR IP T

ACCEPTED MANUSCRIPT

Figure 2: The SGD algorithm for P-FMSM.

gradients of the model parameters, ∇θ = (m)

(m)

∂fuij ∂θ

, are as follows,

M

∇bj = −σ(−ˆ ruij )(−1) + αbj , X 1 (p) (m) ((1 − λs ) + λs sii0 )Wi0 · ) + αVj· , ∇Vj· = −σ(−ˆ ruij )(− p |Iu | i0 ∈Iu (m)

ED

∇bi = −σ(−ˆ ruij ) + αbi , X 1 (m) (p) ∇Vi· = −σ(−ˆ ruij ) p ((1 − λs ) + λs sii0 )Wi0 · + αVi· , |Iu \{i}| i0 ∈I \{i}

(8) (9) (10) (11)

u

PT

Vi· Vj· (m) (p) ∇Wi0 · = −σ(−ˆ ruij )((1 − λs ) + λs sii0 )( p −p ) + αWi0 · , i0 ∈ Iu \{i},(12) |Iu \{i}| |Iu | (m)

CE

∇Wi· = −σ(−ˆ ruij ) (m)

(m)

(p)

−((1 − λs ) + λs sii0 )Vj· p + αWi· , |Iu |

(13)

(m)

where ruij = rˆui − rˆuj .

From the algorithm in Figure 2, we can see that the time complexity of our P-

AC

FMSM is O(T |R|dρ), and that of BPR is O(T |R|d), where ρ denotes the average number of examined items by each user.

ACCEPTED MANUSCRIPT

4. Experimental Results 4.1. Datasets

CR IP T

For direct comparative empirical studies, we use four public datasets1 , where the training and test records of each dataset are obtained via a random splitting of the converted implicit feedback [24]. Specifically, the first dataset Movie-

Lens100K contains 943 users, 1682 items, 27688 training records and 27687 test records; the second dataset MovieLens1M contains 6040 users, 3952 items,

AN US

287641 training records and 287640 test records; the third dataset UserTag contains 3000 users, 2000 items, 123218 training records and 123218 test records; and the fourth dataset Netflix5K5K contains 5000 users, 5000 items, 77936 train-

ing records and 77936 test records. Each dataset contains three copies of training records and test records. For the first copy of each dataset, we construct a valida-

M

tion data for parameter tuning via sampling n (i.e., the number of users) records from the corresponding training records.

ED

We put the statistics of the first copy of each dataset in Table 2.

PT

4.2. Evaluation Metrics

For recommendation with implicit feedback, a user is usually only interested

CE

in a few top ranked items (say 5 for example) in the recommendation list instead of their predicted scores [3]. We thus use two commonly adopted ranking-oriented

AC

evaluation metrics in recommendation and information retrieval [24], i.e., P rec@5 and N DCG@5 for precision and normalized discounted cumulative gain (NDCG) at location 5, respectively. 1

https://sites.google.com/site/weikep/GBPRdata.zip

ACCEPTED MANUSCRIPT

Table 2: Statistics of the data used in the experiments, including numbers of users (|U|), items (|I|), implicit feedback (|R|) of training records (Tr.) and test records (Te.), and densities (|R|/|U|/|I|)

CR IP T

of training records.

|U|

|I|

|R| (Tr.)

|R| (Te.)

|R|/|U|/|I| (Tr.)

MovieLens100K

943

1682

27688

27687

1.75%

MovieLens1M

6040

3952

287641

287640

1.21%

UserTag

3000

2000

123218

123218

2.05%

Netflix5K5K

5000

5000

AN US

Data set

77936

77936

0.31%

ED

M

Mathematically, P rec@5 and N DCG@5 are defined as follows [26], " # 5 X 1X 1 P re@5 = δ(Lu (p) ∈ Iute ) , |U te | 5 p=1 u∈U te " # u 5 X 1 2δ(Lu (p)∈Ite ) − 1 1 X , N DCG@5 = Pmin(5,|Iute |) 1 |U te | log(p + 1) te p=1 p=1 u∈U

log(p+1)

where Lu (p) is the pth item for user u in the recommendation list, and δ(x) is an

PT

indicator function with value of 1 if x is true and 0 otherwise. Note that U te and Iute denote the set of test users and the set of examined items by user u in the test

CE

data, respectively.

AC

4.3. Baselines and Parameter Settings In order to study the effectiveness of the developed algorithm with the de-

signed mixed similarity model in the pairwise preference learning framework, we choose the state-of-the-art methods based on the predefined similarity, the learned

ACCEPTED MANUSCRIPT

similarity or the pairwise preference assumption. Specifically, we include the following baselines:

ommendation with implicit feedback;

CR IP T

• Ranking based on items’ popularity (PopRank) is a basic algorithm for rec-

• Item-oriented collaborative filtering with Cosine similarity (ICF) [5] is a classical algorithm based on the predefined similarity;

AN US

• ICF with an amplifier on the similarity (ICF-A) [2] favors the items with higher similarity;

• Factored item similarity model with AUC loss (FISMauc) and RMSE loss

(FISMrmse) [14] are two seminal algorithms based on the learned similar-

M

ity;

• Hierarchical poisson factorization (HPF) [8] is a recent Bayesian approach

ED

tailed for modeling implicit feedback in recommender systems;

• Bayesian personalized ranking (BPR) [27] is the first work that introduces

PT

the pairwise preference assumption into the task of recommendation with

CE

implicit feedback, which performs among the best in most cases; • Adaptive K nearest neighbors based recommendation (A-KNN) [27] gen-

AC

eralizes the pairwise preference assumption to ICF with similarity learning.

However, the learned similarity in A-KNN is not factorized, which thus has the same limitation of intransitivity. For fair comparison, we implement a factored version of adaptive KNN (FA-KNN);

ACCEPTED MANUSCRIPT

• and Group preference based BPR (GBPR) [24] is an extended pairwise preference learning method considering user groups.

CR IP T

For ICF and ICF-A, we use Cosine similarity and |Ni | = 20 in neighborhood

construction. For the amplifier in ICF-A, we use 2.5 as a typical value [2]. For

factorization-based methods, i.e., FISMauc, FISMrmse, BPR, FA-KNN, GBPR

and our P-FMSM, we use d = 20 latent features, and search the tradeoff parameter on regularization terms α ∈ {0.001, 0.01, 0.1} and iteration number T ∈

AN US

{100, 500, 10000} using N DCG@5. For the Bayesian method HPF, we use the public code2 and default parameter configurations, i.e., 20 for the latent dimension

number, 10 for the iteration number multiple, 0.3 for the hyperparameter, and true for bias and hierarchy. The tradeoff parameter λs for the mixed similarity in PFMSM is also searched via N DCG@5 from {0.2, 0.4, 0.6, 0.8, 1}. The learning

M

rate γ is fixed as 0.01. The results of GBPR are copied from [24]. Note that the

sampling strategy of BPR and P-FMSM in our empirical studies in this paper is

ED

shown in Figure 2, which is more efficient than that of BPR and GBPR in [24].

PT

4.4. Experimental Results 4.4.1. Overall Results

CE

The main experimental results are shown in Tables 3, 4, 5 and 6. We can have the following observations: (i) our proposed P-FMSM performs significantly better than all other methods in all cases (the p-value of the significance test is smaller

AC

than 0.01)3 , which clearly shows the advantages of integrating the predefined and 2 3

https://github.com/premgopalan/hgaprec We calculate the p-value of the statistical significance test via the MATLAB function ttest2.m

as shown at http://www.mathworks.com/help/stats/ttest2.html

ACCEPTED MANUSCRIPT

learned similarities in the pairwise preference learning framework; (ii) the recommendation methods based on the learned similarity (e.g., FISMauc, FISMrmse,

CR IP T

FA-KNN and our P-FMSM) usually perform better than that based on the predefined similarity (i.e., ICF and ICF-A), which shows the helpfulness of similarity learning for a recommendation algorithm; (iii) the recommendation methods

based on the pairwise preference assumption (e.g., BPR, FA-KNN, GBPR and our P-FMSM) perform well, which shows its effectiveness for the uncertain implicit

AN US

feedback; (iv) and the very basic method PopRank usually performs worst as expected as it is a non-personalized solution and recommends the same set of most popular items to all users. 4.4.2. Effect of Mixed Similarity

In this subsection, we study the effect of the mixed similarity in P-FMSM.

M

For direct comparison, we fix λs = 0 and obtain a reduced model P-FISM with the unmixed learned similarity only. We show the results in Figure 3, and can

ED

see that P-FMSM is better P-FISM across all the four datasets, which clearly shows the merit of the mixed similarity via exploiting the complementarity of the

PT

predefined similarity and the learned similarity in the pairwise preference learning

CE

framework.

4.4.3. Effect of Neighborhood Size

AC

In this subsection, we study the effect of using different neighborhood sizes,

i.e., K ∈ {20, 30, 40, 50}, of P-FMSM instead of using all the neighbors by default. For fair comparison, we also use the same numbers of neighbors in ICF.

The recommendation performance on P rec@5 and N DCG@5 are shown in Figure 4 and Figure 5, respectively. We can have the following observations: (i) both

CR IP T

ACCEPTED MANUSCRIPT

Table 3: Recommendation performance of P-FMSM and other methods on MovieLens100K using

P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are

AN US

marked in bold (p value < 0.01).

P rec@5

PopRank

N DCG@5

0.2724±0.0094 0.2915±0.0072

ICF

0.3145±0.0018 0.3305±0.0010

M

ICF-A

0.2820±0.0048 0.2943±0.0091 0.3651±0.0086 0.3788±0.0139

FISMrmse (α = 0.001, T = 100)

0.3987±0.0085 0.4153±0.0092

HPF

0.3412±0.0077 0.3523±0.0097

ED

FISMauc (α = 0.01, T = 100)

0.3627±0.0079 0.3779±0.0079

FA-KNN (α = 0.1, T = 1000)

0.3679±0.0044 0.3868±0.0076

GBPR

0.4051±0.0038 0.4201±0.0031

CE

PT

BPR (α = 0.1, T = 1000)

0.4115±0.0059 0.4320±0.0054

AC

P-FMSM (α = 0.001, T = 500, λs = 0.4)

CR IP T

ACCEPTED MANUSCRIPT

Table 4: Recommendation performance of P-FMSM and other methods on MovieLens1M using

P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are

AN US

marked in bold (p value < 0.01).

P rec@5

PopRank

N DCG@5

0.2822±0.0019 0.2935±0.0010

ICF

0.3831±0.0021 0.3966±0.0020

M

ICF-A

0.3921±0.0016 0.4066±0.0022 0.3502±0.0069 0.3569±0.0082

FISMrmse (α = 0.001, T = 100)

0.4227±0.0006 0.4366±0.0011

HPF

0.4131±0.0080 0.4262±0.0096

ED

FISMauc (α = 0.001, T = 100)

0.4195±0.0013 0.4307±0.0008

FA-KNN (α = 0.01, T = 100)

0.3650±0.0026 0.3808±0.0017

GBPR

0.4494±0.0020 0.4636±0.0014

CE

PT

BPR (α = 0.01, T = 1000)

0.4562±0.0037 0.4731±0.0038

AC

P-FMSM (α = 0.001, T = 500, λs = 0.4)

CR IP T

ACCEPTED MANUSCRIPT

Table 5: Recommendation performance of P-FMSM and other methods on UserTag using P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are marked in

AN US

bold (p value < 0.01).

P rec@5

PopRank

N DCG@5

0.2647±0.0012 0.2730±0.0014

ICF

0.2257±0.0051 0.2306±0.0045

M

ICF-A

0.2480±0.0028 0.2540±0.0032 0.2536±0.0031 0.2619±0.0037

FISMrmse (α = 0.001, T = 100)

0.3006±0.0046 0.3092±0.0049

HPF

0.2684±0.0070 0.2756±0.0074

ED

FISMauc (α = 0.1, T = 500)

0.2849±0.0036 0.2931±0.0047

FA-KNN (α = 0.1, T = 1000)

0.2641±0.0046 0.2720±0.0036

GBPR

0.3011±0.0008 0.3104±0.0009

CE

PT

BPR (α = 0.1, T = 500)

0.3129±0.0026 0.3218±0.0030

AC

P-FMSM (α = 0.001, T = 500, λs = 0.6)

CR IP T

ACCEPTED MANUSCRIPT

Table 6: Recommendation performance of P-FMSM and other methods on Netflix5K5K using

P rec@5 and N DCG@5. The searched values of α, T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are

P rec@5

N DCG@5

0.1728±0.0012

0.1794±0.0004

0.2017±0.0012

0.2186±0.0008

0.1653±0.0032

0.1804±0.0041

FISMauc (α = 0.01, T = 500)

0.1932±0.0039

0.2022±0.0023

FISMrmse (α = 0.001, T = 100)

0.2217±0.0043

0.2413±0.0063

HPF

0.1899±0.0047

0.2044±0.0041

0.2207±0.0051

0.2374±0.0055

FA-KNN (α = 0.001, T = 100)

0.2101±0.0020

0.2254±0.0029

GBPR

0.2411±0.0027

0.2611±0.0025

CE

AN US

marked in bold (p value < 0.01).

0.2469±0.0027

0.2661±0.0027

PopRank ICF

ED

M

ICF-A

PT

BPR (α = 0.01, T = 1000)

AC

P-FMSM (α = 0.001, T = 1000, λs = 0.2)

ACCEPTED MANUSCRIPT

P−FISM P−FMSM

0.4

0.3

0.2

MovieLens100K

MovieLens1M

UserTag

P rec@5

P−FISM P−FMSM

0.4

0.3

MovieLens100K

MovieLens1M

M

0.2

Netflix5K5K

AN US

NDCG@5

0.5

CR IP T

Prec@5

0.5

UserTag

Netflix5K5K

ED

N DCG@5

Figure 3: Recommendation performance of P-FISM (i.e., with learned similarity only when

PT

λs = 0) and P-FMSM. Note that the searched values of P-FISM are α = 0.01, T = 1000 on MovieLens100K, α = 0.001, T = 500 on MovieLens1M, α = 0.001, T = 100 on UserTag, and

CE

α = 0.001, T = 500 on Netflix5K5K.

P-FMSM and P-FMSM (with different K) give about 23% on MovieLens100K,

AC

10% on MovieLens1M, 28% on UserTag, and 15% on Netflix5k5k of significantly better results compared to ICF w.r.t. the corresponding values of the neighborhood sizes K ∈ {20, 30, 40, 50}, which shows the effectiveness of the developed algo-

rithm; and (ii) the performance of P-FMSM and P-FMSM (with different K) are

close, which shows that we can use a proper size of neighborhood instead of using

ACCEPTED MANUSCRIPT

all the neighbors to achieve low space complexity. It is an appealing property for real deployment by industry practitioners.

CR IP T

The above experimental results show that our P-FMSM is significantly more accurate than the state-of-the-art recommendation methods for the studied problem, which shows the effectiveness of mixed similarity learning model via inte-

grating the predefined similarity, learned similarity and pairwise preference learning in one single algorithm. One major potential limitation is its effectiveness in

AN US

an extremely sparse case, because neither the predefined similarity nor the learned similarity can guarantee to succeed in bridging two latently related items well.

Prec@5

0.4

0.3

20

30

K

40

ED

MovieLens100K

0.4

0.3

20

AC

CE

0.2

30

K

UserTag

40

50

0.3

20

30

K

ICF w/ different K P−FMSM w/ different K P−FMSM 40 50

MovieLens1M 0.5

ICF w/ different K P−FMSM w/ different K P−FMSM

PT

Prec@5

0.5

0.4

0.2

50

Prec@5

0.2

0.5

ICF w/ different K P−FMSM w/ different K P−FMSM

M

Prec@5

0.5

ICF w/ different K P−FMSM w/ different K P−FMSM

0.4

0.3

0.2

20

30

K

40

50

Netflix5K5K

Figure 4: Recommendation performance of P rec@5 using ICF and P-FMSM with different values

of K (i.e., different neighborhood sizes), and the default P-FMSM with all neighbors.

ACCEPTED MANUSCRIPT

0.5

0.4

0.3

0.2

20

30

K

ICF w/ different K P−FMSM w/ different K P−FMSM 40 50

0.4

0.3

0.2

20

MovieLens100K NDCG@5

0.3

20

30

K

K

MovieLens1M 0.5

ICF w/ different K P−FMSM w/ different K P−FMSM

0.4

0.2

30

40

50

UserTag

ICF w/ different K P−FMSM w/ different K P−FMSM

0.4

AN US

NDCG@5

0.5

ICF w/ different K P−FMSM w/ different K P−FMSM 40 50

CR IP T

NDCG@5

NDCG@5

0.5

0.3

0.2

20

30

K

40

50

Netflix5K5K

M

Figure 5: Recommendation performance of N DCG@5 using ICF and P-FMSM with different

5. Related Work

ED

values of K (i.e., different neighborhood sizes), and the default P-FMSM with all the neighbors.

PT

In the big data era, the information overload problem has become unavoidable for almost everyone. We are submerged into a world with various types of infor-

CE

mation from social updates to recent scientific and technology advancement. So far, we mainly have two approaches to address the information overload problem

AC

to some extent, including information retrieval [4] and intelligent recommendation [28]. The paradigm of information retrieval is from a user-issued query to a ranked list of documents w.r.t. a certain relevance metric, which needs proactive involvement of the user. On the contrary, an intelligent recommendation engine exploits users’ historical behavior and then generates a personalized ranked list of

ACCEPTED MANUSCRIPT

items (e.g., web pages [20], tags [1], travel sequences [13] and many others [18]) to a target user or a group of target users [34] without intensive user involvement.

CR IP T

Technically, there are mainly two types of recommendation methods, including neighborhood-based recommendation and factrization-based recommendation, besides the very basic approach of recommendation based on items’ popularity

(i.e., PopRank). We will discuss these two types of recommendation methods separately in the subsequent two subsections.

AN US

5.1. Neighborhood-based Recommendation

There are usually two types of users’ feedback in a typical recommendation system [28, 12], including explicit feedback such as 5-star ratings and implicit feedback such as examination records. For either recommendation with explicit

M

feedback or implicit feedback, neighborhood construction is always a very critical step [21], which requires some similarity measurement between two users or two

ED

items.

For explicit feedback, the most well-known similarity measurement is probably the Pearson correlation coefficient (PCC) [7, 9] defined on the co-rated items

PT

by two users or co-raters to two items. For implicit feedback, we have similarities such as Cosine similarity, Jaccard index and their extensions [5, 33]. Note that

CE

the similarity between two users (or two items) such as Cosine similarity may be

AC

further amplified to favor users (or items) with high similarity [2]. 5.2. Factorization-based Recommendation Factorization-based methods have been well recognized as the state-of-the-

art methods in most recommendation problems. For explicit feedback, a typical

ACCEPTED MANUSCRIPT

factorization-based method will approximate each feedback, e.g., a numerical rating, with the inner product of two latent feature vectors [29] or with additional fea-

CR IP T

ture vectors of implicitly examined items [15]. There are also some factorizationbased methods that optimize the ranking list of the derived relative ordering from explicit feedback directly [35].

For implicit feedback, a factorization-based method is usually associated with

a preference assumption, including pointwise preference assumption [10, 14, 22],

AN US

pairwise preference assumption [14, 27, 36, 24], and listwise preference assump-

tion [30]. A specific preference assumption is usually associated with a loss to be minimized or an objective to be maximized, such as square loss [10, 22] in pointwise assumption, Logistic loss [27], Hinge loss [36] and square loss of relative preference difference [14] in pairwise assumption, and the objective of mean

M

reciprocal rank (MRR) [30] in listwise assumption. For the prediction rule in a typical loss or objective function, we usually have two choices, including the one

ED

with the inner product of two latent feature vectors [10, 22, 27, 30, 36] and the other using the prediction rule of neighborhood-based method [14]. Besides, a

PT

recent approach digests the implicit user feedback in the Bayesian framework [8] instead of the well-known matrix-factorization framework.

CE

Similarity between two items or two users is a very fundamental issue in recommendation. Our mixed similarity learning model goes one step beyond traditional unmixed similarity such as the predefined similarity or the learned similarity

AC

only, and inherits the merits of both two. The designed mixed similarity learning model is also generic and can be applied to recommendation settings with heterogeneous feedback [25] and social connections [32]. We summarize some related works in Table 7, from which we can see that our

ACCEPTED MANUSCRIPT

Table 7: Summary of some related works,

Factorization-based

recommendation

recommendation

CR IP T

Neighborhood-based

w/o similarity: e.g., [29, 35] Explicit feedback

predefined similarity: e.g., [7]

learned similarity: e.g., [15] mixed similarity: N/A

w/o similarity: e.g., [8, 10, 22, 24, 27, 30, 36] predefined similarity: e.g., [2, 5]

learned similarity: e.g., [14]

AN US

Implicit feedback

mixed similarity: e.g., P-FMSM

proposed mixed similarity learning model is a novel solution for recommendation

M

with implicit feedback. 6. Conclusions and Future Work

ED

In this paper, we study an important recommendation problem with implicit feedback from the perspective of item similarity. Specifically, we propose a novel

PT

mixed similarity learning model that exploits the complementarity of the predefined similarity and the learned similarity commonly used in the state-of-the-art

CE

recommendation methods. As far as we know, we are the first to study the integration of such two canonical approaches for bridging two items in a principled way.

AC

With the mixed similarity, we further develop a novel recommendation algorithm in the pairwise preference learning framework, i.e., pairwise factored mixed sim-

ilarity model (P-FMSM). Our P-FMSM can recommend significantly better than the state-of-the-art methods with the predefined similarity or the learned similarity.

ACCEPTED MANUSCRIPT

For future works, we are interested in studying the mixed similarity model in an extremely sparse case, generalizing it to cross-domain problem settings with

ploying it in a cloud computing environment [17]. Acknowledgments

CR IP T

heterogeneous auxiliary data of text and temporal inforamtion [16, 23], and de-

We thank the handling editor and reviewers for their expert comments and con-

AN US

structive suggestions. We thank the support of National Natural Science Founda-

tion of China No. 61502307 and No. 61672358, and Natural Science Foundation of Guangdong Province No. 2014A030310268 and No. 2016A030313038. References

M

[1] Fabiano M. Bel´em, Carolina S. Batista, Rodrygo L. T. Santos, Jussara M. Almeida, and Marcos A. Gonc¸alves. Beyond relevance: Explicitly promot-

ED

ing novelty and diversity in tag recommendation. ACM Transactions on

PT

Intelligent Systems and Technology, 7(3):26:1–26:34, 2016. [2] John S. Breese, David Heckerman, and Carl Myers Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the

CE

14th Conference on Uncertainty in Artificial Intelligence, UAI ’98, pages

AC

43–52, 1998.

[3] Li Chen and Pearl Pu. Users’ eye gaze pattern in organization-based recommender interfaces. In Proceedings of the 16th International Conference on Intelligent User Interfaces, IUI ’11, pages 311–314, 2011.

ACCEPTED MANUSCRIPT

[4] Prabhakar Raghavan Christopher D. Manning and Hinrich Sch¨utze. Introduction to Information Retrieval. Cambridge University Press, 2008.

CR IP T

[5] Mukund Deshpande and George Karypis. Item-based top-n recommenda-

tion algorithms. ACM Transactions on Information Systems, 22(1):143–177, 2004.

[6] Asmaa Elbadrawy and George Karypis. User-specific feature-based similar-

AN US

ity models for top-n recommendation of new items. ACM Transactions on Intelligent Systems and Technology, 6(3):57:1–57:22, 2015.

[7] David Goldberg, David Nichols, Brian M. Oki, and Douglas Terry. Using collaborative filtering to weave an information tapestry. Communications of

M

the ACM, 35(12):61–70, 1992.

[8] Prem Gopalan, Jake M. Hofman, and David M. Blei. Scalable recommen-

ED

dation with hierarchical poisson factorization. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, UAI ’15, pages 326–

PT

335, 2015.

[9] Guibing Guo, Jie Zhang, and Neil Yorke-Smith. A novel bayesian similarity

CE

measure for recommender systems. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, IJCAI ’13, pages 2619–2625,

AC

2013.

[10] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 8th IEEE International Conference on Data Mining, ICDM ’08, pages 263–272, 2008.

ACCEPTED MANUSCRIPT

[11] Michael Jahrer and Andreas T¨oscher. Collaborative filtering ensemble for ranking. In Proceedings of KDD Cup 2011 Competition, pages 153–167,

CR IP T

2012. [12] Gawesh Jawaheer, Peter Weller, and Patty Kostkova. Modeling user preferences in recommender systems: A classification framework for explicit and

implicit user feedback. ACM Transactions on Interactive Intelligent Systems, 4(2):8:1–8:26, 2014.

AN US

[13] Shuhui Jiang, Xueming Qian, Tao Mei, and Yun Fu. Personalized travel sequence recommendation on multi-source big social media. IEEE Transactions on Big Data, 2(1):43–56, 2016.

[14] Santosh Kabbur, Xia Ning, and George Karypis. Fism: Factored item sim-

M

ilarity models for top-n recommender systems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data

ED

Mining, KDD ’13, pages 659–667, 2013. [15] Yehuda Koren. Factorization meets the neighborhood: a multifaceted col-

PT

laborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08,

CE

pages 426–434, 2008.

AC

[16] Jianqiang Li, Jing Li, Xianghua Fu, M. A. Masud, and Joshua Zhexue Huang.

Learning distributed word representation with multi-contextual

mixed embedding. Knowledge-Based Systems, 106:220–230, 2016.

[17] Jiayin Li, Meikang Qiu, Zhong Ming, Gang Quan, Xiao Qin, and Zonghua Gu. Online optimization for scheduling preemptable tasks on iaas cloud

ACCEPTED MANUSCRIPT

systems. Journal of Parallel and Distributed Computing, 72(5):666–677, 2012.

CR IP T

[18] Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. Recommender system application developments: A survey. Decision Support Systems, 74:12–32, 2015.

[19] Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. Content-based collaborative filtering for news topic recommendation. In

’15, pages 217–223, 2015.

AN US

Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI

[20] Thi Thanh Sang Nguyen, Haiyan Lu, and Jie Lu. Web-page recommendation based on web usage and domain knowledge. IEEE Transactions on

M

Knowledge and Data Engineering, 26(10):2574–2587, 2014. [21] Xia Ning, Christian Desrosiers, and George Karypis. A comprehensive sur-

ED

vey of neighborhood-based recommendation methods. In Recommender Systems Handbook (Second Edition), pages 37–76. 2015.

PT

[22] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. One-class collaborative filtering. In Proceedings

CE

of the 8th IEEE International Conference on Data Mining, ICDM ’08, pages

AC

502–511, 2008.

[23] Weike Pan. A survey of transfer learning for collaborative recommendation with auxiliary data. Neurocomputing, 177:447–453, 2016.

[24] Weike Pan and Li Chen. Gbpr: Group preference based bayesian personalized ranking for one-class collaborative filtering. In Proceedings of the 23rd

ACCEPTED MANUSCRIPT

International Joint Conference on Artificial Intelligence, IJCAI’13, pages 2691–2697, 2013.

CR IP T

[25] Weike Pan, Shanchuan Xia, Zhuode Liu, Xiaogang Peng, and Zhong Ming. Mixed factorization for collaborative recommendation with heterogeneous explicit feedbacks. Information Sciences, 332:84–93, 2016.

[26] Weike Pan, Hao Zhong, Congfu Xu, and Zhong Ming. Adaptive bayesian

AN US

personalized ranking for heterogeneous implicit feedbacks. KnowledgeBased Systems, 73:173–180, 2015.

[27] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars SchmidtThieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence,

M

UAI ’09, pages 452–461, 2009.

ED

[28] Francesco Ricci, Lior Rokach, and Bracha Shapira. Recommender Systems Handbook (Second Edition). Springer, 2015.

PT

[29] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Annual Conference on Neural Information Processing Systems 20, NIPS

CE

’08, pages 1257–1264, 2008. [30] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria

AC

Oliver, and Alan Hanjalic. Climf: learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings of the 6th ACM Conference on Recommender Systems, RecSys ’12, pages 139–146, 2012.

ACCEPTED MANUSCRIPT

[31] G´abor Tak´acs and Domonkos Tikk. Alternating least squares for personalized ranking. In Proceedings of the 6th ACM Conference on Recommender

CR IP T

Systems, RecSys ’12, pages 83–90, 2012. [32] Jiliang Tang, Xia Hu, and Huan Liu. Social recommendation: a review. Social Network Analysis and Mining, 3(4):1113–1133, 2013.

[33] Koen Verstrepen and Bart Goethals. Unifying nearest neighbors collabora-

AN US

tive filtering. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 177–184, 2014.

[34] Wei Wang, Guangquan Zhang, and Jie Lu. Member contribution-based group recommender system. Decision Support Systems, 87:80–93, 2016.

M

[35] Markus Weimer, Alexandros Karatzoglou, and Alexander J. Smola. Improving maximum margin matrix factorization. Machine Learning, 72(3):263–

ED

276, 2008.

[36] Shuang-Hong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, and

PT

Zhaohui Zheng. Collaborative competitive filtering: learning recommender using context of user choice. In Proceeding of the 34th International ACM

CE

SIGIR Conference on Research and Development in Information Retrieval,

AC

SIGIR ’11, pages 295–304, 2011.