User preference modeling based on meta paths and diversity regularization in heterogeneous information networks

User preference modeling based on meta paths and diversity regularization in heterogeneous information networks

Knowledge-Based Systems 181 (2019) 104784 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locat...

830KB Sizes 0 Downloads 10 Views

Knowledge-Based Systems 181 (2019) 104784

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

User preference modeling based on meta paths and diversity regularization in heterogeneous information networks✩ ∗

Hongzhi Liu a , , Zhengshen Jiang b , Yang Song c , Tao Zhang c , Zhonghai Wu d ,



a

School of Software and Microelectronics, Peking University, Beijing, 102600, PR China School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, PR China c BossZhipin NLP [email protected], Beijing, 100028, PR China d National Engineering Center of Software Engineering, Peking University, Beijing, 100871, PR China b

article

info

Article history: Received 17 September 2018 Received in revised form 16 May 2019 Accepted 16 May 2019 Available online 21 May 2019 Keywords: Recommender systems Heterogeneous information network Ensemble learning Meta path User preference

a b s t r a c t Recommendation methods based on heterogeneous information networks (HINs) have been attracting increased attention recently. Meta paths in HINs represent different types of semantic relationships. Meta path-based recommendation methods aim to use meta paths in HINs to evaluate the similarity or relevancy between nodes to make recommendations. In previous work, the meta paths have usually been selected manually (based on experience), and the path weight optimization methods usually suffer from overfitting. To solve these problems, we propose to automatically select and combine the meta paths through weight optimization. Diversity is introduced into the objective function as a regularization term to avoid overfitting. Inspired by the ambiguity decomposition theory in ensemble learning, we present a new diversity measure and use it to encourage diversity among meta paths to improve recommendation performance. Experimental results on item recommendation and tag recommendation tasks confirm the effectiveness of the proposed method compared with traditional collaborative filtering and state-of-the-art HIN-based recommendation methods. © 2019 Elsevier B.V. All rights reserved.

1. Introduction 1.1. Background Recommender systems have been used to provide users with recommendations of products or services in various domains, such as e-commerce platforms and social networks [1,2]. Commonly used recommendation approaches fall into three categories: content-based filtering, collaborative filtering, and hybrid approaches [3,4]. As a type of hybrid approach, heterogeneous information network (HIN)-based recommendation methods have been attracting increased attention recently. HINs are logical networks involving multiple-typed objects and multiple-typed links denoting different relations [5]. For instance, Fig. 1 illustrates an example of an HIN, which contains five types of objects: users, movies, genres, actors, and directors, together with the relationships between them.

Fig. 1. An example of a heterogeneous corresponding network schema.

information

network

and

its

HIN-based approaches combine user feedback data with addi✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys. 2019.05.027. ∗ Corresponding authors. E-mail addresses: [email protected] (H. Liu), [email protected] (Z. Wu). https://doi.org/10.1016/j.knosys.2019.05.027 0950-7051/© 2019 Elsevier B.V. All rights reserved.

tional information, such as item or user attributes and relationships. Therefore, they can emulate collaborative filtering, contentbased filtering, context-aware recommendation, and combinations of these recommendation semantics [6].

2

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784

In this paper, we focus on the common scenario in which feedback is implicit (for example, clicks, purchases, or likes). Compared with explicit feedback data (for example, 5-star graded ratings), implicit feedback data is more widespread and more easily collected [7]. In addition, implicit feedback-based recommender systems have several advantages: they are less intrusive, more widely applicable, and can provide a better customer experience [8]. 1.2. Related work Several HIN-based recommendation algorithms have been proposed [6,9,10], which can be divided into two categories: random walk-based and meta path-based methods. Random walk-based methods aim to calculate the similarity or relevancy between nodes by simulating the physics process of random walk [11]. One of the most commonly used random walk models is random walk with restart [12]. This assumes that a random walker, starting from a node i, recursively moves to a random neighbor with probability α or returns to node i with (t) probability 1 −α . Let rj denote the probability that the walker is located at node j at time t, and vector r(t) denote the probability of all nodes at time t. We can write the iterative equation as follows: r(t +1) = α P · r(t) + (1 − α )t

(1)

where P is the transition probability matrix and t is called the ‘‘teleport vector’’, which is usually set as t = r(0) . That is, the elements of t are all zeros except the element for the starting node i, for which ti = 1. The random walk with restart model was traditionally used in PageRank for information retrieval [13], and was introduced into recommender systems by Personalized PageRank [14]. Several similar methods have been proposed, including ObjectRank [15] and ItemRank [16]. Recently, Pham et al. [17] and Jiang et al. [18] proposed two HIN-based methods with this model, called HeteRS and HeteLearn, respectively. These methods assume that all nodes are the same and ignore the differences between paths with the same length, which limits their performance. Meta path-based methods use meta paths in HINs to evaluate the similarity (or relevancy) between nodes to make recommendations. Meta paths in HINs represent different types of semantic relationships. For example, in Fig. 1, the meta path User − Mov ie − Genre − Mov ie explores the movies of the same genre as the movies that the user has watched; that is, it assumes that users like to watch movies according to genres. In addition, the meta path User − Mov ie − User − Mov ie can be interpreted as user-based collaborative filtering: it assumes that users may like movies watched by similar users. Typical meta path-based similarities include PathSim [19] and HeteSim [20]. Two of the most representative meta path-based recommendation methods are path-constrained random walk (PCRW) [21] and PathRank [6]. PCRW [21] uses single paths in HINs to make recommendations; this method is also called semantic recommendation because each meta path has its own corresponding semantics. The limited information contained in single paths restricts the performance of this method. To solve this problem, PathRank [6] combines multiple meta paths with the random walk with restart model. Although it can achieve better performance, it relies on manually selecting several meta paths.

1.3. Motivation Random walk-based methods, which usually employ the stable solution of the random walk-based model as the scores of nodes, have the following disadvantages: First, they assume that all nodes with paths between them interact with each other. However, random walks along long paths may result in poor generalization performance, because the longer the path, the less likely it is that the path reflects the real preferences of users. Second, it is usually costly to compute the stable solution. Finally, it is usually assumed that all nodes and paths are the same, which results in difficulties in explaining the results of the random walk. Meta path-based methods avoid the above problems. However, single meta path-based methods, such as PCRW [21], assume that user preference is determined by a single meta path, and ignore the effects of all other paths. Methods based on multiple meta paths, such as PathRank [6], aim to combine information from different meta paths. However, they face two new problems: (1) which meta paths should be combined, and (2) how to set the weights to combine them. In previous work, the meta paths were usually selected manually, based on experience. In this paper, we propose to select and combine the meta paths automatically through weight optimization. Existing weight optimization methods usually suffer from overfitting. In this paper, inspired by the theory of ensemble learning, we introduce diversity into the objective function to avoid overfitting. 1.4. Contributions Our contributions are both theoretical and empirical. First, we prove that the result of the random walk with restart model is a weighted accumulation of a series of random walks with different lengths, and longer random walks have less impact on the final result. Because long paths may result in poor generalization performance and high computational cost, we propose to use a combination of meta paths with a limited number of steps to model user preferences. Second, we propose to select and combine the meta paths automatically through weight optimization. Diversity is introduced into the objective function as a regularization term to avoid overfitting. Third, inspired by the ambiguity decomposition theory in ensemble learning, we present a new diversity measurement for AUC optimization, and use it to encourage diversity among meta paths to improve recommendation performance. Finally, we empirically validate the effectiveness of the proposed method for both item recommendation and tag recommendation tasks. For both tasks, the proposed method performs significantly better than both traditional collaborative filtering methods and state-of-the-art HIN-based recommendation methods. The remainder of this paper is organized as follows. Section 2 introduces some background knowledge and formulates the problem. Section 3 describes the proposed model and algorithm. Empirical evaluation of the proposed algorithm and a comparison with other state-of-the-art algorithms are presented in Section 4. Finally, Section 5 concludes the paper. 2. Preliminaries 2.1. Heterogeneous information network A HIN is a type of information network involving multipletyped objects and multiple-typed links, which can integrate behavior information and attribute information into one representation.

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784

Definition 2.1 (Heterogeneous Information Network [19]). An information network is a directed graph G = (V , E) with an object type mapping function φ : V → A and a link type mapping function ψ : E → R, where each object v ∈ V belongs to a particular object type φ (v ) ∈ A, and each link e ∈ E belongs to a particular relation ψ (e) ∈ R. When the types of objects satisfy |A| > 1 or the types of relations satisfy |R| > 1, the network is a heterogeneous information network; otherwise, it is a homogeneous information network. The network schema, also called a meta-graph, provides a metalevel description of an HIN. Definition 2.2 (Network Schema [19]). The network schema is a meta-template for a heterogeneous network G = (V , E) with an object type mapping φ : V → A and link mapping ψ : E → R. The network schema is a directed graph defined over object types A, with edges as relations from R, denoted as TG = (A, R). An example of an HIN and its corresponding network schema are shown in Fig. 1. 2.2. Meta path In HINs, two nodes may be connected by different paths, which may have various lengths and may contain different node types and link types. The concept of the meta path was proposed to describe the path types in an HIN [19]. Definition 2.3 (Meta Path [19]). A meta path P is a path defined on the graph of network schema TG = (A, R), and takes the form R1

Rk

R2

A0 → A1 → · · · → Ak , which defines a composite relation R1 ◦ R2 ◦ · · · ◦ Rk between node types A0 and Ak , where ◦ denotes the composition operator on relations. The length of a meta path P is the number of relations in it. In this paper, we use names of node types to denote the meta path: P = A1 A2 ...Ak . For instance, UMAM is used to denote the meta path User − Mov ie − Actor − Mov ie in Fig. 1, whose length is 3. Each meta path has its corresponding semantics [20]. For instance, the meta path UMUM explores the social relationships of users who watched movies in common with the target user, while the meta path UMAM utilizes movie-actor links to build relationships between users and movies. 2.3. Random walk with restart As noted previously, the random walk with restart model has been widely used in graph-based recommendation methods. The iterative equation has been shown in Eq. (1). This model has a stable solution. However, it is usually costly to compute the stable solution in practice. In the following, we will show that it is not necessary to iterate the model until convergence. From the iterative equation in Eq. (1), with the typical setting t = r(0) , we can infer: r(k) = α k Pk r(0) +

k−1 ∑

α i (1 − α )Pi r(0)

(2)

i=0

Note that Pi r(0) corresponds to a random walk with i steps. Therefore, the result of a k-step random walk with restart is a weighted accumulation of a series of random walks with no more than k steps. The coefficients α i (1 − α ) reflect the relative importance of random walks with different lengths. It satisfies the geometric distribution and decays exponentially. Half-life, a commonly used

3

term for characterizing exponential decay, is a characteristic unit for the exponential decay equation [22]. The half-life τ of the relative importance of a random walk satisfies the equation

α τ (1 − α ) = (1 − α )/2

(3)

Hence τ = ln 0.5/ln α . With the typically setting used in practice, α = 0.8, we obtain τ ≈ 3.11. Therefore, it is reasonable to set k = 3 in most cases. The importance of a path decays exponentially as its length increases. Conversely, the number of explored items for recommendation and the computational cost increase exponentially as the length of the path increases. In addition, the semantics of long paths are less clear than those of short paths, and the correlations between nodes connected through long paths are also less reliable. Therefore, long paths may introduce more noise, compared with information, and using long paths in the learning procedure may result in poor generalization performance with high computational cost. In summary, the random walk with restart model can be decomposed into a series of meta paths with different lengths. Moreover, it is not necessary to use long paths, considering the problems of semantic interpretation, information contribution, and computational cost. 2.4. Problem definition Let U and I denote the set of users and items, respectively. With u ∈ U and i ∈ I denoting a user and an item, respectively, we define the implicit user feedback matrix R as follows:

{ Rui =

1,

if user u has given positive feedback on item i;

0,

otherwise.

The set Iu+ ⊂ I denotes the items on which user u has given positive feedback: Iu+ = {i | Rui = 1}. The set of remaining items is denoted by I \ Iu+ . The predicted preference of user u for item i is denoted as rui . The goal of personalized recommendation is to give each user u a personalized ranked list of items from I \Iu+ . This problem is usually called the one-class recommendation problem [23] or recommendation with implicit feedback [24,25]. Besides the implicit feedback matrix R, we assume that there exists some additional information, such as item or user attributes and relationships, which can be used to construct the corresponding HIN. Our goal is to use this information to model the preferences of users and give a personalized recommendation for each user. 3. User preference modeling and model learning with diversity regularization First, in Section 3.1, we propose a meta path-based user preference model with the assumption that each meta path starting from USER reflects some preferences of users. Second, in Section 3.2, we design an algorithm, called HeteBPR, to learn model parameters with the goal of optimizing AUC. Third, in Section 3.3, we present a new diversity measure for AUC optimization and introduce it into the objective function. Finally, we design a new algorithm, call HeteWeight, to learn the model parameters with the goal of maximizing both AUC and diversity among meta paths. 3.1. Meta path-based user preference model The key to personalized recommendation is the modeling of user preferences. Inspired by Eq. (2) and the analysis in Section 2.3, we propose to use the combination of different meta

4

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784

paths to model user preferences. We assume that each meta path starting from USER reflects some preferences of users. Specifically, suppose that we are given a set of meta paths that connect the users and the items: M = {P1 , P2 , . . . , Pm }. For each user, through random walking from the user along a path Pα , we can obtain a preference vector rα , which records the preferences of the user for all items according to path Pα . For path Pα , the preference vectors of all users form a preference matrix Rα . Given the meta path set M, we approximate the user preference matrix using

ˆ = R

m ∑

θα R α

(4)

α=1

where θα is the weight of the α th meta path in M, which can be learned from training data. As analyzed in Section 2.3, the most important and meaningful meta paths are the short ones. Therefore, to obtain the meta path set M, we enumerate the meta paths with their length up to k steps, where k is a small positive integer.

In this paper, we focus on recommendation with implicit feedback. Similar to Bayesian personalized ranking (BPR) [7], we attempt to optimize the area under the ROC curve (AUC) metric. AUC can measure how well a recommender system can distinguish the positive items (those appreciated by a user) from the negative items (all the others). For a given user u and a recommender, AUC is defined as follows: 1

def

+

∑ ∑ +

|Iu ||I \Iu |

L(u) + λ∥Θ∥2

arg min

(6)

Θ

where Θ = (θ1 , θ2 , . . . , θm )T is the parameter vector of the model to be learned, and λ is the regularization coefficient. When the logistic loss is used, this method is equivalent to using the BPR criterion [7] to optimize the path weights. Such an optimization problem can be solved using stochastic gradient descent (SGD [27]). Because we use the weighted average preference of multiple meta paths as the final preference, as in Eq. (4), we have rui =

m ∑

θα ruiα

α=1 α

where rui denotes user u’s preference for item i according to the meta path Pα . Similarly, we also have

3.2. Objective function

AUC(u) =

These three loss functions have all been proved to be consistent with AUC [29]. Directly optimizing L(u) usually leads to the overfitting problem. A common method to solve this problem is to use the L2 -norm regularization. Specifically, such methods are intended to solve the following optimization problem:

δ (ruij > 0)

i∈Iu+ j∈I \Iu+

where i is a positive item from Iu+ , j is a negative item from I \Iu+ , and

ruij = rui − ruj =

m ∑

α θα ruij

α=1 α where ruij = ruiα − rujα . Therefore, the update rule of SGD for Eq. (6) with logistic loss is

θα ← θα − η

( ) α ruij ∂ (L(u) + λ∥Θ∥2 ) = θα − η · − + λθ (7) α ∂θα 1 + eruij

where η is the learning rate. In this paper, we call this method HeteBPR, which directly uses the BPR criterion [7] to optimize the weights of meta paths. According to Eq. (7), this method prefers to give larger weights to all paths with higher accuracy, even if they are the same or very similar. The L2 -norm regularization term used in Eq. (6) can avoid some weights becoming too large. However, it cannot deal with redundancy among meta paths.

ruij = rui − ruj 3.3. Diversity regularization in which rui and ruj are the estimated preferences of user u for items i and j, respectively. The average AUC of all users can be used to evaluate the performance of the recommender: def

AUC =

1 ∑

|U |

AUC(u).

u∈U

The loss function δ (x > 0) used in AUC is non-differentiable; it is identical to the Heaviside step function:

δ (x > 0) = H(x):=

{

1,

x > 0;

0,

otherwise.

Therefore, directly optimizing AUC often leads to an NP-hard problem [26]. In practice, it is usually approximated by a convex optimization problem that minimizes the following objective function: 1

def

L(u) =

+

∑ ∑ +

|Iu ||I \Iu |

f (ruij )

(5)

i∈Iu+ j∈I \Iu+

where f (·) is a surrogate loss function that is set as a convex function. Widely used surrogate loss functions include:

• square loss (e.g., OPAUC [26]): f (x) = (1 − x)2 • logistic loss (e.g., RankNet [27]): f (x) = ln(1 + e−x ) • exponential loss (e.g. RankBoost [28]): f (x) = e−x

In the field of ensemble learning, it has become widely accepted that base learners in an ensemble model should be both accurate and diverse to achieve good performance [30–33]. Inspired by this, we propose to use diversity among the meta paths as another regularization term, to address the redundancy problem. To investigate the relationship between AUC and diversity, following the generalized ambiguity decomposition theory [34], we present two decompositions of L(u) (Theorems 3.1 and 3.2). Theorem 3.1 demonstrates the effectiveness of ensemble learning for recommendation. According to Theorem 3.2, we present a diversity measurement for recommender combination. Theorem 3.1. Assume that we are given a set of trained base recommenders {h1 , h2 , . . . , hm } and these are combined ∑recommenders ∑ m m α by weighted averaging; that is, rui = α=1 θα rui with α=1 θα = 1 and θα ≥ 0. Let Lα (u) denote the surrogate function of AUC of the base recommender hα . Then for any surrogate loss function f (·) that is twice differentiable, the objective function in Eq. (5), which is used to maximize AUC, can be decomposed into L = L¯ − A¯ where L = L(u)

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784



L¯ =

θα Lα (u) 1

A¯ =

2|Iu+ ||I \Iu+ |

·

∑ ∑ ∑ i∈Iu+ j∈I \Iu+

where L(u) denotes the loss function, as given in Eq. (5), and ∑ ∑ 2 i∈Iu+ j∈I \Iu+ (ruij ) denotes the diversity, as given in Eq. (8). The gradient of this objective function F (Θ) with respect to θα and the corresponding update rule of SGD are given in Eq. (10) and Eq. (11), respectively.

α θα f ′′ (r ∗ )(ruij − ruij )2

α

α with r being some number between ruij and ruij . ∗

α ruij gradα ← g + λθα + γ · ruij

Theorem 3.2. With the same conditions as in Theorem 3.1, for any surrogate loss function f (·) that is twice differentiable, the objective function in Eq. (5) can be decomposed into

with L and L¯ remaining the same as in Theorem 3.1, and A¯ can also be defined as 1 +

+

2|Iu ||I \Iu |

·

∑ ∑ ∑ i∈Iu+ j∈I \Iu+

α α



with being some number between 0 and ruij , and r being some number between 0 and ruij . For readability, we place the proof of these two theorems in the∑ appendix. Although the above results are derived in the case of θα = 1 and θα ≥ 0,∑for simplicity, the results can easily be extended to the case of θα ̸= 1. Using Theorem 3.1, we can prove the following corollary. Corollary 3.1. If the surrogate loss function f (·) is convex, then L ≤ L¯ always holds. This indicates that blending methods would always perform better than the average performance of base recommenders. Proof. For any convex function f (·), f ′′ ≥ 0 always holds. As a result, A¯ is always non-negative. Therefore, L is always less than ¯ or equal to L. According to Corollary 3.1, if we use the square loss, the logistic loss, or other convex loss functions in Eq. (5), then the blending methods are more effective than the average level of base recommenders. This is consistent with the empirical results in previous work [35,36]. In Theorem 3.2, although rα∗ and r ∗ are uncertain numbers, we can conclude that f ′′ (rα∗ ) ≥ 0 and f ′′ (r ∗ ) ≥ 0 always hold if the surrogate loss function f (·) is convex. According to the research in the field of ensemble learning, diversity is the key to the success of ensemble systems [37]. To combine the meta paths effectively, we should encourage diversity among them. The ambiguity term A¯ is considered to be closely related to diversity [38]: larger ambiguity indicates larger diversity. Therefore, we can use the ambiguity term for diversity regularization. 2 According to Theorem 3.2, the larger (r∑ uij ) is, the smaller m α ¯ the ambiguity A will be. Note that ruij = α=1 θα ruij ; that is, α α ruij is the weighted average of ruij , where ruij means user u’s preference difference between a positive item i and a negative item j according to the meta path Pα .

∑∑ α

β

α θα θβ ruij ruij

(11)

where g is the gradient of L(u) with respect to θα : g = When α using square loss, g = 2(ruij − 1)ruij . When using logistic loss, α α g = −ruij /(1 + eruij ). When using exponential loss, g = −ruij /eruij .

(8)

To learn the weights of meta path – the parameters in Eq. (9) – we design an algorithm called HeteWeight, whose pseudocode α is shown in Algorithm 1. The preference scores rui of single meta paths can be generated using path-constrained random walk [21] or other methods, and the procedure is omitted. The triple (u, i, j) in Step 3, which includes a user u, a positive item i, and a negative item j, is sampled in the same manner as in [7]. To satisfy the constraint θα ≥ 0, if any weight becomes negative after the update, the learning rate η is halved (Steps 5–7). Because the loss function L(u) (Eq. (5)) and the diversity function (Eq. (8)) are convex functions, the objective function F (Θ) (Eq. (9)) is also a convex function. Therefore, there exists a unique optimal solution, which we can find by a gradient descent-based method [39]. Steps 8–10 in Algorithm 1 ensure that the updating of parameters will lead to the decreasing of the objective function’s value: that is, moving toward the optimal solution. In addition, the learning rate η decreases during the learning procedure (Steps 5–7). Therefore, the proposed HeteWeight algorithm will converge after a finite number of iterations. Algorithm 1: HeteWeight. Input: set of meta paths M = {P1 , ..., Pk }, initial learning rate η, regularization coefficients λ and γ Output: path weights Θ 1: initialize Θ 2: repeat 3: randomly draw a triple (u, i, j) from the training data, where i ∈ Iu+ and j ∈ I \Iu+ 4: compute the gradient gradα according to Eq. (10) 5: while exists α such that θα − η · gradα < 0 do 6: η ← η/2 7: end while ∂ F (Θ) 8: compute Θ′ ← Θ − η · ∂ Θ according to Eq. (11) ′ 9: if F (Θ ) < F (Θ) then 10: update the parameters Θ ← Θ′ 11: end if 12: until convergence 13: return Θ

β

Intuitively, a larger (ruij )2 means greater correlation and lower diversity among the meta paths. Therefore, we can use −(ruij )2 to reflect diversity, or use (ruij )2 to reflect redundancy. Based on the above analysis, we introduce diversity into the objective function and rewrite the optimization problem as follows: arg min F (Θ) = L(u) + λ∥Θ∥2 + γ Θ

θα ← θα − η · gradα

3.4. The heteweight algorithm

] [ α 2 ) − f ′′ (r ∗ )(ruij )2 θα f ′′ (rα∗ )(ruij

rα∗

(ruij )2 =

(10)

∂ L(u) . ∂θα

L = L¯ − A¯

A¯ =

5

∑ ∑ +

+

i∈Iu j∈I \Iu

(ruij )2

(9)

4. Empirical study In this section, we demonstrate the effectiveness of the proposed method on two types of recommendation tasks: item recommendation and tag recommendation. First, the datasets and experimental setup are presented in Sections 4.1 and 4.2, respectively. Second, the baseline methods and parameter settings are described in Section 4.3. Finally, experimental results are shown and discussed in Section 4.4.

6

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784

Table 1 Statistics of restaurant and consumer dataset. Types User Restaurant ParkingLot Payment Cuisine Rating

Total no. 138 130 7 11 59 1161

Average no.

(per restaurant)

Average no. (per user)

– – – 2.14 1.19 8.93

– – – 1.33 2.39 8.41

Table 2 Statistics of HetRec 2011 MovieLens dataset. Types

Total no.

Average no. (per movie)

Average no. (per user)

User Movie Genre Director Actor Country Tag Rating

2 113 10 197 20 4 060 95 321 72 13 222 855 598

– – 2.040 1.0 22.778 1.0 8.117 84.637

– – – – – – 22.696 404.921

Fig. 2. The network schema of the HIN constructed using the Restaurant and Consumer dataset.

4.1. Datasets Two public datasets were used as experimental data to evaluate the proposed method. The smaller one is the Restaurant and Consumer dataset from the UCI Machine Learning Repository [40]. The larger one is the HetRec 2011 MovieLens dataset published by the GroupLens research group.1 The Restaurant and Consumer dataset, which is a customer dining dataset, contains 1,161 ratings given by 138 users on 130 restaurants during a period of seven months. This dataset also contains some information about the users and restaurants, such as the cuisine and parking lots of the restaurants, and the preferred cuisine and payment methods of the users. Table 1 displays the statistics of this dataset. The HetRec 2011 MovieLens dataset is an extension of the MovieLens10M dataset,2 which links the movies of the MovieLens dataset with their corresponding web pages at Internet Movie Database (IMDb)3 and the Rotten Tomatoes movie review system.4 It contains 855,598 ratings given by 2,113 users on 10,197 movies. It also contains some information about users and movies, such as the genre, director, actor, and country of movies, and the tags given by users for movies. Table 2 shows the statistics of this dataset. As in previous work [6,8,24], we performed a preprocessing step on the datasets, which retained only the ratings larger than the middle rating as the observed positive feedback (to simulate one-class feedback). 4.2. Experimental setup Based on the two datasets, we constructed three HINs for the item recommendation and tag recommendation tasks. Using the Restaurant and Consumer dataset, we constructed a network for the item recommendation task. It contained five types of nodes, abbreviated as (U: USER, R: RESTAURANT, P: PAYMENT, C: CUISINE, and L: PARKING LOT), and six types of links. The network schema is shown in Fig. 2. 1 http://www.grouplens.org. 2 https://grouplens.org/datasets/movielens/10m/. 3 http://www.imdb.com. 4 http://www.rottentomatoes.com.

Fig. 3. The network schema of the HIN constructed using the HetRec 2011 MovieLens dataset and used in the item recommendation task.

Fig. 4. The network schema of the HIN constructed using the HetRec 2011 MovieLens dataset and used in the tag recommendation task.

Using the HetRec 2011 MovieLens dataset, we constructed two different networks: one for item recommendation and another for tag recommendation. For the item recommendation task, we constructed a network with seven types of nodes, abbreviated as (U: USER, M: MOVIE, T: TAG, G: GENRE, D: DIRECTOR, A: ACTOR, and C: COUNTRY), and seven types of links. The network schema is shown in Fig. 3. For the tag recommendation task, we constructed a network with six types of nodes, abbreviated as (M: MOVIE, T: TAG, G: GENRE, D: DIRECTOR, A: ACTOR, and C: COUNTRY), and five types of links. The network schema is shown in Fig. 4. To study the empirical performance of different methods, two widely used metrics, AUC and NDCG [41], were adopted to evaluate the recommendation performance. We first evaluated AUC and NDCG for each user in the test data, and then obtained the AUC and NDCG results averaged over all users. For the Restaurant and Consumer dataset, to simulate sparse data, we used 50% of data as training data and the other 50% as testing data. For the HetRec 2011 MovieLens dataset, we sorted the ratings (tag

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784

assignments) by their timestamps, and used the first 80% as the training set and the remaining 20% as the testing set. 4.3. Baselines and parameter settings Nine recommendation algorithms were used as baseline methods: User CF [42], Item CF [43], NMF [44], LFM [45], BPR [7], PRank [14], PCRW [20], PathRank [6], and HeteBPR.

• User-Based Collaborative Filtering (User CF)

















User CF predicts a user’s preference for an item based on the preferences of similar users [42]. The preference rui is ∑ ′ ˆ ′ estimated as rui = u′ ∈Uˆ sim(u, u ) × ru i , where U is a set of the nk users most similar to user u, and sim(u, u′ ) is the similarity between two users. In our experiments, cosine similarity was used as the similarity measure. Item-Based Collaborative Filtering (Item CF) Item CF is similar to User CF, but uses similarity between items instead [43]. The preference rui is estimated ∑ of users ′ ˆ ′ as rui = i′ ∈Iˆ sim(i, i ) × rui , where I is a set of the nk items most similar to item i. As with User CF, cosine similarity was used as the similarity measure. Non-negative Matrix Factorization (NMF) As a representative matrix factorization-based method, NMF tries to find two non-negative factor matrices whose product is an approximation to the original matrix. This method was originally proposed for image analysis and has been used in collaborative filtering recently [44]. Latent Factor Model (LFM) LFM is another representative matrix factorization-based method, also known as the singular value decomposition (SVD) model. The key idea of this method is to factorize the user-item rating matrix to a product of two lower-rank matrices: one containing the ‘‘user factors’’, and the other containing the ‘‘item factors’’ [45]. Bayesian Personalized Ranking (BPR-MF) BPR is a state-of-the-art learning to rank method for personalized recommendation from implicit feedback [7]. It uses a logistic loss-based approximation to AUC as the objective function. Personalized PageRank (PRank) PRank is a graph-based node ranking method [14] based on the classic PageRank algorithm [13]. The stable solution of Eq. (1) is used as the preferences of a given user for the nodes in a graph. Path-Constrained Random Walk (PCRW) PCRW is based on single meta paths in the HIN. It is also called ‘‘semantic recommendation" because each meta path has its corresponding semantics [20]. According to the semantics of meta paths and the results of previous work, 3-step paths are the most common and important. In our experiments, we considered paths with length no longer than 3. PathRank PathRank is a state-of-the-art HIN-based recommendation method [6]. In our experiments, we used the same settings as in [6]: wrestart = 0.1, wtrans = 0.25, wpath = 0.65. For the HetRec 2011 MovieLens dataset, the path-guide UMTM +UMU(20)M +UMUM(20) was used because it showed the best performance in the original paper [6]. For the Restaurant and Consumer dataset, because no tag information was available, the path-guide UMU(20)M+UMUM(20) was used instead. HeteBPR HeteBPR directly uses BPR-Opt [7] for path weight optimization. The optimization problem is shown in Eq. (6), and the update rule of SGD is given in Eq. (7).

7

As for the proposed method, HeteWeight (Algorithm 1), the optimization problem is given in Eq. (9), and the update rule of SGD is given in Eq. (11). In our experiments, we used the square loss as the surrogate loss function, and the results of logistic loss and exponential loss were similar. The set of meta paths M was set to contain all paths with length no longer than 3. To avoid overfitting, the paths UR (Restaurant and Consumer dataset) and UM (MovieLens dataset) were not included in M. The α single path preferences rui were generated using PCRW [20]. The initial learning rate η was set to 0.5. During the learning process, if any of the path weights became negative after being updated, the learning rate was halved. 4.4. Experimental results 4.4.1. Task 1: Item recommendation The experimental results of HeteWeight and all baseline methods on the two datasets for the item recommendation task are shown in the second and third columns of Tables 3 and 4. The proposed HeteWeight method performed the best among all of the compared methods on both datasets, according to different evaluation metrics. In most cases, the improvements of HeteWeight over the baseline methods are statistically significant at the level of 0.01. Compared with PCRW, which relies on single meta paths, HeteWeight performed better than any single path-based method; this confirms the conclusion of Corollary 3.1 given in Section 3.3. It also confirms a consensus in ensemble learning: that a proper combination of multiple base learners can achieve better performance than any individual one. In addition, the performance of PCRW varied greatly, with different paths in both datasets, and the best path was different in different datasets. This indicates that we need to select the path before using PCRW, either manually or by using some other method. Compared with PRank, which is based on the random walk with restart model (relying on all meta paths), HeteWeight (using only the meta paths with length no longer than 3) performed better. This confirms the analysis in Section 2.3: that it is not necessary to use all meta paths. In addition, the performance of PRank relies heavily on the setting of input parameter α , and the optimal setting of this parameter varies greatly between different datasets. It seems that a larger value of α caused PRank to perform better on the Restaurant and Consumer dataset but worse on the HetRec MovieLens dataset. Compared with PathRank, which relies on several manually selected meta paths, HeteWeight performed better without the need to manually select meta paths. This demonstrates the effectiveness of weight learning in HeteWeight. In addition, we observe that the length of all meta paths selected for PathRank was 3, which partially confirms the analysis in Section 2.3. Compared with HeteBPR, which directly uses BPR-Opt [7] for path weight optimization, HeteWeight performed better, as a result of introducing diversity into the objective function. This confirms the importance of diversity for ensemble learning and demonstrates the effectiveness of the proposed diversity measure. To compare the performance of different recommendation methods more successfully, we performed the Friedman test in conjunction with the Bonferroni–Dunn test [46] at significance level 0.05. The results are shown in Fig. 5 and 6. The dots indicate the average ranks, and the bars indicate the critical difference with the Bonferroni–Dunn test at significance level 0.05. Methods with non-overlapping bars are significantly different. For each method, the best parameter setting was used to perform the test. For PRank, α = 1.0 was used on Restaurant and Consumer, and α = 0.6 on HetRec MovieLens. For PCRW, the path UPUR

8

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784 Table 3 Performance of different methods measured by AUC. Method

Restaurant and consumer

HetRec MovieLens (Item recommendation)

HetRec MovieLens (Tag recommendation)

User CF Item CF NMF LFM BPR-MF

0.7241** 0.7241** 0.8953** 0.9147* 0.9169*

0.7004** 0.7190** 0.6528** 0.6587** 0.7924**

0.7128** 0.7115** 0.8195** 0.8180** 0.8120**

PRank

α = 0.6 α = 0.8 α = 1.0

0.8932** 0.9080** 0.9139*

α = 0.6 α = 0.8 α = 1.0

0.7551** 0.7462** 0.7163**

α = 0.6 α = 0.8 α = 1.0

0.8924** 0.8908** 0.8423**

PathConstrained Random Walk

UR UCR UPR URCR URUR URLR UCUR URPR UPUR

0.6571** 0.6662** 0.7941** 0.6745** 0.7242** 0.7563** 0.8202** 0.8238** 0.8684**

UM UTM UMCM UTUM UMDM UMUM UMAM UMTM UMGM

0.5110** 0.5302** 0.5463** 0.5600** 0.6107** 0.6525** 0.6619** 0.6648** 0.6865**

MT MDMT MTMT MAMT MCMT MGMT

0.5651** 0.5715** 0.7188** 0.7229** 0.8063** 0.8668**

PathRank

URU(20)R+ URUR(20)

0.9164*

UMTM+ UMU(20)M+ UMUM(20)

0.7127**

MTM(20)T+ MTMT(20)

0.8902**

0.8784** 0.9381

HeteBPR HeteWeight

0.7918** 0.8233

0.8921** 0.9005

The best result on each dataset is displayed in bold. An entry is marked with ‘∗’ (or ‘∗∗’) if HeteWeight is significantly better than the compared method, based on the Wilcoxon signed rank test with continuity correction at the significance level 0.05 ( or 0.01 ). Table 4 Performance of different methods measured by NDCG. The best result on each dataset is displayed in bold. Method

Restaurant and consumer

HetRec MovieLens (Item recommendation)

HetRec MovieLens (Tag recommendation)

User CF Item CF NMF LFM BPR-MF

0.2455 0.2311 0.2984 0.2894 0.2961

0.3623 0.3706 0.3609 0.3443 0.3024

0.1850 0.1965 0.1836 0.1619 0.1812

PRank

α = 0.6 α = 0.8 α = 1.0

0.2771 0.2793 0.2961

α = 0.6 α = 0.8 α = 1.0

0.3726 0.3661 0.3189

α = 0.6 α = 0.8 α = 1.0

0.2770 0.2765 0.1912

PathConstrained Random Walk

UR UCR UPR URCR URUR URLR UCUR URPR UPUR

0.2141 0.2230 0.2486 0.2156 0.2441 0.2325 0.2598 0.2769 0.2820

UM UTM UMCM UTUM UMDM UMUM UMAM UMTM UMGM

0.2598 0.2485 0.2590 0.2800 0.3128 0.3359 0.3313 0.3407 0.3281

MT MDMT MTMT MAMT MCMT MGMT

0.1203 0.1327 0.1574 0.1633 0.1820 0.2020

PathRank

URU(20)R+ URUR(20)

0.2968

UMTM+ UMU(20)M+ UMUM(20)

0.3803

MTM(20)T+ MTMT(20)

0.2707

HeteBPR HeteWeight

0.2737 0.2984

was used on Restaurant and Consumer, and UMGM on HetRec MovieLens. On the Restaurant and Consumer dataset, HeteWeight performed significantly better than NMF, PCRW, HeteBPR, User CF, and Item CF. On the HetRec MovieLens dataset, HeteWeight was significantly better than all other methods. 4.4.2. Task 2: Tag recommendation The goal of tag recommendation is to find a set of proper words (tags) to describe resources. Existing tag recommendation methods mainly include content-based methods and cooccurrence based methods [47,48]. Content-based methods directly use the content of items, such as the genre of movies, to perform tag recommendation. Co-occurrence-based methods use the co-occurrence of tags among items (the item-tag matrix)

0.3890 0.3899

0.2847 0.2874

for tag recommendation. The proposed method, HeteWeight, is a hybrid method that combines both content-based methods and co-occurrence methods. The performance of different methods for this task is shown in the last column of Table 3 and 4. The Friedman test results on this task are shown in Fig. 7. All of the HIN-based methods performed significantly better than the traditional co-occurrence methods, including User CF, Item CF, NMF, LFM, and BPR-MF. This indicates that the additional side information in HIN is useful for this tag recommendation task. Compared with PCRW, which relies on single meta paths, all of the hybrid HIN-based methods performed better because of the combination of multiple meta paths. This confirms that hybrid methods usually perform better than single methods.

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784

9

by the ambiguity decomposition theory in the field of ensemble learning, we presented a new diversity measure and used it to encourage diversity among the meta paths. We tested the proposed method on both item recommendation and tag recommendation tasks, and the experimental results confirmed the effectiveness of the proposed method. Both our theoretical and empirical results indicate that the proposed method, HeteWeight, is a good choice for HIN-based recommendation. Acknowledgments The work is supported by the National Key R&D Program of China under Grant No. 2017YFB1002000 and the National Natural Science Fund of China under Grant No. 61232005. Fig. 5. Friedman test result on Restaurant and Consumer dataset, measured by AUC.

Appendix

Proof of Theorem 3.1 α The function f (ruij ) can be expanded near f (ruij ) according to Taylor’s theorem:

1

α f ′′ (r ∗ )(ruij − ruij )2 2 α and ruij . where r ∗ is a value between ruij Summing over all of the base recommenders using weighted averaging yields

α α f (ruij ) = f (ruij ) + f ′ (ruij )(ruij − ruij )+



α θα f (ruij ) = f (ruij ) +

α

Fig. 6. Friedman test result on HetRec MovieLens dataset for item recommendation measured by AUC.

1∑ 2

α θα f ′′ (r ∗ )(ruij − ruij )2

α

Therefore, according to the definitions of L(u) and Lα (u), we obtain ∑ ∑ ∑ ∑ 1 α θα f ′′ (r ∗ )(ruij − ruij )2 θα Lα (u) − L(u) = + + 2 | I || I \ I | u u + α + α i∈Iu j∈I \Iu

which is exactly the theorem. Proof of Theorem 3.2 α ) can be expanded near f (0) according to The function f (ruij Taylor’s theorem: α α f (ruij ) = f (0) + f ′ (0)ruij +

Fig. 7. Friedman test result on HetRec MovieLens dataset for tag recommendation, measured by AUC.

α 2 f ′′ (rα∗ )(ruij ) 2 α where rα∗ is a value between zero and ruij . Summing over all of the base recommenders using weighted averaging yields

∑ Compared with PRank, which relies on all meta paths, both HeteWeight and HeteBPR, which use only meta paths with length no longer than 3, performed significantly better. This again confirms the analysis in Section 2.3 that it is not necessary to use all meta paths, and long paths may cause poor generalization performance. Among the compared methods, HeteWeight achieved the best results, which again confirms the effectiveness of the proposed method.

α θα f (ruij ) = f (0) + f ′ (0)ruij +

α

1∑ 2

α 2 θα f ′′ (rα∗ )(ruij )

α

The function f (ruij ) can also be expanded similarly: 1

f ′′ (r ∗ )(ruij )2 2 where r ∗ is a value between zero and ruij . Therefore, we obtain f (ruij ) = f (0) + f ′ (0)ruij +



α θα f (ruij ) − f (ruij ) =

α

5. Conclusion

1∑ 2

[ ] α 2 ) − f ′′ (r ∗ )(ruij )2 θα f ′′ (rα∗ )(ruij

α

According to the definitions of L(u) and Lα (u), we obtain

∑ In this paper, based on the analysis that the most important and meaningful meta paths in the random walk with restart model are the short ones, we proposed using the combination of the short meta paths to model user preferences. To find the optimal weights to combine the meta paths, we introduced diversity into the objective function as a regularization term. Inspired

1

θα Lα (u) − L(u)

α

=

1 +

+

2|Iu ||I \Iu |

·

∑ ∑ ∑ i∈Iu+ j∈I \Iu+

[ ] α 2 θα f ′′ (rα∗ )(ruij ) − f ′′ (r ∗ )(ruij )2

α

which is exactly the theorem.

10

H. Liu, Z. Jiang, Y. Song et al. / Knowledge-Based Systems 181 (2019) 104784

References [1] J. Lu, D. Wu, M. Mao, W. Wang, G. Zhang, Recommender system application developments: A survey, Decis. Support Syst. 74 (2015) 12–32. [2] J. Bobadilla, F. Ortega, A. Hernando, A. Gutirrez, Recommender systems survey, Knowl.-Based Syst. 46 (2013) 109–132. [3] F. Ricci, L. Rokach, B. Shapira, Recommender systems: Introduction and challenges, in: F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook, Springer US, Boston, MA, 2015, pp. 1–34. [4] M. Al-Hassan, H. Lu, J. Lu, A semantic enhanced hybrid recommendation approach: A case study of e-government tourism service recommendation system, Decis. Support Syst. 72 (2015) 97–109. [5] J. Han, Mining heterogeneous information networks by exploring the power of links, in: J. Gama, V.S. Costa, A.M. Jorge, P.B. Brazdil (Eds.), Discovery Science, Springer, Berlin, Heidelberg, 2009, pp. 13–30. [6] S. Lee, S. Park, M. Kahng, S. Lee, Pathrank: Ranking nodes on a heterogeneous graph for flexible hybrid recommender systems, Expert Syst. Appl. 40 (2) (2013) 684–697. [7] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, Bpr: Bayesian personalized ranking from implicit feedback, in: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, in: UAI’09, AUAI Press, Arlington, Virginia, United States, 2009, pp. 452–461. [8] H. Liu, Z. Wu, X. Zhang, CPLR: Collaborative pairwise learning to rank for personalized recommendation, Knowl.-Based Syst. 148 (2018) 31–40. [9] X. Yu, X. Ren, Y. Sun, B. Sturt, U. Khandelwal, Q. Gu, B. Norick, J. Han, Recommendation in heterogeneous information networks with implicit user feedback, in: Proceedings of the 7th ACM Conference on Recommender Systems, in: RecSys’13, ACM, New York, NY, USA, 2013, pp. 347–350. [10] X. Zhou, L. Ding, Z. Li, R. Wan, Collaborator recommendation in heterogeneous bibliographic networks using random walks, Inf. Retr. J. 20 (4) (2017) 317–337. [11] F. Fouss, A. Pirotte, J.-M. Renders, M. Saerens, Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation, IEEE Trans. Knowl. Data Eng. 19 (3) (2007) 355–369. [12] F. Yu, A. Zeng, S. Gillard, M. Medo, Network-based recommendation algorithms: A review, Physica A 452 (2016) 192–208. [13] L. Page, S. Brin, R. Motwani, T. Winograd, The pagerank citation ranking: Bringing order to the web, Tech. rep., Stanford InfoLab, 1999. [14] T.H. Haveliwala, Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search, IEEE Trans. Knowl. Data Eng. 15 (4) (2003) 784–796. [15] A. Balmin, V. Hristidis, Y. Papakonstantinou, Objectrank: Authority-based keyword search in databases, in: Proceedings of the Thirtieth International Conference on Very Large Data Bases, in: VLDB ’04, VLDB Endowment, 2004, pp. 564–575. [16] M. Gori, A. Pucci, Itemrank: A random-walk based scoring algorithm for recommender engines, in: Proceedings of the 20th International Joint Conference on Artifical Intelligence, in: IJCAI’07, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2007, pp. 2766–2771. [17] T.-A.N. Pham, X. Li, G. Cong, Z. Zhang, A general recommendation model for heterogeneous networks, IEEE Trans. Knowl. Data Eng. 28 (12) (2016) 3140–3153. [18] Z. Jiang, H. Liu, B. Fu, Z. Wu, T. Zhang, Recommendation in heterogeneous information networks based on generalized random walk model and Bayesian personalized ranking, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, in: WSDM ’18, ACM, 2018, pp. 288–296. [19] Y. Sun, J. Han, X. Yan, P.S. Yu, T. Wu, Pathsim: Meta path-based topk similarity search in heterogeneous information networks., Proc. Vldb Endow. 4 (11) (2011) 992–1003. [20] C. Shi, X. Kong, Y. Huang, P.S. Yu, Hetesim: A general framework for relevance measure in heterogeneous networks, IEEE Trans. Knowl. Data Eng. 26 (10) (2014) 2479–2492. [21] N. Lao, W.W. Cohen, Fast query execution for retrieval models based on path-constrained random walks, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in: KDD’10, ACM, New York, NY, USA, 2010, pp. 881–888. [22] S. Arbesman, The Half-Life of Facts: Why Everything We Know Has an Expiration Date, Penguin, 2013. [23] R. Pan, Y. Zhou, B. Cao, N.N. Liu, R. Lukose, M. Scholz, Q. Yang, One-class collaborative filtering, in: Proceedings of the Eighth IEEE International Conference on Data Mining, in: ICDM’08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 502–511.

[24] W. Pan, L. Chen, Gbpr: Group preference based bayesian personalized ranking for one-class collaborative filtering, in: Proceedings of the TwentyThird International Joint Conference on Artificial Intelligence, in: IJCAI’13, AAAI Press, 2013, pp. 2691–2697. [25] T. Ebesu, Y. Fang, Neural semantic personalized ranking for item cold-start recommendation, Inf. Retr. J. 20 (2) (2017) 109–131. [26] W. Gao, L. Wang, R. Jin, S. Zhu, Z.-H. Zhou, One-pass AUC optimization, Artificial Intelligence 236 (2016) 1–29. [27] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in: Proceedings of the 22nd International Conference on Machine Learning, in: ICML’05, ACM, New York, NY, USA, 2005, pp. 89–96. [28] Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, An efficient boosting algorithm for combining preferences, J. Mach. Learn. Res. 4 (Nov) (2003) 933–969. [29] W. Gao, Z.-H. Zhou, On the consistency of AUC pairwise optimization, in: Proceedings of the 24th International Conference on Artificial Intelligence, in: IJCAI’15, AAAI Press, 2015, pp. 939–945. [30] N. Li, Y. Yu, Z.-H. Zhou, Diversity regularized ensemble pruning, in: Machine Learning and Knowledge Discovery in Databases, Springer, 2012, pp. 330–345. [31] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, Chapman and Hall/CRC Press, 2012. [32] C. Zhang, C. Liu, X. Zhang, G. Almpanidis, An up-to-date comparison of state-of-the-art classification algorithms, Expert Syst. Appl. 82 (2017) 128–150. [33] C. Zhang, J. Bi, S. Xu, E. Ramentol, G. Fan, B. Qiao, H. Fujita, Multiimbalance: An open-source software for multi-class imbalance learning, Knowl.-Based Syst. 174 (2019) 137–143. [34] Z. Jiang, H. Liu, B. Fu, Z. Wu, Generalized ambiguity decompositions for classification with applications in active learning and unsupervised ensemble pruning, in: Proceedings of the 31st AAAI Conference on Artificial Intelligence, in: AAAI’17, AAAI Press, 2017, pp. 2073–2079. [35] Y. Koren, The bellkor solution to the netflix grand prize, Netflix Prize Documentation, 81, 2009, 1–10. [36] A. Töscher, M. Jahrer, R.M. Bell, The bigchaos solution to the netflix grand prize, Netflix Prize Documentation, 2009, 1–52. [37] Y. Yu, Y.-F. Li, Z.-H. Zhou, Diversity regularized machine, in: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, in: IJCAI’11, AAAI Press, 2011, pp. 1603–1608. [38] G. Brown, J. Wyatt, R. Harris, X. Yao, Diversity creation methods: a survey and categorisation, Inf. Fusion 6 (1) (2005) 5–20. [39] M.J.D. Powell, R. Fletcher, A rapidly convergent descent method for minimization, Comput. J. 6 (2) (1963) 163–168. [40] M. Lichman, UCI machine learning repository, 2013, URL http://archive.ics. uci.edu/ml. [41] Y. Wang, L. Wang, Y. Li, D. He, W. Chen, T.-Y. Liu, A theoretical analysis of NDCG ranking measures, Proceedings of the 26th Annual Conference on Learning Theory, COLT’13, pp. 25–54. [42] J. Wang, A.P. de Vries, M.J.T. Reinders, Unifying user-based and item-based collaborative filtering approaches by similarity fusion, in: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, in: SIGIR’06, ACM, New York, NY, USA, 2006, pp. 501–508. [43] G. Linden, B. Smith, J. York, Amazon.com recommendations: item-to-item collaborative filtering, IEEE Internet Comput. 7 (1) (2003) 76–80. [44] Y. Bao, H. Fang, J. Zhang, Topicmf: Simultaneously exploiting ratings and reviews for recommendation, in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, in: AAAI’14, AAAI Press, 2014, pp. 2–8. [45] P. Cremonesi, Y. Koren, R. Turrin, Performance of recommender algorithms on top-n recommendation tasks, in: Proceedings of the Fourth ACM Conference on Recommender Systems, in: RecSys’10, ACM, New York, NY, USA, 2010, pp. 39–46. [46] J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (Jan) (2006) 1–30. [47] H. Murfi, K. Obermayer, A two-level learning hierarchy of concept based keyword extraction for tag recommendations, in: Proceedings of the 2009th International Conference on ECML PKDD Discovery Challenge, in: ECML-PKDD DC’09, CEUR-WS.org, Aachen, Germany, Germany, 2009, pp. 201–214. [48] H. Wang, B. Chen, W.-J. Li, Collaborative topic regression with social regularization for tag recommendation, in: Proceedings of the TwentyThird International Joint Conference on Artificial Intelligence, in: IJCAI’13, AAAI Press, 2013, pp. 2719–2725.