A machine learning-based recommendation model for bipartite networks

Physica A xxx (xxxx) xxx Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa A machine learning-bas...

Download PDF

663KB Sizes 0 Downloads 95 Views

Report

PDF Reader
Full Text

Physica A xxx (xxxx) xxx

Contents lists available at ScienceDirect

Physica A journal homepage: www.elsevier.com/locate/physa

A machine learning-based recommendation model for bipartite networks ∗

Ozge Kart , Oguzhan Ulucay, Berkay Bingol, Zerrin Isik Computer Engineering Department, Dokuz Eylul University, Izmir, Turkey

article

info

Article history: Received 27 August 2019 Received in revised form 22 December 2019 Available online xxxx Keywords: Recommendation model Link prediction Social network Network similarity metric Random walk Machine learning

a b s t r a c t Online user reviews on a product, service or content has been widely used for recommender systems with the spread of the internet and online applications. Link prediction is one of the popular recommender system approaches. It can benefit the structure of a social network by mapping item reviews of users to a bipartite user–item graph structure. This study aims to investigate how topological information, namely neighborbased, path-based and random walk-based network similarity metrics, improve the prediction capability of a recommendation model. This study proposes a supervised machine learning-based link prediction model for weighted and bipartite social networks. The input features of the machine learning model are extended versions of similarity metrics for weighted and bipartite networks. Our proposed model provides 0.93 and 0.9 AUC values for the Goodreads and MovieLens datasets, respectively. Random forest and extreme gradient boosting as the ensemble models achieved the highest performances for ItemRank metric in both datasets. © 2020 Elsevier B.V. All rights reserved.

1. Introduction Recently, social media is widely used by most of the people to get information or share opinions about products or services. Therefore, online user reviews and ratings on social media have become a powerful source while people make their decisions. One of the main methods used for item recommendation systems is collaborative filtering. Conventional collaborative filtering algorithms generally use the transaction data for recommending products or services to users [1]. As opposed to collaborative filtering, link prediction applies both node-based information and structural information of the network to predict occurrences of new links in the future [2]. Using the interaction relationship between nodes and attribute information of nodes, link prediction could capture the real world more precisely and limit the shortcomings of the conventional transaction data. Hence, many researchers started to work on link prediction problem in the last decade [3–5]. Lü and Zhou [2] provided a comprehensive review of studies on link prediction. They explained the similarity-based algorithms, probabilistic models and maximum likelihood algorithms. Yao et al. [6] proposed a local path-based method for link prediction that explores the impact of the Resources from Short Paths (RSP) for complex networks. Their proposed method outperformed the traditional structure-based network similarity metrics on real world datasets. Mallek et al. [7] applied belief functions tools for link prediction. It combines ∗ Corresponding author. E-mail address: [email protected] (O. Kart). https://doi.org/10.1016/j.physa.2020.124287 0378-4371/© 2020 Elsevier B.V. All rights reserved.

Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

2

O. Kart, O. Ulucay, B. Bingol et al. / Physica A xxx (xxxx) xxx

neighborhood and common groups information together in social networks for prediction of new links. They also stated that the proposed method surpasses the network structure-based methods. Muniz et al. [8] proposed a weighted metric that uses contextual, temporal and topological information to elevate the performance of unsupervised link prediction in social networks. They applied their proposed weighting criteria on two well-known weighted centrality measures (Adamic-Adar and common neighbors). Their proposed criteria provided better performance compared to the weighting criterion which is only based on topological information. Bastami et al. [9] developed a new unsupervised gravitation-based link predictor by combining node attributes, community information and graph properties to improve accuracy of local and global predictions. Their proposed approach gave more successful and scalable results than classical similarity-based methods and reduced running time. Bütün et al. [10] proposed a new link prediction metric by extending neighbor-based metrics and considering the direction of the links in directed weighted and temporal networks. Their proposed approach with supervised learning algorithms improves the prediction performance of the model compared to frequently applied link prediction metrics. Previously proposed link prediction models generally apply neighbor-based measures and compare their individual performances in different network datasets. Path-based and random walk measures are relatively less frequently used in recommender systems. Additionally, providing network centrality measures as the input of ensemble machine learning models has limited applications in link prediction studies. The current study aims to investigate how neighbor-based, path-based and random walk-based measures contribute to the prediction performance of a recommendation model. This study proposes a machine learning-based link prediction model for weighted and bipartite social networks. Neighbor-based (Jaccard Coefficient, Preferential Attachment, AdamicAdar) metrics and a path-based metric (Local Path) were adapted to a weighted and bipartite graph setting. We combined weighted [11] and bipartite [12,13] extensions of mentioned network similarity metrics into our supervised recommendation model. A random walk-based metric (the original Item Rank) was implemented. Several machine learning algorithms run with a supervised scheme by taking computed values of centrality metrics as input to predict whether a relationship between the given pair of nodes can form in the future. The experiments were performed on two different datasets: MovieLens and GoodReads. They consist of ratings given to the movies or books by users with timestamps. The MovieLens dataset is commonly used in the evaluation of various recommendation systems. GoodReads is a relatively new network data; hence, the proposed models were further evaluated on this network as well. Main contribution of the current study is the usage of different network similarity metrics as the feature vectors of machine learning algorithms to design a supervised learning setup as an alternative recommendation system. The common strategy of previous recommendation methods is a direct application of rank values calculated by individual network similarity metrics. However, our proposed method contributes a supervised learning step to experiment the effectiveness of machine learning algorithms in such recommendation systems. We also adapted weighted and bipartite extensions of network similarity metrics to able to use them in the graphs with two types of nodes. Another contribution is the evaluation of combined usage of all proposed metrics as feature vectors to observe the performance improvement in machine learning algorithms. The rest of the paper is organized as follows. Section 2 explains the dataset, adapted and implemented network similarity metrics and machine learning algorithms used in this study. Section 3 presents the experiments and obtained results. Section 4 presents conclusions and future work. 2. Materials and methods The workflow of the link prediction process is illustrated in Fig. 1. Firstly, the social network dataset is imported. Two networks G[T0] and G[T1] are constructed by dividing this dataset according to the timestamp. G[T0] is used for training and G[T1] is used for binary (positive/negative) labeling of links. The details of the division and labeling process are explained in Section 2.1. Network similarity metrics described in Section 2.2 are calculated as input features of a binary classifier. The machine learning models used as the binary classifier are described in Section 2.3. The performance evaluation techniques of the machine learning models are given in Section 2.4. 2.1. Datasets The study has been conducted on two different network datasets. First one is the public MovieLens 1M dataset [14]. It consists of data about users’ ratings (1–5) to movies. It covers 1,000,209 ratings made by 6040 users for approximately 3900 movies. The attributes of dataset are User ID, Movie ID, Rating (1–5), Timestamp (rating time). This dataset was represented with a bipartite network where one part of the network shows a user node and the other one shows a movie node (Fig. 2). When the user has rated this movie, a new link connects the user and the movie nodes. The weight of the link holds the rating of the movie given by the user. The other one is Goodreads poetry dataset [15]. The data set consists of 1,636,718 ratings of 36,514 books made by 377,799 users with timestamps. In Goodreads application, users can rate books after reading. Book rating ranges from one to five. If this value is zero, it means that the user read the book, but did not rate it. Ratings with zero values have been omitted from the dataset. For binary labeling process of the supervised learning, a similar approach with [10] has been adopted. In this study, G (V+W, E), which is a social network of interest is divided into two sub-graphs as G[T0] and G[T1] according to the Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

O. Kart, O. Ulucay, B. Bingol et al. / Physica A xxx (xxxx) xxx

3

Fig. 1. The workflow of the link prediction process.

Fig. 2. Conceptual representation of the data sets as a bipartite graph.

timestamps in ratings datasets. These timestamps provide information about evolution of the social network. The binary labeling problem is to check whether a new link (edge) will appear in G[T1] which is not exist in G[T0] where T0 < T1. If a non-edge in G[T0] becomes an edge in G[T1], it is positively labeled, otherwise negatively labeled (Fig. 3). For the experimental studies, two weighted and bipartite sub-networks G[T0] and G[T1] were constructed from equally divided MovieLens dataset according to timestamp data. G[T0] contains user and movie nodes and edges (ratings) up to from April 2000 to October 2000. G[T1] was constructed from November 2000 up to December 2003. Eventually, the number of newly appeared links in G[T1], namely positive examples is 47,636. The same procedure was applied to Goodreads dataset. The first sub-network (G[T0]) contains the data available between years 2007 and 2013. The second sub-network (G[T1]) was constructed from 2014 up to 2017. After network and Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

4

O. Kart, O. Ulucay, B. Bingol et al. / Physica A xxx (xxxx) xxx

Fig. 3. Some of the non-edges in G[T0] (shown as red dashed lines) become new edges (links) in G[T1] (shown as blue lines) and labeled as positive links. Remaining non-edge pairs are labeled as negative links.

graph creation, 382 small sub-components that are not connected to any other components in the network are cleaned from G[T0] thus nearly 500 links are deleted from G[T0]. Finally, 146,756 positive labeled records have been obtained. Because of the huge number of negatively labeled records, the same number of negative examples has been randomly selected among all negative examples for an equal distribution of positive and negative samples in both datasets. The similarity/centrality metrics described in next section were calculated for all user–item pairs for feature extraction to train machine learning models. The calculated features are JC (numerical), AA (numerical), PA (numerical), LP (numerical), IR (numerical), and binary target (0, 1). 2.2. Similarity metrics Link prediction techniques are mainly categorized as node-based, topology-based, and social-based [16]. Furthermore, topology category is divided into neighbor-based, path-based, and random walk-based techniques. This study focuses on feature extraction from topology-based similarity metrics for supervised link prediction in weighted bipartite networks. Neighbor-based metrics; Adamic-Adar, Jaccard Coefficient and Preferential Attachment, path-based metric; Local Path and random walk-based metric; Item Rank has been evaluated based on weighted and bipartite networks. However, neighbor and path-based similarity metrics need to be adapted for bipartite setting, since the neighbors of different types of nodes in bipartite network do not intersect (e.g., neighbors of users are items and neighbors of items are users). To be able to calculate the similarity metrics, both sets of nodes need to be in the same type. Therefore, the two sets S(x), S(y) are built for given nodes (x, y) which have different types as in the studies [12,13]. The set of nodes which are accessed from x with 2 hops, are represented as S(x). Thus, S(x) contains same types of nodes with x. S(y) is the set of y’s direct neighbors which are on the same side as x. The study [11] updated Adamic Adar (AA), Preferential Attachment (PA), Jaccard Coefficient (JC), and Local Path (LP) metrics for weighted networks as in formulas (1)–(4). Γ (x) is the set of neighbors of a node x in the network and |Γ (x)| is the number of elements in this set; ω(x, y) is the link weight between nodes x and y.

ω (x, z ) + ω(y, z) ∑ c ϵ Γ (z ) ω (z , c ))

∑

AA (x, y) =

z ∈Γ (x)∩Γ (y)

log(1 +

ω (x, z ) + ω(y, z) ∑ ω (a, x) + bϵ Γ (y) ω(b, y) aϵ Γ (x) z ∈Γ (x)∩Γ (y) ∑ ∑ PA (x, y) = ω (a, x) ∗ ω (b, y)

∑

JC (x, y) =

∑

a∈Γ (x)

(2) (3)

b∈Γ (y)

∑

LP (x, y) =

(1)

(ω (x, z ) + ω (y, z )) +

(4)

z ∈Γ (x)∩Γ (y)

e∗

∑

(ω (x, a) + ω (a, b) + ω (b, y))

(3 ) {x,a,b,y}∈pathsx,y

In the updated version of Local Path (LP) for weighted network by [11], for each path with exact length of 2 is shown as ω(x, z) + ω(y, z), where z is a common neighbor of x and y. For each path with exact length of 3 is shown as ω(x, a) + ω(a, b) + ω(b, y), where node a is adjacent to x and not to y; node b is adjacent to y and not to x; a and b are direct neighbors. Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

O. Kart, O. Ulucay, B. Bingol et al. / Physica A xxx (xxxx) xxx

5

In current study, these formulas have been adapted respectively for bipartite setting as in the following. maxiϵ (Γ (x)∩Γ (z )) {ω (x, i) + ω (z , i)} + ω(y, z)

∑

AA (x, y) =

log(1 +

z ∈S (x)∩S(y)

∑

aϵ S(x)

z ∈S (x)∩S(y)

PA (x, y) =

∑

c ϵ Γ (z )

maxjϵ (Γ (x)∩Γ (a)) {ω (x, j) + ω (a, j)} +

maxiϵ (Γ (x)∩Γ (a)) {ω (x, i) + ω (a, i)} ∗

a∈S (x)

∑

∑

bϵ S(y)

ω(b, y)

ω(b, y)

(6) (7)

b∈S (y)

∑

LP (x, y) =

(5)

ω (z , c ))

maxiϵ (Γ (x)∩Γ (z )) {ω (x, i) + ω (z , i)} + ω(y, z)

∑

JC (x, y) =

∑

(maxiϵ(Γ (x)∩Γ (z )) {ω (x, i) + ω (z , i)} + ω (y, z ))+

(8)

z ∈S (x)∩S (y)

∑

e∗

(ω (x, a) + ω (a, b) + ω (b, y))

(3) {x,a,b,y}∈pathsx,y

Nodes x and z, which are in the same type, cannot be linked to each other because of the bipartite structure of the network. Therefore, the path with length of 2, from x to z via their common neighbor i that yields the maximum sum of weights, is used instead of ω(x, z). A random-walk based metric is proposed by [17]. They present the ItemRank algorithm as an adaptation of the original PageRank to use in a recommender system. They used G[T0] to compute an ItemRank value IRui for every item node and for every user profile. The stochastic matrix M in PageRank is the correlation matrix C and for every user ui, a different IRui is computed by simply choosing a different dui static score distribution vector. α is a decay factor. The resulting equation for the ItemRank value of ui is: IRui = α × C × IRui + (1 − α ) × dui

(9)

ItemRank (IR) scores computes a ranking of items according to their expected preference for a given user. If the ItemRank score of an item is higher, the user would prefer this item with a higher probability. We calculated ItemRank scores for all users and used these scores as a new feature for user - movie/book pairs in our datasets. 2.3. Machine learning models The machine learning (ML) models [18] built by using the features described in previous section are Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machines (SVM) and ensemble models: Random Forest (RF) and Extreme Gradient Boosting (XGB). For the application of machine learning algorithms, python scikit-learn and XGBoost libraries were used. All experiments run with the default parameters of the libraries. The SVM was used with RBF kernel; both cost and gamma parameters were set to 10. The total number of decision trees was set to 100 in RF implementation. The total number of decision trees in XGB was set to 500. 2.4. Evaluation All of the experiments were conducted on Intel Xeon Silver 4110, 2.10 GHz CPU with 128 GB of memory and run on Ubuntu 18.04 operating system. The machine learning models has been evaluated by 5 – fold cross validation. The model is fitted on the training set and evaluated on the test set. The overall accuracy of the validation is average accuracy of each fold [19]. Equal number of positive and negative samples has been picked randomly for each partition. Accuracy, precision, recall, and the Area Under the ROC Curve (AUC) [20] were evaluated as performance measures. These measures can be expressed mathematically as follows. Accuracy = Recall =

True Positiv es + True Negativ es N True Positiv es

True Positiv es + False Negativ es True Positiv es

Precision =

True Positiv es + False Positiv es

(10) (11) (12)

ROC curve is the plot of the true positive rate against the false positive rate. AUC is calculation of the area under this curve. 3. Results and discussions Performances of the ML models have been evaluated. The models have been built by using both individual features and all features together. Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

6

O. Kart, O. Ulucay, B. Bingol et al. / Physica A xxx (xxxx) xxx Table 1 AUC rates of individual metrics on Goodreads dataset. Model:

LR

NB

RF

SVM

XGB

AA: JC: PA: LP: IR:

0.840 0.889 0.647 0.846 0.911

0.688 0.747 0.703 0.709 0.818

0.783 0.863 0.725 0.831 0.880

0.837 0.889 0.421 0.846 0.911

0.842 0.890 0.749 0.848 0.911

Table 2 AUC rates of individual metrics on MovieLens dataset. Model:

LR

NB

RF

SVM

XGB

AA: JC: PA: LP: IR:

0.776 0.772 0.776 0.803 0.847

0.767 0.77 0.762 0.767 0.802

0.694 0.684 0.689 0.719 0.774

0.769 0.757 0.768 0.796 0.843

0.779 0.773 0.777 0.803 0.846

Table 3 Performance values for Goodreads dataset by using AA, JC, PA and LP metrics together. Model:

LR

NB

RF

SVM

XGB

Avg. Avg. Avg. Avg.

0.66 0.66 1.0 0.647

0.44 0.99 0.16 0.718

0.84 0.88 0.88 0.910

0.75 0.95 0.66 0.894

0.84 0.88 0.88 0.912

accuracy: precision: recall: AUC:

Table 4 Performance values MovieLens dataset by using AA, JC, PA and LP metrics together. Model:

LR

NB

RF

SVM

XGB

Avg. Avg. Avg. Avg.

0.73 0.76 0.69 0.81

0.69 0.81 0.49 0.79

0.73 0.75 0.7 0.81

0.75 0.74 0.76 0.809

0.76 0.75 0.78 0.838

accuracy: precision: recall: AUC:

Table 5 Performance values Goodreads dataset by using all metrics together. Model:

LR

NB

RF

SVM

XGB

Avg. Avg. Avg. Avg.

0.66 0.66 1 0.647

0.44 0.99 0.16 0.718

0.87 0.90 0.90 0.931

0.78 0.95 0.71 0.913

0.87 0.90 0.90 0.934

accuracy: precision: recall: AUC:

3.1. Individual metric performances Tables 1 and 2 show AUC values of ML models trained with each metric individually for the MovieLens and Goodreads datasets. In both datasets, random walk-based metric, IR, achieved the highest AUC values among other topology-based metrics in all machine learning models. The performance of ML models in the Goodreads dataset are generally better than the ones in the MovieLens. NB led lower performances for all metrics (except IR) in the Goodreads dataset. In terms of metric performances, IR and JC have significant improvements in Goodreads compared to MovieLens.

3.2. Comparison of combination of metrics Tables 3 and 4 summarize performance measures of the ML models trained with AA, JC, PA and LP metrics together. When individual and combined metric performances are compared, random walk-based metric IR gives higher AUC values than all four metrics (AA, JC, PA and LP) together. When all five metrics are used together for training and evaluation of machine learning models, ensemble models RF, XGB, and kernel model SVM improved their performances in average 2% for AUC values as shown in Table 5. Similarly, RF, XGB, and SVM achieved improvements in average 7% for AUC values; however, the performances of LR and NB remain almost the same in terms of AUC values as shown in Table 6. Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

O. Kart, O. Ulucay, B. Bingol et al. / Physica A xxx (xxxx) xxx

7

Table 6 Performance values MovieLens dataset by using all metrics together. Model:

LR

NB

RF

SVM

XGB

Avg. Avg. Avg. Avg.

0.5 0.5 1 0.804

0.67 0.82 0.44 0.792

0.82 0.82 0.81 0.900

0.79 0.81 0.76 0.874

0.81 0.81 0.80 0.895

accuracy: precision: recall: AUC:

Table 7 Comparison of AUC values of CAA, proposed AA and IR for GoodReads and MovieLens datasets. Dataset:

GoodReads

Model:

LR

NB

RF

SVM

XGB

AA: IR: CAA:

0.84 0.911 0.869

0.688 0.818 0.718

0.783 0.880 0.992

0.837 0.911 0.869

0.842 0.911 0.874

0.767 0.802 0.76

0.694 0.774 0.807

0.769 0.843 0.797

0.779 0.846 0.802

Dataset:

MovieLens

AA: IR: CAA:

0.776 0.847 0.802

Table 8 Running time for calculation of each metric. Model:

Goodreads

MovieLens

AA: JC: PA: LP: IR: CAA:

16 h 30 min 204 h 20 min 218 h 55 min 17 h 35 min 158 h 54 min 515 h 35 min

17 h 10 min 68 h 5 min 55 h 45 min 18 h 20 min 4 h 21 min 335 h 25 min

3.3. Comparison with the CAA metric Table 7 shows a comparison between the proposed metrics and the CAA metric, which is a CAR-based extension of AA metric proposed in a previous study [21]. The main idea of CAR is that a node pair has more potential to connect each other if their common-first-neighbors are members of a strongly interconnected community. In general, the CAA metric obtained a higher AUC value an average of 0.04 than the current proposed AA metric in both GoodReads and MovieLens datasets. On the other hand, the IR metric outperformed CAA in both datasets with an exception in RF models. In both datasets, the RF models trained with the CAA metric obtained increased AUC values compared to the proposed AA and IR metrics. However, the calculation of the CAA metric is very costly in terms of running time compared to all metrics in this study (Table 8). 3.4. Computational time evaluation Computation time of each metric for both datasets is given in Table 8. IR has the fastest computation time among other proposed metrics for MovieLens, while it has slowest computation time for GoodReads dataset. It can be inferred from these results that size and density of network is highly related to the computation time of IR as it involves matrix multiplications for whole network. 4. Conclusions In this paper, an item recommendation system based on the link prediction problem has been addressed in weighted and bipartite networks by proposing a machine learning-based model. Several topology-based metrics have been extended to compute network similarity metrics on weighted and bipartite networks. These metrics have been used to train machine learning models based on different mathematical approaches. Individual and combined performances of neighbor-based, path-based and random walk-based similarity metrics have been compared. Experimental studies on two different social networks have shown that the proposed models can achieve quite promising results with 0.93 and 0.90 AUC values for GoodReads and MovieLens datasets, respectively. The ensemble models, RF and XGB, provided the highest predictions for both datasets. Random walk-based metric ItemRank outperformed individually all other neighbor-based and path-based metrics on both datasets. Usage of all metrics together improved prediction performances of the SVM model and ensemble models, RF Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

8

O. Kart, O. Ulucay, B. Bingol et al. / Physica A xxx (xxxx) xxx

and XGB. When the computation time of each similarity metric is considered, individual ItemRank metric can be preferred as the input feature of machine learning models, specifically for LR and NB. On the other hand, other machine learning models (RF, XGB, and SVM) provided slightly better results when they used the combined version of all metrics. If we need to choose a single ML algorithm and a similarity metric for a recommendation model, an ensemble model, RF or XGB, and the ItemRank metric might be a good candidate. The proposed method was experimented on two social networks MovieLens and GoodReads. However, the proposed method is easily applicable to social networks that have two types of nodes and weighted/unweighted connections. Therefore, the method is generally applicable or adaptable to most of the social network data. As future work, a new link prediction method might be developed by considering both node-based similarities and topological features. However, the public datasets providing detailed node attributes are limited. We manually need to crawl such data from online social media/e-commerce web sites. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]

L. Lü, M. Medo, C.H. Yeung, Y.-C. Zhang, Z.-K. Zhang, T. Zhou, Recommender systems, Phys. Rep. 519 (2012) 1–49. L. Lü, T. Zhou, Link prediction in complex networks: A survey, Physica A 390 (6) (2011) 1150–1170. S. Zeng, Link prediction based on local information considering preferential attachment, Physica A 443 (2016) 537–542. J. Liu, G. Deng, Link prediction in a user-object network based on time-weighted resource allocation, Physica A 388 (17) (2009) 3643–3650. Y. Cui, L. Zhang, Q. Wang, P. Chen, C. Xie, Heterogeneous network linkage-weight based link prediction in bipartite graph for personalized recommendation, Proc. Comput. Sci. 91 (2016) 953–958. Y. Yao, R. Zhang, F. Yang, J. Tang, Y. Yuan, R. Hu, Link prediction in complex networks based on the interactions among paths, Physica A 510 (2018) 52–67. S. Mallek, I. Boukhris, Z. Elouedi, E. Lefèvre, Evidential link prediction in social networks based on structural and social information, J. Comput. Sci. 30 (2019) 98–107. C.P. Muniz, R. Goldschmidt, R. Choren, Combining contextual, temporal and topological information for unsupervised link prediction in social networks, Knowl.-Based Syst. 156 (November 2017) (2018) 129–137. E. Bastami, A. Mahabadi, E. Taghizadeh, A gravitation-based link prediction approach in social networks, Swarm Evol. Comput. 44 (November 2017) (2019) 176–186. E. Bütün, M. Kaya, R. Alhajj, Extension of neighbor-based link prediction methods for directed, weighted and temporal social networks, Inform. Sci. 463–464 (2018) 152–165. H.R. De Sá, R.B.C. Prudêncio, Supervised link prediction in weighted networks, in: Proc. Int. Jt. Conf. Neural Networks, 2011, pp. 2281–2288. K.K. Chinta, K. Clark, CS224W project final report supervised link prediction in bipartite networks, 2014, pp. 1–10. Y. Lu, Y. Guo, A. Korhonen, Link prediction in drug-target interactions network using similarity indices, BMC Bioinformatics 18 (1) (2017) 1–9. MovieLens, 2019, [Online]. Available: https://grouplens.org/datasets/movielens/. (Accessed 21 April 2019). goodreads, 2019, [Online]. Available: https://www.goodreads.com/. (Accessed 21 April 2019). P. Wang, B.W. Xu, Y.R. Wu, X.Y. Zhou, Link prediction in social networks: the state-of-the-art, Sci. China Inf. Sci. 58 (1) (2014) 1–38. M. Gori, A. Pucci, ItemRank: A random-walk based scoring algorithm for recommender engines, in: IJCAI Int. Jt. Conf. Artif. Intell., 2007, pp. 2766–2771. J. Kelleher, B. Mac Namee, A. D’arcy, Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies, MIT Press, 2015. R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: International Joint Conference on Artificial Intelligence, Vol. 14, No. 2, 1995, pp. 1137–1145. J. Davis, M. Goadrich, The relationship between Precision-Recall and ROC curves, in: International Conference on Machine Learning, 2006, pp. 233–240. C.V. Cannistraci, G. Alanis-Lobato, T. Ravasi, From link-prediction in brain connectomes and protein interactomes to the local-communityparadigm in complex networks, Sci. Rep. 3 (2013) 1613.

Please cite this article as: O. Kart, O. Ulucay, B. Bingol et al., A machine learning-based recommendation model for bipartite networks, Physica A (2020) 124287, https://doi.org/10.1016/j.physa.2020.124287.

A machine learning-based recommendation model for bipartite networks

A machine learning-based recommendation model for bipartite networks

Recommend Documents