Collaborative linear manifold learning for link prediction in heterogeneous networks

Collaborative linear manifold learning for link prediction in heterogeneous networks

Information Sciences 511 (2020) 297–308 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

2MB Sizes 0 Downloads 51 Views

Information Sciences 511 (2020) 297–308

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Collaborative linear manifold learning for link prediction in heterogeneous networks JiaHui Liu a,b, Xu Jin a, YuXiang Hong a,b, Fan Liu a, QiXiang Chen a, YaLou Huang c, MingMing Liu a, MaoQiang Xie a,∗, FengChi Sun a,∗∗ a

College of Software, Nankai University, Tianjin, China College of Computer Science, Nankai University, Tianjin, China c TianJin International Joint Academy of Biomedicine, TianJin, China b

a r t i c l e

i n f o

Article history: Received 27 February 2019 Revised 19 September 2019 Accepted 23 September 2019 Available online 24 September 2019 Keywords: Link prediction Heterogeneous networks Manifold learning Collaborative learning

a b s t r a c t Link prediction in heterogeneous networks aims at predicting missing interactions between pairs of nodes with the help of the topology of the target network and interconnected auxiliary networks. It has attracted considerable attentions from both computer science and bioinformatics communities in the recent years. In this paper, we introduce a novel Collaborative Linear Manifold Learning (CLML) algorithm. It can optimize the consistency of nodes similarities by collaboratively using the manifolds embedded between the target network and the auxiliary network. The experiments on four benchmark datasets have demonstrated the outstanding advantages of CLML, not only in the high prediction performance compared to baseline methods, but also in the capability to predict the unknown interactions in the target networks accurately and effectively. © 2019 Published by Elsevier Inc.

1. Introduction Link prediction is an important problem in network analysis. Many problems related to it are currently being studied widely, including friend recommendation in social networks [24], protein-protein interaction prediction [4] and the reconstruction of airline networks [13]. In fact, many real-world networks are complicated and include heterogeneous interactions. Consequently, more efforts have been made to improve link prediction methods by fully using topological structures and interconnections of the heterogeneous networks [9,25]. Many advanced algorithms have been proposed to solve the link prediction problem. The current algorithms can be grouped into three categories: similarity-based algorithms [1,19,34], path-based algorithms [18,25,30] and factorization-based algorithms [5,10,11,26]. In similarity-based methods, similarity scores are calculated to measure the correlations between the nodes. These algorithms have lower computational cost but lower prediction accuracy, since they cannot fully utilize global structures in the known network. Path-based algorithms generally utilize the topology of network and attributes of nodes for link prediction. However, it brings a higher computational cost to learn the structure from the networks. Factorization-based algorithms can extract latent features from the known networks for link prediction. However, existing factorization-based algorithms neither fully utilize the information from the auxiliary network nor keep the valid information during feature ∗ ∗∗

Corresponding author. Co-corresponding author. E-mail addresses: [email protected] (M. Xie), [email protected] (F. Sun).

https://doi.org/10.1016/j.ins.2019.09.054 0020-0255/© 2019 Published by Elsevier Inc.

298

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

Fig. 1. A schematic of Collaborative Linear Manifold Learning Algorithm. (1) Three small-scale heterogeneous networks are constructed from real-world data. Adjacency matrices T(0) , R and A(0) are obtained according to the interactions within the networks. (2) T(0) R and RA(0) are the two initial manifolds for target and auxiliary networks. (3) T and A are alternatively updated by collaborative learning of TR and RA iteratively. (4) The predicted links in matrix T are obtained after the convergence of CLML.

integration. In brief, all of the above algorithms have some advantages and disadvantages over each other, but none of them consider the consistency between the target and the auxiliary network, which causes the low prediction performance. Recently, manifold learning for link prediction has been becoming popular in machine learning and pattern recognition fields [22,27,29,33]. The basic idea of manifold learning is to project the data from original high-dimensional space to another low-dimensional space, and thus more latent information can be learned for reproducing essential structural features in the original data. It outperforms the algorithms that only consider the original feature space. After dimensional transformation, redundant features in original feature space are removed, and mutual relationships between nodes are reconstructed to better demonstrate their similarities. Manifold learning has been widely applied in knowledge representation models due to its effectiveness on data distance measurement. For instance, Wan et al. proposed a two-dimensional maximum embedding difference(2DMED) method which combines graph embedding and difference criterion techniques for image feature extraction [28]. Zhang et al. introduced a drug feature-based manifold regularization method that projected drugs in the interaction space into a low-dimensional space to predict drug-drug interactions [32]. Ma et al. proposed a manifold concept factorization model with adaptive neighbor structure to learn a new representation for clustering [20]. In this paper, we propose a collaborative linear manifold learning algorithm, named CLML, to predict the links in heterogeneous networks. The overall illustration of the proposed framework is shown in Fig. 1. In CLML, manifolds characterized by neighbors can be used to measure the correlations of the data. Specifically, one manifold is constructed from the target network and the inter-network, while the other one is constructed from the auxiliary network and the inter-network. The loss function of CLML is defined by jointly considering the consistency between these two manifolds. Since the networks in the real-world applications are generally sparse, low-rank approximations on the target and the auxiliary network are employed to maintain the sparsity of networks. Since the loss function cannot be learned by traditional gradient descent algorithms, an alternative updating strategy that combines the proximal gradient method and the estimation of learning rate is adopted. The experiments on four real-world datasets show that the proposed CLML method improves the performance of link prediction compared with current state-of-the-art algorithms. The stable performance on benchmark datasets also proves that CLML can be easily generalized to other applications. The main contributions and novelties of our work are as follows: 1. We propose a novel collaborative linear manifold learning model, CLML, for link prediction. CLML can better capture the correlations among data points and utilize the consistency within networks. Moreover, the two manifolds in CLML is capable of detecting the hidden network topological features from both target and auxiliary networks. 2. CLML employs the low rank constraint and also combines prior knowledge to overcome the sparsity problem of heterogeneous networks. Therefore, CLML can still maintain stable performance on networks containing lots of missing links or unobserved interactions.

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

299

3. CLML significantly improves the accuracy of predicting unknown links on different real-world applications, including social rating network, scientific co-authorship network, and drug-drug interaction network compared to current state-ofthe-art algorithms. The remainder of this paper is organized as follows. In the next section, the current study of link prediction methods are grouped into three categories. Subsequently, the representative methods in each category are reviewed. In Section 3, we first describe the notations in the model. Then, we present the proposed Collaborative Linear Manifold Learning model in detail and an alternative optimization algorithm for solving it. The baseline methods and experimental results are presented and analyzed in Section 4. Finally, this paper is concluded in Section 5. 2. Related work As mentioned above, the current methods for link prediction can be broadly divided into the three categories: similaritybased methods, path-based methods and factorization-based methods. Some representative methods in each category are briefly reviewed. 2.1. Similarity-based methods Common Neighbors(CN) and Adamic-Adar(AA) are the two most representative similarity-based methods. CN defines the similarity for social network analysis by using the number of shared neighbors between two nodes [19], while AA method penalizes the shared neighbors by their degrees [1]. Since then, various algorithms have been developed based on them: Zhu et al. developed a mutual information approach called NSI, that measures the likelihood between the node pair when their common neighbors are given. In addition, multiple structural features are also considered from the perspective of information theory [34]. Huang et al. proposed a network-connectivity score to measure the strength of network connection between drug and target for drug-drug interaction prediction [16]. Specifically, it maps nodes (drugs) in target network into auxiliary network (Protein-Protein Interaction network) through inter-network (drug-target associations), and then calculates the similarities of node pairs in target network by scoring the systematic connectivity of target-related node set in auxiliary network. 2.2. Path-based methods The path-based link prediction methods pay more attention to the topology information of network. Katz status index, the most popular path-based method, counts all possible paths between pair of nodes. It measures the paths by their lengths, since the shorter paths can contribute more to final similarity [18]. Zhang et al. proposed an integrative label propagation approach for predicting missing links in the network by considering high-order similarity. It benefits from the combination of the network topology and the integration of multiple similarity matrices derived from different data sources [31]. Xu et al. quantitatively studied the influence of paths for link prediction by utilizing information theory. The study proposed a new similarity index, Path Entropy (PE) index, which calculates the information entropies of shortest paths between node pairs with penalization to long paths [30]. 2.3. Factorization-based methods Factorization-based methods can extract latent features and analyze local structure information in the network. Menon et al. proposed a supervised matrix factorization approach to solve the link prediction problem [21]. The loss function is defined by jointly considering the latent features factorized from the target network and the inter-network (i.e. the attributes of nodes in the target network). The model is optimized by stochastic gradient descent algorithm. Fang et al. proposed a matrix factorization model called MCRI for link prediction in recommendation system. The model incorporates rich user and item information into recommendation with implicit feedback [10]. Specifically, the user information matrix is decomposed into a shared subspace with the implicit feedback matrix, as does the item information matrix. Cao et al. applied subgraph embedding to convex matrix completion for link prediction [5]. In fact, this method induces a representation of each node by learning embeddings of subgraphs around given nodes in the graph. Therefore, the global structure information can be leveraged by the fine-grained network features, such as the neighborhood information of nodes. Gao et al. proposed a network link prediction algorithm called MCLP, based on matrix completion [11]. In this method, the target network is treated as a corrupted dataset, and the link prediction is considered as the recovering of unobserved links according to the topological information of the network. 3. Collaborative linear manifold learning In this section, we first introduce the notations used in the paper and then a base Linear Manifold Learning model. We next introduce our proposed model, Collaborative Linear Manifold Learning (CLML) for link prediction in detail. The CLML model is illustrated in Fig. 1, which uses drug-target heterogeneous networks as an example.

300

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308 Table 1 Notations. Notation

Description

m n R ∈ Rm×n T ∈ Rm×m T (0) ∈ Rm×m A ∈ Rn×n A(0) ∈ Rn×n W1 ∈ Rm×m W2 ∈ Rn×n

Size (# of nodes) of the target network Size (# of nodes) of the auxiliary network Inter-network association matrix Target network matrix Initialized target network matrix with known interactions Auxiliary network matrix Initialized auxiliary network matrix with known interactions Prior knowledge weighted matrix for the target network Prior knowledge weighted matrix for the auxiliary network

Fig. 2. Manifold learning: local neighborhood extends to global space.

3.1. Notations The data used in this study can be represented as an integrated heterogeneous network consisting of a target network T(m × m) , an auxiliary network A(n × n) and an inter-network R(m × n) interconnecting the related networks T and A, where m and n are the size of the target network and the auxiliary network, respectively. Details of notations are described in Table 1 and the used heterogeneous network is demonstrated in the left panel of Fig. 1. In the proposed model, the prior knowledge weighted matrix is denoted by W ∈ Rm×m . Xij means the (i, j)th entry of the matrix X. Xi · and X · j denote the ith row and the jth column respectively. XF denotes the Frobenius norm of X.  X ∗ = i σi denotes the nuclear norm, where σ i is the ith largest singular value of X. The symbol  stands for Hadamard product. 3.2. Linear manifold learning In this subsection, the loss function of Linear Manifold Learning (LML) is defined. First, the Locally Linear Embedding (LLE) algorithm and Sparse Subspace Clustering (SSC) are integrated to measure the correlations between data points. Secondly, topological information of the target network and the attributes of nodes in the target network are used to learn the embedded manifold. Finally, the low-rank property of the target network is employed to solve the sparsity issue with prior knowledge. In the LLE algorithm, each data point and its neighbors lie on or close to the locally linear patch of the manifold [23]. The local geometry in the neighborhood of data point can be characterized by linear coefficients, and reconstructed from its neighbors. As shown in the left panel of Fig. 2, suppose a data point Xi sampled from the data set X and φ (i) denotes its k nearest neighbors. Each point Xi can be approximately represented as a linear combination of points in φ (i). The coefficient ωij is the linear coefficient of point Xj ∈ φ (i) to reconstruct Xi , as shown in Eq. (1),

Xˆi =



ωi j X j .

(1)

X j ∈φ ( i )

With the help of the learned embedded manifold, link prediction algorithms can estimate the correlations among the data points much better than the explicit geometric distance measure. Furthermore, integrated with SSC, manifold can be more efficient in practice by considering all the points in space, especially when data points are sparse [8]. Therefore, the local neighborhood is needed to extend to global space to take full advantage of manifold learning as illustrated in Fig. 2. Different from the link prediction problem, SSC is concentrating on finding the core subspace or sparse representation of source data. In contrast, our goal is to detect the missing links from sparse network thus requiring some auxiliary in-

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

301

formation to learn the manifold. As a result, the known target network interactions are embedded as prior knowledge to reconstruct R, and the fixed weight matrix W is integrated together to control the influence of prior knowledge. The prior constraint of T is presented as follows:

f (T ) =

m  m 





Wi j Ti j − Ti(j0) ,

(2)

i=1 j=1

where T(0) is the adjacency matrix of known target network with 1 for the entries with known interactions and 0 otherwise. To avoid the ill posed problem, the weight matrix W is defined as Eq. (3). In our experiments, μ = 0.2 and ρ = 2.

Wi j =

⎧ ⎨μ

if

1 − μ if if

⎩ ρ

Ti(j0) = 0

Ti(j0) = 1 . i= j

(3)

With the prior knowledge as constraint (prior constraint) in the loss function, the manifold constraint can be more flexible to get a robust solution. Due to the sparsity of the target network, the low-rank constraint on T is also integrated by using nuclear norm [6], which is denoted as T∗ . Consequently, the loss function of LML is concluded as follows:



2



2

m m  m      L (T ) = Ti j R j + α Wi j Ti j − Ti(j0) Ri − i=1



j=i





F

+ γ T ∗

i=1 j=1



= R − T R2F + α W  T − T (0)

 2 + γ T ∗ , F

(4)

where α and γ are hyper-parameters, which balance the weights of prior constraint and low-rank constraint, respectively. Eq. (4) can be optimized to an optimal solution T∗ by a proximal gradient method [17]. 3.3. Collaborative linear manifold learning Although learning the linear manifold of data space is less data-dependent than previous link prediction methods, additional feature information is still needed to improve the link prediction accuracy. Therefore, the inter-network is integrated with target network to improve the prediction. Similar to the solution described in Section 3.2, the missing links in auxiliary network can also be inferred through linear manifold learning. As a result, a similar loss function on auxiliary network A is given in Eq. (5). The same matrix R in Eqs. (4) and (5) attracts our attention and motivates the modeling of CLML algorithm.





min R − RA2F + β W2  A − A(0) A

 2 + γ A∗ . F

(5)

Two optimal linear combination matrices R∗ denoted as Rt∗ and Ra∗ can be learned from Eqs. (4) and (5) respectively. They are the completion results of the original matrix R, and the only difference between them is the use of different prior knowledge. Since the completion results based on the same original matrix R should be close to Rreal regardless of the source of prior knowledge, Rt∗ ≈ Ra∗ ≈ Rreal can be obtained. This can also be explained by the locally invariant idea [14] which declares that the results obtained from different directions of linear manifold learning should be similar. Based on this observation, by replacing the manifold constraints on Eqs. (4) and (5) with the collaborative constraint of T and A, the loss function of CLML is defined as





L(T , A ) = RA − T R2F + α W1  T − T (0)

 2 F

  2 + β W2  A − A(0) + γ (T ∗ + A∗ ) F

(6)

where W1 and W2 are the prior knowledge weighted matrices for the target network and the auxiliary network, respectively. α and β are two hyper-parameters. The illustration of CLML is shown in Fig. 1. The input heterogeneous networks are shown in the left panel of Fig. 1, which includes intial target network T(0) , the inter-network R and intial auxiliary network A(0) . Then, the target network and the auxiliary network are integrated with inter-network to construct the initial manifolds T(0) R and RA(0) . After that, motivated by the consistency of two manifolds TR and RA for the same R, collaborative learning strategy is adopted for reconstructing two manifolds iteratively when high-potential links are updated in T and A. As a result, the learned target network is expected to be close to the real one when the collaborative learning converges. 3.4. Alternative optimization method for CLML The loss function in Eq. (6) can be divided into two parts from the terms of matrix norm: (1) the Frobenius norm terms on collaborative manifold constraint and prior knowledge constraint, (2) the nuclear norm on low-rank constraint. Since the nuclear norm term is not differentiable, a proximal gradient method is needed to solve it.

302

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

Based on the proximal gradient method, an alternative iterative scheme is proposed to minimize the loss function L(T , A ) with respect to one variable while fixing the other variable in each iteration. Additionally, the estimation of step size is employed to accelerate the rate of convergence. 3.4.1. Updating T while A is fixed Because of the non-differentiable nature of nuclear norm, the loss function with respect to T can be represented as

L ( T ) = γ T ∗ + f ( T )

(7)

where f (T ) = RA − T R2F + αW1  (T − T (0 ) )2F . It is known [17] that the gradient step for minimizing f(T) can be reformulated equivalently as a proximal regularization of the linearized function f(T) at T (k−1 ) as











T (k ) = arg min Qη(k) T , T (k−1) = arg min f T (k−1) + T − T (k−1) , ∇ f (T (k−1) ) T

T

t

+

1 (k )

2ηt

T − T (k−1) 2 + γ T ∗

(8)

F

where k is the iteration index, ηt(k ) is the step size of the kth iteration of T, ∇ f(T) is the gradient of f(T), and X, Y = tr (X Y is the matrix inner product operation. We ignore the term f (T (k−1 ) ) that does not depend on T and denote P (k) (T (k−1 ) ) by arg minT Q (k) (T , T (k−1 ) ). ηt

Eq. (8) can be reformulated as





T (k ) = Pη(k) T (k−1) = arg min t T

1 (k )

2ηt

  T − T (k−1) + ηt(k) ∇ f T (k−1) 2 + γ T ∗ . F

ηt

(9)

It can be solved by first computing the singular value decomposition(SVD) of T (k−1 ) − ηt(k ) ∇ f (T (k−1 ) ), and then performing a soft-thresholding algorithm SVT [3] on the single values. Hence, Eq. (9) can be expressed as

T (k ) = U 

γ ηt(k )

V

(10)

where UV is the form of singular value decomposition, U is the left eigenvector and V is the right eigenvector. The γ corresponding singular values  γ is diagonal with ( γ )ii = max{0, ii − (k) }. ηt(k )

ηt(k )

ηt

3.4.2. Choosing an appropriate step size The gradient method can be accelerated, if the step size ηt(k ) can be chosen properly [17]. In particular, ηt(k ) in the kth iteration is initially set to be equal to the last (k − 1 )th iteration, then it is multiplied by a factor θ > 1 until the inequality (11) holds. The detailed steps are presented in Algorithm 1. Algorithm 1 Estimation of step size ηt(k ) . Initialization: ηt(k ) = ηt(k−1 ) , θ > 1 Iteration: 1: while L (P (k ) (T (k−1 ) )) > Q (k ) (P ηt

2: 3: 4:

ηt

ηt(k) ← θ ηt(k)

ηt(k )

(T (k−1) ), T (k−1) ) do

end while return ηt(k )







L Pη(k) (T (k−1) ) > Qη(k) Pη(k) (T (k−1) ), T (k−1) t t t

 (11)

3.4.3. Updating A while T is fixed The update of A is similar to T and it is formulated as

A(k ) = arg min A

1 2ηa(k )

A − A(k−1) + ηa(k) ∇ f (A(k−1) )2F + γ A∗

(12)

Finally, the algorithm for minimizing the loss function (Eq. (6)) of CLML is outlined in Algorithm 2. As a result, P =

(|T ∗ | + |T ∗ | )/2 is used as the final affinity matrix to predict the missing links in the target network T. A larger value of Pij means a larger possibility of a link between node i and node j.

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

303

Algorithm 2 Optimization algorithm for CLML. Initialization: ηt(0 ) > 0,ηa(0 ) > 0,T (0 ) ∈ Rm×m , A(0 ) ∈ Rn×n , k = 1 Iteration: 1: while not converge do Estimate ηt(k ) and ηa(k ) by Algorithm 1 2: 3: T (k ) ← P (k) (T (k−1 ) ) 4:

A (k ) ← P

ηt

ηa(k )

(A(k−1) )

k←k+1 end while 7: return T (k ) , A(k )

5:

6:

Table 2 Statistics summary of datasets.(SN: Similarity Network). Dataset

Douban

Yelp

NIPS

DDI

Target Network T Auxiliary Network A Relations between T and A # of Nodes in T(0) # of Nodes in A # of links in T # of links in inter-network R

User Network Movie SN User-Movie 13367 12677 4085 168278

User Network Business SN User-Business 16239 14284 158590 198397

Co-author Network Doc SN Author-Doc 2865 2484 63178 5879

DDI Network PPI Network for Target Proteins Drug-Target 758 473 11852 2296

Fig. 3. Statistics of node interactions on each dataset.

4. Experiments 4.1. Datasets To evaluate the performance of the proposed CLML method, four real-world heterogeneous networks from different domains were collected, including two social rating networks, a scientific coauthorship network and a drug-drug interaction network. Douban: a well-known movie rating social network in China (http://movie.douban.com/). In this website, users can score movies in the range from 1 to 5. The dataset also contains the social relations between users and the attribute information of users and movies. Yelp: another rating network (http://www.yelp.com/dataset_challenge/). The dataset contains users’ ratings on local business and the attribute information of users and businesses. NIPS: a coauthorship network, which is a widely studied dataset consisting of coauthorship information among authors and their publications. In this paper, the processed version in [12] is tested. DDI: a well-known biological network with known drug-drug interactions. The dataset used in the experiments is released by [16], which also includes drug-target interactions and PPI network. The detailed statistics summary, including the number of different nodes and the number of different links of the four datasets, is listed in Table 2. As shown in Fig. 3, Douban is a very sparse network with 82% nodes without links. In contrast, Yelp, DDI and NIPS are denser networks with 17%, 42% and 48% nodes with more than 10 links respectively. 4.2. Baselines and evaluation measures The state-of-the-art link prediction methods including CN [19], AA [1], NSI [34], LP [31], MCLP [11], MCRI [10] and MRMF [32] are applied as baseline methods to evaluate the performance of CLML. In addition, LML, a variant of CLML, is also considered as a baseline method for the comparison. Area under the receiver operator characteristic curve (AUC) [15] and area under the precision-recall curve (AUPR) [7] are used to evaluate the proposed method and baselines. Higher values represent better prediction performance. AUC20 measure

304

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308 Table 3 Experimental results of 5-fold cross-validation on four benchmark datasets. The AUC20, AUC and AUPR are reported for each dataset. The best results across all the methods are in bold. AUC20

AUC

AUPR

Methods

Douban

Yelp

NIPS

DDI

Douban

Yelp

NIPS

DDI

Douban

Yelp

NIPS

DDI

CN AA NSI LP MCLP MCRI MRMF LML CLML

0.0213 0.0290 0.1725 0.0569 0.1528 0.1949 0.1574 0.1349 0.3014

0.0568 0.0282 0.2088 0.2447 0.1321 0.2895 0.3545 0.0843 0.2070

0.0193 0.0173 0.1751 0.1066 0.0511 0.1722 0.1481 0.1990 0.2281

0.0161 0.0251 0.2879 0.3643 0.0837 0.1760 0.3679 0.1584 0.5436

0.5186 0.5248 0.6474 0.5520 0.6904 0.6710 0.6658 0.6551 0.7203

0.5126 0.5489 0.6731 0.6969 0.7269 0.6884 0.7475 0.6358 0.7808

0.5038 0.5057 0.6097 0.4621 0.6357 0.5895 0.6243 0.6210 0.6438

0.5121 0.5388 0.6437 0.7136 0.7322 0.6503 0.8186 0.8040 0.8297

0.0115 0.0085 0.0870 0.0285 0.0519 0.1043 0.0744 0.1622 0.2448

0.0229 0.0141 0.1273 0.1341 0.0750 0.2266 0.2548 0.1370 0.2642

0.0171 0.0170 0.1160 0.0927 0.0398 0.1137 0.1269 0.1717 0.1751

0.0292 0.0322 0.1506 0.3490 0.0795 0.0814 0.3679 0.3287 0.3917

is also used to evaluate the performance on top ranked interactions, where the AUC is calculated with the number of false positives up to 20. Since the top ranked interactions are more important than the lower ranked ones, AUC20 provides more informative evaluation and comparison than AUC. 4.3. Comparison with baseline methods The 5-fold cross-validation(CV) was designed to compare the performance of CLML with other baseline methods. Specifically, the datasets are randomly partitioned into five folds. Then four folds are selected as the training set while the remaining fold is the test set. The 5-fold CVs are repeated ten times for CLML and baseline methods. As a result, the optimal parameters of CLML and all the baseline methods can be set according to the average performance of cross-validation. The results of 5-fold cross-validation experiments with optimal parameters on four benchmark datasets are reported in Table 3. It shows that CLML and LML can predict top-ranked interactions well. The performance of them is better than other baselines significantly on AUPR which heavily punishes top-ranked false positives. Although MRMF has better AUC20 score on Yelp dataset, its performance on AUC and AUPR are worse than the proposed methods. Thus, CLML has better prediction accuracy than other methods. Furthermore, the precision is significantly improved by applying the collaborative learning strategy between two linear manifolds, where the AUPR score is promoted nearly 50%. The significant improvement of link prediction on four benchmark datasets demonstrates that the strategy of collaborative learning is robust on learning hidden structure information from both target and auxiliary networks. Comparing the differences of performance between the baseline methods, similarity-based, path-based and conventional factorization-based methods did not perform very well since they cannot capture auxiliary information effectively or utilize the correlations of data points fully. In contrast, CLML which combines manifold learning and low rank constraint achieves improvement than baselines by employing the known target and auxiliary information simultaneously. In summary, CLML is not only a measure which relies on correlations of node pairs, but also an efficient model that improves the link prediction by a proper collaborative learning strategy. 4.4. Analysis of collaborative manifold learning Further experiment is needed to analyze the impact of collaborative manifold learning, since CLML declares that the results obtained from different directions of linear manifold learning should be similar and helpful for the prediction. 4.4.1. Improvements by collaborative manifold learning In CLML, the initial manifolds T(0) R and RA(0) are constructed by original target network and auxiliary network. Through alternative iterative updating of T and A, enriched manifolds can be obtained at the end. As a result, the consistency that measured by similarity between TR and RA on four datasets all has been improved: Douban from 34% to 75%, Yelp from 46% to 81%, NIPS from 52% to 85% and DDI from 50% to 95%. In other word, the performance of CLML is improved along with this procedure that make T and A more consistent and finally reaches the optimal one. 4.4.2. Impact of network density on CLML Clustering coefficient reflects the clustering of links into tightly connected neighborhoods [2]. Its value varies from zero to one, the higher the value is, the more likely the nodes are to be linked together. Evidence shows that a sparse and clustered network tends to reveal more hidden links, thus the sampled network with different clustering coefficients can be used to analyze the performance of link prediction algorithms from the perspective of network density. At first, 6 sub-networks with different clustering coefficients are cut from the Yelp’s user network. The heterogeneous network used in this analysis consists of sub-network, user rating relations and the similarity network of Yelp’s business items. Secondly, CLML and baseline methods are executed on these 6 heterogeneous networks. Finally, the AUC scores are calculated when the results are obtained.

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

305

Fig. 4. Performance of CLML and baseline methods on sub-networks of Yelp’s users.

Fig. 5. The influence of parameters α and γ on AUC.

As shown in Fig. 4, The performance of CLML and baselines is almost proportional to the clustering coefficients as a whole. Furthermore, CLML outperforms other baselines with different clustering coefficients, especially with high clustering coefficient. It means that the collaborative manifold learning can extract better structural information for link prediction on integrated heterogeneous network. 4.5. Impacts of prior constraint and low-rank constraint In order to better understand CLML, the overall impacts of prior constraint and low rank constraint are further investigated by tuning parameters α and γ . The parameter α and parameter β are used to penalize the training errors on target network and auxiliary network, respectively. In the analysis, the parameter β is set as α , since they have similar roles and can be considered as the prior constraint together. The larger α is, the higher the weight of prior knowledge is, which pushes the manifold features learned from inter-network to be closer to the distribution of target network. Fig. 5(a) shows the relationship between AUC and parameter α on four benchmark datasets. The prediction accuracy of CLML increases as α grows until the loss function reaches to the optimal value, which is consistent with the expectation. Parameter γ controls the rank of the adjacency matrices of target network and auxiliary network. Theoretically, smaller γ enforces sparser adjacency matrices and less impact on prior constraints. As shown in Fig. 5(b), the prediction accuracy of our model increases gradually with γ increases and the optimal value reaches at γ = 1 on Yelp dataset. The prediction accuracy starts to drop afterwards. Note that both prior knowledge and low rank constraint play important roles in CLML. Fig. 6 illustrates the joint influence of parameter α and γ on the prediction accuracy of CLML on Yelp dataset. The optimal performance is obtained when α = 100 and γ = 1, which is consistent with the results in Fig. 5. 4.6. Robustness analysis with disturbed target networks To test whether the methods are robust to the bias, we repeated the experiments on six different disturbed target networks: (1) randomly remove 10% links and add 10% links; (2) randomly remove 20% links; (3) randomly remove 20% links and add 20% links; (4) randomly remove 40% links; (5) randomly remove 60% links; (6) randomly remove 80% links. The experimental results are shown in Fig. 7. It should be noted that the performance of all methods decreases as more real

306

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

Fig. 6. The joint impact of prior constraint and low-rank constraint of CLML by adjusting parameter α and γ . AUC is adopted as the performance measure and Yelp is chosen as the dataset.

Fig. 7. The performance of CLML and baseline methods on 6 disturbed target networks.

links in the target network are removed or shuffled. However, CLML is more robust to the missing links or false positive links in the network. When 80% of links are removed, the AUC of CLML can still keep around 0.7 which is much higher than all the other methods. Therefore, CLML has a higher robustness compared to the baselines. 4.7. Top-ranked drug-drug interactions by CLML With the best parameters from cross-validation, CLML was used to predict drug-drug interactions by using the DDI dataset in [16]. The results are reported in Table 4, and were verified by Interactions Checker tool (http://drugs.com and https://www.rxlist.com/drug- interaction- checker.htm) or literatures. In the top 10 ranked interactions predicted by CLML, the drug pairs in bold font in Table 4 are supported by mentioned resource, or have substantive supporting evidence from existing interactions. As shown in Table 4, 8 out of 10 pairs are confirmed as true interactions, which further indicates the effectiveness of CLML in predicting missing interactions. Furthermore, the corresponding sub-network is visualized in Fig. 8.

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

307

Table 4 Top 10 ranked interactions predicted by CLML. Rank

DrugBank ID

Drug Name

1 2 3 4 5 6 7 8 9 10

DB00777, DB00674, DB00777, DB00715, DB00777, DB01238, DB00715, DB01238, DB00382, DB00382,

Propiomazine, Galantamine Galantamine, Aripiprazole Propiomazine, Donepezil Paroxetine, Tacrine Propiomazine, Tacrine Aripiprazole, Donepezil Paroxetine, Donepezil Aripiprazole, Trimipramine Tacrine, Chlorprothixene Tacrine, Promazine

DB00674 DB01238 DB00843 DB00382 DB00382 DB00843 DB00843 DB00726 DB01239 DB00420

Fig. 8. Visualization of predicted drug-drug interactions. Drugs in predicted DDIs are represented by diamonds. Black lines indicate interactions between drugs that we predicted and verified. Dotted lines indicate interactions between drugs that we predict but haven’t been verified. Grey lines indicate interactions between drugs are known. The darkness of the nodes depends on their degrees.

5. Conclusion In this paper, we propose a novel collaborative linear manifold learning algorithm CLML to improve the performance of link prediction with auxiliary networks. Inspired by the locally linear embedding algorithm, low-dimensional manifolds embedded in target network and auxiliary network are extracted to measure the correlations of data points. Then, the loss function of penalizing the inconsistency between geometries in target manifold and auxiliary manifold is proposed by integrating with low-rank constraint. Finally, collaborative updating strategies are employed to optimize the loss function. CLML can measure the correlations of data points effectively and fully utilize the consistency between networks. The experimental results show that CLML significantly outperforms conventional methods. The additional experiments and analysis demonstrate the effectiveness of the proposed method. Declaration of Competing Interest None.

308

J. Liu, X. Jin and Y. Hong et al. / Information Sciences 511 (2020) 297–308

Acknowledgements This work is supported by National Key R&D Program of China(No.2018YFB0504400), the National Natural Science Foundation of China (No. 61300972), and the Natural Science Foundation of Tianjin (No. 18JCYBJC15700). We would like to thank editors and reviewers for their insightful comments. We are also grateful thank Wei Zhang and ShuYin Shen for their helpful discussions and revisions. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34]

L.A. Adamic, E. Adar, Friends and neighbors on the web, Social Netw. 25 (3) (2003) 211–230. A. Barrat, M. Weigt, On the properties of small-world network models, Eur. Phys. J. B 13 (3) (20 0 0) 547–560. J. Cai, E.J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim. 20 (4) (2010) 1956–1982. C.V. Cannistraci, G. Alanis-Lobato, T. Ravasi, From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks, Sci. Rep. 3 (2013) 1613. Z. Cao, L. Wang, G. de Melo, Link prediction via subgraph embedding-based convex matrix completion, in: Proceedings of AAAI Conference on Artificial Intelligence, AAAI Press, 2018, pp. 2803–2810. J. Chen, J. Yang, Robust subspace segmentation via low-rank representation, IEEE Trans. Cybern. 44 (8) (2014) 1432–1445. J. Davis, M. Goadrich, The relationship between precision-recall and ROC curves, in: Proceedings of International Conference on Machine Learning, ACM, 2006, pp. 233–240. E. Elhamifar, R. Vidal, Sparse subspace clustering, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 2790–2797. B. Ermis¸ , E. Acar, A.T. Cemgil, Link prediction in heterogeneous data via generalized coupled tensor factorization, Data Min. Knowl. Discovery 29 (1) (2015) 203–236. Y. Fang, L. Si, Matrix co-factorization for recommendation with rich side information and implicit feedback, in: Proceedings of International Workshop on Information Heterogeneity and Fusion in Recommender Systems, ACM, 2011, pp. 65–69. M. Gao, L. Chen, B. Li, et al., A link prediction algorithm based on low-rank matrix completion, Appl. Intell. 48 (12) (2018) 4531–4550. A. Globerson, G. Chechik, F. Pereira, et al., Euclidean embedding of co-occurrence data, J. Mach. Learn. Res. 8 (2007) 2265–2295. R. Guimerà, M. Sales-Pardo, Missing and spurious interactions and the reconstruction of complex networks, Proc. Natl. Acad. Sci. 106 (52) (2009) 22073–22078. R. Hadsell, S. Chopra, Y. LeCun, Dimensionality reduction by learning an invariant mapping, in: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 2006, pp. 1735–1742. J.A. Hanley, B.J. McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology 143 (1) (1982) 29–36. J. Huang, C. Niu, C.D. Green, et al., Systematic prediction of pharmacodynamic drug-drug interactions through protein-protein-interaction network, PLoS Comput. Biol. 9 (3) (2013) e1002998. S. Ji, J. Ye, An accelerated gradient method for trace norm minimization, in: Proceedings of International Conference on Machine Learning, PMLR, 2009, pp. 457–464. L. Katz, A new status index derived from sociometric analysis, Psychometrika 18 (1) (1953) 39–43. D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks, J. Assoc. Inf. Sci. Technol. 58 (7) (2007) 1019–1031. S. Ma, L. Zhang, W. Hu, et al., Self-representative manifold concept factorization with adaptive neighbors for clustering, in: Proceedings of International Joint Conference on Artificial Intelligence, 2018, pp. 2539–2545. A.K. Menon, C. Elkan, Link prediction via matrix factorization, in: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2011, pp. 437–452. D.C.G. Pedronette, F.M.F. Gonçalves, I.R. Guilherme, Unsupervised manifold learning through reciprocal kNN graph and connected components for image retrieval tasks, Pattern Recognit. 75 (2018) 161–174. L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds, J. Mach. Learn. Res. 4 (2003) 119–155. S. Scellato, A. Noulas, C. Mascolo, Exploiting place features in link prediction on location-based social networks, in: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2011, pp. 1046–1054. C. Shi, Z. Zhang, P. Luo, et al., Semantic path based personalized recommendation on weighted heterogeneous information networks, in: Proceedings of ACM International on Conference on Information and Knowledge Management, ACM, 2015, pp. 453–462. A.P. Singh, G.J. Gordon, Relational learning via collective matrix factorization, in: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2008, pp. 650–658. M. Wan, Z. Lai, G. Yang, et al., Local graph embedding based on maximum margin criterion via fuzzy set, Fuzzy Sets Syst. 318 (2017) 120–131. M. Wan, M. Li, G. Yang, et al., Feature extraction using two-dimensional maximum embedding difference, Inf. Sci. 274 (2014) 55–69. W. Wang, Y. Yan, F. Nie, et al., Flexible manifold learning with optimal graph for image and video representation, IEEE Trans. Image Process. 27 (6) (2018) 2664–2675. Z. Xu, C. Pu, J. Yang, Link prediction based on path entropy, Phys. A 456 (2016) 294–301. P. Zhang, F. Wang, J. Hu, et al., Label propagation prediction of drug-drug interactions based on clinical side effects, Sci. Rep. 5 (2015) 12339. W. Zhang, Y. Chen, D. Li, et al., Manifold regularized matrix factorization for drug-drug interaction prediction, J. Biomed. Inf. 88 (2018) 90–97. B. Zhu, J.Z. Liu, S.F. Cauley, et al., Image reconstruction by domain-transform manifold learning, Nature 555 (2018) 487. B. Zhu, Y. Xia, An information-theoretic model for link prediction in complex networks, Sci. Rep. 5 (2015) 13707.