Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network

Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network

ARTICLE IN PRESS JID: NEUCOM [m5G;September 2, 2019;21:28] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing...

1MB Sizes 0 Downloads 67 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 2, 2019;21:28]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network Guangfu Chen a, Chen Xu b,∗, Jingyi Wang b, Jianwen Feng b, Jiqiang Feng b a b

College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, PR China College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, PR China

a r t i c l e

i n f o

Article history: Received 31 October 2018 Revised 27 July 2019 Accepted 19 August 2019 Available online xxx Communicated by Prof. J. Cao Keywords: Link prediction Weighted nonnegative matrix factorization Weighted cosine similarity Link weight

a b s t r a c t In weight networks, both link weights and topological structure are important features for link prediction. Currently, almost all existing weighted network link prediction algorithms only focused on naturally existed link weight but ignored the topological structure information. Therefore, these methods will suffer from the challenge of network sparsity and insufficient topology information. In this paper, we propose a novel Graph Regularization Weighted Nonnegative Matrix Factorization(GWNMF) model to integrate local topology information with link weights information for link prediction. Specifically, this model integrates two types of information: local topology and link weight information, and utilizes the weighted cosine similarity(WCS) method to calculate the weight similarity between local nodes. The WCS score matrix as the indicator weighted matrix to capture more useful link weight information. While graph regularization technology combines WCS score matrix to capture the local information. Besides, we derive the multiplicative updating rules to learn the parameter of this model. Empirically, we conduct the experiments on eight real-world weighted networks demonstrate that GWNMF remarkably outperforms the state-of-the-arts methods for weighted link prediction tasks. © 2019 Elsevier B.V. All rights reserved.

1. Introduction The aim of link prediction is to find missing links, identify spurious links and forecast the emergence of future links in observed network based on the available information, such as observed links and the attributes of nodes [1]. Link prediction has a wide range of applications in different fields. For example, in recommender systems, link prediction can help people to find new friends and recommend patent partners in enterprise social networks potential collaborators [2]. In biological networks, such as food webs and protein–protein interaction networks and metabolic networks, as our knowledge of these networks is very limited, using link prediction algorithm to conduct biological experiments will reduce the experimental costs [1]. Besides, in security networks, to prevent crime or terrorist activities, link prediction can disclose hidden connections between criminals [3]. Most of the existing link prediction algorithms only focus on unweighted networks, ignoring the contribution of link weight information [4]. However, in real-world weighted networks, the link weights of different weighted networks represent different meanings, i.e., in USAir network, link weight represents the ∗

Corresponding author. E-mail address: [email protected] (C. Xu).

frequency of flights between two airports. While in Baydry network the link weight denotes the feeding level. For weighted network link prediction, previous work has confirmed that considering weight contribution will improve the accuracy of the prediction. Since the importance of link weight information, in recent years, a few works have been extended the link prediction algorithm from unweighted networks to weighted networks, e.g., reliable-route method [5], weighted mutual information model [6], benefit rank [7], strong link [8], weak ties [9] and strong and weak ties [10]. These methods use link weights to achieve good performance in predicting missing links and weight predictions. Recently, nonnegative matrix factorization(NMF) models have been widely used in various fields, such as community detection [11], recommendation systems [12], and link prediction [13]. However, most existing link prediction algorithms based on the NMF method only consider unweighted networks but ignore the weight contributions, such as link recommendation [14], perturbation methods [15], belief propagation [16], and graph communicability [17] and only a few work consider link weight information, i.e., symmetric nonnegative matrix factorization and NMF [18]. The former has the disadvantage of ignoring topology information. To explore more network structure information, many graph-based methods have been proposed, which use graph regularization techniques to effectively maintain local structure information

https://doi.org/10.1016/j.neucom.2019.08.068 0925-2312/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx

[19–23]. The latter only directly uses observable link weight information to predict missing links. Although these methods utilize link weight information to obtain high prediction accuracy. However, challenge still exit in weight network link prediction. First, the network adjacency matrix is usually very sparse, how to explore more useful link weight information in a sparse weight network? Second, there are some obvious drawbacks of previous work. Weight similarity-based and NMF-based only depend on the link weights but ignore the network topology information. In addition, the network topology information is important features for link prediction of weighted networks [10]. Therefore, How to explore the network topology based on the NMF framework?. To overcome above-mentioned problems, in this paper, we propose a graph regularization weighted nonnegative matrix factorization model combined with graph regularization technology and local weight similarity to capture local topology information and exploit more useful link weight information for link prediction in weighted networks. Specifically, we employ the weighted cosine similarity(WCS) [24] to calculate the weight similarity between local nodes to obtain all the link weight information of the original network, then the graph regularization technology is integrated with WCS to explore topology information. Finally, we propose a unified link prediction model (GWNMF) and employ multiplicative updating rules to learn the parameters of GWNMF. To verify the validity of the proposed model, we use four evaluation metrics and eight real-world weighted networks, the experimental results show that our model outperforms traditional algorithms. The main contributions of this paper are as follows: •





We proposed a Graph Regularization Weighted Nonnegative Matrix Factorization(GWNMF) model for link prediction in weighted networks, which exploits more useful link weight information and preserves the local topology information. We derive multiplicative updating rules to learn the parameters of GWNMF, and provided the theoretical analysis their convergence. Conducting experiments on eight real-world weighted networks and four evaluation metrics to demonstrate that GWNMF outperforms the state-of-the-arts methods for weighted link prediction tasks.

The remainder of this paper is organized as follows. Section 2, related work. Section 3, proposed algorithm based on GWNMF link prediction. Section 4, The experimental results and analysis are described. Finally, Section 5 concludes this paper briefly. 2. Related work In this section, we briefly review the previous link prediction algorithm. Many link prediction algorithms of different types have been proposed for use in different scenarios in recent years. Those algorithms can be generally divide into two types: similaritybased algorithms and maximum-likelihood methods. Among them, similarity-based algorithms are most simplest and effective. The similarity-based algorithms including neighborhood-based similarity and distance-based similarity. The neighborhood-based similarity main idea is that two nodes have more neighbors and they have higher likelihood of similarity. For example, Common Neighbors(CN) [25], Adamic-Adar(AA) [26], Resource Allocation (RA) [27]. However, when it suffer from the network of low clustering coefficients, the prediction accuracy is greatly reduced. Distancebased algorithms assume that similarity score is determined by distance or number of the shortest path between nodes, such as Local Path Index (LP) [28], Katz Index [29]. Although the distancebased method overcomes the low clustering coefficients problem, it is sensitive to sparse networks. Maximum-likelihood methods presupposed that the network structure and node properties are

known, and according to maximizing the likelihood of the observed structure and node properties to obtain some rules and parameters. Then, the likelihood of any non-observed link can be calculated by those rules and parameters. The hierarchical structure model(HSM) [30], the stochastic block model(SBM) [31] and Hamiltonian [32] are typical examples of such algorithms. However, these algorithms are designed for unweighted networks. In recent years, some researchers have started to pay attention to the weight network. Tsuyoshi Murata et al. [8] propose weighted graph proximity measure for link prediction of social networks, and extend the indicators of CN, AA, and PA to weighted networks, which link prediction performance is improved by considering the weight links information. De sa´ et al. [33] propose a supervised machine learning approach for link prediction in weighted networks. The link weighted can view as the strength of relationships and useful information. This method is applied to the a coauthorship data set and achieved good results. Lu¨ et al. [9] apply weak-ties Theory to weighted networks and use a free parameter to control the relative contributions of weak ties to the weight similarity measure. Pech et al. [34] present the method of robust principal component analysis (robust PCA) for link prediction in weight network and unweighted network. In short, the above algorithms only rely on the link weight information and do not consider the network topology. Zhao et al. [5] use reliable-route method to extend unweighted local similarity indices to weighted similarity indices, and this method gain good performance in PPI networks. Moradabadi et al. [35] use learning automata to learn the optimal action based on reinforcement signals in weighted networks. However, this algorithm is only available for co-authorship and email networks. Settet et al. [36] employ the min-flow and multiplicative methods to extend WCN, WAA, WRA. These methods are widely used in various types of weighting networks, and far superior to the performance of unweighted networks. 3. Method In this section, we discuss how the model enables more useful link weight information and captures local topology information through weighted local weight similarity. 3.1. Problem description Considers an undirected weighted networks G(ν , E, W), where ν , E and W are sets of nodes, links and link weights, respectively.

Let A = [ai j ] ∈ RN×N is adjacency matrix of weighted network. In + this paper, We assume that the link weights of any two nodes i and j are nonnegative and symmetric, namely, Wij > 0 and Wi j = W ji , where Wij denotes the weight on link (i, j). Multiple links and self-

loops are not allowed here. We further denote all possible |V |(|V2 |−1 ) links as U, and U − E is set of nonexistent links. The goal of link prediction is to find missing links from the set U − E. To validate the algorithm accuracy, we randomly divide the observed link set E into two parts:the training ET and the probe set EP , with the former view as given information and the latter only used for testing. Clearly, E T ∩ E P = φ and E T ∪ E P = E. We introduce the basic weighted nonnegative matrix factorization model we use in this paper. Weighted nonnegative matrix factorization model has been applied in collaborative filtering and clustering [37–39]. Let A˜ ∈ RN×N is adjacency matrix of unweighted + network, and its decomposition is as follows:

minU˜ ≥0,V˜ ≥0 |S˜ ◦ (A˜ − U˜ V˜ T )|2F + β (|U˜ |2F + |V˜ |2F )

(1)

where U˜ ∈ RN×K denotes the basic matrix, V˜ ∈ RN×K denotes the + + coefficient matrix, K is the dimension of latent space. ◦ denotes the hadamard product, β is prevent over-fitting problem, S˜ is indicator

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx

matrix, if S˜i j is equal to 1, then i and j are connected,and equal to 0 otherwise. 3.2. Exploiting link weight information In weight networks, weight of link information plays an important role in improving weighted network link prediction. However, most real-world weighted networks are sparse, that is, observable link weight information is only a small fraction. Therefore, we employ the weighted cosine similarity (WCS) method to exploit more link weight information for sparse networks. It mainly shares more common neighbors between nodes, and nodes may be similar, such as if two strangers have many common friends, the chances are that they become friends. Let xi = (ai1 , . . . , an1 ) be the link weight between node i and other nodes. Then the indicator weighted matrix is determined by the similarity xi and xj . Here we consider the weighted cosine similarity as the indicator weighted matrix, i.e., for nodes i and j, WCS score matrix is defined as follows:

Si j =

n  i, j=1

wi j 

n

i=1

xi x j x2i



(2) n j=1

x2j

where wij represents the weight of link between nodes i and j. Since S contains all the original network links weight, we assign S as the indicator weighted matrix to our model. We rewrite the first part of Eq. (1) as follows:

minU≥0,V ≥0 |S ◦ (A − UV T )|2F

(3)

where U ∈ RN×K denotes the basic matrix,V ∈ RN×K denotes the co+ + efficient matrix, K is the dimension of latent space. ◦ denotes the hadamard product.

In this section, our goal is to employ graph regularization technology and WCS such that topological structural information of network is well preserved. Specifically, we aim at allocating similar vector stands for nodes with similar topological structural. Let us denote vi and vj a feature vector associated to node i and j. If node i and j have similar local topological structural, then their vector vi and vj should also be similar in the latent space. We use euclidean distance d (vi , v j ) = vi − v j 2 to measure this. To make the weight of V as close as possible to the weight of S, and S should be as large as possible. We employ the graph regularization to minimize the sum similarity errors between weight of S and weight of V:

G=

Eq. (5) use the lagrangian multiplication rule to update. The object function has two variables U and V. It is non-convex optimization. We can fix with the other one of these two variables. The objective function is convex. According to matrix trace properties :T r (A ) = T r (AT ), T r (AB ) = T r (BA ), we can rewrite Eq. (5) as follows:

JGW NMF = T r{[S ◦ (A − UV T )][S ◦ (A − UV T )]T } + α T r (V T LV ) + β [T r (U T U ) + T r (V T V )] We introduce a lagrange multiplier matrix  = [φnk ] ∈ [ψnk ] ∈ RN×K then we rewrite Eq. (17) as follows: +

1 d (vi , v j ) × Si j = T r (V T DV ) − T r (V T SV ) = T r (V T LV ) (4) 2 i, j

+ T r (U T ) + T r ( V T )

∂ JGW NMF = −2S ◦ AV + 2S ◦ (UV T )V + 2β U ∂U

JGW NMF = min

U≥0,V ≥0

Unk ← Unk

(5) where ◦ denotes the hadamard product, β is prevent over-fitting problem, α is control the contribution of local topology structure.

(S ◦ AV )nk [S ◦ (UV T )V + 2β U]nk

(9)

Updating V: to update V with U fix Because

∂ JGW NMF = −2(S ◦ A )T U + 2(S ◦ (UV T ))T U + 2α (D − S )V + 2β V ∂V (10) According to KKT condition, we get:

[(S ◦ A )T U + α SV ]nk [S ◦ ((S ◦ (UV T ))T U + α DV + β V ]nk

(11)

We acquire new the basic matrix U and the feature matrix V by minimizing Eq. (5). Finally, we calculate the similarity score of  = UV T . Therefore, we propose the link preoriginal network with A diction algorithm for GWNMF (Algorithm 1) as follows: Algorithm 1 Algorithm GWNMF. Input: A: adjacency matrix of undirected weighted network; K: dimension of latent space; maxiter : maximum number of iterations; Parameter: α , β ; Output:  Similarity score matrix A 1: 3: 4: 5: 7:

|S ◦ (A − UV T )|2F + α T r (V T LV ) + β (|U |2F + |V |2F )

(8)

According to KKT condition, we have:

8:

By integrating link weight information and local information, we propose a unified model GWNMF for link prediction in weighted networks. The optimized objective function is expressed as follows:

(7)

Updating U: to update U with V fix Because

6:

3.4. The unified model:GWNMF

=

+ α T r (V T LV ) + β [T r (U T U ) + T r (V T V )]

2:

where Tr( · ) indicates the trace of a matrix, D is a diagonal matrix  (D = nj=1 Si j ),L = D − S is the laplacian matrix and Sij is the local weighted similarity.

(6) RN×K + ,

JGW NMF = T r{[S ◦ (A − UV T )][S ◦ (A − UV T )]T }

Vnk ← Vnk

3.3. Exploiting local information

3

9: 10: 11:

Divide A into training set E T and probe set E P Randomly initialize U, V Calculate WCS score matrix according to Eq. (2) Exploiting link weight information according to Eq. (3) Preserve local information according to Eq. (4) For t=1:maxiter do Update U according to Eq. (9) Update V according to Eq. (11) Get U and V after convergence; endfor  = UV T Compute probability matrix for link prediction A

3.5. Computational complexity analysis he computational cost of the GWNMF algorithm uses the multiplicative updating rules to optimize the objective function. In each iteration, updating U according to Eq. (9) requires O(m2 ∗ K

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

ARTICLE IN PRESS

JID: NEUCOM 4 ∗ ∗ ∗

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx

Niter ), and updating V according to Eq. (11) requires O(n2 ∗ K Niter ). Therefore, the total time complexity of GWNMF is O(n2 ∗ K Niter ).

G(U, U ) =

 (S ◦ (U V T )V + β U )i jUi2j Ui j

ij

−2

3.6. Proof of convergence



U (S ◦ AV )i jUi j 1 + log i j Ui j

ij

In this section, we will prove that the objective function J of the proposed GWNMF algorithm is nonincreasing under the update formulas Eq. (9) and Eq. (11). We propose lemmas and definitions based on Lee et al. [40] to help us prove convergence. Definition: Let function G(h, h ) is an auxiliary function of the function L(h), if

G(H, H ) ≥ L(h ), G(h, h ) = L(h ), f or any h, h

ij

ht+1 = arg min G(h, h ) Proof. L(ht+1 ) ≤ G(ht+1 ),ht ≤ G(ht ), ht = L(ht ) g×g



Si j

S2 ij ij



T

≥ T r SF SG



(12)

Proof. Let Si j = Si j pi j . According to Eq. (12), we obtain the lefthand side (LHS) and the right-hand side (RHS), respectively. g n  

G S p2 Fix Sxy yj ij ij

(13)

G S p p Fix Sxy y j i j i j xy

(14)

i,x=1 j,y=1

RHS =

g n   i,x=1 j,y=1

Then we have,

LH S − RH S =

g n  



ij

u

) > 1 + log ui j ,we have

(S ◦ AV )i jUi j 1 + log

ij

Ui j Ui j

ij



< T r (U T (S ◦ AV ))

G(U, U ) > J (U )



Lemma 2. Let F ∈ Rn+×n , G ∈ R+ are any two symmetric matrin×g n×g ces,S ∈ R+ and S ∈ R+ be two any nonnegative matrices, then the following inequality holds g n  F S G 



ui j u

> T r (S ◦ (U V T )V U T + 2β U U T )

Therefore, we obtain the following inequality:

h

LHS =

Ui j

ij

u

Lemma 1. If G(h, h ) is an auxiliary function for L(h), then L(h) is non-increasing under the update

i=1 j=1

 (S ◦ (U V T )V + β U )i jUi2j

For ∀ ui j ) > 0,then

is satisfied.



we need to prove G(U, U ) is an auxiliary function of J(U). Obviously, when U = U ,then G(U, U ) = J (U ).If U = U , according to Lemma 2, we get:

G S ( p2 − p p ) Fix Sxy yj ij i j xy ij

(15)

Thus G(U, U ) is an auxiliary function of J(U) To find the globe minimum of G(U, U ) with U fixed, we take derivative of G(U, U ) on Uij , then according to KTT conditions, we have

Ui j ∂ G(U, U ) (S ◦ (U V T )V + βU )i jUi j = − 2(S ◦ AV )i j =0

∂ Ui j Ui j Ui j Therefore, we have updated the formula

Ui j ← Ui j

(S ◦ AV )i j [S ◦ (U V T )V + 2β U ]i j

(18)

) Since the Hessian matrix ∂∂ UG(U,U is positive define and G(U, U ) is i j ∂ Uki a convex function. Therefore, using (9) at each iteration of GWNMF, the value of objective function J is nonincreasing. Similarly, we can prove that the GWNMF algorithm is iteratively updated using Eq. (11), and the value of the objective function J is also nonincreasing.  2

4. Experiment results

i,x=1 j,y=1

4.1. Evaluation metric

Since F and G are symmetric matrices, this is equal to n   g

LH S − RH S =



G S

Fix Sxy yj ij

p2i j + p2xy 2

i,x=1 j,y=1

=



− pi j pxy

g n 1  

G S ( p − p )2 ≥ 0 Fix Sxy xy yj ij ij 2

(16)

i,x=1 j,y=1

 Based on the above lemma 1 and 2, we prove the convergence of the GWNMF algorithm in the following Lemma. Lemma 3. Fixing any one matrix in U and V, and using updating rules (9) and (11) in each iteration of algorithm GWNMF, the value of objective function J is nonincreasing.

J = T r{[S ◦ (A − UV )][S ◦ (A − UV )] T

T

}

+ α T r (V LV ) + β [T r (U U ) + T r (V V )] T

T

T

(17)

We remove the irrelevant terms and get a function that is only related to U.

J (U ) = T r[U T (S ◦ (UV T )V ) + 2β U T U] − T r (2U T (S ◦ AV )) Let

(1) AUC: It can be interpreted as the probability that a missing links has higher score than a nonexistence links. In practice, among n independent comparison, we randomly pick a missing link and a nonexistence link at each time to compare their score. If n1 times the missing link having a higher score and n2 times they have the same score, the AUC value is

AUC =

Proof. First, let’s rewrite the objective function J, T

To verify the performance of our proposed algorithm, we employ AUC(the area under the receiver operating characteristic curve) [41], Precision [42], RMSE(Root Mean Square Error) [5,43] and Pearson Correlation Coefficient(PCC) [18] for the accuracy measurement.

n1 + 0.5 ∗ n2 n

(19)

(2) Precision: Given the ranking of the non-observed links, the precision is defined as the ratio of relevant items selected to the number of items selected. Namely, if we take the top-L links as the predicted ones and only Lr right links in probe set EP , then

precision =

Lr L

(20)

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx

5

Table 1 The basic topological features of eight real-world networks. N is the number of nodes and M is the number of links. CC and CCW are the clustering coefficient of node in unweighted networks and weighted networks,respectively. r is assotration, respectively. < k > is the average degree, < d > is the average 2 > distance, H =
BayWet BayDry Celegans USAir Language Bioscgt Biocegn Adolescent

|N|

|M|





r

H

CC

CCW

128 128 306 332 614 1715 2219 2539

2106 1247 306 2126 1247 33,985 53,676 10,455

32.9063 4.0603 14.0392 12.8072 4.0603 39.6327 48.3785 8.2355

1.7724 3.556 2.3128 2.7381 3.5560 2.7507 2.6858 4.5594

−0.1044 −0.1681 −0.1632 −.2079 −0.16815 0.0632 0.0681 0.2513

1.2307 5.1786 1.8554 3.4639 5.1786 2.3883 1.9198 1.2741

0.3346 0.0828 0.2838 0.6252 0.0828 0.3471 0.1839 0.1467

0.1793 0.0389 0.2522 0.3203 0.0389 0.2988 0.1474 0.1350

(3) RMSE and PCC: It are the standard deviation of the differences between the vectors of predict and actual weighted for links, the RMSE and PCC value are



RMSE =

i, j

( wi j − ri j )2 n

 PCC =



i, j

(wi j − w )(ri j − r )  2 i, j (ri j − r )

2 i, j (wi j − w )

(21)

(1) Common neighbors(CN), which is denoted as

SCN xy = | (x ) ∩ (y )|

(23)

where (x) denote the set of neighbors of x. For weighted networks, WCN is denoted as

SWCN = xy



Wxz + Wzy

(24)

z∈Oxy

(22)

where wij and rij are predict weighted and actual weighted, respectively. n is the number of probe set. w and r are the average of w and r, respectively. Clearly, Eq. (21), smaller score corresponds with better performance. Eq. (22), higher score corresponds with better performance.

(2) Adamic/Adar(AA), which is denoted as

z∈ (x )

We compare our method with baseline indices on eight realworld weighted networks, the statistics of which are concluded in Table 1. The data of networks comes from different field. Language [44]: A bipartite network denotes which languages are spoken in which countries. The nodes represent countries and languages. The weight of a link denote the proportion (between zero and one) of the population of a given country speaking a given language. Celegans [45]: A directed, weighted network representing the neural network of C. Elegans. The nodes denote the neurons. The weight of a link is the number of synapses between the corresponding neuron pair. USAir [46]: The network is the aviation network of USA, where nodes denote the airports, and the links is the routes. The weight of a link represent the frequency of flights between two airports. Biocegn and Bioscgt [47]: The weighted network of biological networks, where nodes denote gene. Baywet [46]: A network contains the carbon exchanges in the cypress wetlands of South Florida during the wet season,where the nodes represent taxa and an link denotes that a taxon uses another taxon as food with a given trophic factor (feeding level). Adolescent [48]: A directed, weighted network was created from a survey that took place in 1994/1995. Each student was asked to list his 5 best female and his 5 male friends. The nodes represents a student. The weight of a link is no common activity at all. Baydry [46]: A network contains the carbon exchanges in the cypress wetlands of South Florida during the dry season,where the nodes represent taxa and an link denotes that a taxon uses another taxon as food with a given trophic factor (feeding level). 4.3. Weighted similarity indices

(y )



z∈Oxy

=

1 log(kz )

(25)

Wxz + Wzy

(26)

log(1 + Sz )

(3) Resource Allocation(RA), which is denoted as



RA Sxy =

z∈ (x )



(y )

1 kz

(27)

For weighted networks, WRA is denoted as:



z∈Oxy

RA SW = xy

Wxz + Wzy

(28)

Sz

where Oxy denotes the common neighbor set of node x and y. Specifically, Oxy = {z : z ∈ (x ) ∩ (y )}, (x ) stands for the set of neighbors of node x. Wzy is the weight of link (z, y). Sz represents the strength of node z. (4) Preferential Attachment (PA), which is denoted as PA Sxy = kx × ky

(29)

where kx is the degree of node x. For weighted networks,PA index can be extended as: PA SW = xy



a∈ (x )

Wax ∗



Wby

(30)

b∈ (y )

where Wax is the weight of link (a, x) (5) Local Path (LP), which is denoted as: LP Sxy =

∞ 

| path xy | + ε

l=2

∞ 

| path xy |

(31)

l=3

For unweighted networks, we only consider the case where the paths are 2 and 3. For weighted networks, LP index can be extended as: LP SW = xy

∞  l=2

In this section, We will briefly introduce some existing typical weighted indices. They will be used as baseline methods to compare with our proposed algorithm in later experiments.



where kz is the degree of node z. For weighted networks, WAA is denoted as

SWAA xy 4.2. Dataset



AA Sxy =

Wxz + Wzy + ε

∞ 

Wxa + Wab + Wby

(32)

l=3

(6) Nonnegative matrix factorization(NMF) It is directly applied on the real network to produce the score matrix only use the network structure information.

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx Table 2 Comparison of the prediction accuracy under the Precision in eight weighted networks. The results are the average 20 independent implementations with random partitions of training set (90%) and probe set (10%). For each networks, the highest value is emphasized in boldface.

Baydry Baywet Celegans USAir Language Biocegn Bioscgt Adolescent

WCN

WAA

WRA

WPA

WLP

NMF

LR

GNMF

GWNMF

0.0752 0.0782 0.0997 0.3693 0.0333 0.1024 0.2704 0.1056

0.0785 0.0789 0.1061 0.3885 0.0406 0.1046 0.2834 0.1122

0.0759 0.0770 0.1048 0.4493 0.0351 0.0904 0.3376 0.0935

0.1598 0.1563 0.0598 0.3186 0.0603 0.0344 0.0855 0.0029

0.1806 0.1836 0.1235 0.3595 0.0442 0.1171 0.2327 0.0669

0.4629 0.4630 0.1337 0.3729 0.0278 0.1861 0.4105 0.0450

0.5338 0.5324 0.1270 0.4066 0.0090 0.2690 0.4666 0.0287

0.4838 0.4858 0.1499 0.3892 0.0212 0.1748 0.3933 0.0007

0.5481 0.5489 0.1680 0.4581 0.0635 0.2815 0.4997 0.0792

Table 3 Comparison of the prediction accuracy under the AUC in eight weighted networks. The results are the average 20 independent implementations with random partitions of training set (90%) and probe set (10%). For each networks, the highest value is emphasized in boldface.

Baydry Baywet Celegans USAir Language Biocegn Bioscgt Adolescent

WCN

WAA

WRA

WPA

WLP

NMF

LR

GNMF

GWNMF

0.5986 0.6006 0.8516 0.9284 0.6330 0.8818 0.9458 0.7699

0.6062 0.6035 0.8711 0.9458 0.6332 0.8854 0.9490 0.7703

0.6106 0.6150 0.8751 0.9503 0.6326 0.8874 0.9533 0.7683

0.7213 0.7196 0.7682 0.8854 0.7063 0.8530 0.8717 0.6150

0.7621 0.7637 0.8689 0.9210 0.7257 0.9139 0.9342 0.8461

0.9248 0.9232 0.8499 0.9152 0.6448 0.9083 0.9349 0.7754

0.9099 0.9078 0.5761 0.8072 0.4977 0.6332 0.7269 0.5262

0.9250 0.9348 0.827 0.9209 0.8075 0.9095 0.9439 0.7302

0.9493 0.9476 0.8927 0.9268 0.6924 0.9358 0.9575 0.8558

(7) Low Rank(LR) index employ Robust PCA method for link prediction to predict missing links. (8) GNMF [19] method combines the link structure with graph neighbor information of nodes.

min

U≥0,V ≥0

A − UV T 2F + λT r (V T LV )

(33)

Where λ is parameter, L is graph laplacian matrix and L = D − W, D is a diagonal matrix and W is a similarity matrix between nodes. Therefore, the GNMF method similarity score SGNMF = UV T . In this paper, the weights are analogous to link-existence probabilities, for networks whose weights do not lie in the range[0,1]. Therefore, we need to normalize link weights by mapping to [0,1] through

w =

1 1 + e−w

(34)

where w and w stands for the original and regulated weights, respectively. 4.4. Experiment analysis We use four different metrics and eight weight networks to test the performance of our proposed algorithm. The GWNMF contain four parameter α and β ,latent factorization dimensionality K and maximum number of iterations maxiter which directly affect algorithm performance, respectively. We empirically set α =0.1, β =1, maxiter = 40 and K = 70. The results of the experiment are reported in Tables 2–5. For convenience, we refer to WCN, WAA, WRA, and WPA as weight local similarity. Tables 2 and 3 demonstrate that our proposed model achieves significant improvements in most weighted networks. Specifically, in precision metric, our methods has an notable improvement over the second excellent method by 1.4%, 1.6%, 1.8%, 0.8%, 0.3%, 1.2% and 3.3% for the datasets Baydry, Baywet, Celegans, USAir, Language, Biocegn and Bioscgt, respectively. While in AUC metric,our methods has an remarkable improvement over the second good method by 2.4%, 1.2%, 2.1%, 2.1%, 1.3% and 0.9% for the

datasets Baydry, Baywet, Celegans, Biocegn, Bioscgt and Adolescent,respectively. From Tables 2 and 3, the weight local similarity methods have acquired lower-quality performance because they all rely on common link weights information between nodes, e.g., BayWet and BayDry networks are asymmetric networks. Besides, another reason for the poor performance of weight local similarity methods is the preference for high clustering coefficient networks, i.e., US Air network. The LR index also achieved better performance because LR uses the Robust PCA method to robust sparse networks and eliminate noise. In addition, we verify that our approach is effective in dealing with sparse networks. We randomly remove some links from the training set. In this experiment, the random removal link ratio was 40–90%, and the experimental results are shown in Figs. 1 and 2. Experimental results show that our method is robust to sparse networks. RMSE evaluates weight prediction metrics, and its experimental results are reported in Table 4. From Table 4, the lowest RMSE are achieved by GWNMF methods in network Baydry, Baywet, USAir, Adolescent, Biocegn and Bioscgt, respectively, confirming the remarkable performance of GWNMF method. The reason that the performance of NMF, GNMF and GWNMF are very close or equal is that they have the same theory. The PCC metric measures the correlation between the vectors of actual weight and predicted weights(similarity scores) for the links in the probe set. Table 5 shows our methods all obtain positive values, indicating that positive linear correlation between the actual weight and the predicted weight. If PCC is negative, there is no correlation between the actual weight and the predicted weight, i.e., in Language network, PCC of WCN is −0.0916. Therefore,GWNMF gets higher-quality performance in most weighted networks exceptions for USAir, Language and Adolescent networks. Whether the link prediction algorithm can improve performance after considering the link weights. We selected two indicators WRA and GWNMF to verify. In Fig. 3, when GWNMF considers links weights, GWNMF performance is significantly improved. Fig. 3 shows that WRA has improved in most weight networks. The main reason is that similarity and link weights are fully represented between nodes. In summary, the link prediction algorithm

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx USAir

Celegans

Baydry 0.95 0.9

0.95

0.85 AUC

0.8 AUC

0.75 0.7

0.8

0.65 0.75

0.6

0.7

0.55 0.4

0.5

0.6

0.7

0.8

0.9

0.4

Ratio of removed links

0.5

WCN

WAA

0.65

AUC

AUC

0.7

0.6 0.55 0.5 0.45 0.7

0.9

WRA

0.7 0.6 0.55 0.5

0.8

0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5

0.9

0.4

Ratio of removed links

0.5

0.6

0.7

0.8

0.6

0.7

0.8

0.9

0.4

Ratio of removed links WLP

WPA

0.75 0.65

0.4

NMF

LR

0.9

0.5

0.6

0.7

0.8

0.9

Ratio of removed links

GNMF Bioscgt

GWNMF Adolescent

1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5

AUC

0.75

0.6

0.8

0.8

Biocegn

0.8

0.5

0.7

0.9 0.85

Ratio of removed links

Language

0.4

0.6

0.95

0.9 0.85 0.8 0.75 AUC

AUC

0.9 0.85

Baywet

0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45

AUC

1

7

0.7 0.65 0.6 0.55 0.5

0.4

Ratio of removed links

0.5

0.6

0.7

0.8

0.9

0.4

Ratio of removed links

0.5

0.6

0.7

0.8

0.9

Ratio of removed links

Baydry

0.5

0.6

0.7

0.8

0.9

0.5

0.6

0.6

0.5

0.5

0.4 0.3 0.2

WCN

0.1

0

0 0.5

0.7

0.8

0.9

0.3

0.1

0.25

0.5

WLP

0.02

0.05

0

0 0.8

Ratio of removed links

0.9

0.4

0.5

0.6

0.7

0.7

0.8

0.9

0.4

NMF

LR

0.5

0.8

0.9

Ratio of removed links

0.6

0.7

0.8

0.9

Ratio of removed links

GNMF

GWNMF Adolescent

Bioscgt

0.15 0.1

0.6

Ratio of removed links

0.2

0.04

0.7

0 0.4

Precision

0.35

0.12 Precision

Precision

0.6

Ratio of removed links WAA WRA WPA

0.14

0.06

0.3 0.2

Biocegn

0.08

0.4

0.1

0.4

0.6

0.3

0.1

0.16

0.5

0.4

0.2

Language

0.4

Precision

0.7

0.4

Ratio of removed links

Baywet

0.7

0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05

0.16 0.14 0.12 Precision

0.4

Celegans

0.6

Precision

USAir

0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05

Precision

Precision

Fig. 1. Predicting missing links for different ratio of removed links. Link prediction accuracy measured by AUC on the eight networks. Ratio of removed links ranges from 0.4 to 0.9.

0.1 0.08 0.06 0.04 0.02 0

0.4

0.5

0.6

0.7

0.8

Ratio of removed links

0.9

0.4

0.5

0.6

0.7

0.8

0.9

Ratio of removed links

Fig. 2. Predicting missing links for different ratio of removed links. Link prediction accuracy measured by Precision on the eight networks. Ratio of removed links ranges from 0.4 to 0.9.

will improve performance after considering the link weights, but different algorithms adapt to different networks. 4.5. Parameter sensitivity To further evaluate the effect of GWNMF’s parameters, we vary these on eight weighted networks. Our model has four important parameters dimension of latent space K, maximum number of iterations maxiter , parameters α and β , of which α controls the contribution of topology information information and β prevents over-fitting problem. We vary of K as {10,20,30,...,100}, α as {0.1,1.9,2.8,3.7,4.6,5.5} and β as {105 , 103 , 101 , 10−1 , 10−3 , 10−5 }, maxiter as {10,20,30,...,80}. In the experiment, we fixed the other three parameters of the four parameters. For example, we want to study the effect of α on performance, we have to fix K, maxiter , β . Other parameters can be similarly set. Impact of α

Fig. 5 indicates that the effects of different choices of parameter

α , ranging from 10−5 to 105 . When the value of α is greater than 0.1, our performance rapidly decreases because the model fails to capture topology information. As the ratio of α decreases, the performance gradually improves, and finally stays at a constant level. From the above analysis, we can choose the best is α = 0.1. Impact of β Fig. 6 demonstrates that GWNMF is sensitive to changes in parameter β . When β is less than 1, the performance rapidly decreases because the GWNMF converges slowly for a given number of iterations. As the β increases( > 1), our performance rapidly degrade because the model leads to over fitting problem. It can be seen that for both GWNMF, the optimal β value is 1. Impact of K We focus on how to select the dimension of latent space K value in weighted network, and the K value directly affects the accuracy and time complexity of our proposed algorithm. If K is too large, it will increase the algorithm time complexity; if K is

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

ARTICLE IN PRESS

JID: NEUCOM 8

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx Table 4 Comparison of the prediction accuracy under the RMSE in eight weighted networks. The results are the average 20 independent implementations with random partitions of training set (90%) and probe set (10%). For each networks, the highest value is emphasized in boldface.

Baydry Baywet Celegans USAir Language Biocegn Bioscgt Adolescent

WCN

WAA

WRA

WPA

WLP

NMF

LR

GNMF

GWNMF

0.0864 0.0862 0.0865 0.0510 0.0633 0.0618 0.0705 0.1125

0.0861 0.0861 0.0863 0.0508 0.0654 0.0617 0.0703 0.1127

0.0860 0.0862 0.0860 0.0504 0.0645 0.0618 0.0703 0.1128

0.1402 0.1431 0.0644 0.0412 0.0216 0.0801 0.0445 0.0588

0.0833 0.0831 0.0628 0.0348 0.0327 0.0416 0.0475 0.0477

0.0811 0.0808 0.0612 0.0321 0.0207 0.0374 0.0393 0.0167

0.0815 0.0815 0.0589 0.0322 0.0137 0.0375 0.0385 0.0162

0.0812 0.0812 0.0584 0.0314 0.0187 0.0372 0.0388 0.0163

0.0806 0.0808 0.0603 0.0318 0.0213 0.0369 0.0382 0.0162

Table 5 Comparison of the prediction accuracy under the PCC in eight weighted networks. The results are the average 20 independent implementations with random partitions of training set (90%) and probe set (10%). For each networks, the highest value is emphasized in boldface.

Baydry Baywet Celegans USAir Language Biocegn Bioscgt Adolescent

WCN

WAA

WRA

WPA

WLP

NMF

LR

GNMF

GWNMF

0.0732 0.0773 0.1352 0.2985 −0.0916 0.1136 0.2918 0.0198

0.0727 0.0746 0.1495 0.3238 −0.0216 0.1167 0.3001 0.0166

0.0824 0.0763 0.1500 0.3196 0.0197 0.1033 0.3078 0.0186

0.1736 0.1728 0.0900 0.2725 0.0716 0.0810 0.1281 0.0093

0.1949 0.1971 0.1584 0.2848 0.0308 0.1374 0.2712 0.1111

0.3242 0.3244 0.1648 0.2448 0.0304 0.2021 0.3661 0.0611

0.2488 0.2545 0.1286 0.2594 −0.0011 0.1449 0.3782 0.0111

0.2756 0.2785 0.1709 0.2638 0.0384 0.1962 0.3675 0.0061

0.3291 0.3302 0.1889 0.2642 0.0401 0.2666 0.4306 0.0941

1

0.25 unweighted weighted

0.95

unweighted weighted

0.6

unweighted weighted

0.5

unweighted weighted

0.6

0.2

0.5

0.8

0.4

0.15

0.3

PCC

0.85

0.4 RMSE

Precision

AUC

0.9

0.75

0.3

0.1 0.2

0.7

0.1

0.65

0

0.2 0.05

0.1

0

0

nt ce es ol Ad gt c os Bi gn e oc e Bi uag ng La ry yd s Ba an g le Ce et yw Ba r Ai US

nt ce es ol Ad gt c os Bi gn e oc e Bi uag ng La ry yd s Ba an g le Ce et yw Ba r Ai US

nt ce es ol Ad t cg os Bi gn e oc e Bi ag u ng La y r yd Ba ans g le Ce et yw Ba r Ai US

nt ce es ol Ad gt c os Bi gn e oc e Bi uag ng La ry yd s Ba an g le Ce et yw Ba r Ai US

Fig. 3. GWNMF compares the performance of unweighted and weighted networks with four metrics: Precision, AUC, RMSE, and PCC. 1

0.25 unweighted weighted

0.95

unweighted weighted

0.6

unweighted weighted

0.5

unweighted weighted

0.6

0.2

0.5

0.8

0.4

0.15

0.3

PCC

0.85

0.4 RMSE

Precision

AUC

0.9

0.75

0.3

0.1 0.2

0.7

0.1

0.65

0

0.2 0.05

0.1

0

0

nt ce es ol Ad gt c os Bi gn e oc e Bi uag ng La ry yd s Ba an g le Ce et yw Ba r Ai US

nt ce es ol Ad gt c os Bi gn e oc e Bi uag ng La ry yd s Ba an g le Ce et yw Ba r Ai US

nt ce es ol Ad t cg os Bi gn e oc e Bi ag u ng La y r yd Ba ans g le Ce et yw Ba r Ai US

nt ce es ol Ad gt c os Bi gn e oc e Bi uag ng La ry yd s Ba an g le Ce et yw Ba r Ai US

Fig. 4. WRA/RA compares the performance of unweighted and weighted networks with four metrics: Precision, AUC, RMSE, and PCC. 0.6

0.09

0.5

0.9 0.4

0.8 0.75

0.3

0.7 0.1

0.6

0 105

103

101

10-1

10-3

10-5

0.3

0.05

105

103

101

10-1

10-3

10-5

Baywet

Celegans

Baydry

0.25 0.2 0.15

0.03

0.1

0.02

0.05

0.01

105

103

101

10-1

10-3

10-5

Language

Biocegn

Bioscgt

0

105

103

101

10-1

10-3

10-5

Impact of α

Impact of α

Impact of α

Impact of α USAir

0.35

0.04

0.2

0.65

0.4

0.07 0.06 RMSE

Precision

AUC

0.85

0.45

0.08

PCC

1 0.95

Adolescent

Fig. 5. Varying the α for GWNMF on eight weighted networks. α ranges from 105 to 10−5 .

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx 0.6

0.09

0.95 0.5

0.9 0.85

0.4

0.75 0.7

0.3

0.6

0.1

0.55 0.5

0 0.1

1

Baywet

USAir

0.3

0.05

0.25 0.2 0.15

0.03

0.1

0.02

0.05

0.01 0.1

1.9 2.8 3.7 4.6 5.5 Impact of β

0.35

0.04

0.2

0.65

0.4

0.07 0.06 RMSE

Precision

AUC

0.8

0.45

0.08

PCC

1

9

1 1.9 2.8 3.7 4.6 5.5 Impact of β Celegans Baydry

0.1

Language

0 1

1.9 2.8 3.7 4.6 5.5 Impact of β Biocegn

0.1

Bioscgt

1

1.9 2.8 3.7 4.6 5.5 Impact of β

Adolescent

Fig. 6. Varying the β for GWNMF on eight weighted networks. β ranges from 0.1 to 5.5.

1

0.6

0.95

0.4

0.08

0.5

0.35

0.07

0.85 0.8

0.25

0.3

0.05 0.04

0.2

0.75

0.1

0.7 0.65

0 10 20 30 40 50 60 70 80 90 100 Dimension of latent space K

Baywet

Celegans

0.03

0.1

0.02

0.05 0 10 20 30 40 50 60 70 80 90 100 Dimension of latent space K

10 20 30 40 50 60 70 80 90 100 Dimension of latent space K Language

Baydry

0.2 0.15

0.01 10 20 30 40 50 60 70 80 90 100 Dimension of latent space K

USAir

0.3

0.06 RMSE

Precision

0.4

Pcc

0.9 AUC

0.45

0.09

Biocegn

Bioscgt

Adolescent

Fig. 7. Average Precision AUC, RMSE, and PCC performances on eight weighted networks using latent matrix factorization based methods when varying K values. Size of K values ranges from 10 to 100.

0.6

0.95

0.5

0.9

0.4

0.85 0.8

0.3

0.1

0.7 0.65

0 10 20 30 40 50 60 70 80 Iteration number USAir

0.4

0.07

0.35 0.3

0.05 0.04

0.2

0.75

0.45

0.08

0.06 RMSE

Precision

AUC

0.09

PCC

1

Baywet

20

30 40 50 60 Iteration number Celegans

70

80

Baydry

0.2 0.15

0.03

0.1

0.02

0.05

0.01 10

0.25

0 10 20 30 40 50 60 70 80 Iteration number

Language

Biocegn

Bioscgt

10 20 30 40 50 60 70 80 Iteration number Adolescent

Fig. 8. Average Precision, AUC, RMSE, and PCC performances on eight weighted networks using iteration number based methods when varying maxiter values. Size of maxiter values ranges from 10 to 80.

too small, it will reduce the algorithm prediction accuracy. Therefore, it is very important to choose the best K value. The experimental results are reported in Fig. 7. From Fig. 7, the performance gradually increases with the change of K. When K = 70, the performance begins to remain stable. Impact of maxiter We further analyze the convergence of the algorithm, and the experimental results are reported in Fig. 8. From Fig. 8, we can see that our models converge fast and the numbers of iteration is 40.

our method is based on the NMF framework elegantly combining link weights and topology information to perform weighted link prediction tasks. In addition, we provide multiplicative updating rules to learn model parameters and theoretical and experimental proofs of the convergence of the algorithm. Extensive experiments results performed on eight real-world weighted networks demonstrate that the superior performance and robustness of the our method over the state-of-the-arts methods. In future research, we consider extending the algorithm to be dynamic weighted and directed networks.

5. Conclusion In this paper, we have focused on the problem of link prediction in weighted networks, and have proposed a novel algorithm based on graph regularization weighted nonnegative matrix factorization which combine weight similar matrix and topology structure to capture link weights and local structure information. Specifically,

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

JID: NEUCOM 10

ARTICLE IN PRESS

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx

Acknowledgments This work was supported by the National Natural Science Foundation of China (Grant no. 61872429). References [1] L. Lü, T. Zhou, Link prediction in complex networks: a survey, Physica A 390 (6) (2011) 1150–1170. [2] W. Peng, X.U. Baowen, W.U. Yurong, X.Y. Zhou, Link prediction in social networks: the state-of-the-art, Sci. China Inform. Sci. 58 (1) (2015) 1–38. [3] G. Berlusconi, F. Calderoni, N. Parolini, M. Verani, C. Piccardi, Link prediction in criminal networks: a tool for criminal intelligence analysis, Plos One 11 (4) (2016) e0154244. [4] Y. Hou, L.B. Holder, Deep learning approach to link weight prediction, in: 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, pp. 1855–1862. [5] J. Zhao, L. Miao, J. Yang, H. Fang, Q.-M. Zhang, M. Nie, P. Holme, T. Zhou, Prediction of links and weights in networks by reliable routes, Sci. Rep. 5 (2015) 12261. [6] B. Zhu, Y. Xia, Link prediction in weighted networks: a weighted mutual information model, PloS One 11 (2) (2016) e0148265. [7] Z. Lin, X. Yun, Y. Zhu, Link prediction using benefit ranks in weighted networks, in: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, volume 1, IEEE, 2012, pp. 423–430. [8] T. Murata, S. Moriyasu, Link prediction of social networks based on weighted proximity measures, in: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, 2007, pp. 85–88. [9] L. Lü, T. Zhou, Link prediction in weighted networks: the role of weak ties, Europhys. Lett. 89 (1) (2010) 18001. [10] B. Liu, S. Xu, T. Li, J. Xiao, X.-K. Xu, Quantifying the effects of topology and weight for link prediction in weighted complex networks, Entropy 20 (5) (2018) 363. [11] F. Ye, C. Chen, Z. Zheng, Deep autoencoder-like nonnegative matrix factorization for community detection, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, ACM, 2018, pp. 1393–1402. [12] X. Luo, M. Zhou, Y. Xia, Q. Zhu, An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems, IEEE Ind. Inform. 10 (2) (2014) 1273–1284. [13] L. Zhu, D. Guo, J. Yin, G. Ver Steeg, A. Galstyan, Scalable temporal latent space inference for link prediction in dynamic social networks, IEEE Trans. Knowl. Data Eng. 28 (10) (2016) 2765–2777. [14] A.R. Nelakurthi, J. He, Finding cut from the same cloth: cross network link recommendation via joint matrix factorization, in: AAAI, 2017, pp. 1467–1473. [15] W. Wang, F. Cai, P. Jiao, L. Pan, A perturbation-based framework for link prediction via non-negative matrix factorization, Sci. Rep. 6 (2016) 38938. [16] C. Dai, L. Chen, B. Li, Y. Li, Link prediction in multi-relational networks based on relational similarity, Inf. Sci. 394–395 (2017) 198–216. [17] X. Ma, P. Sun, G. Qin, Nonnegative matrix factorization algorithms for link prediction in temporal networks using graph communicability, Pattern Recognit. 71 (2017) 361–374. [18] D.K. Wind, M. Mørup, Link prediction in weighted networks, in: 2012 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2012, pp. 1–6. [19] D. Cai, X. He, J. Han, T.S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1548–1560. [20] X. Li, G. Cui, Y. Dong, Graph regularized non-negative low-rank matrix factorization for image clustering, IEEE Trans. Cybern. 47 (11) (2017) 3840–3853. [21] Y. Feng, J. Xiao, K. Zhou, Y. Zhuang, A locally weighted sparse graph regularized non-negative matrix factorization method, Neurocomputing 169 (2015) 68–76. [22] S. Gao, L. Denoyer, P. Gallinari, Temporal link prediction by integrating content and structure information, in: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, 2011, pp. 1169–1174. [23] X. Ma, P. Sun, Y. Wang, Graph regularized nonnegative matrix factorization for temporal link prediction in dynamic networks, Physica A 496 (2018) 121–136. [24] D. Wang, H. Lu, C. Bo, Visual tracking via weighted local cosine similarity, IEEE Trans. Cybern. 45 (9) (2015) 1838–1850. [25] M.E. Newman, Clustering and preferential attachment in growing networks, Phys. Rev. E 64 (2) (2001) 025102. [26] L.A. Adamic, E. Adar, Friends and neighbors on the web, Soc. Netw. 25 (3) (2003) 211–230. [27] T. Zhou, L. Lü, Y.C. Zhang, Predicting missing links via local information, Eur. Phys. J. B 71 (4) (2009) 623–630. [28] L. Lü, C.H. Jin, T. Zhou, Similarity index based on local paths for link prediction of complex networks, Phys. Rev. E 80 (2) (2009) 046122. [29] L. Katz, A new status index derived from sociometric analysis, Psychometrika 18 (1) (1953) 39–43. [30] A. Clauset, C. Moore, M.E.J. Newman, Hierarchical structure and the prediction of missing links in networks, Nature 453 (7191) (2008) 98. [31] R. Guimerà, M. Sales-Pardo, Missing and spurious interactions and the reconstruction of complex networks, Proc. Natl. Acad. Sci. U.S.A. 106 (52) (2009) 22073–22078.

[32] L. Pan, T. Zhou, L. Lü, C.-K. Hu, Predicting missing links and identifying spurious links via likelihood analysis, Sci. Rep. 6 (2016) 22955. [33] H.R. De Sá, R.B.C. Prudêncio, Supervised link prediction in weighted networks, in: The 2011 International Joint Conference on Neural Networks (IJCNN), IEEE, 2011, pp. 2281–2288. [34] R. Pech, D. Hao, L. Pan, H. Cheng, T. Zhou, Link prediction via matrix completion, Europhys. Lett. 117 (3) (2017) 38002. [35] B. Moradabadi, M.R. Meybodi, Link prediction in weighted social networks using learning automata, Eng. Appl. Artif. Intel. 70 (2018) 16–24. [36] N. Sett, S.R. Singh, S. Nandi, Influence of edge weight on node proximity based link prediction methods: an empirical analysis, Neurocomputing 172 (2016) 71–83. [37] Y. Kim, S. Choi, Weighted nonnegative matrix factorization, in: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. (ICASSP 2009), IEEE, 2009, pp. 1541–1544. [38] Q. Gu, J. Zhou, C. Ding, Collaborative filtering: weighted nonnegative matrix factorization incorporating user and item graphs, in: Proceedings of the 2010 SIAM International Conference on Data Mining, SIAM, 2010, pp. 199–210. [39] W. Shao, L. He, S.Y. Philip, Multiple incomplete views clustering via weighted nonnegative matrix factorization with l2,1 regularization, in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2015, pp. 318–334. [40] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in: Advances in Neural Information Processing Systems, 2001, pp. 556–562. [41] J.A. Hanley, B.J. McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology 143 (1) (1982) 29–36. [42] J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative filtering recommender systems, ACM Trans. Inform. Syst. 22 (1) (2004) 5–53. [43] J. Kunegis, A. Lommatzsch, Learning spectral graph transformations for link prediction, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 561–568. [44] J. Kunegis, Konect network dataset, 2017, (http://konect.uni-koblenz.de/). [45] L.R. Varshney, B.L. Chen, E. Paniagua, D.H. Hall, D.B. Chklovskii, Structural properties of the caenorhabditis elegans neuronal network, PLoS Comput. Biol. 7 (2) (2011) e1001066. [46] V. Batagelj, A. Mrvar, Pajek datasets, (http://vlado.fmf.uni-lj.si/pub/networks/ data/). [47] R.A. Rossi, N.K. Ahmed, Network repository, (http://networkrepository.com). [48] J. Leskovec, A. Krevl, SNAP Datasets: Stanford large network dataset collection, 2014, (http://snap.stanford.edu/data). Guangfu Chen received the M.S. degree in computer software and theory from Guilin University Of Electronic Technology, Guilin, China, in 2011. He is currently pursuing the Ph.D. degree in Shenzhen University, Shenzhen, China. His current research interests include complex networks,link prediction.

Chen Xu received the B.Sc. and M.Sc. degrees from Xidian University in 1986 and 1989, respectively, and the Ph.D. degree from Xi’an Jiaotong University in 1992. He joined the Shenzhen University, Shenzhen, China in 1992 and currently is a Professor. From September 1999 to January 20 0 0, he was a research fellow with the Kansai University, Japan. From August 2002 to August 2003, he was a research fellow with the University of Hawaii, USA. His research interests are image processing, intelligent computing and wavelet analysis.

Jingyi Wang received the B.S. degree in Information Management and Information System from Northwest Normal University, the M.S. degree in Applied Mathematics and the Ph.D. degree in Information and Communication Engineering from Shenzhen University, Guangdong, China, in 2009, 2012 and 2015, respectively. From December 2015 to February 2018, he was a Post-Doctoral Researcher with Shenzhen University. He is currently an Assistant Professor with the College of Mathematics and Statistics, Shenzhen University. His research interests include multi-agent systems, complex dynamical networks.

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 2, 2019;21:28]

G. Chen, C. Xu and J. Wang et al. / Neurocomputing xxx (xxxx) xxx Jianwen Feng received the B.S. degree in Mathematics/Applied Mathematics from Hubei Normal University, Huangshi, China, in 1986, and the M.S. and Ph.D. degrees in Mathematics/Applied Mathematics from Wuhan University, Wuhan, China, in 1995, and 2001, respectively. From 1986 to 1998, he was a faculty member in Yunyang Normal College, Shiyan, China. Since 2001, he has worked in the College of Mathematics and Computational Science, Shenzhen University, Shenzhen. And he is currently a Professor of Applied Mathematics in Shenzhen University. From 2009 to 2010, he was a Visiting Research Fellow and a Visiting Professor with the Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong. He has authored and co-authored more than 30 refereed international journal papers and has been the reviewer for several international journals. His research interests include nonlinear systems, control theory and applications, complex networks, stability theory and applied mathematics.

11

Jiqiang Feng received the B.Sc. degree from Yantai Normal College in 2005, and received the M.Sc. and Ph.D. degrees from Shenzhen University in 2008 and 2011, respectively. He joined the Shenzhen University, Shenzhen, China in 2011 and currently is a Lecturer. His research interests are swarm optimization, fuzzy theory and image processing.

Please cite this article as: G. Chen, C. Xu and J. Wang et al., Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.068