An evidential link prediction method and link predictability based on Shannon entropy

An evidential link prediction method and link predictability based on Shannon entropy

Physica A 482 (2017) 699–712 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa An evidential link...

1MB Sizes 3 Downloads 44 Views

Physica A 482 (2017) 699–712

Contents lists available at ScienceDirect

Physica A journal homepage: www.elsevier.com/locate/physa

An evidential link prediction method and link predictability based on Shannon entropy Likang Yin a,b , Haoyang Zheng a , Tian Bian a , Yong Deng a,c,∗ a

School of Computer and Information Science, Southwest University, Chongqing, 400715, China

b

School of Hanhong, Southwest University, Chongqing 400715, China

c

Institute of Integrated Automation, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xian, Shaanxi, 710049, China

highlights • A novel method for predicting missing links by fusing attribute and structural similarity is proposed. • This paper compares other nine indexes in both unweighted and weighted networks. • A new method to measure the link predictability based on Shannon entropy is proposed.

article

info

Article history: Received 5 December 2016 Received in revised form 18 April 2017 Available online 3 May 2017 Keywords: Complex networks Link prediction Dempster–Shafer theory Belief function Predictability

abstract Predicting missing links is of both theoretical value and practical interest in network science. In this paper, we empirically investigate a new link prediction method base on similarity and compare nine well-known local similarity measures on nine real networks. Most of the previous studies focus on the accuracy, however, it is crucial to consider the link predictability as an initial property of networks itself. Hence, this paper has proposed a new link prediction approach called evidential measure (EM) based on Dempster–Shafer theory. Moreover, this paper proposed a new method to measure link predictability via local information and Shannon entropy. © 2017 Elsevier B.V. All rights reserved.

1. Introduction Complex network is ubiquitous in both nature and society [1,2]. Link prediction is a field of great potential deserving research and application. It is used to predict missing links which will exist in the future or already exists but have not been observed. Link prediction is an effective math tool to handle the uncertainty and potential relationship between nonadjacent node in complex networks [3]. Link prediction conducts research in a wide variety of areas generally related to, but not restricted to network science, such as recommendation on social media [4,5], biology analysis [6], network reconstruction [7,8] and other hot fields [9,10]. In the beginning, researchers used probabilistic models to predict missing links, such as Markov chain model [11,12], statistical relational learning [13], naive Bayes model [14], degree-related clustering [15] and other mathematical tools [16,17]. However, the attributes of nodes are easier to know than the local structure of the nodes. Therefore, some other studies focus on modeling the problem based on the network structure, such as subgraph-based ranking model [18], local information model [19,20], link direction [21] and other research fields [22–24]. Moreover, some studies



Corresponding author at: School of Computer and Information Science, Southwest University, Chongqing, 400715, China. Fax: +86 023 68254555. E-mail addresses: [email protected], [email protected] (Y. Deng).

http://dx.doi.org/10.1016/j.physa.2017.04.106 0378-4371/© 2017 Elsevier B.V. All rights reserved.

700

L. Yin et al. / Physica A 482 (2017) 699–712

consider both structural features and the attributes of nodes to build local conditional probability models to predict missing links [25]. Recently, most of the studies focus on the similarity of nodes [26]. It is easy to understand how it works: if we are good friends, there must be some interests that we share together, and another friend of mine has more chance to make friend with you than a stranger [27,28]. From the online social network, we find an interesting phenomenon that people tend to hide their true interests on the online social networks. One reason for this phenomenon is the fact that people want to stay their comfort zone and try to maintain their relationships. Base on that, we consider the similarity between nodes may be either obvious (e.g. the nodes have same neighbors) or hidden (e.g. the nodes have same attributes). This paper proposed a new method base on the Dempster–Shafer theory [29,30] to combine the two kinds of similarity and present the general formula to calculate the similarity degree between nodes. Moreover, previous study [31,32] consider the unpredictability is also a property of network and proposed the predictability of network based on structural consistency. In this paper, we consider not only the whole network but also every single node has its unique unpredictability. To address this issue, we transfer each node in the network to a belief function to represent each node based on Dempster–Shafer theory. The paper is organized as follows. The related works are briefly presented in Section 2. Section 3 introduces the benchmarks method. Proposed Evidential Measure (EM) and some numerical examples are used to illustrate the procedure of EM in Section 4. The experiment results are shown in Section 5. Discussion about predictability of nodes based on Shannon entropy is stated in Section 6. Finally, the conclusion is presented in Section 7. 2. Related work With the development of big data and data mining fields, link prediction method became an effective analysis tool in communities of physics and computer science [33–35]. The common framework of link prediction methods is the similaritybased algorithm [36,37]. The problem can be simplified as how to measure the probability of any pairs of two individuals become friends in the social network [38]. To solve this problem, the first idea comes to minds is to measure the similarity or connection between the two individuals since people who are alike often become friends [31]. Moreover, the algorithm of similarity index can be extremely simple and easy to understand, such as the Common Neighbors index (CN), it considers that the more the more similar the two nodes. Based on that, in order to improve the accuracy of the prediction algorithm, Adamic–Adar (AA) [39] and Resource Allocation (RA) [19] index have been proposed, both AA and RA index refine the simple counting of common neighbors by assigning the less connected neighbors more weights. The main difference between CN and RA is that RA index punishes the high-degree common neighbors more heavily than AA. Meanwhile, people tend to hide their true interests in the online social network [40]. In order to better explain this phenomenon, it is necessary to focus on other aspects to investigate the similarity, and in this paper, we consider this similarity call structural similarity since it is based solely on the network structure. The structural similarity indices can be classified in various ways, such as using local information, regular equivalence and structural equivalence [41]. Those prediction methods are based on similarity, some studies focus on the attributive similarity while others focus on the structural similarity [38]. However, one link prediction method may work very well for some networks while failing for others. One possible reason for this phenomenon is that there is the huge difference between different networks in both attributive aspect and structural aspect. In order to investigate the predictability of each network, Lü has proposed a fancy method to measure the predictability of networks via structural perturbation method [32]. Qi et al. proposed a novel method to predicting missing links by biased cross-network sampling [42]. Moreover, Shannon entropy [43] is widely used to measure the uncertainty in information science. Furthermore, Xu et al. have proposed the link prediction method based on the path entropy [44], and the entropy-based method was also applied in weighted networks [45]. Shang et al. have proposed a novel method to model the evolving networks based on the past structure [46]. To sum up, most of existing link prediction method only consider either attributive similarity or structural similarity. Based on the previous studies, this paper proposed a new method to fuse those two parts. Moreover, link predictability becomes more and more important for predicting missing links, however, most of the existing method do not consider the link predictability, this paper proposed a fancy method to measure the predictability of node based on Shannon entropy and Dempster–Shafer evidence theory. 2.1. Introduction to Dempster–Shafer evidence theory Theory of the evidence is an efficient tool to handle uncertain information between many information sources [29,30]. Dempster–Shafer theory is often applied to uncertain decision making [47–49], fuzzy information processing [50], Znumbers modeling [51,52], D numbers modeling [53–55], information fusion [56–58] and other hot fields [59]. In order to get a better explanation of this method, some basic concepts are introduced follows. The frame of discernment (FOD) is used to represent the set of all observed events. Let φ be the set of mutually exclusive and collectively exhaustive events Ei , namely

φ = {E1 , E2 , . . . , Ei , . . . , En }.

(1)

L. Yin et al. / Physica A 482 (2017) 699–712

701

The power set of φ is denoted by 2φ , and 2φ = {∅, {E1 }, . . . , {En }, {E1 , E2 }, . . . , φ}

(2)

where the ∅ is denoted empty set. For a FOD φ = {E1 , E2 , . . . , Eφ }, a mass function is a mapping m from 2φ to [0, 1], mass function is used to transform every event to probability, formally defined as m : 2φ → [0, 1]

(3)

which need to meet the following properties m(∅) = 0



and

m(θ ) = 1,

0 ≤ m(θ ) ≤ 1.

(4)

θ∈2φ

Dempster–Shafer theory has many merits in uncertainty modeling [60] due to its high efficiency and accuracy. In the theory of evidence, the Belief function (Bel) and Plausibility function (Pl) are defined as Bel(A) =



m(B)

(5)

m(B).

(6)

B⊆A

Pl(A) =

 B∩A̸=∅

The belief function Bel(A) represents the justified specific support for the focal element (or proposition) A, while the plausibility function Pl(A) represents the potential specific support for A. The length of the belief interval [Bel(A), Pl(A)] is used to represent the degree of imprecision for A. 2.2. Data pretreatment The real world is very complex with many factors affects each other [61–63]. Different complex networks are sued to model different complicated systems [64,65]. In this paper, we choose nine representative networks from real network data, including: (i) C.elegans—The neural network of the nematode worm C. elegans [66]; (ii) NS—A coauthorship network of scientists working on network theory and experiment [67]; (iii) USAir—The US Air transportation system [38]; (iv) Jazz—A collaboration network of jazz musicians [68]; (v) Power—An electrical power grid of the western US [66]; (vi) Metabolic— A metabolic network of C.elegans [69,70]; (vii) Yeast—A protein–protein interaction network in budding yeast [71]; (viii) Router—A symmetrized snapshot of the structure of the Internet at the level of autonomous systems [72]; (ix) PB— A network of the US political blogs [73]. The basic topological features of the nine real networks are shown in Table 1. There are some other weighted networks which are used to measure the accuracy of proposed method, including [74]: (i) Adolescent health [75] is a directed network which was created from a survey which was presented in 1994–1995. In this network, the nodes represent the adolescent and the links from node vi to vj means the adolescent vi choose another adolescent vj be his/her friend. Moreover, the link weights indicate more interactions between two adolescents. (ii) King James [76] is an undirected network which contains both names and the occurrences of the King James bible. (iii) USA airports [76] is a directed network of flights between USA airports in 2010. Each link represents an airline from one airport to another, and the weight of a link shows the number of flights on that connection in the given direction. In order to test the accuracy of proposed EM model, the data set S is divided into two parts, one is the training set S T and another part is the probe set S P . Obviously, S = SP



ST

and S P



S T = ∅.

(7)

The training set contains the 90% of the whole data, the 10% of the whole data is the probe set. Both training set and the probe set are divided randomly with maintaining the connectivity of the whole network simultaneously. Moreover, only the information of S T is allowed to be used to compute the performance score Sxy . While the probe set, S P , is used for testing and no information therein is allowed to be used for prediction [77]. 2.3. Evaluation metrics In this paper, we use a general measurement method called Area Under the receiver operating characteristic Curve (AUC) to measure the accuracy of proposed method and compare the result with a bunch of existing prediction methods. In the field of Signal Detection Theory (SDT), the Receiver Operating Characteristic (ROC) is often used to evaluate the effectiveness of classification algorithm. Likewise, we use the AUC to measure the accuracy of link prediction [38]. Namely, AUC =

n′ + 0.5n′′

(8) n where n is times of independent comparisons, n′ is the times that the missing link having a higher score and n′′ is times that the missing link and nonexistent link having the same score.

702

L. Yin et al. / Physica A 482 (2017) 699–712

Table 1 The basic topological features of nine real networks. N is the total numbers of nodes and M is the total numbers of links. C and A denote clustering coefficient ⟨k2 ⟩

and assortative coefficient, respectively. L is the number of self-loops in the networks. H denotes the degree heterogeneity, defined as H = ⟨k⟩2 , where ⟨k⟩ is the average degree of the network [78]. Network

N

M

C

A

H

L

C.elegans NS USAir Jazz Power Metabolic Yeast Router PB

297 1589 332 198 4941 453 2361 5022 1222

2 345 2 742 2 126 5 484 6 594 4 596 7 182 6 258 19 021

0.308 0.791 0.749 0.633 0.107 0.655 0.368 0.033 0.360

−0.163

1.801 2.011 3.464 1.395 1.450 4.485 3.486 5.503 2.971

0 0 0 0 0 22 536 0 0

0.462 −0.208 0.020 0.003 −0.226 0.444 −0.138 −0.221

This measurement acts like a randomly chosen missing link (e.g. a link in S P ) is given a higher score than a randomly chosen nonexistent link. For the general cases, if a prediction model has a better effect than choosing links randomly, the AUC should be larger than 0.5 and vice versa. Generally speaking, the degree to which the accuracy exceeds 0.5 indicates how much the algorithm performs better than pure chance. 3. Benchmarks 3.1. Common Neighbors (CN) For a node x, let Γ (x) denote the set of neighbors of x. In common sense, two nodes, x and y, are more likely to have a link if they have many common neighbors. Namely

 

Sxy = Γ (x)



  Γ (y) .

(9)

3.2. Adamic–Adar Index (AA) This index represents the counting of common neighbors by giving the lower connected neighbors more weights [39], k(z ) is the degree of node z. Namely, Sxy =

1



log(k(z )) z ∈Γ (x)∩Γ (y)

.

(10)

3.3. Resource Allocation (RA) RA index [19] is similar to AA index. The only difference is the size of punishment to the large-degree node (i.e., 1 ). k(z )

1 log k(z )

and Moreover, it should be noted that the difference between RA and AA is insignificant when the average degree is small, and it is great otherwise. The definition of RA is Sxy =

1

 z ∈Γ (x)



Γ (y)

k(z )

.

(11)

3.4. Salton index This index was proposed by Salton [79] and defined as

 |Γ (x) Γ (y)| Sxy = √ k(x) × k(y) where k(x) = |Γ (x)| denotes the degree of x. Salton index is also called cosine similarity in the literature.

(12)

3.5. Jaccard index This index was proposed by Jaccard [80] and defined as Sxy =

 |Γ (x) Γ (y)|  . |Γ (x) Γ (y)|

(13)

L. Yin et al. / Physica A 482 (2017) 699–712

703

3.6. Sørensen index This index is mainly used for ecological community data [81], which is defined as Sxy =

2 × | Γ ( x)



Γ (y)|

k(x) + k(y)

.

(14)

3.7. Hub Promoted Index (HPI) This index is proposed to quantify the topological overlap of pairs of substrates in metabolic networks [82], defined as Sxy =

2 × |Γ (x)

Γ (y)| . min(k(x), k(y))



(15)

3.8. Leicht–Holme–Newman Index (LHN) This index assigns high similarity to node pairs that have several common neighbors compared not to the possible maximum, but to the expected number of such neighbors [83]. It can be defined as Sxy =

2 × |Γ (x)

Γ (y)| . k(x) × k(y)



(16)

3.9. Preferential Attachment Index (PAI) The mechanism of preferential attachment index [15] can be used to generate evolving scale-free networks (i.e., networks with power-law degree distributions), where the probability that a new link is connected to the node x is proportional to k(x). A similar mechanism can also lead to scale-free networks without growth, where at each time step, an old link is removed and a new link is generated. The probability this new link is connecting x and y is proportional to k(x) × k(y) [84]. Motivated by this mechanism, a corresponding similarity index can be defined as Sxy = k(x) × k(y).

(17)

3.10. Measurement index in weighted networks However, most of the natural networks are weighed by some built-in attributes, such as the communication frequency between two friends in social networks, the carbon flow between species in food webs or the amount of traffic load along connections in transportation networks. Here, we use three common link prediction method in weighted networks, named WCN, WAA, and WRA as the benchmarks [85]. In fact, WCN, WAA and WRA is extended similarity indices of CN, AA, and RA in weighted networks respectively. In addition, the weighted algorithms with a free parameter significantly improved the performance of previously weighted methods. Namely,

w(x, z )α + w(z , y)α ,

(18)

WAA Sxy =

 w(x, z )α + w(z , y)α , log(1 + s(z )) z ∈Oxy

(19)

WRA Sxy =

 w(x, z )α + w(z , y)α s(z ) z ∈Oxy

(20)

WCN Sxy =

 z ∈Oxy

where the O xy is the set of common neighbors of node pair (x, y), the w(x, z ) is the weight of the link between x and z, α and s(x) = z ∈Γ (x) w(x, z ) . Moreover, when α = 0, the s(x) is the degree of node x, and the indices degenerate to the unweighted cases. When α = 1, the indices are equivalent to the simply weighted indices. Generally, the optimal values of α are smaller than 1 in most of the weighted networks [86,87]. 4. Proposed method In this section, we shall show how to build our method. Given an undirected network G(V , E ), where V denotes the set of nodes and E denotes the set of links. Moreover, this paper uses sxy to denotes the similarity between node x and node y, specifically, the higher sx,y , the higher similarity. Recall that there are two kinds of similarity between two nodes. On the one hand, the larger the number of the sharing neighbors the higher the similarity of two nodes, this is consistent with our intuition since people share the same hot followers (i.e. who has a great degree) do not mean they are similar. However, if

704

L. Yin et al. / Physica A 482 (2017) 699–712

Fig. 1. The flow of computing the similarity between two nodes.

Fig. 2. For any pairs of two nodes in the networks, we stain them with red color. The green nodes denote the common neighbors they share and the rest of the nodes in the network are non-sharing nodes which are in blue color. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

people follow the same guy with a low degree, maybe they share the same unique hobby. On the other hand, we need to consider the attributes of the nodes (e.g. the degree of the nodes), one reason for consideration is that if a person who is an avid reader of books and he did read kinds of books including science fiction, however, another person is a science fiction fan and he only read fiction books, maybe both of them have read The Three Bodies Problem, but they have lower similarity than two science fiction fans intuitively (see Figs. 1 and 2). Therefore, this paper considers both the local structure similarity and nodes attribute similarity to measure the similarity between two nodes. Moreover, proposed method is based on D–S evidence theory, called Evidential Measurement (EM). Assume every node behaves like a real person and all adjacent nodes are its friends, we may ask who is its most trusted friend? By intuition, if one of your friends has the most frequent communication with you, it probably is your most trusted friend. In this paper, we use the weight to denote the levels of trust. Base on the local information of each node, we could build the basic probability assignment (BPA) to denote the level of trust. Namely,

wk wi

m(vk ) = 

and

vi ∈Θ

n 

m(vk ) = 1

(21)

k

where Θ denotes the frame of discernment (FOD) and vi is the focal element in this BPA, wi is the weight of link to the node vi . After above procedure, each node becomes a mass function and assign its own belief to each adjacent nodes. Now, we can calculate the similarity between two indirectly connected nodes. SiEM ,j =

ϕi,j  φz z ∈Γ (i) Γ (j) 

(22)

where φ represents the similarity caused by structure and ϕ represents the similarity caused by attributes. φ denotes the size of FOD of their common neighbors which shared by node i and node j. Γ (i) denotes the set of neighbors of i and ϕ is

L. Yin et al. / Physica A 482 (2017) 699–712

705

Algorithm 1 The algorithm of evidential measurement. Input: The adjacency matrix of train set T ; Output: The similarity matrix S of each pair of two nodes; 1: Calculating the degree D of each node; 2: Initialize the similarity matrix S to null matrix; 3: for node i from i = 1 to i = n do 4: for node j from j = 1 to j = n do 5: % The procedure of calculating the structural similarity φ . 6: Calculate the number of sharing nodes Nij between node i and j; 7: Assign the reciprocal logarithm of Nij to φ ; 8: % The procedure of calculating the attribute similarity ϕ . 9: Get the degree Di and Dj of two selected nodes i j; 10: Choose an appropriate F as the fusion function; 11: Assign F (x) with the parameter as Nij2 / (Di × Dj ) to ϕ ; 12: Sij ← Sij + (ϕ / φ ); 13: j←j+1 14: end for 15: i ← i + 1; 16: end for defined as N2



ϕi,j = F

 (23)

Di × Dj

where F is a threshold function and N is the number of their sharing neighbors. Di and Dj is the degree of node i and j, respectively. In this paper, we choose the sigmoid function s(x) to be our fusion function F . s(x) =

1 1 + e−a(x−c )

.

(24)

As shown in the pseudo-code, EM considers both node attributes and network structures, the computational cost is theoretically high. However, this problem could be overcome by using matrix operation instead of for − loop to reduce the actual operation time. Theorem. Given a undirected network G(V,E), the evidential measure is symmetry, namely si,j = sj,i .

(25)

Here is a simple example to illustrate the idea of proposed method. Considering a network, we selected three nodes named node 1, node 2, node 3, respectively. Note that their degree are different and the three nodes share the same adjacent node which is node 4. The Si,j denotes the similarity between node i and node j.

 S1,2 = f

10



1

×

2



4



 ×

1 3

+

1 2



= 0.1929

1

× 1 × = 0.0772 3   1 1 =f × 1 × = 0.1069

S1,3 = f S2,3

2

10 4

3

where the threshold function f denotes the sigmoid function and the parameters of the sigmoid are: a = 3, c = 0.5. It is obvious that S1,3 is lower than S2,3 although both node 1 and node 2 share the same neighbor with node 3. A reason for this phenomenon is the fact that evidential measure acts in a manner to punish large-degree node in both attribution similarity and local structure similarity. In the next section, we shall show the accuracy of proposed method in both unweighted networks and weighted networks (see Fig. 3). 5. Results The comparison of EM index with other nine indices in nine networks is summarized in Table 2. EM index generally outperforms the other nine indices in link prediction based on AUC and precision measure. The highest accuracy of all nine networks in each line is emphasized in bold. For the other nine measures, from CN to PAI, if the degree of nodes is evenly distributed in a network, it is difficult to show the accuracy of the measures (e.g. the accuracy of all measure are pretty low or

706

L. Yin et al. / Physica A 482 (2017) 699–712

Fig. 3. Study case.

Table 2 The accuracy of EM compare with other nine measures under AUC and Precision (Precision values are in brackets). The mean of AUC and Precision values are obtained by the mean of 100 independent realizations. The entries corresponding to the highest value among these measures are emphasized in bold. Network

CN

Salton

Jaccard

Sorens

HPI

LHN

AA

RA

PAI

EM

C.elegans

0.8478 (0.1211)

0.7979 (0.0134)

0.7904 (0.0126)

0.7911 (0.0126)

0.8058 (0.0111)

0.7234 (0.0106)

0.8663 (0.1358)

0.8701 (0.1266)

0.7613 (0.0688)

0.8764 (0.1267)

Jazz

0.9566 (0.8140)

0.9663 (0.7658)

0.9625 (0.7544)

0.9627 (0.7544)

0.9478 (0.0126)

0.9036 (0.060)

0.9639 (0.8404)

0.9723 (0.8226)

0.7707 (0.1876)

0.9745 (0.8533)

Metabolic

0.9241 (0.1786)

0.8158 (0.0834)

0.7764 (0.0642)

0.7765 (0.0642)

0.9161 (0.1633)

0.7398 (0.0440)

0.9556 (0.2494)

0.9606 (0.3067)

0.8232 (0.1514)

0.9615 (0.3054)

NS

0.9911 (0.8707)

0.9914 (0.4922)

0.9912 (0.0842)

0.9914 (0.0842)

0.9912 (0.0126)

0.9908 (0.2376)

0.9914 (0.9722)

0.9915 (0.9694)

0.7298 (0.0031)

0.9940 (0.9620)

PB

0.9223 (0.3988)

0.8786 (0.0021)

0.8759 (0.0021)

0.8757 (0.0003)

0.8538 (0.1633)

0.7617 (0.0002)

0.9262 (0.3718)

0.9265 (0.2733)

0.9093 (0.1167)

0.9275 (0.3133)

Power

0.6269 (0.1064)

0.6272 (0.0267)

0.6267 (0.0400)

0.6264 (0.0400)

0.6267 (0.0033)

0.6266 (0.0267)

0.6268 (0.1033)

0.6269 (0.0741)

0.5788 (0.0433)

0.6209 (0.1067)

Router

0.6522 (0.0910)

0.6514 (0.0004)

0.6518 (0.0021)

0.6522 (0.0003)

0.6512 (0.0005)

0.6511 (0.1633)

0.6526 (0.1052)

0.6530 (0.0846)

0.9550 (0.0196)

0.6606 (0.0620)

USAir

0.9532 (0.5820)

0.9249 (0.0004)

0.9144 (0.0102)

0.9142 (0.0101)

0.8813 (0.0037)

0.7789 (0.0067)

0.9655 (0.6232)

0.9721 (0.6348)

0.9087 (0.4782)

0.9782 (0.6033)

Yeast

0.7348 (0.1746)

0.7323 (0.0044)

0.7328 (0.0033)

0.7329 (0.0033)

0.7329 (0.0037)

0.7318 (0.0012)

0.7359 (0.2244)

0.7358 (0.1826)

0.8872 (0.0333)

0.7440 (0.2330)

Table 3 The accuracy of EM compare with the optimal parameter α of three weighted measurements under AUC and precision. The mean of AUC and precision values are obtained by the mean of 100 independent realizations. The entries corresponding to the highest value among these measures are emphasized in bold. Network

Adolescent King James US AirPorts NS USAir

WCN

WAA

WRA

EM

AUC

Precision

AUC

Precision

AUC

Precision

AUC

Precision

0.7734 0.9854 0.9677 0.9933 0.9774

0.2825 0.4570 0.9625 0.9559 0.6375

0.7724 0.9851 0.9654 0.9942 0.9685

0.2995 0.5426 0.9635 0.9163 0.6465

0.7738 0.9857 0.9673 0.9935 0.9733

0.3365 0.7040 0.9415 0.9828 0.6680

0.7814 0.9862 0.9778 0.9940 0.9782

0.2700 0.9200 0.8433 0.9620 0.6033

pretty high). However, the difference between two measure becomes significant when a network has a very low clustering coefficient, based on our results, one reason for this phenomenon is that the number of neighbors matters much. Therefore, the network of US-Air and political blogs are typical networks for measure the accuracy of the prediction model. The result of weighted link prediction method is shown in Table 3. Moreover, the relative similarity of nodes between RA and EM index as illustrated in Fig. 4. From the Table 2, we can see that RA and EM index are well-matched in the unweighted network. However, the situation is different in the weighted network as shown in Table 3. Notably, recall that the weighted indices (e.g., WCN, WAA and WRA) are affected by the free parameter α according to Eqs. (18)–(20), thus we use each optimal value to indicate the accuracy of each measurement, the numerical results are given in Fig. 5. We can see that the weak links actually play a more important role than the strong links, and most of the indices can reach their optimal value in the interval [−1.5,1.5]. Theoretically, the proposed EM consider both the degree of nodes itself and the degree of its sharing neighbors, and it does work well for a bunch of real networks especially the weighted network. Moreover, we found that the sigmoid function which is used for infusing the degree of selected nodes (as we know, the sigmoid function can map from real number field to the interval [0, 1]) and we use it to calculate the similarity between nodes. Due to EM index is multiplied by a sigmoid function of the two nodes degree, since the sigmoid of the two nodes degree is a number between 0.18 and 0.81, as the node degree increase, the

L. Yin et al. / Physica A 482 (2017) 699–712

707

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.7 RA

0.6 0.5

RA

0.9

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.4

0.1 0.1

0 0.2

0.3

0.4

0.5 0.6 EM

0.7

0.8

0.9

1

0 0

0.1

0.2

0.3

0.4

0.5 EM

0.6

0.7

0.8

0.9

1 1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.3

0.2

0.2

0.3

0.1

0.1

0.2

0

0 0.2

0.3

0.4

0.5 EM

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5 EM

0.6

0.7

0.8

0.9

0.1 0.1

1

1

1

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.6

0.6

RA

0.5

RA

1

0.6

0.3

0.4

0.5 EM

0.6

0.7

0.8

0.9

0.4

0.9

0.7

0.2

0.5

0.4

0.3

0.1

0.1

0.6

0.4

0

0

0.7 RA

RA

1

1

0.5

0.2

0.3

0.4

0.5 0.6 EM

0.7

0.8

0.9

0.2

0.3

0.4

0.5

0.7

0.8

0.9

0.5

0.4 0.4

0.4

0.2

0.3

0.3

0.1

0.2

0.2

0.3

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.1

0.6

Fig. 4. This is the relative similarity of nodes between RA index and EM index. In order to eliminate the effect caused by the difference of absolute value, we set the percentage values in x-axis and y-axis as the top percentage ranking values of EM index and RA index, respectively. As we can see, there exist some ranking dots of EM lower than the RA index (in the upper left side), one possible reason for this phenomenon is that EM acts in a manner to punish those large-degree neighbors more than RA index.

result of this sigmoid function merge to 0.18 (if the node degrees of both nodes are more than 9). It means that the impact of this sigmoid function for calculating the similarity decrease as the nodes degree increase. In order to get better performance, we need to change the parameters in sigmoid function according to different networks since the relationship between the large-degree nodes. The reason is twofold. First, if the degree of the network is evenly distributed, it will get harder to distinguish the similarity if we set the parameter c too low or too high in sigmoid function since all the nodes are gathered together. Next, the parameter a in sigmoid function behaves like the line of demarcation, if the similarity score is close to the parameter a, it is difficult to balance the similarity degree after the fusion process. One possible solution for this choke point is using machine learning to optimize those parameters while the drawback is the higher cost of computation complexity.

6. View From Shannon entropy Recall that we transfer each node in the network into a mass function by assigning belief to each adjacent node and each belief of adjacent node sums up to 1. It is consistent with Bayes structure in probability theory since the size of FOD equals to the number of adjacent nodes. Moreover, the belief degree is corresponding to the weight of each node (i.e., the link with higher weight means higher belief degree in a mass function), for the unweighted network, the belief is evenly distributed to each adjacent node. In our real life, if one man gets into trouble and he has many friends to ask for help (i.e., he is a social butterfly with a big degree in the social network). Therefore, it is difficult to predict which friend he would choose for help. For the worst case, each person is an ordinary friend to him (i.e., all links have the same weight). However, the same case for another

708

L. Yin et al. / Physica A 482 (2017) 699–712

Fig. 5. Precision as a function of α for Adolescent, King James, Net Science, and US Airs. Each data point is obtained by averaging over 100 realizations, and each of which corresponds to an independent division of the training set and probe set. Optimal values of the parameter α subject to the highest precisions. Therefore, the optimal accuracy of each weighted indices are obtained according to the respective α .

Fig. 6. Both A and C are unweighted networks, B is weighted network.

person who just has only one friend, it is obvious that he would ask his only friend for help. One possible reason for this predictability is the fact that the local structure of former person has higher entropy than the later one. In the field of information science, when we want to analyze the uncertainty of an information, the first question is: is the information quantifiable? Shannon proposed information entropy [43] to represent the uncertain degree of an information. Moreover, the larger the entropy, the more uncertain the information. Here, the information is implicated inside the local structure in the network and represented by mass functions. Namely, H =−

N 

θi logb θi

(26)

i =1

where N is the number of basic states in a system, θi is the probability of state i appears satisfying base of logarithm. Moreover, when b = 2, the unit of information entropy is bit.

W

i =1

pi = 1, and b is the

L. Yin et al. / Physica A 482 (2017) 699–712

(a) Unpredictability as a function of degree of node in unweighted networks which illustrates the uncertainty (unpredictability) increases with the degree of the node.

709

(b) This is a example for a node which has two adjacent nodes, the uncertainty of this node reach the maximum when the belief is evenly distributed.

Fig. 7. The sketch map of predictability responds to changes of the node degree.

The related research has found that the predictability is also a kind of property of the network itself [31]. In this paper, we define the uncertainty of the local structure as the unpredictability of nodes. Based on the unpredictability of nodes, we can know more information about the local structure and go further with our link prediction methods. Moreover, it is natural to transfer each local structure of nodes into a mass function based on D–S theory and we will illustrate the characteristic of Shannon entropy using three simple networks as shown in Fig. 6. The calculation of entropy for these networks is simple. However, before the calculation process, we need to transfer the structure of node into a mass function. mA (i) =

1 10

,

∀i ∈ Θ 4

mB (a) =

1+2+4 mC ( Θ ) = 1

=

4 7

,

m 2 ( b) =

1 1+2+4

=

1 7

,

m2 (c ) =

2 1+2+4

=

2 7

where Θ denotes FOD in mC . Note that all the mass functions are in Bayes structure, the calculation of Shannon entropy is the same as the probability distribution. Namely, n=10

 1

1 × log = 3.3219 10 10 n =1 4 1 1 2 2 4 × log + × log + × log = 1.3787 HB = 7 7 7 7 7 7 HC = 1 × log 1 = 0. HA = −

The result is consistent with our intuition, m1 has the highest unpredictability (or uncertainty) since the belief is evenly distributed in each focal element, and m3 has the lowest unpredictability since node 3 has the only one adjacent node. As we have discussed before, the greater the information entropy, the more close the weight distribution of each adjacent nodes and the node is more uncertain. Assume node i and j have the same number of neighbor n, while the distributions of the weight of adjacent nodes are different. One has a higher variance and another is relative low. Theorem 1. The unpredictability U is corresponding to the degree of node in unweighted networks, namely U =−

n  1 i

k

× log k = log k

where the k denotes the degree of the node.

(27)

710

L. Yin et al. / Physica A 482 (2017) 699–712

Fig. 8. The degree of link predictability of each network. The darker the color the higher its unpredictability. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Proof. Given a unweighted network G and a node with degree k. While another node connects to this node after the first uncertain measurement process, namely Uk+1 − Uk = log2 (k + 1) − log2 k = log2

k+1 k

> 0.

(28)

Theorem 2. The unpredictability U of node is corresponding to the relative weight of each adjacent node in weighted networks, namely Uh − Ul < 0

(29)

where Uh denotes the mass function has a higher variance and Ul denotes a mass function that has lower variance. Proof. Let U =−

N 

θi log θi and

 (Fi ) = 1.

i=1

(30)

i

Then the Lagrange function can be defined as U0 = −

n 

 pi ln pi + λ

n 

 pi − 1 .

(31)

i =1

i =1

Now we can calculate the gradient,

∂ U0 = − ln pi − 1 + λ = 0, ∂ m(Fi )

i = 1, 2, 3 . . . n.

(32)

From the equation − ln pi − 1 + λ = 0, we can get pi = exp(λ − 1),

i = 1, 2, 3 . . . n.

(33)

Thus, p1 = p2 = · · · = pn =

1 n

.

(34)

Since Uh has a higher variance and Ul has lower variance. We can get Uh − Ul < 0. As shown in Fig. 7(b), we can find that the uncertainty reaches the maximum when the belief is evenly distributed, i.e., this person does not have any good friends in the social network. Let us go back to what interests us. The result of the link predictability of each network is shown in Fig. 8. The total unpredictability of a network is obtained by taking the mean of the unpredictability of all nodes in the network. 7. Conclusion In this paper, we found the similarity between two nodes in the network is twofold: one part is attribution of nodes, another part is structural perturbation of local structure named structural similarity. Then this paper proposed a fancy measurement called evidential measure (EM) based on Dempster–Shafer theory. Moreover, we empirically compared some

L. Yin et al. / Physica A 482 (2017) 699–712

711

common link prediction algorithms with real network data. EM could combine attribution similarity and structural similarity of the network. Our result shows that proposed method has better accuracy when most of the large-degree nodes are sparsely connected and seldom share their neighbors, proposed method punishes large-degree node in both attribution similarity and structural similarity. A reason for this phenomenon is the fact that proposed method punish the large-degree node by both attribution and structure sides. Moreover, this paper proposed a novel method in order to study the uncertainty of prediction. This paper measure the unpredictability of some common networks, our result show the larger the average degree the more unpredictable to this network and this is consistent with the phenomenon in the real networks. Acknowledgments The authors greatly appreciate the reviews’ suggestions and the editor’s encouragement. The work is partially supported by National Natural Science Foundation of China (Grant Nos. 61174022, 61573290, 61503237). References [1] W.-J. Li, Y.-Y. Xu, Q. Dong, J.-L. Zhou, Y. Fu, Tadb: A time-aware diffusion-based recommender algorithm, Internat. J. Modern Phys. C 26 (09) (2015) 1550102. [2] S. Wang, Y. Du, Y. Deng, A new measure of identifying influential nodes: Efficiency centrality, Commun. Nonlinear Sci. Numer. Simul. 47 (2017) 151–163. [3] D. Chen, L. Lü, M.-S. Shang, Y.-C. Zhang, T. Zhou, Identifying influential nodes in complex networks, Physica A 391 (4) (2012) 1777–1787. http://dx.doi.org/10.1016/j.physa.2011.09.017. [4] S. Scellato, A. Noulas, C. Mascolo, Exploiting place features in link prediction on location-based social networks, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2011, pp. 1046–1054. [5] D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks, J. Am. Soc. Inf. Sci. Technol. 58 (7) (2007) 1019–1031. [6] M. Al Hasan, V. Chaoji, S. Salem, M. Zaki, Link prediction using supervised learning, in: SDM06: Workshop on Link Analysis, Counter-Terrorism and Security, 2006. [7] R. Guimerà, M. Sales-Pardo, Missing and spurious interactions and the reconstruction of complex networks, Proc. Natl. Acad. Sci. 106 (52) (2009) 22073–22078. [8] C.J. Zhang, A. Zeng, Prediction of missing links and reconstruction of complex networks, Internat. J. Modern Phys. C 27 (10) (2016). [9] F.W. Peng Zhang, Xiang Wang, J.X. An Zeng, Measuring the robustness of link prediction algorithms under noisy environment, Sci. Rep. 6 (2016). [10] L. Peng, W. Chong, L. Yongli, Link prediction measures considering different neighborseffects and application in social networks, Internat. J. Modern Phys. C (2016) 1750033. [11] R.R. Sarukkai, Link prediction and path analysis using Markov chains, Comput. Netw. 33 (1) (2000) 377–386. [12] J. Zhu, J. Hong, J.G. Hughes, Using Markov chains for link prediction in adaptive web sites, in: Soft-Ware 2002: Computing in an Imperfect World, Springer, 2002, pp. 60–73. [13] R. Popescul, L.H. Ungar, Statistical relational learning for link prediction, in: Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003. [14] Z. Liu, Q.-M. Zhang, L. Lü, T. Zhou, Link prediction in complex networks: A local Naive Bayes model, Europhys. Lett. 96 (4) (2011) http://dx.doi.org/10.1209/0295-5075/96/48007. [15] Y. Liu, C. Zhao, X. Wang, Q. Huang, X. Zhang, D. Yi, The degree-related clustering coefficient and its application to link prediction, Physica A 454 (2016) 24–33. http://dx.doi.org/10.1016/j.physa.2016.02.014. [16] Z. Wu, Y. Lin, J. Wang, S. Gregory, Link prediction with node clustering coefficient, Physica A 452 (2016) 1–8. [17] A. Andalib, S.M. Babamir, A class-based link prediction using distance dependent Chinese restaurant process, Physica A 456 (2016) 204–214. [18] F. Guo, Z. Yang, T. Zhou, Predicting link directions via a recursive subgraph-based ranking, Physica A 392 (16) (2013) 3402–3408. http://dx.doi.org/10.1016/j.physa.2013.03.025. [19] T. Zhou, L. Lü, Y.-C. Zhang, Predicting missing links via local information, Eur. Phys. J. B 71 (4) (2009) 623–630. http://dx.doi.org/10.1140/epjb/e200900335-8. [20] P. Pei, B. Liu, L. Jiao, Link prediction in complex networks based on an information allocation index, Physica A (2016). [21] K. ke Shang, M. Small, W. sheng Yan, Link direction for link prediction, Physica A 469 (2017) 767–776. [22] J. Liu, B. Xu, X. Xu, T. Xin, A link prediction algorithm based on label propagation, J. Comput. Sci. 16 (2016) 43–50. [23] A. Clauset, C. Moore, M.E. Newman, Hierarchical structure and the prediction of missing links in networks, Nature 453 (7191) (2008) 98–101. [24] Z. Wang, Y. Wu, Q. Li, F. Jin, W. Xiong, Link prediction based on hyperbolic mapping with community structure for complex networks, Physica A 450 (2016) 609–623. [25] A. Grabowski, N. Kruszewska, R. Kosiński, Dynamic phenomena and human activity in an artificial society, Phys. Rev. E 78 (6) (2008) 066110. [26] X. Feng, J. Zhao, K. Xu, Link prediction in complex networks: a clustering perspective, Eur. Phys. J. B 85 (1) (2012) 1–9. [27] W. Cui, C. Pu, Z. Xu, S. Cai, J. Yang, A. Michaelson, Bounded link prediction in very large networks, Physica A 457 (2016) 202–214. [28] C. Fan, Z. Liu, X. Lu, B. Xiu, Q. Chen, An efficient link prediction index for complex military organization, Physica A 469 (2017) 572–587. [29] A.P. Dempster, Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Statist. (1967) 325–339. [30] G. Shafer, et al., A Mathematical Theory of Evidence, Vol. 1, Princeton University Press, Princeton, 1976. [31] C. Ma, T. Zhou, H.-F. Zhang, Playing the role of weak clique property in link prediction: A friend recommendation model, Sci. Rep. 6 (2016) http://dx.doi.org/10.1038/srep30098. [32] L. Lü, L. Pan, T. Zhou, Y.-C. Zhang, H.E. Stanley, Toward link predictability of complex networks, Proc. Natl. Acad. Sci. U.S.A. 112 (8) (2015) 2325–2330. http://dx.doi.org/10.1073/pnas.1424644112. [33] H. Zhang, D. Wei, Y. Hu, X. Lan, Y. Deng, Modeling the self-similarity in complex networks based on coulomblaw, Commun. Nonlinear Sci. Numer. Simul. 35 (2016) 97–104. [34] Z. Gao, Y. Shi, S. Chen, Measures of node centrality in mobile social networks, Internat. J. Modern Phys. C 26 (09) (2015) 1550107. [35] G. Qi, C.C. Aggarwal, T.S. Huang, Breaking the barrier to transferring link information across networks, IEEE Trans. Knowl. Data Eng. 27 (7) (2015) 1741–1753. http://dx.doi.org/10.1109/TKDE.2014.2313871. [36] B. Moradabadi, M.R. Meybodi, Link prediction based on temporal similarity metrics using continuous action set learning automata, Physica A 460 (2016) 361–373. [37] H. Liao, A. Zeng, Y.C. Zhang, Predicting missing links via correlation between nodes, Physica A 436 (1) (2015) 216–223. [38] L. Lü, T. Zhou, Link prediction in complex networks: A survey, Physica A 390 (6) (2011) 1150–1170. http://dx.doi.org/10.1016/j.physa.2010.11.027. [39] L.A. Adamic, E. Adar, Friends and neighbors on the web, Soc. Networks 25 (3) (2003) 211–230. [40] F. Du, Q. Xuan, T. Wu, Empirical analysis of attention behaviors in online social networks, Internat. J. Modern Phys. C 21 (7) (2010) 955–971. [41] E. Sherkat, M. Rahgozar, M. Asadpour, Structural link prediction based on ant colony approach in social networks, Physica A 419 (2015) 80–94.

712

L. Yin et al. / Physica A 482 (2017) 699–712

[42] G. Qi, C.C. Aggarwal, T. Huang, Link prediction across networks by biased cross-network sampling, in: 2013 29th IEEE International Conference on Data Engineering, ICDE 2013, Vol. 00, 2013, pp. 793–804. http://dx.doi.org/doi.ieeecomputersociety.org/10.1109/ICDE.2013.6544875. [43] C.E. Shannon, A mathematical theory of communication, ACM SIGMOBILE Mob. Comput. Commun. Rev. 5 (1) (2001) 3–55. [44] Z. Xu, C. Pu, J. Yang, Link prediction based on path entropy, Physica A 456 (2016) 294–301. http://dx.doi.org/10.1016/j.physa.2016.03.091. [45] Z. Xu, C. Pu, R.R. Sharafat, L. Li, J. Yang, Entropy-based link prediction in weighted networks, Chin. Phys. B 26 (1) (2017) 18902. http://dx.doi.org/10. 1088/1674-1056/26/1/018902. [46] K. ke Shang, W. sheng Yan, M. Small, Evolving networksusing past structure to predict the future, Physica A 455 (2016) 120–135. http://dx.doi.org/ 10.1016/j.physa.2016.02.067. [47] F. Ye, J. Chen, Y. Li, J. Kang, Decision-making algorithm for multisensor fusion based on Grey relation and DS evidence theory, J. Sens. (2016) http://dx.doi.org/10.1155/2016/3954573. [48] X. Zhang, Y. Deng, F.T.S. Chan, A. Adamatzky, S. Mahadevan, Supplier selection based on evidence theory and analytic network process, Proc. Inst. Mech. Eng. B 230 (3) (2016) 562–573. http://dx.doi.org/10.1177/0954405414551105. [49] X. Deng, Q. Liu, Y. Deng, Matrix games with payoffs of belief structures, Appl. Math. Comput. 273 (2016) 868–879. [50] X. Zhang, Y. Deng, F.T. Chan, S. Mahadevan, A fuzzy extended analytic network process-based approach for global supplier selection, Appl. Intell. 43 (4) (2015) 760–772. [51] W. Jiang, C. Xie, M. Zhuang, Y. Shou, Y. Tang, Sensor data fusion with z-numbers and its application in fault diagnosis, Sensors 16 (9) (2016) http://dx.doi.org/10.3390/s16091509. [52] B. Kang, Y. Hu, Y. Deng, D. Zhou, A new methodology of multicriteria decision-making in supplier selection based on Z-numbers, Math. Probl. Eng. (2016) http://dx.doi.org/10.1155/2016/8475987. [53] X. Zhou, X. Deng, Y. Deng, S. Mahadevan, Dependence assessment in human reliability analysis based on d numbers and ahp, Nucl. Eng. Des. 313 (2017) 243–252. [54] H. Mo, Y. Deng, A new aggregating operator in linguistic decision making based on d numbers, Internat. J. Uncertain. Fuzziness Knowledge-Based Systems 24 (6) (2016) 831–846. [55] X. Zhou, Y. Shi, X. Deng, Y. Deng, D-DEMATEL: A new method to identify critical success factors in emergency management, Saf. Sci. 91 (2017) 93–104. [56] Y. Li, J. Chen, F. Ye, D. Liu, The improvement of DS evidence theory and its application in IR/MMW target recognition, J. Sens. (1903792) (2016). [57] J. Wang, Y. Hu, F. Xiao, X. Deng, Y. Deng, A novel method to use fuzzy soft sets in decision making based on ambiguity measure and Dempster-Shafer theory of evidence: An application in medical diagnosis, Artif. Intell. Med. 69 (2016) 1–11. [58] J. Liu, F. Lian, M. Mallick, Distributed compressed sensing based joint detection and tracking for multistatic radar system, Inform. Sci. 369 (2016) 100–118. [59] Y. Deng, Deng entropy, Chaos Solitons Fractals 91 (2016) 549–553. [60] Y. Du, X. Lu, X. Su, Y. Hu, Y. Deng, New failure mode and effects analysis: An evidential downscaling method, Qual. Reliab. Eng. Int. 32 (2) (2016) 737–746. [61] X. Ning, J. Yuan, X. Yue, Uncertainty-based optimization algorithms in designing fractionated spacecraft, Sci. Rep. 6 (2016) 22979. [62] Y. Hu, F. Du, H.L. Zhang, Investigation of unsteady aerodynamics effects in cycloidal rotor using RANS solver, Aeronaut. J. 120 (1228) (2016) 956–970. http://dx.doi.org/10.1017/aer.2016.38. [63] X. Ning, T. Zhang, Y. Wu, P. Zhang, J. Zhang, S. Li, X. Yue, J. Yuan, Coordinated parameter identification technique for the inertial parameters of noncooperative target, PLoS One 11 (4) (2016) e0153604. [64] W.-B. Du, X.-L. Zhou, O. Lordan, Z. Wang, C. Zhao, Y.-B. Zhu, Analysis of the Chinese airline network as multi-layer networks, Transp. Res. Part E: Logist. Transp. Rev. 89 (2016) 108–116. [65] R. Zhang, X. Ran, C. Wang, Y. Deng, Fuzzy evaluation of network vulnerability, Qual. Reliab. Eng. Int. 32 (5) (2016) 1715–1730. [66] D.J. Watts, S.H. Strogatz, Collective dynamics of small-worldnetworks, Nature 393 (6684) (1998) 440–442. [67] C. Von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, P. Bork, Comparative assessment of large-scale data sets of protein–protein interactions, Nature 417 (6887) (2002) 399–403. [68] P.M. Gleiser, L. Danon, Community structure in jazz, Adv. Complex Syst. 6 (04) (2003) 565–573. [69] H. Jeong, B. Tombor, R. Albert, Z. Oltvai, A. Barabasi, The large-scale organization of metabolic networks, Nature 407 (6804) (2000) 651–654. [70] J. Duch, A. Arenas, Community detection in complex networks using extremal optimization, Phys. Rev. E 72 (2) (2005) 027104. [71] D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, et al., Topological structure analysis of the protein–protein interaction network in budding yeast, Nucleic Acids Res. 31 (9) (2003) 2443–2450. [72] N. Spring, R. Mahajan, D. Wetherall, T. Anderson, Measuring isp topologies with rocketfuel, IEEE/ACM Trans. Netw. 12 (1) (2004) 2–16. [73] S.D. Reese, L. Rutigliano, K. Hyun, J. Jeong, Mapping the blogosphere professional and citizen-based media in the global news arena, Journalism 8 (3) (2007) 235–261. [74] L. Lü, D. Chen, X. Ren, Q. Zhang, Y. Zhang, T. Zhou, Vital nodes identification in complex networks, Phys. Rep. 650 (2016) 1–63. http://dx.doi.org/10. 1016/j.physrep.2016.06.007. [75] J. Moody, Peer influence groups: identifying dense clusters in large networks, Social Networks 23 (4) (2001) 261–283. http://dx.doi.org/10.1016/ S0378-8733(01)00042-9. [76] Konect, http://konect.uni-koblenz.de/networks/, (2015). [77] K. ke Shang, M. Small, W. sheng Yan, Fitness networks for real world systems via modified preferential attachment, Physica A 474 (2017) 49–60. http://dx.doi.org/10.1016/j.physa.2017.01.066. [78] M. Newman, Networks: An introduction, Astron. Nachr. 327 (8) (2010) 741–743. [79] G. Salton, M.J. McGill, Introduction to modern information retrieval. [80] L. Hamers, Y. Hemeryck, G. Herweyers, M. Janssen, H. Keters, R. Rousseau, A. Vanhoutte, Similarity measures in scientometric research: the jaccard index versus salton’s cosine formula, Inf. Process. Manag. 25 (3) (1989) 315–318. [81] T. Sørensen, A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons, Biol. Skr. 5 (1948) 1–34. [82] E. Ravasz, A.L. Somera, D.A. Mongru, Z.N. Oltvai, A.-L. Barabási, Hierarchical organization of modularity in metabolic networks, Science 297 (5586) (2002) 1551–1555. [83] E.A. Leicht, P. Holme, M.E. Newman, Vertex similarity in networks, Phys. Rev. E 73 (2) (2006) 026120. [84] A.-L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (5439) (1999) 509–512. [85] B. Zhu, Y. Xia, Link prediction in weighted networks: A weighted mutual information model, PLoS One 11 (2) (2015) http://dx.doi.org/10.1371/journal. pone.0148265. [86] M.S. Granovetter, The strength of weak ties, Am. J. Sociol. 78 (6) (1973) 1360–1380. [87] L. Lü, T. Zhou, Link prediction in weighted networks: The role of weak ties, Europhys. Lett. 89 (1) (2010) 18001.