Accepted Manuscript Extended resource allocation index for link prediction of complex network Shuxin Liu, Xinsheng Ji, Caixia Liu, Yi Bai PII: DOI: Reference:
S0378-4371(17)30199-1 http://dx.doi.org/10.1016/j.physa.2017.02.078 PHYSA 18058
To appear in:
Physica A
Received date: 23 April 2016 Revised date: 31 January 2017 Please cite this article as: S. Liu, X. Ji, C. Liu, Y. Bai, Extended resource allocation index for link prediction of complex network, Physica A (2017), http://dx.doi.org/10.1016/j.physa.2017.02.078 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Highlights (for review)
1) 2) 3) 4) 5)
Similarity of endpoints depends on the potential resource interacted between them Resource transferred by common neighbors and non-common neighbors are considered It’s a self-adaptive similarity index by adjusting the resource of longer paths This method proposed is very suitable for large-scale networks It is well performed under two standard metrics AUC and precision
*Manuscript Click here to view linked References
Extended resource allocation index for link prediction of complex network
1
Extended resource allocation index for link prediction of complex network
2 Shuxin Liua, *, Xinsheng Jia, b, c, Caixia Liua, Yi Baia
3 4 a
5
National Digital Switching System Engineering and Technological R&D Center,
6
Zhengzhou 450002, P. R. China b
7
National Mobile Communications Research Laboratory, Southeast University,
8
Nanjing 211189, P. R. China c
9
Wireless Technology Institute, National Engineering Laboratory for Mobile Network
10 11
Security, Beijing 100876, P. R. China Abstract
12
Recently, a number of similarity-based methods have been proposed to predict the
13
missing links in complex network. Among these indices, the resource allocation index
14
performs very well with lower time complexity. However, it ignores potential resources
15
transferred by local paths between two endpoints. Motivated by the resource exchange
16
taking places between endpoints, an extended resource allocation index is proposed.
17
Empirical study on twelve real networks and three synthetic dynamic networks has
18
shown that the index we proposed can achieve a good performance, compared with
19
eight mainstream baselines.
20
PACS: 89.20.Ff; 89.75.Hc; 89.65.-s
21
Keywords: Link prediction; Complex network; Resource exchange; Similarity index
22 23
*
Corresponding author. Tel.: +8615939045554; E-mail:
[email protected];
[email protected]
1
Extended resource allocation index for link prediction of complex network
1
1. Introduction
2
With the continuous development of complex network theory, many complex
3
systems in nature and society are described as complex network [1-5]. As a research
4
hotspot of complex network, link prediction which aims to predict the likelihood that a
5
link exists between two nodes in complex networks [6-7], has attracted increasing
6
attention in recent years. Predicting missing link can help discover unknown
7
interactions in protein-protein interaction networks [8], recommend new friends for
8
users in online social media networks [9] and identify spurious connection in author-
9
paper bipartite network [10].
10
Plenty of link prediction methods have been proposed based on evolution
11
mechanisms [11-20]. Among these methods, similarity-based prediction methods,
12
especially topology-based similarity indices, have received close attention from
13
researchers due to their simpleness and effectiveness [21]. Similarity methods based on
14
topological structure assume that the similarity of two endpoints is positively correlated
15
with the number of paths or amount of resources transfered between them [15], which
16
can be classified as neighbor-based and path-based methods. There are many neighbor-
17
based similarity indices for link prediction such as the simplest Common Neighbors
18
(CN)[11] index counting the number of the common neighbors, the Adamic-Adar (AA)
19
[12]index and the resource allocation (RA) [13] index adding the node degree
20
information of the common neighbors. To address the limitations of the path
21
information used in neighbor-based indices, the Katz index [14], effective path index
22
[15], significant path index [16], Average Commute Time (ACT) [17], Cosine
23
Similarity Time (Cos+) [18] and SimRank [19] are proposed by considering global
24
topological information between endpoints. Most of these global indices are performing
2
Extended resource allocation index for link prediction of complex network
1
well in real networks, but not suitable for large networks due to their high time
2
complexity [21]. In order to achieve a compromise between complexity and
3
performance, Zhou et al. proposed a Local Path index (LP) [20] by adding the paths
4
with length 3 to CN index. Many empirical studies show that these indices considering
5
local paths can get a good performance with lower complexity in link prediction of real
6
networks. However, most similarity indices ignore the potential resource exchange
7
between endpoints by local path.
b
x i
y a
j
8 9
Fig. 1: The resource transferred by common neighbors with different topology.
10
In the real world, such as online social network, many resources such as hot topics
11
are disseminated between strangers through friends around, and the more times the
12
news spread between two strangers by their friends, the higher the likelihood that they
13
become friends. Take Fig.1 for example, there are two pairs of endpoints( x and y ,
14
i and j ), and nodes a and b are their common neighbors with the same node degree.
15
According to RA index, the likelihood that link Lxy exists is the same as that of link Lij ,
16
though the link Lxy is more likely to be exist in the reality (because there are more
17
possiable paths between x and y ). For real networks, resources in each node are divided
18
into several pieces and then flow to its neighbors when resources flow constantly[40].
19
Therefore, if a new hot topic is spread from node x and i at the same time, endpoint y
20
can receive more news or more resources due to “branches” of its common neighbor b .
3
Extended resource allocation index for link prediction of complex network
1
Obviously, besides the “trunks” of common neighbors ( Lby ) considered by RA index,
2
the resource exchange through longer paths also play an important role in describing the
3
similarity between two endpoints.
4
Based on above discussion, we proposed an extended resource allocation index
5
(ERA), which adds longer paths to RA index. With a parameter adjusting the amount of
6
resource transferred by longer paths for different networks, the ERA index measure the
7
similarity by the amount of resources interacted through common neighbors and non-
8
common-neighbors between two endpoints. Empirical study has shown that the ERA
9
index we proposed can improve the prediction accuracy, compared with eight
10
mainstream baselines on fifteen datasets.
11
The rest of paper is organized as follows: in Section 2, the extended resource
12
allocation index is introduced; in Section 3, the metric is described; in Section 4, 15
13
experimental datasets are described; in Section 5, eight mainstream baselines are
14
introduced and comparison of the results are discussed under two different standards
15
AUC and precision; finally, we make a conclusion.
16
2. The extended resource allocation index
17
Considering an unweighted undirected network G (V , E ) , V and E are the sets of
18
nodes and links respectively. A score s xy is assigned to each pair of nodes x and y ,
19
according to a given similarity index. This score can be a measure of similarity between
20
them and the score for each nonexistent link represents the likelihood that the link exists.
21
In RA index, when calculating the amount of resource transferred by common
22
neighbors between node x and y , it is assumed that the resources of common neighbor
23
allocated to each link are equal. However, in the real world, the longer the path
24
between two endpoints, the higher the uncertainty of resource allocated through the path.
4
Extended resource allocation index for link prediction of complex network
1
Therefore, considering the uncertainty of resource allocated through the longer path is
2
not the same for different real networks, we introduce a parameter to adjust the amount
3
of resource transferred by longer path (longer than 2). As shown in Fig.2, if node x has
4
one unit of resource, and will distribute it to node y through its neighbor v1 , the amount
5
of resource y received from x denotes as kv1 ( is the adjust parameter). Similarly, if
6
node y distributes one unit of resource to node x through its neighbor vn , the amount of
7
resource x received from y denotes as 1 k vn . In this paper, taking into account the time
8
complexity, we only consider the local paths between endpoints with length shorter than
9
or equal to 3. 1/kv1
x
10 11
v1
σ/kvn
x
v2
σ/kv1
other... nodes
other nodes
1/kvn
y
y
vn
Fig. 2: The amount of resource transferred through longer paths.
12
Considering all the resources are transferred through neighbors and the amount of
13
resources transferred by different neighbors are different, we divide the neighbors of
14
two endpoints into two classes: common neighbors and non-common-neighbors. Then,
15
non-common-neighbors and extended resource allocation index are defined as follows:
b
x
c
y
a 16 17 18
Fig. 3: The resource exchange between nodes.
Definition 1. Considering a pair of nodes, x, y V . z is the neighbor of node x ,
5
Extended resource allocation index for link prediction of complex network
1
but not the common neighbor between node x and y . Because each neighbor of node
2
x may be a resource carrier for node x , node z is defined as a non-common-neighbors
3
(such as nodes b and c shown in Fig.3) between node x and y . C xy is the set of all the
4
non-common-neirghbors between node x and y ,and C x| y is the set of non-common-
5
neirghbors directly connected to node x in C xy ( Cxy =Cx| y
C y| x ).
6
Definition 2. Considering a pair of nodes, x, y V . z is the common neighbor of
7
node x and y . In the simplest case, we assume that node x has one unit of resource, and
8
will distirbute it to its neighbors[13]. The amount of resource y received from
9
x through common neighbors, defined as
10
R( x y )
1 nzy kz zC xy
(1)
11
where C xy is the set of common neighbors between x and y , kz repesents the node
12
degree of node z , is the adjust parameter and nzy denotes the number of common
13
neighbors between z and y . Taking Fig. 3 for example, the amount of resource
14
y received from x through common neighbors a is (1 2 ) ka .
15
Definition 3. Considering a pair of nodes, x, y V . z is the neighbors of node x ,
16
and it is also the non-common-neighbors between node x and y ( z C x| y ). If node x
17
has a unit of resource, and will distirbute it to its neighbors. The amount of resource
18
y received from x through non-common-neighbors is
19
R( x y )
zC x| y
nzy kz
(2)
20
Here, the length of all the local paths walking across non-common-neighbors is three.
21
Taking Fig. 3 for example, the amount of resource y received from x through non-
6
Extended resource allocation index for link prediction of complex network
1
common-neighbors b is kb .
2
Definition 4. On an unweighted undirected network G (V , E ) , the total extended
3
resource allocation index composes of all the resource exchange between endpoints x
4
and y , defined as sxyERA R( x y ) R ( y x) R( x y ) R( y x )
5
1 nzy nzy 1 nzx nzx k k k kz zCxy zC xy zC x| y zC y |x z z z
zCxy
2 (nzy nzx ) (nzy nzx ) kz kz zC xy
6
(3)
7
Here, the 0 aims to adjust the proportion of resource allocated to each longer branch.
8
When =0 , the ERA index proposed is the same as the RA index[13]. The amount of
9
resource transmitted between nodes x and y in Fig.3 are R( x y)=(2 1) 4 ,
10
R( y x)=14 , R( x y)= 2 and R( y x) 3 3+ 2=7 6 respectively.
11
3. Metrics
12
To quantify the prediction accuracy of different methods, there are two standard
13
metrics called the area under the receiver operating characteristic curve (AUC) [22] and
14
precision [23]. The AUC value can be interpreted as the probability that the score given
15
to a randomly chosen missing link is higher than a randomly chosen non-existent link
16
[21]. At each time, a missing link and a non-existent link is randomly picked to compare
17
their scores given by the algorithm, if among n times of independent comparisons, there
18
are n times the missing link having a higher score and n times they have the same
19
score, the AUC value of the algorithm is:
20
AUC
n 0.5n n
7
(4)
Extended resource allocation index for link prediction of complex network
1
Clearly, if all the scores are randomly given, AUC 0.5 . Therefore, the higher the
2
value exceeds 0.5, the better the algorithm performs. Precision is defined as the ratio of
3
relevant links to the top L predicted links. If there are m relevant links appeared in the
4
probe set, the precision value is: Pr ecision
5 6 7 8 9 10
(5)
The higher precision means higher prediction accuracy. Here, we set L=100 . Table 1: The basic topological features of the twelve real networks and three synthetic dynamic networks. V is the number of nodes, E denotes the number of links, k indicates the average degree. d denotes the average distance. r is the assortativity coefficient [38] . C represents the clustering coefficient [39]. H is the degree heterogeneity. Network Jazz FW USAir Hamster Yeast PB Flight Infect AIDS-Blog Email UcSocial Figeys Ba-1 Ba-2 Ba-3
11
m L
V 198 69 332 1858 2375 1222 2939 410 146 167 1899 2239 800 1200 2000
E 2742 880 2128 12534 11693 16717 30501 2765 180 5784 13838 6432 1727 2527 4123
k 27.7 25.51 12.81 13.49 9.85 27.36 20.75 13.49 2.47 69.26 14.57 5.76 4.32 4.21 4.12
d 2.24 1.64 2.74 3.39 5.10 2.74 4.18 3.63 3.42 1.87 3.06 3.98 3.14 3.27 3.40
r 0.020 -0.298 -0.208 -0.085 0.469 -0.221 0.051 0.226 -0.725 -0.295 -0.188 -0.331 -0.242 -0.229 -0.220
C 0.633 0.552 0.749 0.090 0.378 0.361 0.255 0.456 0.052 0.541 0.109 0.040 0.211 0.172 0.144
H 1.4 1.27 3.36 3.36 3.48 2.97 5.22 1.39 5.99 1.66 3.82 9.75 6.14 6.99 8.43
4. Data
12
Experiments are performed on twelve different real networks and three synthetic
13
dynamic networks (randomly generated by BA scale-free network model [1] with
14
different scales). The real networks are introduced as follow: (1)Jazz [24]:the network
15
of Jazz musicians. (2) Food Web of South Florida ecosystem (FW) [25]: the network of
16
carbon exchanges occurring during the wet season in the cypress wetlands of South
17
Florida. (3) USAir [26]: the network of the USA airline. (4) Hamster [27]: a friendship
18
network of users on the website hamsterster.com. (5) Yeast PPI (Yeast) [28]: the
19
protein-protein interaction network of yeast. (6) Political blogs (PB) [29]: a network of
8
Extended resource allocation index for link prediction of complex network
1
US political blogs. (7) Open flights (Flight) [30]: the network of flights between airports
2
of the world. (8) Infectious (Infect) [31]: the network of face-to-face behaviour of
3
people during the exhibition “Infectious: Stay away” in 2009 at the Science Gallery in
4
Dublin. (9) AIDS-Blog [32]: a network of citations among blogs related to AIDS,
5
patients, and their support networks. (10) Email network (Email) [33]: the internal email
6
communication network between employees of a mid-sized manufacturing company.
7
(11) UC Irvine messages social network (UcSocial) [34]: the messages communication
8
network between the users of an online community of students from the University of
9
California, Irvine. (12) Human protein network (Figeys) [35]: a network of interactions
10
between proteins in Humans (Homo sapiens). Table 1 shows the basic topological
11
features of these networks. Each original data is randomly divided into training set
12
contains 90% of links, and the probe set contains the remaining 10%.
13
5. Results
14
We compare the ERA index with other eight similarity indices, including four local
15
indices: CN, RA, CAR and LP index, and four global indices: Katz, ACT, Cos+ and
16
MFI index. A brief introduction of them is shown as follow:
17 18 19
(1) Common Neighbor index(CN) [11] believes that the similarity of two nodes is positively correlated with the number of their common neighbors:
sCN | ( x) ( y) | xy
(6)
20
( x) is the set of neighbors of nodes x , and ( x) ( y ) denotes the
21
common neighbors of nodes x and y .
22 23 24
(2) Resource Allocation index (RA) [13] weights the common neighbors based on resource allocation, and publishes the common neighbors with big degree:
s xyRA
1
z| ( x ) ( y )| k
9
z
(7)
Extended resource allocation index for link prediction of complex network
1
(3) CAR index [36] suggests that two nodes are more likely to link together if their
2
common-first-neighbours are members of a strongly inner-linked cohort:
3
sCAR | ( x) ( y) | xy
z| ( x ) ( y )|
( z)
(8)
2
4
( z ) refers to the sub-set of neighbors of z that are also common neighbors of
5
nodes x and y .
6
(4) Local Path index (LP)[20], adds the path information with length 3 to CN, as:
S = A2 + A3
7 8 9 10
(9)
A is the adjacency matrix and is the adjust parameter.
(5) Katz index[14] considers all the paths between two nodes, and gives more weights to the shorter paths, as:
11
s xyKatz l | pathxyl | Axy 2 ( A2 ) xy 3 ( A3 ) xy ...
(10)
l 1
12
Where pathxyl is the set of paths with length l between nodes x and y , and
13
is the adjust parameter.
14 15
(6) Average Commute Time (ACT)[17] is the average steps of random walk between two endpoints, as:
s xyACT
16
1 l l 2lxy xx
yy
(11)
17
l xy denotes the corresponding entry in L+ , and L+ is the pseudo-inverse of
18
matrix L D A .
19 20
(7) Cosine Similarity Time (Cos+)[18] is based on by L+ calculating similarity of two vectors, as:
10
Extended resource allocation index for link prediction of complex network
s Cos xy
1
2
vTx vy | vx | | v y |
lxy
(12)
lxx l yy
(8) Matrix-Forest Index(MFI)[37] is defined as:
S ( I L)1
3 0.98
0.9
0.98
(c).USAir
(a).Jazz σ=0.0001
0.8
Auc
0.95
(b).FW
Auc
Auc
(13)
σ=19.4
0.97
σ=0.009 0.96 0.92 0
0.2
0.4
4
σ
0.6
0.8
0.7 0
1
1
0.98
0.95
0.96
4
8
σ
12
16
20
0
0.1
0.2
0.3
σ
0.4
0.5
0.9
σ=0.438
0.85
(f).PB
0.94
Auc
(d).Hamster
Auc
Auc
0.95
(e).Yeast σ=0.267
0.8 0
0.2
0.4
σ
0.9 0
0.6
σ=0.056 0.93
0.92
5
0.94
0.1
0.2
σ
0.92 0
0.3
0.1
σ
0.3
0.5
0.97 0.9
0.99
0.98
σ=0.033
0.97 0
0.1
6
σ
0.95
0.2
σ=0.017
0.93
0.8
0.2
0.6 0
0.4
σ
Auc
Auc 0.92 0
0.1
7
(l).Figeys
0.85 σ=0.409 0.8
σ=0.005
0.2
0.3
σ
0.8 σ=0.44
0.7 0.6
0.75 0
0.4
1
σ
0.9
(k).Ucsocial
Auc
0.9
0.5
1
0.95
(j).Email
σ=1.143
0.7
0.94 0
0.3
(i).AIDS-Blog
(h).Infect
Auc
0.96
Auc
Auc
(g).Flight
0.2
0.5 0
0.4
σ
0.8
0.2
0.4
σ
0.6
0.7
(m). Ba-1
Auc
Auc
0.7
σ=0.44
(n).Ba-2
(o).Ba-3
Auc
0.7
0.75
σ=0.74
0.65
0.65 σ=0.28
0.65 0
8 9 10 11
0.6 0.1
0.2
σ
0.3
0.4
0.5
0.6 0
0.2
0.4
σ
0.6
0.8
1
0
0.1
0.2
σ
0.3
0.4
0.5
Fig. 4: The experiment result (AUC) of ERA index on twelve real networks and three synthetic dynamic networks with different values of . Each AUC value is the average of 20 realizations, each of which corresponds to an independent division of E T and E P .
11
Extended resource allocation index for link prediction of complex network
1
5.1 AUC results
2
Firstly, we report the AUC result of ERA index with different , and each data is the
3
average of 20 realizations. As shown in Fig. 4, AUC value is varies with the change of
4
in fifteen datasets. When 1 , many of ERA can obtain the optimal values (except
5
the FW and AIDS-Blog networks 1 ). There is a sudden increase of AUC around
6
=0 for all the datasets, which indicates the effectiveness of potential resource
7
allocated through longer paths in ERA index. In most of networks, the AUC values are
8
stable around the maximum value after a sudden increase (there is no big difference
9
between the AUC values of these points and peak point). However, in some datasets
10
such as Jazz, USAir, Infect and Email, the prediction accuracy is declined gradually
11
after reaching the highest point.
12 13
Table 2: Comparison of the AUC value between ERA and some similarity indices. Each AUC value is the average of 20 realizations, each of which corresponds to an independent division of E T and E P . AUC Jazz FW USAir Hamster Yeast PB Flight Infect AIDS-Blog Email UcSocial Figeys Ba-1 Ba-2 Ba-3
CN 0.954 0.690 0.954 0.812 0.915 0.924 0.969 0.940 0.601 0.921 0.780 0.565 0.640 0.614 0.599
RA 0.971 0.708 0.972 0.815 0.916 0.928 0.972 0.946 0.615 0.925 0.786 0.568 0.641 0.615 0.597
CAR 0.955 0.689 0.954 0.812 0.916 0.924 0.969 0.941 0.602 0.919 0.779 0.564 0.556 0.535 0.525
LP(a) 0.951 0.709 0.953 0.933 0.970 0.936 0.984 0.960 0.821 0.921 0.892 0.887 0.705 0.669 0.646
LP(b) 0.947 0.735 0.952 0.940 0.969 0.939 0.984 0.960 0.822 0.921 0.902 0.903 0.703 0.668 0.647
Katz(a) Katz(b) 0.951 0.941 0.707 0.738 0.952 0.951 0.933 0.937 0.972 0.971 0.937 0.933 0.983 0.981 0.961 0.960 0.840 0.841 0.920 0.917 0.891 0.903 0.887 0.900 0.699 0.697 0.666 0.667 0.641 0.637
ACT 0.795 0.786 0.902 0.843 0.898 0.893 0.909 0.802 0.954 0.899 0.895 0.875 0.566 0.535 0.516
Cos+ 0.925 0.507 0.956 0.924 0.971 0.928 0.989 0.947 0.579 0.905 0.867 0.806 0.261 0.267 0.274
MFI 0.921 0.707 0.939 0.948 0.971 0.905 0.979 0.960 0.730 0.887 0.868 0.884 0.581 0.556 0.535
ERA 0.972 0.875 0.976 0.973 0.974 0.952 0.992 0.968 0.932 0.928 0.934 0.952 0.761 0.713 0.683
In these methods, the adjust parameter =0.001 . The adjust parameter =0.01 .
14 15 16
(a)
17
indices. In 14 out of 15 datasets, the AUC value of ERA is the highest, and only lower
18
than the ACT in AIDS-Blog network. Because of the neglect of long path information,
19
CN index gets the lowest AUC values for most of networks. The performance of CAR
(b)
Table 2 shows comparison of the AUC value between ERA and some similarity
12
Extended resource allocation index for link prediction of complex network
1
index is almost the same as CN index under the AUC standard, though CAR has
2
considered the local-community based on CN. With more path information considered,
3
LP achieves a better performance than CN, and some of them are close to global ones.
4
All the global indices, especially the Katz and Cos+ index, can obtain higher prediction
5
accuracy than local indices in real networks. Nevertheless, the performance of ACT,
6
Cos+ and MFI is worse in synthetic dynamic networks than in real networks. It is worth
7
mentioning that, because the RA index considers the resource allocation of common
8
neighbors, it performs better than expected at lower complexity, which indicates that
9
resource interaction between endpoints may be more important than the number of paths
10
(LP index) in some networks such as in Jazz, Email and USAir. However, having
11
considered the potential resource exchange through longer paths, ERA can perform
12
even better than RA in synthetic dynamic or real networks. In addition, we recommend
13
that the parameter is set at around 0.04 for ERA under AUC metric in the real
14
predicting (most of these AUC values are equal to or close to the optimal value).
15
5.2 Precision results
16
In order to further understand the performance of ERA, the standard metric precision
17
is introduced to measure the prediction accuracy from a different perspective. Fig. 5
18
shows the precision of ERA index with the change of in different datasets. For most
19
of networks, there is also a sudden increase of precision value around =0 (except for
20
Jazz). In high clustering networks such as USAir, Jazz, Infect and Email, the precision
21
value is declined gradually after reaching the highest point, because the longer paths
22
between nodes are more important for resource interaction in these networks. On the
23
contrary, for many networks without higher clustering coefficient, the precision of ERA
13
Extended resource allocation index for link prediction of complex network
1
can obtain the optimal values with 1 and stay around a certain value after a sudden
2
increase. 0.4
0.8 0.75
0.66
0.35
Precision
Precision
(a).Jazz σ=0
0.7
(c).USAir
(b).FW
0.3
Precision
0.85
σ=10
0.25 0.2
0.65 0
0.2
0.3
(d).Hamster
0.4
σ
3
0.6
0.8
3
5
7
σ
9
11
0
σ=17.7 0.1
0.65
σ=8.2
0.55
5
10
15
σ
σ
0.3
0.4
0.5
(f).PB
0.75
0.45 0
0.2
0.45
Precision
0.2
4
0.1
(e).Yeast
Precision
Precision
0.85
0 0
σ=0.009 0.62 0.6
0.15 0 1
1
0.64
0.35
σ=10.3
0.25 2
4
σ
6
8
10
0
2
4
6
8
σ
10
12
0.55
σ=6.8
0.45
(h).Infect
Precision
Precision
Precision
0.08
0.5
(g).Flight
σ=0.001
0.4
(i).AIDS-Blog
0.06
σ=0.14 0.04
0.3 0.35 0
2
4
6
σ
5
8
0
0.2
0.4
0.6
σ
0.8
0.02 0
1
σ
0.6
0.8
1
0.2
(j).Email σ=0.014
0.71
(k).Ucsocial
0.1
Precision
Precision
Precision
0.4
0.25
0.15 0.72
0.2
σ=3.1
0.05
(l).Figeys 0.15 σ=1.3
0.1 0.05
0.7 0
0.1
6
0.2
σ
0.3
0.4
0 0
0.5
1
2
3
σ
0 0
4
0.2
σ=3.5 0.1 0
7 8 9 10
(n).Ba-2
Precision
Precision
Precision
0.15
1
1.5
σ
2
0.2
0.18
(m).Ba-1
0.5
0.14 σ=0.64
(o).Ba-3
0.15
σ=4.9
0.1
1
2
σ
3
4
0
0.2
0.4
σ
0.6
0.8
1
0.12 0
1
2
3
σ
4
5
6
Fig. 5: The prediction result of ERA index on twelve real networks and three synthetic dynamic networks with different values of . Each precision value is the average of 20 realizations, each of which corresponds to an independent division of E T and E P .
11 12
14
Extended resource allocation index for link prediction of complex network
1 2
Table 3: Comparison of the precision value between ERA and some similarity indices. Each precision value is the average of 20 realizations. Precision Jazz FW USAir Hamster Yeast PB Flight Infect AIDS-Blog Email UcSocial Figeys Ba-1 Ba-2 Ba-3
CN 0.814 0.148 0.585 0.015 0.684 0.409 0.509 0.397 0.016 0.703 0.032 0.014 0.192 0.173 0.187
RA 0.828 0.170 0.632 0.008 0.491 0.242 0.365 0.512 0.032 0.702 0.024 0.016 0.088 0.101 0.125
CAR 0.851 0.143 0.580 0.033 0.674 0.467 0.627 0.385 0.016 0.702 0.055 0.030 0.190 0.174 0.183
LP(a) 0.802 0.161 0.583 0.016 0.684 0.416 0.514 0.360 0.052 0.709 0.033 0.014 0.192 0.176 0.188
LP(b) 0.775 0.187 0.580 0.052 0.736 0.442 0.549 0.356 0.052 0.706 0.046 0.015 0.193 0.175 0.187
Katz(a) Katz(b) 0.802 0.747 0.161 0.192 0.583 0.574 0.016 0.077 0.683 0.729 0.416 0.451 0.514 0.543 0.360 0.344 0.053 0.053 0.709 0.694 0.033 0.047 0.014 0.015 0.194 0.193 0.177 0.174 0.188 0.186
ACT 0.253 0.271 0.477 0.085 0.571 0.131 0.337 0.134 0.079 0.619 0.069 0.010 0.001 0.002 0.000
Cos+ 0.354 0.000 0.078 0.016 0.243 0.326 0.042 0.204 0.000 0.614 0.010 0.006 0.000 0.001 0.003
MFI 0.218 0.047 0.052 0.036 0.062 0.007 0.056 0.151 0.000 0.365 0.002 0.001 0.000 0.000 0.001
ERA 0.828 0.384 0.651 0.346 0.853 0.456 0.545 0.512 0.082 0.722 0.142 0.223 0.195 0.180 0.197
In these methods, the adjust parameter =0.001 . The adjust parameter =0.01 .
3 4 5
(a)
6
out of 15 datasets, ERA index obtains the best performance, and only worse than the
7
CAR index in Jazz and PB. For the Flight network, there are two indices CAR and LP
8
which are performing better than ERA index. Unlike the result of AUC, the precision
9
value of local indices (CN, RA) is very close to global indices and even higher than that
10
of LP and Katz in some higher clustering network such as Jazz, USAir, Yeast, Infect
11
and Email. Considering the local community, CAR achieves a good performance under
12
the precision metric, and even better than LP, Katz in some networks. In Jazz, PB and
13
Flight networks, the precision values of CAR are higher than ERA, which indicate that
14
the longer paths passing across the common-neighbors (local community in CAR) play
15
a more important role than that passing across the non-common-neighbors in link
16
prediction for these networks. Surprisingly, the performance of some global indices
17
including ACT, Cos+ and MFI is poor than expected in all the datasets, partly because
18
these indices pay more attention to AUC and ignore the standard metric precision.
19
Compared with other indices, RA index perform worse in synthetic dynamic networks
(b)
Table 3 reports the average precision value of ERA and some similarity indices. In 12
15
Extended resource allocation index for link prediction of complex network
1
than in real networks. Nevertheless, ERA can improve the performance of RA in
2
synthetic networks with the consideration of longer paths. Besides, the complexity of
3
ERA is between O( N k ) (RA) and O( N k ) (LP). In the real predicting, we
4
recommend that the parameter is set at around 1.1 for common networks under the
5
precision metric, and around 0 (such as 0.001) for high clustering networks.
6
4. Conclusions and discussions
2
3
7
Similarity index based on topological structure plays an important role in link
8
prediction. Motivated by the potential resource transferred through longer paths, an
9
extended resource allocation index is proposed. The ERA considers all the neighbors
10
which can transfer resources, and achieves a good performance with an adjust
11
proportion of resource allocated to longer paths. In all the twelve real networks and
12
three synthetic dynamic networks, the AUC and precision values of ERA have a sudden
13
increase around =0 (RA), which indicates that the consideration of potential resource
14
transferred by longer paths can effectively improve the prediction accuracy of RA. With
15
the change of parameter , each network can find an optimal prediction value. As can
16
be seen form the result of AUC and precision, the local indices (CN and RA) are more
17
suitable for networks with smaller average distance or higher clustering coefficient, and
18
even perform better than global indices. On the contrary, the global indices can perform
19
well in networks with larger average distance or lower clustering coefficient, because
20
they have considered all the path information. Having considered resource exchange
21
between endpoints, the ERA finds a tradeoff between node degree of neighbors and
22
potential paths for different networks by an adjusting parameter, and it indicates that the
23
growth mechanism of edge is closely related to the node degree of neighbors which
24
appeared in the paths between two endpoints of new edge. It is of great significance to
16
Extended resource allocation index for link prediction of complex network
1
understand the network evolution mechanism. In addition, many indices pay more
2
attention to the standard metric AUC and ignore the precision. However, the precision
3
also play an important role in measuring the prediction accuracy for some real networks
4
such as protein-protein interaction networks. In ERA index, it can achieve a high result
5
under two standard metrics AUC and precision. Duo to its good performances in
6
datasets with different clustering coefficient and low time complexity, the ERA index
7
can be applied to many more real networks, especially large-scale networks.
8 9
Acknowledgements
10
This work is partially supported by the Foundation for Innovative Research Groups
11
of the National Natural Science Foundation of China (No. 61521003) and the National
12
High
13
SS2015AA011306).
14
References
15
[1]. A.-L. Barabási, R. Albert, Emergence of Scaling in Random Networks, Science 286
16
(1999) 509-512.
17
[2]. S. Aral, D. Walker, Identifying influential and susceptible members of social
18
networks, Science 337 (2012) 337-341.
19
[3]. E. Bullmore, O. Sporns, Complex brain networks: graph theoretical analysis of
20
structural and functional systems, Nat. Rev. Neurosci. 10 (2009) 186-198.
21
[4]. F. Schweitzer, G. Fagiolo, D. Sornette, F. Vega-Redondo, A. Vespignani, D.R.
22
White, Economic networks: The new challenges, Science 325 (2009) 422.
23
[5]. Sun L, Liu L, Xu Z, et al, Locating inefficient links in a large-scale transportation
24
network, Physica A 419 (2015) 537-545.
Technology
Research
and
Development
17
Program
of
China
(No.
Extended resource allocation index for link prediction of complex network
1
[6]. P. Wang, B. Xu, Y. Wu, X. Zhou, Link prediction in social networks: the state-of-
2
the-art, Sci. China Inform. Sci. 58 (2015) 1-38.
3
[7]. A. Clauset, C. Moore, M.E. Newman, Hierarchical structure and the prediction of
4
missing links in networks, Nature 453 (2008) 98-101.
5
[8]. C. Von Mering, L.J. Jensen, B. Snel, S.D. Hooper, M. Krupp, M. Foglierini, N.
6
Jouffre, M.A. Huynen, P. Bork, STRING: known and predicted protein-protein
7
associations, integrated and transferred across organisms, Nucleic Acids Res. 33 (2005)
8
D433-D437.
9
[9]. S. Scellato, A. Noulas, C. Mascolo, Exploiting place features in link prediction on
10
location-based social networks, in: Proceedings of the 17th ACM SIGKDD
11
international conference on Knowledge discovery and data mining, ACM, 2011, pp.
12
1046-1054.
13
[10]. P. Zhang, A. Zeng, Y. Fan, Identifying missing and spurious connections via the
14
bi-directional diffusion on bipartite networks, Phys. Lett. A 378 (2014) 2350-2354.
15
[11]. F. Lorrain, H.C. White, Structural equivalence of individuals in social networks, J.
16
Math. Sociol 1 (1971) 49-80.
17
[12]. L.A. Adamic, E. Adar, Friends and neighbors on the web, Soc. Netw 25 (2003)
18
211-230.
19
[13]. T. Zhou, L. Lü, Y.-C. Zhang, Predicting missing links via local information, Eur.
20
Phys. J. B 71 (2009) 623-630.
21
[14]. L. Katz, A new status index derived from sociometric analysis, Psychometrika 18
22
(1953) 39-43.
23
[15]. X. Zhu, H. Tian, S. Cai, Predicting missing links via effective paths, Physica A
24
413 (2014) 515-522.
18
Extended resource allocation index for link prediction of complex network
1
[16]. X. Zhu, H. Tian, S. Cai, J. Huang, T. Zhou, Predicting missing links via significant
2
paths, Europhys. Lett. 106 (2014) 18008.
3
[17]. D.J. Klein, M. Randic, Resistance distance, J. Math. Chem. 12 (1993) 81.
4
[18]. F. Fouss, A. Pirotte, J.-M. Renders, M. Saerens, Random-walk computation of
5
similarities between nodes of a graph with application to collaborative recommendation,
6
IEEE Trans. Knowl. Data. Eng. 19 (2007) 355.
7
[19]. G. Jeh, J. Widom, SimRank: a measure of structural-context similarity, in:
8
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery
9
and Data Mining, ACM Press, New York, 2002, pp. 271–279.
10
[20]. L. Lü, C.-H. Jin, T. Zhou, Similarity index based on local paths for link prediction
11
of complex networks, Phys. Rev. E 80 (2009) 046122.
12
[21]. X. Feng, J. Zhao, K. Xu, Link prediction in complex networks: a clustering
13
perspective, Eur. Phys. J. B 85 (2012) 1-9.
14
[22]. L. Lü, T. Zhou, Link prediction in complex networks: A survey, Physica A 390
15
(2011) 1150-1170.
16
[23]. J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative
17
filtering recommender systems, ACM Trans. Inf. Syst. 22 (2004) 5-53.
18
[24]. P.M. Gleiser, L. Danon, Community structure in jazz, Adv. Complex Syst. 6
19
(2003) 565-573.
20
[25]. R.E Ulanowicz, D.L DeAngelis, Network analysis of trophic dynamics in south
21
florida ecosystems, US Geological Survey Program on the South Florida Ecosystem
22
(2005) 114.
23
[26]. V Batagelj, A Mrvar, Pajek-program for large network analysis, Connections
24
21(1998) 47-57.
19
Extended resource allocation index for link prediction of complex network
1
[27]. L. Lü, L. Pan, T. Zhou, Y.-C. Zhang, H.E. Stanley, Toward link predictability of
2
complex networks, Proc. Natl. Acad. Sci. 112 (2015) 2325-2330.
3
[28]. D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N.
4
Zhang, G. Li, R. Chen, Topological structure analysis of the protein-protein interaction
5
network in budding yeast, Nucleic Acids Res. 31 (2003) 2443-2450.
6
[29]. L.A. Adamic, N. Glance, The political blogosphere and the 2004 US election:
7
divided they blog, in: Proceedings of the 3rd international workshop on Link discovery,
8
ACM, 2005, pp. 36-43.
9
[30]. T. Opsahl, F. Agneessens, J. Skvoretz, Node centrality in weighted networks:
10
Generalizing degree and shortest paths, Soc. Netw 32 (2010) 245-251.
11
[31]. L. Isella, J. Stehlé, A. Barrat, C. Cattuto, J. F. Pinton, W. Van den Broeck, What's
12
in a crowd? Analysis of face-to-face behavioral networks, J. Theor. Biol. 271 (2011)
13
166-180.
14
[32]. S. Gopal, The evolving social geography of blogs, H. Miller, Ed. Berlin: Springer,
15
2007, pp. 275-294.
16
[33]. R. Michalski, S. Palus, P. Kazienko, Matching organizational structure and social
17
network extracted from email communication, in: Business Information Systems,
18
Springer, 2011, pp. 197-206.
19
[34]. Tore Opsahl and Pietro Panzarasa, Clustering in weighted networks, Soc. Netw,
20
31(2009) 155-163.
21
[35]. R. M. Ewing, P. Chu, F. Elisma, H. Li, P.Taylor, S. Climie, L. M.- Cerajewski, M.
22
D. Robinson, L. O'Connor, M. Li, R. Taylor, M. Dharsee, Y. Ho, A. Heilbut, L. Moore,
23
S. Zhang, O. Ornatsky, Y. V. Bukhman, M. Ethier, Y. Sheng, J. Vasilescu, M. A.-
24
Farha, J. P. Lambert, H. S Duewel, I. I Stewart, B. Kuehl, K. Hogue, K. Colwill, K.
20
Extended resource allocation index for link prediction of complex network
1
Gladwish, B. Muskat, R. Kinach, S.- L. Adams, M. F Moran, G. B Morin, T.
2
Topaloglou, D. Figeys, Large-scale mapping of human protein-protein interactions by
3
mass spectrometry, Mol. Syst. Biol. 3 (2007).
4
[36]. C.V. Cannistraci, G Alanis-Lobato, T Ravasi, From link-prediction in brain
5
connectomes and protein interactomes to the local-community-paradigm in complex
6
networks, Sci. Rep. 3 (2013).
7
[36]. P. Chebotarev, E.V. Shamis, The matrix-forest theorem and measuring relations in
8
small social groups, Autom. Remote Control 58 (1997) 1505.
9
[37]. M. E. Newman, Assortative mixing in networks, Phys. Rev. Lett. 89 (2002)
10
208701.
11
[39]. D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature
12
393 (1998) 440-442.
13
[40]. Q. Ou, Y.-D. Jin, T. Zhou, B.-H. Wang, B.-Q. Yin, Power-law strength-degree
14
correlation from resource-allocation dynamics on weighted networks, Phys. Rev. E 75
15
(2007) 021102.
21