Journal Pre-proof Identifying influential spreaders in complex networks based on improved k-shell method Min Wang, Wanchun Li, Yuning Guo, Xiaoyan Peng, Yingxiang Li
PII: DOI: Reference:
S0378-4371(20)30055-8 https://doi.org/10.1016/j.physa.2020.124229 PHYSA 124229
To appear in:
Physica A
Received date : 16 January 2019 Revised date : 15 January 2020 Please cite this article as: M. Wang, W. Li, Y. Guo et al., Identifying influential spreaders in complex networks based on improved k-shell method, Physica A (2020), doi: https://doi.org/10.1016/j.physa.2020.124229. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Published by Elsevier B.V.
Journal Pre-proof
repro of
Identifying influential spreaders in complex networks based on improved k-shell method Min Wanga , Wanchun Lia,∗, Yuning Guoa , Xiaoyan Penga , Yingxiang Lib a School
of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China b Department of Communication Engineering, Meteorological information and Signal Processing Key Laboratory of Sichuan Higher Education Institutes of Chengdu University of Information Technology, Chengdu 611731, China
Abstract
Identifying influential spreaders in complex networks is a fundamental network project. It has drawn great attention in recent years because of its great
theoretical significance and practical value in some fields. K-shell is an efficient
rna lP
method for identifying influential spreaders. However, k-shell neglects informa-
tion about the topological position of the nodes. In this paper, we propose an improved algorithm based on the k-shell and node information entropy named IKS to identify influential spreaders from the higher shell as well as the lower shell. The proposed method employs the susceptible-infected-recovered (SIR) epidemic model, Kendall’s coefficient τ , the monotonicity M, and the average shortest path length Ls to evaluate the performance and compare with other benchmark methods. The results of the experiment on eight real-world networks show that the proposed method can rank the influential spreaders more accurately. Moreover, IKS has superior computational complexity and can be extended to large-scale networks.
Jou
Keywords: Complex network, Influential spreaders, Improved k-shell, Node information entropy, SIR epidemic model
∗ Corresponding
author Email address:
[email protected] (Wanchun Li)
Preprint submitted to Journal of Physica A
January 15, 2020
1. Introduction
repro of
Journal Pre-proof
With the rise of complex network research, the identification of influential spreaders in the network is critical for global message dissemination [1] and
effective broadcast of various news [2, 3, 4]. This research has been used in many applications, such as disease spreading control [5], information dissemination
[6, 7], virus spreading deceleration [8, 9], social network analysis[10, 11] and
new product campaign [12]. A basic question is how to identify influential
spreaders in complex networks [13]. A variety of methods have been reported to identify influential spreaders in complex networks [14, 15, 16, 17].
There are many classic centrality methods such as degree centrality [18], closeness centrality [19], betweenness centrality [20], neighborhood-based cen-
tralities [21, 22], path-based centralities [23], local structural similarity [24], iterative refinement centralities[25], and so on[26]. The k-shell method proposed
rna lP
by Kitsak et al. [27] is the most widely used method [28], which indicates that
the nodes’ location is one of the most essential factors to evaluate the most efficient spreaders. The k-shell decomposition tends to assign many nodes with an identical k-shell index, but the importance of nodes may different [29]. Studies have shown that the nodes with the largest coreness value may not be the super spreaders [30, 31]. Moreover, as the k-shell index does not provide sufficient information about the topological locations of nodes [32], the influential spreaders evaluated by the k-shell method are located at the core of graph, which shows the phenomenon of rich-clubs [33]. Avoiding the phenomenon of rich-clubs can achieve better spread effects [34, 35].
Several studies have been devoted to improve the performance of the k-shell
Jou
algorithm and avoid the rich-clubs phenomenon. Hu et al.[36] identified the multiple influential spreaders in community networks and found that the distance among spreaders plays an important role. Guo et al.[37] proposed an improved distance-based coloring method to identify influential spreaders. Zhang et al. [38] proposed a method to select a spreader source with the best spreading capacity, which can effectively avoid the rich-clubs phenomenon. Ren et al.
2
Journal Pre-proof
repro of
[29] developed a method to identify the spreading capability of nodes with the
minimum k-shell value. The method is effective only for those minimum k-shell nodes. Bian et al.[39] proposed a measure based on node information dimension to identify the influential nodes, the local dimensions of different topological dis-
tance scales for each node constitute the node information dimension, but its
computational complexity is high. Sheikhahmadi et al.[40] defined a hybrid parameter based on degree, structural location and dispersion of the neighbors to
discover influential nodes. Recently, Zareie et al. [41] propose an index that
determines node centrality using the notions of Shannon entropy and JensenShannon Divergence. In this index, the influential node is specified based on the dispersion of its neighbors in the graph as well as the specification of their
influence spread spheres on the network and the intensity of the influence. Addi-
tionally, several node ranking algorithms were proposed to improve the ranking
performance [42, 43, 44, 45]. The influence of a node is determined together by
rna lP
the location and neighbor nodes. A node belonging to a lower shell sometimes can be more powerful than nodes belonging to a higher shell [40]. Therefore, it is critical to design an effective method to rank the node importance. In this paper, we consider the influential spreaders not only from the highest shell but also from the lower shell. In addition to the shell of a node, its
local metric is considered. We present a selection method to identify influential nodes based on the k-shell method and the influence of the neighbor nodes. The SIR model is employed to simulate the epidemic spreading process on eight real-world networks to evaluate the effectiveness of the proposed method. Experimental results are presented to demonstrate that our new selection algo-
Jou
rithm can identify the influential nodes more accurately than the conventional centrality methods such as the degree centrality, k-shell, neighborhood coreness centrality, weight neighborhood centrality, improved k-shell indicator, and the mixed degree decomposition. Since the proposed method is based on the k-shell decomposition method with low computational complexity, it can be applied to large scale networks.
The remainder of this paper is organized as follows. Section 2 briefly re3
Journal Pre-proof
repro of
views the conventional centrality methods used for comparison. The proposed
improved k-shell (IKS) method is described in section 3. Section 4 includes data description, evaluation methods and experimental results. Finally, the conclusion and future works are presented in section 5.
2. Related Works
Supposing an undirected and unweighted complex network can be repre-
sented as a graph G(V, E), where N = |V | and M = |E| denote the number of all nodes and edges respectively. Meanwhile, A = (aij )n×n denotes the adjacent
matrix, where aij = 1 if node vi is connected to node vj , else aij = 0. Besides, we use Γ(v) to denote the set of neighbors of node v. 2.1. Degree centrality
rna lP
The degree centrality (DC)[18] is the simplest index to measure nodes’ influence. The larger degree a node has, the greater influence of the node gets. Since it can spread its impact into more neighbor nodes. The normalized degree centrality is defined as:
DC(i) =
ki N −1
(1)
where N is the number of nodes in G, ki is the degree of a node vi , which is defined as the number of directly connected neighbors of node vi . 2.2. K-shell method
The K-shell decomposition method is used to quantify the nodal significance by dividing all nodes into different shells [27]. The algorithm removes the 1-
Jou
degree nodes firstly, iteratively proceed until there exist no nodes with degree 1 in the network. All nodes and edges which have been removed create the 1-shell. Repeatedly performing this procedure, we can obtain the 2-shell, 3shell· · · · · · and so on. Finally, each node will belong to a specic shell, within which all nodes have the same coreness value. Accordingly, the nodes within the same shell are thought to hold the same importance and the same spreading
4
Journal Pre-proof
repro of
capability. Thus, k-shell is a coarse-grained index to characterize the node
ranking importance since it neglects all the information of removed nodes and
edges. Thus the results obtained by the k-shell method are not suitable in some real-world networks.
2.3. Mixed degree decomposition method
The mixed degree decomposition (MDD) method was proposed by Zeng et
al.[23] to improve the precision of k-shell. It modifies the process by considering both the residual degree kr and the exhausted degree ke . Let km (v) denotes
the mixed degree of node v. The larger the value km is, the greater the node influences the network. The mixed degree of node v is defined as: km (v) = kr + λ · ke
(2)
where λ is a tunable parameter between 0 and 1. Note that the MDD method
rna lP
coincides with the k-shell method when λ = 0, but it is equivalent to the degree
when λ = 1. The network continues to layer according to the new mixed value, but it is difficult to find the optimal parameter λ to achieve a better result. 2.4. Improved k-shell indicator
The improved k-shell indicator[46] by taking into account the shortest distance from a target node to the network core which is defined as the node set with the highest k-shell values. The development of the influence of the nodes with the same amount of k-shell can be identified by equation (3):
Jou
θ(v) = (ksmax − ks(v) + 1)
X
d(v, w)
(3)
w∈Sc
where ksmax is the largest k-shell in the network, Sc is the set of core nodes
with the highest k-shell index ksmax and d(v, w) is the shortest distance from the node v to w ∈ Sc . The larger the value θ is, the farther the node lies to
the network core, which indicates the node has less importance. Although this method is able to differentiate the nodes of a shell, the computational cost is relatively expensive by calculating the shortest distance to the core nodes. 5
Journal Pre-proof
repro of
2.5. Neighborhood coreness centrality
The neighborhood coreness centrality is proposed by Bae et al.[21] to estimate the spreading influence of a node in a network by summing all neighbors’ k-shell values. A high neighborhood coreness value indicates that a spreader
with more connections to the neighbors located in the core of the network. The neighborhood coreness of node v is defined as: X
Cnc (v) =
ks(w)
(4)
w∈N (v)
where N (v) is the set of the neighbors adjacent to node v and ks(w) is the
k-shell index of its neighbor node w. Furthermore, the extended neighborhood coreness Cnc+ of node v is calculated by: Cnc+ (v) =
X
Cnc (w)
(5)
w∈N (v)
rna lP
where Cnc (w) is the neighborhood coreness of neighbor w. 2.6. Weight neighborhood centrality
The weight neighborhood centrality was proposed[22] to consider the centrality of a node and its neighbors’ centrality to generate the influential ranking list. The diffusion importance of an edge is used to adjust the influence of its neighbors’ centrality. The weigh neighborhood centrality of node i use an edge weighting method based on the power-low function of degree [47, 48] to quantify the diffusion importance of edges, which is defined as: Ci (ϕ) = ϕi +
X
j∈Γ (i)
wij · ϕj
(6)
Jou
where ϕ is the benchmark centrality, we use k-shell in this paper. Γ (i) is the set of nearest neighbors of node i, wij is the diffusion importance of edge eij and is the average diffusion importance of all edges. This equation means that the weight neighborhood centrality encodes the centrality of a node and its neighbors which depend on the diffusion importance of links. Besides, the neighbors’ effect increases with the increase of diffusion importance of the link between it and the origin node. 6
Journal Pre-proof
repro of
2.7. Mixed Core, Degree and Entropy
The mixed Core, Degree and Entropy were proposed based on node degree and topological location as well as the diversity of the neighbors in different
shells. Sheikhahmadi et al.[40] defined weighted entropy as in Eq.(7) for de-
termining the diversity of the neighbors of node v on the graph given the significance of neighboring nodes with greater k-shells in the specification of the influentiality of a node:
Entropy(v) = −
ks max X i=1
(pi ∗ log2 pi )
(7)
where pi is the probability of node v’s friends presence in the ith core, which is calculated by equation (8): pi =
Count(v 0 s f riends in core i) k(v)
(8)
as:
rna lP
In this approach, MCDE(the mixed core, degree and entropy) was defined
M CDE(v) = αks(v) + βk(v) + γEntropy(v)
(9)
In equation (9), ks(v) is the core number of node v. k(v) is the number of node v links. Entropy(v) is used to calculate the dispersion of node v’s friends in different cores.
To adjust the effect of the core, degree and entropy of each node, three adjustable parameters, α, β and γ, are used. To equate the effect of these measures, the amounts of these three parameters are considered to be equal
Jou
(α=β=γ=1).
3. The improved k-shell algorithm As shown above, many methods have been proposed to rank nodes in net-
works. These measures have their advantages and disadvantages. In the field of information science, when we want to analyze the uncertainty of any information, we need to make a quantitative analysis of the information. Shannon
7
Journal Pre-proof
repro of
proposed information entropy to represent the uncertain degree of information.
Moreover, the larger the entropy, the more uncertain the information. In order
to measure the spreading contribution of neighbor nodes to the specific node, we extend the concept of entropy into complex networks, called as node information entropy. Here, the node information entropy captures the global structure
in the network, and the larger the entropy, the more influential the node is.
Although the concept of information entropy is mentioned in [49], he didn’t consider location information. Assuming that the degree of the node vi is ki , the node importance of vi is Ii =
ki , N P kj
where N is the number of nodes in G,
j=1
so the node information entropy is defined as: ei = −
X
j∈Γ(i)
Ij · lnIj
(10)
where j ∈ Γ(i) is the set of neighbors of node vi . Node information entropy
rna lP
takes the propagation effect of neighbor nodes into account, the larger the node
information entropy is, the easier it propagates influence to neighbor nodes, so the node is more influential. Node information entropy is an unordered metric of the network. If the network is randomly connected, the node information entropy of each node is similar. Conversely, if the network is scale-free, there are a small number of nodes are high connectivity and a large number of nodes with low connectivity. Every node has different node importance, node information entropy distribution is uneven, so the more important node will have the larger node information entropy. The IKS method is conducted using the following procedure:
Jou
Step 1: Decomposing the network into k shells according to the k-shell decomposition algorithm;
Step 2: Calculating the node information entropy ei according to the formula
(10);
Step 3: Sorting the nodes in each shell according to the node information en-
tropy from large to small; Step 4: For nodes with the highest k-shell value, select the node which has the 8
Journal Pre-proof
repro of
largest node information entropy. Then select the node in next to the highest shell which has the largest node information entropy. This process continues
until the node selected in the 1-shell. At this time, the first iteration is finished; Step 5: Repeat Step 4 and select the residual nodes until all nodes have been
selected. To ignore the shell in which all nodes are selected. Choosing the node randomly when the value of the node information entropy is equal in the specific shell.
This method can solve the shortcomings of the original k-shell method. It
divides the nodes in the network into different shells according to the node information entropy. In order to avoid the problem that some influential spreaders
are so close together that they overlap sphere of influence, only one node in every hierarchical node set is selected. The IKS selects influential nodes not only from the higher shell, hence every node in each shell has different importance.
To give an intuitive explanation of our algorithm, we consider the following
rna lP
example. Fig.1[50] is a simple network with 26 nodes.Firstly, we use the k-shell
decomposition algorithm decomposing the network into 3 shells. As shown in Fig.1, the nodes set {1,2,3,4} is in 3-shell, {5,6,7,8} is in 2-shell, from node 9 to node 26 are in 1-shell. We calculate the node information entropy results of all nodes in three shells. The node 4 is taken as an example to calculate as follows.
14
12
13
15
11
17
10
18
16
9
23
6
7
K=2
5
20
K=1 21 22
4
2
Jou
19
K=3
3
24
1
8 25 26
Figure 1: An illustrative example of a simple network[50]
9
I4 =
k4 4 = N P 62 kj
j=1
e4 = − =−
X
j∈Γ(v4 )
Ij · lnIj
repro of
Journal Pre-proof
X 5 5 8 8 4 4 8 8 ( × ln + × ln + × ln + × ln ) 62 62 62 62 62 62 62 62
= 0.9083
The degree of node v4 is 4, so the node importance of v4 is
4 62 .
The neighbors
of node v4 is {1, 2, 3, 23}, so we can get the node information entropy of node
v4 is 0.9083. Similarly, the node information entropy of the whole network is calculated as shown in Table 1. Table 1
Result of node information entropy in each shell
Node
Node Information Entropy
rna lP
Shell
2
1.0882
1
0.9715
4
0.9083
3
0.8209
5
0.6004
8
0.5572
6/7
0.3750
23
0.6870
17
0.4108
11/12/16/18/19/20/21/22
0.2642
15
0.2439
25
0.2434
9/10/26
0.1768
13/14
0.1465
24
0.1108
3-shell
Jou
2-shell
1-shell
10
Journal Pre-proof
repro of
According to the IKS method, we start select node begin 3-shell, the node
information entropy of node 2 is highest, so the node 2 is first selected. Secondly, node 5 in the 2-shell is selected. Next, node 23 is selected in the 1-shell. Only
three nodes are selected in this iteration, so the selection process continues. In the second iteration, node 1 in the 3-shell is selected firstly, then node 8 in the
2-shell is selected, node 17 in the 1-shell is selected finally. And this process continues until all nodes are selected. Note that in the third iteration, node
6 and node 7 are selected randomly, node 11,12,16,18,19,20,21,22 are selected randomly because their node information entropy is equal.
The ranking orders obtained by different methods are summarized in Table
2. The result implies that the IKS method is not coarse. It is remarkable that
the IKS is a hybrid measure that balances the local metric node information entropy and the global metric k-shell. We argue that the IKS is likely to be effective for identifying influential nodes because it considers the information
Jou
rna lP
entropy and the coreness of a spreader.
11
Journal Pre-proof
The order of spreaders is revealed by different ranking methods: DC, KS,
repro of
Table 2
MDD, θ, Cnc , Cnc+ , C(ks), MCDE and IKS.
Rank
DC
KS
1
2,23
2
1
5,6,7,8
3
3,4,5,8
others
4
MDD
θ
Cnc
Cnc+
C(ks)
MCDE
IKS
1,2,3,4
2
2
2
2
2
1
5,8
1
1
4
23
5
3,4
6,7
3
4
1
1
23
15
5,8
11,12,23
4,23
3
3
3,4
1
5
6,7,17,25
15
9,10,25,26
5,8
5
5
5,8
8
6
others
6,7,17
16-22
6,7
8
23
6,7
17
7
25
24
11,12,15,25
6,7
8
15,25
4
8
others
15
9,10,17,26
23
17
17
6
13,14
others
11,12
6,7
others
11
17
11,12,16
1,2,3,4 2,23
9
11
12 13 14 15 16
Jou
17
rna lP
10
18-22
16
25
7
15
12,16
18-22 25
18-22 9,10,26
9,10,26
15
15
13,14
25
13,14,24
24
9,10,26
4. Experiment and analysis Performance evaluation was conducted for the proposed method and the
conventional centrality methods such as the DC, K-shell, MDD, θ, Cnc , Cnc+ MCDE and C(ks).
12
3
13,14 24
Journal Pre-proof
repro of
4.1. Datasets
Eight real networks are employed, which include (1) Jazz—record jazz bands that performed between 1912 and 1940 [51] (2) USAir—records the condi-
tion of American airline in 1997. The nodes mean the airport of American,
the edges denote the airline between different airports [52] (3) EEC describes
email interchanges between institution members of a large European research institution[53] (4) email—the relationship of mail exchanges between users from University Rovira I Virgili (Tarragona)[54] (5) Hamsterster—friendships and
family links between users of the website hamsterster.com [55] (6) Power—
the power grid of the Western States of the United States of America [56] (7) PGP—an encrypted communication network. Pretty-Good-Privacy algorithms have been developed in order to maintain privacy between peers, wherefore, it is
also called the web of trust of PGP [57] (8) Sex is a bipartite network in which nodes are females and males and links between them are established when males
rna lP
write posts indicating sexual encounters with females[58].
Table 3 shows the network statistical features in above mentioned networks. Where n is the number of nodes, m is the number of edges, hki denotes average degree, c is clustering coefficient, hdi is average shortest path length, βmin is the epidemic threshold in SIR epidemic model, β is the infected probability. Table 3
The statistical features of the eight real complex networks[59]
n
m
hki
c
hdi
βmin
β
Jazz
198
2742
27.6970
0.6175
2.2350
0.0266
0.05
USAir
332
2461
12.8072
0.7494
2.7381
0.0231
0.03
EEC
986
16064
32.5842
0.4505
2.5869
0.0136
0.02
email
1133
5451
9.6222
0.2540
3.6060
0.0530
0.08
Hamsterster
2426
16631
13.711
0.231
3.67
0.022
0.04
Power
4941
6594
2.6691
0.1065
18.9892
0.3483
0.35
PGP
10680
24316
4.5536
0.2659
7.463
0.0558
0.06
Sex
15810
38540
4.8754
0
5.7846
0.0365
0.04
Jou
Network
13
4.2. Evaluation criteria
repro of
Journal Pre-proof
This section introduces the evaluation criteria to verify the effectiveness of
the proposed IKS method. They are the SIR epidemic model, monotonicity relation, correlation coefficient, and the average shortest path length. 4.2.1. SIR epidemic model
At present, most scholars adopt the standard susceptible-infected-recovered (SIR) model[60] to detect the spreading scale of information and viruses. In the SIR epidemic model, nodes have three states:
(i) Susceptible(S) represents the nodes that are susceptible to infection but have not been infected;
(ii) Infected(I) represents the nodes are already infected and with the infected probability β;
rna lP
(iii) Recover(R) represents the nodes that are infected but have recovered with the probability µ and will not be infected again. At the initial time, there are a group of infected seed nodes and all other
nodes are susceptible. At each time step, every infected node makes contact with its neighbors and each susceptible neighbor is infected with a probability β. Then each infected node enters the recovered state with a probability µ and no longer infected after recovered. The spreading process stops when there is no infected node in the network. We set µ = 0.01 in this paper. The proportion of infected nodes denoted by S(t). The transmission rate is expressed as βmin = hki hk2 i [61].
In the SIR simulation, the infection probability β should not be too
Jou
small or too large. If β is too small, the epidemic cannot successfully spread over networks, so the spreading capability of each node cannot be measured. If β is too large, the node will have a better infect ability, thus the epidemic can easily outbreak over almost whole network, which is not conductive to distinguish the influence of the individual nodes. In order to ensure a normal transmission, we choose the value of β to be slightly larger than βmin , the values of the
epidemic threshold βmin and the infection probability β are presented in the 14
Journal Pre-proof
repro of
seventh column and the eighth column of Table 3. The sum of infected nodes
at time t, denoted by S(t), can be considered as an indicator to evaluate the
influence of the initially infected node at time t. Obviously, S(t) increases with the increasing of t, and will reach stable denoted by S, where S represents the eventual influence of the initially infected node. Thus S(t) evaluates the
influence of initially infected nodes at time t, S evaluates the eventual influence. 4.2.2. Monotonicity relation
To quantify the resolution of different ranking methods, a monotonicity index M (R) for a ranking list R is used [20]:
M (R) = 1 −
P
r∈R
Nr (Nr − 1)
N (N − 1)
2
(11)
where N is the size of ranking vector R, Nr is the number of nodes with the
rna lP
same rank index value r. This metric quantifies the fraction of the same rank nodes in the ranking list. M (R) ∈ [0, 1], if M (R) = 1, which means that the ranking method is perfectly monotonic and each node is categorized a different index value. Otherwise, all nodes are in the same rank as M (R) = 0. A larger value of M represents greater difference and uniformity for rank list R. 4.2.3. Correlation coefficient
We adopt Kendall’s coefficient τ [62] to quantify the correlation between ranking list and the spread ability. The Kendall’s coefficient τ is defined as: τ=
Nc − Nd N (N − 1)/2
(12)
Jou
where Nc and Nd are the number of concordant and discordant pairs respectively. N is the number of network nodes. Let (xi , yi ) and (xj , yj ) be a pair of joint observations from two ranking lists X and Y respectively. If any pair of rank xi > xj and yi > yj or xi < xj and yi < yj , the observations (xi , yi )
and (xj , yj ) are said to be concordant. If xi > xj and yi < yj or xi < xj and yi > yj , they are said to be discordant. If xi = xj or yi = yj , the pair is nei-
15
Journal Pre-proof
repro of
ther concordant nor discordant. A large correlation coefficient implies a more concordant relation between two ranking lists. 4.2.4. Average shortest path length Ls
We can also select average shortest path length Ls between infected sources to evaluate the performance of different methods[38]. In this paper, the average
shortest path length Ls between each pair of source spreaders S is used as evaluating metric, it is defined as: Ls =
X 1 lu,v |S| (|S| − 1)
(13)
u,v∈S u6=v
where S is the selected spreader set, |S| denotes the number of spreaders in S and
lu,v denotes the length of the shortest path from node u to node v. This index can measure the distance among the infection sources. Assume that each node
rna lP
has the same propagation ability. If the infected sources are relatively dispersed, better propagation effects can be achieved. Taking the network of Figure 1 as an example, the picture (a) of Figure 2 shows the spreading scale of the four
infected sources of the k-shell method, and picture (b) shows the propagation effect of the four infected nodes of IKS method. Blue nodes represent the source of infection, red circles represent the ability of infected sources. For convenience, it is assumed that the ability of each node is equal. In the picture (a) of Figure 2, there are four nodes {5,6,7,8} are infected. In the picture (b) of Figure 2, there are ten nodes {3,4,6,16,17,18,19,20,21,22} are infected. It can be seen that the IKS method infects more nodes than the k-shell method. Meanwhile, the IKS method can avoid the infected sources too close, reduce the overlap of
Jou
propagation effects and make the propagation effect better.
16
14 12
14
13
15
12
11
17
10
19
9
15
17
10
20
23
21
7 22
5
13
11
18
16
6
repro of
Journal Pre-proof
4
19
9
20
23
6
21
7
22
5
2
18
16
4
2
3
3
24
1
8 25 26
(a) K-shell method
24
1
8
25
26
(b) IKS method
Figure 2: Schematic diagram of the propagation effect
4.3. Experimental analysis
rna lP
This section records the results of the experiment effect of different algorithms. By fixing the infection probability β, we compare the effectiveness on
eight real-world networks. In figure 3, we plot the spreading scale as a function
Jou
with the infected time of nine methods on eight networks in the SIR model.
17
Journal Pre-proof
Jazz
USAir
repro of
0.5
0.8
0.45
0.7
0.4
0.6
S(t)
S(t)
0.35
0.5
DC KS Cnc Cnc+ C(ks)
0.4 0.3
0.25
MDD MCDE IKS
0.2 0.1
DC KS Cnc Cnc+ C(ks)
0.3
MDD MCDE IKS
0.2
0.15
0
2
4
6
8
10
12
14
infected time/t
0
2
4
6
8
10
12
14
16
infected time/t
EEC
0.6
16
email
0.65 0.6
0.55
0.55
0.5
0.5
0.45 0.4
S(t)
S(t)
0.45
DC KS Cnc Cnc+ C(ks)
0.35 0.3 0.25
0.4
DC KS Cnc Cnc+ C(ks)
0.35 0.3
0.25
MDD MCDE IKS
0.2
0.15
0
rna lP
0.15
MDD MCDE IKS
0.2
2
4
6
8
10
12
14
16
0
2
4
6
infected time/t
Hamsterster
0.6
10
12
14
16
Power
1
0.9
0.55
0.8
0.5
0.7
0.45 0.4
S(t)
S(t)
8
infected time/t
DC KS Cnc Cnc+ C(ks)
0.35 0.3
0.6
DC KS Cnc Cnc+ C(ks)
0.5 0.4 0.3
0.25
MDD MCDE IKS
0.2
MDD MCDE IKS
0.2 0.1
0.15 0
2
4
6
8
10
12
14
0
16
2
4
PGP
0.55
Jou
0.5
6
8
10
12
14
16
infected time/t
infected time/t
Sex
0.45
0.4
0.45
0.35
S(t)
S(t)
0.4
0.35
DC KS Cnc Cnc+ C(ks)
0.3
0.25
0.2
2
4
6
8
MDD MCDE IKS
0.15
0.15
0
DC KS Cnc Cnc+ C(ks)
0.25
MDD MCDE IKS
0.2
0.3
10
12
14
16
0
2
Figure 3:
4
6
8
10
12
14
16
infected time/t
infected time/t
Comparison of the spreading 18 scale S(t) as a function of infected time t
of nine methods on eight networks. The number of infected nodes source is set as 20% of the total nodes n. Results are obtained by averaging over 100 independent implementations.
Journal Pre-proof
repro of
Figure 3 shows the results of the proportion of infected nodes S(t) as a function of infected time t. The X-axis is the infected time t, t is set vary from 0 to 16s, and the Y-axis is the proportion of infected nodes in the network. As
can be concluded from the figure, the number of total infected nodes increases with time and ultimately reaches a steady value. At each time step, our method
outperforms all other well-known centrality measures for the number of infected nodes S(t). Meantime, the steady value of our proposed method is the highest. The reason is that the top 20% important nodes ranked by IKS may be more
scattered in the whole network. Other methods entire propagation scope of
Jou
rna lP
every single node will be overlapped with each other, like Figure 2(a).
19
Journal Pre-proof
Jazz
0.9
USAir
repro of
0.45 0.4
0.8
0.35
0.7
S
S
0.3
0.6
DC KS Cnc Cnc+ C(ks)
0.5
0.4
DC KS Cnc Cnc+ C(ks)
0.2
0.15
MDD MCDE IKS
0.3
MDD MCDE IKS
0.1
0.05
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Proportion of initial infected nodes
0.2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Proportion of initial infected nodes
EEC
0.6
email
0.65 0.6
0.55 0.5
0.55
0.45
0.5
0.4
0.45
S
S
0.25
0.35 0.3 0.25
0.35
MDD MCDE IKS
0.25
0.2 0.15
DC KS Cnc Cnc+ C(ks)
0.4
DC KS Cnc Cnc+ C(ks)
0.3
MDD MCDE IKS
0
0.02
rna lP
0.2
0.1
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0
0.2
0.02
0.04
Hamsterster
0.6
0.55
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Proportion of initial infected nodes
Proportion of initial infected nodes
Power
1.4
1.2
0.5
1
S
S
0.45
DC KS Cnc Cnc+ C(ks)
0.4
0.35
DC KS Cnc Cnc+ C(ks)
0.6
0.4
MDD MCDE IKS
0.3
0.8
MDD MCDE IKS
0.2
0.25 0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0
0.2
0.02
Proportion of initial infected nodes
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Proportion of initial infected nodes
PGP
0.55
Sex
Jou
0.5
0.5
0.45 0.4 0.35
0.4
S
S
0.45
DC KS Cnc Cnc+ C(ks)
0.35
0.3
0.02
0.04
0.06
0.08
0.1
DC KS Cnc Cnc+ C(ks)
0.2
MDD MCDE IKS
MDD MCDE IKS
0.15 0.1
0.25
0
0.3 0.25
0.12
0.14
0.16
0.18
0.2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Proportion of initial infected nodes
Proportion of initial infected nodes
20
Figure 4: Comparison of the spreading scale S as a function with the proportion of initial infected nodes on eight networks. The node proportion of source spreaders ranges from 0 to 0.2. Results are obtained by averaging over 100 independent implementations.
Journal Pre-proof
repro of
To evaluate the effect of the proportion of source spreaders, we simulate
the epidemic spreading on eight networks by varying the proportion of initial
infected nodes value from 0 to 0.2. Fig.4 shows the results of the eventual influence S with different proportion of initial infected nodes. The X-axis is
the proportion of initial infected nodes, the Y-axis is the proportion of infected nodes in the network after the propagation of t = 20 steps up to the stable
state. Fig.4 shows that the IKS presents better results than other measures do, especially in Jazz, USAir, EEC, email and Hamsterster. It illustrates that from the infected sources obtained by the IKS method, information can be
spread faster and eventually affects the larger scale than by other methods. It
is reasonable that the community leaders of relatively small clusters may be located in the lower k-shells than the community leaders of larger clusters. The
IKS method can capture the spreading capability of influential nodes in the lower shell when the community structure is obvious.
rna lP
Then the Kendall’s coefficient τ is used to measure the correlation between
ranking result and the spreading ability. The total number of infected and recovered nodes is defined as the spreading ability σi under the influence of the node vi . For accurately measurement, the actual spreading ability of node vi is: M P 1 σ¯i = M σi , where M is the number of repeated experiments on the initial m=1
node. M is set as 100 in our paper. The higher the correlation is, the more accurate the measurement of influence in respect of the node. The ideal case of
τ = 1, indicates that the method uniquely identifies the real influence ranking list.
Fig.5 shows the results of the correlation between the different evaluation
Jou
indexes and actual influence σ¯i on the email network. It can be seen that while the result of θ algorithm is negatively correlated with σ¯i , other algorithms are positively correlated with σ¯i . The IKS method has a better correlation tendency in email network. The correlation curves of the k-shell method in different networks are more divergent, which shows that there is a great difference in spreading ability between nodes with equal k-shell value. Cnc algorithm and IKS algorithm as presented in this paper have a better correlation in different 21
= 0.7744
10
= 0.7913
10
8
8
6
6
4
4
2
2
0
0.01
0.02
0.03
0.04
0.05
0.06
8
6
4
2
0
0
0.07
2
4
6
8
10
12
= 0.8832
6
6
4
4
1
1.5
0
2
0.02
6
4
2
0
2
0.04
0.06
0.08
0.1
10
20
30
40
MDD
Figure 5
0 0
200
400
600
800
50
8
8
6
6
4
4
2
2
0
20
40
60
80
= 0.9138
10
0
0
0
700
4
= 0.7721
10
8
600
C(ks)
= 0.7843
10
500
6
104
Cnc+
400
8
0
0.5
300
= 0.7747
10
2
0
200
Cnc
rna lP
8
0
100
= 0.8143
10
8
2
0
KS
DC
10
= 0.8369
10
0
0
repro of
Journal Pre-proof
100
MCDE
0
2000
4000
6000
8000
10000 12000 14000
IKS
Correlation analyses of different algorithms and actual transmission influ-
Jou
ence on the email network.
22
Journal Pre-proof
influential nodes.
repro of
networks, but the IKS algorithm has an edge over others in the ability to mine
The rank correlation coefficient τ in different neiworks is summarized in
Table 4. We can observe that our methods outperform the other methods in
most cases. In the USAir network, Cnc is highest, IKS ranks the third. In the Power network, Cnc+ is highest. Meanwhile, we witness that the IKS method is
significantly correlated with the transmission ability, which produces extremely monotonic relations with the spreading power of nodes in the network. Table 4
The correlation of different ranking measures compared with the spreading ability is measured by Kendall’s τ .
Network
τ (¯ σ , DC) τ (¯ σ , ks) τ (¯ σ , M DD) τ (¯ σ , θ) τ (¯ σ , Cnc ) τ (¯ σ , Cnc+ ) τ (¯ σ , C(ks)) τ (¯ σ , M CDE) τ (¯ σ , IKS) 0.8673
0.8617
0.7238
0.8652
0.7930
0.8235
0.8015
EEC
0.8886
0.9023
0.8971
email
0.7744
0.7913
0.7843
-0.7747
0.8369
0.8359
0.7653
0.8521
-0.7388
0.8319
0.7849
0.8161
0.7930
-0.8893
0.8908
0.9312
0.8846
0.7834
0.7975
0.8939
0.8822
0.8971
-0.9123
0.9069
0.9168
0.9251
0.8965
0.9283
0.8397
0.8418
0.8501
-0.8061
0.8958
0.9036
0.9081
0.8672
0.8717
Hamsterster Power PGP Sex
-0.6721
0.8080
0.8317
0.8642
0.8545
-0.8946
0.9334
0.8963
0.8925
0.7917
0.8959
-0.9125
0.9269
0.9354
0.9251
0.8914
0.9369
0.8832
0.8143
0.7721
0.9138
0.8610
0.8901
0.8418
0.9126
rna lP
Jazz USAir
Next, we investigate the capability of DC, K-shell, MDD, θ, Cnc , Cnc+ , C(ks), MCDE and IKS method by the monotonicity M to distinguish the spreading ability of nodes. For a specific centrality measure, nodes in the network are ranked according to their centrality values in descending order. Nodes with the same centrality value have the same rank. The monotonicity M of different ranking methods is summarized in Table 5. From Table 5, we can see that the
Jou
monotonicity of the IKS has a very prominent performance in eight networks. Compared with the benchmark centralities, the IKS method can give the higher value of M in most cases. Moreover, M(IKS) is very near 1 in all networks. Therefore, the IKS method can better distinguish the node’s influence. Note that the monotonicity of Cnc+ method is excellent too.
23
Journal Pre-proof
Network
The monotonicity M of different ranking methods.
M (DC) M (KS) M (M DD)
Jazz
0.9659
0.7944
0.9937
USAir
0.8586
0.8114
0.8893
EEC
0.9571
0.9216
0.9691
email
0.8874
0.8088
0.9249
Hamsterster
0.8980
0.8714
0.9278
Power
0.5927
0.2460
0.7048
PGP
0.6193
0.4807
0.6706
Sex
0.6002
0.5288
0.6323
repro of
Table 5
M (θ)
M (Cnc ) M (Cnc+ ) M (C(ks)) M (M CDE) M(IKS) 0.9982
0.9993
0.9994
0.9981
0.9640
0.9628
0.9945
0.9941
0.9179
0.9943
0.9968
0.9975
0.9998
0.9998
0.9774
0.9999 0.9995
0.9783
0.9839
0.9991
0.9989
0.9460
0.9347
0.9751
0.9855
0.9854
0.9523
0.9843
0.9604
0.7292
0.9419
0.9635
0.6671
0.9667
0.9856
0.8920
0.9852
0.9782
0.6753
0.9874
0.9980
0.9332
0.9957
0.9909
0.6469
0.9957
As mentioned in section 1, the IKS method deals with the ”rich-clubs” problem well. It also reduces the influence of overlapped neighbors when the length of infected sources is larger. To verify that the infected sources selected by our method are more scattered than the other eight methods, the average shortest
rna lP
path length Ls in [38] obtained by IKS and other methods are evaluated. Be-
cause of the high computational complexity of distance, we analyze the length of the shortest path using Jazz, USAir, EEC and email. Fig.6 shows the comparison of the average shortest path length Ls obtained by nine methods. The X-axis is the proportion of initial infected nodes, and the Y-axis is the Ls . From Fig.6, we can see that infected sources by IKS method have larger Ls than by other methods. Information can spread faster and more effective when Ls is
Jou
larger.
0.9994
0.9345
24
Journal Pre-proof
Jazz
USAir
2
repro of
2.5
2
1.5
Ls
Ls
1.5
1
DC KS Cnc Cnc+ C(ks)
0.5
0.5
MDD MCDE IKS
MDD MCDE IKS
0 0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Proportion of initial infected nodes
0
0.2
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Proportion of initial infected nodes
(a) Jazz
(b) USAir
EEC
2.5
0.18
DC KS Cnc Cnc+ C(ks)
1
email
3.5
2
3
Ls
2.5
Ls
1.5
DC KS Cnc Cnc+ C(ks)
1
0.5
DC KS Cnc Cnc+ C(ks)
2
1.5
0
rna lP
MDD MCDE IKS
MDD MCDE IKS
1
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0
0.02
Proportion of initial infected nodes
(c) EEC
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Proportion of initial infected nodes
(d) email
Figure 6: Average shortest path length Ls by nine methods under different proportion of source spreaders.
4.4. Computational complexity analysis
In this section, the efficiency of the proposed method is investigated. As can be seen from our algorithm, the time complexity of the whole algorithm depends
Jou
on the calculation of k-shell method. Table 6 shows the time complexity of the different methods mentioned in this paper. The main advantage of our method is achieving high performance with the same complexity with k-shell.
25
Journal Pre-proof
The computational complexity of different methods
Method
Category
DC
Local
k-shell
Global
MDD
Global
θ
Global
Cnc
Hybrid
Cnc+
Hybrid
C(ks)
Local
MCDE
Hybrid
IKS
Hybrid
repro of
Table 6
Computational complexity O (n) O (n)
O (n) O n3 or O (mn) O (n)
O (n) O (n)
O (n)
O (n)
rna lP
It can be seen from Table 6 that these methods have different category metrics and different computational complexity. For example, the θ is based
on the shortest paths, which need the full network information and have a high complexity of O n3 . The C(ks) only needs the local information of a node, and the KS, MDD, θ need the full network information, while the Cnc and Cnc+
need both of them. The computational complexity for k-shell centrality is O (n). The computational complexity of IKS is O (n), which is equal to that of DC, KS, MDD, Cnc and Cnc+ and it is lower than θ. Local measures determine node rank based on the local information and according to the neighbors. Global measures need to traverse the whole network for specification of node rank. Hybrid measures, which make use of both types of information, are evidently
Jou
less efficient than global measures.
Furthermore, the runtime of different methods is examined. The DC and θ
method are ignored here because their complexity is significantly lower or higher than our method. For this purpose, each of the different methods is implemented 100 times and the average run times are calculated. The experiments are carried out on a desktop PC with Intel Core i7 4GHz CPU and 4GB RAM. The results
26
Journal Pre-proof
repro of
of the experiment are shown in Fig.7. The results are shown confirm that KS, Cnc+, C(ks), MDD, MCDE and IKS method run in nearly linear time in
different networks. The KS method is lowest, IKS and Cnc+ method are almost slightly higher than the KS. As the size of the network increases, the proposed method, Cnc+ and C(ks) increase slowly, IKS becomes efficient than the other
measures. It can be attested that there is a direct relationship between the number of edges in the network and the run time. MDD and MCDE are the
slowest methods, they do not often have good differentiation and accuracy. In summary, the IKS is acceptable for large networks. 70
50
KS Cnc+ C(ks) MDD MCDE IKS
40
30
20
10
0
rna lP
Running Time(second)
60
Jazz
USAir
EEC
email
Power
PGP
Dataset
Figure 7: Time costs of execution of different methods on six datasets.
5. Conclusion
Jou
This paper has presented an improved k-shell (IKS) method that efficiently ranks and measures the influential spreaders in complex networks. The IKS method combines the k-shell and the node information entropy to optimize the use of available resources so that the information can be disseminated efficiently. IKS considers the location shell of a node and the influence of the node and its nearest neighbors so as to effectively rank the spreading ability of nodes in complex networks. Experimental results conducted on six real networks show 27
Journal Pre-proof
repro of
that the proposed IKS method distinguishes the difference of node influence
better than conventional centrality methods such as the DC, K-shell, MDD, , Cnc, Cnc+, and C(ks) algorithm. The influential spreaders selected by IKS
are scattered more broadly than that by other methods. This accelerates the
propagation of information. Moreover, computational complexity analysis shows that the IKS method can be extended to the large scale network.
It is still a long-term challenge to find a more efficient method combined by network structure and spreading dynamics to identify the node spreading
influence in large-scale dynamic networks. Our future work will focus on how to apply the proposed IKS method to identify multiple spreaders in dynamic networks.
Acknowledgment
rna lP
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant No.U1530126, the Fundamental Research Funds for the Central Universities (Grant No.ZYGX2016Z005, ZYGX2016J218), and the Meteorological information and Signal Processing Key Laboratory of Sichuan Higher Education Institutes at Chengdu University of Information Technology (Grant No.QXXCSYS201702). The authors would like to thank the anonymous reviewers for their helpful comments and suggestions. We greatly appreciate Chungu Guo and Yanzhou Su for providing guidance and encouragement on Pearsons correlation.
Jou
Data Availability
All relevant datasets and matlab code are available at https://github.com/uestcwm/complex-
network.
28
References
repro of
Journal Pre-proof
[1] A. Zareie, A. Sheikhahmadi, A hierarchical approach for influential node ranking in complex social networks, Expert Systems with Applications 93 (2018) 200–211.
[2] D. Chen, L. L¨ u, M.-S. Shang, Y.-C. Zhang, T. Zhou, Identifying influential nodes in complex networks, Physica a: Statistical mechanics and its applications 391 (4) (2012) 1777–1787.
[3] H. Yu, X. Cao, Z. Liu, Y. Li, Identifying key nodes based on improved
structural holes in complex networks, Physica A: Statistical Mechanics and its Applications 486 (2017) 318–327.
[4] Z.-Y. Jiang, Y. Zeng, Z.-H. Liu, J.-F. Ma, Identifying critical nodes’ group
in complex networks, Physica A: Statistical Mechanics and its Applications
rna lP
514 (2019) 121–132.
[5] Z. Wang, L. Wang, A. Szolnoki, M. Perc, Evolutionary games on multilayer networks: a colloquium, The European physical journal B 88 (5) (2015) 124. [6] M. Medo, Y.-C. Zhang, T. Zhou, Adaptive model for recommendation of news, EPL (Europhysics Letters) 88 (3) (2009) 38005. [7] L. L¨ u, Y.-C. Zhang, C. H. Yeung, T. Zhou, Leaders in social networks, the delicious case, PloS one 6 (6) (2011) e21202. [8] Z. Wang, M. A. Andrews, Z.-X. Wu, L. Wang, C. T. Bauch, Coupled disease–behavior dynamics on complex networks: A review, Physics of life
Jou
reviews 15 (2015) 1–29.
[9] F. Morone, B. Min, L. Bo, R. Mari, H. A. Makse, Collective influence algorithm to find influencers via optimal percolation in massively large social media, Scientific reports 6 (2016) 30062.
29
Journal Pre-proof
repro of
[10] L. Yang, Y. Qiao, Z. Liu, J. Ma, X. Li, Identifying opinion leader nodes in online social networks with a new closeness evaluation algorithm, Soft Computing 22 (2) (2018) 453–464.
[11] D. Purevsuren, G. Cui, Efficient heuristic algorithm for identifying critical nodes in planar networks, Computers & Operations Research 106 (2019) 143–153.
[12] T. Wen, S. Duan, W. Jiang, Node similarity measuring in complex networks with relative entropy, Communications in Nonlinear Science and Numerical Simulation (2019) 104867.
[13] K. Rahimkhani, A. Aleahmad, M. Rahgozar, A. Moeini, A fast algorithm
for finding most influential people based on the linear threshold model, Expert Systems with Applications 42 (3) (2015) 1353–1361.
rna lP
[14] S. Gao, J. Ma, Z. Chen, G. Wang, C. Xing, Ranking the spreading ability of nodes in complex networks based on local structure, Physica A: Statistical Mechanics and its Applications 403 (2014) 130–147. [15] Y. Zhao, S. Li, F. Jin, Identification of influential nodes in social networks
with community structure based on label propagation, Neurocomputing 210 (2016) 34–44.
[16] A. Sheikhahmadi, M. A. Nematbakhsh, A. Shokrollahi, Improving detection of influential nodes in complex networks, Physica A: Statistical Mechanics and its Applications 436 (2015) 833–845.
Jou
[17] Z. Lv, N. Zhao, F. Xiong, N. Chen, A novel measure of identifying influential nodes in complex networks, Physica A: Statistical Mechanics and its Applications 523 (2019) 488–497.
[18] L. C. Freeman, Centrality in social networks conceptual clarification, Social networks 1 (3) (1978) 215–239.
30
Journal Pre-proof
581–603.
repro of
[19] G. Sabidussi, The centrality index of a graph, Psychometrika 31 (4) (1966)
[20] L. C. Freeman, A set of measures of centrality based on betweenness, Sociometry (1977) 35–41.
[21] J. Bae, S. Kim, Identifying and ranking influential spreaders in complex
networks by neighborhood coreness, Physica A: Statistical Mechanics and its Applications 395 (2014) 549–559.
[22] J. Wang, X. Hou, K. Li, Y. Ding, A novel weight neighborhood centrality algorithm for identifying influential spreaders in complex networks, Physica A: Statistical Mechanics and its Applications 475 (2017) 88–105.
[23] A. Zeng, C.-J. Zhang, Ranking spreaders by decomposing complex networks, Physics Letters A 377 (14) (2013) 1031–1035.
rna lP
[24] J.-G. Liu, Z.-Y. Wang, Q. Guo, L. Guo, Q. Chen, Y.-Z. Ni, Identifying multiple influential spreaders via local structural similarity, EPL (Europhysics Letters) 119 (1) (2017) 18001.
[25] L. L¨ u, D. Chen, X.-L. Ren, Q.-M. Zhang, Y.-C. Zhang, T. Zhou, Vital nodes identification in complex networks, Physics Reports 650 (2016) 1–63. [26] L. Fei, Q. Zhang, Y. Deng, Identifying influential nodes in complex networks based on the inverse-square law, Physica A: Statistical Mechanics and its Applications 512 (2018) 1044–1059. [27] M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley,
Jou
H. A. Makse, Identification of influential spreaders in complex networks, Nature physics 6 (11) (2010) 888.
[28] L. Jiang, X. Zhao, B. Ge, W. Xiao, Y. Ruan, An efficient algorithm for mining a set of influential spreaders in complex networks, Physica A: Statistical Mechanics and its Applications 516 (2019) 58–65.
31
Journal Pre-proof
repro of
[29] Z.-M. Ren, J.-G. Liu, F. Shao, Z.-L. Hu, Q. Guo, Analysis of the spreading influence of the nodes with minimum k-shell value in complex networks, Acta Physica Sinica 62 (10) (2013) 956–959.
[30] L.-l. Ma, C. Ma, H.-F. Zhang, B.-H. Wang, Identifying influential spread-
ers in complex networks based on gravity formula, Physica A: Statistical Mechanics and its Applications 451 (2016) 205–212.
[31] X. REN, L. Linyuan, Review of ranking nodes in complex networks, Chinese Science Bulletin 59 (13) (2014) 1175–1197.
[32] C. Salavati, A. Abdollahpouri, Z. Manbari, Ranking nodes in complex net-
works based on local structure and improving closeness centrality, Neurocomputing 336 (2019) 36–45.
[33] D. Liu, Y. Jing, J. Zhao, W. Wang, G. Song, A fast and efficient algorithm
rna lP
for mining top-k nodes in complex networks, Scientific reports 7 (2017) 43330.
[34] A. Namtirtha, A. Dutta, B. Dutta, Identifying influential spreaders in complex networks based on kshell hybrid method, Physica A: Statistical Mechanics and its Applications 499 (2018) 310–324. [35] C. Li, L. Wang, S. Sun, C. Xia, Identification of influential spreaders based on classified neighbors in real-world complex networks, Applied Mathematics and Computation 320 (2018) 512–523.
[36] Z.-L. Hu, J.-G. Liu, G.-Y. Yang, Z.-M. Ren, Effects of the distance among
Jou
multiple spreaders on the spreading, EPL (Europhysics Letters) 106 (1) (2014) 18002.
[37] L. Guo, J.-H. Lin, Q. Guo, J.-G. Liu, Identifying multiple influential spreaders in term of the distance-based coloring, Physics Letters A 380 (7-8) (2016) 837–842.
32
Journal Pre-proof
repro of
[38] J.-X. Zhang, D.-B. Chen, Q. Dong, Z.-D. Zhao, Identifying a set of influential spreaders in complex networks, Scientific reports 6 (2016) 27823.
[39] T. Bian, Y. Deng, Identifying influential nodes in complex networks: A node information dimension approach, Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (4) (2018) 043109.
[40] A. Sheikhahmadi, M. A. Nematbakhsh, Identification of multi-spreader
users in social networks for viral marketing, Journal of Information Science 43 (3) (2017) 412–423.
[41] A. Zareie, A. Sheikhahmadi, M. Jalili, Influential node ranking in social networks based on neighborhood diversity, Future Generation Computer Systems 94 (2019) 120–129.
[42] Y. Liu, B. Wei, Y. Du, F. Xiao, Y. Deng, Identifying influential spreaders
rna lP
by weight degree centrality in complex networks, Chaos, Solitons & Fractals 86 (2016) 1–7.
[43] M. Li, Q. Zhang, Y. Deng, Evidential identification of influential nodes in network of networks, Chaos, Solitons & Fractals 117 (2018) 283–296. [44] Y. Wang, S. Wang, Y. Deng, A modified efficiency centrality to identify influential nodes in weighted networks, Pramana 92 (4) (2019) 68. [45] J.-G. Liu, J.-H. Lin, Q. Guo, T. Zhou, Locating influential nodes via dynamics-sensitive centrality, Scientific reports 6 (2016) 21380. [46] J.-G. Liu, Z.-M. Ren, Q. Guo, Ranking the spreading influence in complex
Jou
networks, Physica A: Statistical Mechanics and its Applications 392 (18) (2013) 4154–4159.
[47] B. Mirzasoleiman, M. Babaei, M. Jalili, M. Safari, Cascaded failures in weighted networks, Physical Review E 84 (4) (2011) 046114.
[48] W.-X. Wang, G. Chen, Universal robustness characteristic of weighted networks against cascading failure, Physical Review E 77 (2) (2008) 026101. 33
Journal Pre-proof
repro of
[49] T. Nie, Z. Guo, K. Zhao, Z.-M. Lu, Using mapping entropy to identify
node centrality in complex networks, Physica A: Statistical Mechanics and its Applications 453 (2016) 290–297.
[50] S. N. Dorogovtsev, A. V. Goltsev, J. F. F. Mendes, K-core organization of complex networks, Physical review letters 96 (4) (2006) 040601.
[51] P. M. Gleiser, L. Danon, Community structure in jazz, Advances in Complex Systems 6 (4) (2003) 565–573.
[52] V. Colizza, R. Pastor-Satorras, A. Vespignani, Reaction–diffusion processes and metapopulation models in heterogeneous networks, Nature Physics 3 (4) (2007) 276–282.
[53] H. Yin, A. R. Benson, J. Leskovec, D. F. Gleich, Local higher-order graph
clustering, in: Proceedings of the 23rd ACM SIGKDD International Con-
564.
rna lP
ference on Knowledge Discovery and Data Mining, ACM, 2017, pp. 555–
[54] R. Guimera, , l. danon, a. dıaz-guilera, f. giralt, a. arenas, Phys. Rev. E 68 (2003) 065103.
[55] Hamsterster full network dataset – KONECT (Apr. 2017). URL http://konect.uni-koblenz.de/networks/petster-hamster [56] D. J. Watts, S. H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393 (1) (1998) 440–442.
[57] M. Bogu, R. Pastor-Satorras, A. Daz-Guilera, A. Arenas, Models of social
Jou
networks based on social distance attachment, Phys. Rev. E 70 (5) (2004) 056122.
[58] L. E. Rocha, F. Liljeros, P. Holme, Simulated epidemics in an empirical spatiotemporal network of 50,185 sexual contacts, PLoS computational biology 7 (3) (2011) e1001109.
34
Journal Pre-proof
repro of
[59] J. Kunegis, Konect: the koblenz network collection, in: Proceedings of
the 22nd International Conference on World Wide Web, ACM, 2013, pp. 1343–1350.
[60] K. J. Sharkey, Deterministic epidemic models on contact networks: correlations and unbiological terms, Theoretical population biology 79 (4) (2011) 115–129.
[61] C. Castellano, R. Pastor-Satorras, Thresholds for epidemic spreading in networks, Physical review letters 105 (21) (2010) 218701.
[62] S. Xiao-Ping, S. Yu-Rong, Leveraging neighborhood” structural holes” to
Jou
rna lP
identifying key spreaders in social networks, Acta physica sinica 64 (2).
35
Journal Pre-proof Highlights Our paper proposed a new method to identify influential nodes in complex network which the experimental results show accurately and superiority. The highlights of this paper are list as follows: A novel method is proposed to identify influential spreaders based on the improved k-shell.
repro of
Node information entropy is proposed based on the node and its neighbors.
Only the node with largest node information entropy in each shell is selected.
Our approach can guarantee that the spreaders not only influence but also scattered.
rna lP
The proposed method outperforms other measures in eight real-world networks.
Jou
Journal Pre-proof Confict of interest We declare that we do not have any commercial or associative interest that
Jou
rna lP
repro of
represssents a conflict of interest in connection with the work submitted.