Identifying influential spreaders in complex networks based on improved k-shell method

Identifying influential spreaders in complex networks based on improved k-shell method

Journal Pre-proof Identifying influential spreaders in complex networks based on improved k-shell method Min Wang, Wanchun Li, Yuning Guo, Xiaoyan Pen...

1MB Sizes 0 Downloads 37 Views

Journal Pre-proof Identifying influential spreaders in complex networks based on improved k-shell method Min Wang, Wanchun Li, Yuning Guo, Xiaoyan Peng, Yingxiang Li

PII: DOI: Reference:

S0378-4371(20)30055-8 https://doi.org/10.1016/j.physa.2020.124229 PHYSA 124229

To appear in:

Physica A

Received date : 16 January 2019 Revised date : 15 January 2020 Please cite this article as: M. Wang, W. Li, Y. Guo et al., Identifying influential spreaders in complex networks based on improved k-shell method, Physica A (2020), doi: https://doi.org/10.1016/j.physa.2020.124229. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.

Journal Pre-proof

repro of

Identifying influential spreaders in complex networks based on improved k-shell method Min Wanga , Wanchun Lia,∗, Yuning Guoa , Xiaoyan Penga , Yingxiang Lib a School

of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China b Department of Communication Engineering, Meteorological information and Signal Processing Key Laboratory of Sichuan Higher Education Institutes of Chengdu University of Information Technology, Chengdu 611731, China

Abstract

Identifying influential spreaders in complex networks is a fundamental network project. It has drawn great attention in recent years because of its great

theoretical significance and practical value in some fields. K-shell is an efficient

rna lP

method for identifying influential spreaders. However, k-shell neglects informa-

tion about the topological position of the nodes. In this paper, we propose an improved algorithm based on the k-shell and node information entropy named IKS to identify influential spreaders from the higher shell as well as the lower shell. The proposed method employs the susceptible-infected-recovered (SIR) epidemic model, Kendall’s coefficient τ , the monotonicity M, and the average shortest path length Ls to evaluate the performance and compare with other benchmark methods. The results of the experiment on eight real-world networks show that the proposed method can rank the influential spreaders more accurately. Moreover, IKS has superior computational complexity and can be extended to large-scale networks.

Jou

Keywords: Complex network, Influential spreaders, Improved k-shell, Node information entropy, SIR epidemic model

∗ Corresponding

author Email address: [email protected] (Wanchun Li)

Preprint submitted to Journal of Physica A

January 15, 2020

1. Introduction

repro of

Journal Pre-proof

With the rise of complex network research, the identification of influential spreaders in the network is critical for global message dissemination [1] and

effective broadcast of various news [2, 3, 4]. This research has been used in many applications, such as disease spreading control [5], information dissemination

[6, 7], virus spreading deceleration [8, 9], social network analysis[10, 11] and

new product campaign [12]. A basic question is how to identify influential

spreaders in complex networks [13]. A variety of methods have been reported to identify influential spreaders in complex networks [14, 15, 16, 17].

There are many classic centrality methods such as degree centrality [18], closeness centrality [19], betweenness centrality [20], neighborhood-based cen-

tralities [21, 22], path-based centralities [23], local structural similarity [24], iterative refinement centralities[25], and so on[26]. The k-shell method proposed

rna lP

by Kitsak et al. [27] is the most widely used method [28], which indicates that

the nodes’ location is one of the most essential factors to evaluate the most efficient spreaders. The k-shell decomposition tends to assign many nodes with an identical k-shell index, but the importance of nodes may different [29]. Studies have shown that the nodes with the largest coreness value may not be the super spreaders [30, 31]. Moreover, as the k-shell index does not provide sufficient information about the topological locations of nodes [32], the influential spreaders evaluated by the k-shell method are located at the core of graph, which shows the phenomenon of rich-clubs [33]. Avoiding the phenomenon of rich-clubs can achieve better spread effects [34, 35].

Several studies have been devoted to improve the performance of the k-shell

Jou

algorithm and avoid the rich-clubs phenomenon. Hu et al.[36] identified the multiple influential spreaders in community networks and found that the distance among spreaders plays an important role. Guo et al.[37] proposed an improved distance-based coloring method to identify influential spreaders. Zhang et al. [38] proposed a method to select a spreader source with the best spreading capacity, which can effectively avoid the rich-clubs phenomenon. Ren et al.

2

Journal Pre-proof

repro of

[29] developed a method to identify the spreading capability of nodes with the

minimum k-shell value. The method is effective only for those minimum k-shell nodes. Bian et al.[39] proposed a measure based on node information dimension to identify the influential nodes, the local dimensions of different topological dis-

tance scales for each node constitute the node information dimension, but its

computational complexity is high. Sheikhahmadi et al.[40] defined a hybrid parameter based on degree, structural location and dispersion of the neighbors to

discover influential nodes. Recently, Zareie et al. [41] propose an index that

determines node centrality using the notions of Shannon entropy and JensenShannon Divergence. In this index, the influential node is specified based on the dispersion of its neighbors in the graph as well as the specification of their

influence spread spheres on the network and the intensity of the influence. Addi-

tionally, several node ranking algorithms were proposed to improve the ranking

performance [42, 43, 44, 45]. The influence of a node is determined together by

rna lP

the location and neighbor nodes. A node belonging to a lower shell sometimes can be more powerful than nodes belonging to a higher shell [40]. Therefore, it is critical to design an effective method to rank the node importance. In this paper, we consider the influential spreaders not only from the highest shell but also from the lower shell. In addition to the shell of a node, its

local metric is considered. We present a selection method to identify influential nodes based on the k-shell method and the influence of the neighbor nodes. The SIR model is employed to simulate the epidemic spreading process on eight real-world networks to evaluate the effectiveness of the proposed method. Experimental results are presented to demonstrate that our new selection algo-

Jou

rithm can identify the influential nodes more accurately than the conventional centrality methods such as the degree centrality, k-shell, neighborhood coreness centrality, weight neighborhood centrality, improved k-shell indicator, and the mixed degree decomposition. Since the proposed method is based on the k-shell decomposition method with low computational complexity, it can be applied to large scale networks.

The remainder of this paper is organized as follows. Section 2 briefly re3

Journal Pre-proof

repro of

views the conventional centrality methods used for comparison. The proposed

improved k-shell (IKS) method is described in section 3. Section 4 includes data description, evaluation methods and experimental results. Finally, the conclusion and future works are presented in section 5.

2. Related Works

Supposing an undirected and unweighted complex network can be repre-

sented as a graph G(V, E), where N = |V | and M = |E| denote the number of all nodes and edges respectively. Meanwhile, A = (aij )n×n denotes the adjacent

matrix, where aij = 1 if node vi is connected to node vj , else aij = 0. Besides, we use Γ(v) to denote the set of neighbors of node v. 2.1. Degree centrality

rna lP

The degree centrality (DC)[18] is the simplest index to measure nodes’ influence. The larger degree a node has, the greater influence of the node gets. Since it can spread its impact into more neighbor nodes. The normalized degree centrality is defined as:

DC(i) =

ki N −1

(1)

where N is the number of nodes in G, ki is the degree of a node vi , which is defined as the number of directly connected neighbors of node vi . 2.2. K-shell method

The K-shell decomposition method is used to quantify the nodal significance by dividing all nodes into different shells [27]. The algorithm removes the 1-

Jou

degree nodes firstly, iteratively proceed until there exist no nodes with degree 1 in the network. All nodes and edges which have been removed create the 1-shell. Repeatedly performing this procedure, we can obtain the 2-shell, 3shell· · · · · · and so on. Finally, each node will belong to a specic shell, within which all nodes have the same coreness value. Accordingly, the nodes within the same shell are thought to hold the same importance and the same spreading

4

Journal Pre-proof

repro of

capability. Thus, k-shell is a coarse-grained index to characterize the node

ranking importance since it neglects all the information of removed nodes and

edges. Thus the results obtained by the k-shell method are not suitable in some real-world networks.

2.3. Mixed degree decomposition method

The mixed degree decomposition (MDD) method was proposed by Zeng et

al.[23] to improve the precision of k-shell. It modifies the process by considering both the residual degree kr and the exhausted degree ke . Let km (v) denotes

the mixed degree of node v. The larger the value km is, the greater the node influences the network. The mixed degree of node v is defined as: km (v) = kr + λ · ke

(2)

where λ is a tunable parameter between 0 and 1. Note that the MDD method

rna lP

coincides with the k-shell method when λ = 0, but it is equivalent to the degree

when λ = 1. The network continues to layer according to the new mixed value, but it is difficult to find the optimal parameter λ to achieve a better result. 2.4. Improved k-shell indicator

The improved k-shell indicator[46] by taking into account the shortest distance from a target node to the network core which is defined as the node set with the highest k-shell values. The development of the influence of the nodes with the same amount of k-shell can be identified by equation (3):

Jou

θ(v) = (ksmax − ks(v) + 1)

X

d(v, w)

(3)

w∈Sc

where ksmax is the largest k-shell in the network, Sc is the set of core nodes

with the highest k-shell index ksmax and d(v, w) is the shortest distance from the node v to w ∈ Sc . The larger the value θ is, the farther the node lies to

the network core, which indicates the node has less importance. Although this method is able to differentiate the nodes of a shell, the computational cost is relatively expensive by calculating the shortest distance to the core nodes. 5

Journal Pre-proof

repro of

2.5. Neighborhood coreness centrality

The neighborhood coreness centrality is proposed by Bae et al.[21] to estimate the spreading influence of a node in a network by summing all neighbors’ k-shell values. A high neighborhood coreness value indicates that a spreader

with more connections to the neighbors located in the core of the network. The neighborhood coreness of node v is defined as: X

Cnc (v) =

ks(w)

(4)

w∈N (v)

where N (v) is the set of the neighbors adjacent to node v and ks(w) is the

k-shell index of its neighbor node w. Furthermore, the extended neighborhood coreness Cnc+ of node v is calculated by: Cnc+ (v) =

X

Cnc (w)

(5)

w∈N (v)

rna lP

where Cnc (w) is the neighborhood coreness of neighbor w. 2.6. Weight neighborhood centrality

The weight neighborhood centrality was proposed[22] to consider the centrality of a node and its neighbors’ centrality to generate the influential ranking list. The diffusion importance of an edge is used to adjust the influence of its neighbors’ centrality. The weigh neighborhood centrality of node i use an edge weighting method based on the power-low function of degree [47, 48] to quantify the diffusion importance of edges, which is defined as: Ci (ϕ) = ϕi +

X

j∈Γ (i)

wij · ϕj

(6)

Jou

where ϕ is the benchmark centrality, we use k-shell in this paper. Γ (i) is the set of nearest neighbors of node i, wij is the diffusion importance of edge eij and is the average diffusion importance of all edges. This equation means that the weight neighborhood centrality encodes the centrality of a node and its neighbors which depend on the diffusion importance of links. Besides, the neighbors’ effect increases with the increase of diffusion importance of the link between it and the origin node. 6

Journal Pre-proof

repro of

2.7. Mixed Core, Degree and Entropy

The mixed Core, Degree and Entropy were proposed based on node degree and topological location as well as the diversity of the neighbors in different

shells. Sheikhahmadi et al.[40] defined weighted entropy as in Eq.(7) for de-

termining the diversity of the neighbors of node v on the graph given the significance of neighboring nodes with greater k-shells in the specification of the influentiality of a node:

Entropy(v) = −

ks max X i=1

(pi ∗ log2 pi )

(7)

where pi is the probability of node v’s friends presence in the ith core, which is calculated by equation (8): pi =

Count(v 0 s f riends in core i) k(v)

(8)

as:

rna lP

In this approach, MCDE(the mixed core, degree and entropy) was defined

M CDE(v) = αks(v) + βk(v) + γEntropy(v)

(9)

In equation (9), ks(v) is the core number of node v. k(v) is the number of node v links. Entropy(v) is used to calculate the dispersion of node v’s friends in different cores.

To adjust the effect of the core, degree and entropy of each node, three adjustable parameters, α, β and γ, are used. To equate the effect of these measures, the amounts of these three parameters are considered to be equal

Jou

(α=β=γ=1).

3. The improved k-shell algorithm As shown above, many methods have been proposed to rank nodes in net-

works. These measures have their advantages and disadvantages. In the field of information science, when we want to analyze the uncertainty of any information, we need to make a quantitative analysis of the information. Shannon

7

Journal Pre-proof

repro of

proposed information entropy to represent the uncertain degree of information.

Moreover, the larger the entropy, the more uncertain the information. In order

to measure the spreading contribution of neighbor nodes to the specific node, we extend the concept of entropy into complex networks, called as node information entropy. Here, the node information entropy captures the global structure

in the network, and the larger the entropy, the more influential the node is.

Although the concept of information entropy is mentioned in [49], he didn’t consider location information. Assuming that the degree of the node vi is ki , the node importance of vi is Ii =

ki , N P kj

where N is the number of nodes in G,

j=1

so the node information entropy is defined as: ei = −

X

j∈Γ(i)

Ij · lnIj

(10)

where j ∈ Γ(i) is the set of neighbors of node vi . Node information entropy

rna lP

takes the propagation effect of neighbor nodes into account, the larger the node

information entropy is, the easier it propagates influence to neighbor nodes, so the node is more influential. Node information entropy is an unordered metric of the network. If the network is randomly connected, the node information entropy of each node is similar. Conversely, if the network is scale-free, there are a small number of nodes are high connectivity and a large number of nodes with low connectivity. Every node has different node importance, node information entropy distribution is uneven, so the more important node will have the larger node information entropy. The IKS method is conducted using the following procedure:

Jou

Step 1: Decomposing the network into k shells according to the k-shell decomposition algorithm;

Step 2: Calculating the node information entropy ei according to the formula

(10);

Step 3: Sorting the nodes in each shell according to the node information en-

tropy from large to small; Step 4: For nodes with the highest k-shell value, select the node which has the 8

Journal Pre-proof

repro of

largest node information entropy. Then select the node in next to the highest shell which has the largest node information entropy. This process continues

until the node selected in the 1-shell. At this time, the first iteration is finished; Step 5: Repeat Step 4 and select the residual nodes until all nodes have been

selected. To ignore the shell in which all nodes are selected. Choosing the node randomly when the value of the node information entropy is equal in the specific shell.

This method can solve the shortcomings of the original k-shell method. It

divides the nodes in the network into different shells according to the node information entropy. In order to avoid the problem that some influential spreaders

are so close together that they overlap sphere of influence, only one node in every hierarchical node set is selected. The IKS selects influential nodes not only from the higher shell, hence every node in each shell has different importance.

To give an intuitive explanation of our algorithm, we consider the following

rna lP

example. Fig.1[50] is a simple network with 26 nodes.Firstly, we use the k-shell

decomposition algorithm decomposing the network into 3 shells. As shown in Fig.1, the nodes set {1,2,3,4} is in 3-shell, {5,6,7,8} is in 2-shell, from node 9 to node 26 are in 1-shell. We calculate the node information entropy results of all nodes in three shells. The node 4 is taken as an example to calculate as follows.

14

12

13

15

11

17

10

18

16

9

23

6

7

K=2

5

20

K=1 21 22

4

2

Jou

19

K=3

3

24

1

8 25 26

Figure 1: An illustrative example of a simple network[50]

9

I4 =

k4 4 = N P 62 kj

j=1

e4 = − =−

X

j∈Γ(v4 )

Ij · lnIj

repro of

Journal Pre-proof

X 5 5 8 8 4 4 8 8 ( × ln + × ln + × ln + × ln ) 62 62 62 62 62 62 62 62

= 0.9083

The degree of node v4 is 4, so the node importance of v4 is

4 62 .

The neighbors

of node v4 is {1, 2, 3, 23}, so we can get the node information entropy of node

v4 is 0.9083. Similarly, the node information entropy of the whole network is calculated as shown in Table 1. Table 1

Result of node information entropy in each shell

Node

Node Information Entropy

rna lP

Shell

2

1.0882

1

0.9715

4

0.9083

3

0.8209

5

0.6004

8

0.5572

6/7

0.3750

23

0.6870

17

0.4108

11/12/16/18/19/20/21/22

0.2642

15

0.2439

25

0.2434

9/10/26

0.1768

13/14

0.1465

24

0.1108

3-shell

Jou

2-shell

1-shell

10

Journal Pre-proof

repro of

According to the IKS method, we start select node begin 3-shell, the node

information entropy of node 2 is highest, so the node 2 is first selected. Secondly, node 5 in the 2-shell is selected. Next, node 23 is selected in the 1-shell. Only

three nodes are selected in this iteration, so the selection process continues. In the second iteration, node 1 in the 3-shell is selected firstly, then node 8 in the

2-shell is selected, node 17 in the 1-shell is selected finally. And this process continues until all nodes are selected. Note that in the third iteration, node

6 and node 7 are selected randomly, node 11,12,16,18,19,20,21,22 are selected randomly because their node information entropy is equal.

The ranking orders obtained by different methods are summarized in Table

2. The result implies that the IKS method is not coarse. It is remarkable that

the IKS is a hybrid measure that balances the local metric node information entropy and the global metric k-shell. We argue that the IKS is likely to be effective for identifying influential nodes because it considers the information

Jou

rna lP

entropy and the coreness of a spreader.

11

Journal Pre-proof

The order of spreaders is revealed by different ranking methods: DC, KS,

repro of

Table 2

MDD, θ, Cnc , Cnc+ , C(ks), MCDE and IKS.

Rank

DC

KS

1

2,23

2

1

5,6,7,8

3

3,4,5,8

others

4

MDD

θ

Cnc

Cnc+

C(ks)

MCDE

IKS

1,2,3,4

2

2

2

2

2

1

5,8

1

1

4

23

5

3,4

6,7

3

4

1

1

23

15

5,8

11,12,23

4,23

3

3

3,4

1

5

6,7,17,25

15

9,10,25,26

5,8

5

5

5,8

8

6

others

6,7,17

16-22

6,7

8

23

6,7

17

7

25

24

11,12,15,25

6,7

8

15,25

4

8

others

15

9,10,17,26

23

17

17

6

13,14

others

11,12

6,7

others

11

17

11,12,16

1,2,3,4 2,23

9

11

12 13 14 15 16

Jou

17

rna lP

10

18-22

16

25

7

15

12,16

18-22 25

18-22 9,10,26

9,10,26

15

15

13,14

25

13,14,24

24

9,10,26

4. Experiment and analysis Performance evaluation was conducted for the proposed method and the

conventional centrality methods such as the DC, K-shell, MDD, θ, Cnc , Cnc+ MCDE and C(ks).

12

3

13,14 24

Journal Pre-proof

repro of

4.1. Datasets

Eight real networks are employed, which include (1) Jazz—record jazz bands that performed between 1912 and 1940 [51] (2) USAir—records the condi-

tion of American airline in 1997. The nodes mean the airport of American,

the edges denote the airline between different airports [52] (3) EEC describes

email interchanges between institution members of a large European research institution[53] (4) email—the relationship of mail exchanges between users from University Rovira I Virgili (Tarragona)[54] (5) Hamsterster—friendships and

family links between users of the website hamsterster.com [55] (6) Power—

the power grid of the Western States of the United States of America [56] (7) PGP—an encrypted communication network. Pretty-Good-Privacy algorithms have been developed in order to maintain privacy between peers, wherefore, it is

also called the web of trust of PGP [57] (8) Sex is a bipartite network in which nodes are females and males and links between them are established when males

rna lP

write posts indicating sexual encounters with females[58].

Table 3 shows the network statistical features in above mentioned networks. Where n is the number of nodes, m is the number of edges, hki denotes average degree, c is clustering coefficient, hdi is average shortest path length, βmin is the epidemic threshold in SIR epidemic model, β is the infected probability. Table 3

The statistical features of the eight real complex networks[59]

n

m

hki

c

hdi

βmin

β

Jazz

198

2742

27.6970

0.6175

2.2350

0.0266

0.05

USAir

332

2461

12.8072

0.7494

2.7381

0.0231

0.03

EEC

986

16064

32.5842

0.4505

2.5869

0.0136

0.02

email

1133

5451

9.6222

0.2540

3.6060

0.0530

0.08

Hamsterster

2426

16631

13.711

0.231

3.67

0.022

0.04

Power

4941

6594

2.6691

0.1065

18.9892

0.3483

0.35

PGP

10680

24316

4.5536

0.2659

7.463

0.0558

0.06

Sex

15810

38540

4.8754

0

5.7846

0.0365

0.04

Jou

Network

13

4.2. Evaluation criteria

repro of

Journal Pre-proof

This section introduces the evaluation criteria to verify the effectiveness of

the proposed IKS method. They are the SIR epidemic model, monotonicity relation, correlation coefficient, and the average shortest path length. 4.2.1. SIR epidemic model

At present, most scholars adopt the standard susceptible-infected-recovered (SIR) model[60] to detect the spreading scale of information and viruses. In the SIR epidemic model, nodes have three states:

(i) Susceptible(S) represents the nodes that are susceptible to infection but have not been infected;

(ii) Infected(I) represents the nodes are already infected and with the infected probability β;

rna lP

(iii) Recover(R) represents the nodes that are infected but have recovered with the probability µ and will not be infected again. At the initial time, there are a group of infected seed nodes and all other

nodes are susceptible. At each time step, every infected node makes contact with its neighbors and each susceptible neighbor is infected with a probability β. Then each infected node enters the recovered state with a probability µ and no longer infected after recovered. The spreading process stops when there is no infected node in the network. We set µ = 0.01 in this paper. The proportion of infected nodes denoted by S(t). The transmission rate is expressed as βmin = hki hk2 i [61].

In the SIR simulation, the infection probability β should not be too

Jou

small or too large. If β is too small, the epidemic cannot successfully spread over networks, so the spreading capability of each node cannot be measured. If β is too large, the node will have a better infect ability, thus the epidemic can easily outbreak over almost whole network, which is not conductive to distinguish the influence of the individual nodes. In order to ensure a normal transmission, we choose the value of β to be slightly larger than βmin , the values of the

epidemic threshold βmin and the infection probability β are presented in the 14

Journal Pre-proof

repro of

seventh column and the eighth column of Table 3. The sum of infected nodes

at time t, denoted by S(t), can be considered as an indicator to evaluate the

influence of the initially infected node at time t. Obviously, S(t) increases with the increasing of t, and will reach stable denoted by S, where S represents the eventual influence of the initially infected node. Thus S(t) evaluates the

influence of initially infected nodes at time t, S evaluates the eventual influence. 4.2.2. Monotonicity relation

To quantify the resolution of different ranking methods, a monotonicity index M (R) for a ranking list R is used [20]: 

M (R) = 1 −

P

r∈R

Nr (Nr − 1)

N (N − 1)

2 

(11)

where N is the size of ranking vector R, Nr is the number of nodes with the

rna lP

same rank index value r. This metric quantifies the fraction of the same rank nodes in the ranking list. M (R) ∈ [0, 1], if M (R) = 1, which means that the ranking method is perfectly monotonic and each node is categorized a different index value. Otherwise, all nodes are in the same rank as M (R) = 0. A larger value of M represents greater difference and uniformity for rank list R. 4.2.3. Correlation coefficient

We adopt Kendall’s coefficient τ [62] to quantify the correlation between ranking list and the spread ability. The Kendall’s coefficient τ is defined as: τ=

Nc − Nd N (N − 1)/2

(12)

Jou

where Nc and Nd are the number of concordant and discordant pairs respectively. N is the number of network nodes. Let (xi , yi ) and (xj , yj ) be a pair of joint observations from two ranking lists X and Y respectively. If any pair of rank xi > xj and yi > yj or xi < xj and yi < yj , the observations (xi , yi )

and (xj , yj ) are said to be concordant. If xi > xj and yi < yj or xi < xj and yi > yj , they are said to be discordant. If xi = xj or yi = yj , the pair is nei-

15

Journal Pre-proof

repro of

ther concordant nor discordant. A large correlation coefficient implies a more concordant relation between two ranking lists. 4.2.4. Average shortest path length Ls

We can also select average shortest path length Ls between infected sources to evaluate the performance of different methods[38]. In this paper, the average

shortest path length Ls between each pair of source spreaders S is used as evaluating metric, it is defined as: Ls =

X 1 lu,v |S| (|S| − 1)

(13)

u,v∈S u6=v

where S is the selected spreader set, |S| denotes the number of spreaders in S and

lu,v denotes the length of the shortest path from node u to node v. This index can measure the distance among the infection sources. Assume that each node

rna lP

has the same propagation ability. If the infected sources are relatively dispersed, better propagation effects can be achieved. Taking the network of Figure 1 as an example, the picture (a) of Figure 2 shows the spreading scale of the four

infected sources of the k-shell method, and picture (b) shows the propagation effect of the four infected nodes of IKS method. Blue nodes represent the source of infection, red circles represent the ability of infected sources. For convenience, it is assumed that the ability of each node is equal. In the picture (a) of Figure 2, there are four nodes {5,6,7,8} are infected. In the picture (b) of Figure 2, there are ten nodes {3,4,6,16,17,18,19,20,21,22} are infected. It can be seen that the IKS method infects more nodes than the k-shell method. Meanwhile, the IKS method can avoid the infected sources too close, reduce the overlap of

Jou

propagation effects and make the propagation effect better.

16

14 12

14

13

15

12

11

17

10

19

9

15

17

10

20

23

21

7 22

5

13

11

18

16

6

repro of

Journal Pre-proof

4

19

9

20

23

6

21

7

22

5

2

18

16

4

2

3

3

24

1

8 25 26

(a) K-shell method

24

1

8

25

26

(b) IKS method

Figure 2: Schematic diagram of the propagation effect

4.3. Experimental analysis

rna lP

This section records the results of the experiment effect of different algorithms. By fixing the infection probability β, we compare the effectiveness on

eight real-world networks. In figure 3, we plot the spreading scale as a function

Jou

with the infected time of nine methods on eight networks in the SIR model.

17

Journal Pre-proof

Jazz

USAir

repro of

0.5

0.8

0.45

0.7

0.4

0.6

S(t)

S(t)

0.35

0.5

DC KS Cnc Cnc+ C(ks)

0.4 0.3

0.25

MDD MCDE IKS

0.2 0.1

DC KS Cnc Cnc+ C(ks)

0.3

MDD MCDE IKS

0.2

0.15

0

2

4

6

8

10

12

14

infected time/t

0

2

4

6

8

10

12

14

16

infected time/t

EEC

0.6

16

email

0.65 0.6

0.55

0.55

0.5

0.5

0.45 0.4

S(t)

S(t)

0.45

DC KS Cnc Cnc+ C(ks)

0.35 0.3 0.25

0.4

DC KS Cnc Cnc+ C(ks)

0.35 0.3

0.25

MDD MCDE IKS

0.2

0.15

0

rna lP

0.15

MDD MCDE IKS

0.2

2

4

6

8

10

12

14

16

0

2

4

6

infected time/t

Hamsterster

0.6

10

12

14

16

Power

1

0.9

0.55

0.8

0.5

0.7

0.45 0.4

S(t)

S(t)

8

infected time/t

DC KS Cnc Cnc+ C(ks)

0.35 0.3

0.6

DC KS Cnc Cnc+ C(ks)

0.5 0.4 0.3

0.25

MDD MCDE IKS

0.2

MDD MCDE IKS

0.2 0.1

0.15 0

2

4

6

8

10

12

14

0

16

2

4

PGP

0.55

Jou

0.5

6

8

10

12

14

16

infected time/t

infected time/t

Sex

0.45

0.4

0.45

0.35

S(t)

S(t)

0.4

0.35

DC KS Cnc Cnc+ C(ks)

0.3

0.25

0.2

2

4

6

8

MDD MCDE IKS

0.15

0.15

0

DC KS Cnc Cnc+ C(ks)

0.25

MDD MCDE IKS

0.2

0.3

10

12

14

16

0

2

Figure 3:

4

6

8

10

12

14

16

infected time/t

infected time/t

Comparison of the spreading 18 scale S(t) as a function of infected time t

of nine methods on eight networks. The number of infected nodes source is set as 20% of the total nodes n. Results are obtained by averaging over 100 independent implementations.

Journal Pre-proof

repro of

Figure 3 shows the results of the proportion of infected nodes S(t) as a function of infected time t. The X-axis is the infected time t, t is set vary from 0 to 16s, and the Y-axis is the proportion of infected nodes in the network. As

can be concluded from the figure, the number of total infected nodes increases with time and ultimately reaches a steady value. At each time step, our method

outperforms all other well-known centrality measures for the number of infected nodes S(t). Meantime, the steady value of our proposed method is the highest. The reason is that the top 20% important nodes ranked by IKS may be more

scattered in the whole network. Other methods entire propagation scope of

Jou

rna lP

every single node will be overlapped with each other, like Figure 2(a).

19

Journal Pre-proof

Jazz

0.9

USAir

repro of

0.45 0.4

0.8

0.35

0.7

S

S

0.3

0.6

DC KS Cnc Cnc+ C(ks)

0.5

0.4

DC KS Cnc Cnc+ C(ks)

0.2

0.15

MDD MCDE IKS

0.3

MDD MCDE IKS

0.1

0.05

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Proportion of initial infected nodes

0.2

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Proportion of initial infected nodes

EEC

0.6

email

0.65 0.6

0.55 0.5

0.55

0.45

0.5

0.4

0.45

S

S

0.25

0.35 0.3 0.25

0.35

MDD MCDE IKS

0.25

0.2 0.15

DC KS Cnc Cnc+ C(ks)

0.4

DC KS Cnc Cnc+ C(ks)

0.3

MDD MCDE IKS

0

0.02

rna lP

0.2

0.1

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0

0.2

0.02

0.04

Hamsterster

0.6

0.55

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Proportion of initial infected nodes

Proportion of initial infected nodes

Power

1.4

1.2

0.5

1

S

S

0.45

DC KS Cnc Cnc+ C(ks)

0.4

0.35

DC KS Cnc Cnc+ C(ks)

0.6

0.4

MDD MCDE IKS

0.3

0.8

MDD MCDE IKS

0.2

0.25 0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0

0.2

0.02

Proportion of initial infected nodes

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Proportion of initial infected nodes

PGP

0.55

Sex

Jou

0.5

0.5

0.45 0.4 0.35

0.4

S

S

0.45

DC KS Cnc Cnc+ C(ks)

0.35

0.3

0.02

0.04

0.06

0.08

0.1

DC KS Cnc Cnc+ C(ks)

0.2

MDD MCDE IKS

MDD MCDE IKS

0.15 0.1

0.25

0

0.3 0.25

0.12

0.14

0.16

0.18

0.2

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Proportion of initial infected nodes

Proportion of initial infected nodes

20

Figure 4: Comparison of the spreading scale S as a function with the proportion of initial infected nodes on eight networks. The node proportion of source spreaders ranges from 0 to 0.2. Results are obtained by averaging over 100 independent implementations.

Journal Pre-proof

repro of

To evaluate the effect of the proportion of source spreaders, we simulate

the epidemic spreading on eight networks by varying the proportion of initial

infected nodes value from 0 to 0.2. Fig.4 shows the results of the eventual influence S with different proportion of initial infected nodes. The X-axis is

the proportion of initial infected nodes, the Y-axis is the proportion of infected nodes in the network after the propagation of t = 20 steps up to the stable

state. Fig.4 shows that the IKS presents better results than other measures do, especially in Jazz, USAir, EEC, email and Hamsterster. It illustrates that from the infected sources obtained by the IKS method, information can be

spread faster and eventually affects the larger scale than by other methods. It

is reasonable that the community leaders of relatively small clusters may be located in the lower k-shells than the community leaders of larger clusters. The

IKS method can capture the spreading capability of influential nodes in the lower shell when the community structure is obvious.

rna lP

Then the Kendall’s coefficient τ is used to measure the correlation between

ranking result and the spreading ability. The total number of infected and recovered nodes is defined as the spreading ability σi under the influence of the node vi . For accurately measurement, the actual spreading ability of node vi is: M P 1 σ¯i = M σi , where M is the number of repeated experiments on the initial m=1

node. M is set as 100 in our paper. The higher the correlation is, the more accurate the measurement of influence in respect of the node. The ideal case of

τ = 1, indicates that the method uniquely identifies the real influence ranking list.

Fig.5 shows the results of the correlation between the different evaluation

Jou

indexes and actual influence σ¯i on the email network. It can be seen that while the result of θ algorithm is negatively correlated with σ¯i , other algorithms are positively correlated with σ¯i . The IKS method has a better correlation tendency in email network. The correlation curves of the k-shell method in different networks are more divergent, which shows that there is a great difference in spreading ability between nodes with equal k-shell value. Cnc algorithm and IKS algorithm as presented in this paper have a better correlation in different 21

= 0.7744

10

= 0.7913

10

8

8

6

6

4

4

2

2

0

0.01

0.02

0.03

0.04

0.05

0.06

8

6

4

2

0

0

0.07

2

4

6

8

10

12

= 0.8832

6

6

4

4

1

1.5

0

2

0.02

6

4

2

0

2

0.04

0.06

0.08

0.1

10

20

30

40

MDD

Figure 5

0 0

200

400

600

800

50

8

8

6

6

4

4

2

2

0

20

40

60

80

= 0.9138

10

0

0

0

700

4

= 0.7721

10

8

600

C(ks)

= 0.7843

10

500

6

104

Cnc+

400

8

0

0.5

300

= 0.7747

10

2

0

200

Cnc

rna lP

8

0

100

= 0.8143

10

8

2

0

KS

DC

10

= 0.8369

10

0

0

repro of

Journal Pre-proof

100

MCDE

0

2000

4000

6000

8000

10000 12000 14000

IKS

Correlation analyses of different algorithms and actual transmission influ-

Jou

ence on the email network.

22

Journal Pre-proof

influential nodes.

repro of

networks, but the IKS algorithm has an edge over others in the ability to mine

The rank correlation coefficient τ in different neiworks is summarized in

Table 4. We can observe that our methods outperform the other methods in

most cases. In the USAir network, Cnc is highest, IKS ranks the third. In the Power network, Cnc+ is highest. Meanwhile, we witness that the IKS method is

significantly correlated with the transmission ability, which produces extremely monotonic relations with the spreading power of nodes in the network. Table 4

The correlation of different ranking measures compared with the spreading ability is measured by Kendall’s τ .

Network

τ (¯ σ , DC) τ (¯ σ , ks) τ (¯ σ , M DD) τ (¯ σ , θ) τ (¯ σ , Cnc ) τ (¯ σ , Cnc+ ) τ (¯ σ , C(ks)) τ (¯ σ , M CDE) τ (¯ σ , IKS) 0.8673

0.8617

0.7238

0.8652

0.7930

0.8235

0.8015

EEC

0.8886

0.9023

0.8971

email

0.7744

0.7913

0.7843

-0.7747

0.8369

0.8359

0.7653

0.8521

-0.7388

0.8319

0.7849

0.8161

0.7930

-0.8893

0.8908

0.9312

0.8846

0.7834

0.7975

0.8939

0.8822

0.8971

-0.9123

0.9069

0.9168

0.9251

0.8965

0.9283

0.8397

0.8418

0.8501

-0.8061

0.8958

0.9036

0.9081

0.8672

0.8717

Hamsterster Power PGP Sex

-0.6721

0.8080

0.8317

0.8642

0.8545

-0.8946

0.9334

0.8963

0.8925

0.7917

0.8959

-0.9125

0.9269

0.9354

0.9251

0.8914

0.9369

0.8832

0.8143

0.7721

0.9138

0.8610

0.8901

0.8418

0.9126

rna lP

Jazz USAir

Next, we investigate the capability of DC, K-shell, MDD, θ, Cnc , Cnc+ , C(ks), MCDE and IKS method by the monotonicity M to distinguish the spreading ability of nodes. For a specific centrality measure, nodes in the network are ranked according to their centrality values in descending order. Nodes with the same centrality value have the same rank. The monotonicity M of different ranking methods is summarized in Table 5. From Table 5, we can see that the

Jou

monotonicity of the IKS has a very prominent performance in eight networks. Compared with the benchmark centralities, the IKS method can give the higher value of M in most cases. Moreover, M(IKS) is very near 1 in all networks. Therefore, the IKS method can better distinguish the node’s influence. Note that the monotonicity of Cnc+ method is excellent too.

23

Journal Pre-proof

Network

The monotonicity M of different ranking methods.

M (DC) M (KS) M (M DD)

Jazz

0.9659

0.7944

0.9937

USAir

0.8586

0.8114

0.8893

EEC

0.9571

0.9216

0.9691

email

0.8874

0.8088

0.9249

Hamsterster

0.8980

0.8714

0.9278

Power

0.5927

0.2460

0.7048

PGP

0.6193

0.4807

0.6706

Sex

0.6002

0.5288

0.6323

repro of

Table 5

M (θ)

M (Cnc ) M (Cnc+ ) M (C(ks)) M (M CDE) M(IKS) 0.9982

0.9993

0.9994

0.9981

0.9640

0.9628

0.9945

0.9941

0.9179

0.9943

0.9968

0.9975

0.9998

0.9998

0.9774

0.9999 0.9995

0.9783

0.9839

0.9991

0.9989

0.9460

0.9347

0.9751

0.9855

0.9854

0.9523

0.9843

0.9604

0.7292

0.9419

0.9635

0.6671

0.9667

0.9856

0.8920

0.9852

0.9782

0.6753

0.9874

0.9980

0.9332

0.9957

0.9909

0.6469

0.9957

As mentioned in section 1, the IKS method deals with the ”rich-clubs” problem well. It also reduces the influence of overlapped neighbors when the length of infected sources is larger. To verify that the infected sources selected by our method are more scattered than the other eight methods, the average shortest

rna lP

path length Ls in [38] obtained by IKS and other methods are evaluated. Be-

cause of the high computational complexity of distance, we analyze the length of the shortest path using Jazz, USAir, EEC and email. Fig.6 shows the comparison of the average shortest path length Ls obtained by nine methods. The X-axis is the proportion of initial infected nodes, and the Y-axis is the Ls . From Fig.6, we can see that infected sources by IKS method have larger Ls than by other methods. Information can spread faster and more effective when Ls is

Jou

larger.

0.9994

0.9345

24

Journal Pre-proof

Jazz

USAir

2

repro of

2.5

2

1.5

Ls

Ls

1.5

1

DC KS Cnc Cnc+ C(ks)

0.5

0.5

MDD MCDE IKS

MDD MCDE IKS

0 0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Proportion of initial infected nodes

0

0.2

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Proportion of initial infected nodes

(a) Jazz

(b) USAir

EEC

2.5

0.18

DC KS Cnc Cnc+ C(ks)

1

email

3.5

2

3

Ls

2.5

Ls

1.5

DC KS Cnc Cnc+ C(ks)

1

0.5

DC KS Cnc Cnc+ C(ks)

2

1.5

0

rna lP

MDD MCDE IKS

MDD MCDE IKS

1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0

0.02

Proportion of initial infected nodes

(c) EEC

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Proportion of initial infected nodes

(d) email

Figure 6: Average shortest path length Ls by nine methods under different proportion of source spreaders.

4.4. Computational complexity analysis

In this section, the efficiency of the proposed method is investigated. As can be seen from our algorithm, the time complexity of the whole algorithm depends

Jou

on the calculation of k-shell method. Table 6 shows the time complexity of the different methods mentioned in this paper. The main advantage of our method is achieving high performance with the same complexity with k-shell.

25

Journal Pre-proof

The computational complexity of different methods

Method

Category

DC

Local

k-shell

Global

MDD

Global

θ

Global

Cnc

Hybrid

Cnc+

Hybrid

C(ks)

Local

MCDE

Hybrid

IKS

Hybrid

repro of

Table 6

Computational complexity O (n) O (n)

O (n)  O n3 or O (mn) O (n)

O (n) O (n)

O (n)

O (n)

rna lP

It can be seen from Table 6 that these methods have different category metrics and different computational complexity. For example, the θ is based

on the shortest paths, which need the full network information and have a high  complexity of O n3 . The C(ks) only needs the local information of a node, and the KS, MDD, θ need the full network information, while the Cnc and Cnc+

need both of them. The computational complexity for k-shell centrality is O (n). The computational complexity of IKS is O (n), which is equal to that of DC, KS, MDD, Cnc and Cnc+ and it is lower than θ. Local measures determine node rank based on the local information and according to the neighbors. Global measures need to traverse the whole network for specification of node rank. Hybrid measures, which make use of both types of information, are evidently

Jou

less efficient than global measures.

Furthermore, the runtime of different methods is examined. The DC and θ

method are ignored here because their complexity is significantly lower or higher than our method. For this purpose, each of the different methods is implemented 100 times and the average run times are calculated. The experiments are carried out on a desktop PC with Intel Core i7 4GHz CPU and 4GB RAM. The results

26

Journal Pre-proof

repro of

of the experiment are shown in Fig.7. The results are shown confirm that KS, Cnc+, C(ks), MDD, MCDE and IKS method run in nearly linear time in

different networks. The KS method is lowest, IKS and Cnc+ method are almost slightly higher than the KS. As the size of the network increases, the proposed method, Cnc+ and C(ks) increase slowly, IKS becomes efficient than the other

measures. It can be attested that there is a direct relationship between the number of edges in the network and the run time. MDD and MCDE are the

slowest methods, they do not often have good differentiation and accuracy. In summary, the IKS is acceptable for large networks. 70

50

KS Cnc+ C(ks) MDD MCDE IKS

40

30

20

10

0

rna lP

Running Time(second)

60

Jazz

USAir

EEC

email

Power

PGP

Dataset

Figure 7: Time costs of execution of different methods on six datasets.

5. Conclusion

Jou

This paper has presented an improved k-shell (IKS) method that efficiently ranks and measures the influential spreaders in complex networks. The IKS method combines the k-shell and the node information entropy to optimize the use of available resources so that the information can be disseminated efficiently. IKS considers the location shell of a node and the influence of the node and its nearest neighbors so as to effectively rank the spreading ability of nodes in complex networks. Experimental results conducted on six real networks show 27

Journal Pre-proof

repro of

that the proposed IKS method distinguishes the difference of node influence

better than conventional centrality methods such as the DC, K-shell, MDD, , Cnc, Cnc+, and C(ks) algorithm. The influential spreaders selected by IKS

are scattered more broadly than that by other methods. This accelerates the

propagation of information. Moreover, computational complexity analysis shows that the IKS method can be extended to the large scale network.

It is still a long-term challenge to find a more efficient method combined by network structure and spreading dynamics to identify the node spreading

influence in large-scale dynamic networks. Our future work will focus on how to apply the proposed IKS method to identify multiple spreaders in dynamic networks.

Acknowledgment

rna lP

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant No.U1530126, the Fundamental Research Funds for the Central Universities (Grant No.ZYGX2016Z005, ZYGX2016J218), and the Meteorological information and Signal Processing Key Laboratory of Sichuan Higher Education Institutes at Chengdu University of Information Technology (Grant No.QXXCSYS201702). The authors would like to thank the anonymous reviewers for their helpful comments and suggestions. We greatly appreciate Chungu Guo and Yanzhou Su for providing guidance and encouragement on Pearsons correlation.

Jou

Data Availability

All relevant datasets and matlab code are available at https://github.com/uestcwm/complex-

network.

28

References

repro of

Journal Pre-proof

[1] A. Zareie, A. Sheikhahmadi, A hierarchical approach for influential node ranking in complex social networks, Expert Systems with Applications 93 (2018) 200–211.

[2] D. Chen, L. L¨ u, M.-S. Shang, Y.-C. Zhang, T. Zhou, Identifying influential nodes in complex networks, Physica a: Statistical mechanics and its applications 391 (4) (2012) 1777–1787.

[3] H. Yu, X. Cao, Z. Liu, Y. Li, Identifying key nodes based on improved

structural holes in complex networks, Physica A: Statistical Mechanics and its Applications 486 (2017) 318–327.

[4] Z.-Y. Jiang, Y. Zeng, Z.-H. Liu, J.-F. Ma, Identifying critical nodes’ group

in complex networks, Physica A: Statistical Mechanics and its Applications

rna lP

514 (2019) 121–132.

[5] Z. Wang, L. Wang, A. Szolnoki, M. Perc, Evolutionary games on multilayer networks: a colloquium, The European physical journal B 88 (5) (2015) 124. [6] M. Medo, Y.-C. Zhang, T. Zhou, Adaptive model for recommendation of news, EPL (Europhysics Letters) 88 (3) (2009) 38005. [7] L. L¨ u, Y.-C. Zhang, C. H. Yeung, T. Zhou, Leaders in social networks, the delicious case, PloS one 6 (6) (2011) e21202. [8] Z. Wang, M. A. Andrews, Z.-X. Wu, L. Wang, C. T. Bauch, Coupled disease–behavior dynamics on complex networks: A review, Physics of life

Jou

reviews 15 (2015) 1–29.

[9] F. Morone, B. Min, L. Bo, R. Mari, H. A. Makse, Collective influence algorithm to find influencers via optimal percolation in massively large social media, Scientific reports 6 (2016) 30062.

29

Journal Pre-proof

repro of

[10] L. Yang, Y. Qiao, Z. Liu, J. Ma, X. Li, Identifying opinion leader nodes in online social networks with a new closeness evaluation algorithm, Soft Computing 22 (2) (2018) 453–464.

[11] D. Purevsuren, G. Cui, Efficient heuristic algorithm for identifying critical nodes in planar networks, Computers & Operations Research 106 (2019) 143–153.

[12] T. Wen, S. Duan, W. Jiang, Node similarity measuring in complex networks with relative entropy, Communications in Nonlinear Science and Numerical Simulation (2019) 104867.

[13] K. Rahimkhani, A. Aleahmad, M. Rahgozar, A. Moeini, A fast algorithm

for finding most influential people based on the linear threshold model, Expert Systems with Applications 42 (3) (2015) 1353–1361.

rna lP

[14] S. Gao, J. Ma, Z. Chen, G. Wang, C. Xing, Ranking the spreading ability of nodes in complex networks based on local structure, Physica A: Statistical Mechanics and its Applications 403 (2014) 130–147. [15] Y. Zhao, S. Li, F. Jin, Identification of influential nodes in social networks

with community structure based on label propagation, Neurocomputing 210 (2016) 34–44.

[16] A. Sheikhahmadi, M. A. Nematbakhsh, A. Shokrollahi, Improving detection of influential nodes in complex networks, Physica A: Statistical Mechanics and its Applications 436 (2015) 833–845.

Jou

[17] Z. Lv, N. Zhao, F. Xiong, N. Chen, A novel measure of identifying influential nodes in complex networks, Physica A: Statistical Mechanics and its Applications 523 (2019) 488–497.

[18] L. C. Freeman, Centrality in social networks conceptual clarification, Social networks 1 (3) (1978) 215–239.

30

Journal Pre-proof

581–603.

repro of

[19] G. Sabidussi, The centrality index of a graph, Psychometrika 31 (4) (1966)

[20] L. C. Freeman, A set of measures of centrality based on betweenness, Sociometry (1977) 35–41.

[21] J. Bae, S. Kim, Identifying and ranking influential spreaders in complex

networks by neighborhood coreness, Physica A: Statistical Mechanics and its Applications 395 (2014) 549–559.

[22] J. Wang, X. Hou, K. Li, Y. Ding, A novel weight neighborhood centrality algorithm for identifying influential spreaders in complex networks, Physica A: Statistical Mechanics and its Applications 475 (2017) 88–105.

[23] A. Zeng, C.-J. Zhang, Ranking spreaders by decomposing complex networks, Physics Letters A 377 (14) (2013) 1031–1035.

rna lP

[24] J.-G. Liu, Z.-Y. Wang, Q. Guo, L. Guo, Q. Chen, Y.-Z. Ni, Identifying multiple influential spreaders via local structural similarity, EPL (Europhysics Letters) 119 (1) (2017) 18001.

[25] L. L¨ u, D. Chen, X.-L. Ren, Q.-M. Zhang, Y.-C. Zhang, T. Zhou, Vital nodes identification in complex networks, Physics Reports 650 (2016) 1–63. [26] L. Fei, Q. Zhang, Y. Deng, Identifying influential nodes in complex networks based on the inverse-square law, Physica A: Statistical Mechanics and its Applications 512 (2018) 1044–1059. [27] M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley,

Jou

H. A. Makse, Identification of influential spreaders in complex networks, Nature physics 6 (11) (2010) 888.

[28] L. Jiang, X. Zhao, B. Ge, W. Xiao, Y. Ruan, An efficient algorithm for mining a set of influential spreaders in complex networks, Physica A: Statistical Mechanics and its Applications 516 (2019) 58–65.

31

Journal Pre-proof

repro of

[29] Z.-M. Ren, J.-G. Liu, F. Shao, Z.-L. Hu, Q. Guo, Analysis of the spreading influence of the nodes with minimum k-shell value in complex networks, Acta Physica Sinica 62 (10) (2013) 956–959.

[30] L.-l. Ma, C. Ma, H.-F. Zhang, B.-H. Wang, Identifying influential spread-

ers in complex networks based on gravity formula, Physica A: Statistical Mechanics and its Applications 451 (2016) 205–212.

[31] X. REN, L. Linyuan, Review of ranking nodes in complex networks, Chinese Science Bulletin 59 (13) (2014) 1175–1197.

[32] C. Salavati, A. Abdollahpouri, Z. Manbari, Ranking nodes in complex net-

works based on local structure and improving closeness centrality, Neurocomputing 336 (2019) 36–45.

[33] D. Liu, Y. Jing, J. Zhao, W. Wang, G. Song, A fast and efficient algorithm

rna lP

for mining top-k nodes in complex networks, Scientific reports 7 (2017) 43330.

[34] A. Namtirtha, A. Dutta, B. Dutta, Identifying influential spreaders in complex networks based on kshell hybrid method, Physica A: Statistical Mechanics and its Applications 499 (2018) 310–324. [35] C. Li, L. Wang, S. Sun, C. Xia, Identification of influential spreaders based on classified neighbors in real-world complex networks, Applied Mathematics and Computation 320 (2018) 512–523.

[36] Z.-L. Hu, J.-G. Liu, G.-Y. Yang, Z.-M. Ren, Effects of the distance among

Jou

multiple spreaders on the spreading, EPL (Europhysics Letters) 106 (1) (2014) 18002.

[37] L. Guo, J.-H. Lin, Q. Guo, J.-G. Liu, Identifying multiple influential spreaders in term of the distance-based coloring, Physics Letters A 380 (7-8) (2016) 837–842.

32

Journal Pre-proof

repro of

[38] J.-X. Zhang, D.-B. Chen, Q. Dong, Z.-D. Zhao, Identifying a set of influential spreaders in complex networks, Scientific reports 6 (2016) 27823.

[39] T. Bian, Y. Deng, Identifying influential nodes in complex networks: A node information dimension approach, Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (4) (2018) 043109.

[40] A. Sheikhahmadi, M. A. Nematbakhsh, Identification of multi-spreader

users in social networks for viral marketing, Journal of Information Science 43 (3) (2017) 412–423.

[41] A. Zareie, A. Sheikhahmadi, M. Jalili, Influential node ranking in social networks based on neighborhood diversity, Future Generation Computer Systems 94 (2019) 120–129.

[42] Y. Liu, B. Wei, Y. Du, F. Xiao, Y. Deng, Identifying influential spreaders

rna lP

by weight degree centrality in complex networks, Chaos, Solitons & Fractals 86 (2016) 1–7.

[43] M. Li, Q. Zhang, Y. Deng, Evidential identification of influential nodes in network of networks, Chaos, Solitons & Fractals 117 (2018) 283–296. [44] Y. Wang, S. Wang, Y. Deng, A modified efficiency centrality to identify influential nodes in weighted networks, Pramana 92 (4) (2019) 68. [45] J.-G. Liu, J.-H. Lin, Q. Guo, T. Zhou, Locating influential nodes via dynamics-sensitive centrality, Scientific reports 6 (2016) 21380. [46] J.-G. Liu, Z.-M. Ren, Q. Guo, Ranking the spreading influence in complex

Jou

networks, Physica A: Statistical Mechanics and its Applications 392 (18) (2013) 4154–4159.

[47] B. Mirzasoleiman, M. Babaei, M. Jalili, M. Safari, Cascaded failures in weighted networks, Physical Review E 84 (4) (2011) 046114.

[48] W.-X. Wang, G. Chen, Universal robustness characteristic of weighted networks against cascading failure, Physical Review E 77 (2) (2008) 026101. 33

Journal Pre-proof

repro of

[49] T. Nie, Z. Guo, K. Zhao, Z.-M. Lu, Using mapping entropy to identify

node centrality in complex networks, Physica A: Statistical Mechanics and its Applications 453 (2016) 290–297.

[50] S. N. Dorogovtsev, A. V. Goltsev, J. F. F. Mendes, K-core organization of complex networks, Physical review letters 96 (4) (2006) 040601.

[51] P. M. Gleiser, L. Danon, Community structure in jazz, Advances in Complex Systems 6 (4) (2003) 565–573.

[52] V. Colizza, R. Pastor-Satorras, A. Vespignani, Reaction–diffusion processes and metapopulation models in heterogeneous networks, Nature Physics 3 (4) (2007) 276–282.

[53] H. Yin, A. R. Benson, J. Leskovec, D. F. Gleich, Local higher-order graph

clustering, in: Proceedings of the 23rd ACM SIGKDD International Con-

564.

rna lP

ference on Knowledge Discovery and Data Mining, ACM, 2017, pp. 555–

[54] R. Guimera, , l. danon, a. dıaz-guilera, f. giralt, a. arenas, Phys. Rev. E 68 (2003) 065103.

[55] Hamsterster full network dataset – KONECT (Apr. 2017). URL http://konect.uni-koblenz.de/networks/petster-hamster [56] D. J. Watts, S. H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393 (1) (1998) 440–442.

[57] M. Bogu, R. Pastor-Satorras, A. Daz-Guilera, A. Arenas, Models of social

Jou

networks based on social distance attachment, Phys. Rev. E 70 (5) (2004) 056122.

[58] L. E. Rocha, F. Liljeros, P. Holme, Simulated epidemics in an empirical spatiotemporal network of 50,185 sexual contacts, PLoS computational biology 7 (3) (2011) e1001109.

34

Journal Pre-proof

repro of

[59] J. Kunegis, Konect: the koblenz network collection, in: Proceedings of

the 22nd International Conference on World Wide Web, ACM, 2013, pp. 1343–1350.

[60] K. J. Sharkey, Deterministic epidemic models on contact networks: correlations and unbiological terms, Theoretical population biology 79 (4) (2011) 115–129.

[61] C. Castellano, R. Pastor-Satorras, Thresholds for epidemic spreading in networks, Physical review letters 105 (21) (2010) 218701.

[62] S. Xiao-Ping, S. Yu-Rong, Leveraging neighborhood” structural holes” to

Jou

rna lP

identifying key spreaders in social networks, Acta physica sinica 64 (2).

35

Journal Pre-proof Highlights Our paper proposed a new method to identify influential nodes in complex network which the experimental results show accurately and superiority. The highlights of this paper are list as follows: A novel method is proposed to identify influential spreaders based on the improved k-shell.

repro of





Node information entropy is proposed based on the node and its neighbors.



Only the node with largest node information entropy in each shell is selected.



Our approach can guarantee that the spreaders not only influence but also scattered.

rna lP

The proposed method outperforms other measures in eight real-world networks.

Jou



Journal Pre-proof Confict of interest We declare that we do not have any commercial or associative interest that

Jou

rna lP

repro of

represssents a conflict of interest in connection with the work submitted.