Journal Pre-proof Sequential seeding strategy for social influence diffusion with improved entropy-based centrality Chengzhang Ni, Jun Yang, Demei Kong
PII: DOI: Reference:
S0378-4371(19)32041-2 https://doi.org/10.1016/j.physa.2019.123659 PHYSA 123659
To appear in:
Physica A
Received date : 31 May 2019 Revised date : 29 September 2019 Please cite this article as: C. Ni, J. Yang and D. Kong, Sequential seeding strategy for social influence diffusion with improved entropy-based centrality, Physica A (2019), doi: https://doi.org/10.1016/j.physa.2019.123659. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier B.V.
*Manuscript Click here to view linked References
Journal Pre-proof
Highlights Sequential seeding strategy for social influence diffusion with improved entropybased centrality
of
Chengzhang Ni, Jun Yang, Demei Kong • A measurement based on heterogeneity of confidence in neighbors is developed.
p ro
• The entropy-based centrality is proposed to measure the individual’s influence. • Sequential seeding strategy are conducted for comparison with sing-stage strategy.
Jo
urn
al
Pr e-
• The proposed centrality is the best of all in the BA scale-free network.
Journal Pre-proof
Sequential seeding strategy for social influence diffusion with improved entropy-based centrality Chengzhang Ni, Jun Yang∗, Demei Kong
of
School of Management, Huazhong University of Science and Technology, Wuhan, 430074 Hubei, China
Abstract
p ro
In this paper, we investigate the centrality problem of selecting seed targets for sequential seeding strategy in social networks. Based on the concept of entropy, we design a novel improved centrality by integrating interaction intimacy and confidence level to measure the total influence of an individual which can be decomposed into direct effect and indirect effect.
Pr e-
In addition, we formulate the sequential seeding strategy to evaluate the performance of the proposed centrality and compare it with the counterpart of the single-stage seeding strategy. Furthermore, extensive experiments are conducted for comparison with the other centralities including betweenness, closeness, degree, and eigenvector in two empirical and four artificial social networks. By simulations, we find that the proposed entropy-based centrality is superior to other centralities in terms of diffusion speed and influence coverage in the BA scale-free
al
network. Parameter analysis of sequential seeding strategy demonstrates that the proposed centrality can achieve the greatest total influence coverage in the case where the individual’s
Keywords:
urn
confidence in each neighbor is treated equally.
social network, influence diffusion, entropy, centrality, seeding strategy.
1
1. Introduction
In the past decades, the emergence of the extensive social network services makes it pos-
3
sible to investigate and analyze social influence diffusion in real-world networks. Diffusion
4
in the social ecology of human beings has long been a very common phenomenon, including
5
disease spread in the crowds through contact[1], information dissemination and rumors spread
Jo
2
∗
Corresponding author Email address:
[email protected] (Jun Yang)
Preprint submitted to Physica A
September 29, 2019
Journal Pre-proof
on online or offline social media[2], and innovation communication in the society[3]. In partic-
7
ular, a central hot topic that received considerable attention in diffusion research field is the
8
maximization problem of social influence[4], which can be interpreted as how to select a small
9
subset of individuals in a social network as initial influential seeds such that social influence
10
diffusion cascade triggered from these seeds can lead to an optimal coverage. One of the most
11
popular applications related to social influence diffusion is word-of-mouth marketing of new
12
product[5][6]. For example, an individual’s decision to adopt a new product or not partly
13
depends on neighbors’ attitude toward this product. The greater the influence of the neigh-
14
bors who adopt the product, the larger the probability that the individual will be influenced.
15
Therefore, for a corporation who aim to launch new products into the market, how to choose
16
and seed the most influential individuals effectively is apparently an important strategy to
17
optimize influence diffusion and control seeding budget. In addition, the research results of
18
social influence diffusion problem have been widely applied to many business activities rang-
19
ing from public opinion analysis[7], on-line recommendation[8], and advertising release[9], to
20
expert identification[10], etc.
Pr e-
p ro
of
6
Being a fundamental research domain of the social influence maximization problem, the
22
identification problem of influential individuals is frequently processed to quantify centrality
23
of the individual in the social network. Thus, multitudinous centrality measurements have
24
been proposed including[11–26]. Thereinto, the centrality measurements, e.g. betweenness
25
centrality[11], Katz centrality[12], closeness centrality[11], eccentricity[13], and information
26
centrality index[14], are established on path traffic flows of the social network, which can
27
affect and mirror the social influence of individuals.In addition to the path-based central-
28
ity mentioned above, centrality measurements in[15–26] emphasize analyzing an individual’s
29
social influence resulted from its neighbor’s location in the social network. Although above-
30
mentioned centrality measurements are available to identify the influential individual through
31
excavating the topological structure of the social network to characterize its social influence,
32
they ignore the role of the complexity and uncertainty of individual neighborhood structure in
33
analyzing network centrality. There are some shreds of evidence that entropy-based centrality
34
has been applied extensively to quantify the complexity and uncertainty of networks[27–34].
35
Therefore, in this paper, we propose an improved entropy-based centrality measurement con-
36
sidering connection weight and the difference in the degree of confidence in neighbors.
Jo
urn
al
21
2
Journal Pre-proof
Furthermore, in order to make social influence diffusion more efficient, how to design
38
seeding strategies has also aroused wide research interests. Since seeking the optimal seeding
39
strategy is known to be NP-hard, there has been considerable focus on greedy algorithms[35]
40
and heuristic methods[4][36]. A popular application in the existing researches is that all seed
41
individuals are activated during the initial phase and then spread their social influence to
42
neighbor individuals. This is the so-called single-stage seeding strategy. However, if all seed
43
individuals are launched at the initial time step, there exists such a situation where some seeds
44
are redundant due to their potentiality of being influenced naturally in the following stages.
45
Therefore, several seeding strategies are proposed in[37–40], which divides behavior of seeding
46
individuals into sequential actions according to some rules rather than completing it at the
47
initial stage.
p ro
of
37
Motivated by the above discussion, we propose a novel centrality measurement to evaluate
49
individuals’ social influence based on the connection weight influence entropy and confidence
50
influence entropy. Considering the confidence strength among nodes pair and weights of con-
51
nections, the proposed centrality can be suitably qualified to describe the social influence of
52
individual on its one-hop neighbors and two-hop neighbors. By exploiting the concept of
53
entropy to measure the complexity and uncertainty of social influence, the vital influential
54
individuals can be consequently and effectively detected. In addition, based on the proposed
55
centrality and the independent cascade model, we design a sequential seeding strategy to com-
56
pare the performance of social influence diffusion with the counterpart of sing-stage seeding
57
strategy. For comparison, other four centralities including degree centrality, betweenness cen-
58
trality, closeness centrality, and eigenvector centrality are also applied to these two seeding
59
strategies. To sum up, our contributions in this paper are summarized as follows:
60
(1) We propose a novel centrality measurement based on the concept of entropy incorporating
61
individual’s confidence level, which mines potential pattern of evaluation for individual’s
62
social influence, thereby identify influential individuals to trigger the diffusion cascading.
63
(2) We develop a new method to characterize the relationship between social interactions and
64
the strength of social influence. It can evaluate the direct influence and indirect influence
65
by integrating confidence influence entropy and weight influence entropy to measure the
66
impact of confidence level and interaction intimacy on social influence.
Jo
urn
al
Pr e-
48
3
Journal Pre-proof
67
(3) We provide a comprehensive comparison of our sequential seeding strategy with single-
68
stage seeding strategy based on this proposed centrality and perform extensive experiments
69
to prove its effectiveness and efficiency to some extent. The remainder of this paper is organized as follows: In section 2, the literature review is
71
discussed. In section 3, we introduce the preliminary work containing diffusion model and the
72
concept of entropy. In section 4, we propose an improved centrality measurement elaborated on
73
the entropy-based social influence such that seed targets chosen according to that centrality
74
method can trigger the spreading process following a widely applied independent cascade
75
model. In addition, the proposed seeding strategies are presented. In section 5, we compare
76
the performance of five centralities in different networks for sequential seeding strategy and
77
single-stage strategy, and conduct parameter analysis. In section 6, conclusions are given.
78
2. Literature review
Pr e-
p ro
of
70
In order to achieve maximum social influence diffusion efficiency, the criterion of seed
80
selection and the seeding strategies are two important procedures. In this section, we provide
81
related literature reviews in these two factors including the centrality of seed selection and
82
arrangement of seeding strategy.
83
(1) The centrality of seed selection
al
79
In social network analysis, some well-known classical centrality measurements have been
85
applied to identify the influential individuals by measuring the network structure[41]. De-
86
gree centrality is the most direct and popular measurement of node centrality in network
87
analysis[42]. Katz centrality takes account of all paths between a pair of nodes to calculate
88
social influence[12]. Eigenvector centrality first proposed in[43] assigns each node in the net-
89
work a relative score associated with the network adjacency matrix. The renowned PageRank
90
of Google Search is an application based on the variant version of eigenvector centrality[44].
91
Closeness centrality reflects the proximity of a node to other nodes in the network, whereas,
92
betweenness centrality indicates the importance of a node by the number of shortest paths
93
passing through it[11]. Generally speaking, the concept of betweenness centrality and close-
94
ness centrality represents controllability and accessibility, respectively. In the literature on
Jo
urn
84
4
Journal Pre-proof
95
the comparison of centralities’ performance, Hinz et al.[38] compared the performance and
96
efficiency of four different seeding strategies based on nodes’ central character including Hubs,
97
Bridges, Fringes and Random, resulting in that the Hubs strategy outperforms other three
98
counterparts. Similarly, Banerjee, A. et al.[45] presented that individuals with a high value of
99
eigenvector centrality have greater capability to influence neighbors in the context of available microfinance loan program.
of
100
In addition to the above centrality measurements, the scholars put forward many other
102
improved centrality approaches. Stephenson and Zelen[46] proposed a novel centrality to cal-
103
culate the “information” contained in all potential paths between pairs of nodes. Furthermore,
104
Chen et al.[15] proposed a semi-local centrality measure as a tradeoff between the low rele-
105
vant degree centrality and other time-consuming measures by exploiting the information of
106
multiple-hop neighbors. Considering the significance of the node location in a given network,
107
Kitsak et al.[19]suggested the coreness centrality may be more effective index in identifying
108
the most influential spreaders. However, in terms of the computational complexity of the
109
original k-core centrality[47], it requires global network topological information, which may
110
lead to computational inefficiency especially in the large-scale network. In addition, k-core
111
centrality may fail to distinguish plenty of nodes with the same k-core. Thus, many improved
112
k-core algorithms[22, 24, 48] have been successively proposed from different perspectives. H-
113
index is introduced to deal with discussed above drawback in[47][49], and H-index is proposed
114
originally to measure scholars’ academic influence by calculating the least citations for the
115
least publications[26]. Subsequently, Lv et al.[50] proposed an influence evaluation model
116
by constructing an integrated operator containing the degree metrics, H-index, and coreness
117
method.
urn
al
Pr e-
p ro
101
In addition to the classical centrality measurements and improved centrality approaches
119
discussed above, recently the information entropy techniques initially proposed by Shannon[51]
120
have been extended to demonstrate the quantitative analysis results of influence in[27–32]. In
121
recent CentralityM easure articles, a model was presented by Peng et al.[33] to quantitatively
122
evaluate social influence in mobile social networks by introducing the concept of entropy to
123
depict the uncertainty and complexity of social influence, focusing on friend entropy and in-
124
teraction frequency entropy. Furthermore, Qiao.T et al.[34] proposed the re-defined entropy
125
centrality model, which describe associations among node pairs, to measure the potential in-
Jo
118
5
Journal Pre-proof
fluence of actor in communication activity. Base on this model, they proposed an extended
127
model to the case of directed and weighted networks in[52], which characterizes the total influ-
128
ence of individual by calculating the structural entropy and the interaction frequency entropy.
129
Although all the above information entropy-based centrality measures are proposed from the
130
perspective of network topology[33, 34, 52], these measures have until now been difficult to
131
conceptually compare. In addition, Tutzauer[53] proposed an entropy-based measure based on
132
the ways that traffic propagates by transfer and flows through the whole network. It should be
133
mentioned that this measure needs to obtain the global information of the network structure
134
due to the calculation of the centrality. However, global information of a node is usually diffi-
135
cult to obtain in social network. Different from the combination of topological structure and
136
information entropy in their works, the purpose of this paper is to propose an entropy-based
137
measure of centrality for individuals characterized by individual’s intimacy of connections and
138
confidence in neighbors, each focusing on the personal emotion and local information that
139
determine the diffusion of social influence. Specifically, a measurement taking into account
140
interaction intimacy and confidence level based on the heterogeneity of confidence in neigh-
141
bors is developed, and the entropy-based calculation can be realized by local information of
142
individuals. Moreover, to the best of our knowledge, no literatures have studied the sequential
143
seeding program by using entropy-based centrality and shed light on the tradeoff of influence
144
coverage and seeding frequency. This paper is thus the first to compare the performance of
145
the proposed entropy-based centrality with that of other centralities under different seeding
146
strategy. Next, we build on the centrality measurements to develop the seeding strategies.
147
(2) Seeding strategies
urn
al
Pr e-
p ro
of
126
In the early seminal works[38][45], a large number of relevant researches about seeding
149
strategies focus on single-stage seeding strategy where all chosen seed nodes are activated
150
at the beginning of the diffusion process. Recently, some works proposed a novel adaptive
151
approach that seeds are activated over time, and the centrality of seed selection need to be
152
reassessed in each seeding stage in order to obtain a higher activation rate[54][55]. This is the
153
so-called sequential seeding strategy. Sequential seeding strategies focus on how to designate
154
the suitable subset of individuals as seeds to trigger a cascade of influence diffusion and
155
determine the appropriate seeding mechanisms at different stages of diffusion. Considering
Jo
148
6
Journal Pre-proof
the three state assumptions including the ‘non-active’, ‘active’ and ‘available’, Seeman et
157
al.[54]proposed a two-stage framework for seeding actions. Tong G et al.[56] presented a
158
greedy adaptive seeding strategy and an effective heuristic algorithm based on the dynamic
159
independent cascade model. Comparing the sequential seeding strategy and the single-stage
160
seeding strategy, Jankowski J et al.[37] demonstrated that the former is superior to the later in
161
term of the activation coverage. Furthermore, Jankowski J et al.[57] proposed the sequential
162
seeding strategies with buffering, which can avoid selecting the nodes naturally activated by
163
other nodes. Chierichetti F. et al.[58] studied how to determine seeding sequence for two
164
competitive marketers in order to maximize the expected coverage. Liu[59] proposed the
165
push-driven cascade model with the consideration of limited user attention and controlling
166
factor over the diffusion process. More specifically, seeding action occurs at each time step
167
to obtain the overall activation scope. In recent years, Goldenberg D et al.[60] proposed a
168
scheduled seeding method which focuses on finding not only the optimal set of seed nodes
169
but also the right timing to implement the seeding actions. They extract three different
170
properties, named the stochastic dynamics, diminishing the social effect and state-dependent
171
seeding from the existing diffusion models to apply in the scheduled approach. Analogously,
172
Sela A et al.[55] suggested a greedy heuristic scheduled seeding approach motivated by the
173
timing aspect of seeding which takes into account the identification of seeds at the initial stage
174
and determination of seeding time over the diffusion process.
al
Pr e-
p ro
of
156
Although the above works have made major progress in methods of measuring centrality,
176
the research on entropy-based centrality is still in a nascent stage, which still does not accu-
177
rately capture the inherent fundamental rules followed by social influence diffusion. In fact,
178
an individual’s opinion will not always be accepted by his neighbors, because the individual’s
179
social influence depends largely on how much his neighbors trust him. Different from the
180
literature mentioned above, this paper proposes the appropriate entropy-based centrality by
181
integrating interaction intimacy and confidence level and further analyze the effect of confi-
182
dence level on the sequential seeding strategy. Nevertheless, the impact of seeding frequency
183
during sequential seeding horizon on diffusion speed and activation coverage has also rarely
184
been investigated. Therefore, how the different consecutive seeding actions affect the acceler-
185
ation and coverage of influence diffusion is subsequently analyzed in the context of sequential
186
seeding strategy.
Jo
urn
175
7
Journal Pre-proof
187
3. Preliminaries
188
3.1. Diffusion model In considering social influence diffusion through a social network, the appropriate diffusion
190
model is first needed to be set up. The literature[61] introduced different models of influence
191
diffusion, and therein the Independent Cascade Model (ICM) and the linear threshold model
192
(LTM) had been widely applied and lie at the core of most extended versions of diffusion
193
model. ICM is used to arrange a given spreading probability for each connection through
194
which a node already influenced in the latest time tick will affect its neighbor in the next
195
time tick. In LTM, an inactive node will transform into active status under the condition
196
that the sum of the spreading probability of the connections with active neighbors exceeds
197
its threshold value. In particular, the method of LTM is based on the operation of the node-
198
specific threshold from the perspective of the entity, and whether the node became active or
199
not depends on its active neighbors’ aggregate effect. However, in ICM a connection tying the
200
active node and the inactive node is given the single chance through which the inactive node
201
is successfully influenced with a certain probability independently of the previous historical
202
records[36].For the reason that only the connection-specific diffusion model is involved in this
203
paper, we adopt the ICM to carry out experiments. It is worth emphasizing that, unlike the
204
conceptually traditional version of ICM, we redefine the propagation probability by considering
205
the weight of the connection between adjacent nodes. In previous studies, Gang Y. et al.[62]
206
ij defined the probability as τij = ( wmax )α with which the node i with the inactive state is affected
207
by its active neighbor node j, where α > 0 refers to the tunable parameter, wij corresponds
208
to the weighted value of the directed connection, and wmax denotes the maximal weighted
209
value. Moreover, another definition of infection transmission proposed by Wang W. et al.[63]
210
stated the probability as 1 − (1 − γ)wij , where wij is still the weighted value on the given
211
edge connecting node i and node j, and the positive parameter γ chosen in the interval [0, 1]
212
denotes the original propagation probability. According to such a realistic assumption that the
213
probability of a node being affected by neighbor nodes is related to the original propagation
214
probability and weighted value in weighted social networks, we thus adopt the latter version
215
of influence spreading probability, i.e., pij = 1 − (1 − γ)wij , to substitute the counterpart of
216
traditional ICM in our paper. Each node (or individual) will be mentioned as being either
217
active (the adopter of the social influence) or inactive in the remaining sections.
al
Pr e-
p ro
of
189
Jo
urn
w
8
Journal Pre-proof
Starting with an initial set of active nodes, the newly activated node i is given an only single
219
chance to influence each currently inactive neighbor node j, and it succeeds with a probability
220
pij = 1 − (1 − γ)wij . Here, the weight wij can represent intuitively the heterogeneity level of
221
each connection. If node j has multiple neighbor nodes that are newly activated, then those
222
nodes will attempt to activate node j in random order. Supposing the node j is activated
223
successfully by one of its neighbor node i, j will turn into the active state at the next time
224
step. It is worth noting that whether node i succeeds or not, the further attempts of node
225
i to influence its neighbor nodes will not happen in subsequent phases. Again, this process
226
is repeated until the one more influenced active node in the network does not exist and the
227
propagation process ends. According to these rules, the influence diffusion unfolds in the
228
discrete time step. Based on this improved diffusion model, next, we evaluate the effect of
229
different centrality measurements on selecting seed targets and seeding strategies for maximal
230
coverage of influence diffusion.
231
3.2. Information entropy
Pr e-
p ro
of
218
In 1948, Shannon proposed the concept of information entropy in his well-known work
233
“A Mathematical Theory of Communication” which pointed out redundancy is ubiquitous in
234
any information and elaborated on how to measure the uncertainty of information using a
235
mathematical language. In general, the more information can be transmitted in a system, the
236
higher the entropy value it possesses. Information entropy has been widely applied in many
237
fields such as data mining, statistical inference, image processing, and so on.
al
232
According to the definition of Shannon’ information entropy, given a discrete random
239
variable X with a set of possible events xi whose probability of occurrence is represented by
240
p(xi ), i = 1, 2, ..., n, the information entropy is defined as follows:
urn
238
Jo
H(X) = H(x1 , x2 , ..., xn ) = −
n ∑
p(xi )log10 p(xi ).
(1)
i=1
241
the base 10 in formula (1) is selected in the logarithm without loss of generality. Specif-
242
ically, the definition contains the following three basic attributes: (1) information entropy
243
is continuously changing with p(xi ); (2) when all events occur with equal probability, i.e.,
244
p(xi ) = 1/n, i = 1, 2, ..., n, the information entropy increases monotonically as the total num-
245
ber of events n increases. That is, the more choices the event has, the greater the uncertainty 9
Journal Pre-proof
246
involved in the results; (3) When a choice can be broken down into two consecutive choices,
247
the entropy values before and after decomposition should be equal and the uncertainty is the
248
same. Many studies have investigated a number of important aspects of information measure, in-
250
cluding a magnitude-based information measure[64], the partition-independent graph entropy
251
for capturing the information of graphs[65], the information function based on the degree of
252
node[30], and so on. One of the salient contributions of Shannon’ information entropy[66] is
253
the remarkable extended application in social network, including hypergraph partitioning[67],
254
community detection[68], data forwarding[69] and influence measure[70]. Therefore, informa-
255
tion entropy techniques can provide an ideal basis to accurately formulate centrality measure-
256
ment for social influence in a model free manner. In this paper, we are now ready to introduce
257
a novel method to assess nodes’ social influence from the perspective of information entropy.
258
4. Model description
Pr e-
p ro
of
249
The seminal work done by Christakis et al.[71] found that effective influence can be de-
260
tectable within the scope of two-hop local network, i.e., the influence from those nodes whose
261
location is beyond the boundary of two-hop neighbors can be omitted, consequently, we de-
262
compose the social network of each individual node into two-hop sub-network for capturing
263
influence diffusion through the entire network. Now, let we consider an undirected, weighted
264
social network G(V, E, W ), where V denotes the finite set of nodes, E corresponds to the set
265
of undirected edges connecting a pair of nodes, and set W represents corresponding weight
266
value for a given edge. Consistent with the idea originated from wireless multi-hop network
267
where nodes communicate in a limited range[72], due to the limited social power, an individual
268
in a social network can impose meaningful influence on others only located in its local small
269
world. Motivated by this inspiration, if node i is connected directly with node j, denoted
270
as eij ∈ E, we express that node i is the one-hop neighbor of node j and vice versa. Cor-
271
respondingly, i has a direct influence on its one-hop neighbor j which can be represented as
272
DIi . Considering the analogous case where node i and node j have not the direct connection
273
but mutual neighbor node k, we call node i and node j are two-hop neighbors. It means node
274
i has the capability to influence node j through their common neighbor node k. For example,
275
the contagion processes of social norms and technological innovation are largely attributed to
Jo
urn
al
259
10
Journal Pre-proof
their mediators[73]. Related examples abound in the real network. Similarly, we define this
277
two-hop-distance influence of node i exerting on node j as indirect influence denoted as IIi .
278
According to the above discussion, we build a practical centrality measurement to evaluate the
279
influence of each node based on the concept of information entropy in the following section.
280
4.1. Entropy-based centrality measurement
of
276
With the purpose of interpreting the definition of entropy-based centrality measurement,
282
we consider an undirected, weighted local network G(V, E, W ) mentioned above and the net-
283
work topology can be demonstrated in Figure 1. In this schema, each node in the set V
284
represents an individual in a social network, E denotes the set of undirected edges connecting
285
two adjacent individuals, and the set of weight W corresponds to the intimacy of connection
286
through which the influence flows. As shown on the side of each edge, wij denotes the weighted
287
value on the given edge connecting node i and node j. To quantify the influence of a given in-
288
dividual, we deconstruct individual’s influence into two components including direct influence
289
DI and indirect influence II, achieved by integrating the connection weight and confidence
290
level to the information entropy.
Pr e-
p ro
281
In order to compute on direct influence, we proposed a novel definition of entropy-based
292
centrality, which takes into consideration two aspects including connection intimacy of in-
293
dividual and confidence level among neighbors, each focusing on the personal emotion that
294
determines the diffusion of social influence. We believe that a more effective influence indi-
295
cator will be demonstrated when considering the connection weight generally referred to as
296
the interaction intimacy. The higher the connection weight for individual pairs, the greater
297
the degree of mutual influence between them. Motivated by this idea, the weight influence
298
entropy for individual i, denoted as DIiw , is defined as follows:
urn
al
291
Jo
DIiw
=−
Ni ∑ j=1
wij ∑Ni k=1
wik
wij · log10 ∑Ni k=1
wik
(2)
299
Where Ni denotes the total number of neighbors of individual i, and wij indicates the weight
300
of connection bonding individual i with its neighbor j.
301
Generally, the probability of an individual i transmitting influence to his or her neighbor
302
j will not always be consistent. Additionally, which neighbor is chosen as a recipient of
303
information depends on how trustworthy individual i think neighbor j. Therefore, considering 11
Journal Pre-proof
e1,4, w1,4
e1,2 , w
V1 3
1, 2
,w
, w 7,8 e 7,8
V7
e7
7, 12
, 12
, 12
,w
V12
,w
V10
, w 10 1
11 ,
1 0, V11 e 1
e11,12 , w11,12
Pr e-
9, 12
,w
7, 11
e7
e9
, 10
8, 10
, 11
,w
V9
V8 e 8
5, 10
e 7,9, w7,9
of
3, 7
6, 7
e5,10 , w
e6,7 , w
e 6,9, w6,9
3,5
V5
e3,7 , w
V6
e3,5 , w
V3
e3,6, w3,6
p ro
e4
, 6
,w e 2,3
w 2 ,5
e1,3 , w1,
V2 3 2,
e 2,5,
4, 6
V4
Figure 1: An undirected weighted social network
304
the heterogeneity of confidence in neighbors, we define the confidence level according to the
305
recipient’s relative social status measured by its degree ratio among all neighbors, where the
306
probability of confidence level Tij is given as follows:
al
Tij = ∑
kjβ
(3)
β l∈Ni (kl )
Where Ni indicates the set of individual i’s neighbors, kj indicates the degree of recipient j,
308
and tunable parameter β called the confidence strength reinforces the sensitivity of recipient’s
309
degree kj to the confidence probability Tij . The individual’s degree kj can refer to the level
310
of node heterogeneity. When β > 0, it means that individual i has more trend to influence
311
those who possess a higher degree and vice versa. Specially β = 0, individual i influences all
312
neighbors with equal probability of confidence.
314
Jo
313
urn
307
Based on the above discussion, the definition of the confidence influence entropy DIic for individual i is stated as follows: DIic
=−
Ni ∑ j=1
Tij · log10 Tij = −
Ni ∑ j=1
∑
kjβ
β l∈Ni (kl )
· log10 ∑
kjβ
β l∈Ni (kl )
(4)
315
As explained above, since the weight influence entropy and the confidence influence entropy
316
play the componential role in constructing the direct influence of i on its one-hop neighbors, 12
Journal Pre-proof
317
denoted as DIi , we define the direct influence as the summation of DIiw and DIic multiplied
318
by two coefficients respectively, which is represented as follows: DIi = θ1 · DIiw + θ2 · DIic
(5)
Where θ1 and θ2 denote the weight coefficients of DIiw and DIic , respectively, and note that
320
θ1 + θ2 = 1.
of
319
Considering the case that individual k is one of the two-hop neighbors of individual i,
322
let Nik denote the total number of the mutual one-hop neighbors between individual i and
323
individual k. Namely, it means that the Nik paths exist between i and k. After already
324
quantifying the direct influence on one-hop neighbors according to the above discussion, here
325
it naturally raises a question about how to calculate the influence of i on its two-hop neighbor
326
k. Motivated by the relationship theory of three degrees proposed in[71], the indirect influence of individual i on its two-hop neighbor k, denoted as IIik , is given by: ∑Nik j=1 DIi · DIj IIik = Nik
Pr e-
327
p ro
321
(6)
328
Where DIi indicates the direct influence of i, and DIj indicates the direct influence of each
329
mutual neighbor j between i and k.
330
and k, which is shown in Figure 2. Thus, the indirect influence of i on k is stated as follows:
al
331
Let us take Nik = 3 for example, and it means there are three paths between individual i
DIi · DIj + DIi · DIl + DIi · DIm 3
urn
IIik =
332
333
334
Jo
Vi
DIi DIi
DIi
Vj Vl Vm
(7)
DIj DIl
Vk
DIm
Figure 2: Three paths between i and k
Based on the above distinct analysis, the average indirect influence of individual i on its all two-hop neighbors, denoted as IIi , is represented as follows: ∑Mi IIik IIi = k=1 Mi 13
(8)
Journal Pre-proof
335
Where Mi indicates the total number of individual i’s two-hop neighbors. Accordingly, the
336
indirect influence of the node v1 in Figure 1 can be expressed as follows: II1 =
338
(9)
According to the above illustration, the total influence of individual i, represented by Ii , can be denoted as follows: Ii = φ1 · DIi + φ2 · IIi
of
337
(DI1 · DI2 + DI1 · DI3 )/2 + DI1 · DI3 + (DI1 · DI4 + DI1 · DI3 )/2 3
(10)
Where the coefficients φ1 and φ2 stand for the weight of direct influence and indirect influence,
340
respectively, and φ1 + φ2 = 1.
341
4.2. Seeding strategies
p ro
339
In line with the given centrality measurements mentioned above, including betweenness
343
centrality, closeness centrality degree centrality, eigenvector centrality, and the proposed
344
entropy-based centrality, correspondingly, we rank all nodes in descending order and then
345
compare the simulation results of the single-stage seeding strategy with that of sequential
346
seeding strategy based on the same social network, the identical parameters of diffusion model,
347
and the equal number of seed targets. In this subsection, two kinds of seeding strategies are
348
unfolded in discrete steps as follows:
349
Single-stage seeding strategy:
al
Pr e-
342
(1) At the time tick t0 , the top k most influential nodes are selected as the initial seed targets
351
based on a given centrality measurement and activate them all, which indicates that they
352
have the capacity to influence their neighbors. Simultaneously, it means the start of
353
influence diffusion;
urn
350
(2) At the next time tick t > t0 , each newly activated node i in the previous step has a
355
chance to activate its direct inactive neighbor j with the probability pij = 1 − (1 − γ)wij .
356
Regardless of whether the activation succeeds or not, the activated nodes will not make
357
further attempts to activate their neighbors in the subsequent time ticks;
Jo
354
358
(3) The influence diffusion process continues without any external support until no more
359
activated nodes appear at certain time tick TSN referred to as saturation time, attaining
360
the influence coverage CSN . 14
Journal Pre-proof
Sequential seeding strategy:
362
In the sequential seeding strategy, we take the same number of seed targets as in the case
363
of single-stage seeding strategy, but activate them among m consecutive seeding frequency
364
interspersed with diffusion stages. Especially, once the last diffusion stage stops, another
365
fraction of seed targets are selected to trigger the new diffusion process until the sum of all
366
selected seed targets is equal to k, and the seed selection for each seeding action is based on
367
the yet inactive node rankings. Here, what needs to be emphasized is that the same number
368
of seed targets are selected uniformly for each seeding frequency, and the heuristic strategy of
369
uneven sequential seeding rule will be investigated in our further work.
370
(1) At the initial time tick t0 , the top x1 (x1 < k) most influential seeds are selected based
371
on a given centrality measurement and make them active, which triggers the influence
372
diffusion process;
Pr e-
p ro
of
361
(2) At any time tick t > t0 in the first diffusion stage, each activated seed has only one chance
374
to activate their inactive neighbors with the probability pij = 1 − (1 − γ)wij . Hence, the
375
only way to continue diffusion is activating the top x2 (x1 + x2 ≤ k) ranked nodes not yet
376
activated. In the subsequent seeding frequency m, the next activation is suspended until
377
the coverage derived from current diffusion step does not increase, unless the all k seed
378
targets are run out;
380
(3) Repeat the above procedures, and the whole diffusion process ends with maximization of influence coverage CSQ at the saturation time TSQ .
urn
379
al
373
Contrast to the single-stage seeding strategy consisting of single activation and one diffusion
382
process, the sequential seeding strategy embodies multistage diffusion process in which each
383
activation is followed by homologous diffusion stage. For each of the sequential seeding action,
384
the new seeds are selected from those not yet activated nodes, and this allows to avoid selecting
385
seeds that will be activated anyway through spontaneous diffusion.
386
5. Evaluation of model performance
Jo
381
387
In this section, we conduct several experiments and analyze related results. Since the
388
network structure plays a significant role in modeling the influence diffusion and validating 15
Journal Pre-proof
the efficiency of the proposed centrality measurement, we conduct extensive experiments on
390
several networks, including two empirical networks and four classical artificial networks, to
391
compare the relative performance with other centrality measurements. The two empirical
392
social networks are constructed by collecting the relative data in Facebook.com, where the
393
propagation phenomenon of influence, such as the US presidential campaign, the promotion
394
of innovation policies, and the spread of fashion, is ubiquitous. Due to the limitation in our
395
computational capacity, we have to select the two undirected unweighted online collegiate
396
networks: Caltech Facebook Network[74] with 769 nodes and 16656 edges, and Princeton
397
Facebook Network[75] with 6596 nodes and 293320 edges, respectively. The two networks
398
consist of the complete set of users and relationship extracted from the Facebook network
399
of California Institute of Technology and Princeton University. Each node corresponds to an
400
individual and the undirected edge indicates the social connection. The network data statistics
401
are calculated using Python’s Networkx package and displayed in Table 1.
Pr e-
p ro
of
389
In addition, we also verify the performance of the proposed centrality on the four classical
403
artificial networks modeled by the BA scale-free network, the regular ring network, the WS
404
small-world network, and the ER random network, respectively. The generation processes of
405
these network structures are stated as follows: The BA scale-free network can start from fully
406
connected nodes, a new node with m(m < m0 ) edges is added to the existing network at each
407
time step according to the preferential attachment, i.e., the probability of being connected
408
to the existing node i is proportional to its degree ki . In simulations, we set N = 769 for
409
comparison with empirical social network Caltech, and m = 2, i.e., the average degree is
410
about 4; In the regular ring network, the starting point is an N nodes ring, in which each
411
node is symmetrically connected to its 2m(m for each side) nearest neighbors for a total of
412
N m edges. Here, we also set N = 769 and m = 2; The WS small-world network can be
413
constructed by starting from a regular ring network with 769 nodes and 4 edges per node,
414
and then rewiring each edge with a random probability p = 0.3; Similarly, we generate ER
415
random network with 769 nodes and average degree < k >= 4, achieved by rewiring each edge
416
with the probability p = 0.3. Here, we randomly assign values to weights ranging from 0 to 1
417
for simplicity. After establishing the above network structures, we generate a random number
418
(the value varies between 0 and 1), which is subject to the uniform distribution, for each edge
419
as its weight.
Jo
urn
al
402
16
Journal Pre-proof
Table 1: Basic information of two real-world social networks Edges
Density
Maximum degree
Average degree
Assortativity
Clustering coefficient
Maximum k-core
Caltech
769
16656
0.0564
248
43
-0.0653
0.4093
36
Princeton
6596
293320
0.0135
628
88
0.0911
0.2369
77
5.1. Model comparison
of
420
Nodes
In order to obtain the detailed comparison between the proposed model and the other
422
centrality measurements mentioned above under the condition of single-stage seeding strategy
423
and sequential seeding strategy respectively, we repeat the numerical simulation 100 times on
424
the same network for each comparison and take the average value of the influence coverage
425
which can be seen as an indicator of the centrality’s effectiveness. In each simulation of
426
different seeding strategies, we select k = 10 top influential nodes as seed targets in total.
427
Correspondingly, the results are presented in Figure 3.
Pr e-
p ro
421
Figure 3 illustrates the influence coverage of single and sequential stage seeding strategies
429
varies over time under different centralities in six networks when related parameters are set as
430
follows: β = 2, γ = 0.4, θ1 = θ2 = 0.5, φ1 = φ2 = 0.5, m = 5. The set of subgraphs in the left
431
column corresponds to the single-stage seeding strategy and the one on the right corresponds
432
to the sequential seeding strategy. Our main purpose is to compare the performance of seeding
433
strategies under different centralities. As shown in Figure 3, we can confirm intuitively the
434
fact that influence coverage increases with the time step. For a given centrality measurement,
435
the initial propagation speed of single stage seeding strategy outperforms the counterpart of
436
sequential seeding strategy, but the latter achieves generally a larger range of influence arrival.
437
Therefore, there exists a balance of coverage and speed for seeding strategies. The coverage of
438
each centrality varies with different seeding strategies in different networks. For example, the
439
results shown in BA scale-free network contain two subfigures: the influence coverage of five
440
centralities are almost identical on the left single-stage seeding strategy (Figure 3(a)) and the
441
entropy-based centrality performs best on the right sequential seeding strategy (Figure 3(b)).
442
Further, in the simulations of the left column, it is worth noting that the curves generated in
443
ER random network (Figure 3(g)), Caltech (Figure 3(i)) and Princeton (Figure 3(k)) social
444
network are almost overlapping for different centralities, which means that the performance of
445
entropy-based centrality is similar to that of other four classic centralities in those networks
Jo
urn
al
428
17
p ro
of
Journal Pre-proof
(b) sequential seeding in BA scale-free network
al
Pr e-
(a) single-stage seeding in BA scale-free network
(d) sequential seeding in regular network
Jo
urn
(c) single seeding in regular network
(e) single seeding in WS small-world network
(f) sequential seeding in WS small-world network
18
p ro
of
Journal Pre-proof
(h) sequential seeding in ER random network
Pr e-
(g) single seeding in ER random network
(j) sequential seeding in Caltech collegiate network
Jo
urn
al
(i) single seeding in Caltech collegiate network
(k) single seeding in Princeton collegiate network
(l) sequential seeding in Princeton collegiate network
Figure 3: The influence coverage of single-stage seeding strategy vs. sequential seeding strategy based on five centralities in different networks
19
Journal Pre-proof
and implies the structure of the two chosen empirical social networks may be analogous to
447
the ER random network. Similarly, for sequential seeding strategy in ER random network
448
(Figure 3(h)), Caltech (Figure 3(j)) and Princeton (Figure 3(l)) social network, several curves
449
of different centralities almost coincide with each other. However, the priority of performance
450
for the five centralities in the BA scale-free network (Figure 3(b)), regular network (Figure
451
3(d)), WS small-world network (Figure 3(f)) differ from each other. This indicates that the
452
network structure has a significant impact on the performance of sequential seeding strategy.
453
Even though the effectiveness of eigenvector centrality is significantly superior to that of
454
all other centralities in the regular network (Figure 3(d)), eigenvector centrality performs
455
badly than closeness centrality and the proposed entropy-based centrality in the BA scale-free
456
network (Figure 3(b)).
p ro
of
446
In above four typical type of artificial networks (in fact, the Caltech network and Princeton
458
network can be classified as the ER random network according to the above analysis), we have
459
observed the abnormal situation which shows the consistent performance under single-stage
460
seeding strategy and sequential seeding strategy in ER random network. As shown in Figure
461
3(g),3(h),3(i),3(j),3(k),3(l), the curves generated from five centralities are almost overlapping
462
for both seeding strategies. One plausible reason we guess is that due to the randomness
463
of the network structure, which may lead to the indistinguishable difference between the
464
five centralities, the speed of propagation is so rapid that the influence coverage reaches its
465
maximum value at the first diffusion stage of sequential seeding strategy, which is not much
466
different from the single-stage seeding strategy. In addition, Figure 3 shows an intuitive result
467
that the sequential seeding strategy has a larger influence coverage than single-stage seeding
468
strategy for the corresponding centrality, while the consuming time is exactly opposite. Note
469
that some of the seed targets that can be influenced naturally by its neighbor nodes should
470
not have been selected as initial seeds. There exists a balance of influence coverage and time
471
consumed.
Jo
urn
al
Pr e-
457
472
In summary, in terms of the influence coverage in BA scale-free network (Figure 3(a),3(b)),
473
the performance of the sequential seeding strategy is better than that of the single-stage seed-
474
ing strategy regardless of based on any centralities, therein, the performance of the proposed
475
entropy-based centrality is optimal. In subgraph (Figure 3(d)), both the diffusion speed and
476
the influence coverage in sequential seeding strategy are more outstanding when applying 20
Journal Pre-proof
eigenvector centrality than all other centralities, of which the performance is almost the same.
478
However, the influence coverage for the proposed entropy-based centrality is still slightly opti-
479
mal in the single-stage seeding strategy (Figure 3(c)). Comparing the two strategies in terms
480
of influence coverage in Figure 3(e),3(f), the proposed centrality is inferior to the optimal
481
degree centrality. The curves in the rest of subgraphs(Figure 3(g),3(h),3(i),3(j),3(k),3(l)) are
482
roughly the same since their underlying network structure is of the identical type. As can be
483
observed from the left column of subgraphs in BA scale-free network (Figure 3(a)), regular
484
network (Figure 3(c)), and WS small-world network (Figure 3(e)), although the acceleration
485
of entropy-based centrality is poorly graded for single-stage seeding strategy, final influence
486
coverage of entropy-based centrality reaches at a higher level than in other centralities. How-
487
ever, the situation is different for sequential seeding strategy in the right column of subgraphs
488
(Figure 3(b),3(d),3(f)). In BA scale-free network (Figure 3(b)), for example, the coverage of
489
entropy-based centrality for sequential seeding strategy shoots up to the maximum value at
490
the highest acceleration, but neither acceleration nor coverage for entropy-based centrality can
491
be seen to have drastic effects in the case of regular network (Figure 3(d)) and WS small-world
492
network (Figure 3(f)).
493
5.2. Parameter analysis
Pr e-
p ro
of
477
To systematically investigate how model parameters affect the performance of sequential
495
seeding strategy, we conduct experiments by individually tuning related parameter to obtain
496
the influence coverage. In the light of the optimal performance of entropy-based centrality
497
under sequential seeding strategy in the BA scale-free network, we still apply this BA scale-
498
free network as the controlled test platform. In the following simulations, we mainly focus
499
on two model parameters: the confidence strength β which reflects the individual sensitivity
500
to its neighbor’s degree heterogeneity in the whole network, the seeding frequency m which
501
is to determine the trade-off between influence coverage and time cost. All other values of
502
parameters remain the same: γ = 0.4, θ1 = θ2 = 0.5, φ1 = φ2 = 0.5. The basic values of
503
confidence strength and seeding frequency are set as β = 2, m = 5, which means that when
504
we study the effect of a parameter, another parameter keeps unchanged at the basic value.
Jo
urn
al
494
505
Figure 4 describes the influence coverage of the proposed entropy-based centrality mea-
506
surement with confidence strength β in the BA scale-free network in the context of sequential
507
seeding strategy. Each value next to the data point indicates the corresponding time consumed 21
Pr e-
p ro
of
Journal Pre-proof
Figure 4: the non-monotonous behavior of influence coverage with the increase of confidence strength
by using sequential seeding strategy. The parameter of confidence strength captures the sen-
509
sitivity of a neighboring node affected by a node. Our original purpose is to observe whether
510
there exists a monotonous relationship between the confidence strength β and the influence
511
coverage. As can be seen from the results of Figure 4, as the value of β increases from −5 to
512
5 with an interval of 1, we find that values of the influence coverage show the obvious fluctu-
513
ations. For instance, as the value of β increases from -5 to 0 when β ≤ 0, the corresponding
514
influence coverage firstly and sharply decreases from 262 to 243, and then fluctuates crosswise
515
in a small range, and lastly and quickly increases from 251 to 278, corresponding to the value
516
of β from −1 to 0. Similarly, in the right part of the curve, similar behaviors are observed.
517
In short, the influence coverage reaches the maximum value in the case where β = 0. Then, a
518
question arises: what factors can bring about this oscillational phenomenon? Considering the
519
definition form of confidence influence entropy, since all probabilities of a node influencing its
520
different neighboring nodes are equal when �=0, i.e., the distribution of confidence in neigh-
521
boring nodes becomes most uniform (Tij = 1/Ni in Equation 3), the magnitude of confidence
522
influence entropy is vintage. This also corresponds to the fact that if there exist more elements
Jo
urn
al
508
22
Journal Pre-proof
contained in a system, the uncertainty about one outcome should increase. The results mean
524
that when the individual’s trust in neighbors is treated equally, that is, the social force on the
525
node’s heterogeneity is significantly less influential, it can achieve the largest total influence
526
coverage.
al
Pr e-
p ro
of
523
Figure 5: Results from sequential seeding strategies with four kinds of seeding frequency in the BA
urn
scale-free network
Figure 5 presents the influence coverage of the proposed entropy-based centrality mea-
528
surement with different seeding frequencies m over time t in the BA scale-free network. By
529
labeling and amplifying the beginning of curves shown in Figure 5, we can visually distinguish
530
four seeding strategies containing different seeding frequency. In the numerical simulations,
531
each data point is the average of 100 realizations on the generated BA scale-free network. The
532
statistics of simulation results are given in Figure 6. Due to the non-dynamic network struc-
533
ture, the standard deviation of each simulation result is very small relative to the network scale.
534
Therefore, the robustness of simulation performance can be guaranteed. Similar to Figure 3,
535
the related parameters are set as follows: β = 2, γ = 0.4, θ1 = θ2 = 0.5, φ1 = φ2 = 0.5, and the
536
total number of seeds is still set as 10. It’s worth noting that the curve of red dot for seeding
Jo
527
23
p ro
of
Journal Pre-proof
Figure 6: the statistical data about the standard deviation of 100 realizations on the generated BA scale-free network in Figure 5
frequency m = 2, which represents that sequential seeding strategy is performed in two stages
538
and five seed targets are selected and used in each stage, is relatively particular. Considering
539
the scale-free particularity of the BA network, we speculate that the network structure may
540
lead to this unexpected phenomenon. Generally speaking, the BA scale-free network contains
541
a small number of nodes with extremely large degree and a vast majority of nodes with very
542
small degree. Correspondingly, the nodes sorted by the value of entropy-based centrality have
543
such similar characteristics, i.e., a small number of nodes have significant influence and most
544
of the nodes have the general influence. For this critical situation of seeding frequency m = 2,
545
some nodes with extremely large power of influence are influenced naturally, leading to that
546
subsequent seed targets can only be selected among ordinary nodes and then slow infection.
urn
al
Pr e-
537
As can be seen from the results, as the value of seeding frequency m increases, the number
548
of influence coverage increases. On the other hand, the acceleration of influence diffusion
549
decreases as m increases. This is to say, there exists the trade-off between coverage and
550
acceleration of influence diffusion. The insights provided by the above results have several
551
management implications. For the company with an intense desire to occupy quickly market
552
share, the samples of the product should be distributed to individuals in a short time as pos-
553
sible. Adversely, when ignoring the acceleration of market share occupation, a company with
554
a desire to obtain market scale advantage should adopt the specific strategy which distributes
555
samples in as many stages as possible. In other words, under the same marketing cost budget,
556
the larger the number of marketing actions of distributing samples a company has, the greater
Jo
547
24
Journal Pre-proof
557
the market share it ultimately gets.
558
6. Conclusion In this paper, we present an improved centrality measurement elaborated on the concept of
560
entropy, which takes into account the weight of connection and confidence level. We propose a
561
new method to quantify the influence entropy for each individual, containing weight influence
562
entropy and confidence influence entropy, and then design seeding strategies. By applying
563
a widely realistic diffusion model to capture the influence diffusion process, we focus on the
564
performance of sequential seeding strategy compared with the counterpart of the single-stage
565
seeding strategy. Firstly, we compare the influence coverage of several type of centrality
566
measurements in the context of sing-stage seeding strategy and sequential seeing strategy,
567
respectively, and the extensive experimental results demonstrate that the proposed entropy-
568
based centrality is superior to other centralities in terms of diffusion speed and influence
569
coverage in the BA scale-free network. Then, based on the proposed entropy-based centrality,
570
we investigate the effect of the confidence strength β on the influence coverage by utilizing
571
sequential seeding strategy in the BA scale-free network. Finally, we observe the performance
572
of sequential seeding strategies with entropy-based centrality and give the correlation between
573
the influence coverage and parameter m of the seeding frequency. It is worth emphasizing
574
that there exists a critical situation due to the specific structure of the BA scale-free network.
575
By deeply excavating the topology of the social network, we can better choose the proper
576
seeding strategy to obtain the optimal diffusion effect. Simulation results show that the
577
proposed entropy-based centrality is superior to other centralities in terms of diffusion speed
578
and influence coverage in the BA scale-free network. Parameter analysis of sequential seeding
579
strategy demonstrates that the proposed centrality can achieve the greatest total influence
580
coverage in the case where the individual’s confidence in each neighbor is treated equally.
Jo
urn
al
Pr e-
p ro
of
559
581
As for future work, more sequential seeding strategies about entropy-based centrality will
582
be further investigated. For example, it is a significant issue to find the appropriate compromise
583
between the sequential seeding strategies and the cost of diffusion. Moreover, the seeding
584
strategy on dynamic social networks is also a promising research direction. Taking this one
585
step further, we expect our findings can provide an inspiration to observe more potential
586
properties of seeding strategy, and can be extended in the real-world networks. 25
Journal Pre-proof
587
Acknowledgment The authors are grateful to the anonymous reviewers and the editor for their valuable
589
comments and suggestions. This research is supported by National Science Foundation of
590
China (No. 71672065).
591
References
594
595
596
597
016128.
p ro
593
[1] M. E. J. Newman, Spread of epidemic disease on networks, Physical review E 66 (2002)
[2] D. Zinoviev, V. Duong, H. Zhang, A game theoretical approach to modeling information dissemination in social networks, arXiv preprint arXiv:1006.5493 (2010). [3] C. Mast, S. Huck, A. Zerfass, Innovation communication, Innovation Journalism 2 (2005)
Pr e-
592
of
588
165.
598
[4] W. Chen, Y. Wang, S. Yang, Efficient influence maximization in social networks, in:
599
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery
600
and data mining, ACM, 2009, pp. 199–208.
603
604
communication mix, Management science 54 (2008) 477–491.
al
602
[5] Y. Chen, J. Xie, Online consumer review: word-of-mouth as a new element of marketing
[6] V. Mahajan, E. Muller, R. A. Kerin, Introduction strategy for new products with positive and negative word-of-mouth, Management Science 30 (1984) 1389–1404.
urn
601
605
[7] E. D’Andrea, P. Ducange, A. Bechini, A. Renda, F. Marcelloni, Monitoring the public
606
opinion about the vaccination topic from tweets analysis, Expert Systems with Applica-
607
tions 116 (2019) 209–226.
[8] Z. Zhao, H. Lu, D. Cai, X. He, Y. Zhuang, User preference learning for online social
609
recommendation, IEEE Transactions on Knowledge and Data Engineering 28 (2016)
610
2522–2534.
Jo
608
611
[9] J. Lim, T. Li, The optimal advertising-allocation rules for sequentially released products:
612
the case of the motion picture industry, Journal of Advertising Research 58 (2018) 228–
613
239. 26
Journal Pre-proof
614
[10] M. Rafiei, A. A. Kardan, A novel method for expert finding in online communities based
615
on concept map and pagerank, Human-centric computing and information sciences 5
616
(2015) 10.
618
[11] U. Brandes, S. P. Borgatti, L. C. Freeman, Maintaining the duality of closeness and betweenness centrality, Social Networks 44 (2016) 153–159.
of
617
[12] J. Zhao, T.-H. Yang, Y. Huang, P. Holme, Ranking candidate disease genes from gene
620
expression and protein interaction: a katz-centrality based approach, PloS one 6 (2011)
621
e24306.
622
623
p ro
619
[13] P. Hage, F. Harary, Eccentricity and centrality in networks, Social networks 17 (1995) 57–63.
[14] I. Poulakakis, G. F. Young, L. Scardovi, N. E. Leonard, Information centrality and
625
ordering of nodes for accuracy in noisy decision-making networks, IEEE Transactions on
626
Automatic Control 61 (2015) 1040–1045.
Pr e-
624
[15] D. Chen, L. Lü, M.-S. Shang, Y.-C. Zhang, T. Zhou, Identifying influential nodes in
628
complex networks, Physica a: Statistical mechanics and its applications 391 (2012)
629
1777–1787.
al
627
[16] C. Gao, D. Wei, Y. Hu, S. Mahadevan, Y. Deng, A modified evidential methodology of
631
identifying influential nodes in weighted networks, Physica A: Statistical Mechanics and
632
its Applications 392 (2013) 5490–5500.
634
635
636
[17] T. Petermann, P. De Los Rios, Role of clustering and gridlike ordering in epidemic spreading, Physical Review E 69 (2004) 066116. [18] D.-B. Chen, H. Gao, L. Lü, T. Zhou, Identifying influential nodes in large-scale directed
Jo
633
urn
630
networks: the role of clustering, PloS one 8 (2013) e77455.
637
[19] M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, H. A. Makse,
638
Identification of influential spreaders in complex networks, Nature physics 6 (2010) 888.
639
[20] A. Zeng, C.-J. Zhang, Ranking spreaders by decomposing complex networks, Physics
640
Letters A 377 (2013) 1031–1035. 27
Journal Pre-proof
641
642
643
644
[21] S. Pei, L. Muchnik, J. S. Andrade Jr, Z. Zheng, H. A. Makse, Searching for superspreaders of information in real-world social media, Scientific reports 4 (2014) 5547. [22] J.-G. Liu, Z.-M. Ren, Q. Guo, Ranking the spreading influence in complex networks, Physica A: Statistical Mechanics and its Applications 392 (2013) 4154–4159. [23] Q. Hu, Y. Gao, P. Ma, Y. Yin, Y. Zhang, C. Xing, A new approach to identify influential
646
spreaders in complex networks, in: International Conference on Web-Age Information
647
Management, Springer, 2013, pp. 99–104.
649
p ro
648
of
645
[24] B. Min, F. Liljeros, H. A. Makse, Finding influential spreaders from human activity beyond network location, PloS one 10 (2015) e0136831.
[25] Y. Liu, M. Tang, T. Zhou, Y. Do, Improving the accuracy of the k-shell method by
651
removing redundant links: from a perspective of spreading dynamics, Scientific reports
652
5 (2015) 13172.
656
657
658
659
660
661
662
[27] S. Cao, M. Dehmer, Y. Shi, Extremality of degree-based graph entropies, Information Sciences 278 (2014) 22–33.
al
655
of the National academy of Sciences 102 (2005) 16569–16572.
[28] Z. Chen, M. Dehmer, Y. Shi, Bounds for degree-based network entropies, Applied Mathematics and Computation 265 (2015) 983–993.
urn
654
[26] J. E. Hirsch, An index to quantify an individual’s scientific research output, Proceedings
[29] A. G. Nikolaev, R. Razib, A. Kucheriya, On efficient use of entropy centrality for social network analysis and community detection, Social Networks 40 (2015) 154–162. [30] S. Cao, M. Dehmer, Degree-based entropies of networks revisited, Applied Mathematics and Computation 261 (2015) 141–147.
Jo
653
Pr e-
650
663
[31] T. Nie, Z. Guo, K. Zhao, Z.-M. Lu, Using mapping entropy to identify node centrality
664
in complex networks, Physica A: Statistical Mechanics and its Applications 453 (2016)
665
290–297.
666
667
[32] L. Fei, Y. Deng, A new method to identify influential nodes based on relative entropy, Chaos, Solitons & Fractals 104 (2017) 257–267. 28
Journal Pre-proof
669
670
671
672
673
[33] S. Peng, A. Yang, L. Cao, S. Yu, D. Xie, Social influence modeling using information theory in mobile social networks, Information Sciences 379 (2017) 146–159. [34] T. Qiao, W. Shan, C. Zhou, How to identify the most powerful node in complex networks? a novel entropy centrality approach, Entropy 19 (2017) 614. [35] Y. Ni, L. Xie, Z.-Q. Liu, Minimizing the expected complete influence time of a social network, Information Sciences 180 (2010) 2514–2527.
of
668
[36] D. Kempe, J. Kleinberg, É. Tardos, Maximizing the spread of influence through a so-
675
cial network, in: Proceedings of the ninth ACM SIGKDD international conference on
676
Knowledge discovery and data mining, ACM, 2003, pp. 137–146.
p ro
674
[37] J. Jankowski, P. Bródka, P. Kazienko, B. K. Szymanski, R. Michalski, T. Kajdanow-
678
icz, Balancing speed and coverage by sequential seeding in complex networks, Scientific
679
reports 7 (2017) 891.
682
683
684
685
empirical comparison, Journal of Marketing 75 (2011) 55–71. [39] Y. Ni, Sequential seeding to optimize influence diffusion in a social network, Applied Soft Computing 56 (2017) 730–737.
al
681
[38] O. Hinz, B. Skiera, C. Barrot, J. U. Becker, Seeding strategies for viral marketing: an
[40] J. Jankowski, B. K. Szymanski, P. Kazienko, R. Michalski, P. Bródka, Probing limits of information spread with sequential seeding, Scientific reports 8 (2018) 13996.
urn
680
Pr e-
677
[41] S. P. Borgatti, Centrality and network flow, Social networks 27 (2005) 55–71.
687
[42] Q. Liu, T. Hong, Sequential seeding for spreading in complex networks: influence of
688
the network topology, Physica A: Statistical Mechanics and its Applications 508 (2018)
689
10–17.
690
691
692
693
Jo
686
[43] P. Bonacich, Factoring and weighting approaches to status scores and clique identification, Journal of mathematical sociology 2 (1972) 113–120. [44] T. L. Griffiths, M. Steyvers, A. Firl, Google and the mind: Predicting fluency with pagerank, Psychological Science 18 (2007) 1069–1076. 29
Journal Pre-proof
694
695
696
697
[45] A. Banerjee, A. G. Chandrasekhar, E. Duflo, M. O. Jackson, The diffusion of microfinance, Science 341 (2013) 1236498. [46] K. Stephenson, M. Zelen, Rethinking centrality: Methods and examples, Social networks 11 (1989) 1–37. [47] K. L. Calvert, E. W. Zegura, M. J. Donahoo, Core selection methods for multicast routing,
699
in: Proceedings of Fourth International Conference on Computer Communications and
700
Networks-IC3N’95, IEEE, 1995, pp. 638–642.
p ro
of
698
701
[48] H. Q. Cheng, Y. Y. Shen, M. P. Fei, G. Yang, Y. Zhang, X. C. Xiao, A new approach to
702
identify influential spreaders in complex networks, ActaPhys. Sin 62 (2013) 140101.
703
[49] P. Basaras, D. Katsaros, L. Tassiulas, Detecting influential spreaders in complex, dynamic
707
708
709
710
711
712
713
714
Pr e-
706
[50] L. Lü, T. Zhou, Q.-M. Zhang, H. E. Stanley, The h-index of a network node and its relation to degree and coreness, Nature communications 7 (2016) 10168. [51] C. E. Shannon, A mathematical theory of communication, Bell system technical journal 27 (1948) 379–423.
[52] T. Qiao, W. Shan, G. Yu, C. Liu, A novel entropy-based centrality approach for identi-
al
705
networks, Computer 46 (2013) 24–29.
fying vital nodes in weighted networks, Entropy 20 (2018) 261. [53] F. Tutzauer, Entropy as a measure of centrality in networks characterized by path-transfer
urn
704
flow, Social networks 29 (2007) 249–265. [54] L. Seeman, Y. Singer, Adaptive seeding in social networks, in: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, IEEE, 2013, pp. 459–468. [55] A. Sela, I. Ben-Gal, A. S. Pentland, E. Shmueli, Improving information spread through
716
a scheduled seeding approach, in: Proceedings of the 2015 IEEE/ACM International
717
Conference on Advances in Social Networks Analysis and Mining 2015, ACM, 2015, pp.
718
629–632.
719
720
Jo
715
[56] G. Tong, W. Wu, S. Tang, D.-Z. Du, Adaptive influence maximization in dynamic social networks, IEEE/ACM Transactions on Networking (TON) 25 (2017) 112–125. 30
Journal Pre-proof
721
[57] J. Jankowski, P. Bródka, R. Michalski, P. Kazienko, Seeds buffering for information
722
spreading processes, in: International Conference on Social Informatics, Springer, 2017,
723
pp. 628–641.
725
[58] F. Chierichetti, J. Kleinberg, A. Panconesi, How to schedule a cascade in an arbitrary graph, SIAM Journal on Computing 43 (2014) 1906–1920.
of
724
[59] S. Lin, Q. Hu, F. Wang, S. Y. Philip, Steering information diffusion dynamically against
727
user attention limitation, in: 2014 IEEE International Conference on Data Mining, IEEE,
728
2014, pp. 330–339.
p ro
726
[60] D. Goldenberg, A. Sela, E. Shmueli, Timing matters: influence maximization in so-
730
cial networks through scheduled seeding, IEEE Transactions on Computational Social
731
Systems 5 (2018) 621–638.
Pr e-
729
732
[61] D. Kempe, J. Kleinberg, É. Tardos, Influential nodes in a diffusion model for social
733
networks, in: International Colloquium on Automata, Languages, and Programming,
734
Springer, 2005, pp. 1127–1138.
736
[62] Y. Gang, Z. Tao, W. Jie, F. Zhong-Qian, W. Bing-Hong, Epidemic spread in weighted scale-free networks, Chinese Physics Letters 22 (2005) 510.
al
735
[63] W. Wang, M. Tang, H.-F. Zhang, H. Gao, Y. Do, Z.-H. Liu, Epidemic spreading on
738
complex networks with general degree and weight distributions, Physical Review E 90
739
(2014) 042803.
741
742
743
744
745
[64] D. Bonchev, N. Trinajstić, Information theory, distance matrix, and molecular branching, The Journal of Chemical Physics 67 (1977) 4517–4533. [65] M. Dehmer, Information processing in complex networks: graph entropy and information
Jo
740
urn
737
functionals, Applied Mathematics and Computation 201 (2008) 82–94. [66] J. L. Proops, Entropy, information and confusion in the social sciences, Journal of Interdisciplinary Economics 1 (1987) 225–242.
31
Journal Pre-proof
746
[67] W. Yang, G. Wang, M. Z. A. Bhuiyan, K.-K. R. Choo, Hypergraph partitioning for social
747
networks based on information entropy modularity, Journal of Network and Computer
748
Applications 86 (2017) 59–71. [68] J. D. Cruz, C. Bothorel, F. Poulet, Entropy based community detection in augmented
750
social networks, in: 2011 International Conference on computational aspects of social
751
networks (CASoN), IEEE, 2011, pp. 163–168.
753
[69] P. Yuan, H. Ma, H. Fu, Hotspot-entropy based data forwarding in opportunistic social
p ro
752
of
749
networks, Pervasive and Mobile Computing 16 (2015) 136–154.
[70] S. He, X. Zheng, D. Zeng, K. Cui, Z. Zhang, C. Luo, Identifying peer influence in online
755
social networks using transfer entropy, in: Pacific-Asia Workshop on Intelligence and
756
Security Informatics, Springer, 2013, pp. 47–61.
759
760
761
762
763
764
works and human behavior, Statistics in medicine 32 (2013) 556–577. [72] B. Latré, B. Braem, I. Moerman, C. Blondia, P. Demeester, A survey on wireless body area networks, Wireless Networks 17 (2011) 1–18.
[73] B. Wejnert, Integrating models of diffusion of innovations: A conceptual framework,
al
758
[71] N. A. Christakis, J. H. Fowler, Social contagion theory: examining dynamic social net-
Annual review of sociology 28 (2002) 297–326. [74] A. L. Traud, E. D. Kelsic, P. J. Mucha, M. A. Porter, Comparing community structure
urn
757
Pr e-
754
to characteristics in online collegiate social networks, SIAM review 53 (2011) 526–543. [75] C. L. Staudt, Y. Marrakchi, H. Meyerhenke, Detecting communities around seed nodes
766
in complex networks, in: 2014 IEEE International Conference on Big Data (Big Data),
767
IEEE, 2014, pp. 62–69.
Jo
765
32
Journal Pre-proof *Declaration of Interest Statement
Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Jo
urn
al
Pr e-
p ro
of
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: