Community structure enhanced cascade prediction

Community structure enhanced cascade prediction

Neurocomputing 359 (2019) 276–284 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Communi...

2MB Sizes 0 Downloads 41 Views

Neurocomputing 359 (2019) 276–284

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Community structure enhanced cascade prediction Chaochao Liu a,b, Wenjun Wang a,b, Yueheng Sun a,b,∗ a b

College of Intelligence and Computing, Tianjin University, Tianjin 300350, China Tianjin Key Laboratory of Advanced Networking (TANK), Tianjin 300350, China

a r t i c l e

i n f o

Article history: Received 7 December 2018 Revised 5 May 2019 Accepted 7 May 2019 Available online 7 June 2019 Communicated by Zidong Wang Keywords: Cascade prediction Deep learning Community influences Recurrent neural network Cascade behavior

a b s t r a c t Predicting cascade is a popular issue which can be applied in viral marketing, trending topic detection, network supervision and so on. Conventional methods heavily depend on the hypothesis of diffusion models or hand-crafted rules, which is difficult to generalize to other domains. Recently, researchers attempt to use deep learning methods to circumvent these problems. However, community structure has an important impact on cascade behavior, and almost no researchers take it into cascade prediction tasks. In this paper, we propose a community structure embedded deep learning framework(named by CS-RNN) to enhance the cascade prediction by containing the communist influence in cascade. Extensive experiments on both synthetic and real-world datasets demonstrate the proposed model outperforms state-of-the-art models in next activated nodes and their community structure labels prediction tasks. And parameter sensitivity analysis shows the robustness of our proposed method.

1. Introduction Nowadays, social media such as Twitter, Facebook, and Weibo have attracted millions of users. The post and repost behavior drive information propagated from one user to another. All the post and repost behavior make up cascades of information resharing. Many studies, which model and predict cascades of information resharing, have been made to support the launching campaign of viral marketing [1], trending topic detection [2], and information summarization [3] work in various social networks. Indeed, extensive work has been done on information cascade prediction of social networks in the literature. Early researches in this area are derived from the literature on the epidemiology. The well-known work is the independent cascade models (IC) [4,5] or the linear threshold models (LT) [6]. These models attempt to predict and understand the dynamic of observed propagation. Subsequently, work has focused on prediction tasks in social network [7–13]. These studies used manually designed and constructed features took both the contextual and social information into consideration to predict information cascade in social networks. Recent researches [14–18] show that the result of manually designed and constructed features based model is not robust for different kinds

∗ Corresponding author at: College of Intelligence and Computing, Tianjin University, Tianjin 300350, China. E-mail addresses: [email protected] (C. Liu), [email protected] (W. Wang), [email protected] (Y. Sun).

https://doi.org/10.1016/j.neucom.2019.05.069 0925-2312/© 2019 Elsevier B.V. All rights reserved.

© 2019 Elsevier B.V. All rights reserved.

of datasets, and they focus on using deep learning framework to model the information cascade. Actually, deep learning framework has been successfully used in the time-series prediction problems. Bahdanau et al. proposed a bidirectional recurrent neural network based model to predict the next word given the previous words of sentence [19]. Vinyals et al. presented a generative model based on a deep recurrent architecture that can be used to generate natural sentences describing an image [20]. However, the cascade prediction task not only involve time-series data but also involve structure information of the spreading nodes. Recently, researchers find that the community structure [21] has a remarkable effect on the dynamics of spreading on networks. Liu et al. found that community structures can facilitate biological spreading on static networks [22]. Ahn et al. found that there is an optimal community strength that can greatly promote social contagions [23]. However, they did not take the community structure information into the cascades of information resharing prediction area. In this paper, we use deep learning method to solve the modeling and predicting cascades of information resharing. And we take the community structure information into consideration. Thus, a new deep learning algorithm based on recurrent neural network(RNN) by considering the community structure information has been proposed to predict the cascade of information resharing, named by CS-RNN. In the proposed method, community structure label embedding vectors are involved in cascade representation. Thus, our method not only considers the historical sequential

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

state of activated nodes, but also the communities through which information is transmitted during the cascade behavior modeling. And we also present a community structure labels prediction layer. In this way, the method is able to predict next activated nodes, and community structure labels as well. We apply the proposed method on 4 synthetic datasets and the Digg real-world dataset with three baseline methods to compare the prediction performance of next activated nodes and their community structure labels. The parameter analysis for the dimension of community structure embedding vector is also executed to test the robustness of our model. It is worthwhile to highlight several contributions of the CSRNN model here: •







The model involves community structure information, which makes it more suitable than other RNN models to deal with the social task. CS-RNN contains a community structure labels prediction layer, which makes the model be able to predict community structure labels of next activated nodes. The parameter for the dimension of community structure embedding vector is robust. CS-RNN performs better than the other compared algorithms.

2. Related work In this section, we present the related concepts and work, which is mainly focused on cascade behavior modeling and community structure’s affection on the spreading behavior, respectively. Then, we give our point of view on existing cascade prediction and their connection to community structure. All these observations indeed motivate the work of this paper. Many efforts for information spread model have been done from different perspectives, including social influence [24,25], textual features [26], social features [9,27,28], history information [29], visual features [10], and combinations of different features [30]. Actually, recent studies also demonstrate that community structure plays an important role in information spreading. Weng et al. [31] found that features based on community structure are the most powerful predictors of future popularity of a meme given its early spreading patterns. Nematzadeh et al. [23] investigated the impact of community structure on information diffusion by using the linear threshold model, and they found that strong communities can facilitate global diffusion by enhancing local, intracommunity spreading. Wu et al. [32] investigated the impact of multi-community structure on information diffusion with linear threshold model, and they found that within the appropriate range, the multi-community structure will facilitate information diffusion. However, there are few kinds of researches consider community structure’s influence in cascade prediction models. There are also a variety of work studying different models to achieve the information spreading modeling task, including Matchbox [27,33], multiple additive regression trees (MART) [34], maximum entropy classifier [35], autoregressive-moving-average (ARMA) [36], factor graph model [7], conditional random fields [8] et al. However, these approaches heavily rely on the knowledge of domain experts or the hypothesis of diffusion models, e.g., independent cascade model and linear threshold model. Thus, they are usually difficult to generalize to other domains. Recently, inspired by the recent success of deep neural networks in a variety of applications, researchers attempt to circumvent the problem of cascade prediction by using the recurrent neural network(RNN) that for feature representation learning to predict cascade and do not require knowing the underlying diffusion model. Xiao et al. [16] proposed a composite neural network with two recurrent neural networks to interprets the conditional intensity function of a

277

point process as a nonlinear mapping. Zhang et al. [14] converted the content of the tweet, the user interests, the similarity information between the tweet and user interests, user information and author information to the representation of embeddings, and designed an attention mechanism to encode the interests of the user, their model finally to predict whether a tweet will be retweeted by a user. Wang et al. [15] showed that each cascade generally corresponds to a diffusion tree, causing cross-dependence in cascade, so they proposed an attention-based recurrent neural network to capture the cross-dependence in cascade. Liu et al. [17] defined parameters for every user with a latent influence vector and a susceptibility vector, and proposed an influence model to model cascade dynamics. Qiu et al. [18] designed an end-to-end framework for feature representation learning to predict social influence. In this paper, we also use the deep learning method for dealing with the cascade prediction task. 3. Proposed method 3.1. Problem definition The primary problem to be solved in this paper is to predict further activated nodes, the community labels of the further activated nodes and the specific time of the nodes activated. First, we introduce some definitions used in this paper: Definition 1. Network. A network denoted as G = (V, E ), where V is a set of nodes and E is a set of edges between the nodes. Definition 2. Cascade. A cascade S = {(ti , vi )|vi ∈ V, ti ∈ [0, +∞ ), ti ≤ ti+1 , i = 1, 2 . . . N} is a sequence set of information spreading start from a original post ascendingly ordered by time, where N represents the frequency of information spreading behaviors. The i−th behavior refering to the node vi resharing the information at ti time, recorded as (ti , vi ). Definition 3. Community. Since our used data contains community structure information, we introduce community definition of a network. Given a graph G(V, E), a community (or cluster, or cohesive subgroup) is a subgraph G(V , E ), whose nodes are tightly connected [37]. Suppose a graph G(V, E) can be partitioned as q communities L = {L1 , L2 . . . Lq }, a node vi is belong to a unique community Ll (where l = 1, 2 . . . q ), and it’s community label is defined as cvi ∈ Xx . To describe the scientific problem of this paper formally, we suppose that a collection of F cascades are observed, Q = {S f }Ff =1 . In one of an observed cascade, there are a list of time and node pairs up to the kth spreading behavior, refering to S ≤ k . And we also find the community labels of involved nodes cvi ∈ Xx . The objective of sequence modeling in cascade prediction, which is our paper’s task, is to predict the probability of next activated node vk+1 presents as p(vk+1 |S≤k ), the probability of next activated node’s community cvk+1 presents as p(cvk+1 |S≤k ), and the exactly time of the nodes activated presents as tvk+1 . 3.2. Model framework We propose an RNN-based deep learning model containing community structure information, named by CS-RNN, to solve the cascade prediction problem. The rationale of our model is that the infections of next node are not only related to the historical sequential state of activated nodes, but also the communities through which information is transmitted. Based on these ideas, we propose a new deep learning based cascade behavior modeling framework containing community structure information. The system framework of the method is shown in Fig 1.

278

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

Fig. 1. Our deep learning network can be trained end-to-end. Nodes are converted to low-dimensional vectors by node embedding layer, then feed into RNN network, and (0 ) (0 ) (0 ) we can get hidden state vectors h1 , h2 , . . . , hh . Community structure labels are converted to low-dimensional vectors by node embedding layer, and we can get hidden (2 )

(2 )

(2 )

state vectors cv1 , cv2 , . . . , cvh . Time series are extracted as inter-event duration, and feed into RNN network, finally we can get hidden state vectors h1 , h2 , . . . , hh . The three types of hidden state vectors are concatenated, then feed into linear and activation layers, we can get the cascade’s representation. Then three prediction layers are used to output the predicted community structure label, activated node, and the associated timestamp. Cross-entropy and square loss are respectively used for event type and timestamp prediction.

Input layer. At the kth step of a cascade, the node vk is initially convert to a low-dimensional vector vk ∈ Rdv , by an embedding layer with the weight matrix Wemv , to achieve a more compact and efficient representation

vk = WTemv vk

(1)

and dv is the dimension of the embedding vector. To model the influence of community structure, each community label cvk of node vk is initially convert to a low-dimensional vector cvk ∈ Rdc , by an embedding layer with the weight matrix Wemc , to achieve a more compact and efficient representation

cvk = WTemc cvk

(2)

and dc is the dimension of the embedding vector. In addition, for the timing input tk , we using inter-event duration tk − tk−1 as the temporal features tk . Hidden layer. In each step k, the model transforms the embedding vector of nodes as hidden user direct influence with the recurrent neural network, we can get the hidden state: (0 )

hk

(0 )

= RN N (vk , hk−1 )

(3)

(0 )

(0 )

where hk−1 represents the last hidden state, h0 initialized as all zero vector, and RNN represents the gated recurrent unit layer or long short-term memory unit layer. The model then incorporates the influence of community structure and hidden user direct influence together, represents as (1 )

hk

(0 )

= hk  cvk

(4)

where  represents the concatenate operation. The model also transforms the temporal features tk as hidden representations with a recurrent neural network, we can get the hidden state: (2 )

hk

(2 )

= RN N (tk , hk−1 )

(5)

(2 )

(2 )

where hk−1 represents the last hidden state of time, h0

initialized (1 )

as all zero vector. Finally, we concatenate the hidden states hk (2 )

and hk , add a linear layer and a activation layer to generate the hidden representation of cascade at step k as: cas hk

= δ(

WTh

(1 )

(2 )

(hk  hk ) + bh )

(6)

where WTh is the weight, bh is the bias, and δ () represents the activation function, and in this paper we use the prelu activation function. Community structure label generation. Given the learned hidden cas representation hk , we add a linear layer and an activation layer to project it into the same space with the community structure label embedding showed in Eq. (7): com

hk

= δ (WTcom hk + bcom ) cas

(7)

WTcom

where is the weight, and bcom is the bias. Then, we calcucom late cosine similarities between the hidden vector hk with the embedding vectors of all the community labels, and use a softmax layer to generate the probability distribution of next infected node’s community label as follow:

pcom = sigmoid (hk WTemc ) k com

(8)

where pcom ∈ RC , C is the number of community structure labels. k Next activated node generation. The model incorporates the probability distribution of next infected node’s community label and the hidden representation of cascade together, then add a linear layer and an activation layer to project the hidden representation into the same space with the node embedding showed in Eq. (9): node

hk

= δ (WTnode (hk  pcom k ) + bnode ) cas

(9)

where WTnode is the weight, and bnode is the bias. Finally, we calcunode

late cosine similarities between the hidden vector hk with the embedding vectors of all the nodes, and use a softmax layer to generate the probability distribution of next infected node as follow:

pnode = sigmoid (hk k

node

WTemv )

(10)

where pnode ∈ RN , N is the number of nodes. k Next activated time generation. Based on the hidden representation of cascade at step k, we are able to generate the time interevent duration between step k + 1 and step k by adding a linear layer following Eq. (11) cas

tk+1 − tk = WtT hk + bt where

WtT

is the weight, and bt is the bias.

(11)

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

4. Optimization We introduce our learning process of the model as below. Given a collection of cascades Q = {S f }Ff =1 , we treat the cascades are independent on each other. Thus, we can learn the model by maximizing the joint log-likelihood of observing Q in Eq. (12):

Loss(Q ) =

f −1 F N 

logp((tk+1 , vk+1 , ck+1 )|cvk , S≤k )

(12)

f =1 i=1

which is the sum of the logarithmic likelihood for all the individual cascades. We exploit backpropagation through time (BPTT) for training our model. In each training iteration, we vectorize activated nodes’ information as inputs, including nodes’ embedding, community structure labels’ embedding, and inter-event duration temporal features. The embedding matrix of nodes and community structure labels are learned along with the training process. The learning parameters of the model are Wemv , Wemc , Wh , Wcom , Wnode , Wt , bcom , bh , bnode , bt , and parameters of RNNs. At last, we apply stochastic gradient descent (SGD) with mini-batch and the parameters are updated by Adam [38]. To speed up the convergence, we use orthogonal initialization method in training process [39]. We also employ clips gradient norm for the parameters to prevent overfitting. 5. Experimental setup In this section, we introduce the data sets, the comparison methods, and the evaluation metrics used in the experiments to quantitatively evaluate the proposed framework. 5.1. Data sets Our experiments are conducted on two types of data sets— synthetic data and real-world data. Synthetic data. Following previous work [15], our data generation consists of two parts: network generation and cascade generation. We use two network generation tools to generate networks. The first network generation tool is following from previous work [15]. We apply kronecker graph model [40] to generate random network (RD) with parameters [0.5 0.5; 0.5 0.5], and hierarchical community network (HC) with parameters [0.9 0.1; 0.1 0.9]. Both are widely used in previous diffusion studies. We construct network with default 1024 users and avenge 20◦ . The second network generation tool is the LFR benchmark proposed by Lancichinetti et al. [41]. The method can generate synthetic networks with heterogeneity community sizes and heterogeneity degree distribution, which are more similar to real word networks. The average degree of nodes is 20, the maximum degree of nodes is 50, power law exponent for the degree distribution is 2, power law exponent for the community size distribution is 1. We generate two networks contain 50 0 nodes and 10 0 0 nodes separately. In the cascade generation part, for each activated node, we set the activation time of an activated user following a certain time distribution. Similar with Wang’s setup [15], we choose two-time distributions for sampling: (1) mixed exponential (Exp) distributions, controlled by rate parameters in [0.01, 10]; (2) mixed Rayleigh (Ray) distribution, controlled by scale parameters in [0.01, 10] [15]. The cascade generation progress uses breadth-first to search for next activated node, and the progress will stop until the overall time exceeds the threshold ω or no node is activated. We set ω = 100. Finally, four synthetic data sets are generated by different combinations of network scale and propagation time distributions, denoted by (500, Exp), (500, Ray), (1000, Exp), (1000, Ray), (Hc, Exp), (Hc,Ray), (Rd, Exp) and (Rd, Ray). We generate 20 cascades per node in each

279

dataset and randomly pick up 80% of cascades for training, 10% of cascades for validation, and 10% of cascades for test. Real world data. The Digg dataset proposed by Nathan et al. [42] is used in this paper. The dataset contains diffusions of stories as voted by the users, along with friendship network of the users [43]. We drop the cascades whose size larger than 1,0 0 0, as the large cascade rarely occurs in practice and may dominate the training process [15]. We randomly pick up 80% of cascades for training, 10% of cascades for validation, and 10% of cascades for test. Since there is no ground-truth for community labels, community structure is computed by infomap, which is the widely used community detection method. The infomap method finally generates 4 level modules, and we choose the third-level which contains 6 communities.

5.2. Comparison methods For comparison with the proposed model, we evaluate the following methods on the data sets. CYANRNN [15]: The method of CYANRNN is proposed by Wang et al. The method takes the cross-dependence in cascade into account by using an attention-based Recurrent Neural Network(RNN). RNNPP [16]: The method of RNNPP takes an RNN perspective to point process and models its background and history effect. The model can be used to predict event timestamp, main-type event and sub-type event. In this paper, we consider community structure labels as main-types and nodes as sub-types. Recurrent marked temporal point processes (RMTPP) [44]: The method of RMTPP views the intensity function of a temporal point process as a nonlinear function of the history and uses a recurrent neural network to automatically learn a representation of influences from the event history. The RMTPP can be applied in activated nodes timestamp and activated nodes prediction for information cascade. CS-LSTM: We using the Long short-term memory(LSTM) version of RNN in the proposed framework. CS-GRU: We using the Gated recurrent units(GRU) version of RNN in the proposed framework. S-RNN: We remove the community structure labels’ hidden representation part of our proposed method, to illustrate the community structure’s enhancement in cascade prediction tasks.

5.3. Evaluation metrics Our task is predicting next activated node, community structure label of the next activated node, and next activated timestamp, given previously cascade information. Since the number of potential nodes and community structure labels is huge, we can regard the prediction task as a ranking problem with users’ transition probabilities as their scores [15]. We evaluate the proposed method and comparison method by Accuracy on top k(Acc@k) and Mean Reciprocal Rank (MRR) which are the popularity used metrics. For timestamp prediction, we use the root-mean-square error(RMSE) which measures the difference between the predicted time point and the actual one.

6. Experimental results We run all the algorithms on our servers contain NVIDIA Tesla V100 32GB GPUs, Intel Xeon E5 CPUs, and 512G memory. We set the maximum number of training iterations for all algorithms to be 100. In this section, we introduce the performance of all the methods on synthetic datasets and real-world dataset.

280

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

Fig. 2. Prediction comparisons of next activated node on baselines and our proposed model named by CS-RNN. subfigures (a) ∼ (d) are represent comparisons on (500,Exp), (50 0,Ray), (10 0 0,Exp), (10 0 0,Ray) data sets, respectively. We use ACC@5, ACC@10, and MRR as evaluation metrics. Y-axises represent the percentage of each value. CS-GRU and CS-LSTM are the variants of our proposed model.

Fig. 3. Prediction comparisons of next activated node on baselines and our proposed model named by CS-RNN. subfigures (a) ∼ (d) are represent comparisons on (Hc,Exp), (Hc,Ray), (Rd,Exp), (Rd,Ray) data sets respectively. We use ACC@5, ACC@10, and MRR as evaluation metrics. Y-axises represent the percentage of each value. CS-GRU and CS-LSTM are the variants of our proposed model.

6.1. Synthetic data results Our method’s cost time for one epoch training are 10 s, 10 s, 29 s, and 29 s on (50 0,Exp), (50 0,Ray), (10 0 0,Exp), (10 0 0,Ray) dataset respectively. And our method’s cost time for one epoch testing are 2 s, 2 s, 5 s, and 5 s on (50 0,Exp), (50 0,Ray), (10 0 0,Exp), (10 0 0,Ray) dataset respectively. Fig 2 shows the prediction comparisons of the next activated node on baselines and our proposed model on (50 0,Exp), (50 0,Ray), (10 0 0,Exp), (10 0 0,Ray) datasets. As we can see, CS-RNN, CS-GRU, and CS-LSTM perform consistently and significantly better than other baselines on Acc@5, Acc@10, and MRR in all datasets. The results indicate that our proposed method and it’s variants can better predict next activated node.

S-RNN has lower accuracy and MRR values than CS-RNN. Since SRNN is the without community structure information version of our proposed method, the results clearly demonstrate that community structure information enhances accuracy and MRR values in cascade prediction tasks. Our method’s cost times for one epoch training are 6s, 6s, 15s, and 15s on (Hc,Exp), (Hc,Ray), (Rd,Exp), (Rd,Ray) dataset, respectively. And our method’s cost times for one epoch testing are 2s, 2s, 3s, and 3s on (Hc,Exp), (Hc,Ray), (Rd,Exp), (Rd,Ray) dataset, respectively. Fig 3 shows the prediction comparisons of the next activated node on baselines and our proposed model on (Hc,Exp), (Hc,Ray), (Rd,Exp), (Rd,Ray) datasets. As we can see, CS-RNN, CS-GRU, and CS-LSTM perform similarly with S-RNN

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

281

Fig. 4. Prediction comparisons of next activated node on Digg dataset. We use ACC@5, ACC@10, and MRR as evaluation metrics. Y-axises represent percentage of each value. Table 1 Predictive performance for predictions of next activation time on baselines and our proposed model named by CS-RNN. CS-GRU and CS-LSTM are the variants of our proposed model. Method

500,Exp

500,Ray

10 0 0,Exp

10 0 0,Ray

Hc,Exp

Hc,Ray

Rd,Exp

Rd,Ray

CYANRNN RNNPP RMTPP S-RNN CS-RNN CS-GRU CS-LSTM

7344.35 6.09 6.10 6.14 5.87 5.87 5.87

313.82 0.24 0.23 0.24 0.24 0.24 0.24

5897.60 2.80 2.69 2.84 2.87 2.87 2.87

522.80 0.20 0.19 0.20 0.20 0.20 0.20

14253.91 8.91 8.75 8.83 8.83 8.83 8.83

2037.61 0.73 0.75 0.72 0.69 0.69 0.69

17651.81 10.03 10.51 10.01 9.91 9.91 9.91

1320.55 0.63 0.58 0.58 0.58 0.58 0.58

Table 2 Predictive performance of community structure label for next activated node. Method

500,Exp

500,Ray

10 0 0,Exp

10 0 0,Ray

Hc,Exp

Hc,Ray

Rd,Exp

ACC@5

RNNPP CS-RNN CS-GRU CS-LSTM

41.82 81.88 78.96 82.90

41.86 81.05 76.89 81.09

26.11 78.04 74.50 78.07

25.16 76.78 69.13 76.13

53.24 92.54 91.29 92.09

52.21 95.64 95.07 96.08

27.54 79.98 79.65 80.47

26.30 83.05 87.68 87.22

ACC@10

RNNPP CS-RNN CS-GRU CS-LSTM

69.60 92.72 90.90 92.78

69.09 92.62 89.66 92.78

50.07 88.78 83.99 88.78

46.42 88.25 81.92 87.70

74.34 95.20 94.09 95.45

73.13 97.35 97.05 97.87

49.76 85.34 88.45 88.73

51.54 86.39 91.56 91.19

MRR

RNNPP CS-RNN CS-GRU CS-LSTM

26.14 64.21 67.00 66.75

26.23 63.73 64.72 63.21

18.02 71.40 71.07 66.23

18.09 68.80 59.27 65.30

31.03 81.36 87.18 76.93

34.60 94.10 94.72 94.90

18.30 77.58 69.89 72.35

24.21 82.42 86.81 85.97

on (Rd,Exp) and (Rd,Ray) datasets. Since (Rd,Exp) and (Rd,Ray) datasets are random networks, which contain no community structures. It is reasonable that S-RNN performs similar to the proposed method. Table 1 shows the prediction comparisons of next activation timestamp on baselines and our proposed model. The evaluation metrics is RMSE. We can see that our proposed methods perform well. The method RMTPP performs better than ours on some datasets. It is because the RMTPP takes the datasets as temporal point processes, which may make the method predict time more accurate. And our future work will focus on improving time prediction accuracy. The results on two types of datasets show that CS-LSTM performs better than CS-RNN. The reason is that there

Rd,Ray

is much long dependence in information diffusion cascades, and LSTM can extract more accurate representation for long dependence than RNN. Since only RNNPP, CS-RNN, CS-GRU, CS-LSTM can predict community structure label for next activated node, we show RNNPP, CS-RNN, CS-GRU, and CS-LSTM results in Table 2. We use the maintype prediction part of RNNPP to predict community structure labels, and use Acc@5, Acc@10, and MRR metrics to evaluate the results. The models are run on (50 0,Exp), (50 0,Ray), (10 0 0,Exp), and (10 0 0,Ray) datasets respectively. The results show that our method performs very much better than the RNNPP method, which also illustrates that community structure information can enhance the cascade prediction task.

282

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

Fig. 5. Heatmaps for the prediction performance of next activated nodes in terms of ACC@5, ACC@10, and MRR by varying the size from 1 to 256 on (10 0 0, Exp) and (10 0 0, Ray) datasets. The X-axises represent the dimension of community embedding weight, and the Y-axises represent the datasets.

Fig. 6. Heatmaps for the prediction performance of next activated nodes in terms of ACC@5, ACC@10, and MRR by varying the size from 1 to 256 on Digg dataset. The X-axises represent the dimension of community embedding weight, and the Y-axises represent the datasets.

6.2. Real data results Fig 4 shows the prediction comparisons of the next activated node on Digg dataset. We also compare our method with the other baselines. CYANRNN has no results because of out of memory. As we can see, CS-RNN performs consistently best. 6.3. Parameter sensitivity analysis In this section, we investigate how the prediction performance varies with the size of community embedding weight. We con-

duct the parameter analyses on the (10 0 0, Exp), (10 0 0, Ray), and Digg datasets. Other hyper-parameters that control our network are set similar to Xiao’s framework [16]. Fig 5 shows heatmaps for the prediction performance of next activated nodes in terms of ACC@5, ACC@10 and MRR by varying the size from 1 to 256 on (10 0 0, Exp) and (10 0 0, Ray) datasets. The X-axises represent the dimension of community embedding weight, and the Y-axises represent the datasets. Fig 6 shows heatmaps for the prediction performance of next activated nodes in terms of ACC@5, ACC@10 and MRR by varying the size from 1 to 256 on Digg dataset. We can observe a fast increase in prediction performance when the size of

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

community embedding weight increasing below about 25, and the prediction performance becomes robust after the size of community embedding weight exceed 25. 7. Conclusion In this paper, we work on the cascade prediction task by involving community structure information in the recurrent neural network framework(RNN). As far as we know, this work is a prior attempt on cascade prediction task based on RNN by involving community structure information. Different from traditional modeling methods, RNN is a convenient and effective tool for cascade predicting, avoiding strong prior knowledge on diffusion model and being flexible to capture complex dependence in cascades. Besides, recent researches find that community structure always affects cascade behaviors. Thus we propose a community structure information containing RNN to capture the community effects in cascade. Furthermore, our model introduces a community structure label predicting module to predict community structure label of the next activated node. We evaluate the effectiveness of our proposed model on both synthetic and real datasets. Experimental results demonstrate that our proposed model outperforms state-of-the-art modeling methods at both community structure label of next activated node prediction task and next activated node prediction task. Additionally, CS-RNN performs better than S-RNN on both synthetic and real datasets, implying that community information can enhance the precision of cascade predict. Besides, we conduct experiments to explore parameter sensitivity analysis of the size for community embedding weight. The results show that our method is robust when the size for community embedding weight becomes lager. In the future, we can improve the proposed work in three aspects. First, since the network structure information may be incomplete, we can combine deep Boltzmann machine to infer the structural information learn from [45]. Second, since there may contain common diffusion patterns, we can model the representation of background diffusion patterns learn from [46]. Third, the representation of predicted node and cascade representation may have complex nonlinear dependence, we can build this map by using Gaussian process regression learn from [47]. Declarations of interest None. Acknowledgment This research is partly supported by the Chinese National Funding of Social Sciences 15BTQ056, the National Key Research and Development Program of China 2018YFC0809800. References [1] S. Cheng, H. Shen, J. Huang, W. Chen, X. Cheng, Imrank: influence maximization via finding self-consistent ranking, in: Proceedings of the 37th international ACM SIGIR Conference on Research & Development in Information Retrieval, ACM, 2014, pp. 475–484. [2] L. Weng, F. Menczer, Y.-Y. Ahn, Virality prediction and community structure in social networks, Sci. Rep. 3 (2013) 2522. [3] D. Shahaf, J. Yang, C. Suen, J. Jacobs, H. Wang, J. Leskovec, Information cartography: creating zoomable, large-scale maps of information, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2013, pp. 1097–1105. [4] J. Goldenberg, B. Libai, E. Muller, Talk of the network: a complex systems look at the underlying process of word-of-mouth, Mark. Lett. 12 (3) (2001) 211–223. [5] K. Saito, R. Nakano, M. Kimura, Prediction of information diffusion probabilities for independent cascade model, in: Proceedings of the International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer, 2008, pp. 67–75.

283

[6] D. Kempe, J. Kleinberg, É. Tardos, Maximizing the spread of influence through a social network, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2003, pp. 137–146. [7] Z. Yang, J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, Z. Su, Understanding retweeting behaviors in social networks, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, 2010, pp. 1633–1636. [8] H.-K. Peng, J. Zhu, D. Piao, R. Yan, Y. Zhang, Retweet modeling using conditional random fields, in: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, IEEE, 2011, pp. 336–343. [9] Z. Luo, M. Osborne, J. Tang, T. Wang, Who will retweet me?: finding retweeters in Twitter, in: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2013, pp. 869–872. [10] E.F. Can, H. Oktay, R. Manmatha, Predicting retweet count using visual cues, in: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, ACM, 2013, pp. 1481–1484. [11] Q. Zhang, Y. Gong, Y. Guo, X. Huang, Retweet behavior prediction using hierarchical dirichlet process., in: Proceedings of the AAAI, 2015, pp. 403–409. [12] P. Cui, S. Jin, L. Yu, F. Wang, W. Zhu, S. Yang, Cascading outbreak prediction in networks: a data-driven approach, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2013, pp. 901–909. [13] P. Cui, F. Wang, S. Liu, M. Ou, S. Yang, L. Sun, Who should share what?: item-level social influence prediction for users and posts ranking, in: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2011, pp. 185–194. [14] Q. Zhang, Y. Gong, J. Wu, H. Huang, X. Huang, Retweet prediction with attention-based deep neural network, in: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ACM, 2016, pp. 75–84. [15] Y. Wang, H. Shen, S. Liu, J. Gao, X. Cheng, Cascade dynamics modeling with attention-based recurrent neural network, in: Proceedings of the 26th International Joint Conference on Artificial Intelligence, AAAI Press, 2017, pp. 2985–2991. [16] S. Xiao, J. Yan, X. Yang, H. Zha, S.M. Chu, Modeling the intensity function of point process via recurrent neural networks., in: Proceedings of the AAAI, 17, 2017, pp. 1597–1603. [17] S. Liu, H. Zheng, H. Shen, X. Cheng, X. Liao, Learning concise representations of users influences through online behaviors, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, 2017, pp. 2351–2357. [18] J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, J. Tang, Deepinf: Modeling influence locality in large social networks, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), 2018. [19] D. Bahdanau, K. Cho, Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate, in: International Conference on Learning Representations, 2015. [20] O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: a neural image caption generator, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3156–3164. [21] M.E. Newman, Modularity and community structure in networks, Proc. Natl. Acad. Sci. 103 (23) (2006) 8577–8582. [22] Z. Liu, B. Hu, Epidemic spreading in community networks, EPL (Europhys. Lett.) 72 (2) (2005) 315. [23] A. Nematzadeh, E. Ferrara, A. Flammini, Y.-Y. Ahn, Optimal network modularity for information diffusion, Phys. Rev. Lett. 113 (8) (2014) 088701. [24] L. Liu, J. Tang, J. Han, M. Jiang, S. Yang, Mining topic-level influence in heterogeneous networks, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM, 2010, pp. 199–208. [25] J. Zhang, B. Liu, J. Tang, T. Chen, J. Li, Social influence locality for modeling retweeting behaviors., in: Proceedings of the IJCAI, 13, 2013, pp. 2761–2767. [26] N. Naveed, T. Gottron, J. Kunegis, A.C. Alhadi, Bad news travel fast: a content-based analysis of interestingness on Twitter, in: Proceedings of the 3rd International web Science Conference, ACM, 2011, p. 8. [27] T.R. Zaman, R. Herbrich, J. Van Gael, D. Stern, Predicting information spreading in Twitter, in: Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds, Nips, 104, Citeseer, 2010, pp. 17599–17601. [28] S. Petrovic, M. Osborne, V. Lavrenko, Rt to win! predicting message propagation in Twitter., ICWSM 11 (2011) 586–589. [29] W. Feng, J. Wang, Retweet or not?: personalized tweet re-ranking, in: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, ACM, 2013, pp. 577–586. [30] B. Suh, L. Hong, P. Pirolli, E.H. Chi, Want to be retweeted? large scale analytics on factors impacting retweet in Twitter network, in: Proceedings of the IEEE Second International Conference on Social Computing, IEEE, 2010, pp. 177–184. [31] L. Weng, F. Menczer, Y.-Y. Ahn, Predicting successful memes using network and community structure., in: Proceedings of the ICWSM, 2014. [32] J. Wu, R. Du, Y. Zheng, D. Liu, Optimal multi-community network modularity for information diffusion, Int. J. Mod. Phys. C 27 (08) (2016) 1650092. [33] D.H. Stern, R. Herbrich, T. Graepel, Matchbox: large scale online Bayesian recommendations, in: Proceedings of the 18th International Conference on World Wide Web, ACM, 2009, pp. 111–120.

284

C. Liu, W. Wang and Y. Sun / Neurocomputing 359 (2019) 276–284

[34] Q. Wu, C.J. Burges, K.M. Svore, J. Gao, Ranking, boosting, and model adaptation, Technical Report, Technical report, Microsoft Research, 2008. [35] Y. Artzi, P. Pantel, M. Gamon, Predicting responses to microblog posts, in: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, 2012, pp. 602–606. [36] Z. Luo, Y. Wang, X. Wu, Predicting retweeting behavior based on autoregressive moving average model, in: Proceedings of the International Conference on Web Information Systems Engineering, Springer, 2012, pp. 777–782. [37] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Complex networks: structure and dynamics, Phys. Reports 424 (4–5) (2006) 175–308. [38] D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, in: International Conference on Learning Representations, 2015. [39] M. Henaff, A. Szlam, Y. LeCun, Recurrent orthogonal networks and long-memory tasks, in: International Conference on Machine Learning, 2016, pp. 2034–2042. [40] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, Z. Ghahramani, Kronecker graphs: an approach to modeling networks, J. Mach. Learn. Res. 11 (Feb) (2010) 985–1042. [41] A. Lancichinetti, S. Fortunato, F. Radicchi, Benchmark graphs for testing community detection algorithms, Phys. Rev. E 78 (4) (2008) 046110. [42] N.O. Hodas, K. Lerman, The simple rules of social contagion, Sci. Reports 4 (2014) 4343. [43] T. Hogg, K. Lerman, Social dynamics of digg, EPJ Data Sci. 1 (1) (2012) 5. [44] N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, L. Song, Recurrent marked temporal point processes: embedding event history to vector, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016, pp. 1555–1564. [45] J. Han, D. Zhang, G. Cheng, L. Guo, J. Ren, Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning, IEEE Trans. Geosci. Remote Sens. 53 (6) (2015) 3325–3337. [46] J. Han, D. Zhang, X. Hu, L. Guo, J. Ren, F. Wu, Background prior-based salient object detection via deep reconstruction residual, IEEE Trans. Circuits Syst. Video Technol. 25 (8) (2015) 1309–1321. [47] J. Han, X. Ji, X. Hu, D. Zhu, K. Li, X. Jiang, G. Cui, L. Guo, T. Liu, Representing and retrieving video shots in human-centric brain imaging space, IEEE Trans. Image Process. 22 (7) (2013) 2723–2736.

Chaochao Liu received the B.S. and M.S. degrees from HeFei University of Technology, HeFei, China, in 2012 and 2015. He is currently pursuing the Ph.D. degree from College of Intelligence and Computing, Tianjin University, Tianjin, China. His research interests include deep learning, information diffusion, multilayer networks, and social network.

Wenjun Wang received the Ph.D. degree from Peking University, Beijing, China, in 2004. He is currently a Professor with the School of Computer Science and Technology, Tianjin University, Tianjin, China. His research interests include computational social science, emergency management, large-scale data mining and network science.

Yueheng Sun received the Ph.D. degree from Tianjin University, Tianjin, China, in 2005. He is currently a lecturer with the School of Computer Science and Technology, Tianjin University, Tianjin, China. His research interests include social computing, data minning and machine learning.