Constrained common cluster based model for community detection in temporal and multiplex networks

Constrained common cluster based model for community detection in temporal and multiplex networks

Communicated by Dr. Muhammet Uzuntarla Accepted Manuscript Constrained Common Cluster based Model for Community Detection in Temporal and Multiplex ...

3MB Sizes 0 Downloads 28 Views

Communicated by Dr. Muhammet Uzuntarla

Accepted Manuscript

Constrained Common Cluster based Model for Community Detection in Temporal and Multiplex Networks Pengfei Jiao, Wenjun Wang, Di Jin PII: DOI: Reference:

S0925-2312(17)31507-2 10.1016/j.neucom.2017.09.013 NEUCOM 18870

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

4 April 2017 11 June 2017 5 September 2017

Please cite this article as: Pengfei Jiao, Wenjun Wang, Di Jin, Constrained Common Cluster based Model for Community Detection in Temporal and Multiplex Networks, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.09.013

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Constrained Common Cluster based Model for Community Detection in Temporal and Multiplex Networks

a Tianjin b Tianjin

CR IP T

Pengfei Jiaoa,b , Wenjun Wanga,b,∗, Di Jina

University, School of Computer Science and Technology, Tianjin, 300350, China Engineering Center of SmartSafety & Bigdata Technology,Tianjin Key Laboratory of Advanced Networking (TANK)

AN US

Abstract

On one hand, the detection of tightly connected groups, also known as community detection in complex networks, is a prominent problem for network analysis and mining. On the other hand, almost all of social, biological, bibliographic, communication and computer systems are modeled as temporal networks, the

M

topological structures of which evolve with time, or multiplex networks, each pair of nodes of which has multiple linked relations. Current methods of community detection for temporal networks are based on incremental, independent

ED

or evolutionary clustering, and for multiplex networks are based on fusion of the multiple links. However, all these methods ignore the common structure hidden in the networks, which is denoted as the common cluster here. So in this paper,

PT

we propose a constrained common cluster based model (C 3 model) to analyze the temporal and multiplex networks, which can not only detect the community

CE

structure, but also identify the importance of each node based on the common cluster structure of both two classes of networks. The intrinsic assumption of the proposed model is that there are common or coincident clusters hidden in

AC

these networks. In detail, we first construct the Markov steady-state matrices of each snapshot of temporal network or each slice of multiplex network. Next, ∗ Corresponding

author Email addresses: [email protected] (Pengfei Jiao), [email protected] (Wenjun Wang), [email protected] (Di Jin)

Preprint submitted to Journal of LATEX Templates

September 11, 2017

ACCEPTED MANUSCRIPT

we propose the object function of C 3 model by combining the Markov steadystate matrices, similarity matrices with community membership matrices of each snapshot or slice of the network. Last, a gradient descent algorithm based on

CR IP T

non-negative matrix factorization is proposed for the object function. Experiments on both synthetic datasets and real-world networks demonstrate that the

proposed C 3 model has competitive performance based on the evaluation in-

dexes N M I and error of community detection, otherwise, the proposed model could identify the importance of nodes of the temporal or multiplex networks. Keywords: temporal or multiplex networks, constraint common structre,

AN US

Markov steady-state matrices, non-negative matrix factorization

1. Introduction

Community structure is a remarkable feature observed in complex networks and community detection has a giant impact on complex network analysis and

5

M

mining [1, 2]. Communities are usually composed of tightly connected subgraphs in complex networks, for example, in a protein-protein interaction (PPI) network, nodes and edges correspond proteins and interactions among the pro-

ED

teins, respectively, and each community may represent a biological organism with specific function [3]. A variety of methods have been proposed for commu-

10

PT

nity detection from different fields, such as traditional clustering methods [3], spectral methods [4], modularity optimization [5], stochastic block model [6, 7], non-negative matrix factorization [8, 9, 10] and so on. Detailed and comprehen-

CE

sive introductions for community detection can be seen in [11, 12, 13]. Furthermore, many social, biological, bibliographic, communication and com-

AC

puter systems can be modeled as temporal networks [14], i.e. the topological

15

structures of which evolve with time, or multiplex network [15], i.e. each pair of nodes of which has multiple relations. On one side, in a mobile communication network, the links between each two individuals are dynamic with times and the time interval distribution of these interactions follows a power-low distribution [16]. A ecological network, which is constructed with the species or

2

ACCEPTED MANUSCRIPT

20

other categories of organisms and interactions among them, it may change with the seasons as organisms going through different phases of their life cycles [17]. On the other side, such as in a social network which has a multiplex structure,

CR IP T

different slices may represent different social relationships (including Facebook, Twitter, etc); in a transportation network, nodes and different type links corre25

spond to airports and different airline flights, respectively [18]. In the temporal or multiplex networks, the communities of which have their own characteris-

tics. So finding the community structure for both these two types of network

makes the problem of community detection more interesting and challenging.

30

AN US

Otherwise, detecting anomalies in temporal networks has also received great attentions[19, 20].

In fact, over the topological structure varying with time, community structure of the temporal networks has its own characteristics which including community growth, contractions and so on [21]. For a multiplex network, each single layer of which has a piece of meaningful information from its perspective, and hence how to exploit and fuse the multi slices or multiplexes information

M

35

to detect the community structure and analyze each layer of the network is an

ED

essential problem [22]. Some pioneering methods have been proposed for either the temporal networks or the multiplex networks, such as the methods based on independent clustering [21], incremental clustering [23, 24] and evolutionary clustering [25, 26] for temporal networks, matrix factorization [27] and pattern

PT

40

mining [28] for multiplex networks. However, the methods of community de-

CE

tection for temporal networks are not suitable for multiplex networks, because they didn’t make the best of multi slices structure of the networks. On the contrary, the methods for multiplex networks ignore the temporal information and couldn’t analyze the evolution of the temporal networks.

AC

45

In addition, how can we detect the community structures in each snapshots

of the temporal networks or each slices of the multiplex networks and analyze their latent properties? Is there any internal consistency among these communities? How to generate the each slice of the multiplex network and the each

50

snapshots of the temporal networks we observed by joining with the common 3

ACCEPTED MANUSCRIPT

cluster structure and the temporal or multiplex communities? Aiming at these challenges, in this paper we present an intuitive, principal, interpretable and optimizable model, called constrained common cluster model

55

CR IP T

(C 3 model), to analyze and explore community structure and common cluster structure hidden in the temporal or multiplex networks. We assume that there are same nodes and fixed number of communities for each snapshot or each slice

of the network, with varying link structure in the temporal network and different

types of links at each slice in the multiplex network. The intrinsic assumption of the proposed model is that, there are common or coincident structure in the snapshots of temporal networks or slices of multiplex networks, which are

AN US

60

called common clusters. In fact, we also assume that the number of common clusters is equal to the number of communities across the temporal or multiplex networks, and nodes are divided into different communities at each snapshot or slice. However, each cluster contains all the nodes of the network in the form 65

of probability, which makes our model analyze the importance of each node in

M

the network. In detail, We first construct the Markov steady-state matrices of each snapshot of temporal network or each slice of multiplex network. Next,

ED

we propose the object function of C 3 model by combining the Markov steadystate matrices, similarity matrices and community membership matrices of the 70

network in a theoretical way. Finally, a gradient descent algorithm based on

PT

non-negative matrix factorization is proposed for optimizing the object function. Experiments on both synthetic datasets and real-world networks demonstrate

CE

that our C 3 model has competitive performance based on different evaluation indexes of community detection and could reveal interesting results in some neworks.

AC

75

80

The main contributions of this paper is summarized as follows: • By assuming that there are common or coincident cluster structure in the temporal or multiplex network, we propose an intuitive, principal and interpretable statistical model C 3 , to detect the community structure, which is suitable for both these two types of networks.

4

ACCEPTED MANUSCRIPT

• Based on the common or coincident cluster structure hidden in the network, the proposed model could identify the importance of each node in the cluster of the network.

CR IP T

• We propose an iterative algorithm to optimize the object function which is constituted by combing the Markov steady-state matrices, similarity

85

matrices and community membership matrices of the network.

• We test the proposed model and compare it with some widely used methods for community detection on a variety of temporal and multiplex net-

AN US

works, and the experimental results show the effectiveness and competitively performance of the proposed model.

90

The rest of this paper is organized as follows. First, we review some classic and widely used methods for community detection in temporal or multiplex networks in Section 2. Second, we give the detailed construction of our model

95

M

and the algorithm in Section 3 and 4, respectively. Third, experiments on both artificially generated networks and real world datasets are shown in Section 5. At last, we conclude our work and give some future extensions of the proposed

ED

model in Section 6.

PT

2. Related work

Community detection, as an important and meaningful research direction in 100

complex networks, has been attracted many attentions and a number of methods

CE

have been proposed. Surveys about community detection in static networks can been see in [11, 12, 13, 29].

AC

Moreover, recently there are also a growing number of literatures on com-

munity detection in temporal or multiplex networks separately.

105

On one hand, a temporal network can be modeled as s series of snapshot

networks and methods of community detection for the temporal networks can be divided into three categories in general. The first class is called independent

5

ACCEPTED MANUSCRIPT

clustering, the basic idea of which is to apply static methods of community detection to each snapshot of temporal networks and then analyze the community 110

evolution among them [21]. The second is called incremental clustering, which

CR IP T

continuously updates the community structures of the networks by translating the varying of the network as stream data [23, 24]. Similar methods include

GraphScope [30], incremental spectral clustering [31], and dynamic modular-

ity optimization methods [32, 33]. The last and most important one is called 115

evolutionary clustering, which assumes that the temporal networks vary slowly,

and the previous snapshot or community structure can be used as a penalty

AN US

term when we analyze the current snapshot network. Furthermore, the authors

in [34] extended the classic k-means and hierarchical clustering to analyze dynamic datasets. Chi et.al. [25] also extended the spectral clustering to dynamic 120

scene and proposed two clustering frameworks, i.e. Preserving Cluster Quality (PCQ) and Preserving Cluster Membership (PCM). Liu et.al. proposed a label propagation based evolutionary clustering for detecting overlapping and

M

non-overlapping communities in dynamic networks [35]. A classic and most widely used method, F aceN et [36], which generalized the symmetric nonnegative matrix factorization for community detection in temporal networks, and

ED

125

also been as a comparative method in our work. Folino et.al. [26] proposed the DY N M OGA which takes the evolutionary clustering as a multi objective clus-

PT

tering problem. Gauvin et.al. [37] proposed a method based on non-negative tensor factorization to detect the community structure and activity patterns in temporal networks. In addition, Stochastic Block Models [38] (SBM), a classical

CE

130

statistical model for network analysis, has been also extended to dynamic SBM

AC

for modeling temporal networks [39, 40, 41]. On the other hand, matrix factorization and pattern mining are the two

classes of methods for community detection in multiplex networks. Based on

135

matrix factorization, Tang et al. [27] and Gao et al. [42] proposed different graph clustering algorithms for this problem. The basic idea of both these two algorithms is to fuse different information by extracting common factors from multiple slices of the networks. Zeng et al. [28] proposed a subgraph 6

ACCEPTED MANUSCRIPT

mining algorithm which denotes to find cross-graph quasi-cliques in a multiplex 140

networks, which is a classic and effective method based on pattern mining. Besides, some similar works can be seen in [43], and a detailed review in [44].

CR IP T

Also of note, a generalized network quality function for both the timedependent (temporal) and multiplex networks is proposed in [45], and the au-

thors proposed a heuristic method M ultislice, which detects the temporal and 145

multiplex community structure by a greedy optimization algorithm. The key

idea of this method is to couple multiple or multi-snapshots adjacency matri-

ces of the networks, and hence the returned communities have no a principle

AN US

and theoretically meaning. Therefor, the constrained common cluster model (C 3 model) is proposed, which can detect the community structure in both 150

temporal and multiplex network and analyze importance of each node in the network.

M

3. Constraint Common Cluster based Model

In this section, we introduce the basic notations and the proposed model for

155

ED

the temporal and multiplex networks. 3.1. Notations and Model

As we know, a temporal or multiplex network usually can be represented as

PT

Gt = (V, Et ), where, t = 1, 2, · · · , S. V and Et denote the nodes of the network and edges at the snapshot or slice t, respectively. N = |V | and Mt = |Et | are the

CE

number of nodes and the number of edges of the network at each t, respectively.

160

S and K are the number of snapshots and the number of communities of the temporal network or the number of slices of the multiplex network, respectively.

AC

In other words, we set the number of nodes and the number of communities in the temporal or multiplex networks as constants. We also assume that the network is unweighted and undirected with At denoting the adjacency matrix

165

of the network at snapshot or slice t, and At,ij = 1 if there is an edge between

nodes i and j at snapshot or slice t, and At,ij = 0 otherwise.

7

CR IP T

ACCEPTED MANUSCRIPT

Figure 1: The plate representation of the constraint common clsuter based model. We present

AN US

the community membership matrix at snapshot or slice t and the common cluster W for fitting the temporal or multiplex network G with the Markov steady-state matrix Pt and adjacency matrix At at each snapshot or slice t.

It is worth noting that, at each snapshot or slice t of the network, communities are denoted as a set C = {C1 , C2 , · · · , CK } and each Ck , k = 1, 2, · · · , K is a set of nodes at the snapshot or slice network t, which are usually constrained by ∪K k=1 Ck = V and Ck ∩ Cl = φ when k 6= l. This is called non overlapping

M

170

communities structure and if Ck ∩ Cl 6= φ, it is called overlapping community nodes V of the network as W = {W1 , W2 , · · · , WK }, each Wk ∈ (0, 1)N ×1 is a PN probability vector and i=1 Wik = 1 for k = 1, 2, · · · , K and i = 1, 2, · · · N .

As represented in Figure 1, which is the plate representation of our proposed

PT

175

ED

structure. On the other hand, we denote a common cluster structure on all the

model. Ht is the community membership matrix of the network at snapshot or

CE

slice t, each Ht,jk represents the propensity of node j belonging to community k at t, where j = 1, 2, · · · , N , k = 1, 2, · · · , K, and K is the number of com-

munities of the network and we assume that it is a constant and has a same value for different snapshots of the temporal network and for different slices of

AC

180

the multiplex network. Pt is the Markov steady-state matrix of the network at snapshot or slice t, each Pt,ij represent the probability of a random walker arriving node j from node i, which also can be considered the probability we

find node j from node i.

8

ACCEPTED MANUSCRIPT

On one hand, as we denoted before, W is the probabilistic matrix of the

185

common cluster structure. Each Wik represent the importance of node i in cluster k, which also can be regarded as a prior probability of cluster k containing

CR IP T

190

node i or the probability of we could find node i in cluster k. It is easy known PN that i=1 Wik = 1. So Wik Ht,jk is the expected probability we find node i PK from node j in cluster k at every snapshot or clice t. So the k=1 Wik Ht,jk

is the expected probability we find node i from node j in the network at t, PK with which, we know that Pt,ij ≈ k=1 Wik Ht,jk , further more, we write that as Pt ≈ Pˆt = W HtT . On the other hand, the Ht,ik Ht,jk can be regard as

AN US

195

the the expected number of links between nodes i and j in community k at PK snapshot or slice t, so the k=1 Ht,ik Ht,jk is the expected number of links between nodes i and j at snapshot or slice t of the network. We have the PK formula At ≈ Aˆt = k=1 Ht HtT , where HtT is the transposition of matrix Ht .

As we denoted above, Pt − W HtT and At − Ht HtT for every snapshot or slice

t are the errors which we need to optimize, so we will optimize the following

O1 = λ

T X

(1)

where, Dt

Dt (Pt |W HtT ) + (1 − λ)

ED

t=1

(1)

M

object function of the proposed C 3 model

T X t=1

(2)

Dt (At |Ht HtT )

(1)

(2)

and Dt , t = 1, 2, · · · , T maybe any cost functions that quantify

200

PT

the quality of the approximation, such as the the squared loss or the square of the Frobenius norm, which also is the euclidean distance between two matrices X and Y [46] or the Kullback-Leibler divergence between the two matrices.

CE

The squared loss and Kullback-Leibler divergence are denoted as DF (X|Y ) = P P X |X − Y |2F = ij (Xij − Yij )2 and DKL (X|Y ) = ij (Xij log Yijij − Xij + Yij ),

AC

which are equivalent to the errors Xij − Yij for all pair of i ∼ j and which are

205

Normal distribution and Poisson distribution, respectively. The λ ∈ [0, 1] is a balance parameter, if λ is close to 1, the community membership matrices Ht are more similar because of the common cluster structure W , if λ is close to 0, the model will deteriorate to independent clustering at each snapshot. So a suitable value of λ for the objective function will achieve the best tradeoff,

9

ACCEPTED MANUSCRIPT

210

which can not only detect the well communities structure, but also evaluate the importance of each node in every cluster. (1)

For simple, in this paper, we just set all the Dt

(2)

and Dt , t = 1, 2, · · · , T as

and can be rewritten as min O1 = λ

W,Ht

s.t.

T X t=1

Ht ≥ 0,

|Pt − W HtT |2F + (1 − λ)

T X t=1

t = 1, 2, . . . , S

|At − Ht HtT |2F

.

(2)

(3)

k = 1, 2, . . . , K

AN US

W ≥ 0, PN i=1 Wik = 1,i = 1, 2, . . . , N,

CR IP T

the euclidean distance and the object function become a minimization problem

And without loss of generality, here we set λ = 0.5 (there must be another better value for λ, how to select the best value is not in the scope of this paper), the optimization of object function 2 can be effectively solved by a 215

gradient descent method based on the non-negative matrix factorization [47]

M

and a detailed description can be found in section 4.

ED

4. Parameter Learning For The C 3 Model It is difficult to obtain the optimal solution by optimize the object function 2 directly because of its non convex. So we can return the W and Ht by obtaining the local minimum of the object function under the Majorization-Minimization

PT

220

framework [48], in which we can iteratively updates Ht , t = 1, 2, · · · , S given W

CE

and updates W given Ht , t = 1, 2, · · · , S. The detailed formulation about the update rules of the parameters is as follows. We update each Ht while remaining W are constant. A Lagrange multiplier

AC

matrix Θt for each Ht is introduced and the corresponding object function is as OHt = |Pt − W HtT |2F + |At − Ht HtT |2F + tr(Θt HtT ) = tr[(Pt − W HtT )T (Pt − W HtT )]+

tr[(At − Ht HtT )T (At − Ht HtT )] + tr(Θt HtT )

10

(4)

ACCEPTED MANUSCRIPT

here, we ignore the λ and (1 − λ) due to space constraints and it is insignificant to the model and optimization. We then set derivative of OHt with respect to Ht to 0 and get (5)

CR IP T

2Ht W T W − 2Pt W + 4Ht HtT Ht − 4At Ht + Θ = 0

with complementary slackness Karush-Kuhn-Tucker (KKT) condition for each Ht and we get

(2Pt W − 2Ht W T W + 4At Ht − 4Ht HtT Ht )ik Ht,ik = 0

(6)

as Ht,ik ← Ht,ik



AN US

Following with the updates in [8, 49, 50], we get the update of Ht [51, 47, 52] (Pt W + 2At Ht )ik (Ht W T W + 2Ht HtT Ht )ik

 14

(7)

this updating rule of Ht guarantee the non-increasing and convergence of the 225

objective function.

M

We then update W when given each Ht , the object function about W can be rewritten as

ED

OW =

S X t=1

with the constraints W ≥ 0 and

PN

|Pt − W HtT |2F

i=1

(8)

Wik = 1, i = 1, 2, . . . , N, k = 1, 2, . . . , K.

We also introduce a Lagrange multiplier matrix Φ for the nonnegative con-

PT

straints on W and the object function OW is written as

CE

OW =

S X t=1

|Pt − W HtT |2F + tr(ΦW T )

(9)

AC

the derivative of OW with respect to W is S

X ∂OW = (2W HtT Ht − 2Pt Ht )ik + Φ( ik) ∂Wik t=1

(10)

then, we easy get the updating of W similar it in [42] as Wik

PS ( t=1 (Pt Ht ))ik ← PS ( t=1 W HtT Ht )ik 11

(11)

ACCEPTED MANUSCRIPT

and we ensure constraint

PN

i=1

Wik = 1, i = 1, 2, . . . , N, k = 1, 2, . . . , K. after

each iteration with column normalization. As we discussed above, given the initial values, we iteratively update W and each Ht and calculate the objective function until convergence, this algorithm

CR IP T

230

can be see in algorithm 1, and it is easy to prove the convergence and correctness of the algorithm [51, 47, 52].

Algorithm 1 Algorithm for the Constraint Common Cluster based Model Input: A temporal or multiplex network G = A1 , A2 , · · · , AS ; The number of communities K, and the maximum number of iterations niter

snapshot or slice of the network 1:

initialize W with Wik ≥ 0 and

2:

for t = 1 : S do

4: 5:

PN

initialize Ht with Ht,ik ≥ 0 for n = 1 : niter do

P ( S

(P H ))

i=1

Wik = 1, i = 1, 2, . . . , N, k = 1, 2, . . . , K

M

3:

AN US

Output: The common structure W and community membership Ht for each

Wik ← Wik (PSt=1W Ht T tH ik ) t=1

PWik i Wik

t

t ik

Wik ←

7:

for t = 1, 2, · · · , S do  14  (Pt W +2At Ht )ik Ht,ik ← Ht,ik (Ht W T W +2H H T H ) t t ik

8:

t

Calculate objective function equation 2

PT

9:

ED

6:

return W and Ht

CE

The most time- consuming parts of the algorithm are the updating Wik and

Ht,ik for each t. The time cost at n−th iteration for each Wik is O(S[(N +1)K + di ]) and for each Ht,ik is O((N + 1)K + di ), where the di denotes the degree

AC

235

of the node i at t. So the over all time cost of the algorithm is O(N KS[(N + 1)K] + M K), where the N , M , K, and S are the number of nodes, the mean number of edges of all snapshots or slices, the number of communities and the number of snapshots or slices of the network.

12

ACCEPTED MANUSCRIPT

240

5. Experimental results In this section, we introduce the algorithms to be compared with the proposed model, evaluation metrics and give the experiments on both the synthetic

CR IP T

and real world networks including some temporal networks and multiplex networks. 245

5.1. Comparison Algorithms

Here, we introduce the comparison algorithms in our experiment, which are

AN US

• the static symmetric non-negative matrix factorization [8] (StaN M F )

which is the special case of our model when we set λ = 0 of the object function 2.

• DY N M OGA [26], a multiobjective approach which is also based on the

250

evolutionary clustering and formalized as a multiobjective optimization

M

problem to be optimized by a genetic algorithm.

• F aceN et [36]: which combine the symmetric nonnegative matrix factoriza-

ED

tion and evolutionary clustering, we set the penalty coefficient of smoothness as 0.8, which could return the best performance of the method.

255

• M ultislice [45]: which optimize the temporal, multiscale and multiplex

PT

modularity with a greedy heuristic method. We set the resolution parameter γ = 1 and couple parameter ω = 0.2 for this method, which are the

CE

usually parameters setting in the related works.

• our proposed model (C 3 ) and the common constraint non-negative matrix

260

AC

factorization (ComNMF), which is the special case of the proposed model when we set λ = 1 of the object function 2.

5.2. Evaluation Metrics

265

Two widely used evaluation metrics, the Normalized Mutual Information

(NMI) [53], the error [54] for community detection, are introduced to evaluate

13

ACCEPTED MANUSCRIPT

the performance of the methods. The larger N M I value means a better performance of the algorithm and a larger error presents a relatively poor result of the corresponding method. We use both metrics on each snapshots of the

CR IP T

temporal networks and each slices of the multiplex networks. The Normalized Mutual Information (NMI), which determines how similar-

ity the community detection results delivered by the algorithm to the ground truth information and is denoted as N M I(X, Y ) =

(12)

where the X and Y are the two partitions of the community structure and P P (x,y) I(X, Y ) = x y P (x, y)log PP(x)P (y) is the mutual information between the X

AN US

270

2I(X, Y ) H(X) + HH(Y )

and Y , H(X) and H(Y ) denote the entropies of X and Y , respectively.

The error, which determines the difference between the different partitions and is denoted as

(13)

M

error = kZZ T − CC T k2F

where Z and C denotes the partition of ground truth and the partition of the

275

ED

algorithm, respectively. And kV kF is the Frobenius norm of matrixV . 5.3. Experiments on the synthetic networks In this subsection, we introduce one class multiplex benchmark and three

PT

class temporal synthetic networks to present the performance of our proposed

CE

model and the baseline methods introduced before. Multiple Girvan-Newman benchmark Here, we first generate the Girvan-Newman benchmark (GN), which is a

280

AC

class of networks with 128 nodes and 4 equal size communities, each community has 32 nodes and expected degree of all nodes are equal to 16. A extremely important parameter zin, which denotes the average number of edges of each node connecting in its own community, and presents the significance of community

285

structure of the network, detailed description about this data and its procedure is in [53]. Based on each generated GN network, we generate the multiplex 14

ACCEPTED MANUSCRIPT

network as follows, by randomly selecting the edges of the network S times but with all the nodes, we set a fraction parameter r to control the selecting number of edges, and we can construct a multiplex network with S slices. Of course, the ground truth of each slices of the network is constant and we ensure each

CR IP T

290

slice network is connected.

As present in the figure 2, we show four situations of experimental per-

formances of the networks where we set the parameters as zin = 7, r = 0.7; zin = 7, r = 0.8; zin = 8, r = 0.7; and zin = 8, r = 0.8 respectively. For each 295

situation, we run all the algorithms 20 times and the Figure 2 shows the average

AN US

and standard deviation of the results at each slice over times. We find that the performance of our proposed model are relative better than other methods, all the methods have relative poor results with zin = 7. As for the error, the method M ultislice has lower values, however, which automatic selects 13 com300

munities in each slice and it is serious wrong. We test all the situation with the parameter zin varying from 8 to 15, which is the setting almost static methods

M

do, our proposed method is nearly better than the baseline algorithms, we only give the situation with zin = 8 for concise. We also know that, the performances

305

ED

of all the methods are better when setting r = 0.8 than that in r = 0.7. It is easy to be understanding for the later setting which makes each slice network

PT

more sparser.

Temporal benchmark network 1 This class temporal synthetic networks is first proposed by Lin et al. in

CE

[54] which is the dynamic generalization of the Girvan-Newman benchmark.

310

We generate the temporal networks with 10 snapshots, at each snapshot, there

AC

are 128 nodes and 4 communities with each has equal size, and the average degree of nodes is 20. A mixing parameter or noise level z is used to control

the significance of the communities. Here, we set z = 5 and z = 6, respectively, which are the usually setting in previous works. The dynamics of the networks

315

are as follows, at snapshot t ≥ 1, some nodes in each community join other communities randomly by leaving their own communities, we use parameter nc 15

ACCEPTED MANUSCRIPT

to represent the dynamics of each community, which means that there are 4nc nodes switching their communities at each snapshot. To test the performance of the proposed model, we set mixing parameter z = 5, 6 and dynamic level parameter nc = 3, 9, respectively. As shown in Figure

CR IP T

320

3, we give the results with different parameters setting on both two evaluation

metrics. Here, we compare six different methods, which are the StaN M F ,

F aceN et, DY N M OGA, M ultislice, ComN M F and C 3 , respectively. Firstly, the performance of all the methods are better when parameter z = 5 than that 325

in z = 6 on both NMI and error, which is intuitive and easy to be interpreted.

AN US

Secondly, Our proposed C 3 nearly have best performance based on either NMI

or error, for the M ultislice method over fitting the network, in other words, which finds a large number of communities than the ground truth. Thirdly, although the proposed ComN M F method with relative poor performance, the 330

C 3 model works nice, from which, we know that the common structure is a well hypothesis to analyze the temporal networks. Lastly, we run all the algorithms

ED

NMI and error.

M

20 times, our proposed method C 3 has relatively stable performance on both

Temporal benchmark network 2

The second class temporal benchmark networks are proposed in [55], which

335

PT

is generated by taking into account some events that characterize the dynamic of the networks. For our model’s assumption, we select one class synthetic networks, called switch, we also generate the temporal networks with 10 snapshots.

CE

At each snapshot, the network has 1000 nodes and 40 communities, respectively,

340

the average degree of nodes is 15 and the degree distribution follows power law.

AC

There are two parameters mu and p to control the community structure and temporal dynamics of the networks. The first is called mixing parameter, a larger value mu, a more fuzzy community structure. The latter is called temporal switch parameter, p represent the probability of each node changing their

345

communities of between time steps. In our experiments, we set mu = 0.1, 0.2, · · · , 0.8 and p = 0.1, 0.8, respec16

ACCEPTED MANUSCRIPT

tively. In the Figure 4, we show the experimental results with parameters setting nu = 0.7, p = 0.1, nu = 0.7, p = 0.8, nu = 0.8, p = 0.1, and nu = 0.8, p = 0.8, respectively, of generated temporal networks. Performance of all the methods are relative better when setting nu < 0.7 and we ignore that. From the results,

CR IP T

350

we easy know that, firstly, the proposed model C 3 and F aceN et have better performance than other methods in general based on NMI, the former method has more competitively results based on error, which show that the C 3 model is more appropriate for temporal networks analysis. secondly, the ComN M F 355

method give poor results with a larger p, which is same to our intuition. lastly,

AN US

it is most important that, compared with adding a penalty term using last community structure in F aceN et, adding the common structure is more better, meanwhile, the balance parameter is easy to be set. Temporal benchmark network 3

The dynamic Grow-shrink benchmark proposed in [56] is mainly used for

360

M

testing the models and algorithms of temporal community detection. The benchmark denotes a triangular waveform function which is used to model the dy-

ED

namic of the network and each snapshot network is generated with the classic SBM. Here, we generate the temporal networks with 21 snapshots, and at each 365

snapshot, we set the number of the community 8 and each community has 32

PT

and 64 nodes, respectively. Two other parameters pin and pout are denoted to represent the density within communities and between different communities, respectively. In our experiments, we set pin = 0.5 and vary pout from 0.1 to

CE

0.5.

As shown in Figure 5, the proposed model C 3 has more better and more

370

AC

stable performance based on NMI and error at each snapshot, especially for the networks when pout = 0.8, which means that our model are suitable for

fuzzy temporal networks. The method M ultislice has a lower error values for

it is over fitting the networks, as in our results, this method has more than

375

20 communities at each snapshot, however, the number of communities at each snapshot is only 8 as we denoted. 17

ACCEPTED MANUSCRIPT

To test the performance of results of all these methods on the temporal networks, we give the detailed results on the networks with varying parameter pout. We show the results in a simple way that we compute the average value of NMI and error on all snapshot of each temporal network. It is surprising and

CR IP T

380

reasonable that our proposed model C 3 has competitive performance based on

either NMI or error, especially for larger size or more fuzzy temporal networks. 5.4. Experiments on the real world networks A MIT Reality Multiplex network

The multiplex network of MIT Reality Mining [57] used in this paper is col-

AN US

385

lected and analyzed by the MIT human dynamic lab with 87 users involving in the October 2008 on the camps. The networks has 3 layers, each of which represent the different social relationships including Calls, SMS, and Proximity respectively. We construct the temporal networks as follows, each user is de390

noted as a node and there is 3 slices in this networks, at each snapshot t, we

M

denote the similarly matrix Aij = 1 if nodes i and j has more than 3 times interactions and Aij = 0, otherwise. The ground truth can be denoted as the

ED

individual attributes based on survey data. We consider two class annotated data including ”year-school”, which year the user were in, and ”floor”, which 395

living sector in the dorm building the user apartments were located in.

PT

As represented in Figure 7, we show these two results on this multiplex network of the methods, StaN M F , ComN M F , C 3 and M ultislice based on

CE

NMI and error. It is obvious that our model C 3 and the M ultislice have better performance in general. A KIT email temporal network

AC

400

This temporal network is constructed by the email senders, recipients and

their interactions over time 1 . Each email sender or recipient is a node and in-

teractions represent the edges. The senders or recipients belong to a community 1 www.iti.kit.edu/projects/spp1307/

18

ACCEPTED MANUSCRIPT

if they are guided by a same supervisor. We select these data with time stamps 405

ranging from July 2007 to December 2009, further more, we make these data into temporal networks with four resolutions, which are 2, 3, 4 and 6 months,

CR IP T

respectively. So we have 4 temporal networks with the number of snapshots 24, 16, 12, and 8 respectively. To enforce the number of nodes and communities of each snapshot same, we select only the common nodes and communities 410

occurring all the snapshots for the 4 temporal networks.

As represented in Figure 8, the methods of ComNMF and C 3 achieve better

An DBLP example

AN US

performance than other methods in general.

To exhibit the community evolution of the temporal networks and verify the 415

rationality of the assumption-common cluster structure-of our proposed model, we analyze a DBLP dataset

2

as our case study. The data contains the co-

authorship relation among the papers in 28 conference over 5 years, which is

M

come from three main research areas in computer science including data mining (DM), database (DB) and artificial intelligence (AI). We construct this temporal network as follows, each author is denoted one node, and if two authors

ED

420

collaborate one or more papers, then there is an edge between the nodes. We set the number of snapshots of this network is S = 5 with one year of each, and

PT

we find the authors who appear in all the snapshots, which makes this temporal network having a constant number of nodes. So the number of nodes and 425

communities of this networks are 632 and 3, respectively.

CE

Here we give the performance of our proposed method, as shown in Figures

9 and 10. As we have denoted, W means the common cluster structure of the

AC

network, and each Wik means the importance of node i in community k, we sort

each column of W and select top 20 authors in each community. As represent

430

in Figure 9, each circle represents one author and the name of author is in it, the navy blue, medium blue, light blue circles represent the top 20 authors in 2 http://dblp.uni-trier.de/db/

19

ACCEPTED MANUSCRIPT

three communities, respectively. The size of one circle represent its importance. It is easy to know that, the 3 communities are corresponding to artificial intelligence, data mining and database, respectively. Author M ichaelI.Jordan has the largest size in the artificial intelligence, authors JiaweiHan and P hilipS.Y u

CR IP T

435

have the top and second importance in data mining. Based on the community structure of the proposed method, we analyze the evolution of the three commu-

nities, which are shown in Figure 10. The left subfigure represents the transition probability among the three communities at consecutive two snapshots and the right subfigure represents the evolution of the communities, which is plotted by the tool in [58].

6. Conclusion and discussion

AN US

440

In this paper, we have introduced the Constraint Common Cluster based model (C 3 model), to detect the community structure in the temporal and multiplex networks. The key assumption in the proposed model is that there

M

445

are common cluster structures in the networks. We have also derived an iterative algorithm to optimize the proposed model, which is converge to a stable local

ED

optimal solution. We have tested the model on a variety of generated and real world networks, and compared it with some widely used methods, showing its competitive performances in community detection.

PT

450

In detailed, as our intuitive perception, the common cluster is a useful constraint for community detection in temporal or multiplex networks. Compared

CE

with the widely used methods, such as static nonnegative matrix factorization, M ultislice, F aceN et, DY N M OGA and so on, either the ComN M F or the

455

C 3 model not only shows the competitive performances in the computer gen-

AC

erated datas and real world networks, but also has a principled, generated and theoretical comprehension. Based on the assumption, the proposed methods could identify the importance of each node in the networks, as represented in the DBLP example. Also of note, we can analyze the evolution of the temporal

460

communities with the help of other methods as in experiments.

20

ACCEPTED MANUSCRIPT

In fact, our proposed method is just designed to detect communities at each snapshot of temporal networks or each slice of multiplex networks based on common cluster structure, which ignore the temporal order of snapshots of

465

CR IP T

temporal networks and the difference of slices of multiplex networks. However, we analyze the evolution of the temporal communities in other methods and validate the multiplex communities based on different ground-truth. We hope to solve this problem better in future.

In addition, there are a number of directions the proposed model could be

extended. Firstly, we assume that there is a fixed number of nodes in each slice or snapshot of the network, which is always unrealistic in the real networks,

AN US

470

how to extend it, especially in the temporal networks? Secondly, the balance parameters on each slice or snapshot are the same, which is easily to set, and how to automatic setting these parameters based on the observed networks is an interested idea in future. Lastly and most importantly, how to set the 475

number of communities, and how to extend the model to deal the networks

M

with different number of communities in different snapshots or slices? A lot of methods determining the number of communities in static networks have been

ED

proposed and extending these methods to the multiplex and temporal networks reasonably will be our next work. Acknowledgment

PT

480

This work was supported by the Major Project of National Social Science

CE

Fund(14ZDB153),the major research plan of the National Natural Science Foundation (91224009,51438009 ).

AC

References

485

[1] M. E. J. Newman, Communities, modules and large-scale structure in networks, Nature Physics 8 (1) (2011) 25–31.

[2] Y.-Y. Ahn, J. P. Bagrow, S. Lehmann, Link communities reveal multiscale complexity in networks, Nature 466 (7307) (2010) 761–764. 21

ACCEPTED MANUSCRIPT

[3] M. Girvan, M. E. J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences (12) (2001)

490

7821–7826. arXiv:PMC122977.

CR IP T

[4] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborova,

P. Zhang, Spectral redemption in clustering sparse networks, Proceedings of the National Academy of Sciences 110 (52) (2013) 20935–20940. 495

[5] M. E. J. Newman, Modularity and community structure in networks, Proceedings of the National Academy of Sciences 103 (23) (2006) 8577–8582.

AN US

[6] B. Ball, B. Karrer, M. E. J. Newman, Efficient and principled method

for detecting communities in networks, Physical Review E 84 (3) (2011) 036103–13. 500

[7] B. Karrer, M. E. J. Newman, Stochastic blockmodels and community struc-

83.016107.

M

ture in networks, Phys. Rev. E 83 (2011) 016107. doi:10.1103/PhysRevE.

URL https://link.aps.org/doi/10.1103/PhysRevE.83.016107

ED

[8] F. Wang, T. Li, X. Wang, S. Zhu, C. Ding, Community discovery using nonnegative matrix factorization, Data Mining and Knowledge Discovery

505

PT

22 (3) (2010) 493–521.

[9] J. Yang, J. Leskovec, Overlapping community detection at scale: a non-

CE

negative matrix factorization approach., WSDM (2013) 587–596. [10] I. Psorakis, S. Roberts, M. Ebden, B. Sheldon, Overlapping community detection using Bayesian non-negative matrix factorization, Physical Review

AC

510

E 83 (6) (2011) 066114–9.

[11] S. Fortunato, Community detection in graphs, Physics Reports 486 (3-5) (2010) 75–174.

[12] S. Fortunato, D. Hric, Community detection in networks: A user guide, 515

Physics Reports 659 (2016) 1–44. 22

ACCEPTED MANUSCRIPT

[13] F. D. Malliaros, M. Vazirgiannis, Clustering and community detection in directed networks: A survey, Physics Reports (2013) 1–48. [14] P. Holme, J. Saram¨ aki, Temporal networks, Physics Reports 519 (3) (2012)

520

CR IP T

97–125.

[15] S. Boccaletti, G. Bianconi, R. Criado, C. I. del Genio, J. G´ omez-Garde˜ nes,

M. Romance, I. Sendi˜ na-Nadal, Z. Wang, M. Zanin, The structure and dynamics of multilayer networks, Physics Reports 544 (1) (2014) 1–122.

[16] M. X. Li, Z. Q. Jiang, W. J. Xie, S. Miccich`e, M. Tumminello, A com-

AN US

parative analysis of the statistical properties of large mobile phone calling networks, Scientific reports (2014) 5132.

525

[17] C. Aggarwal, K. Subbian, Evolutionary Network Analysis, ACM Computing Surveys 47 (1) (2014) 1–36.

[18] M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, M. A.

M

Porter, Multilayer networks, Journal of Complex Networks 2 (3) (2014) 203–271.

ED

530

[19] A. Sapienza, A. Panisson, J. Wu, L. Gauvin, C. Cattuto, Detecting Anomalies in Time-Varying Networks Using Tensor Decomposition, in: 2015 IEEE

PT

International Conference on Data Mining Workshop (ICDMW), IEEE, 2015, pp. 516–523. [20] A. Sapienza, A. Panisson, J. Wu, L. Gauvin, C. Cattuto, Anomaly De-

CE

535

tection in Temporal Graph Data - An Iterative Tensor Decomposition and

AC

Masking Approach., AALTD@PKDD/ECML.

[21] G. Palla, A.-L. Barab´ asi, T. Vicsek, Quantifying social group evolution,

540

Nature 446 (7136) (2007) 664–667.

[22] V. Nicosia, G. Bianconi, V. Latora, M. Barthelemy, Growing Multiplex Networks, Physical Review Letters 111 (5) (2013) 058701–17.

23

ACCEPTED MANUSCRIPT

[23] H. Tong, S. Papadimitriou, J. Sun, P. S. Yu, C. Faloutsos, Colibri: fast mining of large static and dynamic graphs., KDD (2008) 686–694. [24] C. Tantipathananandh, T. Y. Berger-Wolf, Constant-factor approximation

CR IP T

algorithms for identifying dynamic communities., KDD (2009) 827–836.

545

[25] Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng, Evolutionary spectral clustering by incorporating temporal smoothness., KDD (2007) 153–162.

[26] F. Folino, C. Pizzuti, An Evolutionary Multiobjective Approach for Com-

munity Discovery in Dynamic Networks, IEEE Transactions on Knowledge

AN US

and Data Engineering 26 (8) (2014) 1838–1852.

550

[27] W. Tang, Z. Lu, I. S. Dhillon, Clustering with Multiple Graphs, in: 2009 Ninth IEEE International Conference on Data Mining (ICDM), IEEE, 2009, pp. 1016–1021.

[28] Z. Zeng, J. Wang, L. Zhou, G. Karypis, Out-of-core coherent closed quasi-

M

clique mining from large dense graph databases, ACM Transactions on

555

ED

Database Systems 32 (2) (2007) 13. [29] J. Xie, S. Kelley, B. K. Szymanski, Overlapping community detection in networks, ACM Computing Surveys 45 (4) (2013) 1–35.

PT

[30] J. Sun, C. Faloutsos, S. Papadimitriou, P. S. Yu, GraphScope - parameterfree mining of large time-evolving graphs., KDD (2007) 687.

560

CE

[31] H. Ning, W. Xu, Y. Chi, Y. Gong, T. S. Huang, Incremental spectral clustering by efficiently updating the eigen-system., Pattern Recognition

AC

43 (1) (2010) 113–127.

[32] R. Aktunc, I. H. Toroslu, M. Ozer, H. Davulcu, A Dynamic Modularity

565

Based Community Detection Algorithm for Large-scale Networks, in: the 2015 IEEE/ACM International Conference, ACM Press, New York, New York, USA, 2015, pp. 1177–1183.

24

ACCEPTED MANUSCRIPT

[33] T. N. Dinh, N. P. Nguyen, M. T. Thai, An adaptive approximation algorithm for community detection in dynamic scale-free networks, in: INFOCOM, 2011 Proceedings IEEE, IEEE, 2013, pp. 55–59.

570

CR IP T

[34] D. Chakrabarti, R. Kumar, A. Tomkins, Evolutionary clustering., KDD (2006) 554–560.

[35] H. Sun, J. Huang, X. Zhang, J. Liu, D. Wang, H. Liu, J. Zou, Q. Song,

IncOrder: Incremental density-based community detection in dynamic networks, Knowledge-Based Systems 72 (C) (2014) 1–12.

575

AN US

[36] Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, B. L. Tseng, Facetnet: a frame-

work for analyzing communities and their evolutions in dynamic networks., WWW (2008) 685–694.

[37] L. Gauvin, A. Panisson, C. Cattuto, Detecting the community structure and activity patterns of temporal networks: A non-negative tensor factor-

580

pone.0086028.

M

ization approach, PLOS ONE 9 (1) (2014) 1–13. doi:10.1371/journal.

ED

URL https://doi.org/10.1371/journal.pone.0086028 [38] K. Nowicki, T. A. B. Snijders, Estimation and Prediction for Stochastic Blockstructures, Journal of the American Statistical Association 96 (455)

585

PT

(2001) 1077–1087.

[39] T. Yang, Y. Chi, S. Zhu, Y. Gong, R. Jin, Detecting communities and

CE

their evolutions in dynamic social networks—a Bayesian approach, Machine Learning 82 (2) (2010) 157–189.

[40] K. S. Xu, A. O. Hero, Dynamic Stochastic Blockmodels for Time-Evolving

AC

590

Social Networks, IEEE Journal of Selected Topics in Signal Processing 8 (4) (2014) 552–562.

[41] C. Matias, V. Miele, Statistical clustering of temporal networks through a dynamic stochastic block model, arXiv.org.

25

ACCEPTED MANUSCRIPT

595

[42] J. Gao, J. Han, J. Liu, C. W. 0001, Multi-View Clustering via Joint Nonnegative Matrix Factorization., SDM (2013) 252–260. [43] X. Dong, P. Frossard, P. Vandergheynst, N. Nefedov, Clustering With

CR IP T

Multi-Layer Graphs - A Spectral Perspective., IEEE Trans. Signal Processing 60 (11) (2012) 5820–5831. 600

[44] J. Kim, J.-G. Lee, Community Detection in Multi-Layer Graphs: A Survey., SIGMOD 44 (3) (2015) 37–48.

[45] P. J. Mucha, Community structure in time-dependent, multiscale, and mul-

AN US

tiplex networks (vol 328, pg 876, 2010), Science, 2010.

[46] D. Cai, X. He, J. Han, T. S. Huang, Graph Regularized Nonnegative Matrix Factorization for Data Representation, IEEE Transactions on Pattern

605

Analysis and Machine Intelligence 33 (8) (2011) 1548–1560.

NIPS (2000) 556–562.

M

[47] D. D. Lee, H. S. Seung, Algorithms for Non-negative Matrix Factorization.,

ED

[48] M. A. T. Figueiredo, J. M. Bioucas-Dias, R. D. Nowak, Majorizationminimization algorithms for wavelet-based image restoration, IEEE Trans-

610

actions on Image Processing 16 (12) (2007) 2980–2991. doi:10.1109/TIP.

PT

2007.909318.

[49] D. Jin, Z. Chen, D. He, W. Zhang, Modeling with Node Degree Preservation

CE

Can Accurately Find Communities., AAAI. 615

[50] D. Jin, H. Wang, J. Dang, D. He, W. Zhang, Detect Overlapping Commu-

AC

nities via Ranking Node Popularities., AAAI (2016) 172–178.

[51] C. Fevotte, J. Idier, Algorithms for nonnegative matrix factorization with the β-divergence, Neural computation 23 (9) (2011) 2421–2456.

[52] W. Wang, P. Jiao, D. He, D. Jin, L. Pan, B. Gabrys, Autonomous over-

620

lapping community detection in temporal networks - A dynamic Bayesian

26

ACCEPTED MANUSCRIPT

nonnegative matrix factorization approach., Knowl.-Based Syst. 110 (2016) 121–134. [53] M. E. J. Newman, M. Girvan, Finding and evaluating community structure

625

CR IP T

in networks, Physical Review E 69 (2) (2004) 026113–15.

[54] Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, B. L. Tseng, Analyzing commu-

nities and their evolutions in dynamic social networks, ACM Transactions on Knowledge Discovery from Data 3 (2) (2009) 1–31.

[55] D. Greene, D. Doyle, P. Cunningham, Tracking the Evolution of Commu-

AN US

nities in Dynamic Social Networks, in: 2010 International Conference on

Advances in Social Networks Analysis and Mining (ASONAM 2010), IEEE,

630

2010, pp. 176–183.

[56] C. Granell, R. K. Darst, A. Arenas, S. Fortunato, S. G´ omez, Benchmark model to assess community structure in evolving networks, Physical Review

635

M

E 92 (1) (2015) 012805–8.

[57] W. Dong, B. Lepri, A. S. Pentland, Modeling the co-evolution of behaviors

ED

and social relationships using mobile phone data, in: the 10th International Conference, ACM Press, New York, New York, USA, 2011, pp. 134–143.

PT

[58] R. Mall, R. Langone, J. A. K. Suykens, Netgram: Visualizing Communities

CE

in Evolving Networks, PLoS ONE 10 (9) (2015) e0137502–24.

Biography

AC

640

27

ACCEPTED MANUSCRIPT

Pengfei Jiao received the B.S. degree from the Hainan University, Haikou, China, in 2012. He is currently pursuing the Ph.D. degree from the School of 645

Computer Science and Technology, Tianjin University, Tianjin, China. His cur-

CR IP T

rent research interests include dynamic complex network analysis, data mining,

AN US

and machine learning.

Wenjun Wang received the Ph.D. degree from Peking University, Beijing, 650

China, in 2004. He is currently a Professor with the School of Computer Science and Technology, Tianjin University, Tianjin, China. His research interests include computational social science, emergency management, large-scale data

ED

M

mining and network science.

PT

655

Di Jin received the B.S., M.S., and Ph.D. degrees in computer science from Jilin University, Changchun, China, in 2005, 2008, and 2012, respectively. He

CE

was a Post-Doctoral Research Fellow at the School of Design, Engineering, and Computing, Bournemouth University, Poole, U.K., from 2013 to 2014. He is an Assistant Professor with the School of Computer Science and Technology,

AC

660

Tianjin University, Tianjin, China. He has published over 30 international journal articles and conference papers. His current research interests include data mining, complex network analysis, and machine learning.

28

0.4

0.2

NMI

NMI

0.3

0.1 0

1

2

3

4

5

6

7

8

9

0

10

error 1

2

3

4

5

6

7

t

(a)

9

10

NMI

1

2

3

4

5

6

7

8

9

1

2

3

4

4

5

6

5

7

8

StaNMF

9 10 ComNMF C3 Multislice

6

7

8

9

10

6

7

8

9

10

6

7

8

9

10

t

(b)

ED

2

3

4

5

6

7

8

0.4

1

2

3

4

5

1

2

3

4

5

6000

4000

1

0.6

0.2

10

error

0.4

2000

4000

3

0.8

0.6

6000

error

8

M

NMI

0.8

5000

2

AN US

error

5000

0.2

1

6000

6000

4000

0.2

CR IP T

ACCEPTED MANUSCRIPT

9

10

4000 2000 0

PT

t

t

(c)

(d)

Figure 2: The experimental results on the multiplex Girven-Newman Synthetic Networks,

CE

each sub figure represents the performance based on the NMI and error, respectively. The network size is 128, the average degree is 16. (a): zin = 7, r = 0.7; (b): zin = 7, r = 0.8;

(c): zin = 8, r = 0.7; (d): zin = 8, r = 0.8;. The black lines represent the results of the

AC

our ComN M F method, the red lines represent that of the C 3 model proposed in this paper,

the other two lines represent the StaN M F and M ultislice methods, respectively. Error bars show the standard deviations estimated on 20 runs for all the methods.

29

1

NMI

NMI

1 0.5 0

1

2

3

4

5

6

7

8

9

0

10

10000

error

4000 2000 1

2

3

4

5

6

7

t

(a)

10

NMI

2

3

4

5

6

6000 4000

1

2

7

ED

error

9

3

4

8

9

1

2

3

4

5

3

4

5

6

7

8

9

10

6

7

8

9

10

6

7

t

(b)

5

6

7

8

9

0.2

0

10

1

2

3

4

5

10

6000 4000 2000

1

2

3

4

5

PT

6

t

(c)

(d)

Figure 3: The experimental results on the temporal Girven-Newman Synthetic Networks, each

CE

sub figure represents the performance based on the NMI and error, respectively. The network size is 128, the average degree is 20. (a): z = 5, nc = 3; (b): z = 5, nc = 9; (c): z = 6, nc = 3; (d): z = 6, nc = 9; The red lines represent that of the C 3 model proposed in this paper, the

black lines represent the results of the ComN M F method, the other four lines represent the

AC

10

C3 DYNMOGA FaceNet Multislice

t

StaN M F , F aceN et, DY N M OGA and M ultislice methods, respectively. Error bars show the standard deviations estimated on 20 runs for all the methods.

30

StaNMF

8 ComNMF 9

8000

error

1

8000

2000

0

2

0.4

0.2

0

8

M

NMI

0.4

5000

1

AN US

error

6000

0

0.5

CR IP T

ACCEPTED MANUSCRIPT

7

8

9

10

1

NMI

0.6 0.4 0.2

1

3

4

5

6

7

error 1

2

3

4

5

6

7

t

(a)

5

8

9

10

4

5

4

5

6

7

8

9

10

1

2

3

6

7

8

9

10

t

(b)

NMI

2

3

4

5

6

10 4

15 10 5 1

2

7

3

4

5

6

8

9

M

1

7

8

9

10

0.3 0.2 0.1

1

10

2

3

4

5

2

3

4

5

7

8

9

10

6

7

8

9

10

10 5 0

1

t

PT

6

10 4

15

error

0.2

0

3

10 4

10

0

2

0.4

0.3

ED

NMI

0.4

1

15

5

0.1

0

10

C3 DYNMOGA FaceNet Multislice

10

0

error

StaNMF

8 ComNMF 9

10 4

15

error

2

0.5

AN US

NMI

0.8

CR IP T

ACCEPTED MANUSCRIPT

t

(c)

(d)

Figure 4: The experimental results on the temporal switch Synthetic Networks, each sub

CE

figure represents the performance based on the NMI and error, respectively. The network size is 128, the average degree is 20. (a): nu = 0.7, p = 0.1; (b): nu = 0.7, p = 0.8; (c): nu = 0.8, p = 0.1; (d): nu = 0.8, p = 0.8; The red lines represent that of the C 3 model proposed in

AC

this paper, the black lines represent the results of the ComN M F method, the other four lines represent the StaN M F , F aceN et, DY N M OGA and M ultislice methods, respectively. Error bars show the standard deviations estimated on 20 runs for all the methods.

31

NMI

0.4

0.5

0

4

6

8

10

12

14

C3 DYNMOGA FaceNet Multislice

10 4

1 2

4

6

8

10

12

14

16

t

(a)

NMI

0.1

error

10

2

4

6

8

10

12

10 4

5

0

2

14

16

18

8

10

4

6

8

10

12

14

16

18

20

2 1

2

12

14

16

18

20

t

(b)

4

6

8

10

12

14

16

18

0.1

0.05

0

20

M

0

6

10 4

0

20

4

0.15

0.2

ED

NMI

0.3

18

2

3

error

2

0

0

StaNMF 18 20 ComNMF

16

10

error

error

3

2

0.2

AN US

NMI

1

CR IP T

ACCEPTED MANUSCRIPT

20

2

4

6

8

10

12

14

16

18

20

12

14

16

18

20

t

10 4

5

0

2

4

6

8

10

t

t

(d)

PT

(c)

Figure 5: The experimental results on the temporal Grow-shrink Synthetic Networks, each

CE

sub figure represents the performance based on the NMI and error, respectively. (a): the size of each snapshot is 256, pout = 0.24; (b): the size of each snapshot is 256, pout = 0.28; (c): the size of each snapshot is 512, pout = 0.24; (d):the size of each snapshot is 512, pout = 0.28; The red lines represent that of the C 3 model proposed in this paper, the black lines represent

AC

the results of the ComN M F method, the other four lines represent the StaN M F , F aceN et,

DY N M OGA and M ultislice methods, respectively. Error bars show the standard deviations estimated on 20 runs for all the methods.

32

2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 10 4

2 1 0

1

NMI

error

3

error

1

C3 DYNMOGA FaceNet 0 Multislice 1 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 4 5 10 4 10

0.5

5

AN US

0.5 0

StaNMF ComNMF

1

2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 t

4

4

5

5

0

1

2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 t

(b)

ED

(a)

M

NMI

1

CR IP T

ACCEPTED MANUSCRIPT

Figure 6: The experimental results on the temporal Grow-shrink Synthetic Networks with varying pout, each sub figure represents the performance based on the NMI and error, respec-

PT

tively. (a): the size of each snapshot is 256; (b): the size of each snapshot is 512;The red lines represent that of the C 3 model proposed in this paper, the black lines represent the results of the ComN M F method, the other four lines represent the StaN M F , F aceN et, DY N M OGA

CE

and M ultislice methods, respectively. Error bars show the standard deviations estimated on

AC

all the snapshots for all the methods.

33

4

5

0.45

2500

0.4 2000

AN US

0.35

CR IP T

ACCEPTED MANUSCRIPT

0.3

1500

NMI

error

0.25

0.2

1000

M

0.15

0.1

0

ED

0.05

500

StaNMF ComNMF

C^3

0

Multislice

StaNMF ComNMF

C^3

Multislice

PT

Figure 7: The results on the MIT reality multiplex network with two classes ground truth. The left panel shows the performance based on NMI, and the right panel gives the error results. The white and the light purple histogram are the performance of different methods

CE

based on ”year-school” and ”floor”, respectively. Error bars show the standard deviations

AC

estimated on all the snapshots for all the methods.

34

0.9

0.8

0.8

0.7

StaNMF ComNMF

0.6

5

10

15

C 20 DYNMOGA FaceNet Multislice

4000

1500

error

error

2000

1000 500

5

10

15

0.7 0.6

3

2000

0

20

NMI

0.7

2

4

6

8

10

8

2

4

6

8

10

12

14

16

10

12

14

16

5

6

7

8

5

6

7

8

(b)

0.8 0.7 0.6

12

1

2

3

4

1

2

3

4

6000

M

6000

error

6

0.9

0.8

4000 2000

2

4

6

8

10

error

NMI

0.9

0

4

t

(a)

0.6

2

AN US

t

CR IP T

0.9

NMI

NMI

ACCEPTED MANUSCRIPT

12

4000 2000 0

t

ED

t

(d)

PT

(c)

Figure 8: The experimental results on the KIT email temporal Networks based on NMI and error: (a): The temporal network with 24 snapshots, the number of nodes and communities are

CE

138 and 23, respectively; (b): The temporal network with 24 snapshots, the number of nodes and communities are 170 and 25, respectively; (c): The temporal network with 24 snapshots, the number of nodes and communities are 195 and 25, respectively; (d): The temporal network with 24 snapshots, the number of nodes and communities are 231 and 27, respectively. The

AC

red lines represent that of the C 3 model proposed in this paper, the black lines represent the results of the ComN M F method, the other four lines represent the StaN M F , F aceN et, DY N M OGA and M ultislice methods, respectively. Error bars show the standard deviations estimated on all the snapshots for all the methods.

35

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 9: The importance representation of the top 20 authors in 3 communities. The navy blue, medium blue, light blue circles represent the communities of AI, DM, and DB.

36

W2

C2

C2

C3

C3 C1

C2 T2

C3

C1

C2 T3

C3

1

W4

C1

C1

C2

C2

C3

M

T4

T3

W3

2

AN US

C1 T2

T1

W1 C1

CR IP T

ACCEPTED MANUSCRIPT

C3

C2 T4

C3

C1

C2 T5

ED

C1

(a)

C3

0

(b)

PT

Figure 10: The evolution of communities in the DBLP data. (a): the transition probability among the three communities at consecutive two snapshots; (b): the evolution of the

AC

CE

communities.

37