Community-aware dynamic network embedding by using deep autoencoder

Community-aware dynamic network embedding by using deep autoencoder

Community-aware Dynamic Network Embedding by Using Deep Autoencoder Journal Pre-proof Community-aware Dynamic Network Embedding by Using Deep Autoen...

1MB Sizes 3 Downloads 148 Views

Community-aware Dynamic Network Embedding by Using Deep Autoencoder

Journal Pre-proof

Community-aware Dynamic Network Embedding by Using Deep Autoencoder Lijia Ma, Yutao Zhang, Jianqiang Li, Qiuzhen Lin, Qing Bao, Shanfeng Wang, Maoguo Gong PII: DOI: Reference:

S0020-0255(20)30029-3 https://doi.org/10.1016/j.ins.2020.01.027 INS 15158

To appear in:

Information Sciences

Received date: Revised date: Accepted date:

10 June 2019 11 January 2020 16 January 2020

Please cite this article as: Lijia Ma, Yutao Zhang, Jianqiang Li, Qiuzhen Lin, Qing Bao, Shanfeng Wang, Maoguo Gong, Community-aware Dynamic Network Embedding by Using Deep Autoencoder, Information Sciences (2020), doi: https://doi.org/10.1016/j.ins.2020.01.027

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Inc.

Highlights • This paper studies the embedding problem (DNE) in dynamic networks. • We design a deep autoencoder (CDNE) for solving the DNE problem. • We present a community-aware smoothing technique for DNEs. • CDNE preserves both the structure and evolution of dynamic networks. • We validate the superiority of CDNE over several state-of-the-art DNE methods.

1

Community-aware Dynamic Network Embedding by Using Deep Autoencoder Lijia Maa , Yutao Zhanga , Jianqiang Lia , Qiuzhen Lina , Qing Baob , Shanfeng Wangc , Maoguo Gongc,∗ a College

of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China. of Computer Science, Hangzhou Electronic Science and Technology University, Hangzhou 310018, China. c School of Electronic Engineering, the Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, No. 2 South TaiBai Road, Xi’an 710071, China. b School

Abstract Network embedding has recently attracted lots of attention due to its wide applications on graph tasks such as link prediction, network reconstruction, node stabilization, and community stabilization, which aims to learn the low-dimensional representations of nodes with essential features. Most existing network embedding methods mainly focus on static or continuous evolution patterns of microscopic node and link structures in networks, while neglecting the dynamics of macroscopic community structures. In this paper, we propose a Community-aware Dynamic Network Embedding method (short for CDNE) which considers the dynamics of macroscopic community structures. First, we model the problem of dynamic network embedding as a minimization of an overall loss function, which tries to maximally preserve the global node structures, local link structures, and continuous community dynamics. Then, we adopt a stacked deep autoencoder algorithm to solve this minimization problem, obtaining the low-dimensional representations of nodes. Extensive experiments on both synthetic networks and real networks demonstrate the superiority of CDNE over the existing methods on tackling various graph tasks. Keywords: Network embedding, dynamic networks, community structures, deep autoencoder, low-dimensional node representation 1. Introduction Networks were widely used to represent various complex systems such as social, biological, ecological and economical systems, in which nodes and links denote the entities and communications of the systems, respectively. However, the rapid developments of Internet and Web 2.0 make network representation more challenging, as the numbers of entities and communications increase exponentially over time. In recent years, network embedding (NE) has recently attracted much attention due to its wide applications on large-scale graph tasks such as link prediction, network reconstruction, community detection, community stabilization, node ranking, node classification, graph visualization and item recommendation) [38, 46]. NE is a novel representation technique for learning the low-dimensional representation of entities while preserving their properties [13, 23], which enables a comprehensive understanding of the features and functions of systems. Previous works mainly focused on static NE (SNE) methods, which embed the high-dimensional structures of static networks into a low-dimensional representation space by using some dimension reduction approaches. The main idea behind them is that two nodes in a network with a more similar structure should be closer in the embedded representation space. Generally, the high-dimensional structures are represented by the microscopic structures such as the link proximity [41], cycles [20], subspaces [44], paths [32] and multilayered links [22], and the macroscopic structures such as the community structures [6, 24, 41, 47] and subgraphs [8]. Moreover, the classical dimension reduction approaches have matrix factorization [41], deep autoencoder [40, 43, 48], deep learning [8, 13, 15, 32], singular value decomposition [4], generative adversarial nets [17] and evolutionary algorithms [23]. A systematic review of SNE methods could be found in [15, 39, 45]. Nevertheless, many real-world networks such as collaboration networks, social networks and biological networks are naturally dynamic, with the emergence and disappearance of nodes and edges over time [1, 9, 34, 37]. The above-mentioned SNE methods are hard to preserve the underlying network structures of dynamic networks evolving over time as they neglect the evolutionary patterns of the networks. To preserve the evolutionary patterns of dynamic networks, some dynamic NE (DNE) methods have been proposed under the assumption that the microscopic node and link structures constantly change over time. For instance, Goyal et al. [12, 14] proposed to learn the low-dimensional representations of nodes incrementally, with the representations at current step depending those on previous time steps. They further proposed Dyn2vec and DySAT, which use a selfattention technique and a vector recurrent technique to learn the potential structures of dynamic networks at multiple ∗ Corresponding

author Email address: [email protected] (Maoguo Gong)

Preprint submitted to Information Sciences

January 17, 2020

Table 1: Differences between our work and related works.

Work

Network

CDNE DynGEM [12] Dyngraph2vec [11]

Dynamic Dynamic Dynamic

DynamicTriad [49]

Dynamic

GraphSAGE [14]

Dynamic

CTDNE [31]

Dynamic

DHPE [50]

Dynamic

COSINE [47]

Static

ComE [5]

Static

M-NMF [41]

Static

CDE [24]

Static

GNE [6]

Static

Embedding method Autoencoder Autoencoder Autoencoder Dynamic triad Graph convolutional network Skip-Gram Matrix Factorization Probabilistic generative Probabilistic generative Matrix Factorization Matrix Factorization Probabilistic generative

Consider global&local &smooth loss function? Yes&Yes&Yes Yes&Yes&No Yes&Yes&No

Consider community preservation? No No No

Consider first order & second order proximity Yes&Yes Yes&No Yes&No

Consider temporal node evolution? Yes No No

Consider smooth community evolution? Yes No No

No&Yes&Yes

No

No&No

Yes

No

No&Yes&No

No

No&No

No

No

No&Yes&No

No

Yes&Yes

No

No

Yes&No&Yes

No

Yes&No

No

No

No&Yes&No

Yes

Yes&No

No

No

No&Yes&No

Yes

Yes&Yes

No

No

Yes&Yes&No

Yes

Yes&Yes

No

No

No&Yes&No

Yes

Yes&No

No

No

No&Yes&No

Yes

Yes&No

No

No

non-linear time stamps, respectively [11]. Zhou et al. [49] and Zhu et al. [50] learned the smooth evolution of triads and link proximities, respectively. Du et al. [7] learned the smooth evolution of the most influential nodes. GraphSAGE [14] and DHPE [50] imposed some extra feature information into dynamic networks, aiming to preserve the dynamics of both network structures and node attributes. However, the smooth dynamic assumption in [7, 11, 12, 14] does not always hold as the microscopic node and link structures of some systems (like communication systems and online systems) may evolve sharply [31, 49, 50]. Moreover, this assumption neglects the dynamics of the macroscopic community structures in dynamic networks. A community in a dynamic network is a cluster of nodes with similar functions and dense communications to each other [10, 28, 30]. In many real-world dynamic systems, the microscopic node structures greatly change over time whereas the macroscopic community structures smoothly evolve over continuous time. For instance, users and their communications temporarily appear or disappear over time in mobile sensing or email systems, whereas a population of individuals continuously evolves over time by natural selection rules in ecological systems. Although the effects of smooth node dynamics on DNEs has been considered recently, little attention has been paid to the preservation of temporal node dynamics and continuous community dynamics. In this paper, we present a novel dynamic network embedding method (called as CDNE), with consideration of the dynamics of the microscopic node structures and macroscopic community structures. The intuition behind CDNE is that communities generally have a smoother evolution than nodes and links in dynamic systems. In this case, the embedding representations of a community at adjacent time should be close to each other in the low-dimensional space. Moreover, two similar nodes should be close to each other at a time. To preserve the dynamics of communities, we present a communityaware smoothing technique and adopt a temporal regularizer. To model the similarity of nodes, we use a combination of the first-order proximity and the second-order proximity of nodes in dynamic networks. Then, we adopt a stacked deep autoencoder algorithm to learn the low-dimensional representations of nodes. A comparison of our CDNE with the most relevant works is given in Table 1. Our main contributions are summarized as follows. • We propose a novel embedding method (CDNE) for NEs in dynamic networks, aiming to learn the low-dimensional representations of nodes over time while preserving the global node structures, local link structures, and continuous community dynamics. CDNE models the DNE problem in dynamical networks as a minimization of an overall loss function, in which a community-aware smoothing loss function is presented to preserve the continuous community dynamics. • We adopt a deep autoencoder algorithm for solving the minimization problem in CDNE, which enables to obtain the low-dimensional representations of nodes. 2

• We validate the superiority of CDNE over several state-of-the-art DNE methods on two synthetic networks and nine real-world networks, in terms of performance and scalability on graph mining tasks of network reconstruction, link prediction, network stabilization and community stabilization. The rest of the paper is organized as follows. Section 2 gives the DNE problem definition and formulation in CDNE, while Section 3 introduces our solution for the DNE problem. Experimental results are analyzed in Section 4, and concluding remarks and some possible paths for future work are given in Section 5. 2. Problem Definition and Formulation This section provides the definition and formulation of DNEs in CDNE. 2.1. Problem Definition Notations: We use italic lower-case letters, upper-case letters, block upper-case letters and calligraphic letters to represent scalars, vectors, matrices and sets, respectively. For a matrix A, we also represent the (i, j)th entry of A as Aij , and denote all the indices of the ith row of A as Ai. . The bold upper-case letters are used to represent functions. For a set V, the operator |V| is used to denote the number of its elements. For a vector S, we let the operator ||S|| be the sum of its elements. For two vectors S1 and S2 , the operator ||S1 − S2 ||22 computes their 2-norm Euclidean distance. In real-world applications, complex systems are composed of sets of entities and communications, in which entities represent the basic functional units while communications denote the functional connections among entities. For instance, in mobile crowdsourcing systems, tasks and mobile sensors are entities while the observations of sensors to tasks are functional links. In transportation systems, stations on undergrounds and their routings are entities and their functional communications, respectively. Those systems can be represented by a network or graph model G = {V, E}, in which nodes V and edges E correspond to entities and their communications, respectively. Let n = |V| and m = |E| be the numbers of nodes and edges in G, respectively. Moreover, most real-world systems have the property of dynamics, in which entities join/leave the systems while their communications appear and disappear over time [9, 34, 37]. The dynamics of systems will cause the evolution of their underlying function. For instance, in mobile crowdsourcing systems, the signals of mobile sensors always change with the positions of individuals, resulting in the variations of aggregation results for tasks. In power systems, new power stations generally replace old power stations due to the increasing power demand, causing the appearance and disappearance of physical links among power stations. Those systems can be well represented by dynamic networks. Definition 1: Dynamic Network. A dynamic network G is a series of network snapshots {G 1 , G 2 , . . . , G t } whose structures G a = {V, E a }, a = 1, 2, . . . t, evolve over time, where t is the number of time stamps and E a represents the edges of G at ath time stamp. For each a ∈ {1, 2, . . . , t}, E a can be represented by an adjacent matrix Aa = [Aaij ∈ {0, 1}]n·n , where Aaij denotes the possible link between nodes i and j. Specifically, if nodes i and j at ath time stamp have a link eaij ∈ E a , Aaij = 1, and Aaij = 0 otherwise. The dynamics of nodes and links in a dynamical network G can be well represented by the adjacent matrix Aa . Specifically, a • ||Aa−1 i. || = 0 and ||Ai. || > 0: node i is emerged at ath time stamp. a • ||Aa−1 i. || > 0 and ||Ai. || = 0: node i is leaved at ath time stamp. a • ||Aa−1 ij || = 0 and ||Aij || = 1: link eij is appeared at ath time stamp. a • ||Aa−1 ij || = 1 and ||Aij || = 0: link eij is disappeared at ath time stamp.

In addition, many real dynamic systems have the property of communities. A community in a dynamic system is composed of a set of entities with similar functions [10, 28, 30]. For instance, in mobile crowdsourcing systems, a community is composed of a set of mobile sensors that perform similar tasks in the same area. In online game systems, a community is a team of players who often cooperate with each other to complete their common tasks. In dynamic networks, a community is defined as follows. Definition 2: Community Structure. A community in a dynamic network consists of a set of nodes, such that the nodes in the community are densely linked with each other whereas they are sparsely connected with the other nodes at all time stamps [10, 28, 30]. For a dynamic network G represented by A, it has n · n · t possible features, and each feature represents the possible link between a pair of nodes. The DNE problem is motivated by the limits of classical data mining techniques on revealing useful information from the n · n · t possible features in A, which are generally sparse and useless. Definition 3: Dynamic Network Embedding (DNE). Given a dynamic network G = {G 1 , G 2 , . . . , G t } and a number d of dimensions, DNE aims to find a time-series of mappings F = {F1 , F2 , . . . Ft }, and each mapping Fa : Sa → Ha , a = 1, 2, . . . , t, learns the latent d dimensional representations Ha of nodes at ath time stamp, such that the learned 3

G a-1

Ga

i

i

Evolution of networks

C1

C2

C1

C2

j

Embedding

Embedding

C1

0.5 Dimension 1

i

0.5 Dimension 2

0.5 Dimension 2

0

1

1

i

C2

j

j

C1 j 0

1

C2 0.5 Dimension 1

1

Ha

H a-1

Figure 1: Schematic illustration of the nodes and communities of a toy network with 10 nodes and 2 time stamps in both network dimensions G and embedded dimensions H. Each set represents a community, and there are two communities in the toy network. The dotted links denote the disappeared links while the full links marked with red colors are the appeared links at the next time stamp a, respectively.

representations H = {H1 , H2 , . . . , Ht } can effectively preserve both the structural information S = {S1 , S2 , . . . , St } and the evolutionary pattern in G. Under the assumption that the node structures of dynamic networks constantly evolve over time [7, 11, 12, 14], some DNE methods have been proposed to learn the low-dimensional representations of nodes while preserving the microscopic node structures. However, this assumption does not always hold [49, 50] in real systems. For instance, in mobile crowdsensing systems, entities (mobile sensors and tasks) and their communications temporarily appear or disappear over time. Moreover, this assumption neglects the evolution patterns of macroscopic communities in dynamic networks. Generally, communities have a smoother evolution than nodes and links in dynamic systems. For instance, in ecological systems, a population of individuals continuously evolves over time by natural selection rules whereas an individual may suddenly disappear due to death caused by diseases. Fig. 1 gives a schematic illustration about the evolutionary patterns of nodes and communities in a toy network with 10 nodes and 2 time stamps. As shown in Fig. 1, when the link structures of nodes i and j are changed at adjacent time stamps, their low-dimensional representations at adjacent time stamps may be greatly changed, whereas their communities C1 and C2 constantly evolve over time in both the network representation and the low-dimensional representation. Following this idea, we present a community-aware smoothing technique to preserve the continuous evolution of communities in dynamic networks. 2.2. Problem Formulation In this part, we begin with a formulation of the preserved structures and evolutionary patterns, followed by a formulation of the optimization problem in CDNE. 2.2.1. Global Structure Preservation The global structure of a network reflects the similarities of nodes which are usually evaluated by their link structures. However, this evaluation neglects the similarities of two unlinked nodes (see Fig. 2). Recent years, many indexes have been proposed to further evaluate the similarities of nodes, including common neighbors, high-order proximity, salton index, jaccard index, preferential attachment index, Katz index, Random Walk, PageRank index, Matrix Forest Index, H-index, and etc [25]. Here, we use a combination of the first-order proximity S1 and the second-order proximity S2 to represent the global structure of networks. S1 evaluates the number of direct neighbor of nodes while S1 measures the similarity of the common neighbors of nodes. Specifically, for each time stamp a ∈ {1, 2, . . . , t}, S1,a is computed as follows [41]. S1,a = Aa , and for each pair (i, j) of nodes, i, j ∈ V, S2,a ij is computed by using the cosine similarity [41]. S2,a ij =

||Aai. . · Aaj. || , ||Aai. || · ||Aaj. || 4

i j

o

Figure 2: Illustration of the common neighbors and similarities of nodes in a toy network with 8 nodes. Nodes j and o are similar as they are linked directly. Nodes i and j are similar as they have four common neighbors plotted in the shaded regions.

where ||Aai. || computes the number of neighbors that are linked with node i, and ||Aai. . · Aaj. || evaluates the number of common neighbors between nodes i and j. Generally, S2,a ij is in the range [0, 1], and two similar nodes i and j have a high S2,a value. ij Here, we use a combination of the first-order proximity and second-order proximity to model the global structures S of a dynamic network. Specifically, the global structure Sa at ath time stamp is computed as follows [41]. (1)

Sa = S1,a + λS2,a ,

where λ > 0 is a parameter which controls the weight of the second-order link proximity at the global structures. To preserve the global structures S, we adopt the objective of a stacked deep autoencoder which minimizes the global ˆ (S ˆ = D(F(S))) decoded by the decoder reconstruction error Lg between the true global structures S and the structures S D from the low-dimension representations of networks generated by the encoder F for the structures S. Here, the reconstruction error Lag at ath time stamp is computed by the Euclidean distance as follows. Lag =

n X i=1

2

ˆai. − Sai. k , kS 2

(2)

a 2 ˆa − Sa k2 = Pn (S ˆa ˆa where kS i. i. 2 j=1 ij − Sij ) computes the 2-norm Euclidean distance (loss) between the decoded structures Si. and the original ones Sai. of node i at ath time stamp.

2.2.2. Local Structure Preservation The local structure of a node is the neighbors of the node. From the view of social homophily, the neighbors of an individual i are a set of individuals that are highly connected with i. For a dynamic network G, the neighbor of a node i at ath time stamp is composed of a set of nodes that are directly linked with i in G a . To preserve the local structure of dynamic networks at ath time stamp, we construct a local loss function Lal which evaluates the distances of all linked nodes in the embedded space. Specifically, Lal is computed as follows: X 2 Lal = kHai. − Haj. k2 , (3) eij ∈E a

2

where Hai. is the low-dimension representation of node i at ath time stamp in the embedded space, and kHai. − Haj. k2 evaluates the distance between two nodes i and j in the embedded space. Based on the social homophyly, the local structures of G a can be effectively preserved by a minimization of Lal during the embedding processes. 2.2.3. Evolution Community Preservation A community in a dynamic network G is composed of a set of nodes which are densely linked with each other and have similar structure functions at most time stamps. Let C = {C1 , C2 , . . . , Cq } be the community structures of G, where q is the number of communities, and Ck represents the kth community. In the embedded space, each community Ck has its low-dimensional representation HCk . In many real systems, the microscopic node structures temporally change over time whereas the macroscopic community structures smoothly evolve over time. In other words, although a node i may sharply change its link structures at adjacent time stamps, its community property evolves smoothly. Motivated by this intuition, we present a community-aware smoothing technique with an evolution loss function Lc , which minimizes the Euclidean distance between nodes and their communities in the low-dimensional representations at adjacent time stamps. The community evolution loss function Lac at ath time stamp is computed as follows:  q 2 P P kHai. − Ha−1 if a>1 Ck k2 Lac = k=1 i∈Ck (4)  0 if a=0, 5

a-1 S$

µ a-1 W µ H

'HFRGHU'

'HFRGHU'

µ W

a

a-1

H a-1

W a-1 W a-1

S

a S$

µ a-1 W

(QFRGHU)

(QFRGHU)

W

µa H

Lag

Ha

Lal

a

Sa

a-1

Compute S

H

a-1

H

Compute S

a-1 G•

Lac

a-1

a G•

Figure 3: CDNE: Community-aware Dynamic Network Embedding at ath time stamp.

where HaCk is the low-dimensional representation of community Ck at ath time stamp. HaCk is evaluated by the average of the low-dimensional dimensions of all nodes in Ck at ath time stamp. P Hai. a (5) HCk = i∈Ck . |Ck | The minimization of Lac enables to maintain the community stability of a dynamic network in embedding processes. Note that, in real systems, a large-scale community may consist of hundreds of nodes distributed into several small-scale communities. Nodes in a small-scale community are more likely to have a link than those in a large-scale community. This means that in the embedded space, a node is closer to its small-scale community than its large-scale community, while a small-scale community is more sensitive to the evolution than a large-scale community. To make a tradeoff between the sensibility and stability, we preserve the continuous dynamics of small-scale communities in dynamic networks. Let Ck = {Ck1 , Ck2 , . . . , Ckqk } be the small-scale of a community Ck , k = 1, 2, . . . , q, where qk is the number of small-scale communities in Ck . In this case, the community evolution loss function Lac of a dynamic network at ath time stamp is further evaluated as follows. P q P qk P a−1 2 a  if a>1 i∈Ckj kHi. − HC j k a k 2 Lc = k=1 j=1 (6)  0 if a=0, where HaC j is computed as follows. k

HaC j k

=

P

a i∈Ckj Hi. . |Ckj |

(7)

Here, qk is determined by the predefined parameter w ∈ [1, n] which controls the size of small communities. Note that, when w = 1, Lac represents the loss function of node evolution, while w = n, Lac denotes the loss function of large-scale community evolution. A proper w is crucial to preserve the evolutions of both nodes and communities in dynamic networks. To evaluate Lac in Eq. (6), we need to have some prior knowledge about community structures of dynamic networks. However, in real dynamic networks, the community structures are generally unknown a priori, but they can be detected by classical dynamic community detection algorithms [26, 28, 33]. Here, we choose Genlouvin [28] to detect communities in dynamic networks due to its low computational complexity and high performance, which enables to tackle large-scale networks. Genlouvin is a multilevel community grouping technique (i.e., BGLL [2]) together with a community division technique (i.e., Kernighan-Lin [29]), and its detailed operations are given in [2, 28]. Moreover, classical clustering algorithms can be used to divide a large-scale community into a set of small-scale communities, including hierarchical clustering [10], Kernighan-Lin [29], multi-scale modularity optimization [36], K-Means [18]), etc. Among those algorithms, K-Means can produce tighter clusters than the other clustering algorithms, and each node in a cluster is close to its cluster center. This property satisfies our intuition that nodes in a small community are closer to those in a large community. K-Means can effectively control the number and the size of small communities. Motivated by these properties, here, we choose K-Means to find small-scale communities, and set the size of a small-scale community as a predefined value w.

6

2.2.4. Objective Given a dynamic network G with t time stamps and a predefined dimension number d, the objective of our DNE problem is to find optimal encoders F = {F1 , F2 , . . . Ft } and decoders D = {D1 , D2 , . . . Dt }, in which an encoder Fa , a = 1, 2, . . . , t, learns the latent d dimensional representations Ha of the nodes of G at ath time stamp, while a decoder Da ˆa , so as to minimize the maps the low-dimensional representations Ha to the high-dimensional network representation S global reconstruction loss function in Eq. (2), the local structure loss function in Eq. (3), and the community evolution loss function in Eq. (6). For an optimization problem with three objectives, a weighted sum approach is usually chosen to construct a convex combination of different objectives, which adopts two weight parameters to control the weights of each objective in the convex combination, and lets the weights of the three objectives sum to 1. Here, similar to most of the existing NE methods, we choose the global loss function Lag as the reference objective, and set its weight parameter as 1, and adopt two parameters α ≥ 0 and β ≥ 0 to control the fractions of the preserved local structures and evolutionary patterns to the preserved global structures, respectively. Generally, the performance of graph tasks is highly related to the preservation of the global node structures in networks. Formally, the constructed overall loss function is shown as follows. La = Lag + α · Lal + β · Lac .

(8)

Notice that the objectives Lag , Lal and Lac in Eq. (8) are the normalized items and the regularization objective is omitted as it is tackled by normalization techniques during training. Our simulation and experimental results in Section 4 suggest that the setting of α affects the performance of our CDNE on the prediction tasks (network reconstruction and link prediction), while the setting of β has significant impacts on the stabilization tasks (network stabilization and community stabilization) of CDNE. Fig. 3 gives the overall framework of our CDNE method. 3. Architecture of CDNE with Stacked Deep Autoencoder To minimize the overall loss function in Eq. (8), CDNE tries to find optimal encoder functions F = {F1 , F2 , . . . Ft } and decoder functions D = {D1 , D2 , . . . Dt }. Here, we introduce a stacked deep autoencoder algorithm to learn the nonlinear embedding functions F and D due to its superiority in analyzing nonlinear properties of data. Given a dynamic network G represented as {A1 , A2 , . . . , At }, the architecture of CDNE with a stacked deep autoencoder at ath time stamp is shown in Fig. 3. As shown in Fig. 3, this stacked autoencoder mainly consists of an encoder F and a decoder D. The encoder maps the input data Sa at time a to the low-dimensional representations Ha = [Haij ]n·d ∈ Rn×d , while the decoder maps the ˆ a = [S ˆa ]n·n ∈ Rn×n . Moreover, the F embedded representations Ha to the high-dimensional network representations S ij and D are optimally trained by minimizing the overall loss function in Eq. (8). Encoder. For each node i at ath time stamp, its low-dimensional representation Ha,o i. at the layer o of the stacked encoder is computed as follows. a,o Ha,o (Ha,o−1 ) = S(Wa,o · Ha,o−1 + ba,o ), (9) i. = F i. i.

where Wa,o and ba,o are the weight parameters and the bias parameters of the encoder at the layer o, respectively, and S(·) is a sigmoid function which nonlinearly maps an input value (e.g. x) into 1/(1 + exp−x ). ˆa,o at the layer o of the Decoder. For each node i at ath time stamp, its high-dimensional network representation S i. stacked decoder is computed as follows. ˆa,o ), ˆa,o = Da,o (Ha,o ) = S(W ˆ a,o · Ha,o + b S i. i. i.

(10)

ˆa,o are the weight parameters and the bias parameters of the decoder at the layer o, respectively, ˆ a,o and b where W Different from classical stacked autoencoders, our stacked deep autoencoder aims to learn a d-dimensional representation H while preserving both the global network structures, local network structures and the community evolution of dynamic networks by minimizing the overall loss function in Eq. (8). This minimization problem can be solved by some classical optimizers (like Genetic Algorithm [42], ADAM [19], Batch Gradient Descent, Stochastic Gradient Descent (SGD) [3], etc.). Among those optimizers, SGD shows its superiority in fast convergence, strong stability and high performances, which is able to tackle the embedding of large-scale dynamic networks. Here, we adopt SGD as the optimizer to optimize the weight and bias parameters in the stacked deep autoencoder. ˆa,o } of the oth layer of the autoencoder at ath ˆ a,o , ba,o , b Specifically, SGD updates these parameters θa,o = {Wa,o , W time stamp by going along the direction at a small learning rate δ. Formally, for each pair (u, v) of nodes in the deep

7

autoencoder, the parameters θa,o are updated as follows. ∂ a,o , a,o L ∂Wu,v ∂ La,o , = ba,o u −δ· ∂ba,o u ˆ a,o − δ · ∂ La,o , =W u,v a,o g ˆ u,v ∂W

a,o a,o Wu,v = Wu,v −δ·

ba,o u ˆ a,o W u,v

ˆa,o = b ˆa,o − δ · b u u

(11)

∂ La,o g , ˆ ∂ ba,o u

  a,o a,o ˆ a,o = . By defining Za,o = Wa,o Ha,o−1 + ba,o , Ha,o = S(Za,o ), Z = ∂W∂u,v La,o + β · La,o where ∂W∂u,v a,o L a,o g + α · Ll c ˆa,o , H ˆ a,o Ha,o + b ˆ a,o = S(Z ˆ a,o ) and the character index ζ ∈ {g, l, c}, we have W n

X ∂ ∂ ∂ a,o a,o a,o a,o Lζ = a,o Lζ · a,o Zv ∂Wu,v ∂Z ∂W v u,v i=1 =

n X i=1

∂ La,o ζ = ∂ba,o u =

n X i=1

n X i=1

∂ La,o = a,o g ˆ u,v ∂W =

n X i=1

n X i=1

∂ La,o g = ˆ ∂ ba,o u =

n X i=1

n X i=1

a,o T where (Ha,o i. ) is the transpose of Hi. , and

∂ a,o−1 T ) , La,o ζ · (Hi. ∂Za,o v ∂ ∂ La,o Za,o v ζ · ∂Za,o ∂ba,o v u ∂ La,o ζ , ∂Za,o v

(12)

∂ ∂ ˆ a,o La,o Z g · a,o v ˆ a,o ˆ ∂Z ∂ W v u,v ∂ ˆ a,o ∂Z v

a,o T La,o g · (Hi. ) ,

∂ ∂ ˆ a,o La,o · a,o Z v a,o g ˆ ˆ ∂ Zv ∂ bu ∂ La,o g , ˆ ∂ Za,o v

∂ ∂ ∂ La,o La,o La,o g , ∂Za,o c l , ∂Za,o ∂Za,o v v v

and

∂ La,o g ˆ a,o ∂Z v

are computed as follows.

n X ∂ ∂ a,o a,o 0 a,o Wu,v · La,o a,o Lg = ( g ) · S (Zv ), ˆ ∂Zv ∂ Za,o v i=1 n X ∂ a,o a,o a,o =− Aai,j (Ha,o a,o Ll i. − Hj. ) · Zv , ∂Zv i,j=1

p n X X ∂ a,o L = − ((Ha,o − Ha−1,o ) · Cjik ) · S0 (Za,o c v ), i Ckj ∂Za,o v i=1 j=1

(13)

n

X a,o−1 ∂ ˆ a,o ) · S0 (Z ˆ a,o La,o =− (Hi. −H g v ), i. a,o ˆ ∂ Zv i=1

where S0 (x) is the derivative value of S(x). The framework of CDNE for DNEs is given in Algorithm 1. Specifically, CDNE works on a dynamic network G as follows. First, the community structures C of G are detected by Genlouvin [28] (line 2). Second, each community of C is further divided into a set of small-scale communities by using K-Means [18] (line 3). Finally, for each snapshot of G, a stacked autoencoder is used to learn the low-dimensional representations of nodes, and its parameters are trained by SGD (lines 7-10).

8

Algorithm 1 Algorithm: Framework of CNDE 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

Input: Dynamic network: G = {G 1 , G 2 , . . . , G t } and number of dimensions: d; Detect the community structures C of G by using GenLouvin [28]. Divide each community of C into a set of small-scale communities by using K-Means [18]. ˆ0 } randomly for all layers of deep autoencoder. ˆ 0 , b0 , b Generate initial θ0 = {W0 , W a for each snapshot G , a = 1, 2, . . . , t do Calculate Sa by Eq. (1). ˆa } are not converged) do ˆ a , ba , b while (Parameters θa = {Wa , W a a b Compute H and S based on Eqs. (9-10), respectively. ˆa } based on Eqs. (11-13). ˆ a , ba , b Update θa = {Wa , W end while Compute HaC j by Eq. (7) for each small community. k

end for Output: H = {H1 , H2 , . . . , Ht }.

Table 2: Basic information of all test dynamic networks. d¯ and qm are the averaged node degree and the multiplex modularity [28] of the networks, respectively. A multiplex network with a higher qm value has more clear community structures.

Networks SYN SBM ia-contacts football ia-email ia-enron TCGA ca-cit-HepTh soc-wiki-elec tech-as-topology ia-stackexch

n 128 1,000 113 120 150 150 1,214 1,500 4,583 5,023 10,217

m 20,480 727,839 20,488 7,020 42,286 12,026 453,180 5,654,158 718,220 568,174 927,218

t 10 7 7 5 8 6 4 15 10 9 7

q 5 2 4 9 4 7 18 5 4 15 23

d¯ 16.00 104.0 25.90 11.70 35.24 13.36 93.32 251.3 15.67 12.57 12.96

qm 0.5951 0.4617 0.1927 0.6357 0.1983 0.5013 0.3632 0.2262 0.4948 0.4856 0.4100

Field Simulation Simulation Communication Social Email Communication Biological Social Social Physical Social

4. Experimental Results In this section, we test CDNE on two simulated networks and nine real-world networks, and use four criteria to comprehensively evaluate its performance. Moreover, we compare CDNE with seven classical NE methods to show its superiority. In the following, the experimental settings are first given, including the test networks, the criteria and the baseline algorithms for comparisons, and then the experimental comparisons between CDNE and all baseline algorithms are made. Finally, the effects of some parameters’ setting on the performance of CDNE are analyzed. 4.1. Experimental Settings 4.1.1. Experimental Networks Two benchmark networks (the SYN network [10] and the SBM network) [21] and nine real-world dynamic networks are chosen as the test networks. The SYN network is the dynamical version of GN benchmark networks [10]. It consists of 128 nodes, 1,024 links and 10 time stamps. In SYN, nodes are evenly distributed into 4 communities. The communities smoothly evolve over time whereas the microscopic links among nodes greatly change over time. The SBM network [21] is extracted from the dynamic Stochastic Block Model, containing 1,000 nodes, 727,839 links and 7 time stamps. In SBM, nodes are randomly distributed into 2 communities, and each community smoothly evolves over time. The ia-contacts network [35] is a real temporal communication network, in which nodes represent individuals while edges denote physical communications. It includes 113 nodes, 20,488 edges and 7 time stamps. The football network [16] records the schedule of National Collegiate Athletic Association (NCAA) Football Division 1-A, in which nodes represent the football teams in 116 schools while links denote the games among teams in the regular seasons of 2005-2009. This network has 120 nodes, 7,020 links and 5 time stamps. The ia-email network [35] describes the temporal email communications between employees of a manufacturing company in 5 days. It has 150 employees, 7,020 communications and 5 time stamps. The ia-enron network [35] denotes the communications between 150 enron employees across the entire time range (1998 to 2002). The link represents the exchange of at least 5 communications across employees.

9

Table 3: Layers configuration of the deep autoencoder on test networks.

Networks SYN SBM ia-contacts football ia-email ia-enron TCGA ca-cit-HepTh soc-wiki-elec tech-as-topology ia-stackexch

Layers Settings 128-64-32-16 1000-512-256-128 113-64-32-16 120-64-32-16 150-64-32-16 150-64-32-16 1214-512-256-128 1500-512-256-128 4583-2048-1024-256 5023-2048-1024-256 10217-4096-2048-512

Model Settings (α/β/w/λ) 0.5/5.0/5.0/2.0 0.5/5.0/5.0/2.0 5.0/0.1/5.0/2.0 1.0/1.0/5.0/2.0 1.0/2.0/5.0/2.0 5.0/2.0/5.0/2.0 1.0/1.0/5.0/2.0 0.5/0.5/5.0/2.0 5.0/1.0/5.0/2.0 5.0/1.0/5.0/2.0 5.0/1.0/5.0/2.0

The TCGA network [27] illustrates the gene expression data on the cancer progression, in which nodes represent genes while edges denote the gene transformation in the gene expression data. Here, 1,214 genes and 453,180 transformations across 4 time stamps are extracted from the gene expression data as the TCGA network. The ca-cit-HepTh network [35] is the Arxiv Hep-th (high energy physics theory) citation graph collected from arxiv, in which nodes represent papers while links denote the citations among papers in the period from January 1993 to April 2003. Here, 1,500 papers, 5,654,158 cites and 15 time stamps of the network are tracked. The soc-wiki-elec network [35] illustrates the administrator election and the vote history from January 3, 2008. This network is collected based on the edit history in complete dump of Wikipedia page. Here, 4,583 voters and 718,220 elections across 10 time stamps are tracked. The tech-as-topology network [35] shows the logical or physical layout or configuration of a topology network. Here, 5,023 nodes and 568,174 links across 9 time stamps are extracted from http://networkrepository.com/tech-as-topology. php as the test network. The ia-stackexch network [35] is a bipartite stack overflow favorite network, which shows the question and answer in the website of the stack exchange networks. In this network, nodes denote users and posts while edges represent the favorite records. The test network contains 10,217 nodes and 927,218 edges across 7 time stamps. Table 2 gives some basic information about the test networks. 4.1.2. Baseline Algorithms Two SNE methods (SDNE [40] and M-NMF [41]) and five DNE methods (SAGE [14], DynGEM [12], Dyn2vecAE [11], Dyn2vecRNN [11], Dyn2vecAERNN [11]) are chosen as baseline algorithms. SDNE [40]: It adopts a structural deep autoencoder to generate the low-dimensional representations of nodes for static networks, which jointly exploits the first-order proximity and second-order proximity to preserve both the global and local network structures. Here, SDNE is worked on the test networks at each time stamp. The comparison between SDNE and CDNE is to show the superiority of CDNE in preserving the dynamics of networks. M-NMF [41]: It presents a novel modularized nonnegative matrix factorization (NMF) to learn the low-dimensional representation of networks, aiming to preserve both node structures and community structures. It uses a combination of first-order proximity and second-order proximity to represent node structures, and adopts a classical criterion (modularity [30]) to evaluate the quality of community structures. A comparison of M-NMF with CDNE is made to validate the superiority of CDNE in the preservation of community evolution. SAGE [14]: It uses a general inductive algorithm and a LSTM optimization architecture to generate the node embeddings of dynamic networks. SAGE learns the low-dimensional representation of new or unseen nodes by aggregating the features of their local neighborhoods. A comparison of CDNE with SAGE is to show the superiority of CDNE over the classical DNE method with a LSTM optimization architecture on solving the DNE problem. DynGEM [12]: It uses a deep autoencoder to solve the DNE problem in dynamic netowrks, in which a temporal regularizer is adopted to preserve the smooth node evolution. A comparison of CDNE with DynGEM aims to demonstrate the effectiveness of CDNE on preserving the macroscopic communities’ evolution in dynamic networks. Dyn2vecAE, Dyn2vecRNN, Dyn2vecAERNN [11]: They adopt deep architectures with multiple non-linear dense layers and recurrent layers to learn the temporal node evolution in dynamic networks. The dense layers are used to learn the structure patterns of dynamic networks at each time stamp while the recurrent layers are adopted to learn the temporal transitions of the networks. Dyn2vecAE, Dyn2vecRNN and Dyn2vecAERNN are different in the adopted deep architectures. Specifically, Dyn2vecAE adopts a vector deep autoencoder whereas Dyn2vecRNN and Dyn2vecAERNN use a vector recurrent neural network and a vector autoenncoder recurrent neural network, respectively. Comparisons of CDNE with Dyn2vecAE, Dyn2vecRNN and Dyn2vecAERNN are to demonstrate the superiority of CDNE over classical DNE methods on preserving both the structural information and the evolutionary patterns in dynamic networks.

10

4.1.3. Parameter Settings CDNE was tested on the architectures in the deep untoencoder with parameter settings in Table 3. All hyperparameters are set by adopting a grid search on validation sets, including the number of layers, the size of each layer, the learning rate (it is usually set to 0.01) and the control parameters α, β, λ and w in CDNE. To validate the stability of CDNE, the effects of some parameter settings on the performance of CDNE were also analyzed. For each simulation setting, we tested CDNE over 20 independent trials, and compared it with all baseline algorithms with the parameter settings recommended in the relevant manuscripts [11, 12, 14, 40]. All experiments were simulated on a Windows 7 Professional version with 2 cores (i3-4150CPU 3.5GHz) and 12.0 GB RAM. 4.1.4. Criteria To validate the performance of CDNE, its learned low-dimension representations of nodes are used to execute the graph tasks, including the network reconstruction, link prediction, network stabilization and community stabilization. Network reconstruction: It is used to test the performance of NE methods on reconstructing the link structures of networks. Here, we use the mean average reconstruction precision (pr ) [40] to evaluate the performance of all baseline algorithms on the network reconstruction. pr is the averaged fraction of the real links for each node (e.g., i) in the reconstructed links over time. For each node i at ath time stamp, the reconstructed links lie in the nearest ||Aai. || neighbors with node i in the embedded space. pr is calculated as follows.

pr =

t P n  P n P

a=1 i=1

j=1

 Xaij · Aaij /||Aai. || t

(14)

,

where Xaij ∈ {0, 1} denotes whether the edge eij is in the nearest ||Aai. || pairs of nodes for node i at time a. Specifically, if eij is in the nearest ||Aai. || neighbors of node i, Xaij = 1, and Xaij = 0 otherwise. Link prediction: For dynamic networks, it is adopted to predict the existence of links in the next time stamp. In DNEs, it is used to determine whether there exists a link eij at time stamp a + 1 based on the low-dimensional representations of node i and j at ath time stamp. Here, we also use the mean average prediction precision (pp ) to evaluate the performance of all baseline algorithms on the link prediction. Similar to pr , pp evaluates the averaged ratio of the real links for every node (e.g., i) at (a + 1)th time stamp that are in the nearest ||Aa+1 i. || pairs of nodes in the embedded representations at ath time stamp over the evolution. pp is calculated as follows. For each time stamp a, the real links of a network at this time are first used to train the weights of the deep autoencoder, and then the low dimensional representations of nodes at ath time stamp are obtained by the deep autoencoder obtains. Next, we calculate the Euclidean distance for any pairs of nodes at time a. Finally, we compute pp as follows.

pp =

t−1 n P P

a=1 i=1

P n

j=1

 Xa+1 · Aa+1 /||Aa+1 ij ij i. || t−1

.

(15)

Here, Xa+1 is evaluated based on the embedded representations of nodes at time a. ij Network stabilization: It evaluates the performance of DNE methods on the stabilization of embedding. Generally, a dynamic network should have similar evolutionary patterns in both the learned low-dimensional representation and the network representation over time. Here, the stabilization constant (ps ) [12] is used to evaluate the network stabilization, and it is evaluated as follows. ps =

0

max

a,a0 ∈{1,2,...,t}

||pas − pas ||22 ,

(16)

where pas evaluates the evolution ratio of the low-dimensional node representations to the network representations at ath time stamp, and it is computed as follows.   2 2 kHa+1 − Ha k2 /kHa k2 a  . (17) ps =  2 2 kAa+1 − Aa k2 /kAa k2 Community stabilization: It computes the stability of communities in dynamic networks on the embedded lowdimensional representations. Similar to the network stabilization, the community stabilization pc is evaluated as follows. pc =

0

max

a,a0 ∈{1,2,...,t}

11

||pac − pac ||22 ,

(18)

Table 4: Results on the test benchmark networks. p¯ and p ~ are the mean and best results on the criterion p for each algorithm over 20 independent trials, respectively. The best result for each network is marked in boldface.

Simulation networks

Criteria

SDNE

SAGE

DynGEM

Dyn2vecAE

Dyn2vecRNN

Dyn2vecAERNN

M-NMF

CDNE

p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c

0.4287 0.4776 0.4468 0.4588 0.8064 0.7893 0.7416 0.7134 0.2329 0.2347 0.2717 0.2738 1.281 1.261 1.162 1.131 0/16

0.3087 0.3317 0.2975 0.3148 1.232 1.087 2.137 0.6821 0.2299 0.2399 0.2682 0.2799 1.420 1.386 1.472 0.6863 0/16

0.4668 0.4969 0.4439 0.4600 0.7489 0.7310 0.6958 0.6860 0.2352 0.2427 0.2744 0.2831 0.9745 0.5161 0.9742 0.5161 2/16

0.2835 0.2938 0.2806 0.3016 1.519 1.129 1.406 0.9102 0.2783 0.2903 0.2785 0.2895 1.684 1.380 1.552 1.138 2/16

0.2964 0.3034 0.2921 0.2989 1.374 1.305 4.342 2.590 0.2696 0.2852 0.2693 0.2857 1.454 1.414 1.664 1.459 0/16

0.2938 0.3171 0.2857 0.2988 1.411 1.301 3.304 2.257 0.2806 0.2859 0.2805 0.2892 1.407 1.480 1.607 1.449 2/16

0.4669 0.4758 0.4268 0.4313 1.077 0.9893 1.325 1.122 0.2346 0.2364 0.2408 0.2427 1.083 0.9646 1.597 1.271 0/16

0.5392 0.5577 0.4570 0.4735 0.2234 0.2116 0.2502 0.2425 0.2749 0.2810 0.2736 0.2794 0.7255 0.6417 0.9477 0.7814 10/16

SYN

SBM

best/all

where pac evaluates the community evolution ratio of the low-dimensional representations to the network representations at ath time stamp. It is computed as follows.   2 q  kHa+1 − Ha k /kHa k2  X Ck 2 Ck 2 Ck   pac = /q. (19) 2 2 a+1 a kACk − ACk k2 /kAaCk k2 k=1

Generally, a DNE method with a smaller ps and pc value has a better performance on the network stabilization and the community stabilization of embedding, respectively.

4.2. Experimental Results 4.2.1. Experimental Results on Simulated Benchmark Networks All algorithms are tested on two simulated benchmark networks (SYN and SBM), and their statistical values (mean and best) of criteria (pr , pp , ps and pc ) over 20 independent trials are recorded in Table 4. The best results are highlighted in boldface for each network. The comparison results show that CDNE obtains the best results in 10 out of the 16 statistical values of criteria (62.50%), while SDNE, SAGE, DynGEM, Dyn2vecAE, Dyn2vecRNN, Dyn2vecAERNN and M-NMF perform best in 0, 0, 2, 2, 0, 2 and 0 cases, respectively, which validate the superiority of CDNE on the network embeddings for these benchmark networks. The SYN network has clear community structures and smooth node evolution over time. In this network, the deep autoencoder algorithms (SDNE, DynGEM and CDNE) have a better performance than the other algorithms which neglect a high-order link proximity and smooth node dynamics. Moreover, among the deep autoencoder algorithms, CDNE has the best performance on the network reconstruction, link prediction, network stabilization and community stabilization. This is because the community-aware smoothing techniques in CDNE can effectively use the community structures and their smooth evolution in the SYN network to guide the node embeddings. Compared with the SYN network, the SBM network has more link structures and less clear community structures (see Table 2). In this case, the Dyn2vec algorithms (Dyn2vecAE, Dyn2vecRNN and Dyn2vecAERNN), which use the recurrent neural networks to capture the temporal changes of microscopic node structures, show a good performance on tasks of network reconstruction and link prediction. However, they present a low performance on tasks of network stabilization and community stabilization as they ignore the smooth dynamics of the macroscopic community structures in networks. In this network, CDNE also has a good performance for all tasks, especially for network stabilization and community stabilization. Moreover, except for Dyn2vecAE and Dyn2vecAERNN, CDNE shows the best performance for tasks of network reconstruction and link prediction. Table 4 also shows an interesting phenomenon: for the deep neural networks based DNE methods (like SDNE and DynGEM, Dyn2vecAE, Dyn2vecRNN and Dyn2vecAERNN), an algorithm with higher prediction accuracies (¯ pr and p¯p ) generally has larger stabilization values (¯ ps and p¯c ). This is reasonable as the smooth evolution of dynamic networks would promote the preservation of link structures. To quantitatively show the overall performance, Fig. 4 plots the average performance ranks of all algorithms on the simulated benchmark networks. The results show that CDNE has an average performance rank (2.000) much smaller than 12

7 6 5 4

5.500

5.312

5.125

5.000

4.125

3

2.562 2.000

2 1

D

N

E

F M -N M

ER

D

yn

2v

ec

A

ec

R

N

N

N

N

E A ec D

D

yn 2v

G yn

yn 2v

EM

E G D

SA

SD

N

E

0

C

Average performance ranking

6.375

Figure 4: Average performance ranking of the criteria for all baseline algorithms on the simulated benchmark networks.

that of the other baseline algorithms. This further confirms the superiority of CDNE over the other baseline algorithms when comprehensively considering all the criteria on the benchmark networks. 4.2.2. Results on Real-world Networks To further demonstrate the effectiveness of CDNE, we test all baseline algorithms on nine real-world dynamic networks , and we record their statistical values (mean and best) of criteria (pr , pp , ps and pc ) over 20 independent trials in Tables 5 and 6. The results illustrate that CDNE obtains the best results in 62 out of the 72 statistical values of criteria (86.11%), while SDNE, SAGE, DynGEM, Dyn2vecAE, Dyn2vecRNN, Dyn2vecAERNN and M-NMF perform best in 4, 0, 2, 0, 0, 0 and 8 cases, respectively. Those validate the superiority of CDNE over the other baseline algorithms on the network embeddings for those real-world dynamic networks. The results in Tables 5 and 6 also illustrate that CDNE has the best performance on the prediction tasks (network reconstruction and link prediction) for most dynamic networks (ia-contacts, football and ia-email, ia-enron, TCGA, ca-cit-HepTh, soc-wiki-elec and tech-as-topology). For the ca-cit-Hep and ia-stackexch networks, CDNE has a worse performance than SDNE and M-NMF for the prediction tasks. Overall, CDNE has an averaged value (0.4343) of all prediction criteria in 5 and 6 higher than SDNE (0.3473), SAGE (0.2439), DynGEM (0.3199), Dyn2vecAE (0.2523), Dyn2vecRNN (0.2570), Dyn2vecAERNN (0.2725) and M-NMF (0.3516). Moreover, for most real networks, CDNE has the best performance on the stabilization tasks. Specifically, CDNE has reduced 91.45%, 94.25%, 67.22%, 95.48%, 94.82%, 94.88% and 93.54% of the averaged value of all stabilization criteria obtained by SDNE, SAGE, DynGEM, Dyn2vecAE, Dyn2vecRNN, Dyn2vecAERNN and M-NMF, respectively. The high performance of CDNE is attributed to its optimization model which simultaneously considers the embeddings of the high-order link proximity structures, the local link structures and the smooth dynamics of macroscopic structures (communities). Fig. 5 provides the average performance ranks in all the criteria on the real-world networks. The results show that CDNE obviously outperforms the other baseline algorithms when we consider the comprehensive performance of tasks on the test real-world networks. Specifically, the average performance rank (1.153) of CDNE is much smaller than that of SDNE (3.097), SAGE (5.874), DynGEM (3.375), Dyn2vecAE (6.819), Dyn2vecRNN (6.069), Dyn2vecAERNN (5.402) and M-NMF (4.181). Without the smoothing techniques and the preservation of high-order link proximity, the DNE methods (SAGE, Dyn2vecAE, Dyn2vecRNN and Dyn2vecAERNN) present a low performance when the test networks have community structures. In addition, from Tables 5 and 6, we can obtain the following interesting observations. 1. All DNE methods (SAGE, DynGEM, Dyn2vecAE, Dyn2vecRNN, Dyn2vecAERNN and CDNE) have a worse performance than the SNE methods (SDNE and M-NMF) on prediction tasks for the ca-cit-Hep and ca-stackexch networks. For those networks, there may have some tradeoffs between the prediction tasks and stabilization tasks in DNEs.

13

Table 5: Results on the small-scale real-world networks with less than 2,000 nodes. p¯ and p ~ are the mean and best results on the criterion p for each algorithm over 20 independent trials, respectively. The best result for each network is marked in boldface.

Real-world networks

ia-contacts

football

ia-email

ia-enron

TCGA

ca-cit-Hep

best/all

Criteria

SDNE

SAGE

DynGEM

Dyn2vecAE

p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c

0.3485 0.3556 0.3799 0.3889 2.945 2.737 4.574 4.209 0.8077 0.8222 0.8132 0.8264 1.160 1.145 1.690 1.664 0.4009 0.4101 0.4143 0.4200 6.199 6.072 15.77 15.46 0.2004 0.2213 0.2217 0.2440 0.3181 0.2766 0.1586 0.1302 0.4177 0.4293 0.3106 0.3216 1.014 0.9944 0.4357 0.4248 0.5168 0.5249 0.5964 0.6057 16.46 15.94 22.51 21.36 4/48

0.3162 0.3273 0.3452 0.3640 4.015 3.650 6.968 2.388 0.2796 0.3117 0.3516 0.3852 1.835 1.511 3.673 1.378 0.3690 0.3767 0.3816 0.3929 8.095 7.865 23.06 13.65 0.2449 0.2578 0.2547 0.2720 0.4322 0.3810 0.3361 0.2793 0.3294 0.3401 0.2374 0.2406 1.539 1.092 0.8892 0.4207 0.3169 0.3188 0.3696 0.3720 18.89 18.85 52.60 40.47 0/48

0.2989 0.3216 0.3290 0.3476 0.2146 0.2057 0.2847 0.2670 0.8239 0.8360 0.8340 0.8487 0.7813 0.7566 0.7377 0.7231 0.3741 0.3824 0.3896 0.3987 0.6269 0.2747 1.166 0.4802 0.1462 0.1627 0.1674 0.1891 1.583 0.8614 2.835 1.504 0.4225 0.4337 0.3075 0.3295 0.8089 0.5151 0.2705 0.2600 0.4038 0.4121 0.4666 0.4712 4.641 2.689 9.538 5.490 2/48

0.3412 0.3521 0.3705 0.3802 4.515 3.103 6.252 3.536 0.4642 0.5055 0.4606 0.4964 1.803 1.439 2.583 1.621 0.3664 0.3774 0.3779 0.3942 13.68 8.299 34.94 13.25 0.1426 0.1585 0.1639 0.1831 5.051 3.508 8.541 4.955 0.3192 0.3275 0.2641 0.2803 1.530 1.248 1.118 0.9206 0.2952 0.3754 0.3115 0.3974 28.46 19.40 68.18 35.12 0/48

Dyn2vec RNN 0.3444 0.3548 0.3706 0.3833 4.162 3.664 8.001 4.741 0.3229 0.4289 0.3217 0.4308 2.061 1.845 4.973 3.176 0.3697 0.3774 0.3808 0.3892 8.090 6.843 25.71 16.54 0.1584 0.1662 0.1835 0.1918 4.455 2.838 8.463 2.244 0.4104 0.4370 0.2956 0.3288 1.455 1.331 0.8082 0.6473 0.3592 0.3649 0.3794 0.3857 19.37 18.60 49.10 39.08 0/48

Dyn2vecAE RNN 0.3443 0.3539 0.3684 0.3797 4.076 3.533 7.900 4.672 0.4651 0.5015 0.4628 0.5003 2.130 1.654 3.951 3.064 0.3722 0.3787 0.3838 0.3913 8.386 6.824 27.23 17.20 0.1489 0.1624 0.1731 0.1876 4.408 3.246 8.824 4.135 0.4257 0.4365 0.3209 0.3363 1.347 0.8162 0.6490 0.5453 0.3543 0.3565 0.3743 0.3770 19.39 18.81 47.06 41.27 0/48

M-NMF

CDNE

0.3468 0.3578 0.3617 0.3760 4.023 3.892 6.402 6.254 0.8848 0.8892 0.8864 0.8916 1.684 1.540 3.006 2.870 0.3728 0.3796 0.3738 0.3822 6.932 6.234 16.03 14.56 0.3161 0.3205 0.2944 0.3067 3.763 1.888 4.512 2.150 0.4704 0.4829 0.3584 0.3681 1.437 1.128 0.6525 0.4905 0.4178 0.4204 0.4230 0.4281 24.93 22.66 27.34 25.87 4/48

0.3541 0.3698 0.3887 0.4071 0.3759 0.3240 0.2663 0.2004 0.8902 0.9023 0.8866 0.8985 0.1355 0.1094 0.1866 0.1233 0.4109 0.4332 0.4252 0.4468 0.0700 0.0011 0.0112 0.0011 0.3992 0.4200 0.4068 0.4250 0.2477 0.2412 0.1229 0.1179 0.5132 0.5220 0.3730 0.3763 0.3782 0.3431 0.1189 0.1070 0.4404 0.4490 0.4652 0.4745 1.773 1.530 3.070 2.593 42/48

2. Except for CDNE, all algorithms obtain higher p¯c values than p¯s values in most cases, and present a low performance on the stabilization tasks (network stabilization and community stabilization). Those are reasonable as they only consider the smooth evolution of microscopic node structures, while neglecting the dynamics of macroscopic community structures in dynamic networks. 3. SDNE always has a good performance for the prediction tasks (network reconstruction and link prediction) whereas it has a poor performance for the stabilization tasks (network stabilization and community stabilization). This is reasonable as SDNE adopts the high-order link proximity of nodes to model the network structures while neglecting 14

Table 6: Results on the large-scale real-world networks with more than 4,500 nodes. p¯ and p ~ are the mean and best results on the criterion p for each algorithm over 20 independent trials, respectively. The best result for each network is marked in boldface.

Real-world networks

soc-wiki-elec

tech-as-topology

ia-stackexch

best/all

Criteria

SDNE

SAGE

DynGEM

Dyn2vecAE

p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c p¯r p ~r p¯p p ~p p¯s p ~s p¯c p ~c

0.1482 0.1547 0.1526 0.1773 0.1928 0.1143 0.5839 0.3572 0.0944 0.1089 0.1021 0.1138 0.7564 0.6243 0.8645 0.4798 0.1032 0.1169 0.1107 0.1228 0.5283 0.4671 0.2911 0.1633 0/24

0.0773 0.0914 0.0844 0.1094 0.2384 0.1833 0.7621 0.5730 0.0574 0.0692 0.0749 0.0882 0.7951 0.6811 0.9659 0.8018 0.0802 0.0957 0.0960 0.1015 0.7993 0.7718 0.4898 0.3910 0/24

0.1344 0.1482 0.1479 0.1531 0.1024 0.0774 0.3839 0.1387 0.0868 0.0956 0.0973 0.1147 0.1224 0.0937 0.2748 0.1380 0.0918 0.1082 0.1075 0.1393 0.2933 0.2174 0.0618 0.0247 0/24

0.0752 0.0837 0.0793 0.0962 0.5378 0.2925 0.9800 0.8944 0.0723 0.0842 0.0802 0.0871 1.342 1.028 1.112 0.9262 0.0657 0.0816 0.0822 0.0937 0.7281 0.5271 0.4885 0.3593 0/24

Dyn2vec RNN 0.0893 0.0959 0.1031 0.1171 0.5053 0.2416 0.7682 0.4793 0.0828 0.0930 0.0952 0.1002 1.232 1.027 1.231 1.109 0.0793 0.0871 0.0827 0.0915 0.6571 0.4265 0.4362 0.1287 0/24

Dyn2vecAE RNN 0.1104 0.1182 0.1203 0.1239 0.4782 0.2872 0.5539 0.5481 0.1021 0.1175 0.1018 0.1138 1.183 0.9172 1.372 1.108 0.0805 0.0829 0.0874 0.0941 0.6196 0.5474 0.3217 0.1872 0/24

M-NMF

CDNE

0.1228 0.1311 0.1384 0.1547 0.3817 0.2356 0.4728 0.3822 0.0927 0.1043 0.1065 0.1174 1.153 0.9761 1.171 0.8194 0.1387 0.1514 0.1389 0.1519 0.5631 0.4152 0.3123 0.2597 4/24

0.1602 0.1714 0.1827 0.2081 0.0657 0.0492 0.0394 0.0138 0.1159 0.1256 0.1298 0.1402 0.0438 0.0249 0.0186 0.0115 0.1127 0.1256 0.1311 0.1355 0.0529 0.0371 0.0032 0.0017 20/24

6.819

7

6.069

5.874

6

5.402

5 4.181 4 3

3.375

3.097

2 1.153

D

N

E

F C

M -N M

ER

N

N cA

yn 2v e

D

D

N

N

E 2v

yn

ve cR

ec A

G yn D

yn 2

E G D

SA

SD

N

0

EM

1

E

Average performance ranking

8

Figure 5: Average performance ranking of the criteria for all baseline algorithms on the real-world networks.

the evolution of dynamic networks. 4. The deep autoencoder algorithms (SDNE, DynGEM and CDNE) have a good performance on the prediction tasks (network reconstruction and link prediction) whereas the other framework based algorithms (SAGE, Dyn2vecAE, Dyn2vecRNN and Dyn2vecAERNN) have a low performance for the football network which has clear community structures (i.e., qm > 0.6, see Table 2).

15

Table 7: Results on the real-world networks in terms of the ratio of the network stabilization ps to the community stabilization pc . re is the averaged ratio of the changed link structures of networks. The best result for each network is marked in boldface.

re

SDNE

SAGE

DynGEM

ia-contacts football ia-email ia-enron TCGA ca-cit-HepTh soc-wiki-elec tech-as-topology ia-stackexch

0.4961 0.7910 0.2922 0.5479 0.4351 0.7287 0.1419 0.3959 0.2192

0.6438 0.6865 0.3930 2.005 2.327 0.7312 0.3301 0.8749 1.814

0.5763 0.4995 0.3510 1.285 1.730 0.3591 0.3128 0.8231 1.793

0.7537 1.059 0.5376 0.5583 2.990 0.4865 0.2667 0.4454 4.745

Dyn2vec AE 0.7221 0.6980 0.3914 0.5913 1.368 0.4174 0.6012 1.206 1.490

1

0.5

0.8

0.4

Statistic values of stability

Statistic values of accuracy

networks

0.6 SYN with pr

0.4

SYN with pp 0.2 0

football with pr football with pp 0

2

4 6 8 Variations of parameter λ (a)

Dyn2vecAE M-NMF RNN 0.5159 0.6283 0.5846 0.5602 0.3079 0.4324 0.4995 0.8339 2.075 2.202 0.412 0.9118 0.8633 0.8073 0.8622 0.9846 1.926 1.803

CDNE 1.411 0.7261 0.6250 2.015 3.181 0.5775 1.668 2.355 16.53

SYN with ps SYN with pc football with ps football with pc

0.3 0.2 0.1 0

10

Dyn2vec RNN 0.5201 0.5809 0.3146 0.5264 1.800 0.3945 0.745 1.000 1.898

0

2

4 6 8 Variations of parameter λ

10

(b)

Figure 6: Statistical performance of CDNE VS. Control parameter λ for global structures on the SYN and football dynamic networks. (a) accuracy of prediction tasks and (b) performance of stabilization tasks.

5. M-NMF shows a good performance for the prediction tasks whereas it has a poor performance for the stabilization tasks. This is because M-NMF only considers the preservation of the global link structures and the local community structures. To further validate the effectiveness of CDNE on the preservation of network evolution (temporal node evolution and smooth community evolution), we record the ratio ps /pc of the network stabilization ps to the community stabilization pc in Table 7. The results show that for the dynamic networks (ia-contacts, football and ia-email, ia-enron, TCGA, ca-citHepTh, soc-wiki-elec and tech-as-topology) with smooth link evolutions, CDNE has the highest ps /pc values (ps /pc ≥ 1). However, the ratios (ps /pc ) computed by the other baseline algorithms are smaller than 1 in most cases. Those indicate that for those networks, the community evolution is smoother than the node evolution, and CDNE can effectively preserve this property. The results also show that for the dynamic networks (ca-cit-HepTh and ia-stackexch) with sharp link dynamic, the ratios (ps /pc ) computed by CDNE are smaller than 1. This may be because the sharp link dynamics of those networks would cause the temporal evolution of communities. 4.2.3. Effects of Parameter Settings In this part, the sensitiveness of parameters λ, α, β, d and w on the performance of CDNE is discussed for the simulated SYN network and the real football network. λ denotes the weight of the second-order link proximity at the preserving structures of the test networks. Fig. 6 shows the variations of the accuracy and stability of CDNE on the prediction tasks and stabilization tasks with λ, respectively. The results illustrate that the accuracy of CDNE changes directly with λ whereas the stability varies inversely with λ. This is reasonable as embedding the high-order link proximity structures facilitates the presentations of global structures. The results also show that when λ > 1, CDNE has a good performance for the prediction tasks on both the SYN and football networks.

16

1

0.3

SYN with p

SYN with ps

0.9

p

Statistic values of stability

Statistic values of accuracy

r

SYN with p

football with pr

0.8

football with pp

0.7 0.6 0.5 0.4 -1 10

0

1

10 10 Variations of parameter α

0.25

football with ps

0.2

football with pc

0.15 0.1 0.05 0 10-1

2

10

SYN with pc

(a)

100 101 Variations of parameter α

102

(b)

Figure 7: Statistical performance of CDNE VS. Control parameter α for preserving local structures on the SYN and football dynamic networks. (a) accuracy of prediction tasks and (b) performance of stabilization tasks.

0.3

0.8

Statistic values of stability

Statistic values of accuracy

1

0.6 0.4

SYN with pr SYN with pp

0.2 0 10-1

football with pr football with pp 100 101 Variations of parameter β (a)

SYN with pc football with ps

0.2

football with pc

0.15 0.1 0.05 0 10-1

102

SYN with ps

0.25

100 101 Variations of parameter β

102

(b)

Figure 8: Statistical performance of CDNE VS. Control parameter β for preserving evolutionary patterns on the SYN and football dynamic networks. (a) accuracy of prediction tasks and (b) performance of stabilization tasks.

α denotes the weight of the preserved local structures at the embedded structures and dynamics, which influences the preservation of the local structures. To investigate this influence, CDNE is tested with different α values in Fig. 7. The results illustrate that the prediction performance of CDNE slightly decreases with the increase of α whereas the stability performance of CDNE is improved with the increase of α. This is to be expected as two linked nodes will be closer in the embedded space with the increase of α at all time stamps. The low diversity of the representations of nodes is fit for the network stabilization whereas it is inappropriate for the link prediction. β specifies the weight of the preserved evolutionary patterns at the embedded structures and dynamics. Fig. 8 records the performance of CDNE varying with different β values. We can obtain similar observations from the variations of CDNE with α and β. This is because the preservation of the local structures and the dynamics of the networks would decrease the diversity of the low-dimensional representations of nodes in the embedding space. Fig. 9 gives the variations of the performance of CDNE with the number d of embedding dimensions. The results show that the performance of prediction tasks slightly increases with the increase of d whereas that of stabilization tasks is hardly improved. Note that, the computational complexity of a deep autoencoder increases with d. Here, we set d to 16 when considering a tradeoff between the computational complexity and the performance of CDNE. w denotes the number of nodes at each small community. Fig. 10 gives the performance of CDNE with different w. The results illustrate that both the prediction performance and the stability performance are improved with the increase of w. When w ≥ 5, CDNE always achieves a good performance for both the prediction tasks and stabilization tasks. Finally, we test the performance of CDNE under some special settings which consider all possible extreme combination in the overall loss function. In those experiments, we set the weight parameter of the loss function as ‘0’ when this loss 17

1

0.35

0.9

0.3

Statistic values of stability

Statistic values of accuracy

SYN with p SYN with pr

0.8

SYN with pp football with pr

0.7

football with pp

0.6 0.5 0.4

4

8

12 16 20 24 Variations of parameter d

28

0.25

football with ps

0.2

football with pc

0.15 0.1 0.05 0

32

s

SYN with pc

4

8

12

16

20

24

28

32

Variations of parameter d

(a)

(b)

Figure 9: Statistical performance of CDNE VS. Number d of embedding dimensions for preserving evolutionary patterns on the SYN and football dynamic networks. (a) accuracy of prediction tasks and (b) performance of stabilization tasks.

1 SYN with ps

0.8

Statistic values of stability

Statistic values of accuracy

1

0.6 SYN with pr

0.4

SYN with p

p

0.2 0 100

football with pr

SYN with pc

0.8

football with ps football with pc

0.6 0.4 0.2

football with pp 101 Variations of parameter w

0 100

102

(a)

101 Variations of parameter w

102

(b)

Figure 10: Statistical performance of CDNE VS. Averaged number w of nodes for small communities on the SYN and football dynamic networks. (a) accuracy of prediction tasks and (b) stability of stabilization tasks.

function is removed from the overall loss function. The results are shown as Table 8. The results in Table 8 show that the global loss function Lag determine the performance of CDNE on the prediction tasks, while both the local loss function Lal and the smooth loss function Lac affect the performance of CDNE on the stability tasks. They also illustrate that the ps and pc values of CDNE under the weight setting (1, 0, 1) are smaller than those of CDNE under the weight setting (1, 1, 0). This indicates that when compared with Lal , Lac has a greater impact on the performance of CDNE in terms of stability tasks. CDNE obtains the best performance on the prediction tasks and the stability tasks under the weight setting (1, 1, 1). This validates the effectiveness of the modeled objectives (see Eq. (8)) for DNEs. 5. Conclusions In this paper, we presented a community-aware dynamic network embedding method (CDNE), which learns the low-dimensional representations of nodes from the high-dimensional network representations while preserving both the structural information (microscopic node and link structures) and the evolutionary patterns (macroscopic communities’ evolution) of dynamic networks. In CDNE, the global nodes’ similarities and the local link structures were used to facilitate the embeddings of node structures while a community-aware smoothing technique was presented to preserve the smooth dynamics of communities in dynamic networks. Moreover, a stacked deep autoencoder algorithm was adopted to embed those structures and evolutionary patterns into the low-dimensional representations of nodes. Compared with classical DNEs, CDNE considers both the temporal dynamics of real systems and the continuous dynamics of the communities. 18

Table 8: Performance of CDNE under some special settings which consider all possible extreme combinations in the overall loss function. In those experiments, we set the weight parameter of the loss function as ‘0’ when this loss function is removed from the overall loss function.

datasets

SYN

football

Lag 1 0 1 0 1 1 0 1 0 1 0 1 1 0

Lal 1 1 0 0 0 1 1 1 1 0 0 0 1 1

Lac 1 1 1 1 0 0 0 1 1 1 1 0 0 0

p¯r 0.5776 0.2355 0.5665 0.1891 0.5663 0.5771 0.2356 0.8911 0.3498 0.8776 0.1939 0.8805 0.8907 0.5161

p~r 0.5888 0.3908 0.5712 0.1891 0.5783 0.5871 0.3367 0.9005 0.5289 0.8942 0.1939 0.8921 0.9002 0.6762

p¯p 0.4688 0.2328 0.4686 0.1951 0.4642 0.4674 0.2375 0.8910 0.3548 0.8781 0.1933 0.8845 0.8903 0.5246

p~p 0.4832 0.3432 0.4785 0.1951 0.4710 0.4763 0.3164 0.8987 0.5335 0.8941 0.1933 0.8909 0.8978 0.7029

p¯s 0.1089 0 0.5744 0 0.5904 0.5931 0 0.0518 0 0.2036 0 0.7523 0.5692 0

p~s 0.0700 0 0.5489 0 0.5001 0.5133 0 0.0408 0 0.1617 0 0.5299 0.5261 0

p¯c 0.1030 0 0.5334 0 0.5398 0.5587 0 0.0679 0 0.2317 0 1.2346 1.1364 0

p~c 0.0608 0 0.4976 0 0.4295 0.4779 0 0.0481 0 0.1725 0 0.7744 0.9851 0

Extensive experiments on both two simulated networks and nine real-world dynamic networks showed the superiority of CDNE over seven classical NE methods on both the prediction tasks (network reconstruction and link prediction) and stabilization tasks (network stabilization and community stabilization). This work was under the assumptions that all communities of a network evolve smoothly over time and all link structures can be observed, which may be inconsistent in some applications. In future work, we will consider the temporal dynamics of both nodes and communities in DNEs. Moreover, we will study the embeddings of dynamic networks while preserving the privacy of some sensitive structures (like the node structures, link structures and community structures). In addition, we will use a graph based deep autoencoder with some prior knowledge to simultaneously embed the network structures, the evolution patterns and the network attributes. Acknowledgements This work was supported by the Joint Funds of the National Natural Science Foundation of China under Key Program under Grant U1713212, the National Natural Science Foundation of China under Grants 61672358, 61572330, 61772393, 61806153, 61806061 and 61836005, and the Natural Science Foundation of Guangdong Province under grant 2017A030313338. References [1] N. Ahmed, L. Che, An efficient algorithm for link prediction in temporal uncertain social networks, Information Sciences 289 (2016) 120–136. [2] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of statistical mechanics: theory and experiment 2008 (10) (2008) P10008. [3] L. Bottou, Large-scale machine learning with stochastic gradient descent, in: Proceedings of COMPSTAT’2010, Springer, 2010, pp. 177–186. [4] S. Cao, L. Wei, Q. Xu, Grarep: Learning graph representations with global structural information, in: Twenty-Forty ACM International on Conference on Information and Knowledge Management, ACM, Melbourne, Australia, 2015. [5] S. Cavallari, V. W. Zheng, H. Cai, K. C.-C. Chang, E. Cambria, Learning community embedding with community detection and node embedding on graphs, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, ACM, Singapore, 2017. [6] L. Du, Z. Lu, Y. Wang, G. Song, Y. Wang, W. Chen, Galaxy network embedding: A hierarchical community structure preserving approach., in: IJCAI, Stockholm, Sweden, 2018. [7] L. Du, Y. Wang, G. Song, L. Zhicong, W. Junshan, Dynamic network embedding: An extended approach for skipgram based network embedding, in: Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI, Stockholm, Sweden, 2018. 19

[8] D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, T. Hirzel, R. P. Adams, Convolutional networks on graphs for learning molecular fingerprints, in: Thirtieth Neural Information Processing Systems, MIT Press, Montreal, Canada, 2015. [9] E. Ghalebi, B. Mirzasoleiman, R. Grosu, J. Leskovec, Dynamic network model from partial observations, in: 32nd Conference on Neural Information Processing Systems, Montreal, Canada, 2018. [10] M. Girvan, M. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences 99 (12) (2002) 7821–7826. [11] P. Goyal, S. R. Chhetri, A. Canedo, dyngraph2vec: Capturing network dynamics using dynamic graph representation learning, in: Thirty-Third Association for the Advancement of Artificial Intelligence, AAAI, Honolulu, Hawaii, USA, 2018. [12] P. Goyal, N. Kamra, X. He, Y. Liu, Dyngem: Deep embedding method for dynamic graphs, Computing Research Repository abs/1805.11273. [13] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Twenty-Second ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, California, USA, 2016. [14] W. L. Hamilton, R. Ying, J. Leskovec, Inductive representation learning on large graphs, in: Thirty-First Conference on Neural Information Processing System, NIPS, Long Beach, California, USA, 2017. [15] W. L. Hamilton, R. Ying, J. Leskovec, Representation learning on graphs: Methods and applications, CoRR abs/1709.05584. [16] J. Howell, College football ratings index, www.jhowell.net/cf/scores/scoresindex.htm, updated Jan., 2019. [17] Y. Jia, Q. Zhang, W. Zhang, X. Wang, Communitygan: Community detection with generative adversarial nets, in: The World Wide Web Conference, ACM, San Francisco, California, USA, 2019. [18] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, A. Y. Wu., An efficient k-means clustering algorithm: Analysis and implementation, IEEE Transactions on Pattern Analysis & Machine Intelligence 7 (2002) 881–892. [19] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: International Conference on Learning Representations, ICLR, San Diego, CA, USA, 2015. [20] C.-J. Lai, J.-C. Chen, C.-H. Tsai, A systematic approach for embedding of hamiltonian cycles through a prescribed edge in locally twisted cubes, Information Sciences 289 (2014) 1–7. [21] A. Lancichinetti, S. Fortunato, Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities, Physical Review E 80 (1) (2009) 016118. [22] J. Li, C. Chen, H. Tong, H. Liu, Multi-layered network embedding, in: Twentieth International Conference on Data Mining, IEEE, San Diego, California, USA, 2018. [23] M. Li, J. Liu, P. Wu, X. Teng, Evolutionary network embedding preserving both local proximity and community structure, IEEE Transactions on Evolutionary Computation (2020) doi:10.1109/TEVC.2019.2937455. [24] Y. Li, C. Sha, X. Huang, Y. Zhang, Community detection in attributed graphs: an embedding approach, in: ThirtySecond AAAI Conference on Artificial Intelligence, AAAI, New Orleans, Louisiana, USA, 2018. [25] L. Lü, M. Medo, C. H. Yeung, Y.-C. Zhang, Z.-K. Zhang, T. Zhou, Recommender systems, Physics Reports 519 (1) (2012) 1–49. [26] X. Ma, D. Dong, Evolutionary nonnegative matrix factorization algorithms for community detection in dynamic networks, IEEE Transactions on Knowledge and Data Engineering 29 (5) (2017) 1045–1058. [27] X. Ma, L. Gao, K. Tan, Modeling disease progression using dynamics of pathway connectivity, Bioinformatics 30 (16) (2014) 2343–2350. [28] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, J.-P. Onnela, Community structure in time-dependent, multiscale, and multiplex networks, Science 328 (5980) (2010) 876–878. [29] M. E. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical review E 74 (3) (2006) 036104. 20

[30] M. E. Newman, Modularity and community structure in networks, Proc Natl Acad Sci U S A 103 (23) (2006) 8577–8582. [31] G. H. Nguyen, J. B. Lee, R. A. Rossi, N. K. Ahmed, E. Koh, S. Kim, Continuous-time dynamic network embeddings, in: Twenty-Fifty Companion of the World Web Conference, ACM, Lyon, France, 2018. [32] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk:online learning of social representations, in: Twentieth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, USA, 2014. [33] C. Pizzuti, Evolutionary computation for community detection in networks: a review, IEEE Transactions on Evolutionary Computation 22 (3) (2018) 464–483. [34] G. Rossetti, R. Cazabet, Community discovery in dynamic networks: a survey, ACM Computing Surveys 51 (2) (2018) 35. [35] R. A. Rossi, N. K. Ahmed, The network data repository with interactive graph analytics and visualization, in: Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI, Austin Texas, USA, 2015. [36] M. Sales-Pardo, R. Guimera, A. A. Moreira, L. A. N. Amaral, Extracting the hierarchical organization of complex systems, Proceedings of the National Academy of Sciences 104 (39) (2007) 15224–15229. [37] V. Sekara, A. Stopczynski, S. Lehmann, Fundamental structures of dynamic social networks, Proceedings of the National Academy of Sciences 113 (36) (2015) 9977. [38] C. Shi, B. Hu, W. X. Zhao, S. Y. Philip, Heterogeneous information network embedding for recommendation, IEEE Transactions on Knowledge and Data Engineering 31 (2) (2019) 357–370. [39] J. Tang, Y. Dong, Representation learning on networks: Theories, algorithms, and applications, in: Proceedings of The 2019 World Wide Web Conference, ACM, San Francisco, California, USA, 2019. [40] D. Wang, P. Cui, W. Zhu, Structural deep network embedding, in: Twenty-Second ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016. [41] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, S. Yang, Community preserving network embedding., in: Thirty-First AAAI Conference on Artificial Intelligence, AAAI, San Francisco, California, USA, 2017. [42] D. Whitley, A genetic algorithm tutorial, Statistics and computing 4 (2) (1994) 65–85. [43] L. Yang, X. Cao, D. He, C. Wang, X. Wang, W. Zhang, Modularity based community detection with deep learning, in: Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI, New York, USA, 2016. [44] Z. Yu, Z. Zhang, H. Chen, J. Shao, Structured subspace embedding on attributed networks, Information Sciences 512 (2020) 726–740. [45] D. Zhang, J. Yin, X. Zhu, C. Zhang, Network representation learning: A survey, IEEE Transactions on Big Data (2019) doi:10.1109/TBDATA.2018.2850013. [46] W. Zhang, Y. Du, Y. Taketoshi, Y. Yang, Deeprec: A deep neural network approach to recommendation with item embedding and weighted loss function, Information Sciences 470 (2019) 121–140. [47] Y. Zhang, T. Lyu, Y. Zhang, Cosine: community-preserving social network embedding from information diffusion cascades, in: Thirty-Second AAAI Conference on Artificial Intelligence, AAAI, New Orleans, Louisiana, USA, 2018. [48] Z. Zhang, D. Chen, Z. Wang, H. Li, L. Bai, E. Hancock, Depth-based subgraph convolutional auto-encoder for network representation learning, Pattern Recognition 90 (2019) 363–376. [49] L.-k. Zhou, Y. Yang, X. Ren, F. Wu, Y. Zhuang, Dynamic network embedding by modeling triadic closure process., in: Thirty-Second AAAI Conference on Artificial Intelligence, AAAI, New Orleans, USA, 2018. [50] D. Zhu, P. Cui, Z. Zhang, J. Pei, W. Zhu, High-order proximity preserved embedding for dynamic networks, IEEE Transactions on Knowledge and Data Engineering 30 (11) (2018) 2134–2144.

21

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Lijia Ma, Conceptualization, Methodology, Funding acquisition, Writing- Original draft preparation, Reviewing and Editing Yutao Zhang, Conceptualization, Methodology, Software, Funding acquisition, Writing- Original draft preparation Jianqiang Li: Visualization, Investigation, Writing- Reviewing and Editing Qiuzhen Lin: Investigation, Writing- Modification Qing Bao: Supervision, Validation, Writing- Reviewing and Modification Shanfeng Wang: Validation, Discussion, Writing- Modification Maoguo Gong: Supervision, Validation, Writing- Reviewing and Modification