Edge-based stochastic network model reveals structural complexity of edges

Edge-based stochastic network model reveals structural complexity of edges

Future Generation Computer Systems 100 (2019) 1073–1087 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepag...

5MB Sizes 0 Downloads 36 Views

Future Generation Computer Systems 100 (2019) 1073–1087

Contents lists available at ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Edge-based stochastic network model reveals structural complexity of edges Xuemeng Zhai a , Wanlei Zhou b , Gaolei Fei a , Cai Lu a , Sheng Wen c , Guangmin Hu a,d , a

School School c School d Center b



of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China of Software, University of Technology Sydney, Ultimo NSW 2007, Australia of Software and Electrical Engineering, Swinburne University of Technology, Hawthorn, Victoria 3122, Australia for Information Geoscience, University of Electronic Science and Technology of China, Chengdu 611731, China

highlights • • • • •

A general edge-based stochastic network model for edge research is proposed. The model preserves the basic edge structure of the original network. The model reveals the link community structure of networks. Motifs can be detected efficiently by the proposed model. The model provides an edge-based framework for network science.

article

info

Article history: Received 20 January 2019 Received in revised form 19 March 2019 Accepted 15 May 2019 Available online 17 May 2019 Keywords: Complex network Stochastic network model Link community detection Motifs

a b s t r a c t Network Science defines complex systems as objects interacting in a network with nodes and edges. Stochastic network models that treat networks as a collection of nodes with fixed degree distributions and randomly-connected edges have provided significant theoretical support for network analyses. However, the structural characteristics of edges in complex networks remain largely unknown due to the lack of edge-based network models. Here, we propose a general edge-based stochastic network model with constrained edge-degree distributions and arbitrary node-degree distributions. The random edge configuration method is used to build the model with the explicit edge-connected probability, which can also be explained by Laplacian dynamics. The model reveals both basic and complex structural characteristics of edges in networks, including statistical structural characteristics, link community structure, and higher-order organization. The experimental results show the advantageous performance on both link community and motifs detection based on the edge-based stochastic network model, which demonstrate that the model is useful for conducting quantitative comparisons the complex structural characteristics of edges. The edge-based stochastic network model is fundamental model to help understand the complex structure of edges that is hard to quantify in the complex networks. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Network science has been employed to describe complex systems in numerous fields such as physics, engineering, biology, and social sciences [1–3]. The general theories and approaches that have emerged from network science have provided guidelines and resulted in applications for analysis of the objects in the systems [4–6]. A complex system can be abstracted as a network in which nodes represent the active elements or objects of the system and the connections between nodes, denoted as ∗ Correspondence to: No.2006, Xiyuan Avenue, Hi-tech West Zone, University of Electronic Science and Technology of China, Chengdu, 611731, China. E-mail address: [email protected] (G. Hu). https://doi.org/10.1016/j.future.2019.05.047 0167-739X/© 2019 Elsevier B.V. All rights reserved.

links or edges, describe the interactions or relationships among the objects of the system [7,8], such as the reaction shared with different metabolites in a metabolic network [9] and the social relationship between two people in a social network [10]. In recent years, complex network research has mainly focused on the nodes, such as Erdos–Renyi (ER) random graphs [11], Watts–Strogatz (WS) small world networks [12], Barabasi (BR) scale-free networks [13], and the null model proposed by Newman [14]. Analyses based on the null model with arbitrary degree distributions in particular have revealed the basic node-structure characteristics of complex networks, and have been successfully applied for conducting quantitative evaluations of complex structural characteristics, including community detection [15, 16], motif assessment [17], assortativity analysis [18], epidemic

1074

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

spreading rate [19], routing efficiency [20], pattern detection [21], etc. However, because the structure of existing random network models is based on nodes with randomly-connected edges, the structural characteristics of the edges themselves are ignored. Nonetheless, while the nodes of networks, being based on actual objects, typically have quite physical meanings, the edges in many networks have entirely physical meanings as well. For example, an edge in a road network represents a physical road connecting two intersections [22]. In fact, research focused on networks treated as a collection of edges (i.e., edge-based) has been demonstrated to facilitate a direct analysis of the interactions or relationships among objects in complex systems, such as the detection and analysis of link communities [23], the classification of salient links [24] and link prediction [25]. However, edge-based research lacks a reasonably complete theoretical framework for analysis. Therefore, the development of an edge-based stochastic network model is urgently required to facilitate the analysis of complex networks from this unique and productive point of view. In this paper, we propose an edge-based stochastic network model (EBSNM) that assumes random networks with arbitrary node-degree distributions and edge-degree distributions that are constrained to match that of the original network. The final result is a model with an explicit edge-connected probability that provides new insight into the specific nature of edge-based research. The statistical structure characteristics of the edges of EBSNMs are analyzed to show that the average structure of the generated EBSNMs is nearly equivalent to that of the original network, which verifies our ability to build an edge-based analysis framework, and to predict the behavior of complex networks. Experiments conducted for link community detection and motif identification demonstrate that the proposed EBSNM can reveal the link community structure of networks and eliminate the influence of single edge in motif identification. In addition, we propose edge-based measures for evaluating link communities and network motifs that provide improved performance relative to node-based measures. Our main contributions are summarized as follows:

• We propose an edge-based stochastic network model (EBSNM) that has the same edge-degree distributions with the original network but the connection of the edges are randomized. The model preserves the basic nature of edges and can be used to quantify the complex structure of edges in complex networks. It provides a unique and productive edge-based framework for conducting network science research; • Through the analysis of the statistical characteristics, we demonstrate that the EBSNM preserves the basic edge structure of the original network and break the link community structure of the original network. This suggests that the EBSNM would be useful for predicting network evolution, and for conducting quantitative comparisons of complex structures; • Based on the EBSNM, we also propose a notable measure to quantify the quality of the link community such as the modularity in the node-based community research. With the measure, we propose a fast algorithm for link community detection. The algorithm performs better than some other methods, which demonstrate that the proposed EBSNM can reveal the link community structure of networks; • For motif detection, we use our EBSNM to quantify the significance of the motifs and prove that our EBSNM can eliminate the influence of both single edge and node in motif identification. The EBSNM is useful for conducting quantitative comparisons the complex structural characteristics of edges.

The rest of this paper is organized as follows. The related work is in Section 2. Section 3 introduces the basic conception of our edge-based stochastic network model. In Section 4, we discuss the relationship among the node-based null model and EBSNM. The brief introduction of the experiments and dataset are shown in Section 5. In Section 6, we analyze the statistical characteristics of the EBSNM. The link community detection using EBSNM is presented in Section 7. Section 8 describes the motif detection using EBSNM. Finally, conclusions and future work are given in Section 9. 2. Related work In this section, we introduce the current research of both stochastic models, link community detection and motif identifications. 2.1. Stochastic models of the complex network Since small world networks [12] and scale-free networks [13] are proposed, Newman has a great contribution on stochastic network including null model for single network [14] and stochastic blockmodels [26]. In [14], a stochastic model with the same degree distributions of the original network was proposed to predict the behavior of the real world. The model named null model in mathematics also indicates the presence of additional social structure in the network that is not captured. Based on this null model, modularity was proposed and has a great influence on node-based community detection [27]. In [28], Mahadevan has studied the high order expressions of the null model. The work in [26] also develop a tool for detecting community structure in networks as well as for generating synthetic networks for use as benchmarks. The model became a powerful tool to judge effectiveness of a community detection algorithm. In [29], the stochastic block models were used in the multilayer networks to uncover the different interaction layers from aggregate data. In [30], the strata multilayer stochastic block model were described to show the existing groups of layers. In [31], the stochastic model based on state transition theory was used to investigate the dynamics of cascading failures in communication networks. However, most of the stochastic network models regard the node as the main subject. Research always focused on the attributes of the node such as node degree and its distribution. The edges, as important as nodes, are ignored in the stochastic model research. 2.2. Link community detection and motif identification As for edge-based research, Ahn found the significance of the edge in the communities and propose the conception of link community and the algorithm of detection [23]. They reinvented the communities as groups of links that naturally incorporate overlap while revealing hierarchical organization. After link community proposed, some node-based community detection methods were used in the edge-based community research. In [32], the map equation method was extended to find link communities in network under the MDL principle. In [33], a scalable local community detection approach was proposed to unfold the communities of individual target nodes in the networks. In [34], an edge label propagation algorithm was presented to detect link communities more competitively. The algorithm combines the natural advantage of link communities with the efficiency of the label propagation algorithm. In [35], a link-based label propagation algorithm was proposed to transform node partition into link partition problem. In the research about link community detection, however, there is no effective measure that can be used to

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

quantify the quality of the link community structure just like the modularity in the node-based research. In addition, some other node-based research like motif detection [36] only considered the influence of the single node but ignore the edges, which may not catch the significant edge structure in the networks. In [37], a framework for clustering networks based on node-based motifs was proposed to provide mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. Therefore, edge-based research lacks a reasonable theoretical framework and the development of an edge-based stochastic network model is required to facilitate these analysis of network science. 3. Edge-based stochastic network model In this section, We first introduce the edge adjacency matrix and edge degree to help us understand and build the edge-based stochastic network model. Then we propose the basic conception of the edge-based stochastic network model. In addition, the random edge configuration method is given to build the model and the random walk explanation is given to get a deep understanding of the model. 3.1. The edge adjacency matrix and edge degree A number of node-based measures of network structure have been employed for network analysis [2,38]. Prominent among these are the node adjacency matrix, defined as follows: A = (aij )N ×N ,

(1)

where aij is a binary variable indicating whether node i is connected directly to node j (i.e., aij = 1 if node i is connected directly to node j, and is 0 otherwise, including the condition i = j) and N is the total number of nodes (N ×N refers to the dimension of the matrix which has N lines and N rows). The node degree ki is defined as: ki =



aij ,

(2)

j

which is a summation over all j that represents the total number of nodes in the network directly connected to node i. However, these standard network measures are generally no longer directly applicable because we consider edges as the main objects of analysis here. From the present perspective, we employ ei−j (i̸ =j) to represent the edge between nodes i and j. For undirected graphs, the adjacency matrices of both nodes and edges are symmetric, meaning that ai j = aj i and e(i−j) and e(j−i) refer to the same edge between node i and node j. Therefore, the edge adjacency matrix is defined as: EM = (mi−j,x−y )M ×M .

3.2. Edge-based stochastic network model We reconfigure the network in Fig. 1A to obtain the EBSNM shown in Fig. 1C, which represents an arbitrarily configured set of nodes, but with equivalent values of ξ (ei−j ) for each edge, as indicated by a comparison of Fig. 1B and D. We note that the arbitrarily configured set of nodes affects the original node labels and the total number of nodes, while retaining a one-toone correspondence between edges. Therefore, we do not label the nodes in the EBSNM, while the edge labels remain unchanged. This reconfiguration provides a very different structure from what would be obtained from the null model, which conserves the node labels, but arbitrarily alters the edges. As such, the EBSNM differs from the null model of a line graph. Like the configuration model in a complex network [39], the EBSNM can also be built through the edge configuration method and be explained by a random walk theory [40]. 3.3. Random edge configuration method and random walk explanation To generate the EBSNM, we introduce the random edge configuration method based on the configuration model in node-based research. The random edge configuration method is shown in algorithm 1. Algorithm 1 Random Edge Configuration method Require: The original network G; Ensure: The edge-based stochastic network model of network G (EBSNM); 1: initial Generate M corresponding edges of network G and label the node from 1 to 2M. Set the degree of each edges ξ (ei−j ) = 0; 2: repeat 3: Randomly select two non-connected edges of which edge degree is less than the corresponding original one in network G; 4: select one end node of each edge and aggregate them to one node as the common node to connect these two edges. The label of common node valued by the small one; 5: If the edges after aggregation are already in the EBSNM or the edge degree is larger than the original one, break the two edges; 6: until The edge degree of each edge is same as the one of network G; 7: return EBSNM after aggregation of all edges ; In the random edge configuration method, all edges are con∑ nected at ξ (ei−j ) times. For each edge ei−j , the number of connections is ξ (ei−j ). Therefore, the probability of there being a common node between edge ei−j and ex−y in one network is:

ξ (ei−j )×ξ (ex−y ) ∑ . [ ij ξ (ei−j )]2

(3)

The edge adjacency matrix describes the connection relationships of edges, where mi−j,x−y = 1 when ei−j and ex−y share a common node (i.e., x = i, x = j, y = i, or y = j), and mi−j,x−y = 0 otherwise. This is illustrated by the network given in Fig. 1A with its corresponding edge adjacency matrix given in Fig. 1B. ∑ From the EM, we can compute the edge degree ξ (ei−j ) = xy mi−j,x−y , which is a summation over all x and y (x̸ =y) that represents the total number of neighbors of edge ei−j (i.e., the number of edges sharing a common node). From this definition, we note that ξ (ei−j ) is a basic attribute reflecting the connection relationships of edges in a network.

1075

PEi−j,x−y =

(4)

Then we give the probability explanation based on random walk under the Laplacian dynamics [41]. First, we suppose that there is a walker on one of the edges ei−j . Then, the walker have ξ (ei−j ) ways to go to the other edges. For the edge ex−y , the walker have ξ (ex−y ) ways to walk into it. The random walk process is a Markov Process. When the process is stable, the steady-state probability distribution PEi∗−j is:

ξ (ei−j ) . 2 ij ξ (ei−j )]

PEi∗−j = ∑

[

(5)

1076

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

Fig. 1. An example network, edge adjacency matrix, and an edge-based stochastic network model (EBSNM) of the original network. (A) The original network comprises six nodes and seven edges, and (B) its corresponding edge adjacency matrix. The edges are labeled according to their two terminating nodes. (C) An EBSNM of the original network in (A) comprising eight nodes and an unchanged number of edges. Because the edge-based reconfiguration alters the node labels and the number of nodes, we do not label the nodes in the EBSNM. (D) The edge degree of each edge computed based on the edge adjacency matrix in (B). The edge degrees in both EBSNM and the original network are the same.

Therefore, the joint probability of walker walking from edge ei−j to ex−y in the model is: PEi−j,x−y = p(x − y|i − j)×PEi∗−j =

ξ (ei−j )×ξ (ex−y ) ∑ . [ ij ξ (ei−j )]2

follows: PNi−j,x−y =

(6)

The probability in the random walk process is same as the one with the random edge configuration method. The two methods are unified in the edge-based research, which verifies the correctness and validity of our method.

ξ (ei−j )− ×ξ (ex−y )− ∑ + [ ξ (e)− ]2 ξ (ei−j )− ×ξ (ex−y )+ + ξ (ei−j )+ ×ξ (ex−y )− ∑ ∑ + ξ (e)− × ξ (e)+ ξ (ei−j )+ ×ξ (ex−y )+ ∑ . [ ξ (e)+ ]2

(7)

Similarly, we can represent PEi−j,x−y as: (ξ (ei−j )− + ξ (ei−j )+ )×(ξ (ex−y )− + ξ (ex−y )+ )

.

4. Node-based null model and EBSNM

PEi−j,x−y =

In network science, the null model is a stochastic network which matches the original networks in some of its structural features, but which is otherwise taken to be an instance of a random network. Here the node-based null mode refers to the Newman’s random graphs with arbitrary node degree distributions. The node-based null model is used as a term of comparison, to verify whether the network in question displays some feature, such as community structure, or not. The null model proposed by Newman consists of a randomized version of the original network, where edges are rewired at random, under the constraint that the expected degree of each vertex matches the degree of the vertex in the original network. From the definition, we know that node-based null mode is built based on the node but edges are completely random. To compare PEi−j,x−y in Eq. (4) with the equivalent probability for node-based null model with arbitrary node-degree distributions (i.e., PNi−j,x−y ), we must define the relationship between the node degree and the edge degree. First, we define the edge degree of edge ei−j from the perspective of node i as ξ (ei−j )− = ki − 1 and the edge degree of edge ei−j from the perspective of node j as ξ (ei−j )+ = kj − 1. Therefore, ξ (ei−j ) = ξ (ei−j )− + ξ (ei−j )+ = ki + kj − 2. These terms can be employed to define PNi−j,x−y as

From a comparison of these two expressions, we note that PEi−j,x−y < PNi−j,x−y , which indicates that the uncertainty of two edges being connected in an EBSNM is greater than that in the node-based null model. Importantly, the construction of edges in the EBSNM is more random than that in node-based null model, which represents a significant difference between graphs reconstructed according to these two perspectives.

∑ [ (ξ (e)− + ξ (e)+ )]2

(8)

5. Overview of experiments and dataset To verify that the EBSNM is a productive framework for the edge-based analysis of complex networks, we performed three experiments, including statistical analyses, link community detection, and motif detection of a test group composed of seven real-world networks obtained from the Stanford Network Analysis Project (SNAP) [42], which included a road network, a location network, an Amazon network, a DBLP network, a YouTube network, an email network and a web graph. The edges in these networks are quite significant. For example, edges in the Amazon shopping network represent the relationship between customers and merchandise, and edges in the road network represent roads between two intersections.

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

1077

In addition to these real-world networks, we also applied the EBSNM to the classical network models, such as ER random graphs, scale-free networks, and small world networks, to demonstrate that our model can be applied to any network without including specific requirements. The three theoretical models are build by the Python module networkx [43] to ensure that these programmed model is consistent with the original models as follows: Erdos-Renyi (ER) random graphs: The model is proposed by Erdos P and Renyi A in 1959. Donated a set of graph Gi (n, M), where n is the number of nodes and M is the number of edges, the ER random graphs assign equal probability to all graphs with exactly M edges. Therefore, every edge occurs with the probability 1 N . A graph is chosen uniformly at random from the collection

(M )

of all graphs which have n nodes and M edges. This is the basic random model in the network science. Watts–Strogatz (WS) small world networks: The model is created because Erdos–Renyi graphs do not have two properties that are in many real-world networks. The first one is local clustering and triadic closures. The second one is the formation of hubs. The WS small world networks indicate the small-world phenomenon (popularly known as six degrees of separation) that can be highly clustered and have small characteristic path lengths. Barabasi (BR) scale-free networks: The model is widely observed in natural and human-made systems. The degree distribution of scale-free networks follows a power law. The model contains two concepts: (1) growth: number of nodes in network increases over time; (2) preferential attachment: the more connected a node is, the more likely it is to receive new edges. 6. Statistical analysis for the EBSNM In this section, we analyze the statistical characteristics of the obtained EBSNMs in terms of the network diameter, average path length, and clustering coefficient because these three network characteristics are closely related to the structure of edges [44– 46]. We first introduce the three measures. Fig. 2. The comparison results between ER random networks and their EBSNM of three theoretical network models on the statistical measures: network diameter, average path length, and clustering coefficient.

6.1. Measures The details of measures we chose are shown as follows: Network Diameter: The network diameter is the longest of all the calculated shortest paths in a network. The measure indicates the longest connected links in the network. It is the representative of the linear size of a network. With the measure, we could compare the leaner size of EBSNM with the original one based on the edges instead of nodes. Average Path Length: The average path length is the average of the shortest path between all pairs of nodes, divided by the total number of pairs. The measure shows the average number of edges it takes to get from one node to another. It is a basic measure that represents the whole size of a network with the edges. Average Clustering Coefficient: The clustering coefficient is a measure of an ‘‘all neighbors connected with each other’’. The clustering coefficient of a node is the ratio of existing links connecting a node’s neighbors to each other to the maximum possible number of such links, as follows: ci =

2ei ki (ki − 1)

,

(9)

where ei refers to the number of edges between the neighbors of node i, and ki means the number of edges that connected with node i directly. The average clustering coefficient for the entire

network is the average of the clustering coefficients of all the nodes, as follows: C =

1 ∑ N

ci =

i

1 ∑ N

i

2ei ki (ki − 1)

.

(10)

This measure is the representative of a community structure in some way. A high clustering coefficient for a network is another indication of a small world. Because the community is connected by edges, the clustering coefficient can reflect the connection tightness among edges. 6.2. Experiments We first generate theoretical models as the original networks. Then we build the EBSNM of these network models and compute three measures above in the theoretical models and there EBSNM separately. Figs. 2 to 4 show the experimental results. The size of the networks is from 100 nodes to 1000 nodes with a distance of 100 nodes. In Fig. 2, the diameter and average path length of ER networks are completely same with the one of the EBSNM. In the 100-nodes networks, the cluster coefficient of the ER network is less than the EBSNM, which may be caused by

1078

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

Fig. 3. The comparison results between WS small world networks and their EBSNM of three theoretical network models on the statistical measures: network diameter, average path length, and clustering coefficient.

the randomness of the small-scale networks. However, with the scale increasing, the cluster coefficient of the EBSNM is less than the one of the ER networks. In Fig. 3, the diameter and average path length of small world networks are little bit higher than the one of the EBSNM but they can be still considered very close. As for the cluster coefficient, all small world networks are far higher than their EBSNMs. The reason is that the small world networks themselves have strong community structure and the EBSNMs break the community structure. In Fig. 4, all of the three parameters are very close between the scale-free networks and their EBSNMs. Particularly, the cluster coefficient in most EBSNMs is less than the one in the original scale-free networks. The reason for this small difference is that the community structure is weak in the scale-free networks due to the small number of the triangles structure. Therefore, the cluster coefficient of scale-free networks is small enough so that it cannot be far smaller even if the community structure is broken. In summary, the network diameters and average path lengths of the EBSNMs are similar to all three theoretical network models. And the average clustering coefficients of the EBSNMs are smaller than those of the original networks, particularly for the small world network.

Fig. 4. The comparison results between BR scale-free networks and their EBSNM of three theoretical network models on the statistical measures: network diameter, average path length, and clustering coefficient.

The second experiment is conducted in the seven real-word networks. We set the size of the network is about 1000 edges. The comparison results are shown in Table 1. Figs. 5–7 shows the examples of EBSNM of the road network, DBLP network and Amazon network after sampling to facilitate to observation (The size of networks are reduced to about 100 edges). The parameters are computed by Gephi, am open graph visualization platform [47]. We see that the structure of EBSNM is random compared with the real-word network but the basic structure measures (Network Diameter and Average Path Length) are similar between the EBSNMs and real-world networks. As Figs. 5–7 shown, there are lots of square in the road network and a large star-subgraphs in DBLP network and unconnected star-subgraphs in Amazon network. The edge structure of them are so special that the EBSNM are built with the average value of the parameters. The results show that the network diameters and average path lengths of the EBSNMs are similar to those of the original networks, which verifies that the EBSNM preserves the basic edge structure of the original network. We also note that the average clustering coefficients of the EBSNMs are smaller than those of the original networks, particularly for the small world network. The relatively small average clustering coefficient obtained for

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

1079

Fig. 5. The example of EBSNM of the road network and their statistics measure. The size of networks is reduced to about 100 edges. (A) The original road network. (B) The EBSNM of the road network in (A).

Fig. 6. The example of EBSNM of the DBLP network and their statistics measure. The size of networks is reduced to about 200 edges. (A) The original DBLP network. (B) The EBSNM of the road network in (A).

the EBSNM indicates that the EBSNM tends to break the community structure of the original network. The results of statistical analysis show that, while the EBSNM preserves the basic edge structure of the original network, the complex edge structure of the original network is considerably simplified. This suggests that the EBSNM would be useful for predicting network evolution, and for conducting quantitative comparisons of complex structures. In addition, the simplified community structure of the EBSNM provides another possibility for detecting the link community of a network. 7. Link community detection using EBSNM The simplified community structure in the EBSNM can be employed for analyzing the link community structure quantitatively in a manner similar to the analysis of modularity adopted in

node-based research [27]. The link communities are fundamental building blocks that naturally incorporate overlap while revealing hierarchical organization. However, though the phenomenon that link community reveals is notable, the algorithm to detect the link community is still not perfect enough. We found that the algorithm based on the link similarity is not suitable for the famous network models, such as scale-free networks and small world networks. The algorithm needs to compute the link similarity of all edge pairs and combine them. Therefore, the time complexity of the algorithm is quite high as O(n3 ), where n refers to the node number. We believe the reason why link community detection is not as good as the node community detection is that there is a no theoretical stochastic model for edges like the null model for nodes which is used to build the modularity of the node community. Therefore, based on our EBSNM, we propose the link community coefficient (LCC) to quantify the quality of

1080

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

Fig. 7. The example of EBSNM of the Amazon network and their statistics measure. The size of networks is reduced to about 200 edges. (A) The original Amazon network. (B) The EBSNM of the road network in (A). Table 1 The comparison results of statistical characteristics in seven real world network. Networks

Node number

Edge number

Network diameter

Average path length

Clustering coefficient

Road network

Original network EBSNM

903 1123

1153 1153

45 29

16.36 6.26

0.094 0

Location network

Original network EBSNM

645 683

965 965

4 6

3.36 4.42

0.612 0.009

Amazon network

Original network EBSNM

1185 938

1000 1000

4 27

1.82 8.11

0 0

DBLP network

Original network EBSNM

914 926

1000 1000

15 13

6.55 5.34

0.377 0.003

YouTube network

Original network EBSNM

968 937

1000 1000

4 7

2.215 3.78

0.041 0.026

Email-Enron network

Original network EBSNM

529 342

911 911

4 6

2.87 3.27

0.434 0.054

Web-Google network

Original network EBSNM

467 526

967 967

4 6

2.59 3.56

0.66 0.034

link communities and a fast algorithm based on LCC to detect the link community. 7.1. Edge connection strength and link community coefficient Different from nodes, the two connected edges have two situations shown in Fig. 8A and B. The first one is that the two edges share with one common node and the rest of nodes are not connected. The other one is that they are not only connected by the common node but also connected by the edge of which ends are the rest of nodes of the two edges. Intuitively, the connection in the latter situation is more close than the one in former. The edge degree that cannot distinguish between the two situations is no longer suitable for the link community detection. Therefore, based on edge degree, we propose the Edge co-Strength (EoS) to describe the strength of the relationship between two connected edges. For the two connected edges, we believe that (1) the more edges that common node is connected with, the weaker connection strength they have; (2) the connection strength is stronger when the two edges are connected with another edge than the one when they are not. Therefore, we give the expression formula of EoS as follow: 1 EoSi−j,j−k = + β ajk , (11) kj

where ajk is the connection state between node j and node k, β is an parameter to adjust the influence when two edges are connected by another. By default, we give β = 1. With the EoS of two edges, we can compute the edge ∑ connection strength (ECS) of one edge, given as ECSi−j = k (EoSi−j,i−k + EoSi−j,j−k ). The ECSi−j refers to the total edge co-strength among all edges that are connected to edge ei−j . An edge with large ECS mean that it may be connected with more edges that are connected to each other. Fig. 8 shows an example for EoS and ECS. In Fig. 8A, the EoS1−3,2−3 = 13 and in Fig. 8B, the EoS1−3,2−3 = 43 , which means that edge e1−3 and e2−3 are connected more closely in (B) than they in (A). In a similar manner as is conducted for the modularity, we define the link community coefficient (LCC) in conjunction with the EoS and ECS to quantify the quality link community. The LCC = (Actual EoS of two edges in a link community - Expect EoS of two edges in a link community in the EBSNM), expression formula as follow: 1 × LCCi−j,j−k = ∑ ECS i−j ij (EoSi−j,j−k −

ECSi−j ×ECSj−k



ij ECSi−j

δ (g(ei−j ), g(ej−k )),



(12)

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

1081

Fig. 8. The two situations of the connected edges and the ECS of edges. (A) The edge e1−3 and e2−3 are connected only by node 3. (B) The edge e1−3 and e2−3 are not only connected by node 3 but also connected by the edge e1−2 of which end nodes are the uncommon nodes of e1−3 and e2−3 . (C) The ECS of the edges in (A) and (B). Only the existence of one edge e1−2 causes a large difference of the ECS between two situations.

where δ (g(ei−j ), g(ej−k )) = 1 when ei−j and ej−k are divided into the same community and δ (g(ei−j ), g(ej−k )) = 0 otherwise. With this formulation, we propose a fast algorithm for link community detection. The time complexity of the algorithm for an input of n bits is about O(n), which is much faster than a previously proposed method based on link similarity [23] that is O(n3 ), where n refers to the edge number. 7.2. Fast algorithm for link community detection In the area of big data, the scale of networks is becoming increasingly large, especially for edges. Thus, we propose a new fast algorithm for link community detection based on the link community coefficient function in large networks, which we use LCCA (Link Community Coefficient Algorithm) to represent. This work is based on the work of V. D. Blondel. The steps in the algorithm are shown in algorithm 2. The time complexity of LCCA is O(Ne ×ξ (e)max ), where Ne refers to the number of edges and ξ (e)max denotes the maximum edge degree. For large networks, ξ (e)max is far less than number of edges. Therefore, the time complexities of the algorithm can be approximated to be O(Ne ), which is much faster than a previously proposed method based on link similarity that is O(Ne2 ) 7.3. Experiments To evaluate the performance of our algorithm, we compare the LCCA with those of existing, classical algorithms of community detection. We chose two representative algorithms: Edge Label Propagation Algorithm (ELPA), which is the state-of-theart algorithms for link community detection and Link Similarity based Algorithm (LSA), which is the most original and classical algorithm.

Besides the algorithms, we also chose three measures for the comparison of the experimental results: (1) Community Quality; (2) Community Coverage; (3) Overlap Coverage [23]. The three measures are the first and most famous to measure the quality of the link community. The quality measures are based on ground truth and the measures of coverage focus on the amount of information extracted from the network. Community Quality: The measure means the accuracy of the experimental results compared with the real community, as follow: CQ =

Nright Ne

×100%,

(13)

where Nright is the number of edges that are divided into right link communities, and Ne is the edge number. Community Coverage: The measure refers to the fraction of nodes that belong to at least one community of three or more nodes. Since some algorithms may divide the one edge as a link community, which is meaningless for the large-scale networks. A size of three was chosen since it is the smallest nontrivial community. Overlap Coverage: The measure means the average number of communities that nodes belong to. This measure shows how much information is extracted from that portion of the network that the particular algorithm was able to analyze, as follow:



i NCi ×100%, (14) N where NCi is the number of communities that node i belong to and N donates to the node number. To verify the effectiveness of our LCCA, we conduct two experiments on link community detection in theoretical network models and the real-world networks. The network models contain the WS small world network and BR scale-free network in

OC =

1082

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

Algorithm 2 Link Community Coefficient Algorithm (LCCA) Require: The edge adjacency matrix of network G, EM = (mi−j,x−y )M ×M . Ensure: The results of the link communities, C = {C1 , C2 , ..., C3 }; 0 1: initial EM = EM; 2: repeat 3: Regarding each edge in EM k as a community initially. Ck = {e1−2 , e1−3 , ..., ei−j }. M donates the total number of edges in EM k ; 4: Traverse each edge ei−j to find all the edges connected with edge ei−j . Compute the LCC increment ∆LCC of each neighboring edge of edge ei−j when it is added into the community of its neighbors.; ∑ 1 5: ∆LCC( i − j, j − k) = ∑ ECS ×{ e ∈C [EoSi−j,j−k − P(ei−j , ej−k )] − P(ei−j , ej−k ) = 6: 7: 8:

9: 10:

ij



k∈Ci EoSi−j,j−k ECSi−j ×ECSj−k



ij ESi−j

[

i−j

j−k

j

− P(ei−j , ej−k )]}, where

is the expect EoS of two edges in a

Fig. 9. The comparison results of link community detection between LCCA and LSA in BR scale-free networks (20 nodes and 19 edges). (A) The results of classical LSA. The algorithm could not detect any link communities in the network above. (B) The results of LCCA. The network is divided into three link communities according to their star node. Each color of edges represents a link community.

link community in the EBSNM; Find the link community Ck of edge ei−j with the maximum ∆LCC . Add edge ei−j to community Ck ; Updating the set of communities Ck after the aggregation of Ck−1 in step 6; Regarding each link community as an new edge;The nodes within each link community can be regarded as the self-connection strength of the new edges; Here, the selfconnection strength is the sum of EoS of the common nodes within the community to which the edge belongs; The nodes between two communities can be regarded as the common nodes of the two new edges; Generating a new network EM k until ∆LCC < 0 of all nodes; return Ck ;

which both structure is simplified to see the quality of the results. The real-world networks are Amazon network, DBLP network and YouTube network with ground truth and the road network in which edges have the practical physical meanings. We first compare the LCCA and LSA in the theoretical network models. The results are shown in Figs. 9–11. From Fig. 9, we see that the LSA cannot detect the link community in the BR scalefree network model. The reason is that the stopping condition for the clustering, partition density, requires at least one pair of connected edges are connected by another edge. The partition density is: D=

2 ∑ Ne

c

mc

mc − (nc − 1) (nc − 1)(nc − 2)

,

(15)

where mc refers to the edge number in the community c and nc refers to the node number in the community c. If there is no circle in the community, the mc will be equal to nc − 1. The partition density will always be zero, which cannot be used in the singlelinkage hierarchical clustering. Therefore, the link community cannot be detected by LSA in such networks. However, our LCCA performs good while the four link communities are detected in the results. In Fig. 10, we notice that LSA does not perform well in the WS small world network. 36 communities are detected in the network model in which there are only 60 edges. Intuitively, the edges with such structure cannot be detected into the such many communities while the result of our LCCA is quite reasonable. What is more, in Fig. 9, the running time of LSA is 0.032 s and of LCCA is 0.03 s. With few edges, the time consuming is similar between two algorithms. In Fig. 10, the running time of LSA is 0.42 s and of LCCA is 0.23 s. With edges increasing, the time

Fig. 10. The comparison results of link community detection between LCCA and LSA in WS small-world networks (30 nodes and 60 edges). (A) The results of classical LSA. The edges are divided into several communities disorderly. (B) The results of LCCA. The network is divided into the proper communities according to the neighbor nodes. Each color of edges represents a link community.

consuming of LSA is much higher than the LCCA. In Fig. 11, the running time of LSA is 3203.88 s and of LCCA is 26.229 s. With large number of edges, the time consuming of LSA is too high to be accepted. Therefore, with the increasing scale of the network, the running time of our LCCA is far more less than the LSA, especially for large number of edges. The results of the comparison in real-world networks is shown in Fig. 12. The real-world networks are Amazon network, DBLP network and YouTube network with ground truth and the road network in which edges have the practical physical meanings. we compared the performance of our algorithm (LCCA), Edge Label Propagation Algorithm (ELPA) and that based on Link Similarity (LSA) according to three parameters that are commonly used to measure the quality of link community detection. The comparison results shown in Fig. 12A indicate that our algorithm performs well for most of the networks considered. In the three networks, LCCA shows the best performance on the community quality and community coverage. For the overlap coverage, all three algorithms perform differently in the three networks because the ground truth of the data is not overlapped naturally. When compared the three algorithms in the road network in which edges represent the roads, we find that our LCCA perform especially better than the other two algorithms. The virtualization of the comparison results is shown in Figs. 13–15. We see that our algorithm has a global optimization resolution on the link community detection rather than other two algorithms. This benefit is brought due to the rationality of the LCC based on the EBSNM.

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

1083

Fig. 11. The comparison results of link community detection between LCCA and LSA in dense ER random networks (100 nodes and 1470 edges). (A) The results of classical LSA. (B) The results of LCCA. Each color of edges represents a link community. Note that in dense network (far more edges than nodes), the LSA cannot detect any communities so that it divides the network into one community. The LCCA detect 20 link communities which we believe is a more acceptable result.

Particularly, our algorithm presents the notable results in the road network where edges have the physical meaning shown in Fig. 12B. We divide the network in to several blocks instead of the disorderly road links that two other algorithm performs. These results verify that the EBSNM is useful for conducting quantitative comparisons of link community structure. 8. Motifs detection using EBSNM The motifs, first proposed by R. Milo [36] are defined as patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. Here, the randomized networks Milo used are the stochastic network model with the same node degree distribution of the original network. The model has the same singlenode characteristics as does the real network: Each node in the randomized networks has the same number of edges as the corresponding node has in the real network. The comparison to this randomized ensemble accounts for patterns that appear only because of the single-node characteristics of the network (e.g., the presence of nodes with a large number of edges). Here, only node characteristics are considered but the edge structures are ignored. However, the edge is also a significant element to construct the motifs. We believe that if both node and edge characteristics are considered, it will be more reasonable to identify the significance of the motifs. Our EBSNM capture both edge and node characteristics because when we constrain the edge degree, the two end nodes are also constrained. Therefore, we change the node-based random network to the EBSNM to show the more reasonable significance of the motifs. In Milo’s work, the relative importance of a motif was determined according to its Z-score, which is defined according to the number of appearances of the motif in the original network (Norig ) and its number of appearances and the standard deviation (SD) of those appearances in the random networks (Nrand + SD), and is given as Z -score = (Norig − Nrand )/SD. The proposed Z-score is determined according to node-based random network models, which may eliminate the influence of single-node characteristics, but not edges. The edge-based random model provided by the EBSNM avoids the influence of both single nodes and edges because the model considers both edges and the nodes at both ends of each edge. Therefore, replacing node-based stochastic models with the EBSNM in the Z-score can be expected to improve the accuracy of motif detection.

Fig. 12. (A) The comparison results of link community detection in three networks with ground truth: Amazon network, DBLP network and YouTube network. The three algorithms are link community coefficient algorithm (LCCA), edge label propagation algorithm (ELPA) and link similarity algorithm (LSA). The three parameters are overlap coverage, community coverage and community quality. (B) The link community detection result of LCCA in road network. (C) The link community detection result of ELPA in road network. (D) The link community detection result of LSA in road network. The two blocks are divided into two different link communities at the connected intersection in (B). Instead, the roads are divided into a mess in (C) and (D) with meaningless communities.

Table 2 shows a comparison result. In road network, shown in Fig. 5A, the first pattern (Accounting for 76%) have the −3.03 NZ-Score and 7.64 EZ-Score. If we judge the pattern by the NZScore, it may not be a motif. However, the phenomena that the first pattern appears is caused by the edge structure but not

1084

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

Fig. 13. The virtualization of the comparison results of link community detection among LCCA, ELPA and LSA in Amazon network (1000 edges). (A) The results of LCCA. (B) The results of ELPA. (C) The results of LSA. Each color of edges represents a link community.

Fig. 14. The virtualization of the comparison results of link community detection among LCCA, ELPA and LSA in DBLP network. (A) The results of LCCA. (B) The results of ELPA. (C) The results of LSA. Each color of edges represents a link community.

Fig. 15. The virtualization of the comparison results of link community detection among LCCA, ELPA and LSA in YouTube network. (A) The results of LCCA. (B) The results of ELPA. (C) The results of LSA. Each color of edges represents a link community.

single-node characteristics because there are few nodes whose degree is larger than four in road networks (Most intersections connect with 4 or more less roads). The first pattern should be significant in the road networks which we can see directly in the visualization shown in Fig. 5A. For common sense, the road chain is an important part in the road network. Therefore, EZScore is more reasonable for those edge-based networks in which the edges have the physical meaning because the edge structure is considered by EBSNM. For those node-based network, we see that EZ-Scores expand the difference between the important patterns and the nonimportant one for most patterns. For example, in DBLP network shown in Fig. 6A, the NZ-Score of two patterns are 4.18 and −4.89 and the EZ-Score are 6.14 and −5.63. We see that for high NZ-Score, EZ-Score is higher and for low NZ-Score, the EZScore is lower. In the Email-Enron network, the NZ-score of first pattern is −3.73. However, there are lots of group email from the

leader to the staffs in the real-world email network. Therefore, the first pattern like the star network should be identified as the motif. The reason why the pattern gets a low NZ-score is that the high degree of the star node and low degree of the leaf node are both captured in the node-based stochastic model. This pattern has to appear many times in the node-based stochastic model. However, when we use EBSNM, the influence of nodes is relatively weakened because the edge structure is considered. Therefore, the significance of the pattern due to the natural node characteristics can be found through the EBSNM. The results indicate that the EZ-Scores expand the difference between the important patterns and the non-important one and eliminate the influence caused by the edge structure to find the more reasonable patterns, particularly for the road network, the DBLP network, and the Email-Enron network. The experiments demonstrate that the EBSNM is a much more efficient framework for representing the relative importance of motifs.

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087 Table 2 The comparison results of motif detection in seven real world network.

9. Conclusion and future work While prior research focusing on link communities and motifs have noted that edges play a significant role in the analysis of complex networks, the lack of efficient edge-based tools has restricted efforts toward an accurate analysis of edges. The results reported in the preceding section demonstrate that both basic and complex structural characteristics of edges in networks are revealed by the EBSNM. In the statistical analysis, both network diameters and average path lengths of the EBSNMs are similar to those of the original networks, which verifies that the EBSNM preserves the basic edge structure of the original network. This suggests that the EBSNM would be useful for predicting network evolution, and for conducting quantitative comparisons of complex structures. In addition, the relatively small average clustering coefficient obtained for the EBSNM indicates that the EBSNM tends to break the community structure of the original network. The simplified community structure of the EBSNM provides another possibility for detecting the link community of a network. For link community detection and motifs detection, the advantageous performance shown in the experimental results verify that the EBSNM is useful for conducting quantitative comparisons the complex structural characteristics of edges. This indicates that the EBSNM is fundamental model to help understand the complex structure of edges that is hard to quantify in the complex networks. In a more general sense, the EBSNM is a general stochastic model for any complex system such as the social networks utilized above. We developed the EBSNM using the basic configuration method based on the edge degree. The rationality of the EBSNM can also be explained by the traditional randomwalk theory. The model can be built without any other additional conditions just according to the basic edge degree and can be

1085

used in directed networks based on in-and-out edge degree. The community detection methods and motifs detection methods we proposed could have a widely used in real life in lots of area, such as social science, internet topology, bioscience, engineering, and etc. For example, link community detection may help understanding the relationship of people in social network [48], the relationship among the AS-route in the AS-topology network [49] and relationship among the protein in the metabolic network [50]. The motifs can help extract and analyze the core substructure of the networks, such as traffic network [37]. The applications of our model is for the general complex network theory, it could be used in lots of area with or without the improvement. With the development of edge-based research, we believe that many other complex edge properties can be revealed through the model, such us salient links, propagation-rate of edges, edge-degree correlations and synchronization-state stability for edges, which have already been shown to be important in node-based research. Our future work is based on such extensions of our EBSNM, which may lead to some problems involving the applications of all edge-based research. Finally, our EBSNM provides a unique and productive edgebased framework for conducting network science research. The model preserves the basic structure of edges in the networks based on the edge degree distribution. Therefore, some complex edge properties of the networks that are not captured by the model can be revealed through comparison with the EBSNM. With the appropriate algorithms, the model could be effective dealing with the problem in complex network, such as link community detection. We believe that the EBSNM can give rise to much stronger and more general applications in many edge-based areas, including social science, Internet topology, bioscience, engineering, economics, and education. To accomplish this, much more work needs to be done to gain a deeper understanding of the model, such as high order edge degree distribution [28] and a determination of the edge-degree distribution law. We hope that many more attributes of edges can be modeled and analyzed through the edge-based stochastic network models. Acknowledgments This work was supported by National Natural Science Foundation of China (No. 61471101 & No. 61301274 & No. 61571094). The data sets used to obtain the results in this manuscript are available at http://snap.stanford.edu/data. Conflict of interest statement None. Declaration of confliction interests The authors declare that they had no conflicts of interest with respect to their authorship or the publication of this article. References [1] S.V. Buldyrev, R. Parshani, G. Paul, H.E. Stanley, S. Havlin, Catastrophic cascade of failures in interdependent networks, Nature 464 (7291) (2010) 1025. [2] P.J. Mucha, T. Richardson, K. Macon, M.A. Porter, J.-P. Onnela, Community structure in time-dependent, multiscale, and multiplex networks, Science 328 (5980) (2010) 876–878. [3] F. Morone, H.A. Makse, Influence maximization in complex networks through optimal percolation, Nature 524 (7563) (2015) 65. [4] S.G.A. Brito, L.R.D. Silva, C. Tsallis, Role of dimensionality in complex networks: Connection with nonextensive statistics, Physics 6 (2015).

1086

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087

[5] D. Leitold, Á. Vathy-Fogarassy, J. Abonyi, Controllability and observability in complex networks–the effect of connection types, Sci. Rep. 7 (1) (2017) 151. [6] D. Li, X. Wang, P. Huang, A fractal growth model: Exploring the connection pattern of hubs in complex networks, Physica A 471 (2017) 200–211. [7] K. Börner, S. Sanyal, A. Vespignani, Network science, Annu. Rev. Inf. Sci. Technol. 41 (1) (2007) 537–607. [8] M.E. Newman, The structure and function of complex networks, SIAM Rev. 45 (2) (2003) 167–256. [9] A.M. Feist, C.S. Henry, J.L. Reed, M. Krummenacker, A.R. Joyce, P.D. Karp, L.J. Broadbelt, V. Hatzimanikatis, B.Ø. Palsson, A genome-scale metabolic reconstruction for escherichia coli k-12 mg1655 that accounts for 1260 orfs and thermodynamic information, Mol. Syst. Biol. 3 (1) (2007) 121. [10] D.M. Boyd, N.B. Ellison, Social network sites: Definition, history, and scholarship, J. Comput.-Mediat. Commun. 13 (1) (2007) 210–230. [11] P. Erdös, A. Rényi, On random graphs, i, Publ. Math. 6 (1959) 290–297, (Debrecen). [12] D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393 (6684) (1998) 440. [13] A.-L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (5439) (1999) 509–512. [14] M.E. Newman, S.H. Strogatz, D.J. Watts, Random graphs with arbitrary degree distributions and their applications, Phys. Rev. E 64 (2) (2001) 026118. [15] M.E. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113. [16] X. Zhai, W. Zhou, G. Fei, W. Liu, Z. Xu, C. Jiao, C. Lu, G. Hu, Null model and community structure in multiplex networks, Sci. Rep. 8 (1) (2018) 3245. [17] W.E. Schlauch, K.A. Zweig, Influence of the null-model on motif detection, in: Advances in Social Networks Analysis and Mining (ASONAM), 2015 IEEE/ACM International Conference on, IEEE, 2015, pp. 514–519. [18] J.G. Foster, D.V. Foster, P. Grassberger, M. Paczuski, Edge direction and the structure of networks, Proc. Natl. Acad. Sci. 107 (24) (2010) 10815–10820. [19] E. Estrada, S. Meloni, M. Sheerin, Y. Moreno, Epidemic spreading in random rectangular networks, Phys. Rev. E 94 (5) (2016) 052316. [20] X. Nian, H. Fu, Efficient routing on two layer degree-coupled networks, Physica A 410 (2014) 421–427. [21] W. Ulrich, N.J. Gotelli, Pattern detection in null model analysis, Oikos 122 (1) (2013) 2–18. [22] G.M. Coclite, M. Garavello, B. Piccoli, Traffic flow on a road network, SIAM J. Math. Anal. 36 (6) (2005) 1862–1886. [23] Y.-Y. Ahn, J.P. Bagrow, S. Lehmann, Link communities reveal multiscale complexity in networks, Nature 466 (7307) (2010) 761. [24] D. Grady, C. Thiemann, D. Brockmann, Robust classification of salient links in complex networks, Nat. Commun. 3 (2012) 864. [25] A. Clauset, C. Moore, M.E. Newman, Hierarchical structure and the prediction of missing links in networks, Nature 453 (7191) (2008) 98. [26] B. Karrer, M.E. Newman, Stochastic blockmodels and community structure in networks, Phys. Rev. E 83 (1) (2011) 016107. [27] M.E. Newman, Modularity and community structure in networks, Proc. Natl. Acad. Sci. 103 (23) (2006) 8577–8582. [28] P. Mahadevan, D. Krioukov, K. Fall, A. Vahdat, Systematic topology analysis and generation using degree correlations, in: ACM SIGCOMM Computer Communication Review, Vol. 36, ACM, 2006, pp. 135–146. [29] T. Valles-Catala, F.A. Massucci, R. Guimera, M. Sales-Pardo, Multilayer stochastic block models reveal the multilayer structure of complex networks, Phys. Rev. X 6 (1) (2016) 011036. [30] N. Stanley, S. Shai, D. Taylor, P.J. Mucha, Clustering network layers with the strata multilayer stochastic block model, IEEE Trans. Netw. Sci. Eng. 3 (2) (2016) 95–105. [31] W. Ren, J. Wu, X. Zhang, R. Lai, L. Chen, A stochastic model of cascading failure dynamics in communication networks, IEEE Trans. Circuits Syst. II Exp. Briefs 65 (5) (2018) 632–636. [32] Y. Kim, H. Jeong, Map equation for link communities, Phys. Rev. E 84 (2) (2011) 026110. [33] P. Liakos, A. Ntoulas, A. Delis, Scalable link community detection: A local dispersion-aware approach, in: 2016 IEEE International Conference on Big Data (Big Data), IEEE, 2016, pp. 716–725. [34] W. Liu, X. Jiang, M. Pellegrini, X. Wang, Discovering communities in complex networks by edge label propagation, Sci. Rep. 6 (2016) 22470. [35] H. Sun, J. Liu, J. Huang, G. Wang, X. Jia, Q. Song, Linklpa: A link-based label propagation algorithm for overlapping community detection in networks, Comput. Intell. 33 (2) (2017) 308–331.

[36] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon, Network motifs: simple building blocks of complex networks, Science 298 (5594) (2002) 824–827. [37] A.R. Benson, D.F. Gleich, J. Leskovec, Higher-order organization of complex networks, Science 353 (6295) (2016) 163–166. [38] A.-L. Barabási, Network Science, Cambridge university press, 2016. [39] M. Molloy, B. Reed, A critical point for random graphs with a given degree sequence, Random Structures Algorithms 6 (2–3) (1995) 161–180. [40] J.D. Noh, H. Rieger, Random walks on complex networks, Phys. Rev. Lett. 92 (11) (2004) 118701. [41] R. Lambiotte, J.C. Delvenne, M. Barahona, Laplacian dynamics and multiscale modular structure in networks, Physics (2008). [42] J. Leskovec, A. Krevl, {SNAP Datasets}:{Stanford} Large Network Dataset Collection, 2015. [43] A. Hagberg, P. Swart, D. S.C.hult, Exploring network structure, dynamics, and function using networkx, Tech. rep., Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2008). [44] J. Xu, A. Kumar, X. Yu, On the fundamental tradeoffs between routing table size and network diameter in peer-to-peer networks, IEEE J. Sel. Areas Commun. 22 (1) (2004) 151–163. [45] A. Fronczak, P. Fronczak, J.A. Hołyst, Average path length in random networks, Phys. Rev. E 70 (5) (2004) 056110. [46] J. Wang, M. Li, H. Wang, Y. Pan, Identification of essential proteins based on edge clustering coefficient, IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9 (4) (2012) 1070–1080. [47] M. Bastian, S. Heymann, M. Jacomy, Gephi: an open source software for exploring and manipulating networks, in: Third International AAAI Conference on Weblogs and Social Media, 2009. [48] B. Wellman, The network community: An introduction, in: Networks in the Global Village, Routledge, 2018, pp. 1–47. [49] H. Hu, W. Liu, G. Fei, S. Yang, G. Hu, A novel method for router-to-as mapping based on graph community discovery, Information 10 (3) (2019) 87. [50] R. Baldick, B. Chowdhury, I. Dobson, Z. Dong, B. Gou, D. Hawkins, H. Huang, M. Joung, D. Kirschen, F. Li, et al., Initial review of methods for cascading failure analysis in electric power transmission systems ieee pes cams task force on understanding, prediction, mitigation and restoration of cascading failures, in: Power and Energy Society General Meeting-Conversion and Delivery of Electrical Energy in the 21st Century, 2008 IEEE, IEEE, 2008, pp. 1–8.

Xuemeng Zhai was born in 1991. Having got the bachelors degree from University of Electronic Science and technology of China (UESTC) in 2014, Mr. Zhai is previously pursuing the Ph.D. in the School of Communication and Information Engineering (SCIE), UESTC, and his research interest lies in network science, complex network, community detection, and stochastic network models.

Wanlei Zhou received the B.Eng and M.Eng degrees from Harbin Institute of Technology, Harbin, China in 1982 and 1984, respectively, and the Ph.D. degree from The Australian National University, Canberra, Australia, in 1991, all in Computer Science and Engineering. He also received a D.Sc. degree from Deakin University in 2002. He is currently the Head of School of software in University of Technology Sydney, Australia. He was an Alfred Deakin Professor and Chair of Information Technology in Deakin University. Professor Zhou has published more than 300 papers in refereed international journals and refereed international conferences proceedings. Prof Zhou’s research interests include distributed systems, network security, and privacy preserving. Prof. Wanlei has chaired many international conferences and has been invited to deliver keynote address in many international conferences. He is a Senior Member of the IEEE.

X. Zhai, W. Zhou, G. Fei et al. / Future Generation Computer Systems 100 (2019) 1073–1087 Gaolei Fei was born in Zhejiang Province, PR China in 1982. He received his B.S. and Ph.D. degrees in the School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC), Sichuan Province, PR China in 2006 and 2012, respectively. From 2010 to 2011, he was a graduate research trainee at McGill University. He is now an association professor in the School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC), PR China. His main interests include network topology measurement, social network analysis and social media text mining. Cai Lu was born in Sichuan Province, PR China in 1975. He received his Ph.D. degree in the School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC), Sichuan Province, PR China in 2007. From 2014 to 2015, he was a visiting scholar at Purdue University. He is now an association professor in the School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC), PR China. His main interests include signal processing, visualization and visual analysis.

1087

Sheng Wen received Ph.D. degree from Deakin University, Melbourne, in October 2014. He has been working full-time as a senior lecturer in Swinburne University of Technology from Oct. 2017. Before that, he served as a research fellow, and then become a Lecturer in Computer Science in the School of Information Technology in Deakin University. Dr. Wen’s research interests include system security, and social media analysis.

Guangmin Hu was born in Sichuan Province, PR China in 1966. He received his B.S. degree in the Department of Computer Science from Nanjing University, China, in 1986, and his M.S. and Ph.D. degrees from Chengdu University of Technology, Sichuan, PR China, in 1992 and 2000, respectively. From 2000 to 2003, he was a postdoctor in the SCIE, UESTC. From 2002 to 2003, he was a visiting scholar at Hong Kong Polytechnic University. He is now a Professor of the SICE, UESTC. His current research interests include computer network and signal processing.