Leaders in communities of real-world networks

Accepted Manuscript Leaders in communities of real-world networks Jing-Cheng Fu, Jian-Liang Wu, Chuan-Jian Liu, Jin Xu PII: DOI: Reference: S0378-437...

Download PDF

3MB Sizes 3 Downloads 100 Views

Report

Full Text

Accepted Manuscript Leaders in communities of real-world networks Jing-Cheng Fu, Jian-Liang Wu, Chuan-Jian Liu, Jin Xu PII: DOI: Reference:

S0378-4371(15)00834-1 http://dx.doi.org/10.1016/j.physa.2015.09.091 PHYSA 16470

To appear in:

Physica A

Received date: 21 December 2014 Revised date: 11 August 2015 Please cite this article as: J.-C. Fu, J.-L. Wu, C.-J. Liu, J. Xu, Leaders in communities of real-world networks, Physica A (2015), http://dx.doi.org/10.1016/j.physa.2015.09.091 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript Click here to view linked References

Leaders in Communities of Real-World Networks

∗

Jing-Cheng Fu1 , Jian-Liang Wu1†, Chuan-Jian Liu1 , Jin Xu1 1

School of Mathematics, Shandong University, Jinan, 250100, China

Abstract Community structures have important inﬂuence on the properties and dynamic characteristics of the complex networks. However, to the best of our knowledge, there are not much attention given to investigating the internal structure of communities in the literature. In this paper, we study community structures of more than twenty existing networks using ten commonly used community-detecting methods, and discovery that most communities have several leaders whose degrees are particularly large. We use statistical parameter, variance, to classify the communities as leader communities and self-organized communities. In a leader community, we deﬁned the nodes with largest 10% degree as its leaders. In our experiences, when removing the leaders, on average community’s internal edges are reduced by more than 40% and inter-communities edges are reduced by more than 20%. In addition, community’s average clustering coeﬃcient decrease. These facts suggest that the leaders play an important role in keeping communities denser and more clustered, and it is the leaders that are more likely to link to other communities. Moreover, similar results for several random networks are obtained, and a theoretical lower bound of the lost internal edges is given. Our study shed the light on the further understanding and application of the internal community structure in complex networks. Keywords: small-world network, community detecting, leader community, clustering coeﬃcient.

1

Introduction

Network science has witnessed many developments in the last decades because many real-world networks were found to have a variety of topological structures [1, 5, 9]. The small-world network model of Watts and Strogatz (WS) [12] and scale-free network’s [4] emergence makes complex network unprecedented popular in the last 30 years. Data analysis has revealed that the scale-free property is found in social [8, 11], biological [3, 6, 7], technological [2, 10], and many other types of networks. Some authors also ﬁnd that the sizes of ∗ This work is supported by the National Nature Science Foundation of China(11271006), Independent Innovation Foundation of Shandong University(IFYT 14013), Shandong Provincial Natural Science Foundation(ZR2012GQ002), Shandong Provincial Natural Science Foundation(ZR2014AQ001). † Corresponding author. E-mail address: [email protected].

1

communities detected by maximizing modularity also obey the power law [13, 17]. In [9], authors studied four categories of networks, including social networks, information networks, technological networks, and biological networks. Since the number of academic papers with coauthors exponentially increases, scientiﬁc collaboration networks have been an important subject of study. Many early studies have shown that one important property of complex networks is community structure [14, 15], within which vertices are more highly connected to each other than to the rest of the network. The mainstream of these studies includes two directions: one is to study the structure and dynamic characteristics of communities; the other is to study the community detection algorithm. Few researchers focus on the internal structure of community and the correlation between diﬀerent communities. In a recent work of the third author [16], the internal structure of communities has been studied in details, and coexisting of two organization modes within communities in real-world networks have been investigated. One is the leader community within which some nodes are associated with higher degrees and the others are associated with lower degrees; the other one is the self-organizing community within which all the nodes are associated with similar degrees. They distinguish the two kinds of communities by the variance of the degrees of nodes in each community. It is observed that the two kinds of communities exist widely in social and technical networks, and they play diﬀerent but important roles in these systems. For example, in social networks, sometimes people work around several leaders and in the other time they work equally. The individuals in the leader community do not have equal status in contrast with the individuals in the self-organizing community. It opens up a new research realm, in which when detecting community we should consider the internal structure of community and interaction between diﬀerent communities. The previous algorithms did not consider these aspects. In real life, the sizes of self-organized communities are typically small. It is the leaders that connect almost all the units in a community. Nodes in diﬀerent communities communicate with each other commonly through the leaders. Therefore, leader communities are more common in our society. Moreover, we should not only divide large networks into communities, but also identify each kind of communities. In order to ﬁnd leader community, we should understand the role of leaders in community in the topological aspect. In this paper, we detect community structure of more than 20 real-world networks using 10 commonly used methods ﬁrst. We ﬁnd that most communities have some leaders of which the degrees are especially large. For a community, we deﬁned the nodes with 10% largest degree as its leaders. Here we take ⌈|Vi |/10⌉ as the number of leaders in a community of size |Vi | in this paper. When we remove the leaders, each community’s internal edges are removed by more than 40% and inter-communities edges are removed by more than 20% on average. Also, each community’s average clustering coeﬃcient will decrease. It means that the leaders play an important role in keeping communities closely and more clustered. And it is the

2

leaders that are more likely to link to other communities as well. For a community in a citation network, if there are excellent papers, then the topic will attract more researchers, and the citation ties will be denser. Moreover, the leaders are more likely to cite or be cited by papers in other topics, because the leaders are often instrumental or introducing new techniques from other topics. Also, after we remove the leaders in communities, the sets of nodes with high betweenness do not maintain unchanged. It is to say, in the social network, if leaders are removed, the group of people with high betweenness centrality will be changed. The work in the paper may shed light on the understanding of the community internal structure in real-world networks, and therefore help to ﬁnd better community detecting methods. The paper is organized as follows: In Section 2, we brieﬂy review Liu’s work [16] in which two kinds of community in real-world networks have been introduced. According to our analysis, the leader community is more ubiquitous in the real-world networks and so we will concentrate on this kind of community mainly. In Section 3, we analysis role of leaders in community of a special network in details. The network is Newman’s scientiﬁc collaboration network [30] and we divide it into communities using the BGLL method [20]. Then we observe the changes of parameters of the communities after removing some nodes from each community, including the internal edges number of community, the average clustering coeﬃcient of each community, the inter-communities edges number and the betweeenness of nodes in a community. The similar results in random networks are demonstrated in Section 4, along with a theoretical lower boundary of the percentage of internal edges lost after leaders are removed. In Section 5, we show the results of 20 other networks detecting by 10 algorithms. In Section 6, we present our conclusions and discussion.

2

Leader Community and Self-Organized Community

A community in a scientiﬁc collaboration network just like a research group in a study ﬁeld [30]. In a research group, there are some popular scientists who people prefers to collaborate and some are not so popular. So the communities in the scientiﬁc collaboration network have some hub nodes as leaders, we call these communities "leader communities". In the work of Liu [16], communities were distinguished by nodes degrees’ variances, and leader community has larger variance while self-organized community has smaller variance. Let C be a community, and NC (1, 2, · · · , k ∈ NC ) be the nodes set in C. The degree of node i is di . Therefore, the expectation of the degree can be computed as Ex(C) =

∑k

i=1

k

di

,

(1)

and the variance of the nodes in community C is V ar(C) = Ex(C 2 ) − Ex2 (C). 3

(2)

In this paper, if V ar(C) > 1, then C is a leader community. Otherwise, C is a self-organised community.

Fig. 1. (Color online) The communities structure of the Karate-club network and the Football network detected by Newman fast method. Diﬀerent colors represent diﬀerent communities. Like the communities in Figure 1 (a), there are one or two leaders which link to almost all the nodes in their communities. So both the communities of Karate-club network are leader communities. But in Figure 1 (b), nodes in each of the communities have similar degrees which suggests that all the teams in each community are equal in status. So they are self-organized communities. The Football network is generated in accordance with game rules, so the community structure is diﬀerent from the natural networks. The criterion of V ar(C) is not perfect because there may be a community which has one K9 (clique) and two or three 2-degree nodes but has a high V ar(C) value. However, this kind of community rarely exists in real life. We still use this metrics to distinguish communities. Most of the communities in Table 1 are leader communities, in which leaders link to almost all the nodes in their communities. Furthermore, the sizes of the self-organized communities are usually small. We will discuss the role of leaders in communities next.

3

Leaders in Communities of Several Real-World Networks

We deﬁne a network as a graph G(V, E), where V is the set of vertices and E is the set of edges. The numbers of vertices and edges of G(V, E) are denoted by |N (G)| and |E(G)|. We denote G(V, E) as G for simplicity. Our ﬁrst network is the largest connected component of a scientiﬁc collaboration network, where |N (G)| = 13861 and |E(G)| = 89238. In this network, if A and B have written at least one paper together, there is an edge linking two nodes. We apply the BGLL method ﬁrst, and obtain 57 communities. The size

of each community is given in Figure 2, in which we label the communities in descent order of their sizes. Also, we ﬁnd that the number of small communities is higher than that of large communities. 4

Network Karate-club Football Dolphins Celegans Netscience coauthor Power grid Arnet citation Internet data GN benchmark LFR benchmark

Nodes 34 115 62 296 379 4941 39357 22963 124 150

Edges 78 616 159 2359 914 6594 112832 48436 1076 902

Communities 2|2 10| 9 3|4 5|7 22 | 18 36 | 37 219|182 56 | 27 4|4 5|5

Leader community 2|2 0|0 2|3 4|6 22 | 18 36 | 37 216 | 171 56 | 27 0|0 5|5

Self-organizing community 0|0 10| 9 1|1 1|1 0|0 0|0 3 |11 0|0 4|4 0|0

Table 1: In this table, the community structures of ten networks are investigated. The two numbers in each of the last three columns represent the numbers of communities that are detected by the Newman fast method and the BGLL algorithm, respectively. The Netscience coauthor network and HET-coauthor network have many components, here we choose the largest component. The community structure of the GN benchmark and the LFR benchmark is completely diﬀerent from each other. The parameters used in GN and LFR benchmarks are default.

Fig. 2. (Color online) Sizes of the communities. The horizontal axis represents a label of the community and the vertical axis is the size of each community. We calculate the degree variance of each community and note that all communities’ variances are larger than 15 except one (which still larger than one). According to the deﬁnition, all communities are leader communities. In a community, the nodes with the top 10% largest degrees will be deﬁned as leaders.

5

Fig. 3. (Color online) Degree variance of each community. Then we test the leaders’ roles in communities by comparing the same community after removed some nodes of large degree with the original network. First we observe the change of internal edges number: removing 10%, 20% and 30% largest degree nodes of each community in order, we record the percent loss of internal edges. For the case of 10%, if the size of a community is less than 10, we just remove the node of largest degree. And then we observe the change of community’s average clustering coeﬃcient. Here the average clustering coeﬃcient is deﬁned as: C=

3Nta , Ntp

(3)

where Nta is the number of triangles in the community and Ntp is the number of triples in the community.

6

Fig. 4. (Color online) In most communities, the numbers of internal edges are reduced by more than 40% when 10% largest degree nodes are removed. In some communities, the number of internal edges were even removed by 60%. When removing 20% and 30% largest degree nodes, the percentages of internal edges can be reduced by up to 65% and 75%. Those numbers are independent of the sizes of communities. As expected, most communities have small numbers of larger degree nodes. Data in Figure 4 indicates that the leaders in a community can make the community tighten. Observing the curve’ ﬂuctuation, we ﬁnd that some communities have this property more severe than others. Next we choose some representative communities and investigate its structures.

7

Fig. 5. (Color online) Figure (a) is the 8th community. It has 379 nodes. (b) is the result of 40 nodes removed. In this case, more than 60% of internal edges are removed. We show a typical leader community in Figure 5. It is clear that removing of the leaders of the community leads to the emergence of a multitude of isolated nodes. This example only demonstrates that 10% of the nodes removed, let alone the cases of 20% and 30%. In Figure 6, we show a community of which the degrees of nodes are not so diﬀerent, therefore the leaders are not so clear.

Fig. 6. (Color online) Figure (a) is the 34th community. It has 186 nodes. (b) is the result of 19 nodes removed. In this case, more than 45% of internal edges are removed. The 8th community’s variance is 133.358191603 while the 34th community’s variance is 68.0050872933. We choose those two communities because their sizes are not too small. They are both leader communities, because based on the previous work that leader community has a larger variance while self-organized community has a smaller variance. We conclude that most communities are leader communities, and the leaders in the communities make the communities denser. Next we compare the average clustering coeﬃcients of the communities in which some nodes are removed with those of the original communities. According to Figure 7, leaders in a community enhance the average clustering coeﬃcient. When we remove leaders, all the communities’ clustering coeﬃcients decrease, and the decreased scopes are diﬀerent. We choose some communities as examples to observe their changes of the average clustering coeﬃcients in Figure 8.

8

Fig. 7. (Color online) The upper three charts are the average clustering coeﬃcient of each community. The blue line is the communities’ average clustering coeﬃcients before nodes removed while the red line is those after nodes removed. The charts at the bottom are the communities’ average clustering coeﬃcient diﬀerence between two states, obviously, the average clustering coeﬃcients of original communities are larger.

9

Fig. 8. (Color online) Figure (a) is the 13th community while Figure (c) is the 51th community. The 13th community has 347 nodes and the 51th community has 106 nodes. (b) is the result of 35 nodes removed and (d) is the result of 11 nodes removed. Both communities’ clustering coeﬃcients decrease by about 0.25. It is clear that leaders in communities make communities more clustered. It is to say, leaders in a research group make the group more collective. Next, we will look at the example whose average clustering coeﬃcient is decreased least after the removing of leaders in Figure 9, its label is 40.

Fig. 9. (Color online) (a) is the 40th community whose average clustering coeﬃcient is decreased by 0.0075 when 10% largest degree nodes are removed. 10

(a)

(b)

Fig. 10. (Color online) Figure (a) is the Dolphins network and (b) is the Zachary’s karate club network. Diﬀerent colors of the nodes represent diﬀerent communities.

We can see in Figure 9 (a), the leaders associate with many triangles but far less than possible triples. Moreover, there are a lot of triangles which are not destroyed. So the average clustering coeﬃcient of the community is barely changed. We study the role of leaders between communities and observed that a very large proportion of edges between communities link to leaders. Like the edges in bold in the Zachary’s karate club network [18] and in the dolphins [19] network. Each of the bold edges in Figure 10 has at least one node which is a leader in its community. Note that in (b), the bold edges count 70% of the edges between the two communities. It is to say, the links between communities are dependent on the leaders to some extent. In the scientiﬁc collaboration network, it means that, if a scientist want to collaborate with a scientist who is in a diﬀerent group, it prefers the leader scientists of that research group. Or it is more commonly that excellent scientists prefer to collaborate with scientist who is in diﬀerent research groups.

11

Fig. 11. (Color online) Percentage of inter-communities edges which are removed in each community. From Figure 11, we see that the communities’ inter-communities edges are reduced by more than 40% after 10% largest degree nodes of each communities are removed. To show the important role of leaders more clearly, we remove the edges between communities which linked no leaders (the largest 10% degree nodes of each community) and shrink each community to a node. The result network is called contractible graph of the network.

Figure. 12. (Color online) The contractible graph of the Newman’s scientiﬁc collaboration network in which one node represent a community of the network. The size of the node is in proportion to the size of the community. 12

In Figure 12, there are 57 nodes, and the graph is connected. Consider its speciﬁcity, the contractible graph is a good description of the network. The average of clustering coeﬃcients of the graph is 0.618, and the average of shortest path lengths is 1.5645363408521302. Obviously, it is a small-world network.

Fig. 13. (Color online) In this panel, we choose four communities of which the sizes are rather large. The 1th and the 14th communities have high degree variances and their size are 613 and 344 respectively. On the contrary, the other two communities in this panel have small degree variances and their sizes are 338 and 295 respectively. The red points represent the betweenness of nodes in each community before the leaders are removed, whereas the blue points are those in each community after the leaders are removed. We ﬁnd that after the leaders are removed, the averages of the nodes betweenness decline. More importantly, the sets of nodes with high betweenness in the leaders-removed communities do not maintain unchanged. We choose these four communities because the sizes of them are rather large, and two of them have high degree variances while the other two have small degree variances.

4

Leaders in Communities of Random Networks

Communities not only exist in real-world networks, but also exist in random networks. We use CNM method to divide three kinds of random networks into communities, they are ER networks, WS networks and BA networks. The deﬁnition of leader is the same with that in real-world networks. After we remove leaders of each community, we ﬁnd that not all the models have the properties mentioned above. For the two ER 13

networks, all the percentages of removed edges inside communities and the percentages of removed edges intercommunities are extremely low, whereas, the average clustering coeﬃcients of communities decrease after the leaders have been removed. For the two WS networks, the average clustering coeﬃcients of communities are always 0, because there is merely triangle in them. The percentages of removed edges inter-communities are more than 20% for all the communities, and most of the percentages of removed edges inside communities are more than 32.8% (marked with a horizontal line). The latter number will be proved to be an lower boundary of this kind of percentage. From panels c, f and i in Figure 14, we can see that the properties in BA networks are similar to that in real-world networks.

Fig. 14. (Color online) As marking in the panels, we produce 2 ER networks, 2 WS networks and 2 BA networks. Each of them have 1000 nodes. For the ER networks we set p to be 0.3, and for WS networks k = 5, p = 0.3, where k is the number of neighbours in each side of a node in the initial regular networks. For BA networks m = m0 = 5. We will give a theoretical lower boundary of the percentages of removed edges inside communities. If the size of a community is about 100, then the boundary is 32.8%. We regard a scale-free network as a community and it has n nodes. We assume its degree distribution is

14

P (k) = Ak −3 . Then the distribution function is F (x) = A

x ∑

t−3 .

(4)

(5)

t=1

Then A=

n−1 ∑ 1 1 = ∑n−1 =( t−3 )−1 . −3 F (n − 1) t t=1 t=1

(6)

The order statistic is deﬁned as below: suppose x1 , x2 , ..., xn are samples from X, then x(k) is the k th order statistic of those samples and its value is equal to the k th sample after we ranking them. Here we choose n samples from the distribution of equation (4). The probability density function of the k th order statistic which is denoted by Pk (x) is deﬁned as below: Pk (x) =

n! (F (x))k−1 (1 − F (x))n−k P (x). (k − 1)!(n − k)!

(7)

So the expected degree of the k th nodes is Dk =

n−1 ∑ x=1

n! (F (x))k−1 (1 − F (x))n−k P (x)x. (k − 1)!(n − k)!

(8)

The leaders will be the last 10%n nodes after we rank those nodes. If we remove the leaders, the edges of the community will be decreased by at least Er =

1 2

n ∑

1 2

Dk =

k=0.9n+1

n ∑

n−1 ∑

k=0.9n+1 x=1

n! (F (x))k−1 (1 − F (x))n−k P (x)x. (k − 1)!(n − k)!

(9)

The total number of edges will be n

Et =

1∑ 1 Dk = 2 2 k=1

n ∑

n−1 ∑

k=0.9n+1 x=1

n! (F (x))k−1 (1 − F (x))n−k P (x)x. (k − 1)!(n − k)!

So, r, the percentage of the edges removed will be larger than r0 , where ∑n ∑n ∑n−1 ∑x −3 k−1 ∑x 1 n! ) (1 − A t=1 t−3 )n−k Ax−3 x k=0.9n+1 2 k=0.9n+1 x=1 (k−1)!(n−k)! (A t=1 t r0 = ∑n ∑n ∑n−1 1 n! k−1 (1 − F (x))n−k Ax−3 x k=1 Dk = 2 k=0.9n+1 x=1 (k−1)!(n−k)! (F (x)) ∑ ∑x ∑n ∑ ∑ x n n−1 1 n! −3 k−1 ) (1 − A t=1 t−3 )n−k x−2 t=1 t k=0.9n+1 2 k=0.9n+1 x=1 (k−1)!(n−k)! (A = ∑n ∑n ∑n−1 1 n! k−1 (1 − F (x))n−k x−2 k=1 Dk = 2 k=0.9n+1 x=1 (k−1)!(n−k)! (F (x))

(10)

(11)

We set n = 100, then r0 ≈ 0.328, which means that the percentages of edges lost when the leaders of communities are removed are more than 32.8%.

15

5

Leaders in Communities of Other Networks

In this section we provide 20 real-world networks to show the roles of leaders in communities and 10 methods are used to divide networks into communities. In order to make our description more concise, we give each network and algorithm a label. Label 1 2 3 4 5 6 7 8 9 10

Network US airport network c_celegens ca_Grqc ca_Hepph ca_Hepth citation email facebook hep_th internet

Nodes 500 293 4158 11204 8638 1555 1133 3927 5835 22963

Label 11 12 13 14 15 16 17 18 19 20

Network jazz kohonen metabolic netscience pgpgiantcompo power grid protein sciMet Wiki_vote Yeast

Nodes 198 3704 453 379 10680 4941 1458 2678 6959 2375

Table 2: The networks’ information. In Table 2, if networks are not connected, we choose the largest connected components. Similarly, we give each algorithm a label in Table 3. Label 1 2 3 4 5

Algorithm BGLL RB RNSC CPM SCluster

Reference [20] [21] [22] [23] [24]

Label 6 7 8 9 10

Algorithm UVCluster OSLOM SVI Infomap COPRA

Reference [25] [26] [27] [28] [29]

Table 3: The algorithms’ information. We use the algorithms to ﬁnd communities of all the networks ﬁrst. All the algorithms’ parameters are default. But in the SVI method, the community number should be given manually. We set the number of communities as about

N 100 .

Also some algorithms can ﬁnd overlapping communities, we set the parameters

that produce no overlapping communities. As in the previous sections, we choose each community’s 10% largest degree nodes as leaders. In order to study the roles of the leaders, we delete them and observe the changes happened to the communities. Now we show several parameters of these networks. All ﬁgures in this section are in form of average values of all communities for each network with each algorithm.

16

Fig. 15. (Color online) The average percentage of internal communities edges which are removed after leaders of each community are removed. For each network and each algorithm, the value is the average of the percentage of all the communities’ internal edges which have been removed. Figure 15 tells us that communities’ internal edges are removed more than 40% on average in all the 20 network. Not to say the cases when 20%, 30% nodes are removed. Moreover, diﬀerent networks have very diﬀerent values under the same algorithm while the values of the same network under diﬀerent algorithm are not so diﬀerent. Like the 10th algorithm, all the networks have high values. So it is the property of the networks instead of the algorithms. Then we see the change of the average of clustering coeﬃcient of the communities in each network under each algorithm. Note C ij is deﬁned by C

ij

=

∑K ij

ij k=1 (C0 (k) − K ij

C1ij (k))

,

(12)

where C0ij (k) is the average clustering coeﬃcient of the k th community of the ith network under the j th algorithm before the 10% largest degree nodes are removed, and C1ij (k) is the average of clustering coeﬃcient of the k th community of the ith network under the j th algorithm after the leaders are removed, K ij is the number of communities of the ith network under the j th algorithm.

17

Fig. 16. (Color online) C ij of the ﬁrst 10 networks.

Fig. 17. (Color online) C ij of the last 10 networks. C ij is positive for every i and j though the values vary dramatically in Figure 16 and Figure 17. It is to say, when we remove the leaders of each community, its average of clustering coeﬃcient decreases. So the leaders make community more clustered. Till now we have seen leaders of each community make the community denser and more clustered. Then we will see the leaders’ role in linking diﬀerent communities in Figure 18.

18

Fig. 18. (Color online) The upper panel is the average percentage of the inter-communities edges removed of each community after 10% largest degree nodes of each community are removed for each network and algorithm. Whereas, another panel is the average reduction of node betweenness in communities of each network under each algorithm after leaders are removed. Almost all the networks’ values are more than 20% in the upper panel of Figure 18, about half of these values are even more than 50%. Like the case of internal edges, the values are more related to the networks than algorithms. We conclude that the leaders are more likely to link to other communities. Also from Figure 18, we see that the removing of leader makes the betweenness of nodes in some communities increase while in other communities decrease, which depends on the networks and the methods used to detect community.

6

Conclusions and Discussion

In this paper, we display 10 commonly used community detecting methods to divide more than 20 real-world networks. After our analysis, we ﬁnd that most of the communities have some hub nodes as leaders. The leaders make the community denser and make the community’s average clustering coeﬃcient larger. It is more revealing that the leaders are more likely to link to other communities. In order to describe it, we obtain the contractible graph of the network which is formed by removing edges between communities that linked no leaders and then shrinking communities to nodes. The contractible graph is connected and it is a small-world network. Also, apart from the leaders which are removed, the nodes with high betweenness in the leaders-removed communities are quite diﬀerent from those in the original communities. Finally, the

19

similar results are veriﬁed in the random networks with experimented and theoretical approaches. In view of the results above, we can perform pretreatment before detecting community in order to obtain more structural properties in community. So leaders have potential to inﬂuence the detection of communities. To remove some large degree nodes of the network could be a good idea for detecting communities eﬀectively. Then we put each of these large degree nodes into community which has most neighbours of it. Because we use more than one methods and more than one real-world networks, our results are universal and reasonable. Moreover, the structure of a network is more clear in which some leaders attract a lot of general units to form communities and connect those communities to a network. The network models like BA network which consider preferential connections often contain some nodes with huge degrees and those nodes are closed to each other. But the real-world networks often have many leader communities in which there are some large degree nodes. Therefore, if we disperse large degree nodes during the evolution of network, we will get a network model which is more close to the actual one. Because this paper is the ﬁrst work to research the leader community in details, there are several problems to investigate in the future. First, how an algorithm aﬀect the percentage of internal edges or inter-community edges which are removed? As shown in our research, in some network the leader communities are more signiﬁcant while some are not. So which kind of networks have signiﬁcant leader communities? Second, what is the roles of the leaders in information spreading in a network? In this work we only discuss the roles of leaders in topological aspects, the roles in other aspects are still unknown. Third, is there any better metric to distinguish diﬀerent communities? This work could shed some light on the in-depth understanding and applications of the community internal structure in real-world networks, and all the problems about community could be thought in a new viewpoint. We complete our work by Python, Gephi, and C++. Acknowledgments This work is supported by the National Nature Science Foundation of China(11271006), Independent Innovation Foundation of Shandong University(IFYT 14013), Shandong Provincial Natural Science Foundation(ZR2012GQ002), Shandong Provincial Natural Science Foundation(ZR2014AQ001).

References [1] R. Albert, A.-L. Barabási, Statistical mechanics of complex networks, Rev. Mod. Phys. 74 (1) (2002) 47-97. [2] R. Albert, H. Jeong, A.-L. Barabási, Diameter of the world wide web, Nature 401 (1999) 130-131. [3] R. Albert, Scale-free networks in cell biology, J. Cell. Sci. 118 (2005) 4947-4957. 20

[4] A.-L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (5439) (1999) 509-512. [5] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.U. Hwang, Complex networks: structure and dynamics, Phys. Rep. 424 (2006) 175-308. [6] H. Jeong, S. Mason, A.-L. Barabási, Z.N. Oltvai, Lethality and centrality in protein networks, Nature 411 (2001) 41-42. [7] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, A.-L. Barabási, The large-scale organization of metabolic networks, Nature 407 (2000) 651-654. [8] F. Liljeros, C.R. Edling, L.A.N. Amaral, H.E. Stanley, Y. Aberg, The web of human sexual contacts, Nature 411 (2001) 907-908. [9] M.E.J. Newman, The structure and function of complex networks, SIAM Review 45 (2003) 167-256. [10] R. Pastor-Satorras, A. Vespignani, Evolution and Structure of the Internet, Cambridge University Press, 2004. [11] S. Redner, How popular is your paper? An empirical study of the citation distribution, Euro. Phys. J. B. 4 (1998) 131-34. [12] D.J. Watts, S.H. Strogatz, Collective dynamics of ’small-world’ networks, Nature 393 (1998) 440-442. [13] M.E.J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E 69 (2004) 66-133. [14] M. Girvan, M.E.J Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99 (12) (2002) 7821-7826. [15] R. Guimera, L. Danon, A. Diaz-Guilera, et al. Self-similar community structure in a network of human interactions, Phys. Rev. E 68 (6) (2003) 065103. [16] C.-J. Liu, Community ditection and analytical application in complex networks, Dissertation for doctoral degree of Shandong University, 2014. [17] M.T. Gastner, M.E.J. Newman, Diﬀusion-based method for producing density equalizing maps, Proc. Natl. Acad. Sci. USA 101 (2004) 7499-7504. [18] W.W. Zachary, An information ﬂow model for conﬂict and ﬁssion in small groups, J. Anthropol. Res. 33 (1977) 452-473.

21

[19] D. Lusseau, The emergent properties of a dolphin social network, Biology Letter, Proc. R. Soc. London B(suppl) (2003). DOI 10.1098/rsbl.2003.0057. [20] V.D. Blondel, J.L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, J. Stat. Mech. (2008) 10008. [21] J. Reichardt, S. Bornholdt, Statistical mechanics of community detection. Phys. Rev. E 74 (2006) 016110. [22] A.D. King, N. Przulj, I. Jurisica, Protein complex prediction via cost-based clustering, Bioinformatics 20 (2004) 3013-3020. [23] V.A. Traag, P. Van Dooren, Narrow scope for resolution-limit-free community detection, Phys. Rev. E 84 (2011) 016114. [24] R. Aldecoa, I. Marín, Jerarca: Eﬃcient analysis of complex networks using hierarchical clustering. PLoS ONE 5 (2010) 11585. [25] V. Arnau, S. Mars, I. Marín, Iterative cluster analysis of protein interaction data, Bioinformatics 21 (2005) 364-378. [26] A. Lancichinetti, F. Radicchi, J.J. Ramasco, S. Fortunato, Finding statistically signiﬁcant communities in networks. PLoS ONE 6 (2011) 18961. [27] P.K. Gopalan, D.M. Blei, Eﬃcient discovery of overlapping communities in massive networks, Proc. Natl. Acad. Sci. USA 110 (36) (2013) 14534-14539. [28] M. Rosvall, C.T. Bergstrom, Maps of random walks on complex networks reveal community structure, Proc. Natl. Acad. Sci. USA 105 (2008) 1118-1123. [29] S. Gregory, Finding overlapping communities in networks by label propagation, New J. Phys. 12 (10) (2010) 103018. [30] M.E.J. Newman, The structure of scientiﬁc collaboration networks, Proc. Natl. Acad. Sci. USA 98 (2) (2001) 404-409.

22

Leaders in communities of real-world networks

Leaders in communities of real-world networks

Recommend Documents