Information Sciences 236 (2013) 83–92
Contents lists available at SciVerse ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Maximizing modularity intensity for community partition and evolution Peng Gang Sun a,b,⇑, Lin Gao a,b, Yang Yang a a b
School of Computer Science and Technology, Xidian University, Xi’an 710071, China Institute of Computational Bioinformatics, Xidian University, Xi’an 710071, China
a r t i c l e
i n f o
Article history: Received 7 July 2011 Received in revised form 24 January 2013 Accepted 10 February 2013 Available online 4 March 2013 Keywords: Community Modularity intensity Evolution
a b s t r a c t Most previous studies of community partition have often focused on the network topology, the topology and link weights are in fact closely associated with each other for community formation in complex networks. This paper proposes a function-modularity intensity, a variation of modularity density (D-value) for evaluating the cohesiveness of a community, which considers links between vertices as well as link weights. The results showed that maximizing the modularity intensity not only can resolve the resolution limits problem, but also achieve better performance for community partition. To further evaluate the function and clarify the weight-topology correlation with communities, we give a simple model to simulate the topology and link weights development for community evolution, and use the modularity intensity to capture communities of networks in each step of this process. In this model, a network is treated as a fuzzy relation, and two operations of the fuzzy relation are used to make the link weights stronger and weaker respectively with the growth of the network in each evolutionary step. By simulation experiments, we found that in the model the modularity intensity catches communities of networks that undergo a gradual transition from faintness to clearness. Our model also reproduces the finding of Granovetter that strong links are confined mainly in tight communities and the links between communities are predominantly weak. From the results above, we believe that the modularity intensity gives a more comprehensive evaluation for communities, and by studying the weight-topology correlation, this model provides a new view for community evolution. 2013 Elsevier Inc. All rights reserved.
1. Introduction For the past several years, many studies indicated that most of the complex systems/networks display the modular structures/communities that are groups of nodes characterized with more internal links than external [2,8,11,18,27,39]. These structures are very important, because they can help us understand the systems’ organizations and functions [2,8,11,18,27,39]. The graph theory provides a powerful tool for studying the complex systems and detecting the communities [1,3,4,6–11,14,17–35,37,38,40–42]. Although many outstanding achievements are accomplished [1,3,4,6–11,14,17– 35,37,38,40–42], our works do not always stay on these structures’ identification, the study for the formation and evolution of communities becomes more and more significant for our further research [2,11,15,16,18,27,39,41]. By analyzing these complex systems’ structure formation, partition and evolution, we can get insight into the phenomena such as the function units, organization structure and culture tradition’s formation, and the rules for information diffusion or spreading processes [15,16,41]. Therefore, it is of great importance for us to understand how the communities form and evolve during the growth of networks [2,11,15,16,18,27,39,41]. ⇑ Corresponding author at: School of Computer Science and Technology, Xidian University, Xi’an 710071, China. E-mail address:
[email protected] (P.G. Sun). 0020-0255/$ - see front matter 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ins.2013.02.032
84
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
Since real-world networks are complicated, varied and dynamic, we can not only characterize the relation between two vertices by the existence of a link or not. Actually, the topology and link weights are two significant factors and are closely related in the networks for community formation and evolution [3,6,10,12–16,28]. The empirical studies of Granovetter [12,13] indicated that the strong and weak links have different roles in real-world networks that the strong links maintain the communities, while the weak links keep the global integrity of the network. Based on the findings, Kumpula et al. [15,16] gave a simple model to reproduce the network structure properties by studying the weight-topology correlation, and used it to simulate the process for answering the question: ‘‘how does the weight-topology correlation influence the community formation during the growth of networks’’. This model uses two mechanisms: (1) cyclic closure, it forms ties at short range with one’s network neighbors and (2) focal closure, it forms ties independently of the range through shared activities [14] to construct network tie for network evolution. In the implementation of this model, the cyclic closure and focal closure correspond to the local search and random attachment mechanisms respectively [15,16]. Although the model [15,16] elaborated the weight-topology correlation for community formation, a quantitative measure was not given to evaluate this correlation. In view of the importance of the weight-topology correlation for community formation, our paper is motivated and tried to propose a function-modularity intensity, a variation of modularity density (D-value) [21], which considers links between vertices as well as link weights for community detection. We also resolve the resolution limits problem by maximizing this measure. To further evaluate the function and clarify the weight-topology correlation with communities, we give a simple model to simulate the topology and link weights development for community evolution, and use the modularity intensity to capture communities of networks in each step of this process. Thus, we can see that the aim of our paper is completely different from Kumpula et al. [15,16]. In addition, our model has great differences with the model mentioned by Kumpula et al. [15,16]: (1) Our model emphasizes the process that the network communities evolve from faintness to clearness, while the model of Kumpula et al. stresses the process that the network undergoes a gradual structural transition from a community free topology to one with communities. (2) Our model uses the operations of fuzzy relation to replace the local search and the random attachment in the model of Kumpula et al. respectively. (3) In the process, our model considers all the possible neighbors with different distances, while the model of Kumpula et al. only considers network neighbors at short range. (4) Our model also emphasizes that strong links will transfer to their neighbors in different distances with a higher likelihood. (5) Our model describes the weight-topology correlation with communities by a quantitative function, modularity intensity. The rest of the paper is organized as follows. In Section 2, we describe the modularity intensity. Section 3 discusses maximizing the modularity intensity for the resolution limits problem. To further evaluate the function and clarify the weighttopology correlation with communities, we give a simple model to simulate the topology and link weights development for community evolution in Section 4. Section 5 presents the experimental results in artificial networks and real-world networks. The conclusion is provided in Section 6. 2. Modularity intensity Recently, many criteria were proposed for evaluating the partition of a network [8,11,21,26,27]. A widely used measure called modularity, or Q was presented by Newman and Girvan [11,26,27]. They considered Q as the objective function to transform the problem of community detection into a modularity optimization problem [11,26,27]. However, Fortunato and Barthélemy [9] claimed that the modularity exposed to the resolution limits problem. The communities depends on the total size of interactions in the network, in the extreme case that complete graphs connected by single link may not be resolved [9,21,35]. Li and Zhang proposed [21] a quantitative function called modularity density, or D-value to solve the resolution limits problem and found that their method obtains better performance than other algorithms [11,27] for community detection. The studies discussed above only considered links between vertices for community identification, the topology and link weights are in fact closely associated with each other for community formation. This paper proposes a function-modularity intensity, a variation of modularity density (D-value) [21] for evaluating the cohesiveness of a community, which not only considers links between vertices, but also link weights. A community with a higher modularity intensity indicates that it is hard to split or die out. P Let G = (V,E) be a network, where V is a set of vertices, and E is a set of edges. We define FðV s ; V t Þ ¼ 8i2V s ;8j2V t Aij Bij , P where Vs, Vt are two disjoint subsets of V, and FðV s ; V s Þ ¼ 8i2V s ;8j2V s Aij Bij , where V s ¼ V V s . A is the adjacency matrix of G; B is the link weights matrix of G, and can be defined by the following function [37]:
PjVj t¼1 ðAit Atj Þ þ Aij Bij ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PjVj ffi PjVj t¼1 Ait t¼1 Atj
ði; j 2 V; i – jÞ
ð1Þ
Bij is defined based on the shared neighbors between i and j which indicates that the greater the overlap of their friendship circle, the stronger the weight between them. It is consistent with the findings by Granovetter [12,13]. Given G1(V1,E1), . . . , Gn(Vn, En) is a partition of a network G, where Vi, Ei are the vertex set and edge set of the subnetwork Gi respectively, i = 1, . . . n. The modularity intensity of the partition of the network is defined as follows:
Mintensity ¼
n n X X a FðV i ; V i Þ b FðV i ; V i Þ mintensity ðGi Þ ¼ jV i j i¼1 i¼1
ð2Þ
85
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
where mintensity(Gi) is the modularity intensity of the subnetwork of Gi; Based on the definition of F(Vs, Vt), when Vs = Vt = Vi, then F(Vs, Vt) = F(Vi, Vi) that is equal to twice the sum of link weights in subgraph Gi; a, b are the tuning parameters (a, b 2 (0, 1], a P b). This function is benefited from [21]. For Eq. (2), we can not promise that Mintensity gets the maximum, when a ? 1 and b ? 0. Here, we need to find a maximal Mintensity (each mintensity(Gi) should be positive), because it corresponds to a better partition for a network. However, it is a NP-hard problem, we resolved it as an integer programming problem which had been discussed by Li and Zhang [21]. 3. Maximizing modularity intensity for resolution limits problem Fortunato and Barthélemy found that the modularity optimization may fail to identify communities smaller than a scale even in cases where communities are unambiguously defined [9]. This scale depends on the total size of the network and on the degree of interconnectedness of the communities [9]. Here, we used maximizing the modularity intensity to solve this problem and evaluated it by using the same examples from Fortunato and Barthélemy [9]. All the following demonstrations are benefited from [9,21]. Here, we first proved whether maximizing the modularity intensity can partition a clique into several parts. Given a kclique, and it is partitioned into several subnetworks G1, . . . , Gn(2 6 n 6 k, k P 3). The number of vertices of G1, . . . , Gn are t1, . . . , tn respectively, t1 + + tn = k. Since the degree of each vertex is k 1 and any two vertices have k 2 shared neighbors in the k-clique network, each link has an equal weight, which is denoted by w0(0 < w0 6 1).
aw0 kðk 1Þ b 0
¼ aw0 ðk 1Þ; k aw0 ti ðti 1Þ bw0 ðk ti Þti mintensity ðGi Þ ¼ ¼ aw0 ðt i 1Þ bw0 ðk ti Þ; ti
M intensity ðk-cliqueÞ ¼
" !# ! n n n n X X X X mintensity ðGi Þ ¼ ½aw0 ðt i 1Þ bw0 ðk ti Þ ¼ w0 a t i n b nk ti Mintensity ðG1 ; . . . ; Gn Þ ¼ i¼1
i¼1
i¼1
i¼1
¼ w0 ½aðk nÞ bðnk kÞ < aw0 ðk nÞ; Mintensity ðk-cliqueÞ M intensity ðG1 ; . . . ; Gn Þ > aw0 ðk 1Þ aw0 ðk nÞ ¼ aw0 ðn 1Þ > 0 ð2 6 n 6 k; k P 3; 0 < w0 6 1; 0 < a 6 1Þ: Form the discussion above, we can see that maximizing the modularity intensity can not partition a clique into several parts. We further evaluated the modularity intensity by using the examples from [9] in Fig. 1. Fig. 1a is the network of several kcliques connected by a single edge to form a circle. Obviously, each k-clique is a community, but maximizing Q can not correctly detect these communities, two or more k-cliques will be seen as one community [9,21]. The number of k-cliques in Fig. 1(a) is m, which can be divided into n communities. These communities have t1, . . . , tn consecutive k-cliques respectively, t1 + + tn = m (m P 2, m/n P 2). In Fig. 1a, there are three kinds of links and the link weights are denoted by w0, w1, w2 respectively, which can be seen in Fig. 1c. It is obviously that 0 < w0 < w1 < w2 6 1. In other words, the weights of internal links are larger than external links. We used M0intensity to denote the modularity intensity of the partition that each k-clique is a community, and M 00intensity to denote the modularity intensity of the partition that m k-cliques are divided into n communities.
X n aw0 kðk 1Þ 2bw0 aw0 ½ti kðk 1Þ þ 2ðti 1Þ 2bw0 M0intensity M 00intensity > m k kti i¼1 n n X X 2 2ðti 1Þ 2 þ bw0 aw0 > maw0 k 1 k1þ k kti kti i¼1 i¼1 n X 2 k1 2ðat i a bÞ w0 ¼ maw0 k 1 m k kti n i¼1 m ! 2 an ab 2 k1 2 k 1 2 mn 1 ab w0 P maw0 k 1 m n ¼ m a w k 1 0 2 m k k k mn k mn n n 2 k1 2 2 k1 1 ; ðm=n P 2Þ > maw0 k 1 m m P maw0 k 1 k kn k 2 k n 3 k1 P 0; ðk P 3; 0 < w0 6 1; 0 < a 6 1Þ: ¼ maw0 k 1 k 2
86
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
Fig. 1. The examples. (a) One k-clique is a complete graph of k vertices, two adjacent k-cliques are connected by a single edge. (b) A network consists of two p-cliques and two q-cliques.
M0intensity > M 00intensity : Based on the analysis above, we can find that maximizing the modularity intensity can correctly detect each k-clique as one community in Fig. 1a. In the example of Fig. 1b, it is a network that consists of two p-cliques and two q-cliques (q P p P 3). Obviously, each clique should be a community. However, Fortunato and Barthélemy found that the two p-cliques in the blue dashed line are considered as one community by maximizing Q [9]. Here, we proved that maximizing the measure can correctly detect each clique as one community. We used M0intensity to denote the modularity intensity of the partition that each clique is a community, and M 00intensity to denote the modularity intensity of the partition that two p-cliques in the blue dashed line are considered as one community. It
87
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
is the same with the above, the weights of internal links are larger than external links (0 < w6 < w3 < w0 < 1, w0 < w1 < w2, w3 < w4 < w5, w6 < w4 < w5), which can be seen in Fig. 1d–f respectively.
M0intensity > 2
M00intensity <
aw1 pðp 1Þ 2bw0
p
þ
aw4 qðq 1Þ bw3 ð2w3 þ w6 Þ q
2aw1 pðp 1Þ þ 2aw0 2bw3 2p
þ
þ
aw4 qðq 1Þ bw6
aw4 qðq 1Þ bw3 ð2w3 þ w6 Þ q
q þ
aw4 qðq 1Þ bw6 q
bw3 4bw0 aw0 bw3 5aw0 aw0 ðp2 p 5Þ þ bw3 > aw1 ðp 1Þ þ > > 0; p p p ðp P 3; 0 < a; b 6 1; 0 < w3 ; w0 < 1Þ:
M0intensity M 00intensity > aw1 ðp 1Þ þ
Therefore, we can find that maximizing this measure can correctly detect each clique as one community in Fig. 1b. In conclusion, maximizing the modularity intensity can effectively resolve the resolution limits problem for community partition. 4. The model To further evaluate the modularity intensity and clarify the weight-topology correlation with communities, we gave a simple model to simulate the topology and link weights development for community evolution, and used the modularity intensity to capture the communities of networks in each step of this process. In this model, a network is considered as a fuzzy relation, and two operations of the fuzzy relation are used to make the link weights stronger and weaker respectively with the growth of the network in each evolutionary step. At the end of each evolutionary step, a vertex is removed which is replaced by a new one at the beginning of the next step. A network, G = (V, E) can be denoted by a fuzzy relation, R, and we used a group of consecutive fuzzy relations, R ? R2 ? ? Rk ? to show the network growth in each evolutionary step. Rk is the network in the kth step of the evolutionary process, Rk(i, j) can be interpreted as the grade of membership of the ordered pair (i, j) in the kth step. The initial link weights in R is B(R(i, j) = Bij). Here, we used two operations of the fuzzy relation to simulate the increase and decrease of link weights with the growth of the network respectively. (1) (Rk1 Rk1)(i, j) = _v2V (Rk1(i, v) ^ Rk1(v, j)), (i, j) 2 V V. If Rk1 is reflexive, then Rk1 # Rk1 Rk1. That is "(i, j) 2 V V, Rk1(i, j) 6 (Rk1 Rk1)(i, j) [5]. This operation can make the link weights stronger. Thus, we assumed that Rk1 is reflexive in this operation. (2) ðRk1 ~ Rk1 Þði; jÞ ¼ ^v 2V ðRk1 ði; v Þ _ Rk1 ðv ; jÞÞ; ði; jÞ 2 V V. If Rk1 is irreflexive, then Rk1 Rk1 ~ Rk1 . That is 8ði; jÞ 2 V V; Rk1 ði; jÞ P ðRk1 ~ Rk1 Þði; jÞ [36]. This operation can make the link weights weaker. Thus, we assumed that Rk1 is irreflexive in this operation. The proof of this conclusion can be found in the appendix. From the discussion above, we found that the changes of link weights between i and j are closely related with the paths that they are involved, which agree with the transitivity of the network growth. The studying results of Granovetter [12,13] indicated that the strong and weak links have different roles in real-world networks that the strong links maintain the communities while the weak links maintain the global integrity of the network. It means that the internal links of the communities are dense and stronger, the external links are sparse and weaker [12,13]. Moreover, link weights (or tie strengths) between two individuals increase with the overlap of their friendship circles during the growth of the network [12,13], which may mean that the internal links will be stronger with a higher likelihood. According to the findings above, we assumed that Rk(i, j) can choose one of the two operations with a probability, pr based on the hypothesis that link weights between two individuals increase with the overlap of their friendship circles during the growth of the network [12,13]. Here, we defined pr based on Rk1(i, j), because the larger the Rk1(i, j), the greater overlap between i and j. Rk can be obtained by Rk1.
( Rk ¼
Rk ði; jÞ ¼ ðRk1 Rk1 Þði; jÞ þ d; pr ¼ Rk1 ði; jÞ Rk ði; jÞ ¼ ðRk1 ~ Rk1 Þði; jÞ d;
pr ¼ 1 Rk1 ði; jÞ
ð3Þ
For Eq. (3), we presumed that if Rk(i, j) + d > 1, then Rk(i, j) = 1; if Rk(i, j) d < 0, then Rk(i, j) = 0.1. d is a parameter in (0, 1] that can control the speed of the network evolutionary procedure. Here, we also included to the model a vertex deletion mechanism, which is benefited from [15,16]. One vertex is removed with a probability of pD at the end of each evolutionary step. The removed vertex is replaced by a new one with a same node ID at the beginning of the next evolutionary step so that the size of the network remains fixed at jVj [15,16]. The selection of the removed node in each step is completely at random.
88
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
5. Results and discussions 5.1. Analysis of community partition In this section, we evaluated the performance of our method for community partition on the GN benchmark [11] and the Zachary network [43]. In addition, we also illustrated how the weight-topology correlation influences the community formation and evolution. 5.1.1. Test on GN benchmark The network of GN benchmark [11] has 128 nodes divided into 4 communities of 32 vertices each (1–32, 33–64, 65–96, 97–128). The links are distributed randomly with a constant probability Zin for a link to occur for each pair of intra-community nodes and another constant probability Zout for each pair of outer-community nodes so as to keep the average degree of a
Fig. 2. Comparison of community detection based on GN benchmark. (a–d) Correspond to Zout = 4, 5, 6, 7 respectively (a = 1.0).
Fig. 3. The results of weight-topology correlation for influencing the community detection based on GN benchmark. (a and b) Correspond to Zout = 4 and 7 respectively. Descending weight removal is in blue and ascending in red. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
89
node at 16 (Zin + Zout = 16). In this benchmark, the larger the Zout is, the more vague the communities are. Fig. 2 showed the plot of classification accuracy (the fraction of nodes that are classified into their correct communities) as a function of Zout. In Fig. 2, for Zout = 4, the four communities detected by our method get perfect results in b = 0.8 or b = 0.9, and our method obtains the same performance with the D-value method. For Zout = 5, the performance declines and our method outperforms the D-value method in b = 0.3. For Zout = 6, 7, our method nearly achieves the same performance with the D-value method in b = 0.1. From Fig. 2, we also found that the peak value of our method shifts left with the increase of Zout. For the selection of parameters a and b, the results we found are just as we have discussed in Section 2. If a ? 1 and b ? 0, then mintensity(Gi) is maximum, but we can not promise that Mintensity gets a max value. In addition, the choice of parameters depends strongly on the network structure, therefore, no closed criterion can be given to estimate their values in general. The modularity intensity is defined based on the correlation between the network topology and link weights. Granovetter [12,13] suggested that strong links are confined mainly in tight communities and the links between communities are predominantly weak. In order to illustrate how the weight-topology correlation influences the community partition and
Fig. 4. Partition of Zachary’s karate club network. The instructor’s faction and the administrator’s faction are divided by the red dashed lines. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 5. The modularity intensity of a community (vertices 1–32) from GN benchmark in each evolutionary step. Each point is averaged over 100 realizations (Zout = 4). (a) We fixed pD = 1 102. (b) We fixed d = 0.03.
90
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
formation, we removed links from the network in both ascending and descending orders based on their weights respectively, when monitoring the modularity intensity of a community as a function of the ration of removed network links, f [15,16]. This experimental design is benefited from [15,16]. From Fig. 3, we found that the modularity intensity of a community increased firstly and decreased later when removing the network links in ascending order. It is because the external links are weaker and removed firstly which lead to that the modularity intensity increased in the beginning until the peak value. This gives us a sufficient proof that weak links play a role of the bridges between communities. However, on the other hand, the modularity intensity of a community decreased directly when removing the network links in descending order. It is because the internal links are stronger and removed firstly which lead to that the modularity intensity decreased. This indicated that the strong links locate mainly in tightly connected communities. From the discussion above, we confirmed that removing the links in ascending based on link weights is better than in descending order for the community partition and promotes the community formation as well. This discussion not only verified the hypothesis by Granovetter that strong links are confined mainly in tight communities and the links between communities are predominantly weak, but also explained how the weight-topology correlation influences the community formation and partition. 5.1.2. Test on Zachary network In the Zachary network [43], Zachary observed 34 members of a karate club over a period of two years. During the course of the experiment, a disagreement developed between vertex 1 (the administrator of the club) and vertex 33 (the club’s instructor), which ultimately resulted in a splitting of the network (interpreted as ‘‘the instructor’s leaving and starting a new club, taking about a half of the original club’s members with him’’). Zachary constructed a network of friendships between members of the club. Here we tried to identify the factions involved in the split of club based on maximizing the modularity intensity. The results showed that the two well-known communities are detected which are centred with vertex 1 (the administrator of the club) and vertex 33 (the club’s instructor) (see Fig. 4). In Fig. 4, two primary groups are formed and divided by the red dashed lines (a = 1, b = 0.1). Our method obtains better performance for the Zachary network. 5.2. Analysis of community evolution Just as the discussions in Section 4, the modularity intensity of a community will be larger with a higher likelihood during the growth of the network, we used GN benchmark as an example to analyze the community evolution based on our model. Fig. 5 showed the modularity intensity of a community (vertices 1–32) from GN benchmark in each evolutionary step. Fig. 5a indicated the effect of d on the modularity intensity of the community when we fixed pD = 1 102. We found that the larger the d is, the faster the modularity intensity reaches to the peak value and the network to a stable state. Therefore,
Fig. 6. Visualization of community evolution based on GN benchmark (Zout = 4). Each point is averaged over 100 realizations, weak links are blue and the color changes gradually to red for strong links, for the comparison, we only show the changes of link weights that the links exist in the original network. (a– c) The 1st, 20th and 40th step in the evolutionary process respectively (pD = 1 102, d = 0.03). This figure only shows the weights changes of original links. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
91
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
increasing d can promote the community formation and evolution. The results also illustrated that the network topology and link weights are closely associated with each other for community formation. Fig. 5b showed the effect of pD on the modularity intensity of the community when we fixed d = 0.03. We found that pD has a small impact on the evolutionary process. The discussion also indicated that the results are more susceptible to d than pD. The similar results are also found in other communities of GN benchmark. The community evolution is visualized in Fig. 6. From the results above, we found that in the model the modularity intensity catches the communities of the network that undergo a gradual transition from faintness to clearness. Our model effectively captures the community evolutionary process and illustrates how the correlation between the topology and link weights influences the community formation and evolution. We believe that the modularity intensity gives a more comprehensive evaluation for communities, and this model provides a new view for community evolution by studying the weight-topology correlation. 6. Conclusion This paper proposed a function-modularity intensity for evaluating the cohesiveness of a community, which considers links between vertices as well as link weights. Furthermore, we gave a model to simulate the community evolution and used the modularity intensity to track this process. We found that maximizing the modularity intensity can resolve the resolution limits problem, and the model effectively captures the community evolutionary process. The results also help us understand how the weight-topology correlation influences the community formation and evolution. In the future work, we will still focus on this model and apply it on biological networks such as protein–protein interaction (PPI) networks for studying functional modules or protein complexes formation and evolution. Acknowledgements This work is supported by the National Natural Science Foundation of China (Grant Nos. 61202175, 60933009, 91130006), the Fundamental Research Funds for the Central Universities (Grant No. K50511030001), the Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2012JQ1010), and the Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120203120015). Appendix A We give a proof to show that if "i 2 V, R(i, i) = 0(R ðR ~ RÞði; jÞ ¼ ^v 2V ðRði; v Þ _ Rðv ; jÞÞ; ði; jÞ 2 V VÞ, and Rn+1 is irreflexive. Proof. Here, we use mathematical induction to prove this 0 0 ... B . .. (i) When k = 1, Rk = R1 = R, Rk+1 = R2; Let R ¼ @ .. . (1) It is easy to see that R2 is 0 0 B . (2) Given R2 ¼ R ~ R ¼ @ .. ~r n1
r n1 irreflexive. 1 . . . ~r1n .. C ~ .. . A; 8v ij ; i – j, . 0
is
irreflexive),
theorem. 1 0 r1n 0 B . .. C 2 . A, and R ¼ R ~ R ¼ @ .. rn1 0
~r ij ¼ ðr i1 _ r 1j Þ ^ ðr i2 _ r 2j Þ ^ ^ ðr ii _ r ij Þ ^ ^ ðr ij _ r jj Þ ^ ^ ðrin _ rnj Þ; * r ii ¼ r jj ¼ 0ðR is irreflexiveÞ; ) ~r ij ¼ ðri1 _ r1j Þ ^ ðr i2 _ r 2j Þ ^ ^ ð0 _ rij Þ ^ ^ ðr ij _ 0Þ ^ ^ ðr in _ r nj Þ; )~r ij ¼ ðr i1 _ r 1j Þ ^ ðr i2 _ r 2j Þ ^ ^ ðr ij Þ ^ ^ ðrij Þ ^ ^ ðr in _ r nj Þ; )~r ij ¼ ðr i1 _ r 1j Þ ^ ðr i2 _ r 2j Þ ^ ^ ðr ij Þ ^ ^ ðrin _ rnj Þ; )~r ij 6 r ij ; )R R2 ; ) When n ¼ 1; it is true: (ii) Given k = n 1, the theorem is true, and Rk = Rn1 Rn,Rn is irreflexive. (iii) When k = n,Rk = Rn,Rk+1 = Rn+1. 0 1 0 1 0 ~r1n ~r 1n 0 ... 0 ... 0 ... B B .. B .. .. nþ1 n .. C .. C .. .. ~ ~ and R R ¼ ¼ R Let Rn ¼ @ ... A @ A @ . . . . . . . ~rn1 0 ~r n1 0 rn1 (1) It is also easy to see that Rn+1 is irreflexive.
1 r 1n .. C . A; 0
... .. .
then
Rn Rn+1(Rn ¼ Rn1 ~ R;
1 0 r 1n .. C B . A~@ 0
0 .. . r n1
... .. .
1 r1n .. C . A 0
92
P.G. Sun et al. / Information Sciences 236 (2013) 83–92
0
nþ1
(2) Given R
v~ ij ¼
0 B ¼ @ ... ~r n1
1 ~r1n ... .. C .. ~ . A; 8v ij ; i – j, . 0
~r i1 _ r 1j ^ ~ri2 _ r1j ^ ^ ~rii _ r ij ^ ^ ~r ij _ r jj ^ ^ ~r in _ r nj ;
* ~rii ¼ 0; r jj ¼ 0ðRn ; R is irreflexiveÞ; ) v~ ij ¼ ~ri1 _ r1j ^ ~r i2 _ r 1j ^ ^ ð0 _ r ij Þ ^ ^ ~r ij _ 0 ^ ^ ~r in _ r nj ; ) v~ ij ¼ ~ri1 _ r1j ^ ~r i2 _ r 1j ^ ^ ðr ij Þ ^ ^ ~r ij ^ ^ ~r in _ r nj ; ) Rn1 Rn ; ) R Rn ; ) rij P ~rij ; ) v~ ij ¼ ~ri1 _ r1j ^ ~r i2 _ r 1j ^ ^ ~r ij ^ ^ ~r in _ r nj ; ) v~ ij 6 ~r ij ; ) Rn Rnþ1 ; ) From the (i), (ii) and (iii), we can see that it is true. h References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43]
Y.-Y. Ahn, J.P. Bagrow, S. Lehmann, Link communities reveal multiscale complexity in networks, Nature 455 (2010) 761–764. R. Albert, A.-L. Barabasi, Statistical mechanics of complex networks, Rev. Mod. Phys. 74 (2002) 47–97. N.A. Alves, Unveiling community structures in weighted networks, Phys. Rev. E 76 (2007) 036101. J.P. Bagrow, Communities and bottlenecks: trees and treelike networks have high modularity, Phys. Rev. E 85 (2012) 066118. S.L. Chen, J.G. Li, X.G. Wang, Fuzzy Set Theory and its Application, Science Press, Beijing, 2005. P. De Meo, E. Ferrara, G. Fiumara, A. Provetti, Enhancing community detection using a network weighting strategy, Inform. Sci. 222 (2013) 648–668. L. Danon, A. Diaz-Guilera, J. Duch, A. Arenas, Comparing community structure identification, J. Statist. Mech.: Theory Exp. 09 (2005) P09008. S. Fortunato, Community detection in graphs, Phys. Reports 486 (2010) 75–174. S. Fortunato, M. Barthélemy, Resolution limit in community detection, Proc. Natl. Acad. Sci. 104 (2007) 36–41. I. Farkas, D. Abel, G. Palla, T. Vicsek, Weighted network modules, New J. Phys. 9 (2007) 1–18. M. Girvan, M.E.J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. 99 (2002) 7821–7826. M.S. Granovetter, The strength of weak ties, Am. J. Sociol. 78 (1973) 1360–1380. M.S. Granovetter, Economic action and social structure: the problem of embeddedness, Sociol. Econom. Life 91 (1985) 481–510. G. Kossinets, D.J. Watts, Empirical analysis of an evolving social network, Science 311 (2006) 88–90. J.M. Kumpula, J.-P. Onnela, J. Saramäki, K. Kaski, J. Kertész, Model of community emergence in weighted social networks, Comput. Phys. Commun. 180 (2009) 517–522. J.M. Kumpula, J.-P. Onnela, J. Saramäki, K. Kaski, J. Kertész, Emergence of communities in weighted networks, Phys. Rev. Lett. 99 (2007) 228701. A. Lancichinetti, S. Fortunato, Limits of modularity maximization in community detection, Phys. Rev. E 84 (2011) 066122. A. Lancichinetti, F. Radicchi, J.J. Ramasco, S. Fortunato, Finding statistically significant communities in networks, PLoS One 6 (2011) e18961. A. Lancichinetti, S. Fortunato, Consensus clustering in complex networks, Sci. Reports 2 (2012) 1–7. A. Lancichinetti, S. Fortunato, Community detection algorithms: a comparative analysis, Phys. Rev. E 80 (2009) 056117. Z. Li, S. Zhang, R.-S. Wang, X.-S. Zhang, L. Chen, Quantitative function for community detection, Phys. Rev. E 77 (2008) 036109. W. Li, Revealing network communities with a nonlinear programming method, Inform. Sci. 229 (2013) 18–28. A.D. Medus, C.O. Dorso, Alternative approach to community detection in networks, Phys. Rev. E 79 (2009) 066111. S. Muff, F. Rao, A. Caflisch, Local modularity measure for network clusterizations, Phys. Rev. E 72 (2005) 056107. B. Mirkin, S. Nascimento, Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices, Inform. Sci. 183 (2012) 16–34. M.E.J. Newman, Detecting community structure in networks, Eur. Phys. J. B 3 (2004) 8321–8330. M.E.J. Newman, Modularity and community structure in networks, Proc. Natl. Acad. Sci. 103 (2006) 8577–8582. M.E.J. Newman, Analysis of weighted networks, Phys. Rev. E 70 (2004) 056131. G.K. Orman, V. Labatut, H. Cherifi, Comparative evaluation of community detection algorithms: a topological approach, J. Statist. Mech.: Theory Exp. (2012) P08001. G. Palla, I. Derenyi, I. Farkas, T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature 435 (2005) 814–818. J.M. Pujol, J. Béjar, J. Delgado, Clustering algorithms for determining community in large networks, Phys. Rev. E 74 (2006) 016107. F. Radicchi, A. Lancichinetti, J.J. Ramasco, Combinatorial approach to modularity, Phys. Rev. E 82 (2010) 026102. J. Reichardt, S. Bornholdt, Detecting fuzzy community structures in complex networks with a potts model, Phys. Rev. Lett. 93 (2004) 218701. P. Ronhovde, Z. Nussinov, Local resolution-limit-free potts model for community detection, Phys. Rev. E 81 (2010) 046114. M. Rosvall, C.T. Bergstrom, An information-theoretic framework for resolving community structure in complex networks, Proc. Natl. Acad. Sci. 104 (2007) 7327–7331. P.G. Sun, Clustering Algorithms and Its Application for Network Modularity Analysis, PhD thesis, Xidian University, 2011. P.G. Sun, L. Gao, S. Han, Identification of overlapping and non-overlapping community structure by fuzzy clustering, Inform. Sci. 181 (2011) 1060– 1071. P.G. Sun, Y. Yang, The methods to find community based on edge centrality, Nature (2013), http://dx.doi.org/10.1016/j.physa.2012.12.024. S.H. Strogatz, Exploring complex networks, Nature 410 (2001) 268–276. M. Sales-Pardo, R. Guimer’a, L.A.N. Amaral, Extracting the hierarchical a organization of complex systems, Proc. Natl. Acad. Sci. 4 (2007) 15224. J. Wu, L. Jiao, C. Jin, F. Liu, M. Gong, R. Shang, W. Chen, Overlapping community detection via network dynamics, Phys. Rev. E 85 (2012) 016115. J. Xie, S. Kelley, B.K. Szymanski, Overlapping community detection in networks: the state of the art and comparative study, ACM Comput. Surveys 45 (2013) 1–37. W.W. Zachary, An information flow model for conflict and fission in small groups, J. Anthropol. Res. 33 (1977) 452–473.