Partition network into communities based on group action on sets

Partition network into communities based on group action on sets

Accepted Manuscript Partition network into communities based on group action on sets Shu-Cheng Lin, Han-Wen Tuan, Cheng-Tan Tung, Peterson Julian PII...

359KB Sizes 0 Downloads 32 Views

Accepted Manuscript Partition network into communities based on group action on sets Shu-Cheng Lin, Han-Wen Tuan, Cheng-Tan Tung, Peterson Julian

PII: DOI: Reference:

S0378-4371(17)30855-5 https://doi.org/10.1016/j.physa.2017.08.124 PHYSA 18570

To appear in:

Physica A

Received date : 2 April 2016 Revised date : 12 June 2017 Please cite this article as: S. Lin, H. Tuan, C. Tung, P. Julian, Partition network into communities based on group action on sets, Physica A (2017), https://doi.org/10.1016/j.physa.2017.08.124 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Partition network into communities based on group action on sets

Shu-Cheng Lin1, Han-Wen Tuan2, Cheng-Tan Tung3, Peterson Julian4 1

Department of Hotel Management, Lee-Ming Institute of Technology, Taiwan

2

Department of Computer Science and Information Management, Hungkuang University, Taiwan 3

Department of Information Management, Central Police University, Taiwan 4

Department of Traffic Science, Central Police University, Taiwan

Abstract In this paper an improved algorithm is provided to detect communities within a network based on group action on sets (GAS). Modularity has been used as the criterion to revise the results of three previous papers, deriving a better method of partition for the network of Karate club. We developed a new method to replace the complicated GAS to achieve the same effect as GAS. Through four examples, we demonstrated that our revised approach reduced the computation amount of modularity values.

Based on a branch marked example, a detailed example is

provided by us to illustrate that there is too many cores in the initial stage of GAS approach to induce too many communities in the final partition. The findings shown here, will allow scholars to understand using GAS algorithm to partition a network into communities is an unreliable method.

Keywords: community in network; intersection-union operation; group action on sets (GAS); r-cycle

1. Introduction

1

A group of people who share common values or interests can form a human community. A human community, thereupon, can represent a type of political power, social power, consumption tendency, consumption behavior transference, commercial opportunity, new demand formation, value and preference change, or new type socialor scientific-oriented cohesion. All living organisms can be attributed to a community. For instance, each type of bacteria or virus can be classified into different communities, each of them having a unique set of survival condition, parasitical conditions, influential objects and extents, proliferation speed and likely mutation. The identification process of various communities has been the focus of researchers, creating different approaches for different purposes. In order to study communities in social, computer, metabolic, and regulatory networks, Newman [1] presented a mathematical technique to investigate the community structure which differentiates dense links from sparse links among numerous networks of scientific interest. Radicchi et al. [2] proposed a quantitative local algorithm for the identification of community structures, claiming that it performs better than existing approaches and contributes to the application of large-scale technological and biological systems. By focusing on community structures with node-joined networks in tightly knit groups, among which there are only looser connections, Girvan and Newman [3] developed a method for detecting such communities built around the idea of using centrality indices to find the boundaries of those communities. In the application of living organic communities, Risatti et al. [4] used hybridization probes to complement various phylogenetic groups and directly determined the abundance and distribution of sulfate-reducing bacterial populations in a microbial non-overlapping mat community. In the pursuit of disclosing the modular structure of complex systems, Palla et al. [5] proposed a clique percolation for analyzing and masterminding the main statistical features of the interwoven sets of communities. Donetti and Muñoz [6] 2

used spectral properties of the graph, Laplacian matrix, and hierarchical-clustering techniques to maximize the modularity of the output. It turns out that the algorithm they presented was quite encouraging in terms of accuracy and computer running time for detecting and analyzing communities and modular structures in complex networks. Other essential means for identifying and detecting communities include local techniques [7, 8] betweenness-based methods [9], and so on. Zhang et al. [10] proposed a GAS algorithm to detect communities in highly-clustered networks, such that they utilized the community separability to reveal the inherent structure of a network and the modularity to evaluate the quality of the community division. They claimed that GAS algorithm was shown to be more accurate and effective in the detection of communities in high clustered networks. The aim of this study is to point out seven relevant questionable results of Zhang’s work [10] and provide improvements. The first, the algorithm presented by Zhang et al. [10] only considered one of the many possibilities in real world situations, when building up communities from a particular network while our explanation exhausts all aspects when carrying out the GAS processes of community division of Zhang’s work [10]. The second, their approach in the formation of targeted communities through the fulfillment of adding isolated orbits into non-isolated orbits will imply many possible partitions. The third, the GAS to generate a group from a finite set will be a tedious task and then we provide a simplified method to obtain the same effect. Therefore, this study provides a new and simplified method to achieve goal-finding. The fourth, we show that their method to compute the difference between community separability and modularity is unnecessary. The fifth, following the branch marked example recommended by Girvan and Newman [3], this study shows a GAS algorithm of lower than 1% to get the predesigned communities. The sixth, we pointed out that the Burnside Theorem (cited in Zhang et al. [10]), which is useless to predict the number 3

of communities in a network. The seventh, based on the branch marked example with our 3-2-3 simplification, we compared the results that derived by GAS method of Zhang et al. [10] and that of Newman [11] to indicate that there are 9 communities by GAS and 4 communities by Newman [11]. By our performance estimation Zhang et al. derived 34.4% desired partition less than 75% derived by Newman [11]. These findings will help researchers choose algorithms to derive communities from a network.

2. Review of Zhang et al. [10]

We will first explain in detail how Zhang et al. [10] applied GAS of nodes to obtain the cores of communities and then merged isolated nodes into existing cores as a base to derive a partition of the network into communities. For this part, we will point out two questionable results in Zhang et al. [10]: (a) The proof for nonoverlapping after GAS is missing, and (b) Their GAS algorithm may imply many possible partitions. An r-cycle (r click) in a network is defined as a subset of r-nodes, such that there is a link between every two nodes in this subset. In Zhang et al. [10], they selected a number, r from {3, 4, 5} and collected all r-cycles in the network as base elements, denoted as X , to develop a finite group, denoted as G = X . They then applied group G to act on set of nodes to derive the orbit,

denoted as Θ( x ) , of a node, x , as follows Θ( x ) = {gx : g ∈ G} .

(1)

The set of nodes become a disjoint union of obits. However, Zhang et al. [10] did not provide a proof for the disjoint union. Therefore,

4

we will show that GAS implies the disjoint union of orbits.

Lemma 1. If Θ( x ) and Θ( y ) intersect, then Θ( x ) = Θ( y ) . (Proof.) Let z be an element in the intersection of Θ( x ) and Θ( y ) , then there are two elements g1 and g 2 in the group G such that z = g1 x = g 2 y .

(2)

For an arbitrary point in the orbit Θ( x ) , g 3 x , we will show that there is an element, g 4 , in the group G that satisfies

g3 x = g4 y .

(3)

( )

We take g 4 = g 3 g1−1 g 2 where g 2−1 is the group inverse of g 2 in G , then

( )

( )

g 4 y = g 3 g1−1 g 2 y = g 3 g1−1 z = g 3 x .

(4)

The above discussion proves Θ(x ) ⊆ Θ( y ) . Similarly, we know that Θ( y ) ⊆ Θ(x ) which implies Θ( x ) = Θ( y ) .

Given a node, x , if the size of Θ(x) is greater than one, that is Θ( x ) > 1 , then this non-isolated orbit, Θ( x ) will be a core for the development of a community. On the other hand, for a given node, y , if the orbit Θ( y ) is single, that is Θ( y ) = 1 , then y is defined as an isolated orbit which will be merged to the most connected developing communities that will be explained in detail in our iterative algorithm in Section 5.

In Zhang et al. [10], they mentioned, “combine the isolated orbit, if it exists, with the orbit which has the most links with the isolated one.” However, they did consider the scenario where an isolated orbit has the same number of links to two developing

5

communities. We will use the same numerical example in [3, 6, 10] for the network of a karate club, to demonstrate that based on the GAS algorithm of [10] merging isolated orbits into cores as a method of partitioning a network into communities will result in several possible outcomes. Please refer to following Figure 1 that was reproduced from Figure 5 of Zhang et al. [10] such that there are 34 members in a karate club. Our goal is to partition them into two communities.

Insert Figure 1 here

They selected all 4-cycles as the generator, X , to develop a finite group, G by GAS such that X = {(1,2,3,4), (1,2,4,8), (1,2,4,14), (1,3,4,14), (9,31,33,34 ), (24,30,33,34 )} .

(5)

Without a detailed description of G , they used G to act on the set of nodes to derive the following non-isolated orbits: Θ(1) = {1,2,3,4,8,14} = Θ(2 ) = Θ(3) = Θ(4 ) = Θ(8) = Θ(14 ) ,

(6)

and Θ(9) = {9,24,30,31,33,34} = Θ(24 ) = Θ(30 ) = Θ(31) = Θ(33) = Θ(34 )

(7)

as two cores and the rest are isolated orbits. In the following Figure 2, the nodes of the orbit Θ(1) are marked in yellow and nodes of orbit Θ(9) are marked in green.

Insert Figure 2 here

In the next table, we list the number of links for these isolated orbits with respect to the cores Θ(1) and Θ(9 ) of equations (6) and (7).

Table 1. The links between isolated orbits and original cores 6

Node Θ(1) of (6) Θ(9 ) of (7) Node Θ(1) of (6) Θ(9 ) of (7)

5 1 0 19 0 2

6 1 0 20 2 1

7 1 0 21 0 2

10 1 1 22 2 0

11 1 0 23 0 2

12 1 0 25 0 0

13 2 0 26 0 1

15 0 2 27 0 2

16 0 2 28 1 2

17 0 0 29 1 1

18 2 0 32 1 2

There are four isolated orbits, Θ(10 ) , Θ(17 ) , Θ(25) , Θ(29) , that have the same number of links to cores Θ(1) and Θ(9 ) . Based on the rule of merging of Zhang et al. [10], Θ(10 ) , Θ(17 ) , Θ(25) and Θ(29) may be merged into Θ(1) or Θ(9 ) such that there will be sixteen different partitions to separate the karate club network into communities. Thus revealing the merging rule of Zhang et al. [10] may imply too many different partitions. This phenomenon will be further discussed in our numerical examples 2-4 in Section 6. Without explanation, they decided that node 17 belongs to Θ(1) and nodes 10, 25 and 29 belong to Θ(9 ) . Therefore, in [10], they only consider one of the sixteen possible results which may not be the optimal partition for communities.

In the next section, we will use

community separability and modularity to decide the best partition among those sixteen possible partitions.

3. How to select the best partition

The intuitive explanation for partitioning a network into communities is to obtain the dense links within one community and the spare links between them. If there are several possible partitions for a network, how would one decide which one is the best one? Let us recall community separability and modularity in Zhang et al. [10]. 7

They considered an undirected network with a set of nodes D = {s1 ,..., s N } .

Its

adjacent matrix is denoted by A where Ai j = 1 when nodes i and j are connected and Ai j = 0 when they are not, where

N

ki = ∑ Ai j is the degree of node i , and j =1

m=

1 N ∑ ki is the total number of links in a network. The community separability M 2 i =1

is defined as

ki k j 1 N   − M = A ∑ i j 2m 2m i , j =1

  

+

(8)

 x, x ≥ 0 where x + =  , and the magnitude M indicates the strength of the structure 0, x < 0 the community possesses. For a network with node set D = {s1 ,..., s N } , if we partition the network into communities, then the modularity Q evaluates this community division Q=

1, where f (si , s j ) =   0,

k k 1 N   Ai j − i j ∑ 2m i , j =1 2m

  f (si , s j ) 

(9)

si , s j in the same community . si , s j in different communities

The smaller the divergence M − Q , the better the partition that was performed. We must point out that the community separability, M , is independent of partitions such that for two partitions to imply two modularity, say Q A and QB , finding min{M − Q A , M − QB }

(10)

max{Q A , QB }.

(11)

is equivalent to finding

After our discussion, the computation of the community separability M becomes unnecessary. We will compare those sixteen modularity values and then list the results in table 2.

8

Table 2. Results of modularity values for sixteen possible partitions Θ(10 ) Θ(17 ) Θ(25) Θ(29) Q Θ(10 ) Θ(17 ) Θ(25) Θ(29) Q

Merge to Θ(1) Θ(1) Θ(1) Θ(1) 0.2845 Θ(9 ) Θ(1) Θ(1) Θ(1) 0.2849

Θ(1) Θ(1) Θ(1) Θ(9 ) 0.2915 Θ(9 ) Θ(1) Θ(1) Θ(9 ) 0.2916

Θ(1) Θ(1) Θ(1) Θ(1) Θ(1) Θ(1) Θ(1) Θ(1) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(1) Θ(1) Θ(9 ) Θ(9 ) Θ(1) Θ(9 ) Θ(1) Θ(9 ) Θ(1) Θ(9 ) 0.3043 0.3109 0.2721 0.2788 0.2916 0.2980 Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(1) Θ(1) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(9 ) Θ(1) Θ(1) Θ(9 ) Θ(9 ) Θ(1) Θ(9 ) Θ(1) Θ(9 ) Θ(1) Θ(9 ) 0.3045 0.3108 0.2724 0.2788 0.2916 0.2977

The bold face indicates the maximum and the second best among these sixteen modularity values. In [10], they decided node 17 belongs to Θ(1) and nodes 10, 25 and 29 belongs to Θ(9 ) with the modularity value, 0.3108 . However, looking at the results of Table 2, we can see that Zhang et al. [10] did not attain the maximum. We must point out that the maximum belongs to the partition where nodes 10 and 17 belong to Θ(1) and nodes 25 and 29 belong to Θ(9 ) , with the modularity value,

0.3109 . In [3, 6, 10], it was decided that node 10 belonged to Θ(9 ) . However, we discovered that node 10 should belong to Θ(1) .

4. Our method to derive orbits as GAS

We must point out that from X , the set of all r-cycles in the network, to derive a group, say G = X



, with G =  Gn , where Gn = {x1 x 2 ...x n : x j ∈ X ,} is a n =1

complicated problem. For a network with N nodes, we know that G is a subset of

9

all one to one functions from {1,2,..., N } into {1,2,..., N } and then the cardinal number of G is finite (less than N ! ). However, how to derive G is a tedious work. For example, we refer to Figure 2 of Zhang et al. [10] of a network with 8 nodes and 12 links, then we find that G contained 144 elements. Please refer to Appendix 1 for the detailed list of those 144 elements. In [10], they first found all r-cycles as the generators to construct a group, and then they used GAS on nodes to derive orbits. This study attempted to locate the group for the karate club but could not complete said task. Partial solutions for the elements of G are in the following link: https://dl.dropboxusercontent.com/u/67654377/Supplement2.doc All algorithms described in this study were implemented using MATLAB V7.13 on a Windows 7 Professional operating system. The experiments were performed on an Intel Core i5 - 3.2GHz PC with 8GB of RAM. In this section, we will develop our method to replace their complicated GAS approach to obtain orbits directly. First, we will use the karate club to illustrate our method and then we will present our method in a rigorously approach to prove our simplified method still obtaining the same non-isolated orbits (Lemma 2) and isolated orbits (Lemma 3) as proposed by [10] from GAS. In the following, for illustration, we refer to the network of the karate club such that we will provide a simple approach without the group G but we still obtain the non-isolated orbits as equations (6) and (7) and the rests are isolated orbits.

For a family of set, say A = {ak : k ∈ Λ}, where Λ is a finite index set. For any two distinct elements, say aα and aβ in A , we define the following

10

intersection-union operations: (i)

If aα ∩ aβ ≠ φ , then we take the union of them to create a large set in A to replace aα and aβ .

(ii)

If aα ∩ aβ = φ , then we leave aα and aβ in A .

We assume that a network with nodes, denoted as N = {n : n ∈ Λ}, where Λ is a (finite) index set and the set all r-cycles is denoted as X = {x j : j = 1,2,..., m}. We assume that

{

}

E (x j ) = n : n ∈ N , x j n ≠ n .

(12)

For example, with the karate club, x1 = (1,2,3,4 ) to imply that x11 = 2, x1 2 = 3, x1 3 = 4, x1 4 = 1, and x1 n = n for n ∉ {1,2,3,4}, such that E ( x1 ) = {1,2,3,4} .

Hence, for the karate club, we consider A = {E (x j ) : x j ∈ X } where X of equation (5), that is A = {{1,2,3,4}, {1,2,4,8}, {1,2,4,14}, {1,3,4,14}, {9,31,33,34}, {24,30,33,34}}.

(13)

After a finite steps of intersection-union, we found that two disjoint subsets,

{1,2,3,4,8,14} and {9,24,30,31,33,34} , for the karate club with

X of equation (5).

The above disjoint subsets will imply two non-isolated orbits Θ(1) = {1,2,3,4,8,14} = Θ(2 ) = Θ(3) = Θ(4 ) = Θ(8) = Θ(14 ) ,

(14)

and Θ(9) = {9,24,30,31,33,34} = Θ(24 ) = Θ(30 ) = Θ(31) = Θ(33) = Θ(34 ) ,

(15)

and then the rest nodes (not in {1,2,3,4,8,14} or {9,24,30,31,33,34} ) will be isolated orbits.

Here, we provide the demonstration why our operations (i) and (ii) can obtain the

11

desired results. We will illustrate by the derivation of Θ(1) . If we assume x1 = (1,2,3,4 ) that is the function, say f that satisfies f (1) = 2 , f (2 ) = 3 , f (3) = 4 and f (4 ) = 1 .

(16)

It follows that x12 = (1,3)(2,4) , x13 = (1,4,3,2 ) and x14 = (1) , the identity function. We know that the subgroup group generated by x1 = (1,2,3,4 ) that contains four elements. Similarly, to simplify the expression for other elements in the generator, X , of equation

(5),

we

assume

x 2 = (1,2,4,8) ,

x3 = (1,2,4,14 ) ,

x 4 = (1,3,4,14 ) ,

x5 = (9,31,33,34 ) and x6 = (24,30,33,34 ) .

Based on equation (16), we recall that x11 = 2 , x12 1 = 3 , x13 1 = 4 , and x14 1 = 1 .

(17)

From equation (17), we derive that

{1,2,3,4} ⊆ Θ(1) .

(18)

Similarly, from x 2 = (1,2,4,8) , we imply that

{1,2,4,8} ⊆ Θ(1) .

(19)

{1,2,3,4,8,14} ⊆ Θ(1) .

(20)

x ij k = k

(21)

Consequently, we obtain that

On the other hand, we know that

for k = 1,2,3,4,8,14 , j = 5,6 and i = 1,2,3,4 . Any element, say g , in group G is a finite composition of x ij for j = 1,2,3,4,5,6 and i = 1,2,3,4 to denote that N (g )

g = ∏ x ij((nn))

(22)

n =1

owing to the composition of two functions is not commutative, where N (g ) is a natural number depending on g , j (n ) ∈ {1,2,3,4,5,6} and i (n ) ∈ {1,2,3,4} . If we 12

compute g1 , then x ij((kk))1∈ {1,2,3,4,8,14} , if j (k ) ∈ {1,2,3,4}

(23)

and x ij((kk))1∈ 1 , if j (k ) ∈ {5,6}.

(24)

Consequently, we know that x ij((kk))t ∈ {1,2,3,4,8,14} , if t ∈ {1,2,3,4,8,14} , and j (k ) ∈ {1,2,3,4}

(25)

x ij((kk))t = t , if t ∈ {1,2,3,4,8,14} , and j (k ) ∈ {5,6}.

(26)

and

From the results of equations (23-26), we find that Θ(1) ⊆ {1,2,3,4,8,14}.

(27)

From equations (20) and (27), we obtain that Θ(1) = {1,2,3,4,8,14} .

(28)

Similarly, we can derive other non-isolated orbits of equations (14) and (15). For other nodes, they are not in {1,2,3,4,8,14} or {9,24,30,31,33,34} , by the same reason as equation (21), then we know that

xjs = s

(29)

for j ∈ {1,...,6} and s ∉ {1,2,3,4,8,14,9,24,30,31,33,34} to imply that Θ(s ) = s

(30)

for s ∉ {1,2,3,4,8,14,9,24,30,31,33,34}.

From above our simple operations of intersection-union, without the derivation of the complicated group, G , we directly obtain the orbits proposed by GAS on the network of karate club. It pointed out that the GAS proposed by Zhang et al. [10] can be simplified. Next, we present our results in an abstract approach. 13

Motivated by our intersection-union operation, we assume that Y = {y i : i ∈ 1,2,..., s} where y i is a subset of X , with two properties: (i) For any pair i and k in {1,2,..., s} , with i ≠ k , any xα ∈ y i and x β ∈ y k , then

E ( xα ) ∩ E (x β ) = φ . (ii) For one index i ∈ {1,2,..., s} , there are two cases: Case 1: y i contains only one element. Case 2: y i contains at least two elements. For any two distinct elements xα and x β in y i , there is a finite sequence of xu , xu ,..., xδ in y i such that E (xα ) ∩ E (xu ) ≠ φ , E (xu ) ∩ E (xv ) ≠ φ , …, and E (xδ ) ∩ E (x β ) ≠ φ .

For example, with the karate club, where X is of equation (5), then y1 = {(1,2,3,4), (1,2,4,8), (1,2,4,14 ), (1,3,4,14 )} ,

(31)

y2 = {(9,31,33,34), (24,30,33,34 )}.

(32)

and

We assume Z = {z i : i = 1,2,..., s} , where z i = ∪{E (x j ) : x j ∈ y i }.

For one index i ∈ {1,2,..., s} and an element t ∈ z i , in the following, we will compute

x j t with x j ∈ X . We divide into two conditions: Condition (a), x j ∈ y i , and Condition (b) x j ∉ y i .

For Condition (a), there are two situations: If t ∉ E (x j ) , then x j t = t , so x j t ∈ Z i . 14

If t ∈ E (x j ) , then x j t ≠ t , but x j t ∈ E (x j ) ⊆ Z i , since x j ∈ y i . From both situations, we derive that x j t ∈ Z i .

For Condition (b), we will claim that t ∉ E (x j ) . By the way of contradiction, we assume that t ∈ E (x j ) . From t ∈ Z i , we know that there is at least one xε ∈ y i with t ∈ E ( xε ) , then t ∈ E ( xε ) ∩ E (x j ) to imply that

x j ∈ y i which is violated to Condition (b). From t ∉ E (x j ) , we know that x j t = t ∈ Z i .

Now, we combine the findings for Conditions (a) and (b), to imply that x j (t ) ∈ Z i that is, we derive that

x j (Z i ) ⊆ Z i .

(33)

We recall equation (22) to imply that for t ∈ Z i , and for any g ∈ G = X , to denote

g = xα xβ ...xσ xλ , and then we compute

gt = xα xβ ...xσ (xλ t ) ∈ xα xβ ...xσ xλ Z i ⊆ xα xβ ...xσ Z i ⊆ xα Z i ⊆ Z i .

(34)

Hence we derive that given an element t , with t ∈ Z i , Θ(t ) = {gt : g ∈ G} ⊆ Z i .

(35)

Next, we will begin to prove that Z i ⊆ Θ(t ) , for any element t ∈ Z i , that is, for a predetermined element t in Z i , we will prove that Z i ⊆ Θ(t ) . For this predetermined element t , wing to t ∈ Z i , there is xα with xα ∈ y i and 15

t ∈ E ( xα ) . For one element s ∈ Z i , there is x β with x β ∈ y i and s ∈ E (x β ) . Our goal is to verify that s ∈ Θ(t ) . There are two circumstances: (a) xα = x β and (b) xα ≠ x β .

{

}

For circumstance (a), s ∈ xα t , (xα )2 t ,... such that s ∈ Θ(t ) . For circumstance (b), there are elements xu , xu ,..., xη , xδ

in y i such that

E (xα ) ∩ E (xu ) ≠ φ , E (xu ) ∩ E (xv ) ≠ φ , …, E (xη ) ∩ E (xδ ) ≠ φ , E (xδ ) ∩ E (x β ) ≠ φ . We denote that tα ∈ E (xα ) ∩ E (xu ) , t u ∈ E (xu ) ∩ E (xv ) ,…, tη ∈ E (xη ) ∩ E (xδ ) , and

tδ ∈ E ( xδ ) ∩ E (x β ) . There are natural numbers, nα , nu ,..., nδ , n β with (xα )nα t = tα , (xu )nu tα = t u , …,

( xδ ) n

δ

tη = tδ , and (xβ ) β tδ = s . We find that n

(x ) (x ) ...(x ) (x ) β



δ



nu

u

α



t = s.

(36)

For circumstances (a) and (b), we show that s ∈ Θ(t ) that is verify for any s ∈ Z i to imply that Z i ⊆ Θ(t ) .

(37)

Finally, we combine equations (35) and (37) to conclude that Z i = Θ(t ) , for any t ∈ Z i . We summarize our results in the next Lemma.

Lemma 2. For any t ∈ Z i , we prove that Z i = Θ(t ) .

For t ∈ N and t ∉  Z i , we know that t ∉  E (x j ) to imply that x j t = t for s

m

i =1

j =1

16

j = 1,2,..., m and then gt = t for any g ∈ G . Hence, we obtain that Θ(t ) = {t} that is an

isolated orbit. We summarize our findings in the next Lemma.

s

Lemma 3. For any t ∉  Z i , we prove that Θ(t ) = {t} i =1

Theorem 1. We can use intersection-union operation to replace the GAS proposed by Zhang et al. [10]. (Proof) Based on our Lemma 2, we know non-isolated orbits and based on our Lemma 3, we know isolated orbits by our intersection-union operation, without knowing the group, G = X , generated by X .

5. Our iterative algorithm

In the previous section 3, we demonstrated that the network of karate club will imply sixteen possible partitions by the method proposed in [10] and then we have to compute the modularity values sixteen times.

This observation provides us a

motivation to reduce the computation amount of modularity values. Here, we will provide our revised merging algorithm. We can avoid the complicated GAS of nodes to find non-isolated orbits as cores for future developing communities. The merging rule of [10] was then applied with our enhancement. Our new policy is a recursive process containing several steps to merge isolated orbits to cores. If an isolated orbit has the same number links to multiple cores, then we will not merge this isolated orbit in any developing communities during this step.

We will demonstrate our new policy by using the example of karate club. 17

From Table 1, in the first step, we will merge single orbits Θ(5) , Θ(6) , Θ(7 ) , Θ(11) , Θ(12) , Θ(13) , Θ(18) , Θ(20) , and Θ(22) to the core of Θ(1) .

Similarly, from Table 1, in the first step, we will merge single orbits Θ(15) , Θ(16) , Θ(19) , Θ(21) , Θ(23) , Θ(26) , Θ(27 ) , Θ(28) , and Θ(32) to the core of Θ(9) .

The remaining single orbit, Θ(10 ) , Θ(17 ) , Θ(25) and Θ(29) , will not be merged to cores Θ(1) or Θ(9) , during the first step, because they have the same number of links to both Θ(1) and Θ(9) . After the first step of the merging process, we know that the cores are extended as Θ(1) = {1,2,3,4,5,6,7,8,11,12,13,14,18,20,22}

(38)

Θ(9 ) = {9,15,16,19,21,23,24,26,27,28,30,31,32,33,34} .

(39)

and

Hence, in following Figure 3, nodes of the orbit Θ(1) are marked in yellow and nodes of orbit Θ(9) are marked in green.

Insert Figure 3 here

In the second step, we will reconsider the links between (a) a single undetermined orbit with (b) two new cores derived in the first phase, respectively. We listed the number of links for those undetermined isolated orbits with respect to Θ(1) and Θ(9 ) of equations (38) and (39) in the next table.

Table 3. The links between isolated orbits and developing communities Θ(1) of (38) Θ(9 ) of (39)

Θ(10 ) 1 1

Θ(17 ) 2 0

Θ(25) 0 3

Θ(29) 1 2

From Table 3, we know that Θ(17 ) should be merged to Θ(1) . On the other hand, 18

Θ(25) and Θ(29) should be merged to Θ(9 ) in the second step. Hence, we can

further extend the cores as Θ(1) = {1,2,3,4,5,6,7,8,11,12,13,14,17,18,20,22}

(40)

Θ(9) = {9,15,16,19,21,23,24,25,26,27,28,29,30,31,32,33,34}.

(41)

and

In the following Figure 4, nodes of the orbit Θ(1) are marked in yellow and nodes of orbit Θ(9) are marked in green.

Insert Figure 4 here

There is still an isolated orbit, Θ(10 ) , that still remains to be merged to Θ(1) of equation (40) or Θ(9 ) of equation (41). However, we found that there is one link between the isolated orbit, Θ(10 ) , and Θ(1) . On the other hand, there is also one link between the isolated orbit, Θ(10 ) , and Θ(9 ) . Hence, there are two possible partitions of the network into communities which are the two results marked in bold in Table 2. Through our improved iterative process, we simplified the possible partitions form sixteen possible cases to only two cases, that will simplify the computation amount of modularity values, Q , from sixteen to two. In Section 6, we will demonstrate the reduction of the computation amount of modularity values for other three examples.

We provide an algorithm for our iterative approach as follows.

Step 1. In a network, there are n nodes, denoted by V1 ,...,Vn . With some pairs of nodes, for example Vi and V j , there is a link between them that is in the adjacency matrix, Aij = 1 . If there is no edge connecting nodes Vi and V j , then Aij = 0

19

Step 2. For four nodes, {Va ,Vb ,Vc ,Vd } forms a 4-cycle (a click) if and only if Aij = 1 for every pair of nodes Vi and V j with i, j ∈ {a, b, c, d } and i ≠ j . That is, Aab Abc Acd Ada Aac Abd = 1 . Step 3. Among C (n,4 ) =

n! possible cases, we check whether or not they are 4(n − 4)!4!

cycle. Step 4. After we find all 4-cycles, we set them as X = {xα : xα = {Vi ,V j ,Vk ,Vl }, α ∈ Ω} such that {Vi ,V j ,Vk ,Vl } forms a 4-cycle, where Ω is an index set. Step 5. We apply our intersection-union method to find non-isolated orbits as cores to develop communities. Other isolated orbits will join a core by the following recursive steps. Step 6. We assume that there are cores {C1 ,...Ct } , for an isolated orbit, say V j0 , then

(

we compute m Cβ ,V j0

) that

{

}

is the cardinal number of Aij 0 : Vi ∈ Cβ , for

β = 1,2,..., t .

{(

)

(

)}

(

)

Step 7. For m C1 ,V j0 ,..., m Ct ,V j0 , if there is only one number, say m Cγ ,V j0 that

{(

)

(

attains the maximum of m C1 ,V j0 ,..., m Ct ,V j0

)} then

as Zhang et al. [10]

mentioned, that isolated orbit V j0 will join the core Cγ , after comparisons for all isolated orbits are completed. Step 8. If there are two numbers γ and η in {1,2,..., t} that satisfy

(

)

(

)

{(

)

(

)}

m Cγ ,V j0 = m Cη ,V j0 = the maximum of m C1 ,V j0 ,..., m Ct ,V j0 ,

(42)

then we will not merge the isolated orbit V j0 into any core, after comparisons for all isolated orbits are completed. Step 9. For isolated orbits that satisfy the condition of Step 7, we merge them into the 20

{(

)

(

)}

core that attain the maximum of m C1 ,V j0 ,..., m Ct ,V j0 , such that cores are extended. For example, an extended core index by 1 is the union of the original core C1 and those isolated orbits V j satisfy

{1} =

{p : m(C ,V ) > p

j

}

max m(Ci ,V j ) .

i =1, 2 ,..., t , i ≠ p

(43)

Step 10. Repeat Steps 6-9, with iteratively increasing cores. Step 11. If all isolated orbits are merged, then the partition is completed. Otherwise, if there are some undecided isolated orbits left, denoted as

Vε 1 ,...,Vε k , then we will compute all possible partitions for k orbits into t enlarged cores to compare t k modularity values, the bigger, the better.

6. Numerical examples

In this section, we compare our approach with other partition methods. For the karate club, we partition by the Newman [11] approach, to find the maximum eigenvalue

λmax = 5.439 , and then assume the corresponding eigenvector, denoted as V = [vi ]64×1 . We define Θ(1) = {i : vi ≤ 0} and Θ(34) = {i : vi > 0} , then Θ(1) = {1,2,3,4,5,6,7,8,10,11,12,13,14,17,18,20,22} ,

(44)

Θ(34 ) = {9,15,16,19,21,23,24,25,26,27,28,29,30,31,32,33,34} .

(45)

and

We check the above partition to find that is identical to Zhang et al. [10] such that we know the modularity value, M = 0.3108 .

Next, we compare three different

approaches: Newman [11], Zhang et al. [10] and our iterative algorithm by modularity values to find that

21

M Newman = 0.3108 = M Zhang < M Ours = 0.3109

(46)

to imply that for the karate club network, our iterative algorithm has the best modularity among them. In the following, we will construct other three numerical examples. We are motivated by a branch marked example proposed by Girvan and Newman [3]. For networks with predesigned communities, we had tested 3 examples which gave us the result that there is only one core within each predesigned community, with 64, 96, and 128 nodes such that each community contains 32 nodes with predesigned 2, 3, 4 communities, for examples 2, 3 and 4, respectively. (The first example is the karate club.) Within the same community, each node has 8 links, and for other 3 different communities, each node has 3, 2, and 3 links to connect. For four different partition methods: (a) Hierarchical clustering, (b) K-means, (c) Newman [11] and (d) our iterative algorithm, we compute their modularity values and list them in the next table 4.

Table 4. Comparison by modularity values for four different methods Hierarchical

K-means

Newman [11]

clustering

Our iterative algorithm

Example 2

0.069243

0..033300

0.174183

0.179688

Example 3

-0.001089

-0.07375

0.108999

0.185648

Example 4

-0.01997

-0.001278

0.130680

0.196999

Based on above table, we claim that our iterative algorithm can partition a predesigned network into communities with the best modularity value than other methods of hierarchical clustering, K-means, and Newman [11], when there is only one core within each predesigned community.

22

In the following, we begin to discuss the efficiency for our revised algorithm with Zhang et al. [10]. We list the number of modularity values in the next Table 5.

Table 5. Comparison between Zhang et al. [10] and our iterative approach for the computation amount of modularity values karate club Zhang et al. [10] Our iterative approach

4

2 21

Ex. 2

Ex. 3

Ex. 4

17

36

4 48 41

2

3

0

0

We find that by Zhang et al. [10], there are 4, 17, 36 and 48 nodes that can not be decided which community to merge.

Consequently, the computation amount of

modularity values are 24 , 217 , 336 and 4 48 that will increase dramatically as the size of networks becomes larger. On the other hand, by our iterative algorithm, we find that the computation amount of modularity values are 21 , 0 , 0 and 41 to indicate that our iterative algorithm improves Zhang et al. [10].

Remark. Our previous discussion was based on the application of modularity as the criterion in deciding which partition implies best communities in the given network. However, Fortunato and Barthelemy [12], Kumpula et al. [13], and Good et al. [14] have pointed out some drawbacks of using modularity. Fortunato and Barthelemy [12], and Kumpula et al. [13] showed the resolution limit in a ring-like network, to obtain the maximum value of modularity; some small communities are merged together. Good et al. [14] used numerical examples of a ring network, a hierarchical random graph model, and a metabolic network to illustrate that near the maximum value of modularity; there are many peaks with (a) different partition structures, (b) varied 23

module density, and (c) diverse distribution of module size. Hence, previous authors have cautioned us about the pitfalls of deducing conclusions from modularity.

7. Questionable results of GAS algorithm for the branch marked example by Girvan and Newman [3]

We do not possess the capability to apply GAS algorithm (or our simplified version of intersection-union operation) straightforwardly on the branch marked example of Girvan and Newman [3] with a network containing 128 nodes partitioned into four predesigned communities as close as possible. Each predesigned community consists of 32 nodes with 8 links each to its other 31 nodes belonging to the same predesigned community, and 8 links each connected to a total of 96 nodes of the remaining three predesigned communities. Nonetheless, a special setting can be used to show that almost always (> 99%) the GAS algorithm is unable to produce 4 cores within 4 predesigned communities and subsequently merge the other isolated orbits to create 4 communities that are similar to the initial predesigned communities. Three programs (marked a-c) is developed for the first predesigned community: (a): Develop a 32-node network where every node has 8 links to the other 31 nodes. (b): Determine the 4-clicks within said network. (c): Apply our intersection-union operation to check how many cores result from the 4-clicks in (b). The following link shows the complete details of the programs above: https://dl.dropboxusercontent.com/u/67654377/findCore.rar

We executed the network for 1,000 repetitions and discovered that in 281 instances, 24

there is only one core while the remaining 719 tests show more than one core. Therefore, if we use the GAS algorithm by Zhang et al [10] on the original network with 128 nodes, and then there is only a (0.281) = 0.62% probability of having four 4

cores in every predesigned community. This indicates the GAS algorithm by Zhang et al. [10] is unable to give a meaningful partition result for a network.

8. Burnside Theorem is useless to predict the number of communities

We cite from Zhang et al. [10] at page 1174, Lines 11-15, “Furthermore, the following fundamental theorem can be used to calculate the number of communities in the division. Theorem 1 (Burnside). If finite group G acts on a set V , the number of the orbits under the action of G on V is L=

1 G

∑ χ (g )

(47)

g ∈G

where χ ( g ) = {x : gx = x, x ∈ V } is the number of fixed points of g acting on V .”

The purpose of this section is to claim that Theorem 1 of Burnside is useless to calculate the number of communities in the network.

Let us recall the network of karate club with 34 nodes. There are six 4-clicks that are listed in Equation (5). Without finding the group generated by these six 4-clicks, using our intersection-union operation, we find the two cores (non-isolated orbits) as Equations (6) and (7), where Θ(1) = {1,2,3,4,8,14}, and Θ(9) = {9,24,30,31,33,34} , and then the rest are isolated orbits. Hence, Burnside Theorem implies that L = 24 consists of 2 non-isolated orbits and 22 25

isolated orbits.

It is confirmed that number of communities in the network which is the number of non-isolated orbits. Hence, the finding of 24 derived by Burnside Theorem is useless to find the number of cores for the karate club network.

On the other hand, we recall the example of the Figure 2 of Zhang et al. [10] that was cited in our Appendix 1, with 8 nodes where V = {1,2,3,4,5,6,7,8} and four 3-clicks as X = {(1 2 4 ), (2 3 4 ), (5 6 8), (5 7 8)} , by our intersection-union operation, we found

cores as Θ(1) = {1,2,3,4} and Θ(5) = {5,6,7,8} such as the final partition is Θ(1) and Θ(5) for the network V .

On the other hand, for this network with 8 nodes, we followed the suggestion of Zhang et al. [10] to find the group generated by X = {(1 2 4), (2 3 4 ), (5 6 8), (5 7 8)} that contains 144 elements that are listed in Appendix 1. Moreover, we examined the Burnside Theorem to compute χ ( g ) where g belongs to .(these 144 elements). For the detailed computation, please refer to Appendix 1. We find that L = 2 , since there are 2 non-isolated orbits and no isolated orbits. For this network, Burnside Theorem is applicable to indicate the number of communities : L = 2 .

We have conclusion that only for those networks satisfying that the union of cores (non-isolated orbits), which equals to all nodes of the network. Burnside Theorem is applicable to predict the number of communities in the network.

26

In general, networks contain isolated orbits to imply that the finding of Burnside Theorem is too large that is useless to predict the number of communities. To compute the number of non-isolated orbits can be efficiently obtained by our proposed intersection-union operation.

We must point out that there is a fundamental problem before applying the Burnside Theorem that is to find the group V , generated by all n-clicks which is a very difficult task. Our proposed intersection-union operation not only simplifies the difficult task to find the group V , but also replaces the useless Burnside Theorem to predict the number of how many communities in the network.

9. Comparison between Zhang et al. [10] and Newman [11]

The previous Section that we mentioned from former application program was about to construct a network of 128 nodes with 4 predesigned communities proposed by Girvan and Newman [3] that requires the following restriction: We use the 16 links of node 1 to illustrate. For node 1, there are 8 links to connect node 1 with nodes among 2 to 32 and there are other 8 links to connect node 1 with nodes among 33 to 128.

We divided the 128 × 128 adjacency matrix into 16 sub-matrices of 32 × 32 matrix such as the example we constructed: (a) 8 links to connect node 1 with nodes among 2 to 32, (b) 3 links to connect node 1 with nodes among 33 to 64, (c) 2 links to connect node 1 with nodes among 65 to 96, 27

(d) 3 links to connect node 1 with nodes among 97 to 128.

We would name our simplification of 3-2-3 arrangement for node 1 with nodes 33 to 128 as “3-2-3 simplification” for later discussion. The purpose of 3-2-3 simplification is to construct a network with 128 nodes and 4 predesigned communities that there is only one core within the predesigned community to demonstrate our merging process reduces the computation amount of modularity as Table 5.

There are 11 clicks, and then by our intersection-union operation, there are 9 cores as {3, 4, 20, 29}, {5, 11, 14, 15, 32}, {6, 7, 16, 25}, {19, 24, 26, 28}, {35, 37, 38, 44, 57,} {39, 46, 48, 54}, {67, 86, 87, 91}, {109, 116, 118, 123} and {119, 121, 127, 128}.

For the detailed merging process, please refer to following link: https://dl.dropboxusercontent.com/u/67654377/Matlab_Code_for_R2.rar

After five times merging isolated nodes into developing cores, there still is an isolated node, #106, having three links with cores 1, 2, 5 and 7, respectively using modularity values to decide node #106 belonging to which core? We list the computation results in the following table, where M i is the modularity value when node #106 is assigned to core i .

Table 6. The computation results for modularity values for node #106.

Mi

i =1

i=2

i=3

i=4

i=5

i=6

i=7

i =8

i=9

0.024

0.035

0.008

0.016

0.020

0.013

0.021

0.016

0.009

28

From Table 6, we know that node #106 should be merged to core 2.

In the next table 7, we list the detailed results for final communities derived by our improved merging iterative process. The presentation is expressed related to the original predesigned communities of P1 = {1,2,…, 32}, P2 = {33, 34,…, 64}, P3 = {65, 66,…,96} and P4 = {97, 98,…,128}, where C j is denoted as the final jth communities derived by Zhang et al. [10] with our improvement.

Table 7. The detailed list of the final partition derived by 9 cores expressed corresponding to the original predesigned communities.

C1

1 3 4 10 12 13

36

90

99 104 115

2 5 9 11 14 15

33 40 41 43 49 50

65 68 77 79 81 84

97 98 100 103 106

17 31 32

56 58 63

88 89 93 95 96

108 110 111 112

18 20 22 29 30

C2

113 117 125 C3

6 7 16 21 25

C4

8 19 23 24 26

112 34 62 64

102 124

27 28 C5

35 37 38 42 44 47

70 76 85

120

53 57 60 C6

39 45 46 48 51 52

69

54 55 61 66 67 71 73 74 78

C7

80 82 86 87 91 92 C8

59

75 83 94

101 105 107 109 114 116 118 123

29

126 72

C9

110 119 121 122 127 128

Based on Table 7, we compute the data between 9 communities and 4 predesigned communities.

Table 8. Comparison between predesigned communities with derived 9 communities.

C1

C2

C3

C4

C5

C6

C7

C8

C9

P1

11

9

5

7

0

0

0

0

0

P2

1

9

0

3

9

9

0

1

0

P3

1

11

0

0

3

1

12

3

1

P4

3

12

1

2

1

0

0

9

6

We use the following formula to estimate the performance of a partition: 4

max ∑ Pj ∩ Ci

(48)

j =1

where i ∈ {1,2,...,9} satisfying if α ≠ β , then a ≠ b with Pα ∩ C a and Pβ ∩ Cb in the 4

max ∑ Pj ∩ Ci . j =1

From Table 8, we can conclude that the original predesigned 4 communities, P1 , P2 , P3 and P4 are transformed as C1 , C6 , C7 and C2 such as we derive that 4

max ∑ Pj ∩ Ci = P1 ∩ C1 + P2 ∩ C6 + P3 ∩ C7 + P4 ∩ C2 j =1

= 11 + 9 + 12 + 12 = 44 .

(49)

We recall another partition method proposed by Newman [11] and then we list the 30

partition result in table 9, where R j , for j = 1,2,3,4 , denote the partition results by Newman [11].

Table 9. The detailed list of the final partition derived by Newman [11] expressed corresponding to the original predesigned communities.

R1

R3

R2

65, 66,…,96

P1

97, 98,…,128

P2 P3

R4

1 2 3 4 8 10 12 13 18 19

33 34 36 38 39 40

20 22 27 29 30

45 46 48 51 54 55 61 63 64

P4

5 6 7 9 11 14 15 16 17 21

35 37 41 42 43 44

23 24 25 26 28 31 32

47 49 50 52 53 56 57 58 59 60 62

Based on Table 9, we compute the data between (a) 4 communities based on Newman [11] with (b) the four originally predesigned communities proposed by Girvan and Newman [3], list them in the next table.

Table 10. Comparison between predesigned communities with 4 communities derived by Newman [11].

R1

R3

R2

R4

P1

0

0

32

0

P2

0

0

0

32

P3

15

15

0

0

31

P4

17

0

17

0

We compute 4

max ∑ Pj ∩ Ri

(50)

j =1

where i ∈ {1,2,3,4} satisfying if α ≠ β , then a ≠ b with Pα ∩ Ra and Pβ ∩ Rb in the 4

max ∑ Pj ∩ Ri , to derive that j =1

4

max ∑ Pj ∩ Ri = P1 ∩ R3 + P2 ∩ R4 + P3 ∩ R1 + P4 ∩ R2 j =1

= 32 + 32 + 15 + 17 = 96 .

(51)

We demonstrate that for a branch marked example proposed by Girvan and Newman [3], we execute two different partition methods to find the performance 44 96 = 34.4% < = 75% 128 128

(52)

to indicate that the partition method, Group Action on Set (GAS), is far less than the eigenvector approach proposed by Newman [11].

We pointed out that the fundamental problem within GAS approach is that within the predesigned communities, the number of cores is more than just one. For the above example, the number of cores within each predesigned communities is 4, 2, 1 and 2, respectively.

We conclude that the partition method, Group Action on Set (GAS), proposed by Zhang et al. [10] after our two revisions: (i) Intersection-union operation to replace the group generated by clicks, (ii) Iterative merging process to reduce the burden of modularity values, 32

still can not be treated as a robust partition method, to owing too many cores in the initial stage, we advise practitioners think twice before apply the GAS approach in their research.

10. Direction for future research

Up to now, we developed a computer program for the GAS algorithm based on our two improvements: intersection-union operation and iterative merging process for the branch marked example of Girvan and Newman [3], to run for 1000 times. The average accuracy ratio is 42.2% . For the detailed computation program, please refer to following link: https://www.dropbox.com/s/ur8nuljboultam9/Matlab_Code_for_R3.rar?dl=0 In the future, an interesting research topic involves providing a comparison between (a) Zhang et al. (2013) and (b) other community detection algorithms to show the average accuracy ratio and efficiency among them.

11. Conclusion

The findings of this paper are fivefold. First, we demonstrated that several possible outcomes to partition a network into communities will arise according to the merged rule of Zhang et al. [10]. Second, using the same example in Girvan and Newman [3], Donetti and Muñoz [6], and Zhang et al. [10], to point out that the partitions made for the karate club in [3, 6, 10] in the above three papers were not attained the maximum of modularity value. Third, we develop a new method to replace the complicated GAS proposed by Zhang et al. [10]. Fourth, we had provided a patchwork for the merged algorithm of Zhang et al. [10] to reduce the computation 33

effort of modularity values for partitions with multiple results. Lastly, using the branch marked example of Girvan and Newman [3], we note that the GAS (our simplified version: intersection-union operation) proposed by Zhang et al. [10] offers only 0.62% success rate in dividing the network into four predesigned communities. In Zhang et al. [10], they have shown the performance for networks with 5000 nodes. However, there are two very difficult problems in their GAS approach: (a) How to generate the group from clicks, (b) How to handle those undecided nodes, such that their results should be treated as questionable assertions. We derive our intersection-union operation to avoid generating the group from clicks. Moreover, we construct an iterative merging process and apply the modularity value to solve partition problem for those undecided nodes. At last, we point out that there will be too many cores in the initial stage derived by Zhang et al. [10] such that we would suggest researchers to think more before applying the GAS method proposed by Zhang et al. [10].

Acknowledgements Authors greatly appreciated for the financial support from Ministry of Sciences and Technology with grand numbers, 104-2410-H-015-007 and 105-2410-H-015-006.

Reference

[1] M.E.J. Newman, Eur. Phys. J. B 38 (2004) 321-330. [2] F. Radicchi, C. Castellano, F. Ceccni, V. Loreto, D. Parisi, Proc. Natl. Acad. Sci. USA 101 (2004) 2658. [3] M. Girvan, M.E.J. Newman, Proc. Natl. Acad. Sci. USA 99 (2002) 7821. [4] J.B. Risatti, W.C. Capman, D.A. Stahl, Proc. Natl. Acad. Sci. USA 91 (1994) 34

10173. [5] G. Palla, I. Derenyi, I. Farkas, T. Vicsek, Nature (London) 435 (2005) 814. [6] L. Donetti, M.A. Muñoz, J. Stat. Mech. (2004) P10012. [7] J.B. Bagrow, E.M. Bolt, Phys. Rev. E 72 (2005) 046108. [8] A. Lancichinetti, S. Fortunato, J. Kertesz, New j. Phys. 11 (2009) 033015. [9] M.E.J. Newman, Soc. Networks 27 (2005) 39. [10] Z. Zhang, X. Jiang, L. Ma, S. Tang, Z. Zheng, Physica A 390 (2011) 1171. [11] M.E.J. Newman, Proc. Natl. Acad. Sci. USA 103 (2006) 8577. [12] S. Fortunato, M. Barthelemy, Proc. Natl. Acad. Sci. USA 104 (2007), 36. [13] J.M. Kumpula, J. Saramaki, K. Kaski, J. Kertesz, Eur. Phys. J. B 56 (2007) 41. [14] B.H. Good, Y.A. De Montjoye, A. Clauset, arXiv:0910.0165 (2010).

35

Appendix 1. The detailed list of 144 elements for the group generated by a network with 8 nodes

We had considered Figure 2 of Zhang et al. [10] for a network of 8 nodes with 12 links. Please refer to Figure 5 for links between nodes. The set of clicks, forming 3-cycles, X = {(1 2 4), (2 3 4 ), (5 6 8), (5 7 8)}, in Zhang et al. [14], only denoted G3 = X without the detailed data for G3 . For group operation , we find that

(1 2 4)(1 2 4) = (1 4 2)

(1 2 4)(1 2 4)(1 2 4) =

is the identity function denoted as = I

(1 2 4)(2 3 4) = (2 3 1) (1 3 2)(1 3 4) = (1 2)(3 4) Based on above observation, we know that G3 = {xy : x ∈ G A , y ∈ GB }, where GA = {I , (1 2 4), (1 4 2), (2 3 4), (2 4 3), (1 2 3), (1 3 2), (1 3 4), (1 4 3), (1 2)(3 4), (1 3)(2 4), (1 4 )(2 3)}

and GB = {I , (5 6 8), (5 8 6 ), (6 8 7 ), (6 7 8), (5 7 6 ), (5 6 7 ), (5 7 8), (5 8 7 ), (5 6)(7 8), (5 7 )(6 8), (5 8)(6 7 )}

such that G3 contains 144 elements.

Next, we will use the Burnside’s Theorem (Theorem 1 of Zhang et al. [10]) to check our findings such that we evaluated the fixed point for elements in G3 . The set of fixed point for I = χ (I ) = {1,2,...,8} = V , where χ ( f ) is the fixed point for the function f . The set of fixed points for (1 2 4 ) = χ (1 2 4 ) = {3,5,6,7,8} The set of fixed points for (1 2 4 )(5 6 8) = χ ((1 2 4)(5 6 8)) = {3,7}

36

χ ((1 2 )(3 4 )) = {5,6,7,8}

χ ((1 2)(3 4)(5 6 8)) = {7} χ ((1 2)(3 4)(5 6)(7 8)) = the empty set Hence we compute

∑ χ (g ) =

g ∈G3

8 + 8 ⋅ 5 ⋅ 2 + 8 ⋅ 8 ⋅ 2 + 3 ⋅ 4 ⋅ 2 + 3 ⋅ 8 ⋅ 2 + 3 ⋅ 3 ⋅ 0 = 288 and then derive that 1 G3

∑ χ (g ) = 144 = 2 288

g ∈G3

to indicate that there are two different orbits after GAS. The above computation is consistent with Burnside’s Theorem cited as Theorem 1 of Zhang et al. [10].

37

23

13

20

15

10

27

4 5 16

31

14

6 17

34

30

9

7 1

21

33

3

11 2

19 29

24 28

32 26

8

22

18 12

25

Figure 1. The network for karate club reproduced from Figure 5 of Zhang et al. [10].

38

23

13

20

15

10

27

4 5 16

31

14

6 17

34

30

9

7 1

21

33

3

11 2

19 29

24 28

32

8

26

22

18 12

25

Figure 2. The cores of Θ(1) and Θ(9 ) in Step 5 of our algorithm

39

23

13

20

15

10

27

4 5 16

31

14

6 17

34

30

9

7 1

21

33

3

11 2

19 29

24 28

32

8

26

22

18 12

25

Figure 3. The cores of Θ(1) and Θ(9 ) after the first merging

40

23

13

20

15

10

27

4 5 16

31

14

6 17

34

30

9

7 1

21

33

3

11 2

19 29

24 28

32

8

26

22

18 12

25

Figure 4. The cores of Θ(1) and Θ(9 ) after the second merging

41

6 3 5

4 2

8 1 7

Figure 5. Reproduction of Figure 2 of Zhang et al. [10].

42

Highlights We use modularity values to solve the incomplete algorithm of Zhang et al. (2011). Our revised algorithm reduced the computation amount of modularity values. We provide our intersection-union operation to replace the GAS algorithm. The GAS algorithm of Zhang et al. (2011) will result in too many cores in the initial stage and then too many communities. We point out the Burnside Theorem, cited in Zhang et al. (2011), that is no use to decide the number of communities