Decentralized mining social network communities with agents

Mathematical and Computer Modelling 57 (2013) 2998–3008 Contents lists available at SciVerse ScienceDirect Mathematical and Computer Modelling journ...

Download PDF

1MB Sizes 1 Downloads 106 Views

Report

PDF Reader
Full Text

Mathematical and Computer Modelling 57 (2013) 2998–3008

Contents lists available at SciVerse ScienceDirect

Mathematical and Computer Modelling journal homepage: www.elsevier.com/locate/mcm

Decentralized mining social network communities with agents Jing Huang a,b , Bo Yang a,b,∗ , Di Jin c , Yi Yang a,b a

College of Computer Science and Technology, Jilin University, ChangChun 130012, China

b

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, ChangChun 130012, China c College of Computer Science and Technology, Tianjin University, China

article

info

Article history: Received 9 December 2009 Received in revised form 17 February 2013 Accepted 8 March 2013 Keywords: Social network Community mining Multi-agent system Decentralized algorithm

abstract Network community mining algorithms aim at efficiently and effectively discovering all such communities from a given network. Many related methods have been proposed and applied to different areas including social network analysis, gene network analysis and web clustering engines. Most of the existing methods for mining communities are centralized. In this paper, we present a multi-agent based decentralized algorithm, in which a group of autonomous agents work together to mine a network through a proposed self-aggregation and self-organization mechanism. Thanks to its decentralized feature, our method is potentially suitable for dealing with distributed networks, whose global structures are hard to obtain due to their geographical distributions, decentralized controls or huge sizes. The effectiveness of our method has been tested against different benchmark networks. © 2013 Elsevier Ltd. All rights reserved.

1. Introduction Social network is one of the most important complex networks, which aims to describe the interactive relationship among a group of active actors. Many systems in the real world such as human societies and different types of natural ecosystems can be modeled as social networks. For example, in the karate club network shown in Fig. 1, each node denotes a club member and each link denotes the friendship between two individuals. Different types of social networks demonstrate distinct functions. However, scientists have discovered that most existing social networks with different structures and different functions share some common statistic properties such as smallworld effect [1,2], scale-free regularity [3], network transitivity [1], network motif [4] and network modularity [5–7]. In particular, one can easily understand how a complex social network comes into being hierarchically based on some simple and basic building blocks such as network motifs and network communities. Nowadays, research has shown that quite many social networks are made up of multiple communities, which are defined as groups within which the social interactions among actors are very intensive, but between which they are very weak [5–9]. From the network shown in the Fig. 1, we can observe two obvious network communities, i.e., cluster A and cluster B. The intra-community links are much denser than inter-community ones. Actually, these two communities completely correspond to two real groups after the karate club was finally split into two parts due to some social reasons. Community A was led by the administrator of the club (denoted as node 1) and community B was led by the coach of the club (denoted

∗ Corresponding author at: College of Computer Science and Technology, Jilin University, ChangChun 130012, China. Tel.: +86 13069008969; fax: +86 431 85166063. E-mail addresses: [email protected], [email protected] (B. Yang). 0895-7177/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.mcm.2013.03.005

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

2999

Fig. 1. The karate club network constructed by Wayne Zachary in the 1970s [15].

as node 33). As shown by above simple example, being able to discover network communities is very helpful for human to understand social networks in depth and thus uncover some useful patterns hidden behind huge or complex networks. Besides social network analysis, a lot of problems can be transformed into network community mining problem once their problem domains are modeled as networks, such as gene network clustering [7,10] web page clustering [11,12] image segmentation [13] and sensor network clustering [14]. The task of mining network communities can be formalized as a graph partition problem, which has been proven to be NP-complete [13,16]. In the literature, many efforts have been taken to find various approximate algorithms to efficiently and effectively address this tough task. Most of them fall into four main categories: (1) graph theoretic methods such as spectral methods [13,17,18], the maximum flow communities (MFC) algorithm [11] and the hyperlink-induced topic search (HITS) algorithm [12]; (2) hierarchical methods such as Girvan–Newman algorithm [5,6] and their improvements [19,20]; (3) greedy optimization methods such as Kernighan–Lin algorithm [21] fast Newman algorithm [22], simulated annealing based algorithm [8]; and (4) random walk based methods such as the FEC algorithm [23], the NCMA algorithm [24] and others [25]. However, most existing approaches to addressing mining network communities are centralized. In order to find out all communities of a given network, all methods mentioned above have to take a whole network structure as their input. In some cases, the assumption that we can obtain the global view of networks beforehand is not always true due to their geographical distributions, decentralized controls or huge sizes. For instance, it is not practical to get a complete structure of the WWW in order to find out its community structure. Also, in the cases of ad hoc networks such as P2P networks or sensory networks, it is not reasonable to detect communities by forcing one of nodes to run a centralized method because this will inevitably break down the nature of peer to peer. Multi-agent system aims to provide a general computing paradigm for solving hard computational problems as well as for characterizing complex systems based on concepts such as autonomy, self-aggregation, self-organization and emergent behavior. Due to its inherent decentralized features, the multi-agent system is especially suitable for solving large-scale and distributed problems such as distributed constraint satisfaction problems [26]. Recently, Yang and Liu have proposed an AOC (autonomy oriented computing) based approach for solving the D-NCMP (distributed network community mining problem) [27,28]. Briefly, the D-NCMP is concerned with finding natural clusters from a distributed and dynamically evolving network. In their approach, each network node is modeled as one active agent (or a group of nodes managed by an active agent), which is responsible for gathering and processing local information through local interactions with others. In order to synchronize the collaborations among agents, they adopted a distributed clock mechanism, in which each agent maintains its own clock in a decentralized way. Through a positive feedback mechanism, agents can self-organize them into desired network communities. Completely distinct from the above idea, in this paper, we propose a novel decentralized algorithm for detecting network communities. In the newly proposed method, a group of mobile agents will be generated and dispatched into networks, and they work together through a self-aggregation mechanism rather than direct P2P communication. So, in our method, agents can run concurrently and asynchronously without any synchronization constraints. The performance of this method has been validated and compared with Yang’s method against different benchmark networks. The remainder of the paper is organized as follows: Section 2 gives a problem definition and the basic ideas behind our method. Section 3 presents a multi-agent based decentralized approach to mining communities of social networks. In Section 4, we examine the performance of the proposed method by applying it to several benchmark datasets. Finally, Section 5 concludes the paper and discusses future work. 2. Problem definition and basic idea A network can be modeled as a graph G = (V , E ), where V is the set of nodes and E is the set of links. We assume that the networks considered in this paper are binary and non-directed graphs. A k-way partition of graph G is defined as P = (C1 , . . . , Ck ), where components C1 , . . . , Ck satisfy ∪1≤i≤k Ci = G and ∩1≤i≤k Ci = φ . P is said to be a community structure if the number of edges within components is much greater than that between components. In order to digitally

3000

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008 Table 1 Centralized-P ∗ algorithm. Algorithm P = Centralized-P ∗ (A, k) /* A is the adjacency matrix of the network, and k is the desired community number */ Step 1. // initialize P randomly; For each node i, P (i) is assigned a random number between 1 and k; Step 2. /* initialize local evaluation value for each node */ For each node i, calculate f (i) according to Eq. (2.3); Step 3. // locally search an approximate P ∗ for cycle = 1: upbound // upbound is a predefined constant Step 3.1 select a node i with maximum f (i) with a probability r1 or randomly select a node i with a probability 1 − r1 ; Step 3.2 update P (i) to minimize f (i) with a probability r2 or update P (i) to decrease f (i) with a probability 1 − r2 Step 3.3 for each node i, recalculate f (i) according to Eq. (2.3); end

describe the meaning of ‘‘much greater’’, different methods adopt different evaluation functions such as various cuts used by spectral methods [13,17,18] or the network modularity proposed by Newman [6] and used by many methods [8,22]. Most of the existing evaluation functions, however, require global information of networks as their input, and thus cannot be efficiently computed in a decentralized way. Therefore, we present a new criterion to describe how good a k-way partition P is in terms of network community structure. Let A be the adjacency matrix of G containing n nodes, and P be a k-way partition of G. The evaluation function in terms of P is defined as:



F (P ) =

(1 − Aij )gij

(2.1)

1≤i,j≤n

where gij =



1, 0,

vi and vj belong to same component of P otherwise.

For a component of the network, its corresponding F -value is actually the number of newly added edges in order to change it into a clique. So, if a component is a community in which links are very dense, its F -value should be very small. Otherwise, it will be very large. For example, in Fig. 1a, if we combine communities B and C into one community, its F -value will be much greater than the sum of the two original F -values corresponding to communities B and C. The partition which can minimize F is the best network community structure. Now the network community mining problem can be transformed into an optimization problem as follows: P ∗ = arg min F (P ) P

(2.2)

where P ∗ is the desired network community structure. Based on the local search technique, we can design a centralized algorithm to find out an approximate P ∗ . In order to clearly describe it, we firstly introduce some necessary notations. A k-way partition P can be denoted as a n-dimension vector (p1 , . . . , pn ), where each component denotes that node i is assigned to the community pi . For vertex i, its local evaluation function in terms of a given P is defined as:



f (i) =

(1 − Aij )gij

(2.3)

1≤j≤n

where gij =



1, 0,

pi = pj else.

Now we can give the main steps to find out an approximately optimal solution for Eq. (2.2) (see Table 1). Step 3 is the main part of the algorithm, which iteratively runs a local search process to find out an optimum solution of function F . Actually, this local search process is implemented by a MCMC (Markov Chain Monte Carlo) method with a Gibbs sampling strategy. According the theory of MCMC, the sampling process will be theoretically guaranteed to convergence the stationary distribution of the Markov chain, which is right the solution of evaluation function F . While, in practice, one usually sets a upper bound to limit the iterations in order to get an approximately optimal solution within a reasonable time. As an example, let us see how to cluster the network shown in Fig. 2 using the Centralized-P ∗ method. This simple network describes the communication relationships between 32 mobile agents running in an ad-hoc network. Seven communities denoted by A–G are formed based on their communication relationships. Within each community, the communications between agents are frequent, whereas between them, the communications are rare. In order to improve the efficiency of the entire network, those agents belonging to the same community should autonomously move to same computer. By calling Centralized-P ∗ (A, 7), we got an approximately optimal 7-way partition in terms of Eq. (2.1) after 400 cycles, as visualized in Fig. 2d, which is the community structure we desire. Fig. 2b and c tell us that both the global evaluation function and local evaluation function will gradually decrease during the local search process of the Centralized-P ∗ .

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

3001

Fig. 2. An example of running the Centralized-P ∗ . (a) A simple network containing 7 clusters; (b) the global evaluation function value of whole network; (c) the local evaluation function value of the node 11 (d) the P ∗ obtained after 400 cycles. In this experiment, we set r1 = 0.8 and r2 = 0.8.

3. A multi-agent based decentralized algorithm A basic multi-agent system is composed of three main components: an environment, a predefined system objective and a group of autonomous agents [29]. Now we introduce the concept of ‘‘complementary graph’’ to model the environment of agents. Let A = 1 − A, where 1 denotes the matrix in which all entries are equal to one and A is the adjacency matrix of graph G = (V , E ). Let G = (V , E ) be the graph in terms of A. Actually G is the ‘‘complementary graph’’ of G. In terms of G, Eq. (2.1) can be rewritten as:



F (P ) =

Aij gij .

(3.1)

1≤i,j≤n

Note that Eq. (3.1) is same as the evaluation function of a graph coloring problem. So, to cluster a network is to actually color its complementary counterpart so that the adjacent nodes in the complementary network are assigned distinct colors as possible as they can. In the colored complementary graph, the nodes with the same color will be clustered together. In our proposed multi-agent system, the environment is the complementary network G = (V , E ) of the network G = (V , E ) to be clustered. Each node in such environment is characterized by a 3-tuple ⟨i, pi , f (i)⟩, where i denotes the node identifier, pi denotes the color assigned to this node, and f (i) is the local evaluation function defined as:



f (i) =

(1 − Aij )gij =



1, 0,

gij

(3.2)

⟨i,j⟩∈E

1≤j≤n

where gij =



pi = pj otherwise.

So, a localized version of Eq. (2.1) can be rewritten as: F (P ) =

 1≤i≤n

f (i).

(3.3)

3002

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

Table 2 Decentralized-P ∗ algorithm. Algorithm 3.1 P = Decentralized-P ∗ (G, k) /* G is the complementary network of G to be clustered, and k is the desired number of communities */ Step 1. for each node i, pi is assigned a random number between 1 and k; Step 2. generate m agents, and randomly dispatch them onto network nodes; Step 3. activate all agents, and let each of them repeat the following actions: Step 3.1 get the identifier of the current node, say, i; Step 3.2 update pi to minimize f (i) with a probability r2 or update pi to decrease f (i) with a probability 1 − r2 ; Step 3.3 for each neighbor j of node i, update f (j) according to the new pi ; Step 3.4 select a neighbor j of node i with maximum f (j) with a probability r1 or randomly select a neighbor j of node i with a probability 1 − r1 ; Step 3.5 die if hops is more than a predefined constant, otherwise move to node j and increment hops.

As discussed earlier, a color-assignment scheme for all nodes of the complementary network G is actually a k-way partition of the original network G. So, the objective of this multi-agent system is to find out an optimal color-assignment scheme for G, that is, to calculate a vector P which can minimize the evaluation function defined by Eq. (3.3). Each agent in this multi-agent model is a mobile agent, which can freely move from one node to another along the links between them. When an agent gets to a node, it will take actions to update the color of the node. Then, it will select its next stop and move there. Each agent maintains a variable, hops, which records the total moving steps it has done so far. An agent will die after its total hops attain a predefined constant. Now, we can give a multi-agent based decentralized algorithm as the substitute of the Centralized-P ∗ (see Table 2). The decentralized algorithm stops when all agents are inactivated. For each agent, steps 3.2, 3.3 and 3.4 are the most time-consuming steps, which will respectively take O(kd), O(kd2 ) and O(kd) time, where d is the average degree of the complementary network G. So each agent will run O(hops · kd2 ) time during its whole life. Note that, all agents run concurrently. So, the total time required by the Decentralized-P ∗ is O(maxm {hops · kd2 }). In the worst case, we have d = O(n), where n is the number of nodes. In that case, the time complexity of the Decentralized-P ∗ is O(hops · kn2 ). 4. Evaluation of the algorithm In this section, we will evaluate the effectiveness of the Decentralized-P ∗ algorithm against some benchmark networks widely used in literatures. Also we will demonstrate the self-aggregation and self-organization behaviors of the Decentralized-P ∗ algorithm through an image segmentation example. Finally, we will discuss some practical issues such as parameters setting. For all experiments presented in this section, we set m = n/3, r1 = 0.6, r2 = 0.8 and the maximum hops is equal to 300. 4.1. Self-aggregation and self-organization of the Decentralized-P ∗ algorithm Here we offer a simple sample, shown in Fig. 3, to visualize the dynamic searching process of the Decentralized-P ∗ and observe the self-aggregation and self-organization behaviors of agents emerged during this process. We used the famous karate network as the test bed. This figure shows six snapshots of community results obtained by the Decentralized-P ∗ after every 5 hops during its searching process. In each of the subfigures, the shape of each node denotes the actual community structure, and the color of each node expresses the community result got by our method at the current iteration. Initially, partitions are very close to random assignments. Gradually, they get better and better. After aggregating enough local effects (about 20 hops), an approximately optimal partition emerges. Then it is fine-tuned and finally towards to a best partition except just one node. 4.2. The football association network Fig. 4a shows the adjacency matrix of the football association network [5], which contains 115 nodes and 613 links. Each node denotes a US college football team and each link denotes a match played between two teams. All teams are grouped into 12 conferences. Intra-conference matches are played much more frequently than inter-conference ones. So, each conference can be considered as a network community. Due to its well-defined community structure, this network is widely used as a benchmark to test the correctness of different clustering algorithms. Fig. 4b shows the transformed adjacency matrix of the clustered network based on the 12-way partition obtained by the Decentralized-P ∗ algorithm. We transform an adjacency matrix by rearranging its rows and columns such that the nodes in the same community will be put together. In this way, we can clearly see the community structure of a network in terms of its adjacency matrix. If a network has a well-defined cluster structure, its transformed matrix should be an approximate diagonal one, in which each block along the diagonal line corresponds to a community. We can see the matrix of Fig. 4b is a highly regular one, in which the non-zero entries distributed out of diagonal area (corresponding to the inter-community links) are much fewer than those within diagonal blocks (corresponding to the intra-community links).

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

3003

Fig. 3. A run of our Decentralized-P ∗ on the karate network to illustrate the underlying mechanism of our method. The subfigures (a)–(f) show each of the community results got by our method compared with the actual community structure, at the 0th generation, the 1st generation, the 5th generation, the 10th generation, the 15th generation and the 20th generation respectively. Note that, in each subfigure, the shape of each node denotes the actual community structure, and the color of each node expresses the community result obtained by our method at the current iteration.

Fig. 4. (a) The adjacency matrix of the football association network; (b) the transformed adjacency matrix of the clustered football network obtained by the Decentralized-P ∗ algorithm.

After comparing the obtained 12-way partition with 12 real conferences, we found that all teams are exactly clustered except only 8 teams that belong to several independent conferences. The reason for this misclassification is that these teams play more inter-conference matches than intra-conference ones. 4.3. The dolphin social network The Fig. 5a shows the adjacency matrix of a dolphin social network, which describes the social relationship of 62 bottlenose dolphins living in Doubtful Sound of New Zealand [30]. Finally, these dolphins are partitioned into two groups due to some missing dolphins. Fig. 5b shows the output adjacency matrix of the Decentralized-P ∗ algorithm, in which two biggest groups are detected. This partition is almost identical with the actual division observed by Lusseau based on his experimental observations of these dolphins for seven years [30]. Only one dolphin, ‘‘sn89’’, is misclassified, which respectively has one link to each of two groups. 4.4. The distributed iPDA network The Decentralized-P ∗ algorithm is suitable for a clustering distributed network in that it works in a decentralized computing way. In this section, we test it against a small distributed network, the iPDA network as shown in Fig. 6a and b.

3004

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

Fig. 5. (a) The adjacency matrix of the dolphin network; (b) the transformed adjacency matrix of the clustered dolphin network obtained by the Decentralized-P ∗ algorithm.

Fig. 6. (a) A distributed iPDA network containing 82 nodes and 124 links; (b) the adjacency matrix of the network; (c) the transformed adjacency matrix of the clustered network obtained by the Decentralized-P ∗ algorithm; (d) the community containing one special customer.

In this network, nodes denote iPDAs (intelligent Portable Digital Assistant) carried by individuals, and links denote communication frequencies between individuals. One promising application of clustering such a distributed network is to build a recommending system, in which each iPDA is able to recommend potential business partners to its holder based on accumulated communication records of a period. For example, after one iPDA finds out its owner, Miller, within a cluster, in which individuals had more frequent contacts with each other than ever, it will report all members of this cluster to Miller. Then Miller will check this name list and select one or more to keep in his mind as possible collaborators.

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

3005

Table 3 Some real-world datasets we used. Datasets

N

e

r1

r2

m

k (Louvain [31])

E-mail network URV [33] Political blogs [34] Network science collaborations [35]

1133 1490 1589

5,451 16,717 2,742

0.8

0.8

n

11 44 277

Fig. 7. Against each network, the F -value obtained by our method is as a function of iteration numbers. (a) For email network. (b) For polbooks networks. (c) For netscience network. Table 4 Some real-world datasets we used. Datasets

n

e

c (ground-truth)

Zachary’s karate club [38] Dolphin social network [30] High school friendship network [39] Political books [40] American college football [41]

34 62 69 105 115

78 160 220 441 613

2 2 6 3 12

Fig. 6c shows the transformed adjacency matrix corresponding to a 5-way partition obtained by the Decentralized-P ∗ algorithm. Fig. 6d shows the community containing Miller. Besides his 3 direct neighbors, he will be recommended other 11 persons whom he might not know before. 4.5. Convergence analysis of the decentralized-P ∗ algorithm Here we take three large networks emails, polblogs, and netscience as examples to analyze the convergence of our method. These networks we used and their corresponding parameters we adopted are all listed in Table 3. Note that, as they do not have ground-truth, their community numbers used here are got by Louvain method [31], which is one of the best algorithms for community detection [32]. The trend that our algorithm’s clustering quality in terms of function F varies with iteration number is given by Fig. 7. As we can see, even against these large networks containing thousands of nodes and edges, our algorithm can attain a good clustering solution within dozens of iterations, and it will converge within 50 iterations in general. 4.6. Comparison on real networks In order to test the performance of our method, we compare it with Yang’s method [28] which is also a decentralized algorithm, on several real-world networks whose community structures are known. These networks are listed in Table 4. Here we adopt two types of widely used metrics, including the well-known accuracy measure Normalized Mutual Information (NMI) [36] and the most famous quality metric modularity Q [37], to evaluate the goodness of community results obtained by each algorithm. The parameters of our method are set as: r1 = 0.8, r2 = 0.8, m = n, and k = c, where c denotes the actual community number of the network. The parameters of Yang’s method are set as: w 1 = 0.4, and w2 = 0.3 according to the advice of its authors [28]. Table 5 shows the result that compares our method with Yang’s method on the real-world networks described in Table 4. As we can see, in terms of all these two types of metrics, the performance of our method is stably better than or competitive with that of Yang’s method.

3006

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008 Table 5 The comparison of each algorithm on several real networks. NMI index

Karate Dolphin Friendship Polbooks Football

Q -value

Our (%)

Yang’s (%)

Our

Yang’s

83.72 48.55 47.14 51.88 91.96

43.26 36.56 46.04 34.29 85.09

0.3718 0.4015 0.5540 0.4972 0.5908

0.2476 0.3113 0.4719 0.1893 0.4172

Fig. 8. Parameters analysis of r1 in terms of F -value, running time, and hops needed.

Fig. 9. Parameters analysis of r2 in terms of F -value, running time, and hops needed.

4.7. Discussions about parameters There are four parameters required by the Decentralize-P ∗ algorithm, including two probabilities (r1 and r2 ) for selecting actions, the number of agents (m) and the maximum hops required. These parameters work together to regulate the performance of the Decentralize-P ∗ algorithm such as speed as well as the quality of solutions. In this section, we will empirically analyze how the performance of the Decentralize-P ∗ will be influenced by these parameters. These discussions will be very helpful to set appropriate parameters for different practical problems. The following figures shows experiment results. The network used here is the football association network. Actually, we did this experiment against different networks and obtained quite similar results. Fig. 8 denotes the relationship between parameter r1 with some indexes of our algorithm, such as F -value, running time and hops needed. Fig. 9 denotes the variety of F -value, T and hops in terms of r2 . These figures records the mean values of 50 runs. More specifically, we have following observations. (1) The quality of solutions in term of the F -value defined by Eq. (2.1) is somewhat influenced by r1 and r2 . Note that, in the case of r1 = 1 (that is, agents always select the nodes with maximum local conflict values), the quality is not good enough since its corresponding F -value is much greater than its average value. That is the reason why we need the action of randomly selecting a node regardless of its local conflict value. With the aid of this random feature, the local search process of the Decentralize-P ∗ is able to run away from local optimal with big chance. (2) When m is fixed, the speed of the Decentralize-P ∗ in terms of running time T is up to hops, which denotes the minimum hops required by agents in order to search out an approximately optimal solution. Remember that, the optimal F -values exists, which is the stationary distribution of a Markov chain. The hops represents the steps of random walk of this chain. The bigger the hops is set, the nearer the obtained approximation is from the stationary distribution. The number of hops will affect both the running time and the precision of the algorithm. More hops means more running time at the same time

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

3007

better solution. One way we can do is to set bigger r1 and r2 (that is, each agent is asked to perform more rationally rather than randomly), which can force F -value decrease as fast as possible. But, rational acts are much costly than random ones. So, we should be careful to set them in order to get a good trade-off. 5. Conclusion In this paper we have presented a multi-agent based decentralized algorithm to cluster social networks, which is actually a multi-agent based local search approach aiming to optimize a predefined evaluation function through the self-aggregation and self-organization behaviors of autonomous agents. Due to its decentralized feature, our method is suitable for clustering distributed networks. All agents in our method can run concurrently and asynchronously without any synchronization mechanisms. Compared with the method proposed by Yang and Liu [27,28], all agents in our method can run concurrently and asynchronously without any synchronization mechanisms. The effectiveness of our method has been tested against different benchmark networks. The proposed method has some limitations. For example, instead of setting same lifetime for all entities, they can be dynamically generated and removed according to their history performance. Another key issue is how to get rid of k from the basic algorithm. The current method can only figure out a k-partition for a network, where k is a predefined constant. However, in practice, the number of clusters hidden in a network is unknown. In our future work, we will focus on above extensions. Acknowledgments The authors would like to express their thanks to the anonymous reviewers for their constructive comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under grants 60873149, 61133011, and 61170092, the Program for New Century Excellent Talents in University under grant NCET-11-0204, Jilin Province Scientific and Technological Developing Plan under Grant No. 20100186. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]

D.J. Watts, S.H. Strogatz, Collective dynamics of small-world networks, Nature 393 (1998) 440–442. S. Milgram, The small world problem, Psychology Today 1 (1) (1967) 60–67. A.L. Barabási, R. Albert, H. Jeong, G. Bianconi, Power-law distribution of the World Wide Web, Science 287 (5461) (2000) 2115. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon, Network motifs: simple building blocks of complex networks, Science 298 (2002) 824–827. M. Girvan, M.E.J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Science 9 (2002) 7821–7826. M.E.J. Newman, M. Girvan, Finding and evaluating community structure in networks, Physical Review E 69 (2004) 026113-1–026113-15. D.M. Wilkinson, B.A. Huberman, A method for finding communities of related genes, Proceedings of the National Academy of Science 1011 (2004) 5241–5248. R. Guimera, L.A.N. Amaral, Functional cartography of complex metabolic networks, Nature 433 (2005) 895–900. G. Palla, I. Derenyi, I. Farkas, T. Vicsek, Uncovering the overlapping community structures of complex networks in nature and society, Nature 435 (2005) 814–818. Z. Wang, J. Zhang, In search of the biological significance of modular structures in protein networks, PLOS Computational Biology 3 (2007) e107. G.W. Flake, S. Lawrence, C.L. Giles, F.M. Coetzee, Self-organization and identification of web communities, IEEE Computer 35 (2002) 66–71. J.M. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of ACM 46 (1999) 604–632. J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Transactions On Pattern Analysis and Machine Intelligence 22 (2000) 888–904. Meka, A.K. Singh, Distributed spatial clustering in sensor networks, LNCS 3896 (2006) 980–1000. W.W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33 (1977) 452–473. M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, New York, 1979. Pothen, H. Simon, K.P. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM Journal of Matrix Analysis and Application 11 (1990) 430–452. M. Fiedler, A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory, Czechoslovakian Mathematics Journal 25 (1975) 619–637. J.R. Tyler, D.M. Wilkinson, B.A. Huberman, Email as spectroscopy: automated discovery of community structure within organizations, in: Proceedings of 1st International Conference of Communities and Technologies, Kluwer, Dordrecht, 2003, pp. 81–96. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi, Defining and identifying communities in networks, Proceedings of the National Academy of Science 101 (2004) 2658–2663. B.W. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graphs, Bell System Technical 49 (1970) 291–307. M.E.J. Newman, Fast algorithm for detecting community structure in networks, Physical Review E 69 (2004) 066133. Yang, W.K. Cheung, J. Liu, Community mining from signed social networks, IEEE Transactions on Knowledge and Data Engineering 19 (10) (2007) 1333–1348. Yang, J. Liu, An efficient probabilistic approach to network community mining, in: The Proceeding of 2007 Joint Rough Set Symposium, JRS’07, Toronto, Canada, May 14–16, 2007, LNAI 4481, pp. 267–275. P. Pons, M. Latapy, Computing communities in large networks using random walks, LNCS 3733 (2005) 284–293. J. Liu, H. Jing, Y.Y. Tang, Multi-agent oriented constraint satisfaction, Artificial Intelligence 136 (2002) 101–144. Yang, J. Liu, An autonomy oriented computing (AOC) approach to distributed network community mining, in: The Proceeding of the First International Conference on Self-Adaptive and Self-Organizing Systems, SASO 2007, MIT, Boston, Mass., USA, 2007, pp. 151–160. Yang, J. Liu, An autonomy-oriented computing approach to community mining in distributed and dynamic networks, Journal of Autonomous Agents and Multi-Agent Systems 20 (2) (2010) 123–157. N.R. Jennings, An agent-based approach for building complex software systems, Communications of the ACM 44 (4) (2001) 35–41. D. Lusseau, The emergent properties of a dolphin social network, Proceedings of the Royal Society 270 (2003) S186–S188. No. Suppl 2.

3008

J. Huang et al. / Mathematical and Computer Modelling 57 (2013) 2998–3008

[31] V.D. Blondel, J.L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment 2008 (2008) P10008. [32] S. Fortunato, Community detection in graphs, Physics Reports 486 (3–5) (2010) 75–174. [33] R. Guimer‘a, L. Danon, A. Diaz-Guilera, F. Giralt, A. Arenas, Self-similar community structure in a network of human interactions, Physical Review E 68 (6) (2003) 065103. [34] L.A. Adamic, N. Glance, The political blogosphere and the 2004 US Election, in: Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem, 2005, pp. 36–43. [35] M.E.J. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical Review E 74 (3) (2006) 036104. [36] L. Danon, J. Duch, A. Diaz-Guilera, A. Arenas, Comparing community structure identification, Journal of Statistical Mechanics: Theory and Experiment 2005 (2005) P09008. [37] M.E.J. Newman, M. Girvan, Finding and evaluating community structure in networks, Physical Review E 69 (2004) 026113. [38] Wayne W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33 (4) (1977) 452–473. [39] Jierui Xie, Stephen Kelley, Boleslaw K. Szymanski, Overlapping community detection in networks: the state of the art and comparative study, ACM Computing Surveys 45 (4) (2013) 1–37. [40] M.E.J. Newman, Modularity and community structure in networks, Proceedings of the National Academy of Sciences of the United States of America 103 (23) (2006) 8577–8582. [41] M. Girvan, M.E.J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences of the United States of America 99 (12) (2002) 7821–7826.

Decentralized mining social network communities with agents

Decentralized mining social network communities with agents

Recommend Documents