A cooperative game framework for detecting overlapping communities in social networks

A cooperative game framework for detecting overlapping communities in social networks

Accepted Manuscript A cooperative game framework for detecting overlapping communities in social networks Annapurna Jonnalagadda, Lakshmanan Kuppusamy...

2MB Sizes 1 Downloads 88 Views

Accepted Manuscript A cooperative game framework for detecting overlapping communities in social networks Annapurna Jonnalagadda, Lakshmanan Kuppusamy

PII: DOI: Reference:

S0378-4371(17)30837-3 http://dx.doi.org/10.1016/j.physa.2017.08.111 PHYSA 18557

To appear in:

Physica A

Received date : 20 January 2017 Revised date : 17 May 2017 Please cite this article as: A. Jonnalagadda, L. Kuppusamy, A cooperative game framework for detecting overlapping communities in social networks, Physica A (2017), http://dx.doi.org/10.1016/j.physa.2017.08.111 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A Cooperative Game Framework for Detecting Overlapping Communities in Social Networks Annapurna Jonnalagaddaa,∗, Lakshmanan Kuppusamya a School

of Computer Science and Engineering, VIT University, Vellore-632 014, India.

Abstract Community detection in social networks is a challenging and complex task, which received much attention from researchers of multiple domains in recent years. The evolution of communities in social networks happens merely due to the self-interest of the nodes. The interesting feature of community structure in social networks is the multi membership of the nodes resulting in overlapping communities. Assuming the nodes of the social network as self-interested players, the dynamics of community formation can be captured in the form of a game. In this paper, we propose a greedy algorithm, namely, Weighted Graph Community Game (WGCG), in order to model the interactions among the self-interested nodes of the social network. The proposed algorithm employs the Shapley value mechanism to discover the inherent communities of the underlying social network. The experimental evaluation on the real-world and synthetic benchmark networks demonstrates that the performance of the proposed algorithm is superior to the state-of-the-art overlapping community detection algorithms. Keywords: Coalitional game, Game theory, Shapley value, Overlapping community detection, Weighted graph game

∗ Corresponding

author Email addresses: [email protected] (Annapurna Jonnalagadda), [email protected] (Lakshmanan Kuppusamy)

Preprint submitted to Physica A

September 12, 2017

1. Introduction Social networks play an important role in promoting the communication among the people or organizations across the world. The characteristics of social network such as small world phenomenon, power-law distribution etc. reveals that the social networks are not random networks [1, 2]. The evolution of social networks happens in a systematic way due to the affinity between the nodes of the networks. This develops dense connections between some groups of nodes and sparse connections across different groups. These densely connected components are called communities/modules. The nodes of a community possess similar characteristics such as behavior, neighborhood connectivity, interests compared to the rest of the nodes of the network [1]. Detecting the communities in real-world networks has substantial such as developing the efficient recommender systems [3], critical node identification [4], improving the connectivity in wireless sensor networks [5], identifying the functional modules in protein–protein interaction networks [6] and so on. In social networks, community detection serves to analyze the interplay between the nodes of the network.

In social networks, a node may belong to more than one community resulting in overlapping communities. According to [6, 1]. The existing community detection algorithms look for specific predefined contours [7, 8, 9], demand a priori information on number of clusters [7], overlapping membership of nodes [10, 11], optimize an objective function such as modularity [12] or partition density [13, 14]. The evolution of communities in social networks happens merely due to expedience of the individual nodes, that is, the nodes are not enforced by any global authority in order to join/continue/leave a particular community. This notion of self-interest of individuals inspired the game theorists to capture the dynamics of community formation in the form of a game. Assuming node as a selfish agent, the interactions among nodes of the network are modeled as a game [15, 16]. A detailed study on different game theoretic models for community detection in social networks is discussed in [17]. The game theoretic

2

algorithms such as Chen et al. [16], Patrick et al. [18], Zhou et al. [19] are able to detect overlapping communities. Chen et al.[16] framework is a randomized non-cooperative and do not consider the cumulative interests of the nodes in due course of community formation. According to Witner [20], a cooperative game framework is more stable and effective in modeling the real-world applications. Even though Patrick et al. [18] algorithm is Non-transferable utility(NTU) game, it can effectively detect partitions rather than overlapping communities. Zhou et al. [19] proposed a coalitional game framework and is able to detect overlapping communities but it needs tuning of parameter values. Motivation: The non-game theoretic community detection algorithms such as CPM [7], COPRA [11], SPLA [21] etc. optimize a predefined structure such as clique or require number of communities or need the extent of overlap for each node. The existing game theoretic algorithms are either randomized or demand tuning of parameters in order to get the optimal communities.The randomized algorithms do not always guarantee the same community structure on repetition of tests. A deterministic algorithm, such as Zhou et al. [19], derives distinct community structures for varying values of α and β. The motivation for the current work lies at the following observations: a) There is a need for deterministic game theoretic community detection algorithms, b) the algorithm should consider the cumulative interests of nodes in order to form a coalition and c) the algorithm should be free from tuning parameters. Contribution of this paper: In this paper, we propose a cooperative game theoretic framework called Weighted Graph Community Game (WGCG) in order to model the interactions among the nodes of the network. In WGCG, a weighted edge is induced between every pair of nodes. The weight of each edge is computed considering the neighborhood connectivity and participation ratio of its end vertices. Based on the study made on theoretical properties such as monotonicity, super-additivity, non-negativity and gaps identified in the existing game theoretic models, we formulated the objectives of the current work accordingly as follows:

3

• To propose a cooperative game theoretic model called Weighted Graph Community Game (WGCG) to model the interactions among the nodes. • To analyze the theoretical properties of WGCG. • To Propose a community detection algorithm (WGCDA) that employs the Shapley value mechanism in order to derive the stable coalitions of the underlying network. • To analyze the complexity of WGCDA. • The evaluate the performance of WGCDA over real-world and synthetic networks and compare with RCHEN [16], COPRA [10]. We provide the graphical illustration for the evolution of communities using WGCDA over a network having 9 nodes and 16 edges in Fig 1. Consider the toy network as shown in Fig 1a. 1. The seed coalition (7,8) is selected according to the maximum worth (see Fig 1b. 2. Coalition (7,8) is expanded by adding its neighbors. The resultant coalition is (5,6,7,8)(see Fig1c). In this new coalition (5,6,7,8), each node is attaining its desired payoff. So this coalition is further expanded. 3. The resulted coalition is (0,5,6,7,8) (see Fig 1d). This coalition is further expanded as every node is interested to be part of this coalition. In the resultant coalition (0,1,3,5,6,7,8) (see Fig 1e), the set of nodes (1,3) are not interested to be part of the coalition. So they exit the coalition leading (0,5,6,7,8) as stable coalition. 4. Next seed selected is (2,4) (see 1f. The same procedure of expansion and stability check is repeated till no nodes can leave the coalition. The progress of the algorithm is visualized in Fig 1g to Fig 1i. The obtained stable coalition is (0,1,2,3,4). 5. Now, we apply merging mechanism on these stable coalitions. The final set of resultant communities of the given network are (0,1,2,3,4),(0,5,6,7,8)(Fig 1j. 4

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Figure 1: Evolution of communities using WGCDA on a network of 9 nodes and 16 edges

5

6. The resultant communities are overlapping communities with node 0 as overlapping node. Thus, WGCDA is able to detect the underlying modular structure of given toy network in Fig 1a. The rest of the paper is organized as follows: Section 2 discusses preliminaries; Section 3 elaborates the related work. In section 4, we discuss the proposed Game theoretic framework and the community detection algorithm. We also analyze the theoretical properties of the proposed game and the complexity of the proposed community detection algorithm. We provide the results and discussion in Section 5. We then conclude the paper providing some future insights.

2. Preliminaries Game theory is an abstract mathematical framework that focuses on decision making of the scenarios, in which the decision of one player can influence the decision of other players [22, 23, 24]. A game involves a number of players, a set of strategies for each player and a utility value that quantifies the outcome of each play of the game [22]. The discipline of game theory is broadly classified in to two categories: non-cooperative game theory and co-operative game theory. Non-cooperative game theory deals with players primitive actions. The well known solution concept of non-cooperative game theory is Nash equilibrium [22, 24]. The primitives of cooperative game theory are the joint actions of group of players. The popular solution concepts of cooperative game theory are core, nucleolus, stable-sets and bargaining sets. As the proposed game theoretic model is based on cooperative game theoretic framework, we discuss the required concepts related to cooperative games.The implicit assumption in cooperative games is that the players are allowed to form coalitions and they make mutual agreements in order to share the worth of the coalition [22].Cooperative games are classified in to two categories based on the utility of the players: transferable Utility (TU) games and Non-transferable Utility (NTU) games. The word ”transferable utility”signifies that there exists 6

some medium of exchange between the players of coalition. The cooperative game theory deals with the problems of distributing the revenues among the players in a joint work. The basic constructs of cooperative game theory below are discussed below:. Definition 1. Coalitional game with transferable utility (TU games) [22]: A coalitional form game on a finite set of players N = {1, 2, . . . , n} is a characteristic function υ that assigns a number υ(S) to every possible coalition S. i.e., υ : 2n → <. υS is called the worth of coalition S. υ(ϕ) = 0, where ϕ denotes the empty coalition. Definition 2. Coalitional game with non transferable utility (NTU games)[22]: A NTU game is a pair(N, υ) where N = {1, 2, · · · , n} is the set |s|

of players and υ : 2N → 2<

is the different payoff vectors that S is able to

achieve for each of its members. Definition 3. Payoff allocation: A payoff allocation is any vector u = (ui )i∈N in
Xi∈S yi . i∈S

Definition 5. Core: An allocation x is in Core of υ iff x is feasible and no P coalition can improve on x. i.e., x is in the core iff i∈N xi = υ(N ) and P i∈S xi ≥ υ(S), ∀S ⊆ N . Thus, if the feasible coalition is not in the core, then the coalition S may be

inconsistent.The core of a coalition game is not guaranteed, it may be empty or quite large. The Shapely value is a powerful tool for measuring the power structure of players in a coalition game. The Shapley value is an unique function ϕ : n

<2

−1


earity. The Shapley value of a player i in coalition S presents the payoff received by the player being a member of coalition. 7

Definition 6. Shapley value: The Shapley value of a player i towards a coalition T ⊆ N is given by ϕi (υ) =

X |T |! ∗ (|N | − |T | − 1)! (υ(T ∪ {i}) − υ(T )) |N |!

(1)

T ∈N−i

where N−i denotes the set of players N \{i}. The term

|T |!∗(|N |−|T |−1)! |N |!

can

be constructed as the probability in any permutation, the members of T are ahead of a player i and the term (υ(T ∪ {i}) − υ(T )) represents the marginal contribution of player i to the worth of coalition T . Definition 7. Monotonic game: A game is said to be Monotone if ∀S ⊆ T ⊆ N , υ(S) ≤ υ(T ). That is, the worth of a larger coalition is always greater than the worth of its subset coalitions. For TU games, the notion of convex games introduced by Shapley provides a natural way to formalize the distribution of payoffs among the players [22, 24]. Convex games capture the intuition that the benefit for joining the coalition increases as the coalition grows. Convex games have interesting theoretical properties such as: the non- empty core and also the stable set, bargaining set of grand coalition is contained in the core and the Shapley value is the barycentre of the core [25]. Definition 8. Convex game: A game is Convex if ∀S, T ⊆ N : υ(S ∪ T ) + υ(S ∩ T ) ≥ υ(S) + υ(T ). Equivalently, a game is convex if υ(S ∪ {i} − υ(S) ≤ υ(T ∩ {i}) − υ(T ) whenever S ⊆ T and i ∈ / T . In other words, the marginal value that the player i adds to a coalition S is no greater than the marginal value i adds to a coalition T ⊇ S. Definition 9. Weighted graph game[24]: Let G = (V, W ) be an undirected graph, where V is the set of vertices and W ∈
8

Proposition 1. [24] If all weights of a weighted graph game are non negative then the game is convex. Proposition 2. [24] If all weights of a weighted graph game are non negative then the membership of a payoff vector in the core can be tested in polynomial time. Theorem 1. [24] The Shapley value of a node i coalitional game (N, V ) induced by a weighted graph game (V, W ) is given by: ϕi (N, υ) =

1X w(i, j). 2

(2)

j6=i

These properties signify that if the game turns to be convex game, it ensures the guaranteed solution present in the core. In the following section, we study the existing literature for overlapping community detection in social networks which motivated us to pursue the current study.

3. Related work A social network can be viewed as a graph in which the nodes represent the social actors (such as individuals, organizations etc) and the links represent a kind of relation between these social actors. The significant irregularities in degree of nodes and distribution of edges across the nodes imparts a high level organization of the network [1, 6, 17]. This results in high density of edges within special groups of nodes and low density across different special groups. These special groups of nodes are known as communities or clusters within the network. Disclosing these communities reveal the interplay between the structure and the functionality of the network [1, 2, 6, 17]. In many real-world networks, an individual may often associates across different communities resulting in overlapping communities. consider the example of Facebook networks, in which an individual may belong to different groups like family, friends, co-workers etc [17]. The overlapping of a pair of communities 9

is called cover. Individuals belonging to this cover of the communities play an important role in intermediation and communication across the members of different groups [26]. Determining the cover of the given network can find analogy to the famous set cover problem, which is known to be NP-Hard [1, 17, 27]. Detecting the communities in social networks helps in developing the applications such as recommender systems [3], anomaly detection [28], information diffusion [4] and so on. Hence detecting overlapping communities in social networks is a significant and computationally expensive problem. Palla et al.[7] proposed clique percolation method to disclose the overlapping communities through the maximal cliques of the network. CONGA developed by Gregory et al.[10] is a hierarchical overlapping clustering algorithm that uses the betweenness centrality measure. A greedy modularity optimization algorithm is developed to disclose the community structure [12]. Shen et al.[8] proposed an agglomerative framework called EAGLE, which utilizes maximal cliques of the network in order to reveal the overlapping communities.COPRA [11],SLPA [21] and SATOCI [29] are the label propagation algorithms that expands the label of nodes based on the belonging coefficient of nodes in respective communities . GAOCD [13] is a genetic algorithm that optimizes the objective function called partition density in order to reveal the underlying community structure of the network. XU et al. [14] proposed a heuristic ant colony optimization algorithm to discover the network’s community structure. Jinquili [9] proposed maximal clique based overlapping community detection algorithm. These community detection algorithms either searches for some predefined patterns like cliques or optimize the objective functions such as modularity/partition density . But in real-world networks, the evolution of communities happens naturally in a bottom up fashion, considering the selfinterest of the individual nodes. This notion of self-interest inspired the game theorists to model the interactions among nodes in the form of game [16, 15]. The problem of community detection is modeled in game theoretic framework by Athey and Jha [15], who analyzed the work flow among the worker of an organization. In the context of social networks, Chen et al.[16] succeeded in 10

developing a non-cooperative game theoretic framework called potential game to disclose the overlapping communities of the given network.This framework involves personalized modularity as gain function and linear loss function to convey the interest of the node. This is a randomized algorithm which finds the local Nash equilibrium, which do not guarantee the same community structure on the repetition of the experiment. Followed by Chen’s [16] framework several game theoretic models have been developed. The non cooperative game models are primarily succeeded in finding partitions than overlapping communities [17]. The non-cooperative games are based on the decisions of the individual player and they didnt consider the joint decisions of all members of community during expansion of community. Unlike non-cooperative game theory, the cooperative game theory does not specify a game through a minute description of the strategic environment, including the order of moves, the possible actions at each move and the corresponding payoff opportunities for each of the players. Cooperative games provide a framework that models the cooperation among the players and also for the distribution of gains among the players [22, 24]. The cooperative games are proved to be stable and robust compared to noncooperative games [23, 20]. The cumulative interests of the players of the coalition are considered during the formation of coalition. Cooperative games have interesting real world applications such as: identify the top-k influential nodes [4], centrality analysis of terrorist networks [30], free riders [31],detecting and preventing intrusions [32] and so on. In the context of community detection, Zhou et al., [33] proposed a Shapley value based approach to determine the overlapping communities of the given social network. They assume the community formation as super additive game. This framework is unsuccessful in integrating the topological and topical information. Zhou et al. [19] modeled a coalitional game based on the topological structure of nodes. But one need to adjust the values of parameters α and β. After the extensive analysis of the existing literature, the deterministic approach driven by cooperative games motivated us to model the dynamics of community formation as a cooperative game. As the affinity between the nodes 11

of the network can be stacked as weights of edges, the interactions among these nodes can be naturally confronted as a weighted graph game. In this paper, we propose a Weighted Graph Community Game (WGCG), which turns to be convex indulging certain properties. We then develop a deterministic community detection algorithm in order to disclose the community structure of the underlying network. The algorithm is deterministic in the sense of generating the same community structure for the given network on repetition of experiment. In addition to this, the proposed algorithm does not demand any aprioi information like number of communities or overlapping membership of the nodes or tuning of any parameters during execution. In the following section, we present our proposed game theoretic framework for capturing the interactions among the nodes of the social network. We also analyze the theoretical properties of the proposed game, which are required to prove that the communities are disclosed when the game reaches the state of equilibrium. We then propose a greedy community detection algorithm that employs the Shapley mechanism in order to disclose the overlapping communities of the given network.

4. Proposed work Let G = (V, E) be the given social network, where V is the set of vertices called social objects (or nodes) and E is the set of edges between these social objects. Let di represents the degree of node i. A weighted graph G0 = (V, E 0 ) is constructed by computing the weights of the induced edges. Even if a pair of vertices (i, j) do not have a direct edge in G but an edge may be induced between (i, j) in G0 . The weight of an edge is computed using the common neighborhood ratio and the participation ratio of its end vertices. Let Wi,j be the weight of the edge between pair of vertices (i, j). We compute the weight of the edge Wi,j as follows: Wi,j = CN i,j + Pi,j

12

(3)

where CNi,j is the common neighborhood ratio and Pi,j is the participation ratio. The common neighborhood ratio of nodes i, j is computed as follows:   1 1 CNi,j = (|common neighbors of i and j | + 1) + . (4) di dj The common neighborhood ratio depicts the association of the nodes i, j with the rest of the network. The participation ratio of the nodes i, j is formulated as follows:   1 1 + . Pi,j = di dj

(5)

The participation ratio describes the prominence given by the nodes i, j towards the edge e = (i, j). The notion behind the participation ratio is, if a node is having more number of incident edges then prominence given to incident edges will get decreased. Hence, the weight of an edge between any pair of nodes (i, j) is computed using the following rules:    0 if di = 0 or dj = 0       CNi,j −Pi,j  if i and j are disconnected in G 4 Wij =    Pi,j if di = 1 or dj = 1 and i and j have an edge in G      2CN + P if i and j are connected in G i,j i,j (6)

The notion behind this edge weight computation is: if nodes i and j have a

direct edge and are sharing more number of neighbors that means they have good connectivity with the rest of the network. If nodes i and j are not connected, still they can have affinity due to their common neighbors. The edge weight will be more in the former case than the later. This weight function induces the biased weights to the edges of the given network according to their connectivity with the neighbors and rest of the network. Fig 2 illustrates the induced weighted graph for the toy network discussed in Fig 1a.The edges are labeled with their corresponding weights. Now, we define the weighted graph community game (WGCG) on weighted network G0 as follows: 13

0.125

0.125 0.125

1

5

2.5 4.5 4.083 4.083

2.5 0 2.5

2

4.083

4.5

2.5

3

6

4.083 4.083 4.083 8

0.125 4.667

4.083

4.083

4

4.667 7

0.292

0.292 0.292

0.292 Figure 2: Example of the toy network with induced and Weighted edges

Definition 10. Weighted Graph community Game (WGCG): A cooperative game CG = (N, υ) defined over the weighted graph G0 where N = V , i.e. the players of WGCG are the nodes of G0 and υ is the characteristic function. The characteristic function υ(S) for any coalition S ⊆ N is given by υ(s) =

XX

Wi,j .

(7)

i∈S j∈S

The characteristic function υ(s) calculates the worth or value of any coalition S ∈ N . Let S = 1, 2, 3, 4 be a possible coalition of the nodes of network (Fig 1a). The worth/value of the coalition S is computed as follows: υ(S)

=

XX

Wi,j

i∈S j∈S

= 4.083 + 4.5 + 4.083 + 4.083 + 4.667 + 4.083 = 25.499. The division of worth among the members of the coalition is computed using the Shapley value of the nodes. Therefore the payoff of a node i defined by its

14

Shapely value in coalition S is given as ϕi (S)): ϕi (s) =

1X Wi,j . 2

(8)

j∈S

[Directly extending the result of Theorem 1] The Shapley values of each node in coalition S = {1, 2, 3, 4} is given as ϕ1 (S) = 1 2

[4.083 + 4.5 + 4.083]. Similarly, we can compute the Shapley values of other

nodes and is given by: ϕ2 (S) = 6.4165, ϕ3 (S) = 6.333, ϕ4 (S) = 6.4165. The nodes of the network are allowed to play WGCG game in order to form coalitions. The node may choose one of the action joining / leaving /switching between the coalitions in order to maximize their Shapley value. When the game reaches the state of equilibrium, the nodes are conferred to the coalitions from which they cant deviate. These coalitions are called stable coalitions. Formally, we define the stable coalition as follows: Definition 11. Stable coalitions: A coalition S ⊆ N is said to be a stable

coalition, if no node i ∈ S or no subset of nodes S 0 ⊂ S can deviate from the coalition S in order to improve their Shapley value. That is, S is a Stable coalition iff ∀i ∈ S, ϕi (s) ≥ ϕi (N−S )

For example, consider the coalitions S1 ={0, 1, 2, 3, 4, 5} and S2 = {0, 5, 6, 7, 8}. ϕ5 (S1 ) < ϕ5 (S2 ). i.e, node 5 gets his best payoff in coalition S2 . Therefore, node 5 can not be a member of coalition S1 . In coalition S2 , each nodes is satisfied with their received payoff. Hence, S2 is a stable coalition. We extend the definition of stable coalition in order to define the community of the network. Definition 12. Community: A coalition S of WGCG is said to be the community of the network G, if S is a Stable coalition. For all i ∈ s, each node is satisfied with their received payoff and are not interested to leave the coalition S. This represents the similarity of the nodes in the coalition S. Therefore, these stable coalitions are claimed to be the required communities of the given social network. 15

We choose weighted graph game framework because of the promising properties of the weighted graph games. These games are super-additive and turn to be convex if all the edge weights are non-negative. The class of convex games are assured with core and the Shapley value is the bary-centre of the core [24, 25]. Though computing Shapley value is expensive [22], for weighted graph games Shapley value can be computed in quadratic time [24]. Herewith,we discuss the theoretical properties of WGCG. 4.1. Properties of the proposed game WGCG In this section, we discuss the theoretical properties of WGCG in order to prove WGCG is a convex game. This is required because convex games are assured with non-empty cores and the membership of solution in the core can be verified in polynomial time. We show that for WGCG, the Shapely value of a node can be computed in linear time to the size of coalition. Lemma 1. Given the weighted graph G0 = (V, E 0 ), if the weight of every e ∈ E 0 is non -negative then the WGCG is a convex game. Proof. Let υ be the characteristic function of the cooperative game WGCG and S be the coalition. υ(S) is given as: υ(S)

X

=

Wi,j .

i,j∈S

That is, if |S| = 0, then υ(S) = 0 which implies a non- negative value. If |S| = 6 0, υ(S) is the function of Wi,j which is comprised of two terms common neighborhood ratio CNi,j and participation ratio Pi,j . We know that CNi,j ≥ 0, as the number of common neighbors can not be negative. Similarly, Pi,j ≥ 0 as long as di , dj > 0. As CNi,j ≥ 0 and Pi,j ≥ 0, then υ(S) ≥ 0. Therefore, as υ(i, j) ≥ 0, WGCG turn to be the convex game (according to the proposition 1). Lemma 2. Given WGCG CG = (N, υ), WGCG is a super additive game.

16

Proof. Let S1 and S2 are two coalitions of the game WGCG such that S1 , S2 ⊂ N . We show that the worth of the larger coalition (S1 + S2 ) is at least as the sum of the worths of the individual coalitions S1 and S2 . Let S1 ∩ S2 = ∅. i.e.,

υ(S1 + S2 )

=

X

υ(S1 + S2 ) ≥ υ(S1 ) + υ(S2 ) Wi,j

i,j∈S

=

X

Wi,j +

i,j∈S1

X

i,j∈S2

= υ(S1 ) + υ(S2 ) +

X

Wi,j + X

Wi,j

i∈S1 ,j∈S2

Wi,j

[∵

i∈S1 ,j∈S2

i∈S1 ,j∈S2

≥ υ(S1 ) + υ(S2 ).

X

Wi,j ≥ 0]

This property proves that the formation of larger coalitions does not decrease the worth/benefit of nodes they receive in smaller coalitions. Lemma 3. Given WGCG CG = (N, υ), then υ(.) is monotonic. Proof. Let S1 , S2 be two coalitions and S1 ⊆ S2 .We show that the value of the coalition increases in proportional to the increase in size of the coalition i.e., the value of the larger coalition is greater than or equal to the value of its subset coalitions. υ(S2 )

=

X

Wi,j

i,j∈S2

=

X

Wi,j +

i,j∈S1

= υ(S1 ) +

X

i∈S1 ,j∈S2 −S1

X

i∈S1 ,j∈S2 −S1

≥ υ(S1 ).

Wi,j + Wi,j +

X

Wi,j

i,j∈S−2−S1

X

Wi,j

i,j∈S−2−S1

Hence, the proposed game υ(.) turns to be monotonic game. Lemma 1 states that the weight of the edge is non - negative. Lemma 2 proves that the formation of larger coalition does not reduce the utility of the players of the coalition. Lemma 3 states that the increase in the worth of the

17

coalition is directly proportional to the increase in the size of the coalition. Hence, we can conclude that the proposed game WGCG is a convex game. In the following section, we discuss the proposed greedy algorithm in order to disclose the stable coalitions of the WGCG game. 4.2. Weighted Graph Community Detection Algorithm (WGCDA) In this section, we propose a greedy algorithm called Weighted Graph Community Detection Algorithm (WGCDA). The algorithm is greedy in the sense of expanding the nodes of the coalitions i.e. it expands the coalitions by adding its immediate neighbors. Arriving at equilibrium state can be achieved through maximizing the Shapley value of the nodes. According to [22, 25], super additive games often lead to grand coalitions. Even though WGCG is a super-additive game, we utilize a threshold based Shapley value mechanism for WGCDA in order to determine the interest of the node(s) in a particular coalition. We prove WGCDA is able to disclose the desired overlapping community structure and not sensitive to satellite nodes (the nodes having degree one). The following notations are used in the proposed algorithm: Notations:

18

seedlist

:

list of seed coalitions

Grp

:

First coalition of the seedlist

ϕmax (i)

:

maximum Shapley value of node i in G0

temp

:

temporary coalition

ϕi (temp)

:

Shapley value of node i in temporary coalition

N Itemp

:

list of nodes not interested in temporary coalition

thi

:

threshold for node i

SG

:

stable group

SSG

:

Set of stable groups

|Win |

:

Number of links for node i in group temp,considering weighted graph

di

:

degree of node i in weighted graph G0

LL[i]

:

Likely list of i

NC

:

Newly formed coalition

Neighbors[i]

:

Set of neighbors of node i

ϕmax (i)

:

list of maximum Shapley values of node i

Φ

:

empty set

G0

19

Algorithm 1 WGCD //Algorithm to disclose the stable groups of the given network input: Weighted graph G0 = (V, E 0 ) output: Set of stable groups SSG 1:

//To generate the seed list

2:

seedlist = {(i, j)|Wi,j maximum for either i or j}

3:

// To sort the seedlist according to their worth

4:

seedlist ← sorted seedlist in ascending order[based on worth of seed coalition]

5:

// To determine the maximum Shapley value of nodes in G’

6:

for i ∈ V do

7:

max (i)

8: 9:

← maxS hapley(i)

//To Determine the stable coalitions while (seedlist 6= N U LL) do

10:

Grp ← seedlist(0)

11:

SGchecks table(Grp)

12:

if SG * any c ∈ SSG then

13:

SSG ← SSG ∪ SG

14:

seedlist ← update(seedlist, SG)

15: 16: 17:

// To eliminate the list of edges present in SG ∀i, j ∈ SG, Wi,j = 0 return SSG

Algorithm 2 max Shapley input: node i output: Maximum Shapley value of node i, ϕmax (i) 1: 2: 3:

for j ∈ V do ϕmax (i) ←

P

j∈V

Wi,j 2

return ϕmax (i)

20

Algorithm 3 expandg rp(Grp) input: A coalition (or group) Grp output: Expanded coalition 1: 2: 3:

for i ∈ Grp do temp ← temp ∪ neighbours(i) return temp

Algorithm 4 check stable input: A coalition (or group) Grp output: Stable group SG 1:

for i ∈ temp do

2:

thresholdi =

3:

for i ∈ temp do

4: 5:

P

j∈temp Wi,j 0 di G +1

if ϕi (temp) < thresholdi ∗ ϕm ax(i) then N Itemp = N Itemp ∪ i

6:

if N Itemp = Φ then

7:

return temp

8: 9:

for j ∈ N Itemp do temp ← temp − j

10:

SG ← temp

11:

return SG

Algorithm 5 update input: seedlist, stablegroup(SG) output: seedlist 1: 2: 3: 4:

for e ∈ SG do if e ∈ seedlist then seedlist ← seedlist − e return seedlist

21

Algorithm 6 V oting merge input: set of stable groups SSG output: Set of communities (SSG)

1: 2: 3: 4: 5:

for i ∈ SSG do for j ∈ SSG − i do if

|i∩j| min(|i|,|j|)

≥ 0.5 then

LL[i] ← LL[i] ∪ j N C[i] ← {i ∪ j | maximum number of nodes in groups

i ∪

j can improve the Shapley value af ter merging the groups i and j} 6: 7: 8: 9: 10: 11: 12:

if N C[i] ∈ / SSG then SSG ← SSG ∪ N C[i] for i ∈ SSG do for j ∈ SSG do if i ⊂ j then SSG ← SSG − i return seedlist The proposed WGCDA algorithm works in two phases. In Phase-I, the stable

coalitions are determined. Phase-II merges these stable coalitions in order to disclose the community structure of the network. The following section explains two phases of WGCDA: Phase-I: WGCDA algorithm first computes the seedlist, comprising the maximum weighted edge for each node. Hence, the seedlist consists of coalitions of size two. Then, the seedlist is sorted in ascending order according to the worth of seed coalitions. This step imparts the deterministic nature to WGCDA. The seed coalition having the maximum worth is chosen as the initial seed for expansion. Let the chosen coalition be c = {i, j}. The coalition c is expanded by appending the neighbors of both the nodes i and j using expand grp(). Let c0 the new expanded coalition and let c0 = {i, j, p1 , p2 , . . . , pk }, where p1 , p2 , . . . , pk

22

are the neighbors of nodes i and j. This step facilitates the faster convergence of WGCDA. Now, the interest of the nodes is determined using check stable. This function checks the payoff division across the members of the coalition using their received Shapley values. The nodes receiving lower payoff than expected are assumed to be not interested to be part of the coalition and are removed from c0 . If all the nodes of c0 gets their expected Shapley value, then c0 is considered as a stable group and is added to the set of stable groups (SG). Accordingly, the the seedlist and graph G0 are updated by removing the set of edges of c0 . This step reduces the required number of iterations as well as avoids the formation of grand coalition. Again the next seed is chosen for expansion. This procedure is repeated until the seedlist becomes empty. The Phase-I discovers the stable coalitions of the given network. But the coalitions may be of small size or may have excessive overlaps. To overcome this problem, a voting merge() procedure is developed in Phase-II. Phase-II: In this phase, each stable group in SSG choses the best possible stable group to get merged. The best match is determined based on the voting of the nodes in both the coalitions that improve their Shapley value after merge. This voting merge() procedure is repeated until no more merges are possible. 4.3. Complexity analysis: This section analyzes the computational complexity of WGCDA algorithm. Each step of the algorithm is analyzed to compute the overall complexity. De0

termining the seedlist can be accomplished in O(N dG max ), where N is the num0

ber of social actors anddG max is the maximum degree of the network G. The maximum size of seedlistcan be N . As there can be overlapping of coalitions for more than one node, in general |seedlist| ≤ N . The seedlist is sorted in O(N logN ) time. The computation of maximum Shapley value for all nodes 0

G incurs complexity of O(N dmax ). The expand grp() function contributes a cost

of O(|C|dmax ), where |C| is the size of the coalition C. The check stable() function can be accomplished in O(|C|dmax ), where |C| is the size of the coalition C and dmax is the maximum degree of the network G. Let k be the 23

number times the while loop (step-9 of Algorithm 1) executes.

Obviously,

k  |seedlist|. The subset elimination can be accomplished in O(|c|2 ), where |c| is the number of resulted communities. The V ote merge() can be accomplished in O(|c|2 ). Therefore the total complexity of our algorithm is approximately 0

2 equal tok(O(N logN ) + O(|C|dmax ) + O(N dG max ) + O(|c| ). As k is very small

when compared with N (which is usually very large), k can be omitted, thus 0

the total complexity of the algorithm is O(N logN ) + O(N dG max ). The following section proves that WGCDA is a deterministic overlapping community detection algorithm and is insensitive to the satellite nodes (Satellite nodes are those nodes whose degree is one). Theorem 2. Let CG = (N, υ) be Weighted Graph Community Game,then WGCDA is a deterministic overlapping community detection algorithm. Proof. We analyze the atomic execution of WGCDA and prove the following claims in order to complete the proof: 1. With the given characteristic function, WGCDA always yields the same seedlist which in turn drives the same community structure 2. The greedy expand grp() function do not reduce the payoff to its members. 3. Threshold based Shapley value mechanism always yields the stable coalitions. 4. As the membership of nodes is not restricted to one coalition, the resulting community structure is overlapping 5. V oting merge() mechanism reveals the set of desired overlapping community structure. WGCDA initially computes the seedlist, which comprises the maximum weight edges for each node. An edge gets more weight, if its end vertices have more number of common neighbors or the degree of end nodes is very low. That means the seedlist represents the set of active and interactive node pairs. As WGCDA always chooses the seed having maximum worth, for the given characteristic function it always yield the same initial seed. This imparts the deter24

ministic nature to the algorithm in the sense of revealing the same community structure on any number of repetition of the experiment. The exapand grp() procedure results the formation of large coalition c from coalition c. According to Lemma 2 and Lemma 3, the nodes of i ∈ c are assured with payoff in c. This deduces that the utility of nodes does not decrease after the expansion procedure. But there may be nodes in c which are not interested to stay with c. The check stable() function determines the set of nodes c00 = {∀i ∈ c0 , if ϕi (c0 ) < threshold ∗ ϕmax (i)} as uninterested nodes. This step helps in determining the

set of uninterested nodes. Removing the set of nodes c00 from c results the stable coalition c000 consisting of nodes from c0 − c00 . c000 = ϕi (c0 ) ≥ threshold ∗ ϕmax (i), ∀i ∈ c0 . That is, the nodes of the stable coalitions do not depart from the coalition in order to improve their Shapley value. Then, we update the graph such that ∀i ∈ c000 , Wi,j = 0. In other words, the edges of the subgraph formed by the

vertices of c000 are not considered for further iterations. This step restricts the formation of grand coalition. As the membership of node i in a coalition c1 depends on its received payoff i.e, ϕi (c1 ) ≥ threshold ∗ ϕmax (i). This results in the formation of overlapping stable coalitions. Finally, V oting merge() helps in improving the quality of functions. When two coalitions have more common nodes, they can be merged. This helps in reducing the extent of overlap and also the number resultant communities. The merge considers the cumulative interests of the nodes in order to form the large coalitions. As the V oting merge() procedure is applied on stable coalitions, the resultant merged coalitions are also becomes stable. As each step of the algorithm can be accomplished in polynomial time 0

(O(N logN ) + O(N dG max )), thus WGCDA is a polynomial time algorithm. Therefore, we conclude that WGCDA is a deterministic polynomial time overlapping community detection algorithm. 25

(b) Communities disclosed by WGCDA

(a) Network with one satellite node

Figure 3: Demo on network with one satellite node

(a) Network with three satellite nodes

(b) Communities disclosed by WGCDA

Figure 4: Demo on network with three satellite nodes

The following proposition proves that WGCDA does not change the underlying community structure as like Newman fast greedy algorithm [34] in the presence of satellite nodes. Proposition 3. WGCDA is not sensitive to satellite nodes. Proof. This proposition proves that, WGCDA discloses the same underlying community structure even after col ligating satellite nodes.we provide the following illustrations to suffice the proof.consider the network with one satellite node (node 9) in Fig 3a.The community structure disclosed by WGCDA is given in Fig 3b. The detected communities are {0, 1, 2, 3, 4} and {0, 5, 6, 7, 8, 9}. Again consider the network with three satellite nodes in Fig 4a. The community structure revealed by WGCDA is {0, 5, 6, 7, 8, 9, 10, 11} and {0, 1, 2, 3, 4} as shown in Fig 4b. These networks demonstrate that appending the satellite nodes to the network does not change the pilot community structure of the network. Hence, we can conclude that WGCDA is not sensitive to satellite

26

nodes.

5. Results and Discussion In this section, we discuss the performance of WGCDA on various realworld networks as well as the LFR benchmark networks. We also compare the performance of WGCDA with Chens Randomized algorithm (RCHEN) and COPRA label propagation algorithm (COPRA). We choose these algorithms because RCHEN is a non-cooperative game theoretic algorithm and is proved to be superior to existing overlapping community detection algorithms [27]. Copra is a label propagation algorithm proved to be good when compared to state of art [27]. While conducting the experiments with Copra, we considered overlap membership = 2 for each node and the chosen the best cluster by repeating the experiment for 50 times. We conducted the experiments on the system with the following configuration: Intel core(TH) i5-4200 U CPU @ 1.60 GHZ 2.30 GHZ 8.0 GB RAM. The following section discusses the evaluation on real-world and synthetic benchmark network datasets. 5.1. Evaluation on real-world networks Now, we discuss the performance of WGCDA on real-world benchmark networks. Table 1 describes the real world network datasets in the current study. It also provides the information on number of nodes (#nodes), number of edges (#edges) and the clustering coefficient, which represents the cohesiveness of nodes of the given network. Fig 5 illustrates the communities detected by WGCDA, COPRA and RCHEN on Zachary karate club network dataset [35].WGCDA disclose three communities, in which one is the large community and other two are overlapping communities with node 0 as the overlap (Fig 5a). This gives the high level modular structure. COPRA discloses communities of small size and even singleton communities as shown in Fig 5b. The communities detected by RCHEN are of small size and more overlapping as shown in Fig 5c.

27

(a) communities disclosed by WGCDA

(b) Communities disclosed by COPRA

(c) Communities disclosed by RCHEN

28 Figure 5: Demo on karate network

Table 1: Description of network datasets

Sno

Dataset

#nodes

# edges

Clustering coefficient

1

Karate[35]

34

78

0.588

2

Dolphins[36]

62

159

0.303

3

Lesmis [37]

72

254

0.736

4

Football [38]

115

613

0.403

5

Polbooks[2]

105

441

0.488

6

Netscience [39]

1589

2742

0.878

7

Power [40]

4941

6594

0.107

We evaluate the performance of WGCDA on the real world networks listed in Table 1 and compare its performance over the other state of art algorithms RCHEN and COPRA. We use the popular quality metric Modularity [41] for overlapping communities in order to measure the efficiency of algorithms.The modularity received by WGCDA, RCHEN and COPRA on real world datasets is reported in Fig 6. Following are few observations from the experiments: 1. WGCDA is yielding good modular structure for real-world networks than rest of the community detection algorithms as the modularity is high. 2. RCHEN is a randomized algorithm and does not always guarantee the same community structure resulting in different modularity value. Even choosing the best modularity value for RCHEN is still less than the WGCDA. RCHEN does not guarantee to give the same output on repetition of experiment. In this view, WGCDA is more consistent and better performance than RCHEN. 3. The computational time of WGCDA is reasonably less than RCHEN. 4. Even though COPRA is taking much lower computational time, the quality of the communities detected is very poor when compared to WGCDA and RCHEN. There are many small size communities are disclosed by COPRA. We further analyze the performance of WGCDA and COPRA on moderately 29

Figure 6: performance on different real world networks

30

Figure 7: performance of WGCDA and COPRA

31

large networks. RCHEN is not included as its execution time is very high (with the given system configuration). Fig 7 explains the modularity values obtained for Polbooks [2], Netscience [39] and Power [40]. Fig-8 demonstrates that WGCDA is performing well when compared to COPRA. For the Power network dataset, COPRA is performing very poor in disclosing the underlying community structure. 5.2. Benchmark networks In this section, we discuss the performance of WGCDA on synthetic benchmark networks. We choose LFR networks as they are widely used and proved to be versatile benchmark networks with ground truth communities. LFR is very closer to the real world networks as they ensure the power law distribution for number of nodes and the degree of nodes. The generation of LFR networks is discussed in [42]. 5.2.1. Construction of LFR networks We conducted the experiments on the LFR benchmark networks with the different size of 1000 nodes and 5000 nodes. The average degree is taken as kavg =20 and the maximum degree is maxk=50. The power law distributions with exponents for node degrees and community size are chosen as τ1 =2 and τ2 = 1 respectively. The number of communities to which each overlapping node belongs to is set to om = 2. The number of overlapping nodes varies from 0% to 50% of the total nodes. µ is the mixing parameter that controls the fraction of edges between communities. If µ = 0 then all edges are within community and if µ = 1, then all edges are between the nodes belonging to different communities. In this study, we set the mixing parameter µ = 0.1 and µ = 0.3 separately. The minimum and maximum community sizes are set to (10,50) and (20,100) separately. We describe the parameter setting of LFR networks generated for the current study in Table 2 . N -number of nodes, µ-mixing parameter, minC- minimum size of community, maxC-maximum size of community.

32

Table 2: parameter setting for generation of LFR networks

Sno

N

µ

minC

maxC

1

1000/5000

0.1

10

50

2

1000/5000

0.1

20

100

3

1000/5000

0.3

10

50

4

1000/5000

0.3

20

100

The following section discusses the performance of three community detection algorithms on LFR networks. 5.2.2. Performance on LFR networks We adopt the extended Normalized mutual information (NMI) [43] to measure the accuracy of three community detection algorithms. The NMI measures the similarity between the detected communities and ground truth communities. Its value ranges from 0 to 1. If NMI=1, the resulted community structure exactly matches with the ground truth. If NMI=0, the resultant community structure is irrelevant. i.e. higher the NMI value, more accurate community structure is detected. The NMI values (best) of three algorithms on 48 synthetic networks comprising 8 datasets are depicted in Fig 8,9,10 and 11. The x-axis represents the fraction of overlapping nodes and the y-axis represents the corresponding NMI values. Fig 8,10 considers the network of size 1000 and 5000 nodes with µ=0.1. Fig 9,11 represents performance of the algorithms on networks of size 1000 and 5000 nodes with µ=0.3 The NMI value for COPRA is diminishing with the increase in the fraction of overlapping nodes for both µ=0.1 and µ=0.3 . On average, COPRA is able to detect 74% and 72% of accurate communities for µ=0.1 and µ=0.3 respectively. When µ=0.1, RCHEN and WGCDA are able to detect the community structure with more than 90% accuracy even for different ranges of community sizes (min and max community sizes are 10,50 and 20,100). When µ=0.3, minc=10 and maxc=50, RCHEN and WGCDA are able to detect 74% and 89% of accurate communi-

33

1000 nodes, minc 10, maxc 50, mu=0.1

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(a) 1000 nodes, minc 20, maxc 100, mu=0.1

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(b) Figure 8: performance evaluation on LFR networks of 1000 nodes µ =0.1

34

1000 nodes, minc 10, maxc 50, mu=0.3

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(a) 1000 nodes, minc 20, maxc 100, mu=0.3

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(b) Figure 9: performance evaluation on LFR networks of 1000 nodes µ=0.3

35

5000 nodes, minc 10, maxc 50, mu=0.1

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(a) 5000 nodes, minc 20, maxc 100, mu=0.1

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(b) Figure 10: performance evaluation on LFR networks of 5000 nodes µ =0.1

36

5000 nodes, minc 10, maxc 50, mu=0.3

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(a) 5000 nodes, minc 20, maxc 100, mu=0.3

Normalized Mutual Information

1

0.8

0.6

0.4

0.2

WGCDA COPRA RCHEN

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

0.5

(b) Figure 11: performance evaluation on LFR networks of 5000 nodes µ =0.3

37

ties (on average over all fraction of overlapping nodes). When µ=0.3, minc=20 and maxc=100, both RCHEN and WGCDA are able to detect 67% of accurate communities. This clearly illustrates that WGCDA is able to detect the good community structure over RCHEN and COPRA. We further analyze the performance of WGCDA for different values of mixing parameter and fraction of overlapping nodes to determine the best performance domain of WGCDA. 5.2.3. Performance of WGCDA for different µ values In this section, we illustrate the performance of WGCDA for different value of mixing parameter µ on a network of 1000 nodes. The mixing parameter µ controls the fraction of edges across different communities. If µ = 0 then all edges are within community, if µ = 1 then all edges are between the nodes belonging to different communities. When the value increases, the topology of the network becomes much fuzzier and it is very difficult to detect the community structure. The LFR networks are generated under the following parameter setting: minc = 10, maxc = 50, kavg =20, maxk=50, τ1 = 2 and τ2 = 1. Fig 12 illustrates the observed NMI values or different values of µ. The x-axis represents the fraction of overlapping nodes and the y-axis represents the corresponding NMI values. We consider the following set of values for µ= {0.1, 0.2, 0.3, 0.4, 0.5}. WGCDA is able to detect more than 95% of accurate communities when µ=0.1 and µ=0.2. For µ=0.3 and µ=0.4, WGCDA is able to detect more than 90% and 85% of accurate communities. But even for µ=0.5, on average it is detecting 40% of exact of community structure, which is very difficult task. WGCDA is giving the best performance for the fraction of overlapping nodes in the range of 0% to 30%. Based on the observed NMI measure for different values of mixing parameter µ. we can conclude that WGCDA is performing well when µ=0.1 to µ=0.4 and fraction of overlap is 0% to 30%. As WGCDA is able to detect the good community structure even when µ=0.4, it can even detect the communities in dense networks, which could be a difficult task.

38

Different values of mu

Normalized Mutual Information

1

0.8

0.6

0.4 0.1 0.2 0.3 0.4 0.5

0.2

0 0

0.1

0.2 0.3 Fraction of overlap

0.4

Figure 12: Performance of WGCDA for different µ values

39

6. Conclusions and future directions Overlapping community detection in networks is challenging and complex task. The existing overlapping community detection algorithms are randomized algorithms or demand prior information on number of communities or membership of the nodes. To address this issue, in this paper, we have proposed a cooperative game theoretic framework to detect the overlapping community structure in social networks. The proposal is analyzed both experimentally on real world and synthetic LFR benchmark networks and theoretically analyzed with properties such as super additivity and monotonicity. The main contributions of the current research are summarized as follows. • A novel approach is proposed, based on Shapley value mechanism, which considers the cumulative interests of the nodes while forming the communities. • The proposed method does not demand any apriori information on number of communities or overlapping membership of the nodes. • The proposed approach work with a simple weight function to discover the underlying community structure of the given network. • WGCDA algorithm uses the local information in order to compute the weight function as well as Shapley values. • The performance evaluation on synthetic and real world benchmark networks demonstrates that WGCDA is superior to state-of-the-art algorithms such as RCHEN and COPRA. In future, one can extend this framework to directed and dynamic networks. Our proposed method explores one seed group in an iteration, thus works in sequential. Hence, one can work on the logistics to process more seed groups per iteration, thus can be executed in parallel framework.

40

References [1] S. Wasserman, K. Faust, Social network analysis: Methods and applications, Vol. 8, Cambridge university press, 1994. [2] M. E. Newman, Modularity and community structure in networks, Proceedings of the national academy of sciences 103 (23) (2006) 8577–8582. [3] P. Cremonesi, R. Turrin, E. Lentini, M. Matteucci, An evaluation methodology for collaborative recommender systems, in: Automated solutions for Cross Media Content and Multi-channel Distribution, 2008. AXMEDIS’08. International Conference on, IEEE, 2008, pp. 224–231. [4] R. Narayanam, Y. Narahari, A shapley value-based approach to discover influential nodes in social networks, IEEE Transactions on Automation Science and Engineering 8 (1) (2011) 130–147. [5] M. A. de Paulo, M. C. Nascimento, V. Rosset, Improving the connectivity of community detection-based hierarchical routing protocols in large-scale wsns, Procedia Computer Science 96 (2016) 521–530. [6] S. Fortunato, Community detection in graphs, Physics reports 486 (3) (2010) 75–174. [7] G. Palla, I. Der´enyi, I. Farkas, T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature 435 (7043) (2005) 814–818. [8] H. Shen, X. Cheng, K. Cai, M.-B. Hu, Detect overlapping and hierarchical community structure in networks, Physica A: Statistical Mechanics and its Applications 388 (8) (2009) 1706–1712. [9] J. Li, X. Wang, Y. Cui, Uncovering the overlapping community structure of complex networks by maximal cliques, Physica A: Statistical Mechanics and its Applications 415 (2014) 398–406.

41

[10] S. Gregory, An algorithm to find overlapping community structure in networks, in: European Conference on Principles of Data Mining and Knowledge Discovery, Springer, 2007, pp. 91–102. [11] S. Gregory, Finding overlapping communities in networks by label propagation, New Journal of Physics 12 (10) (2010) 103018. [12] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of statistical mechanics: theory and experiment 2008 (10) (2008) P10008. [13] C. Shi, Y. Cai, D. Fu, Y. Dong, B. Wu, A link clustering based overlapping community detection algorithm, Data & Knowledge Engineering 87 (2013) 394–404. [14] X. Zhou, Y. Liu, J. Zhang, T. Liu, D. Zhang, An ant colony based algorithm for overlapping community detection in complex networks, Physica A: Statistical Mechanics and its Applications 427 (2015) 289–301. [15] S. Athey, E. Calvano, S. Jha, A theory of community formation and social hierarchy. [16] W. Chen, Z. Liu, X. Sun, Y. Wang, A game-theoretic framework to identify overlapping communities in social networks, Data Mining and Knowledge Discovery 21 (2) (2010) 224–240. [17] A. Jonnalagadda, L. Kuppusamy, A survey on game theoretic models for community detection in social networks, Social Network Analysis and Mining 6 (1) (2016) 83. [18] P. J. McSweeney, K. Mehrotra, J. C. Oh, Game-theoretic framework for community detection, in: Encyclopedia of Social Network Analysis and Mining, Springer, 2014, pp. 573–588. [19] L. Zhou, K. L¨ u, P. Yang, L. Wang, B. Kong, An approach for overlapping and hierarchical community detection in social networks based on coalition 42

formation game theory, Expert Systems with Applications 42 (24) (2015) 9634–9646. [20] E. Winter, A value for cooperative games with levels structure of cooperation, International Journal of Game Theory 18 (2) (1989) 227–240. [21] J. Xie, B. K. Szymanski, Towards linear time overlapping community detection in social networks, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2012, pp. 25–36. [22] R. B. Myerson, Game theory, Harvard university press, 2013. [23] N. Nisan, T. Roughgarden, E. Tardos, V. V. Vazirani, Algorithmic game theory, Vol. 1, Cambridge University Press Cambridge, 2007. [24] Y. Shoham, K. Leyton-Brown, Multiagent systems: Algorithmic, gametheoretic, and logical foundations, Cambridge University Press, 2008. [25] L. S. Shapley, Cores of convex games, International journal of game theory 1 (1) (1971) 11–26. [26] J. Yang, J. Leskovec, Overlapping communities explain core–periphery organization of networks, Proceedings of the IEEE 102 (12) (2014) 1892–1902. [27] J. Xie, S. Kelley, B. K. Szymanski, Overlapping community detection in networks: The state-of-the-art and comparative study, ACM computing surveys 45 (4) (2013) 43. [28] G. K. Orman, V. Labatut, M. Plantevit, J.-F. Boulicaut, A method for characterizing communities in dynamic attributed complex networks, in: Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, IEEE, 2014, pp. 481–484. [29] R. Badie, A. Aleahmad, M. Asadpour, M. Rahgozar, An efficient agentbased algorithm for overlapping community detection using nodes? closeness, Physica A: Statistical Mechanics and its Applications 392 (20) (2013) 5231–5247. 43

[30] R. Lindelauf, H. Hamers, B. Husslage, Cooperative game theoretic centrality analysis of terrorist networks: The cases of jemaah islamiyah and al qaeda, European Journal of Operational Research 229 (1) (2013) 230–238. [31] A. Al-Dhanhani, R. Mizouni, H. Otrok, A. Al-Rubaie, A game theoretical model for collaborative groups in social applications, Expert Systems with Applications 41 (11) (2014) 5056–5065. [32] S. Shamshirband, A. Patel, N. B. Anuar, M. L. M. Kiah, A. Abraham, Cooperative game theoretic approach using fuzzy q-learning for detecting and preventing intrusions in wireless sensor networks, Engineering Applications of Artificial Intelligence 32 (2014) 228–241. [33] L. Zhou, K. L¨ u, C. Cheng, H. Chen, A game theory based approach for community detection in social networks, in: British National Conference on Databases, Springer, 2013, pp. 268–281. [34] M. E. Newman, Fast algorithm for detecting community structure in networks, Physical review E 69 (6) (2004) 066133. [35] W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of anthropological research 33 (4) (1977) 452–473. [36] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, S. M. Dawson, The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations, Behavioral Ecology and Sociobiology 54 (4) (2003) 396–405. [37] D. E. Knuth, The Stanford GraphBase: a platform for combinatorial computing, Vol. 37, Addison-Wesley Reading, 1993. [38] M. Girvan, M. E. Newman, Community structure in social and biological networks, Proceedings of the national academy of sciences 99 (12) (2002) 7821–7826.

44

[39] M. E. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical review E 74 (3) (2006) 036104. [40] D. J. Watts, S. H. Strogatz, Collective dynamics of ?small-world?networks, nature 393 (6684) (1998) 440–442. [41] V. Nicosia, G. Mangioni, V. Carchiolo, M. Malgeri, Extending the definition of modularity to directed graphs with overlapping communities, Journal of Statistical Mechanics: Theory and Experiment 2009 (03) (2009) P03024. [42] A. Lancichinetti, S. Fortunato, Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities, Physical Review E 80 (1) (2009) 016118. [43] A. Lancichinetti, S. Fortunato, J. Kert´esz, Detecting the overlapping and hierarchical community structure in complex networks, New Journal of Physics 11 (3) (2009) 033015.

45

Highlights 1. A novel approach is proposed, based on Shapley value mechanism, which considers the cumulative interests of the nodes while forming the communities. 2. The proposed method does not demand any a priori information on number of communities or overlapping membership of the nodes. 3. The proposed approach work with a simple weight function to discover the underlying community structure of the given network. 4. The performance evaluation on synthetic and real world benchmark networks demonstrates that WGCDA is superior to state-of-the-art algorithms.