Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Procedia Engineering
Procedia Engineering 14 (2011) 14–223080 – 3084 Procedia Engineering 29 (2012) www.elsevier.com/locate/procedia
2012 International Workshop on Information and Electronics Engineering (IWIEE)
Mining Community in Mobile Social Network Ke Xua,b, Xinfang Zhanga* a
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China b College of Computer Science, South-Central University for Nationalities, Wuhan, China
Abstract With the popularity of mobile devices and wireless technologies, mobile communication network systems are increasingly available. In this paper we propose a new algorithm for mining interesting communities or groups in Mobile Social Network. The proposal algorithm is composed of two main components, an algorithm for community partition and an algorithm for selecting small communities to combine into a big community. Empirical studies on a campus mobile social network (CMSN) show that performance of the proposal algorithm is better than the state-ofthe-art other cluster algorithm for mining community in CMSN.
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Harbin University of Science and Technology Keywords: Social Network; Community Detection; Virtual Campus Mobile Network
1. Introduction A social network is a social structure connecting individuals or organizations. Examples of social networks include email networks, online FaceBook, and scientific collaboration networks [1]. A social network is modeled by a graph, where the nodes represent individuals, and an edge between nodes indicates that a direct relationship between the individuals. Some typical problems in SNA include discovering groups of individuals sharing the same properties [2] and evaluating the importance of individuals [3, 4]. In a typical social network, there always exist various relationships between individuals, such as friendships, business relationships, and common interest relationships. Mobile social network is a typical social network where one or more individuals of similar interests or commonalities, conversing and connecting with one another using the mobile phone. In principle, a
* Corresponding author. Tel.: +86-27-63676886; E-mail address:
[email protected].
1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2012.01.444
Ke Xu Xinfang Zhang / Procedia Engineering 29 (2012) 308014–22 – 3084 Keand Xu and Xinfang Zhang / Procedia Engineering 14 (2011)
2
community can be simply defined as a group of objects sharing some common properties. In this paper, we use communication data collected from mobile phone call records to quantify the characteristic behaviors underlying relational ties and cognitive constructs reported through surveys. We propose a new algorithm for detecting interesting community of campus social networks in a virtual campus mobile network of China Mobile Communications Corporation named CMCC-V-Net. 2. Related Works The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community. Newman et al. present a hierarchical agglomeration algorithm for detecting community structure. An algorithm is proposed in [5] to extract meaningful communities from this network, revealing large-scale patterns present in the purchasing habits of customers. Kimura et al. address the combinatorial optimization problem of finding the most influential nodes on a large-scale social network for two widely-used fundamental stochastic diffusion models [7]. A small but emerging thread of literature examining interaction data based on e-mail [8] and call log data [9]. Bird et al. mine social network by constructing social networks of email correspondents and addressing some interesting questions in Open Source Software (OSS) projects [10]. Eagle et al. Infers social network structure using mobile phone data [11]. Dong et al. focus on the experimental study for this kind of social network with the support of large-scale real mobile call data [13]. 3. Community Mining Method The Campus Mobile Social Network is a virtual campus mobile network of China Mobile Communications Corporation (CMCC-V-Net) in which all users are from one university or college. We extract a Campus Mobile Social Network from the call log and model it as a directed weighted graph: a phone user corresponds to a node; a directed edge from node u to node v is established, if there exits communication from u to v, with the corresponding communication time as the weight of the edge. We denoted the graph as G = (V, E, W), where V, E, and W represent nodes, edges, and weights, respectively. 3.1. Extract Social Network Feature Degree of node in network Degree(i) is defined as unique dialer and receiver who had mobile phone call communication with node i. That is, it is the simple average of the in-degree and out-degree of the node: n
Degree(i ) = ∑ j =1
E ij + E ji 2
where πjk is the number of shortest path between j and k and πjk (i) is the number of shortest path between j and k that goes through i. This metric has been used in social network analysis [14]. 3.2. Community Detection Algorithm The methods described in this paper all assume that we are given a network structure that we wish to divide into communities in such a way that every vertex belongs to one of the communities. Newman find the best division is partitioning the complete graph into two groups, and then further subdividing those two until we have the required number of groups [14, 15]. Our community detection algorithm consists of two steps, partition and combination.
3081
3082
andXinfang XinfangZhang Zhang/ Procedia / ProcediaEngineering Engineering1429(2011) (2012)14–22 3080 – 3084 KeKeXuXuand
Community Partition The main principle of the label propagation is that a node i should belong to the community that contains the maximum number of its influenced neighbors. At each iteration of the label propagation, we assign the community label for a node i based on the labels of its neighbors that are influenced by i. Specifically, for each node i we find out the label of the community that the majority of its influenced neighbors belong to, and the label will become the label of i. Formally, the label, denoted by Lit , of a node i at iteration t is represented as follows: Lit = max MLC( Lj1t-1, Lj2t-1, Lj3t-1, …, Lsit-1 ) Where t denotes the tth iteration, si denotes the number of neighbors that are influenced by i, jm ( m ∈ [1, si]) represents an influence neighbor of node i, Ljmt-1 (m∈ [1, si]) represents the community of label of jm, at iteration t-1, and max MLC is to compute the majority label of Ljmt-1 ( m∈[1, sv]). Community Combination Definition Combination-Betweenness If a node i activates its neighbor j, we label the edge Eij as live. If Eij is live and i belongs to community Cm, but j belongs to a different community Cl, we say that j is a live node of Cm. Let A[Cm] be the set of live nodes of Cm. The combination entropy of community Cl to Cm is defined as: ~
ComBetw (Cε ) = l m
max
i∈C m , j∈ A[ C m ], j∈C l
Bm ({i}) Bm ({ j})
Where Bm({i}) denotes the betweenness of node i in its community Cm. B̃m({j}) denotes the betweenness of node j outside its community Cm. A node with high betweenness means that the corresponding person is a contact point between different social groups. We set a threshold θ to control whether community Cl should be combined to community Cm. If the combination betweenness ComBetw(Cεml) of community of Cm to community Cl is bigger than , then Cm and Cl will be combined. 3.3. Macro-averaged performance Metric Macro-averaged performance (MAP) is a conventional metric for evaluating community detection in community divide to node in network. The system-made decisions on each node in network belongs to which community with respect to a specific category Lc ∈ C ≡ {C1, C2, …, Cm} can be divided into four groups: True Positions (TPc), False Positions (FPc), True Negatives (TNc) and False Negatives (FNc), respectively. The corresponding evaluation metrics are defined as:
∑
Global Precision P = Global Recall R =
Lc∈C
TPc
∑Lc∈C (TPc + FPc )
∑
∑
Lc∈C
Lc∈C
TPc
(TPc + FN c )
Macro-averaged performance MAP = 4. Experiments
2 PR P+R
3
Ke Xu Xinfang Zhang / Procedia Engineering 29 (2012) 308014–22 – 3084 Keand Xu and Xinfang Zhang / Procedia Engineering 14 (2011)
4
We evaluate the Macro-averaged performance of the proposed algorithm. The data sets we used and experimental setup beforehand are described at first, and then the results with different parameters are shown. We take the Newman Clustering [6, 14, 15] as the benchmark to evaluate the proposed algorithm. 4.1. Data Sets We extract a Mobile Social Network which all users are from one university or college from the Call Detail Records data using the method presented in Section 3, and obtain a network with 26258 nodes and an average degree of 32.3. 4.2. Experimental Results We do experiment with different θ from 0.1 to 0.8 to combine the communities generated in the partition step. When θ=0.1, all the communities are combined; when θ=0.2, most of the communities are combined and the number of final communities is only 5; when θ=0.8, the generated communities are dispersed and small (62 of them have less than 10 nodes); when θ=0.5, we get 91 communities and this appear to be an appropriate number after we check the data manually. Hence we set θ=0.5 as the threshold. Community number C versus θ is shown in Table 1. Table 1 Community number C versus θ. θ
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
C
10
36
52
69
91
129
186
217
We observe that it is not completely stable after 5 iterations, i.e. the partitions of some nodes are still changed. To be safe, we set the number of iterations at 20 to make it more stable. We also compute Precision, Recall, and Macro-average for Community Detection algorithm by carrying out experiment with different θ from 0.1 to 0.8. According to community number C versus θ is shown in Table 1, we classify the campus mobile social network into different community/cluster number C by Newman Clustering, and compute Precision, Recall, and Macro-average for Newman Clustering algorithm. Precision, Recall, and Macro-average of the two comparison algorithms are shown in Table 2. As it is shown in Table 1 and those graphs, we get 91 communities when θ=0.5 and classify the he campus mobile social network into 91 clusters. The Precision, Recall, and Macro-average percent of the two algorithms when θ=0.5 is higher than other conditions. It verifies that we set θ=0.5 as the threshold in this appear to be an appropriate number. Table 2 Precision, Recall, and Macro-average for Newman Clustering and Community Detection algorithm θ
0.1
Cluster size Precision Recall Macro-average
0.2
0.3
0.4
0.5
0.6
0.7
0.8
10
36
52
69
91
129
186
217
NC
51.2
58.3
63.1
69.3
71.6
66.4
61.7
53.3
CD
43.1
53.4
66.3
72.9
77.9
73.6
67.3
49.9
NC
61.3
68.1
73.7
79.6
81.4
76.7
71.8
63.5
CD
53.5
63.7
76.2
82.5
87.6
83.5
77.5
59.7
NC
56.1
63.1
68.6
74.2
76.7
71.3
66.4
58.6
CD
47.8
58.2
71.8
77.1
82.5
78.2
72.8
54.9
3083
3084
andXinfang XinfangZhang Zhang/ Procedia / ProcediaEngineering Engineering1429(2011) (2012)14–22 3080 – 3084 KeKeXuXuand
5. Conclusions In this paper we propose a new algorithm called Community Detection algorithm for mining community in a CMSN. We first extend the basic features to take weight edge of CMSN into consideration. Community Detection algorithm has two main components, an algorithm for detecting communities, and a dynamic programming algorithm for selecting small communities to combine into a big community. This work opens to several interesting directions for future work. Notably, it is relevant to take spatial information of mobile customers into consideration, and construct locations based social networks to find influential nodes; it is also interesting to study the evolution of influential nodes over time. Acknowledgement This research was supported by National Science and Technology Major Project of the Ministry of Science and Technology of China (2011ZX03002-001-01). Reference [1] M. Kimura and K. Saito. Tractable models for information diffusion in social networks. In PKDD 2006, LNAI 4213, pages 259-271, 2006. [2] M. F. Schwartz and D. C. M. Wood. Discovering shared interests using graph analysis. Communications of the ACM, 36(8):78-89, 1993. [3] H. Kautz, B. Selman, and A. Milewski. Agent amplified communication. In Proceedings of AAAI-96, pages 3-9, 1996. [4] P. Domingos and M. Richardson. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57-66. ACM Press, 2001. [5] H. Ma, H. Yang, M. R. Lyu, and I. King. Mining social networks using heat diffusion processes for marketing candidates selection. In CIKM, pages 233-242, 2008. [6] A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70, 2004. [7] Kimura, M.; Saito, K.; Nakano, R.; and Motoda, H. Extracting influential nodes on a social network for information diffusion. Data Mining and Knowledge Discovery page 20:70–97, 2010. [8]G. Kossinets and D. J. Watts. Empirical Analysis of an Evolving Social Network. Science 311: 88-90, 2006. [9]W. Aiello, F. Chung, L. Lu. A random graph model for massive graphs, Annual ACM Symposium on Theory of Computing, Proceedings of the thirty-second annual ACM symposium on Theory of computing: 171-180, 2000. [10] C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. Mining email social networks. In Proceedings of the 3rd International Workshop on Mining Software Repositories, 2006. [11] N. Eagle, A. Pentland, and D. Lazer. Inferring Social Network Structure using Mobile Phone Data. Proceedings of the National Academy of Sciences (PNAS) 106(36), pages 15274-15278, 2009. [12] N. Eagle, A. Pentland. Reality Mining: Sensing Complex Social Systems. Personal and Ubiquitous Computing, 10(4): 255268, 2006. [13] Z. B. Dong, G. J. Song, K. Q. Xie, and J. Y. Wang. An experimental study of large-scale mobile social network. In WWW, pages 1175-1176, 2009. [14] M. E. J. Newman. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006). [15] M. E. J. Newman. Detecting community structure in networks. Eur. Phys. J. B 38, 321–330 (2004).
5