A complex network community detection algorithm based on label propagation and fuzzy C-means

Accepted Manuscript A complex network community detection algorithm based on label propagation and fuzzy C-means Zheng-Hong Deng, Hong-Hai Qiao, Qun S...

Download PDF

765KB Sizes 2 Downloads 58 Views

Report

PDF Reader
Full Text

Accepted Manuscript A complex network community detection algorithm based on label propagation and fuzzy C-means Zheng-Hong Deng, Hong-Hai Qiao, Qun Song, Li Gao

PII: DOI: Reference:

S0378-4371(18)31536-X https://doi.org/10.1016/j.physa.2018.12.024 PHYSA 20410

To appear in:

Physica A

Received date : 7 July 2018 Revised date : 11 November 2018 Please cite this article as: Z.-H. Deng, H.-H. Qiao, Q. Song et al., A complex network community detection algorithm based on label propagation and fuzzy C-means, Physica A (2018), https://doi.org/10.1016/j.physa.2018.12.024 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Highlights (for review)

Highlights 1. Research on complex network community detection algorithm based on improved label propagation and fuzzy C-means. 2. The algorithm completes initialization classification of communities by neighbor evaluation method. 3. The algorithm adjusts the labels of unstable vertexes by fuzzy C-means membership vectors. 4. The accuracy of community detection is improved on synthetic and real network.

*Manuscript Click here to view linked References

A complex network community detection algorithm based on label propagation and fuzzy C-means Zheng-Hong Denga*, Hong-Hai Qiaoa, Qun Songa, Li Gaob b

a School of Automation, Northwestern Polytechnical University, Xi’an 710072, China School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China

Abstract：Community detection algorithms have important significance in the research and practical application of complex network theory. This paper proposes a community detection method by improved label propagation and fuzzy C-means. Due to low accuracy and instability detection results, we modify original label propagation framework. Primarily, initial labels of vertexes are assigned by neighbor evaluation method. Secondarily, the labels of vertexes with large diversity in each community are revised by fuzzy C-means membership vectors. Tertiarily, parameters are updated until communities status is stabilized ultimately. The results showed that this method can achieve better accuracy on synthetic and real network. Key words: Community detection; Label propagation; Fuzzy C-means objective function; Modularity; Normalized Mutual Information;

Corresponding to Email: [email protected]

1 Introduction Community detection algorithms are an important part of complex network theory. The definition of community detection is not strictly limited currently. Generally, it is for vertexes with common or similar features to be assigned to the same community labels. In the real world, a large number of real cases can be regarded as complex network framework structures, such as mobile, bioinformatics [1,29], power system [2], Internet, social interpersonal[26], image segmentation[3, 4] and military decision information system. If scholars need to analyse the performance, features and functions of complex networks, vertexes should be divided into different communities reasonably. Therefore, the research of community detection has profound and important significance in social network science theory. For now, community detection algorithms contain several mainstream research directions. Community detection algorithm based on the hierarchy of community structure. Girvan and Newman [5, 6] proposed that the similarity measure or connection strength between vertexes is regarded as the hierarchical structure features. Vertexes are allocated or redistributed into different communities by modularity optimization criterion. Community detection algorithm based on genetic algorithm. Shi et al [7] proposed a framework of genetic algorithm to solve community detection, which includes populations (communities) initialization, criterion, updating rules of vertexes and population, mutation and crossover operator. Guerrero et al [8] proposed an adaptive genetic algorithm of community detection. The high efficiency of initialized population, mutation and crossover operator are improved by the guidance of community modularity parameters. Besides, genetic algorithm can analyze the features of different type networks flexibly and adaptively by different graphic structure levels and details. Community detection based on label propagation algorithm. Raghavan et al [9] proposed a framework of label propagation algorithm. The core idea of label propagation is that the label of vertex is estimated by its neighbor vertexes. Furthermore, the update of vertexes labels is performed continuously until each vertex shares the same label by its neighbour vertexes. Label propagation is a fast unsupervised clustering algorithm, which can adapt to different types or sizes of complex network structures. Li et al [10] proposed an improved label propagation algorithm. Li modified traditional model structure and set up a stepped LPA framework to divide complex networks. Community labels are assigned by the similarity between vertexes. Furthermore, community labels of vertexes are selected and distinguished by relevant evaluation functions. In this paper, we proposed a community detection algorithm based on improved label propagation and fuzzy C-means, called LPA-FCM. Our method still uses label propagation framework, which is divided into following steps. Firstly, initialization community labels are obtained by neighbour evaluation method, including neighbour evaluation vectors and formation of communities. Secondly, vertexes with diversity degree in each community are selected to revise their labels by Fuzzy C-means membership vector. Their labels are exchanged among communities by the similarity membership between vertexes and communities. Thirdly, Fuzzy C-means parameters are updated constantly during iteration. When objective function cut-off condition is satisfied, LPA-FCM stops iterative calculation and finishes community detection ultimately.

The rest of this paper is organized as follows. We introduce the original LPA and KL-FCM model in Section 2. LPA-FCM algorithm is described in Section 3. The experiments results and discussion about method is present in Section 4. Finally, the conclusion of our algorithm is given in Section 5. 2 Related work 2.1 LPA The original label propagation algorithm [9] proposed an advance concept of community detection, which means detection algorithm not need to obtain priori information and predefined optimization objective function. The network structure will be regarded as the partition wizard of network vertexes. The steps of the original LPA are as follows: 1) Initial community labels are assigned for each vertex randomly. 2) Random access sequence is generated in network. 3) Label of vertex is updated by its neighbour vertexes labels until the label is unique and unchanged. During the process of accessing sequence, if the label of its majority neighbour vertexes is not unique, ties will be broken randomly. 4) Iterative calculation is stopped if every vertex shares the same label by its neighbour vertexes. Otherwise, algorithm returns to step 2 and repeats the process of iteration. The computing complexity of LPA is almost linear with the size of network. So the efficiency of LPA is higher than other community detection algorithms. 2.2 KL-FCM model Fuzzy C-means is one of the most popular clustering methods that are applied successfully in community detection. The original fuzzy C-means model [11] contains two elements: fuzzy membership u and similarity distance dis . The objective function J FCM is shown in Eq (1)-(3). K

N

J FCM   uin disin

(1)

i 1 n 1

uin  [0,1]

i 1

disin  Vn  Zi

uin 

K

u

  [1, ]

in

1

(2)

2

1 1

disin  1 ( )  j 1 dis jn K

(3)

where N is network vertexes. K is communities number. uin is the fuzzy membership between vertex Vn and community Ci . The fuzziness of FCM model is controlled by parameter  . disin is the Euclidean distance between Vn and central vector Z i of community Ci . Original Fuzzy C-means model can’t guarantee excellence clustering results in some special conditions. Ichihashi et al [4, 12, 13] suggested an improved fuzzy objective function J KLFCM by introducing regularized smoothing function based on KL

information is shown in Eq (4). K

N

K

N

J KLFCM   uin disin    uin log( i 1 n 1

i 1 n 1

uin

 in

)

(4)

where  in is priori probability membership between vertex Vn and community Ci .  is fuzziness parameter. 3 Algorithm This paper proposes an improved community detection method base on label propagation and Fuzzy C-means, which is applied to artificial synthesis and real complex networks. Our method modifies label propagation pattern and is divided into three sections as follows. The framework flowchart of LPA-FCM is shown as Fig1. (1) The establishment of order access sequence. The neighbor evaluation vectors are calculated by feature vectors. Order access sequence is generated base on size of neighbor evaluation vertexes. (2) The assignment of initial community labels. The initialization process is divided into two steps. a) The formation of initial communities. b) The fusion of isolated vertexes. Labels of vertexes are assigned base on their affiliated community labels. (3) The modification of vertexes labels. Some unstable vertexes are selected to change their community labels by Fuzzy C-mean neighbor membership vector. (4) Parameter update. The KL-FCM parameters are updated until cut-off conditions are satisfied during the process of iterative calculation. Input adjacency matrix Compute vertexes’ neighbor evaluation vector Generate order access sequences

Judge cut-off condition

Y N Modify label of unstable vertexes

Form the initial communities

Merge isolated vertexes Initial community labels of vertexes

Update KL-FCM parameters

Detection Results

Fig.1 the framework flowchart of LPA-FCM

3.1 The assignment of initial community labels The initial community labels are assigned randomly in the original framework of label propagation algorithm, which maybe lead to the instability of community detection results. The connection modes of each vertex with its surrounding vertexes

are regarded as its features. According to above-mentioned problem and notion, the initial community labels are assigned by neighbour features evaluation.The process of initialization has mainly three measures as follows. a) The establishment of neighbor evaluation vectors Adjacency matrix M contains N vectors M  ( M1 , M 2 , M 3...M N ) , and M i is the feature vector of vertex Vi . The smooth coefficient  is obtained by calculating average vector M and Euclidean distances as shown in Eq (5) and (6). N

M 

M i 1

i

(5)

N N



 dis( M , M ) i

i 1

N

The neighbor evaluation vector f i  { f i1 , f i 2 , f i 3...... f i evaluation function as shown in Eq (7).

fi k  exp(

(6) NbVi

} is acquired by neighbor

dis( M i , M k ) ) 

(7)

where fi k is the evaluation function element between vertex Vi and its k-th neighbor vertex Vk . The neighbor evaluation function is usually a decreasing function. The larger function value is, the closer relationship between vertex and its neighbor vertex is. Vertex and its closely neighbor vertexes have larger probability to share the same community label. The elements of vector f i are sorted in descending order. The vertexes of larger elements in vector f i are assigned into vertex Vi ’s neighbor evaluation set. b) The formation of initial communities The access sequence is generated randomly in the original framework of label propagation algorithm, which leads to ignore the impact of vertexes on network structure. The vertexes with higher neighbor vertexes can influence their surrounding vertexes and change detection results. Order access sequence is sort in descend by size of neighbor evaluation vertexes. The initial communities are formed by comparing the average neighbor distance between any vertexes in order access sequence. The vertexes with smaller average neighbor distance are divided into a community and share the same label. The average neighbor distance d (Vn ) , d (Vm ) and average mutual neighbor distance d (Vn ,Vm ) for vertex Vn and Vm is shown as Eq (8)-(10).

d (Vn ) 

2  dis(Vi ,V j ) NbVn ( NbVn  1) i , jSNbVn

(8)

d (Vm ) 

2  dis(Vi ,V j ) NbVm ( NbVm  1) i , jSNbVm

(9)

d (Vm ,Vn ) 

1 NbVm NbVn



iSNbVm jSNbVn

dis(Vi ,V j )

(10)

where SNbV is neighbor evaluation set of vertex Vn . NbV is its neighbor evaluation vertices. If d (Vm ,Vn )  (d (Vm )  d (Vn )) / 2   ,vertex Vn and Vm are assigned into a n

n

community and share the same label. Otherwise, vertex Vn and Vm are divided into different communities. Parameters  can be adjusted in a small range by modularity before formation of initial communities. Table 1 the pseudo code of initial community labels Input: adjacency matrix M Step 1：calculate neighbor evaluation vertex Step 2：generate order access sequences Step 3：for all i do if Vi is not assigned to any communities creates a new community for Vi end_if for all j do calculate d (Vi ) , d (V j ) , d (Vi ,V j ) if



threshold condition is satisfied V j is assigned to the community of Vi end_if end_for end_for Step 4：form K base communities Step 5：for all z do for all K do calculate d (Vi , CK ) end_for if d (Vi , Cm ) is the minimum average distance Vi is assigned to C m end_if end_for Output: initial community detection result C

c) The fusion of isolated vertexes Most vertexes can be divided into corresponding communities during the form of communities. However, there are still small scale isolated vertexes, which are not divided into any communities. The neglect or misclassification of isolated vertexes can reduce the accuracy of community detection. The minimum average distance is determined by comparing total average distances between isolated vertexes and each community. The average distance d (Vk , Ci ) between isolated vertex Vk and community Ci is obtained as shown in Eq (11).

d (Vk ,Ci )

2  dis V(k V j, NCi ( NCi  1 )jCi

)

(11)

where N C is vertices of community Ci . If the average distance d (Vk , Cm ) is the minimum, vertex Vk will be divided into community Cm and share the label of Cm as shown in Eq (12). i

K

Vk  Cm if

d (Vk , Cm )  min d (Vk , Ch ) h 1

(12)

The pseudo code of initial community labels is shown as Table1. 3.2 The modification of vertexes labels There are still vertexes misclassification phenomenon when the initialization community detection is completed, especially vertexes with poor connectivity among communities. Vertexes with large diversity or unstable are selected to transfer and revise their community labels by Fuzzy C-mean membership vector. Communities C  {C1 , C2 , C3 ......CK } are obtained by initial community detection results. The network vertexes are assigned to the K communities and share the affiliated community label. The modification of vertexes labels are divided into three mainly step. a) The establishment of membership vector Fuzzy C-mean membership vector un  {u1n , u2 n , u3n ......uKn } is determined by vertex Vn ’s neighbor evaluation vertexes. uin indicates the probability that vertex Vn is affiliate to community Ci . uin is obtained by normalization as shown in Eq (13). NbCni

uin 

f

e 1 NbVn

f j 1

e n

(13) j n

n Ci

where Nb is vertexes belonging to community Ci in neighbor evaluation set SNbV , n

which maybe vary along with the iteration process. If the i-th element of vector un is largest, it indicates that vertex Vn is most possibility to belong to community Ci . b) The selection of unstable vertexes When community does not satisfy the stable cut-off condition, it means to have unstable or large diversity vertexes among communities. Unstable vertexes are selected by comparing the consistency between the current label of vertex and its largest membership element label.

Vs  Suv

 Vs  C j  K if u ps  max(uhs ) h 1  j p 

(14)

If the community label j of vertex Vs does not match the largest element label p of

vector us , vertex Vs is considered to be an unstable vertex and absorbed into unstable vertexes set Suv , as shown in Eq(14). c) Labels modification The labels of unstable vertexes are changed by selecting the maximum membership element label. For unstable vertex Vm , um contains the membership probability distribution between vertex Vm and all communities.

Vm  C j

K

Vm  Suv

if

u jm  max(uhm )

(15)

h 1 j i

If label j of the um largest element is not equal to the current label i of vertex Vm , Vm will be assigned to the community C j . The j is regarded as new community label for vertex Vm as shown in Eq(15). 3.3 Parameters update During iterative process, parameters are updated continuously until objective function cut-off condition is satisfied. When the entire network is stable status, algorithm stops iterative calculation and completes final community detection. a) Cut-off condition Objective function is used to verdict iterative status of community. During t-th t

iteration, KL-FCM objective function J KLFCM is shown as Eq(16).

J

t KLFCM

K NCi

K NCi

  u fd   u log( i 1 j 1

t ij

t ij

i 1 j 1

t ij

uijt

 ijt

)

(16)

where  t is the priori information during t-th iteration. The iterative calculation is stopped when the cut-off condition is satisfied, as shown in Eq (17). t t 1 J KLFCM  J KLFCM 

(17)

where  is status threshold. b) Parameters update The renewal of model information is executed in the next iteration. Parameters contain priori information vector  t 1 and distance vector fd t 1 , which are obtained as shown in Eq(18)-(19).

fd

t 1 ij

( M j  it 1 ) 2 1 t 1  log(2 )  log( i )  2 2( it 1 ) 2

(18)

where it 1 and  it 1 are internal mean and variance vector of community Ci during (t+1)-th iteration. M j is the feature vector of internal vertex V j in community Ci .





t 1 ij



uikt 1

k SNbCj

(19)

i

j Ci

Nb

where SNbCi is V j ’s neighbor evaluation set belonging to community Ci .The pseudo code of labels modification is shown as Table 2. j

Table 2 the pseudo code of labels modification Input: initial community detection result C Step1：calculate initial objective function J KLFCM , flag=1 Step 2：while flag Step 3: if cut-off condition  is not satisfied for all K do select unstable vertexes in community Ck by Eq(13)-(14) revise labels of unstable vertexes by Eq(15) update parameters, t=t+1 fd k t 1 ,  k t 1 , uk t 1 , k t 1 ,  k t 1 by Eq(18)-(19) end_for t 1 update objective function J KLFCM by Eq(16) return Step 3 Step 4： else iterative calculation is end and flag=0, return Step 1 end_if Step 5：end_while Output: final community detection result C_final

4 Experimental results and analysis In this paper, the performance test of community detection is mainly focused on artificially marked networks and real networks. 4.1 Synthetic network Artificially synthesized networks consist of GN and LFR network. These are some simulation conditions that exist in real networks due to differences in degree of vertexes, connection modes and communities scale. In synthesized experiment, LPA [9] and Step-LPA-S [10] are contrasted by normalized mutual information [14] and modularity [15] criterion. 4.1.1 GN network Girvan-Newman benchmark network (GN network) is usually used to simulate a network with some conditions that small-scale networks, vertexes are evenly distributed in several communities, and vertex degree is fixed. The GN network consists of 128 vertexes, which are distributed evenly in l communities and each community has g vertexes. The average degree  k  is set to a fixed value. The external connection vertex number kout is influenced by internal connection probability Pin and external connection probability Pout . The network structure is changed quantitatively by the variety of kout . In this paper, community number l is 4. Community vertexes g is 32. Average degree  k  is equal to 16. The range of external connection vertex number kout varies from 1 to 8. When kout grows larger gradually, each vertex has more connected vertexes that do not belong to the same community. When kout is more than

8, the clustering performance will decline intensely and lost the reality meaning of community detection. The NMI and modularity contrast results on GN network is shown in Fig .2 (a) and (b). From the performance curve observation, the following analysis can be derived: 

When parameter kout varies from 1 to 3, the NMI of all algorithms stay at about 1. When kout is the range [4, 7], the NMI curve of LPA-FCM begins to decrease slightly, and the other curves have decreased in different extent. When kout is 7, LPA-FCM is higher than Step-LPA-S by 34%. When kout is 8, LPA almost arrives 0. Step-LPA-S and LPA-FCM can overcome this kind of problem by iterative calculation. LPA-FCM is higher than other algorithms at least by 2%.



When parameter kout varies from 1 to 8, the modularity curves of all algorithms keep declining. But the curve of LPA-FCM is still higher than other contrast algorithms slightly. When kout is 7, the modularity of LPA-FCM is superior to other algorithms at least by 9%. When kout is 8, the modularity of LPA-FCM can reach 0.22, and is higher than other algorithms at least by 4%.



The NMI and modularity curve of LPA-FCM is higher than Step-LPA-S and LPA on GN network.

(a) (b) Fig.2 The test results on GN benchmark network. The parameters N=128, l=4, g=32,  k  =16, the range of kout is [1, 8]. (a) The NMI result of LPA-FCM, Step-LPA-S and LPA on GN network (b) The modularity result of LPA-FCM, Step-LPA-S and LPA on GN network.

4.1.2 LFR network Lancichinetti-Fortunato-Radicchi benchmark Network (LFR Network) is usually used to simulate complex networks with large-scale vertexes, vertexes are unevenly distributed among several communities, and vertex degree is unfixed. The LFR network consists of 1000 vertexes, which are distributed in several communities. The vertexes in each community are selected in community size [ Cmin , Cmax ]. Parameter  and  are community degree distribution and community size distribution. The average degree  k  and the maximum degree kmax determine the connection mode. Topology mixing parameter  represents the proportion of edges

connecting each vertex with other vertexes in the same community. The variation of mixing parameter  can change the network structure and influence the performance of clustering. In this paper, the LFR network parameters are set as follows:  k  =20， kmax =50, Cmin =20, Cmax =100,  =2,  =1. Mixing parameter  varies from 0.1 to 0.6.

(a) (b) Fig.3 The test results on LFR benchmark network. The parameters N=1000,  k  =20, Cmin =20, Cmax =100, kmax =50,  =2,  =1, the range of  is [0.1, 0.6]. (a) The NMI result of LPA-FCM, Step-LPA-S and LPA on LFR network (b) The modularity result of LPA-FCM, Step-LPA-S and LPA on LFR network.

The NMI and modularity contrast results on LFR network is shown in Fig.3 (a) and (b). From the performance curve observation, the following analysis can be derived:  When parameter  varies from 0.1 to 0.3, the NMI curve of LPA-FCM keeps about 1, and other algorithms have decreased in different extent. When parameter  is 0.5, LPA-FCM is higher than Step-LPA-S by 3%. When  is 0.6, all algorithms have dropped to below 0.3, LPA-FCM reaches 0.28 and is superior 

to Step-LPA-S by 10%. When parameter  is in the range [0.1, 0.6], the modularity curves of all algorithms descend during the process. But the decline rate of LPA-FCM is still slighter than other contrast algorithms. When  is 0.5, LPA-FCM is higher than other algorithms at least by 8%. When  is 0.6, LPA-FCM is superior to Step-LPA-S by 4%.



The NMI and modularity curve of LPA-FCM is better than Step-LPA-S and LPA on LFR network.

4.2 Real network The real network experiment is obtains the optimal result by modularity for GN database[16]. The description of database is shown in Table 3. The contrast algorithms are some classical algorithms, including LPA [9], BGLL [17], Step-LPA-S [10],

Infomap [18], MOCD [19], FC [20], GA [7], FM [5, 6] and MODBS [21]. Table 3 the description of real network Database [16] Description Vertexes

Edges

Karate

Zachary’s karate club network

34

78

Dolphin

Dolphins network

62

159

Football

American college football network

115

613

Polbook

US politics books network

105

441

Email

Email communication network

1133

5254

Netscience

Scientists network

1589

2742

Power

US power grid network

4941

6953

Table 4 the optimal modularity results of different algorithms on real network Algorithms

Karate

Dolphin

Football

Polbook

Email

Netscience

Power

Average

BGLL

0.4188

0.5188

0.6046

0.4986

0.5412

0.9346

0.7756

0.6131

GA

0.4188

0.5014

0.5940

0.5230

0.3283

0.8224

0.6660

0.5505

MODBS

0.4198

0.5259

0.6046

0.5268

0.5355

0.9275

0.8140

0.6220

Infomap

0.4151

0.5204

0.5634

0.5127

0.5233

0.9315

0.8182

0.6120

FM

0.3808

0.4947

0.5813

0.5126

0.5289

0.8581

0.9341

0.5986

MOCD

0.4188

0.5259

0.5958

0.5230

0.3681

0.8923

0.7065

0.5757

FC

0.3573

0.4602

0.5793

0.4787

0.3657

—

0.7450

0.4977

LPA

0.3573

0.4868

0.5897

0.5117

0.5318

0.9174

0.8100

0.6006

Step-LPA-S

0.3715

0.4787

0.5754

0.5017

0.5341

0.9251

0.8541

0.6052

LPA-FCM

0.4198

0.5264

0.6046

0.5257

0.5512

0.9317

0.9051

0.6378

The optimal modularity results of different algorithms on real network are shown in Table 4 .From the optimal modularity contrast result, the following analysis can be derived: 

In Karate and Football database, LPA-FCM, MODBS and BGLL can reach the best classification results. In Dolphin database, LPA-FCM obtains the optimal result. In Polbooks database, LPA-FCM is less than MODBS by 0. 1%, and still better than others.



In Email database, LPA-FCM is superior to other contrast algorithms at least by 1%. In Netscience database, LPA-FCM is less than BGLL by 0.3%. But it is still higher than other algorithms. In Power dataset, LPA-FCM is better than other contrast algorithms at least by 5 % except FM.



In GN database, the optimal modularity of LPA-FCM is superior to LPA and Step-LPA-S by about 3%.



Regarding the average optimal modularity, LPA-FCM is higher than other contrast algorithms. Therefore, it reflects that LPA-FCM has better detection performance on real networks.

4.3 Complexity The complexity of LPA-FCM is mainly divided into three parts, including the merge of initial community, the fusion of isolated vertexes and the labels modification.

The complexity of community merge is reflected in the establishment of neighbor evaluation vectors and the form of initial community. The complexity of initialization is regarded as o( N 2  mN 2 ) where m is average value of neighbor evaluation vertices. m is less than edges E , which is a fixed value on synthesized networks or less than 10 on real networks. The isolated vertex fusion takes time o( N 2 ) . It should cost runtime o(t *(3m  2) * N ) during the process of labels modification where t is iterations. So the entire complexity of LPA-FCM is about o((2  m) N 2  t (3m  2) N ) . The iteration t usually is less than 20. LPA-FCM spends more runtime than LPA o( N  E ) , Infomap o( N 2  EN ) and Step-LPA-S o( N 2  M 2  t ( E  N )) where M is the average sub-networks vertices. But LPA-FCM is still faster than some detection algorithms, including GN and FM with the complexity of o( N 3 ) [5, 22, 23]. 5 Conclusion In this paper, an improved community detection method based on label propagation and Fuzzy C-means is proposed and applied on real social and synthetic networks. It changed relevant content of label propagation framework. The initial labels of vertexes are assigned by neighbor evaluation method. The unstable vertexes can be selected and revised their labels base on neighbor membership. Parameters are updated until objective function cut-off condition is satisfied. Experiments showed that the performance of LPA-FCM algorithm is improved on artificial synthesis and real complex network. LPA-FCM algorithm can be extended to intelligent decision support system, wireless sensor network, network game theory [31, 35] and other practical complex network applications [38-40]. Acknowledgments This paper is supported by the National Natural Science Foundation of China (61471299) and Shaanxi Province of China Key Research and Development Project (2017ZDXM-GY-139). References [1] R.S.Wang, S.Zhang, Y.Wang, 2008.Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures. Neurocomputing. 72, 043. https://doi.org/10.1016/j.neucom.2007. 12.043. [2] S.Pahwa, M.Youssef, P.Schumm, 2013. Optimal intentional islanding to enhance the robustness of power grid networks. Physica A. 392, 029. https://doi.org/10.1016 /j.physa. 2013. 03.029. [3] R.R.Gharieb, G.Gendy, A.Abdelfattah, 2017. Adaptive local data and membership based KL divergence incorporating C-means algorithm for fuzzy image segmentation. Appl. Soft. Comput. 59,055. https://doi.org/10.1016/j.asoc.2017. 05.055. [4] Q.H.Zhao, X.L.Li, Y.Li, 2017. A fuzzy clustering image segmentation algorithm based on Hidden Markov Random Field models and Voronoi Tessellation. Pattern. Recogn. Lett. 85,019. https://doi.org/10.1016/j.patrec.2016.11.019. [5] A.Clauset, M.E.Newman, C. Moore, 2004. Finding community structure in very large networks. Phys. Rev. E.70, 066111. https://doi.org/10.1103/physreve.70. 066111. [6] M.E.Newman, M.Girvan, 2004. Finding and evaluating community structure in networks. Phys. Rev. E. 69, 026113. https://doi.org/10.1103/PhysRevE.69. 026113.

[7] C. Shi, Z. Yan, Y. Wang, 2010. A genetic algorithm for detecting communities in large-scale complex networks. Adv. Complex. Syst. 13, S0219525910002463. https://doi.org/10.1142/ S0219525910002 463. [8] M.Guerrero, F.G.Montoya, R.Baños, 2017. Adaptive community detection in complex networks using genetic algorithms. Neurocomputing. 266,029. https://doi.org/10.1016/ j.neucom.2017.05.029. [9] U.N.Raghavan, R.Albert, S.Kumara, 2007. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E. 76, 036106. http://dx.doi.org/ 10.1103/ physreve.76.036106. [10] W. Li, C. Huang, M. Wang, 2017. Stepping community detection algorithm based on label propagation and similarity. Physic A. 472, 030. http://dx.doi.org/10.1016/j.physa. 2017 .01.030. [11] J.C.Bezdek, Pattern recognition with fuzzy objective function algorithms, Plenum 1981. [12] R.R.Gharieb,G.Gendy, A.Abdelfattah, 2017. Adaptive local data and membership based KL divergence incorporating C-means algorithm for fuzzy image segmentation, Appl. Soft. Comput. 59,055. https://doi.org/10.1016/j.asoc.2017. 05.055. [13] H.Ichihashi, K.Miyagishi, K.Honda, 2001. Fuzzy C-means clustering with regularization by K-L information. FUZZ-IEEE-2001.13, 1009107. https://doi.org/10.1109/FUZZ.2001. 1009107. [14] L.Danon, A.Díazguilera, J.Duch, 2005. Comparing community structure identification. J. Stat. Mech-theory. E. 2005, P09008. https://doi.org/10.1088/1742 -5468/2005/09/ P09008. [15] M.E.Newman, 2006. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E. 74, 036104. https://doi.org/10.1103 /physreve.74.036104. [16] M.E.J. Newman, Network data, 2013, (http://www-personal.umich.edu/mejn/netdata/). [17] V.D.Blondel, J.L.Guillaume, R.Lambiotte, 2008. Fast unfolding of communities in large networks. J. Stat. Mech-theory. E. 2008, P10008. https://doi.org/10.1088/1742-5468/2008/10/ P10008. [18] M. Rosvall, C.T. Bergstrom, 2008. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA. 105, 0706851105. https://doi.org/10.1073/ pnas. 0706851105. [19] C. Shi, Z. Yan, Y. Cai, 2012. Multi-objective community detection in complex networks. Appl. Soft .Comput. 12, 005. https://doi.org/10.1016/j.asoc.2011.10.005. [20] P.G. Sun, 2015. Community detection by fuzzy clustering. Physica A. 419,009. https:// doi.org/ 10.1016/ j.physa.2014.10.009 [21] F.Zou, D.Chen, S.Li, 2017. Community detection in complex networks: Multi-objective discrete backtracking search optimization algorithm with decomposition. Appl. Soft. Comput. 53, 005. https://doi.org/10.1016/j.asoc.2017.01.005. [22] M. Girvan, M.E.J. Newman, 2002. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA. 99, 122653799. https://doi.org/10.1073/pnas.122653799. [23] H.Lu, H.Wei, 2012. Detection of community structure in networks based on community coefficients. Physica A. 391, 062. https://doi.org/10.1016/j.physa.2012.06.062. [24] Y.H.Ma, H.J.Li, X.D.Zhang, 2009. Strength distribution of novel local-world networks. Physica A. 388,030. https://doi.org/10.1016/j.physa.2009.07.030. [25] M. Perc, J.J. Jordan, D.G. Rand, 2017. Statistical physics of human cooperation. Phys. Rep. 687,004. https://doi.org/10.1016/j.physrep.2017.05.004. [26] Z. Wang, C.T. Bauch, S. Bhattacharyya, 2016. Statistical physics of vaccination. Phys. Rep. 664,006. https://doi.org/10.1016/j.physrep.2016.10.006. [27] D. Helbing, D. Brockmann, T. Chadefaux, 2015. Saving Human Lives: What Complexity Science and Information Systems can Contribute. J. Stat. Phys. 158, s10955-014-1024-9. https://doi.org/ 10.1007/s10955-014-1024-9. [28] M. Gosak, R.Markovič, J.Dolenšek, 2017. Network science of biological systems at different scales: A review. Phys. Life. Rev. 24,003. https://doi.org/ 10.1016/j.plrev.2017.11.003. [29] M. Jalili, M.Perc, 2017. Information cascades in complex networks. J.Com.Net. 5, cnx019. https://doi.org/10.1093/comnet/cnx019. [30] Z.Bu, G.Gao, H.J.Li, 2017. CAMAS: A cluster-aware multi-agent system for attributed graph clustering. Inform.Fusion.37, 002. https://doi.org/10.1016/j.inffus.2017.01.002. [31] Z.Bu, H.Li, J.Cao, 2016. Game theory based emotional evolution analysis for chinese online

reviews. Knowl-Based. Syst. 103,026. https://doi.org/10.1016/j.knosys.2016.03.026. [32] H.J.Li, Z.Bu, Z.Wang, 2018. Enhance the Performance of Network Computation by a Tunable Weighting Strategy. IEEE.Trans.Emerg.Topics.Compue. 2, 2829906. https://doi. org/ 10. 1109/ TETCI. 2018.2829906. [33] H.J.Li, Z.Bu, Y.Li, 2018. Evolving the attribute flow for dynamical clustering in signed networks. Chaos. Soliton. Fract. 110,009. https://doi.org/10.1016/j.chaos.2018.02.009. [34] Z.Wang, M.Jusup, R.W.Wang, 2017. Onymity promotes cooperation in social dilemma experiments. Sci. Adv. 3, 1601444. https://doi.org/10.1126/sciadv.1601444. [35] C.Shen, C.Chu, H.Guo, 2017. Coevolution of vertex weights resolves social dilemma in spatial networks. Sci. Rep-UK. 7, s41598-017-15603-2. https://doi.org/10.1038/s41598 -017-15603-2. [36] Z.Wang, S.Kokubo, M.Jusup, 2015. Universal scaling for the dilemma strength in evolutionary games. Phys. Life. Rev. 14,033. https://doi.org/10.1016/j.plrev.2015.04.033. [37] P. Holme, J. Saramäki, 2012. Temporal networks .Phys. Rep. 519,001. https://doi.org/ 10. 1016/ j.physrep. 2012. 03.001. [38] L.Wang, X.Li, 2014. Spatial epidemiology of networked meta-population: an overview. Chin. Sci. Bull. 59, s11434-014-0499-8. https://doi.org/10.1007/s11434-014-0499-8. [39] J.B.Wang, L.Wang, X.Li, 2016. Identifying spatial invasion of pandemics on meta-population networks via anatomizing arrival history. IEEE.T.Cybernetics. 46, 2489702. https://doi.org/ 10.1109/tcyb.2015.2489702. [40] L.Wang. J.T.Wu, 2018. Characterizing the dynamics underlying global spread of epidemics. Nat. Commun.9, s41467-017-02344-z. https://doi.org/ 10.1038/s41467-017-02344-z.

A complex network community detection algorithm based on label propagation and fuzzy C-means

A complex network community detection algorithm based on label propagation and fuzzy C-means

Recommend Documents